Connect to a remote, kerberized hadoop cluster

To use a remote hadoop cluster with kerberos authentication you will need to get a proper krb5.conf file (eg from your remote cluster /etc/kerb5.conf) and place the file /etc/krb5.conf on your client OSX machine. To use this configurations from your osx hadoop client change your .[z]profile to:

export HADOOP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
export YARN_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"

With java 1.7 this should be sufficient to detect the default realm, the kdc and also any specific authentication options used by your site. Please make sure the kerberos configuration is already in place when you obtain your ticket with

$ kinit

In case you got a ticket beforehand you may have to execute kinit again or login to local account again.

For the next step you will need to obtain the remote cluster configuration files (eg scp the config files from the remote cluster to a local directory, eg to ~/remote-hadoop-conf). The result should be a local copy similar to this:

$ ls -l  ~/remote-hadoop-conf

total 184
-rw-r--r--  1 dirkd  staff  4146 Jun 25  2013 capacity-scheduler.xml
-rw-r--r--  1 dirkd  staff  4381 Oct 21 11:44 core-site.xml
-rw-r--r--  1 dirkd  staff   253 Aug 21 11:46 dfs.includes
-rw-r--r--  1 dirkd  staff     0 Jun 25  2013 excludes
-rw-r--r--  1 dirkd  staff   896 Dec  1 11:44 hadoop-env.sh
-rw-r--r--  1 dirkd  staff  3251 Aug  5 09:50 hadoop-metrics.properties
-rw-r--r--  1 dirkd  staff  4214 Oct  7  2013 hadoop-policy.xml
-rw-r--r--  1 dirkd  staff  7283 Nov  3 16:44 hdfs-site.xml
-rw-r--r--  1 dirkd  staff  8713 Nov 18 16:26 log4j.properties
-rw-r--r--  1 dirkd  staff  6112 Nov  5 16:52 mapred-site.xml
-rw-r--r--  1 dirkd  staff   253 Aug 21 11:46 mapred.includes
-rw-r--r--  1 dirkd  staff   127 Apr  4  2014 taskcontroller.cfg
-rw-r--r--  1 dirkd  staff   931 Oct 20 09:44 topology.table.file
-rw-r--r--  1 dirkd  staff    70 Jul  2 11:52 yarn-env.sh
-rw-r--r--  1 dirkd  staff  5559 Nov  5 16:52 yarn-site.xml

Then point your hadoop and hdfs command to this configuration:

$ hdfs --config ~/remote-hadoop-conf dfs -ls /

If all worked well, then you should see at this point the content of the remote hdfs directory and you will be ready to use the standard hdfs or hadoop commands remotely.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.