To use a remote hadoop cluster with kerberos authentication you will need to
get a proper krb5.conf
file (eg from your remote cluster /etc/kerb5.conf
)
and place the file /etc/krb5.conf
on your client OSX machine. To use this
configurations from your osx hadoop client change your .[z]profile
to:
export HADOOP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
export YARN_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf"
With java 1.7 this should be sufficient to detect the default realm, the kdc and also any specific authentication options used by your site. Please make sure the kerberos configuration is already in place when you obtain your ticket with
$ kinit
In case you got a ticket beforehand you may have to execute kinit again or login to local account again.
For the next step you will need to obtain the remote cluster configuration files (eg scp the config files from the remote cluster to a local directory, eg to ~/remote-hadoop-conf). The result should be a local copy similar to this:
$ ls -l ~/remote-hadoop-conf
total 184
-rw-r--r-- 1 dirkd staff 4146 Jun 25 2013 capacity-scheduler.xml
-rw-r--r-- 1 dirkd staff 4381 Oct 21 11:44 core-site.xml
-rw-r--r-- 1 dirkd staff 253 Aug 21 11:46 dfs.includes
-rw-r--r-- 1 dirkd staff 0 Jun 25 2013 excludes
-rw-r--r-- 1 dirkd staff 896 Dec 1 11:44 hadoop-env.sh
-rw-r--r-- 1 dirkd staff 3251 Aug 5 09:50 hadoop-metrics.properties
-rw-r--r-- 1 dirkd staff 4214 Oct 7 2013 hadoop-policy.xml
-rw-r--r-- 1 dirkd staff 7283 Nov 3 16:44 hdfs-site.xml
-rw-r--r-- 1 dirkd staff 8713 Nov 18 16:26 log4j.properties
-rw-r--r-- 1 dirkd staff 6112 Nov 5 16:52 mapred-site.xml
-rw-r--r-- 1 dirkd staff 253 Aug 21 11:46 mapred.includes
-rw-r--r-- 1 dirkd staff 127 Apr 4 2014 taskcontroller.cfg
-rw-r--r-- 1 dirkd staff 931 Oct 20 09:44 topology.table.file
-rw-r--r-- 1 dirkd staff 70 Jul 2 11:52 yarn-env.sh
-rw-r--r-- 1 dirkd staff 5559 Nov 5 16:52 yarn-site.xml
Then point your hadoop and hdfs command to this configuration:
$ hdfs --config ~/remote-hadoop-conf dfs -ls /
If all worked well, then you should see at this point the content of the remote hdfs directory and you will be ready to use the standard hdfs or hadoop commands remotely.