Why would I do this?
An OSX laptop will not allow to do any larger scale data processing, but it may
be convenient place to develop/debug hadoop scripts before running on a real
cluster. For this you likely want to have a local hadoop “cluster” to play
with, and use the local commands as client for an larger remote hadoop
cluster. This post covers the local install and basic testing. A second post
shows how to extend the setup for accessing /processing against a remote
If you don’t yet have java (yosemite does not actually come with it) then the
first step is to download the installer from the Oracle download site. Once
installed you should get in a terminal shell something like:
$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
If you need to have several java versions installed and want to be able to
switch between them: take have a look at the nice description here.
If you don’t yet have the homebrew package manager installed then get it now
by following the (one line) installation on http://brew.sh. Homebrew packages
/usr/local and rarely interfere with other stuff on your machine
(unless you ask them to). Install the hadoop package as a normal user using:
(At the time of writing I got hadoop 2.5.1)
BTW: Once you start using brew also for other packages, be careful when
brew upgrade. Eg you may want to use
brew pin to avoid getting eg a new
hadoop versions installed, while doing other package upgrades.
Next stop: edit a few config files: In
.[z]profile you may want to add a few
shortcuts to quickly jump to the relevant places or to be able to switch
between hadoop and java versions, but this is not strictly required to run hadoop.
Now you should edit a few hadoop files in your hadoop configuration directory:
core-site.xml expand the configuration to:
<description>A base for other temporary directories.</description>
and finally in
Now its time to:
$ hadoop namenode -format
## start the hadoop daemons (move to launchd plist to do this automatically)
## show (still empty) homedir in hdfs
$ hdfs dfs -ls
## put some local file
$ hdfs dfs -put myfile.txt
## now we should see the new file
$ hdfs dfs -ls
Work around an annoying Kerberos realm problem on OSX
The hadoop setup will at this point likely still complain with a message
Unable to load realm info from SCDynamicStore, which is caused by a java
bug on OSX (more details here).
There are different ways to work around this, depending on whether you just
want to get a local hadoop installation going or need your hadoop client to
(also) access a remote kerberized hadoop cluster.
To get java running on the local (non-kerberized) setup, it is
sufficient to just add some definitions to
.[z]profile as described in this post.
The actual hostname probably does not matter too much, as you won’t do an
actual kerberos exchange locally, but just get past the flawed
“do we know a default realm” check in java.
In case you are planning to access a kerberized hadoop cluster
please continue reading the next post.
Some of the default logging settings make hadoop rather chatty on the console
about deprecated configuration keys and other things. On OSX there are a few
items that get nagging after a while as they make it harder to spot real
problems. You may want to adjust the
log4j settings to mute warnings that you
don’t want to see every single time you enter a hadoop command. In
$HADOOP_HOME/etc/hadoop/log4j.properties you could add:
# Logging Threshold
# the native libs don't exist for OSX
# yes, we'll keep in mind that some things are deprecated