Latest Hadoop setting on Mac OS X can be achieved by various ways like Homebrew. If you don’t prefer Homebrew, this guide aims to setting up the pseudo-distributed mode in single node cluster.
1. Required software
1) Java
Run the following command in a terminal:$ java -version
If Java is already installed, you can see a similar result like:
$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
If not, the terminal will prompt you for installation or you can download Java JDK here.
2) SSH
First enable Remote Login in System Preference -> Sharing.
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2. Get a Hadoop distribution
You can download it from Apache Download Mirror.
3. Prepare to start the Hadoop cluster
1) Unpack the downloaded Hadoop distribution.
2) Run the following command to figure out where is your Java home directory:
$ /usr/libexec/java_home
You can see a result like:
$ /usr/libexec/java_home
/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home
3) In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:
# set to the root of your Java installation
export JAVA_HOME={your java home directory}
# set to the root of your Hadoop installation
export HADOOP_PREFIX={your hadoop distribution directory}
4) Try the following command:
$ cd {your hadoop distribution directory}
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
- Standalone mode
- Pseudo-distributed mode
- fully-distributed mode
We will go through pseudo-distributed mode and run a MapReduce job on YARN here. In this mode, Hadoop runs on a single node and each Hadoop daemon runs in a separate Java process.
4. Configuration
Edit following config files in your Hadoop directory1) etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2) etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3) etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4) etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
5. Execution
1) Format and start HDFS and YARN
$ cd {your hadoop distribution directory}
Format the filesystem:
$ bin/hdfs namenode -format
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
Now you can browse the web interface for the NameNode at - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/{username} #make sure you add correct username here
Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
Browse the web interface for the ResourceManager at - http://localhost:8088/
2) Test examples code that came with the hadoop version
Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
This example counts the words starting with "dfs" in the input.
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*
or View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
You can see the result like:
4 dfs.class
4 dfs.audit.logger
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.replication
1 dfs.file
3) Stop YARN and HDFS
$ sbin/stop-yarn.sh $ sbin/stop-dfs.sh
You are all done.
Thanks for your help it is so easy to install by following the above steps
ReplyDeleteHi, I keep having the "Input path does not exist:" issue when running the grep example.
ReplyDelete15/11/22 20:43:35 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/richmond/.staging/job_1448196147691_0002
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/richmond/grep-temp-1479442832
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
and I have googled and found others have the same issue, but no solution.
Do you have any ideas?
Thanks
Hi Ravi, I can't find the "HADOOP_PREFIX" in the mentioned file, any idea if it has changed or something?
ReplyDeleteThat parameter is not required in latest version
DeleteThanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..
ReplyDeletethanks very useful!
ReplyDeleteThank you so much! It works finally!
ReplyDeleteBeing new to the blogging world I feel like there is still so much to learn. Your tips helped to clarify a few things for me as well
ReplyDeleteiOS App Development Company
Android App Development Company
Best Mobile app Development company
Android App Development Company in chennai
iOS App Development Company in chennai
Do not use DSA keys. They are fixed 1024 bits and not secure enough today. Generate a new RSA key and your steps should work:
ReplyDeletessh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
If the RSA does not work either, most probably there will be a problem with permissions (which is in the debug log of the server). Make sure that your home directory, the ~/.ssh/ and authorized_keys files are owned by you and not writable by any other user/group.
You should change the permissions to be readable and writable only by you:
chmod 600 ~/.ssh.authorized_keys
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Admin online training
ReplyDelete