Hadoop single node setup on Ubuntu
I tried to figure out how to use Hadoop and HDFS for a while. But the information on their official site is very scattered and out-of-updated. Here are some notes I did.
- A clean ubuntu 10.04 LTS build.
- Download hadoop package from here.
- The Hadoop's versioning rule is very confusing.
- 1.0.x is stable version
- 1.1.x is beta version
- 2.x.x is alpha version
- 0.23.x is similar to 2.x.x but missing Name Node HA
- I tried to ignore all other version started with 0.2x. I just use 1.0.4 directly.
- Download the KEYS in the root directory
- Download the from hadoop_1.0.4-1_i386.deb(or it's x64 version) and it's asc file from hadoop-1.0.4 folder.
- Check the integrity
- run `gpg --import KEYS`
- run `gpg --verify hadoop_1.0.4-1_i386.deb.asc`
- You should see
mac@mac-ubuntu:~/projects/hadoop$ gpg --verify hadoop_1.0.4-1_i386.deb.asc
gpg: Signature made Thu 04 Oct 2012 01:04:55 PM PDT using RSA key ID ECB31663
gpg: Good signature from "Matthew Foley (CODE SIGNING KEY) <mattf@apache.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 7854 36A7 8258 6B71 829C 67A0 4169 AA27 ECB3 1663 - Then it's good enough to me.
- Install it
- sudo dpkg -i hadoop_1.0.4-1_i386.deb
- Patch it
- I found there were some issues when it deb file ran on Ubuntu, I just patched it.
- add following lines to /usr/sbin/hadoop-daemon.sh, line 81
export USER=`whoami`
if [ "$command" == "jobtracker" ] || [ "$command" == "tasktracker" ]; then
export HADOOP_LOG_DIR=/var/log/hadoop/$USER
export HADOOP_IDENT_STRING="$USER"
fi - Configure it
- It comes with a handy script, just run it.
- `sudo hadoop-setup-single-node.sh --default`
- So the Name node, Data node, Task tracker should be running now. But the Job tracker is not.
- Fix it
- `sudo -u hdfs hadoop fs -mkdir /mapred`
- `sudo -u hdfs hadoop fs -chown mapred /mapred`
- `sudo /etc/init.d/hadoop-jobtracker restart`
- Test it
- `sudo hadoop-validate-setup.sh --user=hdfs`
- It will run 3 test cases, you should see they are all passed.
I think it's the cleanest way to install hadoop in ubuntu.
留言