Hadoop single node setup on Ubuntu

I tried to figure out how to use Hadoop and HDFS for a while. But the information on their official site is very scattered and out-of-updated. Here are some notes I did.

  1. A clean ubuntu 10.04 LTS build.
  2. Download hadoop package from here.
    1. The Hadoop's versioning rule is very confusing.
      • 1.0.x is stable version
      • 1.1.x is beta version
      • 2.x.x is alpha version
      • 0.23.x is similar to 2.x.x but missing Name Node HA
      • I tried to ignore all other version started with 0.2x. I just use 1.0.4 directly.
    2. Download the KEYS in the root directory
    3. Download the from hadoop_1.0.4-1_i386.deb(or it's x64 version) and it's asc file from hadoop-1.0.4 folder.
  3. Check the integrity
    1. run `gpg --import KEYS`
    2. run `gpg --verify hadoop_1.0.4-1_i386.deb.asc`
    3. You should see
      mac@mac-ubuntu:~/projects/hadoop$ gpg --verify hadoop_1.0.4-1_i386.deb.asc
      gpg: Signature made Thu 04 Oct 2012 01:04:55 PM PDT using RSA key ID ECB31663
      gpg: Good signature from "Matthew Foley (CODE SIGNING KEY) <mattf@apache.org>"
      gpg: WARNING: This key is not certified with a trusted signature!
      gpg:          There is no indication that the signature belongs to the owner.
      Primary key fingerprint: 7854 36A7 8258 6B71 829C  67A0 4169 AA27 ECB3 1663
    4. Then it's good enough to me.
  4. Install it
    • sudo dpkg -i hadoop_1.0.4-1_i386.deb
  5. Patch it
    • I found there were some issues when it deb file ran on Ubuntu, I just patched it.
    • add following lines to /usr/sbin/hadoop-daemon.sh, line 81
      export USER=`whoami`

      if [ "$command" == "jobtracker" ] || [ "$command" == "tasktracker" ]; then
        export HADOOP_LOG_DIR=/var/log/hadoop/$USER
        export HADOOP_IDENT_STRING="$USER"
  6. Configure it
    • It comes with a handy script, just run it.
    • `sudo hadoop-setup-single-node.sh --default`
    • So the Name node, Data node, Task tracker should be running now. But the Job tracker is not.
  7. Fix it
    1. `sudo -u hdfs hadoop fs -mkdir /mapred`
    2. `sudo -u hdfs hadoop fs -chown mapred /mapred`
    3. `sudo /etc/init.d/hadoop-jobtracker restart`
  8. Test it
    1. `sudo hadoop-validate-setup.sh --user=hdfs`
    2. It will run 3 test cases, you should see they are all passed.
I think it's the cleanest way to install hadoop in ubuntu.




怎麼在兩台linux server間用scp而不需打密碼?