We will describe how we successfully installed a Hadoop cluster setup in the ED202 laboratory. [[BR]] '''Step 1'''. Firstly, read and follow carefully the one-machine setup for Hadoop from [http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) this how-to on Hadoop single-node setup]. [[BR]] '''Step 2'''. After that, you can step further by following [http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) the guide for dual-node setup]. [[BR]] '''Step 3'''. For extending the cluster and add the N-th slave to the cluster, you must do the following: '''Step 3.1''' Follow step 1 on the ''slave-N'' machine. [[BR]] '''Step 3.2''' Copy all the contents of the /conf/ directory from a working slave configuration.[[BR]] '''Step 3.3''' Set the hostname of the machine to a suggestive string, let's say ''slave-N''. [[BR]] '''Step 3.4''' Add an entry to the master's /etc/hosts file like {{{ ... 10.10.10.10 slave-N ... }}} where 10.10.10.10 is the ip of the ''slave-N'' machine. [[BR]] '''Step 3.5''' Add the same entry added in master's /etc/hosts file to each other slave's /etc/hosts, so that every slave can resolve the name ''slave-N''.[[BR]] '''Step 3.6''' Put the master's ssh public keys to ''slave-N''`s ~/.ssh/authorized_keys, and check that. {{{ #ssh slave-N }}} works without asking a passwsord. [[BR]] '''Step 3.7''' Add a line with ''slave-N'' in the master's /conf/slaves file. [[BR]] '''Step 3.8''' Restart the cluster from the master's console: {{{ hadoop# conf/stop-dfs.sh hadoop# conf/stop-mapred.sh hadoop# conf/start-dfs.sh hadoop# conf/start-mapred.sh }}} '''Step 3.9''' Check the status of the cluster by executing ''jps'' command on each node. Main issues: [[BR]] - Make sure you install on each machine a Linux distribution that easily permits changing the computer's hostname. We have used Ubuntu 8.04 and Ubuntu 9.10 and it worked fine. We also tried with Fedora 10, but we didn't succeeded, because of this hostname issue. [[BR]] - Incorrect NamespaceID issue appears after (re-)formating the DFS. It is discussed the [http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)#java.io.IOException:_Incompatible_namespaceIDs dual-node setup guide]. [[BR]] - For other issues check the logs on suspect machines.