[120] | 1 | Hadoop On Demand |
---|
| 2 | ================ |
---|
| 3 | |
---|
| 4 | 1. Introduction: |
---|
| 5 | ================ |
---|
| 6 | |
---|
| 7 | The Hadoop On Demand (HOD) project is a system for provisioning and |
---|
| 8 | managing independent Hadoop MapReduce instances on a shared cluster |
---|
| 9 | of nodes. HOD uses a resource manager for allocation. At present it |
---|
| 10 | supports Torque (http://www.clusterresources.com/pages/products/torque-resource-manager.php) |
---|
| 11 | out of the box. |
---|
| 12 | |
---|
| 13 | 2. Feature List: |
---|
| 14 | ================ |
---|
| 15 | |
---|
| 16 | The following are the features provided by HOD: |
---|
| 17 | |
---|
| 18 | 2.1 Simplified interface for managing MapReduce clusters: |
---|
| 19 | |
---|
| 20 | The MapReduce user interacts with the cluster through a simple |
---|
| 21 | command line interface, the HOD client. HOD brings up a virtual |
---|
| 22 | MapReduce cluster with the required number of nodes, which the |
---|
| 23 | user can use for running Hadoop jobs. When done, HOD will |
---|
| 24 | automatically clean up the resources and make the nodes available |
---|
| 25 | again. |
---|
| 26 | |
---|
| 27 | 2.2 Automatic installation of Hadoop: |
---|
| 28 | |
---|
| 29 | With HOD, Hadoop does not need to be even installed on the cluster. |
---|
| 30 | The user can provide a Hadoop tarball that HOD will automatically |
---|
| 31 | distribute to all the nodes in the cluster. |
---|
| 32 | |
---|
| 33 | 2.3 Configuring Hadoop: |
---|
| 34 | |
---|
| 35 | Dynamic parameters of Hadoop configuration, such as the NameNode and |
---|
| 36 | JobTracker addresses and ports, and file system temporary directories |
---|
| 37 | are generated and distributed by HOD automatically to all nodes in |
---|
| 38 | the cluster. |
---|
| 39 | |
---|
| 40 | In addition, HOD allows the user to configure Hadoop parameters |
---|
| 41 | at both the server (for e.g. JobTracker) and client (for e.g. JobClient) |
---|
| 42 | level, including 'final' parameters, that were introduced with |
---|
| 43 | Hadoop 0.15. |
---|
| 44 | |
---|
| 45 | 2.4 Auto-cleanup of unused clusters: |
---|
| 46 | |
---|
| 47 | HOD has an automatic timeout so that users cannot misuse resources they |
---|
| 48 | aren't using. The timeout applies only when there is no MapReduce job |
---|
| 49 | running. |
---|
| 50 | |
---|
| 51 | 2.5 Log services: |
---|
| 52 | |
---|
| 53 | HOD can be used to collect all MapReduce logs to a central location |
---|
| 54 | for archiving and inspection after the job is completed. |
---|
| 55 | |
---|
| 56 | 3. HOD Components |
---|
| 57 | ================= |
---|
| 58 | |
---|
| 59 | This is a brief overview of the various components of HOD and how they |
---|
| 60 | interact to provision Hadoop. |
---|
| 61 | |
---|
| 62 | HOD Client: The HOD client is a Unix command that users use to allocate |
---|
| 63 | Hadoop MapReduce clusters. The command provides other options to list |
---|
| 64 | allocated clusters and deallocate them. The HOD client generates the |
---|
| 65 | hadoop-site.xml in a user specified directory. The user can point to |
---|
| 66 | this configuration file while running Map/Reduce jobs on the allocated |
---|
| 67 | cluster. |
---|
| 68 | |
---|
| 69 | RingMaster: The RingMaster is a HOD process that is started on one node |
---|
| 70 | per every allocated cluster. It is submitted as a 'job' to the resource |
---|
| 71 | manager by the HOD client. It controls which Hadoop daemons start on |
---|
| 72 | which nodes. It provides this information to other HOD processes, |
---|
| 73 | such as the HOD client, so users can also determine this information. |
---|
| 74 | The RingMaster is responsible for hosting and distributing the |
---|
| 75 | Hadoop tarball to all nodes in the cluster. It also automatically |
---|
| 76 | cleans up unused clusters. |
---|
| 77 | |
---|
| 78 | HodRing: The HodRing is a HOD process that runs on every allocated node |
---|
| 79 | in the cluster. These processes are run by the RingMaster through the |
---|
| 80 | resource manager, using a facility of parallel execution. The HodRings |
---|
| 81 | are responsible for launching Hadoop commands on the nodes to bring up |
---|
| 82 | the Hadoop daemons. They get the command to launch from the RingMaster. |
---|
| 83 | |
---|
| 84 | Hodrc / HOD configuration file: An INI style configuration file where |
---|
| 85 | the users configure various options for the HOD system, including |
---|
| 86 | install locations of different software, resource manager parameters, |
---|
| 87 | log and temp file directories, parameters for their MapReduce jobs, |
---|
| 88 | etc. |
---|
| 89 | |
---|
| 90 | Submit Nodes: Nodes where the HOD Client is run, from where jobs are |
---|
| 91 | submitted to the resource manager system for allocating and running |
---|
| 92 | clusters. |
---|
| 93 | |
---|
| 94 | Compute Nodes: Nodes which get allocated by a resource manager, |
---|
| 95 | and on which the Hadoop daemons are provisioned and started. |
---|
| 96 | |
---|
| 97 | 4. Next Steps: |
---|
| 98 | ============== |
---|
| 99 | |
---|
| 100 | - Read getting_started.txt to get an idea of how to get started with |
---|
| 101 | installing, configuring and running HOD. |
---|
| 102 | |
---|
| 103 | - Read config.txt to get more details on configuration options for HOD. |
---|
| 104 | |
---|