1 | Hadoop On Demand |
---|
2 | ================ |
---|
3 | |
---|
4 | 1. Introduction: |
---|
5 | ================ |
---|
6 | |
---|
7 | The Hadoop On Demand (HOD) project is a system for provisioning and |
---|
8 | managing independent Hadoop MapReduce instances on a shared cluster |
---|
9 | of nodes. HOD uses a resource manager for allocation. At present it |
---|
10 | supports Torque (http://www.clusterresources.com/pages/products/torque-resource-manager.php) |
---|
11 | out of the box. |
---|
12 | |
---|
13 | 2. Feature List: |
---|
14 | ================ |
---|
15 | |
---|
16 | The following are the features provided by HOD: |
---|
17 | |
---|
18 | 2.1 Simplified interface for managing MapReduce clusters: |
---|
19 | |
---|
20 | The MapReduce user interacts with the cluster through a simple |
---|
21 | command line interface, the HOD client. HOD brings up a virtual |
---|
22 | MapReduce cluster with the required number of nodes, which the |
---|
23 | user can use for running Hadoop jobs. When done, HOD will |
---|
24 | automatically clean up the resources and make the nodes available |
---|
25 | again. |
---|
26 | |
---|
27 | 2.2 Automatic installation of Hadoop: |
---|
28 | |
---|
29 | With HOD, Hadoop does not need to be even installed on the cluster. |
---|
30 | The user can provide a Hadoop tarball that HOD will automatically |
---|
31 | distribute to all the nodes in the cluster. |
---|
32 | |
---|
33 | 2.3 Configuring Hadoop: |
---|
34 | |
---|
35 | Dynamic parameters of Hadoop configuration, such as the NameNode and |
---|
36 | JobTracker addresses and ports, and file system temporary directories |
---|
37 | are generated and distributed by HOD automatically to all nodes in |
---|
38 | the cluster. |
---|
39 | |
---|
40 | In addition, HOD allows the user to configure Hadoop parameters |
---|
41 | at both the server (for e.g. JobTracker) and client (for e.g. JobClient) |
---|
42 | level, including 'final' parameters, that were introduced with |
---|
43 | Hadoop 0.15. |
---|
44 | |
---|
45 | 2.4 Auto-cleanup of unused clusters: |
---|
46 | |
---|
47 | HOD has an automatic timeout so that users cannot misuse resources they |
---|
48 | aren't using. The timeout applies only when there is no MapReduce job |
---|
49 | running. |
---|
50 | |
---|
51 | 2.5 Log services: |
---|
52 | |
---|
53 | HOD can be used to collect all MapReduce logs to a central location |
---|
54 | for archiving and inspection after the job is completed. |
---|
55 | |
---|
56 | 3. HOD Components |
---|
57 | ================= |
---|
58 | |
---|
59 | This is a brief overview of the various components of HOD and how they |
---|
60 | interact to provision Hadoop. |
---|
61 | |
---|
62 | HOD Client: The HOD client is a Unix command that users use to allocate |
---|
63 | Hadoop MapReduce clusters. The command provides other options to list |
---|
64 | allocated clusters and deallocate them. The HOD client generates the |
---|
65 | hadoop-site.xml in a user specified directory. The user can point to |
---|
66 | this configuration file while running Map/Reduce jobs on the allocated |
---|
67 | cluster. |
---|
68 | |
---|
69 | RingMaster: The RingMaster is a HOD process that is started on one node |
---|
70 | per every allocated cluster. It is submitted as a 'job' to the resource |
---|
71 | manager by the HOD client. It controls which Hadoop daemons start on |
---|
72 | which nodes. It provides this information to other HOD processes, |
---|
73 | such as the HOD client, so users can also determine this information. |
---|
74 | The RingMaster is responsible for hosting and distributing the |
---|
75 | Hadoop tarball to all nodes in the cluster. It also automatically |
---|
76 | cleans up unused clusters. |
---|
77 | |
---|
78 | HodRing: The HodRing is a HOD process that runs on every allocated node |
---|
79 | in the cluster. These processes are run by the RingMaster through the |
---|
80 | resource manager, using a facility of parallel execution. The HodRings |
---|
81 | are responsible for launching Hadoop commands on the nodes to bring up |
---|
82 | the Hadoop daemons. They get the command to launch from the RingMaster. |
---|
83 | |
---|
84 | Hodrc / HOD configuration file: An INI style configuration file where |
---|
85 | the users configure various options for the HOD system, including |
---|
86 | install locations of different software, resource manager parameters, |
---|
87 | log and temp file directories, parameters for their MapReduce jobs, |
---|
88 | etc. |
---|
89 | |
---|
90 | Submit Nodes: Nodes where the HOD Client is run, from where jobs are |
---|
91 | submitted to the resource manager system for allocating and running |
---|
92 | clusters. |
---|
93 | |
---|
94 | Compute Nodes: Nodes which get allocated by a resource manager, |
---|
95 | and on which the Hadoop daemons are provisioned and started. |
---|
96 | |
---|
97 | 4. Next Steps: |
---|
98 | ============== |
---|
99 | |
---|
100 | - Read getting_started.txt to get an idea of how to get started with |
---|
101 | installing, configuring and running HOD. |
---|
102 | |
---|
103 | - Read config.txt to get more details on configuration options for HOD. |
---|
104 | |
---|