Context Navigation

getting_started.txt

Last change on this file was 120, checked in by (none), 14 years ago
Added the mail files for the Hadoop JUNit Project
Property svn:executable set to ``*
File size: 9.2 KB

Line
1	Getting Started With Hadoop On Demand (HOD)
2	===========================================
3
4	1. Pre-requisites:
5	==================
6
7	Hardware:
8	HOD requires a minimum of 3 nodes configured through a resource manager.
9
10	Software:
11	The following components are assumed to be installed before using HOD:
12	* Torque:
13	(http://www.clusterresources.com/pages/products/torque-resource-manager.php)
14	Currently HOD supports Torque out of the box. We assume that you are
15	familiar with configuring Torque. You can get information about this
16	from the following link:
17	http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki
18	* Python (http://www.python.org/)
19	We require version 2.5.1 of Python.
20
21	The following components can be optionally installed for getting better
22	functionality from HOD:
23	* Twisted Python: This can be used for improving the scalability of HOD
24	(http://twistedmatrix.com/trac/)
25	* Hadoop: HOD can automatically distribute Hadoop to all nodes in the
26	cluster. However, it can also use a pre-installed version of Hadoop,
27	if it is available on all nodes in the cluster.
28	(http://hadoop.apache.org/core)
29	HOD currently supports Hadoop 0.15 and above.
30
31	NOTE: HOD configuration requires the location of installs of these
32	components to be the same on all nodes in the cluster. It will also
33	make the configuration simpler to have the same location on the submit
34	nodes.
35
36	2. Resource Manager Configuration Pre-requisites:
37	=================================================
38
39	For using HOD with Torque:
40	* Install Torque components: pbs_server on a head node, pbs_moms on all
41	compute nodes, and PBS client tools on all compute nodes and submit
42	nodes.
43	* Create a queue for submitting jobs on the pbs_server.
44	* Specify a name for all nodes in the cluster, by setting a 'node
45	property' to all the nodes.
46	This can be done by using the 'qmgr' command. For example:
47	qmgr -c "set node node properties=cluster-name"
48	* Ensure that jobs can be submitted to the nodes. This can be done by
49	using the 'qsub' command. For example:
50	echo "sleep 30" \| qsub -l nodes=3
51	* More information about setting up Torque can be found by referring
52	to the documentation under:
53	http://www.clusterresources.com/pages/products/torque-resource-manager.php
54
55	3. Setting up HOD:
56	==================
57
58	* HOD is available under the 'contrib' section of Hadoop under the root
59	directory 'hod'.
60	* Distribute the files under this directory to all the nodes in the
61	cluster. Note that the location where the files are copied should be
62	the same on all the nodes.
63	* On the node from where you want to run hod, edit the file hodrc
64	which can be found in the <install dir>/conf directory. This file
65	contains the minimal set of values required for running hod.
66	* Specify values suitable to your environment for the following
67	variables defined in the configuration file. Note that some of these
68	variables are defined at more than one place in the file.
69
70	* ${JAVA_HOME}: Location of Java for Hadoop. Hadoop supports Sun JDK
71	1.5.x
72	* ${CLUSTER_NAME}: Name of the cluster which is specified in the
73	'node property' as mentioned in resource manager configuration.
74	* ${HADOOP_HOME}: Location of Hadoop installation on the compute and
75	submit nodes.
76	* ${RM_QUEUE}: Queue configured for submiting jobs in the resource
77	manager configuration.
78	* ${RM_HOME}: Location of the resource manager installation on the
79	compute and submit nodes.
80
81	* The following environment variables may need to be set depending on
82	your environment. These variables must be defined where you run the
83	HOD client, and also be specified in the HOD configuration file as the
84	value of the key resource_manager.env-vars. Multiple variables can be
85	specified as a comma separated list of key=value pairs.
86
87	* HOD_PYTHON_HOME: If you install python to a non-default location
88	of the compute nodes, or submit nodes, then, this variable must be
89	defined to point to the python executable in the non-standard
90	location.
91
92
93	NOTE:
94
95	You can also review other configuration options in the file and
96	modify them to suit your needs. Refer to the file config.txt for
97	information about the HOD configuration.
98
99
100	4. Running HOD:
101	===============
102
103	4.1 Overview:
104	-------------
105
106	A typical session of HOD will involve atleast three steps: allocate,
107	run hadoop jobs, deallocate.
108
109	4.1.1 Operation allocate
110	------------------------
111
112	The allocate operation is used to allocate a set of nodes and install and
113	provision Hadoop on them. It has the following syntax:
114
115	hod -c config_file -t hadoop_tarball_location -o "allocate \
116	cluster_dir number_of_nodes"
117
118	The hadoop_tarball_location must be a location on a shared file system
119	accesible from all nodes in the cluster. Note, the cluster_dir must exist
120	before running the command. If the command completes successfully then
121	cluster_dir/hadoop-site.xml will be generated and will contain information
122	about the allocated cluster's JobTracker and NameNode.
123
124	For example, the following command uses a hodrc file in ~/hod-config/hodrc and
125	allocates Hadoop (provided by the tarball ~/share/hadoop.tar.gz) on 10 nodes,
126	storing the generated Hadoop configuration in a directory named
127	~/hadoop-cluster:
128
129	$ hod -c ~/hod-config/hodrc -t ~/share/hadoop.tar.gz -o "allocate \
130	~/hadoop-cluster 10"
131
132	HOD also supports an environment variable called HOD_CONF_DIR. If this is
133	defined, HOD will look for a default hodrc file at $HOD_CONF_DIR/hodrc.
134	Defining this allows the above command to also be run as follows:
135
136	$ export HOD_CONF_DIR=~/hod-config
137	$ hod -t ~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10"
138
139	4.1.2 Running Hadoop jobs using the allocated cluster
140	-----------------------------------------------------
141
142	Now, one can run Hadoop jobs using the allocated cluster in the usual manner:
143
144	hadoop --config cluster_dir hadoop_command hadoop_command_args
145
146	Continuing our example, the following command will run a wordcount example on
147	the allocated cluster:
148
149	$ hadoop --config ~/hadoop-cluster jar \
150	/path/to/hadoop/hadoop-examples.jar wordcount /path/to/input /path/to/output
151
152	4.1.3 Operation deallocate
153	--------------------------
154
155	The deallocate operation is used to release an allocated cluster. When
156	finished with a cluster, deallocate must be run so that the nodes become free
157	for others to use. The deallocate operation has the following syntax:
158
159	hod -o "deallocate cluster_dir"
160
161	Continuing our example, the following command will deallocate the cluster:
162
163	$ hod -o "deallocate ~/hadoop-cluster"
164
165	4.2 Command Line Options
166	------------------------
167
168	This section covers the major command line options available via the hod
169	command:
170
171	--help
172	Prints out the help message to see the basic options.
173
174	--verbose-help
175	All configuration options provided in the hodrc file can be passed on the
176	command line, using the syntax --section_name.option_name[=value]. When
177	provided this way, the value provided on command line overrides the option
178	provided in hodrc. The verbose-help command lists all the available options in
179	the hodrc file. This is also a nice way to see the meaning of the
180	configuration options.
181
182	-c config_file
183	Provides the configuration file to use. Can be used with all other options of
184	HOD. Alternatively, the HOD_CONF_DIR environment variable can be defined to
185	specify a directory that contains a file named hodrc, alleviating the need to
186	specify the configuration file in each HOD command.
187
188	-b 1\|2\|3\|4
189	Enables the given debug level. Can be used with all other options of HOD. 4 is
190	most verbose.
191
192	-o "help"
193	Lists the operations available in the operation mode.
194
195	-o "allocate cluster_dir number_of_nodes"
196	Allocates a cluster on the given number of cluster nodes, and store the
197	allocation information in cluster_dir for use with subsequent hadoop commands.
198	Note that the cluster_dir must exist before running the command.
199
200	-o "list"
201	Lists the clusters allocated by this user. Information provided includes the
202	Torque job id corresponding to the cluster, the cluster directory where the
203	allocation information is stored, and whether the Map/Reduce daemon is still
204	active or not.
205
206	-o "info cluster_dir"
207	Lists information about the cluster whose allocation information is stored in
208	the specified cluster directory.
209
210	-o "deallocate cluster_dir"
211	Deallocates the cluster whose allocation information is stored in the
212	specified cluster directory.
213
214	-t hadoop_tarball
215	Provisions Hadoop from the given tar.gz file. This option is only applicable
216	to the allocate operation. For better distribution performance it is
217	recommended that the Hadoop tarball contain only the libraries and binaries,
218	and not the source or documentation.
219
220	-Mkey1=value1 -Mkey2=value2
221	Provides configuration parameters for the provisioned Map/Reduce daemons
222	(JobTracker and TaskTrackers). A hadoop-site.xml is generated with these
223	values on the cluster nodes
224
225	-Hkey1=value1 -Hkey2=value2
226	Provides configuration parameters for the provisioned HDFS daemons (NameNode
227	and DataNodes). A hadoop-site.xml is generated with these values on the
228	cluster nodes
229
230	-Ckey1=value1 -Ckey2=value2
231	Provides configuration parameters for the client from where jobs can be
232	submitted. A hadoop-site.xml is generated with these values on the submit
233	node.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: proiecte/HadoopJUnit/hadoop-0.20.1/contrib/hod/getting_started.txt

Download in other formats: