Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

getting_started.txt @ 173

Last change on this file since 173 was 120, checked in by (none), 14 years ago
Added the mail files for the Hadoop JUNit Project
Property svn:executable set to ``*
File size: 9.2 KB

Rev	Line
[120]	1	Getting Started With Hadoop On Demand (HOD)
	2	===========================================
	3
	4	1. Pre-requisites:
	5	==================
	6
	7	Hardware:
	8	HOD requires a minimum of 3 nodes configured through a resource manager.
	9
	10	Software:
	11	The following components are assumed to be installed before using HOD:
	12	* Torque:
	13	(http://www.clusterresources.com/pages/products/torque-resource-manager.php)
	14	Currently HOD supports Torque out of the box. We assume that you are
	15	familiar with configuring Torque. You can get information about this
	16	from the following link:
	17	http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki
	18	* Python (http://www.python.org/)
	19	We require version 2.5.1 of Python.
	20
	21	The following components can be optionally installed for getting better
	22	functionality from HOD:
	23	* Twisted Python: This can be used for improving the scalability of HOD
	24	(http://twistedmatrix.com/trac/)
	25	* Hadoop: HOD can automatically distribute Hadoop to all nodes in the
	26	cluster. However, it can also use a pre-installed version of Hadoop,
	27	if it is available on all nodes in the cluster.
	28	(http://hadoop.apache.org/core)
	29	HOD currently supports Hadoop 0.15 and above.
	30
	31	NOTE: HOD configuration requires the location of installs of these
	32	components to be the same on all nodes in the cluster. It will also
	33	make the configuration simpler to have the same location on the submit
	34	nodes.
	35
	36	2. Resource Manager Configuration Pre-requisites:
	37	=================================================
	38
	39	For using HOD with Torque:
	40	* Install Torque components: pbs_server on a head node, pbs_moms on all
	41	compute nodes, and PBS client tools on all compute nodes and submit
	42	nodes.
	43	* Create a queue for submitting jobs on the pbs_server.
	44	* Specify a name for all nodes in the cluster, by setting a 'node
	45	property' to all the nodes.
	46	This can be done by using the 'qmgr' command. For example:
	47	qmgr -c "set node node properties=cluster-name"
	48	* Ensure that jobs can be submitted to the nodes. This can be done by
	49	using the 'qsub' command. For example:
	50	echo "sleep 30" \| qsub -l nodes=3
	51	* More information about setting up Torque can be found by referring
	52	to the documentation under:
	53	http://www.clusterresources.com/pages/products/torque-resource-manager.php
	54
	55	3. Setting up HOD:
	56	==================
	57
	58	* HOD is available under the 'contrib' section of Hadoop under the root
	59	directory 'hod'.
	60	* Distribute the files under this directory to all the nodes in the
	61	cluster. Note that the location where the files are copied should be
	62	the same on all the nodes.
	63	* On the node from where you want to run hod, edit the file hodrc
	64	which can be found in the <install dir>/conf directory. This file
	65	contains the minimal set of values required for running hod.
	66	* Specify values suitable to your environment for the following
	67	variables defined in the configuration file. Note that some of these
	68	variables are defined at more than one place in the file.
	69
	70	* ${JAVA_HOME}: Location of Java for Hadoop. Hadoop supports Sun JDK
	71	1.5.x
	72	* ${CLUSTER_NAME}: Name of the cluster which is specified in the
	73	'node property' as mentioned in resource manager configuration.
	74	* ${HADOOP_HOME}: Location of Hadoop installation on the compute and
	75	submit nodes.
	76	* ${RM_QUEUE}: Queue configured for submiting jobs in the resource
	77	manager configuration.
	78	* ${RM_HOME}: Location of the resource manager installation on the
	79	compute and submit nodes.
	80
	81	* The following environment variables may need to be set depending on
	82	your environment. These variables must be defined where you run the
	83	HOD client, and also be specified in the HOD configuration file as the
	84	value of the key resource_manager.env-vars. Multiple variables can be
	85	specified as a comma separated list of key=value pairs.
	86
	87	* HOD_PYTHON_HOME: If you install python to a non-default location
	88	of the compute nodes, or submit nodes, then, this variable must be
	89	defined to point to the python executable in the non-standard
	90	location.
	91
	92
	93	NOTE:
	94
	95	You can also review other configuration options in the file and
	96	modify them to suit your needs. Refer to the file config.txt for
	97	information about the HOD configuration.
	98
	99
	100	4. Running HOD:
	101	===============
	102
	103	4.1 Overview:
	104	-------------
	105
	106	A typical session of HOD will involve atleast three steps: allocate,
	107	run hadoop jobs, deallocate.
	108
	109	4.1.1 Operation allocate
	110	------------------------
	111
	112	The allocate operation is used to allocate a set of nodes and install and
	113	provision Hadoop on them. It has the following syntax:
	114
	115	hod -c config_file -t hadoop_tarball_location -o "allocate \
	116	cluster_dir number_of_nodes"
	117
	118	The hadoop_tarball_location must be a location on a shared file system
	119	accesible from all nodes in the cluster. Note, the cluster_dir must exist
	120	before running the command. If the command completes successfully then
	121	cluster_dir/hadoop-site.xml will be generated and will contain information
	122	about the allocated cluster's JobTracker and NameNode.
	123
	124	For example, the following command uses a hodrc file in ~/hod-config/hodrc and
	125	allocates Hadoop (provided by the tarball ~/share/hadoop.tar.gz) on 10 nodes,
	126	storing the generated Hadoop configuration in a directory named
	127	~/hadoop-cluster:
	128
	129	$ hod -c ~/hod-config/hodrc -t ~/share/hadoop.tar.gz -o "allocate \
	130	~/hadoop-cluster 10"
	131
	132	HOD also supports an environment variable called HOD_CONF_DIR. If this is
	133	defined, HOD will look for a default hodrc file at $HOD_CONF_DIR/hodrc.
	134	Defining this allows the above command to also be run as follows:
	135
	136	$ export HOD_CONF_DIR=~/hod-config
	137	$ hod -t ~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10"
	138
	139	4.1.2 Running Hadoop jobs using the allocated cluster
	140	-----------------------------------------------------
	141
	142	Now, one can run Hadoop jobs using the allocated cluster in the usual manner:
	143
	144	hadoop --config cluster_dir hadoop_command hadoop_command_args
	145
	146	Continuing our example, the following command will run a wordcount example on
	147	the allocated cluster:
	148
	149	$ hadoop --config ~/hadoop-cluster jar \
	150	/path/to/hadoop/hadoop-examples.jar wordcount /path/to/input /path/to/output
	151
	152	4.1.3 Operation deallocate
	153	--------------------------
	154
	155	The deallocate operation is used to release an allocated cluster. When
	156	finished with a cluster, deallocate must be run so that the nodes become free
	157	for others to use. The deallocate operation has the following syntax:
	158
	159	hod -o "deallocate cluster_dir"
	160
	161	Continuing our example, the following command will deallocate the cluster:
	162
	163	$ hod -o "deallocate ~/hadoop-cluster"
	164
	165	4.2 Command Line Options
	166	------------------------
	167
	168	This section covers the major command line options available via the hod
	169	command:
	170
	171	--help
	172	Prints out the help message to see the basic options.
	173
	174	--verbose-help
	175	All configuration options provided in the hodrc file can be passed on the
	176	command line, using the syntax --section_name.option_name[=value]. When
	177	provided this way, the value provided on command line overrides the option
	178	provided in hodrc. The verbose-help command lists all the available options in
	179	the hodrc file. This is also a nice way to see the meaning of the
	180	configuration options.
	181
	182	-c config_file
	183	Provides the configuration file to use. Can be used with all other options of
	184	HOD. Alternatively, the HOD_CONF_DIR environment variable can be defined to
	185	specify a directory that contains a file named hodrc, alleviating the need to
	186	specify the configuration file in each HOD command.
	187
	188	-b 1\|2\|3\|4
	189	Enables the given debug level. Can be used with all other options of HOD. 4 is
	190	most verbose.
	191
	192	-o "help"
	193	Lists the operations available in the operation mode.
	194
	195	-o "allocate cluster_dir number_of_nodes"
	196	Allocates a cluster on the given number of cluster nodes, and store the
	197	allocation information in cluster_dir for use with subsequent hadoop commands.
	198	Note that the cluster_dir must exist before running the command.
	199
	200	-o "list"
	201	Lists the clusters allocated by this user. Information provided includes the
	202	Torque job id corresponding to the cluster, the cluster directory where the
	203	allocation information is stored, and whether the Map/Reduce daemon is still
	204	active or not.
	205
	206	-o "info cluster_dir"
	207	Lists information about the cluster whose allocation information is stored in
	208	the specified cluster directory.
	209
	210	-o "deallocate cluster_dir"
	211	Deallocates the cluster whose allocation information is stored in the
	212	specified cluster directory.
	213
	214	-t hadoop_tarball
	215	Provisions Hadoop from the given tar.gz file. This option is only applicable
	216	to the allocate operation. For better distribution performance it is
	217	recommended that the Hadoop tarball contain only the libraries and binaries,
	218	and not the source or documentation.
	219
	220	-Mkey1=value1 -Mkey2=value2
	221	Provides configuration parameters for the provisioned Map/Reduce daemons
	222	(JobTracker and TaskTrackers). A hadoop-site.xml is generated with these
	223	values on the cluster nodes
	224
	225	-Hkey1=value1 -Hkey2=value2
	226	Provides configuration parameters for the provisioned HDFS daemons (NameNode
	227	and DataNodes). A hadoop-site.xml is generated with these values on the
	228	cluster nodes
	229
	230	-Ckey1=value1 -Ckey2=value2
	231	Provides configuration parameters for the client from where jobs can be
	232	submitted. A hadoop-site.xml is generated with these values on the submit
	233	node.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: proiecte/HadoopJUnit/hadoop-0.20.1/contrib/hod/getting_started.txt @ 173

Download in other formats: