[120] | 1 | HOD Configuration |
---|
| 2 | ================= |
---|
| 3 | |
---|
| 4 | 1. Introduction: |
---|
| 5 | ================ |
---|
| 6 | |
---|
| 7 | Configuration options for HOD are organized as sections and options |
---|
| 8 | within them. They can be specified in two ways: a configuration file |
---|
| 9 | in the INI format, and as command line options to the HOD shell, |
---|
| 10 | specified in the format --section.option[=value]. If the same option is |
---|
| 11 | specified in both places, the value specified on the command line |
---|
| 12 | overrides the value in the configuration file. |
---|
| 13 | |
---|
| 14 | To get a simple description of all configuration options, you can type |
---|
| 15 | hod --verbose-help |
---|
| 16 | |
---|
| 17 | This document explains some of the most important or commonly used |
---|
| 18 | configuration options in some more detail. |
---|
| 19 | |
---|
| 20 | 2. Sections: |
---|
| 21 | ============ |
---|
| 22 | |
---|
| 23 | The following are the various sections in the HOD configuration: |
---|
| 24 | |
---|
| 25 | * hod: Options for the HOD client |
---|
| 26 | * resource_manager: Options for specifying which resource |
---|
| 27 | manager to use, and other parameters for |
---|
| 28 | using that resource manager |
---|
| 29 | * ringmaster: Options for the RingMaster process, |
---|
| 30 | * hodring: Options for the HodRing processes |
---|
| 31 | * gridservice-mapred: Options for the MapReduce daemons |
---|
| 32 | * gridservice-hdfs: Options for the HDFS daemons. |
---|
| 33 | |
---|
| 34 | The following are some of the important options in the HOD |
---|
| 35 | configuration: |
---|
| 36 | |
---|
| 37 | 3. Important / Commonly Used Configuration Options: |
---|
| 38 | =================================================== |
---|
| 39 | |
---|
| 40 | 3.1. Common configuration options: |
---|
| 41 | ---------------------------------- |
---|
| 42 | |
---|
| 43 | Certain configuration options are defined in most of the sections of |
---|
| 44 | the HOD configuration. Options defined in a section, are used by the |
---|
| 45 | process for which that section applies. These options have the same |
---|
| 46 | meaning, but can have different values in each section. |
---|
| 47 | |
---|
| 48 | * temp-dir: Temporary directory for usage by the HOD processes. Make |
---|
| 49 | sure that the users who will run hod have rights to create |
---|
| 50 | directories under the directory specified here. |
---|
| 51 | |
---|
| 52 | * debug: A numeric value from 1-4. 4 produces the most log information, |
---|
| 53 | and 1 the least. |
---|
| 54 | |
---|
| 55 | * log-dir: Directory where log files are stored. By default, this is |
---|
| 56 | <install-location>/logs/. The restrictions and notes for the |
---|
| 57 | temp-dir variable apply here too. |
---|
| 58 | |
---|
| 59 | * xrs-port-range: A range of ports, among which an available port shall |
---|
| 60 | be picked for use to run an XML-RPC server. |
---|
| 61 | |
---|
| 62 | * http-port-range: A range of ports, among which an available port shall |
---|
| 63 | be picked for use to run an HTTP server. |
---|
| 64 | |
---|
| 65 | * java-home: Location of Java to be used by Hadoop. |
---|
| 66 | |
---|
| 67 | 3.2 hod options: |
---|
| 68 | ---------------- |
---|
| 69 | |
---|
| 70 | * cluster: A descriptive name given to the cluster. For Torque, this is |
---|
| 71 | specified as a 'Node property' for every node in the cluster. |
---|
| 72 | HOD uses this value to compute the number of available nodes. |
---|
| 73 | |
---|
| 74 | * client-params: A comma-separated list of hadoop config parameters |
---|
| 75 | specified as key-value pairs. These will be used to |
---|
| 76 | generate a hadoop-site.xml on the submit node that |
---|
| 77 | should be used for running MapReduce jobs. |
---|
| 78 | |
---|
| 79 | 3.3 resource_manager options: |
---|
| 80 | ----------------------------- |
---|
| 81 | |
---|
| 82 | * queue: Name of the queue configured in the resource manager to which |
---|
| 83 | jobs are to be submitted. |
---|
| 84 | |
---|
| 85 | * batch-home: Install directory to which 'bin' is appended and under |
---|
| 86 | which the executables of the resource manager can be |
---|
| 87 | found. |
---|
| 88 | |
---|
| 89 | * env-vars: This is a comma separated list of key-value pairs, |
---|
| 90 | expressed as key=value, which would be passed to the jobs |
---|
| 91 | launched on the compute nodes. |
---|
| 92 | For example, if the python installation is |
---|
| 93 | in a non-standard location, one can set the environment |
---|
| 94 | variable 'HOD_PYTHON_HOME' to the path to the python |
---|
| 95 | executable. The HOD processes launched on the compute nodes |
---|
| 96 | can then use this variable. |
---|
| 97 | |
---|
| 98 | 3.4 ringmaster options: |
---|
| 99 | ----------------------- |
---|
| 100 | |
---|
| 101 | * work-dirs: These are a list of comma separated paths that will serve |
---|
| 102 | as the root for directories that HOD generates and passes |
---|
| 103 | to Hadoop for use to store DFS / MapReduce data. For e.g. |
---|
| 104 | this is where DFS data blocks will be stored. Typically, |
---|
| 105 | as many paths are specified as there are disks available |
---|
| 106 | to ensure all disks are being utilized. The restrictions |
---|
| 107 | and notes for the temp-dir variable apply here too. |
---|
| 108 | |
---|
| 109 | 3.5 gridservice-hdfs options: |
---|
| 110 | ----------------------------- |
---|
| 111 | |
---|
| 112 | * external: If false, this indicates that a HDFS cluster must be |
---|
| 113 | bought up by the HOD system, on the nodes which it |
---|
| 114 | allocates via the allocate command. Note that in that case, |
---|
| 115 | when the cluster is de-allocated, it will bring down the |
---|
| 116 | HDFS cluster, and all the data will be lost. |
---|
| 117 | If true, it will try and connect to an externally configured |
---|
| 118 | HDFS system. |
---|
| 119 | Typically, because input for jobs are placed into HDFS |
---|
| 120 | before jobs are run, and also the output from jobs in HDFS |
---|
| 121 | is required to be persistent, an internal HDFS cluster is |
---|
| 122 | of little value in a production system. However, it allows |
---|
| 123 | for quick testing. |
---|
| 124 | |
---|
| 125 | * host: Hostname of the externally configured NameNode, if any |
---|
| 126 | |
---|
| 127 | * fs_port: Port to which NameNode RPC server is bound. |
---|
| 128 | |
---|
| 129 | * info_port: Port to which the NameNode web UI server is bound. |
---|
| 130 | |
---|
| 131 | * pkgs: Installation directory, under which bin/hadoop executable is |
---|
| 132 | located. This can be used to use a pre-installed version of |
---|
| 133 | Hadoop on the cluster. |
---|
| 134 | |
---|
| 135 | * server-params: A comma-separated list of hadoop config parameters |
---|
| 136 | specified key-value pairs. These will be used to |
---|
| 137 | generate a hadoop-site.xml that will be used by the |
---|
| 138 | NameNode and DataNodes. |
---|
| 139 | |
---|
| 140 | * final-server-params: Same as above, except they will be marked final. |
---|
| 141 | |
---|
| 142 | |
---|
| 143 | 3.6 gridservice-mapred options: |
---|
| 144 | ------------------------------- |
---|
| 145 | |
---|
| 146 | * external: If false, this indicates that a MapReduce cluster must be |
---|
| 147 | bought up by the HOD system on the nodes which it allocates |
---|
| 148 | via the allocate command. |
---|
| 149 | If true, if will try and connect to an externally |
---|
| 150 | configured MapReduce system. |
---|
| 151 | |
---|
| 152 | * host: Hostname of the externally configured JobTracker, if any |
---|
| 153 | |
---|
| 154 | * tracker_port: Port to which the JobTracker RPC server is bound |
---|
| 155 | |
---|
| 156 | * info_port: Port to which the JobTracker web UI server is bound. |
---|
| 157 | |
---|
| 158 | * pkgs: Installation directory, under which bin/hadoop executable is |
---|
| 159 | located |
---|
| 160 | |
---|
| 161 | * server-params: A comma-separated list of hadoop config parameters |
---|
| 162 | specified key-value pairs. These will be used to |
---|
| 163 | generate a hadoop-site.xml that will be used by the |
---|
| 164 | JobTracker and TaskTrackers |
---|
| 165 | |
---|
| 166 | * final-server-params: Same as above, except they will be marked final. |
---|
| 167 | |
---|
| 168 | 4. Known Issues: |
---|
| 169 | ================ |
---|
| 170 | |
---|
| 171 | HOD does not currently handle special characters such as space, comma |
---|
| 172 | and equals in configuration values. |
---|