source: proiecte/HadoopJUnit/hadoop-0.20.1/src/contrib/failmon/README @ 176

Last change on this file since 176 was 120, checked in by (none), 14 years ago

Added the mail files for the Hadoop JUNit Project

  • Property svn:executable set to *
File size: 3.6 KB
Line 
1****************** FailMon Quick Start Guide ***********************
2
3This document is a guide to quickly setting up and running FailMon.
4For more information and details please see the FailMon User Manual.
5
6***** Building FailMon *****
7
8Normally, FailMon lies under <hadoop-dir>/src/contrib/failmon, where
9<hadoop-source-dir> is the Hadoop project root folder. To compile it,
10one can either run ant for the whole Hadoop project, i.e.:
11
12$ cd <hadoop-dir>
13$ ant
14
15or run ant only for FailMon:
16
17$ cd <hadoop-dir>/src/contrib/failmon
18$ ant
19
20The above will compile FailMon and place all class files under
21<hadoop-dir>/build/contrib/failmon/classes.
22
23By invoking:
24
25$ cd <hadoop-dir>/src/contrib/failmon
26$ ant tar
27
28FailMon is packaged as a standalone jar application in
29<hadoop-dir>/src/contrib/failmon/failmon.tar.gz.
30
31
32***** Deploying FailMon *****
33
34There are two ways FailMon can be deployed in a cluster:
35
36a) Within Hadoop, in which case the whole Hadoop package is uploaded
37to the cluster nodes. In that case, nothing else needs to be done on
38individual nodes.
39
40b) Independently of the Hadoop deployment, i.e., by uploading
41failmon.tar.gz to all nodes and uncompressing it. In that case, the
42bin/failmon.sh script needs to be edited; environment variable
43HADOOPDIR should point to the root directory of the Hadoop
44distribution. Also the location of the Hadoop configuration files
45should be pointed by the property 'hadoop.conf.path' in file
46conf/failmon.properties. Note that these files refer to the HDFS in
47which we want to store the FailMon data (which can potentially be
48different than the one on the cluster we are monitoring).
49
50We assume that either way FailMon is placed in the same directory on
51all nodes, which is typical for most clusters. If this is not
52feasible, one should create the same symbolic link on all nodes of the
53cluster, that points to the FailMon directory of each node.
54
55One should also edit the conf/failmon.properties file on each node to
56set his own property values. However, the default values are expected
57to serve most practical cases. Refer to the FailMon User Manual about
58the various properties and configuration parameters.
59
60
61***** Running FailMon *****
62
63In order to run FailMon using a node to do the ad-hoc scheduling of
64monitoring jobs, one needs edit the hosts.list file to specify the
65list of machine hostnames on which FailMon is to be run. Also, in file
66conf/global.config the username used to connect to the machines has to
67be specified (passwordless SSH is assumed) in property 'ssh.username'.
68In property 'failmon.dir', the path to the FailMon folder has to be
69specified as well (it is assumed to be the same on all machines in the
70cluster). Then one only needs to invoke the command:
71
72$ cd <hadoop-dir>
73$ bin/scheduler.py
74
75to start the system.
76
77
78***** Merging HDFS files *****
79
80For the purpose of merging the files created on HDFS by FailMon, the
81following command can be used:
82
83$ cd <hadoop-dir>
84$ bin/failmon.sh --mergeFiles
85
86This will concatenate all files in the HDFS folder (pointed to by the
87'hdfs.upload.dir' property in conf/failmon.properties file) into a
88single file, which will be placed in the same folder. Also the
89location of the Hadoop configuration files should be pointed by the
90property 'hadoop.conf.path' in file conf/failmon.properties. Note that
91these files refer to the HDFS in which have stored the FailMon data
92(which can potentially be different than the one on the cluster we are
93monitoring). Also, the scheduler.py script can be set up to merge the
94HDFS files when their number surpasses a configurable limit (see
95'conf/global.config' file).
96
97Please refer to the FailMon User Manual for more details.
Note: See TracBrowser for help on using the repository browser.