org.apache.hadoop.mapred
Class TaskTracker

java.lang.Object
  extended by org.apache.hadoop.mapred.TaskTracker
All Implemented Interfaces:
Runnable

public class TaskTracker
extends Object
implements Runnable

TaskTracker is a process that starts and tracks MR Tasks in a networked environment. It contacts the JobTracker for Task assignments and reporting results.


Nested Class Summary
static class TaskTracker.MapOutputServlet
          This class is used in TaskTracker's Jetty to serve the map outputs to other nodes.
 
Field Summary
static org.apache.commons.logging.Log ClientTraceLog
           
static int CLUSTER_INCREMENT
           
static long COUNTER_UPDATE_INTERVAL
           
static int FILE_NOT_FOUND
           
static String FOR_REDUCE_TASK
          The reduce task number for which this map output is being transferred
static String FROM_MAP_TASK
          The map task from which the map output data is being transferred
static int HEARTBEAT_INTERVAL_MIN
           
static org.apache.commons.logging.Log LOG
           
static String MAP_OUTPUT_LENGTH
          The custom http header used for the map output length.
static String MR_CLIENTTRACE_FORMAT
           
static String RAW_MAP_OUTPUT_LENGTH
          The custom http header used for the "raw" map output length.
static int SUCCESS
           
static long versionID
          Changed the version to 2, since we have a new method getMapOutputs Changed version to 3 to have progress() return a boolean Changed the version to 4, since we have replaced TaskUmbilicalProtocol.progress(String, float, String, org.apache.hadoop.mapred.TaskStatus.Phase, Counters) with statusUpdate(String, TaskStatus) Version 5 changed counters representation for HADOOP-2248 Version 6 changes the TaskStatus representation for HADOOP-2208 Version 7 changes the done api (via HADOOP-3140).
static String WORKDIR
           
 
Constructor Summary
TaskTracker(JobConf conf)
          Start with the local machine name, and the default JobTracker
 
Method Summary
 boolean canCommit(TaskAttemptID taskid)
          Child checking whether it can commit
 void cleanupStorage()
          Removes all contents of temporary storage.
 void close()
          Close down the TaskTracker and all its components.
 void commitPending(TaskAttemptID taskid, org.apache.hadoop.mapred.TaskStatus taskStatus)
          Task is reporting that it is in commit_pending and it is waiting for the commit Response
 void done(TaskAttemptID taskid)
          The task is done.
 void fatalError(TaskAttemptID taskId, String msg)
          A child task had a fatal error.
 void fsError(TaskAttemptID taskId, String message)
          A child task had a local filesystem error.
static Class<? extends org.apache.hadoop.mapred.TaskTrackerInstrumentation> getInstrumentationClass(Configuration conf)
           
 org.apache.hadoop.mapred.InterTrackerProtocol getJobClient()
          The connection to the JobTracker, used by the TaskRunner for locating remote files.
 org.apache.hadoop.mapred.JvmManager getJvmManagerInstance()
           
 org.apache.hadoop.mapred.MapTaskCompletionEventsUpdate getMapCompletionEvents(JobID jobId, int fromEventId, int maxLocs, TaskAttemptID id)
          Called by a reduce task to get the map output locations for finished maps.
 long getProtocolVersion(String protocol, long clientVersion)
          Return protocol version corresponding to protocol interface.
 org.apache.hadoop.mapred.JvmTask getTask(org.apache.hadoop.mapred.JVMId jvmId)
          Called upon startup by the child process, to fetch Task data.
 org.apache.hadoop.mapred.TaskMemoryManagerThread getTaskMemoryManager()
           
 org.apache.hadoop.mapred.TaskTrackerInstrumentation getTaskTrackerInstrumentation()
           
 InetSocketAddress getTaskTrackerReportAddress()
          Return the port at which the tasktracker bound to
 boolean isIdle()
          Is this task tracker idle?
 boolean isTaskMemoryManagerEnabled()
          Is the TaskMemoryManager Enabled on this system?
static void main(String[] argv)
          Start the TaskTracker, point toward the indicated JobTracker
 void mapOutputLost(TaskAttemptID taskid, String errorMsg)
          A completed map task's output has been lost.
 boolean ping(TaskAttemptID taskid)
          Child checking to see if we're alive.
 void reportDiagnosticInfo(TaskAttemptID taskid, String info)
          Called when the task dies before completion, and we want to report back diagnostic info
 void reportNextRecordRange(TaskAttemptID taskid, org.apache.hadoop.mapred.SortedRanges.Range range)
          Report the record range which is going to process next by the Task.
 void run()
          The server retry loop.
static void setInstrumentationClass(Configuration conf, Class<? extends org.apache.hadoop.mapred.TaskTrackerInstrumentation> t)
           
 void shuffleError(TaskAttemptID taskId, String message)
          A reduce-task failed to shuffle the map-outputs.
 void shutdown()
           
 boolean statusUpdate(TaskAttemptID taskid, org.apache.hadoop.mapred.TaskStatus taskStatus)
          Called periodically to report Task progress, from 0.0 to 1.0.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

MR_CLIENTTRACE_FORMAT

public static final String MR_CLIENTTRACE_FORMAT
See Also:
Constant Field Values

ClientTraceLog

public static final org.apache.commons.logging.Log ClientTraceLog

HEARTBEAT_INTERVAL_MIN

public static final int HEARTBEAT_INTERVAL_MIN
See Also:
Constant Field Values

CLUSTER_INCREMENT

public static final int CLUSTER_INCREMENT
See Also:
Constant Field Values

COUNTER_UPDATE_INTERVAL

public static final long COUNTER_UPDATE_INTERVAL
See Also:
Constant Field Values

SUCCESS

public static final int SUCCESS
See Also:
Constant Field Values

FILE_NOT_FOUND

public static final int FILE_NOT_FOUND
See Also:
Constant Field Values

MAP_OUTPUT_LENGTH

public static final String MAP_OUTPUT_LENGTH
The custom http header used for the map output length.

See Also:
Constant Field Values

RAW_MAP_OUTPUT_LENGTH

public static final String RAW_MAP_OUTPUT_LENGTH
The custom http header used for the "raw" map output length.

See Also:
Constant Field Values

FROM_MAP_TASK

public static final String FROM_MAP_TASK
The map task from which the map output data is being transferred

See Also:
Constant Field Values

FOR_REDUCE_TASK

public static final String FOR_REDUCE_TASK
The reduce task number for which this map output is being transferred

See Also:
Constant Field Values

WORKDIR

public static final String WORKDIR
See Also:
Constant Field Values

versionID

public static final long versionID
Changed the version to 2, since we have a new method getMapOutputs Changed version to 3 to have progress() return a boolean Changed the version to 4, since we have replaced TaskUmbilicalProtocol.progress(String, float, String, org.apache.hadoop.mapred.TaskStatus.Phase, Counters) with statusUpdate(String, TaskStatus) Version 5 changed counters representation for HADOOP-2248 Version 6 changes the TaskStatus representation for HADOOP-2208 Version 7 changes the done api (via HADOOP-3140). It now expects whether or not the task's output needs to be promoted. Version 8 changes {job|tip|task}id's to use their corresponding objects rather than strings. Version 9 changes the counter representation for HADOOP-1915 Version 10 changed the TaskStatus format and added reportNextRecordRange for HADOOP-153 Version 11 Adds RPCs for task commit as part of HADOOP-3150 Version 12 getMapCompletionEvents() now also indicates if the events are stale or not. Hence the return type is a class that encapsulates the events and whether to reset events index. Version 13 changed the getTask method signature for HADOOP-249 Version 14 changed the getTask method signature for HADOOP-4232 Version 15 Adds FAILED_UNCLEAN and KILLED_UNCLEAN states for HADOOP-4759 Version 16 Added fatalError for child to communicate fatal errors to TT

See Also:
Constant Field Values
Constructor Detail

TaskTracker

public TaskTracker(JobConf conf)
            throws IOException
Start with the local machine name, and the default JobTracker

Throws:
IOException
Method Detail

getTaskTrackerInstrumentation

public org.apache.hadoop.mapred.TaskTrackerInstrumentation getTaskTrackerInstrumentation()

getProtocolVersion

public long getProtocolVersion(String protocol,
                               long clientVersion)
                        throws IOException
Description copied from interface: VersionedProtocol
Return protocol version corresponding to protocol interface.

Parameters:
protocol - The classname of the protocol interface
clientVersion - The version of the protocol that the client speaks
Returns:
the version that the server will speak
Throws:
IOException

getInstrumentationClass

public static Class<? extends org.apache.hadoop.mapred.TaskTrackerInstrumentation> getInstrumentationClass(Configuration conf)

setInstrumentationClass

public static void setInstrumentationClass(Configuration conf,
                                           Class<? extends org.apache.hadoop.mapred.TaskTrackerInstrumentation> t)

cleanupStorage

public void cleanupStorage()
                    throws IOException
Removes all contents of temporary storage. Called upon startup, to remove any leftovers from previous run.

Throws:
IOException

shutdown

public void shutdown()
              throws IOException
Throws:
IOException

close

public void close()
           throws IOException
Close down the TaskTracker and all its components. We must also shutdown any running tasks or threads, and cleanup disk space. A new TaskTracker within the same process space might be restarted, so everything must be clean.

Throws:
IOException

getJobClient

public org.apache.hadoop.mapred.InterTrackerProtocol getJobClient()
The connection to the JobTracker, used by the TaskRunner for locating remote files.


getTaskTrackerReportAddress

public InetSocketAddress getTaskTrackerReportAddress()
Return the port at which the tasktracker bound to


getJvmManagerInstance

public org.apache.hadoop.mapred.JvmManager getJvmManagerInstance()

run

public void run()
The server retry loop. This while-loop attempts to connect to the JobTracker. It only loops when the old TaskTracker has gone bad (its state is stale somehow) and we need to reinitialize everything.

Specified by:
run in interface Runnable

getTask

public org.apache.hadoop.mapred.JvmTask getTask(org.apache.hadoop.mapred.JVMId jvmId)
                                         throws IOException
Called upon startup by the child process, to fetch Task data.

Parameters:
jvmId - the ID of this JVM w.r.t the tasktracker that launched it
Returns:
Task object
Throws:
IOException

statusUpdate

public boolean statusUpdate(TaskAttemptID taskid,
                            org.apache.hadoop.mapred.TaskStatus taskStatus)
                     throws IOException
Called periodically to report Task progress, from 0.0 to 1.0.

Parameters:
taskid - task-id of the child
taskStatus - status of the child
Returns:
True if the task is known
Throws:
IOException

reportDiagnosticInfo

public void reportDiagnosticInfo(TaskAttemptID taskid,
                                 String info)
                          throws IOException
Called when the task dies before completion, and we want to report back diagnostic info

Parameters:
taskid - the id of the task involved
info - the text to report
Throws:
IOException

reportNextRecordRange

public void reportNextRecordRange(TaskAttemptID taskid,
                                  org.apache.hadoop.mapred.SortedRanges.Range range)
                           throws IOException
Report the record range which is going to process next by the Task.

Parameters:
taskid - the id of the task involved
range - the range of record sequence nos
Throws:
IOException

ping

public boolean ping(TaskAttemptID taskid)
             throws IOException
Child checking to see if we're alive. Normally does nothing.

Returns:
True if the task is known
Throws:
IOException

commitPending

public void commitPending(TaskAttemptID taskid,
                          org.apache.hadoop.mapred.TaskStatus taskStatus)
                   throws IOException
Task is reporting that it is in commit_pending and it is waiting for the commit Response

Parameters:
taskid - task's id
taskStatus - status of the child
Throws:
IOException

canCommit

public boolean canCommit(TaskAttemptID taskid)
Child checking whether it can commit

Returns:
true/false

done

public void done(TaskAttemptID taskid)
          throws IOException
The task is done.

Parameters:
taskid - task's id
Throws:
IOException

shuffleError

public void shuffleError(TaskAttemptID taskId,
                         String message)
                  throws IOException
A reduce-task failed to shuffle the map-outputs. Kill the task.

Throws:
IOException

fsError

public void fsError(TaskAttemptID taskId,
                    String message)
             throws IOException
A child task had a local filesystem error. Kill the task.

Throws:
IOException

fatalError

public void fatalError(TaskAttemptID taskId,
                       String msg)
                throws IOException
A child task had a fatal error. Kill the task.

Throws:
IOException

getMapCompletionEvents

public org.apache.hadoop.mapred.MapTaskCompletionEventsUpdate getMapCompletionEvents(JobID jobId,
                                                                                     int fromEventId,
                                                                                     int maxLocs,
                                                                                     TaskAttemptID id)
                                                                              throws IOException
Called by a reduce task to get the map output locations for finished maps. Returns an update centered around the map-task-completion-events. The update also piggybacks the information whether the events copy at the task-tracker has changed or not. This will trigger some action at the child-process.

fromEventId - the index starting from which the locations should be fetched
maxLocs - the max number of locations to fetch
id - The attempt id of the task that is trying to communicate
Returns:
A MapTaskCompletionEventsUpdate
Throws:
IOException

mapOutputLost

public void mapOutputLost(TaskAttemptID taskid,
                          String errorMsg)
                   throws IOException
A completed map task's output has been lost.

Throws:
IOException

isIdle

public boolean isIdle()
Is this task tracker idle?

Returns:
has this task tracker finished and cleaned up all of its tasks?

main

public static void main(String[] argv)
                 throws Exception
Start the TaskTracker, point toward the indicated JobTracker

Throws:
Exception

isTaskMemoryManagerEnabled

public boolean isTaskMemoryManagerEnabled()
Is the TaskMemoryManager Enabled on this system?

Returns:
true if enabled, false otherwise.

getTaskMemoryManager

public org.apache.hadoop.mapred.TaskMemoryManagerThread getTaskMemoryManager()


Copyright © 2009 The Apache Software Foundation