/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.mapreduce; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.RawComparator; import org.apache.hadoop.io.compress.CompressionCodec; /** * Maps input key/value pairs to a set of intermediate key/value pairs. * *
Maps are the individual tasks which transform input records into a * intermediate records. The transformed intermediate records need not be of * the same type as the input records. A given input pair may map to zero or * many output pairs.
* *The Hadoop Map-Reduce framework spawns one map task for each
* {@link InputSplit} generated by the {@link InputFormat} for the job.
* Mapper
implementations can access the {@link Configuration} for
* the job via the {@link JobContext#getConfiguration()}.
*
*
The framework first calls
* {@link #setup(org.apache.hadoop.mapreduce.Mapper.Context)}, followed by
* {@link #map(Object, Object, Context)}
* for each key/value pair in the InputSplit
. Finally
* {@link #cleanup(Context)} is called.
All intermediate values associated with a given output key are * subsequently grouped by the framework, and passed to a {@link Reducer} to * determine the final output. Users can control the sorting and grouping by * specifying two key {@link RawComparator} classes.
* *The Mapper
outputs are partitioned per
* Reducer
. Users can control which keys (and hence records) go to
* which Reducer
by implementing a custom {@link Partitioner}.
*
*
Users can optionally specify a combiner
, via
* {@link Job#setCombinerClass(Class)}, to perform local aggregation of the
* intermediate outputs, which helps to cut down the amount of data transferred
* from the Mapper
to the Reducer
.
*
*
Applications can specify if and how the intermediate
* outputs are to be compressed and which {@link CompressionCodec}s are to be
* used via the Configuration
.
If the job has zero
* reduces then the output of the Mapper
is directly written
* to the {@link OutputFormat} without sorting by keys.
Example:
** ** public class TokenCounterMapper * extends Mapper
Applications may override the {@link #run(Context)} method to exert
* greater control on map processing e.g. multi-threaded Mapper
s
* etc.