source: proiecte/HadoopJUnit/hadoop-0.20.1/src/mapred/org/apache/hadoop/mapreduce/InputFormat.java @ 120

Last change on this file since 120 was 120, checked in by (none), 14 years ago

Added the mail files for the Hadoop JUNit Project

  • Property svn:executable set to *
File size: 3.9 KB
Line 
1/**
2 * Licensed to the Apache Software Foundation (ASF) under one
3 * or more contributor license agreements.  See the NOTICE file
4 * distributed with this work for additional information
5 * regarding copyright ownership.  The ASF licenses this file
6 * to you under the Apache License, Version 2.0 (the
7 * "License"); you may not use this file except in compliance
8 * with the License.  You may obtain a copy of the License at
9 *
10 *     http://www.apache.org/licenses/LICENSE-2.0
11 *
12 * Unless required by applicable law or agreed to in writing, software
13 * distributed under the License is distributed on an "AS IS" BASIS,
14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 * See the License for the specific language governing permissions and
16 * limitations under the License.
17 */
18
19package org.apache.hadoop.mapreduce;
20
21import java.io.IOException;
22import java.util.List;
23
24import org.apache.hadoop.fs.FileSystem;
25import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
26
27/**
28 * <code>InputFormat</code> describes the input-specification for a
29 * Map-Reduce job.
30 *
31 * <p>The Map-Reduce framework relies on the <code>InputFormat</code> of the
32 * job to:<p>
33 * <ol>
34 *   <li>
35 *   Validate the input-specification of the job.
36 *   <li>
37 *   Split-up the input file(s) into logical {@link InputSplit}s, each of
38 *   which is then assigned to an individual {@link Mapper}.
39 *   </li>
40 *   <li>
41 *   Provide the {@link RecordReader} implementation to be used to glean
42 *   input records from the logical <code>InputSplit</code> for processing by
43 *   the {@link Mapper}.
44 *   </li>
45 * </ol>
46 *
47 * <p>The default behavior of file-based {@link InputFormat}s, typically
48 * sub-classes of {@link FileInputFormat}, is to split the
49 * input into <i>logical</i> {@link InputSplit}s based on the total size, in
50 * bytes, of the input files. However, the {@link FileSystem} blocksize of 
51 * the input files is treated as an upper bound for input splits. A lower bound
52 * on the split size can be set via
53 * <a href="{@docRoot}/../mapred-default.html#mapred.min.split.size">
54 * mapred.min.split.size</a>.</p>
55 *
56 * <p>Clearly, logical splits based on input-size is insufficient for many
57 * applications since record boundaries are to respected. In such cases, the
58 * application has to also implement a {@link RecordReader} on whom lies the
59 * responsibility to respect record-boundaries and present a record-oriented
60 * view of the logical <code>InputSplit</code> to the individual task.
61 *
62 * @see InputSplit
63 * @see RecordReader
64 * @see FileInputFormat
65 */
66public abstract class InputFormat<K, V> {
67
68  /**
69   * Logically split the set of input files for the job. 
70   *
71   * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper}
72   * for processing.</p>
73   *
74   * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the
75   * input files are not physically split into chunks. For e.g. a split could
76   * be <i>&lt;input-file-path, start, offset&gt;</i> tuple. The InputFormat
77   * also creates the {@link RecordReader} to read the {@link InputSplit}.
78   *
79   * @param context job configuration.
80   * @return an array of {@link InputSplit}s for the job.
81   */
82  public abstract 
83    List<InputSplit> getSplits(JobContext context
84                               ) throws IOException, InterruptedException;
85 
86  /**
87   * Create a record reader for a given split. The framework will call
88   * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
89   * the split is used.
90   * @param split the split to be read
91   * @param context the information about the task
92   * @return a new record reader
93   * @throws IOException
94   * @throws InterruptedException
95   */
96  public abstract 
97    RecordReader<K,V> createRecordReader(InputSplit split,
98                                         TaskAttemptContext context
99                                        ) throws IOException, 
100                                                 InterruptedException;
101
102}
103
Note: See TracBrowser for help on using the repository browser.