Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

InputFormat.java @ 120

Last change on this file since 120 was 120, checked in by (none), 14 years ago
Added the mail files for the Hadoop JUNit Project
Property svn:executable set to ``*
File size: 3.9 KB

Line
1	/**
2	* Licensed to the Apache Software Foundation (ASF) under one
3	* or more contributor license agreements. See the NOTICE file
4	* distributed with this work for additional information
5	* regarding copyright ownership. The ASF licenses this file
6	* to you under the Apache License, Version 2.0 (the
7	* "License"); you may not use this file except in compliance
8	* with the License. You may obtain a copy of the License at
9	*
10	* http://www.apache.org/licenses/LICENSE-2.0
11	*
12	* Unless required by applicable law or agreed to in writing, software
13	* distributed under the License is distributed on an "AS IS" BASIS,
14	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15	* See the License for the specific language governing permissions and
16	* limitations under the License.
17	*/
18
19	package org.apache.hadoop.mapreduce;
20
21	import java.io.IOException;
22	import java.util.List;
23
24	import org.apache.hadoop.fs.FileSystem;
25	import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
26
27	/**
28	* <code>InputFormat</code> describes the input-specification for a
29	* Map-Reduce job.
30	*
31	* <p>The Map-Reduce framework relies on the <code>InputFormat</code> of the
32	* job to:<p>
33	* <ol>
34	* <li>
35	* Validate the input-specification of the job.
36	* <li>
37	* Split-up the input file(s) into logical {@link InputSplit}s, each of
38	* which is then assigned to an individual {@link Mapper}.
39	* </li>
40	* <li>
41	* Provide the {@link RecordReader} implementation to be used to glean
42	* input records from the logical <code>InputSplit</code> for processing by
43	* the {@link Mapper}.
44	* </li>
45	* </ol>
46	*
47	* <p>The default behavior of file-based {@link InputFormat}s, typically
48	* sub-classes of {@link FileInputFormat}, is to split the
49	* input into <i>logical</i> {@link InputSplit}s based on the total size, in
50	* bytes, of the input files. However, the {@link FileSystem} blocksize of
51	* the input files is treated as an upper bound for input splits. A lower bound
52	* on the split size can be set via
53	* <a href="{@docRoot}/../mapred-default.html#mapred.min.split.size">
54	* mapred.min.split.size</a>.</p>
55	*
56	* <p>Clearly, logical splits based on input-size is insufficient for many
57	* applications since record boundaries are to respected. In such cases, the
58	* application has to also implement a {@link RecordReader} on whom lies the
59	* responsibility to respect record-boundaries and present a record-oriented
60	* view of the logical <code>InputSplit</code> to the individual task.
61	*
62	* @see InputSplit
63	* @see RecordReader
64	* @see FileInputFormat
65	*/
66	public abstract class InputFormat<K, V> {
67
68	/**
69	* Logically split the set of input files for the job.
70	*
71	* <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper}
72	* for processing.</p>
73	*
74	* <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the
75	* input files are not physically split into chunks. For e.g. a split could
76	* be <i><input-file-path, start, offset></i> tuple. The InputFormat
77	* also creates the {@link RecordReader} to read the {@link InputSplit}.
78	*
79	* @param context job configuration.
80	* @return an array of {@link InputSplit}s for the job.
81	*/
82	public abstract
83	List<InputSplit> getSplits(JobContext context
84	) throws IOException, InterruptedException;
85
86	/**
87	* Create a record reader for a given split. The framework will call
88	* {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before
89	* the split is used.
90	* @param split the split to be read
91	* @param context the information about the task
92	* @return a new record reader
93	* @throws IOException
94	* @throws InterruptedException
95	*/
96	public abstract
97	RecordReader<K,V> createRecordReader(InputSplit split,
98	TaskAttemptContext context
99	) throws IOException,
100	InterruptedException;
101
102	}
103

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: proiecte/HadoopJUnit/hadoop-0.20.1/src/mapred/org/apache/hadoop/mapreduce/InputFormat.java @ 120

Download in other formats: