Hadoop 0.20.1 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements. The table below is sorted by Component.

Changes Since Hadoop 0.20.0

Common

Sub-task

Bug

Improvement

New Feature

HDFS

Bug

Improvement

Map/Reduce

Bug

Improvement

Changes Since Hadoop 0.19.1

IssueComponentNotes
HADOOP-3344buildChanged build procedure for libhdfs to build correctly for different platforms. Build instructions are in the Jira item.
HADOOP-4253confRemoved from class org.apache.hadoop.fs.RawLocalFileSystem deprecated methods public String getName(), public void lock(Path p, boolean shared) and public void release(Path p).
HADOOP-4454confChanged processing of conf/slaves file to allow # to begin a comment.
HADOOP-4631confSplit hadoop-default.xml into core-default.xml, hdfs-default.xml and mapreduce-default.xml.
HADOOP-4035contrib/capacity-schedChanged capacity scheduler policy to take note of task memory requirements and task tracker memory availability.
HADOOP-4445contrib/capacity-schedChanged JobTracker UI to better present the number of active tasks.
HADOOP-4576contrib/capacity-schedChanged capacity scheduler UI to better present number of running and pending tasks.
HADOOP-4179contrib/chukwaIntroduced Vaidya rule based performance diagnostic tool for Map/Reduce jobs.
HADOOP-4827contrib/chukwaImproved framework for data aggregation in Chuckwa.
HADOOP-4843contrib/chukwaIntroduced Chuckwa collection of job history.
HADOOP-5030contrib/chukwaChanged RPM install location to the value specified by build.properties file.
HADOOP-5531contrib/chukwaDisabled Chukwa unit tests for 0.20 branch only.
HADOOP-4789contrib/fair-shareChanged fair scheduler to divide resources equally between pools, not jobs.
HADOOP-4873contrib/fair-shareChanged fair scheduler UI to display minMaps and minReduces variables.
HADOOP-3750dfsRemoved deprecated method parseArgs from org.apache.hadoop.fs.FileSystem.
HADOOP-4029dfsAdded name node storage information to the dfshealth page, and moved data node information to a separated page.
HADOOP-4103dfsModified dfsadmin -report to report under replicated blocks. blocks with corrupt replicas, and missing blocks".
HADOOP-4567dfsChanged GetFileBlockLocations to return topology information for nodes that host the block replicas.
HADOOP-4572dfsMoved org.apache.hadoop.hdfs.{CreateEditsLog, NNThroughputBenchmark} to org.apache.hadoop.hdfs.server.namenode.
HADOOP-4618dfsMoved HTTP server from FSNameSystem to NameNode. Removed FSNamesystem.getNameNodeInfoPort(). Replaced FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort() with new method FSNamesystem.getDFSNameNodeAddress(). Removed constructor NameNode(bindAddress, conf).
HADOOP-4826dfsIntroduced new dfsadmin command saveNamespace to command the name service to do an immediate save of the file system image.
HADOOP-4970dfsChanged trash facility to use absolute path of the deleted file.
HADOOP-5468documentationReformatted HTML documentation for Hadoop to use submenus at the left column.
HADOOP-3497fsChanged the semantics of file globbing with a PathFilter (using the globStatus method of FileSystem). Previously, the filtering was too restrictive, so that a glob of /*/* and a filter that only accepts /a/b would not have matched /a/b. With this change /a/b does match.
HADOOP-4234fsChanged KFS glue layer to allow applications to interface with multiple KFS metaservers.
HADOOP-4422fs/s3Modified Hadoop file system to no longer create S3 buckets. Applications can create buckets for their S3 file systems by other means, for example, using the JetS3t API.
HADOOP-3063ioIntroduced BloomMapFile subclass of MapFile that creates a Bloom filter from all keys.
HADOOP-1230mapredReplaced parameters with context obejcts in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes.
HADOOP-1650mapredUpgraded all core servers to use Jetty 6
HADOOP-3923mapredMoved class org.apache.hadoop.mapred.StatusHttpServer to org.apache.hadoop.http.HttpServer.
HADOOP-3986mapredRemoved classes org.apache.hadoop.mapred.JobShell and org.apache.hadoop.mapred.TestJobShell. Removed from JobClient methods static void setCommandLineConfig(Configuration conf) and public static Configuration getCommandLineConfig().
HADOOP-4188mapredRemoved Task's dependency on concrete file systems by taking list from FileSystem class. Added statistics table to FileSystem class. Deprecated FileSystem method getStatistics(Class<? extends FileSystem> cls).
HADOOP-4210mapredChanged public class org.apache.hadoop.mapreduce.ID to be an abstract class. Removed from class org.apache.hadoop.mapreduce.ID the methods public static ID read(DataInput in) and public static ID forName(String str).
HADOOP-4305mapredImproved TaskTracker blacklisting strategy to better exclude faulty tracker from executing tasks.
HADOOP-4435mapredChanged JobTracker web status page to display the amount of heap memory in use. This changes the JobSubmissionProtocol.
HADOOP-4565mapredImproved MultiFileInputFormat so that multiple blocks from the same node or same rack can be combined into a single split.
HADOOP-4749mapredAdded a new counter REDUCE_INPUT_BYTES.
HADOOP-4783mapredChanged history directory permissions to 750 and history file permissions to 740.
HADOOP-3422metricsChanged names of ganglia metrics to avoid conflicts and to better identify source function.
HADOOP-4284securityIntroduced HttpServer method to support global filters.
HADOOP-4575securityIntroduced independent HSFTP proxy server for authenticated access to clusters.
HADOOP-4661tools/distcpIntroduced distch tool for parallel ch{mod, own, grp}.