Changes between Version 7 and Version 8 of PDAD_Introduction
- Timestamp:
- Jan 15, 2010, 7:57:12 PM (14 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
PDAD_Introduction
v7 v8 20 20 21 21 Apache Hadoop is an open-source implementation of the popular MapReduce paradigm introduced by Google for large-scale data processing. Hadoop is written in Java, but it provides support for specifying jobs in multiple languages like Python for example. 22 22 23 Hadoop provides a solid infrastructure that contains a distributed filesystem and a centralized fault-tolerant cluster architecture. 24 Even the framework is not so mature, multiple sub-projects appeared, being built on the Hadoop's MapReduce and targeting different types of application. We found interesting the Pig project, which is a high-level scripting language very similar to SQL used to analyze huge data sets. 25 26 Our curiosity is: it's this language so cool? Is it so hard to design and implement data analysis problems in MapReduce? 23 27 24 28 == MPI ==