Changes between Version 7 and Version 8 of PDAD_Introduction


Ignore:
Timestamp:
Jan 15, 2010, 7:57:12 PM (14 years ago)
Author:
claudiu.gheorghe
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PDAD_Introduction

    v7 v8  
    2020
    2121Apache Hadoop is an open-source implementation of the popular MapReduce paradigm introduced by Google for large-scale data processing. Hadoop is written in Java, but it provides support for specifying jobs in multiple languages like Python for example.
     22
    2223Hadoop provides a solid infrastructure that contains a distributed filesystem and a centralized fault-tolerant cluster architecture.
     24Even the framework is not so mature, multiple sub-projects appeared, being built on the Hadoop's MapReduce and targeting different types of application. We found interesting the Pig project, which is a high-level scripting language very similar to SQL used to analyze huge data sets.
     25
     26Our curiosity is: it's this language so cool? Is it so hard to design and implement data analysis problems in MapReduce?
    2327
    2428== MPI ==