wiki:PDAD

Version 5 (modified by cristina.basescu, 14 years ago) (diff)

--

PDAD: Parallel Data Analysis Diff

  • Team members: Cristina Basescu - cristina.basescu, Claudiu-Dan Gheorghe - cluster_account_name2
  • Project description: compare data analysis performed using (a) Hadoop's MapReduce? (b) Hadoop's Pig (c) MPI

Technologies and Languages

Project Activity

  • Oct 25 - install Hadoop framework and get familiar with MapReduce? and Pig; run examples
  • Nov 2 - project roadmap
  • Nov 22 - ideas for data analysis applications to implement
  • Dec 3 - decide on two data analysis applications; start implementation

Proposed Data Analysis Applications

  • Image filters and metadata processing - this is a scenario where people upload pictures on an website and want to apply a filter (such as blurr, sharpen, emboss etc) on them, while the company would like to make statistics regarding the pictures' metadata, such as camera type, shutter speed, ambient light levels, whether the flash was used, etc. This is a typical map-reduce application, especially for the metadata phase: map jobs extract the necessary metadata information and group it, for example, by producer, and the reduce jobs count the number of occurences. For the filter phase, the map job applies the filter, while the reduce job is an idempotent one.
    • TODO: find source for downloading data
  • Inverted-index for e-mails
    • TODO Add description
  • Semantic web - Reccomendation system
    • TODO Add description
  • Weather Analysis
    • Issue - what could we analyse more precisely?
    • TODO Complete description