= PDAD: Parallel Data Analysis Diff = * Acronym: '''PDAD''' (who's your ''parallel'' daddy) * SVN: https://svn-batch.grid.pub.ro/svn/PP2009/proiecte/pdad * Team members: Cristina Basescu - cristina.basescu, Claudiu-Dan Gheorghe - cluster_account_name2 * Project description: compare data analysis performed using (a) Hadoop's MapReduce (b) Hadoop's Pig (c) MPI == Technologies and Languages == * [http://hadoop.apache.org/ Hadoop Framework] * [http://hadoop.apache.org/mapreduce/ MapReduce subproject] * [http://hadoop.apache.org/pig/ Pig subproject] * MPI == Project Activity == * Oct 25 - install Hadoop framework and get familiar with MapReduce and Pig; run examples * Nov 2 - project roadmap * Nov 22 - ideas for data analysis applications to implement * Dec 3 - decide on two data analysis applications; start implementation == Proposed Data Analysis Applications == * '''Image filters and metadata processing''' - this is a scenario where people upload pictures on an website and want to apply a filter (such as blurr, sharpen, emboss etc) on them, while the company would like to make statistics regarding the pictures' metadata, such as camera type, shutter speed, ambient light levels, whether the flash was used, etc. This is a typical map-reduce application, especially for the metadata phase: map jobs extract the necessary metadata information and group it, for example, by producer, and the reduce jobs count the number of occurences. For the filter phase, the map job applies the filter, while the reduce job is an idempotent one.[[BR]] // ''TODO'': find source for downloading data * '''Inverted-index for e-mails'''[[BR]] // ''TODO'' Add description * '''Semantic web - Reccomendation system'''[[BR]] // ''TODO'' Add description