Changes between Version 27 and Version 28 of PDAD


Ignore:
Timestamp:
Jan 14, 2010, 12:59:57 AM (14 years ago)
Author:
cristina.basescu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PDAD

    v27 v28  
    99 * Project description: compare data analysis performed using (a) [http://hadoop.apache.org/mapreduce Hadoop's MapReduce] (b) [http://hadoop.apache.org/pig/ Hadoop's Pig] (c) [http://www.mcs.anl.gov/research/projects/mpich2/ MPI]
    1010
    11 == Motivation ==
     11== Contents ==
    1212
    13 == Technologies and Languages ==
    14  * [http://hadoop.apache.org/ Hadoop Framework]
    15  * [http://hadoop.apache.org/mapreduce/ MapReduce subproject]
    16  * [http://hadoop.apache.org/pig/ Pig subproject]
    17  * MPI
    18    * http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
     13 * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Introduction Introduction]
     14   * Motivation
     15   * Goals
     16 * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Applications Applications]
     17   * MapReduce
     18   * Pig
     19   * MPI
     20 * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Performance Performance Analysis]
     21   * Testing Infrastructure
     22   * Parameters
     23   * Results
     24 * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Conclusions Conclusions]
    1925
    20 == Project Activity ==
    21  * Oct 25 - install Hadoop framework and get familiar with MapReduce and Pig; run examples
    22  * Nov 2 - project roadmap
    23  * Nov 22 - ideas for data analysis applications to implement
    24  * Dec 3 - decide on two data analysis applications; start implementation
    25  * Jan 9 - finish testing
    26 
    27 == Proposed Data Analysis Applications ==
    28  * '''Image filters and metadata processing''' - this is a scenario where people upload pictures on an website and want to apply a filter (such as blurr, sharpen, emboss etc) on them, while the company would like to make statistics regarding the pictures' metadata, such as camera type, shutter speed, ambient light levels, whether the flash was used, etc. This is a typical map-reduce application, especially for the metadata phase: map jobs extract the necessary metadata information and group it, for example, by producer, and the reduce jobs count the number of occurences. For the filter phase, the map job applies the filter, while the reduce job is an idempotent one.
    29    * // ''TODO'': find source for downloading data
    30 
    31  * '''Inverted-index for e-mails'''  Email servers generate huge amount of text information. Just like web documents, email messages can be classified based on their content and an inverted index would be useful to find relevant emails containers for a given query. This can be useful as an indoor application used by email service owners to find information about the users or for advertising, but is not appropriate as a public application because it breaks privacy rules.
    32    * http://cfdr.usenix.org/
    33    * http://fta.inria.fr/apache2-default/pmwiki/
    34 
    35  * '''Semantic web - Reccomendation system'''
    36    * // ''TODO'' Add description
    37 
    38  * '''Weather Analysis'''
    39    * There are lots of data sets freely available on the Internet.
    40    * Issue - what could we analyse more precisely? Maybe Emil can give us a hint of interesting analysis.
    41    * // ''TODO'' Complete description
    42 
    43 == Next meeting ==
    44 
    45  * testing infrastructure?
    46  * decide on the application(s)
    47  * 'play' on cluster
    48  * care e termenul de predare al proj?
    4926
    5027