Context Navigation

Changes between Version 27 and Version 28 of PDAD

Timestamp:: Jan 14, 2010, 12:59:57 AM (14 years ago)
Author:: cristina.basescu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

PDAD

-                      v27
+                      v28
  * Project description: compare data analysis performed using (a) [http://hadoop.apache.org/mapreduce Hadoop's MapReduce] (b) [http://hadoop.apache.org/pig/ Hadoop's Pig] (c) [http://www.mcs.anl.gov/research/projects/mpich2/ MPI]
 == Motivation ==
+== Contents ==
+== Technologies and Languages ==
+ * [http://hadoop.apache.org/ Hadoop Framework]
+ * [http://hadoop.apache.org/mapreduce/ MapReduce subproject]
+ * [http://hadoop.apache.org/pig/ Pig subproject]
+ * MPI
+   * http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
+ * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Introduction Introduction]
+   * Motivation
+   * Goals
+ * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Applications Applications]
+   * MapReduce
+   * Pig
+   * MPI
+ * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Performance Performance Analysis]
+   * Testing Infrastructure
+   * Parameters
+   * Results
+ * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Conclusions Conclusions]
-== Project Activity ==
- * Oct 25 - install Hadoop framework and get familiar with MapReduce and Pig; run examples
- * Nov 2 - project roadmap
- * Nov 22 - ideas for data analysis applications to implement
- * Dec 3 - decide on two data analysis applications; start implementation
- * Jan 9 - finish testing
-== Proposed Data Analysis Applications ==
- * '''Image filters and metadata processing''' - this is a scenario where people upload pictures on an website and want to apply a filter (such as blurr, sharpen, emboss etc) on them, while the company would like to make statistics regarding the pictures' metadata, such as camera type, shutter speed, ambient light levels, whether the flash was used, etc. This is a typical map-reduce application, especially for the metadata phase: map jobs extract the necessary metadata information and group it, for example, by producer, and the reduce jobs count the number of occurences. For the filter phase, the map job applies the filter, while the reduce job is an idempotent one.
-   * // ''TODO'': find source for downloading data
- * '''Inverted-index for e-mails'''  Email servers generate huge amount of text information. Just like web documents, email messages can be classified based on their content and an inverted index would be useful to find relevant emails containers for a given query. This can be useful as an indoor application used by email service owners to find information about the users or for advertising, but is not appropriate as a public application because it breaks privacy rules.
-   * http://cfdr.usenix.org/
-   * http://fta.inria.fr/apache2-default/pmwiki/
- * '''Semantic web - Reccomendation system'''
-   * // ''TODO'' Add description
- * '''Weather Analysis'''
-   * There are lots of data sets freely available on the Internet.
-   * Issue - what could we analyse more precisely? Maybe Emil can give us a hint of interesting analysis.
-   * // ''TODO'' Complete description
-== Next meeting ==
- * testing infrastructure?
- * decide on the application(s)
- * 'play' on cluster
- * care e termenul de predare al proj?