= PDAD: Parallel Data Analysis Diff = * Acronym: '''PDAD''' (who's your ''parallel'' daddy) * SVN: https://svn-batch.grid.pub.ro/svn/PP2009/proiecte/pdad * project members: svn checkout https://pdad.googlecode.com/svn/trunk/ pdad --username google_username * non-members: svn checkout http://pdad.googlecode.com/svn/trunk/ pdad-read-only * Team members: Cristina Basescu - cristina.basescu, Claudiu-Dan Gheorghe - claudiu.gheorghe * Project description: compare data analysis performed using (a) [http://hadoop.apache.org/mapreduce Hadoop's MapReduce] (b) [http://hadoop.apache.org/pig/ Hadoop's Pig] (c) [http://www.mcs.anl.gov/research/projects/mpich2/ MPI] == Contents == * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Introduction Introduction] * Motivation * Goals * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Applications Applications] * MapReduce * Pig * MPI * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Performance Performance Analysis] * Testing Infrastructure * Parameters * Results * [https://ncit-cluster.grid.pub.ro/trac/PP2009/wiki/PDAD_Conclusions Conclusions] == App == [available/unavailable se refera probabil la tipul resursei care a generat fault-ul (eg cpu availability 60% sau o resursa unavailable)] * care dintre motivele de fault apare cel mai des in event-uri event_trace.event_end_reason - claudiu * care este durata medie a event-urilor - cristina * MapReduce DONE * Pig DONE * ce componenta apare cel mai des in fault event-uri component.component_type code - claudiu * avand event-urile impartite pe categorii dupa durata, care este cauza de fault cea mai intalnita pe fiecare categ event_trace.event_end_reason - cristina [-> sch in enumerarea pe fiecare categ a numarului de joburi terminate din fiecare cauza frecventa (>1000 failed)] * MapReduce DONE * Pig DONE * pt fiecare categ din event_trace.event_end_reason code ranges, care dintre event_trace.event_end_reason code definitions apare cel mai des (numarul de dati cat apare fiecare..) - claudiu * in ce locatie geografica sunt nodurile pe care se inregistreaza cele mai multe failure-uri (node_location luat uitandu-ne dupa node_id din event_trace) - cristina * MapReduce DONE * Pig DONE Obs: pt Pig, comment-urile din input, desi nu afecteaza functionarea script-ului, nu sunt ignorate iar rezultatele ce implica un COUNT (precum media) nu vor fi corecte Sol: o functie custom de citire sau scoaterea comment-urilor din fisierul de input == Comparison == * High performance and scalability * Portability * Productivity is not a principal objective for MPI!!