Context Navigation

Changes between Version 9 and Version 10 of PDAD_Performance

Timestamp:: Jan 14, 2010, 3:33:15 PM (14 years ago)
Author:: cristina.basescu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

PDAD_Performance

-                      v9
+                      v10
 The framework is indeed suitable for large data processing, as shown in the charts below. Increasing the replication factor would have probably increased the throughput for the seti dataset [[Image(throughput_2.png)]] [[Image(time_char_1.png)]] [[Image(time_chart.png)]].
+When testing for
+Here are [http://spreadsheets.google.com/ccc?key=0Av7LR4rlPvTEdGFRQUdfSklFR29pR0NGRmowZ0otZGc&hl=en some detailed tests results] with the number of tasks, maps and reduces.
+== Comparison ==
+ * High performance and scalability
+ * Portability
+ * Productivity is not a principal objective for MPI!!
+When testing the 4 nodes cluster, after the first job (failureCause application on seti dataset, which performed 12s in MapReduce instead of 14s in MapReduce on the 2 nodes cluster) one of the nodes froze and Hadoop proves it's fault tolerance by succeeding in running the tests, without however giving relevant results. But from the first test's output, we can conclude that there was some scalability there.
+We didn't test the applications for MPI as the jobs in Hadoop took a really long time and we considered them more challenging. However, in out applications we ''assumed'' that the data was somehow present on all slaves, which in practice is not true. Consequently, we think MPI needs and underlying distributed file system, like NFS, to do things properly.
+From a portability point of view, Hadoop passes the test. We cannot say the same about MPI, which is highly dependend on the runtime system underneath. For example, the proper asynchronous message sending level deppends on the buffers used by the RTS. Also some MPI implementations may offer things others lack.
+Its interesting to discuss productivity