Changes between Version 9 and Version 10 of PDAD_Performance

Jan 14, 2010, 3:33:15 PM (14 years ago)



  • PDAD_Performance

    v9 v10  
    4141The framework is indeed suitable for large data processing, as shown in the charts below. Increasing the replication factor would have probably increased the throughput for the seti dataset [[Image(throughput_2.png)]] [[Image(time_char_1.png)]] [[Image(time_chart.png)]].
    43 When testing for
     43Here are [ some detailed tests results] with the number of tasks, maps and reduces.
    45 == Comparison ==
    46  * High performance and scalability
    47  * Portability
    48  * Productivity is not a principal objective for MPI!!
     45When testing the 4 nodes cluster, after the first job (failureCause application on seti dataset, which performed 12s in MapReduce instead of 14s in MapReduce on the 2 nodes cluster) one of the nodes froze and Hadoop proves it's fault tolerance by succeeding in running the tests, without however giving relevant results. But from the first test's output, we can conclude that there was some scalability there.
     47We didn't test the applications for MPI as the jobs in Hadoop took a really long time and we considered them more challenging. However, in out applications we ''assumed'' that the data was somehow present on all slaves, which in practice is not true. Consequently, we think MPI needs and underlying distributed file system, like NFS, to do things properly.
     49From a portability point of view, Hadoop passes the test. We cannot say the same about MPI, which is highly dependend on the runtime system underneath. For example, the proper asynchronous message sending level deppends on the buffers used by the RTS. Also some MPI implementations may offer things others lack.
     51Its interesting to discuss productivity