45 | | == Comparison == |
46 | | * High performance and scalability |
47 | | * Portability |
48 | | * Productivity is not a principal objective for MPI!! |
| 45 | When testing the 4 nodes cluster, after the first job (failureCause application on seti dataset, which performed 12s in MapReduce instead of 14s in MapReduce on the 2 nodes cluster) one of the nodes froze and Hadoop proves it's fault tolerance by succeeding in running the tests, without however giving relevant results. But from the first test's output, we can conclude that there was some scalability there. |
| 46 | |
| 47 | We didn't test the applications for MPI as the jobs in Hadoop took a really long time and we considered them more challenging. However, in out applications we ''assumed'' that the data was somehow present on all slaves, which in practice is not true. Consequently, we think MPI needs and underlying distributed file system, like NFS, to do things properly. |
| 48 | |
| 49 | From a portability point of view, Hadoop passes the test. We cannot say the same about MPI, which is highly dependend on the runtime system underneath. For example, the proper asynchronous message sending level deppends on the buffers used by the RTS. Also some MPI implementations may offer things others lack. |
| 50 | |
| 51 | Its interesting to discuss productivity |