Changes between Version 4 and Version 5 of Hadoop

Jan 19, 2010, 10:33:20 AM (14 years ago)



  • Hadoop

    v4 v5  
    11= HadoopJUnitRunner: Hadoop Distributed JUnit Runner =
    2  * Team members: Matei Gruber - gruber.matei, Andreea Lucau - lucau.andreea, Anca Vamanu - vamanu.anca
     2 * Team members: Matei Gruber - gruber.matei, Andreea Lucau - lucau.andreea
    33 * The purpose of this project is to evaluate and implement a way of running in a distributed environment JUnit tests. We've chosen Hadoop as a distributed framework.
    99 * 15 December - Run JUnit tests with dependencies and add simple results reporting
    1010 * 3 January - Add extended error reporting: the entire stack trace
     12=== Motivation ===
     13The purpose of this project is to run a large number of tests in a distributed environment in order to get faster results and to take advantage of the available distributed infrastructures available nowadays.
     14We chose to use Hadoop, an open source framework that comes with its own distributed filesystem and a MapReduce implementation. We used the pseudo-distributed configuration, that creates clusters locally, on the host machine. As testing tool, we used JUnit, a popular testing framework.
     15We aim to offer the end used developer the same experience when running JUnit tests on a normal host machine as when running it with Hadoop, but having the advantages of running in a distributed environment: higher speed.
     17=== Project Architecture ===
     19=== Development Problems ===
     20During the development phase of the project we encountered several problems. We describe them bellow and also the solution we have chosen.
     22==== Sending tests to clusters ====
     23We had to design a protocol for sending the tests we want to run into the clusters, running them and finally, gathering the results. Do accomplish this, we decided to get the Description of each test we want to run (Description is a JUnit class, containing information about a single test), serialize it and send it into the cluster. Also, we would put on the HDFS a jar containing the JUnit test classes. In the Map phase we would deserialize the class Description, run the test and finally, in the Reduce phase, we would serialize the result and send back to the centralized management console a pair <Description Result>.
     25==== Loading resources into the cluster ====
     26At runtime, a test may require custom user defined classes. It would be impossible for a developer to know in advanced all the classed it may need to run the tests in all situation. So we needed a way of getting the test required classes into the cluster. Hadoop has its own ClassLoader that searches classes in the Hadoop Java environment, but this wasn't enough for us. So using the Hadoop API, we could get a hold of the situations when the Hadoop class loader needed classes that weren't already loaded into the HDFS and send a request for the class to a class server (a new component) and copy the bytecode of the class on the HDFS.
     28==== Reporting results ====
     29After running the tests, in the reduce phase, we needed a way of sending the results to the centralized management console. The problem was that the classic JUnit runner returner a Result class, that wasn't serializable. So we transformed this class into a byte array, encoded into a Base64 format and send the result in this format and upon receiving it, we would decode the bytes and recreate the object.