Changes between Version 8 and Version 9 of PDAD_Applications

Jan 14, 2010, 2:18:48 PM (14 years ago)



  • PDAD_Applications

    v8 v9  
    1616Making design choices for !MapReduce is an intricate task. On the one hand, there are many decisions to be made, such as to have or not to have a Combiner (cuts down the amount of data transferred from the Mapper to the Reducer), a Partitioner (partitions the output of mappers per reducer), or even a !CompressionCodec (compresses the intermediate outputs from mappers to reducers) or a Comparator to do a secondary sort before the reduce phase. On the other hand, complex combinations in specifying these extra features may lead to a too long development time, which is not worth it.
     18b. We have tried two approaches here. The first one is to get the Mapper compute the jobs duration, giving the same key to each pair, and the Reducer will sum up all these values and compute the medium. Unfortunately, no Combiner can be specified here, as the Reducer would not know afterwards how many elements the Mapper generated.
     20The second approach is to compute the medium on chunks having a fixed size of elements, and then the result would be the medium of all these mediums. Although this is appropiate for specifying a Combiner, it will give an approximate value of the medium, depending on the distribution of values in each of the chunks. In this case, the Mapper will generate a new key for durations at each chunk number of pairs, the Combiner will make the medium for each chunk and output mediums having the same key, and the reduce will compute the medium like it did in the previous example.
    1824TODO Claudiu for his apps
    2026== Pig ==
    22 On the contrary to !MapReduce, writing code in !PigLatin is as straight forward as it can be
     28On the contrary to !MapReduce, writing code in !PigLatin is as straight forward as it can get. There's no need to wory about ''how'' things are done, one just has to specify ''what'' needs to be done.
    2330TODO Claudiu for his apps