Changes between Version 19 and Version 20 of PDAD_Applications


Ignore:
Timestamp:
Jan 17, 2010, 9:26:03 PM (14 years ago)
Author:
claudiu.gheorghe
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PDAD_Applications

    v19 v20  
    3535d. The Mapper classifies inputs and emits keys 'duration classification - reason' having the value of 1, while the Reducer counts the values and if they exceed 1000, they are outputted. The Combiner does basically the same thing as the Reducer.
    3636
    37 e. On the Mapper we emit a multiple value composed from the fault id and the count of 1. So the output of the Mapper looks like <fault_domain, <fault_code, 1>>. The multiple values will be grouped by each fault_domain, so we compute the sum on each fault_code, using a HashMap<fault_code, sum>. So the reducer will emit multiple values to the output, one for each fault_code found.
     37e. On the Mapper we emit a multiple value composed from the fault id and the count of 1. So the output of the Mapper looks like <fault_domain, <fault_code, 1>>. The multiple values will be grouped by each fault_domain, so we compute the sum on each fault_code, using a !HashMap<fault_code, sum>. So the reducer will emit multiple values to the output, one for each fault_code found.
    3838
    3939f. This is a more interesting application, as it requires a natural join between the node table and the event_trace table on platform_id and node_id fields. In order to do that, we concatenate the two join columns. The idea here is that the Mapper will read both files and will emit for a platform_id;node_id key values consisting both of values 1 (meaning the failures on that node), but also location. So after the map phase, we should have for a platform_id;node_id key many values of 1 and also a location. These paits will reach the Combiner, however depending on which of the Mappers found the location, some Combiners may receive amongst the values just values of 1, so the best effort here is to output the same key, but adding the 1 values. If the Combiner finds a locations between the values, it will output the pair without changing it. Now, in the third phase, all the pairs having the same keys reach the Reducer, who will sum the numeric values and output the location it finds amongst the values as a key. However, having multiple nodes in the same location, each reducer may output more than one numeric value for the same location, which then have to be some up. That's why we need a '''second map-reduce job''', with an identity Mapper and a reducer that will just sum the values having the same key.