Context Navigation

Changes between Version 10 and Version 11 of PDAD_Applications

Timestamp:: Jan 14, 2010, 2:37:06 PM (14 years ago)
Author:: cristina.basescu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

PDAD_Applications

v10	v11
22	22	d. The Mapper classifies inputs and emits keys 'duration classification - reason' having the value of 1, while the Reducer counts the values and if they exceed 1000, they are outputted. The Combiner does basically the same thing as the Reducer.
23	23
24		f. This is a more interesting application, as it requires a natural join between the node table and the event_trace table on platform_id and node_id fields. In order to do that, we concatenate the two join columns. The idea here is that the Mapper will read both files and will emit for a platform_id;node_id key values consisting both of values 1 (meaning the failures on that node), but also location. So after the map phase, we should have for a platform_id;node_id key
	24	f. This is a more interesting application, as it requires a natural join between the node table and the event_trace table on platform_id and node_id fields. In order to do that, we concatenate the two join columns. The idea here is that the Mapper will read both files and will emit for a platform_id;node_id key values consisting both of values 1 (meaning the failures on that node), but also location. So after the map phase, we should have for a platform_id;node_id key many values of 1 and also a location. These paits will reach the Combiner, however depending on which of the Mappers found the location, some Combiners may receive amongst the values just values of 1, so the best effort here is to output the same key, but adding the 1 values. If the Combiner finds a locations between the values, it will output the pair without changing it. Now, in the third phase, all the pairs having the same keys reach the Reducer, who will sum the numeric values and output the location it finds amongst the values as a key. However, having multiple nodes in the same location, each reducer may output more than one numeric value for the same location, which then have to be some up. That's why we need a '''second map-reduce job''', with an identity Mapper and a reducer that will just sum the values having the same key.
25	25
26	26	TODO Claudiu for his apps