Changes between Version 17 and Version 18 of HadoopA51


Ignore:
Timestamp:
Jan 19, 2010, 5:25:44 PM (14 years ago)
Author:
lgrijincu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HadoopA51

    v17 v18  
    6767 * running multiple map-reduce jobs to compute the tables from the input data
    6868
    69 The mapping phase of each map-reduce job reads one by one secrets from the distributed set of random secrets on the current Hadoop node and generates the chain of Ri and H applications. The result of a MAP instance is a ''<key, value>'' pair with the ''key'' being the end secret and the ''value'' being the start secret of that chain.
     69The mapping phase of each map-reduce job reads one by one secrets from the distributed set of random secrets on the current Hadoop node and generates the chain of Ri and H applications. The result of a MAP instance is a ''<key, value>'' pair with the ''key'' being the end secret and the ''value'' being the start secret of that chain. We use this approach to skip a phase that otherwise would have been implemented manually: the end secrets need to be sorted and all their starting secrets gathered in the same place to simplify searching. This is implemented making use of the internal behaviour of Hadoop's MapReduce implementation: the reducers receive a list of ''<key, value>'' pairs with the same ''key'' and ''reduce'' them to provide other  ''<key, value>'' pairs. In our case the values are lists of starting secrets and the reducing part is just a merge of these lists.
     70
     71The Map phase does little traffic outside the node (just receiving the set of random input secrets and the R function specifications).
     72On the other hand, the Reducer phase does much traffic between nodes because it needs to gather all ''<key, value>'' pairs with the same ''key''. By the nature of the A5/1 cypher and the ''R'' functions there is no simple corelation between the start secret and the end secret that can be used to reduce this inter-node communication.
     73
     74In our tests we had only four Hadoop threads mapped on three machines (one dual core machine, and two single core ones). In such a limitted setup the amount of communication between the nodes was not a bottleneck, never reaching to saturate the available bandwidth (100Mbps). In real setups with tens-hundreds of Hadoop machines this may become a problem and may skewe out performance estimates.
     75
     76
    7077== Searching a hash ==