Context Navigation

Changes between Version 17 and Version 18 of HadoopA51

Timestamp:: Jan 19, 2010, 5:25:44 PM (14 years ago)
Author:: lgrijincu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

HadoopA51

-                      v17
+                      v18
  * running multiple map-reduce jobs to compute the tables from the input data
+The mapping phase of each map-reduce job reads one by one secrets from the distributed set of random secrets on the current Hadoop node and generates the chain of Ri and H applications. The result of a MAP instance is a ''<key, value>'' pair with the ''key'' being the end secret and the ''value'' being the start secret of that chain.
+The mapping phase of each map-reduce job reads one by one secrets from the distributed set of random secrets on the current Hadoop node and generates the chain of Ri and H applications. The result of a MAP instance is a ''<key, value>'' pair with the ''key'' being the end secret and the ''value'' being the start secret of that chain. We use this approach to skip a phase that otherwise would have been implemented manually: the end secrets need to be sorted and all their starting secrets gathered in the same place to simplify searching. This is implemented making use of the internal behaviour of Hadoop's MapReduce implementation: the reducers receive a list of ''<key, value>'' pairs with the same ''key'' and ''reduce'' them to provide other  ''<key, value>'' pairs. In our case the values are lists of starting secrets and the reducing part is just a merge of these lists.
+The Map phase does little traffic outside the node (just receiving the set of random input secrets and the R function specifications).
+On the other hand, the Reducer phase does much traffic between nodes because it needs to gather all ''<key, value>'' pairs with the same ''key''. By the nature of the A5/1 cypher and the ''R'' functions there is no simple corelation between the start secret and the end secret that can be used to reduce this inter-node communication.
+In our tests we had only four Hadoop threads mapped on three machines (one dual core machine, and two single core ones). In such a limitted setup the amount of communication between the nodes was not a bottleneck, never reaching to saturate the available bandwidth (100Mbps). In real setups with tens-hundreds of Hadoop machines this may become a problem and may skewe out performance estimates.
 == Searching a hash ==