Context Navigation

Changes between Version 20 and Version 21 of HadoopA51

Timestamp:: Jan 19, 2010, 6:35:44 PM (14 years ago)
Author:: lgrijincu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

HadoopA51

-                      v20
+                      v21
 == Performance ==
+Our design would produce an estimated 64TiB of rainbow table data (if stored binary) and 64*2^40^*[log_10_(medium_hash_value) + length of separators] (the last value is ~20) if stored as text files.
+Our design would produce an estimated 64TiB of rainbow table data (if stored binary) and 64*2^40^*[log_10_(medium_hash_value) + length of separators] if stored as text files.
+In our tests the traffic on the network cards did not represent a bottleneck never exceeding the maximum 100Mbps ethernet connection, but in a real cluster with a significant number of nodes this might degrade performance.
+The algorithms used in this experiment generated table data at a rate of about 10KiB/s. On a single node, supposing that storring the data and sorting ''<key, value>'' pairs in larger setups does not worsen performance this would lead to about 200 years of non-stop computing.
+Run on four threads (again: two single core machines and one dual core machine connected with on a 100 Mbit network), the total rate of data was about 37KiB/s.
+To finish computing this data in one year's time would take a cluster of at least 250 machines. The real number of machines will surely be larger because, even though the mappers always run in paralel and have no data dependencies between them, storing this amount of data on the cluster will decrease performance. Also, sorting and transferring ''<key, value>'' pairs between nodes will be the major bottleneck. Because we did not reach to saturate the network bandwidth we cannot make assumptions about the factor by which this will decrease performance.