Changes between Version 20 and Version 21 of HadoopA51


Ignore:
Timestamp:
Jan 19, 2010, 6:35:44 PM (14 years ago)
Author:
lgrijincu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HadoopA51

    v20 v21  
    8282
    8383== Performance ==
    84 Our design would produce an estimated 64TiB of rainbow table data (if stored binary) and 64*2^40^*[log_10_(medium_hash_value) + length of separators] (the last value is ~20) if stored as text files.
     84Our design would produce an estimated 64TiB of rainbow table data (if stored binary) and 64*2^40^*[log_10_(medium_hash_value) + length of separators] if stored as text files.
     85In our tests the traffic on the network cards did not represent a bottleneck never exceeding the maximum 100Mbps ethernet connection, but in a real cluster with a significant number of nodes this might degrade performance.
     86
     87The algorithms used in this experiment generated table data at a rate of about 10KiB/s. On a single node, supposing that storring the data and sorting ''<key, value>'' pairs in larger setups does not worsen performance this would lead to about 200 years of non-stop computing.
     88
     89Run on four threads (again: two single core machines and one dual core machine connected with on a 100 Mbit network), the total rate of data was about 37KiB/s.
     90To finish computing this data in one year's time would take a cluster of at least 250 machines. The real number of machines will surely be larger because, even though the mappers always run in paralel and have no data dependencies between them, storing this amount of data on the cluster will decrease performance. Also, sorting and transferring ''<key, value>'' pairs between nodes will be the major bottleneck. Because we did not reach to saturate the network bandwidth we cannot make assumptions about the factor by which this will decrease performance.
     91
     92