wiki:HadoopA51

Version 13 (modified by lgrijincu, 14 years ago) (diff)

y

HadoopA51: Hadoop implementation of an rainbow table generator and searcher for cracking the A5/1 cypher

  • Nume Scurt: HadoopA51
  • SVN: https://svn-batch.grid.pub.ro/svn/PP2009/proiecte/HadoopA51/ (imported from the GIT repository)
  • GIT: http://github.com/luciang/hadoop-rainbow-table-a51/ (with development history)
  • Project members: Lucian Adrian Grijincu - lucian.grijincu
  • Project description: the A5/1 cypher used to encrypt data sent over the air between a cell phone and a cell tower was proven broken several times. In 2009 a group of reserchers started to generate rainbow tables for A5/1 using CUDA machines. We seek to analise the generation of rainbow tables and the search on those tables on a distributed Hadoop cluster.

History & Motivation

History

A5/1 is one of the most prolific stream ciphers used worldwide (surpassing the one used in ssh or https by total size of encrypted data per year). It is used to provide OTA communication privacy and authentification in the GSM cell phone networks between a cell phone and the first cell tower.

Though at first the algorithm was not public, it was made available by researchers through reverse engineering and hardware monitoring. A reference implementation is given in https://svn-batch.grid.pub.ro/svn/PP2009/proiecte/HadoopA51/docs/a5/a5-1.c

Through time the algorithm was proven weak & broken (a selection of proofs):

  • 1997 - Jovan Golic published "Cryptanalysis of Alleged A5 Stream Cipher" (http://jya.com/a5-hack.htm). Their attach uses a time-memory trade-off attack (based on the birthday paradox) which reveals an unknown internal state at a known time for a known keystream sequence. This internal state is then used to obtain the secret key.
  • 2000 - Eli Biham and Orr Dunkelman analize A5/1 in "Cryptanalysis of the A5/1 GSM Stream Cipher" demonstrate an attack on A5/1 which relies on ~220 known plaintext data.
  • 2003 - Barkan et al. published several active attacks on GSM by forcing a phone to use A5/2 (which is weaker than A5/1). As the key used in A5/2 and A5/1 in GSM implementations is the same, this lead to breaking the A5/1 code.
  • 2005 - The same authors in "Conditional Estimators: An Effective Attack on A5/1" provide yet another attack on A5/1.
  • 2009 - Karsten Nohl and Chris Paget provide details of an rainbow table attack on A5/1 in "GSM: SRSLY?" at the 26th Chaos Communication Congress (26C3). This attack was announched in September 2009 and lead to this project (HadoopA51)

Motivation

Such a wide used weak cypher must have sparked the interest of government agencies and criminal groups which want to gain illegal information shared through the cell phone.

Altough the cypher was proved broken several times before since at least 13 years ago, the GSM Association has not deployed another strong cypher to replace A5/1 (the proposed A5/3 is not yet supported in all networks or devices and it too was proven broken).

This project seeks to demonstrate that using commodity hardware organised in a Hadoop cluster can lead to an effective attack against A5/1.

Related Work

Project design

Rainbow tables

Table generation

Generating all possible outputs for known input strings of a certain length and all possible keys generates too much data making impractical storage and search of such an imense database.

To aleviate the shortcomings of this approach, we generate a number of precomputed hash chains organized as rainbow tables.

For each table, we define a set of reduction functions R that map hash values to secret values. The reduction functions used in each table must be different. This is needed because the R functions are not the inverses of H, the crypto function, so there are sets of input data that generate the same end values. This merger of two chains makes it difficult to determine which secret value generated a certain end secret. We solve this problem by generating a series of tables of chains. The probability of two mergers to happen at the same place and continue is smaller when we have different sets of reduction functions.

As it can be seen from the above image, the chain generation selects a set of random input values, applyes H (the crypto function, in our case A5/1) and then Ri. For each input secret value we only store the START and END secrets from the chain. Theoretically, chains can have any length we want (from 1-2 series of H and Ri applications to 215-220 applications). Practically smaller chains will lead to a need of bigger tables, while larger chains will increase the lookup time.

Table lookup

Table generation

Searching a hash

Attachments (4)

Download all attachments as: .zip