Cryptanalysis of Hagelin machine pin wheels
Cryptologia, Oct 2002 by Sullivan, Geoff
ABSTRACT: A ciphertext-only attack on the pin wheel patterns of the Hagelin CD-57 Cryptographer is described. The method is also applicable to some earlier Hagelin machines of the pin wheel and lug variety, for example the M-209. The only prior knowledge required is the setting of the lugs and the plain text frequency for the language of the message. The method is extended to finding the lug and pin settings of the M-209 using a longer message.
KEYWORDS: Hagelin Cryptographer, CD-57, M-209, Hillclimbing, Ciphertext-Only cryptanalysis, Genetic algorithm.
INTRODUCTION
During winter 1999 - spring 2000, I took part in The Cipher Challenge) in Simon Singh's book The Code Book . Many of those who attempted this cipher challenge used Jim Gillogly's method of `Shotgun Hillclimbing' applied to cryptanalysis of classical ciphers for the solution of one or more of the ciphers. The procedure for attacking a classical cipher is described by Gillogly as follows:
2. Generate and score each "adjacent" postion.
4. Go to 2 if there was any improvement.
Manually stop the process when it looks like natural language.
The 'Shotgun' part is the random restart. Hillclimbing is a well known search technique usually going under the name Stochastic Iterated Hill Climbing (SIHC) or Multistart Hill Climbing (MHC). This has an advantage over other genetic algorithms in that the stochastic variant will cover a much larger area to find the global optimum peak which can be the only true solution to the cipher, thus avoiding becoming trapped in a local peak. One evening, after several weeks without success, the solution to a Playfair cipher appeared on the computer screen - after about 9 minutes of run time. I'm not sure now what I changed to get success but it was great to see it working. The same procedure broke the substitution stage of the ADFGVX cipher in the challenge a few days later. There the matter rested, until I started to think about Hagelin pin and lug machines. Would the procedure work for a pin wheel machine? Since a partially correct decrypt is related to the amount of correct key it seemed worth a try. Having just produced a computer simulation of the Hagelin CD-57, this seemed a suitable candidate machine. The CD-57 [2, 31 has six pin wheels chosen from a set of twelve3 of length 25, 26, 29, 31, 34, 37, 38, 41, 42, 43, 46 and 47. The CD-57 used for this investigation is configured with wheels of length 29, 31, 37, 41, 43 and 47 pins. All wheels step by one position at each cycle. Associated with each wheel is a lug which allows the alphabet disc to rotate by an integral number of places if an active pin is presented. Each of the six lugs can be set to values between 1 and 16.
PLAIN TEXT DETECTOR
An important part of the hillclimb process is the score system used as a detector of the output plain text. It was decided to limit this investigation to test messages of a realistic size, no more than 500 characters long. A randomly set start key for the pin wheels must have around half the pins correctly set, since they can only take on one of two states. However, six wheels are used to generate six parallel bit streams, so we have no more than 0.56. correct settings from our random start. This is equivalent to no more than 8 correct alphabet shifts in a 500 character message. The plain text detector is required to detect very small increases in plain text as the process runs and we shuffle pins into their correct position. The simplest detector is the Index of Coincidence. However this proved to be unsuccessful in detecting a small increase in the correct decrypt of a 500 character message. To improve on unigram counting, bigram and trigram system were considered. Bigrams are more likely to form first and this proved to be the better detector of the two. A bigram frequency table was constructed from a sample of 50,000 characters of plain English text4 - the target language for the test messages to be attacked. A measure of the output plain text is obtained by the sum of the log frequencies of the bigrams in the output. This is the Log-liklehood or Sinkov statistic [6, p. 77].
THE HILLCLIMB PROGRAM