Polyalphabetic Ciphers
Another way to improve
on the simple monoalphabetic technique is to use different monoalphabetic
substitutions as one proceeds through the plaintext message. The general name
for this approach is polyalphabetic substitution cipher. All these techniques have the following
features in common:
1. A set of related monoalphabetic substitution
rules is used.
2. A key determines which particular rule is chosen
for a given transformation.
The best known, and one
of the simplest, such algorithm is referred to as the Vigenère cipher. In this scheme, the set of related monoalphabetic substitution
rules consists of the 26 Caesar ciphers, with shifts of 0 through 25. Each
cipher is denoted by a key letter, which is the ciphertext letter that
substitutes for the plaintext letter a. Thus, a Caesar cipher with a shift of 3
is denoted by the key value d.
To aid in understanding
the scheme and to aid in its use, a matrix known as the Vigenère tableau is
constructed (Table 2.3). Each of the
26 ciphers is laid out horizontally, with the key letter for each cipher to its
left. A normal alphabet for the plaintext runs across the top. The process of
encryption is simple: Given a key letter x and a plaintext letter y, the ciphertext letter is at the
intersection of the row labeled x and
the column labeled y; in this case the ciphertext is V.
To encrypt a message, a
key is needed that is as long as the message. Usually, the key is a repeating
keyword. For example, if the keyword is deceptive, the message "we are discovered save yourself" is
encrypted as follows:
key: deceptivedeceptivedeceptive
plaintext:
wearediscoveredsaveyourself
ciphertext: ZICVTWQNGRZGVTWAVZHCQYGLMGJ
Decryption is equally
simple. The key letter again identifies the row. The position of the ciphertext
letter in that row determines the column, and the plaintext letter is at the
top of that column.
The strength of this
cipher is that there are multiple ciphertext letters for each plaintext letter,
one for each unique letter of the keyword. Thus, the letter frequency
information is obscured. However, not all knowledge of the plaintext structure
is lost. For example, Figure 2.6 shows the frequency distribution for a Vigenère
cipher with a keyword of length 9. An improvement is achieved over the Playfair
cipher, but considerable frequency information remains.
It is instructive to
sketch a method of breaking this cipher, because the method reveals some of the
mathematical principles that apply in cryptanalysis.
First, suppose that the
opponent believes that the ciphertext was encrypted using either monoalphabetic
substitution or a Vigenère cipher. A simple test can be made to make a
determination. If a monoalphabetic substitution is used, then the statistical
properties of the ciphertext should be the same as that of the language of the
plaintext. Thus, referring to Figure 2.5, there
should be one cipher letter with a relative frequency of occurrence of about
12.7%, one with about 9.06%, and so on. If only a single message is available
for analysis, we would not expect an exact match of this small sample with the
statistical profile of the plaintext language. Nevertheless, if the
correspondence is close, we can assume a monoalphabetic substitution.
If, on the other hand, a
Vigenère cipher is suspected, then progress depends on determining the length
of the keyword, as will be seen in a moment. For now, let us concentrate on how
the keyword length can be determined. The important insight that leads to a
solution is the following: If two identical sequences of plaintext letters
occur at a distance that is an integer multiple of the keyword length, they
will generate identical ciphertext sequences. In the foregoing example, two
instances of the sequence "red" are separated by nine character
positions. Consequently, in both cases, r is encrypted using key letter e, e is encrypted using key letter p, and d is encrypted using key letter t. Thus, in both cases the ciphertext sequence is
VTW.
An analyst looking at
only the ciphertext would detect the repeated sequences VTW at a displacement
of 9 and make the assumption that the keyword is either three or nine letters
in length. The appearance of VTW twice could be by chance and not reflect
identical plaintext letters encrypted with identical key letters. However, if
the message is long enough, there will be a number of such repeated ciphertext
sequences. By looking for common factors in the displacements of the various
sequences, the analyst should be able to make a good guess of the keyword
length.
Solution of the cipher
now depends on an important insight. If the keyword length is N, then the cipher, in effect, consists of N monoalphabetic substitution ciphers. For
example, with the keyword DECEPTIVE, the letters in positions 1, 10, 19, and so
on are all encrypted with the same monoalphabetic cipher. Thus, we can use the
known frequency characteristics of the plaintext language to attack each of the
monoalphabetic ciphers separately.
The periodic nature of
the keyword can be eliminated by using a nonrepeating keyword that is as long
as the message itself. Vigenère proposed what is referred to as an autokey system, in which a keyword is concatenated with the
plaintext itself to provide a running key. For our example,
key: deceptivewearediscoveredsav
plaintext:
wearediscoveredsaveyourself
ciphertext: ZICVTWQNGKZEIIGASXSTSLVVWLA
Even this scheme is
vulnerable to cryptanalysis. Because the key and the plaintext share the same
frequency distribution of letters, a statistical technique can be applied. For
example, e enciphered by e, by Figure 2.5, can be
expected to occur with a frequency of (0.127)2 0.016, whereas t enciphered by t would occur only about half as often. These regularities can be
exploited to achieve successful cryptanalysis.[8]
[8] Although the techniques
for breaking a Vigenère cipher are by no means complex, a 1917 issue of Scientific
American characterized this
system as "impossible of translation." This is a point worth
remembering when similar claims are made for modern algorithms.
The ultimate defense
against such a cryptanalysis is to choose a keyword that is as long as the
plaintext and has no statistical relationship to it. Such a system was
introduced by an AT&T engineer named Gilbert Vernam in 1918. His system
works on binary data rather than letters. The system can be expressed
succinctly as follows:
ci = pi ki
where
pi
|
= ith binary digit of
plaintext
|
ki
|
= ith binary digit of key
|
ci
|
= ith binary digit of
ciphertext
|
|
= exclusive-or (XOR) operation
|
Thus, the ciphertext is
generated by performing the bitwise XOR of the plaintext and the key. Because
of the properties of the XOR, decryption simply involves the same bitwise
operation:
pi = ci ki
The essence of this
technique is the means of construction of the key. Vernam proposed the use of a
running loop of tape that eventually repeated the key, so that in fact the
system worked with a very long but repeating keyword. Although such a scheme,
with a long key, presents formidable cryptanalytic difficulties, it can be
broken with sufficient ciphertext, the use of known or probable plaintext
sequences, or both.
An Army Signal Corp
officer, Joseph Mauborgne, proposed an improvement to the Vernam cipher that
yields the ultimate in security. Mauborgne suggested using a random key that is
as long as the message, so that the key need not be repeated. In addition, the
key is to be used to encrypt and decrypt a single message, and then is
discarded. Each new message requires a new key of the same length as the new
message. Such a scheme, known as a one-time pad, is unbreakable. It produces random output that bears no
statistical relationship to the plaintext. Because the ciphertext contains no
information whatsoever about the plaintext, there is simply no way to break the
code.
An example should
illustrate our point. Suppose that we are using a Vigenère scheme with 27
characters in which the twenty-seventh character is the space character, but
with a one-time key that is as long as the message. Thus, the tableau of Table 2.3 must be expanded to 27 x 27. Consider the
ciphertext
ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
ciphertext: ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
key: pxlmvmsydofuyrvzwc
tnlebnecvgdupahfzzlmnyih
plaintext: mr mustard with
the candlestick in the hall
ciphertext: ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
key: mfugpmiydgaxgoufhklllmhsqdqogtewbqfgyovuhwt
plaintext: miss scarlet
with the knife in the library
Suppose that a
cryptanalyst had managed to find these two keys. Two plausible plaintexts are
produced. How is the cryptanalyst to decide which is the correct decryption
(i.e., which is the correct key)? If the actual key were produced in a truly
random fashion, then the cryptanalyst cannot say that one of these two keys is
more likely than the other. Thus, there is no way to decide which key is
correct and therefore which plaintext is correct.
In fact, given any plaintext
of equal length to the ciphertext, there is a key that produces that plaintext.
Therefore, if you did an exhaustive search of all possible keys, you would end
up with many legible plaintexts, with no way of knowing which was the intended
plaintext. Therefore, the code is unbreakable.
The security of the
one-time pad is entirely due to the randomness of the key. If the stream of
characters that constitute the key is truly random, then the stream of
characters that constitute the ciphertext will be truly random. Thus, there are
no patterns or regularities that a cryptanalyst can use to attack the
ciphertext.
In theory, we need look
no further for a cipher. The one-time pad offers complete security but, in
practice, has two fundamental difficulties:
1. There is the practical problem of making large
quantities of random keys. Any heavily used system might require millions of
random characters on a regular basis. Supplying truly random characters in this
volume is a significant task.
2. Even more daunting is the problem of key
distribution and protection. For every message to be sent, a key of equal
length is needed by both sender and receiver. Thus, a mammoth key distribution
problem exists.
Because of these
difficulties, the one-time pad is of limited utility, and is useful primarily
for low-bandwidth channels requiring very high security.
No comments:
Post a Comment