Monoalphabetic Ciphers
With only 25 possible keys, the Caesar
cipher is far from secure. A dramatic increase in the key space can be achieved
by allowing an arbitrary substitution. Recall the assignment for the Caesar
cipher:
plain: a b c d e f g h i j k l m n o p q r s t u v w x y z
cipher: D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
If, instead, the "cipher" line
can be any permutation of the 26 alphabetic characters, then there are 26! or
greater than 4 x 1026 possible
keys. This is 10 orders of magnitude greater than the key space for DES and
would seem to eliminate brute-force techniques for cryptanalysis. Such an
approach is referred to as a monoalphabetic substitution cipher, because a single
cipher alphabet (mapping from plain alphabet to cipher alphabet) is used per
message.
There is, however, another line of attack.
If the cryptanalyst knows the nature of the plaintext (e.g., noncompressed
English text), then the analyst can exploit the regularities of the language.
To see how such a cryptanalysis
might proceed, we give a partial example here that is adapted from one in [SINK66].
The ciphertext to be solved is
UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ
VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX
EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ
As a first step, the relative frequency of
the letters can be determined and compared to a standard frequency distribution
for English, such as is shown in Figure 2.5 (based on [LEWA00]).
If the message were long enough, this technique alone might be sufficient, but
because this is a relatively short message, we cannot expect an exact match. In
any case, the relative frequencies of the letters in the ciphertext (in
percentages) are as follows:
P 13.33
|
H 5.83
|
F 3.33
|
B 1.67
|
C 0.00
|
Z 11.67
|
D 5.00
|
W 3.33
|
G 1.67
|
K 0.00
|
S 8.33
|
E 5.00
|
Q 2.50
|
Y 1.67
|
L 0.00
|
U 8.33
|
V 4.17
|
T 2.50
|
I 0.83
|
N 0.00
|
O 7.50
|
X 4.17
|
A 1.67
|
J 0.83
|
R 0.00
|
M 6.67
|
|
|
|
|
Comparing this breakdown with, it seems
likely that cipher letters P and Z are the equivalents of plain letters e and
t, but it is not certain which is which. The letters S, U, O, M, and H are all
of relatively high frequency and probably correspond to plain letters from the
set {a, h, i, n, o, r, s}.The letters with the lowest frequencies (namely, A,
B, G, Y, I, J) are likely included in the set {b, j, k, q, v, x, z}.
There are a number of ways to proceed at
this point. We could make some tentative assignments and start to fill in the
plaintext to see if it looks like a reasonable "skeleton" of a
message. A more systematic approach is
to look for other regularities. For example, certain words may be known to be
in the text. Or we could look for repeating sequences of cipher letters and try
to deduce their plaintext equivalents.
A powerful tool is to look at the
frequency of two-letter combinations, known as digrams. A table similar
to could be drawn up showing the relative frequency of digrams. The
most common such digram is th. In our ciphertext, the most common digram is ZW,
which appears three times. So we make the correspondence of Z with t and W with
h. Then, by our earlier hypothesis, we can equate P with e. Now notice that the
sequence ZWP appears in the ciphertext, and we can translate that sequence as
"the." This is the most frequent trigram (three-letter combination)
in English, which seems to indicate that we are on the right track.
Next, notice the sequence ZWSZ in the
first line. We do not know that these four letters form a complete word, but if
they do, it is of the form th_t. If so, S equates with a.
So far, then, we have
UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ
t a e e te a that e e a a
VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX
e t ta t ha e ee a e th t a
EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ
e e e tat e the t
Only four letters have been identified,
but already we have quite a bit of the message. Continued analysis of
frequencies plus trial and error should easily yield a solution from this
point. The complete plaintext, with spaces added between words, follows:
it was disclosed yesterday that several informal but
direct contacts have been made with political
representatives of the viet cong in moscow
Monoalphabetic
ciphers are easy to break because they reflect the frequency data of the
original alphabet. A countermeasure is to provide multiple substitutes, known
as homophones, for a single letter. For example, the letter e could be assigned
a number of different cipher symbols, such as 16, 74, 35, and 21, with each
homophone used in rotation, or randomly. If the number of symbols assigned to
each letter is proportional to the relative frequency of that letter, then
single-letter frequency information is completely obliterated. The great
mathematician Carl Friedrich Gauss believed that he had devised an unbreakable
cipher using homophones. However, even with homophones, each element of
plaintext affects only one element of ciphertext, and multiple-letter patterns
(e.g., digram frequencies) still survive in the ciphertext, making
cryptanalysis relatively straightforward.
Two principal methods are used in
substitution ciphers to lessen the extent to which the structure of the
plaintext survives in the ciphertext: One approach is to encrypt multiple
letters of plaintext, and the other is to use multiple cipher alphabets. We
briefly examine each.
Playfair Cipher
The best-known multiple-letter encryption
cipher is the Playfair, which treats digrams in the plaintext as single units
and translates these units into ciphertext digrams.[3]
[3] This cipher was actually invented by
British scientist Sir Charles Wheatstone in 1854, but it bears the name of his
friend Baron Playfair of St. Andrews, who championed the cipher at the British
foreign office.
No comments:
Post a Comment