Substitute Bytes Transformation
Forward and Inverse Transformations
The forward substitute byte
transformation, called SubBytes, is a simple table lookup (Figure 5.4a). AES defines a 16 x 16 matrix
of byte values, called an S-box (Table
5.4a), that contains a permutation of all possible 256 8-bit values. Each
individual byte of State is mapped into a new
byte in the following way: The leftmost 4 bits of the byte are used as a row
value and the rightmost 4 bits are used as a column value. These row and column
values serve as indexes into the S-box to select a unique 8-bit output value.
For example, the hexadecimal value[4] {95} references row 9, column 5 of the S-box,
which contains the value {2A}. Accordingly, the value {95} is mapped into the
value {2A}.
Rationale
The shift row transformation is more substantial than it may
first appear. This is because the State, as
well as the cipher input and output, is treated as an array of four 4-byte
columns. Thus, on encryption, the first 4 bytes of the plaintext are copied to
the first column of State, and so on. Further,
as will be seen, the round key is applied to State column by column. Thus, a row shift moves an
individual byte from one column to another, which is a linear distance of a
multiple of 4 bytes. Also note that the transformation ensures that the 4 bytes
of one column are spread out to four different columns. Figure 5.3 illustrates the effect.
MixColumns Transformation
Forward and Inverse Transformations
The forward mix column transformation, called MixColumns,
operates on each column individually. Each byte of a column is mapped into a new
value that is a function of all four bytes in that column. The transformation
can be defined by the following matrix multiplication on State (Figure
5.5b):
Equation 5-3
Each element in the product matrix is the sum of products of
elements of one row and one column. In this case, the individual additions and
multiplications[6] are performed in GF(28). The MixColumns
transformation on a single column j(0 j 3) of State
can be expressed as
[6] We follow the convention of FIPS PUB 197 and use the symbol · to indicate multiplication over the finite field GF(28) and to indicate bitwise XOR, which corresponds to addition in GF(28).
Equation 5-4
The following is an example of MixColumns:
Let us verify the first column of this example. Recall from Section 4.6 that, in
GF(28), addition is the bitwise XOR operation and that multiplication
can be performed according to the rule established in Equation (4.10). In particular,
multiplication of a value by x (i.e., by {02})
can be implemented as a 1-bit left shift followed by a conditional bitwise XOR
with (0001 1011) if the leftmost bit of the original value (prior to the shift)
is 1. Thus, to verify the MixColumns transformation on the first column, we need
to show that
({02} · {87})
|
({03} · {6E})
|
{46}
|
{A6}
|
= {47}
|
{87}
|
({02} · {6E})
|
({03} · {46})
|
{A6}
|
= {37}
|
{87}
|
{6E}
|
({02} · {46}
|
({03} · {A6})
|
= {94}
|
({03} · {87})
|
{6E}
|
{46}
|
({02} · {A6}
|
= {ED}
|
For the first equation, we have {02} · {87} = (0000 1110) (0001 1011) = (0001 0101); and {03} · {6E} =
{6E} ({02} · {6E}) = (0110 1110) (1101 1100) = (1011 0010). Then
{02} · {87}
|
=
|
0001 0101
|
||
{03} · {6E}
|
=
|
1011 0010
|
||
{46}
|
=
|
0100 0110
|
||
{A6}
|
=
|
1010 0110
| ||
0100 0111 = {47}
|
The other equations can be similarly verified.
The inverse mix column
transformation, called InvMixColumns, is defined by the following matrix
multiplication:
Equation 5-5
It is not immediately clear that Equation (5.5) is the inverse of Equation (5.3). We need to show that:
which is equivalent to showing that:
Equation 5-6
That is, the inverse transformation matrix times the forward
transformation matrix equals the identity matrix. To verify the first column of
Equation (5.6), we need to show that:
({0E} · {02}) {0B} {0D}
({09} · {03}) = {01}
({09} · {02}) {0E} {0B}
({0D} · {03}) = {00}
({0D} · {02}) {09} {0E}
({0B} · {03}) = {00}
({0B} · {02}) {0D} {09}
({0E} · {03}) = {00}
For the first equation, we have {0E} ·
{02}) 00011100; and {09} · {03} = {09}
({09} · {02}) = 00001001 00010010 = 00011011. Then
{0E} · {02}
|
=
|
00011100
| |
{0B}
|
=
|
00001011
| |
{0D}
|
=
|
00001101
| |
{09} · {03}
|
=
|
00011011
| |
00000001
|
The other equations can be similarly verified.
The AES document describes another way of characterizing the
MixColumns transformation, which is in terms of polynomial arithmetic. In the
standard, MixColumns is defined by considering each column of State to be a four-term polynomial with coefficients
in GF(28). Each column is multiplied modulo (x4 + 1) by the fixed polynomial a(x), given by
Equation 5-7
Appendix
5A demonstrates that multiplication of each column of State by a(x) can be written as the matrix multiplication of Equation (5.3). Similarly, it can be seen
that the transformation in Equation (5.5)
corresponds to treating each column as a four-term polynomial and multiplying
each column by b(x), given by
Equation 5-8
It can readily be shown that b(x) = a1 (x) mod
(x4 + 1).
Rationale
The coefficients of the matrix in Equation (5.3) are based on a linear code with maximal
distance between code words, which ensures a good mixing among the bytes of each
column. The mix column transformation combined with the shift row transformation
ensures that after a few rounds, all output bits depend on all input bits. See
[DAEM99] for a
discussion.
In addition, the choice of coefficients in MixColumns, which
are all {01}, {02}, or {03}, was influenced by implementation considerations. As
was discussed, multiplication by these coefficients involves at most a shift and
an XOR. The coefficients in InvMixColumns are more formidable to implement.
However, encryption was deemed more important than decryption for two
reasons:
-
For the CFB and OFB cipher modes (Figures 6.5 and 6.6; described in Chapter 6), only encryption is used.
-
As with any block cipher, AES can be used to construct a message authentication code (Part Two), and for this only encryption is used.
AddRoundKey Transformation
Forward and Inverse Transformations
In the forward add round key
transformation, called AddRoundKey, the 128 bits of State are bitwise XORed with the 128 bits of the
round key. As shown in Figure 5.4b, the
operation is viewed as a columnwise operation between the 4 bytes of a State column and one word of the round key; it can
also be viewed as a byte-level operation. The following is an example of
AddRoundKey:
The first matrix is State, and
the second matrix is the round key.
The inverse add round key
transformation is identical to the forward add round key transformation,
because the XOR operation is its own inverse.
Rationale
The add round key transformation is as simple as possible and
affects every bit of State. The complexity of
the round key expansion, plus the complexity of the other stages of AES, ensure
security.
AES Key Expansion
Key Expansion Algorithm
The AES key expansion algorithm takes as input a 4-word
(16-byte) key and produces a linear array of 44 words (176 bytes). This is
sufficient to provide a 4-word round key for the initial AddRoundKey stage and
each of the 10 rounds of the cipher. The following pseudocode describes the
expansion:
The key is copied into the first four words of the expanded
key. The remainder of the expanded key is filled in four words at a time. Each
added word w[i] depends on the immediately
preceding word, w[i 1], and the word four
positions back,w[i 4]. In three out of four
cases, a simple XOR is used. For a word whose position in the w array is a multiple of 4, a more complex function
is used. Figure 5.6 illustrates the
generation of the first eight words of the expanded key, using the symbol g to
represent that complex function. The function g consists of the following
subfunctions:
-
RotWord performs a one-byte circular left shift on a word. This means that an input word [b0, b1, b2, b3] is transformed into [b1, b2, b3, b0].
-
SubWord performs a byte substitution on each byte of its input word, using the S-box (Table 5.4a).
-
The result of steps 1 and 2 is XORed with a round constant, Rcon[j].
Figure 5.6. AES Key Expansion
The round constant is a word in which the three rightmost bytes
are always 0. Thus the effect of an XOR of a word with Rcon is to only perform
an XOR on the leftmost byte of the word. The round constant is different for
each round and is defined as Rcon[j] = (RC[j], 0, 0, 0), with RC[1] = 1, RC[j] =
2 · RC[j - 1] and with multiplication defined over the field GF(28).
The values of RC[j] in hexadecimal are
j
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
RC[j]
|
01
|
02
|
04
|
08
|
10
|
20
|
40
|
80
|
1B
|
36
|
For example, suppose that the round key for round 8 is
EA D2 73 21 B5 8D BA D2 31 2B F5 60 7F 8D 29 2F
Then the first 4 bytes (first column) of the round key for
round 9 are calculated as follows:
i (decimal)
|
temp
|
After RotWord
|
After SubWord
|
Rcon (9)
|
After XOR with Rcon
|
w[i 4]
|
w[i] = temp w[i
4]
|
36
|
7F8D292F
|
8D292F7F
|
5DA515D2
|
1B000000
|
46A515D2
|
EAD27321
|
AC7766F3
|
Rationale
The Rijndael developers designed the expansion key algorithm to
be resistant to known cryptanalytic attacks. The inclusion of a round-dependent
round constant eliminates the symmetry, or similarity, between the ways in which
round keys are generated in different rounds.
The specific criteria that were used are as follows [DAEM99]:
-
Knowledge of a part of the cipher key or round key does not enable calculation of many other round key bits
-
An invertible transformation [i.e., knowledge of any Nk consecutive words of the Expanded Key enables regeneration the entire expanded key (Nk = key size in words)]
-
Speed on a wide range of processors
-
Usage of round constants to eliminate symmetries
-
Diffusion of cipher key differences into the round keys; that is, each key bit affects many round key bits
-
Enough nonlinearity to prohibit the full determination of round key differences from cipher key differences only
-
Simplicity of description
The authors do not quantify the first point on the preceding
list, but the idea is that if you know less than Nk consecutive words of either the cipher key or one of
the round keys, then it is difficult to reconstruct the remaining unknown bits.
The fewer bits one knows, the more difficult it is to do the reconstruction or
to determine other bits in the key expansion.
Equivalent Inverse Cipher
As was mentioned, the AES decryption cipher is not identical to
the encryption cipher (Figure 5.1). That
is, the sequence of transformations for decryption differs from that for
encryption, although the form of the key schedules for encryption and decryption
is the same. This has the disadvantage that two separate software or firmware
modules are needed for applications that require both encryption and decryption.
There is, however, an equivalent version of the decryption algorithm that has
the same structure as the encryption algorithm. The equivalent version has the
same sequence of transformations as the encryption algorithm (with
transformations replaced by their inverses). To achieve this equivalence, a
change in key schedule is needed.
Two separate changes are needed to bring the decryption
structure in line with the encryption structure. An encryption round has the
structure SubBytes, ShiftRows, MixColumns, AddRoundKey. The standard decryption
round has the structure InvShiftRows, InvSubBytes, AddRoundKey, InvMixColumns.
Thus, the first two stages of the decryption round need to be interchanged, and
the second two stages of the decryption round need to be interchanged.
Interchanging InvShiftRows and InvSubBytes
InvShiftRows affects the sequence of bytes in State but does not alter byte contents and does not
depend on byte contents to perform its transformation. InvSubBytes affects the
contents of bytes in State but does not alter
byte sequence and does not depend on byte sequence to perform its
transformation. Thus, these two operations commute and can be interchanged. For
a given State Si,
InvShiftRows [InvSubBytes (Si)] = InvSubBytes [InvShiftRows (Si)]
Interchanging AddRoundKey and InvMixColumns
The transformations AddRoundKey and InvMixColumns do not
alter the sequence of bytes in State. If we
view the key as a sequence of words, then both AddRoundKey and InvMixColumns
operate on State one column at a time. These
two operations are linear with respect to the column input. That is, for a given
State Si and a
given round key wj:
InvMixColumns (Si wj) =
[InvMixColumns (Si)]
[InvMixColumns (wj)]
To see this, suppose that the first column of State Si is the sequence (y0, y1, y2, y3) and the first column of the round key
wj is
(k0, k1, k2, k3). Then we need to show that
Let us demonstrate that for the first column entry. We need to
show that:
[{0E} · (y0
k0)] [{0B} · (y1
k1)] [{0D} · (y2 k2)] [{09} · (y3
k3)]
= [{0E} · y0]
[{0B} · y1] [{0D} ·
y2] [{09} · y3]
[[{0E} · k0]
] [{0B} · k1] [{0D} · k2] [{09} · k3]
This equation is valid by inspection. Thus, we can interchange
AddRoundKey and InvMixColumns, provided that we first apply InvMixColumns to the
round key. Note that we do not need to apply InvMixColumns to the round key for
the input to the first AddRoundKey transformation (preceding the first round)
nor to the last AddRoundKey transformation (in round 10). This is because these
two AddRoundKey transformations are not interchanged with InvMixColumns to
produce the equivalent decryption algorithm.
Figure 5.7 illustrates
the equivalent decryption algorithm.
Figure 5.7. Equivalent Inverse Cipher
(This item is displayed on page 158 in the print version)
Implementation Aspects
The Rijndael proposal [DAEM99] provides some suggestions for
efficient implementation on 8-bit processors, typical for current smart cards,
and on 32-bit processors, typical for PCs.
8-Bit Processor
AES can be implemented very efficiently on an 8-bit processor.
AddRoundKey is a bytewise XOR operation. ShiftRows is a simple byte shifting
operation. SubBytes operates at the byte level and only requires a table of 256
bytes.
The transformation MixColumns requires matrix multiplication in
the field GF(28), which means that all operations are carried out on
bytes. MixColumns only requires multiplication by {02} and {03}, which, as we
have seen, involved simple shifts, conditional XORs, and XORs. This can be
implemented in a more efficient way that eliminates the shifts and conditional
XORs. Equation Set (5.4) shows the
equations for the MixColumns transformation on a single column. Using the
identity {03} · x = ({02} · x) x, we can rewrite Equation Set (5.4) as follows:
Equation 5-9
Equation Set (5.9) is
verified by expanding and eliminating terms.
The multiplication
by {02} involves a shift and a conditional XOR. Such an implementation may be
vulnerable to a timing attack of the sort described in Section 3.4. To counter this attack
and to increase processing efficiency at the cost of some storage, the
multiplication can be replaced by a table lookup. Define the 256-byte table X2,
such that X2[i] = {02} · i. Then Equation Set
(5.9) can be rewritten as
Tmp = so, j s1,
j s2, j s3, j
s'0, j = s0, j Tmp X2[so, j s1,
j]
s'1, c = s1, j Tmp X2[s1, j s2,
j]
s'2, c = s2, j Tmp X2[s2, j s3,
j]
s'3, j = s3, j Tmp X2[s3, j s0,
j]
32-Bit Processor
The implementation described in the preceding subsection uses
only 8-bit operations. For a 32-bit processor, a more efficient implementation
can be achieved if operations are defined on 32-bit words. To show this, we
first define the four transformations of a round in algebraic form. Suppose we
begin with a State matrix consisting of
elements ai,j and a round key matrix
consisting of elements ki,j. Then the
transformations can be expressed as follows:
SubBytes
|
bi,j = S[ai,j]
|
ShiftRows
|
|
MixColumns
|
|
AddRoundKey
|
|
In the ShiftRows equation, the column indices are taken mod 4.
We can combine all of these expressions into a single equation:
In the second equation, we are expressing the matrix
multiplication as a linear combination of vectors. We define four 256-word
(1024-byte) tables as follows:
Thus, each table takes as input a byte value and produces a
column vector (a 32-bit word) that is a function of the S-box entry for that
byte value. These tables can be calculated in advance.
We can define a round function operating on a column in the
following fashion:
As a result, an implementation based on the preceding equation
requires only four table lookups and four XORs per column per round, plus 4
Kbytes to store the table. The developers of Rijndael believe that this compact,
efficient implementation was probably one of the most important factors in the
selection of Rijndael for AES.
No comments:
Post a Comment