Tuesday, 19 March 2013

Substitute Bytes Transformation


Substitute Bytes Transformation

Forward and Inverse Transformations
The forward substitute byte transformation, called SubBytes, is a simple table lookup (Figure 5.4a). AES defines a 16 x 16 matrix of byte values, called an S-box (Table 5.4a), that contains a permutation of all possible 256 8-bit values. Each individual byte of State is mapped into a new byte in the following way: The leftmost 4 bits of the byte are used as a row value and the rightmost 4 bits are used as a column value. These row and column values serve as indexes into the S-box to select a unique 8-bit output value. For example, the hexadecimal value[4] {95} references row 9, column 5 of the S-box, which contains the value {2A}. Accordingly, the value {95} is mapped into the value {2A}.

Rationale
The shift row transformation is more substantial than it may first appear. This is because the State, as well as the cipher input and output, is treated as an array of four 4-byte columns. Thus, on encryption, the first 4 bytes of the plaintext are copied to the first column of State, and so on. Further, as will be seen, the round key is applied to State column by column. Thus, a row shift moves an individual byte from one column to another, which is a linear distance of a multiple of 4 bytes. Also note that the transformation ensures that the 4 bytes of one column are spread out to four different columns. Figure 5.3 illustrates the effect.

[Page 151]

MixColumns Transformation

Forward and Inverse Transformations
The forward mix column transformation, called MixColumns, operates on each column individually. Each byte of a column is mapped into a new value that is a function of all four bytes in that column. The transformation can be defined by the following matrix multiplication on State (Figure 5.5b):
Equation 5-3

Each element in the product matrix is the sum of products of elements of one row and one column. In this case, the individual additions and multiplications[6] are performed in GF(28). The MixColumns transformation on a single column j(0 j 3) of State can be expressed as
[6] We follow the convention of FIPS PUB 197 and use the symbol · to indicate multiplication over the finite field GF(28) and to indicate bitwise XOR, which corresponds to addition in GF(28).
Equation 5-4

The following is an example of MixColumns:
Let us verify the first column of this example. Recall from Section 4.6 that, in GF(28), addition is the bitwise XOR operation and that multiplication can be performed according to the rule established in Equation (4.10). In particular, multiplication of a value by x (i.e., by {02}) can be implemented as a 1-bit left shift followed by a conditional bitwise XOR with (0001 1011) if the leftmost bit of the original value (prior to the shift) is 1. Thus, to verify the MixColumns transformation on the first column, we need to show that
({02} · {87})
({03} · {6E})
{46}
{A6}
= {47}
{87}
({02} · {6E})
({03} · {46})
{A6}
= {37}
{87}
{6E}
({02} · {46}
({03} · {A6})
= {94}
({03} · {87})
{6E}
{46}
({02} · {A6}
= {ED}



[Page 152]
For the first equation, we have {02} · {87} = (0000 1110) (0001 1011) = (0001 0101); and {03} · {6E} = {6E} ({02} · {6E}) = (0110 1110) (1101 1100) = (1011 0010). Then
 
{02} · {87}
=
0001 0101
 
 
{03} · {6E}
=
1011 0010
 
 
{46}
=
0100 0110
 
 
{A6}
=
1010 0110
     
0100 0111 = {47}


The other equations can be similarly verified.
The inverse mix column transformation, called InvMixColumns, is defined by the following matrix multiplication:

Equation 5-5

It is not immediately clear that Equation (5.5) is the inverse of Equation (5.3). We need to show that:

which is equivalent to showing that:
Equation 5-6

That is, the inverse transformation matrix times the forward transformation matrix equals the identity matrix. To verify the first column of Equation (5.6), we need to show that:
({0E} · {02}) {0B} {0D} ({09} · {03}) = {01}
({09} · {02}) {0E} {0B} ({0D} · {03}) = {00}
({0D} · {02}) {09} {0E} ({0B} · {03}) = {00}
({0B} · {02}) {0D} {09} ({0E} · {03}) = {00}

[Page 153]
For the first equation, we have {0E} · {02}) 00011100; and {09} · {03} = {09} ({09} · {02}) = 00001001 00010010 = 00011011. Then
 
{0E} · {02}
=
00011100
 
{0B}
=
00001011
 
{0D}
=
00001101
 
{09} · {03}
=
00011011
     
00000001


The other equations can be similarly verified.
The AES document describes another way of characterizing the MixColumns transformation, which is in terms of polynomial arithmetic. In the standard, MixColumns is defined by considering each column of State to be a four-term polynomial with coefficients in GF(28). Each column is multiplied modulo (x4 + 1) by the fixed polynomial a(x), given by
Equation 5-7

Appendix 5A demonstrates that multiplication of each column of State by a(x) can be written as the matrix multiplication of Equation (5.3). Similarly, it can be seen that the transformation in Equation (5.5) corresponds to treating each column as a four-term polynomial and multiplying each column by b(x), given by
Equation 5-8

It can readily be shown that b(x) = a1 (x) mod (x4 + 1).
Rationale
The coefficients of the matrix in Equation (5.3) are based on a linear code with maximal distance between code words, which ensures a good mixing among the bytes of each column. The mix column transformation combined with the shift row transformation ensures that after a few rounds, all output bits depend on all input bits. See [DAEM99] for a discussion.
In addition, the choice of coefficients in MixColumns, which are all {01}, {02}, or {03}, was influenced by implementation considerations. As was discussed, multiplication by these coefficients involves at most a shift and an XOR. The coefficients in InvMixColumns are more formidable to implement. However, encryption was deemed more important than decryption for two reasons:
  1. For the CFB and OFB cipher modes (Figures 6.5 and 6.6; described in Chapter 6), only encryption is used.
  2. As with any block cipher, AES can be used to construct a message authentication code (Part Two), and for this only encryption is used.

AddRoundKey Transformation

Forward and Inverse Transformations
In the forward add round key transformation, called AddRoundKey, the 128 bits of State are bitwise XORed with the 128 bits of the round key. As shown in Figure 5.4b, the operation is viewed as a columnwise operation between the 4 bytes of a State column and one word of the round key; it can also be viewed as a byte-level operation. The following is an example of AddRoundKey:

[Page 154]
The first matrix is State, and the second matrix is the round key.
The inverse add round key transformation is identical to the forward add round key transformation, because the XOR operation is its own inverse.
Rationale
The add round key transformation is as simple as possible and affects every bit of State. The complexity of the round key expansion, plus the complexity of the other stages of AES, ensure security.

AES Key Expansion

Key Expansion Algorithm
The AES key expansion algorithm takes as input a 4-word (16-byte) key and produces a linear array of 44 words (176 bytes). This is sufficient to provide a 4-word round key for the initial AddRoundKey stage and each of the 10 rounds of the cipher. The following pseudocode describes the expansion:

KeyExpansion (byte key[16], word w[44])
{
    word temp
    for (i = 0; i < 4; i++) w[i] = (key[4*i],
 key[4*i+1],
                                   key[4*i+2],
                                   key[4*i+3]);
    for (i = 4; i < 44; i++)
    {
     temp = w[i  1];
     if (i mod 4 = 0) temp = SubWord (RotWord (temp))
                             Rcon[i/4];
     w[i] = w[i4]  temp
    }
}



The key is copied into the first four words of the expanded key. The remainder of the expanded key is filled in four words at a time. Each added word w[i] depends on the immediately preceding word, w[i 1], and the word four positions back,w[i 4]. In three out of four cases, a simple XOR is used. For a word whose position in the w array is a multiple of 4, a more complex function is used. Figure 5.6 illustrates the generation of the first eight words of the expanded key, using the symbol g to represent that complex function. The function g consists of the following subfunctions:

    [Page 155]
  1. RotWord performs a one-byte circular left shift on a word. This means that an input word [b0, b1, b2, b3] is transformed into [b1, b2, b3, b0].
  2. SubWord performs a byte substitution on each byte of its input word, using the S-box (Table 5.4a).
  3. The result of steps 1 and 2 is XORed with a round constant, Rcon[j].
Figure 5.6. AES Key Expansion


The round constant is a word in which the three rightmost bytes are always 0. Thus the effect of an XOR of a word with Rcon is to only perform an XOR on the leftmost byte of the word. The round constant is different for each round and is defined as Rcon[j] = (RC[j], 0, 0, 0), with RC[1] = 1, RC[j] = 2 · RC[j - 1] and with multiplication defined over the field GF(28). The values of RC[j] in hexadecimal are
j
1
2
3
4
5
6
7
8
9
10
RC[j]
01
02
04
08
10
20
40
80
1B
36


For example, suppose that the round key for round 8 is
EA D2 73 21 B5 8D BA D2 31 2B F5 60 7F 8D 29 2F
Then the first 4 bytes (first column) of the round key for round 9 are calculated as follows:
i (decimal)
temp
After RotWord
After SubWord
Rcon (9)
After XOR with Rcon
w[i 4]
w[i] = temp w[i 4]
36
7F8D292F
8D292F7F
5DA515D2
1B000000
46A515D2
EAD27321
AC7766F3


Rationale
The Rijndael developers designed the expansion key algorithm to be resistant to known cryptanalytic attacks. The inclusion of a round-dependent round constant eliminates the symmetry, or similarity, between the ways in which round keys are generated in different rounds. The specific criteria that were used are as follows [DAEM99]:

[Page 156]
  • Knowledge of a part of the cipher key or round key does not enable calculation of many other round key bits
  • An invertible transformation [i.e., knowledge of any Nk consecutive words of the Expanded Key enables regeneration the entire expanded key (Nk = key size in words)]
  • Speed on a wide range of processors
  • Usage of round constants to eliminate symmetries
  • Diffusion of cipher key differences into the round keys; that is, each key bit affects many round key bits
  • Enough nonlinearity to prohibit the full determination of round key differences from cipher key differences only
  • Simplicity of description
The authors do not quantify the first point on the preceding list, but the idea is that if you know less than Nk consecutive words of either the cipher key or one of the round keys, then it is difficult to reconstruct the remaining unknown bits. The fewer bits one knows, the more difficult it is to do the reconstruction or to determine other bits in the key expansion.

Equivalent Inverse Cipher

As was mentioned, the AES decryption cipher is not identical to the encryption cipher (Figure 5.1). That is, the sequence of transformations for decryption differs from that for encryption, although the form of the key schedules for encryption and decryption is the same. This has the disadvantage that two separate software or firmware modules are needed for applications that require both encryption and decryption. There is, however, an equivalent version of the decryption algorithm that has the same structure as the encryption algorithm. The equivalent version has the same sequence of transformations as the encryption algorithm (with transformations replaced by their inverses). To achieve this equivalence, a change in key schedule is needed.
Two separate changes are needed to bring the decryption structure in line with the encryption structure. An encryption round has the structure SubBytes, ShiftRows, MixColumns, AddRoundKey. The standard decryption round has the structure InvShiftRows, InvSubBytes, AddRoundKey, InvMixColumns. Thus, the first two stages of the decryption round need to be interchanged, and the second two stages of the decryption round need to be interchanged.
Interchanging InvShiftRows and InvSubBytes
InvShiftRows affects the sequence of bytes in State but does not alter byte contents and does not depend on byte contents to perform its transformation. InvSubBytes affects the contents of bytes in State but does not alter byte sequence and does not depend on byte sequence to perform its transformation. Thus, these two operations commute and can be interchanged. For a given State Si,
InvShiftRows [InvSubBytes (Si)] = InvSubBytes [InvShiftRows (Si)]

[Page 157]
Interchanging AddRoundKey and InvMixColumns
The transformations AddRoundKey and InvMixColumns do not alter the sequence of bytes in State. If we view the key as a sequence of words, then both AddRoundKey and InvMixColumns operate on State one column at a time. These two operations are linear with respect to the column input. That is, for a given State Si and a given round key wj:
InvMixColumns (Si wj) = [InvMixColumns (Si)] [InvMixColumns (wj)]
To see this, suppose that the first column of State Si is the sequence (y0, y1, y2, y3) and the first column of the round key wj is (k0, k1, k2, k3). Then we need to show that

Let us demonstrate that for the first column entry. We need to show that:
[{0E} · (y0 k0)] [{0B} · (y1 k1)] [{0D} · (y2 k2)] [{09} · (y3 k3)]
= [{0E} · y0] [{0B} · y1] [{0D} · y2] [{09} · y3]
[[{0E} · k0] ] [{0B} · k1] [{0D} · k2] [{09} · k3]
This equation is valid by inspection. Thus, we can interchange AddRoundKey and InvMixColumns, provided that we first apply InvMixColumns to the round key. Note that we do not need to apply InvMixColumns to the round key for the input to the first AddRoundKey transformation (preceding the first round) nor to the last AddRoundKey transformation (in round 10). This is because these two AddRoundKey transformations are not interchanged with InvMixColumns to produce the equivalent decryption algorithm.
Figure 5.7 illustrates the equivalent decryption algorithm.
Figure 5.7. Equivalent Inverse Cipher
(This item is displayed on page 158 in the print version)


Implementation Aspects

The Rijndael proposal [DAEM99] provides some suggestions for efficient implementation on 8-bit processors, typical for current smart cards, and on 32-bit processors, typical for PCs.
8-Bit Processor
AES can be implemented very efficiently on an 8-bit processor. AddRoundKey is a bytewise XOR operation. ShiftRows is a simple byte shifting operation. SubBytes operates at the byte level and only requires a table of 256 bytes.
The transformation MixColumns requires matrix multiplication in the field GF(28), which means that all operations are carried out on bytes. MixColumns only requires multiplication by {02} and {03}, which, as we have seen, involved simple shifts, conditional XORs, and XORs. This can be implemented in a more efficient way that eliminates the shifts and conditional XORs. Equation Set (5.4) shows the equations for the MixColumns transformation on a single column. Using the identity {03} · x = ({02} · x) x, we can rewrite Equation Set (5.4) as follows:

[Page 158]
Equation 5-9

Equation Set (5.9) is verified by expanding and eliminating terms.

[Page 159]
The multiplication by {02} involves a shift and a conditional XOR. Such an implementation may be vulnerable to a timing attack of the sort described in Section 3.4. To counter this attack and to increase processing efficiency at the cost of some storage, the multiplication can be replaced by a table lookup. Define the 256-byte table X2, such that X2[i] = {02} · i. Then Equation Set (5.9) can be rewritten as
Tmp = so, j s1, j s2, j s3, j
s'0, j = s0, j Tmp X2[so, j s1, j]
s'1, c = s1, j Tmp X2[s1, j s2, j]
s'2, c = s2, j Tmp X2[s2, j s3, j]
s'3, j = s3, j Tmp X2[s3, j s0, j]
32-Bit Processor
The implementation described in the preceding subsection uses only 8-bit operations. For a 32-bit processor, a more efficient implementation can be achieved if operations are defined on 32-bit words. To show this, we first define the four transformations of a round in algebraic form. Suppose we begin with a State matrix consisting of elements ai,j and a round key matrix consisting of elements ki,j. Then the transformations can be expressed as follows:
SubBytes
bi,j = S[ai,j]
ShiftRows
MixColumns
AddRoundKey


In the ShiftRows equation, the column indices are taken mod 4. We can combine all of these expressions into a single equation:

In the second equation, we are expressing the matrix multiplication as a linear combination of vectors. We define four 256-word (1024-byte) tables as follows:

Thus, each table takes as input a byte value and produces a column vector (a 32-bit word) that is a function of the S-box entry for that byte value. These tables can be calculated in advance.
We can define a round function operating on a column in the following fashion:

As a result, an implementation based on the preceding equation requires only four table lookups and four XORs per column per round, plus 4 Kbytes to store the table. The developers of Rijndael believe that this compact, efficient implementation was probably one of the most important factors in the selection of Rijndael for AES.


No comments:

Post a Comment