Implementation of AES and its optimization using AVR assembly language-EEWORLD

Collect

Introduction

With the development of symmetric encryption, the DES data encryption standard algorithm is no longer suitable for the data encryption security requirements of today's distributed open networks due to its small key length (56 bits). Therefore, in 1997, NIST publicly solicited a new data encryption standard, namely AES[1]. After three rounds of screening, the Rijndael algorithm submitted by Joan Daeman and Vincent Rijmen of Belgium was proposed as the final algorithm of AES. This algorithm will become the new data encryption standard in the United States and is widely used in various fields. Although people still have different opinions on AES, in general, AES as a new generation of data encryption standard has the advantages of strong security, high performance, high efficiency, ease of use and flexibility. AES is designed with three key lengths: 128, 192, and 256 bits. Relatively speaking, the 128-bit key of AES is 1021 times stronger than the 56-bit key of DES[2]. The AES algorithm mainly includes three aspects: round change, number of rounds and key expansion. This article takes 128 as an example to introduce the basic principles of the algorithm; combined with AVR assembly language, the advanced data encryption algorithm AES is implemented.

1 AES encryption and decryption algorithm principle and AVR implementation

AES is a block key, the algorithm inputs 128-bit data, and the key length is also 128 bits. Nr represents the number of rounds of encryption for a data block (the relationship between the number of encryption rounds and the key length is listed in Table 1). Each round requires the participation of an expanded key Expandedkey(i) with the same length as the input block. Since the length of the external input encryption key K is limited, a key expansion program (Keyexpansion) is used in the algorithm to expand the external key K into a longer bit string to generate the encryption and decryption keys for each round.

1.1 Round Change
Each round transformation of AES consists of the following three layers:
nonlinear layer - Subbyte transformation;
line mixing layer - ShiftRow and MixColumn operations;
key addition layer - AddRoundKey operation.

① Subbyte transformation is a nonlinear byte transformation acting on each byte in the state, which can be mapped through the calculated S box.
Schange:
ldi zh,＄01;Move the pointer to the first address of the S box
mov zl,r2;Set the data to be searched as the pointer low address
ldtemp,z+;Take out the corresponding data
mov r2,temp;Exchange the data to complete the table lookup
.
.
.
ret

② ShiftRow is a byte swap. It cyclically shifts the rows in the state according to different offsets, and this offset is also selected according to different Nb [3].
Shiftrow:;This is a byte-swapping subroutine
mov temp,r3;because it is 4×4
mov r3,r7; r2 r6 r10 r14 r2 r6 r10 r14
mov r7,r11; r3 r7 r11 r15---r7 r11 r15 r3
mov r11,r15; r4 r8 r12 r17 r12 r17 r4 r8
mov r15,temp; r5 r9 r13 r18 r18 r5 r9 r13
mov temp,r4
mov temp1,r8
mov r4,r12
mov r8,r17
mov r12,temp
mov r17,temp1
mov temp,r18
mov r18,r13
mov r13,r9
mov r9,r5
mov r5,temp
ret

③ In MixColumn transformation, each column in the state is regarded as the result of multiplying a polynomial a(x) over GF(28) by a fixed polynomial c(x). The coefficients of b(x)=c(x)*a(x) are calculated as follows: *The operation is not an ordinary multiplication operation, but a special operation, namely
b(x)=c(x)·a(x)(mod x4+1)
For this operation
b0=02. a0+03. a1+a2+a3Let
xtime(a0)=02.
a0Where the symbol "." represents congruence multiplication modulo an irreducible octal polynomial [3].
mov temp,a0;This is a mixcolimn subroutine
rcall xtime;Call the xtime program
mov a0,temp
mov temp,a1
rcall xtime
eor a0,a1
eor a0,temp
eor a0,a2
eor a0,a3;Complete the calculation of b(x)
.
.
.
xtime:;This is a subroutine
ldi temp1,＄1b
lsl temp
brcs next1;If the highest bit is 1, then transfer
next: ret;Otherwise, nothing changes
next1:eor temp,temp1
rjmp next

For the inverse change, the matrix C needs to be changed to the corresponding D, that is, b(x)=d(x)*a(x).

④ The key adding operation (addround) is to perform bit-wise "XOR" on the corresponding bytes in the round key state.

⑤ According to the properties of linear change [1], the decryption operation is the inverse change of the encryption change. This will not be described in detail here.

1.2 Round Change

For different packet lengths, the corresponding number of round changes is different, as listed in Table 1.

1.3 Key Expansion
The AES algorithm uses an external input key K (the number of words in the key string is Nk) to obtain a total of 4 (Nr+1) words of the extended key through the key expansion procedure. It involves the following three modules:

① Position transformation (rotword) - change a 4-byte sequence [A, B, C, D] into [B, C, D, A];

② S-box transformation (subword) - perform S-box replacement on a 4-byte;

③ Transformation Rcon[i] - Rcon[i] represents a 32-bit word [xi-1, 00, 00, 00]. Here x is (02), such as
Rcon[1]=[01000000]; Rcon[2]=[02000000]; Rcon[3]=[04000000]...

Generation of extended key: The first Nk words of the extended key are the external key K; the subsequent word W[[i]] is equal to the "XOR" of the previous word W[[i-1]] and the previous Nk-th word W[[i-Nk]], that is, W[[i]]=W[[i-1]]W[[i- Nk]]. However, if i is a multiple of Nk, then W[i]=W[i-Nk]Subword(Rotword(W[[i-1]]))Rcon[i/Nk]. [page]

When the program is executed, the above subroutines are mainly called, and the specific implementation is as follows:
Keyexpansion:
rcall rotwoed
rcall subword
rcall Rcon
.
.
.
The encryption and decryption process of AES is shown in Figure 1.

Figure 1 AES encryption and decryption process

2 Optimization of AES encryption and decryption algorithm

From the above algorithm process, it can be clearly seen that the most time-consuming part of the program is the circle change part, so the optimization of the algorithm is also here; and the circle change part that can be optimized is the column change. Because the column change is a modular multiplication congruence rule. Since AES encryption and decryption are asymmetric, if it is not optimized, the decryption speed of the algorithm will be much faster than the encryption speed [1].

① Encryption operation. The column transformation (Mixcolumn) can be optimized by calling the xtime subroutine. The specific algorithm [1] is implemented as follows:

Another effective optimization method is to construct a table offline, that is, a column change table. In this way, the encryption speed can be improved by simply looking up the table.

② Optimization of the decryption algorithm. Since the coefficients of the decryption column transformation are 09, 0E, 0B and 0D respectively. It obviously takes a lot of time to implement the above multiplication on the AVR microcontroller, which leads to reduced decryption performance.

Optimization method 1: Decompose the column changes to reduce the number of multiplications.

A careful study of the coefficients of the decryption matrix shows that the decryption matrix and the encryption matrix have a certain connection, that is, the decryption matrix is equal to the multiplication of the encryption matrix and a matrix. Through such a connection, the algorithm can be optimized:

In this way, only a few simple "XOR" can be used to achieve column changes, reduce the number of multiplications, and increase the speed of decryption.
Optimization method 2: construct a table.

As with the encryption construction method, four tables can be constructed: T[ea]=e×a; T[9a]=9×a; T[9a]=9×a; T[ba]=b×a. In this way, only table lookup and simple XOR are needed to complete the decryption task. Although this method will increase additional overhead, it is an effective method.

3 Experimental simulation of AES encryption and decryption

According to the above experimental steps and optimization methods, the experimental results listed in Tables 2 and 3 are obtained.

Assume that the master key is: 000102030405060708090a0b0c0d0e0f (128bit).
Encrypted plaintext: 00112233445566778899AABBCCDDEEFF.
Ciphertext: 69C4E0D86A7B0430D8CDB78070B4C55A.
Decrypted ciphertext: 69C4E0D86A7B0430D8CDB78070B4C55A.
Plaintext: 00112233445566778899AABBCCDDEEFF.

In short, AES is an asymmetric cryptographic system, and its decryption is more complicated and time-consuming than encryption. The decryption optimization algorithm does not increase the storage space, but processes based on column changes. The program is smaller than the original one and saves time. The decryption optimization method is the fastest and most efficient, but it will increase the system's storage space, so its program is also the largest one.

Note: AES128 data encryption and decryption program can be found on the website of this journal (www.dpj.com.cn).

Conclusion

AES advanced data encryption algorithm is superior to DES data encryption algorithm in terms of security, efficiency, and key flexibility. It will gradually replace DES and be widely used in the future. This paper implements the AES algorithm based on the high-speed computing performance of AVR, and optimizes the algorithm in combination with assembly language. According to the specific needs of the actual application, the corresponding method can be selected.

References
1 Song Zhen, et al. Cryptography. Beijing: China Water Resources and Hydropower Press, 2002
2 Yang Yixian. New Theory of Modern Cryptography. Beijing: Science Press, 2002
3 Gu Dawu, et al. Advanced Encryption Standard (AES) Algorithm - Design of Rijndael. Beijing: Tsinghua University Press, 2003
4 Geng Degen, et al. AVR Microcontroller Application Technology. Beijing: Beijing University of Aeronautics and Astronautics Press, 2002
5 Song Jianguo, et al. Principles and Applications of AVR High-Speed Embedded Microcontrollers. Beijing: Beijing University of Aeronautics and Astronautics Press, 2001
6 NIST. Advanced Encryption Standard (AES) .Federal Information Processing Standards Publication,2001

Reference address：Implementation of AES and its optimization using AVR assembly language

Previous article：Design of electric vehicle lithium battery pack based on ATmega16
Next article：Serial Interface Intelligent Converter Based on AVR Microcontroller

Recommended ReadingLatest update time:2024-11-16 19:33

The most concentrated area of the automotive chip market: SerDes introduction

SerDes stands for serialization and deserialization. In the automotive field, each camera requires at least one serializer and at least 0.25 deserializers. Each display requires a serialization and a deserialization chip. The global market size in 2023 is about 2.5-3 billion US dollars. Although the market size is n

[Embedded]

The most concentrated area of the automotive chip market: SerDes introduction

Synopsys Expands DesignWare Security and Processor IP Offerings to Address Cybersecurity and Functional Safety Needs in Automotive Designs

Synopsys, Inc. (Nasdaq: SNPS) today announced the availability of its new DesignWare® tRoot™ Hardware Security Module (HSM) and ARC® SEM130FS functional and information security processor IP solutions, both of which integrate functional safety features to accelerate ISO 26262 certification for automotive system-on-chi

[Automotive Electronics]

Diodes Introduces Mode-Programmable Synchronous Boost Converter to Improve Energy Efficiency in Consumer Devices

Plano, Texas, USA, August 9, 2022 - Diodes Incorporated (Diodes) has introduced a high-efficiency synchronous boost converter . The AP72250 provides boost conversion capability and is designed for compact consumer and industrial product applications, including battery-powered devices, USB power supplies, pow

[Power Management]

Diodes Introduces Mode-Programmable Synchronous Boost Converter to Improve Energy Efficiency in Consumer Devices

Diodes Incorporated's compact, high-bandwidth 2:1 multi-tasking/de-multiplexing switch

Diodes Incorporated's compact, high-bandwidth 2:1 multi-tasking/des-multiplexing switch enables routing with excellent signal integrity Plano, Texas, USA, June 10, 2021 - Diodes Incorporated has expanded its passive multitasking/demultiplexing switch family. The PI2DBS16212A supports data rates up to

[Network Communication]

Diodes Incorporated's compact, high-bandwidth 2:1 multi-tasking/de-multiplexing switch

Diodes Incorporated Launches Non-Isolated Offline Switcher with Low Standby Power

Diodes Incorporated Launches Non-Isolated Offline Switcher with Low Standby Power to Significantly Reduce BOM Cost Diodes Incorporated (Diodes) has launched a universal AC high voltage input non-isolated offline switcher IC to further strengthen its power product portfolio. The AP3928 solves the chal

[Power Management]

Popular Resources
Popular amplifiers