Research and design of cryptographic processor architecture based on EPIC technology

Publisher:WhisperingHeartLatest update time:2010-08-04 Source: 电子技术应用 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

At present, in most confidential communication equipment, general-purpose CPU and dedicated hardware circuit are mainly used to control the dedicated cryptographic chip to realize two types of cryptographic operations. When the former is used to control the dedicated cryptographic chip, it is necessary to select a general-purpose microprocessor GPP (General Purpose Processor) with high flexibility, easy maintenance, and convenient upgrade. However, due to the limitations of the general microprocessor instructions, the dedicated cryptographic chip cannot achieve its optimal performance, which seriously affects the speed of confidential communication; using a dedicated hardware circuit to directly control the dedicated cryptographic chip can achieve the highest performance of the dedicated cryptographic chip, but because its function only depends on the dedicated cryptographic chip and its peripheral devices, it has poor flexibility and a long development cycle.

It can be seen that no matter which of the above methods is used, the separation of the operation and control of the dedicated cryptographic chip limits the cryptographic data processing performance and restricts the overall speed of the system. In view of the above problems, by analyzing a variety of cryptographic algorithms, this paper proposes an explicit parallel instruction computing structure (EPIC programmable cryptographic processor architecture based on processor design ideas, which achieves a compromise between speed and flexibility.

1 Cryptographic Algorithm Analysis

1.1 Typical Cryptographic Algorithms and Their Applications

Now we analyze seven block cipher algorithms and two hash functions, namely DES, IDEA, Rijndael, RC6, Serpent, Twofish, Mars, MD5 and SHA. A

block cipher algorithm is a bijective function that maps a bit of plaintext into an n-bit ciphertext, where n is the block length. Its encryption and decryption processes have the same key, so it is also called a symmetric cryptographic algorithm. A hash function is a function that compresses a message of any length into a message digest of a fixed length. It is mainly used in digital signatures, message integrity detection, and message origin authentication detection.

The DES algorithm (Data Encryption Standard) is the first generation of publicly available, fully described implementation details and recognized by the world. Its original designer was IBM, which obtained its patent. In the following two decades, the DES algorithm, as a typical block cipher algorithm, has been widely used to protect the security of commercial data (such as banking systems, etc.). The

IDEA algorithm (International Data Encryption Algorithm) was announced in 1992 and meets the IPES standard. It is well-known for its wide application in email encryption authentication software (PGP).

Riindael was announced in 1998 and won the AES selection hosted by NIST (National Institute of Standards and Technology) in 2000. Since then, the Rijndael algorithm has also been called the AES algorithm and has become a new encryption standard that gradually replaces DES.

RC6, Serpent, Twofish and Mars algorithms are AES candidate algorithms that participated in the evaluation together with the Rijndael algorithm. They all embody the design principles of block cipher algorithms to varying degrees and have had a considerable impact on the development of applied cryptography.

The MD5 message digest function is a one-way hash function proposed by Rivest, one of the designers of the RSA algorithm. It is not based on any assumptions or cryptographic systems, and uses a direct construction method with a very fast processing speed.

SHA is the secure hash standard of the Federal Information Processing Standard (FIPS-180) published in 1993. It was proposed by NIST and its revised version was launched in 1995, commonly known as SHA-1.

1.2 Basic operations in cryptographic algorithms

Based on the analysis of the above algorithms, the core operation types of each algorithm are extracted, and their basic operations are summarized into the following six categories: S-box operation, bit permutation operation, arithmetic operation, logical operation, shift operation and finite field multiplication operation. Among them, arithmetic operations include modular addition/subtraction and modular multiplication operations, and logical operations are composed of 'AND i, 'OR i, 'NOT i and 'XOR i. Table 1 lists in detail their specific applications in various algorithms. For example, the DES algorithm mainly uses S-box operation, bit permutation, XOR and shift operations.

2 Design of Programmable Cryptographic Processor Architecture

In the typical programmable cryptographic processor architecture (AFPC), the EPIC architecture exploits random concurrency between scalar operations and increases the number of functional components. Unrelated instructions are explicitly compiled into an extra-long machine instruction word by the compiler and emitted to the pipeline, and executed concurrently in each functional component with an instruction-level parallelism of 4 to 8. The hardware control of this structure is relatively simple, and the inherent parallelism is obvious in computationally intensive applications. And it does not require a lot of branch prediction. Running instructions on this structure can achieve a high degree of actual instruction-level parallelism. It is precisely because of the above characteristics that the EPIC structure largely meets the requirements of cryptographic algorithms, that is, computationally intensive and sequential execution.

The hardware structure of the programmable cryptographic processor architecture is shown in Figure 1. The entire processor consists of three parts: data path, control unit, and input/output interface circuit.

The data path is one of the key components of the processor, including 6 parallel executable functional units FUO~FU5, 32 32-bit general registers, 4×32 32-bit key registers and write-back units.

The functional unit is the core of the processor to execute instruction operations, and is composed of several cryptographic operation modules. Among them, the composition and structure of the internal operation modules of FUO~FU3 are exactly the same. The input is 3 32-bit operation data, 2 of which come from the general register stack and 1 from the key register stack, and the output operation result is also 32 bits. FUO~FU3 has 7 operation modules set up inside, namely S-box operation module, modular addition and subtraction operation module, modular multiplication operation module, 32-bit shift operation module, finite field multiplication operation module, two-input logic operation module, and three-input logic operation module. FU4 has a 128-bit permutation operation module set up inside, and the input is 12 32-bit operation data, 8 of which come from the general register stack and 4 from the key register stack. FU5 has a 128-bit shift operation module set up inside, and the input is also 12 32-bit operation data, of which 8 are from the general register file and 4 are from the key register file.

The functions of the above operation modules are not single, but reconfigurable. Table 2 shows the modes supported by the four reconfigurable operation modules.

In addition to the reconfigurable operation modes mentioned above, each operation module also supports adding XOR i operation before operation, adding XOR i operation after operation, or adding XOR i operation before and after operation according to specific circumstances. Since the delay of XOR i operation is very small, its addition does not affect the critical path of operation, which reduces the clock of a single XOR i operation during cryptographic operation, thereby reducing the number of clocks of the entire operation, and does not affect the overall performance. Table 3 shows the round operation flow of the Rijndael algorithm. The XOR i operation is added after the finite field multiplication operation, and the number of clock cycles is reduced from 4 to 3. 10 rounds of operation will reduce 10 clock cycles.

The control unit completes the tasks of instruction access, instruction decoding, instruction memory address generation, etc., and coordinates the correct execution of instructions inside the processor and external user commands.

The input/output interface circuit includes 16 32-bit input registers, 16 32-bit output registers, 4 data length counters, 1 32-bit command register, etc., which complete the operations of loading instructions and operation data from the 32-bit data bus to the instruction memory and input registers, and writing the operation results from the internal general registers to the output registers.

3 Instruction system design

The instruction system is a concentrated embodiment of the algorithm elements and the characteristics of the cryptographic processor architecture. The design of the instruction system must support the parallel execution of hardware, that is, the development of instruction level parallelism (ILP). The degree of development of instruction level parallelism is critical to give full play to the hardware characteristics of the cryptographic microprocessor and improve the program running performance. ILP technology actually refers to a complete set of processor design and compilation technologies. These technologies accelerate the execution of programs by executing independent machine operations (such as memory reading and writing, logical operations, arithmetic operations, etc.) in parallel. The size of ILP can be measured by the average number of instructions executed per cycle (IPC) or the average number of cycles executed per instruction of the entire program CPI (CPT=l/IPC). An explicit parallel instruction computing structure is adopted in the programmable cryptographic processor architecture, and the instruction level parallelism reaches 5.

3.1 Instruction Classification

Instructions in the programmable cryptographic processor architecture are divided into the following categories:

(1) Static configuration instructions. These are control information configuration instructions that remain unchanged or change very rarely during key generation and encryption/decryption. Once the algorithm is determined, its S-box lookup table information, finite field multiplier matrix and irreducible polynomial, and several permutation control information are determined, and they will not change due to different operation modes. In the encryption/decryption process, the method of separating configuration instructions can greatly reduce the redundant encoding of instructions when performing cryptographic operations, thereby shortening the length of instruction words and increasing the number of valid operations contained in the operation instruction words, effectively improving the encryption/decryption speed and reducing the amount of code in the cryptographic program.
(2) Short instructions. These instructions perform various cryptographic operations and data transfer operations between internal registers except for permutation and 128-bit shift operations.
(3) Long instructions. These instructions perform permutation and 128-bit shift operations.
(4) Super long instructions. These instructions perform immediate operations and multi-branch judgment operations.
(5) Control instructions. These instructions perform control operations such as program jumps, subroutine calls and returns, and single branch judgments.

3.2 Instruction form

In hardware, the settings of multiple functional units provide support for the parallel execution of multiple instructions. The principles of which instructions can be executed in parallel, which instructions cannot be executed in parallel, and how to assemble multiple instructions into one instruction are called instruction assembly rules. In this design, there are the following instruction forms:

(1) Static configuration instructions.
(2) Extra-long instructions.
(3) Short instructions II Short instructions II Short instructions II Short instructions II Control instructions.
(4) Long instructions II Control instructions.

The length of short instructions is 37 bits, the length of control instructions is 32 bits, and the length of long instructions is 148 bits. Regardless of the above forms, the final instruction word length is 192 bits (including instruction assembly identifiers). For example, four short instructions can be assembled into one instruction with a control instruction, and long instructions can also be assembled into one instruction with a control instruction. However, static configuration instructions and extra-long instructions cannot be assembled with other instructions and form a 192-bit instruction word by themselves.

4 Performance Analysis

Since the programmable cryptographic processor architecture supports 5 instructions to be bound and executed in parallel, its data path is defined as 5CS (5 Combining-Strands). Assuming that the data path without binding is defined as NCS (No-Combining-Strands), these two cases are compared with the Alpha processor and the Cryp-toManiac cryptographic processor [9]. The number of clocks required for encryption/decryption under the four data paths is shown in Table 4. The analysis and comparison shows that the execution clock of the programmable cryptographic processor is greatly reduced, especially compared with the general-purpose processor Alpha. The number of clocks for encryption/decryption is reduced by 83% for the DES algorithm, 92% for the IDEA algorithm, 91% for the Rijndael algorithm, 69% for the RC6 algorithm, and 78% for the Twofish algorithm.

In order to verify the correctness of the data path and control path of the programmable cryptographic processor architecture, the Altera StratixlIEP2S180F1508C4 device is used as the FPCA target chip, and the Altera QuartusII 5.0 tool is used for synthesis. Before and after synthesis, Mentor's ModelSim 5.8c is used for functional simulation and timing simulation respectively, and the results are correct. The specific resource usage is shown in Table 5.

The flexibility and efficiency of cryptographic processing have always been the limiting factors in the use of cryptographic algorithms. Although the use of general-purpose microprocessors can achieve better flexibility, the performance of some algorithms cannot meet the requirements; the use of dedicated algorithm chips loses flexibility while achieving high performance. In response to this contradiction, this paper takes the EPIC structure microprocessor architecture as the starting point, systematically studies the key technologies such as the general parallel block cipher processor model, various cryptographic operation units, and instruction sets, and finally realizes it, achieving a good compromise between performance and flexibility.

Reference address:Research and design of cryptographic processor architecture based on EPIC technology

Previous article:Design of a programmable traffic control system based on new rules
Next article:Design of DSP Video Driver Based on GIO/FVID

Latest Embedded Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号