At present, in most confidential communication equipment, general-purpose CPU and dedicated hardware circuit are mainly used to control the dedicated cryptographic chip to realize two types of cryptographic operations. When the former is used to control the dedicated cryptographic chip, it is necessary to select a general-purpose microprocessor GPP (General Purpose Processor) with high flexibility, easy maintenance, and convenient upgrade. However, due to the limitations of the general microprocessor instructions, the dedicated cryptographic chip cannot achieve its optimal performance, which seriously affects the speed of confidential communication; using a dedicated hardware circuit to directly control the dedicated cryptographic chip can achieve the highest performance of the dedicated cryptographic chip, but because its function only depends on the dedicated cryptographic chip and its peripheral devices, it has poor flexibility and a long development cycle.
It can be seen that no matter which of the above methods is used, the separation of the operation and control of the dedicated cryptographic chip limits the cryptographic data processing performance and restricts the overall speed of the system. In view of the above problems, by analyzing a variety of cryptographic algorithms, this paper proposes an explicit parallel instruction computing structure (EPIC programmable cryptographic processor architecture based on processor design ideas, which achieves a compromise between speed and flexibility.
1 Cryptographic Algorithm Analysis
1.1 Typical Cryptographic Algorithms and Their Applications
Now we analyze seven block cipher algorithms and two hash functions, namely DES, IDEA, Rijndael, RC6, Serpent, Twofish, Mars, MD5 and SHA. A
block cipher algorithm is a bijective function that maps a bit of plaintext into an n-bit ciphertext, where n is the block length. Its encryption and decryption processes have the same key, so it is also called a symmetric cryptographic algorithm. A hash function is a function that compresses a message of any length into a message digest of a fixed length. It is mainly used in digital signatures, message integrity detection, and message origin authentication detection.
The DES algorithm (Data Encryption Standard) is the first generation of publicly available, fully described implementation details and recognized by the world. Its original designer was IBM, which obtained its patent. In the following two decades, the DES algorithm, as a typical block cipher algorithm, has been widely used to protect the security of commercial data (such as banking systems, etc.). The
IDEA algorithm (International Data Encryption Algorithm) was announced in 1992 and meets the IPES standard. It is well-known for its wide application in email encryption authentication software (PGP).
Riindael was announced in 1998 and won the AES selection hosted by NIST (National Institute of Standards and Technology) in 2000. Since then, the Rijndael algorithm has also been called the AES algorithm and has become a new encryption standard that gradually replaces DES.
RC6, Serpent, Twofish and Mars algorithms are AES candidate algorithms that participated in the evaluation together with the Rijndael algorithm. They all embody the design principles of block cipher algorithms to varying degrees and have had a considerable impact on the development of applied cryptography.
The MD5 message digest function is a one-way hash function proposed by Rivest, one of the designers of the RSA algorithm. It is not based on any assumptions or cryptographic systems, and uses a direct construction method with a very fast processing speed.
SHA is the secure hash standard of the Federal Information Processing Standard (FIPS-180) published in 1993. It was proposed by NIST and its revised version was launched in 1995, commonly known as SHA-1.
1.2 Basic operations in cryptographic algorithms
Based on the analysis of the above algorithms, the core operation types of each algorithm are extracted, and their basic operations are summarized into the following six categories: S-box operation, bit permutation operation, arithmetic operation, logical operation, shift operation and finite field multiplication operation. Among them, arithmetic operations include modular addition/subtraction and modular multiplication operations, and logical operations are composed of 'AND i, 'OR i, 'NOT i and 'XOR i. Table 1 lists in detail their specific applications in various algorithms. For example, the DES algorithm mainly uses S-box operation, bit permutation, XOR and shift operations.
2 Design of Programmable Cryptographic Processor Architecture
In the typical programmable cryptographic processor architecture (AFPC), the EPIC architecture exploits random concurrency between scalar operations and increases the number of functional components. Unrelated instructions are explicitly compiled into an extra-long machine instruction word by the compiler and emitted to the pipeline, and executed concurrently in each functional component with an instruction-level parallelism of 4 to 8. The hardware control of this structure is relatively simple, and the inherent parallelism is obvious in computationally intensive applications. And it does not require a lot of branch prediction. Running instructions on this structure can achieve a high degree of actual instruction-level parallelism. It is precisely because of the above characteristics that the EPIC structure largely meets the requirements of cryptographic algorithms, that is, computationally intensive and sequential execution.
The hardware structure of the programmable cryptographic processor architecture is shown in Figure 1. The entire processor consists of three parts: data path, control unit, and input/output interface circuit.
The data path is one of the key components of the processor, including 6 parallel executable functional units FUO~FU5, 32 32-bit general registers, 4×32 32-bit key registers and write-back units.
The functional unit is the core of the processor to execute instruction operations, and is composed of several cryptographic operation modules. Among them, the composition and structure of the internal operation modules of FUO~FU3 are exactly the same. The input is 3 32-bit operation data, 2 of which come from the general register stack and 1 from the key register stack, and the output operation result is also 32 bits. FUO~FU3 has 7 operation modules set up inside, namely S-box operation module, modular addition and subtraction operation module, modular multiplication operation module, 32-bit shift operation module, finite field multiplication operation module, two-input logic operation module, and three-input logic operation module. FU4 has a 128-bit permutation operation module set up inside, and the input is 12 32-bit operation data, 8 of which come from the general register stack and 4 from the key register stack. FU5 has a 128-bit shift operation module set up inside, and the input is also 12 32-bit operation data, of which 8 are from the general register file and 4 are from the key register file.
The functions of the above operation modules are not single, but reconfigurable. Table 2 shows the modes supported by the four reconfigurable operation modules.
In addition to the reconfigurable operation modes mentioned above, each operation module also supports adding XOR i operation before operation, adding XOR i operation after operation, or adding XOR i operation before and after operation according to specific circumstances. Since the delay of XOR i operation is very small, its addition does not affect the critical path of operation, which reduces the clock of a single XOR i operation during cryptographic operation, thereby reducing the number of clocks of the entire operation, and does not affect the overall performance. Table 3 shows the round operation flow of the Rijndael algorithm. The XOR i operation is added after the finite field multiplication operation, and the number of clock cycles is reduced from 4 to 3. 10 rounds of operation will reduce 10 clock cycles.
The control unit completes the tasks of instruction access, instruction decoding, instruction memory address generation, etc., and coordinates the correct execution of instructions inside the processor and external user commands.
The input/output interface circuit includes 16 32-bit input registers, 16 32-bit output registers, 4 data length counters, 1 32-bit command register, etc., which complete the operations of loading instructions and operation data from the 32-bit data bus to the instruction memory and input registers, and writing the operation results from the internal general registers to the output registers.
3 Instruction system design
The instruction system is a concentrated embodiment of the algorithm elements and the characteristics of the cryptographic processor architecture. The design of the instruction system must support the parallel execution of hardware, that is, the development of instruction level parallelism (ILP). The degree of development of instruction level parallelism is critical to give full play to the hardware characteristics of the cryptographic microprocessor and improve the program running performance. ILP technology actually refers to a complete set of processor design and compilation technologies. These technologies accelerate the execution of programs by executing independent machine operations (such as memory reading and writing, logical operations, arithmetic operations, etc.) in parallel. The size of ILP can be measured by the average number of instructions executed per cycle (IPC) or the average number of cycles executed per instruction of the entire program CPI (CPT=l/IPC). An explicit parallel instruction computing structure is adopted in the programmable cryptographic processor architecture, and the instruction level parallelism reaches 5.
3.1 Instruction Classification
Instructions in the programmable cryptographic processor architecture are divided into the following categories:
(1) Static configuration instructions. These are control information configuration instructions that remain unchanged or change very rarely during key generation and encryption/decryption. Once the algorithm is determined, its S-box lookup table information, finite field multiplier matrix and irreducible polynomial, and several permutation control information are determined, and they will not change due to different operation modes. In the encryption/decryption process, the method of separating configuration instructions can greatly reduce the redundant encoding of instructions when performing cryptographic operations, thereby shortening the length of instruction words and increasing the number of valid operations contained in the operation instruction words, effectively improving the encryption/decryption speed and reducing the amount of code in the cryptographic program.
(2) Short instructions. These instructions perform various cryptographic operations and data transfer operations between internal registers except for permutation and 128-bit shift operations.
(3) Long instructions. These instructions perform permutation and 128-bit shift operations.
(4) Super long instructions. These instructions perform immediate operations and multi-branch judgment operations.
(5) Control instructions. These instructions perform control operations such as program jumps, subroutine calls and returns, and single branch judgments.
3.2 Instruction form
In hardware, the settings of multiple functional units provide support for the parallel execution of multiple instructions. The principles of which instructions can be executed in parallel, which instructions cannot be executed in parallel, and how to assemble multiple instructions into one instruction are called instruction assembly rules. In this design, there are the following instruction forms:
(1) Static configuration instructions.
(2) Extra-long instructions.
(3) Short instructions II Short instructions II Short instructions II Short instructions II Control instructions.
(4) Long instructions II Control instructions.
The length of short instructions is 37 bits, the length of control instructions is 32 bits, and the length of long instructions is 148 bits. Regardless of the above forms, the final instruction word length is 192 bits (including instruction assembly identifiers). For example, four short instructions can be assembled into one instruction with a control instruction, and long instructions can also be assembled into one instruction with a control instruction. However, static configuration instructions and extra-long instructions cannot be assembled with other instructions and form a 192-bit instruction word by themselves.
4 Performance Analysis
Since the programmable cryptographic processor architecture supports 5 instructions to be bound and executed in parallel, its data path is defined as 5CS (5 Combining-Strands). Assuming that the data path without binding is defined as NCS (No-Combining-Strands), these two cases are compared with the Alpha processor and the Cryp-toManiac cryptographic processor [9]. The number of clocks required for encryption/decryption under the four data paths is shown in Table 4. The analysis and comparison shows that the execution clock of the programmable cryptographic processor is greatly reduced, especially compared with the general-purpose processor Alpha. The number of clocks for encryption/decryption is reduced by 83% for the DES algorithm, 92% for the IDEA algorithm, 91% for the Rijndael algorithm, 69% for the RC6 algorithm, and 78% for the Twofish algorithm.
In order to verify the correctness of the data path and control path of the programmable cryptographic processor architecture, the Altera StratixlIEP2S180F1508C4 device is used as the FPCA target chip, and the Altera QuartusII 5.0 tool is used for synthesis. Before and after synthesis, Mentor's ModelSim 5.8c is used for functional simulation and timing simulation respectively, and the results are correct. The specific resource usage is shown in Table 5.
The flexibility and efficiency of cryptographic processing have always been the limiting factors in the use of cryptographic algorithms. Although the use of general-purpose microprocessors can achieve better flexibility, the performance of some algorithms cannot meet the requirements; the use of dedicated algorithm chips loses flexibility while achieving high performance. In response to this contradiction, this paper takes the EPIC structure microprocessor architecture as the starting point, systematically studies the key technologies such as the general parallel block cipher processor model, various cryptographic operation units, and instruction sets, and finally realizes it, achieving a good compromise between performance and flexibility.
Previous article:Design of a programmable traffic control system based on new rules
Next article:Design of DSP Video Driver Based on GIO/FVID
- Popular Resources
- Popular amplifiers
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Introduction to charging system with high efficiency charge pump charger
- I got another fun board from Espressif. Anyone want to play with it? ? Go to the post to find out.
- GD32E231 official routines cannot be purely software-based
- Fuzhou/Wuxi state-owned enterprises recruit senior analog IC design engineers
- High-efficiency integrated power supplies from Texas Instruments for NXP processors and Xilinx FPGAs
- Measurement experts talk about "Arbitrary Waveform Generator Basics", follow us to win Keysight gifts
- Honeycomb coil
- The electric car's headlights are not on, the horn is ringing, and there is a 60V voltage in one of the two control circuits of the headlight circuit board. Please help analyze it.
- IPTV - The New Gaming Battleground
- How to design multi-rail power supply for application circuit boards