In the design of RISC CPU, the processing of transfer instructions has a critical impact on the performance of the processor. Transfer instructions determine the execution order of the program and are frequently used in the program. In RISC CPU, the program is executed in a pipeline manner. When the program is executed sequentially, the address of the next instruction has nothing to do with the content of the previous instruction. When executing a transfer instruction, the address of the next instruction must be determined based on the execution result of the transfer instruction. In other words, the address of the next instruction is unknown before the transfer instruction is executed, resulting in pipeline incoherence and affecting the efficiency of the CPU.
There are many methods for processing transfer instructions, which can be divided into predictive methods and non-predictive methods. Predictive methods include static prediction and dynamic prediction. Static prediction includes total predictive jump, forward non-jump and reverse jump, dynamic prediction includes 2-bit counter (2BC) and BTC; non-predictive methods include delayed jump, etc. [1]. These basic methods can be reasonably combined to achieve good results.
The processing method of the RISC CPU for transfer instructions introduced in this paper is a 5-stage pipeline operation, namely instruction fetch, decoding, execution, memory access, and write back. The processing of transfer instructions is completed at the instruction fetch and decoding levels; the decoding level gives detailed information contained in the transfer instruction, and the instruction fetch level contains an address calculation unit, a transfer target cache (BTC), a jump judgment unit, etc. The delayed jump, 2BC, and BTC methods are used to process the transfer instruction.
2 Principles of transfer instructions
The instruction set of this RISC CPU contains conditional transfer instructions and unconditional transfer instructions. All transfer instructions use delayed transfer, and each transfer instruction is followed by a delay slot instruction; 2BC is used to predict whether the conditional transfer jumps, and BTC saves the information after the transfer instruction with a fixed transfer target is executed. The following introduces the design of the transfer instruction in the RISC CPU design and the specific implementation methods of delayed transfer, BTC, and 2BC.
2.1 Transfer instruction type and format
The instruction set of this RISC CPU contains conditional transfer instructions (BCC) and unconditional transfer instructions (CALL and RET), and its encoding format is shown in Figure 1. The CALL instruction contains a 2-bit opcode and a 30-bit absolute address. The BCC instruction contains an 8-bit opcode, a 4-bit condition code, a 19-bit offset, and a 1-bit to distinguish whether the instruction has an A parameter (i.e., ANNUL operation). All BCC instructions use the same opcode, and different BCC instructions are distinguished by condition codes. There are 16 types of BCC instructions in total; the offset is a signed number, and after the low bit is extended with 00, it can address a relative address of ±220. The RET instruction contains an 8-bit opcode and two 5-bit register addresses.
|
2.2 Delayed transfer
In this RISC CPU, since the transfer instruction is only recognized at the decoding level, whether to jump or not can only be determined at the decoding level, so it is necessary to wait for a clock cycle before fetching the next instruction. In order to reduce bubbles in the pipeline, an instruction that is not related to the jump, namely the delay slot instruction, is inserted immediately after the transfer instruction. Regardless of whether the jump occurs, this instruction is executed. The insertion of the delay slot instruction is completed by the compiler. When the compiler cannot find such an instruction, a NOP instruction is inserted. Considering the difficulty of reducing the compiler, we also use the transfer with A parameter: when the instruction has an A parameter, the delay slot instruction is taken from the transfer target program. Therefore, when the transfer occurs, the delay slot instruction is executed, and when the transfer does not occur, the delay slot instruction is prohibited from entering the decoding level. Generally speaking, the frequency of non-conditional transfer instructions is much lower than that of conditional transfer instructions, and the delay slot instruction of non-conditional transfer instructions is relatively easy to find. Therefore, non-conditional transfer instructions do not use the A parameter option, while conditional transfer instructions use the A parameter option.
2.3 Design of 2BC and BTC
2BC and BTC play an important role in improving the execution efficiency of transfer instructions. In RISC CPU, after a transfer instruction is executed once, there is a high probability that it will be executed more times. For transfer instructions (BCC and CALL) with fixed transfer targets, BTC is used to store related information when it is executed for the first time. When it is executed again, this information is directly read out to control the execution order of the program without the transfer instruction itself entering the pipeline. This can greatly improve efficiency, but BTC is invalid for indirect transfer instructions with uncertain transfer targets (such as RET). In addition, whether the conditional transfer instruction (BCC) jumps is also uncertain. 2BC is used for prediction in this design.
BTC is a fully associative cache with a total of 16 units. Each unit contains the following information: TAG stores the address of the executed transfer instruction, DI stores the delay slot instruction, CC stores the conditional code, TP stores the transfer instruction type, AN stores the A parameter carrying flag, HI stores the historical record of the transfer execution, that is, 2BC, and VI indicates whether the row data is valid. BTC includes three work tasks: BTC storage, BTC hit, and BTC check. The following describes the working conditions of 2BC and BTC under each task.
2.3.1 The role and working principle of 2BC
Because after the transfer instruction is executed once, the transfer target address and delay slot instruction are stored in BTC, when the instruction is executed again, this information is directly read from the cache, so the jump target address and delay slot instruction can be obtained at the instruction fetch level. For non-conditional transfer instructions, the jump is always executed, so when BTC hits, the address of the next instruction can be directly determined as the transfer target address, and the DI is sent to the instruction bus in the current cycle; but for conditional transfer instructions, whether to jump or not is determined based on the condition code and the flag bit of the ALU. If the execution result of the previous instruction of the transfer instruction changes the flag bit, and when BTC hits, the instruction is still in the decoding level, it takes a clock cycle to decide whether to jump or not. In order to avoid the pipeline from stalling due to waiting, the current state of 2BC is used to predict whether the jump is executed. In the next clock cycle, after the flag bit is valid, check whether the prediction is correct. If it is not correct, correct it. When the prediction is accurate, using 2BC and BTC can shorten the execution time of the transfer instruction by one cycle. Even if the prediction is inaccurate, there is no loss compared to not using the prediction. The working principle of 2BC is shown in Figure 2. The initial value is Nx (the first time no jump execution) or Tx (the first time jump execution), t means jump execution, and n means jump not execution. When HI is N or Nx, the predicted jump does not occur; when HI is T or Tx, the predicted jump occurs.
|
2.3.2 BTC storage
When the transfer instruction is executed for the first time, BTC starts the storage task at the current clock and writes the information of the instruction execution into the corresponding unit. For the BCC instruction, the initial state of 2BC is determined. At the same time, the VI of the row is also set to valid. BTC uses a random replacement strategy to determine the data entry: after reset or cache clearing, the cache is filled in order. If BTC is full, a row is randomly selected for replacement.
2.3.3 BTC hit
At the beginning of the instruction fetch cycle, if it is found that the current instruction fetch address is included in the TAG of BTC and the VI of the corresponding row is also valid, BTC is considered to have hit, and the hit task is started: read the data of the hit row, send DI to the instruction bus, if it is a CALL instruction, the transfer target address is used as the address of the next instruction; if it is a BCC instruction, it is necessary to determine whether the jump occurs: when the flag bit is valid, it is judged based on the condition code and the flag bit, otherwise it is predicted based on HI, and then the address of the next instruction is determined: when jumping, it is the transfer target address, and if not jumping, it is PC+2. For BCC instructions with A parameters, when the jump is not executed, DI should be prohibited from entering the decoding stage in the next clock. The process of BTC hit is shown in Figure 3.
|
2.3.4 BTC Check
If BTC hits in the previous cycle, the BTC check task is started at the beginning of the current cycle; if BTC in the previous cycle predicts the jump of BCC based on HI, then after the current clock flag is valid, it is necessary to re-judge whether the jump decision is correct. If it is incorrect, it must be corrected, and the correct instruction fetch address must be given to request that the decoding level or execution level be disabled in the next clock. At the same time, HI must be updated according to the final jump situation and the HI update algorithm. The flowchart of BTC check is shown in Figure 4.
|
3 Conclusion
The entire RISC CPU is described in Verilog HDL language and simulated for standard programs. The simulation results show that the above method for processing transfer instructions can significantly improve the pipeline throughput. Since the delay slot instruction is inserted after the transfer instruction, the execution of the transfer instruction is exactly the same as the sequential execution of the program; although the use of BTC adds some hardware overhead, it makes the transfer instruction basically not occupy pipeline resources when it is executed again, greatly improving the efficiency of the CPU
Previous article:Design and application of ADuC812 single chip microcomputer in underground oil and gas pressure storage test system
Next article:Application of PPP protocol in embedded single chip microcomputer
- Popular Resources
- Popular amplifiers
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications