Functional verification of TURBO51 embedded microprocessor-EEWORLD

Collect

1.1 Background

The engineering background of TURBO51 is that the design of TURBO51 embedded microprocessor adopts the mainstream system structure of 32-bit machine that has been tested by time. Under the premise of strictly ensuring compatibility with 8051 instruction set, the system structure of its processor core is redefined to explore the parallelism of the processor structure. In the traditional 8051 software development environment, the work that should be completed by a 32-bit processor with higher bit width is realized and all existing software resources are fully reused. Under the reality that multiple addressing modes are mixed at the 8051 instruction level and the instruction length is not fixed, a high-performance architecture, out-of-order emission, branch prediction, precise exception handling, guess-based lookahead prefetching, and on-chip first-level instruction cache are realized. The complexity of the processor system structure puts forward high requirements for verification. Moreover, since TURBO51 is an embedded processor core of SoC, it is the control core and user interface of the entire large-scale SOC. If the verification in the embedded processor design is not perfect or the performance does not meet the design requirements, it will lead to the fatal failure of the development of the entire SoC project. Therefore, the verification of the embedded processor is one of the most important parts in SoC design.

All pursuits of high-performance architecture must first be based on the correctness of the design. TURBO51 verification faces three major challenges:

(1) Correctness. The high-performance architecture used is also complex and high-risk. Only a correct design can bring about an improvement in SoC performance.

(2) Compatibility: Compared with the traditional 8051, interrupts and exceptions often interrupt the execution of the program. The dynamic pipeline processor has dynamic out-of-order execution of instructions, but to the external program, it must only differ in speed from fully sequential execution rather than in results. Therefore, it must be able to accurately maintain consistency with the exception results under sequential execution conditions.

(3) The instruction and operand space is huge, and the addressing method is complex:

The 8051 instruction set has a total of 111 instructions, with multiple addressing modes and variable instruction lengths. In addition, the 8051 instruction set also accesses input and output device registers and architecture registers as the same type of registers.

These have become difficult issues in structural design and verification.

(4) Measurement of verification adequacy: During the verification process, based on the nature, cause, and quantity distribution of the errors found, evaluate the degree of correctness and adjust the following verification plan to make the verification more in-depth and achieve rapid convergence of design errors.

1.2 Current Status of Microprocessor Verification

At present, the functional verification methods used by processor companies around the world are mainly simulation verification, formal verification, and hardware simulation acceleration. But in general, due to the huge instruction set, for example, the number of its completely error-free test vectors is the factorial of the number of instructions and the factorial of the number of operands and addresses. It is difficult to achieve in a limited time. Unless all the combinations of instructions and operands have been tested, even after these verifications, it can only prove that the design is correct in the places covered by the test, but cannot prove that the design is correct in all cases.

Formal verification refers to proving the completeness of the design through mathematical methods, that is, the sample space under this method is all possible states of the test object. A rithSMV, * PHDD. Due to the huge state sample space, it only uses design attribute checking tools and is currently only used for local logic verification.

Simulation verification: including RTL simulation and gate-level simulation. The effectiveness of this stage of verification is largely determined by the test stimulus and the method of determining the simulation results. In microprocessor verification, assembly language is used to write test stimulus, run the operating system, application program and randomly generate test vectors.

Hardware accelerated simulation: To overcome the disadvantage of slow simulation verification, physical prototype verification using FPGA can run the operating system and application before tape-out, further verifying correctness at the system level.

2 Verification Method of TURBO51

TURBO51 uses formal verification, simulation and hardware accelerated simulation in its design. It adopts the method of bottom-up submodule level verification and top-down macromodule and system level verification. In the whole design process, verification and design are a whole. TURBO51 starts to write verification plan for the ongoing design at the same time as the document timing design. The design and verification work starts with the behavior description and variable definition of each clock cycle in the design document and verification plan, which is the most important part of the whole design and verification. Since the design of TURBO51 must ensure backward compatibility with the traditional 8051 instruction set, TURBO51 uses two 8051 hardware simulators that can be debugged in single step, two traditional 8051s, and two improved 8051s with simple pipeline structure as the correct scale. The test stimulus is run one by one here, and its running results are used as the standard for defining correct execution and correct compatibility. The verification methods, verification results, problem distribution and verification strategies of each module for each register read and write in each clock cycle and each design stage are specified here, and the test program is manually written for simulation. In the verification document, record how to judge the correctness of the design and the serious design loopholes and their causes, and record which critical states have been considered in the design document, which provides an important basis for doubting whether there may be such an error in a certain situation in the future. In the design of TURBO51, the coverage index has been introduced in the document stage. Each designed logic must be tested to prove the necessity of such design and correct function. In the functional design, each conditional judgment can always find the method and judgment standard of this condition in the test document. Many times, when writing the test method, many situations that were not considered in the design were found. The interaction between the functional design document and the verification document guided by coverage enables TURBO51 to complete the timing design, register definition and full coverage of all block-level tests before starting RTL, such as renaming of the same physical address under multiple addressing methods in register renaming, out-of-order emission, and precise exceptions. Generally speaking, the more document-level description errors are, the easier it is to modify, and the more hardware-level errors are, the more difficult it is to find, the larger the modification amount is, and it is easy to introduce other errors.

In this stage, it is relatively easy to use permutations and combinations to perform formal verification for complete coverage of situations, eliminating most serious errors, and the hand-written test programs used for simulation are also used in subsequent verification. RTL is nothing more than a translation process of a Verilog description of a document, so RTL is not the most important part of TURBO51 design. It can be completed quickly according to the requirements of the functional design document and code inspection, but during the period, the comprehensive results should be used to guide the pipeline load balancing and further adjustments in details. However, each RTL modification that is different from the original functional design document description must first modify the function and verification documents, and the RTL code can only be changed after it is reviewed and approved again. Simulation and RTL writing are integrated, and in Turbo51 verification, they are divided into three stages: module, macro module, and system level. Only when the design, verification and documentation of one stage fully meet the planned requirements, that is, code inspection and code coverage, can the next stage of work be started, so that errors can be quickly converged. During this period, errors are divided into high-risk area errors and low-risk area errors. When something is not normal, we first start to check from the high-risk area that affects the program running direction. After eliminating the errors in the high-risk area, we can find the errors in the low-risk area. After the module-level RTL simulation is completed, the macro module level, instruction pipeline, LOAD/STORE, Cache, etc., and then the system-level RTL simulation. In the design verification of Turbo51, FPGA verification can only be carried out after the code coverage rate of the entire RTL code specification inspection of TURBO51 reaches the coverage rate requirement of RTL simulation and passes the review of the design description document and verification document. Therefore, the bottom line of TURBO51 design verification is to at least eliminate all such serious errors that may cause crashes or compatibility before FPGA hardware prototype verification. The design verification of TURBO51 does not rely on the next stage test to find the errors that should have been found and solved in the previous stage, but only uses the next stage to confirm the completion of the previous stage goal. The purpose of FPGA verification is to test the application running in a real environment for a long time, because after all, many responses to external signals are not easy to simulate in RTL simulation, rather than to find and debug problems that should be eliminated in simulation.

3 Formal Verification

The advantage of formal verification is that it can traverse the entire state space and achieve complete verification. It has been used since the design behavior description specification, and is used to prove the completeness of the highest risk combination in storage access, cache, branch prediction, dynamic execution, and exception handling in high-risk areas. For example, when designing the replacement strategy of the on-chip first-level instruction cache of TURBO51, it is necessary to handle each possible state, otherwise the state machine may be deadlocked. Here, the first thing to do is to mathematically prove whether there are only a few states, and then start writing functional behavior descriptions and verification plans. This method can make the logic with a large error impact but a small state space completely correct, and later facts have also proved that the formal design did not have any anomalies under all subsequent test stimuli.

Another place where formal verification is used in TURBO51 verification is when performing RTL code style checks, using formal verification tools to perform functional comparisons on the RTL before and after modifications. A similar approach is also used to compare the equivalence of the physical netlist with the front-end netlist.

4 RTL simulation and coverage and code style checking

4.1 RTL simulation

When preparing the functional timing document and developing the RTL simulation plan, RTL writing and simulation will only begin after the functional behavior description and verification plan of each submodule, macromodule, and system level design are completed. After the RTL coding of each submodule is completed, the model described at the behavioral level is put into simulation, and then the code checking tool provided by the EDA tool is used to check the RTL code. Simulation is then performed until the code coverage is reached, and then RTL code checking and simulation based on code coverage are performed at the macromodule and system level layer by layer. The main debugging and testing of verification are carried out at this stage, including verification of full compatibility with the 8051 standard, verification of high-risk areas, and running operating systems and applications.

Two standards are used here, namely the code coverage of the designed logic by the test stimulus given by the EDA tool and the coverage of the critical functions defined by oneself.

In simulation, manual assembly language is used to write stimulus for critical instruction combinations. In compatibility testing, instruction set testing, bit addressing space traversal, power-on value test, register file read and write traversal, LS variable RAM traversal, code space paging switching, interrupt control, 8051 standard peripherals, timing, IO, extended peripheral verification, SOC bus read and write, PWM pulse width modulation, online program burning, basic applications:

Software I2C read and write, read 64KB data from the outside and system test, remote control button decoding based on the operating system and parameter reading of other devices on the chip. During this period, the simulation model of flash memory is used. For instruction set testing, test stimulus is created in the existing commercial software development environment. For all 111 instructions, the execution result value of each instruction in the standard 8051 manual, branch target, branch direction, and the impact on the identification bit are tested. First, single-step operation is performed on the benchmark platform, and each state value of each instruction is recorded. Then these values are used as the correct basis. After executing one item, compare the result. If they are the same, continue to move forward. At the same time, one IO outputs a square wave. If they are different, enter the dead loop of the result, mark, and branch of this instruction. By checking the address of this dead loop, you can quickly locate which instruction is wrong and where. At the same time, another IO outputs another square wave. This program is first run on the benchmark platform without entering the dead loop, and then it is converted into a data file and imported into the simulation model. Register file reading and writing are also based on the 8051 manual, distinguishing register files corresponding to different addressing modes. In the test results, the most important observation point is the instruction submission address register, which records the actual processor running direction. As long as it does not appear abnormal, this test item is considered to have no serious errors. RTL simulation in TURBO51 is divided into three parts: test stimulus generation, result detection and coverage analysis. TURBO51 uses manual writing of critical conditions and basic test programs, and then runs actual applications and operating systems after passing. The standard for completion of this stage is the inspection of code and function coverage. During the writing process of module-level RTL, the code style and simulation test coverage are checked while synthesis is performed to test the satisfaction of the critical path with the design timing. As an auxiliary verification, the automatic instruction generation instruction library and the test stimulus generated by the instruction generation controller are also run together in a behavioral-level 8051 instruction set simulator simulation model, and the results are compared one by one. When the results are inconsistent, the instructions are recorded. When the branches are inconsistent, the simulation stops or the simulation volume reaches a certain scale, so as to check the code coverage.

4.2 Coverage and code style checking

The difficulty of simulation-based verification is that no matter whether the test stimulus used is from real applications or automatic instruction generation, it is impossible to prove that the entire processor is error-free. Therefore, the completion standard of TURBO51 simulation verification is to add more test vectors when the error converges, so that the design logic coverage provided by the EDA tool reaches 100% at the block level and 93% at the expression level, and the functional coverage reaches 100%.

Functional coverage testing is all critical points of all behaviors and verification plans defined in the design specification. In the process of coverage checking, the current total coverage and the logic and input values of a certain state in a module that are not covered by the test vector can be obtained. It indicates the existence of vulnerabilities and guides manual writing of tests directly targeting uncovered logic. In addition, code coverage is also used in the design of TURBO51 to eliminate redundant or repeated logic and save unnecessary critical path overhead and logic resources. Code checking: Code checking uses the functions provided by EDA tools. It makes the code not produce exceptions in synthesis and makes the simulation results inconsistent with FPGA. Here, the formal verification tool is used to compare the equivalence of the modified and unmodified codes. Table 1 is the RTL simulation code coverage of each module, and Table 2 is the code block coverage and expression coverage of the main modules under different test stimuli. Both are given by Cadence Incisive.

Table 1 Code test coverage probability of main modules.

Main module code test coverage probability

Table 2 Coverage of main modules under different test stimuli

Coverage of main modules under different test stimuli

5 Physical Prototype Verification

Physical prototype verification is another important verification method commonly used in ASIC design. It is another physical implementation form of ASIC design after the RTL description is synthesized and optimized for FPGA target devices, layout and routing, and optimization, and static timing analysis is performed at the same time. It is closer to the real ASIC than RTL simulation, and can completely replace ASIC on the system board in terms of function, but the maximum speed is generally more than half slower than ASIC. After all these are completed and passed the review of the design description document and verification document, FPGA hardware accelerated simulation is carried out to check compatibility and correctness completely in the system application environment and make preliminary performance tests. Compared with simulation, it can increase the system operation speed by several orders of magnitude.

The premise of TURBO51 FPGA verification is that the design has passed the formal verification of key points, completed RTL simulation and code checking with 100% block coverage, and the errors have been converged. Therefore, the primary purpose of FPGA verification is to verify whether the error estimation of the two steps is correct by running the complete target application system that is exactly the same as the real application environment, and cooperate with other SOC modules for SOC collaborative verification. Because it is very convenient to verify some systems that are inconvenient to simulate on the FPGA platform. In the FPGA verification of TURBO51, the remaining resources on the FPGA are fully utilized to locate and monitor the status and running status of each clock of the FPGA implementation version of TURBO51 in real time. This has actually greatly improved the defects that were originally thought to be difficult to locate errors on FPGA. Running the system in a real environment provides an observation window with debugging capabilities that are very close to RTL simulation. Here, we still first select the instruction submission address and instruction fetch address, accumulator, B register, program status word PSW, reorder buffer status, exception handling flag, write back bus, and submit bus bit as the main observation points, display the status of each clock, and coordinate them with the output of other SOC modules. The output waveform results are observed by oscilloscope to form the FPGA verification results. TURBO51 works at 60MH z during FPGA verification. In addition to running all hand-written test programs for simulation, it also successfully runs all existing mass-produced RTOS-based commercial systems and their extreme conditions for two hundred hours in a row, and no serious errors are found. Through real-time monitoring of register values, non-fatal peripheral errors within ten places are found, such as GPIO and peripheral input and output multiplexing.

Of course, every time you change the RTL or monitor register, you need to regenerate the FPGA burning file. TURBO51 takes nearly two hours, so it still cannot replace simulation. After completing the FPGA verification, do the synthesis and static timing analysis of the process standard cell library provided by the factory preparing for the tape-out, hand over the netlist for back-end layout and routing, and then use the back-end gate-level netlist with gate delay to perform gate-level simulation, and finally write the sample base test program.

6 Verification Results Analysis

Since the implementation method and verification plan were initially developed simultaneously, errors accumulated during the entire design phase. In the design and verification of TURBO51, formal verification was first used to fully prove the highest risk combinations in storage access, cache, branch prediction, dynamic execution, and exception handling, so that errors could be eliminated. In subsequent verifications, no exceptions occurred in the parts that were correct through formal verification, as shown in Figure 1.

Error time cumulative statistics
Figure 1. Cumulative statistics of error time.

In this way, all high-risk errors have been eliminated in the middle of RTL simulation and most of them are completed by manually written test stimuli. Since the 8051 instruction set has relatively small instruction dead space, manual writing is feasible. Most of the errors found in RTL simulation are IO device errors and have nothing to do with the processor instruction execution part. As shown in Figure 2, 99.7% of the errors have converged before FPGA verification, so it can be determined that the previous work is solid and effective. If a design has not converged in the FPGA verification stage, and a large number of new errors, especially serious errors, can be found, this means that there are serious problems in the simulation and behavior model description and verification plan, and it should be returned and re-run, otherwise the risk of tape-out is high.

Statistics of error distribution found at different verification stages

Fig. 2. Statistics of error distribution found in different verification stages.

7 Conclusion and Future Work

The TURBO51 embedded microprocessor uses the above-mentioned multiple verification methods to converge the more serious errors earlier. In addition, the high RTL code coverage and the long-term successful operation of all target applications and all simulation test programs on the FPGA indicate that the design is correct and fully compatible, which enables the TURBO51 embedded microprocessor to be successfully taped out in the first time using Fujitsu Microelectronics (Japan) 90nmCMOS process. On the other hand, the configurable and constrained automatic random instruction sequence has been increasingly widely used in more complex processor verification. The verification of TURBO51 is still in its early stages in this regard, which will be the main direction of improvement in the future.

Keywords：MCU Reference address：Functional verification of TURBO51 embedded microprocessor

Previous article：A high-precision countdown system based on AT89C51 control
Next article：Atmospheric temperature acquisition and recording system based on AT89C2051 single chip microcomputer

Recommended ReadingLatest update time:2024-11-16 20:45

Formerly known as HuaDa Semiconductor's MCU Division, Xiaohua Semiconductor received RMB 40 million in financing

Recently, China Electronics Smart Fund completed its investment in Xiaohua Semiconductor Co., Ltd. (referred to as "Xiaohua Semiconductor") with an investment amount of 40 million yuan. Xiaohua Semiconductor is located in Shanghai Free Trade Zone. It was formerly known as the MCU Division of Huada Semicon

[Mobile phone portable]

LCD1602+MCU+Puzhong+Jiangsu University of Science and Technology Automation Association

1 Realization phenomenon LCD1602 Working Principle To display a character or string on LCD1602, you only need to write the character constant or string constant to be displayed in the C51 program. After compiling, the C51 program will automatically generate its standard ASCII code, and then send the

[Microcontroller]

LCD1602+MCU+Puzhong+Jiangsu University of Science and Technology Automation Association

Why does the MCS-51 microcontroller need to interrupt the system expansion? How does the timer serve as an external interrupt source?

Why does the MCS-51 microcontroller need to use interrupt system expansion? Answer: Although MCS-51 has provided two external interrupt request input terminals INT0 and INT1, when there is still a large demand for external interrupt request sources, interrupt system expansion is required. When the timer is used as

[Microcontroller]

MCU 51 small experiment---marquee

The following program has been run on my experimental box. AT89C52 function: According to the level of P2.X port, if the input is 1, X lights will cycle on P0P1 port. ;The input is a switch button ORG 0000H JMP8: MOV P2,#0FFH ;Set the port to 1, that is, set it as input mode

[Microcontroller]

Design of Single-Chip Microcomputer System for Automobile Driving Simulator

introduction A car driving simulator is a simulation system that can accurately simulate the driving operation of a car and obtain the feeling of driving a real car. The current car driving simulator integrates a variety of advanced technologies such as sensors, computer three-dimensional real-time animation, c

[Microcontroller]

Design of Single-Chip Microcomputer System for Automobile Driving Simulator

Lithium battery fast charging circuit based on single chip microcomputer

　　Common rechargeable batteries include nickel-metal hydride batteries, nickel-cadmium batteries, lithium batteries and polymer batteries. Among them, lithium batteries are widely used due to their high energy density, stable discharge characteristics, no memory effect and long service life. At present, most mobile ph

[Power Management]

Lithium battery fast charging circuit based on single chip microcomputer

Design of home temperature monitoring system controlled by single chip microcomputer

introduction Temperature is a very important physical quantity in the IoT home system, and its measurement and control are of great significance. With the improvement of monitoring of various IoT homes, the temperature control of various devices has higher requirements. In order to meet people's needs for tempe

[Industrial Control]

Design of home temperature monitoring system controlled by single chip microcomputer

Comparison and analysis of MCU-based and ASIC-based LED thyristor dimming solutions

As a new and most potential light source, LED lighting is gaining more and more attention for its advantages of energy saving and environmental protection. With the policy encouragement of the national and local governments, China's LED lighting industry has entered an accelerated development stage, and the applicatio

[Power Management]

Comparison and analysis of MCU-based and ASIC-based LED thyristor dimming solutions

Popular Resources
Popular amplifiers