In real-time control systems, the most important indicator for selecting a microcontroller is the calculation speed. The instruction cycle is an important indicator that reflects the calculation speed. For this reason, this paper analyzes and tests the instruction cycles of three most representative microcontrollers (AT89S51 microcontroller, LPC2114 microcontroller with ARM7TDMI core, and TMS320F2812). In order to observe the instruction cycle, the GPIO ports of the three controllers are set as digital output ports, and the GPIO ports are set and cleared in a loop. The cycle of the entire cycle is obtained by observing the waveform changes of the GPIO ports. In order to correspond the cycle of the entire cycle with the instruction cycle of each specific instruction, the assembly language instructions are obtained through the C language source program to calculate the instruction cycle of each assembly language.
1 AT89S51 working mechanism and instruction cycle test
The clock of the AT89S51 microcontroller adopts an internal mode, and the clock generator divides the oscillation pulse by 2. Since the clock cycle is twice the oscillation cycle (clock cycle = oscillation cycle P1 + oscillation cycle P2), and 1 machine cycle contains 6 clocks, 1 machine cycle includes 12 crystal oscillation cycles. Taking the oscillation frequency of the quartz crystal oscillator as 11.059 2 MHz, the machine cycle of the microcontroller is 12/11.059 2=1.085 1 μs. The instruction cycle of the 51 series microcontroller generally contains 1 to 4 machine cycles, most instructions are single-cycle instructions, and there are 2-cycle and 4-cycle instructions.
In order to observe the instruction cycle, the lowest bit of the P1 port of the microcontroller is set and cleared cyclically. The source program is as follows:
#include
main() {
while(1) {
P1=0x01;
P1=0x00;
}
}
Use KEIL uVISION2 to compile and link to generate an executable file. When calling the Debug in the integrated environment, you can get the disassembly code of the mixed mode of the above source program:
2:main()
3: {
4:while(1)
5:{
6:P1=0x01;
0x000F759001MOVP1(0x90),#0x01
7:P1=0x00;
0x0012 E4CLRA
0x0013 F590MOVP1(0x90),A
8:}
0x001580EDSJMPmain ( C:0003)
The code in italics is the C source program, and the code in normal text is the assembly language code corresponding to the italic C source program. The first column of each line of assembly code is the location of the code in the memory, the second column is the machine code, and the following is the compiled and linked assembly language code. All instructions take up 6 machine cycles (of which "MOV P1(0x90),#0x01" takes up 2 machine cycles, "CLR A" and "MOV P1(0x90),A" each take up 1 machine cycle, and the last jump instruction takes up 2 machine cycles), so the total cycle period is 6×machine cycles=6×1.085 1 μs=6.51 μs.
Figure 1 The waveform of the lowest bit of P1 port
Download the compiled and linked executable file to the Flash of AT89S51 and execute it to get the waveform of the lowest bit of port P1, as shown in Figure 1. The entire cycle period is 6.1 μs, which is exactly the same as the above analysis.
2 LPC2114 working mechanism and instruction cycle test
LPC2114 is an encrypted microcontroller based on the ARM7TDMI core, with 128 KB of zero-wait on-chip Flash and 16 KB of SRAM. The clock frequency can reach 60 MHz (the frequency of the crystal oscillator is 11.059 2 MHz, the clock frequency is set to 11.059 2×4 =44.236 8 MHz, and the frequency of the on-chip peripherals is 1/4 of the clock frequency, that is, the frequency of the crystal oscillator). The ARM7TDMI core improves the execution speed of the instruction stream by using a three-stage pipeline and a large number of internal registers, and can provide an instruction execution speed of 0.9 MIPS/MHz, that is, the instruction cycle is 1/(0.9×44.236 8)=0.025 12 μs, which is about 25 ns.
In order to observe the instruction cycle, the P0.25 pin of the GPIO in LPC2114 is set as an output port, and the set and clear operations are performed on it cyclically. The C source program is as follows:
#include"config.h"
//P0.25 pin output
#defineLEDCON0x02000000
intmain(void)
{//Set all pins to connect GPIO
PINSEL0 = 0x00000000;
PINSEL1 = 0x00000000;
//Set LED4 control port to output
IO0DIR = LEDCON;
while(1)
{IO0SET = LEDCON;
IO0CLR = LEDCON;
}
return(0);
}
ADS1.2 is used to compile and link to generate an executable file. When AXD Debugger is called, the disassembled code of the above source program can be obtained:
main[0xe59f1020]ldrr1,0x40000248
40000224[0xe3a00000]movr0,#0
40000228[0xe5810000]strr0,[r1,#0]
4000022c[0xe5810004]strr0,[r1,#4]
400 00230[0xe3a00780]movr0,#0x2000000
40000234[0xe1c115c0 ]bicr1,r1,r0,asr #11
40000238[0xe5810008]strr0,[r1,#8]
4000023c[0xe5810004]strr0,[r1,#4]
40000240[0xe581000c]strr0,[r1,#0xc]
40000244[0xeafffffc]b0x4000023c
40000248[0xe002c000]dcd0xe0 02c000
The first column of each line of assembly code is the location of the code in the memory, the second column is the machine code, and the following is the compiled and linked assembly language code. The most critical statements in the loop part are the following three sentences:
4000023c[0xe5810004]strr0,[r1,#4]
40000240[0xe581000c]strr0,[r1,#0xc]
40000244[0xeafffffc]b0x4000023c
In AXD Debugger, call it into RAM to run the program to get the output waveform of P0.25 of the GPIO of the loop part, as shown in Figure 2. As can be seen from the figure, the time to maintain the high level in the loop cycle is about 1350 ns, and the time to maintain the low level is about 450 ns, that is, the instruction "str r0, [r1, #4]" and the instruction "str r0, [r1, #0xc]" both require about 350 ns, while the jump instruction requires about 100 ns. This is mainly due to the following reasons: ① Most of the ARM instructions are single-cycle, but some instructions (such as multiplication instructions) are multi-cycle; ② The microcontroller based on the ARM core can only access the data of the memory through load, store and exchange instructions, so reading data from the memory or writing data to the memory requires an additional clock cycle; ③ Accessing the on-chip peripherals requires an additional peripheral clock cycle. Of course, each instruction also requires 1 clock cycle, and clearing the pipeline during the jump requires an additional clock cycle.
Figure 2 GPIO P0.25 pin output waveform
In order to observe the multiplication instruction, the following assembly language is used for experiment. First is the assembly source program without multiplication instruction:
INCLUDELPC2294.INC ; Import header file
; P0.25 pin controls LED4, low level lights up
LEDCONEQU0x02000000
EXPORTMAIN
; Declare program code block
AREALEDCONC,CODE,READONLY
; Load register address, PINSEL0
MAINLDRR0,=PINSEL0
; Set data, that is, set the pin to connect to GPIO
MOVR1,#0x00000000
STRR1,[R0]; [R0] ← R1
LDRR0,=PINSEL1
STRR1,[R0]
LDRR0,=IO0DIR
LDRR1,=LEDCON
; Set LED control port to output
STRR1,[R0]
; Set GPIO control parameter
LOOPLDRR1,=LEDCON
LEDSETLDRR0,=IO0SET
; LED control I/O set, that is, LED4 turns off
STRR1,[R0]
LEDCLRLDRR0,=IO0CLR
; LED control I/O reset, that is, LED4 lights up
STRR1, [R0]
; jump to LOOP
B LOOP
unconditionally
The assembly code compiled and linked using ADS1.2 is:
LOOP [0xe3a01780]movr1,#0x2000000
LEDSET[0xe59f0028] ldrr0,0x40000128
400000fc[0xe5801000]strr1,[r0,#0]
LEDCLR[0xe59f0024] ldrr0,0x4000012c
4 0000104 [0xe5801000]strr1,[r0,#0]
40000108 [0xeafffff9] bLOOP
In AXD Debugger, call it into RAM to run the program and get the output waveform of GPIO P0.25 pin in the loop part, as shown in Figure 3. It can be seen from the figure that the high level time in the loop period is about 450 ns, and the low level time is about 550 ns.
Figure 3 GPIO P0.25 pin output waveform 2
Add a multiplication instruction to the LOOP part of the above example, that is, change the loop part to:
LOOP LDRR1,=LEDCON
LEDSETLDRR0,=IO0SET
STRR1,[R0]
MOVR2,#0x0234
MULR2,R1,R2
LEDCLRLDRR0,=IO0CLR
STRR1,[R0]
B LOOP[page]
The assembly code compiled and linked using ADS1.2 is:
LOOP[0xe3a01780]movr1,#0x2000000
LEDSET[0xe59f0030]ldrr0,0x40000130
400000fc[0xe5801000]strr1,[r0,#0]
40000100[0xe3a02f8d]movr2,#0x234
40000104[0xe0020291] mulr2,r1,r2
LEDCLR[0xe59f0024] ldrr0, 0x40000134
4000010c[0xe5801000]strr1,[r0,#0]
40000110[0xeafffff7]bLOOP
In AXD Debugger, call it into RAM to run the program and get the output waveform of GPIO P0.25 pin in the loop, as shown in Figure 4. It can be seen from the figure that the high level time in the loop period is about 550 ns, and the low level time is about 550 ns. Compared with the above example, it can be seen that the extra MUL multiplication instruction and MOV transfer instruction take up 100 ns in total.
To sum up, the following conclusions can be drawn: When ARM instructions are placed in RAM and run, the instructions "str r0, [r1, #4]" and "strr0, [r1, #0xc]" both take about 350 ns, which is equivalent to 14 instruction cycles; the execution time of the instruction "ldr r0, 0x4000012c" is 100 ns, which is equivalent to 4 instruction cycles; the MUL multiplication instruction and MOV transfer instruction take a total of 100 ns, which is equivalent to 4 instruction cycles; the jump instruction takes a total of 100 ns, which is equivalent to 4 instruction cycles.
3 TMS320F2812 working mechanism and instruction cycle test
TMS320F2812 is a high-performance and cost-effective 32-bit fixed-point DSP chip for control produced by TI. The chip can work at a maximum frequency of 150 MHz (this article sets it to 100 MHz), and has 18K×16-bit 0-wait cycle on-chip SRAM and 128K×16-bit on-chip Flash (access time is 36 ns). TMS320F2812 adopts Harvard bus structure, that is, it can simultaneously perform one instruction fetch, data read and data write operation in the same clock cycle. At the same time, TMS320F2812 also uses 8-stage pipeline to improve the execution speed of system instructions.
In order to observe the instruction cycle, the GPIOA0 of TMS320F2812 is set and cleared repeatedly. The C source program is as follows:
#include "DSP28_Device.h"
void main(void) {
InitSysCtrl();/*Initialize system*/
DINT;/*Disable interrupt*/
IER = 0x0000;
IFR = 0x0000;
InitPieCtrl();/*Initialize PIE control register*/
InitPieVectTable();/*Initialize PIE vector table*/
InitGpio();/*Initialize EV*/
EINT;
ERTM;
for(;;) {
GpioDataRegs.GPADAT.all=0xFFFF;
GpioDataRegs.GPADAT.all=
0xFFFF; GpioDataRegs.GPADAT.all=0xFFFF
; GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
}
}
Figure 4 GPIO P0.25 pin output waveform 3
The most important thing is to initialize the general purpose input/output and determine the system CPU clock. The system clock is set to 100 MHz through PLL, and the source code for initializing InitGpio() is:
#include "DSP28_Device.h"
void InitGpio(void)
{ EALLOW;
//Multiplexer is selected as digital I/O
GpioMuxRegs.GPAMUX.all=0x0000;
//GPIOAO is output, the rest are input
GpioMuxRegs.GPADIR.all=0x0001;
GpioMuxRegs.GPAQUAL.all=0x0000;
EDIS;
}
By adding a breakpoint at the for(;;) in the main program, you can easily find the assembly instructions after the loop part of the main program is compiled:
3F8011 L1:
3F8011761FMOVWDP,#0x01C3
3F8013 2820 MOV@32,#0xFFFF
3F8015 2820 MOV@32,#0xFFFF
3F8017 2820 MOV@32,#0xFFFF
3F8019 2820 MOV@32,#0xFF
FF 3F801B 2820 MOV@32,#0xFFFF
3F801D 2820 MOV @32,#0xFFFF
3F801F 2B20 MOV@32,#0
3F8020 2B20 MOV@32,#0
3F8021 2B20 MOV@32,#0
3F8022 6FEF SBL1,UNC
The first column is the location of the program in RAM, the second column is the machine code, and the following is the assembly language program. The instruction "MOV @32,#0xFFFF" makes GPIO output high level, and the instruction "MOV @32,#0" makes GPIO output low level. There are 6 instructions to make GPIOA0 output high level and 3 instructions to make GPIOA0 output low level. The system instruction cycle is 10 ns, so the time to maintain high level in the cycle is 60 ns. By putting the program in H0 SARAM for debugging, the waveform of GPIOA0 can be obtained, as shown in Figure 5. The high level time is exactly 60 ns. Note that since a jump is required after 3 low levels, the cycle to clear the pipeline is longer.
Figure 5 Waveform 1 of GPIOA0 in TMS320F2812
In order to observe the cycle of the multiplication instruction, modify the C source program of the above loop part to:
for(;;)
{Uint16 test1,test2,test3;
test1=0x1234; test2=0x2345;
GpioDataRegs.GPADAT.all=0xFFFF;
GpioDataRegs.GPADAT.all=0xFFFF;
GpioDataRegs.GPADAT.all=0xFFFF;
test3=test1*test2 ;
GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
}
The assembly instructions of the above program after compilation and linking are as follows:
3F8012L1:
3F80122841MOV*-SP[1],#0x1234
3F8014 2842 MOV*-SP[2],#0x2345
3F8016 761F MOVWDP,#0x01C3
3F8018 2820 MOV@32,#0xFFFF
3F801A 282 0 MOV@32,#0xFFFF
3F801C 2820 MOV@ 32,#0xFFFF
3F801E 2D42 MOVT,*-SP[2]
3F801F 1241 MPYACC,T,*-SP[1]
3F8020 9643 MOV*-SP[3],AL
3F8021 2B20 MOV@32,#0
3F8022 2B20 MOV@32,#0
3F8023 2B20 MOV@32,#0
3F8024 6FEE SBL1,UNC
The instruction to make GPIOA0 high level is still 6 instruction cycles (including 1 multiplication instruction), because the multiplication instruction is also single cycle, so the high level time in the cycle is 60 ns. By putting the program in H0 SARAM for debugging, the waveform of GPIOA0 can be obtained, as shown in Figure 6. The high level time is exactly 60 ns, and because a jump is required after 3 low levels, the pipeline needs to be cleared, and preparations need to be made for multiplication, so the low level time is longer than the time required in Figure 5. When using a digital oscilloscope for observation, if the waveform observed by the probe at ×1 gear is not ideal, you can use ×10 gear and adjust the compensation knob of the probe.
Figure 6 Waveform 2 of GPIOA0 in TMS320F2812
4 Comparison of three microprocessors
First of all, it should be emphasized that these microcontrollers can shorten the instruction cycle by increasing the oscillation frequency of the crystal oscillator, but the oscillation frequency of these controllers is limited. For example, the MCU does not exceed 40 MHz, while the frequency of LPC2114 does not exceed 60 MHz, and the maximum frequency of TMS320F2812 is 150 MHz. At the same operating frequency, the instruction cycle of ARM instructions is much higher than that of traditional MCUs. Because traditional MCUs do not use pipeline mechanisms, while ARM cores and DSPs both use pipelines, but because accessing peripherals and RAM and other memories requires a certain clock cycle, ARM cannot truly achieve single-cycle operation, especially single-cycle multiplication instructions, while DSP can achieve true single-cycle multiplication instructions, and the speed is much higher than that of ARM microcontrollers.
Previous article:Design of a multifunctional temperature detection recorder
Next article:Engine Management Module Test Based on Labview & PXI
Recommended ReadingLatest update time:2024-11-16 20:25
- Popular Resources
- Popular amplifiers
- Keysight Technologies Helps Samsung Electronics Successfully Validate FiRa® 2.0 Safe Distance Measurement Test Case
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- Seizing the Opportunities in the Chinese Application Market: NI's Challenges and Answers
- Tektronix Launches Breakthrough Power Measurement Tools to Accelerate Innovation as Global Electrification Accelerates
- Not all oscilloscopes are created equal: Why ADCs and low noise floor matter
- Enable TekHSI high-speed interface function to accelerate the remote transmission of waveform data
- How to measure the quality of soft start thyristor
- How to use a multimeter to judge whether a soft starter is good or bad
- What are the advantages and disadvantages of non-contact temperature sensors?
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- 【TI recommended course】#Lecture on basic knowledge of electronic circuits#
- Longxun LT7911UXC high-performance Type-C/DP1.4a to MIPI or LVDS chip for VR/display applications
- [SAMR21 new gameplay] 19. Python flashing light program
- Arduino
- Tips and tricks for circuitpython programming
- What to do if the STM32 Flash is write-protected?
- Tips for measuring CC2640 low power consumption
- With many major factories halting production one after another, is the “last mile” of semiconductors facing “paralysis”?
- 2V low input 5V output boost chip
- Using the Quartus II Timequest Timing Analyzer to Constrain and Analyze Designs