Testing and analysis of 8051, ARM and DSP instruction cycles

Publisher:三青Latest update time:2011-05-29 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

In real-time control systems, the most important indicator for selecting a microcontroller is the calculation speed. The instruction cycle is an important indicator that reflects the calculation speed. For this reason, this paper analyzes and tests the instruction cycles of three most representative microcontrollers (AT89S51 microcontroller, LPC2114 microcontroller with ARM7TDMI core, and TMS320F2812). In order to observe the instruction cycle, the GPIO ports of the three controllers are set as digital output ports, and the GPIO ports are set and cleared in a loop. The cycle of the entire cycle is obtained by observing the waveform changes of the GPIO ports. In order to correspond the cycle of the entire cycle with the instruction cycle of each specific instruction, the assembly language instructions are obtained through the C language source program to calculate the instruction cycle of each assembly language.

1 AT89S51 working mechanism and instruction cycle test

The clock of the AT89S51 microcontroller adopts an internal mode, and the clock generator divides the oscillation pulse by 2. Since the clock cycle is twice the oscillation cycle (clock cycle = oscillation cycle P1 + oscillation cycle P2), and 1 machine cycle contains 6 clocks, 1 machine cycle includes 12 crystal oscillation cycles. Taking the oscillation frequency of the quartz crystal oscillator as 11.059 2 MHz, the machine cycle of the microcontroller is 12/11.059 2=1.085 1 μs. The instruction cycle of the 51 series microcontroller generally contains 1 to 4 machine cycles, most instructions are single-cycle instructions, and there are 2-cycle and 4-cycle instructions.

In order to observe the instruction cycle, the lowest bit of the P1 port of the microcontroller is set and cleared cyclically. The source program is as follows:

#include
main() {
while(1) {
P1=0x01;
P1=0x00;
}
}

Use KEIL uVISION2 to compile and link to generate an executable file. When calling the Debug in the integrated environment, you can get the disassembly code of the mixed mode of the above source program:

2:main()
3: {
4:while(1)
5:{
6:P1=0x01;
0x000F759001MOVP1(0x90),#0x01
7:P1=0x00;
0x0012 E4CLRA
0x0013 F590MOVP1(0x90),A
8:}
0x001580EDSJMPmain ( C:0003)

The code in italics is the C source program, and the code in normal text is the assembly language code corresponding to the italic C source program. The first column of each line of assembly code is the location of the code in the memory, the second column is the machine code, and the following is the compiled and linked assembly language code. All instructions take up 6 machine cycles (of which "MOV P1(0x90),#0x01" takes up 2 machine cycles, "CLR A" and "MOV P1(0x90),A" each take up 1 machine cycle, and the last jump instruction takes up 2 machine cycles), so the total cycle period is 6×machine cycles=6×1.085 1 μs=6.51 μs.

Click here to view the image in a new window
Figure 1 The waveform of the lowest bit of P1 port

Download the compiled and linked executable file to the Flash of AT89S51 and execute it to get the waveform of the lowest bit of port P1, as shown in Figure 1. The entire cycle period is 6.1 μs, which is exactly the same as the above analysis.

2 LPC2114 working mechanism and instruction cycle test

LPC2114 is an encrypted microcontroller based on the ARM7TDMI core, with 128 KB of zero-wait on-chip Flash and 16 KB of SRAM. The clock frequency can reach 60 MHz (the frequency of the crystal oscillator is 11.059 2 MHz, the clock frequency is set to 11.059 2×4 =44.236 8 MHz, and the frequency of the on-chip peripherals is 1/4 of the clock frequency, that is, the frequency of the crystal oscillator). The ARM7TDMI core improves the execution speed of the instruction stream by using a three-stage pipeline and a large number of internal registers, and can provide an instruction execution speed of 0.9 MIPS/MHz, that is, the instruction cycle is 1/(0.9×44.236 8)=0.025 12 μs, which is about 25 ns.

In order to observe the instruction cycle, the P0.25 pin of the GPIO in LPC2114 is set as an output port, and the set and clear operations are performed on it cyclically. The C source program is as follows:

#include"config.h"
//P0.25 pin output
#defineLEDCON0x02000000
intmain(void)
{//Set all pins to connect GPIO
PINSEL0 = 0x00000000;
PINSEL1 = 0x00000000;
//Set LED4 control port to output
IO0DIR = LEDCON;
while(1)
{IO0SET = LEDCON;
IO0CLR = LEDCON;
}
return(0);
}

ADS1.2 is used to compile and link to generate an executable file. When AXD Debugger is called, the disassembled code of the above source program can be obtained:

main[0xe59f1020]ldrr1,0x40000248
40000224[0xe3a00000]movr0,#0
40000228[0xe5810000]strr0,[r1,#0]
4000022c[0xe5810004]strr0,[r1,#4]
400 00230[0xe3a00780]movr0,#0x2000000
40000234[0xe1c115c0 ]bicr1,r1,r0,asr #11
40000238[0xe5810008]strr0,[r1,#8]
4000023c[0xe5810004]strr0,[r1,#4]
40000240[0xe581000c]strr0,[r1,#0xc]
40000244[0xeafffffc]b0x4000023c
40000248[0xe002c000]dcd0xe0 02c000

The first column of each line of assembly code is the location of the code in the memory, the second column is the machine code, and the following is the compiled and linked assembly language code. The most critical statements in the loop part are the following three sentences:

4000023c[0xe5810004]strr0,[r1,#4]
40000240[0xe581000c]strr0,[r1,#0xc]
40000244[0xeafffffc]b0x4000023c

In AXD Debugger, call it into RAM to run the program to get the output waveform of P0.25 of the GPIO of the loop part, as shown in Figure 2. As can be seen from the figure, the time to maintain the high level in the loop cycle is about 1350 ns, and the time to maintain the low level is about 450 ns, that is, the instruction "str r0, [r1, #4]" and the instruction "str r0, [r1, #0xc]" both require about 350 ns, while the jump instruction requires about 100 ns. This is mainly due to the following reasons: ① Most of the ARM instructions are single-cycle, but some instructions (such as multiplication instructions) are multi-cycle; ② The microcontroller based on the ARM core can only access the data of the memory through load, store and exchange instructions, so reading data from the memory or writing data to the memory requires an additional clock cycle; ③ Accessing the on-chip peripherals requires an additional peripheral clock cycle. Of course, each instruction also requires 1 clock cycle, and clearing the pipeline during the jump requires an additional clock cycle.

Click here to view the image in a new window
Figure 2 GPIO P0.25 pin output waveform

In order to observe the multiplication instruction, the following assembly language is used for experiment. First is the assembly source program without multiplication instruction:

INCLUDELPC2294.INC ; Import header file
; P0.25 pin controls LED4, low level lights up
LEDCONEQU0x02000000
EXPORTMAIN
; Declare program code block
AREALEDCONC,CODE,READONLY
; Load register address, PINSEL0
MAINLDRR0,=PINSEL0
; Set data, that is, set the pin to connect to GPIO
MOVR1,#0x00000000
STRR1,[R0]; [R0] ← R1
LDRR0,=PINSEL1
STRR1,[R0]
LDRR0,=IO0DIR
LDRR1,=LEDCON
; Set LED control port to output
STRR1,[R0]
; Set GPIO control parameter
LOOPLDRR1,=LEDCON
LEDSETLDRR0,=IO0SET
; LED control I/O set, that is, LED4 turns off
STRR1,[R0]
LEDCLRLDRR0,=IO0CLR
; LED control I/O reset, that is, LED4 lights up
STRR1, [R0]
; jump to LOOP
B LOOP unconditionally

The assembly code compiled and linked using ADS1.2 is:

LOOP [0xe3a01780]movr1,#0x2000000
LEDSET[0xe59f0028] ldrr0,0x40000128
400000fc[0xe5801000]strr1,[r0,#0]
LEDCLR[0xe59f0024] ldrr0,0x4000012c
4 0000104 [0xe5801000]strr1,[r0,#0]
40000108 [0xeafffff9] bLOOP

In AXD Debugger, call it into RAM to run the program and get the output waveform of GPIO P0.25 pin in the loop part, as shown in Figure 3. It can be seen from the figure that the high level time in the loop period is about 450 ns, and the low level time is about 550 ns.

Click here to view the image in a new window
Figure 3 GPIO P0.25 pin output waveform 2

Add a multiplication instruction to the LOOP part of the above example, that is, change the loop part to:

LOOP LDRR1,=LEDCON
LEDSETLDRR0,=IO0SET
STRR1,[R0]
MOVR2,#0x0234
MULR2,R1,R2
LEDCLRLDRR0,=IO0CLR
STRR1,[R0]
B LOOP[page]

The assembly code compiled and linked using ADS1.2 is:

LOOP[0xe3a01780]movr1,#0x2000000
LEDSET[0xe59f0030]ldrr0,0x40000130
400000fc[0xe5801000]strr1,[r0,#0]
40000100[0xe3a02f8d]movr2,#0x234
40000104[0xe0020291] mulr2,r1,r2
LEDCLR[0xe59f0024] ldrr0, 0x40000134
4000010c[0xe5801000]strr1,[r0,#0]
40000110[0xeafffff7]bLOOP

In AXD Debugger, call it into RAM to run the program and get the output waveform of GPIO P0.25 pin in the loop, as shown in Figure 4. It can be seen from the figure that the high level time in the loop period is about 550 ns, and the low level time is about 550 ns. Compared with the above example, it can be seen that the extra MUL multiplication instruction and MOV transfer instruction take up 100 ns in total.

To sum up, the following conclusions can be drawn: When ARM instructions are placed in RAM and run, the instructions "str r0, [r1, #4]" and "strr0, [r1, #0xc]" both take about 350 ns, which is equivalent to 14 instruction cycles; the execution time of the instruction "ldr r0, 0x4000012c" is 100 ns, which is equivalent to 4 instruction cycles; the MUL multiplication instruction and MOV transfer instruction take a total of 100 ns, which is equivalent to 4 instruction cycles; the jump instruction takes a total of 100 ns, which is equivalent to 4 instruction cycles.

3 TMS320F2812 working mechanism and instruction cycle test

TMS320F2812 is a high-performance and cost-effective 32-bit fixed-point DSP chip for control produced by TI. The chip can work at a maximum frequency of 150 MHz (this article sets it to 100 MHz), and has 18K×16-bit 0-wait cycle on-chip SRAM and 128K×16-bit on-chip Flash (access time is 36 ns). TMS320F2812 adopts Harvard bus structure, that is, it can simultaneously perform one instruction fetch, data read and data write operation in the same clock cycle. At the same time, TMS320F2812 also uses 8-stage pipeline to improve the execution speed of system instructions.

In order to observe the instruction cycle, the GPIOA0 of TMS320F2812 is set and cleared repeatedly. The C source program is as follows:

#include "DSP28_Device.h"
void main(void) {
InitSysCtrl();/*Initialize system*/
DINT;/*Disable interrupt*/
IER = 0x0000;
IFR = 0x0000;
InitPieCtrl();/*Initialize PIE control register*/
InitPieVectTable();/*Initialize PIE vector table*/
InitGpio();/*Initialize EV*/
EINT;
ERTM;
for(;;) {
GpioDataRegs.GPADAT.all=0xFFFF;
GpioDataRegs.GPADAT.all=
0xFFFF; GpioDataRegs.GPADAT.all=0xFFFF
; GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
}
}

Click here to view the image in a new window
Figure 4 GPIO P0.25 pin output waveform 3

The most important thing is to initialize the general purpose input/output and determine the system CPU clock. The system clock is set to 100 MHz through PLL, and the source code for initializing InitGpio() is:

#include "DSP28_Device.h"
void InitGpio(void)
{ EALLOW;
//Multiplexer is selected as digital I/O
GpioMuxRegs.GPAMUX.all=0x0000;
//GPIOAO is output, the rest are input
GpioMuxRegs.GPADIR.all=0x0001;
GpioMuxRegs.GPAQUAL.all=0x0000;
EDIS;
}

By adding a breakpoint at the for(;;) in the main program, you can easily find the assembly instructions after the loop part of the main program is compiled:

3F8011 L1:
3F8011761FMOVWDP,#0x01C3
3F8013 2820 MOV@32,#0xFFFF
3F8015 2820 MOV@32,#0xFFFF
3F8017 2820 MOV@32,#0xFFFF
3F8019 2820 MOV@32,#0xFF
FF 3F801B 2820 MOV@32,#0xFFFF
3F801D 2820 MOV @32,#0xFFFF
3F801F 2B20 MOV@32,#0
3F8020 2B20 MOV@32,#0
3F8021 2B20 MOV@32,#0
3F8022 6FEF SBL1,UNC

The first column is the location of the program in RAM, the second column is the machine code, and the following is the assembly language program. The instruction "MOV @32,#0xFFFF" makes GPIO output high level, and the instruction "MOV @32,#0" makes GPIO output low level. There are 6 instructions to make GPIOA0 output high level and 3 instructions to make GPIOA0 output low level. The system instruction cycle is 10 ns, so the time to maintain high level in the cycle is 60 ns. By putting the program in H0 SARAM for debugging, the waveform of GPIOA0 can be obtained, as shown in Figure 5. The high level time is exactly 60 ns. Note that since a jump is required after 3 low levels, the cycle to clear the pipeline is longer.

Click here to view the image in a new window
Figure 5 Waveform 1 of GPIOA0 in TMS320F2812

In order to observe the cycle of the multiplication instruction, modify the C source program of the above loop part to:

for(;;)
{Uint16 test1,test2,test3;
test1=0x1234; test2=0x2345;
GpioDataRegs.GPADAT.all=0xFFFF;
GpioDataRegs.GPADAT.all=0xFFFF;
GpioDataRegs.GPADAT.all=0xFFFF;
test3=test1*test2 ;
GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
GpioDataRegs.GPADAT.all=0x0000;
}

The assembly instructions of the above program after compilation and linking are as follows:

3F8012L1:
3F80122841MOV*-SP[1],#0x1234
3F8014 2842 MOV*-SP[2],#0x2345
3F8016 761F MOVWDP,#0x01C3
3F8018 2820 MOV@32,#0xFFFF
3F801A 282 0 MOV@32,#0xFFFF
3F801C 2820 MOV@ 32,#0xFFFF
3F801E 2D42 MOVT,*-SP[2]
3F801F 1241 MPYACC,T,*-SP[1]
3F8020 9643 MOV*-SP[3],AL
3F8021 2B20 MOV@32,#0
3F8022 2B20 MOV@32,#0
3F8023 2B20 MOV@32,#0
3F8024 6FEE SBL1,UNC

The instruction to make GPIOA0 high level is still 6 instruction cycles (including 1 multiplication instruction), because the multiplication instruction is also single cycle, so the high level time in the cycle is 60 ns. By putting the program in H0 SARAM for debugging, the waveform of GPIOA0 can be obtained, as shown in Figure 6. The high level time is exactly 60 ns, and because a jump is required after 3 low levels, the pipeline needs to be cleared, and preparations need to be made for multiplication, so the low level time is longer than the time required in Figure 5. When using a digital oscilloscope for observation, if the waveform observed by the probe at ×1 gear is not ideal, you can use ×10 gear and adjust the compensation knob of the probe.

Click here to view the image in a new window
Figure 6 Waveform 2 of GPIOA0 in TMS320F2812

4 Comparison of three microprocessors

First of all, it should be emphasized that these microcontrollers can shorten the instruction cycle by increasing the oscillation frequency of the crystal oscillator, but the oscillation frequency of these controllers is limited. For example, the MCU does not exceed 40 MHz, while the frequency of LPC2114 does not exceed 60 MHz, and the maximum frequency of TMS320F2812 is 150 MHz. At the same operating frequency, the instruction cycle of ARM instructions is much higher than that of traditional MCUs. Because traditional MCUs do not use pipeline mechanisms, while ARM cores and DSPs both use pipelines, but because accessing peripherals and RAM and other memories requires a certain clock cycle, ARM cannot truly achieve single-cycle operation, especially single-cycle multiplication instructions, while DSP can achieve true single-cycle multiplication instructions, and the speed is much higher than that of ARM microcontrollers.

Reference address:Testing and analysis of 8051, ARM and DSP instruction cycles

Previous article:Design of a multifunctional temperature detection recorder
Next article:Engine Management Module Test Based on Labview & PXI

Recommended ReadingLatest update time:2024-11-16 20:25

AT89S51 interrupt enable and interrupt priority control
  The interrupt enable control and interrupt priority control are implemented by the interrupt enable register IE and the interrupt priority register IP in the special function register area respectively. The following introduces these two special function registers.      Interrupt enable register IE      The CPU of A
[Microcontroller]
AT89S51 interrupt enable and interrupt priority control
Software and hardware design of high-precision intelligent transmitter using TMS320F2812
Sensors are often used to measure parameters in industrial applications, but sensor signals are generally weak and not suitable for long-distance transmission. In addition, the nonlinear effect is not ideal and cannot meet the high accuracy requirements. This paper applies DSP and eCAN (enhanced controller area networ
[Power Management]
Software and hardware design of high-precision intelligent transmitter using TMS320F2812
Design of Punch Press Controller Based on AT89S51 Single Chip Microcomputer
    The system hardware configuration is based on the AT89S51 microcontroller. The AT89S51 is a low-power, high-performance CMOS 8-bit microcontroller, which contains a 4Kb ISP (In-system programmable) Flash read-only program memory that can be repeatedly erased and written 1,000 times. The device is manufactured usin
[Microcontroller]
Design of Punch Press Controller Based on AT89S51 Single Chip Microcomputer
Latest Test Measurement Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号