introduction
ARM processors are widely used in various successful 32-bit embedded systems due to their advantages such as high performance, low power consumption and low cost. Improving execution speed and reducing code size are key requirements for embedded software design. Although most ARM compilers and debuggers have performance optimization tools, in order to ensure their correctness, the compiler must be stable and safe, and it is also limited by the processor's own structure. Therefore, programmers must implement code optimization based on understanding the working characteristics of the compiler. There are many code optimization methods, and this article describes the function optimization method.
1 Data types of function local variables
Local variables include local variables within a function, function parameters, and function return values. Since ARM data operations are all 32-bit, even if the data itself only requires 8 or 16 bits, 32-bit data types int or lONg should be used as much as possible for these three types of local variables to improve code execution efficiency. The following analysis takes a simple sum function as an example.
Function add1 calculates the cumulative sum of the array array containing 10 words. add2 has the same function as add1, except that the type of the parameter array of function add1 is changed to 16-bit short, the type of the local variable i in the function is changed to 8-bit char, and sum is changed to 16-bit short. The C source code of add1 and add2 is as follows:
int add1(int *array){
unsigned int i;
int sum=0;
for(i=0;i<10;i++)
sum=sum+array[i];
return sum;
}
short add2(short *array){
char i;
short sum=0;
for(i=0;i<10;i++)
sum= sum+array[i];
return sum;
}
The compiled assembly code of add1 is:
add1
mov r2,r0
mov r0,#0
mov r1,#0
add1_loop
ldr r3,[r2,r1,lsl #2]
add r1,r1,#1
cmp r1,#0x0a
add r0,r3,r0
bcc add1_loop
mov pc,r14
The compiled assembly code of add2 is:
add2
mov r2,r0
mov r0,#0
mov r1,#0
add2_loop
add r3,[r2,r1,lsl #1]; add statement ①
ldrh r3,[r3,#0]
add r1,r1,#1
and r1,r1,0xff; add statement ②
cmp r1,#0x0a
add r0,r3,r0
bcc add2_loop
mov r0,r0,lsl #16; Add statement ③
mov r0,r0,asr #16; Add statement ④
mov pc,r14
Comparing the assembly codes of the add1 and add2 functions, we can find that the add2_loop loop has 4 more statements than the add1_loop loop.
Statement ①: The variable sum in function add2 is of 16-bit short type. The ldrh instruction in ARM instructions does not support shift address offset, so add instruction is added to calculate the array subscript address.
Statement ②: Since the loop variable i in function add2 is of 8-bit char type, and the register of ARM processor is 32-bit, this statement is used to handle the overflow problem caused by the accumulation of loop variables. That is, when i accumulates to 255, adding 1 should be 0, not 256.
Statements ③ and ④: The result sum returned by function add2 is of short type. Before returning, the first 16 bits of the 32-bit register must be filled with sign bits, that is, converted to a 16-bit short type.
2 Number of local variables in a function
In order to speed up the execution of the program, local variables should be allocated in registers as much as possible when compiling the function. * When there are more local variables than available registers, the compiler will push the extra variables into the stack (that is, store them in the memory), so the number of local variables must be controlled.
ARM processors use RISC structure and have abundant internal registers. When the compiler uses the apcs switch option, that is, supports the ATPCS (ARM Thumb Procedure Call STandard) standard, there are theoretically 14 registers (R0~R12, R14) that can be used to store local variables. However, in fact, some registers have their own special uses. For example, R9 is used as a static base register in the case of read-write position-independent (RWPI) compilation, and R12 is used as a temporary transition register for internal calls of subroutines. The register names and descriptions in the ATPCS rules are listed in Table 1.
Table 1 Register description in ATPCS rules
Therefore, the number of local variables should be limited as much as possible: ① The number of function parameters should be controlled within 4, and only R0~R3 can be used to store parameters. When there are more than 4 parameters, they will be pushed into the stack. If there are more than 4 parameters due to practical application needs, structures can also be used to organize the parameters and pass structure pointers to achieve this. ② The number of local variables inside a function should be controlled within 12 (R0~R11), and R12~R15 have specific uses.
3 Writing code in function
3.1 Writing loop code
The control condition of the loop is set to decrement to zero, which can reduce the number of instructions. Take the sum of 10 numbers as an example for analysis.
Code 1:
int sum=0;
for(int i=0;i<10;i++)
sum=sum+i;
Code 2:
int sum=0;
for(int i=10;i!=0;i--)
sum=sum+i;
Assembly code 1:
mov r0,#0
mov r1,#0
add1
add r1,r1,#1
cmp r1,#0x0a
add r0,r1,r0
bcc add1
Assembly code 2:
mov r0,#0
mov r1,#0x0a
add2
subs r1,r1,#1
add r0,r1,r0
bne add2
Comparing Code 1 and Code 2, the functions of the two are the same, but Code 2 has one less instruction in the loop. The loop is executed 10 times, which means that a total of 10 instructions are reduced during execution.
3.2 Use of inline functions
When the function body code is small (usually only one or two statements) and is frequently called, it can be set as an inline function. The call to an inline function is similar to the expansion of a macro definition, so there is no function call overhead (i.e., parameter passing and function value return), but the code size of the called function is increased.
For example, in an embedded system, the read and write code of frequently accessed peripheral ports can be set as inline functions to improve execution efficiency. The read and write functions of peripheral registers are as follows:
inline unsigned short reg_read(unsigned short reg){
return (unsigned short)*(volatile unsigned short *)( reg); //Peripheral register read function
}
inline void reg_write(unsigned short reg, unsigned short val){
*(volatile unsigned short *)(reg)=val; //Write function of peripheral register
}
The common features of these two functions are: the function body has very little code, only one statement; the local variables used are very few, only 1 to 2 parameters. Since they are defined as inline functions, the program is easy to read; during execution, there is no call overhead, so the execution efficiency is high; the function body is very small, so the space overhead is not large when it is expanded.
Conclusion
Due to the storage space limitations and real-time requirements of embedded systems, corresponding methods and principles must be adopted when writing code to reduce the space and time overhead of the code. Code optimization takes time, and code optimization will reduce the readability of the source code. Therefore, only by optimizing functions that are frequently called and have a greater impact on performance can the system be optimized most effectively.
Previous article:115.2K Serial Communication C Language Example
Next article:Design and simulation of electronic password lock based on STC89C52 single chip microcomputer based on C language
- Popular Resources
- Popular amplifiers
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- EEWORLD University Hall----What can universal fast charging bring?
- Complete learning manual for using RT-Thread on RISC-V (based on Longan development board)
- How does Tbox protect itself to ensure data security? What good methods can you recommend?
- Encoder Problems
- When defining global variables in Keil, if an initial value is assigned, will the value of the variable remain the same every time the power is turned on again?
- Are the emoticons that were commonly used in the forum gone?
- Recruiting senior electronic engineers
- IMP34DT05 data sheet, package and other information
- Byte/word alignment issues in DSP
- Pre-registration for the live broadcast with prizes: Cytech & ADI discuss with you Gigabit digital isolators for video, converters, and communications