Efficient C language programming based on ARM-EEWORLD

Collect

introduction

ARM processors are widely used in various successful 32-bit embedded systems due to their advantages such as high performance, low power consumption and low cost. Improving execution speed and reducing code size are key requirements for embedded software design. Although most ARM compilers and debuggers have performance optimization tools, in order to ensure their correctness, the compiler must be stable and safe, and it is also limited by the processor's own structure. Therefore, programmers must implement code optimization based on understanding the working characteristics of the compiler. There are many code optimization methods, and this article describes the function optimization method.

1 Data types of function local variables

Local variables include local variables within a function, function parameters, and function return values. Since ARM data operations are all 32-bit, even if the data itself only requires 8 or 16 bits, 32-bit data types int or lONg should be used as much as possible for these three types of local variables to improve code execution efficiency. The following analysis takes a simple sum function as an example.

Function add1 calculates the cumulative sum of the array array containing 10 words. add2 has the same function as add1, except that the type of the parameter array of function add1 is changed to 16-bit short, the type of the local variable i in the function is changed to 8-bit char, and sum is changed to 16-bit short. The C source code of add1 and add2 is as follows:

int add1(int *array){

unsigned int i;

int sum=0;

for(i=0;i<10;i++)

sum=sum+array[i];

return sum;

}

short add2(short *array){

char i;

short sum=0;

for(i=0;i<10;i++)

sum= sum+array[i];

return sum;

}

The compiled assembly code of add1 is:

add1

mov r2,r0

mov r0,#0

mov r1,#0

add1_loop

ldr r3,[r2,r1,lsl #2]

add r1,r1,#1

cmp r1,#0x0a

add r0,r3,r0

bcc add1_loop

mov pc,r14

The compiled assembly code of add2 is:

add2

mov r2,r0

mov r0,#0

mov r1,#0

add2_loop

add r3,[r2,r1,lsl #1]; add statement ①

ldrh r3,[r3,#0]

add r1,r1,#1

and r1,r1,0xff; add statement ②

cmp r1,#0x0a

add r0,r3,r0

bcc add2_loop

mov r0,r0,lsl #16; Add statement ③

mov r0,r0,asr #16; Add statement ④

mov pc,r14

Comparing the assembly codes of the add1 and add2 functions, we can find that the add2_loop loop has 4 more statements than the add1_loop loop.

Statement ①: The variable sum in function add2 is of 16-bit short type. The ldrh instruction in ARM instructions does not support shift address offset, so add instruction is added to calculate the array subscript address.

Statement ②: Since the loop variable i in function add2 is of 8-bit char type, and the register of ARM processor is 32-bit, this statement is used to handle the overflow problem caused by the accumulation of loop variables. That is, when i accumulates to 255, adding 1 should be 0, not 256.

Statements ③ and ④: The result sum returned by function add2 is of short type. Before returning, the first 16 bits of the 32-bit register must be filled with sign bits, that is, converted to a 16-bit short type.

2 Number of local variables in a function

In order to speed up the execution of the program, local variables should be allocated in registers as much as possible when compiling the function. * When there are more local variables than available registers, the compiler will push the extra variables into the stack (that is, store them in the memory), so the number of local variables must be controlled.

ARM processors use RISC structure and have abundant internal registers. When the compiler uses the apcs switch option, that is, supports the ATPCS (ARM Thumb Procedure Call STandard) standard, there are theoretically 14 registers (R0~R12, R14) that can be used to store local variables. However, in fact, some registers have their own special uses. For example, R9 is used as a static base register in the case of read-write position-independent (RWPI) compilation, and R12 is used as a temporary transition register for internal calls of subroutines. The register names and descriptions in the ATPCS rules are listed in Table 1.

Table 1 Register description in ATPCS rules

Therefore, the number of local variables should be limited as much as possible: ① The number of function parameters should be controlled within 4, and only R0~R3 can be used to store parameters. When there are more than 4 parameters, they will be pushed into the stack. If there are more than 4 parameters due to practical application needs, structures can also be used to organize the parameters and pass structure pointers to achieve this. ② The number of local variables inside a function should be controlled within 12 (R0~R11), and R12~R15 have specific uses.

3 Writing code in function

3.1 Writing loop code

The control condition of the loop is set to decrement to zero, which can reduce the number of instructions. Take the sum of 10 numbers as an example for analysis.

Code 1:

int sum=0;

for(int i=0;i<10;i++)

sum=sum+i;

Code 2:

int sum=0;

for(int i=10;i!=0;i--)

sum=sum+i;

Assembly code 1:

mov r0,#0

mov r1,#0

add1

add r1,r1,#1

cmp r1,#0x0a

add r0,r1,r0

bcc add1

Assembly code 2:

mov r0,#0

mov r1,#0x0a

add2

subs r1,r1,#1

add r0,r1,r0

bne add2

Comparing Code 1 and Code 2, the functions of the two are the same, but Code 2 has one less instruction in the loop. The loop is executed 10 times, which means that a total of 10 instructions are reduced during execution.

3.2 Use of inline functions

When the function body code is small (usually only one or two statements) and is frequently called, it can be set as an inline function. The call to an inline function is similar to the expansion of a macro definition, so there is no function call overhead (i.e., parameter passing and function value return), but the code size of the called function is increased.

For example, in an embedded system, the read and write code of frequently accessed peripheral ports can be set as inline functions to improve execution efficiency. The read and write functions of peripheral registers are as follows:

inline unsigned short reg_read(unsigned short reg){

return (unsigned short)*(volatile unsigned short *)( reg); //Peripheral register read function

}

inline void reg_write(unsigned short reg, unsigned short val){

*(volatile unsigned short *)(reg)=val; //Write function of peripheral register

}

The common features of these two functions are: the function body has very little code, only one statement; the local variables used are very few, only 1 to 2 parameters. Since they are defined as inline functions, the program is easy to read; during execution, there is no call overhead, so the execution efficiency is high; the function body is very small, so the space overhead is not large when it is expanded.

Conclusion

Due to the storage space limitations and real-time requirements of embedded systems, corresponding methods and principles must be adopted when writing code to reduce the space and time overhead of the code. Code optimization takes time, and code optimization will reduce the readability of the source code. Therefore, only by optimizing functions that are frequently called and have a greater impact on performance can the system be optimized most effectively.

Reference address：Efficient C language programming based on ARM

Previous article：115.2K Serial Communication C Language Example
Next article：Design and simulation of electronic password lock based on STC89C52 single chip microcomputer based on C language

Popular Resources
Popular amplifiers

Latest Microcontroller Articles

Download from the Internet--ARM Getting Started Notes
A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
Learn ARM development(22)
Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
Learn ARM development(21)
First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
Learn ARM development(20)
With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
Learn ARM development(19)
After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
Learn ARM development(14)
Learn ARM development(15)
Learn ARM development(16)
Learn ARM development(17)

He Limin Column Microcontroller and Embedded Systems Bible

Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like