High-efficiency embedded ARM program development-EEWORLD

Collect

In applications with high computational complexity such as multimedia and communications, embedded system programs often require special design to meet the requirements of many constraints such as manufacturing costs, power consumption, performance, and real-time performance. This requires designers to have a set of practical programming guidelines when designing embedded software for specific applications. In actual program design, engineers especially need to consider the use of variables and the processing of loop programs.

Variable usage

When developing an actual program, the use of variables is crucial. Using global variables is more efficient than passing parameters to functions, which eliminates the need to push and pop parameters when calling functions. Of course, using global variables will have some side effects on the program. The order in which variables are defined will result in different data layouts in the final image, as shown in Figure 1.

Therefore, when declaring variables, you need to consider how to best control the memory layout. The best way is to define all variables of the same type together when programming.

Typically, engineers try to use short or char to define variables to save memory space. When the number of local variables in a function is limited, the compiler will assign local variables to internal registers, with each variable occupying one register. In this case, using short and char variables will not only not save space, but will bring other side effects. As shown in Figure 2: Assume that a is any possible register to store the local variables of the function. For the same operation of adding 1, the 32-bit int type variable is the fastest, using only one addition instruction. For 8-bit and 16-bit variables, after completing the addition operation, the sign needs to be extended in a 32-bit register. Among them, for signed variables, two instructions, logical left shift and arithmetic right shift, are required to complete the sign extension; for unsigned variables, a logical and instruction is used to clear the sign bit. Therefore, it is most effective to use 32-bit int or unsigned int local variables. In some cases, functions read local variables from external memory for calculation. At this time, non-32-bit variables need to be converted to 32 bits. As for the problem that the original overflow exception is hidden after the 8-bit or 16-bit variable is expanded to 32 bits, further careful consideration is needed.

Figure 2 Addition procedures for different types of local variables

In programs, switch case statements are often used. Each test and jump implemented by machine language is just to decide what to do next, which wastes processor time. In order to improve the speed, the specific cases can be sorted according to their relative frequency of occurrence. That is, put the most likely case first and the less likely case last, which will reduce the average execution time of the code.

Generally, engineers try to avoid using redundant variables to simplify programs. This is generally correct, but there are exceptions, as shown below:

int f(void);

int g(void);

//f() and g() do not access the global variable errs

int errs; //global variables

void test1(void)

{

errs += f();

errs += g();

}

void test2(void)

{

int localerrs = errs;

// Define redundant local variables

localerrs += f();

localerrs += g();

errs = localerrs;

}

In the first case, test1(), each time the global variable errs is accessed, it must first be downloaded from the corresponding memory to the register, and then stored back to the original memory after the f() or g() function is called. In this example, a total of two such download/store operations are required. In the second case, test2(), the local variable localerrs is assigned to the register, so that the entire function only needs to download/store the global variable memory once. Saving the number of memory accesses as much as possible is very useful for improving system performance.

Processing of cycle programs

Counting loops are a common flow control structure in programs. In C, for loops like the following are common:

for(loop=1；loop<=limit；loop++)

This method of counting by adding is in line with the general natural thinking habit, so it is used more than the following method of counting by adding:

for(loop<=limit；loop!=0； loop--)

There is no logical efficiency difference between the two, but when mapped to the specific architecture, there is a big difference.

The accumulation method uses one more instruction than the decrement method. When the number of loops is large, the two codes will have a significant performance difference. The fundamental reason is that when a non-zero constant is compared, a special CMP instruction must be used to execute; when a variable is compared with zero, the ARM instruction can directly use the conditional execution feature (NE) to make a judgment. In many cases, loop unrolling is automatically completed by the compiler, but it should be noted that for loops where intermediate variables or results are changed, the compiler often refuses to unroll, and engineers need to do the unrolling work themselves.

It is especially noteworthy that on CPUs with internal instruction caches (such as the ARM946ES chip), because the code for loop unrolling is very large, cache overflow often occurs. At this time, the unrolled code will frequently call back and forth between the CPU's cache and memory, and because the cache is very fast, loop unrolling will actually slow down. At the same time, loop unrolling will affect vector operation optimization.

ARM processor cores have special instructions for NZ (Zero Compare Jump), which is very fast. If your loop is not sensitive to direction, you can loop from large to small. It should be noted that if the pointer operation uses the i value, this method may cause a serious error of pointer index out of bounds (i = MAX+1). Of course, you can correct it by adding or subtracting i, but if you do this, it will not improve efficiency.

Conclusion

This article summarizes some programming techniques for high-efficiency embedded ARM program development. In actual embedded system development, it can greatly improve the performance of the system, especially in high-complexity applications such as multimedia and communication, and has guiding significance for program design.

Reference address：High-efficiency embedded ARM program development

Previous article：uC/OS II porting experience on ARM
Next article：uCLinux system development in S3C4510 development board

Popular Resources
Popular amplifiers