A Brief Analysis of Optimization Problems in Embedded Programming-EEWORLD

Collect

Due to the constraints of power consumption, cost and size, the processing power of embedded microprocessors is also far behind that of desktop system processors. Therefore, embedded systems have more stringent requirements on the space and time for program running. Usually, performance optimization of embedded applications is required to meet the performance requirements of embedded applications.

1 Types of embedded program optimization
Embedded application optimization refers to modifying the algorithm and structure of the original program without changing the program function, and using software development tools to improve the program, so that the modified program runs faster or has a smaller code size.

According to the different optimization focuses, program optimization can be divided into running speed optimization and code size optimization. Running speed optimization refers to shortening the running time required to complete a specified task by adjusting the application structure and other means on the basis of fully mastering the characteristics of software and hardware; code size optimization refers to reducing the amount of program code as much as possible under the premise that the application can correctly implement the required functions. In practical applications, the two are often contradictory. In order to increase the running speed of the program, the amount of code must be increased; and in order to reduce the size of the program code, the running speed of the program may be reduced. Therefore, before optimizing the program, a specific optimization strategy should be formulated according to actual needs. With the continuous development of computer and microelectronics technology, storage space is no longer the main factor restricting embedded systems, so this article mainly discusses running speed optimization.

2 Principles to be followed in embedded program optimization
Embedded program optimization mainly follows the following three principles.
① Equivalence principle: the functions implemented by the program before and after optimization are the same.
② Effective principle: the program after optimization should run faster or occupy less storage space than before optimization, or both.
③ Economic principle: optimizing the program should pay a smaller price and achieve better results.

3 Main aspects of embedded program optimization
Embedded program optimization is divided into three aspects: algorithm and data structure optimization, compilation optimization and code optimization.

3.1 Algorithm and data structure optimization
Algorithms and data structures are the core of program design. The quality of algorithms largely determines the quality of programs. In order to achieve a certain function, multiple algorithms can usually be used, and the complexity and efficiency of different algorithms vary greatly. Choosing an efficient algorithm or optimizing the algorithm can enable the application to achieve higher optimization performance. For example: when searching for data, binary search is faster than sequential search. Recursive programs require a large number of procedure calls and save all local variables of the return procedure in the stack. The time efficiency and space efficiency are very low; if the recursive program is converted to non-recursive using iteration, stack and other methods according to the actual situation, the performance of the program can be greatly improved.

Data structure also occupies an important position in program design. For example: if you insert and delete data items in some disordered data many times, it will be faster to use a linked list structure.

Algorithm and data structure optimization is the preferred optimization technology.

3.2 Compilation optimization
Nowadays, many compilers have certain code optimization functions. During compilation, parallel programming technology is used to perform correlation analysis; semantic information of the source program is obtained, and software pipeline, data planning, loop reconstruction and other technologies are used to automatically perform some optimizations that are independent of the processor system to generate high-quality code. Many compilers have different levels of optimization options, and you can choose a suitable optimization method. Usually, if the highest level of optimization is selected, the compiler will unilaterally pursue code optimization, which sometimes leads to errors.

In addition, there are some dedicated compilers that are optimized for certain architectures and can make full use of hardware resources to generate high-quality code. For example: Microsoft eMbedded Visual C++ version of the Intel compiler is completely targeted at the Intel XScale system, and after being highly optimized, it can create faster running code. This compiler uses a variety of optimization technologies, including scheduling technology for optimizing instruction pipeline operations, dual load and store Intel XScale technology function support, and inter-procedural optimization (storing variables used by functions in registers for fast access).

In the process of embedded software development, a compiler with strong optimization capabilities should be selected to make full use of its code optimization function to generate efficient code and improve the running efficiency of the program.

3.3 Code Optimization
Code optimization is to use assembly language or more concise program code to replace the original code, so that the compiled program runs more efficiently. The compiler can automatically complete the optimization within the program segment and code block, but it is difficult to obtain program semantic information, algorithm flow and program running status information, so programmers need to perform manual optimization. The following are some commonly used optimization techniques and skills.

(1) Code replacement
Use short-cycle instructions to replace long-cycle instructions to reduce the intensity of operations.
① Reduce division operations. Use relational operators to multiply the divisor on both sides to avoid division operations. Some division and modulus operations can be replaced by bit operations. Because bit operation instructions only require one instruction cycle, while "/" operations require calling subroutines, the code is long and the execution is slow. For example:
before optimization, if ((a/b)>c) and a=a/4,
after optimization, if (a>(b*c)) and a=a>>2
② Reduce exponentiation operations. For example:
before optimization, a=pow(a,3.0),
after optimization, a=a*a*a
③ Use white addition and self-decrement instructions. For example:
before optimization, a=a+1, a=al,
after optimization, a++, a-- or inc, dec
④ Try to use small data types. Under the condition that the defined variables meet the use requirements, the priority order is: character type (char)> integer type (im)> long integer type (long int)> floating point type (float).
For division, using unsigned numbers will be more efficient than signed numbers. In actual calls, try to reduce the forced conversion of data types; use less floating-point operations. If the result of the operation can be controlled within the error, long integer type can be used instead of floating point type. [page]

(2) Global variables and local variables
Use less global variables and more local variables. Global variables are placed in the data memory. If global variables are defined, the MCU will have one less available data memory space. Too many global variables will cause the compiler to have insufficient memory allocation; while local variables are mostly located in the registers inside the MCU. In most MCUs, registers are faster than data memory, and the instructions are more flexible, which is conducive to generating higher quality code. In addition, the registers and data memory occupied by local variables can be reused in different modules.

(3) Use register variables
When a variable is frequently read/written, it needs to access the memory repeatedly, which takes a lot of access time. In order to improve access efficiency, CPU register variables can be used, which do not need to access the memory and can be read/written directly. Loop control variables with a large number of loops and variables repeatedly used in the loop body can be defined as register variables, and loop counts are the best choice for applying register variables. Only local automatic variables and formal parameters can be defined as register variables. Because register variables belong to dynamic storage methods, variables that require static storage methods cannot be defined as register variables. The descriptor of a register variable is register. The following is an example of using register variables:

(4) Reduce or avoid executing time-consuming operations
A large amount of the runtime of an application is usually spent in key program modules, which often contain loops or nested loops. Reducing time-consuming operations in loops can increase the execution speed of the program. Common time-consuming operations include: input/output operations, file access, graphical interface operations, and system calls. Among them, if file reading/writing cannot be avoided, then file access will be a major factor affecting the program's running speed. There are two ways to increase file access speed: one is to use memory-mapped files; the other is to use memory cache.

(5) Optimization of switch statement usage
When programming, sort the case values according to probability, put the most likely case first, and the least likely case last, which can increase the execution speed of the switch statement block.

(6) Optimization of loop body
The loop body is the focus of program design and optimization. For some modules that do not require the loop variable to participate in the calculation, it can be placed outside the loop. For a loop body with a fixed number of times, the for loop is more efficient than the while loop, and the down-count loop is faster than the up-count loop. For example:

During actual operation, each loop needs to add two instructions outside the loop body: a subtraction instruction (to reduce the loop count value) and a conditional branch instruction. These instructions are called "loop overhead". On the ARM processor, the subtraction instruction requires 1 cycle and the conditional branch instruction requires 3 cycles, so each loop adds an additional 4 cycles of overhead. The loop unrolling method can be used to increase the speed of the loop, that is, repeat the loop subject multiple times and reduce the number of loops by the same proportion to reduce the loop overhead, so as to increase the code size. In exchange for the program's running speed. .

(7) Function call
Call the function efficiently and try to limit the number of function parameters to no more than 4. When calling ARM, parameters below 4 are passed through registers, and parameters above the 5th are passed through the memory stack. If there are more parameter calls, the related parameters can be organized in a structure and the structure pointer is used instead of the parameters.

(8) Inline function and inline assembly
Important functions that have a great impact on performance can be inlined using the keyword _inline, which will save the overhead of calling the function. The negative impact is that the code size is increased. The time-critical parts of the program can be written in inline assembly, which usually brings significant speed improvements.

(9) Lookup tables instead of calculations
Try not to perform very complex calculations in the program, such as the square root of floating-point numbers. For these time-consuming and resource-consuming calculations, the method of exchanging space for time can be used. Calculate the function value in advance and place it in the program storage area. When the program is running, you can directly look up the table, which reduces the workload of repeated calculations during program execution.

(10) Use hardware-optimized function libraries.
The GPP (Graphics Performance Primitives library)/IPP (Integrated Performance Primitives library) library designed by Intel for the XScale processor has been manually optimized for some typical operations and algorithms for multimedia processing, graphics processing and numerical calculations. It can fully utilize the computing potential of XScale hardware and achieve high execution efficiency.

(11) Utilize hardware features
In order to improve the running efficiency of the program, it is necessary to make full use of hardware features to reduce its running overhead, such as reducing the number of interrupts and using DMA transmission methods.

The CPU's access speed to various memories is ranked as follows: CPU internal RAM> external synchronous RAM> external asynchronous RAM> Flash/ROM. For program codes that have been burned in Flash or ROM, if the CPU reads the code directly from it and executes it, the running speed will be slow. In this case, after the system starts, the target code in Flash or ROM can be copied to RAM and executed to increase the running speed of the program.

4 Conclusion
The performance optimization of embedded programs is often in conflict with the software development cycle, development cost, and software readability. We need to weigh the pros and cons and make a compromise. Use algorithm and data structure optimization as the preferred optimization technology; then select efficient compilers, system runtime libraries, and graphics libraries based on factors such as functions, performance differences, and investment budgets; use performance monitoring tools to detect program hotspots that take up most of the running time, and use code optimization methods to optimize them; finally, use an efficient compiler for compilation optimization to obtain high-quality code.

Reference address：A Brief Analysis of Optimization Problems in Embedded Programming

Previous article：Soft Limiting Circuit of Simple AM Modulator
Next article：LCD human-computer interaction menu design for embedded systems

Popular Resources
Popular amplifiers