Introduction
In multimedia, communication and other applications with high computational complexity, embedded system programs often require special design to meet the requirements of many constraints such as manufacturing cost, power consumption, performance and real-time performance. This requires designers to have a set of practical programming guidelines when designing embedded software for specific applications. In actual program design, engineers especially need to consider the use of variables and the processing of loop programs.
Variable usage
When developing a real program, the use of variables is crucial. Using global variables is more efficient than passing parameters to functions, which eliminates the need to push and pop parameters when calling functions. Of course, using global variables will have some side effects on the program. The order in which variables are defined will result in different data layouts in the final image, as shown in Figure 1.
Figure 1 Variable image order disorder
It can be seen that when declaring variables, it is necessary to consider how to best control the memory layout. The best way is to define all variables of the same type together when programming.
Usually, engineers try to use short or char to define variables to save memory space. When the number of local variables of a function is limited, the compiler will assign local variables to internal registers, and each variable occupies one register. In this case, using short and char type variables will not only not save space, but will bring other side effects. As shown in Figure 2: Assume that a is any possible register to store the local variables of the function. For the same operation of adding 1, the 32-bit int type variable is the fastest, using only one addition instruction. For 8-bit and 16-bit variables, after completing the addition operation, they also need to perform sign extension in a 32-bit register. Among them, for signed variables, two instructions, logical left shift and arithmetic right shift, are required to complete the sign extension; for unsigned variables, a logical and instruction is required to clear the sign bit. Therefore, it is most effective to use 32-bit int or unsigned int local variables. In some cases, a function reads local variables from external memory for calculation. In this case, the non-32-bit variables need to be converted to 32 bits. As for the problem that the original overflow exception may be hidden after the 8-bit or 16-bit variable is expanded to 32 bits, it needs further careful consideration.
Figure 2 Addition program for different types of local variables
In programs, switch case statements are often used. Each test and jump implemented by machine language is just to decide what to do next, which wastes processor time. In order to increase speed, specific situations can be sorted according to their relative frequency of occurrence. That is, put the most likely situation first and the situation with a low probability of occurrence at the end, which will reduce the average execution time of the code.
Usually, engineers always try to avoid using redundant variables to simplify the program. Generally, this is correct, but there are exceptions, as shown below:
int f(void);
int g(void);
file://f() and g() do not access the global variable errs
int errs; file://global variable
void test1(void)
{ errs += f();
errs += g();
}
void test2(void)
{ int localerrs = errs;
// define redundant local variables
localerrs += f();
localerrs += g();
errs = localerrs;
}
In the first case, test1(), each time the global variable errs is accessed, it must first be downloaded from the corresponding memory to the register, and then stored back to the original memory after the f() or g() function call. In this example, a total of two such download/store operations are required. In the second case, test2(), the local variable localerrs is assigned to the register, so that the entire function only needs to download/store the global variable memory once. Saving the number of memory accesses as much as possible is very useful for improving system performance.
Processing of loop programs
Counting loops are commonly used flow control structures in programs. In C, for loops like the following are everywhere:
for(loop=1; loop<=limit; loop++)
This cumulative counting method conforms to the general natural thinking habits, so it is used more than the following decremental counting method:
for(loop<=limit; loop!=0; loop--)
There is no difference in efficiency between the two in logic, but when mapped to the specific architecture, there is a big difference.
The accumulation method uses one more instruction than the decrement method. When the number of loops is large, the two codes will have a significant difference in performance. The essential reason is that when a non-zero constant comparison is performed, a special CMP instruction must be used to execute; when a variable is compared with zero, the ARM instruction can directly use the conditional execution feature (NE) to make a judgment. In many cases, loop unrolling is automatically completed by the compiler, but it should be noted that for loops where intermediate variables or results are changed, the compiler often refuses to unroll, and at this time, engineers need to do the unrolling work themselves.
It is especially noteworthy that on CPUs with internal instruction caches (such as the ARM946ES chip), because the code for loop unrolling is very large, cache overflows often occur. At this time, the unrolled code will frequently call back and forth between the CPU's cache and memory, and because the cache is very fast, loop unrolling will actually slow down. At the same time, loop unrolling will affect vector operation optimization.
The ARM processor core has special instructions for NZ (zero comparison and branch), which are very fast. If your loop is not sensitive to direction, you can loop from large to small. It should be noted that if the pointer operation uses the i value, this method may cause a serious error of pointer index out of bounds (i = MAX+1). Of course, you can correct it by adding or subtracting i, but if this is done, it will not improve efficiency.
Conclusion
This article summarizes some programming techniques for high-efficiency embedded ARM program development. In actual embedded system development, it can greatly improve the performance of the system, especially in high-complexity applications such as multimedia and communication, which has guiding significance for program design.
References:
1 Marshall P. Cline and Greg A. Lomow. C++ FAQs, Addison-Wesley, 1995
2 Bruce Eckel. Thinking in C++ (C++ Programming Thoughts, translated by Liu Zongtian et al.), Machinery Industry Press, 2000
Previous article:Embedded Linux System Graphics and Graphical User Interface
Next article:Design and Analysis of Bootloader in Loongson Tax Control SoC
- Popular Resources
- Popular amplifiers
- Semantic Segmentation for Autonomous Driving: Model Evaluation, Dataset Generation, Viewpoint Comparison, and Real-time Performance
- Machine Learning and Embedded Computing in Advanced Driver Assistance Systems (ADAS)
- Intelligent program synthesis framework and key scientific problems for embedded software
- arm_embedded_machine_learning_design_dummies_guide
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Review Weekly Report 20221010: Sipeed GW2A FPGA development board and Renesas ultra-low power MCU RA2L1 are here~
- Why can a Zener diode break down with a reverse current of just a few volts, but a rectifier diode cannot break down with a reverse current of more than 200 volts?
- Semiconductor Recruitment-Shanghai, Shenzhen, Hangzhou
- Four major reasons for the heating of power modules
- [Erha Image Recognition Artificial Intelligence Vision Sensor] 3. Serial communication with the processor
- (Paid Purchase) SSRP-7.25 Motherboard PCB
- LCR parallel resonance
- How to implement the 0-10V circuit of the inverter and motor speed control interface?
- BLE over-the-air upgrade
- Schematic diagram of serial communication interface