When developing programs to run on embedded processor cores within SoCs, engineers have two main goals: run them fast enough to minimize the processor frequency; and consume as little memory as possible to minimize memory overhead.
The importance of these two factors may vary from project to project. Two key factors greatly influence the design team's ability to meet these goals: how well the compiler that develops the source code optimizes the code, and the programming style used to develop the source code. This article will discuss these two factors in depth and offer some suggestions for creating small and fast C programs.
Compiler Principles
Compilers usually consist of two parts: the front end and the back end. The front end usually refers to the syntax and semantics processing process, and the back end usually refers to optimization, code generation, and optimization processes for specific processors. Many good compiler back ends rely on multiple layers of intermediate representations (IR). Optimization and code generation pass the intermediate representation step by step from high level (syntax of type input program) to low level. Processor-independent optimizations generally tend to be implemented on higher IR levels early in the compilation process, while processor-specific optimizations generally tend to be implemented on lower-level IRs later in the compilation process. Information is passed down through different IR layers, so that low-level optimizations can make full use of high-level information processed by the compiler early.
Tensilica's XCC/C++ compiler for its Xtensa configurable processors and Diamond standard processors includes four basic optimization levels, from -O0 to -O3, corresponding to increasing optimization levels. Table 1 describes these levels and their corresponding code size and internal procedure analysis (IPA). By default, the XCC compiler optimizes one file at a time, but it can also perform internal procedure analysis (by adding the IPA compile option). When optimizing the entire application over multiple source files, the optimization will be delayed until after the link step. Table 2 describes a partial list of optimizations supported by current compilers (including the XCC compiler).
The XCC compiler can also make use of the performance analysis data generated by the compilation. The performance analysis feedback can help the compiler reduce the delay of branch jumps. In addition, the feedback allows the compiler to insert only the most commonly used functions (inline) and properly handle the problem of register overflow in commonly used code sections. Therefore, performance analysis feedback allows the XCC compiler to perform normal optimizations everywhere, while also speeding up by optimizing critical parts of the application.
Some useful C coding rules
To get the best performance from the compiler, the programmer needs to think like a compiler and understand the relationship between the C language and the target processor. The following basic principles can help all embedded programmers get much better performance compiled code without much effort.
1. Observe the compiled code
It is impossible to fully understand how the compiler compiles all the code. If the XCC compiler is set with the -S or -save-temps compilation option, the compilation will produce assembly output and some comments added for understanding. For those codes with high performance requirements, you can observe whether the compilation results meet your expectations. If not, please consider the following rules.
2. Understand the situation where obfuscation occurs
The C language increases the chances of confusion by allowing the arbitrary use of pointers, which allows a program to refer to the same data object in many ways. If the address of a global variable is passed as an argument to a subroutine, the variable can be referenced by its name or through a pointer. This is a form of confusion, and the compiler must be conservative in storing such data objects in memory rather than in registers, and carefully maintain the order of variable accesses in the code that could cause confusion. Consider the following code:
void foo(int *a, int *b)
{
int i;
for (i=0; i<100; i++) {
*a += b[i];
}
}
You would imagine that the compiler would generate code that would store *a in a register before the loop starts, and store b[i] in a register and then add it to the register where *a is located during the loop. But in fact, the compiler generates code that places *a in memory because a and b can be confused, and *a may be an element of the array b. Although it seems unlikely that this kind of confusion will occur in this example, the compiler cannot be sure whether this will happen. There are several tricks to help the compiler do a better job of compiling in the case of confusion: you can compile with the -IPA compile option, you can use global variables instead of parameters, you can compile with special compile options, or you can use the _restrict attribute when declaring variables.
3. Pointers often cause confusion
The compiler often has trouble identifying the target object pointed to by a pointer. The programmer can help the compiler avoid confusion by using local variables to store the value obtained by pointer access, because indirect operations and calls affect the value referenced by the pointer rather than the value of the local variable. Therefore, the compiler will put local variables in registers.
The following example shows how to use pointers correctly to avoid ambiguity and produce better compiled code. In this example, the optimizer does not know whether *p++=0 will modify len, so it cannot put len in a register to gain performance improvement. Instead, len is put in memory in each loop. [page]
int len = 10;
void
zero(char *p)
{
int i;
for (i=0; i
}
By using local variables instead of global variables, you can avoid confusion.
int len = 10;
void
zero(char *p)
{
int local_len = len;
int i;
for (i=0; i< local_len; i++) *p++ = 0;
}
4. Use const and restrict qualifiers
The _restrict qualifier tells the compiler to assume that the qualified pointer is the only way to access a certain memory or data object. Load and Store operations through this pointer will not cause confusion with other Load and Store operations within this function unless accessed through this pointer. For example:
float x[ARRAY_SIZE];
float *c = x;
void f4_opt(int n, float * __restrict a, float * __restrict b)
{
int i;
/* No data dependence across iterations because of __restrict */
for (i = 0; i < n; i++)
a[i] = b[i] + c[i];
}
5. Use local variables instead of global variables
This is because global variables retain their values throughout the life of the program. The compiler must assume that global variables may be accessed through pointers. Consider the following code:
int g;
void foo()
{
int i;
for (i=0; i<100; i++){
fred(i,g);
}
}
Ideally, g is loaded once each time fred loops, and its value is passed to fred in a register. However, the compiler does not know whether fred will modify the value of g. If fred does not modify the value of g, you should use a local variable as shown below. This avoids loading g into a register each time fred is called.
int g;
void foo()
{
int i, local_g=g;
for (i=0; i<100; i++){
fred(i,local_g);
}
}
6. Use the correct data type for the data structure
C programmers often make assumptions about data types, but compilers need to be careful about these assumptions. For example, on almost all modern computer architectures, an unsigned char uses 8 bits to represent values from 0 to 255. A C program would assume that adding 1 to an unsigned char value of 255 will change it to 0. In reality, modern 32-bit processors do not perform 8-bit additions, but rather 32-bit additions. Therefore, if an unsigned char local variable is added, the compiler must use multiple instructions to perform the operation to ensure the sign extension after the addition. Therefore, for various variables, especially loop index variables, int variables should be used as much as possible.
Additionally, many embedded processors have 16-bit multiply instructions but lack 32-bit multiply instructions. In this case, 32-bit multiplication will be emulated, which is generally slower. If the data being multiplied will not exceed 16 bits of precision, use short or unsigned short variables.
7. Don’t use indirect calls
This is done via function pointer calls that include passing arguments, because that can have unpredictable side effects (such as modifying global variables) making optimization difficult.
8. Write functions that return values instead of pointers
9. Use numeric values instead of pointers or global variables when passing variables
Pointers should only be used when passing large structures of data. Each structure passed by value should be fully copied and stored at the entry point of the function call.
10. Using the address of a variable will degrade program performance
Because the addresses of local variables can cause confusion, just like global variables.
11. Declaring pointer parameters with const
If the object pointed to by the pointer will not be modified within the function body, the pointer parameter should be declared as const, which allows the compiler to avoid unnecessary negative assumptions.
12. Use arrays instead of pointers. Consider the following code that accesses an array through a pointer.
for (i=0; i<100; i++)
*p++ = ...
In each loop, *p is assigned. This assignment to the pointer object will hinder optimization. In some cases, the pointer points to itself, then this assignment will modify the value of the pointer itself, which will force the compiler to reload the pointer in each loop. In addition, the compiler cannot be sure that this pointer will not be used outside the loop body, so each time outside the loop, the pointer must be updated according to the incremented value. Therefore, it is better to use the following code:
for (i=0; i<100; i++)
p[i] = ...
13. Write simple and understandable code
Compilers are good at creating complex optimizations, such as function embedding and loop unrolling when appropriate. Compilers are not good at simplifying code, they will not merge loops or use function embedding. Manual loop unrolling in source code to support certain processor architectures reduces program portability because it prevents the compiler from automatically performing the correct loop unrolling and function embedding for other processor architectures.
14. Avoid writing functions with a variable number of parameters
If you must do this, use the ANSI standard method: stdarg.h. Use data tables instead of if-then-else or switch branch processing. For example, consider the following code:
typedef enum { BLUE, GREEN, RED, NCOLORS } COLOR;
Alternative
switch (c) {
case CASE0: x = 5; break;
case CASE1: x = 10; break;
case CASE2: x = 1; break;
}
use
static int Mapping[NCOLORS] = { 5, 10, 1 };
...
x = Mapping[c];
15. Rely on libc function library (such as: strcpy, strlen, strcmp, bcopy, bzero, memset and memcpy). These functions are carefully optimized.
Table 1: Some XCC C/C++ compiler optimization switches
Conclusion
Compiler designers have developed many sophisticated optimizations to get the most performance out of the latest processors, and they continue to develop smarter optimization algorithms. Application developers can take advantage of as many of these optimizations as possible by using the proper programming rules. [page]
Table 2: Optimization methods used by some modern compilers
Previous article:Build user applications on embedded Linux platform
Next article:How to write good C++ code for embedded applications
Recommended ReadingLatest update time:2024-11-16 16:20
- Popular Resources
- Popular amplifiers
- Multi-port and shared memory architecture for high-performance ADAS SoCs
- Semantic Segmentation for Autonomous Driving: Model Evaluation, Dataset Generation, Viewpoint Comparison, and Real-time Performance
- Machine Learning and Embedded Computing in Advanced Driver Assistance Systems (ADAS)
- Intelligent program synthesis framework and key scientific problems for embedded software
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- [Atria AT32WB415 Series Bluetooth BLE 5.0 MCU] PWM breathing light
- 30V8A stepper motor driver, step angle 1.8 degrees, required accuracy 0.1 degrees, should I choose chip or H bridge
- Can the 66AK2L06 SoC enable miniaturization of test and measurement equipment?
- Circuit diagram of leakage alarm automatic control socket
- How to detect mosquitoes using ultrasonic sensor circuit
- 2021 National College Student Electronics Competition Released
- Share the application manuals, library functions, routines and selection tables of the full range of MM32 MCU products of Lingdong Microelectronics
- 【Construction Monitoring and Security System】Work Submission Post
- Live FAQ|Typical applications in the era of the Internet of Things
- CCS import routine for TM4C123x