C Programming Optimization for Embedded SoC Applications

Publisher:千变万化Latest update time:2012-04-26 Source: 21ic Keywords:Embedded Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

When developing programs to run on embedded processor cores within SoCs, engineers have two main goals: run them fast enough to minimize the processor frequency; and consume as little memory as possible to minimize memory overhead.

The importance of these two factors may vary from project to project. Two key factors greatly influence the design team's ability to meet these goals: how well the compiler that develops the source code optimizes the code, and the programming style used to develop the source code. This article will discuss these two factors in depth and offer some suggestions for creating small and fast C programs.

Compiler Principles

Compilers usually consist of two parts: the front end and the back end. The front end usually refers to the syntax and semantics processing process, and the back end usually refers to optimization, code generation, and optimization processes for specific processors. Many good compiler back ends rely on multiple layers of intermediate representations (IR). Optimization and code generation pass the intermediate representation step by step from high level (syntax of type input program) to low level. Processor-independent optimizations generally tend to be implemented on higher IR levels early in the compilation process, while processor-specific optimizations generally tend to be implemented on lower-level IRs later in the compilation process. Information is passed down through different IR layers, so that low-level optimizations can make full use of high-level information processed by the compiler early.

Tensilica's XCC/C++ compiler for its Xtensa configurable processors and Diamond standard processors includes four basic optimization levels, from -O0 to -O3, corresponding to increasing optimization levels. Table 1 describes these levels and their corresponding code size and internal procedure analysis (IPA). By default, the XCC compiler optimizes one file at a time, but it can also perform internal procedure analysis (by adding the IPA compile option). When optimizing the entire application over multiple source files, the optimization will be delayed until after the link step. Table 2 describes a partial list of optimizations supported by current compilers (including the XCC compiler).

The XCC compiler can also make use of the performance analysis data generated by the compilation. The performance analysis feedback can help the compiler reduce the delay of branch jumps. In addition, the feedback allows the compiler to insert only the most commonly used functions (inline) and properly handle the problem of register overflow in commonly used code sections. Therefore, performance analysis feedback allows the XCC compiler to perform normal optimizations everywhere, while also speeding up by optimizing critical parts of the application.

Some useful C coding rules

To get the best performance from the compiler, the programmer needs to think like a compiler and understand the relationship between the C language and the target processor. The following basic principles can help all embedded programmers get much better performance compiled code without much effort.

1. Observe the compiled code

It is impossible to fully understand how the compiler compiles all the code. If the XCC compiler is set with the -S or -save-temps compilation option, the compilation will produce assembly output and some comments added for understanding. For those codes with high performance requirements, you can observe whether the compilation results meet your expectations. If not, please consider the following rules.

2. Understand the situation where obfuscation occurs

The C language increases the chances of confusion by allowing the arbitrary use of pointers, which allows a program to refer to the same data object in many ways. If the address of a global variable is passed as an argument to a subroutine, the variable can be referenced by its name or through a pointer. This is a form of confusion, and the compiler must be conservative in storing such data objects in memory rather than in registers, and carefully maintain the order of variable accesses in the code that could cause confusion. Consider the following code:

void foo(int *a, int *b)

{

int i;

for (i=0; i<100; i++) {

*a += b[i];

}

}

You would imagine that the compiler would generate code that would store *a in a register before the loop starts, and store b[i] in a register and then add it to the register where *a is located during the loop. But in fact, the compiler generates code that places *a in memory because a and b can be confused, and *a may be an element of the array b. Although it seems unlikely that this kind of confusion will occur in this example, the compiler cannot be sure whether this will happen. There are several tricks to help the compiler do a better job of compiling in the case of confusion: you can compile with the -IPA compile option, you can use global variables instead of parameters, you can compile with special compile options, or you can use the _restrict attribute when declaring variables.

3. Pointers often cause confusion

The compiler often has trouble identifying the target object pointed to by a pointer. The programmer can help the compiler avoid confusion by using local variables to store the value obtained by pointer access, because indirect operations and calls affect the value referenced by the pointer rather than the value of the local variable. Therefore, the compiler will put local variables in registers.

The following example shows how to use pointers correctly to avoid ambiguity and produce better compiled code. In this example, the optimizer does not know whether *p++=0 will modify len, so it cannot put len ​​in a register to gain performance improvement. Instead, len is put in memory in each loop. [page]

int len ​​= 10;

void

zero(char *p)

{

int i;

for (i=0; i

}

By using local variables instead of global variables, you can avoid confusion.

int len ​​= 10;

void

zero(char *p)

{

int local_len = len;

int i;

for (i=0; i< local_len; i++) *p++ = 0;

}

4. Use const and restrict qualifiers

The _restrict qualifier tells the compiler to assume that the qualified pointer is the only way to access a certain memory or data object. Load and Store operations through this pointer will not cause confusion with other Load and Store operations within this function unless accessed through this pointer. For example:

float x[ARRAY_SIZE];

float *c = x;

void f4_opt(int n, float * __restrict a, float * __restrict b)

{

int i;

/* No data dependence across iterations because of __restrict */

for (i = 0; i < n; i++)

a[i] = b[i] + c[i];

}

5. Use local variables instead of global variables

This is because global variables retain their values ​​throughout the life of the program. The compiler must assume that global variables may be accessed through pointers. Consider the following code:

int g;

void foo()

{

int i;

for (i=0; i<100; i++){

fred(i,g);

}

}

Ideally, g is loaded once each time fred loops, and its value is passed to fred in a register. However, the compiler does not know whether fred will modify the value of g. If fred does not modify the value of g, you should use a local variable as shown below. This avoids loading g into a register each time fred is called.

int g;

void foo()

{

int i, local_g=g;

for (i=0; i<100; i++){

fred(i,local_g);

}

}

6. Use the correct data type for the data structure

C programmers often make assumptions about data types, but compilers need to be careful about these assumptions. For example, on almost all modern computer architectures, an unsigned char uses 8 bits to represent values ​​from 0 to 255. A C program would assume that adding 1 to an unsigned char value of 255 will change it to 0. In reality, modern 32-bit processors do not perform 8-bit additions, but rather 32-bit additions. Therefore, if an unsigned char local variable is added, the compiler must use multiple instructions to perform the operation to ensure the sign extension after the addition. Therefore, for various variables, especially loop index variables, int variables should be used as much as possible.

Additionally, many embedded processors have 16-bit multiply instructions but lack 32-bit multiply instructions. In this case, 32-bit multiplication will be emulated, which is generally slower. If the data being multiplied will not exceed 16 bits of precision, use short or unsigned short variables.

7. Don’t use indirect calls

This is done via function pointer calls that include passing arguments, because that can have unpredictable side effects (such as modifying global variables) making optimization difficult.

8. Write functions that return values ​​instead of pointers

9. Use numeric values ​​instead of pointers or global variables when passing variables

Pointers should only be used when passing large structures of data. Each structure passed by value should be fully copied and stored at the entry point of the function call.

10. Using the address of a variable will degrade program performance

Because the addresses of local variables can cause confusion, just like global variables.

11. Declaring pointer parameters with const

If the object pointed to by the pointer will not be modified within the function body, the pointer parameter should be declared as const, which allows the compiler to avoid unnecessary negative assumptions.

12. Use arrays instead of pointers. Consider the following code that accesses an array through a pointer.

for (i=0; i<100; i++)

*p++ = ...

In each loop, *p is assigned. This assignment to the pointer object will hinder optimization. In some cases, the pointer points to itself, then this assignment will modify the value of the pointer itself, which will force the compiler to reload the pointer in each loop. In addition, the compiler cannot be sure that this pointer will not be used outside the loop body, so each time outside the loop, the pointer must be updated according to the incremented value. Therefore, it is better to use the following code:

for (i=0; i<100; i++)

p[i] = ...

13. Write simple and understandable code

Compilers are good at creating complex optimizations, such as function embedding and loop unrolling when appropriate. Compilers are not good at simplifying code, they will not merge loops or use function embedding. Manual loop unrolling in source code to support certain processor architectures reduces program portability because it prevents the compiler from automatically performing the correct loop unrolling and function embedding for other processor architectures.

14. Avoid writing functions with a variable number of parameters

If you must do this, use the ANSI standard method: stdarg.h. Use data tables instead of if-then-else or switch branch processing. For example, consider the following code:

typedef enum { BLUE, GREEN, RED, NCOLORS } COLOR;

Alternative

switch (c) {

case CASE0: x = 5; break;

case CASE1: x = 10; break;

case CASE2: x = 1; break;

}

use

static int Mapping[NCOLORS] = { 5, 10, 1 };

...

x = Mapping[c];

15. Rely on libc function library (such as: strcpy, strlen, strcmp, bcopy, bzero, memset and memcpy). These functions are carefully optimized.


Table 1: Some XCC C/C++ compiler optimization switches

Conclusion

Compiler designers have developed many sophisticated optimizations to get the most performance out of the latest processors, and they continue to develop smarter optimization algorithms. Application developers can take advantage of as many of these optimizations as possible by using the proper programming rules. [page]


Table 2: Optimization methods used by some modern compilers
Keywords:Embedded Reference address:C Programming Optimization for Embedded SoC Applications

Previous article:Build user applications on embedded Linux platform
Next article:How to write good C++ code for embedded applications

Recommended ReadingLatest update time:2024-11-16 16:20

IAR embedded Workbench for STM8 activation tutorial
IAR for STM8 3.10 cracked version full name IAR Embedded Workbench for STM8, IAR for STM8 is an embedded work platform, you will also see IAR EWARM in some places, IAR Stm8 is mainly used for software development of ARM processors, is an integrated development environment from IAR official website, supports all STM8 s
[Microcontroller]
IAR embedded Workbench for STM8 activation tutorial
Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号