Efficient C programming under ARM-EEWORLD

Collect

Writing C programs in a certain style can help the C compiler generate ARM code that executes faster. Here are some key points related to performance:

1. Use signed and unsigned int types for local variables, function parameters, and return values. This avoids type conversion and allows efficient use of ARM's 32-bit data manipulation instructions.

2. The most efficient loop form is a do-while loop that counts down to zero.

3. Expand important loops to reduce loop overhead.

4. Do not rely on the compiler to optimize away repeated memory accesses. Pointer aliasing prevents the compiler from doing this.

5. Try to limit the number of function parameters to 4. If all function parameters are stored in registers, the function call will be much faster.

6. Arrange structures by element size from small to large, especially when compiling in thumb mode.

7. Do not use bit fields, use masks and logical operations instead.

8. To avoid division, you can use reciprocal multiplication instead.

9. Avoid data with misaligned boundaries. If the data may have misaligned boundaries, use the char * pointer type to access it.

10. Using inline assembly in the C compiler can take advantage of instructions or optimizations that the C compiler does not originally support.

1. Optimization of data type usage

1. Local variables

A char type data takes up less register space or less ARM stack space than an int type data. Both assumptions are wrong for ARM. All ARM registers are 32 bits, and all stack entries are at least 32 bits. When we execute i++, we can use the condition that when i=255, i++=0, and define it as a char type.

2. Function parameters

Although wide and narrow function calling conventions have their own advantages, function parameters and return values of type char or short will incur additional overhead, resulting in performance degradation and increased code size. Therefore, even if an 8-bit data is transmitted, it is more efficient to use int type for function parameters and return values.

3. Summary

1) For local variables stored in registers, except for 8-bit or 16-bit arithmetic modulo operations, try not to use char and short types, but use signed or unsigned int types. Unsigned numbers are faster when performing division operations.

2) For arrays and global variables stored in main memory, as long as the data size is met, small data types should be used as much as possible to save storage space. The ARMv4 architecture can efficiently load and store data of all widths, and can use incrementing array pointers to efficiently access arrays. For short type arrays, avoid using the offset of the array base address because the LDRH instruction does not support offset addressing.

3) When reading an array or global variable and assigning it to a local variable of a different type, or when writing a local variable to an array or global variable of a different type, an explicit data type conversion is required. This conversion allows the compiler to process explicitly and quickly, expanding the data type with a narrower data width in the memory and assigning it to the wider type in the register.

4) Since implicit or explicit data type conversions usually have additional instruction cycle overhead, they should be avoided in expressions. Load and store instructions generally do not generate additional conversion overhead because load and store instructions automatically complete data type conversions.

5) Avoid using char and short types for function parameters and return values. Even if the parameter range is small, int type should be used to prevent the compiler from making unnecessary type conversions.

2. C loop structure

On ARM, a loop actually only needs 2 instructions:

A subtraction instruction performs a loop subtraction count and sets the conditional flag of the result at the same time;

A conditional branch instruction.

The key here is that the loop should terminate when the count is down to zero, not when the count is increased to a certain limit. Since the count-down structure is stored in the conditional flags, the instruction to compare with zero can be omitted. Since i is not used as the subscript index of the array, there is no problem with using count-down.

In summary, regardless of the signed loop count value, you should use i! = 0 as the loop end condition. For a signed number i, this is one less instruction than using the condition i>0.

Summarize:

1) Use a loop structure that counts down to zero, so that the compiler does not need to allocate a register to save the loop termination value, and the instruction comparing with 0 can also be omitted.

2) Use an unsigned loop count value, and the condition for loop continuation is i!=0 instead of i>0, which ensures that the loop overhead is only two instructions.

3) If you know in advance that the loop body will be executed at least once, it is better to use a do-while loop than a for loop, because this allows the compiler to skip the step of checking whether the loop count value is zero.

4) Expanding important loop bodies can reduce loop overhead, but do not expand them too much. If the loop overhead accounts for a small proportion of the entire program, then loop expansion will increase the amount of code and reduce cache performance.

5) Try to make the array size a multiple of 4 or 8, so that you can easily expand the loop with multiple options such as 2, 4, 8 times, etc. without worrying about the problem of remaining array elements.

3. Register Allocation

Efficient register allocation

You should try to limit the number of local variables used in the internal loop of the function to no more than 12, so that the compiler can allocate all these variables to ARM registers.

4. Function call

4 Register Rules

Functions with 4 or fewer parameters are much more efficient than functions with more than 4 parameters. For functions with fewer than 4 parameters, the compiler can pass all parameters in registers; for functions with more than 4 parameters, the function caller and the callee must pass some parameters by accessing the stack.

If the function is small and uses only a few registers, there are other ways to reduce the overhead of function calls. You can put the calling function and the called function in the same C file, so that the compiler knows the code generated by the called function and can use it to perform some optimizations on the calling function.

Summarize:

1) Try to limit the number of function parameters to no more than 4, so that the function call efficiency will be higher. You can also organize several related parameters in a structure and pass structure pointers instead of multiple parameters.

2) Put the smaller called function and the calling function in the same source file, and define them first and then call them. The compiler can then optimize the function call or inline the smaller function.

3) Important functions that have a greater impact on performance can be inlined using the keyword _inline.

5. Pointer aliases

Definition: When two pointers point to the same address object, the two pointers are called aliases of the object. If you write to one of the pointers, it will affect the read from the other pointer. In a function, the compiler usually does not know which pointer is an alias and which one is not; or which pointer has an alias and which one does not.

Avoid pointer aliasing:

1) Do not rely on the compiler to eliminate common sub-expressions that contain memory accesses. Instead, create a new local variable to hold the value of the expression, so that the expression is evaluated only once.

2) Avoid using the address of a local variable, otherwise access to the variable will be inefficient.

6. Structural Arrangement

There are two issues to consider when using structures on ARM: structure address boundary alignment and the total size of the structure.

Principles for obtaining efficient structures:

1) Arrange all 8-bit elements in front of the structure;

2) Arrange 16-bit, 32-bit, and 64-bit elements in this way;

3) Place all arrays and larger elements at the end of the structure.

4) If the structure is too large to access all elements in one instruction, organize the elements into a substructure. The compiler can maintain pointers to separate substructures.

summary:

Struct elements should be arranged according to their size, with the smallest element at the beginning and the largest element at the end. Avoid using very large structures, and use hierarchical small structures instead. To improve portability, add padding to the API structure manually, so that the arrangement of the structure will not depend on the compiler. Be cautious when using enumeration types in API structures. The size of an enumeration type is compiler-dependent.

7. Bit fields

Precautions:

1) Avoid using bit fields and use #define or enum to define mask bits;

2) Use integer logic operations AND, OR, XOR, and masking to test, negate, and set bit fields. These operations are highly efficient to compile, and can test, negate, and set multiple bit fields at the same time.

8. Boundary-unaligned data and byte arrangement (big/little endian)

The two issues of misaligned data and byte order can complicate memory access and portability issues. Consider whether array pointers are aligned and whether the ARM configuration is a big-endian or little-endian memory system.

Summarize:

1) Try to avoid using data with misaligned boundaries;

2) Use the type char * to point to data on any byte boundary. Access data by reading bytes and use logical operations to combine data so that the code does not depend on whether the boundary is aligned or the configuration of the ARM byte arrangement;

3) In order to quickly access structures with misaligned boundaries, different program variants can be written according to the pointer boundaries and the byte ordering of the processor.

9. Division

ARM hardware does not support division instructions. When a division operation appears in the code, the ARM compiler calls the C library function (signed division calls _rt_sdiv, unsigned division calls _rt_udiv) to implement the division operation. There are many different types of division routines to adapt to different divisors and dividends.

Summarize:

1) Avoid using division as much as possible. The processing of the ring buffer does not need to use division.

2) If division operations cannot be avoided, then whenever possible consider the benefits of using a division procedure that produces both the quotient n/d and the remainder n%d.

3) For repeated divisions with the same divisor d, pre-calculate s = (2k-1)/d. You can use 2k-bit multiplications by s instead of k-bit unsigned integer divisions by d.

4) Use an integer power of 2 as the divisor. When an integer power of 2 is used as the divisor, the compiler will automatically convert the division operation into a shift operation. So when writing program algorithms, try to use an integer power of 2 as the divisor.

5) Remainder operation. Some typical remainder operations can be converted to avoid using division operations in the program. For example:

uint counter1(uint count)

{

return (++count`);

}

Converts to:

uint counter2(uint count)

{

if (++count >=60)

count=0;

return (count);

}

10. Floating-point operations

Most ARM processors do not support floating-point operations in hardware. This can save space and reduce power consumption in a price-sensitive embedded application system. In addition to the hardware vector floating-point accumulator VFP and the floating-point accumulator FPA on the ARM7500FE, the C compiler must provide floating-point support in software.

11. Inline functions and inline assembly

Call functions efficiently. Using inline functions can completely remove the overhead of function calls. In addition, many compilers allow the use of inline assembly in C source programs. Using inline functions containing assembly allows the compiler to support ARM instructions and optimization methods that are usually not effectively used.

The biggest benefit of inline functions and inline assembly is that they can implement some operations that are usually difficult to complete in the C language part. Using inline functions is better than using #define macro definitions because the latter does not check the types of function parameters and return values.

Keywords：ARM Reference address：Efficient C programming under ARM

Previous article：Understanding ARM Compact Memory TCM
Next article：ARM's CACHE principle

Recommended ReadingLatest update time:2024-11-17 01:45

ARM-Linux transplant (Part 3) - Analysis of init process startup process

We usually use Busybox to build the necessary applications for the root file system. Busybox determines what operation to perform based on the parameters passed in. When the init process starts, it actually calls the init_main() function of Busybox. Let's analyze this function to see what the init process is like. T

[Microcontroller]

Comparison of the differences between eil MDK and IAR ARM development tools

I. Overview Keil MDK-ARM (formerly known as RealView MDK) development tools originated from Keil in Germany and have been verified and used by millions of embedded development engineers around the world. It is the latest software development tool launched by ARM for various embedded processors. KEIL MDK integrates

[Microcontroller]

Application and selection of 32-bit RISC CPU ARM chips

Since its official establishment in 1990, ARM has continuously made breakthroughs in the development of 32-bit RISC (Reduced Instruction Set Computer) CPUs, and its structure has evolved from V3 to V6. Since its establishment, ARM has been selling intellectual property to major semiconductor manufacturers as an IP (

[Microcontroller]

Application and selection of 32-bit RISC CPU ARM chips

The difference between main() and _main() in Arm

When all system initialization work is completed, it is necessary to transfer the program flow to the main application, that is, call the main application. The simplest case is: IMPORT main B main directly jumps from the startup code to the main function entry of the application. Of course, the name of

[Microcontroller]

arm-linux-gcc and simple makefile

gcc common options How to use gcc: gcc filename -v: Check the version of the gcc compiler and display the detailed process of gcc execution -o: specifies the output file name as file, which does not need to be the same as the compiled file name -E: preprocess only; do not compile, assemble or link (preprocess only, n

[Microcontroller]

Intelligent Design of Multi-channel Data Acquisition System Based on ARM9 and μC/OSII

Introduction With the rapid development of IT technology, electronic technology, communication technology, and automatic control technology, high-speed real-time acquisition of industrial field data has become an inevitable link in the development of electronic products and industrial control technology. Aiming

[Microcontroller]

Intelligent Design of Multi-channel Data Acquisition System Based on ARM9 and μC/OSII

Application of μC/OS-II System and ARM in Central Air Conditioning Unit Controller

With the popularization of central air conditioning, how to effectively control central air conditioning units is an important topic for many researchers. At present, the hardware of domestic central air conditioning unit controllers mainly uses 8-bit single-chip microcomputers as core processors. Due to limited res

[Microcontroller]

Application of μC/OS-II System and ARM in Central Air Conditioning Unit Controller

Hardware and software design of Bluetooth real-time data acquisition system based on ARM

1 Introduction With the development of computer technology, especially the widespread penetration of wireless technology into all aspects of people's lives, people's lives have undergone profound changes. In the field of industrial data collection and measurement, due to the large number of measurement types an

[Microcontroller]

Popular Resources
Popular amplifiers