The connection between Arm design philosophy and efficient C language programming

Publisher:colcheryLatest update time:2017-01-05 Source: eefocusKeywords:Arm Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

1. RISC Design Concept

The ARM core uses RISC architecture. RISC is a design concept whose goal is to design a simple and effective instruction set that can be executed in a single cycle at a high clock frequency. The design focus of RISC is on the complexity of the instructions executed by the hardware, because software can easily provide greater flexibility and higher intelligence than hardware. Therefore, RISC design has higher requirements for compilers; in contrast, traditional complex instruction set computers (CISC) focus more on the functionality of hardware execution instructions, making CISC more complex.

The RISC design concept is mainly implemented by the following four design principles:

Instruction Set

RISC processors reduce the number of instruction types. The length of each instruction is fixed, allowing the pipeline to fetch the next instruction during the decoding stage of the current instruction. In CISC processors, the length of instructions is usually not fixed and execution requires multiple cycles.

▇ Assembly Line

Ideally, the pipeline advances one step per cycle to achieve the highest throughput; however, the execution of CISC instructions requires calling a microprogram of the microcode.

▇ Register

RISC processors have more general-purpose registers, each of which can store data or addresses. Registers provide fast local storage access for all data operations; CISC processors are dedicated processors for specific purposes.

▇ Load-store structure

The processor can only process data in registers. Independent load and store instructions are used to complete the transfer of data between registers and external memory. Because accessing memory is time-consuming, memory access and data processing are separated. This has an advantage, that is, the data stored in the register can be used repeatedly, avoiding multiple accesses to the memory. In contrast, in the CISC structure, the processor can directly process the data in the memory.

2. ARM design concept

To reduce power consumption, ARM processors have been specially designed with smaller cores and higher code density. The ARM core is not a pure RISC architecture, which is to make it better adapted to its main application area - embedded systems. In a sense, it can even be said that the success of the ARM core is precisely because it did not sink too deep into the RISC concept. The key to the system now is not the pure processor speed, but the effective system performance and power consumption.

Instruction Set for Embedded Systems

▇ Some specific instructions have variable cycle counts

For example, the execution cycle of load/store instructions for multiple registers is uncertain.

▇ Inline barrel shifter produces more complex instructions

▇ Thumb 16-bit instruction set

▇ Conditional Execution

▇ Enhanced Instructions

3. Efficient C Programming

1) Effective usage of C data types

▇ For local variables stored in registers, except for 8-bit or 16-bit arithmetic modulo operators, try not to use char and short types. Instead, use signed or unsigned int types. Using unsigned numbers for division operations is faster.

▇ For arrays and global variables stored in main memory, as long as the data size is met, small data types should be used as much as possible to save storage space. The ARMv4 architecture can effectively load and store data of all widths, and can use incrementing array pointers to effectively access arrays. For short type arrays, avoid using the offset of the array base address because the LDRH instruction does not support offset addressing.

▇ Since implicit or explicit data type conversions usually have additional instruction cycle overhead, they should be avoided in expressions as much as possible. Load and store instructions generally do not generate additional conversion overhead because load and store instructions automatically complete data type conversions.

▇ Avoid using char and short types for function parameters and return values. Even if the parameter range is relatively small, int type should be used to prevent the compiler from making unnecessary type conversions.

2) Efficiently write loop bodies

▇ Use a loop structure that counts down to zero, so that the compiler does not need to allocate a register to store the loop termination value, and the instruction comparing with 0 can also be omitted.

▇ Use an unsigned loop count value. The condition for loop continuation is i!=0 instead of i>0. This ensures that the loop overhead is only two instructions.

▇ If you know in advance that the loop body will be executed at least once, it is better to use a do-while loop than a for loop, because this allows the compiler to skip the step of checking whether the loop count value is 0.

▇ Expanding important loop bodies can reduce loop overhead, but do not expand them too much. If the loop overhead accounts for a small proportion of the entire program, then loop expansion will increase the amount of code and reduce cache performance.

▇ Try to make the array size a multiple of 4 or 8, so that you can easily expand the loop with multiple options such as 2, 4, 8 times, etc. without worrying about the problem of remaining array elements.

3) Efficient register allocation

▇ You should try to limit the number of local variables used in the internal loop of the function to no more than 12, so that the compiler can allocate all these variables to ARM registers.

▇ You can guide the compiler to determine the importance of a variable by checking whether it belongs to the innermost loop variable

4) Efficient function calling

▇ Try to limit the number of function parameters to no more than 4, so that the function call efficiency will be higher. You can also organize several related parameters in a structure and pass structure pointers instead of multiple parameters.

▇ Put the smaller called function and the calling function in the same source file, and define them first and then call them. The compiler can then optimize the function call or inline the smaller function.

▇ Important functions that have a greater impact on performance can be inlined using the keyword _inline.

5) Avoid pointer aliasing

▇ Do not rely on the compiler to eliminate common subexpressions that contain storage accesses. Instead, create a new local variable to hold the value of the expression, which ensures that the expression is only evaluated once.

▇ Avoid using the address of a local variable, otherwise access to the variable will be inefficient.

6) Efficient structural arrangement

▇ Structural elements should be arranged according to their size, with the smallest element at the beginning and the largest element at the end.

▇ Avoid using very large structures. Use hierarchical small structures instead.

▇ To improve portability, manually add padding bits to the API structure so that the arrangement of the structure will not depend on the compiler.


Keywords:Arm Reference address:The connection between Arm design philosophy and efficient C language programming

Previous article:CPLD is read and written by STM32 VHDL program
Next article:Comparative analysis of the advantages and disadvantages of ARM and MIPS platforms

Latest Microcontroller Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号