1. RISC Design Concept
The ARM core uses RISC architecture. RISC is a design concept whose goal is to design a simple and effective instruction set that can be executed in a single cycle at a high clock frequency. The design focus of RISC is on the complexity of the instructions executed by the hardware, because software can easily provide greater flexibility and higher intelligence than hardware. Therefore, RISC design has higher requirements for compilers; in contrast, traditional complex instruction set computers (CISC) focus more on the functionality of hardware execution instructions, making CISC more complex.
The RISC design concept is mainly implemented by the following four design principles:
Instruction Set
RISC processors reduce the number of instruction types. The length of each instruction is fixed, allowing the pipeline to fetch the next instruction during the decoding stage of the current instruction. In CISC processors, the length of instructions is usually not fixed and execution requires multiple cycles.
▇ Assembly Line
Ideally, the pipeline advances one step per cycle to achieve the highest throughput; however, the execution of CISC instructions requires calling a microprogram of the microcode.
▇ Register
RISC processors have more general-purpose registers, each of which can store data or addresses. Registers provide fast local storage access for all data operations; CISC processors are dedicated processors for specific purposes.
▇ Load-store structure
The processor can only process data in registers. Independent load and store instructions are used to complete the transfer of data between registers and external memory. Because accessing memory is time-consuming, memory access and data processing are separated. This has an advantage, that is, the data stored in the register can be used repeatedly, avoiding multiple accesses to the memory. In contrast, in the CISC structure, the processor can directly process the data in the memory.
2. ARM design concept
To reduce power consumption, ARM processors have been specially designed with smaller cores and higher code density. The ARM core is not a pure RISC architecture, which is to make it better adapted to its main application area - embedded systems. In a sense, it can even be said that the success of the ARM core is precisely because it did not sink too deep into the RISC concept. The key to the system now is not the pure processor speed, but the effective system performance and power consumption.
Instruction Set for Embedded Systems
▇ Some specific instructions have variable cycle counts
For example, the execution cycle of load/store instructions for multiple registers is uncertain.
▇ Inline barrel shifter produces more complex instructions
▇ Thumb 16-bit instruction set
▇ Conditional Execution
▇ Enhanced Instructions
3. Efficient C Programming
1) Effective usage of C data types
▇ For local variables stored in registers, except for 8-bit or 16-bit arithmetic modulo operators, try not to use char and short types. Instead, use signed or unsigned int types. Using unsigned numbers for division operations is faster.
▇ For arrays and global variables stored in main memory, as long as the data size is met, small data types should be used as much as possible to save storage space. The ARMv4 architecture can effectively load and store data of all widths, and can use incrementing array pointers to effectively access arrays. For short type arrays, avoid using the offset of the array base address because the LDRH instruction does not support offset addressing.
▇ Since implicit or explicit data type conversions usually have additional instruction cycle overhead, they should be avoided in expressions as much as possible. Load and store instructions generally do not generate additional conversion overhead because load and store instructions automatically complete data type conversions.
▇ Avoid using char and short types for function parameters and return values. Even if the parameter range is relatively small, int type should be used to prevent the compiler from making unnecessary type conversions.
2) Efficiently write loop bodies
▇ Use a loop structure that counts down to zero, so that the compiler does not need to allocate a register to store the loop termination value, and the instruction comparing with 0 can also be omitted.
▇ Use an unsigned loop count value. The condition for loop continuation is i!=0 instead of i>0. This ensures that the loop overhead is only two instructions.
▇ If you know in advance that the loop body will be executed at least once, it is better to use a do-while loop than a for loop, because this allows the compiler to skip the step of checking whether the loop count value is 0.
▇ Expanding important loop bodies can reduce loop overhead, but do not expand them too much. If the loop overhead accounts for a small proportion of the entire program, then loop expansion will increase the amount of code and reduce cache performance.
▇ Try to make the array size a multiple of 4 or 8, so that you can easily expand the loop with multiple options such as 2, 4, 8 times, etc. without worrying about the problem of remaining array elements.
3) Efficient register allocation
▇ You should try to limit the number of local variables used in the internal loop of the function to no more than 12, so that the compiler can allocate all these variables to ARM registers.
▇ You can guide the compiler to determine the importance of a variable by checking whether it belongs to the innermost loop variable
4) Efficient function calling
▇ Try to limit the number of function parameters to no more than 4, so that the function call efficiency will be higher. You can also organize several related parameters in a structure and pass structure pointers instead of multiple parameters.
▇ Put the smaller called function and the calling function in the same source file, and define them first and then call them. The compiler can then optimize the function call or inline the smaller function.
▇ Important functions that have a greater impact on performance can be inlined using the keyword _inline.
5) Avoid pointer aliasing
▇ Do not rely on the compiler to eliminate common subexpressions that contain storage accesses. Instead, create a new local variable to hold the value of the expression, which ensures that the expression is only evaluated once.
▇ Avoid using the address of a local variable, otherwise access to the variable will be inefficient.
6) Efficient structural arrangement
▇ Structural elements should be arranged according to their size, with the smallest element at the beginning and the largest element at the end.
▇ Avoid using very large structures. Use hierarchical small structures instead.
▇ To improve portability, manually add padding bits to the API structure so that the arrangement of the structure will not depend on the compiler.
Previous article:CPLD is read and written by STM32 VHDL program
Next article:Comparative analysis of the advantages and disadvantages of ARM and MIPS platforms
- Popular Resources
- Popular amplifiers
- Learn ARM development(16)
- Learn ARM development(17)
- Learn ARM development(18)
- Embedded system debugging simulation tool
- A small question that has been bothering me recently has finally been solved~~
- Learn ARM development (1)
- Learn ARM development (2)
- Learn ARM development (4)
- Learn ARM development (6)
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Pickering Launches New Future-Proof PXIe Single-Slot Controller for High-Performance Test and Measurement Applications
- Apple faces class action lawsuit from 40 million UK iCloud users, faces $27.6 billion in claims
- Apple faces class action lawsuit from 40 million UK iCloud users, faces $27.6 billion in claims
- The US asked TSMC to restrict the export of high-end chips, and the Ministry of Commerce responded
- The US asked TSMC to restrict the export of high-end chips, and the Ministry of Commerce responded
- ASML predicts that its revenue in 2030 will exceed 457 billion yuan! Gross profit margin 56-60%
- Detailed explanation of intelligent car body perception system
- How to solve the problem that the servo drive is not enabled
- Why does the servo drive not power on?
- What point should I connect to when the servo is turned on?
- What are the losses in transformers? Explain eddy current, hysteresis, leakage flux, etc.
- DSP Implementation of Square Root Operation
- Problems with si4010
- [GD32L233C-START Review] Development Board Power Consumption Test
- Release an IoT core board, Tiny OS open source system
- DSP Program Structure Programming Notes
- Qorvo Online Design Conference - 5G Ecosystem and Next-Generation Infrastructure Deployment
- [Voice and vision module based on ESP32S3] Project submission
- MicroPython stm32 branch adds SD card option
- The difference between ELM327 Bluetooth OBD and QBD66 Bluetooth OBD