Software optimization design based on ARM processor

Publisher:导航灯Latest update time:2015-04-16 Source: eechina Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere
Introduction

With the extensive application of embedded systems in industrial control, automotive systems, home networks, medical and health care, wireless technology and other fields, embedded system developers must face a variety of complex challenges, including how to balance code performance and system cost. In this regard, ARM processors can provide developers with industry-leading technical solutions. The ARM Cortex series provides a standard architecture to meet the different performance requirements of various technologies. It is specially designed to achieve high performance in power- and cost-sensitive embedded applications. It greatly simplifies the complexity of programming. Its mature technology makes the ARM architecture the best choice for various applications. The unified

technology of the ARM Cortex processor is Thumb-2 technology. It is based on the existing ARM technology and combines the advantages of ARM instructions and Thumb instructions. It has unique advantages in optimizing embedded software design, provides the best code density, and can use memory more reasonably. It is crucial for high-speed memory close to the processor core. Even if only a small part of the memory is saved, it will greatly improve the performance of the system and significantly reduce power consumption.

1 Introduction to Thumb-2 instructions

Not all operations can be mapped to the Thumb instruction set. Sometimes multiple Thumb instructions are required to simulate a 32-bit instruction task. Moreover, Thumb instructions cannot access coprocessors, cannot use exception interrupt instructions, and do not support media functions. When these requirements are met in applications and the memory requirements are high, ARM instructions and Thumb instructions must be mixed. The processor core switches to Thumb state as needed to obtain high code density or switches to ARM state to obtain excellent performance. In the development stage, in order to make full use of the memory, it is necessary to repeatedly adjust which code uses ARM instructions and which code uses Thumb instructions. Often, only after the software and hardware are completed can the final decision on how to use ARM/Thumb instructions be made. These factors make the development process very complicated.

Thumb-2 technology is a very important extension of the ARM architecture, which can improve the performance of the Thumb instruction set. The Thumb-2 instruction set has made the following expansions on the basis of the existing Thumb instructions:

· Added some new 16-bit Thumb instructions to improve the execution process of the program

· Added some new 32-bit Thumb instructions to implement some proprietary functions of ARM instructions.

· Expanded the original ARM instructions and added some new instructions to improve code performance and data processing efficiency.

Using Thumb-2 instructions does not require repeated switching between ARM/Thumb states, and code density and performance are significantly improved.

2 Use Thumb-2 instructions to optimize design

For embedded development engineers who already have experience in ARM processor development, using Thumb-2 technology is very simple, because Thumb-2 technology is developed after innovation based on ARM and Thumb. It inherits the original basic programming methods and has the incomparable advantages of ARM/Thumb. When designing embedded software, developers only need to focus on the part of the code design that has the greatest impact on the overall performance, so that they can balance the relationship between performance, code density and power consumption.

2.1 Reduce the Hamming distance

Table 1 Comparison of changes in the Hamming distance
1.gif 

As shown in Table 1, both sets of codes are used to calculate the value of the expression (x1+x2)x(x3-x4), which are implemented using three instruction bundles respectively. The functions, word counts, number of registers used, and opcodes of each instruction are exactly the same. The only difference is the binary encoding of the operands of each register.

Rd represents the target register, and Rn and Rm represent the source operand registers. When executing the original code, the bits of Rd change 4 times, Rn and Rm change 3 times and 5 times respectively, and the total bits of the operands change 12 times. When executing the optimized code, the total bits of the operands change 6 times, which is only half of the original code, which can effectively reduce the power consumption when executing instructions.

2.2 Making good use of 16-bit constant instructions

Two new instructions for 16-bit constants are added to the Thumb-2 instruction set. MOVW can load a 16-bit constant into a register and fill the upper 16 bits of the register with 0; another instruction MOVT can load a 16-bit constant into the upper 16 bits of a register. The combination of these two instructions can load a 32-bit constant into a register.

To operate a 32-bit immediate value or access a peripheral, a 32-bit constant needs to be loaded into a register. For the original ARM/Thumb instruction system, due to the limitation of the number of bits in the instruction encoding, only 12-bit constants can be used at most, and its effective digits are only 8 bits, and the other 4 bits are used for shifting. To read any 32-bit immediate value or address value into a register, the LDR pseudo-instruction is required. The compiler puts the 32-bit data in the data buffer and reads the data with the PC-based LDR instruction. When the program is executed, additional overhead will be caused in the processor. These overheads come from the need for additional clock cycles to enable the data port to access the instruction stream.

Using two instructions to divide the 32-bit constant into two 16-bit constants and load them into the register twice means that the data is directly inside the instruction stream and no longer needs to be accessed through the data port. Compared with the LDR pseudo-instruction method, this solution can eliminate the additional overhead of accessing the instruction stream through the data port, thereby improving performance and reducing power consumption.

2.3 Flexible application of bit operation instructions

Embedded software developers often encounter problems with bit operations, such as the need to assign or reset certain bits in a variable. To extract partial data bit information from a register, insert fixed bit information into a register, etc., developers often use a combination of logical operation instructions and shift operation instructions to achieve this. As shown in Table 2, the two sets of codes perform the same function, compressing the useful information of registers R1 and R2 into register R0 to save registers. The useful information of R1 is R1[15:0], and the useful information of R2 is R2[24:8]. For the original code, because it is necessary to mask the 16-bit data of registers R1 and R2, a 16-bit constant is required, so the MOVW instruction is used to introduce the 16-bit constant. Four instructions are used to complete the program function, and register R3 is used to store intermediate variables. The optimized code only needs one instruction to be implemented, and no additional registers are required for calculation.

Table 2 Comparison of bit operation instructions
2.gif 

In addition to the PKHBT instruction, Thumb-2 technology also provides bit operation instructions such as PKHTB, BFC, BFI, SBFX, and UBFX. In this way, the number of instructions required for developers to insert and extract bits can be significantly reduced, and it will be more convenient to use compressed data structures. The code's demand for registers will also be reduced. [page]

2.4 Select Byte

For developers who are accustomed to programming in high-level languages, most of them like to use if-then-else statements to control the program flow. However, programs written in high-level languages ​​will eventually be converted into assembly programs that reflect machine instructions. Often, a high-level language statement needs to be converted into many assembly instructions. In addition to programming convenience, high-level languages ​​have no advantages in execution efficiency and storage space. Thumb-2 instructions provide an instruction similar to the if-then-else statement. The specific format is shown in Table 3.

The use of the SEL instruction can realize program flow control, and one assembly instruction realizes the functions of 4 if-then-else statements, but each branch can only assign values ​​to character data, which is equivalent to 4 conditional operator statements in C language.

Table 3 SEL instruction
3.gif 

2.5 Register Reversal

Some programs that emphasize algorithms (such as FFT) usually require that hit[n] of the source register be assigned to bit[31-n] of the target register. The bit reversal instruction RBIT can achieve this well. If this instruction is not used, it will take many shift instructions and logical operation instructions to achieve the same function, and a register is also required to store intermediate variables. The use of bit reversal instructions can significantly reduce the number of instructions required and save registers. The

register reversal statements in Thumb-2 instructions are shown in Table 4, including bit reversal, byte reversal and signed reversal.

Table 4 Reversal instruction set
4.gif 

For some encoding/decoding or encryption/decryption programs, it is usually required to swap the high and low bytes of the data. The byte reversal instruction REV can be well implemented. It not only reduces the number of instruction entries, but also saves register space and improves software execution efficiency.

2.6 Implementing jump tables

Using jump tables to control the execution direction of programs is a common feature of high-level languages. Both ARM and Thumb instruction sets can well implement this function. The ARM instruction set is generally used to generate high-performance code. The compiler will optimize performance at the cost of code density. The Thumb compiler will use compressed data tables to reduce code size as much as possible. The

Thumb-2 instruction set introduces two jump table instructions TBB and TBH, corresponding to byte branch jumps and half-word branch jumps respectively. It combines the advantages of ARM/Thumb, and can use the least instructions on the compressed data table to implement the jump table function, and ultimately can achieve the best performance with the smallest code and data.

2.7 Improving the speed of small value calculations

For ordinary control systems or data acquisition systems, the accuracy is often not very high, and 12-bit data is sufficient, but neither ARM instructions nor Thumb instructions provide related instructions for 12-bit immediate numbers (as mentioned above, the effective number of 12-bit immediate numbers in ARM instructions is 8). ThurIlb-2 technology provides two instructions for 12-bit immediate numbers to participate in addition and subtraction operations. The instruction format is shown in Table 5. Using these two instructions can improve data processing speed.

Table 5 Arithmetic operations of 12-bit immediate numbers
5.gif 

In particular, closed-loop control systems need to calculate the offset based on the preset value and feedback value, and control the controlled object based on the offset. The preset value is usually a constant that will not change during system operation. Developers who are accustomed to high-level language programming like to use macro definitions to set the constant, but reading the constant stored in the memory will reduce the data processing speed. If the constant is stored in a register, a very valuable register will be wasted. For situations where the preset value is not easy to change, directly using a 12-bit immediate number to represent the preset constant and directly participating in arithmetic operations can not only save storage space, but also improve data processing speed, and it just corresponds to the 12-bit AD converter for feedback sampling.

2.8 Maximize the use of registers

On processors with load-store architectures such as ARM, accessing data in registers is much more efficient than accessing data in memory. Allocating registers for software variables is far superior to allocating storage space. The

ARM Cortex processor has a total of 14 general registers. The number of variables in actual engineering application software mostly exceeds 14, but the values ​​are small, so multiple variables can be stored in one register. It is also possible to time-multiplex registers for local variables of different functions to give full play to the advantages of registers.

2.9 Other

Thumb-2 technologies provide two zero-crossing detection and jump instructions CBZ and CBNZ. They correspond to zero jump or non-zero jump respectively. This instruction can be used to replace a commonly used instruction sequence: compare with zero, and then jump instruction. Such an instruction sequence is usually used to check whether the pointer is null. Thumb-2 instruction set also adds instructions for accessing coprocessors, so that Thumb-2 code can directly support under-quantity floating-point protection processors and other coprocessors. In conjunction with other instructions for accessing system registers, the entire application can be implemented with Thumb-2 instructions, and there is no need to switch ARM states to implement some special functions.

3 Conclusion

In an application, to achieve the best performance, you need to write optimized assembly programs. Only the key programs that have the greatest impact on performance are worth optimizing. You can use performance analyzers or instruction cycle counting tools to find these sensitive key program segments. The basic idea of ​​optimizing a program is to compress the code size as much as possible to save storage space, improve the execution efficiency of the program as much as possible to obtain higher performance, and reduce power consumption.
Reference address:Software optimization design based on ARM processor

Previous article:Key points for developing embedded system programs based on ARM (V)
Next article:Research on Exception Handling Mechanism of Cortex-M3

Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号