With the development of embedded technology, people have higher and higher requirements for the intelligence and miniaturization of systems. Microprocessors based on ARM structure are widely used in various electronic products with their advantages of high performance, low power consumption and low price, especially in some high-end embedded control applications, such as mobile phones, industrial control, network communications, etc. ARM technology has good performance and efficacy, and its partners include many of the world's top semiconductor companies. It can be said that ARM technology is almost everywhere.
The TCP/IP Internet protocol family has become a protocol for open system interconnection worldwide. It provides excellent interoperability and is compatible with a variety of network technologies. The combination of embedded technology and TCP/IP technology has shown strong development momentum and huge market potential. How to develop efficient code for ARM, especially to improve the execution efficiency of basic software modules such as TCP/IP protocol stack, has become a problem that every developer engaged in ARM-based embedded systems must think about.
Program optimization for ARM
Developing efficient programs involves many aspects, including excellent algorithm implementation, good programming style, and targeted program optimization. Program optimization refers to the process of using software development tools to adjust and improve program code after software programming is basically completed, so that the program can make better use of limited software and hardware resources, reduce code size, and improve operating efficiency.
In the actual program design process, the two goals of program optimization (running speed and code size) are often contradictory. In order to improve the running efficiency of the program, it is necessary to sacrifice storage space and increase the amount of code; and in order to reduce the amount of program code and compress the memory space, it may be necessary to reduce the running efficiency of the program. According to the different optimization focuses, program optimization can be divided into running speed optimization and code size optimization. With the continuous development of microelectronics technology, storage space is no longer the main factor restricting system integration. Program optimization for ARM mainly discusses how to write C language programs that can run efficiently based on the understanding of assembly language and compilation rules.
As a high-performance, low-power RISC chip, ARM's C language compiler is already very mature. Nevertheless, when writing C source programs for ARM, necessary optimization of the program is still an effective way to improve program efficiency. The following are some typical optimization principles and methods used in the process of implementing TCP/IP protocols. These techniques are also applicable to other RISC
Instruction set microprocessor.
Variable definitions
The instruction set of the 32-bit ARM processor supports signed/unsigned 8-bit, 16-bit, 32-bit integer and floating-point variable types, which not only saves code but also improves the running efficiency of the code. According to the different scopes, C language variables can be divided into global variables and local variables. The ARM compiler usually locates global variables in the storage space and allocates local variables to general registers.
When declaring global variables, you need to consider the optimal memory layout so that variables of various types can be aligned with the 32-bit space bit base, thereby reducing unnecessary waste of storage space and improving operating efficiency. For example:
The four variables defined here are of the same form, but in different orders, which leads to different data layouts in the final image, as shown in Figure 1. Obviously, the second method saves more memory space.
For local variables, try not to use variable types other than 32 bits. When a function has a small number of local variables, the compiler will assign local variables to internal registers, with each variable occupying a 32-bit register. In this way, short and char type variables not only fail to save space, but will consume more instruction cycles to complete short and char access operations. The C language code and its compilation results are shown below:
Conditional Execution
Conditional execution is an essential basic operation in a program. A typical conditional execution code sequence starts with a comparison instruction, followed by a series of related execution statements. Conditional execution in ARM is achieved by judging the flag bits of the operation result. In some operation results with flag bits, the results of the N and Z flag bits are the same as the results of the comparison statement. Although there are no instructions with flag bits in C language, in C language programs for ARM, if the operation result is compared with 0, the compiler will remove the comparison instruction and implement the operation and judgment through an instruction with flag bits. For example:
Therefore, the conditional judgment of C language programming for ARM should try to adopt the form of "comparison with 0". In C language, conditional execution statements are mostly used in if conditional judgments, and are also used in complex relational operations (<, ==, >, etc.) and bit operations (&&, !, and, etc.). In C language programming for ARM, signed variables should try to adopt the relational operations of x& lt; 0, x>=0, x==0, x!=0; for unsigned variables, the relational operators x==0, x!=0 (or x>0) should be used. The compiler can optimize conditional execution.
For conditional statements in programming, the if and else judgment conditions should be simplified as much as possible. Different from traditional C language programming, in ARM-oriented C language programming, similar conditions in relational expressions should be grouped together so that the compiler can optimize the judgment conditions.
cycle
Loops are a very common structure in program design. In embedded systems, a large proportion of the microprocessor execution time is spent running in loops, so it is very necessary to pay attention to the execution efficiency of loops. In addition to simplifying the core loop body as much as possible while ensuring the correct operation of the system, correct and efficient loop end flag conditions are also very important. According to the "compare with 0" principle described above, the loop end condition in the program should be a "decrease to 0" loop, and the end condition should be as simple as possible. The above judgment form should be adopted in key loops as much as possible, so that some unnecessary comparison statements can be omitted in key loops, unnecessary overhead can be reduced, and performance can be improved. As shown in the following two examples:
In fact1 and fact2, local variable a is defined to reduce the load/store operations on n. The fact2 function follows the "compare with 0" principle, omitting the comparison instruction in the compilation result of fact1. In addition, variable n does not participate in the calculation during the entire loop process and does not need to be saved. Since register allocation is omitted, it is convenient to compile other parts of the program and improves the running efficiency.
The "decrement to 0" method also applies to while and do statements. If a loop body only loops a few times, you can use the unrolling method to improve the running efficiency. When the loop is unrolled, the loop counter and related jump statements are not needed. Although the code length increases, higher execution efficiency is achieved.
Division and remainder
The ARM instruction set does not provide integer division. Division is implemented by the code in the C language function library (signed _rt_ SDI v and unsigned _rt_udiv). A 32-bit division requires 20 to 140 cycles , depending on the values of the numerator and denominator. The time taken for the division operation is a time constant multiplied by the time required for each bit of division:
Time (numerator/denominator) = C0 + C1 × log2 (numerator/denominator)
=C0+C1×(log2(numerator)-log2(denominator))
Since division has a long execution cycle and consumes a lot of resources, it should be avoided as much as possible in program design. Here are some workarounds to avoid calling division:
(1) In certain specific programming situations, division can be rewritten as multiplication. For example, (x/y)>z, if y is a positive number and y×z is an integer, then it can be written as x>(z×y).
(2) Use powers of 2 as divisors whenever possible. The compiler uses shift operations to perform division, so 128 is more suitable than 100. In programming, unsigned division is faster than signed division.
(3) One purpose of using the remainder operation is to perform modulo calculations. This operation can sometimes be accomplished using an if statement. Consider the following application:
ui NTC counter1(uintcount) uintcounter2(uintcount)
{{return(++count`);if(++count>=60)}count=0;
return (count); }
(4) For some special division and remainder operations, the lookup table method can also achieve good performance.
When dividing by certain constants, it is much more efficient to write a specific function to complete this operation than to compile the generated code. The ARM C language library has two such functions for dividing signed and unsigned numbers by 10, which are used to perform fast decimal operations. In the examplesexpla SMD iv.c and examplesthumbdiv.c files in the toolkit subdirectory, there are ARM and Thumb versions of these two functions.
Application of ARM-oriented program optimization in embedded TCP/IP protocol implementation
The author uses ATMEL's AT91RM9200 microprocessor and Ethernet physical layer driver chip (DM9161) to build a network-oriented embedded system hardware platform, as shown in Figure 2. On this platform, embedded TCP/IP protocol processing based on ARM microprocessor is implemented.
The ARM-based embedded system directly faces Ethernet data, and the typical Ethernet data encapsulation format is shown in Figure 3. According to the above optimization method, the best memory layout needs to be considered when defining variables, so that various types of variables can be aligned with a 32-bit spatial bit base, and the data participating in the operation in the function should be processed as much as possible using 32-bit data.
The implementation of embedded TCP/IP protocol usually adopts the TCP/IP network structure layer in Linux. TCP/IP protocol implements ARP/RARP, IP, IC MP, TCP, UDP and other protocols in the network layer and control layer, and directly provides support for application layer protocols such as HTTP, SMTP, FTP, TELNET, etc. Each system needs to specifically define the interface between the application layer program and the protocol software.
The general flow of protocol processing is shown in Figure 4. In the process of protocol processing, multiple conditional judgments are required, and the checksum processing loop comparison of IP address and TCP data is inevitable. Therefore, the conditional judgment of "comparing with 0" and the loop of "reducing to 0" can be fully utilized to optimize the program design.
Conclusion
In addition to the principles and methods of program optimization for ARM mentioned above, there are many methods of program optimization in C language programming itself. In the system development process based on the ARM embedded system hardware platform mentioned above, by making full use of the C program optimization design method for ARM, the executable code of the TCP/IP protocol processing module can be reduced by more than 5%, and the execution efficiency can be improved. Practice has proved that in the design of ARM-based embedded systems, on the basis of a thorough understanding of the characteristics of ARM assembly instructions and the compilation process, the reasonable use of program optimization principles and methods can effectively improve the compilation efficiency and code execution efficiency.
Previous article:Self-study of external interrupt program of Cortex-M3 processor
Next article:基于LPC2104型CPU的汽车行使记录仪分析
- Popular Resources
- Popular amplifiers
- Learn ARM development(19)
- Learn ARM development(14)
- Learn ARM development(15)
- Learn ARM development(16)
- Learn ARM development(17)
- Learn ARM development(18)
- Embedded system debugging simulation tool
- A small question that has been bothering me recently has finally been solved~~
- Learn ARM development (1)
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Learn ARM development(19)
- Learn ARM development(14)
- Learn ARM development(15)
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- Espressif ESP32-C5
- EEWORLD University ---- Haiwell IoT Terminal MQTT Protocol Application Video Tutorial
- [TI recommended course] #TI LED driver# RGB LED circuit design reference
- 【Sipeed LicheeRV 86 Panel Review】 8- Review Summary
- RVB2601 Evaluation Board Trial 1 Unpacking and Environment Setup
- PNIRP-06V-S Properties Introduction
- [Synopsys IP Resources] Why HDCP 2.3 is required for high-definition large-screen displays
- AM335x Evaluation Board Quick Test (1)
- [TI mmWave Radar Review] Introduction to Sensor Management Module
- The relationship between FPGA memories