Understanding the Development of ARM Architecture-EEWORLD

Collect

　　The architecture of the processor defines the instruction set (ISA) and the programmer model of the processor based on this architecture. Although each processor has different performance and is aimed at different applications, each processor implementation must follow this architecture. The arm architecture provides high system performance for embedded system developers while maintaining excellent power consumption and area efficiency.

The development of the arm architecture

　　The ARM architecture is steadily evolving to meet the general needs of ARM partners and the design community. Every major modification of the arm architecture adds extremely critical technologies. During major modifications to the architecture, new performance is added as a variant of the architecture. The following names indicate improvements in the system structure, and the keywords attached to them indicate the variants of the architecture.

V3 structure 32-bit address.

Thumb state: 16-bit instructions.

Long multiplication support (32*32=>64 or 32*32+64=>64). This property has become a standard

configuration .

The V4 structure adds half-word storage operations.

Support for debugging (Debug)

Embedded ICE (In Circuit Emulation)

　　processors (cores) belonging to V4 architecture include ARM7, ARM7100 (ARM7 core processor), ARM7500 (ARM7 core processor). Processors (cores) belonging to V4T (supporting Thumb instructions) architecture include ARM7TDMI, ARM7TDMI-S (ARM7TDMI synthesizable version), ARM710T (ARM7TDMI core processor), ARM720T (ARM7TDMI core processor), ARM740T (ARM7TDMI core processor), ARM9TDMI, ARM910T (ARM9TDMI core processor), ARM920T (ARM9TDMI core processor), ARM940T (ARM9TDMI core processor), Strongarm (Intel product).

V5 structure Improves the interactive working ability of arm and Thumb instructions.

E DSP instruction support.

J Java instruction support.

　　The processors (cores) belonging to the V5T (supporting Thumb instructions) architecture include ARM10TDMI, ARM1020T (arm10TDMI core processor). The
　
　　processors (cores) belonging to the V5TE (supporting Thumb, DSP instructions) architecture include ARM9E, ARM9E-S (ARM9E synthesizable version), ARM946 (ARM9E core processor), ARM966 (ARM9E core processor), ARM10E, ARM1020E (ARM10E core processor), ARM1022E (arm10E core processor), Xscale (Intel product).
　
　　The processors (cores) belonging to the V5TEJ (supporting Thumb, DSP instructions, and Java instructions) architecture include ARM9EJ, ARM9EJ-S (ARM9EJ synthesizable version), ARM926EJ (processor of ARM9EJ core), and ARM10EJ. The V6 architecture adds media instructions. The processor cores belonging to the V6 architecture include ARM11. There are four special instruction sets in the ARM architecture: Thumb instructions (T), DSP instructions (E), Java instructions (J), and Media instructions. The V6 architecture includes all four special instruction sets. In order to meet backward compatibility, ARMv6 also includes the memory management and exception handling of armv5. This will enable many third-party developers to use existing achievements and support the reuse of software and designs. The
　
　　new architecture is not intended to replace existing architectures and make them redundant. New CPU cores and derivatives will be built on these structures while constantly keeping pace with manufacturing processes. For example, the arm7TDMI core based on the V4T architecture is still widely used in new products. The
　
　　driving force for the development of new architectures The development of next-generation architectures is driven by the emergence of new products and changing markets. The key design constraints are obvious, functionality, performance, speed, power consumption, area and cost must be balanced with the needs of each application. Ensuring leading performance/power consumption (MIPS/Watt) has been the cornerstone of ARM's success in the past and it is also an important criterion for future applications. As computing and communications continue to cover many consumer areas, functions are becoming more and more complex, and consumers expect advanced user interfaces, multimedia and enhanced product performance. ARMv6 will more effectively support these new properties and technologies.

　　The markets driving the development of the ARMv6 architecture are mainly wireless, networking, automation and consumer entertainment. ARM has worked with architecture licensees and major partners such as Intel, Microsoft, Symbian and TI to define the requirements of the armv6 architecture in the past.
　
　　Improvements in the ARMv6 architecture During the development of the ARMv6 architecture, efforts were focused on five areas: Memory Management The memory management method has a significant impact on system design and performance. Improvements in memory structure will greatly improve the overall performance of the processor - especially for platform-oriented applications. The armv6 architecture can improve instruction (data) fetch efficiency. The processor will spend less time waiting for instructions and reloading data on cache misses. Improvements in memory management will increase system performance by 30%. In addition, improvements in memory management will also improve bus usage efficiency. Less bus activity means power savings. [page]
　
Multi-processor

application coverage drives system implementation towards multi-processor development. Wireless platforms, especially 2.5G and 3G, are typical applications that require the integration of multiple ARM processors or ARM and DSP. Multi-processing devices share data efficiently through shared memory. The new armv6 capabilities in data sharing and synchronization will make it easier to implement multi-processors and improve their performance. New instructions enable complex synchronization strategies, further improving system performance.

Multimedia Support

Single Instruction Multiple Data (SIMD) capabilities enable software to more efficiently implement high-performance media applications such as audio and video encoders. More than 60 SIMD instructions have been added to the armv6 instruction set. The addition of SIMD instructions will increase performance by 2 to 4 times. SIMD capabilities enable developers to implement high-end applications such as image encoding

, speech recognition, and 3D graphics, especially those related to next-generation wireless applications.

Data Processing

　　The endianness of data refers to the way in which data is stored and referenced in memory.
　
　　With more SOC integration, a single chip contains not only little-endian OS environments and interfaces (like USB, PCI), but also big-endian data (TCP/IP packets, MPEG streams). The ARMv6 architecture supports a mix. As a result, data processing issues are more efficient in the armv6 architecture. Unaligned
　
　　data refers to data that is not aligned to a natural boundary. For example, in DSP applications, it is sometimes necessary to align word data to halfwords. The processor needs to be able to load words to any halfword boundary to handle this situation more effectively. The
　
　　current version of the architecture requires a large number of instructions to handle unaligned data. ARMv6 compatible architectures handle unaligned data more efficiently. For DSP algorithms that rely heavily on unaligned data, the ARMv6 architecture will provide performance improvements and code size reductions. Unaligned data support will make ARM processors more efficient in emulating other processors like Motorola's 68000 series. Unlike ARMv5 implementations like ARM10 and Xscale, ARMv6 is based on 32-bit processors. ARMv6 can implement bus widths of 64 bits or more. This allows the bus to be equal to or even greater than a 64-bit processor, but with lower power and area than a 64-bit CPU.
　Exceptions and Interrupts

　　For real-time systems, the efficiency of interrupts is critical. For applications such as hard disk controllers and engine management applications, if interrupts are not responded to in a timely manner, the consequences will be serious. More efficient handling of interrupts and exceptions can also improve the overall performance of the system. This is especially important when slowing down the system. In the ARMv6 architecture, new instructions have been added to the instruction set to improve the implementation of interrupts and exceptions. These will effectively improve exception handling in privileged modes.

ARM11 main performance

ARM11 is the first implementation of ARMv6 architecture. The design purpose of ARM11 microstructure is high performance, and pipeline is the key to achieve this goal. The pipeline of ARM11 microstructure is different from the previous ARM core. It contains 8-stage pipeline, which makes the penetration rate 40% higher than the previous core.

Single instruction issuance

　　The pipeline of the arm11 microarchitecture is scalar (SCALAR), that is, only one instruction is issued at a time (single issue). Some pipeline structures can issue multiple instructions at the same time, for example, instructions can be issued to the ALU and MAC pipelines at the same time.
　
　　In theory, multi-issue microarchitectures will have higher efficiency, but in practice, multi-issue microarchitectures will undoubtedly increase the complexity of the front-end instruction decoding level because more logic is required to handle instruction dependencies (DEPENDENCY), which will make the processor area and power consumption larger.
　
Branch prediction

　　Branch instructions are usually conditional instructions, which require some conditional tests before jumping to new instructions. Since the conditional code required for conditional instruction decoding may have a result after three or four cycles, branches may cause pipeline delays.
　
　　But branch prediction will help avoid this delay. The arm11 microarchitecture uses two techniques to predict branches. First, the dynamic predictor uses historical records to determine whether the branch is the most frequent or the least frequent. The dynamic predictor is a 64-entry, 4-state (StronglyTaken, WeaklyTaken, Strongly notTaken, Weakly notTaken) branch target address cache (BTAC). The table size is enough to hold the most recent branch situations, and the branch prediction is based on the previous results. Secondly, if the dynamic branch predictor does not find an entry, the static branch algorithm is used. Very simply, the static prediction checks whether the branch jumps forward or backward. If it jumps backward, it assumes it is a loop and predicts that the branch will take place. If it jumps forward, it predicts that the branch will not take place.
　
　　By using dynamic and static branch prediction, 85% of the branch instructions in the arm11 microarchitecture are correctly predicted.
　
　　Memory Access One of the improvements in the memory system of the ARM11 microarchitecture is non-blocking and hit-under-miss operations. When the data fetched by the instruction is not in the cache, the pipeline of the general processor will stop, but the arm11 performs non-blocking operation, the cache starts to read the missing data, and the pipeline can continue to execute the next instruction (NON-BLOCKING), and allow the instruction to read the data in the cache (HIT-UNDER-MISS).
　
Parallel pipeline

　　Although the pipeline is single-issue, three parallel component structures are used at the back end of the pipeline, ALU, MAC (multiplication and addition), and LS (access). The LS pipeline is specifically used to process access operation instructions. Separating the coupling of data access operations from data arithmetic operations can more efficiently process execution instructions. In the ARM11 microarchitecture that contains the LS component in the pipeline, the ALU or MAC instructions will not stop due to waiting for the LS instruction. This also gives the compiler greater freedom to improve performance by rearranging the code. In order to make the parallel pipeline more efficient, the arm11 microarchitecture uses out-of-order completion.
　
64-bit data path

　　For many current applications, true 64-bit processors are not necessary due to cost and power consumption issues. The ARM11 microarchitecture uses 64-bit architecture locally to achieve 64-bit performance at the cost of 32 bits. The ARM11 microarchitecture uses a 64-bit data bus between the processor integer unit and cache, and between the integer unit and the coprocessor. The 64-bit path can read two instructions from the cache in one cycle, allowing data from two arm registers to be transferred per cycle. This makes many data movement operations and data processing operations more performant.
　
Floating-point processing

　　The ARM11 microarchitecture supports floating-point processing. The arm11 microarchitecture product line includes a floating-point processing unit as an option. This makes it easy for developers to use the right product according to their needs.

Keywords：ARM Reference address：Understanding the Development of ARM Architecture

Previous article：Electronic system design based on ARM
Next article：Implementation of startup program in ARM7 embedded system

Recommended ReadingLatest update time:2024-11-16 14:59

Tiny4412 Friendly Arm ARM Development Board Static IP Settings (Restart Valid)

I just got the Tiny4412 development board and found that its IP address is not 192.168.1.230. It keeps changing automatically. Today it is 192.168.1.7, and tomorrow it becomes 192.168.1.8. So I decided to set it to a static IP address: 192.168.1.240. First, refer to a common solution provided on the Internet: To mod

[Microcontroller]

Tiny4412 Friendly Arm ARM Development Board Static IP Settings (Restart Valid)

Essential ARM simulator knowledge for embedded design

1. Development cycle of embedded products The first stage of a typical embedded microcontroller development project is to use a C compiler to generate target code from the source program. The generated target code will include physical addresses and some debugging information. Currently, the code can be executed

[Power Management]

ARM architecture (1) - working mode and working state

1. ARM working mode User mode (usr): It belongs to the normal user mode and the normal program execution state of the ARM processor. Fast interrupt mode (fiq): used to handle fast interrupts for high-speed data transmission or channel processing. External interrupt mode (irq): handles interrupts in general situations.

[Microcontroller]

ARM architecture (1) - working mode and working state

The startup process of the arm chip using s3c2440 as an example

The startup process of arm embedded chips is actually very complicated for embedded novices. Many people have only a little understanding of it and there are many misunderstandings. In the author's opinion, if you want to truly understand this startup process, you must first understand the differences and connections

[Microcontroller]

The startup process of the arm chip using s3c2440 as an example

Design of Touch Screen Interface for Embedded Microprocessor ARM7202

1 Introduction Touch screens are increasingly being used in embedded systems. The design methods for touch screens vary in different application fields. Generally, there are three types: (1) Using a touch screen module. The touch screen module provides a standard hardware interface to connect to the application

[Industrial Control]

Design of Touch Screen Interface for Embedded Microprocessor ARM7202

Solution to the problem of No Cortex-M Device found when debugging ARM

This morning I have been debugging the PWM output of stm32F407. Last night I have already turned on the general timer TIM2. Now I am going to debug the so-called advanced TIM1. I checked the manual and found that Ch1 of TIM1 is multiplexed with GPIO_Pin7, and Ch2 is multiplexed with GPIOA_Pin7. So the multiplexing code

[Microcontroller]

Solution to the problem of No Cortex-M Device found when debugging ARM

10. Learn ARM from scratch - Detailed explanation of pwm based on Exynos4412

1. What is PWM PWM, the English name Pulse Width Modulation, is the abbreviation of pulse width modulation. It modulates the width of a series of pulses to produce the required waveform (including shape and amplitude) and digitally encodes the analog signal level. In other words, the change of signal and energy is adj

[Microcontroller]

10. Learn ARM from scratch - Detailed explanation of pwm based on Exynos4412

A brief discussion on breakpoint resources in ARM emulator

At present, more and more embedded development companies and engineers are beginning to use JTAG ICE emulators to debug programs. The traditional full ICE emulation method is gradually being replaced by the flexible and low-cost JTAG emulation method. JTAG emulators can currently meet all basic requirements for

[Microcontroller]

A brief discussion on breakpoint resources in ARM emulator

Popular Resources
Popular amplifiers