The architecture of the processor defines the instruction set (ISA) and the programmer model of the processor based on this architecture. Although each processor has different performance and is aimed at different applications, each processor implementation must follow this architecture. The arm architecture provides high system performance for embedded system developers while maintaining excellent power consumption and area efficiency.
The development of the arm architecture
The ARM architecture is steadily evolving to meet the general needs of ARM partners and the design community. Every major modification of the arm architecture adds extremely critical technologies. During major modifications to the architecture, new performance is added as a variant of the architecture. The following names indicate improvements in the system structure, and the keywords attached to them indicate the variants of the architecture.
V3 structure 32-bit address.
Thumb state: 16-bit instructions.
Long multiplication support (32*32=>64 or 32*32+64=>64). This property has become a standard
configuration .
The V4 structure adds half-word storage operations.
Support for debugging (Debug)
Embedded ICE (In Circuit Emulation)
processors (cores) belonging to V4 architecture include ARM7, ARM7100 (ARM7 core processor), ARM7500 (ARM7 core processor). Processors (cores) belonging to V4T (supporting Thumb instructions) architecture include ARM7TDMI, ARM7TDMI-S (ARM7TDMI synthesizable version), ARM710T (ARM7TDMI core processor), ARM720T (ARM7TDMI core processor), ARM740T (ARM7TDMI core processor), ARM9TDMI, ARM910T (ARM9TDMI core processor), ARM920T (ARM9TDMI core processor), ARM940T (ARM9TDMI core processor), Strongarm (Intel product).
V5 structure Improves the interactive working ability of arm and Thumb instructions.
E DSP instruction support.
J Java instruction support.
The processors (cores) belonging to the V5T (supporting Thumb instructions) architecture include ARM10TDMI, ARM1020T (arm10TDMI core processor). The
processors (cores) belonging to the V5TE (supporting Thumb, DSP instructions) architecture include ARM9E, ARM9E-S (ARM9E synthesizable version), ARM946 (ARM9E core processor), ARM966 (ARM9E core processor), ARM10E, ARM1020E (ARM10E core processor), ARM1022E (arm10E core processor), Xscale (Intel product).
The processors (cores) belonging to the V5TEJ (supporting Thumb, DSP instructions, and Java instructions) architecture include ARM9EJ, ARM9EJ-S (ARM9EJ synthesizable version), ARM926EJ (processor of ARM9EJ core), and ARM10EJ. The V6 architecture adds media instructions. The processor cores belonging to the V6 architecture include ARM11. There are four special instruction sets in the ARM architecture: Thumb instructions (T), DSP instructions (E), Java instructions (J), and Media instructions. The V6 architecture includes all four special instruction sets. In order to meet backward compatibility, ARMv6 also includes the memory management and exception handling of armv5. This will enable many third-party developers to use existing achievements and support the reuse of software and designs. The
new architecture is not intended to replace existing architectures and make them redundant. New CPU cores and derivatives will be built on these structures while constantly keeping pace with manufacturing processes. For example, the arm7TDMI core based on the V4T architecture is still widely used in new products. The
driving force for the development of new architectures The development of next-generation architectures is driven by the emergence of new products and changing markets. The key design constraints are obvious, functionality, performance, speed, power consumption, area and cost must be balanced with the needs of each application. Ensuring leading performance/power consumption (MIPS/Watt) has been the cornerstone of ARM's success in the past and it is also an important criterion for future applications. As computing and communications continue to cover many consumer areas, functions are becoming more and more complex, and consumers expect advanced user interfaces, multimedia and enhanced product performance. ARMv6 will more effectively support these new properties and technologies.
The markets driving the development of the ARMv6 architecture are mainly wireless, networking, automation and consumer entertainment. ARM has worked with architecture licensees and major partners such as Intel, Microsoft, Symbian and TI to define the requirements of the armv6 architecture in the past.
Improvements in the ARMv6 architecture During the development of the ARMv6 architecture, efforts were focused on five areas: Memory Management The memory management method has a significant impact on system design and performance. Improvements in memory structure will greatly improve the overall performance of the processor - especially for platform-oriented applications. The armv6 architecture can improve instruction (data) fetch efficiency. The processor will spend less time waiting for instructions and reloading data on cache misses. Improvements in memory management will increase system performance by 30%. In addition, improvements in memory management will also improve bus usage efficiency. Less bus activity means power savings. [page]
Multi-processor
application coverage drives system implementation towards multi-processor development. Wireless platforms, especially 2.5G and 3G, are typical applications that require the integration of multiple ARM processors or ARM and DSP. Multi-processing devices share data efficiently through shared memory. The new armv6 capabilities in data sharing and synchronization will make it easier to implement multi-processors and improve their performance. New instructions enable complex synchronization strategies, further improving system performance.
Multimedia Support
Single Instruction Multiple Data (SIMD) capabilities enable software to more efficiently implement high-performance media applications such as audio and video encoders. More than 60 SIMD instructions have been added to the armv6 instruction set. The addition of SIMD instructions will increase performance by 2 to 4 times. SIMD capabilities enable developers to implement high-end applications such as image encoding
, speech recognition, and 3D graphics, especially those related to next-generation wireless applications.
Data Processing
The endianness of data refers to the way in which data is stored and referenced in memory.
With more SOC integration, a single chip contains not only little-endian OS environments and interfaces (like USB, PCI), but also big-endian data (TCP/IP packets, MPEG streams). The ARMv6 architecture supports a mix. As a result, data processing issues are more efficient in the armv6 architecture. Unaligned
data refers to data that is not aligned to a natural boundary. For example, in DSP applications, it is sometimes necessary to align word data to halfwords. The processor needs to be able to load words to any halfword boundary to handle this situation more effectively. The
current version of the architecture requires a large number of instructions to handle unaligned data. ARMv6 compatible architectures handle unaligned data more efficiently. For DSP algorithms that rely heavily on unaligned data, the ARMv6 architecture will provide performance improvements and code size reductions. Unaligned data support will make ARM processors more efficient in emulating other processors like Motorola's 68000 series. Unlike ARMv5 implementations like ARM10 and Xscale, ARMv6 is based on 32-bit processors. ARMv6 can implement bus widths of 64 bits or more. This allows the bus to be equal to or even greater than a 64-bit processor, but with lower power and area than a 64-bit CPU.
Exceptions and Interrupts
For real-time systems, the efficiency of interrupts is critical. For applications such as hard disk controllers and engine management applications, if interrupts are not responded to in a timely manner, the consequences will be serious. More efficient handling of interrupts and exceptions can also improve the overall performance of the system. This is especially important when slowing down the system. In the ARMv6 architecture, new instructions have been added to the instruction set to improve the implementation of interrupts and exceptions. These will effectively improve exception handling in privileged modes.
ARM11 main performance
ARM11 is the first implementation of ARMv6 architecture. The design purpose of ARM11 microstructure is high performance, and pipeline is the key to achieve this goal. The pipeline of ARM11 microstructure is different from the previous ARM core. It contains 8-stage pipeline, which makes the penetration rate 40% higher than the previous core.
Single instruction issuance
The pipeline of the arm11 microarchitecture is scalar (SCALAR), that is, only one instruction is issued at a time (single issue). Some pipeline structures can issue multiple instructions at the same time, for example, instructions can be issued to the ALU and MAC pipelines at the same time.
In theory, multi-issue microarchitectures will have higher efficiency, but in practice, multi-issue microarchitectures will undoubtedly increase the complexity of the front-end instruction decoding level because more logic is required to handle instruction dependencies (DEPENDENCY), which will make the processor area and power consumption larger.
Branch prediction
Branch instructions are usually conditional instructions, which require some conditional tests before jumping to new instructions. Since the conditional code required for conditional instruction decoding may have a result after three or four cycles, branches may cause pipeline delays.
But branch prediction will help avoid this delay. The arm11 microarchitecture uses two techniques to predict branches. First, the dynamic predictor uses historical records to determine whether the branch is the most frequent or the least frequent. The dynamic predictor is a 64-entry, 4-state (StronglyTaken, WeaklyTaken, Strongly notTaken, Weakly notTaken) branch target address cache (BTAC). The table size is enough to hold the most recent branch situations, and the branch prediction is based on the previous results. Secondly, if the dynamic branch predictor does not find an entry, the static branch algorithm is used. Very simply, the static prediction checks whether the branch jumps forward or backward. If it jumps backward, it assumes it is a loop and predicts that the branch will take place. If it jumps forward, it predicts that the branch will not take place.
By using dynamic and static branch prediction, 85% of the branch instructions in the arm11 microarchitecture are correctly predicted.
Memory Access One of the improvements in the memory system of the ARM11 microarchitecture is non-blocking and hit-under-miss operations. When the data fetched by the instruction is not in the cache, the pipeline of the general processor will stop, but the arm11 performs non-blocking operation, the cache starts to read the missing data, and the pipeline can continue to execute the next instruction (NON-BLOCKING), and allow the instruction to read the data in the cache (HIT-UNDER-MISS).
Parallel pipeline
Although the pipeline is single-issue, three parallel component structures are used at the back end of the pipeline, ALU, MAC (multiplication and addition), and LS (access). The LS pipeline is specifically used to process access operation instructions. Separating the coupling of data access operations from data arithmetic operations can more efficiently process execution instructions. In the ARM11 microarchitecture that contains the LS component in the pipeline, the ALU or MAC instructions will not stop due to waiting for the LS instruction. This also gives the compiler greater freedom to improve performance by rearranging the code. In order to make the parallel pipeline more efficient, the arm11 microarchitecture uses out-of-order completion.
64-bit data path
For many current applications, true 64-bit processors are not necessary due to cost and power consumption issues. The ARM11 microarchitecture uses 64-bit architecture locally to achieve 64-bit performance at the cost of 32 bits. The ARM11 microarchitecture uses a 64-bit data bus between the processor integer unit and cache, and between the integer unit and the coprocessor. The 64-bit path can read two instructions from the cache in one cycle, allowing data from two arm registers to be transferred per cycle. This makes many data movement operations and data processing operations more performant.
Floating-point processing
The ARM11 microarchitecture supports floating-point processing. The arm11 microarchitecture product line includes a floating-point processing unit as an option. This makes it easy for developers to use the right product according to their needs.
Previous article:Electronic system design based on ARM
Next article:Implementation of startup program in ARM7 embedded system
Recommended ReadingLatest update time:2024-11-16 14:59
- Popular Resources
- Popular amplifiers
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- 【TGF4042 Signal Generator】+ Noise Test
- CAN communication design, please help
- Has anyone used ads7844?
- CPU card programming system main frequency setting
- EEWORLD University ---- Automotive eCall Power Solution
- Because he mastered the company's "core technology", the boss hired several strong men to kidnap people on the street after leaving the company
- It's time to test your eyesight
- How to not set a password in AP mode
- LCD Segment Screen Screen Printing Notes
- hfss18 version 3D image setting problem