Design of X86 decoding SOC architecture based on ARM embedded platform

Publisher:风暴使者Latest update time:2010-09-18 Source: 现代电子技术Keywords:ARM Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Binary translation is also a kind of compilation technology. It differs from traditional compilers in that the objects they compile and process are different. Traditional compilers process a certain high-level language and generate target code for a certain machine after compilation.

Binary translation is a technology that directly translates executable binary programs, and can translate binary programs on one processor to execute on another processor. It makes it easy to port binary programs between different processors, expands the scope of hardware/software, and helps to break the mutual control between processors and supporting software. The advantages of binary translation technology are: software can be ported from old platforms to new platforms without recompiling source code; software can be quickly provided for new machines, including porting operating systems and compilers; full use of the characteristics of new machines to optimize code; reduced training costs, because the same software is used, so there is no need to retrain employees on the new platform; reduced costs for multi-platform software.

1 SOC Architecture Design

1.1 Processor Determination

General-purpose processors and hardware logic are the mainstream architecture of SoC design. In some applications that require large amounts of data processing, such an architecture cannot meet the requirements. In fact, since different tasks run independently of each other to a large extent, tasks with inherent execution parallelism can be decomposed into closely related subtasks, different cores can execute different subtasks, and multi-core architectures can execute multiple instructions in one cycle. This parallel processing greatly improves the performance of the entire system compared to using a single-core processor to serially process the same task. In addition, multi-core architecture design can reuse existing single-core processors as processor cores, thereby shortening the design and verification cycle and saving development costs, which is in line with the basic idea of ​​SoC design. Multi-core architecture is a trend in the future development of SoC.

The design adopts a dual-core architecture and uses the popular ARM processors ARM7TDMI-S and ARM926EJ-S with good processing power. The biggest advantage of the ARM core is its high speed and low power consumption.

ARM7TDMl-s has a 3-stage pipeline structure and supports Windows CE, Linux and other operating systems. ARM926EJ-S is the most powerful ARM9 processor launched by ARM in 2000. It implements 5-stage pipeline and has a dual AHB bus structure with external communication interface, i.e. instruction AHB bus and data AHB bus. In this design, ARMTDMI-S is mainly responsible for control, operating system platform and task scheduling. ARM926EJ-S is mainly responsible for the execution of various tasks.

1.2 Bus standards used

Since a large number of IP cores are integrated in SoC, the key to the design is how to achieve the interconnection between the IP modules. At present, the interconnection of IP cores in SoC generally adopts a bus structure and communicates through messages.

ARM's AHB and APB are used as on-chip buses. The AMBA bus architecture is an open standard for the current SoC system design structure design. As AMBA is adopted by more and more companies, it has quickly become the standard for SoC structure and IP library development.

In the specific implementation, a two-level bus structure of AHB plus APB is adopted. AHB is used to support high-speed devices and multiple master-slave devices. The priority between multiple master devices is guaranteed through an arbitration mechanism. The slave device is selected through an address decoding mechanism and responds to the bus transaction initiated by the master device. APB is used to support low-speed devices based on register access. The AHB and APB buses are connected together through a bus bridge to realize the protocol conversion between the two buses. Figure 1 is a system structure block diagram of SoC.

SoC system structure block diagram

1.3 Functions of each IP in the system

In addition to the two processors, the functions of each IP core in the SoC are as follows: Translation module: realizes the function of translating X86 instructions into ARM instructions.

SMI: A bridge between external storage and the microprocessor, supporting RoM as the system's non-volatile storage medium and supporting off-chip SRAM as the system's peripheral high-speed storage.

Interrupt controller: used to support internal and external interrupt control of the system, such as interrupt level/edge trigger, interrupt level polarity and interrupt enable, etc.

Internal Memory: On-chip SRAM, the size is 1 KB, but its size can be changed by modifying the Verilog description.

Default Slave: Used to give a response signal when the master accesses an undefined address space.

Retry Slave: This is a slave example that can generate retry responses and wait for commands. If a similar module is needed, it can be used to complete it.

Watchdog: A monitoring module that ensures system security. The software must access the corresponding registers within a predetermined time, otherwise the hardware will generate an internal signal to automatically reset.

GPIO controller: used to support extended peripherals and expand the scope of use of SoC.

Remap&Pause: It is mainly divided into two processing units. The former is responsible for controlling whether the address is re-mapped, and the latter is responsible for managing the power saving mode of the system.

TImer: Timer, supports capture, matchout output, and external clock drive.

2 X86 to ARM binary translation module

The translation module used in this design is implemented by writing Verilog HDL, which can translate some X86 instructions into ARM instructions, thus realizing the migration of some X86 applications to the ARM architecture. Figure 2 shows the internal structure of the decoder.

Decoder internal structure diagram

The translation module first takes out the X86 instructions from the ROM, translates them into ARM instructions and stores them in the RAM. After all the instructions are translated, the translation module generates a terminal to enable the processor to execute the instructions in the RAM. That is, all instructions are translated first before the processor executes them. This translation process belongs to static binary translation. The Decoder is the core of the entire decoding module and is responsible for translating instructions. The Decoder module is implemented by controlling the data path with a finite state machine. The state is classified according to the function and addressing mode of the instruction, and then the ARM instruction is output. For example, the arithmetic instructions with register addressing can be divided into one category:

ADD EAX, EBX

SUB EAX, EBX

Because these instructions have the same addressing method and similar functions, but different opcodes, they can be merged into one state and a mapping relationship can be established within a certain state to translate them into ARM instructions.

Considering that the AHB bus may be in a busy state, two FIFOs are set for X86 instructions and translated ARM instructions. FIFO1 and FIFO2 each have two memories, one for storing instructions and the other for storing the address corresponding to the instruction. When operating the FIFO, the instruction and address are operated at the same time to keep the correspondence between the instruction and the address.

In addition, the ARM core needs to send a signal to the decoding module to control the operation of the instruction decoder by setting the registers in the Communicate module:

Set the starting address of X86 instructions; set the ending address of X86 instructions; set the initial storage address of ARM instructions; set the initial address of the complex instruction segment of ARM instructions; set the flag register to start the instruction decoder working, a high level indicates working; determine whether the instruction decoding is finished, and send an interrupt to the ARM core after it is finished; after the ARM core receives the interrupt signal, it sets the flag register low, and the translation module ends this work.

The SoC system in this article does not use DMA to access X86 instructions and ARM instructions, but the translation module actively reads and writes. Therefore, there are two Master bus interfaces, which read X86 instructions through AHB _1_1inteRFace and write ARM instructions into RAM through AHB_2_1 interface. The communication interface between the Communicate module and the bus is the Slave port, which is used to receive the 4 addresses sent by the ARM core. Once these 4 addresses are received, the start_flag signal in the translation module is set high, indicating that the work has started.

3 On-chip bus structure

In the ARM SoC architecture, there are two important concepts: Master and Slave. The Master is the main unit in the ARM SoC architecture. It can send requests to the bus and initialize the transmission, such as reading/writing the memory. A typical Master can be a CPU, DSP, or DMA. The Slave is the slave unit in the ARM SoC architecture. A typical Slave is an on-chip or off-chip memory, and they all have their own unique address range. When the Master initiates a read/write operation, the address of the read/write operation is given during initialization, and the address decoder determines which Slave is selected by the Master based on this address, and then the corresponding Slave responds accordingly.

In an AHB system, if two masters often need to access the Bus, the system performance will inevitably decline. To solve this problem, ARM proposed Multi-layer AHB, the basic concept of which is that two masters use different buses to access the slaves. If the slaves they access are different, the two masters can perform transfers synchronously. If they access a slave at the same time, the priority is used to determine whose transfer should be processed first.

The bus structure uses the Multi-layerbus switch (BusMatrix) module. The design of the AHB BusMatrix can be divided into three parts: input stage, decoding stage and output stage. Figure 3 shows the structure used in this design, in which the number of inputs and outputs can be flexibly adjusted according to the system's Master and Slave.

The structure used in the design

It can be seen that each layer has a decoder to determine which slave the master wants to access, and the transfer between the master and the slave is implemented through the multiplexer. Each slave port has its own arbitrator, which uses a fixed priority. The layer with the highest priority can access the corresponding slave first.

As the number of Masters and Slaves in the system increases, the complexity of the Busmatrix module will also increase significantly. If the number of input/output ports is determined according to the number of all Masters and Slaves in the system, the Busmatrix will be very complex, so it is very necessary to optimize the system structure. According to the system working conditions, it can be found that the Slave port of the translation module is only accessed by the ARM7 core, that is, the address required for accessing instructions to the translation module to control its work. The Slave can be regarded as private to the ARM7 core and is not accessed by other Masters. Some Slaves are only accessed under special circumstances, so multiple Slaves can be regarded as one Slave hanging on the BusMatrix. The optimized SoC hardware architecture is shown in Figure 4.

Optimized SoC hardware architecture

4 Conclusion

Here we present a SoC system with X86 to ARM binary translation and execution functions. The Multi-layer bus SWitch (BusMatrix) module is used to implement the Multi-layer bus structure. When multiple cores do not access the same slave, they can execute their own functions at the same time, effectively improving the performance of the system, and the bus structure has strong scalability. At the same time, according to the characteristics of the system work, the bus structure is optimized to reduce the complexity of the bus.

Keywords:ARM Reference address:Design of X86 decoding SOC architecture based on ARM embedded platform

Previous article:Embedded Data Acquisition System Based on ARM7 Processor LPC2104
Next article:LPC2100 series ARM7 microcontroller encryption ARM chip

Recommended ReadingLatest update time:2024-11-16 20:32

Design of GPS mobile device based on ARM processor
    introduction   GPS, or Global Positioning System, was developed by the United States in the 1970s. It took 20 years and cost $20 billion to complete in 1994. It has the ability to conduct all-round real-time three-dimensional navigation and positioning on land, sea and air. In recent years, with the continuous im
[Microcontroller]
Design of GPS mobile device based on ARM processor
Design of embedded speech recognition module based on STM32
This paper introduces the design and implementation of an embedded speech recognition module based on ARM. The core processing unit of the module uses ST's 32-bit processor STM32F103C8T6 based on the ARM Cortex-M3 core. This module is centered on the dialogue management unit, realizes the speech recognition function
[Microcontroller]
Design of embedded speech recognition module based on STM32
Character collection and recognition system based on ARM and Linux
The digitization of traditional paper readings and character recognition systems such as barcode recognition are inseparable from the reading and recognition of images. However, the widely used character recognition devices currently separate these two parts, that is, they are composed of image reading devices (such
[Microcontroller]
Character collection and recognition system based on ARM and Linux
ARM instruction state switches to Thumb instruction state
span style="white-space:pre" /span       AREA Arm_to_Thumb,CODE, READONLY       ENTRY       CODE32   start       ldr r0,=aaa+1       rice r3, #18       bx r0       CODE16   aaa        mov r1,#12       rice r2, #10       END   Please look at the code above. First, you see the first instruction, ldr r0,
[Microcontroller]
【ARM】2410 Bare Metal Series - Interrupt Processing
Purpose    In order to better understand the principles of S3C2410 interrupts and the writing of interrupt programs.    Content Use S3C2410 external interrupt 0 and external interrupt 1 to realize two key functions   principle    This time, external interrupts EXTINT0 and EXTINT11 are selected. The interrupts are gen
[Microcontroller]
In the era of AI and IoT, RISC-V challenges giants ARM and Intel
"Chips based on the ARM architecture are highly homogenized in the IoT market, and it eventually turned into a price war. Now it's a crowded market." Fang Zhixi, chairman of the China Advisory Committee of the RISC-V Foundation, said in an exclusive interview with Caixin that the RISC-V architecture will challenge the
[Mobile phone portable]
In the era of AI and IoT, RISC-V challenges giants ARM and Intel
Design of microwave frequency automatic measurement system based on ARM
  1 Introduction   Microwaves usually refer to decimeter waves, centimeter waves and millimeter waves. Regarding its frequency range, one statement is:   300MHz ~ 300GHz (1MHz = 106Hz, 1GHz = 109) The corresponding wavelength in free space is about 1m~1mm.   The rise and vigorous development of microwave technolog
[Test Measurement]
Design of microwave frequency automatic measurement system based on ARM
Design of portable multi-parameter gas detector for mines based on ARM
0 Preface During the mining process of coal mines, a large amount of toxic and harmful gases and flammable gases, such as CH4, CO, and H2S, will be released. When these gases accumulate to a certain concentration, it will cause breathing difficulties, suffocation and death, and even cause gas explosion accidents, se
[Microcontroller]
Design of portable multi-parameter gas detector for mines based on ARM
Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号