As shown in Figure 1, ARM7TDMI needs to access each slave through the bus; DMA also needs to access peripherals through the bus for data exchange when working; and the LCD controller module needs to continuously access the video memory through the bus to read data in order to achieve real-time display; other masters in the system also occupy the bus when working. The LCD controller module
should be paid special attention to. Color screen display requires a large amount of data. For example, a 320×240, 16bpp TFT color screen requires: 320×240×16/8=153.6kByte for each frame. Such a large amount of data cannot be provided by on-chip memory, and must be obtained from peripherals through the memory interface. Since the amount of data required by the LCD controller is large and needs to be displayed in real time, the work of the LCD controller will occupy a large amount of on-chip bus bandwidth, and even affect the normal operation of the entire system. In the current consumer electronics field, supporting color screen applications is almost indispensable.
This problem can be solved by adopting methods such as optimizing bus switching algorithms, increasing on-chip caches, and improving bus architectures. Among them, the performance improvement brought by optimizing the bus switching algorithm is relatively limited, and the complexity of the cache design itself and its high license cost make it unsuitable in many cases. Therefore, AMBA with a dual-bus architecture is a good choice.
Dual-bus architecture AMBA and its implementation
In the case of a single-layer bus, all masters and slaves are hung on the AHB bus. If any Master wants to access the Slave, it must first apply for the bus. After obtaining the ownership of the bus, it exchanges the address, data and control signals through the MUX in the bus interconnection structure, while other Masters must wait.
Double-layer AMBA bus structure
The double-layer AMBA bus architecture uses a more complex internal interconnection structure, which allows two groups of Masters and Slaves to interact with each other through AMBA at the same time, greatly improving the bandwidth of the bus. And any Master can access the Slave on any layer. In addition, after adopting the double-layer AMBA bus, it is transparent to the AHB Master and AHB Slave, and no modification is required.
Figure 2 is the internal structure diagram of the double-layer AMBA bus designed in this paper. For this double-layer AMBA bus, it is set to support 16 Masters and 16 Slaves, and each layer has 8 Masters and 8 Slaves.
The double-layer AMBA bus itself consists of three parts: the bus decoder, pre-arbitrator and multiple data selectors (MUX) of Layer 1; the bus decoder, pre-arbitrator and multiple data selectors of Layer 2; and the core arbiter of the entire bus. The first two are basically the same, and the core arbiter is the core of the entire double-layer bus architecture. The principle is: the eight masters of each layer first perform decoding and arbitration in their own layer, and the results are sent to the core arbiter, and then the core arbiter determines the state switching and how each MUX selects the data flow and control flow.
The design of the internal components
is combined with Figure 2 and the AMBA protocol. The following introduces the various components of this double-layer AMBA bus. Since the design and function of each component of the second layer are similar to those of the first layer, only the first layer is introduced.
* Layer 1 decoder
This decoder adopts a centralized address decoding mechanism, which is conducive to improving the portability of peripheral devices. The decoder receives the address signal sent by the Master currently occupying the bus, generates a chip select signal corresponding to each Slave, and sends it to the core arbiter. The chip select signal is generated by comparing with the base address of each slave.
It is worth noting that since each master can access any one of Slave0~Slave15, the decoder must be able to generate at least 16 chip select signals.
In addition, the decoder of each layer should have a default chip select signal corresponding to the default slave. The response of this default slave is divided into two cases: for IDLE or BUSY transmission, an OKAY response is made; for NONSEQU ENTIAL or SEQUENTIAL transmission, an ERROR response is made.
* The pre-arbitrator
arbiter of Layer1 receives the bus request signal (HBusReq) issued by each master and the judgment signal of the required bus switching, uses a certain bus arbitration algorithm, determines the master that can occupy the bus, and generates the control signal of M to S MUX1. Different from the single-layer AMBA, the HMaster_layer1 and BusHgrant_layer1 signals generated by it are sent to the core arbitrator instead of directly to each HMaster. In addition, the received current slave response is sent from the core arbitrator.
There are two bus switching algorithms that the arbitrator can use: fixed priority algorithm and round-robin priority algorithm. In the AMBA specification, the bus switching algorithm can be flexibly selected according to actual needs. In this component, a fixed priority algorithm is used, that is, Master0 has the lowest priority, and Master7 has the highest priority.
* Layer1's multiplexer
has a total of 4 MUXs in Layer1, namely M to S MUX1, M to S MUX2, S to M MUX1 and S to M MUX2. Among them, M to S MUX1 receives the signal of the Layer1 arbitrator as a chip select signal, selects one group from the 8 groups of bus signals and outputs it to the core arbitrator, Layer1's M to S MUX2 and Layer2's M to S MUX2. For M to S MUX2, its control signal is obtained from the core arbitrator, and its function is to select one group from the two groups of bus signals and send it to the corresponding Slave in Layer1. S to M MUX1 receives the chip select signal output by the core arbiter, and selects one group from the 8 groups of bus response signals (Hready, Hresp, Hrdata) of Layer1 to send to the core arbiter, S to M MUX2 of Layer1 and S to M MUX2 of Layer2. S to M MUX2 outputs a group of bus response signals to all the masters of Layer1.
* Core arbiter
The main function of the core arbiter is to get the initial state from the chip select signals output by the decoders of the two layers; then decide when to switch the state based on the response signal and transmission status of the slave; at the same time, according to its own state, output the corresponding signal to the relevant MUX as a control signal, output Hmaster and BusHgrant signals to the masters of each layer, and output the corresponding slave response signal to the pre-arbitrators of the two layers.
Since there are situations where masters from different layers access slaves from the same layer at the same time, the core arbitrator also needs to consider the bus switching algorithm. And because at most two masters will seize the bus in the core arbitrator, a simple round-robin priority algorithm can be used.
The main part of the core arbiter is a state machine, which consists of seven states:
IDLE: Enter this state after the system is reset to complete the initial assignment of some data;
M1S1M2S2: Layer1 Master communicates with Layer1 Slave, Layer2 Master communicates with Layer2 Slave, that is, the two-layer bus runs in parallel;
M1S2M2S1: Layer1 Master communicates with Layer2 Slave, Layer2 Master communicates with Layer1 Slave;
M1S1M2S1: Layer1 Master communicates with Layer1 Slave, and Layer2 Master is waiting for communication with Layer1 Slave;
M1S2M2S2: Layer1 Master communicates with Layer2 Slave, and Layer2 Master is waiting for communication with Layer2 Slave;
M2S1M1S1: Layer2 Master communicates with Layer1 Slave, and Layer1 Master is waiting for communication with Layer1 Slave;
M2S2M1S2: The Master of Layer 2 communicates with the Slave of Layer 2, and the Master of Layer 1 is waiting to communicate with the Slave of Layer 2.
The switching between these seven states is determined by the chip select signal given by the two-layer decoder, the control signal sent by the Master currently occupying the bus, and the response signal of the Slave communicating with this Master. When it comes to the state switching of the ARM Master, the three-level pipeline characteristics must be considered and appropriate waiting cycles must be given.
In addition, there is also a first-level input latch part in the core arbitrator, which is used to latch the address and control signals sent by the waiting Master.
Design results and establishment of test platform
For the above implementation, Verilog language is used to describe it at the RTL level, and Synopsys's VCS tool is used for functional simulation. In order to verify the correctness of the above design, for the architecture shown in Figure 1, the single-layer AMBA is changed to a double-layer AMBA, and the LCDC Master and LCDC Slave are moved to the second layer. At the same time, a simple MC Slave is added to the second layer, and the memory models of SRAM and SDRAM are hung outside it. The SDRAM is used to store the LCDC Master display memory data, and the other structures remain unchanged (as shown in Figure 3). At the same time, a set of test programs based on ARM assembly language is prepared to configure the system. After this test program is run, there are three Masters: ARM Master, DMA Master and LCDC Master will continuously access the bus.
The results show that the design is correct: ARM Master can configure the Slave of Layer2; while the LCDC Master of the second layer reads data from the MC Slave of the same layer, the Master of the first layer is accessing the Slave of the same layer; other Masters of Layer1 can also apply for the bus of Layer2 to access the external memory of Layer2.
In addition, in order to examine the occupancy rate of the LCD controller on the bus, an Hmaster Monitor submodule is hung on the AHB to count the number of clock cycles occupied by each Master on the current bus.
Comparison of the two bus modes
The design of the single-layer AMBA bus and the double-layer AMBA bus are compared from two aspects.
First, from the aspect of reducing the occupancy rate of the LCD controller bus. As can be seen from Table 1, when using a single-layer AMBA bus, the LCD controller occupies a relatively large bus bandwidth: for a typical 320×240, 16bpp TFT color screen, the LCD controller occupies 16.3% of the bus bandwidth. When using a double-layer AMBA bus, except for the bus cycle occupied by the ARM Master to configure the two Slaves, the LCD controller will only occupy the bandwidth of Layer2.
Secondly, from the results of synthesis, the area occupied by the double-layer AMBA is larger. In the case of including the APB module, the area obtained by the single-layer AMBA synthesis is 17,000 gates, while the area of the double-layer AMBA is 18,500 gates. Both support 16 Masters and 16 Slaves. The TSMC 0.25 process standard cell library is used, and the gate-level netlist is synthesized using Synopsys's Design Compiler tool.
For the actual application of the double-layer AMBA bus, the MC Slave of Layer1 can be connected to a non-volatile memory, while the MC Slave of Layer2 can be connected to a volatile memory. In this way, the instruction area can be placed in Layer 1, and the data area in Layer 2. Therefore, the instruction fetch operation of the ARM Master can be completed in Layer 1, and the LCD controller can read the video memory data in Layer 2. These two operations occupy a large amount of bus bandwidth, thus greatly reducing the waiting time of each Master due to bus preemption and improving the bus bandwidth.
Conclusion
ARM7TDMI has been widely used in the design of SoC chips, but because it does not have its own cache, it needs to access external memory frequently. If other modules that require a large data bandwidth are integrated on the chip at this time, the performance of the system will be greatly reduced. The double-layer AMBA bus can greatly improve the bus bandwidth and provide a more flexible system architecture under the condition of slightly increasing the occupied area. This has very important significance and practical value for SoC chips based on ARM7TDMI and other SoC chips with similar architectures.
Previous article:Cortex-M3 multi-tasking application design based on MDK RTX
Next article:Improving the A/D Resolution of LPC2138 Using Gradient Average Method
- Popular Resources
- Popular amplifiers
- Learn ARM development(16)
- Learn ARM development(17)
- Learn ARM development(18)
- Embedded system debugging simulation tool
- A small question that has been bothering me recently has finally been solved~~
- Learn ARM development (1)
- Learn ARM development (2)
- Learn ARM development (4)
- Learn ARM development (6)
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
- Principle of MCU reset
- Regarding the C216 issue, I want to make a combination lock, and I'm a newbie looking for advice from a master! !
- The use of SRAM in HPM6750 and the problems encountered
- Oscilloscope automobile turbocharger solenoid valve waveform and analysis
- Transformer voltage calculation
- Failure to send data in broadcast mode
- EEWORLD University - IGBT module technology, drive and application
- Mbed online compiler to be retired soon
- New China Mobile Onenet NB Development Board
- PicoPlanet – Development board with capacitive touch