In application fields such as telecommunications, electric power, and national defense, the equipment used is often required to have extremely high real-time performance. When large-capacity information exchange needs to be carried out between various devices, the traditional network packet switching mode can no longer meet the real-time requirements well. With the help of the C PCI bus, two devices can access each other's memory . It has the characteristics of fast transmission speed, large transmission capacity and high reliability, and is very suitable for large-capacity information transfer. The National 863 Program project undertaken by the National Digital Switching System Engineering Technology Research Center - "China's Third Generation Mobile Communications System" CDMA2000 system integration chose a multi-SBC platform based on the CPCI bus. The communication efficiency between each SBC directly determines the performance of the entire system.
Currently commonly used real-time operating systems, such as VxWorks, Lynx, etc., all implement message queues for the CPCI bus, which can be used for message communication between SBCs. However, the implementation of message passing in VxWorks and Lynx is very inflexible. Generally, a shared memory is opened in a specific SBC (usually a system board), and other SBCs (usually a non-system board) read the shared memory. Write exchange information; every time the information exchange between two non-system SBCs is completed, a PCI read and write operation must be performed, which is not efficient. In addition, the message length in VxWorks and Lynx has a maximum value. When large amounts of data (such as a 1GB memory database) are to be transmitted, the operating system cannot provide a message passing mechanism. The above problems can be solved through direct memory access between any two SBCs. This article first introduces the working principle of PCI Bridge ; then, taking the CPX8000 series industrial computers provided by Motorola as an example, it discusses how two SBCs are based on the CPCI bus on the backplane and use the address mapping mechanism of PCI Bridge. Dual-machine communication is finally realized through mutual access to memory; finally, performance optimization issues that should be paid attention to in practical applications are introduced.
1 Working principle of PCI Bridge
In a simple computer system with few external devices, a single-stage bus structure can meet the needs of the system. However, due to electrical limitations on the number of PCI devices that a single PCI bus can support, for computer systems with a large number of peripherals, the single-level bus structure can no longer meet the system requirements, so bridging devices are produced. The new PCI bus can be expanded through the PCI-to-PCI Bridge, and the ISA bus can be expanded through the PCI-to-ISA Bridge. With the help of special PCI devices such as PCI Bridge, buses at all levels in the system are glued together, making the entire system an organic whole.
Each PCI device has its own PCI I/O space, PCI memory space, and PCI configuration space. After the device driver of the PCI device initializes the PCI configuration space, each intelligent controller such as CPU, DMA controller, etc. can access the PCI I/O space and PCI memory space of the PCI device. In Figure 1, if the CPU wants to access the network card, it will first generate a physical address on PCI Bus0. After this address is filtered and converted by the PCI-to-PCI Bridge, a PCI Bus address is generated on PCI Bus1. The network card passes the address Decode and respond to access to this address.
Figure 1 PCI-based system
As can be understood from this process, PCI-to-PCI Bridge has two basic functions:
(1) Address mapping function. Although they both access the network card, the meanings of the addresses on PCI Bus0 and PCI Bus1 are different. The two addresses belong to their own address spaces, and the mapping of the two addresses is achieved through the PCI-to-PCI Bridge. Depending on whether the two addresses are the same, PCI-to-PCI Bridge can be divided into two types:
PCI-to-PCI Transparent Bridge. PCI Bridge does not translate the address on PCI Bus0 and directly maps it to PCI Bus1. The addresses on PCI Bus0 and PCI Bus1 are the same.
·PCI-to-PCI Non Transparent Bridge. The address on PCI Bus0 must be converted by PCI Bridge before it can be mapped to PCI Bus1. The addresses on PCI Bus0 and PCI Bus1 are different.
(2)Address filtering function. PCI Bridge is selective when passing the address on PCI Bus0 to the downstream bus (ISA Bus, PCI Bus1). In Figure 1, the address generated by the CPU on PCI Bus0 is only accepted by the PCI-to-PCI Bridge for access to SCSI and Ethernet; but for other addresses of PCI Bus0, the PCI-to-PCI Bridge does not accept it. response. The address range that each PCI Bridge responds to can be vividly called the address window of the PCI Bridge. Only when the address of the upstream bus falls into the address window of the PCI Bridge, the PCI Bridge responds to this address and passes it to the downstream bus.
2 Specific implementation of dual-machine communication
This section takes the CPX8000 series industrial computers provided by Motorola as an example to introduce how to achieve communication between two machines through the C PCI bus. As shown in Figure 2, the two SBCs are physically connected through the CPCI bus on the backplane. If two SBCs can access each other's memory , data exchange between the two can be achieved. Taking the System Processor Board (also called the motherboard) accessing the memory of the non-system processor board (also called the daughter board) as an example, the specific implementation of dual-machine communication is introduced. This solution has been implemented on Lynx and VxWorks real-time operating systems.
In Figure 2, if the motherboard CPU wants to access the 1MB memory unit in the daughter board, this memory must be mapped into the virtual address space of the motherboard CPU. This can be done by mapping the motherboard, daughter board, motherboard and daughter board. The interface configuration of the machine board is used to achieve the purpose. This 1MB memory unit can be mapped to different address spaces (such as CPU virtual address space, physical address space, local PCI address space, system CPCI address space, etc.), and the mapped addresses are also different. In Figure 2, the mapping addresses of the starting unit of this 1MB memory in different address spaces are represented by symbols A1, A2,...A7 respectively.
图2 数据通信原理图
2.1 子机板的配置
(1)调用内核内存分配函数申请1MB的内核虚拟地址空间,得到申请空间的开始地址A7。
(2)根据操作系统的内存映射关系,得到虚拟地址A7的物理映射地址A6。
(3)Raven ASIC是一个Host-to-PCI Bridge,因为Processor Bus不是一个标准总线,所以通过Raven将其转换为PCI总线,以挂接各类PCI设备。CPU和Raven一起构成了一组套片(chipset),配合使用。根据Raven的设置,获得物理地址A6在Local PCI Bus的映射地址A5。
(4)21554是一PCI-to-PCI Non Transparent Bridge,并可进行双向数据传递。通过其内部的两个配置寄存器,将其地址窗口的大小设为1MB;地址窗口的起始地址在Local PCI Bus端设为A5。
2.2 主机板的配置
(1)申请大小1MB的内核虚拟地址空间,得到其开始地址A1。
(2)根据操作系统的内存映射关系,得到虚拟地址A1的物理映射地址A2。
(3)根据Raven的设置,得到物理地址A2在Local PCI Bus上的映射地址A3。
(4)21154是一PCI-to-PCI Transparent Bridge,它也可以在两个方向上进行数据访问。设置其内部的两个配置寄存器,将其地址窗口的大小设为1MB;地址窗口的起始地址设置为A3。由于21154的透明性,地址A3与其在System CPCI Bus端的映射地址A4的值是相同的。
2.3 主机板与子机板的接口配置
在主机板端对子机板进行配置,设置21554的配置寄存器,将其在System CPCI Bus端的地址窗口开始地址设为A4。由于在Local PCI Bus端的地址窗口起始地址已设为A5,所以将地址A4映射到了地址A5。可以看到,由于21554的非透明性,使主机板与子机板的地址空间相互隔离,各自可独立分配,并在System CPCI Bus级实现了对接。在主机板CPU看来,整个子机板与主机板网卡一样,都是挂在主机板Local PCI Bus下的一个外设。对子机板的访问与对主机板网卡的访问方式是一样的,没有什么不同。
2.4 地址转换流程
当所有的配置完成后,主机板CPU只对地址A1进行读写操作,便可实现对子机板1MB内存起始单元的访问;对1MB内存中其他单元的访问,只要将地址A1加上相应的偏移量即可。通过下面的地址转换流程,可以清楚地看到各级地址是如何通过一级级映射,最终命中指定单元的。
主机板CPU给出虚拟内存访问地址A1→主机板物理地址A2→主机板Local PCI Bus地址A3→System CPCI Bus地址A4→子机板Local PCI Bus地址A5→子机板物理地址A6→经Falcon Memory Controller译码后,选中所申请的1MB内存的起始单元。
从上述介绍可以看出,要想实现双机的内存互访,关键是要进行正确的地址映射。当要实现多个SBC间的相互访问时,地址的映射会更复杂,需要对操作系统的地址空间分配、各个SBC的PCI-to-PCI Bridge设置、System CPCI Bus地址空间分配等进行通盘考虑。
?图3 在两SBC间进行读操作时的时间图
Figure 4 Time for write operations between two SBCs Figure
3 Performance Optimization
Figures 3 and 4 are data captured using VMETRO's bus analyzer. They are the time charts of continuous 100 Byte transmission during read and write access between two SBCs.
As can be seen from Figure 3, each 4Byte read operation takes 956.8+4×149.5+179.4=1734.2ns, which is equivalent to 1734.2ns/29.9ns=58 PCI clock cycles.
As can be seen from Figure 4, the first 4Byte write operation took 159.5ns, followed by two burst transfers, and then a 4Byte write operation took 119.6ns. The average cost of a 4Byte write operation is (159.5ns+2×29.9ns+119.6ns)/(4×29.9ns)=11 PCI clock cycles.
Comparing the two access methods of reading and writing, it can be seen that the writing operation is much more efficient than the reading operation. This is mainly due to the following reasons:
(1) When a master device on PCI initiates an access to a target device, the completion time of the read and write operations varies greatly. Current PCI devices generally have a forwarding (post) buffer for memory writes. To perform a write operation (such as the write operation of the main board to the slave board in Figure 2), the master device only needs to copy its write buffer data to the forwarding buffer of the target device, and the operation is considered complete. For example, in Figure 2, as long as Raven on the motherboard sends data to 21154, the write operation is considered completed, and subsequent data transmission is completed by the 21154 driver. It can be seen that the write operation can be completed on the source bus (Local PCI Bus of the main board) before being completed on the destination bus (Process Bus of the slave board). It is actually a register-to-register operation. To implement a read operation, it must be completed through access to the memory itself and logical delays in the PCI interface at all levels. Compared with write operations, read operations must be completed on the destination bus before being completed on the source bus, which results in low efficiency of read operations. (2) As can be seen from Figure 3 and Figure 4, PCI devices can also perform burst operations of write operations, but read operations cannot. This is because burst operations are only possible when the previous transaction is a write transaction. Burst transmission cancels the turnover cycle of bus signals such as FRAME#, AD, C/BE#, IRDY#, TRDY#, DEVSEL#, etc., and realizes one data transmission per PCI clock cycle. (3) The burst transfer operation cannot proceed indefinitely. The number of consecutive burst transmissions is related to the size of the forwarding buffer, the value of the Latency Timer, and the busy status of the bus. Due to the above reasons, the following methods should be used when transmitting data between two SBCs: (1) The SBC that provides data should write the data directly into the memory of the SBC that consumes the data; instead of the provider placing the data in the local memory, It is then implemented by the consumer through PCI read operations. That is, PCI writes are always performed. (2) When data needs to be transmitted between multiple SBCs, the value of the Latency Timer must be set reasonably so that each SBC can use PCI bus resources fairly. Consider the communication implementation between two slave boards. If the message passing mechanism provided by the operating system is used, the data provider must first write the data to the motherboard, and the data consumer then reads the data from the motherboard. For a 4 Byte data transfer, it takes an average of 58+11=69 PCI clock cycles. If the method provided in this article is used and the SBC that provides data writes the data directly into the memory of the SBC that consumes data (continued from the previous page), then it only takes 11 PCI clock cycles on average to transmit a 4 Byte data. It can be seen that the latter is 69/11≈6.3 times faster than the former, which greatly improves the transmission efficiency. References: 1 Motorola.MCP750 Series Single Board Computer Programmer's Reference Guide. Motorola, 2001 2 Motorola. CompactPCI Single Board Computer Programmer's Reference Guide. Motorola, 2001 3 Intel.21554 PCI-to-PCI Bridge for Embedded Applications Hardware Reference Manual. ? ?Intel,1998 4 Intel. 21154 PCI-to-PCI Bridge Configuration Application Note. Intel,1998 5 Wind River.VxWorks Programmer′s Guide. Wind River Sys-tems Inc,1998 6 Lynx. Lynx Device Driver Service Manual. Lynx, 2000 7 T.Shanley, D.Anderson. PCI System Architecture(4th Edition).USA: Addison Wesley ??Longman,1999
Previous article:ispPAC and its filter design
Next article:A PCI card that continuously outputs data under Windows 2000
Recommended ReadingLatest update time:2024-11-16 17:51
- Popular Resources
- Popular amplifiers
- Molex leverages SAP solutions to drive smart supply chain collaboration
- Pickering Launches New Future-Proof PXIe Single-Slot Controller for High-Performance Test and Measurement Applications
- CGD and Qorvo to jointly revolutionize motor control solutions
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Nidec Intelligent Motion is the first to launch an electric clutch ECU for two-wheeled vehicles
- Bosch and Tsinghua University renew cooperation agreement on artificial intelligence research to jointly promote the development of artificial intelligence in the industrial field
- GigaDevice unveils new MCU products, deeply unlocking industrial application scenarios with diversified products and solutions
- Advantech: Investing in Edge AI Innovation to Drive an Intelligent Future
- CGD and QORVO will revolutionize motor control solutions
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Qorvo launches payload products into space to commemorate its 5th anniversary, speaking with strength!
- Temperature measurement in the electronics and semiconductor industries
- Working principle and structure analysis of variable frequency series resonance
- EEWORLD University ---- Linux driver strategy and framework
- Phase shifter classification and working principle (Part 1)
- 【RT-Thread Reading Notes】10. RT-Thread Learning Chapter 17 Reading Notes
- [TI recommended course] #Live replay: Application of TI millimeter wave sensors in smart home#
- EEWORLD University ---- Live playback: The most important component of the analog world - Signal chain and power supply: Amplifier special
- Help, how to calculate the voltage gain of an amplifier circuit?
- Electronic password lock based on single chip microcomputer