Implementation of dual-machine communication on CPCI bus

Publisher:caoda143Latest update time:2006-05-18 Source: 电子技术应用Keywords:bus  pci  map  memory Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

   In application fields such as telecommunications, electric power, and national defense, the equipment used is often required to have extremely high real-time performance. When large-capacity information exchange needs to be carried out between various devices, the traditional network packet switching mode can no longer meet the real-time requirements well. With the help of the C PCI bus, two devices can access each other's memory . It has the characteristics of fast transmission speed, large transmission capacity and high reliability, and is very suitable for large-capacity information transfer. The National 863 Program project undertaken by the National Digital Switching System Engineering Technology Research Center - "China's Third Generation Mobile Communications System" CDMA2000 system integration chose a multi-SBC platform based on the CPCI bus. The communication efficiency between each SBC directly determines the performance of the entire system.

  Currently commonly used real-time operating systems, such as VxWorks, Lynx, etc., all implement message queues for the CPCI bus, which can be used for message communication between SBCs. However, the implementation of message passing in VxWorks and Lynx is very inflexible. Generally, a shared memory is opened in a specific SBC (usually a system board), and other SBCs (usually a non-system board) read the shared memory. Write exchange information; every time the information exchange between two non-system SBCs is completed, a PCI read and write operation must be performed, which is not efficient. In addition, the message length in VxWorks and Lynx has a maximum value. When large amounts of data (such as a 1GB memory database) are to be transmitted, the operating system cannot provide a message passing mechanism. The above problems can be solved through direct memory access between any two SBCs. This article first introduces the working principle of PCI Bridge ; then, taking the CPX8000 series industrial computers provided by Motorola as an example, it discusses how two SBCs are based on the CPCI bus on the backplane and use the address mapping mechanism of PCI Bridge. Dual-machine communication is finally realized through mutual access to memory; finally, performance optimization issues that should be paid attention to in practical applications are introduced.   

  1 Working principle of PCI Bridge

  In a simple computer system with few external devices, a single-stage bus structure can meet the needs of the system. However, due to electrical limitations on the number of PCI devices that a single PCI bus can support, for computer systems with a large number of peripherals, the single-level bus structure can no longer meet the system requirements, so bridging devices are produced. The new PCI bus can be expanded through the PCI-to-PCI Bridge, and the ISA bus can be expanded through the PCI-to-ISA Bridge. With the help of special PCI devices such as PCI Bridge, buses at all levels in the system are glued together, making the entire system an organic whole.   

  Each PCI device has its own PCI I/O space, PCI memory space, and PCI configuration space. After the device driver of the PCI device initializes the PCI configuration space, each intelligent controller such as CPU, DMA controller, etc. can access the PCI I/O space and PCI memory space of the PCI device. In Figure 1, if the CPU wants to access the network card, it will first generate a physical address on PCI Bus0. After this address is filtered and converted by the PCI-to-PCI Bridge, a PCI Bus address is generated on PCI Bus1. The network card passes the address Decode and respond to access to this address.

                       Figure 1 PCI-based system   

  As can be understood from this process, PCI-to-PCI Bridge has two basic functions:   

  (1) Address mapping function. Although they both access the network card, the meanings of the addresses on PCI Bus0 and PCI Bus1 are different. The two addresses belong to their own address spaces, and the mapping of the two addresses is achieved through the PCI-to-PCI Bridge. Depending on whether the two addresses are the same, PCI-to-PCI Bridge can be divided into two types:   

  PCI-to-PCI Transparent Bridge. PCI Bridge does not translate the address on PCI Bus0 and directly maps it to PCI Bus1. The addresses on PCI Bus0 and PCI Bus1 are the same.   

  ·PCI-to-PCI Non Transparent Bridge. The address on PCI Bus0 must be converted by PCI Bridge before it can be mapped to PCI Bus1. The addresses on PCI Bus0 and PCI Bus1 are different.   

  (2)Address filtering function. PCI Bridge is selective when passing the address on PCI Bus0 to the downstream bus (ISA Bus, PCI Bus1). In Figure 1, the address generated by the CPU on PCI Bus0 is only accepted by the PCI-to-PCI Bridge for access to SCSI and Ethernet; but for other addresses of PCI Bus0, the PCI-to-PCI Bridge does not accept it. response. The address range that each PCI Bridge responds to can be vividly called the address window of the PCI Bridge. Only when the address of the upstream bus falls into the address window of the PCI Bridge, the PCI Bridge responds to this address and passes it to the downstream bus.   

  2 Specific implementation of dual-machine communication

  
This section takes the CPX8000 series industrial computers provided by Motorola as an example to introduce how to achieve communication between two machines through the C PCI bus. As shown in Figure 2, the two SBCs are physically connected through the CPCI bus on the backplane. If two SBCs can access each other's memory , data exchange between the two can be achieved. Taking the System Processor Board (also called the motherboard) accessing the memory of the non-system processor board (also called the daughter board) as an example, the specific implementation of dual-machine communication is introduced. This solution has been implemented on Lynx and VxWorks real-time operating systems.   

  In Figure 2, if the motherboard CPU wants to access the 1MB memory unit in the daughter board, this memory must be mapped into the virtual address space of the motherboard CPU. This can be done by mapping the motherboard, daughter board, motherboard and daughter board. The interface configuration of the machine board is used to achieve the purpose. This 1MB memory unit can be mapped to different address spaces (such as CPU virtual address space, physical address space, local PCI address space, system CPCI address space, etc.), and the mapped addresses are also different. In Figure 2, the mapping addresses of the starting unit of this 1MB memory in different address spaces are represented by symbols A1, A2,...A7 respectively.

                    图2 数据通信原理图   

  2.1 子机板的配置

  (1)调用内核内存分配函数申请1MB的内核虚拟地址空间,得到申请空间的开始地址A7。   

  (2)根据操作系统的内存映射关系,得到虚拟地址A7的物理映射地址A6。   

  (3)Raven ASIC是一个Host-to-PCI Bridge,因为Processor Bus不是一个标准总线,所以通过Raven将其转换为PCI总线,以挂接各类PCI设备。CPU和Raven一起构成了一组套片(chipset),配合使用。根据Raven的设置,获得物理地址A6在Local PCI Bus的映射地址A5。   

  (4)21554是一PCI-to-PCI Non Transparent Bridge,并可进行双向数据传递。通过其内部的两个配置寄存器,将其地址窗口的大小设为1MB;地址窗口的起始地址在Local PCI Bus端设为A5。   

  2.2 主机板的配置
  (1)申请大小1MB的内核虚拟地址空间,得到其开始地址A1。   

  (2)根据操作系统的内存映射关系,得到虚拟地址A1的物理映射地址A2。   

  (3)根据Raven的设置,得到物理地址A2在Local PCI Bus上的映射地址A3。   

  (4)21154是一PCI-to-PCI Transparent Bridge,它也可以在两个方向上进行数据访问。设置其内部的两个配置寄存器,将其地址窗口的大小设为1MB;地址窗口的起始地址设置为A3。由于21154的透明性,地址A3与其在System CPCI Bus端的映射地址A4的值是相同的。   

  2.3 主机板与子机板的接口配置

  在主机板端对子机板进行配置,设置21554的配置寄存器,将其在System CPCI Bus端的地址窗口开始地址设为A4。由于在Local PCI Bus端的地址窗口起始地址已设为A5,所以将地址A4映射到了地址A5。可以看到,由于21554的非透明性,使主机板与子机板的地址空间相互隔离,各自可独立分配,并在System CPCI Bus级实现了对接。在主机板CPU看来,整个子机板与主机板网卡一样,都是挂在主机板Local PCI Bus下的一个外设。对子机板的访问与对主机板网卡的访问方式是一样的,没有什么不同。   

  2.4 地址转换流程
  
  当所有的配置完成后,主机板CPU只对地址A1进行读写操作,便可实现对子机板1MB内存起始单元的访问;对1MB内存中其他单元的访问,只要将地址A1加上相应的偏移量即可。通过下面的地址转换流程,可以清楚地看到各级地址是如何通过一级级映射,最终命中指定单元的。   

  主机板CPU给出虚拟内存访问地址A1→主机板物理地址A2→主机板Local PCI Bus地址A3→System CPCI Bus地址A4→子机板Local PCI Bus地址A5→子机板物理地址A6→经Falcon Memory Controller译码后,选中所申请的1MB内存的起始单元。

  从上述介绍可以看出,要想实现双机的内存互访,关键是要进行正确的地址映射。当要实现多个SBC间的相互访问时,地址的映射会更复杂,需要对操作系统的地址空间分配、各个SBC的PCI-to-PCI Bridge设置、System CPCI Bus地址空间分配等进行通盘考虑。

                   ?图3 在两SBC间进行读操作时的时间图

                     Figure 4 Time for write operations between two SBCs Figure   

  3 Performance Optimization

  Figures 3 and 4 are data captured using VMETRO's bus analyzer. They are the time charts of continuous 100 Byte transmission during read and write access between two SBCs.   

  As can be seen from Figure 3, each 4Byte read operation takes 956.8+4×149.5+179.4=1734.2ns, which is equivalent to 1734.2ns/29.9ns=58 PCI clock cycles.   

  As can be seen from Figure 4, the first 4Byte write operation took 159.5ns, followed by two burst transfers, and then a 4Byte write operation took 119.6ns. The average cost of a 4Byte write operation is (159.5ns+2×29.9ns+119.6ns)/(4×29.9ns)=11 PCI clock cycles.   

  Comparing the two access methods of reading and writing, it can be seen that the writing operation is much more efficient than the reading operation. This is mainly due to the following reasons:   

  (1) When a master device on PCI initiates an access to a target device, the completion time of the read and write operations varies greatly. Current PCI devices generally have a forwarding (post) buffer for memory writes. To perform a write operation (such as the write operation of the main board to the slave board in Figure 2), the master device only needs to copy its write buffer data to the forwarding buffer of the target device, and the operation is considered complete. For example, in Figure 2, as long as Raven on the motherboard sends data to 21154, the write operation is considered completed, and subsequent data transmission is completed by the 21154 driver. It can be seen that the write operation can be completed on the source bus (Local PCI Bus of the main board) before being completed on the destination bus (Process Bus of the slave board). It is actually a register-to-register operation. To implement a read operation, it must be completed through access to the memory itself and logical delays in the PCI interface at all levels. Compared with write operations, read operations must be completed on the destination bus before being completed on the source bus, which results in low efficiency of read operations.   (2) As can be seen from Figure 3 and Figure 4, PCI devices can also perform burst operations of write operations, but read operations cannot. This is because burst operations are only possible when the previous transaction is a write transaction. Burst transmission cancels the turnover cycle of bus signals such as FRAME#, AD, C/BE#, IRDY#, TRDY#, DEVSEL#, etc., and realizes one data transmission per PCI clock cycle.    (3) The burst transfer operation cannot proceed indefinitely. The number of consecutive burst transmissions is related to the size of the forwarding buffer, the value of the Latency Timer, and the busy status of the bus.     Due to the above reasons, the following methods should be used when transmitting data between two SBCs:    (1) The SBC that provides data should write the data directly into the memory of the SBC that consumes the data; instead of the provider placing the data in the local memory, It is then implemented by the consumer through PCI read operations. That is, PCI writes are always performed.   (2) When data needs to be transmitted between multiple SBCs, the value of the Latency Timer must be set reasonably so that each SBC can use PCI bus resources fairly.   Consider the communication implementation between two slave boards. If the message passing mechanism provided by the operating system is used, the data provider must first write the data to the motherboard, and the data consumer then reads the data from the motherboard. For a 4 Byte data transfer, it takes an average of 58+11=69 PCI clock cycles. If the method provided in this article is used and the SBC that provides data writes the data directly into the memory of the SBC that consumes data (continued from the previous page), then it only takes 11 PCI clock cycles on average to transmit a 4 Byte data. It can be seen that the latter is 69/11≈6.3 times faster than the former, which greatly improves the transmission efficiency. References: 1 Motorola.MCP750 Series Single Board Computer Programmer's Reference Guide. Motorola, 2001 2 Motorola. CompactPCI Single Board Computer Programmer's Reference Guide. Motorola, 2001 3 Intel.21554 PCI-to-PCI Bridge for Embedded Applications Hardware Reference Manual. ? ?Intel,1998 4 Intel. 21154 PCI-to-PCI Bridge Configuration Application Note. Intel,1998 5 Wind River.VxWorks Programmer′s Guide. Wind River Sys-tems Inc,1998 6 Lynx. Lynx Device Driver Service Manual. Lynx, 2000 7 T.Shanley, D.Anderson. PCI System Architecture(4th Edition).USA: Addison Wesley ??Longman,1999   

 



 

  

  



  






Keywords:bus  pci  map  memory Reference address:Implementation of dual-machine communication on CPCI bus

Previous article:ispPAC and its filter design
Next article:A PCI card that continuously outputs data under Windows 2000

Recommended ReadingLatest update time:2024-11-16 17:51

ARM C C++ memory alignment
       ARM series processors are RISC (Reduced Instruction Set Computing) processors. Many ARM-based efficient code programming strategies are derived from RISC processors. Like many RISC processors, memory access of ARM series processors also requires data alignment, that is, when accessing "word" data, four-byte alig
[Microcontroller]
STM32F407 flash memory
Hardware platform: STM32F4 DISCOVERY development board model: MB997A or MB997C Main chip model: STM32F405xx, STM32F407xx, STM32F415xx, or STM32F417xx Main reference documents: (1) PM0081 STM32F40xxx and STM32F41xxx Flash programming manual.pdf (2) STM32F407 datasheet.pdf       As embedded developers, when we get a ch
[Microcontroller]
STM32F407 flash memory
Development of NO.7 signaling acquisition card based on PCI bus technology
At present, the NGN network is a complex network with soft switching as the core and PSTN network and VoIP network integrated together. There are multiple interfaces , multiple protocols, and multimedia coexisting. The compatibility of PSTN and VoIP equipment, the compatibility of equipment of different equi
[Analog Electronics]
Development of NO.7 signaling acquisition card based on PCI bus technology
Communication data acquisition system based on DSP and PCI bus
    Abstract: This article introduces a mobile communication data acquisition system based on DSP and PCI bus. A dual mapping method is proposed, which successfully solves the communication connection between the DSP's host port interface (HPI port for short) and PCI9052.     Keywords: digital signal processor,
[Industrial Control]
Moto E7 core specifications revealed: Snapdragon 632 processor + 2GB memory
      As the successor of the Moto E6 launched last year, the entry-level Moto E7 recently appeared on the Google Play Console and revealed some key specifications, including memory, system, screen resolution, chipset, etc. Unfortunately, the website does not provide renderings of the phone.   According to the listi
[Mobile phone portable]
Keil C51 LST file, similar to MDK.map file
Expand When the Keil C51 compiler compiles a program, it generates a list file with the extension LST, also known as a listing file. This file contains a wealth of information about the compilation process. The file consists of multiple sections, of which the Symbol Listing and Module Information sections are partic
[Microcontroller]
ARMv7-A Processor Peek (3) —— Memory Model
References for this article: {A3.5 Memory types and attributes and the memory order model} in ARMv7-A_and_R_Architecture_Reference_Manual 1. Memory Type In the ARMv7-A processor, Memory is defined as several types (Memory Type): 1. Strongly-ordered; 2. Normal; 3. Device; Its definition is as follows: Not
[Microcontroller]
ARMv7-A Processor Peek (3) —— Memory Model
Flexible use of PROGRAM MEMORY of PIC16C57 microcontroller
When you use Microchip's PIC16C57 microcontroller to design a program, are you troubled by the fact that its PROGRAM MEMORY needs to be divided into PAGE, and the setting of PAGE affects the execution results of the four instructions goto, call, addwf 2, and movwf 2? Woolen cloth? The following is a little bit of wh
[Microcontroller]
Latest Industrial Control Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号