PCI Express is a high-speed serial I/O interconnect mechanism that uses clock data recovery (CDR) technology. The first generation of PCI Express specifies a line rate of 2.5Gbps per channel, which allows you to build applications with single-channel (x1) link 2Gbps (via 8B/10B encoding) up to 32 channels with 64Gbps throughput. This can significantly reduce the number of pins while maintaining or improving throughput. In addition, it can also reduce the size of the PCB, reduce the number of traces and layers, and simplify layout and design. Fewer pins means less noise and electromagnetic interference (EMI). CDR eliminates the clock-data skew problem that is common in wide parallel buses and simplifies interconnect implementation.
The PCI Express interconnect architecture was primarily targeted at PC-based systems, but like PCI, PCI Express quickly migrated to other system types, such as embedded systems. It specifies three types of devices: root complex, switch, and endpoint (Figure 1). The root complex is roughly equivalent to the PCI host, to which the CPU, system memory, and graphics controller are connected. Due to the point-to-point nature of PCI Express, switch devices must be used to increase the number of system functions. PCI Express switch devices connect the root complex device on the upstream side to the endpoint on the downstream side.
Figure 1: PCI Express topology.
Endpoint functionality is similar to PCI/PCI-X devices. The most common endpoint devices are Ethernet controllers or storage host bus adapters (HBAs). FPGAs are most commonly used for data processing and bridging functions, so their largest target function is endpoint. FPGA implementation is well suited for video, medical imaging, industrial, test and measurement, data acquisition, and storage applications.
The PCI Express specification adopted by the PCI-SIG (PCI Special Interest Group) specifies that each PCI Express device uses three different protocol layers: the physical layer, the data link layer, and the transaction layer. You can use a single-chip or two-chip solution to build a PCI Express endpoint. For example, using a low-cost FPGA such as the Xilinx Spartan-3 device, you can build the data link and transaction layers with commercial discrete PCI Express PHYs (Figure 2). This option is best suited for x1 lane applications such as bus controllers, data acquisition cards, and PCI32/33 devices for increased performance. Alternatively, you can use a single-chip solution such as the Virtex-5 LXT or SXT FPGA, which has an integrated PCI Express PHY. This option is best suited for communications or high-definition audio/video endpoint devices (Figure 3), which require higher performance: x4 (8Gbps throughput) links or x8 (16Gbps throughput) links.
Figure 2: Data acquisition card based on Spartan-3 FPGA. [page]
Before choosing a technology to implement a PCI Express design, careful consideration must be given to the application's IP selection, link efficiency, compatibility testing, and resource availability. In this article, we will briefly review some of the factors that go into building single-chip x4 and x8 lane PCI Express designs using the latest FPGA technology.
Figure 3: Video application based on Virtex-5 LXT FPGA.
IP selection
As a designer, you can choose to build your own soft IP or purchase IP from a third party or FPGA vendor. The challenge of building your own IP is that you not only have to create the design from scratch, but you also have to worry about verification, approval, compatibility, and hardware evaluation. IP purchased from a third party or FPGA vendor has gone through all the rigorous compatibility testing and hardware evaluation and is plug-and-play ready. If you use a commercial, proven, compliant PCI Express interface, you can focus on the most value-added part of the design: the user application. The challenge of using soft IP is the resource availability for the application. The PCI Express MAC layer, data link layer, and transaction layer of the soft IP core are implemented through the programmable fabric, so you must pay special attention to the amount of remaining block RAM, lookup tables, and fabric resources.
Figure 4: Virtex-5 LXT FPGA PCI Express endpoint block diagram.
Another option is to use the latest technology FPGAs. An integrated x8-lane PCI Express controller is implemented in the dedicated gate circuits of the Virtex-5 LXT and SXT (Figure 4). This implementation is very advantageous because the design is implemented in hard silicon, so the number of FPGA logic resources required is minimized. For example, in a Virtex-5 LXT FPGA, a x8-lane soft IP core can occupy up to 10,000 logic cells, while the hard implementation requires only about 500 logic cells, most of which are used for interfaces. This resource savings can sometimes allow you to choose a smaller device, and smaller devices are generally cheaper. Integrated implementations usually have higher performance, wider data paths, and are software configurable. [page]
Another challenge with soft IP implementation is the number of features. Typically, such cores implement only the minimum features required to meet performance or compatibility target specifications. In contrast, hard IP can support a comprehensive list of features based on customer requirements and provide full compatibility (Table 1) without serious performance or resource-related issues.
Table 1 Virtex-5 LXT FPGA PCI Express Features
Latency Issues
Although the latency of the PCI Express controller does not significantly affect the overall system latency, it can affect the performance of the interface. Using narrower data paths can help reduce latency.
For PCI Express, latency is the number of cycles required to send a packet and receive it across the physical, logical, and transaction layers. A typical x8 lane PCI Express endpoint has a latency of 20-25 cycles, which corresponds to a latency of 80-100ns at 250MHz. If a 128-bit datapath is used to implement the interface to simplify timing (such as 125MHz), the latency doubles to 160-200ns. In the latest Virtex-5 LXT and SXT devices, both soft and hard IP implementations use a 64-bit datapath at 250MHz to implement x8. [page]
Link efficiency
Link efficiency is a function of latency, user application design, payload size, and overhead. As the payload size (often referred to as the maximum payload) increases, the effective link efficiency increases. This is due to the fact that the packet overhead is fixed; if the payload is large, the efficiency increases. Typically, a 256-byte payload provides a theoretical efficiency of 93% (256 payload bytes + 12 header bytes + 8 framing bytes). Although PCI Express allows packet sizes up to 4KB, most systems will not see performance improvements with payload sizes greater than 256 or 512 bytes. Due to link protocol overhead (ACK/NAK, packet retransmissions) and flow control protocols, link efficiencies for x4 or x8 PCI Express implementations in Virtex-5 LXT FPGAs are 88-89%.
FPGA implementations provide greater control over link efficiency because they allow you to choose the receive buffer size that corresponds to the endpoint implementation. If both sides of a link do not implement the datapath in the same way, the internal latency of the two will be different. For example, if link partner 1 uses a 64-bit, 250MHz implementation with a latency of 80ns, and link partner 2 uses a 128-bit, 125MHz implementation with a latency of 160ns, the combined latency of the link is 240ns. Now, if link partner 1's receive buffer is designed for a latency of 160ns (i.e., expecting its link partner to also have a 64-bit, 250MHz implementation), then the link efficiency will be reduced. With an ASIC implementation, it is not possible to change the receive buffer size, and the efficiency loss will be real and permanent.
User application design also has an impact on link efficiency. The user application must be designed to drain the PCI Express interface's receive buffers regularly and keep the transmit buffers full at all times. If the user application does not immediately consume received packets (or respond immediately to transmit requests), the overall link efficiency will be affected regardless of the performance of the interface.
When using certain processor designs, if the processor cannot perform bursts larger than 1 DWORD, a DMA controller needs to be implemented. This will result in underutilized and inefficient links. Most embedded CPUs can send bursts longer than 1 DWORD, so link efficiency can be effectively managed on these designs with a good FIFO design.
PCI Express Compatibility
Compatibility is an important detail that is often missed and underestimated. If you are building a PCI Express application that must work with other applications and devices, you must ensure compatibility of your design.
Compatibility is not just about the IP, but about the entire solution, including the IP, user applications, semiconductor devices, and hardware boards. If the entire solution has been verified by the PCI-SIG PCI Express Compatibility Working Group, you can be assured that the PCI Express portion of your design will continue to work effectively.
Conclusions
PCI Express has replaced PCI as the de facto system interconnect standard and has migrated from PCs to other system markets, including embedded system design. FPGAs are well suited for building PCI Express endpoint devices because they allow you to create compliant PCI Express devices with additional custom features that users require.
New 65nm FPGAs like the Virtex-5 LXT and SXT series are fully compliant with the PCI Express v1.1 specification and provide extensive logic and device resources for user applications. The Spartan-3 series FPGAs using an external PHY provide a low-cost solution. These factors, plus the inherent programmable logic advantages (flexibility, reprogrammability, and low risk) make FPGAs the best platform for PCI Express.
Previous article:DDR3 Memory Interface Controller IP Accelerates Data Processing Applications
Next article:Low-Power FPGA Solution for GPON Burst Mode Receiver
Recommended ReadingLatest update time:2024-11-16 23:40
- Popular Resources
- Popular amplifiers
- Analysis and Implementation of MAC Protocol for Wireless Sensor Networks (by Yang Zhijun, Xie Xianjie, and Ding Hongwei)
- MATLAB and FPGA implementation of wireless communication
- Intelligent computing systems (Chen Yunji, Li Ling, Li Wei, Guo Qi, Du Zidong)
- Summary of non-synthesizable statements in FPGA
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Today at 10:00 AM Live: ST's IO-Link-based condition monitoring and predictive maintenance solutions
- [TI Course] Questions about calculating the effective value of pfc inductor current
- CC2541 Timer Mode
- Espressif has released a new product: ESP32-H2: IEEE 802.15.4 + Bluetooth 5.2 (LE) RISC-V SoC
- vscode for web released
- Mir MYC-YT507 development board review: Performance test 3: Storage performance test
- What kind of chip is my fake stm32f103c8t6
- How are components connected in a PCB?
- [NXP Rapid IoT Review] Is your kit still working properly?
- Why is the output of this DCDC unstable?