Optimal Platform for Building PCI Express Endpoint Devices Using FPGA-EEWORLD

Collect

PCI Express is a high-speed serial I/O interconnect mechanism that uses clock data recovery (CDR) technology. The first generation of PCI Express specifies a line rate of 2.5Gbps per channel, which allows you to build applications with single-channel (x1) link 2Gbps (via 8B/10B encoding) up to 32 channels with 64Gbps throughput. This can significantly reduce the number of pins while maintaining or improving throughput. In addition, it can also reduce the size of the PCB, reduce the number of traces and layers, and simplify layout and design. Fewer pins means less noise and electromagnetic interference (EMI). CDR eliminates the clock-data skew problem that is common in wide parallel buses and simplifies interconnect implementation.

The PCI Express interconnect architecture was primarily targeted at PC-based systems, but like PCI, PCI Express quickly migrated to other system types, such as embedded systems. It specifies three types of devices: root complex, switch, and endpoint (Figure 1). The root complex is roughly equivalent to the PCI host, to which the CPU, system memory, and graphics controller are connected. Due to the point-to-point nature of PCI Express, switch devices must be used to increase the number of system functions. PCI Express switch devices connect the root complex device on the upstream side to the endpoint on the downstream side.

PCI Express Topology

Figure 1: PCI Express topology.

Endpoint functionality is similar to PCI/PCI-X devices. The most common endpoint devices are Ethernet controllers or storage host bus adapters (HBAs). FPGAs are most commonly used for data processing and bridging functions, so their largest target function is endpoint. FPGA implementation is well suited for video, medical imaging, industrial, test and measurement, data acquisition, and storage applications.

The PCI Express specification adopted by the PCI-SIG (PCI Special Interest Group) specifies that each PCI Express device uses three different protocol layers: the physical layer, the data link layer, and the transaction layer. You can use a single-chip or two-chip solution to build a PCI Express endpoint. For example, using a low-cost FPGA such as the Xilinx Spartan-3 device, you can build the data link and transaction layers with commercial discrete PCI Express PHYs (Figure 2). This option is best suited for x1 lane applications such as bus controllers, data acquisition cards, and PCI32/33 devices for increased performance. Alternatively, you can use a single-chip solution such as the Virtex-5 LXT or SXT FPGA, which has an integrated PCI Express PHY. This option is best suited for communications or high-definition audio/video endpoint devices (Figure 3), which require higher performance: x4 (8Gbps throughput) links or x8 (16Gbps throughput) links.

Data Acquisition Card Based on Spartan-3 FPGA

Figure 2: Data acquisition card based on Spartan-3 FPGA. [page]

Before choosing a technology to implement a PCI Express design, careful consideration must be given to the application's IP selection, link efficiency, compatibility testing, and resource availability. In this article, we will briefly review some of the factors that go into building single-chip x4 and x8 lane PCI Express designs using the latest FPGA technology.

Video Applications Based on Virtex-5 LXT FPGA

Figure 3: Video application based on Virtex-5 LXT FPGA.

IP selection

As a designer, you can choose to build your own soft IP or purchase IP from a third party or FPGA vendor. The challenge of building your own IP is that you not only have to create the design from scratch, but you also have to worry about verification, approval, compatibility, and hardware evaluation. IP purchased from a third party or FPGA vendor has gone through all the rigorous compatibility testing and hardware evaluation and is plug-and-play ready. If you use a commercial, proven, compliant PCI Express interface, you can focus on the most value-added part of the design: the user application. The challenge of using soft IP is the resource availability for the application. The PCI Express MAC layer, data link layer, and transaction layer of the soft IP core are implemented through the programmable fabric, so you must pay special attention to the amount of remaining block RAM, lookup tables, and fabric resources.

Virtex-5 LXT FPGA PCI Express Endpoint Block Diagram

Figure 4: Virtex-5 LXT FPGA PCI Express endpoint block diagram.

Another option is to use the latest technology FPGAs. An integrated x8-lane PCI Express controller is implemented in the dedicated gate circuits of the Virtex-5 LXT and SXT (Figure 4). This implementation is very advantageous because the design is implemented in hard silicon, so the number of FPGA logic resources required is minimized. For example, in a Virtex-5 LXT FPGA, a x8-lane soft IP core can occupy up to 10,000 logic cells, while the hard implementation requires only about 500 logic cells, most of which are used for interfaces. This resource savings can sometimes allow you to choose a smaller device, and smaller devices are generally cheaper. Integrated implementations usually have higher performance, wider data paths, and are software configurable. [page]

Another challenge with soft IP implementation is the number of features. Typically, such cores implement only the minimum features required to meet performance or compatibility target specifications. In contrast, hard IP can support a comprehensive list of features based on customer requirements and provide full compatibility (Table 1) without serious performance or resource-related issues.

Table 1 Virtex-5 LXT FPGA PCI Express Features

Virtex-5 LXT FPGA PCI Express Features

Latency Issues

Although the latency of the PCI Express controller does not significantly affect the overall system latency, it can affect the performance of the interface. Using narrower data paths can help reduce latency.

For PCI Express, latency is the number of cycles required to send a packet and receive it across the physical, logical, and transaction layers. A typical x8 lane PCI Express endpoint has a latency of 20-25 cycles, which corresponds to a latency of 80-100ns at 250MHz. If a 128-bit datapath is used to implement the interface to simplify timing (such as 125MHz), the latency doubles to 160-200ns. In the latest Virtex-5 LXT and SXT devices, both soft and hard IP implementations use a 64-bit datapath at 250MHz to implement x8. [page]

Link efficiency

Link efficiency is a function of latency, user application design, payload size, and overhead. As the payload size (often referred to as the maximum payload) increases, the effective link efficiency increases. This is due to the fact that the packet overhead is fixed; if the payload is large, the efficiency increases. Typically, a 256-byte payload provides a theoretical efficiency of 93% (256 payload bytes + 12 header bytes + 8 framing bytes). Although PCI Express allows packet sizes up to 4KB, most systems will not see performance improvements with payload sizes greater than 256 or 512 bytes. Due to link protocol overhead (ACK/NAK, packet retransmissions) and flow control protocols, link efficiencies for x4 or x8 PCI Express implementations in Virtex-5 LXT FPGAs are 88-89%.

FPGA implementations provide greater control over link efficiency because they allow you to choose the receive buffer size that corresponds to the endpoint implementation. If both sides of a link do not implement the datapath in the same way, the internal latency of the two will be different. For example, if link partner 1 uses a 64-bit, 250MHz implementation with a latency of 80ns, and link partner 2 uses a 128-bit, 125MHz implementation with a latency of 160ns, the combined latency of the link is 240ns. Now, if link partner 1's receive buffer is designed for a latency of 160ns (i.e., expecting its link partner to also have a 64-bit, 250MHz implementation), then the link efficiency will be reduced. With an ASIC implementation, it is not possible to change the receive buffer size, and the efficiency loss will be real and permanent.

User application design also has an impact on link efficiency. The user application must be designed to drain the PCI Express interface's receive buffers regularly and keep the transmit buffers full at all times. If the user application does not immediately consume received packets (or respond immediately to transmit requests), the overall link efficiency will be affected regardless of the performance of the interface.

When using certain processor designs, if the processor cannot perform bursts larger than 1 DWORD, a DMA controller needs to be implemented. This will result in underutilized and inefficient links. Most embedded CPUs can send bursts longer than 1 DWORD, so link efficiency can be effectively managed on these designs with a good FIFO design.

PCI Express Compatibility

Compatibility is an important detail that is often missed and underestimated. If you are building a PCI Express application that must work with other applications and devices, you must ensure compatibility of your design.

Compatibility is not just about the IP, but about the entire solution, including the IP, user applications, semiconductor devices, and hardware boards. If the entire solution has been verified by the PCI-SIG PCI Express Compatibility Working Group, you can be assured that the PCI Express portion of your design will continue to work effectively.

Conclusions

PCI Express has replaced PCI as the de facto system interconnect standard and has migrated from PCs to other system markets, including embedded system design. FPGAs are well suited for building PCI Express endpoint devices because they allow you to create compliant PCI Express devices with additional custom features that users require.

New 65nm FPGAs like the Virtex-5 LXT and SXT series are fully compliant with the PCI Express v1.1 specification and provide extensive logic and device resources for user applications. The Spartan-3 series FPGAs using an external PHY provide a low-cost solution. These factors, plus the inherent programmable logic advantages (flexibility, reprogrammability, and low risk) make FPGAs the best platform for PCI Express.

Keywords：FPGA Reference address：Optimal Platform for Building PCI Express Endpoint Devices Using FPGA

Previous article：DDR3 Memory Interface Controller IP Accelerates Data Processing Applications
Next article：Low-Power FPGA Solution for GPON Burst Mode Receiver

Recommended ReadingLatest update time:2024-11-16 23:40

Design of high-speed data acquisition system based on CPLD/FPGA

0 Introduction Traditional data acquisition systems generally use single-chip microcomputers, and most systems complete data transmission through the PCI bus. Its disadvantages are poor mathematical computing capabilities; limited by the number of computer slots and interrupt resources; inconvenient to

[Embedded]

Design of high-speed data acquisition system based on CPLD/FPGA

SERDES interface design and testing in FPGA implementation

In recent years, the enhancement of chip functions and data throughput requirements have driven the chip industry to shift from low-speed data parallel connections to high-speed serial connections. This concept is called SERDES (Serializer-Deserializer), which involves transmitting data serially on high-speed different

[Test Measurement]

SERDES interface design and testing in FPGA implementation

Motion Control and Mixed-Signal FPGAs

As the performance and integration of electronic components continue to increase while the price continues to decrease, the development of electronic control units is advancing by leaps and bounds. With the emergence of a large number of technologies and applications, from home appliances to industrial automation produ

[Industrial Control]

Design of Convolutional Code Encoder/Decoder Based on FPGA

Convolutional code was first proposed by Elias in 1955. Later, Wozencraft proposed an effective decoding method, namely sequence decoding, in 1957. Massey proposed a threshold decoding method in 1963, which has slightly inferior performance but is more practical. Due to this practical progress, convolutional code ha

[Embedded]

Design of Convolutional Code Encoder/Decoder Based on FPGA

Design and Implementation of Digital Video Conversion Interface Based on FPGA

introduction From the perspective of practical application, this paper uses FPGA as the main control chip to design a digital video interface conversion device. The device collects, converts color space, and converts resolution of ITU-R BT.656 format data generated by the MT9M111 digital image sensor, and c

[Embedded]

Design and Implementation of Digital Video Conversion Interface Based on FPGA

Design of serial communication interface based on FPGA and single chip microcomputer

Field Programmable Gate Array (FPGA) is a semiconductor device containing editable components, which can be programmed by users on site. FPGA is a product further developed on the basis of editable devices such as PAL, GAL, CPLD, etc. The problem of weak data processing capability of the high-speed data acquis

[Microcontroller]

Design of serial communication interface based on FPGA and single chip microcomputer

CDR in Serial Data Testing

In today's GHz-rate serial data testing, eye diagram and jitter testing are the two most important test items. In eye diagram and jitter measurement, the test instrument must recover the reference clock from the signal to be tested and use the clock to synchronize and sample the data. Therefore, the method of recoverin

[Test Measurement]

Design of RS232 Asynchronous Serial Port IP Core Based on FPGA

1 Introduction Data acquisition systems often require asynchronous serial data transmission. The widely used RS232 asynchronous serial interface, such as 8250, NS16450 and other dedicated integrated devices, is simple to use, but has disadvantages such as occupying circuit board area and complex wiring. Sys

[Embedded]

Design of RS232 Asynchronous Serial Port IP Core Based on FPGA

Popular Resources
Popular amplifiers