Enhance DSP co-processing capabilities using Serial RapidIO connectivity

Publisher:紫菜包饭Latest update time:2011-10-09 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

The demand for high-speed communication and ultra-fast computing is increasing day by day. The application of wired and wireless communication standards is everywhere, and the data processing architecture is expanding every day. The more common wired communication method is Ethernet (LAN, WAN and MAN networks). Mobile communication is the most common wireless communication method, which is realized by the architecture using DSP. As the main tool for voice connection, the phone is currently meeting the increasing voice, video and data requirements.

When creating an architecture, system designers must not only consider the high-end demand of the triple play model, but also meet the following requirements: high performance, low latency, low system cost (including NRE), scalable, extensible architecture, integration of off-the-shelf (OTS) components, distributed processing, and support for multiple standards and protocols.

These challenges involve two main aspects: the connectivity between computing platforms/boxes in a wired or wireless architecture and the specific computing resources within these platforms/boxes.

Connectivity between computing platforms

Standards-based connections are now more common. Parallel connection standards (PCI, PCI-X, EMIF) can meet current needs, but are slightly insufficient in terms of scalability and extensibility. With the emergence of packet-based processing, the usage trend is clearly biased towards high-speed serial connections (Figure 1).

Figure 1: Serial connection trends.

The desktop computer and networking industries have adopted standards such as PCI Express (PCIe) and Gigabit Ethernet/XAUI. However, the interconnect requirements for data processing systems in wireless architectures are slightly different: low pin count, backplane chip-to-chip connectivity, scalable bandwidth and speed, DMA and message transfer, support for complex scalable topologies, multi-point transmission, high reliability, absolute time synchronization, and quality of service (QoS).

The Serial RapidIO (SRIO) protocol standard easily meets and exceeds most of the above requirements. As a result, SRIO has become the primary interconnect for data plane connections in wireless infrastructure equipment.

Figure 2: SRIO network building blocks.

SRIO networks are built around two basic building blocks: endpoints and switches (Figure 2). Endpoints source and sink packets, while switches pass packets between ports without parsing them. SRIO is specified in a three-layer architectural hierarchy (Figure 3):

Figure 3: Hierarchical SRIO architecture.

1. The physical layer specification describes the details of the device-level interface, such as packet transmission mechanisms, flow control, electrical parameters, and low-level error management.

2. The transport layer specification provides the necessary wiring information for packets to move between endpoints. Switches operate in the transport layer by using device-based wiring.

3. The logical layer specification defines the overall protocol and packet format. All packets have a payload of 256 or fewer bytes. Transactions use load/store/DMA operations directed to a 34-/50-/66-bit address space. Transactions include: NREAD - read operation (returned data is the response), NWRITE - write operation, no response, NWRITE_R - strong write, response from the target endpoint, SWRITE - streaming write, ATOMIC - atomic read/modify/write, MAINTENANCE - system search, detection, initialization, configuration and maintenance operations.

SRIO-Advantages and Prospects

A 4-lane SRIO link running at 3.125Gbps can provide 10Gbps of traffic with guaranteed data integrity. Because SRIO is similar to a microprocessor bus (memory and device addressing, rather than software management of LAN protocols), packet processing is implemented in hardware. This means that the additional overhead in I/O processing can be greatly reduced, latency can be reduced, and system bandwidth can be increased. However, unlike most bus interfaces, the SRIO interface has a small number of pins, and the bandwidth can continue to expand based on the 3.125Gbps link.

Computing resources in the platform

Today’s applications require a high amount of processing resources. Hardware-based applications are growing rapidly. Compression/decompression algorithms, firewall applications such as anti-virus and intrusion detection, and security applications requiring encryption engines such as AES, Triple DES, and Skipjack were initially implemented in software but are now being implemented in hardware. This requires a large parallel ecosystem where bandwidth and processing power can be shared. Systems need to use CPUs, NPUs, FPGAs, or ASICs to achieve shared or distributed processing.

All of these application-specific requirements need to be considered when building a system that can adapt to future developments and changes. The requirements for computing resources include:
1. Multiple hosts - distributed processing;
2. Direct point-to-point communication;
3. Multiple heterogeneous operating systems;
4. Complex topology
; 5. Discovery mechanism;
6. Redundant paths (failure recovery);
7. Support for high reliability;
8. Lossless protocol;
9. Automatic retraining and device synchronization;
10. System-level error management;
11. Ability to support communication data plane;
12. Multi-point transmission;
13. Traffic management (lossy) operation;
14. Link, level and flow-based flow control;
15. Protocol interoperability;
16. High transaction concurrency;
17. Modular and scalable;
18. Support for a wide ecosystem.

The SRIO protocol can support a variety of requirements derived from computing devices in wireless architectures.

The SRIO specification (Figure 4) defines a packet-based layered architecture that supports multiple domains or market segments, which helps system architects design next-generation computing platforms. Using SRIO as a computing interconnect makes it easy to: make the architecture independent; deploy scalable systems with carrier-grade reliability; implement advanced traffic management; and provide high performance and high traffic. In addition, the ecosystem of a large number of suppliers makes it easy to choose OTS parts and components.

RIO is a packet-based protocol that supports:
1. Data movement through packet-based operations (read, write, message);
2. I/O incoherence and cache coherence;
3. Efficient intercommunication and protocol encapsulation by supporting data flow, data partitioning and reorganization functions;
4. Traffic management framework by enabling millions of flows, supporting 256 traffic levels and lossy operations;
5. Flow control, supporting multiple transaction request flows, and providing QoS;
6. Supporting priority levels to alleviate issues such as bandwidth allocation and transaction order, and avoiding deadlock;
7. Supporting topology, supporting standard (tree and mesh) and arbitrary hardware (daisy chain) topologies through system discovery, configuration and maintenance, including support for multiple hosts;
8. Error management and classification (recoverable, reminder and fatal).

Figure 4: SRIO specification.

Xilinx IP Solutions for SRIO

The Xilinx Endpoint IP solution for SRIO is designed for the RapidIO specification (v1.3). The complete Xilinx Endpoint IP solution for SRIO includes the following parts (Figure 5):

Figure 5: Xilinx Endpoint IP architecture for SRIO.

1. The Xilinx Endpoint IP for SRIO is a soft LogiCORE solution that supports fully compliant maximum payload operations for sourcing and receiving user data through the target and initiator interfaces on both the logical (I/O) and transport layers.

2. A buffer layer reference design is provided as source code that automatically reprioritizes packets and adjusts queues.

3. The SRIO physical layer IP implements link training and initialization, discovery and management, and error and retry recovery mechanisms. In addition, high-speed transceivers are instantiated in the physical layer IP to support 1-lane and 4-lane SRIO bus links with line rates of 1.25Gbps, 2.5Gbps, and 3.125Gbps.

4. The register manager reference design allows the SRIO host device to set and maintain endpoint device configuration, link status, control and timeout mechanisms. In addition, the ports provided on the register manager can be used by the user design to detect the status of the endpoint device.

The entire Xilinx endpoint IP LogiCORE solution for SRIO has been fully tested, hardware validation has been performed, and is currently being tested for interoperability with major SRIO device vendors. The LogiCORE IP is provided through the Xilinx CORE Generator software GUI tool, which allows users to customize baud rate and endpoint configuration and supports extended features such as flow control, retransmission compression, doorbell and messaging. In this way, you can create a flexible and scalable custom SRIO endpoint IP to optimize your application.

Virtex-5 FPGA Computing Resources

Xilinx endpoint IP for SRIO ensures a high-speed connection between both sides of a link using the SRIO protocol. In the smallest Virtex-5 device, the IP occupies less than 20% of the available logic resources, thus ensuring that the user design uses the most logic/memory/I/O and focuses on implementing the system application. Let's take a look at the Virtex-5 device resources.

Logic Module

The Virtex-5 logic fabric offers the highest FPGA capacity with a six-input lookup table (LUT) based on a 65nm process. Improved carry logic delivers 30% higher performance than previous devices. The device consumes significantly less power due to fewer LUTs required and features a highly optimized symmetrical routing architecture.

Memory

Virtex-5 memory solutions include LUT RAM, Block RAM, and memory controllers that interface with large memories. Block RAM structures include prefabricated FIFO logic, which can be used for embedded error detection and correction (ECC) logic for external memories. In addition, Xilinx can provide comprehensive design resources for instantiated memory controller modules in system design through the Memory Interface Generator (MIG) tool. In this way, you can take advantage of hardware-proven solutions and focus on other key parts of the design.

Parallel and serial I/O

SelectIO technology can implement almost any parallel source synchronous interface required by customers in the design. Using SelectIO interface, you can easily create various interfaces that meet industry standards for more than 40 different electrical standards, or you can create dedicated interfaces. The maximum rate provided by SelectIO interface is 700Mbps (single-ended) and 1.25Gbps (differential).

All Virtex-5 LXT FPGAs are equipped with a GTP transceiver that runs at speeds between 100Mbps and 3.2Gbps. In addition, GTP transceivers are among the lowest power MGTs in the industry, with each transceiver consuming less than 100mW. The introduction of proven design techniques and methodologies to simplify design has made the process of high-speed serial design simple and fast.

Additionally, new design tools (RocketIO Transceiver Wizard and IBERT) and new silicon features (TX and RX equalization and built-in pseudo-random bit sequence (PRBS) generator and checker) enable the exploitation of the capabilities and benefits of a migration architecture, from parallel I/O standards to more than 30 serial standards and emerging serial technologies.

DSP module

Each DSP48E Slice provides a performance level of 550MHz, allowing you to create a variety of applications that require single-precision floating-point performance, such as multimedia, video and imaging applications, and digital communications. This extends the functionality of the device, making it superior to previous devices, while also providing power advantages, with a reduction in dynamic power consumption of more than 40%. The number of DSP48E Slices has also been increased in Virtex-5 FPGAs, and the ratio of these modules relative to available logic resources and memory has been optimized.

Integrated I/O modules

All Virtex-5 LXT FPGA devices have an endpoint block to implement PCIe functionality. This hard IP endpoint block allows easy expansion from x1 to x2 and x4 or x8 with minimal effort by simply reconfiguring. This block (x1, x4, and x8 links) has passed rigorous PCI-SIG compatibility and interoperability testing, allowing users to use it with confidence for PCIe.

In addition, all Virtex-5 LXT FPGA devices are equipped with a Tri-Mode Ethernet Media Access Controller (TEMAC) that can reach speeds of 10/100/1,000Mbps. This module provides dedicated Ethernet functions, and combined with the Virtex-5 LXT RocketIO transceiver and SelectIO technology, it can easily connect you to many network devices.

With these two modules for PCIe and Ethernet, a range of custom packet processing and networking products can be created that significantly reduce resource utilization and power consumption. By using these various resources available in Xilinx FPGAs, intelligent solutions can be easily created and deployed.

Let's look at some system design examples that leverage SRIO and DSP technology.

SRIO Embedded System Applications

Consider building an embedded system around a CPU based on the x86 architecture. The CPU architecture is highly optimized to easily handle applications that require number crunching. You can easily implement algorithms in hardware and software that use CPU resources to perform functions such as email, database management, and word processing that do not require a lot of multiplications. Performance is measured in millions or billions of instructions/operations per second, while efficiency is measured in the time/cycles required to complete a specific operation.

High-performance applications that require a lot of fixed-point and floating-point operations take a long time to process data. Examples include signal filtering, fast Fourier transforms, vector multiplication and searching, image/video analysis and format conversion, and simple number processing algorithms. High-end signal processing architectures implemented in DSPs can easily perform these tasks and optimize such operations. The performance of these DSPs is measured in terms of how many multiplications and accumulations are performed per second.

You can easily design an embedded system using CPU and DSP to take advantage of both processing technologies. Figure 6 shows an example of a system using FPGA, CPU, and DSP architecture.

Figure 6: Scalable, high-performance, CPU-based embedded system.

The primary data interconnect in high-end DSPs is SRIO. The primary data interconnect in x86 CPUs is PCIe. As shown in Figure 6, you can easily deploy FPGAs to scale DSP applications or bridge discrete data interconnect standards such as PCIe and SRIO.

In the system shown in Figure 6, the PCIe system is hosted by the root complex chipset. The SRIO system is hosted by the DSP. The 32/64-bit PCIe address space (base address) can be intelligently mapped to the 34/66-bit SRIO address space (base address). PCIe applications can communicate with the root complex through memory or I/O reads and writes. These transactions are easily mapped to the SRIO space through NRead/NWrite/SWrite.

Designing such a bridge function in a Xilinx FPGA is simple because the back-end interfaces of these Xilinx endpoint function blocks, PCIe and SRIO are very similar. Thus, the “packet queue” block performs the task of crossing from PCIe to SRIO or vice versa, thus establishing a packet flow that can traverse both protocol domains.

SRIO DSP System Application

In applications where DSP processing is the primary architectural requirement, the system architecture can be designed as shown in Figure 7.

Figure 7: DSP-intensive array.

Virtex-5 FPGA-based DSP processing combined with other DSP devices in the system can form an intelligent co-processing solution. If SRIO is used as the data interconnect, the entire DSP system solution can be easily expanded. Such solutions can adapt to future developments and changes, provide extensibility, and are supported by multiple form factors. In DSP-intensive applications, fast digital analysis or data processing can be achieved by offloading the corresponding processing tasks to the x86 architecture. Using Virtex-5 FPGA, PCIe subsystems and SRIO architectures can be easily connected to achieve efficient function offloading.

SRIO baseband system application

Existing 3G networks are maturing at a rapid pace, and OEMs are deploying new form factors to alleviate specific capacity and coverage issues. To address these specific issues and evaluate market trends, an FPGA-based DSP architecture is an ideal choice that uses SRIO as a data plane standard. In addition, early DSP systems can be quickly upgraded to a fast, low-power FPGA DSP architecture to gain scalability benefits.

As shown in the system in Figure 8, you can design the Virtex-5 FPGA to handle the existing line-rate processing requirements for antenna traffic and also provide connectivity to other system resources via SRIO. The SRIO endpoint capabilities available to the Virtex-5 FPGA make it easy to migrate existing legacy DSP applications that have inherently slow parallel connections.

Figure 8: Scalable baseband uplink/downlink card.

Conclusion

SRIO is emerging in a large number of new applications, mainly centered on DSPs in wired and wireless applications. The main advantages of implementing SRIO architecture in Xilinx devices include:
1. Availability of the entire SRIO endpoint solution;
2. Flexibility and scalability to facilitate different levels of products using the same hardware and software architecture;
3. Low power consumption achieved through new GTP transceivers and 65nm technology;
4. Easy configuration through the CORE Generator software GUI tool;
5. Proven hardware interoperability with industry-leading vendors to support SRIO connections on their devices;
6. System integration achieved through the use of integrated I/O modules such as PCIe and TEMAC, thereby reducing overall system cost.

In addition, the DSP resources of Virtex-5 FPGA can meet the requirements of existing early DSP systems in terms of power consumption, performance and bandwidth. More advantages are also reflected in system integration, such as Ethernet MAC functional modules, endpoint modules for PCIe, processor IP modules, storage elements and controllers, etc. In addition, since the exhaustive list of IP cores supports multiple source integration in FPGA, the overall system cost can be greatly saved.

Reference address:Enhance DSP co-processing capabilities using Serial RapidIO connectivity

Previous article:Design of DSP System for Spread Spectrum Transmission of Multi-channel Measurement Signals
Next article:Realization of HPI Interface between TMS320VC5402 DSP and Single Chip Microcomputer

Latest Microcontroller Articles
  • Download from the Internet--ARM Getting Started Notes
    A brief introduction: From today on, the ARM notebook of the rookie is open, and it can be regarded as a place to store these notes. Why publish it? Maybe you are interested in it. In fact, the reason for these notes is ...
  • Learn ARM development(22)
    Turning off and on interrupts Interrupts are an efficient dialogue mechanism, but sometimes you don't want to interrupt the program while it is running. For example, when you are printing something, the program suddenly interrupts and another ...
  • Learn ARM development(21)
    First, declare the task pointer, because it will be used later. Task pointer volatile TASK_TCB* volatile g_pCurrentTask = NULL;volatile TASK_TCB* vol ...
  • Learn ARM development(20)
    With the previous Tick interrupt, the basic task switching conditions are ready. However, this "easterly" is also difficult to understand. Only through continuous practice can we understand it. ...
  • Learn ARM development(19)
    After many days of hard work, I finally got the interrupt working. But in order to allow RTOS to use timer interrupts, what kind of interrupts can be implemented in S3C44B0? There are two methods in S3C44B0. ...
  • Learn ARM development(14)
  • Learn ARM development(15)
  • Learn ARM development(16)
  • Learn ARM development(17)
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号