Implementation of Customized Buffer Management Based on FPGA and IP Core-EEWORLD

Collect

With the development and diversification of communication protocols, the protocol processing part PE generally adopts the existing commercial chip NP (Network Processor) to implement hardware forwarding, and the traffic management part needs to be customized according to the needs of the system or use commercial chips to complete. In many cases, NP chips, TM chips, and switching network chips cannot be selected from the same manufacturer. At this time, customizing TM becomes the lowest cost and most system-optimized solution, which is generally implemented using FPGA. The general structure of TM is shown in Figure 1.

Figure 1 General structure of TM

At present, the mainstream TM interface is in the form of SPI4-P2 interface. The signal rate of SPI4-P2 interface is high, and TCCS (Channel-to-channel skew, jitter of data channel, including clock jitter) is difficult to control. It is difficult to achieve a very high rate under normal circumstances. In order to achieve high rate and avoid TCCS problem, SPI4-P2 interface puts forward DPA (dynamic phase adjustment) requirements on the receiving end in many cases. For the SPI4-P2 interface form, the IP Core of Altera can be directly used. Altera's mainstream FPGAs all implement the hardware DPA function. Taking Stratix II devices as an example, the interface data rate of 16Gb/s can be achieved by using SPI4-P2 IP Core with DPA enabled.

The SEG module is a data segmentation block. According to the data structure requirements of the switching network, it is responsible for segmenting IP packets or data packets into fixed-size data blocks in the direction of the switching network, which is convenient for later storage scheduling and switching network operation processing. The SEG module can be implemented in conjunction with the SPI4-P2 IP Core. The RSM module corresponds to the SEG module, which reassembles the data blocks from the switching network into complete IP packets or data packets.

The BM (Buffer Management) module is a buffer management module that manages the buffer unit of the TM and completes the DRAM access operation. The control part of the external DRAM can be implemented using the DDR SDRAM IP Core.

The QM module is a queue management module, which is responsible for completing the data queue management function of the port and receiving the data entry and exit requests when the BM module reads and writes DRAM. The performance indicators such as the number of data flows, the number of service types, and the number of ports that the TM can support are reflected in the QM module.

The Scheduler module is a scheduling module that schedules packets based on their type, priority, and bandwidth allocated to the port. TM traffic shaping, QOS, and other functions are implemented through the scheduling module.

The CELL_EDIT module completes the packaging of output data, packaging the data read from DRAM and sending it out.

In TM, different management strategies need to be implemented for data packets of different service levels based on data service strategies. At the same time, it is necessary to ensure that the data packets of streaming media are not out of order. Data packets are large and small, and the number of data blocks divided by the SEG module is also large and small. Therefore, there must be an effective data structure based on the linked list method to manage these data. The QM module manages the queue based on the business and data flow, and the management of the packet is completed by the BM module. [page]

The packet-based data structure in the BM module consists of two parts: BRAM and PRAM. BRAM is a data buffer, corresponding to the off-chip DRAM. BRAM is responsible for storing data units. Relative to the data units divided by the SEG module, there are corresponding storage units BCELL in the BRAM. BCELL is divided by address space in the BRAM. Each BCELL is the same size. BCELL is the smallest access unit of BRAM. In the actual system, the size of the data unit divided by the SEG module is generally 64 to 512B.

PRAM is a pointer buffer, and PRAM corresponds to the off-chip SSRAM. The address space inside PRAM is also divided into PCELLs, and PCELLs correspond to BCELLs one by one. Each PCELL corresponds to a BCELL, and the corresponding PCELL and BCELL have the same address.

The address of PCELL corresponds to the address of BCELL of the corresponding unit. The basic information in PCELL is the next hop pointer. The relationship between PRAM and BRAM is shown in Figure 2.

Figure 2 PRAM and BRAM relationship diagram

There are two types of linked lists in PRAM. PQ List represents the linked list of stored data packets. To facilitate data reading, PQ List needs to record the first data block address of the data packet, that is, the first pointer Pq_Hptr. To facilitate new data writing, PQ List needs to record the last data block address of the data packet, that is, the tail pointer Pq_Tptr. PQ List also needs to record the length of the linked list as the weight calculation for scheduling by the scheduling module.

Free List represents the idle address queue. In order to easily identify and manage idle addresses and avoid address conflicts, all idle addresses are managed in a linked list in BM. This linked list is the idle address queue. The idle address queue has different forms depending on the system requirements. Generally, the structure of the idle address queue is similar to that of the PQ List, consisting of the idle address first pointer Free_Hptr and the idle address tail pointer Free_Tptr. All operations of the BM module revolve around the idle address queue Free List. [page]

Based on the data flow structure of the BM module, the BM module is generally divided into the Write CONtrol module, the Free List control module, the Read Control module, the PRAM Control module, and the BRAM Control module. The structure of the BM is shown in Figure 3.

Figure 3 BM structure diagram

The Write Control module obtains the free address from the Free List module, submits a write request to the BRAM Control module, and updates the content in the PRAM. The Free List control module is responsible for managing the free address list, providing the Write Control module with the write BRAM address and PRAM address, and recovering the address released after the Read Control module reads the data block. The Read Control module reads the data unit to be sent through the BRAM Control module according to the scheduling result of the scheduler, and writes the released buffer unit address into the free address list. The PRAM Control module is the control module of the external SSRAM and can be directly completed using the reference design. The BRAM Control module is an external DRAM control module, which is generally divided into two sub-modules: Datapath and Controler. The Datapath module is specifically responsible for the data interface part, completing the DQ, DQS processing of the DRAM interface and the corresponding delay adjustment. The Controler module is responsible for completing the DRAM control requirements.

In the BM module, the bandwidth of BRAM and PRAM are usually the bottlenecks of TM. The bandwidth of PRAM is mainly limited by the number of accesses, while the bandwidth of BRAM is limited by the interface bandwidth. For example, for a 10G TM, the effective bandwidth of BRAM must be guaranteed to be 20G. Assuming that the interface utilization can only reach 65% at worst (considering the N+1 problem that occurs when the SEG module splits the cells), the interface bandwidth must be guaranteed to reach 30G. When using a 64-bit DRAM interface, the interface rate cannot be lower than 500MB/s, which puts higher requirements on the design of the Datapath module. In actual systems, BRAM mainly uses DDR SDRAM and DDR II SDRAM.

When using Stratix II FPGA and BRAM using DDR II SDRAM, the test shows that the DDR II SDRAM interface rate can reach 800MB/s. Under normal use, the DDR II SDRAM interface rate can be guaranteed to reach 667MB/s. For a 64-bit DRAM interface, the interface rate can reach 42.7GB/s, which is fully sufficient for a 10G TM system.

The BM module is a buffer management module. The basic unit of the buffer is BCELL. Based on the management of BCELL, the operations on the BM involve the operations of the idle address queue and the linked list. The most basic operations are write operations and read operations. The write operation of the BM module is initiated by the Write Control module. [page]

For the Write Control module, if there is a data unit that needs to be written, it first applies for a free address from the Free List module. The Free List gives the first pointer a to the Write Control module as the write address of the data block, and at the same time reads the content corresponding to the first pointer a in the PRAM to obtain the next hop address b, and uses the next hop address b as the new free address first pointer.

PQ List updates the tail pointer n to the newly written address a, and updates the content of address n in PRAM, adding a as the next hop to address n. In order to save operation cycles, the content of NULL retains the original value and is no longer updated. In this way, a BRAM write operation requires a PRAM read operation and a PRAM write operation.

The QM module receives the dequeuing information from the scheduling module and transmits the dequeued PQ linked list information to the BM module for reading operation.

Figure 4 BM module write operation

Figure 5 Free List stack structure for read operation

The read operation of the BM module is initiated and completed by the Read Control module. When a data unit needs to be read out, the corresponding data unit address needs to be recycled into the free address queue Free List. For different system requirements, the free address queue Free List has different forms. The simpler operation is to use the Free List as a stack.

The Read Control module reads the corresponding BRAM content from the first address 0 of the PQ List, and reads the corresponding next hop address 1 in the PRAM, and updates address 1 to the new first address. The Free List updates the first pointer a to the newly released address 0, and writes the next hop pointer a to address 0. In this way, a BRAM read operation requires a PRAM read operation and a PRAM write operation. [page]

In actual operation, the idle address queue in the form of a stack will put part of the idle address queue into the on-chip buffer. This can save a write operation of the PRAM when the BRAM releases the address to enter the idle address queue when reading, and save a read operation of the PRAM when applying for the idle address when writing the BRAM. The built-in idle address queue table under the PRAM stack structure is shown in Figure 6.

Figure 6 Built-in free address queue table under PRAM stack structure

Taking the read operation of FIG. 5 as an example, when the Read Control module reads the content in the corresponding BRAM from the first address 0 of the PQ List, it reads the corresponding next hop address 1 in the PRAM at the same time, and updates address 1 to the new first address. At this time, address 0 is the address that has been released. According to the operation requirements of the free queue, address 0 needs to enter the free address queue. During the write operation, address 0 is read out and provided to the Write Control module for writing BRAM. Based on the structure of FIG. 6, after being released, address 0 no longer updates the free address queue Free List in the PRAM, but is directly written into the on-chip buffer. When the Write Control module applies for an address, it is read out from the on-chip buffer and provided to the Write Control module. Only when the on-chip Free List buffer is almost full, the free address queue Free List in the PRAM is updated, or when the on-chip Free List buffer is empty, the free address queue Free List in the PRAM is read. Based on the structure of FIG. 6, two PRAM operations can be saved in a read-write cycle, and one PRAM operation can be saved in the worst case. However, based on the stack structure, the addresses at the top of the stack are frequently called repeatedly, and the addresses at the bottom of the stack are rarely used, which will affect the service life of the DRAM. To ensure the service life of the DRAM, some systems make the free address queue Free List into a linked list to ensure that the storage space of each DRAM can be used evenly. The Free List linked list structure of the read operation is shown in Figure 7.

Figure 7 Free Lis linked list structure for read operation

The Read Control module reads the content in the corresponding BRAM from the first address 0 of the PQ List, and at the same time reads the corresponding next hop address 1 in the PRAM, and updates address 1 as the new first address.

Compared with the stack mode, the Free List adds a tail pointer d. When reclaiming an address, the Free List maintains the head pointer a unchanged, updates the tail pointer d to the newly released address 0, and writes the next hop pointer 0 to address d. In this way, a BRAM read operation also requires a PRAM read operation and a PRAM write operation. For the linked list free address queue Free List, two PRAM write operations and two PRAM read operations must be performed in each read and write cycle, and the PRAM efficiency is not high. [page]

In view of the efficiency of the two free address queues and their impact on DRAM, a compromise method is adopted in many systems. That is, a linked list method is used in PRAM to manage the free address queue Free List, and a stack mode is used in the chip to build another free address queue Free List. In this case, each read and write cycle requires three PRAM operations.

In actual systems, the bandwidth of BRAM and PRAM are generally the bottlenecks of TM. PRAM is mainly limited by the number of accesses, while BRAM is limited by the interface bandwidth.

In the 10G TM system, the bit width of the on-chip data bus is set to 128 bits, the system clock is set to 150MHz, and the size of the BCELL is set to 64B. In this case, the read and write operations are both 4 clock cycles. Under the requirements of the 10G system, the read and write operation cycles are 7 clock cycles. It has been calculated before that under the conditions of the 10G TM system, the BRAM uses 64-bit DDR II SDRAM, and the interface clock uses 250MHz to meet the requirements of the data interface. PRAM uses 32-bit ZBT SRAM, the interface clock uses the system clock, each PCELL is 64 bits, and each read and write cycle requires 6 clock cycles to complete. In the actual system, Altera FPGA is used, and the design of BM can meet the requirements of 10G TM line speed operation.

In the TM system of the 40G core network, the bit width of the on-chip data bus is 256 bits, and the system clock uses 250MHz (200MHz can be selected in the 40GE system). Using DDR II SDRAM, the interface clock uses 333MHz, and the 192-bit BRAM can meet the 40G TM requirements. At this time, BCELL can be 96B, 192B, or 384B, and 192B is selected here. When BCELL uses 192B, the read and write operations are also 6 clock cycles. Under the requirements of the 40G system, the read and write operation cycles are 9 clock cycles. PRAM uses 48-bit QDR SRAM, the interface clock uses 150MHz, each PCELL is 96 bits, and the PRAM can be operated up to 5 times in each read and write clock cycle. In the case of using Altera FPGA, BRAM uses 192-bit DDR II SDRAM, PRAM uses 48-bit QDR SRAM, and the BM design can meet the requirements of 40G TM line speed operation.

Keywords：FPGA Reference address：Implementation of Customized Buffer Management Based on FPGA and IP Core

Previous article：FPGA global clock resource related primitives and their use
Next article：Improved Nagao filter based on texture analysis

Recommended ReadingLatest update time:2024-11-16 21:58

Design and implementation of robot voice control system based on DSP and FPGA

1 Introduction The robot hearing system mainly recognizes and judges human voices, and then outputs corresponding action commands to control the movements of the head and arms. The traditional robot hearing system generally uses a PC as a platform to control the robot. Its characteristic is that a computer

[Embedded]

MCS-51 microcontroller and CPLD/FPGA interface logic design

Functionally, microcontrollers and large-scale CPLDs are highly complementary. Microcontrollers have the advantages of high cost performance, flexible functions, easy human-machine dialogue, and good data processing capabilities; CPLD/FPGA has the advantages of high speed, high reliability, convenient development, and

[Microcontroller]

FPGA Implementation of Multi-channel Frequency Detection Technology

Abstract: Multi-channel frequency detection is a common frequency measurement scheme for current digital receivers. This method can better solve the contradiction between frequency intercept probability and frequency resolution, and has the ability to process multiple simultaneously arriving signals in a complex e

[Embedded]

FPGA Implementation of Multi-channel Frequency Detection Technology

GDDR6 brings large bandwidth storage advantages and performance testing to FPGA

1. Overview With the advent of the Internet era, the data generated by humans has experienced unprecedented and explosive growth. IDC predicts that the total amount of global data will increase from 45ZB in 2019 to 175ZB in 2025 . At the same time, nearly 30% of the global data will need to be process

[Embedded]

GDDR6 brings large bandwidth storage advantages and performance testing to FPGA

Optimizing automotive electrical architecture design with FPGA and complete IP solutions

In-vehicle network architectures have become increasingly complex over the past decade. While the number of in-vehicle network protocols has decreased, the number of networks actually deployed has increased significantly. This raises the issue of scalability of network architectures and requires semiconductor devices

[Power Management]

Design of frequency measuring meter for generator set using FPGA

1 Introduction In modern society, electric resources have become an indispensable part of people's lives, and generators and motors play a very important role in the power system. In many cases, it is necessary to measure the frequency of the motor group and the power grid. At present, there are many circuit systems f

[Test Measurement]

Design of frequency measuring meter for generator set using FPGA

Interface design of high-speed data acquisition system based on FPGA

Introduction Currently, more and more communication systems operate in very wide frequency bands, especially for certain wireless communications that have high requirements for confidentiality and anti-interference. As the processing speed of signal processing devices becomes faster and faster, data The sampling rate

[Embedded]

Power Management Solutions Based on Mixed-Signal FPGA

Overview Actel Fusion mixed-signal FPGAs can implement power management, including power-on sequencing, run-time power monitoring, and shutdown control. In addition, Actel provides a mixed-signal power management tool (Mixed-Signal POWER Manager, MPM) reference design that can demonstrate various configurable p

[Embedded]

Power Management Solutions Based on Mixed-Signal FPGA

Popular Resources
Popular amplifiers