Research on the application of pipeline technology in DSP calculation based on FPGA-EEWORLD

Collect

In the field of digital signal processing (DSP), the amount of data that needs to be processed is huge, and the real-time requirements are very high. Traditional DSP design methods mainly use fixed-function DSP devices and DSP processors. Due to their poor flexibility and the sequential nature of software algorithms during execution, their application in high-speed and real-time systems is limited. With the continuous innovation of deep submicron semiconductor manufacturing processes and the continuous introduction of million-gate programmable devices, a third effective solution has been provided for DSP, namely, the use of FPGA to implement DSP operation hardware. It can meet the needs of DSP applications in terms of integration, speed and system functions.

However, in the process of applying FPGA to system design synthesis, the optimization of chip operation speed and resource utilization are often contradictory. Design optimization with high speed index requirements often occupies a large chip resource, while the design that reduces the chip area needs to reduce the system speed at the expense of reducing the system speed. From the perspective of FPGA development trend and DSP computing requirements, the significance of system speed index is more important than area index, and we need to further study the design strategy to improve the maximum working speed of the chip. This article discusses the specific approach of using pipeline technology in FPGA-based DSP system design, making full use of the parallelism inside the hardware, improving the data processing capacity per unit time, that is, data throughput, on the limited resource chip area of FPGA, and improving the working speed of the system.

0 Basic principles of pipeline technology and FPGA structural characteristics

Pipeline is a technology that is serial in time and parallel in space. Its basic principle is shown in Figure 1. The entire circuit is divided into several pipeline stages. Registers are set between each stage of the pipeline to latch the data output by the previous stage. Each stage only completes part of the data processing. One clock cycle completes the data processing of one stage, and then the processed data is passed to the next stage when the next clock arrives. After the first group of data enters the pipeline, it is passed to the second stage after one clock cycle, and the second group of data enters the first stage at the same time, and the data queue advances in sequence. Each group of data must pass through all the pipeline stages to get the final calculation result, but for the entire pipeline, each clock can calculate a group of results, so it only takes one clock cycle to calculate a group of data on average, which greatly improves the data processing speed. The larger the amount of data processed by the circuit per unit time, that is, the greater the throughput of the circuit, ensuring that the entire system works at a higher frequency.

Basic principle diagram of pipeline technology

The structural characteristics of FPGA are very suitable for pipeline design. Taking Altera's low-cost series Cyclone II as an example, it not only has up to 68416 logic units (LEs), but also provides embedded storage resources to support various storage applications and low-cost DSP applications (such as multiplier modules, PLLs). Each LE contains a four-input lookup table LUT, a programmable trigger, etc. In general designs, this trigger is either not used or used to store wiring resources. In the design, an arithmetic operation can be decomposed into some small-scale basic operations and configured in the LUT, and the carry and intermediate values are stored in the register, and the operation continues in the next clock. Therefore, the use of pipeline technology in FPGA requires very little or no additional resource cost. Especially in situations where large-scale repeated operations are required, such as convolution operations in digital signal processing, FFT or FIR filter design, the use of pipeline technology can greatly improve the system operation speed.

1 Pipeline Design and Performance Analysis of Basic DSP Operations in FPGA

Adders and multipliers are the most basic computing components in DSP. There are two basic methods to design adders or multipliers on the Quartus software platform: schematic diagram method and VHDL language. Considering that the parameterized macro module (Library of Parameterrized Modtlles-LPM) has been strictly tested and optimized to achieve the best performance, we adopt the schematic design method and introduce two pipeline-settable LPM modules, 1pm add sub and 1pm mult, through the MegaWizard P1ug-In Manager tool to realize the design of adders and multipliers with different bit widths and different pipeline stages. The CycloneII series EP2C5Q208C7 device is selected for synthesis, layout and routing, timing analysis and simulation design to compare the performance change characteristics.

1.1 Performance comparison of operators with different pipeline stages

Different pipeline stages are selected for designing the 16-bit adder and 8-bit multiplier, and the comparison results are shown in Tables 1 and 2.

Comparison results of designing 16-bit adders and 8-bit multipliers using different pipeline stages

From the comparison results we can see:

(1) The use of pipeline technology generally significantly improves the working speed compared to not using pipeline technology, reflecting the advantages of pipeline technology in high-speed DSP operations.

(2) The use of pipeline technology increases resource consumption (number of logic units and registers, number of memory bits).

(3) Using different pipeline stages will result in different speed indicators and resource consumption rates. As the number of pipeline stages increases, the speed indicator may not increase, but the resource consumption will increase greatly, so attention should be paid to the trade-off between speed and resource consumption indicators. For example, for a 16-bit adder, if M4K (dedicated memory resources) is not used, a 2-stage pipeline is the best; if M4K is used, a 6-stage pipeline is the best. For an 8-bit multiplier, a 2-stage or 6-stage pipeline is the best. For other DSP operations, during the design process, it is necessary to repeatedly compare and design to select the pipeline stage that meets the system performance requirements.

1.2 Performance comparison of different bit-width arithmetic units with the same pipeline stages

The data bit width of the adder and multiplier using a 6-stage pipeline is changed, and the changes in their performance indicators are analyzed through comprehensive simulation, see Table 3.

Performance comparison of 6-stage pipeline adders and multipliers

From the comparison results, it can be seen that when the same number of pipeline stages is used, the working speed is basically the same, but the resource consumption increases sharply with the increase of the number of input bits. The increase of the adder is mainly the increase of the number of logic units LE; the increase of the multiplier is the increase of the number of memory bits and embedded multipliers. Therefore, for different arithmetic circuits, different types of FPGA devices should be selected according to the needs to meet the needs of different resources. For example, when only adding operations are performed, FPGA devices with rich logic units such as HACEX series and FLEX series can be selected; for multiplication and addition operations, it is necessary to select Cyclone, CycloneII and other series with embedded multiplier modules and memory modules.

2 Other issues that should be paid attention to in implementing DSP pipeline design based on FPGA

2.1 Selection of pipeline design method

Pipeline design can be divided into two basic methods: schematic diagram and VHDL.

As mentioned earlier, when designing with the schematic input method, in order to improve design efficiency, you should make full use of the LPM module with LPM_PIPELINE, and use the optimal LPM PIPELINE value (that is, the optimal number of pipeline stages) given by the QuartusII compiler (which provides more than 40 LPM functions) to set the optimal LPM_PIPELINE value.

When no suitable IPM module is available, VHDL is required as design input.

The essence of pipeline technology is to add registers at appropriate places to temporarily store the previous operation results or input data, and when the next clock arrives, the registered value is used as the input of the next level of operation. Therefore, when describing the pipeline with VHDL, it is only necessary to rewrite the description code of the operator without pipeline and impose necessary design constraints to achieve it. Generally, the sensitive signal edge is tested by adding WAIT statements or IF_THEN statements in the process to implement registers or latches.

If the WAIT statement is used, the commonly used description form is:

PROCESS

BEGIN

wait until clk'event and clk='1'; (rising edge trigger)

reg<=x;

END PROCESS;

The x here refers to the data input into the added pipeline register reg.

Using the IF_THEN statement, the common description method is:

IF(clk'event and clk='1') THEN…

In addition, you can also use the LPM function provided by Altera when designing input with VHDL, but you must make the LPM library available before designing the entity, that is, add the following statement:

LIBRARY lpm;

USE lpm. lpm_components． ALL;

2.2 The first delay of the pipeline and the trigger time of the register

The delay analysis of the system shown in Figure 1 shows that the delay of the combinational logic in the figure includes two levels. In a properly designed pipeline structure, the delay time should be roughly equal, set to 2Tpd, and the trigger time for inserting each level of register group is Tco. Therefore, the total waiting delay from input to output is: TDl=2(Tpd+Tco), which is called the first delay (First Latency) of the pipeline design. For continuous operations, after adding the register group, the intermediate results after each level of calculation can be temporarily saved. When the next clock arrives, it can directly participate in the next level of logic operation without waiting for data to be transmitted from the input end of the system. Therefore, it only takes one clock cycle to obtain the second result and subsequent operation results, and the waiting delay is: TD2=Tpd+TCO.

It can be seen that the first delay of the pipeline design is much longer than the normal pipeline delay. Therefore, when choosing whether to use pipeline technology, the frequency of DSP operations should be analyzed. When continuous operations are required (that is, the pipeline is always fully loaded), the use of pipelines can greatly improve the data throughput; but if addition and multiplication operations are only occasionally required, the first delay is greater than the pin-to-pin delay in the non-pipeline mode, the pipeline application effect becomes worse, and additional chip resources are sacrificed, so it is not recommended to use pipelines. In FPGA/CPLD, the device delay Tpd is much longer than the register trigger time TCO. Generally, TCO can be ignored when analyzing pipeline throughput delay. However, in high-speed computing occasions or when there are many pipeline technologies (such as video signal processing or data processing in wireless communications), TCO is no longer negligible compared to Tpd, and the number of pipeline levels must be carefully selected to prevent the influence of TCO from causing pipeline bottlenecks.

2.3 Full Utilization of Embedded Memory Block Resources

In FPGA devices, embedded memory blocks are special resources provided to support various memory applications and DSP applications. For example, Altera's FLEX10K series devices provide three embedded array blocks EAB, each EAB provides a 2048-bit RAM that can be flexibly set, and the Cyclone series provides dozens of M4K resources, each M4K provides 4608 bits of RAM, which can be used alone or in combination. Using EAB or M4K to build an operator such as a multiplier is actually to form a multiplication lookup table, and its operation speed is faster than that of a multiplier using LPM, but due to limited resources, only small multipliers can be implemented. If small multipliers based on embedded array blocks can be combined with pipeline technology, the amount of calculation and speed can be further improved.

2.4 Division of control pipeline and data pipeline

Due to the increasing complexity of digital signal processing systems, when using pipeline technology to implement DSP operation design, another issue that needs to be considered is the division of control pipeline and data pipeline. For example, in high-speed data acquisition and processing systems, the processing of sampled data mainly involves DSP operations, which can be classified as data pipeline. The selection control, analog-to-digital conversion, data buffering and transmission, and data operation control of each input channel sensor and signal conditioning circuit need to be completed by the main control chip, as shown in Figure 2. The high-speed main control chip can use FPGA devices and use pipeline technology to design the control process of channel selection, analog-to-digital conversion, data buffering and transmission, and data operation in the data acquisition and processing process into a four-level pipeline process to reduce the average operation time of data acquisition and processing and achieve high-speed data acquisition. The pipeline technology of the main control chip can be classified as the control pipeline category.

Hardware block diagram of high-speed data acquisition and processing system

3 Conclusion

Through experimental comparison, it is verified that pipeline technology can realize high-speed DSP operation based on FPGA devices. In the design of specific operators, the number of pipeline stages should be compared and optimized through the synthesis process to meet the needs of speed and resource optimization. When designing pipelines in DSP systems, it is necessary to determine whether to use pipelines based on the operation frequency, reasonably divide control pipelines and data pipelines, and pay attention to the reasonable selection of schematics and VHDL descriptions, and make full use of resources such as LPM and EAB (M4K) with LPM_PIPELINE to maximize the system data throughput and design efficiency.

2009/8/20 22:17:05

Reference address：Research on the application of pipeline technology in DSP calculation based on FPGA

Previous article：DSP fast network access solution based on W5100
Next article：Portable Dynamic Signal Analyzer Based on TMS320F2812

Popular Resources
Popular amplifiers