Design of FIR Decimation Filter Based on FPGA

fighting

Design of FIR Decimation Filter Based on FPGA [Copy link]

Author: Tong Liyong, Xiao Shanzhu, ATR Laboratory, National University of Defense Technology
This paper introduces the working principle of FIR decimation filter, focuses on the method of implementing FIR decimation filter with XC2V1000, and gives the simulation waveform and design features.
Keywords : FIR decimation filter; pipeline operation; FPGA

　　　　It is relatively complicated to implement decimation filters with FPGA, mainly because there is a lack of effective structures to implement multiplication operations in FPGA. Now, hardware multipliers are integrated in FPGA, which makes FPGA have made great progress in digital signal processing. This paper introduces a design method for implementing FIR decimation filters using Xilinx's XC2V1000.

Specific implementation
structure design
　　　　Based on the working principle of the decimation filter, this paper uses XC2V1000 to implement a 3rd-order FIR decimation filter with a decimation rate of 2 and a linear phase, and uses schematics and VHDL to complete the source file design. Figure 1 is the top-level schematic diagram of the decimation filter. Among them, clock is the working clock, reset is the reset signal, enable is the input data valid signal, data_in (17:0) is the input data, data_out (17:0) is the output data, and valid is the output data valid signal. adder18 is the adder module, mult18 is the multiplier module, acc36 is the accumulator module, signal_36to18 is the data truncation module, and fir_controller is the controller module. The controller sends data or control signals to the adder, multiplier and accumulator regularly to realize pipeline operation.

Figure 1. Top-level schematic diagram of the decimation filter

Controller
　　　　The controller is the core module of the decimation filter, and has two functions: one is to receive input data, and the other is to send data and control signals to other modules. According to the timing characteristics of the adder, multiplier and accumulator, it regularly sends tap data to the adder, sends coefficients to the multiplier, and sends control signals to the accumulator, so that the adder, multiplier and accumulator can complete the specified tasks in each clock cycle, thereby realizing pipeline operation. The controller is described in VHDL language, and registers are used to store taps and coefficients.

Adder
　　　　The input and output of the adder are both 18 bits, and it is described and implemented in VHDL language. It has a delay of two working clocks. When the input data is ready, the first clock obtains the addition result, and the second clock latches the addition result for output.

Multiplier
　　　　The multiplier has 18 bit input and 36 bit output, and is implemented with library components MULT18X18S and 36 bit latches. MULT18X18S is an 18×18 bit hardware multiplier built into XC2V1000. A single clock can complete the multiplication operation. The 36-bit latch works on the rising edge of the clock and is described in VHDL. The multiplier (mult18) also has a delay of two working clocks. When the input data is ready, the first clock obtains the multiplication result, and the second clock latches and outputs the multiplication result. The adder and multiplier use a latched output structure. Although a delay of one working clock is added, it is conducive to the stable operation of the extraction filter and improves reliability. The 36-bit

accumulator
　　　　is used to accumulate the output of the multiplier to obtain the filtering result. It has a control port clr. When clr is high, the previous round of accumulation results are output, initialized, and a new round of accumulation begins; when clr is low, accumulation is performed. The accumulator is described in VHDL.

Data truncation
　　　　The data truncation is described in VHDL language and is used to select and discard the 36-bit output of the accumulator. Generally, the low-order part of the data is cut off and the high-order part of the data is retained. In order to simulate the function of the decimation filter, the high 18 bits of the data are cut off and the low 18 bits of the data are retained.

Working process and functional simulation
　　　　The following takes the whole process of the decimation filter completing a decimation filter as an example to illustrate the working process of the decimation filter. Assuming that the controllers of clock 1, clock 2, clock 3 and clock 4 have received data x(n-3), x(n-2), x(n-1) and x(n), then,

clock 5: the controller sends data x(n) and x(n-3) to the adder;
clock 6: the adder performs x(n)+x(n-3) operation; the controller sends data x(n-1) and x(n-2) to the adder;
clock 7: the adder performs x(n-1)+x(n-2) operation and outputs the operation result x(n)+x(n-3). The controller sends the coefficient h(0) to the multiplier;
Clock 8: the adder outputs the operation result x(n-1)+x(n-2), the multiplier performs the operation h(0)[ x(n)+x(n-3)], and the controller sends the coefficient h(1) to the multiplier;
Clock 9: the multiplier performs the operation h(1)[ x(n-1)+x(n-2)] and outputs the operation result h(0)[ x(n)+x(n-3)]. The controller sends a control signal to the accumulator (clr is high);
Clock 10: the multiplier outputs the operation result h(1)[ x(n-1)+x(n-2)]. The accumulator is initialized and the accumulation operation begins. The controller sends a control signal to the accumulator (clr is low);
Clock 11: the accumulator performs the accumulation operation: h(0)[ x(n)+x(n-3)]+ h(1)[ x(n-1)+x(n-2)]. The controller sends a control signal to the accumulator (clr is high level), and the controller outputs a valid signal for the filtered data (valid is high level);
Clock 12: The accumulator outputs the accumulated result h(0)[x(n)+x(n-3)]+h(1)[x(n-1)+x(n-2)], and initializes to start a new round of accumulation operation. The controller outputs an invalid signal for the filtered data (valid is low level).

　　　　The above is the whole process of the decimation filter completing a decimation filter. It can be seen that it takes 8 working clocks from the input of data x(n) to the output of the filtered result y(n). If the controller continuously sends taps, coefficients and control signals to the multiplier and adder, a pipeline operation will be formed, and then the decimation filter will output a filtered result every two clocks.

Two points to note
(1) When two n-bit binary numbers are added, their sum requires at least n+1 bits of binary numbers to be correctly represented. The adder input/output in this design are both 18 bits. To prevent the adder from overflowing, the highest two bits of the 18-bit input data x(n) should be the same (both are sign bits).
(2) In order to realize the multi-stage series structure of the decimation filter, the timing requirements of the input data valid signal enable and the output data valid signal valid should be unified. This design stipulates that the controller sends out the filter result valid signal at the next clock after the accumulator outputs the filter result, and the duration is one working clock cycle.

Design features:
　　　　The following three features are available for the decimation filter implemented using this design structure:
(1) Saving on-chip resources and improving resource utilization efficiency. Since the filter generated by the IP core often cannot reasonably utilize the on-chip resources according to the actual situation, it causes resource waste. This design adopts a pipeline structure, and all functional modules work at full capacity without idle waiting for clocks, thereby saving on-chip resources and improving resource utilization.
(2) The multi-stage structure of the decimation filter can be realized. According to the output characteristics of the decimation filter, the same design method can be adopted to design a first-stage decimation filter, and the data output by the previous stage can be extracted and filtered again, thereby realizing a multi-stage decimation filter structure.
(3) The design is flexible and has strong scalability. Using registers to store taps and coefficients is suitable for situations with a small number of filter orders. If a decimation filter of hundreds of orders is required, it is best to use the XC2V1000 on-chip RAM to store taps and coefficients. At this time, it can be realized by slightly changing the logic design of the controller. On this basis, a programmable decimation filter can also be realized.

Conclusion
　　　　This paper takes the realization of a 3rd-order FIR decimation filter with a decimation rate of 2 and a linear phase as an example, and introduces a design method for implementing an FIR decimation filter using Xilinx's XC2V1000. The decimation filter designed by this method has strong flexibility and high resource utilization, and can be widely used in the field of digital reception.

叫我国王大人

This is still needed by the hall. I just happened to see it again recently.

Design of FIR Decimation Filter Based on FPGA [Copy link]

Latest reply