Design and implementation of efficient FIR filter based on FPGA-EEWORLD

Collect

Abstract: A design method for digital filters based on FPGA is given . This method first designs a FIR filter with specific indicators through MATLAB , then processes the filter coefficients to make it easy to implement in FPGA, and then uses a filter structure based on distributed algorithm and CSD coding to design, thereby avoiding multiplication operations and saving hardware resources. Its pipeline design method also improves the running speed. Matlab and Modelsim simulations show that the design function is correct and can achieve fast filtering.

0 Introduction

Digital filters play an important role in speech and image processing, pattern recognition, radar signal processing, spectrum analysis and other applications. They can avoid problems such as temperature drift and noise that analog filters cannot overcome. At the same time, they are more accurate, stable, small and flexible than analog filters, so they are widely used. In acoustic well logging, it is usually necessary to filter the signal accurately, and the filter has strict real-time requirements. This paper uses the auxiliary Matlab design tool to design a high-order fast digital filter based on FPGA that can meet the logging requirements.

1 Linear Phase FIR Filter Structure

There are many types of digital filters, and the classification methods are also different. From the perspective of the unit impulse response of the digital filter, digital filters are divided into finite impulse response digital filters (FIR) and infinite impulse response digital filters (IIR). Compared with IIR filters, FIR filters can be accurately designed with linear phase, and their structure has stable quantized filter coefficients. For acoustic logging processing of acoustic signals with linear phase requirements, FIR filters are the first choice.

In the time domain, the input and output process of the FIR filter is a process of linear convolution of the input signal and the unit impulse response, and its differential equation expression is:

Among them, y (n) is the filter output, x (n) is the sampled data, and h (n) is the filter tap coefficient. Its structure is shown in Figure 1 (a). In the figure, an N-1 order FIR filter is described by N coefficients. Usually, N multipliers and N-1 two-input adders are required to realize it. It is not difficult to find that the coefficients of the multiplier are exactly the coefficients of the transfer function. Therefore, this structure is called a direct structure.

For a FIR linear phase filter with symmetric coefficients, equation (1) can be written as follows:

The structure of the improved FIR filter with symmetric coefficients is shown in Figure 1(b). This structure combines taps with symmetric coefficients (same or opposite) and then multiplies them, which can reduce the number of multipliers to half of the original number, but also adds additional adders.

Figure 1 FIR filter structure

2 Design methods and indicators

FDATool is a dedicated filter design and analysis tool in the Matlab signal processing toolbox. The main function of this tool is to extract filter coefficients according to design indicators. The key to designing digital filters with FDATool lies in the selection of parameters such as filter type, window function, filter order, and cutoff frequency. The window function is used to determine the stopband attenuation and transition band bandwidth. Commonly used window functions include rectangular window, Hanning window, Hamming window, and Blackman window. The rectangular window and Hanning window have small stopband attenuation, while the Blackman window has a large transition band. Relatively speaking, the Hamming window is more in line with the design requirements. Its minimum stopband can reach 54.5dB, and the normalized transition band bandwidth is 3.11π/M (filter order N=2M+1). For acoustic logging signals, the parameters listed in Table 1 should be set during design.

Table 1 Filter parameter selection

Filter parameter selection

Figure 2 shows the amplitude-frequency and phase-frequency response curves of the filter, which maintain linear phase in the passband, stopband attenuation greater than 52dB, and transition band bandwidth of 1.65kHz. The tap coefficients can be quantized to fixed-point integer data in the toolbox to implement a 127-order filter on the FPGA, which has a total of 128 coefficients. For larger order filters, quantization has minimal effect on stopband attenuation and transition band.

Figure 2 Filter amplitude-frequency and phase-frequency response characteristic curves

3 FPGA-based filter design

The key to designing FIR filters with FPGA is how to deal with the multiplication unit that occupies a lot of resources. The introduction of distributed algorithm (DA) can convert multiplication operations into shift-add operations, thereby saving hardware resources. If Hk is the filter coefficient, xk (n) is the sample input at time n, and y (n) is the system response at time n, then equation (1) can be equivalent to the following equation:

If the source data format of the data is specified as 2's complement form, then:

In the formula, xkb (n) is a binary number, which can be 0 or 1; xk0 (n) is the sign bit, 1 means the data is negative, and 0 means the data is positive. Therefore, substituting (4) into (3) yields:

The form of formula (5) is called a distributed algorithm. It can be seen that the square brackets represent a data bit of the input variable and each bit of all filter tap coefficients H0~HN are “AND” operated and summed. The exponent part describes the bit weight of the summation result. An integer multiplied by 2b is a left shift of b bits, which can be achieved through hardware wiring without occupying logic resources. In this way, the operation in the square brackets can be achieved by establishing a lookup table. The lookup table can be addressed with the same bit of all input variables. This is a distributed algorithm based on the lookup table (LUT-DA).

The lookup table size of the LUT-DA algorithm is B·2N bits, where B is the bit width of the input data and N is the filter order. As the filter order increases, the lookup table size grows exponentially by 2; when B is 16 and N is 128, the size of the lookup table is already unimaginable. Therefore, dividing the lookup table into multiple sub-tables can effectively solve this problem, which also derives relatively effective serial LUT-DA algorithms and parallel LUT-DA algorithms, but both have shortcomings. For a serial structure, it takes more than B clock cycles to complete an output; for a parallel structure, although an output can be completed in one clock cycle, it is necessary to copy B identical LUT tables, which will increase the overhead of hardware resources.

In order to balance speed and area, this paper designs a CSD-DA algorithm based on the DA algorithm principle. First, the fixed coefficient Hk in the coefficient formula (3) is expanded by the power of 2 to obtain:

Then swap the shift and accumulation order, and you get the following:

Wherein, Hkb is a weight coefficient with a value of 0 or 1; Sk is 1, indicating that Hk is positive, and -1, indicating that Hk is negative; s′kb can be 0, -1, or 1. After the expansion of equation (4), all multiplication operations will be converted into shift-add operations, and the parts with a weight of 0 can be eliminated without calculation. In order to further reduce the non-zero items in the Hkb array, Hk can be encoded as a CSD code, that is, starting from the least significant bit of the binary code, all 1 sequences greater than or equal to 2 are replaced by 10···01, and 1 indicates that the bit is -1. Since any two adjacent bits in the CSD representation must contain a 0, the number of 1s will not exceed N/2 at most. On average, about 1/3 of the bits in the CSD representation are non-zero values, which is about 1/3 less than the non-zero bits in the complement representation. Assume h = (15) 10 = (01111) 2, y = hx = x (23 + 22 + 21 + 20), and if (15) 10 is encoded as (10001) csd, then Y = x · (24-20). Using binary encoding, three adders will be used, while using CSD encoding, only one subtractor will be used. It can be seen that CSD encoding can essentially reduce hardware resource overhead. After CSD encoding optimization, the number of non-zero values of s′kb will be much smaller than the number of non-zero values of Hkb.

For FIR filters with symmetrical linear phase coefficients, in order to reduce the number of multiplication units, the structure shown in Figure 3 can be selected. Since all multiplication operations can be converted into a large number of addition and subtraction operations, the critical path will be too long and the system will run at a low speed. However, adding pipeline registers can reduce the length of the critical path, thereby increasing the maximum operating frequency of the system. When b is a constant, the number of non-zero values of s′kb is uncertain. Therefore, when designing the pipeline, it can be flexibly divided according to s′kb. The longer the path, the more pipeline registers are added. In order to prevent the overflow of the intermediate results, the bit width of the register must be redundantly designed. For signed numbers, the bit width is M+log2N-1, where M is the bit width of the upper accumulator and N is the filter order. The local structure of the pipeline CSD-DA algorithm

Figure 3 Local structure of pipeline CSD-DA algorithm

From the pipeline optimized CSD-DA algorithm structure in FIG3 , it can be seen that all multiplications are converted into shift additions, the shift operations can be implemented by hardware wiring, and the entire structure has undergone reasonable pipeline segmentation.

Table 2 lists the comprehensive results of filters with different structures. The parallel structure is the worst one, which occupies more resources and has a slow speed. The serial LUT-DA structure, although it occupies less resources and has a high maximum operating frequency, is a serial structure after all, and cannot complete the filtering operation of one sampling point in one clock beat. The pipeline CSD-DA structure has obvious advantages in both speed and area. If the working clock is 75MHz, then one clock beat can complete one output, and it only takes 4.4μs to process a single-channel signal of 330 sampling points, which can meet the real-time requirements of well logging.

Table 2 Comprehensive results of filters

Comprehensive results of the filter

4 Results Analysis

In order to verify whether the function of the filter is correct, this design can be simulated in Modelsim. If the original waveform is a noisy sound wave signal, then the filtering result is shown in Figure 4.

Simulation results of the filter in Modelsim

Figure 4 Simulation results of the filter in Modelsim

Figure 5 shows the simulation results of the filter in Matlab. It can be seen that the simulation results of Modelsim and Matlab are consistent. In the frequency domain, by comparing Figure 5 (a) and Figure 5 (b), it can be seen that the waveform after filtering only retains the spectrum part of 5kHz~18kHz, which shows that the digital filter design of the pipeline CSD-DA structure is correct.

Simulation results of the filter in Matlab

Figure 5 Simulation results of the filter in Matlab

5 Conclusion

This article describes in detail the method of designing FIR linear phase filters using Matlab tools, and designs a pipeline CSD-DA structure that is superior to traditional structures for acoustic signals. This structure has obvious speed and area advantages. The rationality and correctness of the design are also verified by simulation experiments. However, it is worth pointing out that this structure is only suitable for occasions where the filter coefficients are fixed. If you want to modify it, you need to re-encode the coefficients with CSD and pipeline segmentation.

Keywords：FPGA Matlab Modelsim Reference address：Design and implementation of efficient FIR filter based on FPGA

Previous article：FPGA implementation of high-precision DDFS signal source
Next article：Design of High Precision Signal Source Based on FPGA

Recommended ReadingLatest update time:2024-11-16 15:37

Design of SCI Interface Circuit IP Core Based on FPGA

With the development of Very Large Scale Integration (VLSI) process technology, the scale of chips is getting larger and larger, and the integration scale is growing according to Moore's Law. Field Programmable Logic Devices (FPGAs) are widely used in digital system design because they have the flexibility of field

[Embedded]

Design of SCI Interface Circuit IP Core Based on FPGA

FPGA Building Blocks for Next-Generation Automotive Designs

Automotive technology is on the rise. A few years ago, the discussion around fully autonomous driving was very loud, but the technological development in the real automotive world has calmed down. Now people are more pragmatic and want to discuss which technologies can bring meaningful value to cars now and whi

[Embedded]

FPGA Building Blocks for Next-Generation Automotive Designs

Application of LabVIEW and MATLAB in Digital Antenna Array Test

Introduction Digital antenna array is the product of the combination of antenna and digital signal processing technology. It has many advantages such as flexible working mode, excellent anti-interference performance and super angular resolution. Therefore, it has been widely used in military and civilian fields.

[Test Measurement]

Application of LabVIEW and MATLAB in Digital Antenna Array Test

Design of digital phase-shift signal generator based on AVR microcontroller and FPGA to realize DDS

1 Introduction Phase shift signal generator is an important part of signal source, but traditional analog phase shift has many shortcomings, such as the phase shift output waveform is easily affected by the input waveform, the phase shift angle is related to the size and nature of the load, the phase shift acc

[Microcontroller]

Application interface program between Matlab and C language

Application interface program between Matlab and C language , Understand several commonly used socket functions #include #include int socket(int domain,int type,int portocol); domain refers to the protocol family used, which can be AF_UNIX and AF_INET, usually only AF_INET (referring to

[Industrial Control]

Design of large screen display control system based on ARM+FPGA

0 Introduction With the development of computer and semiconductor technology, LED large-screen display system has become a display device integrating computer control, video, optoelectronics, microelectronics, communication, and digital image processing technology. At present, LED large-screen displays are

[Microcontroller]

FPGA and UART Design of GPS_OEM Board

Introduction UART (Standard Asynchronous Receiver/Transmitter) has been widely used, which allows full-duplex communication on serial links. Generally, a universal UART interface chip is used, but this chip has complex circuits and high costs, which reduces the reliability and stability of the system. Due to t

[Embedded]

Lattice Launches Lattice Certus-NX FPGA for Automotive Infotainment, ADAS, and More

On August 25, Lattice Semiconductor, a leader in low-power programmable, announced the launch of the Lattice Certus™-NX FPGA series for infotainment systems, advanced driver assistance systems (ADAS) and safety applications to expand its automotive product portfolio. These new Certus-NX devices are built on the Latt

[Automotive Electronics]