Principles and Applications of FPGA and Specialized DSP
[Copy link]
An FIR filter (Figure 1) stores a series of n data cells, each delayed by one additional cycle. Typically, these data cells are called branches. Each branch is multiplied by a coefficient, and the results are summed to produce the output. Some approaches perform all multiplications in parallel. A more general approach is to divide the multiplication into N stages, using accumulators to pass the results from one stage to the next. These implementations trade functional resources for speed, taking N computational stages and requiring n/N multipliers. There are a number of other common design optimizations, depending on whether the coefficients are static or dynamic and the values of the coefficients being designed.
Figure 1. Typical FIR filter implementation
Implementation methods
FFTs are used in a variety of applications, from image compression to determining the spectral content of data samples. There are many ways to implement FFTs. The most common method is the general Cooley-Tukey time extraction, which decomposes the FFT into several smaller FFTs. The simplest implementation method uses a Radix-2 butterfly unit (Figure 2), whose input data must pass multiples. This calculation is conceptually simple; however, all the multiplications and additions on the left side of the figure are calculated using complex numbers, and the real numbers required for multiplication and addition are a more complicated problem (as shown on the right side of the figure).
IIR filters are similar to FIR filters except that feedback paths are introduced. These feedback paths make the design and analysis of IIR filters more complicated than FIR. However, for the same silicon area, the IIR method can provide a more powerful filter. Although there are several IIR structures, a common structure uses a 2nd order quartic structure (Figure 3).
Many applications use mixers to convert signal frequencies. Conceptually, a single multiplier can be used, and in digital applications, there are many advantages to using complex form. The most general form is that the signal is represented as I and Q components.
DSP Selection
As these general-purpose functions are applied, the core of most DSP applications is multiplication, addition, subtraction or accumulation. General-purpose DSP chips combined with general-purpose microprocessors can effectively implement these functions. The number of multipliers is usually 1 to 4, and the microprocessor sequences the data through multiplication and other functions, storing intermediate results in memory or accumulators. Performance is mainly improved by increasing the clock speed used for multiplication. Typical clock speeds are tens of MHz to 1GHz. Performance is measured in MMACs (million multiply-accumulates) per second, with a typical value of 104000.
For better functionality, multiple DSP engines must be combined in parallel. The main advantage of this approach is the direct implementation of algorithms written in high-level programming languages (such as C).
DSP-oriented FPGAs can implement many functions in parallel on a single chip. General-purpose routing, logic and memory resources interconnect functions, perform addition functions, sequence and store data. Some basic devices only provide multiplication support and require the user to build other logic functions. More complex devices provide addition, subtraction and accumulation functions as part of the DSP building block. FPGAs typically have dozens of multiplier units and can operate at clock frequencies of hundreds of MHz.
|