The basic structure of all digital logic
16-QAM Modulator
Coding and codeword mapping
Square Root Raised Cosine Filter
Design Tips
5 MHz carrier
Distributed Computing (DA) Technology
Implementation of filters
Design software with field programmable gate arrays (FPGAs)
Radios
and
Modems
Comparable to DSP chips. Although FPGAs can easily implement complex logic functions such as convolutional encoders, they have great shortcomings in implementing a large number of complex calculations. Even if the fastest FPGA is used to implement a matrix multiplier, its cost and performance cannot match that of a DSP chip worth only $5. DSP is still the chip of choice when designing with CAD tools, but with the application of distributed computing (DA) technology, FPGAs are once again popular with designers.
One of the characteristics of FPGAs is its flexible structure. In fact, the functional modules of wireless and modulation and demodulation data channels can be easily mapped to independent and parallel hardware nodes. When using a digital signal processor that can only run in time-sharing mode, scheduling multiple time-critical tasks requires very complex programming, while using FP
GA avoids this problem.
We will introduce FPGA features while designing the 16-QAM RF transmit data pump, and describe in detail how to easily convert the data channel function module into the logic circuit of the Xilinx 4000 series FPGA, so as to accurately estimate the amount of required logic circuits. Although the design of 16-QAM data pumps that meet the same system requirements and use the same type of FPGA has been published in the open literature, the number of logic circuits reported seems to be much larger than what is actually needed. In order to rush to market, the product may not be designed with CAD tools. Relying entirely on CAD tools may not always lead to the best solution, but also requires a lot of hard work, experience and creative work.
Basic structure of all digital logic
Any digital logic can be constructed with enough general logic gates such as NAND gates and NOR gates. FPGAs have plenty of logic gates. The logic gates of the Xilinx 4000 series take the form of truth tables or the more general 16-word x 1-bit lookup tables (LUTs) that can implement any Boolean function of four input variables (the address lines of the lookup table). Since the function generated is usually equivalent to the combination of multiple NAND gates, the LUT is considered the basic logic unit. The Xilinx 4000 series configurable logic block (CLB) includes two 16-word LUTs that can be combined to generate any Boolean function of five input variables. In addition, the LUT can be set up as two 16 x 1 RAMs or one 32 x 1 RAM.
The CLBs are arranged in a two-dimensional square matrix, and the CLBs and the interconnections between them can be configured separately. The smallest XC4002 contains an 8 x 8 CLB matrix, and the largest XC4085XL contains a 48 x 48 CLB matrix. Each LUT is connected to a flip-flop up to 100 MHz.
16-QAM Modulator
16-QAM modulator Includes the key functional blocks of the RF transmit data pump (see Figure 1). The 20-Mbps serial data is divided into 4-bit symbol groups and sent in parallel to a differential encoder and symbol mapper at a rate of 5 megasymbols per second. The mapper produces 3-bit orthogonal component pairs. These component pairs are then pulse-shaped by a pair of square root raised cosine filters, interpolated to 20 megasymbols per second, and modulated by a 5MHz carrier. The outputs are summed and converted to digital. The key to the design is the use of a pair of interpolated pulse shaping filters.
To implement this design approach effectively, it is necessary to take the encoding and mapping functional blocks and a 5MHz modulator into account when determining the total number of logic gates.
Encoding and Symbol Mapping
In determining the amount of logic required for the encoder and signal mapper, we can look to the design of standard modems in the past. For example, the encoder in V.32 includes a differential encoder that provides 180-degree bi-phase protection and a convolutional encoder that can add redundancy to reduce the receiver's bit error rate (BER). Both the encoder and mapper are implemented as finite state machines, with all states implemented by five registers (2.5 CLBs) and the connection logic consisting of eight two-input XOR gates (4 CLBs) and three two-input AND gates (1.5 CLBs). In this 16-QAM transmitter, a serial-to-parallel conversion register (2 CLBs) captures four 20-Mbps serial bits to form a 4-bit symbol, so the encoder can handle data streams down to 5 megabits per second, which is easily handled by the CLBs. Data path control requires clocking registers along the data path, requiring fewer than 15 CLBs. Next, an encoded 5-bit output symbol corresponds to the address lines of the mapper, which is simply a pair of 3-bit output LUTs.
These outputs are mapped as quadrature components (I and Q) to symbol positions in a two-dimensional plane (constellation). Only 16 of the 64 intersection points (stars) represent valid symbol positions. The size of the mapper is 32 words x 3 bits x 2 or 6 CLBs. The total number of CLBs for these functional modules is 31.
Square Root Raised Cosine Filters
Square root raised cosine filters are a viable method for suppressing symbol interference within the limited bandwidth of a transmission channel. The spectrum is modulated by the transmitter and receiver units to form square root raised cosine filters. The filter shape and its coefficients are developed with the aid of QEDesign 1000 software. Figure 2 shows the response of a 32-tap finite impulse response (FIR) filter calculated at 12-bit fixed points. We will use a 12-bit filter model and determine its logic gate count (with 12-bit quantization, the QEDesign program only requires 28 symmetric coefficients, but this design will use a full 32-tap symmetric FIR filter).
Design Tips
Square root raised cosine filters are used for spectrum shaping on both I and Q channels. When generating I and Q samples at 5 Mbps, the filter generates 20 Mbps of data for the modulator. Thus, the filter acts as a 1:4 interpolator. The corresponding computational effort (using symmetric coefficients) is 2 channels x 16-tap symmetric taps x 20 Mbps = 640 Mbps multiply-accumulate operations. This speed is significantly faster than most fixed-point DSP chips can run. FPGAs are now an attractive option, but it is also necessary to select a filter format that can be most efficiently mapped to a CLB-based design.
There are many configurations or forms of logic circuits that can implement FIR filters. The most important are the direct form (i.e., a commonly used software model), the transposed form with variables (which has been implemented by dedicated filter chips), and the polyphase filter (for multirate applications). However, none of these forms can use the method of symmetric coefficients to reduce the number of multiplication calculations. One trick for designing multirate filters is to plot the signal flow trajectory on the sample point-coefficient plane.
The vertical axis represents the sample points and the horizontal axis represents the coefficients. The data trajectory is drawn to show the response of the filter after flipping 90 degrees. Because the coefficients are symmetrical, only half of the filter coefficients need to be listed. The insertion coefficient is K, that is, K-1 zeros are filled between the input sample points, resulting in a V-shaped trajectory for the 32-tap FIR. Although the input data sample points are spaced 200 ns apart, the new trajectory points must be every 50ns.
Two computational models can be derived from this figure. The first is a variation of the transposed form, in which the products of the nonzero input sample values and all 32 coefficients are added in the partial sum register. After the 32 products are added and the full filter response is output, the multiply-accumulate circuit can be used to calculate a new trajectory. Here, 32 MAC operations are performed every 200ns. The second model is a delay-and-add, which is a direct form of the FIR filter. As can be seen in the filter trajectory, eight stored samples are required to calculate a filter response. By calculating five consecutive filter responses, we can observe the model given in Table 1.
Four consecutive 20MHz responses can be calculated from the same eight sample input groups. Only two sets of filter coefficients are used. The filter coefficients are in the opposite order of the third and fourth responses (yd and ye) of each sample data group. Can these response equations be mapped into an effective FPGA circuit? Of course they can! The key is to use distributed computing technology, which is not available in all current design tools. Before implementing the response equations, some simplifications can be made.
5 MHz Carrier
The simple equation for carrier modulation is: Y(k) = yI(k)cos(wC*t) + yQ(k)sin(wC*t), where wC is the carrier frequency = 2p(5 MHz), and I and Q represent the in-phase and quadrature components of the symbol.
This equation is executed every 50 ns. There are only four carrier values in one symbol period (200 ns). These values can be conveniently defined as: cos(wC*t) = 1, 0, -1, 0 and sin(wC*t) = 0, 1, 0, -1, 1.
The modulated output does not require any multiplication or addition, nor does it require the calculation of the I and Q filter responses every 50 ns. An I response is calculated for 50 ns, followed by a Q response in the next 50 ns, then an I response, then a Q response, and so on.
Distributed Computing (DA) TechnologyDA
is a computing technique specifically for sum-of-product equations where one of the multiplication factors is a constant. DA design can achieve gate-level high-efficiency, serial bit arithmetic and high-performance bit parallel operations. It is a classic serial/parallel synthesis scheme. DA technology can be applied to many important linear, time-invariant digital signal processing algorithms, such as filters (FIR and IIR), transforms (fast Fourier transform [FFT]), and matrix-vector products, such as 8 x 8 discrete cosine transform (DCT).
DA technology has been around for more than 20 years and has proven to be unsuitable for the fixed-point instruction set structure of programmable DSPs. However, DA is very suitable for FPGA implementation, especially LUT logic blocks such as Xilinx CLB. The design of DA FIR filters using Xilinx XC3000 series FPGAs was proposed as early as 1992.
There are no independent multipliers in the DA circuit. The multiplication is performed by the LUT. DA stores the sum of all partial product terms in an equation and performs operations based on all input variable bit lookup tables (here DALUT). The serial DA circuit has a separate DALUT that looks up the table starting from the least significant bit. The output sum of the partial products is stored in the accumulator. This approach reminds us of the shift-and-add subroutines in early computers. Successive DALUT outputs are accumulated into the binary shift-down accumulated sum of the partial products. This gives a true double-precision result.
Filter Implementation
The data path of the square root raised cosine filter is defined by standard functional blocks that can be converted to CLBs. The 3-bit I and Q signals output by the mapper are transferred to the parallel-to-serial shift register (PSR) every 200ns. The RAM shift register (SR) chain stores the seven previous symbols. The first three filter responses Yb, Yc, Yd are calculated together with the cyclic data in the shift register. The PSR also requires a feedback channel, but the RAM SR is cyclically affected by the block addressing when read only. There are six blocks here, the first three shifts are for Yb, the next three for Yc, and the last three for Yd. When calculating Ye, the data is shifted down the SR chain. This block addressing pattern is repeated as the data is transferred (written) by the previous stage. All twelve shifts and the corresponding PSR loading, RAMSR addressing and write control are derived from the 60MHz system clock.
Since the same coefficient group is used for two sampling cycles, one for I channel data calculation and the other for Q channel data calculation, a set of DALUTs and 2/1 multiplexers are used to direct the serial data stream to the corresponding address ports. These ports can represent the DALUT
The structure is shown in Figure 1. A logic high at the h3 port selects all addresses that contain the partial product and h3. Similarly, a logic high at the h7 port selects all addresses that contain h7, and a logic high at the h3 and h7 ports selects all addresses that contain h3 and h7. The remaining six coefficients follow this pattern. In fact, the eight coefficients will require 28 or 256 words to store. For the case of 12-bit coefficients, (256/32 words per CLB) x 12 = 96 CLBs will be required. Another trick is to use two DALUTs, each taking four coefficients and adding their outputs. This reduces the number of CLBs to (2 x 24)/32 x 12 + 13/2 (parallel adders) = 18.5 CLBs.
The same simplification can be applied to the second set of filter coefficients starting with h1. The parallel adders can be time-shared using a 2/1 multiplexer. The adder is expanded to 13 bits and fed into the aforementioned scalar accumulator that performs shift and add operations. When the sign bit of the input variable is transmitted to the DALUT, a subtraction operation is performed. This process can be accomplished by adding an EXOR gate at the DALUT output and carrying to the first stage of the accumulator in the standard way. For the negative responses Yd and Ye, the data can be sampled without the sign bit and all DALUT output data can be inverted to complement.
For I and Q data in fractional two's complement format, the filter coefficients are adjusted to prevent overflow in the final output. The ten most significant bits can be loaded into the D/A conversion driver register.
The total number of CLBs for the filter data channel is 71.5, and the FPGA output port has a trigger that can be used as a driver register for the D/A conversion. Including the encoder (31 CLBs) and timing and control functions (estimated to be less than 50 CLBs), the total number is about 159 CLBs, which fits neatly into the smaller (slightly larger than the smallest) chip in the Xilinx XC4000 series, the XC4005 (196 CLBs). If a more advanced FPGA device such as the Xilinx Virtex is used, the number of CLBs can be reduced and performance can be improved.
The entire design ensures performance at a 60MHz system clock. The data flow is uniform and unidirectional. Pipeline registers can be inserted (without increasing CLBs) to shorten the combinatorial path. The fourteen-stage carry chain through the scalar accumulator is the longest combinatorial path. However, sufficient speed margin is ensured by the built-in pre-carry circuit.
Previous article:Carrier Modulation System Based on FPGA
Next article:Digital FPGA Design and Implementation of π/4-DQPSK Differential Demodulator
Recommended ReadingLatest update time:2024-11-16 16:51
- Popular Resources
- Popular amplifiers
- Analysis and Implementation of MAC Protocol for Wireless Sensor Networks (by Yang Zhijun, Xie Xianjie, and Ding Hongwei)
- MATLAB and FPGA implementation of wireless communication
- Intelligent computing systems (Chen Yunji, Li Ling, Li Wei, Guo Qi, Du Zidong)
- Summary of non-synthesizable statements in FPGA
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Chestnut Development Board-TI5708 First Hands-On Exploration
- Review summary: National Technology low-power series, N32L43x is now available for testing
- Some information about mobile power supplies (power banks)
- 【CH579M-R1】+U disk read/write
- 77G millimeter wave antenna design
- Unboxing of Materials - STM32F767 & ESP32
- Prize Review: 50 Rapid IoT Kits from NXP to Proof of Concept in Minutes
- Download Keysight Technologies' e-book "X-Apps Treasure Map: Essential Measurement Apps for Signal Analyzers That Accelerate Tests" and receive a gift!
- [Evaluation of EC-01F-Kit, the EC-01F NB-IoT development board] Power-on and serial port test
- ELEXCON 2022 Shenzhen International Electronics Exhibition will open on November 6 (new schedule), hurry up and get your tickets! There are also many gifts waiting for you!