System Generator is the key to building a quasi-maximum likelihood detector (4x4, 64-QAM) for spatially multiplexed MIMO-OFDM systems.
WiMAX is to broadband Internet access what the cell phone is to voice communications. It can replace DSL and cable services, providing Internet access anytime, anywhere. All you need to do is turn on your computer, connect to the nearest WiMAX antenna, and you can surf the world's web.
One of the biggest challenges for broadband Internet access is mobility, which is what the latest WiMAX standard aims to address. IEEE 802.16e-2005 introduces the use of multiple antennas in transmission and reception, the MIMO concept, also known as multiple input multiple output, which is a key feature of mobile WiMAX.
Spatial Division Multiplexing (SDM) MIMO processing can significantly improve spectrum efficiency and thus greatly increase the capacity of wireless communication systems. Spatial Division Multiplexing MIMO communication systems have recently attracted widespread attention as a means to significantly improve wireless system capacity and connection reliability.
The best hard-decision detection method for MIMO wireless systems is the maximum likelihood (ML) detector. ML detection is very popular because of its excellent bit error rate (BER) performance. However, the complexity of a straightforward implementation increases exponentially with the number of antennas and modulation schemes, making ASICs or FPGAs limited to low-density modulation schemes using only a few antennas.
In MIMO detection, the best approach that can maintain BER performance comparable to the best ML detection while significantly reducing computational complexity is sphere detection. This approach can reduce the detection complexity of SDM and space division multiple access systems while maintaining BER performance comparable to the best ML detection. There are many ways to implement sphere detectors, and each method has many different algorithms, so designers can find the best balance between multiple performance indicators such as throughput of the wireless channel, BER, and implementation complexity.
While the algorithm (e.g. K-best or depth-first search) and hardware architecture clearly have a huge impact on the final BER performance of a MIMO detector, the channel matrix preprocessing that is typically performed before spherical detection also has a huge impact on the final BER performance of a MIMO detector. Channel matrix preprocessing can be complex or simple, such as prioritizing the processing of spatially multiplexed data streams based on variance computations on the channel matrix, or using very complex matrix factorization methods to determine a more ideal (in terms of BER) data stream processing priority.
Signum Concepts, a San Diego-based communications systems development company, has been working with Xilinx and Rice University to design a MIMO detector for spatial division multiplexing MIMO in 802.16e broadband wireless systems using FPGAs. The processor uses a channel matrix preprocessor to implement a continuous interference cancellation processing technique similar to that used in the Bell Labs Layered Space-Time (BLAST) architecture, ultimately achieving near maximum likelihood performance.
System Considerations
Ideally, the detection process requires the computation of ML solutions for all possible combinations of symbol vectors. The spherical detector aims to reduce the computational complexity by using simple arithmetic operations while maintaining the numerical integrity of the final result. The first step of our approach is to decompose the complex numerical channel matrix into an expression with only real numbers. This operation increases the matrix dimension but simplifies the computation of processing matrix elements. The second aspect of reducing computational complexity is to reduce the optional symbols analyzed and processed by the detection scheme. Among them, QR decomposition of the channel matrix is a crucial step. [page]
Figure 1 shows how the mathematical transformations are performed to arrive at the final expression for the computational part of the Euclidean distance metric. The Euclidean distance metric is the basis for the spherical detection process. R represents a triangular matrix for the iterative method of processing the optional symbols starting with the matrix element rM,M. Here, M represents the dimension of the channel matrix expressed in real numbers. The solution defines a traversal tree structure through M iterations, where each level i of the tree corresponds to the processed symbols of the i-th antenna.
Figure 1. Partial Euclidean distance metric equation for MIMO detection of spherical detectors
The order in which the sphere detector processes antennas has a significant impact on the BER performance. Therefore, our design uses a channel reordering technique similar to the V-BLAST technique before performing sphere detection.
There are several options for implementing tree traversal. In our implementation, we use breadth-first search because it uses the popular feed-forward structure and is hardware-friendly. At each level, we select only the K surviving nodes with the smallest distance to calculate the expansion.
The order in which the sphere detector processes antennas has a significant impact on the BER performance. Therefore, our design uses a channel reordering technique similar to the V-BLAST technique before performing sphere detection.
The method calculates the row norm of the pseudo-inverse matrix of the channel matrix through multiple iterations, and then determines the optimal column detection order of the channel matrix. Depending on the number of iterations, the method can select the row with the largest or smallest norm. The row of the inverse matrix with the smallest Euclidean norm indicates the strongest antenna influence, while the row with the largest Euclidean norm indicates the weakest antenna influence. This novel method processes the weakest data stream first, and then iterates to process the data streams from high to low power.
FPGA Hardware Applications
To implement the system, we used Xilinx Virtex®-5 FPGA technology. The design flow used Xilinx System Generator for design capture, simulation, and verification. To support a variety of antennas/users and modulation orders, the detector was designed for the most demanding 4x4, 64-QAM case.
Our model assumes that the receiver has good knowledge of the channel matrix, which can be achieved by traditional channel estimation methods. After channel reordering and QR decomposition, we start using the sphere detector. In preparation for using soft-input, soft-output channel decoders (such as turbo decoders), we generate soft outputs by computing the log-likelihood ratios (LLRs) of the detected bits.
The main architectural elements of the system include data subcarrier processing and system submodule management functions to process the required number of subcarriers in real time while minimizing processing latency. A channel matrix estimate is performed for each data subcarrier, limiting the processing time available for each channel matrix. For the selected FPGA, the target clock frequency is 225MHz, the communication bandwidth is 5MHz (equivalent to 360 data subcarriers in a WiMAX system), and the number of processing clock cycles available for each channel matrix interval is 64.
We exploit the sophisticated pipelining and time division multiplexing (TDM) capabilities of hardware functional units to achieve the real-time requirements of WiMAX OFDM symbols. [page]
In addition to high data rates, controlling submodule latency is an important issue in guiding the architectural design process. We address the latency issue by introducing TDM of continuous channel matrices. This approach allows for longer processing time between elements of the same channel matrix while maintaining high data throughput. The number of channels that make up the TDM group varies from submodule to submodule. In the TDM scheme, 5 channels are used in the channel matrix inversion process, while 15 channels are time-division multiplexed in the real QR decomposition module. Figure 2 is a high-level flow chart of the system.
Figure 2. High-level flow chart of a MIMO 802.16e broadband wireless receiver.
Channel matrix preprocessing
The channel matrix preprocessor determines the optimal order in which to detect each layer of the spatially multiplexed composite signal. The preprocessor is responsible for computing the norms of the pseudo-inverse of the channel matrix and, based on these norms, selecting the next transmission stream to be processed. The rows with the smallest norms in the pseudo-inverse matrix correspond to the strongest transmission streams (with the smallest noise amplification after detection), while the rows with the largest norms correspond to the worst quality layers (with the largest noise amplification after detection). Our implementation detects the weakest layers first and then proceeds layer by layer in order from lowest noise amplification to highest noise amplification. For each step in the sorting process, the corresponding column in the channel matrix is then cleared and the simplified matrix enters the next level of the antenna sorting processing pipeline.
Among the preprocessing algorithms, the pseudo-inverse matrix is the most computationally demanding. The core of this process is the matrix inversion, which is usually achieved by QR decomposition (QRD) with Givens rotation. Commonly used angle estimation and plane rotation algorithms (such as CORDIC) will cause severe system latency, which is unacceptable for our system. Therefore, our goal is to find an alternative solution for vector rotation and phase estimation using the embedded DSP resources of FPGAs (such as the DSP48E in Virtex-5 devices).
The systolic array structure of the QRD consists of two types of processing elements – diagonal or boundary elements and off-diagonal or internal elements. The boundary elements perform vector functions that generate the rotation angles used by the internal elements of the array. To obtain the desired rotation angle, the value in the off-diagonal element is multiplied by the conjugate complex number in the diagonal element and then divided by the reciprocal of the complex number. The division is actually done by multiplication, that is, when the function is observed to be close to linear, multiplication is performed by the reciprocal calculated from the polynomial approximation of the defined interval. Figure 3 shows the signal flow diagram of this complex rotation in the diagonal systolic element using this approximation.
Figure 3. Diagonal pulsating unit structure
The data sent to the off-diagonal units is the result of dividing the in-phase and quadrature parts of the rotation vector by the corresponding approximation. We not only achieve high data throughput by using a pipelined architecture in the diagonal and off-diagonal units, but also control the latency caused by the approximation module and complex multipliers by time-division multiplexing the hardware across 5 channels.
For a 4x4 matrix, we used 1 diagonal unit and 7 off-diagonal units. The processing time to decompose a single matrix is 4x4=16 data cycles, and the design delivers data at a rate of one sample every three clock cycles, so the total time to decompose a single matrix is 3x4x4=48 clock cycles (less than the available 64 clock cycles). We used back substitution on the decomposed matrix and further reordered it in the same TDM manner. [page]
Spherical detector
The sphere detector uses PED units for norm calculation. We use three different types of PED units depending on the tree level. The root node PED module is responsible for calculating all possible PEDs. The secondary PED module calculates 8 possible PEDs for the 8 surviving paths calculated in the previous level. So we have 64 generated PEDs in the next level index of the tree. The third type of PED module is used in other tree levels and is responsible for calculating the nearest node PED of all PEDs calculated in the previous level.
The pipeline architecture of the sphere detector (SD) can process data in every clock cycle. As a result, only one PED module is required per tree level. Therefore, for a 4x4 64-QAM system, the total number of PED units is 8, which is equal to the number of tree levels.
SD can use two types of decoding techniques: hard decoding and soft decoding. Hard decoding can measure the order by the minimum distance matrix running through each level of the tree; soft decoding uses log-likelihood ratio to represent each bit of the output. Log-likelihood ratio is generally used as a priority input value to the channel decoder, such as turbo decoder.
FPGA resource usage
The implementation and simulation include the detection process shown in Figure 2, but do not include the soft output generation module. The target chip is Virtex-5 XC5VFX130T-2FF1738 FPGA. The designed clock frequency is 225MHz and the available data rate is 83.965Mb/s.
Table 1 shows the resource usage of each major functional unit in the design. The utilization (%) represents the percentage of FPGA area to the total area of the XC5VFX130T device.
Function |
Number of slices |
LUTs/FFs |
DSP48 |
Block RAM |
Channel preprocessing |
9,999
|
20,339/29,954
|
159
|
105
|
RVD QRD |
1,715
|
4,418/5,556
|
30
|
27
|
Spherical detector |
2,445
|
3,113/6,525
|
48
|
12
|
Table 1. Resource usage by subsystem
Figure 4. 4x4 64-QAM floating point MATLAB simulation (hard decision), System Generator design (hard decision) BER curve compared with maximum likelihood curve [page]
System Generator and Model-Based Design
We implemented the complete hard decision chain using Xilinx System Generator for DSP design flow. The design verification used not only the simulation semantics of the MATLAB®/Simulink® environment but also the co-simulation capabilities of System Generator. The in-phase and quadrature parts of the channel matrix parameters were derived from normal distributions and delivered to the System Generator modeling environment by MATLAB. We also performed bit error rate calculations using this simulation framework. Figure 4 compares the BER curve of our fixed-point hard decision design, the BER curve of the floating-point hard decision design, and the best ML reference curve. We developed a hardware demonstration of this design using Ethernet-based hardware co-simulation on the Xilinx ML510 development platform. The channel matrix parameters were sent to the sphere detector using the Xilinx AWGN IP core. We calculated the BER by embedding the design into a self-synchronous BER tester. This instrument is able to send inputs to the detector and capture bit errors.
This article provides a brief introduction to sphere detectors for communication systems using spatially multiplexed MIMO. We explore the architecture of sphere detectors and channel matrix preprocessors in detail. There are many ways to implement preprocessing, and although our method is a bit more computationally complex, it yields BER performance close to maximum likelihood. Although our discussion is centered around WiMAX, designers can apply many of these methods to 3G LTE (Long Term Evolution) wireless systems.
The next step for our group is to improve the BER performance by using turbo convolutional codes and soft output generation modules to perform iterative soft detection.
Previous article:Implementation of an RFID wireless communication system based on FPGA
Next article:Design and implementation of GPS baseband verification system based on FPGA prototype
Recommended ReadingLatest update time:2024-11-17 00:08
- Popular Resources
- Popular amplifiers
- Analysis and Implementation of MAC Protocol for Wireless Sensor Networks (by Yang Zhijun, Xie Xianjie, and Ding Hongwei)
- MATLAB and FPGA implementation of wireless communication
- Intelligent computing systems (Chen Yunji, Li Ling, Li Wei, Guo Qi, Du Zidong)
- Summary of non-synthesizable statements in FPGA
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- 【What should I do】Bullshit
- STM32 supports VCP+MSC+HID mode
- How does AD20 perform multi-person collaboration? (Multiple picture warning
- 13. [Learning LPC1768 library functions] ADC experiment
- How often should flying probe test probes be replaced?
- Learn about SPICE for circuit simulation
- Application of RPA in the customer service industry
- FPGA Competition Adventure Detailed Explanation.pdf
- How to use BlueNRG-1 AES encryption library
- Discussion on the matching problem of PA input