FPGA Implementation of WIMAX LDPC Code Decoder-EEWORLD

Collect

Abstract: A rate- and length-configurable LDPC code decoder based on the TDMP-NMS algorithm is designed to support the decoding of WIMAX standard LDPC codes. By inserting the shortest extra clock cycle, the updated node information is used in a timely manner. A barrel shift register structure based on a filling algorithm working in incremental mode is adopted to support the decoding of LDPC codes with four code lengths of 576, 768, 1152, and 2304 in the standard. The results show that the designed decoder can fully meet the data throughput requirements of the WIMAX standard.

Keywords: WIMAX; low-density parity-check code decoder; FPGA; TDMP; normalized minimum sum algorithm

WIMAX is a wireless metropolitan area network (WMAN) technology based on the IEEE802.16e standard. This technology uses LDPC codes as its channel coding scheme. The LDPC codes of the WIMAX standard have attracted widespread attention due to their excellent performance. Torben Brack et al. used the TDMP decoding algorithm for LDPC codes with code rates of 1/2 and 2/3B, and the BP algorithm for the other four code rates based on the different characteristics of the check matrices of the LDPC codes of different code rates in the standard. They designed and implemented an LDPC code decoder that can support all code rates and code lengths in the WIMAX standard[1]. Shih Xin Yu et al. permuted the rows and columns of the check matrix of the LDPC code so that the processing of the variable nodes and the check nodes can be partially overlapped in time. Based on the BP algorithm, they implemented a decoder that supports 19 code lengths of LDPC codes with code rates of 1/2 in the standard[2].

The reorganization network is an important component of the partially parallel LDPC code decoder. The logarithmic shift register structure and the bidirectional network are only applicable to a single code length. In order to support multiple code lengths [3], the Benes network is used. The partially parallel decoder based on this network can support the decoding of LDPC codes of various code lengths whose expansion factor is less than the number of input and output ports of the network. Since the generation of each switch control signal in the network is based on a recursive algorithm, its complexity is relatively high.

This paper applies the TDMP algorithm to the decoding of LDPC codes of all six code rates in the WIMAX standard. Through analysis, the minimum time interval between the two updates of the variable node a posteriori LLR of the LDPC code of each code rate is obtained. The updated node information is used in a timely manner by inserting an extra clock cycle. A reorganized network unit with a barrel shift register structure based on the filling algorithm is designed to support the four code lengths of 576, 768, 1152 and 2304 in the standard. The incremental-based cyclic shift mode is adopted to reduce the hardware implementation complexity and the number of cycles of a single iteration, thereby improving the throughput.

1 LDPC code decoding algorithm
The standard decoding algorithm for LDPC codes is the BP algorithm. TDMP uses updated node information in a timely manner to accelerate the decoding convergence rate. Under medium and high signal-to-noise ratio conditions, the average number of iterations required for successful decoding by this algorithm is only half of that of the BP algorithm. The TDMP-NMS algorithm uses the normalized minimum sum algorithm to update the check node information in the TDMP algorithm, simplifying the computational complexity. The decoding processing steps of this algorithm are as follows [3]:

(2) Start the next iteration.

[page]
2 Design of LDPC code decoder for WIMAX standard
2.1 Overall structure of decoder
According to the aforementioned TDMP-NMS decoding algorithm, the structure of the partially parallel LDPC code decoder designed in this paper is shown in Figure 1. The entire decoder consists of a posteriori LLR storage unit, a data reorganization network, a processor array, a hard decision output unit and a control unit. To support continuous decoding, the storage of channel information is completed by two groups of identical RAMs working in ping-pong read-write mode. The data reorganization network performs a cyclic shift on the z data output from the posteriori LLR storage unit according to the corresponding values in the check basis matrix and then sends them to the corresponding unit in the processor array for processing. The processor array consists of 96 identical processor units. When the expansion factor is z, only z of the processing units are enabled to parallelly process and update the z SPC codes corresponding to 1 super code. The control unit generates the work enable signal and control signal for each module. The maximum number of iterations stopping criterion is adopted. When the number of decoding iterations reaches the set maximum value, the hard decision output unit makes a hard decision on the information read from the a posteriori LLR storage unit and outputs the decoding result.

2.2 Design of reorganization network
The reorganization network unit structure proposed in this design supports the cyclic shift of the data sequence whose number of input data is a factor of the number of input and output ports of the network. The structure consists of a data filling unit and a barrel shift register unit. In order to support the LDPC code with the maximum expansion factor zmax=96 code length, the number of input and output data ports of the barrel shift register unit in this design is 96. In the preprocessing stage, the filling unit fills all 96 input ports of the barrel shift register with z valid data input in parallel, and the nz+ith port is filled with the i-th valid input data. Where n=0, 1, ..., (96/z)-1, i=1, ..., z. The barrel shift register unit sequentially shifts these 96 data according to the input cyclic shift value control signal. After the shift is completed, the first z data of the output port are the required data sequence. For a reorganization network with 12 input and output data ports, when the number of valid input data is 6 and the cyclic shift value is 3, the data flow diagram is shown in Figure 2.

During each iterative process, the variable node posterior LLR value passed to each unit of the processor array for processing is a data sequence after cyclic shifting through the data reorganization network. In order to make the data input to the unit in the next iterative process a sequence arranged in the original order [4], two reorganization networks are used, one for reorganizing the data read from the variable node LLR memory, and the other for de-reorganizing the updated LLR posterior data. In order to reduce the complexity of hardware implementation, reduce the clock cycle required for a single iterative process, and improve the decoder throughput, this design will adopt an incremental cyclic shift scheme [5]. With this scheme, the value of the cyclic shift of the z variable node posterior LLR information read from the memory each time is the difference between the value required for the shift of this group of data and the modulus z of the shift value required last time. The updated posterior LLR value processed by the processor array is directly stored in the original unit without passing through the reorganization network.

2.3 Design of operation unit
In order to enable the decoder to support all six code rates in the WIAMX standard, this paper designs an operation unit structure based on serial processing as shown in Figure 3. The alpha operation unit receives the internal information ri,0, ri,1, ind_min, sign_j of the supercode and the a posteriori LLR information Pj of the variable node j in compressed form, and calculates qij according to equations (1) and (2). qij is passed to the alpha cache unit for subsequent a posteriori LLR update, and to the TC2SM conversion module, which converts qij represented in binary complement code into a sign-absolute value representation and passes it to the supercode internal information processing update unit. This unit receives an amplitude value and a sign value in each cycle, updates min0, min1, ind_min according to the comparison result of the amplitude value with the current minimum value min0 and the next minimum value min1, stores the received sign value in the register and performs a modulo 2 accumulation operation. When all the variable node information corresponding to a check node is received, the normalization operation of min0 and min1 is executed to calculate ri,0 and ri,1, as well as the update of sign_j. After the operation is completed, the previously stored qij is sequentially read out from the alpha cache unit, and the update of the variable node posterior LLR value Pj is completed according to (6) (7). The function of the control module is mainly to generate the serial number of the current variable node and the working enable signal of each module required for calculating qij and updating ind_min and Pj according to the degree of the current supercode check node. In order to reduce the fan-out of the control unit, this design adopts a mode in which 8 computing units share one control unit. A total of 12 control units are used for the 96 computing units in the entire processor array.

2.4 Reading of a posteriori LLR information of variable nodes
For LDPC codes with code rates of 1/2 and 2/3B in the WIMAX standard, by changing the iterative processing order of each supercode, any two adjacent supercodes can have no common variable nodes, so the processing of the current supercode does not need to wait until the iterative processing of the previous supercode is completed. In this design, the a posteriori LLR information of the variable nodes of the current supercode is read out from the corresponding storage unit after one cycle. For other code rates, by analyzing the characteristics of the corresponding check matrix, it is found that the iterative processing order of the same variable node in the two adjacent supercodes differs by a maximum of 3 cycles. For example, in Figure 4, the number corresponding to a group of z variable nodes in the first supercode and the second supercode of the A code with a code rate of 2/3 is 8th in the first supercode and 5th in the second supercode. Since there is a difference of 7 clock cycles between the completion of the reading of the a posteriori LLRs of the variable nodes adjacent to all the check nodes of the current supercode and the output of the first group of z updated a posteriori LLRs, in order to use the updated a posteriori LLR information of the variable nodes when processing the current supercode, this design starts reading the a posteriori LLR information of the variable nodes of the next supercode 10 cycles after all the variable nodes corresponding to the current supercode are read out when iteratively processing LDPC codes of other code rates.

3 Implementation results and analysis
The CycloneII series FPGA chip EP2C70F896C6 is selected as the target device. The results after compilation and synthesis show that the decoder consumes a total of 27 077 logic units and the maximum operating frequency can reach 69 MHz. At this operating frequency, when the decoder decodes the LDPC code of each code rate with a code length of 2 304 in the standard for 10 iterations, the number of decoding cycles required is:
1 011, 1 686, 985, 1 520, 1 550, 1 257, and the corresponding decoding throughput rates are: 79 Mb/s, 63 Mb/s, 109 Mb/s, 79 Mb/s, 78 Mb/s, 106 Mb/s, which can fully meet the data throughput requirements of the WIMAX standard.
The hardware test of the designed decoder was carried out on the DE2-70 development board, and the test system structure is shown in Figure 5.

The decoding data and control signal generation module generates the decoding data and related control signals required for the decoder to work: code rate, code length, maximum number of iterations, and input data valid indication signal. The decoder decodes according to the input data and control signals. There is a ROM in the decoding data and control signal generation module that stores a frame of decoding data. In order to observe the output signal of the decoder in the SignalTap II Logic Analyzer of Quartus2, the module periodically reads the decoding data from the ROM and generates corresponding control signals. This paper tests the LDPC code with a code rate of 1/2 and a code length of 2 304. During the test, the system operating frequency was 50 MHz, and the sampling clock of the logic analyzer was 100 MHz, which was obtained by multiplying the input 50 MHz signal by PLL. The decoded output waveform of the decoder is shown in Figure 6. In the figure, hdd_en is the output valid data indication signal, and dout0 ~dout11 is the 96-bit data output in parallel by the decoder. The data is compared with the original information sequence. The results are exactly the same, and the decoder works normally.

This paper designs and implements a configurable LDPC code decoder that supports the WIMAX standard. By designing an operation unit based on the serial working mode, it supports all the code rates in the standard. By designing a reorganization network unit based on the padding algorithm, it supports the four code lengths of 24, 32, 48, and 96 in the standard. The TDMP-NMS algorithm is used to improve the decoding convergence rate while reducing the hardware complexity. The experimental results show that the designed decoder works normally at a clock frequency of 50 Hz and can fully meet the data throughput requirements of the WIMAX standard.

Keywords：WIMAX TDMP decoder Reference address：FPGA Implementation of WIMAX LDPC Code Decoder

Previous article：Design of real-time analysis system for flight test vibration data based on POWER PC+FPGA architecture
Next article：FPGA implementation of sound source localization system based on microphone array

Recommended ReadingLatest update time:2024-11-16 19:53

WiMAX physical layer signal test solution based on PXI

Aeroflex provides a solution that integrates signal source and analyzer for WiMAX ( OFDM / OFDMA ) physical layer signal testing based on PXI3000 series hardware . This solution is suitable for testing the physical layer signal RF parameters of WiMAX (OFDM / OFDMA) terminals, RF devices

[Test Measurement]

WiMAX physical layer signal test solution based on PXI

Popular Resources
Popular amplifiers