Implementation of AMR Speech Codec in TD-SCDMA System-EEWORLD

Collect

0 Introduction

In the field of speech coding, with the huge increase in the amount of information such as transmission, processing, and storage, information compression processing has become an urgent requirement. Based on new networks and new requirements, it is of great significance to study systems that use various variable rate speech coding technologies, whether from the perspective of saving transmission bandwidth resources or maintaining high efficiency of line communication. At present, in order to meet this need, the concept of AMR (adaptive multi-rate) has been proposed, that is, adaptive speech coder. Based on bandwidth considerations, it can be divided into AMR-NB (AMR Narrowband) and AMR-WB (AMR wideband). For AMR-NB, the voice channel bandwidth is limited to 3.7 MHz and the sampling frequency is 8 kHz, while AMR-WB has a bandwidth of 7 MHz and a sampling frequency of 16 kHz, but considering the short-term correlation of speech, the length of each frame is 20 ms. Although these two encoders use different rates according to bandwidth requirements, they have similarities. The following focuses on the implementation of AMR-NB in TD-SCDMA. This encoder uses the Algebraic Code Linear Prediction (ACELP) hybrid coding method, that is, the digital speech signal includes both some speech feature parameters and some waveform coding information, and then uses these feature information to resynthesize the speech signal. The number of these parameters extracted is controlled, and the information is selected according to the rate requirements to obtain the following 8 rates, which are mixed to form the adaptive speech encoder shown in Table 1. In Table 1, mode AMR-12.20 extracts 244 bits of parameter information, while mode AMR-4.70 only extracts 95 bits of information. According to the amount of information contained in these bits, they can be divided into 3 categories of bits: class 0, 1, and 2. During channel coding, class 0 and 1 will use cyclic redundancy check codes for error detection, and class 2 will be restored based on the previous frame.

The basic problem of speech coding or speech compression coding research is how to get the best possible reconstructed speech quality under the condition of a given coding rate. Subjective evaluation methods are widely used because they are consistent with human perception of speech quality when listening to speech. Commonly used methods include the mean opinion score (MOS) judgment method. Table 2 shows the speech quality of each mode of the AMR speech coder.

Table 1

Table 2

1 Adaptive Mechanism of AMR Mode Selection

The basic concept of adaptation is to solve the rate allocation problem of source and channel coding in a more intelligent way, making the configuration and utilization of wireless resources more flexible and efficient. The actual speech coding rate depends on the channel conditions, which is a function of the channel quality. This part of the work is completed by the decoder with the assistance of the base station based on measurement parameters such as noise, selecting the mode and determining the rate. In principle, when the channel is very poor, a relatively low-rate encoder is used, so that more bits can be allocated to the channel coding to achieve error correction and more reliable error control, thereby effectively suppressing errors and improving voice quality [1].

In order to facilitate quantitative comparison in the implementation of TD-SCDMA system, the concept of C/I (carrier-to-interference ratio) is adopted. The sliding average value is taken and then compared with a pre-defined threshold value to determine the rate selection. Due to different characteristics, full-rate channels and half-rate channels should have different definition values. In full-rate channels, when C/I≥13, the MOS value of MR122 can reach more than 4, which can provide good performance; when 9≤C/I＜13, MR122, MR102, and MR795 can all be selected. The lower the rate, the lower the frame error rate; when 6≤C/I＜9, it is best to choose MR74, MR67, and MR59; and when C/I＜6, the lower the rate should be selected as much as possible. As the channel quality decreases, the frame error rate will increase, but the lower the selected rate is, the better the voice quality can be. For half-rate channels, it is similar to the above and will not be repeated. The implementation process of adaptive rate selection is further explained below. Figure 1 is a complete illustration. Adaptation requires that two types of information be transmitted: on the downlink channel, the base station needs to send a mode selection measurement command to the mobile station, and on the uplink channel, the mobile station transmits channel measurement information to the base station. This mode requires accurate, reliable, and timely information transmission to effectively achieve the purpose of adaptation. The base station sends a measurement command per frame, receives the return information, and selects mode 1 for the next frame through comparison. In this way, the conversion between rates can be achieved to achieve the purpose of adaptation. There will be a certain power loss when switching between rates, and the loss between different rates is different. This should be considered during the implementation process [2,3].

[page]

2 AMR Encoder Algorithm

The AMR encoder algorithm is a hybrid coding algorithm based on algebraic codebook linear prediction (ACELP) [4, 5]. The basic principle is that the original speech is input frame by frame, and according to the criterion of minimizing the weighted mean square error between the synthesized speech and the original speech, a suitable code vector is selected from the random codebook and the fixed codebook to replace the residual signal, and the code vector address and gain and the parameters of each filter are quantized and encoded and transmitted to the receiving end; when the receiving end restores each filter, it uses the same codebook as the transmitting end, finds the code vector according to the code vector address and multiplies the gain, excites the synthesis filter, and obtains the synthesized speech. In the encoding part, the following typical parameters need to be extracted: linear prediction filter coefficient (LP), adaptive codebook (ACB) and fixed codebook (FCB) index and the gain of the two codebooks (see Figure 2). The following will explain the AMR encoding and decoding scheme from the perspectives of encoding and decoding.

AMR Encoder Algorithm

(1) Linear prediction calculation. The LPC filter represents the vocal tract model in the speech signal generation model.

Among them, A(z) is the vocal tract transfer function, and ai changes continuously with the change of speech frames (ai has short-term stability). Therefore, in each speech frame, the LPC coefficient needs to be extracted. According to the principle of minimizing the mean square error between the predicted value and the actual value, the following formula can be obtained:

Formula 1

The above canonical equation [6] can be linearly predicted using the Durbin algorithm to obtain the parameter ak. Considering the relative independence and ordered bounded nature of the line spectral frequency (LSF) error, it corresponds one-to-one to the linear prediction parameter (LP), and can be converted to each other using the Chebyshev polynomial estimation method. Therefore, when considering transmission, the LSF parameter is used instead of the LP parameter, which is vector quantized and then restored in the decoding part. In the 12.2 kbit/s mode, the split matrix (SMQ) method is used for vector quantization, and in other modes, the split vector (SVQ) method is used for vector quantization. Since each frame in 12.2 kbit/s needs to perform two linear prediction coding (LPC) analyses, two groups of LSF coefficients will be obtained. In the specific implementation of the TD-SCDMA system, AMR jointly quantizes these two groups of coefficients. That is, the matrix (r(1), r(2)) is divided into 5 2×2 sub-matrices, respectively.

Vector quantization is performed with a dimension of 4, and the codebook capacity is 128 (subarray 1), 64 (subarray 5), and 256 (subarrays 2, 3, and 4). The distortion measure selects the Euclidean distance with the smallest calculation amount and subjective evaluation significance. The full search algorithm is used in the codebook search process. Similarly, for other coding rates, there are the same ideas and operation steps. The biggest difference is the subarray division of the LSF vector. Their division method uses 3 subarrays (subvectors) with dimensions of 3, 3, and 4.

(2) Codebook search [4]. In the TD-SCDMA system, AMR adaptive codebook search and algebraic codebook search are the key to speech synthesis. They are both completed on the basis of subframes, where each subframe is 5 ms long and corresponds to 4 samples. The adaptive codebook represents the periodic structure in the speech signal generation model. The adaptive codebook search removes the long-term correlation in the signal through a long-term prediction filter (LTP), making the residual signal spectrum flatter, so as to form a white noise excitation signal, and extract the pitch delay and the corresponding pitch gain. After the fractional pitch delay is determined through open-loop and closed-loop pitch analysis, the adaptive codebook vector v (n) is obtained by interpolating at the optimal integer delay kopt and phase (fractional delay) t.

Formula 2

The algebraic codebook represents the random signal in the speech signal generation model. According to the principle of minimizing the perceptual weighted mean square error, it is finally obtained [2]. The algebraic codebook structure is based on the interleaved single pulse sequence (ISPP). The values of its pulse amplitude and position are subject to certain restrictions to meet certain algebraic structure and bit allocation requirements. For different rates, the pulse position and number are selected differently. Moreover, in the system, the codebook design improves the previous Gaussian random codebook structure and constructs a center-clipped overlapping codebook. After sparseness, 90% of zero values will be generated in the codebook. This can simplify the search process. By maximizing the following formula, d = Htx2 represents the correlation between the target signal x2 (n) and the impulse response h′w (n). After obtaining the above parameters, a total of three quantizers are designed in the AMR system. AMR -12.2 uses a 6-bit scalar quantizer to quantize the algebraic codebook gain. AMR-4.75 uses a joint quantization of the adaptive codebook gain and the algebraic codebook gain. For other rates, the target vector is found by minimizing the weighted error between the original speech and the synthesized speech. Since multiple rates are considered common, the codebook capacity is large, which is different from other encoders.

(3) AMR decoding principle. The decoder is divided into three parts: decoding, speech synthesis and post-filtering. At the decoder input, the LSP vector, adaptive codebook and algebraic codebook parameters (index and gain) are obtained from the received bit stream. The LSP line spectrum pair parameters also need to be converted into linear prediction filter coefficients, and then the synthesis filter coefficients of each subframe are obtained by interpolation based on the LP coefficients. The excitation vector is obtained by weighting the adaptive codebook and the algebraic codebook with their respective gains. The excitation vector is input into the synthesis filter to obtain the reconstructed speech signal. Finally, the reconstructed speech signal needs to be post-filtered. The speech encoded by the ACELP encoder can be regarded as the original speech containing Gaussian noise. Post-filtering can reduce the noise signal contained in the synthesized speech, thereby effectively improving the speech quality of the synthesized speech. Post-processing includes two functions: adaptive post-filtering and signal amplification. Adaptive gain control is used to compensate for the distortion between the synthesized speech and the synthesized speech after post-filtering. The signal is passed through the following filter to obtain the corrected post-filtered synthesized speech.

3 Conclusion

The introduction of AMR can provide high-quality speech, enhance the ability to resist channel errors, and increase system capacity through flexible configuration of low coding rate. The coding rate dynamically selects different modes according to the wireless environment and local capacity requirements. The author analyzes and studies the AMR speech coding algorithm. The algorithm has been implemented on TI's TMS320 C5510 DSP using a mixed programming of fixed-point C speech and assembly language, and is used in the TD-SCDMA system. The amount of calculation can be reduced to about 20 MIPS. Through self-loop testing on the hardware platform of the TD-SCDMA system, good call voice quality can be obtained, and the result is very ideal.

Keywords：TD-SCDMA Reference address：Implementation of AMR Speech Codec in TD-SCDMA System

Previous article：Chip-on-Chip SoC Challenges Traditional Test Solutions
Next article：High-precision temperature and humidity measuring instrument based on isolation and network technology

Recommended ReadingLatest update time:2024-11-16 18:11

Visual SLAM navigation! ABB unleashes new potential of AMR

In the ever-changing manufacturing environment, flexible and efficient material handling is essential. Currently, autonomous mobile robots (AMRs) are reshaping logistics operations and becoming a key driver of the digital transformation of the manufacturing industry . The latest AMR product T702 from

[robot]

Popular Resources
Popular amplifiers