0 Introduction
In the field of speech coding, with the huge increase in the amount of information such as transmission, processing, and storage, information compression processing has become an urgent requirement. Based on new networks and new requirements, it is of great significance to study systems that use various variable rate speech coding technologies, whether from the perspective of saving transmission bandwidth resources or maintaining high efficiency of line communication. At present, in order to meet this need, the concept of AMR (adaptive multi-rate) has been proposed, that is, adaptive speech coder. Based on bandwidth considerations, it can be divided into AMR-NB (AMR Narrowband) and AMR-WB (AMR wideband). For AMR-NB, the voice channel bandwidth is limited to 3.7 MHz and the sampling frequency is 8 kHz, while AMR-WB has a bandwidth of 7 MHz and a sampling frequency of 16 kHz, but considering the short-term correlation of speech, the length of each frame is 20 ms. Although these two encoders use different rates according to bandwidth requirements, they have similarities. The following focuses on the implementation of AMR-NB in TD-SCDMA. This encoder uses the Algebraic Code Linear Prediction (ACELP) hybrid coding method, that is, the digital speech signal includes both some speech feature parameters and some waveform coding information, and then uses these feature information to resynthesize the speech signal. The number of these parameters extracted is controlled, and the information is selected according to the rate requirements to obtain the following 8 rates, which are mixed to form the adaptive speech encoder shown in Table 1. In Table 1, mode AMR-12.20 extracts 244 bits of parameter information, while mode AMR-4.70 only extracts 95 bits of information. According to the amount of information contained in these bits, they can be divided into 3 categories of bits: class 0, 1, and 2. During channel coding, class 0 and 1 will use cyclic redundancy check codes for error detection, and class 2 will be restored based on the previous frame.
The basic problem of speech coding or speech compression coding research is how to get the best possible reconstructed speech quality under the condition of a given coding rate. Subjective evaluation methods are widely used because they are consistent with human perception of speech quality when listening to speech. Commonly used methods include the mean opinion score (MOS) judgment method. Table 2 shows the speech quality of each mode of the AMR speech coder.
1 Adaptive Mechanism of AMR Mode Selection
The basic concept of adaptation is to solve the rate allocation problem of source and channel coding in a more intelligent way, making the configuration and utilization of wireless resources more flexible and efficient. The actual speech coding rate depends on the channel conditions, which is a function of the channel quality. This part of the work is completed by the decoder with the assistance of the base station based on measurement parameters such as noise, selecting the mode and determining the rate. In principle, when the channel is very poor, a relatively low-rate encoder is used, so that more bits can be allocated to the channel coding to achieve error correction and more reliable error control, thereby effectively suppressing errors and improving voice quality [1].
In order to facilitate quantitative comparison in the implementation of TD-SCDMA system, the concept of C/I (carrier-to-interference ratio) is adopted. The sliding average value is taken and then compared with a pre-defined threshold value to determine the rate selection. Due to different characteristics, full-rate channels and half-rate channels should have different definition values. In full-rate channels, when C/I≥13, the MOS value of MR122 can reach more than 4, which can provide good performance; when 9≤C/I<13, MR122, MR102, and MR795 can all be selected. The lower the rate, the lower the frame error rate; when 6≤C/I<9, it is best to choose MR74, MR67, and MR59; and when C/I<6, the lower the rate should be selected as much as possible. As the channel quality decreases, the frame error rate will increase, but the lower the selected rate is, the better the voice quality can be. For half-rate channels, it is similar to the above and will not be repeated. The implementation process of adaptive rate selection is further explained below. Figure 1 is a complete illustration. Adaptation requires that two types of information be transmitted: on the downlink channel, the base station needs to send a mode selection measurement command to the mobile station, and on the uplink channel, the mobile station transmits channel measurement information to the base station. This mode requires accurate, reliable, and timely information transmission to effectively achieve the purpose of adaptation. The base station sends a measurement command per frame, receives the return information, and selects mode 1 for the next frame through comparison. In this way, the conversion between rates can be achieved to achieve the purpose of adaptation. There will be a certain power loss when switching between rates, and the loss between different rates is different. This should be considered during the implementation process [2,3].
[page]
2 AMR Encoder Algorithm
The AMR encoder algorithm is a hybrid coding algorithm based on algebraic codebook linear prediction (ACELP) [4, 5]. The basic principle is that the original speech is input frame by frame, and according to the criterion of minimizing the weighted mean square error between the synthesized speech and the original speech, a suitable code vector is selected from the random codebook and the fixed codebook to replace the residual signal, and the code vector address and gain and the parameters of each filter are quantized and encoded and transmitted to the receiving end; when the receiving end restores each filter, it uses the same codebook as the transmitting end, finds the code vector according to the code vector address and multiplies the gain, excites the synthesis filter, and obtains the synthesized speech. In the encoding part, the following typical parameters need to be extracted: linear prediction filter coefficient (LP), adaptive codebook (ACB) and fixed codebook (FCB) index and the gain of the two codebooks (see Figure 2). The following will explain the AMR encoding and decoding scheme from the perspectives of encoding and decoding.
(1) Linear prediction calculation. The LPC filter represents the vocal tract model in the speech signal generation model.
Among them, A(z) is the vocal tract transfer function, and ai changes continuously with the change of speech frames (ai has short-term stability). Therefore, in each speech frame, the LPC coefficient needs to be extracted. According to the principle of minimizing the mean square error between the predicted value and the actual value, the following formula can be obtained:
The above canonical equation [6] can be linearly predicted using the Durbin algorithm to obtain the parameter ak. Considering the relative independence and ordered bounded nature of the line spectral frequency (LSF) error, it corresponds one-to-one to the linear prediction parameter (LP), and can be converted to each other using the Chebyshev polynomial estimation method. Therefore, when considering transmission, the LSF parameter is used instead of the LP parameter, which is vector quantized and then restored in the decoding part. In the 12.2 kbit/s mode, the split matrix (SMQ) method is used for vector quantization, and in other modes, the split vector (SVQ) method is used for vector quantization. Since each frame in 12.2 kbit/s needs to perform two linear prediction coding (LPC) analyses, two groups of LSF coefficients will be obtained. In the specific implementation of the TD-SCDMA system, AMR jointly quantizes these two groups of coefficients. That is, the matrix (r(1), r(2)) is divided into 5 2×2 sub-matrices, respectively.
Vector quantization is performed with a dimension of 4, and the codebook capacity is 128 (subarray 1), 64 (subarray 5), and 256 (subarrays 2, 3, and 4). The distortion measure selects the Euclidean distance with the smallest calculation amount and subjective evaluation significance. The full search algorithm is used in the codebook search process. Similarly, for other coding rates, there are the same ideas and operation steps. The biggest difference is the subarray division of the LSF vector. Their division method uses 3 subarrays (subvectors) with dimensions of 3, 3, and 4.
(2) Codebook search [4]. In the TD-SCDMA system, AMR adaptive codebook search and algebraic codebook search are the key to speech synthesis. They are both completed on the basis of subframes, where each subframe is 5 ms long and corresponds to 4 samples. The adaptive codebook represents the periodic structure in the speech signal generation model. The adaptive codebook search removes the long-term correlation in the signal through a long-term prediction filter (LTP), making the residual signal spectrum flatter, so as to form a white noise excitation signal, and extract the pitch delay and the corresponding pitch gain. After the fractional pitch delay is determined through open-loop and closed-loop pitch analysis, the adaptive codebook vector v (n) is obtained by interpolating at the optimal integer delay kopt and phase (fractional delay) t.
The algebraic codebook represents the random signal in the speech signal generation model. According to the principle of minimizing the perceptual weighted mean square error, it is finally obtained [2]. The algebraic codebook structure is based on the interleaved single pulse sequence (ISPP). The values of its pulse amplitude and position are subject to certain restrictions to meet certain algebraic structure and bit allocation requirements. For different rates, the pulse position and number are selected differently. Moreover, in the system, the codebook design improves the previous Gaussian random codebook structure and constructs a center-clipped overlapping codebook. After sparseness, 90% of zero values will be generated in the codebook. This can simplify the search process. By maximizing the following formula, d = Htx2 represents the correlation between the target signal x2 (n) and the impulse response h′w (n). After obtaining the above parameters, a total of three quantizers are designed in the AMR system. AMR -12.2 uses a 6-bit scalar quantizer to quantize the algebraic codebook gain. AMR-4.75 uses a joint quantization of the adaptive codebook gain and the algebraic codebook gain. For other rates, the target vector is found by minimizing the weighted error between the original speech and the synthesized speech. Since multiple rates are considered common, the codebook capacity is large, which is different from other encoders.
(3) AMR decoding principle. The decoder is divided into three parts: decoding, speech synthesis and post-filtering. At the decoder input, the LSP vector, adaptive codebook and algebraic codebook parameters (index and gain) are obtained from the received bit stream. The LSP line spectrum pair parameters also need to be converted into linear prediction filter coefficients, and then the synthesis filter coefficients of each subframe are obtained by interpolation based on the LP coefficients. The excitation vector is obtained by weighting the adaptive codebook and the algebraic codebook with their respective gains. The excitation vector is input into the synthesis filter to obtain the reconstructed speech signal. Finally, the reconstructed speech signal needs to be post-filtered. The speech encoded by the ACELP encoder can be regarded as the original speech containing Gaussian noise. Post-filtering can reduce the noise signal contained in the synthesized speech, thereby effectively improving the speech quality of the synthesized speech. Post-processing includes two functions: adaptive post-filtering and signal amplification. Adaptive gain control is used to compensate for the distortion between the synthesized speech and the synthesized speech after post-filtering. The signal is passed through the following filter to obtain the corrected post-filtered synthesized speech.
3 Conclusion
The introduction of AMR can provide high-quality speech, enhance the ability to resist channel errors, and increase system capacity through flexible configuration of low coding rate. The coding rate dynamically selects different modes according to the wireless environment and local capacity requirements. The author analyzes and studies the AMR speech coding algorithm. The algorithm has been implemented on TI's TMS320 C5510 DSP using a mixed programming of fixed-point C speech and assembly language, and is used in the TD-SCDMA system. The amount of calculation can be reduced to about 20 MIPS. Through self-loop testing on the hardware platform of the TD-SCDMA system, good call voice quality can be obtained, and the result is very ideal.
Previous article:Chip-on-Chip SoC Challenges Traditional Test Solutions
Next article:High-precision temperature and humidity measuring instrument based on isolation and network technology
Recommended ReadingLatest update time:2024-11-16 18:11
- Keysight Technologies Helps Samsung Electronics Successfully Validate FiRa® 2.0 Safe Distance Measurement Test Case
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- Seizing the Opportunities in the Chinese Application Market: NI's Challenges and Answers
- Tektronix Launches Breakthrough Power Measurement Tools to Accelerate Innovation as Global Electrification Accelerates
- Not all oscilloscopes are created equal: Why ADCs and low noise floor matter
- Enable TekHSI high-speed interface function to accelerate the remote transmission of waveform data
- How to measure the quality of soft start thyristor
- How to use a multimeter to judge whether a soft starter is good or bad
- What are the advantages and disadvantages of non-contact temperature sensors?
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Once China overcomes the chip problem, will chips become cheap as cabbage?
- TI Da Vinci series TMS320DM8168 floating point DSP C674x + ARM Cortex-A8
- Started developing ultrasonic gas meters
- Did you have a 4WD car in your childhood? See the "voice-controlled 4WD car" made by a talented person on B station~
- Have you ever designed a 0-10V dimming power supply that is fully compatible with dimmers?
- How to deal with the new national standard for electric vehicles? TI provides BMS solutions!
- TI's IWR6843 smart mmWave sensor antenna-in-package evaluation module
- Structural Design of Virtual I2C Bus Software Package Based on TMS320F206
- eBox question. Has anyone used eBox before? How to set the timer?
- ST NUCLEO-L552ZE-Q Review Summary