【Abstract】 The algorithm of ITU-TG. 723.1 standard speech codec and its implementation on ADSP-2181 chip are introduced. The combination of software and hardware realizes the sampling and real-time coding of speech signals, which fully complies with the fixed-point algorithm of ITU-TG. 723.1 standard and passes all test vectors of ITU-T.
Keywords: speech codec, DSP, ITU-TG. 723.1
1 Introduction
At present, Voice over IP (VoIP) technology is becoming more and more popular, and the low-bit-rate speech compression standards used are mainly G. 723. 1 and G. 729. With the continuous development of VoIP technology, the integration and performance of products are required to be further improved. It is the future development trend to use a new generation of high-performance DSP chips to realize the processing of multiple voice signals on a single DSP.
G. 723. 1 standard is a low-bit-rate coding algorithm launched by ITU in 1996. It is mainly used for the compression of speech and other multimedia sound signals, such as videophone systems, digital transmission systems and high-quality speech compression systems. G. The G. 723.1 standard can work at two bit rates, 6.3 kbps and 5.3 kbps. Among them, the high bit rate algorithm has a higher quality of reconstructed speech, while the low bit rate algorithm has a lower computational complexity. Like the general low bit rate speech coding algorithm, the G. 723.1 standard adopts the synthesis analysis method of linear prediction. When quantizing the excitation signal, the high bit rate algorithm adopts multi-pulse maximum likelihood quantization (MP-MLQ), while the low bit rate algorithm adopts arithmetic codebook excited linear prediction (ACELP).
2 Algorithm Introduction
The parameter model of the speech signal is to use the excitation signal to excite a system model to imitate the sound produced by the airflow impinging on the vocal tract. The linear prediction method is based on the assumption of the all-pole model and uses the minimum mean square error criterion in the time domain to estimate the model parameters. The parameters to be extracted during the analysis process include the LSP parameters of the vocal tract system, the delay and gain of the adaptive codebook,
and the position and sign of the pulse in the fixed codebook.
The G. 723.1 encoder can compress the voice band speech signal sampled at 8 kHz. In order to reduce the bit rate, the G. 723.1 uses a longer frame size, 240 samples per frame, that is, a 30 millisecond frame length. Each frame of input signal is first filtered out by a first-order high-pass filter to remove the DC component, and then divided into four subframes of 60 samples, each of which is independently subjected to LPC analysis. In order to improve the continuity of the LPC coefficient, an overlapping window of 180 samples is used, that is, it contains both the previous and next subframes, which introduces a 60-sample lead delay into the algorithm, so the total delay of the algorithm is 37.5 milliseconds. The LPC coefficient is represented by the linear spectral frequency (LSF), and the LSF parameter is predicted by split vector quantization, which is only performed on the fourth subframe. In order to improve the quantization perception quality, the speech signal after high-pass filtering needs to pass through a resonance peak perception weighting filter and a resonance peak noise shaping filter to generate the initial target signal. The former parameter is composed of the unquantized LPC coefficients of each subframe, and the latter is obtained by estimating the open-loop pitch period for every two subframes, where the pitch period ranges from 18 to 142 samples. The LPC synthesis filter, the resonance peak perception weighting filter and the resonance peak noise shaping filter are used for the system zero input response calculation and the best excitation estimation. The G.723.1 encoder also includes a fifth-order pitch predictor, whose parameters are obtained by closed-loop pitch search based on the open-loop pitch estimation value and the impulse response. When performing the best excitation estimation, the system zero input response and the pitch predictor contribution need to be subtracted from the initial target signal to obtain the final target signal, and then the MP-MLQ and ACELP methods are used for quantization for high and low bit rates respectively. Among them, the LSF parameters, the pitch value and the excitation parameters need to be transmitted to the decoder.
The decoder first reconstructs the LPC synthesis filter according to the obtained LSF parameters, and then obtains the adaptive codebook excitation signal and the fixed codebook excitation signal according to the pitch value and the excitation parameters.
2.1 Extraction of channel model parameters
The transfer function of the tenth-order all-pole model system is:
Where S(z) and U(z) are the Z transforms of the output signal s(n) and the input signal u(n), respectively. Therefore, the error signal is:
To minimize the mean square error, {ak} must satisfy
= 1, 2, ... 10), from which we can get
the system of equations with k as the variable:
where R (n) is the autocorrelation function value of S (n). For this Toeplitz matrix, the Durbin recursive algorithm can be easily solved.
Because of the good quantization and interpolation characteristics of line spectrum pair parameters (LSP), LPC parameters must be converted into LSP parameters for transmission.
Let the inverse filter of the linear filter be
ωi and θi, which are the i-th zero points of P (z) and Q (z) respectively. ωi and θi appear in pairs, reflecting the spectral characteristics of the signal, so they are called line spectrum pairs. Perform discrete Fourier transform on the coefficients of P (z) and Q (z) to obtain the values of each point zk = e-jπk/N (k = 0, 1, 2 ... N), and search for the position of the minimum point, which is the possible zero point position.
2.2 Search of adaptive codebook
The open-loop search is based on integer pitch estimation of the entire frame. In order to improve reliability, the original signal is preprocessed and clipped with a center clipping function. Then
the pitch Top is estimated using the autocorrelation pitch detection method. The closed-loop search is a subframe-based pitch search. The LPC synthesis filter, the resonance peak perception weighted filter, and the harmonic noise filter are combined to form a comprehensive filter, and the impulse response of the comprehensive filter is calculated. The closed-loop pitch period can be calculated by a fifth-order pitch predictor using the estimated open-loop pitch period and the calculated impulse response of the comprehensive filter.
2.3 Fixed codebook search
The residual signal obtained after the adaptive codebook search is searched for a fixed codebook.
The pulse maximum likelihood quantization method (MP-MLQ) is used for the fixed codebook search of high bit rate (6.3kbps). The excitation signal can be expressed as
: Where G is the gain factor, δ(n) is the unit impulse response, and {ak} and {mk} are the sign and position of the unit impulse response, respectively. M is the number of pulses, 6 for even frames and 5 for odd frames.
The task of the coding algorithm is to estimate G, {ak} and {mk} to minimize the mean square value of the error signal e〔n〕.
Among them, r〔n〕 represents the target vector, the residual signal obtained after the adaptive codebook search, and h〔n〕 represents the impulse response of the weighted synthesis filter.
For the low bit rate (5.3kbps) encoder, the fixed codebook search uses the algebraic codebook excited linear prediction method (ACELP). There are 4 pulses in each subframe, and their possible positions are shown in Table 1.
Table 1
The codebook search is also to minimize the mean square error between the weighted speech signal r〔n〕 and the weighted synthesized speech signal. That is:
where r represents the target vector, the residual signal obtained after the adaptive codebook search, G is the codebook gain, vξ is the codebook corresponding to the index ξ in the algebraic codebook, and H is the truncated impulse response of the weighted synthesis filter.
To obtain the best codebook, that is, to search for ξ that maximizes τξ,
Among them, τξ is an intermediate parameter, d is the correlation value between r〔n〕 and h〔n〕, and Φ is the covariance matrix of the impulse response. Calculation of C and ε:
For the code vector in odd position, first shift the even pulse by one sample position, and then use the above formula to calculate.
3 Algorithm Implementation
3.1 Hardware Design
The system block diagram is shown in Figure 1.
The analog voice signal is converted into a digital signal through the A/D conversion of TP3057 and sent to ADSP-2181 (sampling frequency 8kHz). TP3057 is an A-law encoder/decoder produced by National Semiconductor Corporation of the United States. It contains an A-law pulse code modulation encoder/decoder/filter monolithic circuit using A/D and D/A conversion structure and a serial PCM interface. Among them, the encoding part also contains an amplifier with adjustable input gain, an active RC prefilter, an automatic zeroing circuit, and an A-rate compression encoder. The decoding part includes an A-law decoder and a low-pass filter with a cutoff frequency of 3400Hz. The former reconstructs the analog signal from the A-law compressed signal, and the latter corrects the sinx/x response of the decoder output and filters out high-frequency signals.
ADSP-2181 is a high-performance single-chip microcomputer produced by Analog Devices, suitable for high-speed digital signal processing. In addition to three arithmetic units, a data address generator and a program sequencer, ADSP-2181 also contains two serial ports, a 16-bit internal IDMA port, an 8-bit BDMA port, a programmable timer, external interrupt capability, and on-chip program and data memory. The chip integrates 80k bytes of memory, including 16k 24-bit program memory and 16k 16-bit data memory.
The automatic receiving and sending function of the IDMA port can easily realize the data interaction between ADSP-2181 and the main CPU. The PC loads the program into the internal memory of ADSP-2181 through the IMDA port. When ADSP-2181 is executed at full speed, the host can query its status, read the compressed code stream, and also send in the data to be decoded.
3.2 Software Design
The software design includes three modules: interface module, encoding module and decoding module.
The interface module realizes the data exchange between ADSP-2181 and the main CPU. This module includes the main control program of DSP and data transmission. The main control program of DSP is responsible for dividing the collected voice data into frames and sending them to the encoder, and sending the received code stream to the decoding module after classification. The data transmission part is responsible for collecting data and exchanging data with the main CPU.
According to the ITU-TG. 723.1 standard fixed-point algorithm, the DSP program is divided into three modules: initialization (G723-Init), encoding (G723-Incode), and decoding (G723-Decode).
G723-Icode encodes a frame signal containing 240 sampling points and returns 12 or 10 words of binary data.
The input data is obtained from the serial port and placed in the array G723-Enc-Inp. The return value is placed in the array G723-Enc-Out. For 6.3kbps, the length is 12 words, and for 5.3kbps, the length is 10 words. Its format is encapsulated according to the G. 723.1 standard.
G723-Decode reconstructs 240 voice samples based on the received 12-word or 10-word packaged data. The input data is placed in G723
Dec-Inp, and the output is placed in the array G723-Dec-Out.
The host program is written in VisualC++ and communicates with the DSP through the serial port.
3.3 C language optimization
The integrated simulation software VisualDSP++ of AD company is used in the development, but usually, the C compiler can complete 70% of the whole work, and 30% of further optimization must be achieved through handwritten assembly.
3.3.1 Loop unrolling
When using DSP with parallel capabilities to develop software, an important idea is to make full use of the word length and a large number of computing units of DSP and try to unroll the loop body. By increasing the number of instructions executed in each loop to reduce the total number of loops, more instructions can be run in the same clock cycle, which improves the efficiency of the loop.
3.3.2 Improve the utilization of registers
The operation efficiency of the computing unit inside the DSP chip is very high, but if the data exchange between the register and the data bus is frequent, the execution efficiency of the DSP will be greatly reduced. Because DSP often needs several cycles of delay when performing memory operations, such as 4 cycles of delay for Load instruction and 2 cycles of delay for Store instruction. In order to reduce time-consuming memory operations, the frequently used data can be pre-placed in registers before the program enters the loop body, and then called repeatedly. Practice has proved that this method can improve some efficiency.
4 Experimental results
All codes have passed the test of ITU-T test vectors.
The test results show that the required computing amount for high bit rate (6.3kbps) is 24.8MIPS, and the required computing amount for low bit rate (5.3kbps) is 21.3MIPS. This implementation can be widely used in IP phones and video conferencing.
Previous article:Design of Broadband Signal Source Based on DSP and CPLD
Next article:Using MAX543 to realize automatic range conversion of DSP sampling system
- Popular Resources
- Popular amplifiers
- Molex leverages SAP solutions to drive smart supply chain collaboration
- Pickering Launches New Future-Proof PXIe Single-Slot Controller for High-Performance Test and Measurement Applications
- CGD and Qorvo to jointly revolutionize motor control solutions
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Nidec Intelligent Motion is the first to launch an electric clutch ECU for two-wheeled vehicles
- Bosch and Tsinghua University renew cooperation agreement on artificial intelligence research to jointly promote the development of artificial intelligence in the industrial field
- GigaDevice unveils new MCU products, deeply unlocking industrial application scenarios with diversified products and solutions
- Advantech: Investing in Edge AI Innovation to Drive an Intelligent Future
- CGD and QORVO will revolutionize motor control solutions
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- It is found that the OPAMP of STM32G474 cannot be internally connected to ADC4
- TI low-power through-glass touch reference design
- In-depth understanding of C language function parameters as pointers
- How to detect if there is water in the pipe?
- Live broadcast at 10 am today [Introduction to TI Jacinto7 Industrial Application Processor] (Entry at 9:30, 200 gifts waiting for you)
- [2022 Digi-Key Innovation Design Competition] Material unboxing—ESP32-S2-KALUGA-1, k210
- EEWORLD University Hall ---- The first stop of the ADI Road theme tour of Shijian: Industrial Automation
- [DWIN Serial Port Screen] Nucleic Acid Sampling Registration System VI Adding Sampling Workstation
- ADI chip silk screen model
- Why should buck_boost be designed from the lowest input voltage?