1 Introduction
In March 1996, the US Government Digital Voice Processing Association (DDVPC) selected the 2.4kbps Mixed Excitation Linear Prediction (MELP) speech encoder as the new standard for narrowband secure speech coding products and various applications. Because MELP has good sound quality, extremely low bit rate, and good error resistance, it can be used in IP PHONE, mobile communications, satellite communications and other fields, especially in situations that require a large amount of voice storage and confidential communications, and has good development prospects.
There are two ways to implement the coding algorithm: hardware implementation and software implementation. Software implementation is more flexible, but the processing speed is slower and generally cannot meet the requirements of real-time processing. Hardware implementation is divided into two types: dedicated method and general method. The general method is based on a general digital signal processor chip to implement the coding algorithm. It has the advantages of small size, low power consumption, and fast computing speed. Its flexibility is mainly reflected in the easy modification of the software and the processing of various algorithms and the implementation of complex algorithms. It is very suitable for compression processing of voice signals, video signals, etc.
The MELP algorithm is highly complex, so real-time implementation must rely on high-performance digital signal processing chips. At present, there is no dedicated chip for studying vocoder algorithms in China. Therefore, considering power consumption and performance, this paper adopts a general method to implement the MELP vocoder algorithm, and selects TI's TMS320VC5416 DSP chip as the main processor to complete the main functions of the vocoder.
2 MELP Codec Algorithm
2.1 Coding
The encoder is based on linear prediction analysis and synthesis technology, with a sampling rate of 8kHz and encoding with 180 sampling values (22.5ms) as one frame. The overall block diagram is shown in Figure 1.
The input original speech signal is filtered by direct current blocking (i.e. high-pass filtering) to obtain the target signal S (n). The target signal is then processed as follows: ① After low-pass filtering, the normalized cross-correlation method is used to roughly estimate the fundamental pitch, and then the fractional fundamental pitch is estimated based on the [0Hz, 500Hz] subband signal around the roughly estimated fundamental pitch; ② Bandpass analysis, the voice intensity is calculated in 5 subbands to determine the clear/voiced judgment of each subband, where the [0Hz, 500Hz] subband intensity is used to determine the non-periodic flag; ③ Calculate LPC and peak value, use the L-D algorithm to extract 10 LP coefficients, and then multiply them by the bandwidth expansion coefficient, use the obtained coefficients to calculate the residual signal, and calculate the peak value for 160 samples of the residual signal value; ④ Use a 6th-order Butterworth filter with a cutoff frequency of 1kHz to low-pass filter the residual signal, combine the fundamental tone of the previous subframe and the fractional gene of the current subframe to search for the final fundamental tone period; ⑤ Use a fundamental tone adaptive window to quantize the gain using a frame twice method; ⑥ LPC analysis and convert it into a line spectrum to quantize the LSP parameters; ⑦ Convert the quantized LSP parameters into LPC parameters and perform inverse filtering operations, fill the residual signal with 0 to 512 points, perform a 512-point FFT on it, and use the spectrum peak detection algorithm to find the Fourier coefficients corresponding to the first 10 harmonics.
Figure 1. MELP encoder coding principle diagram
2.2 Decoding part
The decoder recovers all the parameters of each frame from the data received from the channel. If it is determined that this frame is a relatively quiet speech frame, the gain of the two contacting subframes is increased for noise attenuation processing, and the value of the noise estimation is changed at the same time. All the synthesized parameters are subjected to pitch synchronization interpolation processing. These interpolated parameters include pitch period, gain, LSF coefficient, vibration intensity, quantized Fourier amplitude, coefficients of the periodic signal filter and noise filter coefficients used to generate the mixed excitation signal, and spectral slope coefficients of the adaptive enhancement filter. After the interpolation is completed, the mixed excitation signal is generated by adding the periodic signal filtered by the subband filter and the noise excitation signal. Then the two excitation signals are filtered separately and added to obtain the excitation signal. After synthesizing the mixed excitation signal, the signal is processed by the adaptive spectral enhancement filter to improve the shape of the resonance peak. Subsequently, the excitation signal is subjected to LPC synthesis to obtain the synthesized speech. LPC synthesis uses a direct form filter whose coefficients are obtained by the interpolated LSP parameters. The synthesized speech signal is output after gain adjustment and pulse dispersion filtering. The overall block diagram is shown in Figure 2.
Figure 2 MELP encoder decoding principle diagram
3 Introduction to TMS320VC5416
The overall architecture of TMS320VC5416 is shown in Figure 4. Its internal high-performance CPU has an arithmetic logic unit ALU, two 40-bit accumulators ACCA and ACCB, a 40-bit barrel row shift register, a multiplication and accumulation unit, and an addressing unit. The arithmetic logic unit includes a 40-bit ALU, a comparison, selection and storage unit (CSSU) and an exponential encoder, which has a high degree of parallelism. The TMS320VC5416 chip used in this article has a maximum addressable capacity of 192K words (including 64K words of program space, 64K words of data space and 64K words of I/O space), and has an extended address space of 256K words to 8M words in the extended addressing mode, and has a set of efficient and flexible instruction sets. Its instruction cycle is 6.25ns, and the execution speed can reach up to 160MIPS, which can fully meet the requirements of real-time processing.
Figure 4 TMS320VC5416 overall system architecture
4 Software Design and Its Key Issues
The software design includes encoding process and decoding process, and the encoding flow chart is shown in Figure 3. Since the decoding process is relatively simple, only the encoding flow chart is given here.
This software process design is completely in accordance with the MELP principle. The following key issues need to be paid attention to during the actual programming process.
Figure 3 MELP encoding flow chart
⑴Memory allocation problem
Since TMS320VC5416 adopts dual bus structure and provides many multi-function instructions, these characteristics should be fully considered in actual implementation, and multi-function instructions should be used as much as possible, and various registers and pointers should be reasonably allocated and used. For example, the MAC instruction can complete the multiplication and addition operation in one instruction cycle, and can also realize continuous multiplication and addition in combination with the reasonable arrangement of registers without caching intermediate data, thereby greatly improving the operation efficiency. In addition, it is necessary to make full use of the dedicated hardware structure, addressing mode and special instructions provided by TMS320VC5416. For example, the ring memory addressing mode, the double operand addressing mode, the EXP instruction and the NORM instruction, the rounding operation, etc. The proper use of these methods and instructions can greatly improve the software efficiency.
(2) Scaling of numbers
TMS320VC5416 uses fixed-point numbers for numerical calculations, and its operands are generally represented by integers. However, its instructions support two operation modes: decimal mode and integer mode. For DSP, the numbers involved in numerical calculations are 16-bit integers. In most cases, the numbers in the mathematical operation process are not necessarily integers, which requires programmers to determine the position of the decimal point, that is, the calibration of the number. There are two representations for the calibration of numbers in TMS320VC5416: Q representation and S representation. In this software, Q representation is used.
In the program, it is often necessary to determine whether the operation result overflows. The TMS320VC5416 chip itself has an overflow protection function. The overflow processing is automatically executed by setting the OVM bit of the PMST register in the chip. The overflow function can be set to be effective at the beginning of the program. Once an overflow exception occurs, the result of the accumulator ACC is set to the maximum saturation value (overflow bit 7FFFH, underflow bit 8001H), thereby preventing the overflow from causing a serious deterioration of accuracy.
⑶Prevent pipeline conflicts
The pipeline is the most distinctive part of TMS320VC5416, which greatly improves the performance of TMS320VC5416. However, when DSP resources are used by instructions that are not in the same pipeline stage at the same time, or when accessing certain registers, pipeline conflicts are likely to occur. During compilation, the compiler will automatically insert one or more empty operations, thereby increasing the required amount of calculation and reducing software efficiency. Therefore, pipeline conflicts need to be avoided in software design and development.
5 Test Results
At present, the codec has been verified by all test vectors of MELP. When the system implements the codec in real time, the results of informal subjective tests show that the MOS score of the MELP algorithm is about 3.3, and its clarity, naturalness and noise resistance performance are significantly better than the traditional LPC algorithm. Tables 1 and 2 respectively give the storage and computing required for the codec to implement the MELP algorithm in real time on the fixed-point DSP chip TMS320VC5416.
As can be seen from Table 1, the total storage capacity of the program and data storage area is 25.2K words. Since the size of the internal RAM of TMS320VC5416 is 128K words, all programs and data can be directly moved to the internal RAM of the chip at one time when the program is booted. Table 2 shows the statistical results of the resources used by the vocoder. At full duplex, the maximum computing capacity is 39.9MIPS, which meets the requirements of real-time implementation.
The above analysis results show that a single TMS320VC5416 chip can implement up to 4-channel voice encoding and decoding, and the remaining resources on the chip can also be used to implement other additional functions.
6 Conclusion
Innovation: This paper introduces the mixed excitation linear prediction (MELP) vocoder algorithm and briefly analyzes the coding and decoding principles of the algorithm. At the same time, this paper uses TI's TMS320VC5416 DSP chip for real-time implementation and points out the key issues that need to be paid attention to in software implementation. The results of informal subjective tests show that the naturalness, clarity and anti-noise performance of the algorithm are significantly better than those of the traditional LPC algorithm. It is suitable for short-wave narrowband digital secure communications, wireless communications and other low-rate speech coding occasions, and has broad application prospects.
Previous article:Design of communication interface for DSP core signal acquisition system
Next article:Design of ultrasonic electronic pen based on TMS320VC5509A
Recommended ReadingLatest update time:2024-11-17 02:42
- Popular Resources
- Popular amplifiers
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- Rambus Launches Industry's First HBM 4 Controller IP: What Are the Technical Details Behind It?
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- Disease data information is uploaded to the cloud in a timely manner NRF9160
- Measuring magnet position using the LSM303AGR magnetometer
- [Evaluation of SGP40] +UART communication test sensor
- It’s not fun anymore. The delivery time of 4G modules is about 4-6 weeks.
- ARM or x86? Who will stand out in the future industrial SBC digital industry?
- ST MEMS Device Resource Library-Development Platform Application Guide
- A new low-cost miniaturized GPS antenna
- This factory will run on 100% renewable energy!
- Question about the positive input pin of the proportional amplifier circuit being left floating
- AWR1243 RF Powered-up RF enable failure (Dev mode) processing