Algorithm Design of MELP Vocoder Using DSP Chip-EEWORLD

Collect

1 Introduction

In March 1996, the US Government Digital Voice Processing Association (DDVPC) selected the 2.4kbps Mixed Excitation Linear Prediction (MELP) speech encoder as the new standard for narrowband secure speech coding products and various applications. Because MELP has good sound quality, extremely low bit rate, and good error resistance, it can be used in IP PHONE, mobile communications, satellite communications and other fields, especially in situations that require a large amount of voice storage and confidential communications, and has good development prospects.

There are two ways to implement the coding algorithm: hardware implementation and software implementation. Software implementation is more flexible, but the processing speed is slower and generally cannot meet the requirements of real-time processing. Hardware implementation is divided into two types: dedicated method and general method. The general method is based on a general digital signal processor chip to implement the coding algorithm. It has the advantages of small size, low power consumption, and fast computing speed. Its flexibility is mainly reflected in the easy modification of the software and the processing of various algorithms and the implementation of complex algorithms. It is very suitable for compression processing of voice signals, video signals, etc.

The MELP algorithm is highly complex, so real-time implementation must rely on high-performance digital signal processing chips. At present, there is no dedicated chip for studying vocoder algorithms in China. Therefore, considering power consumption and performance, this paper adopts a general method to implement the MELP vocoder algorithm, and selects TI's TMS320VC5416 DSP chip as the main processor to complete the main functions of the vocoder.

2 MELP Codec Algorithm

2.1 Coding

The encoder is based on linear prediction analysis and synthesis technology, with a sampling rate of 8kHz and encoding with 180 sampling values (22.5ms) as one frame. The overall block diagram is shown in Figure 1.

The input original speech signal is filtered by direct current blocking (i.e. high-pass filtering) to obtain the target signal S (n). The target signal is then processed as follows: ① After low-pass filtering, the normalized cross-correlation method is used to roughly estimate the fundamental pitch, and then the fractional fundamental pitch is estimated based on the [0Hz, 500Hz] subband signal around the roughly estimated fundamental pitch; ② Bandpass analysis, the voice intensity is calculated in 5 subbands to determine the clear/voiced judgment of each subband, where the [0Hz, 500Hz] subband intensity is used to determine the non-periodic flag; ③ Calculate LPC and peak value, use the L-D algorithm to extract 10 LP coefficients, and then multiply them by the bandwidth expansion coefficient, use the obtained coefficients to calculate the residual signal, and calculate the peak value for 160 samples of the residual signal value; ④ Use a 6th-order Butterworth filter with a cutoff frequency of 1kHz to low-pass filter the residual signal, combine the fundamental tone of the previous subframe and the fractional gene of the current subframe to search for the final fundamental tone period; ⑤ Use a fundamental tone adaptive window to quantize the gain using a frame twice method; ⑥ LPC analysis and convert it into a line spectrum to quantize the LSP parameters; ⑦ Convert the quantized LSP parameters into LPC parameters and perform inverse filtering operations, fill the residual signal with 0 to 512 points, perform a 512-point FFT on it, and use the spectrum peak detection algorithm to find the Fourier coefficients corresponding to the first 10 harmonics.

Figure 1. MELP encoder coding principle diagram

2.2 Decoding part

The decoder recovers all the parameters of each frame from the data received from the channel. If it is determined that this frame is a relatively quiet speech frame, the gain of the two contacting subframes is increased for noise attenuation processing, and the value of the noise estimation is changed at the same time. All the synthesized parameters are subjected to pitch synchronization interpolation processing. These interpolated parameters include pitch period, gain, LSF coefficient, vibration intensity, quantized Fourier amplitude, coefficients of the periodic signal filter and noise filter coefficients used to generate the mixed excitation signal, and spectral slope coefficients of the adaptive enhancement filter. After the interpolation is completed, the mixed excitation signal is generated by adding the periodic signal filtered by the subband filter and the noise excitation signal. Then the two excitation signals are filtered separately and added to obtain the excitation signal. After synthesizing the mixed excitation signal, the signal is processed by the adaptive spectral enhancement filter to improve the shape of the resonance peak. Subsequently, the excitation signal is subjected to LPC synthesis to obtain the synthesized speech. LPC synthesis uses a direct form filter whose coefficients are obtained by the interpolated LSP parameters. The synthesized speech signal is output after gain adjustment and pulse dispersion filtering. The overall block diagram is shown in Figure 2.

Figure 2 MELP encoder decoding principle diagram

3 Introduction to TMS320VC5416

The overall architecture of TMS320VC5416 is shown in Figure 4. Its internal high-performance CPU has an arithmetic logic unit ALU, two 40-bit accumulators ACCA and ACCB, a 40-bit barrel row shift register, a multiplication and accumulation unit, and an addressing unit. The arithmetic logic unit includes a 40-bit ALU, a comparison, selection and storage unit (CSSU) and an exponential encoder, which has a high degree of parallelism. The TMS320VC5416 chip used in this article has a maximum addressable capacity of 192K words (including 64K words of program space, 64K words of data space and 64K words of I/O space), and has an extended address space of 256K words to 8M words in the extended addressing mode, and has a set of efficient and flexible instruction sets. Its instruction cycle is 6.25ns, and the execution speed can reach up to 160MIPS, which can fully meet the requirements of real-time processing.

Figure 4 TMS320VC5416 overall system architecture

4 Software Design and Its Key Issues

The software design includes encoding process and decoding process, and the encoding flow chart is shown in Figure 3. Since the decoding process is relatively simple, only the encoding flow chart is given here.

This software process design is completely in accordance with the MELP principle. The following key issues need to be paid attention to during the actual programming process.

Figure 3 MELP encoding flow chart

⑴Memory allocation problem

Since TMS320VC5416 adopts dual bus structure and provides many multi-function instructions, these characteristics should be fully considered in actual implementation, and multi-function instructions should be used as much as possible, and various registers and pointers should be reasonably allocated and used. For example, the MAC instruction can complete the multiplication and addition operation in one instruction cycle, and can also realize continuous multiplication and addition in combination with the reasonable arrangement of registers without caching intermediate data, thereby greatly improving the operation efficiency. In addition, it is necessary to make full use of the dedicated hardware structure, addressing mode and special instructions provided by TMS320VC5416. For example, the ring memory addressing mode, the double operand addressing mode, the EXP instruction and the NORM instruction, the rounding operation, etc. The proper use of these methods and instructions can greatly improve the software efficiency.

(2) Scaling of numbers

TMS320VC5416 uses fixed-point numbers for numerical calculations, and its operands are generally represented by integers. However, its instructions support two operation modes: decimal mode and integer mode. For DSP, the numbers involved in numerical calculations are 16-bit integers. In most cases, the numbers in the mathematical operation process are not necessarily integers, which requires programmers to determine the position of the decimal point, that is, the calibration of the number. There are two representations for the calibration of numbers in TMS320VC5416: Q representation and S representation. In this software, Q representation is used.

In the program, it is often necessary to determine whether the operation result overflows. The TMS320VC5416 chip itself has an overflow protection function. The overflow processing is automatically executed by setting the OVM bit of the PMST register in the chip. The overflow function can be set to be effective at the beginning of the program. Once an overflow exception occurs, the result of the accumulator ACC is set to the maximum saturation value (overflow bit 7FFFH, underflow bit 8001H), thereby preventing the overflow from causing a serious deterioration of accuracy.

⑶Prevent pipeline conflicts

The pipeline is the most distinctive part of TMS320VC5416, which greatly improves the performance of TMS320VC5416. However, when DSP resources are used by instructions that are not in the same pipeline stage at the same time, or when accessing certain registers, pipeline conflicts are likely to occur. During compilation, the compiler will automatically insert one or more empty operations, thereby increasing the required amount of calculation and reducing software efficiency. Therefore, pipeline conflicts need to be avoided in software design and development.

5 Test Results

At present, the codec has been verified by all test vectors of MELP. When the system implements the codec in real time, the results of informal subjective tests show that the MOS score of the MELP algorithm is about 3.3, and its clarity, naturalness and noise resistance performance are significantly better than the traditional LPC algorithm. Tables 1 and 2 respectively give the storage and computing required for the codec to implement the MELP algorithm in real time on the fixed-point DSP chip TMS320VC5416.

As can be seen from Table 1, the total storage capacity of the program and data storage area is 25.2K words. Since the size of the internal RAM of TMS320VC5416 is 128K words, all programs and data can be directly moved to the internal RAM of the chip at one time when the program is booted. Table 2 shows the statistical results of the resources used by the vocoder. At full duplex, the maximum computing capacity is 39.9MIPS, which meets the requirements of real-time implementation.

The above analysis results show that a single TMS320VC5416 chip can implement up to 4-channel voice encoding and decoding, and the remaining resources on the chip can also be used to implement other additional functions.

6 Conclusion

Innovation: This paper introduces the mixed excitation linear prediction (MELP) vocoder algorithm and briefly analyzes the coding and decoding principles of the algorithm. At the same time, this paper uses TI's TMS320VC5416 DSP chip for real-time implementation and points out the key issues that need to be paid attention to in software implementation. The results of informal subjective tests show that the naturalness, clarity and anti-noise performance of the algorithm are significantly better than those of the traditional LPC algorithm. It is suitable for short-wave narrowband digital secure communications, wireless communications and other low-rate speech coding occasions, and has broad application prospects.

Keywords：MELP Reference address：Algorithm Design of MELP Vocoder Using DSP Chip

Previous article：Design of communication interface for DSP core signal acquisition system
Next article：Design of ultrasonic electronic pen based on TMS320VC5509A

Recommended ReadingLatest update time:2024-11-17 02:42

DSP Programming Skills 4---Unveiling the Mystery of Compilers: Advanced Program Optimization

　　In the last article , we mentioned the most commonly used options for program optimization in DSP programming, mainly including -O1, -O2, -O3, -O4, etc. Although we use DSP as an example, for other processors, such as ARM , CPU, some advanced microcontrollers such as MSP430, PIC, etc. and some compilation en

[Embedded]

DSP Programming Skills 4---Unveiling the Mystery of Compilers: Advanced Program Optimization

Real-time signal simulator completed by DSP

Preface 　　In the design of digital signal processing systems such as communications and radar, signal simulators play a vital role. The simulator is used to simulate various input signals of the signal processing system during actual work, thereby facilitating system debugging. Existing instruments can be used to si

[Embedded]

Design of driving safety auxiliary recording system based on TI DM642 and OMAP5912 DSP experimental board

Research Motivation and Introduction With the advancement of industry, the issues of driving safety and vehicle anti-theft have been put before people all over the world. According to reports, more than 110,000 people die in car accidents in China every year. Most of the accidents are caused by human factors, a

[Embedded]

Design of driving safety auxiliary recording system based on TI DM642 and OMAP5912 DSP experimental board

Parallel implementation of ATR algorithm on DSP processor

　　Automatic target recognition (ATR) algorithms usually include algorithms for automatically detecting, tracking, identifying, and selecting attack points. The complexity of the battlefield environment and the continuous growth of target types make the ATR algorithm more and more computationally intensive, so the ATR

[Embedded]

Actuator Control System Based on TMS320F240 DSP

1 Introduction The exciter is a device that provides a high-precision, high-stability carrier for the transmitter. Currently, many exciter control systems have problems such as low frequency synchronization accuracy and slow frequency calibration due to the speed and accuracy limitations of their core devic

[Embedded]

Actuator Control System Based on TMS320F240 DSP

Design of real-time image target search and tracking system based on DSP

　　1. Introduction 　　The TV image tracker is a simple and intelligent image tracking device. It can extract and separate moving targets in the field of view field by field in a relatively complex background according to the standard TV format, extract the target brightness and structural characteristics, and measure

[Embedded]

Design of real-time image target search and tracking system based on DSP

Design of high-precision data acquisition system based on ADS1158 and DSP

0 Introduction ADS1158 is a high-performance analog/digital conversion chip with multiple channels (16 single-ended or 8 differential), high precision (16 bits), and high speed (scanning speed of 1.8 to 23.7 KSPS) produced by TI in the United States. ADS1258 and ADS1158 have the same functions, higher precision (

[Microcontroller]

Design of high-precision data acquisition system based on ADS1158 and DSP

Speech Data Acquisition and Processing System Based on DSP and MATLAB

1 Introduction ---The rapidly developing digital signal processor has been widely used in data acquisition, communication and multimedia fields. This system uses TI's 16-bit fixed-point high-speed chip TMS320C5410 and a dedicated voice acquisition chip TLC320AD50 for data acquisition and related filtering

[Embedded]

Speech Data Acquisition and Processing System Based on DSP and MATLAB

Popular Resources
Popular amplifiers