Linear prediction analysis is one of the core technologies in modern speech signal processing. It has made great contributions to the rapid development of modern speech signal processing. It has been widely used in speech analysis, synthesis, coding, recognition, etc., and is still one of the most effective speech analysis technologies. For example, the G.729 CS-ACELP speech coding algorithm used in VoIP and H.323 online multimedia communication systems is a high-quality speech coding standard based on conjugate algebraic code excited linear prediction.
1 Basic principles of linear prediction
The most commonly used model in speech signal processing is the full-pole model. The basic concept of linear prediction is that a unique set of predictor coefficients can be determined by minimizing the sum of squares of the differences between the actual speech samples and the linear prediction samples, that is, approximating the minimum mean square error. If a random process is simulated by the output generated by a p-order full-pole system excited by white noise, let the transfer function of this system be:
Where: p is the predictor order; G is the channel filter gain. Therefore, the relationship between the speech sample s(n) and the excitation signal e(n) can be expressed by the following difference equation:
That is, there is a correlation between speech samples, and the past sample values can be used to predict future sample values. For voiced sounds, the excitation e(n) is a unit impulse repeated at the pitch period; for unvoiced sounds, e(n) is a stationary white noise.
In the process of model parameter estimation, the following system is called a linear predictor:
Where: ai is called the linear prediction coefficient. Therefore, the system function of the p-order linear predictor has the following form:
A(z) in equation (1) is called the inverse filter, and the base transfer function is:
The linear prediction equation can be obtained as follows: The short-term average prediction error within a certain frame is defined as:
From this, the prediction coefficients can be calculated.
Due to the short-term stability of speech signals, they need to be processed in frames (10 to 30 ms). For a speech segment s(n) with N sample points selected from the window at time n, it is recorded as Φn(j, i):
2 Basic Principles of Linear Prediction Analysis in G.729
Using 10th order linear prediction (LP) for short-term analysis, the LP synthesis filter is defined as:
[page]
3 Implementing Linear Prediction on Matlab
3.1 Windowing and autocorrelation function calculation
G.729 speech coding algorithms are mostly implemented in C language, which makes the program very lengthy and has certain limitations. Since Matlab has powerful functions in data analysis, automatic control, digital signal processing, and drawing, for intuitive expression, Matlab is used to implement the linear prediction analysis algorithm.
Figure 1 shows a schematic diagram of a hybrid window, which consists of two parts. The first half is a 1/2 Hamming window; the second half is a 1/4 cosine function:
Where: 5 ms (40 samples) come from the next frame, 15 ms (120 samples) come from the previous frame, and the windowing process of the current frame 10 ms (80 samples) is shown in Figure 1.
The windowed image is shown in Figure 2.
3.2 Determination of LP coefficients
To solve the linear prediction filter coefficients ai, the most classic Levinson-Durbin algorithm will be used. The steps of this algorithm are:
The superscripts in brackets in the above formulas represent the divisors of the predictor. Steps (1) to (4) can be recursively solved for I=1, 2, ..., P, where: E is the minimum mean square prediction error; R is the autocorrelation coefficient; k is the reflection coefficient, which ranges from [-1, 1]; aij is the jth coefficient of the i-th order predictor. Through calculation, it is found that in fact, in the calculation process, although the goal is to calculate all the coefficients of a p-order linear predictor, in fact, in the recursive process, only all the coefficients with divisors lower than the p-order linear predictor are calculated, and the minimum prediction error energy is calculated at the same time (the range of the reflection coefficient in the recursive formula is a necessary and sufficient condition to ensure the stability of the system H(z), that is, all the roots of the polynomial A(z) fall within the unit circle).
From the derivation, we know that the physical meaning of φn(j, i) can be understood as the short-time autocorrelation function of sn. Therefore, it reflects the actual situation of the speech waveform, that is, different waveforms have different values. However, the value of ai is determined by φn(j, i) and changes with the change of φn(j, i), so it can also be said that ai reflects the actual situation of the speech waveform.
Its Matlab description is as follows:
4 Conclusion
Linear prediction is widely used in speech processing, and using Matlab to implement it can intuitively know the analysis results, laying the foundation for the next step of implementing the algorithm on DSP.
Previous article:Prediction of CH4 concentration in biogas based on support vector machine
Next article:Real-time monitoring system of intelligent instruments based on Kingview and VB
Recommended ReadingLatest update time:2024-11-16 17:48
- Keysight Technologies Helps Samsung Electronics Successfully Validate FiRa® 2.0 Safe Distance Measurement Test Case
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- Seizing the Opportunities in the Chinese Application Market: NI's Challenges and Answers
- Tektronix Launches Breakthrough Power Measurement Tools to Accelerate Innovation as Global Electrification Accelerates
- Not all oscilloscopes are created equal: Why ADCs and low noise floor matter
- Enable TekHSI high-speed interface function to accelerate the remote transmission of waveform data
- How to measure the quality of soft start thyristor
- How to use a multimeter to judge whether a soft starter is good or bad
- What are the advantages and disadvantages of non-contact temperature sensors?
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- ATE1133 audio decoding solution, USB sound card solution, TYPE C audio adapter chip solution
- What kind of LCD screen is suitable for use in a vibration environment
- Baobaobao~~~After the Chinese New Year, what new developments are there in the evaluation industry? Hurry up and take a look at the Evaluation Intelligence Bureau~
- [NUCLEO-L552ZE Review] Small thermometer
- STM3L4R5 driver for hts221 and stts751
- Cytech’s award-winning live broadcast: Let you learn about ADI’s digital health biosensor series live!
- Evaluation Weekly Report 20220406: Germany's PHYTEC's i.MX 8M+ AI board and RTT Renesas high-performance CPK-RA6M4 are here
- Is this post of the study club incomplete? The formula part?
- [NXP Rapid IoT Review] + Review the Bluetooth function and learn how to program the application
- I need help with a fully digital phase-locked loop Verilog code and modelsim simulation