Speech signal processing is an important branch of digital signal processing. This book contains many digital signal processing methods and MATLAB functions. The book has 10 chapters. Chapters 1 to 4 introduce some basic analysis methods and means of speech signal processing, as well as the corresponding MATLAB functions; Chapters 5 to 9 introduce speech signal preprocessing and feature extraction, including trend elimination and basic noise reduction methods, as well as endpoint detection, pitch extraction and formant extraction, and use the basic methods of speech signal processing to provide a variety of extraction methods and corresponding MATLAB programs; Chapter 10 combines the detection of various parameters to introduce speech signal synthesis, speech signal speed change and pitch change processing, and also introduces time domain pitch synchronization superposition (TD PSOLA) speech synthesis, and provides the corresponding MATLAB program. Appendix A provides methods and ideas for debugging complex programs. Chapter 1 Speech Production and Perception……………………………………………… 1 1.1 Vocal Organs………………………………………………………………………… 1 1.2 Digital Model of Speech Signals…………………………………………… 2 1.2.1 Excitation Model………………………………………………………………… 3 1.2.2 Vocal Tract Model…………………………………………………………… 4 1.2.3 Radiation Model………………………………………………………………… 7 1.3 Speech Perception…………………………………………………………… 7 1.3.1 Structure of the Human Ear………………………………………………… 7 1.3.2 Auditory Receptivity…………………………………………………………… 8 1.3.4 Loudness……………………………………………………………… 10 1.3.5 Pitch…………………………………………………………………… 11 References…………………………………………………………………………… 11 Chapter 2 Time Domain and Frequency Domain Characteristics of Speech Signals and Short-Time Analysis Techniques… 12 2.1 Speech Signal Framing in MATLAB…………………………… 12 2.2 Window Functions in Speech Analysis…………………………… 15 2.3 Short-Time Domain Processing of Speech Signals………………… 16 2.3.1 Short-Time Energy and Short-Time Average Amplitude……………………… 16 2.3.2 2.3.3 Short-time autocorrelation function……………………………………………… 19 2.3.4 Short-time average amplitude difference function……………………………………… 20 2.4 Short-time frequency domain processing of speech signals…………………………… 21 2.4.1 Definition of short-time Fourier transform……………………………………… 22 2.4.2 Spectrogram………………………………………………………………… 25 2.4.3 Short-time power spectral density………………………………… 27 References……………………………………………………………………… 29 Chapter 3 Analysis techniques and characteristics of speech signals in other transform domains… 30 3.1 3.1.1 Basic principles of homomorphic processing……………………………………………… 30 3.1.2 Complex cepstrum and cepstrum…………………………………………………… 31 3.2 Discrete cosine transform…………………………………………………………… 34 3.3 Analysis of Mel frequency cepstrum coefficients………………………………… 37 3.3.1 Mel filter bank………………………………………………………………… 37 3.3.2 MFCC feature parameter extraction……………………………………… 38 3.4 Wavelet and wavelet packet transform………………………………… 43 3.4.1 Wavelet transform…………………………………………………………………… 43 3.4.2 Wavelet Packet Transform…………………………………………………… 44 3.4.3 Wavelet Packet Algorithm……………………………………………………… 45 3.4.4 One-dimensional Wavelet and Wavelet Packet Transform Functions in MATLAB… 46 3.4.5 Examples of Wavelet and Wavelet Packet Transform of Speech Signals in MATLAB… 49 3.5 Basic Theory and Algorithm of EMD…………………………… 53 3.5.1 Basic Concepts of EMD………………………………………………… 53 3.5.2 Basic Principles of EMD…………………………………………… 55 3.5.3 Completeness and Orthogonality of EMD Method……………………… 57 3.5.4 3.5.5 MATLAB function of EMD method……………………………………………… 60 References…………………………………………………………………………………… 61 Chapter 4 Linear prediction analysis of speech signals………………………………… 62 4.1 Basic principles of linear prediction analysis…………………………………… 62 4.1.1 Signal model………………………………………………………………………… 62 4.1.2 Establishment of linear prediction equation………………………………… 64 4.1.3 Linear prediction analysis of speech signals…………………………… 65 4.2 Solution of autocorrelation and autocovariance in linear prediction analysis… 66 4.2.1 4.2.2 Covariance method……………………………………………………………… 71 4.3 Solution of the lattice method for linear prediction analysis………………………………… 72 4.3.1 Basic principle of the lattice method…………………………………………… 72 4.3.2 Solution of the lattice method…………………………………………… 74 4.4 Other parameters derived from linear prediction………………………………… 78 4.4.1 Prediction error and its autocorrelation function……………………… 79 4.4.2 Reflection coefficient and vocal tract area………………………………… 79 4.4.3 Spectrum of Linear Prediction and Roots of the Prediction Error Filter A(z) Polynomial ……………………………………………… 81 4.4.4 Linear Prediction Cepstrum………………………………………………………… 83 4.5 Analysis of Line Spectrum Pairs………………………………………………………… 86 4.5.1 Definition and Characteristics of LSP………………………………………………… 87 4.5.2 Conversion from LPC to LSP Parameters………………………………… 89 4.5.3 Conversion from LSP Parameters to LPC…………………………… 91 References…………………………………………………………………………… 95 Chapter 5 Noisy Speech and Preprocessing………………………………… 96 5.1 Pure Speech and Noisy Speech…………………………………………… 96 5.2 Signal-to-Noise Ratio………………………………………………………… 96 5.3 Generation of Noisy Speech…………………………………………………… 97 5.4 Preprocessing of Speech Signal I—Eliminating Trend Term and DC Component… 101 5.4.1 Principle of Least Squares Fitting Trend Term………………………………… 102 5.4.2 Function of Least Squares Fitting Trend Term Elimination… 103 5.5 Preprocessing of Speech Signal II—Digital Filter… 105 5.5.1 Design of IIR Low-pass, High-pass, Band-pass and Band-stop Filters… 105 5.5.2 Design of FIR Low-pass, High-pass, Band-pass and Band-stop Filters… 109 References………………………………………………………………………… 116 Chapter 6 Speech Endpoint Detection………………………………………………… 117 6.1 Double Threshold Method…………………………………………………………… 117 6.2 Improvement and Extension of Double Threshold Method……………………… 123 6.2.1 Influence of Noise…………………………………………………………… 123 6.2.2 Smoothing Processing……………………………………………………… 125 6.2.3 Double Threshold Detection Method with Dual Parameters… 127 6.2.4 Double Threshold Detection Method with Single Parameters… 129 6.3 Endpoint Detection Based on Correlation Method…………………………… 131 6.3.1 Endpoint Detection of Maximum Auto/Cross-Correlation Function ………………………… 131 6.3.2 Endpoint Detection of Normalized Autocorrelation Function ………………………… 134 6.3.3 Endpoint Detection of Ratio of Main Peak to Sub-Peak of Autocorrelation Function ………… 136 6.3.4 Endpoint Detection of Cosine Angle of Autocorrelation Function ………… 138 6.4 Speech Endpoint Detection by Variance Method …………………………………… 141 6.4.1 Endpoint Detection of Band Variance ………………………………………… 141 6.4.2 Endpoint Detection of Band Variance with Uniform Subband Separation ………… 142 6.4.3 Endpoint Detection of BARK Subband Variance in Frequency Domain …………………… 143 6.4.4 Endpoint Detection Based on Wavelet Packet BARK Subband Variance…………………………………… 145 6.5 Endpoint Detection Based on Spectral Distance Method…………………………………… 148 6.5.1 Endpoint Detection Based on Logarithmic Spectral Distance………………………………… 149 6.5.2 Endpoint Detection Based on Cepstrum Distance……………………………………… 151 6.5.3 Endpoint Detection Based on MFCC Cepstrum Distance…………………………… 153 6.6 Application of Spectral Entropy in Endpoint Detection…………………………………… 155 6.6.1 Endpoint Detection Based on Spectral Entropy Method……………………………… 155 6.6.2 Improvement of Endpoint Detection Based on Spectral Entropy Method………………………… 156 6.7 Endpoint Detection Based on Energy Zero Ratio and Energy Entropy Ratio …………………………………… 159 6.7.1 Endpoint Detection Based on Energy Zero Ratio ………………………………………… 159 6.7.2 Endpoint Detection Based on Energy Entropy Ratio Method ……………………………………… 161 6.8 Application of Wavelet Transform and EMD Decomposition in Endpoint Detection …………………………………… 162 6.8.1 Application of Wavelet Transform in Endpoint Detection …………………………………… 162 6.8.2 Application of EMD Decomposition in Endpoint Detection ………………………… 164 6.9 Endpoint Detection at Low Signal-to-Noise Ratio ……………………………… 167 6.9.1 Noise Estimation ……………………………………………………………… 168 6.9.2 6.9.3 Endpoint Detection of Multi-Window Spectral Estimation Spectral Subtraction and Energy Entropy Ratio Method… 172 References……………………………………………………………………………… 174 Chapter 7 Noise Reduction of Speech Signals……………………………………… 176 7.1 Adaptive Filter Noise Reduction…………………………………………… 176 7.1.1 Basic Principles of LMS Algorithm…………………………………………… 176 7.1.2 Basic LMS Adaptive Algorithm…………………………………………… 178 7.1.3 Adaptive Notch Filter of LMS………………………………… 181 7.2 7.2 Noise Reduction by Spectral Subtraction…………………………………………………… 184 7.2.1 Basic Spectral Subtraction…………………………………………………… 184 7.2.2 Improved Spectral Subtraction………………………………………………… 187 7.3 Noise Reduction by Wiener Filtering……………………………………………… 195 7.3.1 Basic Principle of Wiener Filtering…………………………………………… 195 7.3.2 Specific Steps and Function WienerScalart96 of Wiener Filtering Noise Reduction… 197 7.3.3 MATLAB Example of Wiener Filtering……………………………………… 199 References……………………………………………………………………………… 201 Chapter 8 Methods for estimating fundamental frequency period………………………………………………………… 202 8.1 Preprocessing of Pitch Period Extraction……………………………………………… 203 8.1.1 Endpoint Detection in Pitch Detection……………………………………………… 203 8.1.2 Bandpass Filter in Pitch Detection…………………………………………… 204 8.2 Pitch Detection Using Cepstrum Method………………………………………… 205 8.2.1 Principle of Pitch Detection Using Cepstrum Method…………………………… 205 8.2.2 MATLAB Program for Pitch Detection Using Cepstrum Method…………… 206 8.2.3 Simple Post-Processing Method…………………………………………… 207 8.3 Pitch Detection Using Short-Time Autocorrelation Method………………… 209 8.3.1 8.3.2 Autocorrelation method for center clipping ………………………………………… 211 8.3.3 Cross-correlation method for three-level clipping ……………………………… 212 8.3.4 MATLAB program for extracting fundamental frequency based on autocorrelation function method… 214 8.4 Pitch detection based on short-time average amplitude difference function…………… 215 8.4.1 Short-time average amplitude difference function method…………………………… 215 8.4.2 Improved short-time average amplitude difference function method………………… 217 8.4.3 Cyclic average amplitude difference function method…………………………… 218 8.4.4 8.4.5 Combination of the Autocorrelation Function Method and the Average Amplitude Difference Function Method…………………………………… 221 8.5 Pitch Detection Based on Linear Prediction……………………………………………… 223 8.5.1 Linear Prediction Cepstrum Method…………………………………………………… 223 8.5.2 Simplified Inverse Filtering Method…………………………………………… 225 8.6 Further Improvement of Pitch Detection…………………………………………… 227 8.6.1 Principle and Method of Main Extension Method………………………………… 228 8.6.2 Steps of Main Extension Pitch Detection Method…………………………… 229 8.6.3 8.6.4 Pitch Detection of the Main Vowel Body …………………………………… 232 8.6.5 Calculation of Extension Interval and Length ………………………………………… 239 8.6.6 Pitch Detection in the Extension Interval …………………………………… 241 8.6.7 MATLAB Program for Main Extension Pitch Detection Method ………… 248 8.7 Pitch Detection in Noisy Speech ……………………………………… 251 8.7.1 Wavelet Autocorrelation Function Method ………………………………… 251 8.7.2 Spectral Subtraction Autocorrelation Function Method ………… 253 8.7.3 Combination of spectral subtraction and body extension method……………………………… 255 References…………………………………………………………………………………… 258 Chapter 9 Methods for estimating resonance peaks………………………………………… 259 9.1 Pre-emphasis and endpoint detection…………………………………… 259 9.1.1 Pre-emphasis………………………………………………………………………… 259 9.1.2 Endpoint detection…………………………………………………………………… 260 9.2 Estimation of resonance peaks by cepstrum method……………………………… 260 9.2.1 Principle of resonance peak estimation by cepstrum method……………………………… 260 9.2.2 MATLAB program for resonance peak estimation by cepstrum method………………………… 261 9.3 Formant Estimation Using LPC Method……………………………………………… 262 9.3.1 Principle of Formant Estimation Using LPC Method………………………………… 262 9.3.2 Formant Estimation Using LPC Interpolation Method…………………………… 263 9.3.3 Formant Estimation Using LPC Root Finding Method………………… 266 9.4 Formant Detection Using LPC Method for Continuous Speech…………………………… 268 9.4.1 Simple LPC Formant Detection…………………………………………… 268 9.4.2 Improved LPC Formant Detection……………………………………… 270 9.5 Formant Detection Based on Hilbert Huang Transform (HHT)………………… 274 9.5.1 Hilbert Transform………………………………………………………… 275 9.5.2 Another Model of Speech Signal—AM FM Model… 278 9.5.3 Analysis of AM FM Model………………………………………………… 279 9.5.4 HHT Method for Extracting Formant Feature Parameters of Speech Signal… 279 9.5.5 Formant Detection Steps and MATLAB Program Based on Hilbert Huang Transform… 280 References……………………………………………………………………… 283 Chapter 10 Speech Signal Synthesis Algorithm………………………………… 284 Appendix A Program Debugging and Modification……………………… 3453 Cross-correlation function method for three-level clipping………………………………………… 212 8.3.4 MATLAB program for extracting fundamental tone based on autocorrelation function method… 214 8.4 Fundamental tone detection based on short-time average amplitude difference function…………………………… 215 8.4.1 Short-time average amplitude difference function method………………………………………… 215 8.4.2 Improved short-time average amplitude difference function method……………………… 217 8.4.3 Cyclic average amplitude difference function method………………………………… 218 8.4.4 MATLAB program for extracting fundamental tone based on average amplitude difference function method… 220 8.4.5 Combination of autocorrelation function method and average amplitude difference function method… 221 8.5 8.5.1 Linear Prediction Cepstrum Method ………………………………………… 223 8.5.2 Simplified Inverse Filtering Method ………………………………………… 225 8.6 Further Improvement of Pitch Detection …………………………………… 227 8.6.1 Principle and Method of Main Body Extension Method ………… 228 8.6.2 Steps of Main Body Extension Pitch Detection Method ………… 229 8.6.3 Endpoint Detection and Vowel Main Body Detection ………… 230 8.6.4 Pitch Detection of Vowel Main Body ………………………… 232 8.6.5 Calculating the extension interval and length……………………………………………… 239 8.6.6 Pitch detection in the extension interval…………………………………………… 241 8.6.7 MATLAB program for the main extension pitch detection method………………… 248 8.7 Pitch detection in noisy speech………………………………………………… 251 8.7.1 Wavelet autocorrelation function method……………………………………… 251 8.7.2 Spectral subtraction autocorrelation function method………………… 253 8.7.3 Combining spectral subtraction and main extension method…………… 255 References…………………………………………………………………………………… 258 Chapter 9 Formant estimation method………………………………………………………… 259 9.1 Pre-emphasis and endpoint detection……………………………………………… 259 9.1.1 Pre-emphasis…………………………………………………………………… 259 9.1.2 Endpoint detection……………………………………………………………… 260 9.2 Estimation of resonance peaks by cepstrum method…………………………………… 260 9.2.1 Principle of resonance peak estimation by cepstrum method…………………………… 260 9.2.2 MATLAB program for resonance peak estimation by cepstrum method…………… 261 9.3 Estimation of resonance peaks by LPC method……………………………… 262 9.3.1 Principle of resonance peak estimation by LPC method……………………………… 262 9.3.2 9.3.3 Formant Estimation by LPC Rooting Method …………………………………… 266 9.4 Formant Detection of Continuous Speech Using LPC Method ………………………… 268 9.4.1 Simple LPC Formant Detection ……………………………………………… 268 9.4.2 Improved LPC Formant Detection ………………………………………… 270 9.5 Formant Detection Based on Hilbert Huang Transform (HHT) ………… 274 9.5.1 Hilbert Transformation …………………………………………………… 275 9.5.2 Another Model of Speech Signal—AM FM Model …………………… 278 9.5.3 AM FM Model …………………………………………………… 279 9.5.4 HHT method for extracting formant feature parameters of speech signals… 279 9.5.5 Formant detection steps and MATLAB program based on Hilbert Huang transform… 280 References……………………………………………………………………… 283 Chapter 10 Speech signal synthesis algorithm………………………………… 284 Appendix A Program debugging and modification…………………………… 3453 Cross-correlation function method for three-level clipping………………………………………… 212 8.3.4 MATLAB program for extracting fundamental tone based on autocorrelation function method… 214 8.4 Fundamental tone detection based on short-time average amplitude difference function…………………………… 215 8.4.1 Short-time average amplitude difference function method………………………………………… 215 8.4.2 Improved short-time average amplitude difference function method……………………… 217 8.4.3 Cyclic average amplitude difference function method………………………………… 218 8.4.4 MATLAB program for extracting fundamental tone based on average amplitude difference function method… 220 8.4.5 Combination of autocorrelation function method and average amplitude difference function method… 221 8.5 8.5.1 Linear Prediction Cepstrum Method ………………………………………… 223 8.5.2 Simplified Inverse Filtering Method ………………………………………… 225 8.6 Further Improvement of Pitch Detection …………………………………… 227 8.6.1 Principle and Method of Main Body Extension Method ………… 228 8.6.2 Steps of Main Body Extension Pitch Detection Method ………… 229 8.6.3 Endpoint Detection and Vowel Main Body Detection ………… 230 8.6.4 Pitch Detection of Vowel Main Body ………………………… 232 8.6.5 Calculating the extension interval and length……………………………………………… 239 8.6.6 Pitch detection in the extension interval…………………………………………… 241 8.6.7 MATLAB program for the main extension pitch detection method………………… 248 8.7 Pitch detection in noisy speech………………………………………………… 251 8.7.1 Wavelet autocorrelation function method……………………………………… 251 8.7.2 Spectral subtraction autocorrelation function method………………… 253 8.7.3 Combining spectral subtraction and main extension method…………… 255 References…………………………………………………………………………………… 258 Chapter 9 Formant estimation method………………………………………………………… 259 9.1 Pre-emphasis and endpoint detection……………………………………………… 259 9.1.1 Pre-emphasis…………………………………………………………………… 259 9.1.2 Endpoint detection……………………………………………………………… 260 9.2 Estimation of resonance peaks by cepstrum method…………………………………… 260 9.2.1 Principle of resonance peak estimation by cepstrum method…………………………… 260 9.2.2 MATLAB program for resonance peak estimation by cepstrum method…………… 261 9.3 Estimation of resonance peaks by LPC method……………………………… 262 9.3.1 Principle of resonance peak estimation by LPC method……………………………… 262 9.3.2 9.3.3 Formant Estimation by LPC Rooting Method …………………………………… 266 9.4 Formant Detection of Continuous Speech Using LPC Method ………………………… 268 9.4.1 Simple LPC Formant Detection ……………………………………………… 268 9.4.2 Improved LPC Formant Detection ………………………………………… 270 9.5 Formant Detection Based on Hilbert Huang Transform (HHT) ………… 274 9.5.1 Hilbert Transformation …………………………………………………… 275 9.5.2 Another Model of Speech Signal—AM FM Model …………………… 278 9.5.3 AM FM Model …………………………………………………… 279 9.5.4 HHT method for extracting formant feature parameters of speech signals… 279 9.5.5 Formant detection steps and MATLAB program based on Hilbert Huang transform… 280 References……………………………………………………………………… 283 Chapter 10 Speech signal synthesis algorithm………………………………… 284 Appendix A Program debugging and modification…………………………… 3457.1 Wavelet Autocorrelation Function Method……………………………………………… 251 8.7.2 Spectral Subtraction Autocorrelation Function Method…………………………… 253 8.7.3 Combination of Spectral Subtraction Method and Body Extension Method… 255 References…………………………………………………………………………… 258 Chapter 9 Resonant Peak Estimation Method……………………………………… 259 9.1 Pre-emphasis and Endpoint Detection……………………………………… 259 9.1.1 Pre-emphasis……………………………………………………………… 259 9.1.2 Endpoint Detection……………………………………………………………… 260 9.2 Resonant Peak Estimation by Cepstrum Method………………………………… 261 260 9.2.1 The principle of formant estimation using the cepstrum method………………………………………… 260 9.2.2 MATLAB program for formant estimation using the cepstrum method……………………………… 261 9.3 Formant estimation using the LPC method……………………………………………… 262 9.3.1 The principle of formant estimation using the LPC method………………………………… 262 9.3.2 Formant estimation using the LPC interpolation method…………………………… 263 9.3.3 Formant estimation using the LPC root method…………………………… 266 9.4 Formant detection using the LPC method for continuous speech…………………… 268 9.4.1 Simple LPC formant detection……………………………………………… 268 9.4.2 Improved LPC Formant Detection ………………………………………… 270 9.5 Formant Detection Based on Hilbert Huang Transform (HHT) ………………………… 274 9.5.1 Hilbert Transform ………………………………………………………… 275 9.5.2 Another Model of Speech Signal—AM FM Model ………………………… 278 9.5.3 Analysis of AM FM Model ………………………………………… 279 9.5.4 HHT Method for Extracting Formant Feature Parameters of Speech Signal ……………………………… 279 9.5.5 Formant Detection Steps and MATLAB Program Based on Hilbert Huang Transform …………………………………… 280 References ……………………………………………………………………………… 283 Chapter 10 Speech Signal Synthesis Algorithm ……………………………………………… 284 Appendix A Program Debugging and Modification ……………………………… 3457.1 Wavelet Autocorrelation Function Method……………………………………………… 251 8.7.2 Spectral Subtraction Autocorrelation Function Method…………………………… 253 8.7.3 Combination of Spectral Subtraction Method and Body Extension Method… 255 References…………………………………………………………………………… 258 Chapter 9 Resonant Peak Estimation Method……………………………………… 259 9.1 Pre-emphasis and Endpoint Detection……………………………………… 259 9.1.1 Pre-emphasis……………………………………………………………… 259 9.1.2 Endpoint Detection……………………………………………………………… 260 9.2 Resonant Peak Estimation by Cepstrum Method………………………………… 261 260 9.2.1 The principle of formant estimation using the cepstrum method………………………………………… 260 9.2.2 MATLAB program for formant estimation using the cepstrum method……………………………… 261 9.3 Formant estimation using the LPC method……………………………………………… 262 9.3.1 The principle of formant estimation using the LPC method………………………………… 262 9.3.2 Formant estimation using the LPC interpolation method…………………………… 263 9.3.3 Formant estimation using the LPC root method…………………………… 266 9.4 Formant detection using the LPC method for continuous speech…………………… 268 9.4.1 Simple LPC formant detection……………………………………………… 268 9.4.2 Improved LPC Formant Detection ………………………………………… 270 9.5 Formant Detection Based on Hilbert Huang Transform (HHT) ………………………… 274 9.5.1 Hilbert Transform ………………………………………………………… 275 9.5.2 Another Model of Speech Signal—AM FM Model ………………………… 278 9.5.3 Analysis of AM FM Model ………………………………………… 279 9.5.4 HHT Method for Extracting Formant Feature Parameters of Speech Signal ……………………………… 279 9.5.5 Formant Detection Steps and MATLAB Program Based on Hilbert Huang Transform …………………………………… 280 References ……………………………………………………………………………… 283 Chapter 10 Speech Signal Synthesis Algorithm ……………………………………………… 284 Appendix A Program Debugging and Modification ……………………………… 345
You Might Like
Recommended ContentMore
Open source project More
Popular Components
Searched by Users
Just Take a LookMore
Trending Downloads
Trending ArticlesMore