Speech recognition technology is a technology that allows machines to convert speech signals into text through recognition, and then convert them into instructions through understanding. The purpose is to give machines human auditory characteristics, understand what people say, and take corresponding actions. Speech recognition systems usually consist of two parts: acoustic recognition model and language understanding model, which correspond to the calculation of speech to syllable and syllable to word respectively. A continuous speech recognition system (as shown below) generally consists of four main parts: feature extraction, acoustic model, language model and decoder.
(1) The speech input preprocessing module processes the raw speech input signal, filters out unimportant information and background noise, and performs speech signal endpoint detection (that is, finding the beginning and end of the speech signal) and speech frame segmentation (it can be roughly understood that a speech segment is like a video, consisting of many ordered frames, and the speech signal can be cut into individual "frames" for analysis).
(2) Feature extraction: After removing redundant information in the speech signal that is useless for speech recognition, the information that can reflect the essential characteristics of the speech is retained for processing and expressed in a certain form. In other words, the key feature parameters that reflect the characteristics of the speech signal are extracted to form a feature vector sequence for subsequent processing.
(3) Acoustic model training. The acoustic model can be understood as the modeling of sound, which can convert speech input into an acoustic output. To be more precise, it gives the probability that the speech belongs to a certain acoustic symbol. The acoustic model parameters are trained based on the characteristic parameters of the training speech library. During recognition, the characteristic parameters of the speech to be recognized can be matched with the acoustic model to obtain the recognition result. The current mainstream speech recognition systems mostly use the hidden Markov model HMM for acoustic model modeling.
(4) Language model training. A language model is a model used to calculate the probability of a sentence appearing. Simply put, it is to calculate the probability of whether a sentence is grammatically correct. Because the structure of a sentence is often regular, the words that appear in the front often predict the words that may appear later. It is mainly used to determine which word sequence is more likely, or to predict the next word to appear when several words appear. It defines which words can follow the last recognized word (matching is a sequential processing process), so that some impossible words can be excluded from the matching process.
Language modeling can effectively combine the knowledge of Chinese grammar and semantics to describe the internal relationship between words, thereby improving the recognition rate and reducing the search scope. The training text database is subjected to grammatical and semantic analysis, and the language model is obtained through training based on the statistical model.
(5) Speech decoding and search algorithm. The decoder refers to the recognition process in speech technology. For the input speech signal, a recognition network is established based on the trained HMM acoustic model, language model and dictionary. The search algorithm is used to find the best path in the network. This path is the word string that can output the speech signal with the highest probability, thus determining the text contained in the speech sample. Therefore, the decoding operation refers to the search algorithm, that is, the method of finding the optimal word string through search technology at the decoding end.
The search in continuous speech recognition is to find a word model sequence to describe the input speech signal, so as to obtain a word decoding sequence. The search is based on the acoustic model score and language model score in the formula. In actual use, it is often necessary to add a high weight to the language model based on experience and set a long word penalty score.
Speech recognition is essentially a pattern recognition process, where the pattern of unknown speech is compared with the reference pattern of known speech one by one, and the best matching reference pattern is used as the recognition result. The mainstream algorithms of speech recognition technology today mainly include the dynamic time warping (DTW) algorithm, the vector quantization (VQ) method based on the non-parametric model, the hidden Markov model (HMM) method based on the parametric model, and speech recognition methods based on deep learning and support vector machines in recent years.
Previous article:Solutions for audio and video live streaming systems and cloud servers
Next article:Application of security technology in smart home
- Popular Resources
- Popular amplifiers
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Please tell me, what is the problem of no response for a long time during burning?
- Recruiting embedded software development engineers, embedded BSP engineers
- Basic Theory of System Timing
- Application of RFID and other Internet of Things technologies in smart ranches
- EEWORLD University Hall ---- Engineering Test Technology Foundation Huazhong University of Science and Technology
- Looking for national standards? Here is the national standards website, which will definitely be useful to you!
- How to achieve automated testing of LCR?
- How to speed up the program running speed of Huada HC32F460 HC32F4A0?
- FPDLINK Spark Interference Optimization
- Why do electrolytic capacitors explode? Find out in one article!