Abstract: In order to improve the efficiency of speech recognition and its dependence on the environment, this paper analyzes and improves the speech recognition algorithm and hardware parts. The ARMS3C2410 microprocessor is used as the main control module, the UDA1314TS audio processing chip is used as the speech recognition module, and the HMM acoustic model and Viterbi algorithm are used for pattern training and recognition. A continuous, small-word speech recognition system is designed. Experiments have shown that the speech recognition system has a high recognition rate and a certain degree of robustness. The laboratory recognition rate and outdoor recognition rate are 95.6% and 92.3% respectively.
Keywords: speech recognition; embedded system; Hidden Markov Models; ARM; Viterbi algorithm
0 Introduction
Embedded speech recognition system is a speech recognition system that uses various advanced microprocessors to implement software or hardware at the board level or chip level. The combination of embedded technology and speech recognition technology enables people to get rid of keyboards and operate intelligent terminals through voice commands. This natural and fast interaction between people and intelligent terminals helps to improve the efficiency of human-computer interaction, adapt to the characteristics of embedded platforms with less storage resources and high real-time requirements, and enhance people's control over intelligent devices. At the same time, the development of speech recognition technology is characterized by the wide application of HMM. The algorithm conducts data statistics on a large amount of speech data to establish a statistical model for identifying terms, then extracts features from the speech to be identified, matches them with these models, and obtains recognition results by comparing the matching probabilities. By establishing a large number of speech databases, a robust statistical model can be obtained to improve the recognition efficiency in various practical situations.
1 Markov chain and hidden Markov model (HMM)
The speech signal is an observable sequence. Its characteristics are approximately stable in a sufficiently small time period, but its overall process can be regarded as a transition from a relatively stable characteristic to another characteristic in sequence. Many linear models can be connected in series in the entire analysis interval, which is the Markov chain. The Markov chain is a special case of the Markov random process, that is, the Markov process with both discrete state and time parameters of the Markov chain.
Hidden Markov model is a statistical model for the time series structure of speech signal, which can be regarded as a mathematical double random process: one is the implicit random process of simulating the change of statistical characteristics of speech signal by using Markov chain with finite number of states, and the other is the random process of observation sequence associated with each state of Markov chain. The former is expressed by the latter, but the specific parameters of the former are unmeasurable.
Generally speaking, an HMM is a double random process, which is described by the following five parameters:
2 Implementation of speech recognition system based on HMM
The human speech process is actually a double random process. The speech signal itself is an observable time-varying sequence, which is a parameter stream of phonemes emitted by the brain according to grammatical knowledge and speech needs (unobservable states). HMM reasonably imitates this process, and well describes the overall non-stationarity and local stationarity of speech signal, which is a more ideal speech model. From the perspective of the whole speech, human speech is a non-stationary random process, but if the whole speech is divided into several short-time speech signals, these short-time speech signals can be considered as stationary processes, and these short-time speech signals can be analyzed by linear means. If a hidden Markov model is established for these speech signals, short-term stable signal segments with different parameters can be identified, and the transformation between them can be tracked, thus solving the problem of modeling the pronunciation rate and acoustic changes of speech.
The speech recognition system first converts the analog speech signal into a digital speech signal through the A/D converter in the chip, and then processes the digital speech signal (signal windowing and filtering) to obtain a clean speech signal. Then, a feature vector is made through the feature extraction process to extract speech features. Finally, the recognition process recognizes the speaker's speech and obtains the recognition result. In general, the entire recognition process is divided into several main stages, including speech signal preprocessing, speech signal feature extraction, speech library establishment, and speech signal recognition, as shown in Figure 1.
The speech recognition process is divided into two parts: one is the HMM training process to obtain the HMM speech recognition model, that is, to establish a basic recognition speech library; the other is the HMM recognition process to obtain the speech recognition results.
2.1 HMM training
The HMM algorithm is a common method for solving recognition problems. There are N states in an HMM model. For an observation sequence of length T, if it is calculated according to the definition, 2TNT operations are required, which is unacceptable. The HMM algorithm can simplify this process.
If the distance between P(O/λZ) and
is too large, return to step (2) and iterate until the HMM model parameters no longer change significantly.
2.2 HMM model recognition
The output probability of the HMM model is calculated using the Viterbi algorithm. Since the probability value is generally much smaller than 1, the logarithmic probability is used as the output value:
In the above formula, δt(i) represents the cumulative output probability of the i-th state at time t; φt(i) represents the previous state number of the i-th state at time t;
is the state at time t in the optimal state sequence; P* is the final output probability.
3 Experimental results
The system first inputs the voice signal into the UDA1341 TS digital audio processing chip through the microphone of the voice input module, and sends instructions to the UDA1341 digital audio processing chip through S3C2410. The digital audio processing chip samples the voice signal through the internal A/D, calls the voice compression algorithm to compress the voice signal, and calls the voice recognition function API to perform voice recognition based on the pattern matching algorithm on the input voice. Finally, the UDA1341 digital audio processing chip transmits the recognition result to the ARM S3C2410 through I/O. After receiving the recognition result, S3C2410 sends different instructions to UDA1341 TS according to different recognition results, so as to realize the function of the voice recognition system.
The system uses Samsung's S3C2410 as an embedded CPU, which is a cost-effective, low-power, high-performance, and highly integrated CPU based on the ARM9 core with a main frequency of 203 MHz. It is designed for network communications and handheld devices and can meet the requirements of low cost, low power consumption, high performance, and small size in the voice recognition system.
The experiment used 10 Chinese characters and tested them in outdoor and laboratory environments. The results are shown in Table 1.
The test shows that the results obtained by the system on the UDA1314TS DSP chip in the laboratory environment are quite satisfactory, with good robustness and the recognition rate reaching the practical requirements. However, the recognition rate under outdoor high noise conditions is somewhat lower than that in the laboratory environment, which meets the basic requirements of speech recognition.
4 Conclusions
The system in this paper adopts the speech recognition algorithm of the hidden Markov model, which can recognize small words and continuous speech with a high recognition rate. The combined application of the ARMS3C2410 microprocessor and the UDA1314TS audio processing chip can make this speech recognition system have strong real-time performance. The small size, easy to carry, flexible use and strong portability make the system able to be used in the field of industrial voice control after further improvement and development, and can also be used in people's daily life such as voice-controlled toys and voice-controlled equipment.
However, due to the limitations of technical level and hardware environment, the speech recognition system needs further research and improvement in algorithms and hardware. The research on this embedded speech recognition system has made important attempts and explorations for the further development and research of practical embedded speech recognition systems.
Previous article:Design of Electric Power Steering System Based on DSP
Next article:A protocol stack design and implementation
- Popular Resources
- Popular amplifiers
- Molex leverages SAP solutions to drive smart supply chain collaboration
- Pickering Launches New Future-Proof PXIe Single-Slot Controller for High-Performance Test and Measurement Applications
- CGD and Qorvo to jointly revolutionize motor control solutions
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Nidec Intelligent Motion is the first to launch an electric clutch ECU for two-wheeled vehicles
- Bosch and Tsinghua University renew cooperation agreement on artificial intelligence research to jointly promote the development of artificial intelligence in the industrial field
- GigaDevice unveils new MCU products, deeply unlocking industrial application scenarios with diversified products and solutions
- Advantech: Investing in Edge AI Innovation to Drive an Intelligent Future
- CGD and QORVO will revolutionize motor control solutions
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- STM32 The simplest IAP
- TI blog post: How to select a power supply for automotive camera modules
- EEWORLD University Hall----Live Replay: Evolution and Update of Keysight Signal Integrity Test
- High-Speed Digital Interface Principles and Test Guide
- 【NXP Rapid IoT Review】+ 1. Late unboxing
- The power chip of the Xiaomi Philips Zhirui desk lamp 2nd generation is damaged
- Basic knowledge of Zigbee
- CTRL+ALT+DEL: Restart 2020
- Knowledge about electromagnetic radiation
- A33 development board has a strange problem, I hope experts can give me some advice! Thank you!