Research on continuous small word speech recognition system based on HMM

Publisher:shiwanyongbingLatest update time:2011-08-05 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Abstract: In order to improve the efficiency of speech recognition and its dependence on the environment, this paper analyzes and improves the speech recognition algorithm and hardware parts. The ARMS3C2410 microprocessor is used as the main control module, the UDA1314TS audio processing chip is used as the speech recognition module, and the HMM acoustic model and Viterbi algorithm are used for pattern training and recognition. A continuous, small-word speech recognition system is designed. Experiments have shown that the speech recognition system has a high recognition rate and a certain degree of robustness. The laboratory recognition rate and outdoor recognition rate are 95.6% and 92.3% respectively.
Keywords: speech recognition; embedded system; Hidden Markov Models; ARM; Viterbi algorithm

0 Introduction
Embedded speech recognition system is a speech recognition system that uses various advanced microprocessors to implement software or hardware at the board level or chip level. The combination of embedded technology and speech recognition technology enables people to get rid of keyboards and operate intelligent terminals through voice commands. This natural and fast interaction between people and intelligent terminals helps to improve the efficiency of human-computer interaction, adapt to the characteristics of embedded platforms with less storage resources and high real-time requirements, and enhance people's control over intelligent devices. At the same time, the development of speech recognition technology is characterized by the wide application of HMM. The algorithm conducts data statistics on a large amount of speech data to establish a statistical model for identifying terms, then extracts features from the speech to be identified, matches them with these models, and obtains recognition results by comparing the matching probabilities. By establishing a large number of speech databases, a robust statistical model can be obtained to improve the recognition efficiency in various practical situations.

1 Markov chain and hidden Markov model (HMM)
The speech signal is an observable sequence. Its characteristics are approximately stable in a sufficiently small time period, but its overall process can be regarded as a transition from a relatively stable characteristic to another characteristic in sequence. Many linear models can be connected in series in the entire analysis interval, which is the Markov chain. The Markov chain is a special case of the Markov random process, that is, the Markov process with both discrete state and time parameters of the Markov chain.
Hidden Markov model is a statistical model for the time series structure of speech signal, which can be regarded as a mathematical double random process: one is the implicit random process of simulating the change of statistical characteristics of speech signal by using Markov chain with finite number of states, and the other is the random process of observation sequence associated with each state of Markov chain. The former is expressed by the latter, but the specific parameters of the former are unmeasurable.
Generally speaking, an HMM is a double random process, which is described by the following five parameters:
a.JPG

2 Implementation of speech recognition system based on HMM
The human speech process is actually a double random process. The speech signal itself is an observable time-varying sequence, which is a parameter stream of phonemes emitted by the brain according to grammatical knowledge and speech needs (unobservable states). HMM reasonably imitates this process, and well describes the overall non-stationarity and local stationarity of speech signal, which is a more ideal speech model. From the perspective of the whole speech, human speech is a non-stationary random process, but if the whole speech is divided into several short-time speech signals, these short-time speech signals can be considered as stationary processes, and these short-time speech signals can be analyzed by linear means. If a hidden Markov model is established for these speech signals, short-term stable signal segments with different parameters can be identified, and the transformation between them can be tracked, thus solving the problem of modeling the pronunciation rate and acoustic changes of speech.
The speech recognition system first converts the analog speech signal into a digital speech signal through the A/D converter in the chip, and then processes the digital speech signal (signal windowing and filtering) to obtain a clean speech signal. Then, a feature vector is made through the feature extraction process to extract speech features. Finally, the recognition process recognizes the speaker's speech and obtains the recognition result. In general, the entire recognition process is divided into several main stages, including speech signal preprocessing, speech signal feature extraction, speech library establishment, and speech signal recognition, as shown in Figure 1.

b.JPG


The speech recognition process is divided into two parts: one is the HMM training process to obtain the HMM speech recognition model, that is, to establish a basic recognition speech library; the other is the HMM recognition process to obtain the speech recognition results.

2.1 HMM training
The HMM algorithm is a common method for solving recognition problems. There are N states in an HMM model. For an observation sequence of length T, if it is calculated according to the definition, 2TNT operations are required, which is unacceptable. The HMM algorithm can simplify this process.

d.JPG


h.jpg

If the distance between P(O/λZ) and e.JPG is too large, return to step (2) and iterate until the HMM model parameters no longer change significantly.
2.2 HMM model recognition
The output probability of the HMM model is calculated using the Viterbi algorithm. Since the probability value is generally much smaller than 1, the logarithmic probability is used as the output value:
f.JPG
In the above formula, δt(i) represents the cumulative output probability of the i-th state at time t; φt(i) represents the previous state number of the i-th state at time t; j.jpg is the state at time t in the optimal state sequence; P* is the final output probability.

3 Experimental results
The system first inputs the voice signal into the UDA1341 TS digital audio processing chip through the microphone of the voice input module, and sends instructions to the UDA1341 digital audio processing chip through S3C2410. The digital audio processing chip samples the voice signal through the internal A/D, calls the voice compression algorithm to compress the voice signal, and calls the voice recognition function API to perform voice recognition based on the pattern matching algorithm on the input voice. Finally, the UDA1341 digital audio processing chip transmits the recognition result to the ARM S3C2410 through I/O. After receiving the recognition result, S3C2410 sends different instructions to UDA1341 TS according to different recognition results, so as to realize the function of the voice recognition system.
The system uses Samsung's S3C2410 as an embedded CPU, which is a cost-effective, low-power, high-performance, and highly integrated CPU based on the ARM9 core with a main frequency of 203 MHz. It is designed for network communications and handheld devices and can meet the requirements of low cost, low power consumption, high performance, and small size in the voice recognition system.
The experiment used 10 Chinese characters and tested them in outdoor and laboratory environments. The results are shown in Table 1.

g.JPG


The test shows that the results obtained by the system on the UDA1314TS DSP chip in the laboratory environment are quite satisfactory, with good robustness and the recognition rate reaching the practical requirements. However, the recognition rate under outdoor high noise conditions is somewhat lower than that in the laboratory environment, which meets the basic requirements of speech recognition.

4 Conclusions
The system in this paper adopts the speech recognition algorithm of the hidden Markov model, which can recognize small words and continuous speech with a high recognition rate. The combined application of the ARMS3C2410 microprocessor and the UDA1314TS audio processing chip can make this speech recognition system have strong real-time performance. The small size, easy to carry, flexible use and strong portability make the system able to be used in the field of industrial voice control after further improvement and development, and can also be used in people's daily life such as voice-controlled toys and voice-controlled equipment.
However, due to the limitations of technical level and hardware environment, the speech recognition system needs further research and improvement in algorithms and hardware. The research on this embedded speech recognition system has made important attempts and explorations for the further development and research of practical embedded speech recognition systems.

Reference address:Research on continuous small word speech recognition system based on HMM

Previous article:Design of Electric Power Steering System Based on DSP
Next article:A protocol stack design and implementation

Latest Industrial Control Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号