Universal voice control system based on Sunplus microcontroller-EEWORLD

Collect

With the rapid development of electronic technology, household appliances and other electronic products are becoming more and more humanized. In order to realize voice processing and voice control, a general voice control system based on Lingyang single-chip microcomputer is designed here. The system is mainly used to complete comprehensive functions such as voice recognition, voice control and voice playback, so that household appliances and other electronic products can realize automatic voice control. At present, there is no such design scheme in China. The scheme is mainly divided into voice recognition module design, voice playback module design and voice control module design. Only a single chip can realize voice processing and control functions. In addition, by developing a complete set of graphical general voice integration software, users only need to enter the relevant parameters of the voice material to realize the automatic generation of code. The following introduces the design scheme of the general voice control system.

1 Design scheme of general voice control system
Figure 1 shows the block diagram of the general voice control system.

The main function of the voice recognition module is to complete voice recognition. This design adopts the specific speaker recognition mode. The voice recognition function uses the voice recognition circuit and is realized through software programming. The main function of the voice playback module is to complete voice playback. The voice playback function uses the voice playback circuit and is realized through software programming. The main function of the voice control module is to realize automatic control of voice recognition and voice playback. The voice control function uses the voice control circuit and is realized through software programming. The main function of the main controller part is to embed the voice control system into home appliances and other electronic products, so that all functions of electronic products can be automatically controlled by voice.
1. 1 Introduction to Lingyang MCU SPCE061A
SPCE061A is a 16-bit microcontroller launched by Lingyang Technology. The CPU clock frequency is 0.32～49.152 MHz, with a high processing speed, which enables μ'nSPTM to process complex digital signals very easily and quickly; it has programmable audio processing; built-in 2 KWord SRAM and 32 KWord FLASH; 2 16-bit programmable timers/counters (which can automatically preset the initial count value), 2 10-bit DAC output channels, and 32-bit general programmable input/output ports. It is a relatively economical choice in the field of digital speech recognition applications. 1.2
Universal voice integration software
A set of universal voice integration software has been developed. Users do not need to modify the code. They only need to enter the relevant parameters of the voice material to realize the automatic generation of the code. As shown in Figure 2, it is the operation interface of the universal voice integration software v0.1.

[page]

2 Voice playback module design
Voice processing can be roughly divided into A/D, encoding processing, storage, decoding processing and D/A. However, the WAVE file generated by the microphone input occupies a large storage space. It is obviously impossible for a single-chip microcomputer to store a large amount of information. Lingyang SPCE061A proposed a solution, namely SACM-LIB. This library can make A/D, encoding, decoding, storage and D/A into corresponding modules. Each corresponding module has its application program interface API, so you only need to understand the functions to be implemented by each module and the content of its parameters, and then call the API function to implement the function.

3 Voice recognition module design
3.1 Voice recognition principle
The voice recognition system includes two major parts: learning and training process and recognition process. The basic principle is shown in Figure 3.

(1) Preprocessing. This includes pre-emphasis, windowing and framing, endpoint detection and other processing processes. Before preprocessing, there is also a digitization process for speech signals. Among them, anti-aliasing filtering, A/D conversion automatic gain and other processes are used to remove the influence of glottal excitation, oral and nasal radiation, high-frequency signals higher than 1/2 sampling rate and noise signals to achieve digitization of speech signals.
(2) Acoustic feature analysis and extraction. After preprocessing, the speech signal needs to be subjected to feature extraction, that is, feature parameter analysis. This process is to extract feature parameters that can reflect the essence of speech from the original speech signal to form a feature vector sequence. At present, there are two main types of feature parameters used in speech recognition: linear prediction cepstral coefficients (LPCC) and Mel frequency cepstral coefficients (MFCC). LPCC coefficients mainly simulate the human voice model and do not consider the auditory characteristics of the human ear. MFCC coefficients take into account the auditory characteristics of the human ear, but calculating the Fourier transform will consume a lot of valuable computing resources. Therefore, LPCC coefficients are generally used in embedded speech recognition systems. Speech feature extraction is frame-by-frame extraction, and the feature parameters of each frame generally constitute a vector. Therefore, it is necessary to use very effective data compression technology to compress the data.
(3) Reference template. When using the reference template, the speech parameters of one or more speakers are repeatedly trained. It is an acoustic parameter template, which is obtained and stored before the system is used for recognition.
(4) Judgment recognition. Pattern recognition is to compare and analyze the input speech feature parameters to be recognized with the reference speech pattern obtained by training one by one, and the best matching reference pattern obtained is the recognition result. At present, the commonly used speech recognition algorithms mainly include dynamic time rules, discrete hidden Markov models, continuous hidden Markov models and artificial neural networks.
3.2 Principle and algorithm of speech recognition system
In the built-in 8-channel 10-bit analog to digital converter (ADC) of SPCE061A, a channel MIC_IN is specially set up for speech input only. For the conversion of weaker signals, an audio amplifier (automation gain control, AGC) is also designed. After automatic gain control amplification, A/D conversion is performed. In fact, the analog to digital converter can be regarded as an encoder that realizes analog to digital signal conversion. The principle of ADC conversion is to sequentially send the digital values set in the successive approximation register SAR to the 10-bit DAC0 for D/
A conversion. The voltage analog output value of DAC0 is compared with the external voltage analog input value in order to find the digital output of the external voltage analog value as soon as possible. The output analog value VDAC0 can be compared with the sampled input voltage value VIN by using the half-search method, that is, starting from the most significant bit in SAR, the digital value is determined bit by bit according to the size of the comparison, and the remaining bits are "0". The principle of the speech recognition algorithm is: in the training process, the corresponding feature vectors that can fully describe the behavior of each speaker are extracted from the training sentences issued by each speaker. These feature vectors are called templates of each speaker. In the test phase, the test template is extracted from the speech signal issued by the speaker in the same processing method and compared with the corresponding reference template. Since the speaker's pronunciation changes each time, the test template and the reference template cannot be completely consistent in time scale. In order to compare the two at the time equivalent point, the dynamic time warping
method (DTW) is used. The basic principle is to use a nonlinear regularization technique to match the reference sample feature vector sequence A=[a1, a2, ..., aM] with the feature vector sequence B=[b1, b2, ..., bN] of the speech to be recognized. At present, the DTW algorithm based on dynamic time warping matching may be the most compact speech recognition algorithm with low system overhead and fast recognition speed. It is a very effective algorithm in voice command control systems with small vocabulary. In the training stage, the user speaks the words in the vocabulary one by one, extracts the feature vector of each frame of sampled data, and stores the feature vector as a template in the template library. In the recognition stage, the feature vector of the speech to be recognized is extracted, and then the similarity is compared with each template in the template library one by one, and the one with the highest similarity is output as the result.

[page]

4 Voice control module design
4.1 Voice control module design
The functions of the voice control module hardware part are completed by the designed ultra-small mini circuit board. The hardware has the characteristics of simple structure, low cost, small size, and easy to embed in household appliances. The allocation and function of each I/O port are IOB4~IOB7 are common ports for sending signals to the outside after recognizing specific voices; IOB0~IOB1 are reserved ports for sending signals to the outside; IOB2~IOB3 are external interrupt trigger ports for external triggering SPCE061, which can also be used as output ports when resources are tight, as shown in Figure 4.

4.2 Main program flow
So far, the whole system is complete. When in use, the first time to perform specific person voice recognition, after successful recognition, automatically stored in FLASH, no need for further training, after power on automatically loaded into RAM, and then recognition, when the specific voice is recognized, IOB will send a corresponding signal.

5 Design of general voice integration software
5.1 Overall design of integrated software and tools used
The integrated development environment of this solution is designed using VB. NET of Visual Studio (VS) 2005. Visual Basic. NET is one of the important members of the new integrated development environment VS. NET launched by Microsoft. It is simple, easy to learn and easy to use. Many new features are also adapted to the needs of the new generation of software development. As long as you have a little language foundation, you can quickly master it and then master it. The visual user interface design function frees programmers from the tedious and complex interface design; the "what you see is what you get" function of the visual programming environment makes the interface design like a building block game.
Using Visual Studio (VS) 2005 to write this integrated development environment greatly reduces the development intensity, shortens the development cycle, and greatly improves its code strength and portability.
5.2 Introduction to the use of general voice integration software
The general voice integration software includes all the code automatic generation of voice recognition module design, voice playback module design, and voice control module design, as shown in Figure 2. You can add a prompt sound at the beginning. Click the open file option and select the voice you want to play. The box behind it is checked to indicate that this function is available; the following is the playback and recognition of 5 voices. Each function is the same. The following is a brief introduction: The first voice prompt function is the same as above. The trigger port refers to the port number triggered after the voice is recognized. There are IOB0~IOB7. The level refers to whether the outgoing signal is low or high. There are two options: high level "1" and low level "0" to adapt to different situations. The successful training voice prompt refers to the prompt for each successful training instruction during the first training; the failed training voice prompt refers to the prompt for the failure of the first training; the completed training voice prompt refers to the voice instruction prompt when the first training is successful.

6 Conclusion
This universal voice control system has a simple structure, low cost, strong scalability, and a short development cycle. It can be widely used in household appliances and other electronic products. If this solution can be successfully applied, it will definitely produce good economic and social benefits.

Reference address：Universal voice control system based on Sunplus microcontroller

Previous article：Single chip microcomputer control of stepper motor based on L297/L298 chip
Next article：Space vector control variable frequency power supply based on 68HC908MR16 single chip microcomputer

Popular Resources
Popular amplifiers