Design of speech recognition system based on SPCE061A-EEWORLD

Collect

1 Introduction

Speech recognition technology is a technology that allows machines to convert speech signals into corresponding text or commands through recognition and understanding. Speech recognition is an interdisciplinary subject and is gradually becoming a key technology for human-computer interface in information technology. The combination of speech recognition technology and speech synthesis technology enables people to get rid of keyboards and operate through voice commands. With the in-depth research of phonetics and the development of digital signal processing software and hardware technology in recent years, the application of speech technology has gradually been able to move out of the laboratory and serve the society. In particular, the speech recognition technology of isolated words in small and medium-sized characters has basically matured and has gradually begun to be applied to home appliances, smart toys and other fields where the recognition rate requirements are not extremely strict.

2. Introduction to the overall solution of the hardware system

This system uses Lingyang SPCE061A as the main control chip, and designs the hardware of the embedded speech recognition system according to the functional requirements. Since this is a dedicated SOC for speech processing that includes DSP functions and integrates a series of functions such as user A/D, D/A, module circuit, external storage FLASH, LED display circuit, communication module, power amplifier and speaker output module, etc.

Figure 1 Hardware composition of speech recognition system

2.1 Power Circuit

SPCE061A uses a low voltage power supply, which can greatly reduce the power loss of the chip. Among them, the power supply of SPCE061A is divided into two types, namely the core power supply (VDD) and the I/O port power supply (VDDH). The I/O port power supply uses a 5V voltage, while the core power supply is 3.3V or lower. The purpose of reducing the core voltage of the chip is mainly to reduce the power consumption of the chip, and it can also reduce the operating temperature of the chip and extend the service life of the chip. Although the operating voltage range of this voice chip is very large, in order to make the chip core run more stably while ensuring the operating voltage requirements of the I/O port and external expansion components, the system adopts:

The AC220V power supply is rectified by AC10V, and a +5V power supply is generated by a circuit with a 7805 voltage regulator integrated circuit as the core, which is used as the power supply for the voice recognition and playback modules. The 5V power supply is converted into DC3.3v by TR1972-33 to power the CPU core.

Figure 2 Power supply circuit

2.2 Storage module circuit design

Because the FLASH of SPCE061A has only 32Kwords, an external memory expansion is required to store a large number of voice resources. The system uses a method of expanding serial memory with SIO. This solution is designed using Lingyang's SPR4096 chip. SPR4096 is a high-performance 4M-bit (512K×8-bit) bus FLASH, divided into 256 sectors (sectors), each sector is 2K-byte. SPR4096 also has a built-in 4K×8-bit SRAM. When programming/erasing FLASH, SRAM read/write can be performed concurrently. SPR4096 has a built-in bus memory interface and a serial interface, which allows the microcontroller to access the FLASH SRAM storage area through 8-bit parallel mode or 1-bit serial mode. This example uses the serial mode, and the operating frequency of its interface is 5MHz. SPR4096 has two power input terminals VDDI and VDDQ. VDDI is used to power the internal FLASH and control logic; VDDQ is specifically used to power I/O. The maximum read current of SPR4096 is 2mA, and the maximum program/erase current is 6mA.

2.3 Audio output circuit module

The sound playback uses the DAC integrated inside the SPCE061A, which is a current output. In order to drive the speaker SPEAKER to play the sound, a corresponding driving circuit is required. The SPY0030 single op amp in the figure is a product of Lingyang Company. Compared with the commonly used single op amp LM386, the advantages of SPY0030 are that the operating voltage of LM386 needs to be above 4v, while SPY0030 only needs 2.4v to work. The output power of LM386 is below 100mw, while SPY0030 is about 700mw, which can provide sufficient driving capability. The audio output circuit is shown in Figure 3.

Figure 3 Audio output circuit

2.4 MIC Input Module

The A/D converter of SPCE061A has 8 channels, one of which is MIC-NI input, which is specially used to sample voice signals. The voice signal is converted into an electrical signal by MCI and then input into the internal preamplifier of SPCE061A. Since the distance between the microphone and the mouth is different when people speak, the energy of the voice signal will be very different. At this time, if the input signal of the chip is too large or too small, it will affect the accuracy of recognition. The automatic gain control circuit AGC inside SPCE061A can track and monitor the audio signal level output by the preamplifier at any time. When the input signal increases, the AGC circuit automatically reduces the gain of the amplifier; when the input signal decreases, the AGC circuit automatically increases the gain of the amplifier, thereby compensating for signals that are too small or too large, so that the signal entering the user A/D is kept at the optimal level and the clipping can be minimized.

2.5 Communication interface circuit

The data in the microcontroller is converted to RS-232 level through the serial port by MAX232 and then transmitted to the upper bit. Since the serial ports of SPCE06lA are all TTL level, it is incompatible with RS-232C level, so level conversion must be performed at the interface between the two. Using the MAX232 chip with an external 5V power supply and an external capacitor, a positive and negative 10V power supply can be generated to form a 232C transceiver. The communication circuit in this system is used to upload a large amount of voice data processing to the PC, which is completed by the PC. For example, the calculation of noise energy and zero-crossing rate, digital filter design, model library training, etc.

3. Software Design

In general, this system includes the software design of the speech recognition module and the software design of the speech playback module.

3.1 Speech Recognition Design

The speech recognition program is the main work of software programming. The program flow chart of the recognition module is shown in Figure 4. This system uses the commonly used energy zero-crossing rate double threshold method for speech endpoint detection, and uses the linear prediction cepstral coefficient with less calculation as the speech signal feature vector. In addition, based on the requirements of non-specific embedded systems, in order to reduce the amount of calculation and storage, after the feature parameters are extracted, the vector quantization method is used for data compression. The speech recognition model uses the (DHMM) discrete hidden Markov model, and uses the Baum-Welth re-estimation method, forward-backward algorithm, and Viterbi algorithm to complete the training of speech templates and speech recognition tasks.

Figure 4 Speech recognition module flow chart

The function of the initialization subroutine is to set the parameters of the resources related to speech recognition in the microprocessor so that they can realize the corresponding functions, such as automatic A/D conversion, etc.

Endpoint detection is used to avoid unnecessary calculations and to set the starting and ending points of speech recognition decoding to prevent invalid searches. Preprocessing is an important step to improve speech recognition performance and enhance robustness. Preprocessing includes filtering, pre-emphasis, windowing, framing, and other steps of the original speech signal. It may also include speech enhancement, noise cancellation, endpoint detection, and so on. Pre-emphasis is mainly used to enhance the high-frequency part to compensate for the loss of the high-frequency part when the sound is radiated from the lips. It can make the signal spectrum flat and reduce the dynamic range of the signal.

Feature extraction is to analyze and process the speech signal, remove the redundant information that is irrelevant to speech recognition, and extract the important information that is useful for speech recognition;

Vector quantization (VQ) is an important signal compression method that can reduce the large amount of storage space required in speech signal processing and reduce the amount of calculation for recognition and matching;

The speech signal itself is an observable sequence: it is a parameter stream of phonemes (words, sentences) emitted by the brain (unobservable) according to speech needs and grammatical knowledge (state selection), so a discrete hidden Markov model (DHMM) is used to simulate speech signals.

3.2 Module design of voice playback

In order to have a friendly human-computer interaction function, the system must also play back the voice. The voice data is saved in the form of several voice compression coding algorithms developed by Taiwan Lingyang Company. At the same time, Lingyang Company also provides the corresponding voice compression and decompression algorithm API interface, which can facilitate developers to carry out programming development work.

First, use Lingyang Compress Tool to record the required voice signal in advance, and then use Lingyang compression tool to compress it, so that the voice to be broadcasted can be obtained. The voice playback program calls the API function in the audio coding algorithm library provided by Lingyang, and uses SACM_S480 in Lingyang compression algorithm for automatic playback. The flow chart of the automatic voice playback program is shown in Figure 5 below. Voice playback is executed in the interrupt service program, and this system uses the FIQ_TMA interrupt source. There are usually two situations in voice playback: one is that the system can correctly recognize the voice, and the post-recognition processing at this time is to broadcast the correct result through voice; the other is that the system cannot correctly recognize the voice, and the reason for the failure to recognize is broadcast. The voice playback flow chart is shown in Figure 5.

Figure 5 Voice playback flow chart

4 Conclusion

The innovation of this paper is that the SPCE061A microprocessor CPU selected by the proposed embedded non-specific person speech recognition system can reach a maximum clock of 49MHZ, so it can be comparable to DSP in processing complex digital signals, but its price is cheaper than dedicated DSP chips, and it has strong interrupt processing capabilities. The system supports 10 interrupt vectors and more than 10 interrupt sources, which is suitable for real-time speech processing, and has a dual-channel 10-bit DAC audio output function, and is equipped with a microphone input method with automatic gain control function (AGC), which brings great convenience to speech processing; secondly, a discrete hidden Markov model is used to simulate speech signals, and as the computational complexity of DHMM increases in the training stage of speech templates, the computational burden of the recognition stage is greatly reduced accordingly, which can meet the requirements for speech control systems for specific people and small vocabulary. After this project was put into the market, it generated more than 500,000 economic benefits within half a year.

References

[1] Xue Junyi, Zhang Yanbin, Yu Hesong, etc. Lingyang 16-bit microcontroller principle and application [M]. Beijing University of Aeronautics and Astronautics Press, 2003. 72~89

[2] Yi Kechu et al. Speech Signal Processing. National Defense Industry Press [M]. 2000.11-15; 154-172

[3] Hu Hang. Speech Signal Processing. Harbin Institute of Technology Press [M]. 2000.88-120

[4] Hu Kai, Zhang Yingchao. Design of biochemical analyzer and its communication with PC [J]. Microcomputer Information. 2006, 9-1: 20-22

[5] Ma Hongwen. Design and implementation of automatic storage cabinet based on AT89C52 single chip microcomputer [J]. Microcomputer Information, 2006, 7-2: 10-13.

Keywords：SPCE061A Reference address：Design of speech recognition system based on SPCE061A

Previous article：Design of TM card water meter control system based on P87LPC764 single chip microcomputer
Next article：Design of fingerprint recognition anti-theft door lock based on ATMEGA32

Recommended ReadingLatest update time:2024-11-16 17:54

Design of a real-time measurement and control system for CO concentration in cars

　　1 Introduction 　　When the car air conditioner is running, the carbon monoxide generated during the vehicle stop period can easily induce poisoning accidents when it enters the car compartment. With the popularization of cars, the carbon monoxide safety accidents caused by this are also increasing. In order to redu

[Microcontroller]

Design of a real-time measurement and control system for CO concentration in cars

Popular Resources
Popular amplifiers