Design and implementation of DVD/TV voice-controlled remote control[Copy link]
Introduction: Generally, when using a DVD player, a TV is used to display the played image. Therefore, the DVD player remote control and the TV remote control must be used at the same time, which is very inconvenient. People hope to use a remote control to control both the DVD player and the TV. The remote control designed in this article is a DVD/TV remote control. The DVD remote control is customized, and its control code type is completely determined. However, the TVs used by users are of various types, and their control code types are also various and cannot be determined in advance. Therefore, the TV remote control needs to be learned, that is, the control code of the TV is sent to the remote control through the learning process. This remote control is a learning remote control that is suitable for various types of TVs.
This remote control uses voice recognition function, which can be controlled by buttons or voice commands, bringing great convenience to people.
System hardware design: The system main chip uses UniSpeech, a voice-specific chip with dual cores of DSP and single-chip microcomputer (M8051) newly launched by Tsinghua University and Infineon. The chip integrates 12-bit ADC and 11-bit DAC, eliminating the need for additional CODEC devices. Due to the adoption of SoC structure, the number of chips constituting the system is small, and the system integration and stability are high. Therefore, this chip is very suitable for the development of applications such as remote controllers with voice recognition functions.
The emission of infrared signals is achieved by driving the infrared transmitting tube through the MCU's function pin PWM. Since the PWM pin has a pulse width modulation function, as long as the corresponding register is set, a carrier signal with a certain duty cycle can be output. Therefore, no additional driver is required to send the required infrared signal. The reception of infrared rays is achieved by connecting the infrared receiving tube to the general input/output pin of the MCU.
Considering that the amount of storage data required for voice acoustic models, remote control codes, etc. is relatively large, this paper uses SST's 8Mbits Flash memory 39VF080.
A more important point in the design of remote controllers is the function design of buttons. Combined with the habits of general users using DVD and TV remote controllers, this design only sets a small number of 4×4 scanning buttons. Only the most commonly used functions can be controlled by buttons and voice, and other functions can be completely realized by voice control. The system hardware block diagram is shown in Figure 1.
In the system, MCU is the main control chip, which completes the control of various interfaces and the configuration of the system. DSP is used as a coprocessor to complete the speech recognition algorithm and speech synthesis calculation. The input speech from MIC is sampled by ADC at 8kHz and quantized by 12bit linearly before being sent to DSP for processing.
Since the speech recognition part of the system is designed for non-specific people, the trained acoustic model and the edited speech command terms must be prepared in advance. The system is connected to the computer serial port through UART, and the edited speech command terms and acoustic models are stored in Flash. Considering the needs of actual use, this paper corresponds each speech command term (such as "power on" and "power off") to the remote control code corresponding to a key. In this way, when different voice commands are input, the matching instructions are obtained through speech recognition, and the same control effect as the corresponding key can be achieved.
System software design This system software is designed as a real-time system with a super-loop structure. All task modules are linked by constructing a super-loop in the main program. After the task-level program obtains control, it first checks whether there is an event to be processed. If not, it gives up control so that the next task in the super-loop task chain can be controlled. If there is indeed an event to be processed, then the event is processed completely or partially, and then control is immediately handed over. Through such collaboration, all tasks only take up a small amount of system running time each time they are executed. The system flow is shown in Figure 2. The code structure is as follows: void main(void) { EA=0; //Shield interrupt Init(); //System initialization EA=1; //Open interrupt while(1) { Drv_Ring(); //Driver layer, responsible for scanning the keyboard and detecting whether there is voice input App_Ring(); //Application layer, responsible for transmitting and receiving infrared signals and recognizing voice commands } }
The switching between function modules is realized through a function switch and a voice recognition start key. When the function switch is set to the "LEARN" learning function, the system enters the learning function; when it is set to the "DVD" position, the system enters the DVD function; when it is set to the "TV" position, the system enters the TV function.
DVD function module The remote control code of the DVD remote control has been provided by the DVD player manufacturer. To this end, the DVD remote control code type will be directly determined in the program according to the information provided by the manufacturer. Therefore, when the key is pressed to send the infrared signal, the corresponding code type is also directly determined by the program. Transmitted through the PWM pin.
TV function module For the TV function, its remote control code type must be obtained through the learning process and stored in the data Flash. When the user presses the key, the program reads the corresponding remote control code type from the data Flash according to the key value, and then sends the remote control code through the function pin PWM.
Speech recognition module When the speech recognition start key is pressed, the system enters the recognition state and starts to receive the input of the voice command. After recognizing the voice command, the corresponding remote control code is sent according to the recognition result. The basic structure of the speech recognition subsystem is shown in Figure 3.
The speech recognition engine adopts a sub-word-based non-specific person speech recognition model, which is more flexible and robust than the previous isolated word-based whole word model.
According to their respective functions, the recognition algorithm can be roughly divided into three parts: feature extraction, model parameter training and recognition network decoding. For a subword-based non-specific person embedded speech recognition engine, the acoustic model (using HMM) is irrelevant to the recognition task, and the model parameters are relatively fixed. Therefore, the model parameter training process can be completed on the PC platform (see the dotted box in Figure 3). The only modules that need to be embedded on the chip are feature extraction and recognition network decoding. Feature extraction uses MFCC parameters as the speech feature parameters of the speech recognition engine, and recognition network decoding uses the Viterbi search algorithm. In order to ensure that the algorithm can achieve high recognition accuracy while occupying less resources, a two-level recognition structure is adopted, and a recognition accuracy of 99% is achieved on a dedicated chip.
Learning module When the system is in the learning function state, the system detects the infrared remote control code. According to the button selected by the user, the received infrared remote control information is stored in the Flash data area corresponding to the button. When the button is pressed next time, the newly learned infrared remote control code can be taken out from the Flash data area corresponding to the button.