Design and implementation of robot voice control system based on DSP and FPGA-EEWORLD

Collect

1 Introduction

The robot hearing system mainly recognizes and judges human voices, and then outputs corresponding action commands to control the movements of the head and arms. The traditional robot hearing system generally uses a PC as a platform to control the robot. Its characteristic is that a computer is used as the information processing core of the robot to control the robot through an interface circuit. Although the processing power is relatively strong, the voice library is relatively complete, and the system update and function expansion are relatively easy, it is relatively bulky, which is not conducive to the miniaturization of the robot and working under complex conditions. In addition, the power consumption is large and the cost is high.

This design uses the cost-effective digital signal processing chip TMS320VC5509 as the speech recognition processor. It has a fast processing speed, allowing the robot to independently complete complex speech signal processing and action command control in an offline state. The development of the FPGA system reduces the area occupied by the timing control circuit and logic circuit on the PCB board [1], making the speech processing part of the robot's "brain" miniaturized and low-power. The development of a robot system that is small in size, low in power consumption, and high in speed and can complete speech recognition and action commands within a specific range has great practical significance.

2 Overall design of system hardware

The hardware function of the system is to realize the acquisition of voice commands and the drive control of stepper motors, and provide a development and debugging platform for system software. As shown in Figure 1.

The system hardware is divided into several parts: voice signal acquisition and playback, DSP-based voice recognition, FPGA action command control, stepper motor and its driver, DSP external flash memory chip, JTAG port simulation debugging and keyboard control. The workflow is that the microphone converts the human voice signal into an analog signal, which is then quantized and converted into a digital signal by the audio chip TLV320AIC23 and input into the DSP. After the DSP completes the recognition, it outputs the action command.

FPGA generates correct forward and reverse signals and accurate pulses to the stepper motor driver chip according to the action instructions input by DSP. The driver chip provides the driving signal of the stepper motor to control the rotation of the stepper motor. The off-chip FLASH is used to store system programs and voice libraries and complete the power-on loading of the system. The JTAG port is used for online simulation with the PC, and the keyboard is used for parameter adjustment and function switching.

3 Speech Recognition System Design

3.1 Characteristics of speech signals

The frequency components of speech signals are mainly distributed between 300 and 3400 Hz. According to the sampling theorem, the sampling rate of the signal is selected to be 8 kHz. One characteristic of speech signals is their "short-term nature". Sometimes they show the characteristics of random noise in a short period of time, while in another period they show the characteristics of periodic signals, or both. The characteristics of speech signals change with time. Only within a period of time can the signal show stable and consistent characteristics. Generally speaking, the short period of time can be 5 to 50 ms. Therefore, the processing of speech signals must be based on their "short-term nature" [2]. The system sets the speech signal frame length to 20 ms and the frame shift to 10 ms, so each frame of data is 160×16 bits.

3.2 Collection and playback of speech signals

The voice collection and playback chip used is TLV320AIC23B produced by TI. The analog-to-digital conversion (ADC) and digital-to-analog conversion (DAC) components of TLV320AIC23B are highly integrated inside the chip. The chip uses an 8 k sampling rate, single-channel analog signal input, and dual-channel output. TLV320AIC23 has programmable characteristics. DSP can edit the control register of the device through the control interface, and can compile SPI and I2C interfaces. The circuit connection between TLV320AIC23B and DSP5509 is shown in Figure 2.

DSP uses I2C port to set the register of TLV320AIC23. When MODE=O, it is an I2C interface. DSP uses master transmission mode to initialize 11 registers with addresses of 0000000 to 0001111 through I2C port. In I2C mode, data is written in three 8-bits. TLV320AIC23 has 7-bit address and 9-bit data, that is, the highest bit of the data item needs to be supplemented to the last bit of the second 8-bit.

The MCBSP serial port is connected to the TLV320AIC23 through six pins: CLKX, CLKR, FSX, FSR, DR and CX. Data is transmitted through the DR and DX pins to communicate with the peripherals through the MCBSP serial port, and the control synchronization signal is realized by the four pins CLKX, CLKR, FSX and FSR. Set the MCBSP serial port to DSP Mode, then synchronize the receiver and transmitter of the serial port, and start the serial port transmission by the frame synchronization signal LRCIN and LRCOUT of the TLV320AIC23, and set the data word length of the transmission and reception to 32b (16b for the left channel and 16b for the right channel) single frame mode.

3.3 Design of speech recognition program module

In order to realize the robot's recognition of non-specific voice commands, the system adopts a non-specific isolated word recognition system. Non-specific voice recognition means that the voice model is trained by people of different ages, genders, and accents, and the speaker's voice can be recognized without training [2]. The system is divided into pre-emphasis and windowing, short point detection, feature extraction, pattern matching with the voice library, and training.

3.3.1 Pre-emphasis and windowing of speech signals

The pre-emphasis processing is mainly to remove the influence of glottal excitation and oral and nasal radiation. The pre-emphasis digital filter H(Z)=1-KZ-1, where is the pre-emphasis coefficient, close to 1, and k is 0.95 in this system. The speech sequence X(n) is pre-emphasized to obtain the pre-emphasized speech sequence x(n): x(n)=X(n)-kX(n-1) (1)

The system uses a finite-length Hamming window to slide on the speech sequence to intercept the speech signal with a frame length of 20 ms and a frame shift of 10 ms. The use of the Hamming window can effectively reduce the loss of signal features.

3.3.2 Endpoint Detection

Endpoint detection detects the beginning and end of a word when there is enough time gap between words. It usually detects short-time energy distribution. The equation is:

Among them, x(n) is the speech sequence intercepted by the Hamming window, and the sequence length is 160, so N is 160. For a silent signal, E(n) is very small, but for a sound signal, E(n) will quickly increase to a certain value, thereby distinguishing the starting point and the ending point of the word.

3.3.3 Feature vector extraction

The feature vector is to extract effective information from the speech signal for further analysis and processing. Currently, commonly used feature parameters include linear prediction cepstral coefficients LPCC, Mel cepstral coefficients MFCC, etc. The speech signal feature vector is extracted using the Mel Frequency Cepstrum Coeficient (MFCC). The MFCC parameter is based on the human auditory characteristics. It uses the critical band effect of human hearing [3] and uses the MEL cepstrum analysis technology to process the speech signal to obtain the MEL cepstrum coefficient vector sequence. The MEL cepstrum coefficients are used to represent the input speech spectrum. Several bandpass filters with triangular or sinusoidal filtering characteristics are set within the speech spectrum range, and then the speech energy spectrum is passed through the filter group. The output of each filter is calculated, the logarithm is taken, and discrete cosine transform (DCT) is performed to obtain the MFCC coefficient. The transformation formula of the MFCC coefficient can be simplified as follows:

Among them, i is the number of triangular filters, this system selects P as 16, F(k) is the output data of each filter, and M is the data length.

3.3.4 Pattern Matching and Training of Speech Signals

Model training is to train the feature vector to establish a template, and pattern matching is to match the current feature vector with the template in the speech library to obtain the result. The pattern matching and training of the speech library adopts the Hidden Markov Model (HMM), which is a probability model of the statistical characteristics of a statistical random process and a double random process. Because the Hidden Markov Model can well describe the non-stationarity and variability of speech signals, it is widely used [4].

There are three basic algorithms for HMM: Viterbi algorithm, forward-backward algorithm, and Baum-Welch algorithm. This design uses the Viterbi algorithm for state discrimination and matches the feature vector of the collected speech with the model of the speech library. The Baum-Welch algorithm is used to solve the training of speech signals. Since the observation features of the model are independent between frames, the Baum-Welch algorithm can be used to train the HMM model.

3.4 DSP development of speech recognition program

The DSP development environment is CCS3.1 and DSP/BIOS. The speech recognition and training programs are made into modules, defined as different functions, and called in the program. The speech recognizer function is defined as int Recognizer (int Micin), the recognition result output function is int Result (void), the speech trainer function is int Train (int Tmode, int Audiod), and the action command input function is int Keyin (int Action [5]).

The function of the speech recognizer is to transform the current speech input into a speech feature vector, match the template of the speech library and output the result. The speech response output function outputs the speech response corresponding to the acquired speech recognition result. Speech training is to convert the speech command input of multiple people of different ages, genders and accents into the template of the training library. In order to prevent sample errors, each person's speech command needs to be trained twice. For the two inputs, the Euclidean distance is used for pattern matching. If the similarity of the two inputs reaches 95%, they are added to the sample set. The speech response input function is to input the opposite speech output for each template in the speech library to achieve the purpose of language response. The system working state is to execute the language recognition subroutine, execute external interrupts during training, execute the training function, obtain the database template, and return after training. The program flowchart is shown in Figure 3.

4. Design of robot motion control system

4.1 FPGA logic design

The system controls the robot head movement through voice. The head movement is divided into two degrees of freedom: up and down and left and right. Two stepper motors are needed to control it. After the DSF completes voice recognition, it outputs the corresponding action command. After the action is executed, the DSP issues a zeroing command and the head returns to the initial state. The role of FPGA is to provide DSP interface logic, set the RAM block to store DSP instructions, and generate stepper motor drive pulses to control the rotation direction and angle of the stepper motor.

The FPGA device is the action instruction control unit. The design uses FLEXlOKE chip. After receiving DSP data, it controls two stepper motors in parallel. The internal structure logic of FPGA is shown in Figure 4. Two components are set inside FPGA as motor pulse generators to control the working pulses and forward and reverse rotation of the motor. AO~A7 are DSP data input ports, WR is the data write port, P1 and P2 are the pulse input ports of two stepper motor driver chips, L1 and L2 are the motor forward and reverse control ports, and ENABLE is the enable signal.

RAM1 and RAM2 are the command registers of the two stepper motors respectively. The motor pulse generator sends out square wave pulses corresponding to the number in RAM. DSP outputs 8-bit instructions through DO~D8 data terminal, where D8 is RAM selection. When it is 1, RAM1 is selected, and when it is 0, RAM0 is selected. DO~D7 is the output motor angle. The electrode rotation angle is 120° up and down and left and right, with an accuracy of 1°. The initial value is 60°. The range of DO~D7 is 00000000~11111000, and the initial value is 00111100. FPGA acts as a stepper pulse generator, controls the motor speed through clock cycle configuration, and the coordinates corresponding to the initial value determine the forward and reverse rotation. The system action instruction program is shown in Figure 5.

Among them, R1 is the DSP instruction register, and R2 is the current coordinate register. The rotation direction and angle of the stepper motor are determined by performing a difference operation between the output coordinates of the DSP and the current coordinates of the FPGA. The advantage is that the current action can be ended to run new instructions according to the changes in the new input instructions. After the instruction is executed, the system is reset and the stepper motor returns to the initial state.

4.2 FPGA Logic Simulation

FPGA uses MAX-PLUSⅡ development platform and VHDL language to design the above logic functions, and debugs them through JTAG interface. FLEXl0KE chip can output correct positive and negative reverse signals and pulse waveforms according to DSP output instructions.

4.3 Stepper Motor Drive Design

FPGA controls the stepper motor driver chip through P1, L1, P2, and L2 outputs. The stepper motor driver uses the single-chip sinusoidal subdivision two-phase stepper motor driver dedicated chip TA8435H produced by Toshiba. The circuit connection between FPGA and TA8435H is shown in Figure 6.

Since the operating voltage of FLEX1OKE and TMS320VC5509 is 3.3 V, while that of TA8435H is 5 V and 25 V, the pin connection uses the optocoupler device TLP521 to isolate the voltage on both sides. CLK1 is the clock input pin, CW/CCW is the forward and reverse control pin, and A, A, B, B are the two-phase stepper motor input.

5 Conclusion

The system makes full use of the high processing speed of DSP and the expandable off-chip storage space. It has the characteristics of high speed, real-time, high recognition rate and supports a large voice library. The use of FPGA simplifies the system circuit. One FLEX10KE chip can complete the timing control of 2 stepper motors. Although there is a certain gap between the processing speed and the storage capacity of the voice library and the PC system, the embedded system with DSP and FPGA as the core undoubtedly has a broad prospect in the miniaturization, low power consumption and specific function realization of the robot.

Reference address：Design and implementation of robot voice control system based on DSP and FPGA

Previous article：Design of hardware development platform based on DSP E1-16XS
Next article：Design of welding current detection system based on DSP

Recommended ReadingLatest update time:2024-11-16 22:24

DSP power management technology speeds up design progress

Many battery-powered handheld systems require digital signal processing, and when designing such products, power consumption must be a top priority. Choosing a DSP that meets the computational power requirements and falls within the power budget can make the difference between a design's success or failure in the mar

[Power Management]

Video codec on embedded DSP

　　With the increasingly widespread application of digital multimedia, video decoding has become an essential element in embedded system design. There are many video standards, and depending on the product, one or more of them may be implemented. Of course, this is not all. Video is only part of the multimedia code str

[Home Electronics]

Electronic Technology Decrypted: Simplifying FPGA Power Supply Design

　　FPGA is a chip with multiple power requirements, mainly three types of power requirements: 　　1.Vccint core operating voltage 　　Generally, the voltage is very low, and the commonly used FPGAs are around 1.2V. To power various internal logics of the FPGA, the current ranges from several hundred milliamperes to several

[Power Management]

Electronic Technology Decrypted: Simplifying FPGA Power Supply Design

Design of vector control system for asynchronous motor based on DSP

　　 introduction 　　In recent years, with the rapid development of power electronic devices and modern control theory, brushless DC motors have been widely used in the fields of optical drives, intelligent robots, electric vehicles, etc. because they have no contact commutation device and no sparks caused by commutati

[Embedded]

Design of vector control system for asynchronous motor based on DSP

Data Communication between ARM Processor and DSP under Embedded Linux

The development trend of industrial measurement and control systems in the past 20 years is: decentralized control and centralized management, standardization and openness. Industrial measurement and control systems have shifted from traditional centralized measurement and control systems to networked distribut

[Microcontroller]

Data Communication between ARM Processor and DSP under Embedded Linux

Design of edible peanut oil quality rapid detection instrument based on ARM+FPGA

There are many kinds of edible oils, and the detection methods of different types of edible oils are not the same. Taking peanut oil quality detection as an example, it can be seen from the special physical properties of peanut oil that peanut oil begins to crystallize at 0-5℃ . Other types of edible oils do not cry

[Industrial Control]

Design of edible peanut oil quality rapid detection instrument based on ARM+FPGA

Application of high-speed A/D conversion chip ADC08D1000 based on FPGA

0 Introduction The ultra-high-speed ADC-ADC08D1000 of National Semiconductor is a high-performance analog/digital conversion chip. It has a dual-channel structure, and the maximum sampling rate of each channel can reach 1.6 GHz, and can achieve 8-bit resolution; when using the dual-channel "inter-plug" mode

[Embedded]

Discussion on the key to extend battery life in DSP system--DC/DC regulator

　　A long-standing challenge for designers of MP3 players, personal media players, digital cameras, and other portable consumer applications is to achieve high performance and low power consumption. These battery-powered systems typically use an embedded digital signal processor ( DSP ) that maximizes processing po

[Embedded]

Discussion on the key to extend battery life in DSP system--DC/DC regulator

Popular Resources
Popular amplifiers