Design of robot voice control system based on DSP and FPGA-EEWORLD

Collect

1 Introduction

The robot hearing system mainly recognizes and judges human voices, and then outputs corresponding action commands to control the movements of the head and arms. The traditional robot hearing system generally uses a PC as a platform to control the robot. Its characteristic is that a computer is used as the information processing core of the robot to control the robot through an interface circuit. Although the processing power is relatively strong, the voice library is relatively complete, and the system update and function expansion are relatively easy, it is relatively bulky, which is not conducive to the miniaturization of the robot and working under complex conditions. In addition, it consumes a lot of power and is costly.
This design uses the digital signal processing chip TMS320VC5509 with a high cost performance as the voice recognition processor. It has a fast processing speed, which enables the robot to independently complete complex voice signal processing and action command control in an offline state. The development of the FPGA system reduces the area occupied by the timing control circuit and the logic circuit on the PCB board, making the voice processing part of the robot's "brain" miniaturized and low-power. The development of a robot system with a small size, low power consumption, high speed, and the ability to complete voice recognition and action commands in a specific range has great practical significance.

2 Overall design of system hardware

The hardware function of the system is to realize the collection of voice commands and the drive control of the stepper motor, and provide a development and debugging platform for the system software, as shown in Figure 1.

The system hardware is divided into several parts: voice signal acquisition and playback, DSP-based voice recognition, FPGA action command control, stepper motor and its driver, DSP external flash memory chip, JTAG port simulation debugging and keyboard control. The workflow is that the microphone converts the human voice signal into an analog signal, which is then quantized by the audio chip TLV320AIC23 and converted into a digital signal for input into the DSP. After the DSP completes the recognition, it outputs the action command.

FPGA generates correct forward and reverse signals and accurate pulses to the stepper motor driver chip according to the action instructions input by DSP. The driver chip provides the driving signal of the stepper motor to control the rotation of the stepper motor. The off-chip FLASH is used to store system programs and voice libraries and complete the power-on loading of the system. The JTAG port is used for online simulation with the PC, and the keyboard is used for parameter adjustment and function switching.

3 Speech Recognition System Design

3.1 Characteristics of speech signals

The frequency components of speech signals are mainly distributed between 300 and 3400 Hz. According to the sampling theorem, the sampling rate of the signal is selected to be 8 kHz. One characteristic of speech signals is their "short-term nature". Sometimes they show the characteristics of random noise in a short period of time, while in another period they show the characteristics of periodic signals, or both. The characteristics of speech signals change with time. Only within a period of time can the signal show stable and consistent characteristics. Generally speaking, the short period of time can be 5 to 50 ms. Therefore, the processing of speech signals must be based on their "short-term nature" [2]. The system sets the speech signal frame length to 20 ms and the frame shift to 10 ms, so each frame of data is 160×16 bits.

3.2 Collection and playback of speech signals

The voice collection and playback chip used is TLV320AIC23B produced by TI. The analog-to-digital conversion (ADC) and digital-to-analog conversion (DAC) components of TLV320AIC23B are highly integrated inside the chip. The chip uses an 8 k sampling rate, single-channel analog signal input, and dual-channel output. TLV320AIC23 has programmable characteristics. DSP can edit the control register of the device through the control interface, and can compile SPI and I2C interfaces. The circuit connection between TLV320AIC23B and DSP5509 is shown in Figure 2.

DSP uses I2C port to set the register of TLV320AIC23. When MODE=O, it is an I2C interface. DSP uses master transmission mode to initialize 11 registers with addresses of 0000000 to 0001111 through I2C port. In I2C mode, data is written in three 8-bits. TLV320AIC23 has 7-bit address and 9-bit data, that is, the highest bit of the data item needs to be supplemented to the last bit of the second 8-bit.

The MCBSP serial port is connected to the TLV320AIC23 through six pins: CLKX, CLKR, FSX, FSR, DR and CX. Data is transmitted through the MCBSP serial port to communicate with peripherals through the DR and DX pins, and the control synchronization signal is realized by the four pins CLKX, CLKR, FSX and FSR. Set the MCBSP serial port to DSP Mode, then synchronize the receiver and transmitter of the serial port, and start the serial port transmission by the frame synchronization signal LRCIN and LRCOUT of the TLV320AIC23, and set the data word length of the transmission and reception to 32b (16b for the left channel and 16b for the right channel) single frame mode.

3.3 Design of speech recognition program module

In order to realize the robot's recognition of non-specific voice commands, the system adopts a non-specific isolated word recognition system. Non-specific voice recognition means that the voice model is trained by people of different ages, genders, and accents, and the speaker's voice can be recognized without training [2]. The system is divided into pre-emphasis and windowing, short point detection, feature extraction, pattern matching with the voice library, and training.

3.3.1 Pre-emphasis and windowing of speech signals

The pre-emphasis processing is mainly to remove the influence of glottal excitation and oral and nasal radiation. The pre-emphasis digital filter H(Z)=1-KZ-1, where is the pre-emphasis coefficient, close to 1. In this system, k is 0.95. Pre-emphasize the speech sequence X(n) to obtain the pre-emphasized speech sequence

x(n):x(n)=X(n)-kX(n-1) (1)

The system uses a finite-length Hamming window to slide on the speech sequence to intercept the speech signal with a frame length of 20 ms and a frame shift of 10 ms. The use of the Hamming window can effectively reduce the loss of signal features.

3.3.2 Endpoint Detection

Endpoint detection detects the beginning and end of a word when there is enough time gap between words. It usually detects short-time energy distribution. The equation is:

Among them, x(n) is the speech sequence intercepted by the Hamming window, and the sequence length is 160, so N is 160. For a silent signal, E(n) is very small, but for a sound signal, E(n) will quickly increase to a certain value, thereby distinguishing the starting point and the ending point of the word.

3.3.3 Feature vector extraction

The feature vector is to extract effective information from the speech signal for further analysis and processing. Currently, the commonly used feature parameters include linear prediction cepstrum coefficients LPCC, Mel cepstrum coefficients MFCC, etc. The speech signal feature vector is extracted using the Mel Frequency Cepstrum Coeficient MFCC. The MFCC parameter is based on the human auditory characteristics. It uses the critical band effect of human hearing and uses the MEL cepstrum analysis technology to process the speech signal to obtain the MEL cepstrum coefficient vector sequence. The MEL cepstrum coefficients are used to represent the spectrum of the input speech. Several bandpass filters with triangular or sinusoidal filtering characteristics are set within the speech spectrum range, and then the speech energy spectrum is passed through the filter group. The output of each filter is calculated, the logarithm is taken, and a discrete cosine transform (DCT) is performed to obtain the MFCC coefficient. The transformation formula of the MFCC coefficient can be simplified as follows:

Among them, i is the number of triangular filters, this system selects P as 16, F(k) is the output data of each filter, and M is the data length.

3.3.4 Pattern Matching and Training of Speech Signals

Model training is to train the feature vector to establish a template, and pattern matching is to match the current feature vector with the template in the speech library to obtain the result. The pattern matching and training of the speech library adopts the Hidden Markov Model HMM (Hidden Markov Models), which is a probability model of the statistical characteristics of a statistical random process and a double random process. Because the Hidden Markov Model can well describe the non-stationarity and variability of speech signals, it is widely used.

There are three basic algorithms for HMM: Viterbi algorithm, forward-backward algorithm, and Baum-Welch algorithm. This design uses the Viterbi algorithm for state discrimination and matches the feature vector of the collected speech with the model of the speech library. The Baum-Welch algorithm is used to solve the training of speech signals. Since the observation features of the model are independent between frames, the Baum-Welch algorithm can be used to train the HMM model.

3.4 DSP Development of Speech Recognition Program

The development environment of DSP is CCS3.1 and DSP/BIOS. The speech recognition and training programs are made into modules respectively, defined as different functions, and called in the program. The speech recognition function is defined as int Recognizer (int Micin), the recognition result output function is int Result (void), the speech training function is int Train (int Tmode, int Audiod), and the action command input function is int Keyin (int Action).

The function of the speech recognizer is to transform the current speech input into a speech feature vector, match the template of the speech library and output the result. The speech response output function outputs the speech response corresponding to the acquired speech recognition result. Speech training is to convert the speech command input of multiple people of different ages, genders and accents into the template of the training library. In order to prevent sample errors, each person's speech command needs to be trained twice. For the two inputs, the Euclidean distance is used for pattern matching. If the similarity of the two inputs reaches 95%, they are added to the sample set. The speech response input function is to input the opposite speech output for each template in the speech library to achieve the purpose of language response. The system working state is to execute the language recognition subroutine, execute external interrupts during training, execute the training function, obtain the database template, and return after training. The program flowchart is shown in Figure 3.

4. Design of robot motion control system

4.1 FPGA Logic Design

The system controls the robot's head movements through voice. The head movements are divided into two degrees of freedom: up and down and left and right. Two stepper motors are required for control. After the DSF completes voice recognition, it outputs the corresponding action instructions. After the action is executed, the DSP issues a zeroing instruction and the head returns to the initial state. The role of the FPGA is to provide DSP interface logic, set the RAM block to store DSP instructions, and generate stepper motor drive pulses to control the rotation direction and angle of the stepper motor.

The FPGA device is the action command control unit. The design uses the FLEXlOKE chip. After receiving the DSP data, it controls two stepper motors in parallel. The internal structure logic of the FPGA is shown in Figure 4. Two components are set inside the FPGA as motor pulse generators to control the working pulses and forward and reverse rotation of the motor. AO~A7 are DSP data input ports, WR is the data write port, P1 and P2 are the pulse input ports of the two stepper motor driver chips, L1 and L2 are the motor forward and reverse control ports, and ENABLE is the enable signal.

RAM1 and RAM2 are the command registers of the two stepper motors respectively. The motor pulse generator sends out square wave pulses corresponding to the number in RAM. DSP outputs 8-bit instructions through DO~D8 data terminal, where D8 is RAM selection. When it is 1, RAM1 is selected, and when it is 0, RAM0 is selected. DO~D7 is the output motor angle. The electrode rotation angle is 120° up and down and left and right, with an accuracy of 1°. The initial value is 60°. The range of DO~D7 is 00000000~11111000, and the initial value is 00111100. FPGA acts as a stepper pulse generator, controls the motor speed through clock cycle configuration, and the coordinates corresponding to the initial value determine the forward and reverse rotation. The system action instruction program is shown in Figure 5.

Among them, R1 is the DSP instruction register, and R2 is the current coordinate register. The rotation direction and angle of the stepper motor are determined by performing a difference operation between the output coordinates of the DSP and the current coordinates of the FPGA. The advantage is that the current action can be ended to run new instructions according to the changes in the new input instructions. After the instruction is executed, the system is reset and the stepper motor returns to the initial state.

4.2 FPGA Logic Simulation

FPGA uses MAX-PLUSⅡ development platform and VHDL language to design the above logic functions, and debugs them through JTAG interface. FLEXl0KE chip can output correct positive and negative reverse signals and pulse waveforms according to DSP output instructions.

4.3 Stepper Motor Drive Design

FPGA controls the stepper motor driver chip through P1, L1, P2, and L2 outputs. The stepper motor driver uses the single-chip sinusoidal subdivision two-phase stepper motor driver dedicated chip TA8435H produced by Toshiba. The circuit connection between FPGA and TA8435H is shown in Figure 6.

Since the operating voltage of FLEX1OKE and TMS320VC5509 is 3.3 V, while that of TA8435H is 5 V and 25 V, the pin connection uses the optocoupler device TLP521 to isolate the voltage on both sides. CLK1 is the clock input pin, CW/CCW is the forward and reverse control pin, and A, A, B, B are the two-phase stepper motor input.

5 Conclusion

The system makes full use of the high processing speed of DSP and the expandable off-chip storage space. It has the characteristics of high speed, real-time, high recognition rate and supports a large voice library. The use of FPGA simplifies the system circuit. One FLEX10KE chip can complete the timing control of 2 stepper motors. Although there is a certain gap between the processing speed and the storage capacity of the voice library and the PC system, the embedded system with DSP and FPGA as the core undoubtedly has a broad prospect in the miniaturization, low power consumption and specific function realization of the robot.

Keywords：DSP Reference address：Design of robot voice control system based on DSP and FPGA

Previous article：Design of communication software for a certain aircraft bus system based on FPGA and DSP technology
Next article：Discussion on the selection strategy of ASIC, FPGA and DSP in software radio design

Recommended ReadingLatest update time:2024-11-23 10:23

Design of a high-speed real-time/playback hierarchical multiplexer based on FPGA

Abstract: Based on the recommendations of the Advanced On-Orbit System (AOS) of the Consultative Committee on Space Data Systems (CCSDS), a two-level multiplexing scheme was proposed, and a high-speed real-time/playback hierarchical multiplexer with payload data storage function was designed. The scheme uses FPGA te

[Embedded]

Design of a high-speed real-time/playback hierarchical multiplexer based on FPGA

Design of frequency characteristic tester based on DDS and FPGA devices

1 Introduction Frequency characteristics are the response characteristics of a system (or component) to sinusoidal input signals of different frequencies. As shown in Figure 1, the input of the measured system is a sinusoidal signal with an amplitude of Ar and an angular frequency of ω. If the system is linear, its st

[Test Measurement]

Design of frequency characteristic tester based on DDS and FPGA devices

How to deal with power consumption optimization in embedded DSP design

　　Optimizing power consumption is an important but often elusive design goal for digital signal processor ( DSP ) based systems. Today, DSP based devices often combine multiple applications that were previously independent, each of which may have multiple operating modes. It is very difficult to get the power profile

[Embedded]

Hardware Design of Sound Acquisition System Based on DSP

0Introduction Sound signals are everywhere and contain a lot of information. In daily production and life, we can simplify the process and get the results we want by analyzing sound signals. With the continuous increase in the cost performance of DSP chips, DSP has been expanded from the military field to t

[Embedded]

FPGA driver development for ARM920T based on Linux platform

The full name of the Linux operating system is GNU/Linux. It is an operating system composed of two parts: the GNU project and the Linux kernel. The source code of all components in the system is free, which can effectively protect the learning results, and thus has been widely used in the embedded field. FPGA

[Microcontroller]

FPGA driver development for ARM920T based on Linux platform

FPGA Design in Automatic Temperature Control System of Diffusion Furnace

1 Introduction At present, most of the temperature control equipment at home and abroad is single-channel control, which can only control one heating device. In China, the research and development of multi-channel temperature monitoring systems for high-temperature equipment is still relatively lagging. Most

[Industrial Control]

FPGA Design in Automatic Temperature Control System of Diffusion Furnace

Design and implementation of ultra-high-speed frequency-hopping system baseband based on DSP/FPGA

As an important type of spread spectrum communication system, the frequency hopping communication system has been widely used in the military and civilian communication fields for its excellent anti-far effect and anti-interference capabilities. The frequency hopping communication method refers to the carrier being co

[Embedded]

Design and implementation of ultra-high-speed frequency-hopping system baseband based on DSP/FPGA

Design of FIR digital filter based on FPGA (Part 1)

In the Matlab/Simulink environment, the DSP Builder module was used to build the FIR model, and the FIR filter was designed according to the FDATool tool. Then, system-level simulation and ModelSim functional simulation were performed. The simulation results show that the filtering effect of the digital filter is good

[Analog Electronics]

Design of FIR digital filter based on FPGA (Part 1)

Popular Resources
Popular amplifiers