Design and implementation of high-performance voice dialer-EEWORLD

Collect

Abstract: Taking the digital signal processor ADSP2186L as the core and controlling the system through the microcontroller, a voice dialer is designed to realize the automatic dialing function of voice control.

Keywords: DSP speech recognition dual tone multi-frequency

With the development of speech signal processing theory and very large-scale integrated circuits, speech recognition, speech coding and speech synthesis technologies are gradually becoming practical. Corresponding application electronic products have been released abroad (such as products from Sensory Company in the United States), and mobile phones with voice recognition functions have also appeared in the domestic market (such as products from PHILIPS, SAMSUNG and other companies). The voice dialer based on ADSP2186L introduced in this article integrates voice recognition, voice encoding and decoding, voice prompts, speech synthesis and dual tone multi-frequency dialing functions. Compared with existing products, it has the advantages of large capacity, high recognition rate and easy use. .

1 System functions

·Can store 200 user entries and 800 phone numbers.

·Supports voice query mode, users only need to dictate the entry to get the corresponding phone number. It is also compatible with manual query mode.

·Has convenient editing functions (add, delete, modify user records).

·With dual tone multi-frequency dialing function, automatic dialing can be realized using ordinary telephones.

2 System overall module design

The entire system can be divided into three functional modules: signal processing unit, system control unit and user interface unit.

2.1 Signal processing unit

This unit includes digital signal processor, codec and memory.

As the core of the system's voice signal processing, ADSP2186L is a low-voltage 16-bit fixed-point digital signal processor produced by ANALOG DEVICES. The chip has the following features:

· Operation speed 33MIPS;

·The chip contains 40K Byte RAM, divided into 16K Byte program RAM and 24K Byte data RAM;

·Two independent programmable full-duplex serial communication interfaces, supporting A-law/μ-law hardware voice decompression and expansion, and automatic buffer operation;

·4M Byte external addressing space;

·Support DMA operations between internal and external memory;

·13 programmable I/O ports.

DSP mainly completes feature extraction, endpoint detection and template matching of voice signals in the system, and is also responsible for the management of user records.

Used in conjunction with the ADSP2186L is the latest low-voltage 16-bit codec AD73311L produced by ANALOG DEVICES. This chip can achieve 8~64 by setting the corresponding registers. Multiple sampling frequencies of kHz, with programmable input and output gain control. This chip completes the A/D and D/A conversion of voice signals in the system.

The system memory uses 8M-bit flash memory (Flash Memo) produced by SILICON STORAGE TECHNOLOGY. ry) SST39VF080Q mainly stores two parts of content, the DSP application and the user's voice recording data. The chip has a software write protection function to prevent accidental changes to the application program. The entire chip is divided into 256 4KByte sectors, suitable for accessing voice records.

2.2 System control unit

The 4-bit microcontroller KS57C2308 produced by SAMSUNG Company is selected. This chip is mainly responsible for the overall process control of the system and the user interface control. KS57C2308 has a wide operating voltage, provides segmented liquid crystal drive, and has strong I/O operation capabilities. It is suitable for use in working environments that require multiple I/O ports and LCD displays. Using KS57C2308 can simplify the system and enable the system to have better external interface expansion capabilities.

2.3 User interface unit

Including keyboard, LCD, electret microphone and speakers.

The system hardware module structure is shown in Figure 1.

2.4 Communication between digital signal processor and codec

The communication between the digital signal processor ADSP2186L and the codec is completed through an independent serial port. The specific wiring is shown in Figure 2, and the signal description is shown in Table 1.

Table 1 Signal description

Pin name	function
SCLK	Serial bit clock
RFS	Receive frame synchronization clock
TFS	Send frame synchronization clock
DR	Data receiving pin
DT	Data sending pin

Since the AD73311L adopts a self-connected external crystal oscillator and determines the sampling rate by setting the internal frequency division register, the SCLK, TFS, and RFS of the ADSP2186L are all set as external inputs. The serial port timing of DSP is used for communication.

2.5 Communication between digital signal processing and other microcontrollers

The interface between the ADSP2186L and the microcontroller is similar to this. The difference is that the SCLK, TFS, and RFS of the ADSP2186L are all set as internal outputs. The microcontroller simulates the serial port timing of DSP through the general I/O port. The communication protocol adopts a customized data packet format, which is fast and reliable. The microcontroller also controls the reset pin of the DSP to further ensure the reliability of the system.

2.6 Interface between digital signal processor and external memory

The DSP and the external memory are connected through the address and data bus, and the BMS, WR, and RD pins of the DSP are used to implement chip selection and read and write control of the external memory. When the system is reset, the DSP automatically loads the program from the external memory, and the data during system operation is transmitted through DMA.

The microcontroller controls the liquid crystal through a dedicated LCD drive pin and completes keyboard scanning through a general-purpose I/O pin.

3 System software implementation

The system software adopts a modular design, and the DSP program is divided into 10 basic modules according to functions.

(1) System initialization

Initialization content includes algorithm parameters, state variables, and port settings.

(2) Communication interface

The main function is to receive microcontroller instructions and send back the operation results after completing the corresponding operation sequence.

(3) Voice recording

Using DSP's automatic buffer technology, the feature parameters for recognition and playback are extracted in real time during the user's dictation of entries, using high-performance endpoint detection algorithms to remove silent segments and noise segments, and saving the feature data of the actual speech segments in data RAM.

(4) Voice recognition

Use the pattern recognition method to compare the characteristic parameters in the data RAM with the existing user records saved in the external memory, find the most similar record as the recognition result, and obtain the corresponding record pointer.

(5) Voice prompts

Pre-encoded data is used to synthesize speech to prompt users for corresponding operations, making it easier for users to use and reducing the possibility of misoperations. The encoding and decoding adopts the Multi-Pulse Linear Predictive Code (Multi-Pulse Linear Predictive Code) algorithm, which can obtain higher-quality synthetic speech at a lower code rate.

(6) Voice playback

The playback feature parameters extracted during the "voice recording" process are used to synthesize speech, which is used for users to check the recording results or recognition results. According to the real-time requirements of parameter extraction, a coding and decoding algorithm with fast speed and good synthesis quality is adopted here.

(7) Add records

Store the feature parameters extracted from the "voice recording" together with the corresponding user phone number in the external memory. Also modify the record pointer.

(8) Modify records

Keep the recorded feature parameters unchanged and only modify the phone number.

(9) Delete records

Delete the entire user record, including characteristics and phone numbers, and clear the corresponding pointers.

(10) Dual tone multi-frequency dialing

The sinusoidal signal is synthesized using the method of series expansion, and is converted to D/A by the codec and then output to the speaker to realize the automatic dialing function.

The microcontroller uses the keyboard to receive user instructions, and through different combinations of functional modules, a variety of control processes can be formed to adapt to the needs of different applications.

The system software block diagram is shown in Figure 3.

4 System performance test

This system has a high recognition rate. For ordinary entries, the recognition rate is over 99%. For easily confused entries with similar pronunciations, such as "Li Ping, Li Ning, Li Ding", the recognition rate can also reach over 90%. At the same time, this system has a good user interface, and users can complete corresponding operations under voice prompts, which is convenient and fast.

This system has the basic functions of a voice dialer. Because its software and hardware design are modular, it can be easily transformed into a voice control system suitable for other applications. Therefore, this system has broad application prospects.

Reference address：Design and implementation of high-performance voice dialer

Previous article：Video signal processing in DVB_C set-top box
Next article：HSP50214BPDC and its application in software radio