Design of a Chinese speech synthesis system based on DSP-EEWORLD

Collect

0 Preface

With the continuous development and maturity of speech signal processing technology, speech synthesis is gradually becoming a key technology for human-computer interface in information technology. DSP chip, that is, digital signal processor, is a microprocessor with a special structure specially designed to quickly implement various signal processing algorithms. Its processing speed is 10 to 50 times faster than the fastest CPU. This article introduces an implementation method of a Chinese speech synthesis system based on DSP.

1 Overall system plan

The biggest feature of speech synthesis is to synthesize continuous sentences with unlimited vocabulary from limited storage units [1]. In order to achieve this, this system is designed to have (1) a front-end preprocessing module that converts input text files into a standard format that the system can process; (2) a prosodic rule library that gives the prosodic feature parameters of each syllable in the current language environment; (3) The speech synthesizer adjusts the acoustic parameters of the corresponding speech units in the original speech library according to the given prosodic feature parameters; (4) Splicing the adjusted speech units together to obtain a continuous speech output corresponding to the input text It consists of 4 basic processes. The basic principle block diagram of the system is shown in Figure 1.

2 Hardware system design

*Fund project: Hunan Provincial Department of Education (03C025)

The purpose of the Chinese speech synthesis system is to output the input text file clearly, naturally and understandably in the form of continuous speech. ATMEL's AT89S52 microcontroller displays the text file input from the keyboard, then sends it to TMS320VC5402 for processing, and finally outputs the synthesis result. The hardware structure block diagram is shown in Figure 2.

2.1 Keyboard circuit and display circuit

The keyboard interface circuit of AT89S52 adopts the interrupt mode. When a key is pressed, an interrupt request is generated, the interrupt processing is entered, and then the corresponding processing is performed by querying the status of P1.0 and P1.1. Use resistors and capacitors to form an anti-bounce circuit to prevent incorrect operation of the circuit.

AT89S52 sends the text information input from the keyboard to the LCD display, and at the same time writes the data to the external memory CY7C133, and then TMS320 VC5402 reads the information on CY7C133 for processing. The LCD command format is shown in Table 1:

Table 1 LCD command format

Among them, RS and R/W jointly decide which register to select, as shown in Table 2:

Table 2 Register selection

2.2 Communication between TMS320VC5402 and AT89S52

AT89S52 and TMS320VC5402 work independently, and their information and data exchange are realized by sharing an external memory. The signal communication between them is realized through hard connection and software judgment [2].

The external memory uses CY7C133, which is a high-speed 2K X 16bit static asynchronous dual-port RAM with a storage speed of 25ns. It has two independent sets of address lines, data lines and control signal lines, allowing data in two control devices to be communicated through a commonly connected memory. This dual-port RAM allows two controllers to read any memory unit at the same time (including reading the same unit at the same time), but does not allow simultaneous writing or reading and writing of the same address unit.

For TMS320VC5402, the corresponding address of data memory CY7C133 is 4000H~47FFH.

For AT89S52, the corresponding address of data memory CY7C133 is 2000H~27FFH.

3 Software system design

As a tonal language, Chinese has very complex rhythmic features. In order to synthesize continuous sentences with unlimited vocabulary from limited storage units, the prosodic parameters of the phonetic library units must be adjusted under certain prosodic rules to obtain phonetic change units that conform to the current language flow environment [3].

According to different methods of obtaining sound change units, speech synthesizers can be divided into two types: (1) waveform splicing synthesis; (2) parameter synthesis (also known as source/filter synthesis). This system uses the method of waveform splicing and synthesis to directly adjust the time domain and frequency domain waveforms of the waveform to obtain the required sound change unit.

Simple waveform splicing method makes it difficult to adjust pitch and pitch (duration). Therefore, this system uses the direct splicing of neutral intonation syllables with the Pitch Synchronous Waveform Addition (PSOLA) algorithm, and uses the Code Excited Linear Prediction (CELP) coding method to encode and compress the original sampled sound library. The basic flow chart is shown in Figure 3.

4 Conclusion

This system uses SCM to display input text files in real time, and can achieve comparative output of synthesized speech and input text files, which is highly intuitive; it has high clarity, intelligibility and naturalness; the synthesis algorithm has low computational complexity and can be used as small as possible sound library to meet the requirements for occupying limited storage space.

Keywords：algorithm Reference address：Design of a Chinese speech synthesis system based on DSP

Previous article：Design of a Chinese speech synthesis system based on DSP
Next article：Design of wide-band digital phase-locked loop and implementation based on FPGA

Popular Resources
Popular amplifiers