Speech Recognition and Its Fixed-Point DSP Implementation-EEWORLD

Collect

The fundamental purpose of speech recognition research is to develop a machine with auditory function that can directly accept human oral commands, understand human intentions and respond accordingly. The research on speech recognition system involves many disciplines such as microcomputer technology, artificial intelligence, digital signal processing, pattern recognition, acoustics, linguistics and cognitive science. It is a multidisciplinary and comprehensive research field. In recent years, the rapid development of high-performance digital signal processing chip DSP (Digital Signal Process) technology has made it possible to realize real-time speech recognition. Among them, AD's digital signal processing chip has been widely used in various fields due to its good cost performance and code portability. Therefore, we use AD's fixed-point DSP processing chip ADSP2181 to realize the recognition of speech signals.

1 Basic Process of Speech Recognition

According to different applications in practice, speech recognition systems can be divided into: recognition of specific people and non-specific people, recognition of independent words and continuous words, recognition of small vocabulary, large vocabulary and unlimited vocabulary. However, no matter which speech recognition system is used, its basic principles and processing methods are generally similar. The schematic diagram of a typical speech recognition system is shown in Figure 1.

Schematic diagram of a typical speech recognition system

The speech recognition process mainly includes speech signal preprocessing, feature extraction, and pattern matching. Preprocessing includes pre-filtering, sampling and quantization, windowing, endpoint detection, pre-emphasis, and other processes. The most important part of speech signal recognition is feature parameter extraction. The extracted feature parameters must meet the following requirements:

(1) The extracted feature parameters can effectively represent the speech features and have good distinguishability;

(2) There is good independence between the parameters of each order;

(3) The feature parameters should be easy to calculate, and it is best to have an efficient algorithm to ensure real-time implementation of speech recognition.

In the training phase, after processing the feature parameters, a model is built for each entry and saved as a template library. In the recognition phase, the speech signal passes through the same channel to obtain the speech feature parameters, generate a test template, match it with the reference template, and use the reference template with the highest matching score as the recognition result. At the same time, the recognition accuracy can also be improved with the help of a lot of prior knowledge.

2 System Hardware Structure

2.1 Features of ADSP2181

AD's DSP processing chip ADSP2181 is a 16b fixed-point DSP chip with large internal storage space, strong computing function and strong interface capability. It has the following main features:

(1) Adopting Harvard structure, external 16.67MHz crystal oscillator, instruction cycle is 30ns, instruction speed is 33MI/s, and all instructions are executed in single cycle;

(2) 80 kB of on-chip memory: 16 kB words (24 bits) of program memory and 16 kB words (16 bits) of data memory;

(3) There are three independent computing units inside: arithmetic logic unit (ALU), multiply-accumulate unit (MAC) and barrel shifter (SHIFT). The multiply-accumulate unit supports multiple precision and automatic unbiased rounding.

(4) A 16-bit internal DMA port (1DMA) for high-speed access to on-chip memory; an 8-bit bootstrap DMA (BDMA) port for loading data and programs from the bootstrap program memory;

(5) 6 external interrupts, and the priority or mask can be set, etc.

Due to the above characteristics of ADSP2181, the system composed of this chip has small size, high performance, low cost and power consumption, and can better implement speech recognition algorithm.

2.2 System Hardware Structure

When constructing the speech recognition circuit, we adopted the master-slave structure design of ADSP2181, and the CPU loaded the program through the IDMA port. The hardware structure of the speech recognition system is shown in Figure 2.

Hardware structure of speech recognition system

In this structure, the PC is the master CPU and the ADSP2181 is the slave CPU. The PC loads the program into the internal memory of the ADSP2181 through the IDMA port. The PC bus is decoded by the CPLD to form control signals such as IRD, IWR, IAL, IS, etc., which are connected to the IDMA port of the ADSP2181. In this way, when the ADSP2181 is running at full speed, the host can query the running status of the slave and access all the program memory and data memory inside the ADSP2181. This greatly facilitates the compilation and debugging of the program, as well as the real-time processing of voice signals.

3 DSP Implementation Technology of Speech Recognition

3.1 Fixed-point implementation of floating-point operations

There are many floating-point operations in the speech recognition algorithm. Using fixed-point DSP to implement floating-point operations is the first problem that needs to be solved in writing speech recognition programs. This problem can be solved by the calibration method of numbers. The calibration of numbers is to determine the position of the decimal point in the fixed-point number. Q representation is a commonly used calibration method. Its representation mechanism is:

The fixed point number is J, the floating point number is)/, then the conversion relationship between the fixed point number and the floating point number represented by the Q method is:

Floating point number)/convert to fixed point number x: x= (int)y×2Q;

Convert a fixed-point number z to a floating-point number y: y = (float)x × 2-Q.

3.2 Data accuracy processing

When using a 16b fixed-point DSP to implement a speech recognition algorithm, although the program's running speed is improved, the data accuracy is relatively low. This may be due to the accumulated errors in the intermediate process, which may cause incorrect calculation results. In order to improve the calculation accuracy of the data, the following processing methods are used in the program:

(1) Extended Precision

In places where high precision is required, the intermediate variables of the calculation are represented by 32 bits or even 48 bits. In this way, the calculation precision is greatly improved without increasing the number of instructions by a small amount.

(2) Using pseudo-floating point method to represent floating point numbers

The pseudo-floating point method uses the mantissa + exponent method to represent floating point numbers. In this case, the mantissa of the data block can use the Q1.15 data format, and the exponent of the data block is the same. This method of representing data has a large enough data range and can fully meet the requirements of data accuracy, but it requires writing a set of exponent and mantissa operation libraries, which will increase the number of program instructions and the amount of calculation, which is not conducive to real-time implementation.

Both of the above methods can improve calculation accuracy, but in actual operation, a trade-off should be made based on system requirements and algorithm complexity.

3.3 Variable Maintenance

In high-level languages, there is a difference between global variables and local variables storage, but in DSP programs, all declared variables are allocated to data space when linking. Therefore, if local variables are defined in the same way as in high-level languages, a lot of DSP storage space will be wasted, which is obviously unreasonable for fixed-point DSPs with limited data space. In order to save storage space, it is best to maintain a variable table when writing DSP programs. When entering each DSP submodule, do not rush to allocate new local variables, and give priority to variables that have been allocated but not used. New local variables should only be allocated when there are not enough.

3.4 Handling of nested loops

Many implementations of speech recognition algorithms are implemented in loops. When processing loops, you need to pay attention to the following issues:

(1) In the ADSP2100 series DSP chips, loop nesting cannot exceed 4 levels at most, otherwise a stack overflow will occur, causing the program to fail to execute correctly. However, in the DSP program for speech recognition, the nested programs, including interrupts, often exceed 4 levels. In this case, you cannot use the do...un TI l... instructions provided by the DSP. You can only design some loop variables and maintain these variables yourself. Since the DSP loop stack is not used at this time, it will not cause a stack overflow. In addition, if you use the jump instruction to jump out of the loop instruction, you must maintain the pointers of the three stacks, PC, LOOP, and CNTR.

(2) Try to reduce the number of instructions in the loop body. In multiple loops, reducing the number of instructions helps reduce the number of times the program is executed. This helps reduce the execution time of the program and improve the real-time performance of the operation.

3.5 Adopt a modular programming approach

In the implementation of speech recognition algorithm, in order to facilitate the design and debugging of the program, a modular programming method is adopted. The module division is based on the basic process of speech recognition, and each module is further divided into several sub-modules, and then programming and debugging are carried out on a module basis. Before writing the program, the algorithm of each module is first simulated in a high-level language, and then the assembly program is written on this basis. When debugging, the debugging method of comparing high-level language with assembly language can be used. In this way, the correctness of the assembly language can be verified by tracking the intermediate state between the high-level language and the assembly language, and errors can be discovered and corrected in time, shortening the programming cycle. In addition, in the process of writing the program, necessary comments and instructions should be added to the key parts to enhance the readability of the program.

During the overall adjustment, it is necessary to set the corresponding population parameters and export parameters in each module, maintain the stack pointer and intermediate variables, etc.

3.6 Mixed Programming Using C and Assembly Language

Now, most DSP chips support mixed programming of assembly language and C or C++ language, and ADSP2181 is no exception. Using C language to develop DSP programs has the advantages of shortening the development cycle and reducing program complexity. However, the execution efficiency of the program is not high, and it will increase additional machine cycles, which is not conducive to the real-time implementation of the program. For this reason, when writing the speech recognition algorithm in C language, we use fixed-point processing technology. ADSP2181 is a 16-bit fixed-point processor. The following issues should be noted in fixed-point processing:

(1) ADSP2181 supports both decimal and integer calculation modes. The decimal mode should be selected during calculation so that the absolute value of the calculation result is less than 1.

(2) Use double-word fixed-point arithmetic library instead of C language floating-point library to improve the calculation accuracy;

(3) Pay attention to performing saturation operations after each multiplication and addition operation to prevent overflow and underflow of the result;

(4) After the loop processing, a set of data may have different exponents and needs to be normalized so that the subsequent fixed-point operations can process the exponent and mantissa separately.

4 Conclusion

The speech recognition system composed of fixed-point DSP chips has a wide range of application prospects. When writing speech recognition algorithms, fixed-point processing and some principles and methods are also of practical guiding significance to other similar algorithms. In practical applications, attention should be paid to optimizing the algorithm according to the characteristics of the DSP chip so that the performance of the DSP chip can be fully utilized.

Reference address：Speech Recognition and Its Fixed-Point DSP Implementation

Previous article：Power supply monitoring of microprocessor storage system based on DS1210
Next article：A Brief Discussion on Electromagnetic Compatibility in DSP Systems

Popular Resources
Popular amplifiers