The fundamental purpose of speech recognition research is to develop a machine with auditory function that can directly accept human oral commands, understand human intentions and respond accordingly. The research on speech recognition system involves many disciplines such as microcomputer technology, artificial intelligence, digital signal processing, pattern recognition, acoustics, linguistics and cognitive science. It is a multidisciplinary and comprehensive research field. In recent years, the rapid development of high-performance digital signal processing chip DSP (Digital Signal Process) technology has made it possible to realize real-time speech recognition. Among them, AD's digital signal processing chip has been widely used in various fields due to its good cost performance and code portability. Therefore, we use AD's fixed-point DSP processing chip ADSP2181 to realize the recognition of speech signals.
1 Basic Process of Speech Recognition
According to different applications in practice, speech recognition systems can be divided into: recognition of specific people and non-specific people, recognition of independent words and continuous words, recognition of small vocabulary, large vocabulary and unlimited vocabulary. However, no matter which speech recognition system is used, its basic principles and processing methods are generally similar. The schematic diagram of a typical speech recognition system is shown in Figure 1.
The speech recognition process mainly includes speech signal preprocessing, feature extraction, and pattern matching. Preprocessing includes pre-filtering, sampling and quantization, windowing, endpoint detection, pre-emphasis, and other processes. The most important part of speech signal recognition is feature parameter extraction. The extracted feature parameters must meet the following requirements:
(1) The extracted feature parameters can effectively represent the speech features and have good distinguishability;
(2) There is good independence between the parameters of each order;
(3) The feature parameters should be easy to calculate, and it is best to have an efficient algorithm to ensure real-time implementation of speech recognition.
In the training phase, after processing the feature parameters, a model is built for each entry and saved as a template library. In the recognition phase, the speech signal passes through the same channel to obtain the speech feature parameters, generate a test template, match it with the reference template, and use the reference template with the highest matching score as the recognition result. At the same time, the recognition accuracy can also be improved with the help of a lot of prior knowledge.
2 System Hardware Structure
2.1 Features of ADSP2181
AD's DSP processing chip ADSP2181 is a 16b fixed-point DSP chip with large internal storage space, strong computing function and strong interface capability. It has the following main features:
(1) Adopting Harvard structure, external 16.67MHz crystal oscillator, instruction cycle is 30ns, instruction speed is 33MI/s, and all instructions are executed in single cycle;
(2) 80 kB of on-chip memory: 16 kB words (24 bits) of program memory and 16 kB words (16 bits) of data memory;
(3) There are three independent computing units inside: arithmetic logic unit (ALU), multiply-accumulate unit (MAC) and barrel shifter (SHIFT). The multiply-accumulate unit supports multiple precision and automatic unbiased rounding.
(4) A 16-bit internal DMA port (1DMA) for high-speed access to on-chip memory; an 8-bit bootstrap DMA (BDMA) port for loading data and programs from the bootstrap program memory;
(5) 6 external interrupts, and the priority or mask can be set, etc.
Due to the above characteristics of ADSP2181, the system composed of this chip has small size, high performance, low cost and power consumption, and can better implement speech recognition algorithm.
2.2 System Hardware Structure
When constructing the speech recognition circuit, we adopted the master-slave structure design of ADSP2181, and the CPU loaded the program through the IDMA port. The hardware structure of the speech recognition system is shown in Figure 2.
In this structure, the PC is the master CPU and the ADSP2181 is the slave CPU. The PC loads the program into the internal memory of the ADSP2181 through the IDMA port. The PC bus is decoded by the CPLD to form control signals such as IRD, IWR, IAL, IS, etc., which are connected to the IDMA port of the ADSP2181. In this way, when the ADSP2181 is running at full speed, the host can query the running status of the slave and access all the program memory and data memory inside the ADSP2181. This greatly facilitates the compilation and debugging of the program, as well as the real-time processing of voice signals.
3 DSP Implementation Technology of Speech Recognition
3.1 Fixed-point implementation of floating-point operations
There are many floating-point operations in the speech recognition algorithm. Using fixed-point DSP to implement floating-point operations is the first problem that needs to be solved in writing speech recognition programs. This problem can be solved by the calibration method of numbers. The calibration of numbers is to determine the position of the decimal point in the fixed-point number. Q representation is a commonly used calibration method. Its representation mechanism is:
The fixed point number is J, the floating point number is)/, then the conversion relationship between the fixed point number and the floating point number represented by the Q method is:
Floating point number)/convert to fixed point number x: x= (int)y×2Q;
Convert a fixed-point number z to a floating-point number y: y = (float)x × 2-Q.
3.2 Data accuracy processing
When using a 16b fixed-point DSP to implement a speech recognition algorithm, although the program's running speed is improved, the data accuracy is relatively low. This may be due to the accumulated errors in the intermediate process, which may cause incorrect calculation results. In order to improve the calculation accuracy of the data, the following processing methods are used in the program:
(1) Extended Precision
In places where high precision is required, the intermediate variables of the calculation are represented by 32 bits or even 48 bits. In this way, the calculation precision is greatly improved without increasing the number of instructions by a small amount.
(2) Using pseudo-floating point method to represent floating point numbers
The pseudo-floating point method uses the mantissa + exponent method to represent floating point numbers. In this case, the mantissa of the data block can use the Q1.15 data format, and the exponent of the data block is the same. This method of representing data has a large enough data range and can fully meet the requirements of data accuracy, but it requires writing a set of exponent and mantissa operation libraries, which will increase the number of program instructions and the amount of calculation, which is not conducive to real-time implementation.
Both of the above methods can improve calculation accuracy, but in actual operation, a trade-off should be made based on system requirements and algorithm complexity.
3.3 Variable Maintenance
In high-level languages, there is a difference between global variables and local variables storage, but in DSP programs, all declared variables are allocated to data space when linking. Therefore, if local variables are defined in the same way as in high-level languages, a lot of DSP storage space will be wasted, which is obviously unreasonable for fixed-point DSPs with limited data space. In order to save storage space, it is best to maintain a variable table when writing DSP programs. When entering each DSP submodule, do not rush to allocate new local variables, and give priority to variables that have been allocated but not used. New local variables should only be allocated when there are not enough.
3.4 Handling of nested loops
Many implementations of speech recognition algorithms are implemented in loops. When processing loops, you need to pay attention to the following issues:
(1) In the ADSP2100 series DSP chips, loop nesting cannot exceed 4 levels at most, otherwise a stack overflow will occur, causing the program to fail to execute correctly. However, in the DSP program for speech recognition, the nested programs, including interrupts, often exceed 4 levels. In this case, you cannot use the do...un
(2) Try to reduce the number of instructions in the loop body. In multiple loops, reducing the number of instructions helps reduce the number of times the program is executed. This helps reduce the execution time of the program and improve the real-time performance of the operation.
3.5 Adopt a modular programming approach
In the implementation of speech recognition algorithm, in order to facilitate the design and debugging of the program, a modular programming method is adopted. The module division is based on the basic process of speech recognition, and each module is further divided into several sub-modules, and then programming and debugging are carried out on a module basis. Before writing the program, the algorithm of each module is first simulated in a high-level language, and then the assembly program is written on this basis. When debugging, the debugging method of comparing high-level language with assembly language can be used. In this way, the correctness of the assembly language can be verified by tracking the intermediate state between the high-level language and the assembly language, and errors can be discovered and corrected in time, shortening the programming cycle. In addition, in the process of writing the program, necessary comments and instructions should be added to the key parts to enhance the readability of the program.
During the overall adjustment, it is necessary to set the corresponding population parameters and export parameters in each module, maintain the stack pointer and intermediate variables, etc.
3.6 Mixed Programming Using C and Assembly Language
Now, most DSP chips support mixed programming of assembly language and C or C++ language, and ADSP2181 is no exception. Using C language to develop DSP programs has the advantages of shortening the development cycle and reducing program complexity. However, the execution efficiency of the program is not high, and it will increase additional machine cycles, which is not conducive to the real-time implementation of the program. For this reason, when writing the speech recognition algorithm in C language, we use fixed-point processing technology. ADSP2181 is a 16-bit fixed-point processor. The following issues should be noted in fixed-point processing:
(1) ADSP2181 supports both decimal and integer calculation modes. The decimal mode should be selected during calculation so that the absolute value of the calculation result is less than 1.
(2) Use double-word fixed-point arithmetic library instead of C language floating-point library to improve the calculation accuracy;
(3) Pay attention to performing saturation operations after each multiplication and addition operation to prevent overflow and underflow of the result;
(4) After the loop processing, a set of data may have different exponents and needs to be normalized so that the subsequent fixed-point operations can process the exponent and mantissa separately.
4 Conclusion
The speech recognition system composed of fixed-point DSP chips has a wide range of application prospects. When writing speech recognition algorithms, fixed-point processing and some principles and methods are also of practical guiding significance to other similar algorithms. In practical applications, attention should be paid to optimizing the algorithm according to the characteristics of the DSP chip so that the performance of the DSP chip can be fully utilized.
Previous article:Power supply monitoring of microprocessor storage system based on DS1210
Next article:A Brief Discussion on Electromagnetic Compatibility in DSP Systems
- Popular Resources
- Popular amplifiers
- Mission-oriented wireless communications for cooperative sensing in intelligent unmanned systems
- Monocular semantic map localization for autonomous vehicles
- ICCV2023 Paper Summary: Video Analysis and Understanding
- Algorithm Notebooks Practical Guide for Computer Training (Edited by Hu Fan and Zeng Lei)
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- RT-thread studio hardware timer counting issue
- Highly recommend a good introductory book on digital signal processing
- python3,snake
- [TI recommended course] #[High Precision Laboratory] Interface: 3 Ethernet#
- MSP430F5529 ADC Reference
- Fennel – Lisp syntax for Lua programming
- [LSM6DSOX finite state machine routine study 1]--Introduction to finite state machine and routine
- Reading the good book "Operational Amplifier Parameter Analysis and LTspice Application Simulation" Reading experience 01- Familiarity with the book
- Brief Introduction of Push-Pull Circuit
- New wireless LED driver platform advances commercial lighting applications and meets future demands