Speech recognition is mainly developing in the direction of far-field and fusion, but there are still many difficulties in far-field reliability that have not been broken through, such as multi-round interaction, multi-person noise and other scenes that need to be broken through, and there is also a more urgent need for technologies such as human voice separation. New technologies should completely solve these problems and make machine hearing far superior to human perception. This cannot be just the advancement of algorithms, but requires joint technology upgrades of the entire industry chain, including more advanced sensors and chips with stronger computing power.
From the perspective of far-field speech recognition technology alone, there are still many challenges, including:
(1) Echo cancellation technology. Due to the existence of nonlinear distortion in the speaker, it is difficult to completely eliminate the echo by simply relying on signal processing methods, which also hinders the promotion of voice interaction systems. Existing echo cancellation technologies based on deep learning do not consider phase information and directly obtain the gain on each frequency band. Whether it is possible to use deep learning to fit the nonlinear distortion and combine it with signal processing methods may be a good direction.
(2) Speech recognition in noisy environments still needs to be improved. Signal processing is good at dealing with linear problems, while deep learning is good at dealing with nonlinear problems. However, practical problems must be a combination of linear and nonlinear problems. Therefore, only by integrating the two can the problem of speech recognition in noisy environments be better solved.
(3) The common point of the above two problems is that the current deep learning only uses the energy information of each frequency band of the speech signal, but ignores the phase information of the speech signal. Especially for multi-channel, how to make deep learning better utilize phase information may be a future direction.
(4) In addition, how to obtain a good acoustic model through transfer learning when there is less data is also a hot research direction. For example, in dialect recognition, if there is a relatively good Mandarin acoustic model, how to use a small amount of dialect data to obtain a good dialect acoustic model will greatly expand the application scope of speech recognition. Some progress has been made in this regard, but it is more of a training technique, and there is still a certain gap from the ultimate goal.
(5) The purpose of speech recognition is to enable machines to understand humans, so conversion into text is not the ultimate goal. How to combine speech recognition and semantic understanding may be a more important direction in the future. The LSTM in speech recognition has already taken into account the historical moment information of speech, but semantic understanding requires more historical information to be helpful, so how to pass more contextual conversation information to the speech recognition engine is a difficult problem.
(6) To make machines understand human language, sound information alone is not enough. The next step is to integrate physical sensing methods such as "sound, light, electricity, heat, force and magnetism". Only in this way can machines perceive the real information of the world. This is the prerequisite for machines to learn human knowledge. Moreover, machines must surpass the five senses of humans and be able to see the world that humans cannot see and hear the world that humans cannot hear.
Previous article:The development of voice assistants will change users' future lifestyles
Next article:What kind of convenience will the development of voice technology bring to our lives?
- Popular Resources
- Popular amplifiers
- "Cross-chip" quantum entanglement helps build more powerful quantum computing capabilities
- Why is the vehicle operating system (Vehicle OS) becoming more and more important?
- Car Sensors - A detailed explanation of LiDAR
- Simple differences between automotive (ultrasonic, millimeter wave, laser) radars
- Comprehensive knowledge about automobile circuits
- Introduction of domestic automotive-grade bipolar latch Hall chip CHA44X
- Infineon Technologies and Magneti Marelli to Drive Regional Control Unit Innovation with AURIX™ TC4x MCU Family
- Power of E-band millimeter-wave radar
- Hardware design of power supply system for automobile controller
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
- Vicor high-performance power modules enable the development of low-altitude avionics and EVTOL
- Chuangshi Technology's first appearance at electronica 2024: accelerating the overseas expansion of domestic distributors
- Chuangshi Technology's first appearance at electronica 2024: accelerating the overseas expansion of domestic distributors
- "Cross-chip" quantum entanglement helps build more powerful quantum computing capabilities
- Ultrasound patch can continuously and noninvasively monitor blood pressure
- Ultrasound patch can continuously and noninvasively monitor blood pressure
- Europe's three largest chip giants re-examine their supply chains
- Europe's three largest chip giants re-examine their supply chains
- Breaking through the intelligent competition, Changan Automobile opens the "God's perspective"
- The world's first fully digital chassis, looking forward to the debut of the U7 PHEV and EV versions
- Physical Explanation of Matched Filters
- [Evaluation of SGP40] 4. zynq + SVM40
- Toshiba photorelay TLP3547 small current conduction long time stability
- EEWORLD University ---- designing electrical systems vol 1
- Rice goes in and rice comes out. How is rice husked? You may not have seen it even if you have been eating it for decades!
- [Transfer] Core Design Skills for FPGA Register Automatic Configuration
- Hangzhou Luger Technology is recruiting embedded development engineers (R&D managers)
- The number of bits per byte in the stm32 ordinary serial port transmission method is different from that in the DMA serial port transmission method
- The magnetic field formed by a wire in a vacuum
- The gap between actual power supply and ideal power supply