What will be the development trend of speech recognition technology in the future?-EEWORLD

Collect

Speech recognition is mainly developing in the direction of far-field and fusion, but there are still many difficulties in far-field reliability that have not been broken through, such as multi-round interaction, multi-person noise and other scenes that need to be broken through, and there is also a more urgent need for technologies such as human voice separation. New technologies should completely solve these problems and make machine hearing far superior to human perception. This cannot be just the advancement of algorithms, but requires joint technology upgrades of the entire industry chain, including more advanced sensors and chips with stronger computing power.

From the perspective of far-field speech recognition technology alone, there are still many challenges, including:

(1) Echo cancellation technology. Due to the existence of nonlinear distortion in the speaker, it is difficult to completely eliminate the echo by simply relying on signal processing methods, which also hinders the promotion of voice interaction systems. Existing echo cancellation technologies based on deep learning do not consider phase information and directly obtain the gain on each frequency band. Whether it is possible to use deep learning to fit the nonlinear distortion and combine it with signal processing methods may be a good direction.

What will be the development trend of speech recognition technology in the future?

(2) Speech recognition in noisy environments still needs to be improved. Signal processing is good at dealing with linear problems, while deep learning is good at dealing with nonlinear problems. However, practical problems must be a combination of linear and nonlinear problems. Therefore, only by integrating the two can the problem of speech recognition in noisy environments be better solved.

(3) The common point of the above two problems is that the current deep learning only uses the energy information of each frequency band of the speech signal, but ignores the phase information of the speech signal. Especially for multi-channel, how to make deep learning better utilize phase information may be a future direction.

(4) In addition, how to obtain a good acoustic model through transfer learning when there is less data is also a hot research direction. For example, in dialect recognition, if there is a relatively good Mandarin acoustic model, how to use a small amount of dialect data to obtain a good dialect acoustic model will greatly expand the application scope of speech recognition. Some progress has been made in this regard, but it is more of a training technique, and there is still a certain gap from the ultimate goal.

(5) The purpose of speech recognition is to enable machines to understand humans, so conversion into text is not the ultimate goal. How to combine speech recognition and semantic understanding may be a more important direction in the future. The LSTM in speech recognition has already taken into account the historical moment information of speech, but semantic understanding requires more historical information to be helpful, so how to pass more contextual conversation information to the speech recognition engine is a difficult problem.

(6) To make machines understand human language, sound information alone is not enough. The next step is to integrate physical sensing methods such as "sound, light, electricity, heat, force and magnetism". Only in this way can machines perceive the real information of the world. This is the prerequisite for machines to learn human knowledge. Moreover, machines must surpass the five senses of humans and be able to see the world that humans cannot see and hear the world that humans cannot hear.

Reference address：What will be the development trend of speech recognition technology in the future?

Previous article：The development of voice assistants will change users' future lifestyles
Next article：What kind of convenience will the development of voice technology bring to our lives?

Popular Resources
Popular amplifiers