A brief analysis of the smart TV voice control solution-EEWORLD

Collect

Among the four modules of the voice control platform, the automatic speech recognition module is responsible for converting audio signals into text information. Its functions are relatively simple and have been introduced in the previous article. I will not repeat them here. The functions of semantic understanding, intention decision, and skill distribution/decision modules are relatively complex and are also the core capabilities of voice control. The subdivision functions of each module are shown in Figure 4. The semantic understanding module includes query analysis, scene classification, intention recognition, context recognition, template intervention, and slot extraction functions. After extracting keywords from a piece of voice information through the slot, the intention is classified according to different scenarios, and the intention is adjusted in combination with context understanding, so as to accurately determine the true intention of a sentence. With the slot extraction capability, when expanding new businesses, it can be separated from the dependence on the third-party skill language understanding ability, and realize flexible docking with third-party businesses. It can also train the corresponding slots according to business needs to facilitate the development of new businesses. At the same time, after the slots are subdivided according to the scenario, it can be customized for specific groups and usage scenarios, improving the accuracy of services and the operation conversion rate. The intention decision module includes multiple intention decisions, contextual decisions, personalized intervention, and user portrait generation. It mainly intervenes in the intention based on the user's usage habits and context, selects the intention that best matches the user scenario from multiple intentions, and improves the accuracy of the intention. The skill distribution/decision module selects the decision results through data models or manual intervention, thereby controlling the distribution of intents and achieving flexible docking with third-party content resources.

Figure 4 Voice control core module

4 Voice Central Control Platform Software Architecture

The voice control platform software is mainly divided into three layers in terms of architecture, namely the underlying technology layer, the core capability layer and the docking layer that requires secondary development. The hierarchical structure is shown in Figure 5. The underlying technology includes deep learning algorithms, speech recognition technology, natural language processing and basic data models. This part is the basic technology of intelligent speech, which is highly professional and generally does not require special customization. It can use mature technical solutions from third parties. The core capability layer includes scene classification, intent recognition, slot extraction, context judgment, decision and skill distribution, user portrait and personalized recommendation modules, covering all the core functions of voice cloud processing. The performance optimization of speech processing and the customized development of differentiated functions need to be implemented in this layer. The service docking, model training, decision configuration and data analysis modules above the core capability layer are used to dock specific businesses and services, and secondary development is required according to specific business needs. This layer needs to realize the flexible docking of multiple services, analyze business data and train models, and formulate appropriate decision mechanisms according to business types and user usage scenarios to complete the function matching of complex or multi-semantic statements.

5 Conclusion

This article provides a solution for building a private voice control platform for enterprises. In the entire voice link, the voice control occupies a pivotal position. By building a private voice control platform, third-party services and skills can be flexibly configured through the cloud without disturbing users, improving the speed of intelligent voice optimization iteration. Voice skills can also be customized according to specific business and user usage scenarios to create unique voice services for users. In addition, the use of a private voice control platform can more conveniently manage user data and ensure the security of voice data. Therefore, whether from the perspective of resource integration, performance improvement or business expansion, the establishment of a private control platform is the future trend of large enterprises.

References:

[1] Guo Jingjing. The significance of speech recognition technology development in promoting Mandarin[J]. Communication Research, 2020(18).

[2] Du Lingjun, Wu Xiaodao. Global patent layout trend of speech recognition technology[J]. Science and Technology China, 2021(12).

[3] Zhang Dalin, Ren Xuan, Xu Yimin, et al. Design and implementation of speech recognition technology for enterprise intranet system[J]. Digital Technology and Application, 2021(12).

[4] Yuan Bingqing, Yu Gan, Zhou Xia. A brief introduction to speech recognition technology[J]. Digital Communication World, 2020(02).

[5] Zhang Yu, Gao Lingyan, Hu Huan, et al. Research on the application of intelligent speech recognition technology in postal express lockers[J]. Electronic World, 2020(04).

[6] Li Boli. Mathematics in traditional computer speech recognition technology[J]. Fireworks Technology and Market, 2020(02).

[7] Hao Ouya, Wu Xuan, Liu Rongkai. Development status and application prospects of intelligent speech recognition technology[J]. Electroacoustic Technology, 2020(03).

[8] Peng Hongsong, Li Hongbin, Li Li, et al. Research on far-field speech recognition technology in artificial intelligence [J]. Digital Communication World, 2020(05).

[9] Yu Xiaoming. Development and application of speech recognition technology[J]. Computer Age, 2019(11).

[10] Tian Jianyong, Liu Song, Li Zhouyue, et al. Design analysis of intelligent voice reminder system[J]. Computer Knowledge and Technology, 2020(20).

[11] Li Yaming, Li Yang. Research on the application of artificial intelligence in the television industry in the era of smart media[J]. Publishing Wide Angle, 2019(03).

[12] Zhan Hongyan. Practice of artificial intelligence in television human-computer interaction[J]. Digital Technology and Application, 2019(03).

[13] Zhang Lanshan, Huang Gaoyuan. Opportunities and challenges brought by artificial intelligence technology to television media[J]. China Television, 2018(07).

[14] Hou Guangmin. Application of artificial intelligence in television human-computer interaction[J]. Cable TV Technology, 2017(11).

[1] [2]

Keywords：Smart TV Reference address：A brief analysis of the smart TV voice control solution

Previous article：How to analyze and optimize some background noise in mobile phone audio systems
Next article：Sensors in wearable devices: getting smaller and smaller, but bigger and bigger!

Popular Resources
Popular amplifiers