With the introduction of my country's artificial intelligence development plan and the increasing maturity of various key technologies, the Internet of Things era has gradually become the next wave after the mobile Internet era. Human-computer interaction has ushered in a new round of demand innovation, evolving from traditional interaction with mouse, keyboard, and touch screen to voice interaction. Society is rapidly entering the era of intelligent voice interaction.
Taking "voice + content + intelligence" as the entry point, creating an independently designed, integrated and operated one-stop voice interaction sharing platform, building an operational and monetizable voice interaction ecosystem, empowering multi-form terminal products, and creating a human-computer interaction experience that can listen and speak is the general trend of future smart home development and also the key direction for China Mobile Smart Home Operation Center to promote the construction of digital home ecosystem.
1. Voice interaction is the key entry point for the smart home ecosystem layout
1.1 The demand for human-computer interaction continues to innovate
With the expansion of interactive scenarios, people have gradually put forward greater demands for interactive freedom, and voice interaction is getting closer and closer to human instinctive expression. With its advantages such as fast input speed, few scene restrictions, and mature technology chain development, voice interaction has become an ideal interactive channel in the intelligent era, and is gradually developing towards interactive intelligence, terminal polymorphism, and ubiquitous services.
1.2 Home scene services are more intelligent
Voice interaction is the key to the industrialization of the underlying artificial intelligence technology. Voice assistants connect multi-form terminals and a wide range of businesses, and can provide content services, Internet services, and scenario-based smart home control services, providing home users with new product experiences such as interactive entertainment, interactive education, family health, and home security. Among them, smart speakers have become the first explosive single product and are gradually extending to more product forms.
2. Core technology research to improve user experience
Intelligent voice interaction mainly involves technologies such as speech recognition, semantic understanding and speech synthesis. Speech recognition technology can convert speech streams into text, semantic understanding technology can analyze the meaning of sentences and analyze user intentions, and speech synthesis technology feeds back the analysis results to users in the form of speech, thereby realizing intelligent voice interaction with users.
2.1 Speech Recognition - Hear Clearly
The intelligent voice interaction platform now uses an end-to-end model based on the Transformer algorithm, which has the characteristics of fast recognition speed and high recognition accuracy. The model uses a self-attention mechanism based on context understanding, which improves the ability to extract semantic features and solves the problem that the acoustic model and language model cannot be jointly optimized in the traditional model. In addition, the algorithm can better utilize advanced hardware to achieve parallel computing, thereby improving the computing speed.
2.2 Semantic comprehension - understandable
The platform uses a multi-algorithm fusion model based on rules + depth + keyword matching algorithms to understand user intentions. The rule algorithm can achieve fast and accurate matching for shorter texts, the deep learning algorithm can identify new words that are not covered by the vocabulary, and the keyword matching algorithm can quickly and accurately identify text intent for problems such as inverted vocabulary order and long-tail text.
2.3 Speech Synthesis - Explain
The platform uses an end-to-end synthesis system that can directly input text or phonetic characters and directly output audio waveforms. The system reduces the requirements for linguistic knowledge and can batch implement synthesis systems for dozens or even more languages. It also exhibits rich pronunciation styles and powerful rhythmic expression, and speeds up the synthesis of different sounds.
3. Voice OS forging, empowering the voice ecosystem
3.1 Voice Assistant, Empowering Multi-Form Terminals
The intelligent voice interaction platform provides voice assistants for multi-form terminals, uses Hook technology to separate various sub-modules, realizes functions such as voice on demand, calls, audiobooks, and conversations, and assists the platform in building multi-modal recognition interactions such as voiceprints, emotions, and body sensations, as well as corresponding feedback and recommendation services. It is compatible with mainstream operating systems and supports custom interface expansion, greatly shortening the access cycle and R&D costs, and quickly enabling the voice interaction capabilities of ecological hardware and applications.
3.2 Voice plug-in, empowering massive applications
The platform provides voice interaction plug-ins for massive applications, formulates standard open protocols, and implements cross-process communication between third-party applications and Launcher based on IPC. When users call voice control, the platform generates hot words and word slot information, and Launcher dynamically matches and sends them to third-party applications, which can be used for live broadcast, on-demand, broadcast control, etc., thus achieving what you see is what you get.
4. Complete scenario packaging to provide system solutions
4.1 Whole-house intelligence
Based on intelligent voice interaction capabilities and the Andlink smart home cloud platform, it provides integrated whole-house smart solutions such as smart speakers, smart panels, smart lighting, smart switches, etc., enables access and voice control of cross-manufacturer devices, and can be combined with smart access control, cameras, etc. to realize segmented scenarios such as home security combinations.
4.2 AI Living Room
Create a smart screen, realize TV broadcast control recommendations through the combination of voice remote control, smart speakers and smart TVs, empower large-screen applications such as education, e-commerce, music, games, health, etc. with voice capabilities, give full play to lightweight voice skills, and achieve a user experience of getting what you want with just one word.
4.3 Intelligent Dialogue Service
It provides conversation understanding technology that integrates semantic deduction and semantic matching, with preset conversation capabilities and dictionaries covering audio and video entertainment, device control, life services and other fields. It can efficiently customize conversation capabilities and can be widely used in smart assistants, online customer service, voice tutoring and other fields.
5. Conclusion
The intelligent voice interaction system tackles technologies such as voice recognition, semantic understanding, and speech synthesis, quickly empowers multi-form terminals, and is applied to AI interaction experiences that "can listen and speak" and "can understand and think", achieving a leap from concepts and technologies to commercial products and functional applications, forming a new application ecosystem with voice interaction technology as the core, and promoting the rapid development of the artificial intelligence industry.
In the context of the rapid development of 5G, the high bandwidth and low latency characteristics have prompted intelligent voice interaction technology to continuously overcome new challenges and open a new chapter. At the "understanding" level, the focus is on creating a cognitive dialogue engine that "supports interruption and intelligent correction" to achieve the essential requirements of natural interaction; at the "application" level, the voice interaction content and skill ecosystem will penetrate various fields and be packaged in scenarios to truly achieve an interactive experience of "speaking and getting" for massive services; at the "access" level, voice assistants continue to expand their hub role, enabling more forms of terminals and interactive applications to scale up, so that everything can speak; at the "immersive" level, it will integrate intelligent human-computer interaction methods such as voice recognition, face recognition, expression analysis, lip movement status, eye tracking, gesture recognition, and tactile monitoring, improve the "end-end" and "end-cloud-end" interaction protocols, and create an immersive multimodal interactive experience.
As human-computer interaction becomes closer to natural expression, China Mobile's Smart Home Operation Center will continue to deepen the construction of the intelligent voice ecosystem and lead a better life in the future.
Previous article:LG: The world's first rollable TV will be launched in South Korea
Next article:Why the robot vacuum cleaner ran away from home: It lacked an autonomous driving algorithm
- Popular Resources
- Popular amplifiers
- Multi-port and shared memory architecture for high-performance ADAS SoCs
- Machine Learning: Architecture in the Age of Artificial Intelligence
- ARM Cortex-M4+Wi-Fi MCU Application Guide (Embedded Technology and Application Series) (Guo Shujun)
- Introduction to Artificial Intelligence and Robotics (Murphy)
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Confusion about TFT LCD interface
- What is the function of TI DSP GEL file?
- EEWORLD University ---- stm32f407 video tutorial
- How to use DSP software waiting?
- CCS compilation error: Solution for missing header file
- The eight major losses of switching power supplies are described in great detail!
- Regarding the timing issues and acquisition issues of AD chip ADS1251
- Power analyzer directly connected to CAN network
- RISCV Linux simulation environment construction and summary
- Introducing the TI C2000 stack evaluation method