Deepening capabilities and opening up capabilities to build a smart home ecosystem-EEWORLD

Collect

With the introduction of my country's artificial intelligence development plan and the increasing maturity of various key technologies, the Internet of Things era has gradually become the next wave after the mobile Internet era. Human-computer interaction has ushered in a new round of demand innovation, evolving from traditional interaction with mouse, keyboard, and touch screen to voice interaction. Society is rapidly entering the era of intelligent voice interaction.

Taking "voice + content + intelligence" as the entry point, creating an independently designed, integrated and operated one-stop voice interaction sharing platform, building an operational and monetizable voice interaction ecosystem, empowering multi-form terminal products, and creating a human-computer interaction experience that can listen and speak is the general trend of future smart home development and also the key direction for China Mobile Smart Home Operation Center to promote the construction of digital home ecosystem.

1. Voice interaction is the key entry point for the smart home ecosystem layout

1.1 The demand for human-computer interaction continues to innovate

With the expansion of interactive scenarios, people have gradually put forward greater demands for interactive freedom, and voice interaction is getting closer and closer to human instinctive expression. With its advantages such as fast input speed, few scene restrictions, and mature technology chain development, voice interaction has become an ideal interactive channel in the intelligent era, and is gradually developing towards interactive intelligence, terminal polymorphism, and ubiquitous services.

1.2 Home scene services are more intelligent

Voice interaction is the key to the industrialization of the underlying artificial intelligence technology. Voice assistants connect multi-form terminals and a wide range of businesses, and can provide content services, Internet services, and scenario-based smart home control services, providing home users with new product experiences such as interactive entertainment, interactive education, family health, and home security. Among them, smart speakers have become the first explosive single product and are gradually extending to more product forms.

2. Core technology research to improve user experience

Intelligent voice interaction mainly involves technologies such as speech recognition, semantic understanding and speech synthesis. Speech recognition technology can convert speech streams into text, semantic understanding technology can analyze the meaning of sentences and analyze user intentions, and speech synthesis technology feeds back the analysis results to users in the form of speech, thereby realizing intelligent voice interaction with users.

2.1 Speech Recognition - Hear Clearly

The intelligent voice interaction platform now uses an end-to-end model based on the Transformer algorithm, which has the characteristics of fast recognition speed and high recognition accuracy. The model uses a self-attention mechanism based on context understanding, which improves the ability to extract semantic features and solves the problem that the acoustic model and language model cannot be jointly optimized in the traditional model. In addition, the algorithm can better utilize advanced hardware to achieve parallel computing, thereby improving the computing speed.

2.2 Semantic comprehension - understandable

The platform uses a multi-algorithm fusion model based on rules + depth + keyword matching algorithms to understand user intentions. The rule algorithm can achieve fast and accurate matching for shorter texts, the deep learning algorithm can identify new words that are not covered by the vocabulary, and the keyword matching algorithm can quickly and accurately identify text intent for problems such as inverted vocabulary order and long-tail text.

2.3 Speech Synthesis - Explain

The platform uses an end-to-end synthesis system that can directly input text or phonetic characters and directly output audio waveforms. The system reduces the requirements for linguistic knowledge and can batch implement synthesis systems for dozens or even more languages. It also exhibits rich pronunciation styles and powerful rhythmic expression, and speeds up the synthesis of different sounds.

3. Voice OS forging, empowering the voice ecosystem

3.1 Voice Assistant, Empowering Multi-Form Terminals

The intelligent voice interaction platform provides voice assistants for multi-form terminals, uses Hook technology to separate various sub-modules, realizes functions such as voice on demand, calls, audiobooks, and conversations, and assists the platform in building multi-modal recognition interactions such as voiceprints, emotions, and body sensations, as well as corresponding feedback and recommendation services. It is compatible with mainstream operating systems and supports custom interface expansion, greatly shortening the access cycle and R&D costs, and quickly enabling the voice interaction capabilities of ecological hardware and applications.

3.2 Voice plug-in, empowering massive applications

The platform provides voice interaction plug-ins for massive applications, formulates standard open protocols, and implements cross-process communication between third-party applications and Launcher based on IPC. When users call voice control, the platform generates hot words and word slot information, and Launcher dynamically matches and sends them to third-party applications, which can be used for live broadcast, on-demand, broadcast control, etc., thus achieving what you see is what you get.

4. Complete scenario packaging to provide system solutions

4.1 Whole-house intelligence

Based on intelligent voice interaction capabilities and the Andlink smart home cloud platform, it provides integrated whole-house smart solutions such as smart speakers, smart panels, smart lighting, smart switches, etc., enables access and voice control of cross-manufacturer devices, and can be combined with smart access control, cameras, etc. to realize segmented scenarios such as home security combinations.

4.2 AI Living Room

Create a smart screen, realize TV broadcast control recommendations through the combination of voice remote control, smart speakers and smart TVs, empower large-screen applications such as education, e-commerce, music, games, health, etc. with voice capabilities, give full play to lightweight voice skills, and achieve a user experience of getting what you want with just one word.

4.3 Intelligent Dialogue Service

It provides conversation understanding technology that integrates semantic deduction and semantic matching, with preset conversation capabilities and dictionaries covering audio and video entertainment, device control, life services and other fields. It can efficiently customize conversation capabilities and can be widely used in smart assistants, online customer service, voice tutoring and other fields.

5. Conclusion

The intelligent voice interaction system tackles technologies such as voice recognition, semantic understanding, and speech synthesis, quickly empowers multi-form terminals, and is applied to AI interaction experiences that "can listen and speak" and "can understand and think", achieving a leap from concepts and technologies to commercial products and functional applications, forming a new application ecosystem with voice interaction technology as the core, and promoting the rapid development of the artificial intelligence industry.

In the context of the rapid development of 5G, the high bandwidth and low latency characteristics have prompted intelligent voice interaction technology to continuously overcome new challenges and open a new chapter. At the "understanding" level, the focus is on creating a cognitive dialogue engine that "supports interruption and intelligent correction" to achieve the essential requirements of natural interaction; at the "application" level, the voice interaction content and skill ecosystem will penetrate various fields and be packaged in scenarios to truly achieve an interactive experience of "speaking and getting" for massive services; at the "access" level, voice assistants continue to expand their hub role, enabling more forms of terminals and interactive applications to scale up, so that everything can speak; at the "immersive" level, it will integrate intelligent human-computer interaction methods such as voice recognition, face recognition, expression analysis, lip movement status, eye tracking, gesture recognition, and tactile monitoring, improve the "end-end" and "end-cloud-end" interaction protocols, and create an immersive multimodal interactive experience.

As human-computer interaction becomes closer to natural expression, China Mobile's Smart Home Operation Center will continue to deepen the construction of the intelligent voice ecosystem and lead a better life in the future.

Keywords：Intelligence Reference address：Deepening capabilities and opening up capabilities to build a smart home ecosystem

Previous article：LG: The world's first rollable TV will be launched in South Korea
Next article：Why the robot vacuum cleaner ran away from home: It lacked an autonomous driving algorithm

Popular Resources
Popular amplifiers