Interaction with smart speakers with screens is more intuitive, and shipment demand is growing

Publisher:SereneSoul55Latest update time:2019-05-09 Source: eefocus Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

At the 11th Google I/O conference, Google released the 10-inch Nest Hub Max speaker with a screen, priced at US$229. This is also the first product jointly created by the two parties since Nest was incorporated into Google.

 

Although many people still question whether smart speakers with screens are useless products, compared with pure voice interaction, interaction with screens is more intuitive. Currently, Internet giants including Amazon, Google, Facebook and Baidu have launched smart speakers with screens, using smart speakers as an entry point to provide their original services.

 

The latest research from market research firm Strategy Analytics shows that smart speakers are the hottest consumer electronics product in 2018. Shipments in the fourth quarter of 2018 increased by 95% to 38.5 million units, exceeding the total volume in 2017. Among them, smart speakers with screens accounted for more than 10% of the total shipment demand for smart speakers.

 

David Watkins, Director at Strategy Analytics, commented: “Smart speakers with screens, such as Google’s Home Hub, Amazon’s Echo Show and Baidu’s Xiaodu at Home, are popular with consumers, who are attracted by the combination of audio and video. Smart speakers with screens have more usage scenarios than just voice interaction. It is expected that by 2019, smart speakers with screens will become an important driving force for market growth.”

 

 

Innovation landing carrier

"Tmall Genie, what's the weather like in Beijing today?" "It's sunny in Beijing today, 12℃~28℃, air quality index 30."

 

Those who own Tmall Genie speakers will no doubt be familiar with the above conversation. Tmall Genie embodies conversational artificial intelligence, and the entire human-computer interaction process can be divided into four steps: awakening, recognition, understanding, and feedback.

 

"What's the weather like in Beijing today?" This voice can be converted into text through voice recognition, extracting keywords such as "today", "Beijing", and "weather", and then retrieving data from the weather forecast website behind it; the data retrieved from the weather forecast website is assembled into natural speech, which is speech synthesis, that is, "Today's weather in Beijing is sunny, 12℃~28℃, and the air index is 30."

 

"The reason I joined Alibaba is that I prefer work related to the implementation of technology," said Nie Zaiqing, head of Tmall Genie voice technology, in an interview with China Business News. He said that the research projects he led while working at Microsoft Research Asia (Human Cube, Microsoft Academic Search, LUIS) were more focused on the combination of innovation and actual technology implementation.

 

After joining Alibaba's artificial intelligence experiment, what impressed him most was the speed of innovation and the close coordination between cutting-edge technology and products. Since joining Alibaba on October 9, 2017, Nie Zaiqing has been responsible for the research and development of the Tmall Genie voice assistant algorithm.

 

Alibaba's AI lab is not a pure research department, it is closely related to business and commerce, even cutting-edge technology research and development is for future business and commerce research and development. This means that in addition to academics, the lab also has its own products and business logic.

 

Taking the continuous conversation capability as an example, many users have said that it is a bit tiring to call "Tmall Genie" every time before talking to Tmall Genie. Can a function be implemented so that users only need to wake up Tmall Genie once for multiple interactions with Tmall Genie in a short period of time? The biggest technical challenge to achieve this continuous conversation capability is to be able to distinguish which words the user says to Tmall Genie and which are not. In order to identify what the user says to Tmall Genie, two types of information are available: the semantic content of the user's words and the acoustic features of the user's voice, such as strength, pauses, and direction.

 

After many brainstorming sessions, Tmall Genie's speech and semantic scientists have jointly created a hybrid neural network that combines speech and semantic features, incorporating long short-term memory networks (LSTM), convolutional networks (CNN) and attention mechanisms (Attention), and combined with pre-trained language models. Through training on massive amounts of data, the deep network autonomously learns the ability to extract human-computer dialogue. Ultimately, while allowing users to enjoy convenient continuous interaction, it achieves the lowest false interruption rate in the industry. Nie Zaiqing revealed that more than one million users have actively turned on this feature, making it a new dialogue mode for voice interaction.

 

However, everyone has different interests and hobbies, and multiple members of a family share one Tmall Genie. Previously, there was a case where the Tmall Genie recommended songs that the parents liked to their daughter.

 

Nie Zaiqing said that voiceprint algorithm scientists and personalized recommendation scientists have come up with a groundbreaking solution that does not rely on voiceprint registration: directly using the acoustic features of voice commands in our personalized recommendation deep learning model (Transformer), creatively solving the personalized recommendation technical problems of multiple voice assistants mixed with each other due to low voiceprint registration rates and inaccurate voiceprint clustering. The user survey data of public blind evaluation shows that the addition of voiceprints greatly reduces the confusion of interest in song recommendations, effectively solving the recommendation problem of multiple people mixed with each other, and increasing the average user time by 10%.

 

No longer just a hardware war

The battle of smart speakers is no longer just a hardware war. The upgrade of more scenarios and the addition of innovative functions may be a more important part. At the Digital China Summit, Baidu CEO Robin Li said that smart homes represented by smart speakers can be said to be a new entrance to search in the AI ​​era. It allows people to interact with machines in a more natural way and is also the entrance to information services in the home.

 

From a certain perspective, the functions emphasized by smart speakers are not just the basic functions of speakers. For example, compared with ordinary smart speakers, speakers with screens generally have screens and cameras. Therefore, they can not only realize the original functions of smart speakers such as playing music, checking weather and news, and controlling smart home products, but also watch videos, make video calls, and even integrate security functions.

 

Compared with Google's previous speaker with screen, Home Hub, Nest Hub Max also adds a wide-angle smart camera and a larger screen size. Nest Hub Max can realize functions such as online video viewing, home control, photo taking, security monitoring and video calls. Google said that Nest Hub Max is specially designed for shared places for family and friends to gather.

 

The newly released Nest Hub Max also adds a Face Match feature. After this face unlocking feature, which is already quite common on mobile phones, is implemented on smart speakers, it can present or push specific services required by each family member in real time.

 

Google gave an example, saying, "When you walk into the kitchen in the morning, the smart assistant knows your schedule, commute details, weather, and other information you need for the day to greet you. When you get home from get off work, HubMax welcomes you home and provides reminders and messages to deal with. The smart assistant provides personalized recommendations for music and TV shows, and you can even see who has left you a video message."

 

Robin Li mentioned that two years ago, Baidu launched the world's first smart speaker with a screen, Xiaoyu at Home, which further activated Baidu's previous layout in the video field. Xiaoyu at Home's cooperation with Baidu began in 2015. In 2017, they jointly launched a smart speaker with a screen. In April 2017, they launched a new video call robot "Fenshengyu" equipped with Baidu DuerOS. In March 2018, Baidu announced a strategic investment in Xiaoyu at Home, providing support in terms of resources, funds, platforms, etc. In February 2019, the shipment volume of Xiaodu at Home's smart speaker with a screen exceeded that of Xiaodu smart speaker without a screen for the first time.

 

"Just like playing chess requires taking the initiative, insisting on technological innovation will allow us to make the 'first move' instead of being a follower." In essence, Robin Li's promotion of Baidu's smart speaker on many occasions is intended to compete for the right to speak at the entrance to the smart home.

 

However, whether it is smart speaker hardware technology or voice interaction technologies such as far-field recognition, voice recognition and semantic recognition, there are many problems, such as high false wake-up rate, unstable continuous dialogue, poor semantic understanding ability, etc. Some users said that they hope to improve the recognition rate, "Nowadays, people buy smart speakers only to listen to music, use them as alarm clocks, etc., but these mobile phone voice assistants can do it, and there are too few that can actually access and control home appliances."

 

Even in the United States, the biggest use of smart speakers is listening to music. A previous Nielsen report pointed out that almost all consumers (90%) use smart speakers to listen to music, while 68% listen to news; and about 81% of users use voice interaction to obtain real-time information, such as weather and traffic conditions.

 

David Mercer, Vice President at Strategy Analytics, said: “The question now is how to monetize the user base, and it will be interesting to see how each player responds to this challenge. The first step is to encourage consumers to use apps and services more widely and more frequently through smart speakers, which will bring revenue opportunities to device OEMs or platform providers. Voice shopping and ad insertion are very obvious ways.”

[1] [2]
Reference address:Interaction with smart speakers with screens is more intuitive, and shipment demand is growing

Previous article:Epson's cheap augmented reality glasses will be available in June
Next article:Flash memory prices continue to fall, Phison's revenue performance is poor

Latest Embedded Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号