At the 11th Google I/O conference, Google released the 10-inch Nest Hub Max speaker with a screen, priced at US$229. This is also the first product jointly created by the two parties since Nest was incorporated into Google.
Although many people still question whether smart speakers with screens are useless products, compared with pure voice interaction, interaction with screens is more intuitive. Currently, Internet giants including Amazon, Google, Facebook and Baidu have launched smart speakers with screens, using smart speakers as an entry point to provide their original services.
The latest research from market research firm Strategy Analytics shows that smart speakers are the hottest consumer electronics product in 2018. Shipments in the fourth quarter of 2018 increased by 95% to 38.5 million units, exceeding the total volume in 2017. Among them, smart speakers with screens accounted for more than 10% of the total shipment demand for smart speakers.
David Watkins, Director at Strategy Analytics, commented: “Smart speakers with screens, such as Google’s Home Hub, Amazon’s Echo Show and Baidu’s Xiaodu at Home, are popular with consumers, who are attracted by the combination of audio and video. Smart speakers with screens have more usage scenarios than just voice interaction. It is expected that by 2019, smart speakers with screens will become an important driving force for market growth.”
Innovation landing carrier
"Tmall Genie, what's the weather like in Beijing today?" "It's sunny in Beijing today, 12℃~28℃, air quality index 30."
Those who own Tmall Genie speakers will no doubt be familiar with the above conversation. Tmall Genie embodies conversational artificial intelligence, and the entire human-computer interaction process can be divided into four steps: awakening, recognition, understanding, and feedback.
"What's the weather like in Beijing today?" This voice can be converted into text through voice recognition, extracting keywords such as "today", "Beijing", and "weather", and then retrieving data from the weather forecast website behind it; the data retrieved from the weather forecast website is assembled into natural speech, which is speech synthesis, that is, "Today's weather in Beijing is sunny, 12℃~28℃, and the air index is 30."
"The reason I joined Alibaba is that I prefer work related to the implementation of technology," said Nie Zaiqing, head of Tmall Genie voice technology, in an interview with China Business News. He said that the research projects he led while working at Microsoft Research Asia (Human Cube, Microsoft Academic Search, LUIS) were more focused on the combination of innovation and actual technology implementation.
After joining Alibaba's artificial intelligence experiment, what impressed him most was the speed of innovation and the close coordination between cutting-edge technology and products. Since joining Alibaba on October 9, 2017, Nie Zaiqing has been responsible for the research and development of the Tmall Genie voice assistant algorithm.
Alibaba's AI lab is not a pure research department, it is closely related to business and commerce, even cutting-edge technology research and development is for future business and commerce research and development. This means that in addition to academics, the lab also has its own products and business logic.
Taking the continuous conversation capability as an example, many users have said that it is a bit tiring to call "Tmall Genie" every time before talking to Tmall Genie. Can a function be implemented so that users only need to wake up Tmall Genie once for multiple interactions with Tmall Genie in a short period of time? The biggest technical challenge to achieve this continuous conversation capability is to be able to distinguish which words the user says to Tmall Genie and which are not. In order to identify what the user says to Tmall Genie, two types of information are available: the semantic content of the user's words and the acoustic features of the user's voice, such as strength, pauses, and direction.
After many brainstorming sessions, Tmall Genie's speech and semantic scientists have jointly created a hybrid neural network that combines speech and semantic features, incorporating long short-term memory networks (LSTM), convolutional networks (CNN) and attention mechanisms (Attention), and combined with pre-trained language models. Through training on massive amounts of data, the deep network autonomously learns the ability to extract human-computer dialogue. Ultimately, while allowing users to enjoy convenient continuous interaction, it achieves the lowest false interruption rate in the industry. Nie Zaiqing revealed that more than one million users have actively turned on this feature, making it a new dialogue mode for voice interaction.
However, everyone has different interests and hobbies, and multiple members of a family share one Tmall Genie. Previously, there was a case where the Tmall Genie recommended songs that the parents liked to their daughter.
Nie Zaiqing said that voiceprint algorithm scientists and personalized recommendation scientists have come up with a groundbreaking solution that does not rely on voiceprint registration: directly using the acoustic features of voice commands in our personalized recommendation deep learning model (Transformer), creatively solving the personalized recommendation technical problems of multiple voice assistants mixed with each other due to low voiceprint registration rates and inaccurate voiceprint clustering. The user survey data of public blind evaluation shows that the addition of voiceprints greatly reduces the confusion of interest in song recommendations, effectively solving the recommendation problem of multiple people mixed with each other, and increasing the average user time by 10%.
No longer just a hardware war
The battle of smart speakers is no longer just a hardware war. The upgrade of more scenarios and the addition of innovative functions may be a more important part. At the Digital China Summit, Baidu CEO Robin Li said that smart homes represented by smart speakers can be said to be a new entrance to search in the AI era. It allows people to interact with machines in a more natural way and is also the entrance to information services in the home.
From a certain perspective, the functions emphasized by smart speakers are not just the basic functions of speakers. For example, compared with ordinary smart speakers, speakers with screens generally have screens and cameras. Therefore, they can not only realize the original functions of smart speakers such as playing music, checking weather and news, and controlling smart home products, but also watch videos, make video calls, and even integrate security functions.
Compared with Google's previous speaker with screen, Home Hub, Nest Hub Max also adds a wide-angle smart camera and a larger screen size. Nest Hub Max can realize functions such as online video viewing, home control, photo taking, security monitoring and video calls. Google said that Nest Hub Max is specially designed for shared places for family and friends to gather.
The newly released Nest Hub Max also adds a Face Match feature. After this face unlocking feature, which is already quite common on mobile phones, is implemented on smart speakers, it can present or push specific services required by each family member in real time.
Google gave an example, saying, "When you walk into the kitchen in the morning, the smart assistant knows your schedule, commute details, weather, and other information you need for the day to greet you. When you get home from get off work, HubMax welcomes you home and provides reminders and messages to deal with. The smart assistant provides personalized recommendations for music and TV shows, and you can even see who has left you a video message."
Robin Li mentioned that two years ago, Baidu launched the world's first smart speaker with a screen, Xiaoyu at Home, which further activated Baidu's previous layout in the video field. Xiaoyu at Home's cooperation with Baidu began in 2015. In 2017, they jointly launched a smart speaker with a screen. In April 2017, they launched a new video call robot "Fenshengyu" equipped with Baidu DuerOS. In March 2018, Baidu announced a strategic investment in Xiaoyu at Home, providing support in terms of resources, funds, platforms, etc. In February 2019, the shipment volume of Xiaodu at Home's smart speaker with a screen exceeded that of Xiaodu smart speaker without a screen for the first time.
"Just like playing chess requires taking the initiative, insisting on technological innovation will allow us to make the 'first move' instead of being a follower." In essence, Robin Li's promotion of Baidu's smart speaker on many occasions is intended to compete for the right to speak at the entrance to the smart home.
However, whether it is smart speaker hardware technology or voice interaction technologies such as far-field recognition, voice recognition and semantic recognition, there are many problems, such as high false wake-up rate, unstable continuous dialogue, poor semantic understanding ability, etc. Some users said that they hope to improve the recognition rate, "Nowadays, people buy smart speakers only to listen to music, use them as alarm clocks, etc., but these mobile phone voice assistants can do it, and there are too few that can actually access and control home appliances."
Even in the United States, the biggest use of smart speakers is listening to music. A previous Nielsen report pointed out that almost all consumers (90%) use smart speakers to listen to music, while 68% listen to news; and about 81% of users use voice interaction to obtain real-time information, such as weather and traffic conditions.
David Mercer, Vice President at Strategy Analytics, said: “The question now is how to monetize the user base, and it will be interesting to see how each player responds to this challenge. The first step is to encourage consumers to use apps and services more widely and more frequently through smart speakers, which will bring revenue opportunities to device OEMs or platform providers. Voice shopping and ad insertion are very obvious ways.”
Previous article:In 2019, the average mobile network traffic fee dropped by more than 20%.
Next article:The automotive radar chip business has huge opportunities, and China will see an explosive growth period in the next three to five years
- Popular Resources
- Popular amplifiers
- Apple faces class action lawsuit from 40 million UK iCloud users, faces $27.6 billion in claims
- Apple and Samsung reportedly failed to develop ultra-thin high-density batteries, iPhone 17 Air and Galaxy S25 Slim phones became thicker
- Micron will appear at the 2024 CIIE, continue to deepen its presence in the Chinese market and lead sustainable development
- Qorvo: Innovative technologies lead the next generation of mobile industry
- BOE exclusively supplies Nubia and Red Magic flagship new products with a new generation of under-screen display technology, leading the industry into the era of true full-screen
- OPPO and Hong Kong Polytechnic University renew cooperation to upgrade innovation research center and expand new boundaries of AI imaging
- Gurman: Vision Pro will upgrade the chip, Apple is also considering launching glasses connected to the iPhone
- OnePlus 13 officially released: the first flagship of the new decade is "Super Pro in every aspect"
- Goodix Technology helps iQOO 13 create a new flagship experience for e-sports performance
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Why is there only one channel with data when using STM32's TIMER to capture DMA?
- The concept and function of PA and LNA in Bluetooth module
- [Shanghai Hangxin ACM32F070 development board review] 7. Give the watchdog a thread
- New member reporting in!
- Well-known semiconductor manufacturer in Shanghai Zhangjiang recruits FAE and R&D engineers
- Amazing! TI launches smart high-tech clothing to help curb teenage obesity
- EEWORLD University ---- TPS65218D0: User Programming of Multi-Rail Power Management IC (PMIC)
- DAPLink version upgraded to 0254
- The withstand voltage of film capacitors
- Is there an official reference routine for the C6748 PRU?