Next-generation voice recognition: Technology that could revolutionize the in-car experience

Publisher:DazzlingGazeLatest update time:2024-07-18 Source: 贸泽电子 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Speech recognition is the ability of a device to respond to voice commands, enabling hands-free control of a wide range of devices. The earliest applications of this technology were automated phone systems and medical dictation software. Now, in cars and smartphones, speech recognition technology has become more widely used, such as Apple's Siri and voice commands in Tesla cars, which use advanced speech recognition technology.


In the car, the biggest benefit of a voice assistant is that it allows the driver to keep their eyes on the road and their hands on the steering wheel, while still having a safe and hands-free in-car experience, including making and receiving calls, selecting radio stations, setting navigation, or playing music, etc. In-car voice assistants are now a standard feature in most vehicles.


The rise of voice assistants in cars


A car voice assistant is a voice recognition control system that allows the driver to control the functions and features of the vehicle with voice, such as for the vehicle's climate control, entertainment settings, navigation and other functions, and can also be used for hands-free calling and sending text messages.


Honda was one of the first automakers to use voice recognition technology in its cars, offering a voice navigation system in 2004 that was used for voice command and control of audio, DVD and in-car environment controls. Over time, voice recognition technology in cars has improved significantly, and today, voice recognition technology in cars is able to accurately interpret the driver's commands and perform more complex operations.


As early as March 2022, Volkswagen chose to integrate Cerence's voice AI Cerence Drive 2.0 system into the Volkswagen Golf 8 GTI. The Cerence Drive 2.0 system used was launched in 2021. It integrates functions such as natural language understanding and text-to-speech technology into one stack, making the car voice recognition system more responsive. The increasing popularity of virtual voice assistants such as Siri, Alexa, Maluuba and Cotana has made our lives more convenient, and people are more accustomed to various emerging applications for in-car control by voice. The emergence of self-driving cars has strongly promoted the development of automotive voice recognition systems.


According to Precedence research, the global automotive voice recognition system market was valued at USD 2.89 billion in 2023 and is expected to exceed approximately USD 11.17 billion by 2032, growing at a CAGR of 16.20% during the forecast period 2023 to 2032.


Currently, market players in the voice recognition system market are investing heavily in biometrics and artificial intelligence technologies, which will provide more growth opportunities for the automotive recognition system market in the coming years.


picture.png

Development trend of automotive voice recognition system market from 2022 to 2032 (Source: Precedence research)


Analysis results from Vynz research indicate that the automotive voice recognition market was valued at $2.81 billion in 2023 and is expected to reach $6.87 billion by 2030, growing at a CAGR of 16.41% during the forecast period 2025-2030.


The forecast data of the two institutions are very close, from which we can also see that the entire industry has good expectations for the automotive voice recognition market.


Voice Recognition Technology in Automotive Innovation


In recent years, voice recognition technology has revolutionized the way consumers interact with their cars. From personalized voice interactions to hands-free operation that improves safety and the overall user experience, voice technology has become a catalyst for driving automotive innovation.


Tesla’s implementation of context-based voice commands represents a significant breakthrough in the application of automotive speech recognition technology, allowing users to interact with their Tesla vehicles more intuitively.


For example, users can directly speak their destination and the vehicle will plan a driving route map for you, simplifying the navigation process. In addition, the system can understand the context of previous commands and has complex natural language understanding capabilities, such as adjusting the temperature settings in the car by voice. This capability shows the potential for enhancing in-vehicle voice control systems and proves the importance of investing in advanced voice recognition technology.


Obtaining accurate and clear speech signals has always been a major challenge for in-car voice assistants. Road and wind noise, as well as multiple people using voice devices, can interfere with speech recognition accuracy. Most current in-car speech recognition systems use beamforming technology, which models the sound scene using a one-dimensional "direction of arrival" parameter.


However, in enclosed spaces such as cars, sound waves tend to bounce off windows and panels, so a microphone array is deployed to receive speech signals arriving from hundreds of directions before modeling. Today, the accuracy of speech recognition has improved significantly, with the error rate down to around 5% for a vocabulary of tens of thousands of words.


Voice control provides a safe and convenient solution for controlling complex human-machine interface (HMI) functions in modern cars. Developers use the power of machine learning (ML) and speech modeling to add local voice control capabilities to applications such as automotive voice assistants using custom commands and multiple wake-up words.


NXP has a range of voice control and communication software and system solutions that provide high-quality, reliable embedded voice processing for human-to-human and human-to-machine voice applications. The new member of its product portfolio, Intelligent Voice Technology (VIT), is a comprehensive advanced voice control software solution that can be used as a ready-made software library in the MCUXpresso Software Development Kit (SDK). Based on advanced deep learning and speech recognition technology, the VIT software provides a complete far-field audio front end (AFE) that supports up to three microphones, an always-on wake-up word engine and a voice command engine, as well as online tools for generating customer-defined wake-up word and voice command models.


As mentioned earlier, implementing reliable device-side voice control is not an easy task. Developers also need to select a high-performance signal processing hardware platform and corresponding voice processing software, including AFE beamformer, separate wake-up word engine and voice command engine, etc. VIT software is available on NXP i.MX edge processing platforms based on Arm Cortex-M7 and M33, Cadence Xtensa HiFi4 and Fusion F1 cores. Currently, i.MX cross-border MCU platforms that support VIT include:


• i.MX RT500 MCU (with M33, DSP and GPU cores)


• i.MX RT600 MCU (with M33 and DSP cores)


• i.MX RT1060 MCU (with M7 core)


• i.MX RT1160 MCU (with M7 and M4 cores)


• i.MX RT1170 MCU, featuring 1 GHz MCU (with M7 and M4 cores)


Among them, the i.MX RT500 crossover MCU is a dual-core microcontroller that uses the Arm Cortex-M33 core and the Cadence Xtensa Fusion F1 DSP, designed for low-power applications. The i.MX RT500 Cortex-M33 core runs at up to 275MHz and includes two coprocessors to provide higher performance. The Fusion DSP runs at up to 275MHz. The series offers rich peripherals, embedded security, and ultra-low power consumption, with up to 5MB SRAM and two FlexSPIs, each with 32KB cache.


Another i.MX RT1170 crossover MCU integrates the Arm Cortex-M7 and Arm Cortex-M4 cores, with real-time performance and high integration. The i.MX RT1170 Cortex-M7 runs at up to 1GHz, the Cortex-M4 runs at 400MHz, and has 2MB of on-chip RAM.


This real-time MCU provides a variety of memory interfaces and rich connectivity interfaces, including 3 high-speed Ethernet interfaces supporting TSN/AVB technology as well as UART, SPI, I2C, USB and 3 CAN-FD interfaces. In addition, i.MX RT1170 also enhances built-in security, including secure boot and encryption engine.


picture.png

NXP i.MX RT1170 crossover MCU system block diagram supporting VIT software (Source: NXP)


Four major challenges of automotive speech recognition technology


Speech recognition technology has been around for a long time, and while the popularity of automotive voice assistants has steadily increased, there are four challenges you’re likely to encounter when implementing and developing speech recognition technology:


1. Accuracy Challenge


Speech recognition systems (SRS) must be highly accurate to be useful and commercially valuable. According to a recent survey, 73% of respondents said that low accuracy is the main barrier to the adoption of speech recognition technology. When trying to improve the accuracy of speech recognition models, background noise has a significant impact.


The solution can be approached from three aspects: first, understand the user's usage environment before developing the model, and then choose a microphone with good sound source directionality; second, use linear noise reduction filters such as Gaussian filters to smooth the noise; third, build a denoising algorithm to smooth the signal when inputting/outputting sound.


2. Challenges of language, accent and dialect coverage


Currently, no SRS can cover all languages, dialects, and accents. An effective way to overcome this challenge is to expand the dataset. Only a large enough dataset can provide AI/ML model training for SRS.


3. Data privacy and security challenges


People's voice recordings can be used as their biometric data. Therefore, many people are hesitant to use voice recognition technology. There is no better solution to this problem. The only thing companies can do is to keep the application as transparent as possible and allow users to limit data collection by setting options.

[1] [2]
Reference address:Next-generation voice recognition: Technology that could revolutionize the in-car experience

Previous article:New standard for electric vehicle chargers: touch display, reliability and durability are key
Next article:14 car manufacturers indicate that they will use Apple's latest in-car system

Latest Automotive Electronics Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号