Speech recognition is the ability of a device to respond to voice commands, enabling hands-free control of a wide range of devices. The earliest applications of this technology were automated phone systems and medical dictation software. Now, in cars and smartphones, speech recognition technology has become more widely used, such as Apple's Siri and voice commands in Tesla cars, which use advanced speech recognition technology.
In the car, the biggest benefit of a voice assistant is that it allows the driver to keep their eyes on the road and their hands on the steering wheel, while still having a safe and hands-free in-car experience, including making and receiving calls, selecting radio stations, setting navigation, or playing music, etc. In-car voice assistants are now a standard feature in most vehicles.
The rise of voice assistants in cars
A car voice assistant is a voice recognition control system that allows the driver to control the functions and features of the vehicle with voice, such as for the vehicle's climate control, entertainment settings, navigation and other functions, and can also be used for hands-free calling and sending text messages.
Honda was one of the first automakers to use voice recognition technology in its cars, offering a voice navigation system in 2004 that was used for voice command and control of audio, DVD and in-car environment controls. Over time, voice recognition technology in cars has improved significantly, and today, voice recognition technology in cars is able to accurately interpret the driver's commands and perform more complex operations.
As early as March 2022, Volkswagen chose to integrate Cerence's voice AI Cerence Drive 2.0 system into the Volkswagen Golf 8 GTI. The Cerence Drive 2.0 system used was launched in 2021. It integrates functions such as natural language understanding and text-to-speech technology into one stack, making the car voice recognition system more responsive. The increasing popularity of virtual voice assistants such as Siri, Alexa, Maluuba and Cotana has made our lives more convenient, and people are more accustomed to various emerging applications for in-car control by voice. The emergence of self-driving cars has strongly promoted the development of automotive voice recognition systems.
According to Precedence research, the global automotive voice recognition system market was valued at USD 2.89 billion in 2023 and is expected to exceed approximately USD 11.17 billion by 2032, growing at a CAGR of 16.20% during the forecast period 2023 to 2032.
Currently, market players in the voice recognition system market are investing heavily in biometrics and artificial intelligence technologies, which will provide more growth opportunities for the automotive recognition system market in the coming years.
Development trend of automotive voice recognition system market from 2022 to 2032 (Source: Precedence research)
Analysis results from Vynz research indicate that the automotive voice recognition market was valued at $2.81 billion in 2023 and is expected to reach $6.87 billion by 2030, growing at a CAGR of 16.41% during the forecast period 2025-2030.
The forecast data of the two institutions are very close, from which we can also see that the entire industry has good expectations for the automotive voice recognition market.
Voice Recognition Technology in Automotive Innovation
In recent years, voice recognition technology has revolutionized the way consumers interact with their cars. From personalized voice interactions to hands-free operation that improves safety and the overall user experience, voice technology has become a catalyst for driving automotive innovation.
Tesla’s implementation of context-based voice commands represents a significant breakthrough in the application of automotive speech recognition technology, allowing users to interact with their Tesla vehicles more intuitively.
For example, users can directly speak their destination and the vehicle will plan a driving route map for you, simplifying the navigation process. In addition, the system can understand the context of previous commands and has complex natural language understanding capabilities, such as adjusting the temperature settings in the car by voice. This capability shows the potential for enhancing in-vehicle voice control systems and proves the importance of investing in advanced voice recognition technology.
Obtaining accurate and clear speech signals has always been a major challenge for in-car voice assistants. Road and wind noise, as well as multiple people using voice devices, can interfere with speech recognition accuracy. Most current in-car speech recognition systems use beamforming technology, which models the sound scene using a one-dimensional "direction of arrival" parameter.
However, in enclosed spaces such as cars, sound waves tend to bounce off windows and panels, so a microphone array is deployed to receive speech signals arriving from hundreds of directions before modeling. Today, the accuracy of speech recognition has improved significantly, with the error rate down to around 5% for a vocabulary of tens of thousands of words.
Voice control provides a safe and convenient solution for controlling complex human-machine interface (HMI) functions in modern cars. Developers use the power of machine learning (ML) and speech modeling to add local voice control capabilities to applications such as automotive voice assistants using custom commands and multiple wake-up words.
NXP has a range of voice control and communication software and system solutions that provide high-quality, reliable embedded voice processing for human-to-human and human-to-machine voice applications. The new member of its product portfolio, Intelligent Voice Technology (VIT), is a comprehensive advanced voice control software solution that can be used as a ready-made software library in the MCUXpresso Software Development Kit (SDK). Based on advanced deep learning and speech recognition technology, the VIT software provides a complete far-field audio front end (AFE) that supports up to three microphones, an always-on wake-up word engine and a voice command engine, as well as online tools for generating customer-defined wake-up word and voice command models.
As mentioned earlier, implementing reliable device-side voice control is not an easy task. Developers also need to select a high-performance signal processing hardware platform and corresponding voice processing software, including AFE beamformer, separate wake-up word engine and voice command engine, etc. VIT software is available on NXP i.MX edge processing platforms based on Arm Cortex-M7 and M33, Cadence Xtensa HiFi4 and Fusion F1 cores. Currently, i.MX cross-border MCU platforms that support VIT include:
• i.MX RT500 MCU (with M33, DSP and GPU cores)
• i.MX RT600 MCU (with M33 and DSP cores)
• i.MX RT1060 MCU (with M7 core)
• i.MX RT1160 MCU (with M7 and M4 cores)
• i.MX RT1170 MCU, featuring 1 GHz MCU (with M7 and M4 cores)
Among them, the i.MX RT500 crossover MCU is a dual-core microcontroller that uses the Arm Cortex-M33 core and the Cadence Xtensa Fusion F1 DSP, designed for low-power applications. The i.MX RT500 Cortex-M33 core runs at up to 275MHz and includes two coprocessors to provide higher performance. The Fusion DSP runs at up to 275MHz. The series offers rich peripherals, embedded security, and ultra-low power consumption, with up to 5MB SRAM and two FlexSPIs, each with 32KB cache.
Another i.MX RT1170 crossover MCU integrates the Arm Cortex-M7 and Arm Cortex-M4 cores, with real-time performance and high integration. The i.MX RT1170 Cortex-M7 runs at up to 1GHz, the Cortex-M4 runs at 400MHz, and has 2MB of on-chip RAM.
This real-time MCU provides a variety of memory interfaces and rich connectivity interfaces, including 3 high-speed Ethernet interfaces supporting TSN/AVB technology as well as UART, SPI, I2C, USB and 3 CAN-FD interfaces. In addition, i.MX RT1170 also enhances built-in security, including secure boot and encryption engine.
NXP i.MX RT1170 crossover MCU system block diagram supporting VIT software (Source: NXP)
Four major challenges of automotive speech recognition technology
Speech recognition technology has been around for a long time, and while the popularity of automotive voice assistants has steadily increased, there are four challenges you’re likely to encounter when implementing and developing speech recognition technology:
1. Accuracy Challenge
Speech recognition systems (SRS) must be highly accurate to be useful and commercially valuable. According to a recent survey, 73% of respondents said that low accuracy is the main barrier to the adoption of speech recognition technology. When trying to improve the accuracy of speech recognition models, background noise has a significant impact.
The solution can be approached from three aspects: first, understand the user's usage environment before developing the model, and then choose a microphone with good sound source directionality; second, use linear noise reduction filters such as Gaussian filters to smooth the noise; third, build a denoising algorithm to smooth the signal when inputting/outputting sound.
2. Challenges of language, accent and dialect coverage
Currently, no SRS can cover all languages, dialects, and accents. An effective way to overcome this challenge is to expand the dataset. Only a large enough dataset can provide AI/ML model training for SRS.
3. Data privacy and security challenges
People's voice recordings can be used as their biometric data. Therefore, many people are hesitant to use voice recognition technology. There is no better solution to this problem. The only thing companies can do is to keep the application as transparent as possible and allow users to limit data collection by setting options.
Previous article:New standard for electric vehicle chargers: touch display, reliability and durability are key
Next article:14 car manufacturers indicate that they will use Apple's latest in-car system
- Popular Resources
- Popular amplifiers
- Next-generation automotive microcontrollers: STMicroelectronics technology analysis
- WPG World Peace Group launches automotive headlight solution based on easy-to-charge semiconductor products
- What is the car ZCU that we talk about every day?
- An article reviews the "no-map" intelligent driving solutions of various car companies
- Renesas takes the lead in launching multi-domain fusion SoC using automotive-grade 3nm process
- BYD and Huawei have made another big move!
- V2X technology accelerates, paving the way for advanced autonomous driving
- Rimac and Ceer to supply fully integrated electric drive systems for electric vehicles
- Huawei's all-solid-state battery has surfaced, achieving a major technological breakthrough!
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Red Hat announces definitive agreement to acquire Neural Magic
- 5G network speed is faster than 4G, but the perception is poor! Wu Hequan: 6G standard formulation should focus on user needs
- SEMI report: Global silicon wafer shipments increased by 6% in the third quarter of 2024
- OpenAI calls for a "North American Artificial Intelligence Alliance" to compete with China
- OpenAI is rumored to be launching a new intelligent body that can automatically perform tasks for users
- Nidec Intelligent Motion is the first to launch an electric clutch ECU for two-wheeled vehicles
- Nidec Intelligent Motion is the first to launch an electric clutch ECU for two-wheeled vehicles
- ASML provides update on market opportunities at 2024 Investor Day
- Arm: Focusing on efficient computing platforms, we work together to build a sustainable future
- AMD to cut 4% of its workforce to gain a stronger position in artificial intelligence chips
- Make a beautiful snowflake crystal ball
- 2-megapixel camera module with MIPI CSI-2 video interface, FPD-Link III and POC technology
- Ambilight-FPGA-HDMI Video Atmosphere Light Controller DIY
- How to reduce the brightness of the digital tube
- Understanding wireless communications starts with understanding these 20 professional terms
- Summary of the points to note when using CCS8.0 to program MSP430G2553
- [TI Course] Can gesture recognition only be done with awr1642?
- [RVB2601 Creative Application Development] Several confusing abbreviations and unboxing photos
- [Yatli AT32F421 Review] Rock-Paper-Scissors - Comparator
- Incomplete disassembly of blackbody radiation source