NXP releases a speech recognition engine for a new generation of smart speech technology portfolio. In this blog post, we’ll explore the challenges developers face in designing embedded voice controls, our new Speech to Intent engine, and how you can use it in your apps.
Hearing Your Voice: Challenges of Voice Commands in Embedded Systems
With companies like Amazon, Google, and Apple launching revolutionary smart speakers, embedded voice-controlled devices have become a hot trend, and the technology has actually been around for years. With these smart speakers, end users experience the convenience, practicality and intuitiveness of a voice-first device for the first time. Voice is the user interface (UI) of these devices and their most important or only mode of interaction. Leveraging natural language understanding technology in the cloud, smart speakers allow end users of voice-first devices to communicate with smart devices using natural language, so that requests, queries, and commands can be understood and responded to.
To implement natural language processing, designers and end users face several challenges, such as the requirement for stable, reliable network connections and the high power consumption of an always-on, always-listening device, not to mention the potential for such connected devices to Comes the privacy risk.
To address the speech engine challenges in embedded design, NXP has launched the VIT Speech to Intent engine, the latest product of its Intelligent Voice Technology ( VIT ) product portfolio . Learn more about VIT S2I .
Local voice control vs. cloud-based voice control
To make a device voice-controlled, engineers typically have three options: process it locally, process it in the cloud, or a combination of the two, which we call “hybrid processing.” With local voice control, end devices process all voice locally at the edge without having to connect to the cloud or remote servers for secondary processing. Cloud-based processing uses the computing power of the cloud to process voice audio, and then transmits the response generated by the cloud back to the device through the network. In the case of hybrid processing, a local wake word engine is typically used to wake the device (such as "Hey NXP"), and then all voice commands following that wake word are streamed to the cloud or remote server for processing.
Local processing has the advantages of low latency, low power consumption, and network independence, but it typically only supports basic keywords and commands that require precise wording. For example, turning on the lights might require the exact phrase "Hey, NXP (wake word), turn on the lights (voice command)" and there can be no variations.
For cloud processing and hybrid systems, the use of cloud services increases latency but offers the advantage of being able to run extremely complex algorithms, including natural language understanding models. Revisiting the example of turning on the lights just mentioned, using any combination of words, the system can understand the environment of the required operation, such as "It is dark here, please turn on the lights."
As mentioned earlier, a major drawback of cloud-based natural language processing is security and privacy concerns. Simply put, the principle of this method is to transmit the voice audio stream through the network to a remote server for processing, but this may also cause the system to accidentally start and transmit irrelevant audio streams to the cloud. These audio streams may include personal conversations, credentials, or other sensitive information.
Introduction to NXP Intelligent Voice Technology ( VIT ) Speech to Intent ( S2I ) Engine
To address the speech engine challenges in embedded design, NXP has launched the VIT Speech to Intent engine, the latest product of its Intelligent Voice Technology (VIT) product portfolio. The S2I engine is the high-end product of VIT's product portfolio, which also includes the free wake word engine (WWE) and voice command engine (VCE).
Unlike systems that rely on remote cloud services, VIT S2I is able to determine natural language intent locally. This capability is thanks to NXP’s latest developments in neural network algorithms and machine learning models designed for embedded systems. Therefore, the purpose of "turning on the lights" can be expressed in many different ways, such as "turn on the lights", "it's too dark" and "can you make the light brighter", etc.
This Speech to Intent capability enables users to interact with embedded systems more naturally while reducing system latency and power consumption of cloud-connected systems. Additionally, eliminating cloud services also helps improve security and privacy because all voice is processed locally on the device. In addition, if paired with the NXP wake word engine, ultra-low power consumption designs can be developed. Only after hearing a specific wake word, the VIT S2I engine will be started to process voice commands.
NXP devices supporting VIT S2I include: Arm® Cortex® - M :i.MX RT crossover MCUs and RW61x MCUs, as well as Cortex A i.MX 8M Mini, i.MX 8MPlus and i.MX 9x applications processors. VIT S2I currently supports English, Mandarin and Korean and will be launched by the end of 2023. Online development tools for creating custom commands and training models are planned for release in 2024.
VIT Speech to Intent block diagram
How VIT Speech to Intent can add speech capabilities to your next design
The field of Internet of Things is changing with each passing day, and VIT S2I can adapt to various application scenarios, whether it is home automation, wearable electronics, automotive telematics and building access control, etc., it can give full play to its advantages. Consumers like to use natural language to control basic functions of their devices hands-free, and cloud services that eliminate edge voice processing not only reduce system latency, but also reduce privacy and security issues.
For those devices that require a voice-first user interface, the VIT S2I system is an indispensable part, and it can be used in smart thermostats, smart appliances, home automation, lighting control, sunshade control and other fields. VIT S2I is also suitable for wearables and fitness devices, some use cases include setting reminders, controlling Bluetooth devices and monitoring health.
Enhance your applications with NXP 's VIT portfolio
If you want to develop using NXP's intelligent voice technology portfolio, you are welcome to use our free VIT wake word and voice command engine, available through the MCUXpresso SDK and online model tools. These engines allow you to easily customize wake words and basic voice control, making them suitable for rapid prototyping and development cycles that don't involve natural language understanding. If your application requires more natural language understanding capabilities, contact your local NXP representative to get started with VIT Speech to Intent.
Learn more about NXP’s speech processing portfolio and watch our VIT Speech to Intent demo.
author:
Chris Welsh
Director, IoT Voice and Audio Business Development, Edge Processing Business Unit
As a partner of Retune DSP, Chris joined NXP during the company's merger and acquisition in 2021. Chris focuses on creating value for customers through differentiated voice software technology and services. Chris brings more than 25 years of experience in the embedded voice and audio business to NXP, having served as an engineer, business development, founder, general manager and Senior management and other positions. Chris holds a bachelor's degree in mechanical engineering from Purdue University and a master's degree in acoustics from Pennsylvania State University.
Previous article:CS5213 replaces AG6200 HDMI to VGA with audio solution
Next article:CS5216 solution DP to HDMI1080P adapter cable solution
- Popular Resources
- Popular amplifiers
- Red Hat announces definitive agreement to acquire Neural Magic
- 5G network speed is faster than 4G, but the perception is poor! Wu Hequan: 6G standard formulation should focus on user needs
- SEMI report: Global silicon wafer shipments increased by 6% in the third quarter of 2024
- OpenAI calls for a "North American Artificial Intelligence Alliance" to compete with China
- OpenAI is rumored to be launching a new intelligent body that can automatically perform tasks for users
- Arm: Focusing on efficient computing platforms, we work together to build a sustainable future
- AMD to cut 4% of its workforce to gain a stronger position in artificial intelligence chips
- NEC receives new supercomputer orders: Intel CPU + AMD accelerator + Nvidia switch
- RW61X: Wi-Fi 6 tri-band device in a secure i.MX RT MCU
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- CGD and Qorvo to jointly revolutionize motor control solutions
- CGD and Qorvo to jointly revolutionize motor control solutions
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Infineon's PASCO2V15 XENSIV PAS CO2 5V Sensor Now Available at Mouser for Accurate CO2 Level Measurement
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- Learn about the future of Wi-Fi 6 / 6E through the Wi-Fi Alliance
- WS2410 high performance and low power consumption 2.4G SOC chip
- Does anyone know what circuit this is and what function it has?
- Two new TI boards
- How to observe the PWM output of DSPF2812 through graph in CCS?
- Analysis of the three most commonly used communication protocols in single-chip microcomputer systems
- MSP-EXP430F5529LP Development Board 001-GPIO
- Chip type and model
- Several modes of Bluetooth devices
- Acknowledgements | EEWorld’s 15th anniversary, thank you!