Speech is an effective way for people to express their thoughts and desires. Before the advent of the industrial age, humans discovered that animals could be trained to recognize and respond to basic commands that allowed them to perform certain tasks.
The next logical development would be to develop a method of communicating with machines using sound signals and directing their actions. Voice and audio have become increasingly popular as control interfaces for electronic devices in recent years, and the technology is continually evolving to meet user expectations and the requirements of new applications.
In this article, we will explain the benefits of controlling electronic devices and machines with voice and audio signals and review how to achieve this control. We will also show how such control interfaces can now be embedded in offline devices and how the audio control experience they provide can be greatly improved.
Controlling electronic devices with your voice
There are several obvious benefits to using voice control to interact with machines:
●For humans, speech is an intuitive form of communication and it is easier to convey instructions verbally.
● Voice communication is still possible even if a person’s eyes and hands are being used for other things. Real-time voice control is also convenient, for example, in certain application scenarios such as driving, it is illegal to try to control other devices in the car by touch.
● Voice is an effective medium for controlling machines. Using voice to control machines can monitor and respond without the need for complex instructions.
●Integrating voice control into devices can minimize the need for touchscreens in many devices. This is especially ideal for remote or portable battery-powered devices, where reducing size and power consumption are common design challenges. For applications with multiple users, removing touchscreen controls is also more hygienic.
● As shown in Figure 1, touchscreen control may not be a realistic option for some people with disabilities, so voice can be an effective support tool. Interacting with machines via voice can be used to perform tasks such as opening a door, or to transmit a person’s recent health status through remote communication.
Figure 1: Voice-controlled robot assistant. (Source: PaO_STUDIO via Shutterstock)
The audio front end (AFE) of a voice-controlled device consists of a microphone array and a signal processing module. The AFE processes the signal from the multi-channel microphone array to eliminate any background noise or interference from the device's own playback. The signal is then sent to a "wake-word" detection engine, which is pre-programmed on the device to recognize words such as "Alexa" or "OK Google." Using a variety of signal processing algorithms, a variety of unwanted interfering signals can be eliminated. The components of the voice control solution include:
Microphone arrays: Voice-activated systems require one or more microphones to capture audio control signals. When selecting a microphone array, important considerations include size, cost, performance, and robustness. Optimizing the combination of different signals from a multi-microphone array helps improve the signal-to-noise ratio (SNR) of the audio signal chain.
Direction of Arrival (DoA) detector: Used to determine the user's position relative to the controlled device so that the microphone array can adjust the beam to the direction of the speech.
Beamformer: It accepts the sound from the DoA detector while removing the sound from other directions. Its performance depends on the geometry of the microphone array, SNR, beam width, and background noise level, etc.
Acoustic Echo Canceller (AEC): This removes the playback signal from the device speaker itself (for example, when a voice command is received while music is playing on the device speaker) so that the user's voice command can be picked up clearly.
Adaptive Interference Canceller (AIC): It is able to cancel external noise from other sound sources that are difficult to eliminate with traditional beamformers, such as loud noise generated by other devices.
Wake-up word detector: Compares the processed voice signal from the AFE to a library of wake-up words, such as “Hey Google” using a wake-up word detection algorithm, which is usually part of a machine learning model. Larger models are more accurate, for example, a 1MB trained model is more accurate than a 64kB model, but is more processing intensive. Large wake-up word models are needed to accurately detect the wake-up word, thereby reducing the number of false alarms.
Class D Audio Amplifier
The voice processing portion of the control interface has been extensively developed, and even low-cost devices now offer accurate voice recognition capabilities. However, the audio side of the interface has received significantly less attention, meaning that many early smart speakers and other audio-enabled Internet of Things (IoT) devices produce poor sound quality compared to high-end audio devices.
Any new product related to voice control might be seen as a distraction from these shortcomings. However, as smart devices become more widely adopted, consumers are expecting more from the audio experience they provide. The low efficiency of traditional Class AB audio amplifiers has made them unusable in low-power IoT devices. Fortunately, several chip manufacturers have recently introduced a series of advanced Class D audio amplifiers that represent significant improvements over previously available audio amplifiers. Many of these products have been developed specifically to enable high-quality audio in smart technology and IoT devices.
Texas Instruments' TAS2770 15W input audio amplifier improves loudness and audio quality, and its enhanced voice capture capability means easier and more natural operation of voice-controlled devices. Maxim Integrated (now part of Analog Devices) has developed the MAX98357 and MAX98358 Class-D amplifiers, which are 92% efficient and provide 3.2W Class-AB audio performance. A simplified block diagram of these amplifiers is shown in Figure 2. Diodes Incorporated's PAM8106 has low power consumption, allowing it to operate well in devices driven by 1.5V lead-acid batteries and 3.5V lithium-ion batteries.
Figure 2: Simplified block diagram of a Maxim Integrated Class D audio amplifier. (Source: Maxim Integrated)
Offline voice control
Cloud-based solutions such as Amazon's Alexa and Google Assistant are easy to use for devices with a stable internet connection, but for those devices that don't have a stable internet connection or no connection at all, offline voice control is a better solution. For example, if a product needs to respond to simple word commands such as go, stop, reset, etc. (commonly known as keyword spotting), it makes sense to do local processing on the device itself. Simple keyword command systems can be implemented using low-cost embedded microcontrollers, such as NXP's EdgeReady MCU-based offline local voice control solution. It uses i.MX RT crossover MCUs, allowing developers to quickly integrate voice control into their products. NXP's i.MX RT106S-based solution includes the SLN-LOCAL2-IOT development kit, as shown in Figure 3.
The development kit comes with fully integrated software running on FreeRTOS and a software development kit (SDK) is available to enable rapid proof of concept. Offline voice control also helps address the privacy concerns of many consumers who worry that their systems are vulnerable to online hacking.
Figure 3: NXP’s SLN-LOCAL2-IOT offline voice control solution. (Source: NXP)
in conclusion
Voice and audio are quickly becoming the preferred control interface for many smart devices, and this technology is particularly suitable for use in low-power and portable IoT devices because it can eliminate the requirement for expensive and power-hungry digital displays. Many early systems had poor audio quality and could only be implemented using cloud-connected solutions.
However, the advent of a new generation of highly efficient Class D audio amplifiers has enabled manufacturers to ensure their devices deliver a high-quality audio experience for consumers. And there are now other solutions available that enable voice control of devices even when Internet connectivity is spotty or non-existent. These innovations demonstrate the ability of voice control technology to adapt to new needs as people become more accustomed to this control interface, and this trend will continue.
Previous article:Design and implementation of AI fitness camera based on RK3566
Next article:High signal-to-noise ratio microphones are turning laptops into all-round communication hubs
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- How to choose the right fuse for your design
- Build an IP surveillance camera system using low-cost FPGA
- Microwave Heating Principle
- 【Development Kit for nRF52840】+ Hardware Review
- [Repost] Comparison between aluminum electrolytic capacitors and tantalum capacitors
- Programming example: CPU card response data command
- How to use multiple devices to study radiation signals - Part 1
- Tianjin - Recruiting embedded software and hardware engineers - fresh graduates preferred
- Flash memory failure characteristics
- [RVB2601 Creative Application Development] User Experience 05 -- TCP Client