Advances in Voice and Audio Control in Electronic Devices-EEWORLD

Collect

Speech is an effective way for people to express their thoughts and desires. Before the advent of the industrial age, humans discovered that animals could be trained to recognize and respond to basic commands that allowed them to perform certain tasks.

The next logical development would be to develop a method of communicating with machines using sound signals and directing their actions. Voice and audio have become increasingly popular as control interfaces for electronic devices in recent years, and the technology is continually evolving to meet user expectations and the requirements of new applications.

In this article, we will explain the benefits of controlling electronic devices and machines with voice and audio signals and review how to achieve this control. We will also show how such control interfaces can now be embedded in offline devices and how the audio control experience they provide can be greatly improved.

Controlling electronic devices with your voice

There are several obvious benefits to using voice control to interact with machines:

●For humans, speech is an intuitive form of communication and it is easier to convey instructions verbally.

● Voice communication is still possible even if a person’s eyes and hands are being used for other things. Real-time voice control is also convenient, for example, in certain application scenarios such as driving, it is illegal to try to control other devices in the car by touch.

● Voice is an effective medium for controlling machines. Using voice to control machines can monitor and respond without the need for complex instructions.

●Integrating voice control into devices can minimize the need for touchscreens in many devices. This is especially ideal for remote or portable battery-powered devices, where reducing size and power consumption are common design challenges. For applications with multiple users, removing touchscreen controls is also more hygienic.

● As shown in Figure 1, touchscreen control may not be a realistic option for some people with disabilities, so voice can be an effective support tool. Interacting with machines via voice can be used to perform tasks such as opening a door, or to transmit a person’s recent health status through remote communication.

Figure 1: Voice-controlled robot assistant. (Source: PaO_STUDIO via Shutterstock)

The audio front end (AFE) of a voice-controlled device consists of a microphone array and a signal processing module. The AFE processes the signal from the multi-channel microphone array to eliminate any background noise or interference from the device's own playback. The signal is then sent to a "wake-word" detection engine, which is pre-programmed on the device to recognize words such as "Alexa" or "OK Google." Using a variety of signal processing algorithms, a variety of unwanted interfering signals can be eliminated. The components of the voice control solution include:

Microphone arrays: Voice-activated systems require one or more microphones to capture audio control signals. When selecting a microphone array, important considerations include size, cost, performance, and robustness. Optimizing the combination of different signals from a multi-microphone array helps improve the signal-to-noise ratio (SNR) of the audio signal chain.

Direction of Arrival (DoA) detector: Used to determine the user's position relative to the controlled device so that the microphone array can adjust the beam to the direction of the speech.

Beamformer: It accepts the sound from the DoA detector while removing the sound from other directions. Its performance depends on the geometry of the microphone array, SNR, beam width, and background noise level, etc.

Acoustic Echo Canceller (AEC): This removes the playback signal from the device speaker itself (for example, when a voice command is received while music is playing on the device speaker) so that the user's voice command can be picked up clearly.

Adaptive Interference Canceller (AIC): It is able to cancel external noise from other sound sources that are difficult to eliminate with traditional beamformers, such as loud noise generated by other devices.

Wake-up word detector: Compares the processed voice signal from the AFE to a library of wake-up words, such as “Hey Google” using a wake-up word detection algorithm, which is usually part of a machine learning model. Larger models are more accurate, for example, a 1MB trained model is more accurate than a 64kB model, but is more processing intensive. Large wake-up word models are needed to accurately detect the wake-up word, thereby reducing the number of false alarms.

Class D Audio Amplifier

The voice processing portion of the control interface has been extensively developed, and even low-cost devices now offer accurate voice recognition capabilities. However, the audio side of the interface has received significantly less attention, meaning that many early smart speakers and other audio-enabled Internet of Things (IoT) devices produce poor sound quality compared to high-end audio devices.

Any new product related to voice control might be seen as a distraction from these shortcomings. However, as smart devices become more widely adopted, consumers are expecting more from the audio experience they provide. The low efficiency of traditional Class AB audio amplifiers has made them unusable in low-power IoT devices. Fortunately, several chip manufacturers have recently introduced a series of advanced Class D audio amplifiers that represent significant improvements over previously available audio amplifiers. Many of these products have been developed specifically to enable high-quality audio in smart technology and IoT devices.

Texas Instruments' TAS2770 15W input audio amplifier improves loudness and audio quality, and its enhanced voice capture capability means easier and more natural operation of voice-controlled devices. Maxim Integrated (now part of Analog Devices) has developed the MAX98357 and MAX98358 Class-D amplifiers, which are 92% efficient and provide 3.2W Class-AB audio performance. A simplified block diagram of these amplifiers is shown in Figure 2. Diodes Incorporated's PAM8106 has low power consumption, allowing it to operate well in devices driven by 1.5V lead-acid batteries and 3.5V lithium-ion batteries.

Figure 2: Simplified block diagram of a Maxim Integrated Class D audio amplifier. (Source: Maxim Integrated)

Offline voice control

Cloud-based solutions such as Amazon's Alexa and Google Assistant are easy to use for devices with a stable internet connection, but for those devices that don't have a stable internet connection or no connection at all, offline voice control is a better solution. For example, if a product needs to respond to simple word commands such as go, stop, reset, etc. (commonly known as keyword spotting), it makes sense to do local processing on the device itself. Simple keyword command systems can be implemented using low-cost embedded microcontrollers, such as NXP's EdgeReady MCU-based offline local voice control solution. It uses i.MX RT crossover MCUs, allowing developers to quickly integrate voice control into their products. NXP's i.MX RT106S-based solution includes the SLN-LOCAL2-IOT development kit, as shown in Figure 3.

The development kit comes with fully integrated software running on FreeRTOS and a software development kit (SDK) is available to enable rapid proof of concept. Offline voice control also helps address the privacy concerns of many consumers who worry that their systems are vulnerable to online hacking.

Figure 3: NXP’s SLN-LOCAL2-IOT offline voice control solution. (Source: NXP)

in conclusion

Voice and audio are quickly becoming the preferred control interface for many smart devices, and this technology is particularly suitable for use in low-power and portable IoT devices because it can eliminate the requirement for expensive and power-hungry digital displays. Many early systems had poor audio quality and could only be implemented using cloud-connected solutions.

However, the advent of a new generation of highly efficient Class D audio amplifiers has enabled manufacturers to ensure their devices deliver a high-quality audio experience for consumers. And there are now other solutions available that enable voice control of devices even when Internet connectivity is spotty or non-existent. These innovations demonstrate the ability of voice control technology to adapt to new needs as people become more accustomed to this control interface, and this trend will continue.

Reference address：Advances in Voice and Audio Control in Electronic Devices

Previous article：Design and implementation of AI fitness camera based on RK3566
Next article：High signal-to-noise ratio microphones are turning laptops into all-round communication hubs

Popular Resources
Popular amplifiers