High signal-to-noise ratio MEMS microphone drives artificial intelligence interaction-EEWORLD

Collect

By: Dr. Gunar Lorenz Senior Director, Technical Marketing, Infineon Technologies

Proofreader: Ding Yue Chief Engineer of Consumer, Computing and Communications Business in Greater China, Infineon Technologies

Introduction

At Infineon, we have always believed that excellent audio solutions are essential to enhance the user experience of consumer devices. We are proud of our unwavering commitment to innovation and the significant advances we have made in active noise cancellation, voice transmission, studio recording, audio zoom and other related technologies. As a leading supplier of MEMS microphones, Infineon focuses its resources on improving the audio quality of MEMS microphones to bring excellent experience to a variety of consumer devices such as TWS and over-ear headphones, laptops, tablets, conferencing systems, smartphones, smart speakers, hearing aids and even cars.

Today, we live in an exciting time where AI is revolutionizing daily life and tools like ChatGPT are redefining productivity through intuitive text and voice interactions. As AI systems continue to advance, traditional business models, beliefs, and assumptions are being challenged. What role does voice play in the emerging AI ecosystem? As business leaders, do we need to rethink our beliefs? Will the rise of generative AI reduce the importance of high-quality voice input, or will high-quality voice input become a necessity for widespread adoption of AI services and personal assistants?

Artificial Intelligence: From Right-Hand Assistant to Best Friend

It’s natural for humans to tailor their responses not only to the content of a question, but also to the format in which it’s asked. The human voice provides a variety of clues that can be used to determine the age, gender, social and cultural background, and emotional state of the person asking the question. Additionally, recognizing the context (e.g., an airport, an office, in traffic, or a physical activity like running) can help determine the questioner’s intent and tailor answers accordingly, leading to a better conversation.

Despite the great advances in AI capabilities, it is still believed that AI-based assistance tools lack the ability to correctly predict the human intention of asking a question or how a specific message will be interpreted. To improve human-computer interaction, AI should consider three key factors when making rhetorical choices: understanding of the listener, the listener's emotional state, and the environmental context.

In many cases, the audio signal received alone is sufficient to extract useful information and respond appropriately. For example, consider a phone call or audio conference with someone you have never met. More importantly, consider how one's perception of another person develops and changes after repeated conversations without the opportunity to communicate in person.

Recent research has shown that even small changes in an AI’s verbal response style can lead to noticeable changes in the AI’s social abilities and personality. It is reasonable to assume that, given the right level of vocal input, future AI systems will be able to function as effective companions, exhibiting the behaviors of a human friend, such as asking questions and really listening to the answers, or simply listening and reserving judgment when appropriate.

How do humans experience audio signals?

Like any verbal communication, audio messages use words and text to convey thoughts, emotions, and ideas. In addition, other elements of communication such as pitch, speed, volume, and background noise can affect the overall perception of the message.

From a scientific point of view, the human ear perceives audio signals based on two key factors: frequency and sound pressure level. Sound pressure level (SPL) is measured in decibels (dBSPL) and represents the amplitude of sound pressure oscillating around the ambient atmospheric pressure. A sound pressure level of 100dBSPL is equivalent to the loud noise of a lawn mower or a helicopter. The lowest point in the sound pressure level range (0dB) is equivalent to a sound pressure oscillation of 20µPa, which represents the hearing threshold of a healthy young person with optimal hearing at a frequency of 1kHz. All human sounds related to speech fall into the frequency band of 100Hz to 8kHz. The corresponding human hearing thresholds according to the ISO 226:2023 standard are shown in Figure 1.

Figure 1: Hearing threshold: Sound level at which a person makes 50% correct detection responses in repeated trials according to ISO 226:2023

As shown in Figure 1, the human ear is particularly sensitive to frequencies in the range of 500Hz to 6kHz. Any frequency balance issues at these frequencies will have a significant impact on the perceived quality of sounds and instruments. Frequencies between 500Hz and 4kHz contain most of the information in human speech that affects speech intelligibility. Specifically, frequencies around 2 kHz are particularly important. Frequencies between 5kHz and 10kHz are very important for music. These frequencies add "liveliness" and "brightness" to the sound. However, these frequencies contain relatively little speech information, only sibilance, which is the hissing sound at the beginning of words such as "zhi", "chi" and "shi". Reducing the sibilance around 6-8kHz will have an adverse effect on speech intelligibility.

Most of us know that the human hearing threshold decreases with age, as shown in Figure 2.

Figure 2: This graph shows the hearing threshold loss of normal males at different ages under mono headphone listening conditions. Note that there is a similar graph for females, where the hearing loss decreases slightly with age (ISO7029:2017)

It is important to note that even mild hearing loss (which occurs in most people between the ages of 40 and 50) can have a significant impact on an individual's life. For example, someone with mild hearing loss may have trouble following group conversations in a noisy environment. In addition, they may miss important auditory cues such as warning signs or alarms.

Is current audio hardware sufficient for the needs of future AI?

Now that we have a better understanding of how humans perceive audio signals, let’s revisit our original question of what quality of audio input current and future AI will need to perform at a level indistinguishable from that of humans.

Most consumer devices on the market today use MEMS microphones to record audio signals. MEMS microphones are the primary audio capture technology for AI personal assistants, and devices using AI assistant technology are now available on the market.

The recording quality of a MEMS microphone depends on its dynamic range. The upper limit of the dynamic range is determined by the acoustic overload point (AOP), which defines the distortion performance of the microphone at high sound pressure levels. The self-noise of the microphone determines the lower limit of its dynamic range. The method to measure the self-noise of a microphone is the signal-to-noise ratio (SNR), which defines the ratio between the self-noise of the microphone and the signal it captures (sensitivity). However, for our discussion, the signal-to-noise ratio is somewhat inappropriate because the self-noise of the signal-to-noise ratio uses A-weighting, which is actually defined based on the human ability to perceive audio signals.

If the intended recipient of the audio signal is an artificial intelligence, the equivalent noise level (ENL) of the associated microphone is a more appropriate parameter to measure performance, as it ignores the human perception factor of the recorded sound. The equivalent noise level (ENL) refers to the signal produced by the microphone in the absence of an external sound source. The equivalent noise level (ENL) is measured in decibels (dBSPL), which represents the sound pressure level of the same voltage as the microphone's self-noise.

It is worth noting that any sound information below the equivalent noise level ENL is essentially lost and cannot be recovered, regardless of the sound processing method used later. Therefore, if there are no other components in the audio chain that introduce noise before the signal reaches the AI algorithm, the microphone ENL can be regarded as the hearing threshold of the AI algorithm. It should be noted that this is a highly simplified assumption, as there are usually many other components in the audio chain, including the sound channel, waterproof protective membrane, and audio processing chain.

Please refer to Figure 3 for a direct comparison of the ENL curves of two MEMS microphones and the human hearing threshold.

Figure 3: Comparison of 1/3 octave equivalent noise level ENL and typical male hearing threshold for mid-range and high-end MEMS microphones

The red line is the equivalent noise level ENL curve of a microphone with a signal-to-noise ratio of 65dB(A), and the microphone has an integrated dust-proof design. The corresponding MEMS microphone is currently used in many high-end smartphones produced by multiple suppliers.

The purple line below shows the ENL curve for Infineon's latest high-end digital microphone, which features an innovative protective design to provide dust and water resistance. This microphone represents the current state of the art and was only released in high-end tablets this year. We expect that microphones with comparable performance will appear in high-end smartphones by the end of this year. It is worth noting that reducing the self-noise of a microphone by 5-10dB is a significant achievement, especially considering that sound pressure is expressed using a logarithmic scale.

While Infineon has made significant progress in reducing the self-noise of high-end MEMS microphones, there is still a large gap in the ability of microphones to discern low sound pressure levels compared to the human ear. This is especially true around 2kHz, which is critical to ensure a high level of sound intelligibility for human listeners. The gap between the hearing ability of a young person and Infineon’s most advanced microphones is more than 12dBSPL. Compared to the microphones used in current high-end mobile phones, the gap is significantly larger, at 17dBSPL. It is important to point out again that this assessment only considers the self-noise of the MEMS microphone and does not take into account additional noise sources in the audio chain that would further degrade the overall performance.

[1] [2]

Reference address：High signal-to-noise ratio MEMS microphone drives artificial intelligence interaction

Previous article：Microcontrollers that combine Hi-Fi, intelligence and USB multi-channel features – ushering in a new era of digital audio
Next article：ROHM develops the second generation of MUS-IC™ series audio DAC chips suitable for high-resolution audio playback

Recommended ReadingLatest update time:2024-11-21 17:52

The MEMS equipment market is growing steadily, and various MEMS processes are emerging

The latest report from market research firm Yole Developpement points out that although standardization of the microelectromechanical systems (MEMS) industry has not yet been implemented, major companies are still committed to optimizing their technology platforms; process innovation in this area will also drive MEMS

[Analog Electronics]

The MEMS equipment market is growing steadily, and various MEMS processes are emerging

Improving SoC test capabilities and system output based on new MEMS switches

Advanced digital processor ICs require separate DC parameter and high-speed digital automatic test equipment (ATE) tests to meet quality assurance requirements. This brings great cost and organizational management challenges. This article will introduce how the ADGM1001 SPDT MEMS switch can help pass the single inse

[Embedded]

Improving SoC test capabilities and system output based on new MEMS switches

Comparison and Selection of PDM and I²S Digital Output Interfaces in MEMS Microphones

This article will discuss the two digital interfaces of pulse density modulation (PDM) and integrated circuit sound (I²S) in detail, and introduce their unique characteristics and advantages and disadvantages in system design. The specific choice of which one an engineer chooses will depend on studying the two techn

[Embedded]

Comparison and Selection of PDM and I²S Digital Output Interfaces in MEMS Microphones

Using MEMS microphone arrays to locate and identify audio or voice sources

* Department of Computer Science, University of Milan, Italy ** University of Milan-Bicocca, Italy *** STMicroelectronics (Agrate, Italy) Abstract: Over the past 10 years, human-computer interaction applications mediated by human language and audio signals have become increasingly important in our daily lives. T

[sensor]

Using MEMS microphone arrays to locate and identify audio or voice sources

Suzhou High-tech Zone has signed new contracts for third-generation semiconductor, MEMS sensor and other projects

On May 26, the Suzhou High-tech Zone Integration into the Shanghai-Suzhou Metropolitan Development Promotion Conference was held. At the conference, 92 projects in four categories were signed, with a contract value of 53.6 billion yuan. The signed projects include the RF chip research and development and industriali

[Mobile phone portable]

ADI launches new generation of ultra-low power MEMS accelerometer ADXL367

Analog Devices has released a three-axis MEMS accelerometer designed for a wide range of healthcare and industrial applications, including vital signs monitoring, hearing aids, and motion measurement equipment. The ADXL367 reduces power consumption by a factor of two while improving noise performance by m

[sensor]

IDT introduces industry's lowest jitter MEMS oscillator with frequency margining

IDT® (Integrated Device Technology, Inc.; NASDAQ: IDTI), a leading provider of mixed-signal semiconductor solutions with analog and digital expertise, today introduced the industry’s first differentiated MEMS oscillator with 100 femtosecond (fs) typical phase jitter performance and integrated frequency margining capab

[Analog Electronics]

IDT introduces industry's lowest jitter MEMS oscillator with frequency margining

MEMS product sales increased significantly, Minxin's revenue in the first half of the year increased by 40.17% year-on-year

On August 19, Minxin Co., Ltd. released its semi-annual report stating that in the first half of 2021, the company achieved operating income of 186 million yuan, a year-on-year increase of 40.17%; the net profit attributable to shareholders of the listed company was 10.22 million yuan, a year-on-year decrease of 40.05

[Mobile phone portable]

MEMS product sales increased significantly, Minxin's revenue in the first half of the year increased by 40.17% year-on-year

Popular Resources
Popular amplifiers