The Importance of Ultra-High Signal-to-Noise for MEMS Microphones-EEWORLD

Collect

The popularity of automatic speech recognition systems and the use of video content to share information and experiences are increasing dramatically. The performance and quality of microphones used to capture sound must be high to ensure a good user experience. Critical factors include noise, distortion, frequency response, and component matching.

In previous articles, it has been briefly explained that microphone performance is often characterized by self-noise and dynamic range. The upper limit of the dynamic range is defined by the acoustic overload point (AOP). The lower limit is defined by the signal-to-noise ratio (SNR Signal-to-noise Ratio). The SNR describes the self-noise of the microphone. A microphone can only receive signals at sound pressure levels (SPL) above its self-noise floor. Therefore, a microphone with a high SNR can operate at lower sound pressures than a microphone with a low SNR. This article focuses on the signal-to-noise ratio (SNR) and the acoustic overload point (AOP) and explains the benefits of having high microphone performance in speech recognition and audio/video capture systems.

Noise in a microphone output can be defined as any signal that is not an expected input source and is generally considered an undesirable element in the output signal. The higher the noise level, the more it degrades the quality of the audio signal. Noise can come from outside the microphone or from within the microphone itself. People often hear microphone self-noise as a hiss that affects the perceived sound quality. For algorithms, noise degrades the fidelity of the signal, thereby reducing system performance.

Microphone noise can be expressed in different ways:

Self noise (Vrms, dBV, dBFS) is the rms noise voltage produced by the microphone itself when it is not excited by external sounds.
The signal-to-noise ratio (SNR (dB)) describes the self-noise of a microphone relative to the intended input signal. SNR is usually measured using a standard acoustic input signal that represents the desired sound, a 94 dBSPL (1 Pa) sine wave.

Equivalent input noise EIN (dBSPL) is the (hypothetical) acoustic noise level entering the microphone, which is equivalent to the electronic noise level at the microphone output.

All real-life audio sensors are nonlinear systems in that they add content to the signal passing through them. In the case of distortion, the added content is at the harmonics of the frequencies present in the original signal. Distortion is usually measured as Total Harmonic Distortion THD (or THD+N if self-noise is included). It is the ratio of the energy in the signal's harmonics (usually the 2nd to 5th harmonics) to the energy in the fundamental frequency when the microphone is excited by a sine wave. The test signal is usually a 1 kHz sinusoid at a relatively high sound pressure level (SPL), typically 94 dBSPL or more. THD is expressed in percent (%). The Acoustic Overload Point, AOP is usually defined as the sound pressure level at which the THD exceeds 10%. The unit of AOP is dBSPL.

In most cases, it is beneficial and important to maintain the original form and content of the sound incoming to the microphone. Adding content, such as distortion, to the original signal may sound unpleasant to the human listener. The more energy is added (i.e. the higher the THD value), the worse the perceived audio quality. Distortion may also confuse algorithms such as speech recognition systems, especially speech recognition systems that perform a very detailed analysis of the content of the input signal.

Figure 1 Acoustical SNR signal-to-noise ratio

The goal of audio/video recording is to capture the incoming sound from the subject and reproduce it at the output of a microphone system. When the recording is for the human ear, it is desirable for the electronic output signal to match the acoustic signal as closely as possible, providing a "natural" sounding recording. The microphone and its signal-to-noise ratio are a critical part of the acoustic capture signal chain, affecting the quality of the recording. The table below gives some typical use cases.

In nature, the sound pressure halves (by 6 dB) for every doubling of the distance. The farther away the sound source is to be captured, the quieter the acoustic signal will be that reaches the microphone. Since the self-noise of a microphone is practically constant, a reduction in the input signal level results in a reduction in the signal-to-noise ratio of the microphone output signal. Often, weak signals must be amplified to bring them to an appropriate level for the device's signal path. Amplifying the signal also amplifies the noise in the output signal. The greater the amplification, the greater the risk that the noise will rise to a level that significantly degrades the quality of the captured signal.

A high microphone signal-to-noise ratio helps keep the noise floor nearly inaudible, even when the signal is amplified. The longer the capture distance, the less self-noise the microphone has to have to avoid problems. This is especially important when the distance is long and the sound source itself is quiet. Since the sound pressure decays by 6 dB for every doubling of distance, using a microphone with a high signal-to-noise ratio of 6 dB allows you to double your capture distance without any degradation in signal quality.

POLQA (Objective Listening Quality Assessment) is an ITU-T standard model that uses digital speech analysis to objectively determine the quality and intelligibility of recorded speech signals. Microphones with a high signal-to-noise ratio perform significantly better in POLQA tests and have better speech intelligibility. When a signal is recorded with a high signal-to-noise ratio microphone, the same level of signal is easier to understand.

Like SNR, AOP is also an important audio/video quality factor. Distortion can easily render a video recording useless. There are many smartphone videos shot on the Internet at pop/rock concerts that are unwatchable due to severe audio distortion. If the incoming sound pressure level of the expected sound (or interfering sounds) is high or very high, a high AOP can improve the sound quality. A high AOP helps the microphone system to handle very high signal peaks that may appear in the incoming sound signal, even if the average sound pressure level is not very high. See some typical use cases in the table below.

Until a few years ago, standard levels of AOP for microphones in consumer electronic devices were between 110 and 120 dBSPL. In the recent past, AOP requirements have risen. To ensure that sound quality and speech recognition performance meet customer requirements, device designers should select microphones with AOPs close to or above 130 dBSPL. At lower sound pressure levels, it makes more sense to observe THD levels lower than the 10% specified for the AOP. In addition to having a high AOP, it is also important that THD remains low (less than 2%) and should be achieved at sufficiently high sound pressure levels for the intended application (e.g., up to 120 dBSPL).

In systems where the captured sound is intended for an algorithm, the sound quality goals may be different than when the signal is intended for the human ear. The signal does not necessarily have to sound natural, as long as it is optimized for the algorithm. Regardless of the use case, it is always important to keep the signal free of interference, artifacts, distortion, and noise.

Natural speech recognition (ASR) is the task of automatically transcribing a speech signal into text. Transcription accuracy is getting closer and closer to human levels, at around 95%. However, until now, it has been possible to achieve this level only in laboratories with good environmental conditions. Speech recognition in real-life environments and at long distances involves some important acoustic challenges, such as background noise, reverberation, echo cancellation, and microphone positioning. It is not enough to just have a good speech recognition engine. Every element in the system should perform to a high standard to prevent quality bottlenecks. The job of the microphone is to provide the best possible input signal to the speech recognition system. High input signal quality helps the ASR system analyze the incoming sound and find features in it to recognize the speech content. Key parameters include noise, distortion, frequency response, and phase.

High AOP can help speech recognition systems in noisy environments. Sometimes the speech signal itself is not strong and there are other interferences. For example, in voice-controlled home entertainment systems and digital assistants, there are speakers close to the microphone that can output loud music or voice information. High AOP helps keep distortion low and improves noise and echo cancellation.

The farther away from the speech source, the lower the signal-to-noise ratio of the signal input to the ASR algorithm. Therefore, when the target capture distance is longer, the microphone signal-to-noise ratio is higher.

A key feature of speech recognition systems is the ability to ignore sounds and noise that are not the speech being transcribed. The quality of audio/video capture and human-to-human communication can also be improved by excluding unwanted sounds from the signal. The goal is to increase the signal-to-noise ratio, which in this case is the ratio of the wanted sound (signal) to the unwanted ambient sound (noise).

Combining multiple microphones with algorithms allows for noise cancellation and directionality. Directional microphone systems, such as beamforming, can focus the microphone's sensitivity to the desired direction and highlight the desired sound source. Unwanted sounds can also be canceled based on parameters, such as the level difference between two microphones. Blind source separation is a more sophisticated noise reduction system. It can remove noise that is independent of direction, distance, and position. All of these noise cancellation methods benefit from the accuracy and quality of their received signals. Microphones should have a high signal-to-noise ratio, low distortion, a flat frequency response (which also improves phase response), and low group delay.

[1] [2]

Reference address：The Importance of Ultra-High Signal-to-Noise for MEMS Microphones

Previous article：ViewSonic's 4K gaming monitor XG320U is launched to help you overcome all difficulties!
Next article：Technology in call noise reduction of TWS headsets

Popular Resources
Popular amplifiers