The popularity of automatic speech recognition systems and the use of video content to share information and experiences are increasing dramatically. The performance and quality of microphones used to capture sound must be high to ensure a good user experience. Critical factors include noise, distortion, frequency response, and component matching.
In previous articles, it has been briefly explained that microphone performance is often characterized by self-noise and dynamic range. The upper limit of the dynamic range is defined by the acoustic overload point (AOP). The lower limit is defined by the signal-to-noise ratio (SNR Signal-to-noise Ratio). The SNR describes the self-noise of the microphone. A microphone can only receive signals at sound pressure levels (SPL) above its self-noise floor. Therefore, a microphone with a high SNR can operate at lower sound pressures than a microphone with a low SNR. This article focuses on the signal-to-noise ratio (SNR) and the acoustic overload point (AOP) and explains the benefits of having high microphone performance in speech recognition and audio/video capture systems.
Noise in a microphone output can be defined as any signal that is not an expected input source and is generally considered an undesirable element in the output signal. The higher the noise level, the more it degrades the quality of the audio signal. Noise can come from outside the microphone or from within the microphone itself. People often hear microphone self-noise as a hiss that affects the perceived sound quality. For algorithms, noise degrades the fidelity of the signal, thereby reducing system performance.
Microphone noise can be expressed in different ways:
Self noise (Vrms, dBV, dBFS) is the rms noise voltage produced by the microphone itself when it is not excited by external sounds.
The signal-to-noise ratio (SNR (dB)) describes the self-noise of a microphone relative to the intended input signal. SNR is usually measured using a standard acoustic input signal that represents the desired sound, a 94 dBSPL (1 Pa) sine wave.
Equivalent input noise EIN (dBSPL) is the (hypothetical) acoustic noise level entering the microphone, which is equivalent to the electronic noise level at the microphone output.
All real-life audio sensors are nonlinear systems in that they add content to the signal passing through them. In the case of distortion, the added content is at the harmonics of the frequencies present in the original signal. Distortion is usually measured as Total Harmonic Distortion THD (or THD+N if self-noise is included). It is the ratio of the energy in the signal's harmonics (usually the 2nd to 5th harmonics) to the energy in the fundamental frequency when the microphone is excited by a sine wave. The test signal is usually a 1 kHz sinusoid at a relatively high sound pressure level (SPL), typically 94 dBSPL or more. THD is expressed in percent (%). The Acoustic Overload Point, AOP is usually defined as the sound pressure level at which the THD exceeds 10%. The unit of AOP is dBSPL.
In most cases, it is beneficial and important to maintain the original form and content of the sound incoming to the microphone. Adding content, such as distortion, to the original signal may sound unpleasant to the human listener. The more energy is added (i.e. the higher the THD value), the worse the perceived audio quality. Distortion may also confuse algorithms such as speech recognition systems, especially speech recognition systems that perform a very detailed analysis of the content of the input signal.
Figure 1 Acoustical SNR signal-to-noise ratio
The goal of audio/video recording is to capture the incoming sound from the subject and reproduce it at the output of a microphone system. When the recording is for the human ear, it is desirable for the electronic output signal to match the acoustic signal as closely as possible, providing a "natural" sounding recording. The microphone and its signal-to-noise ratio are a critical part of the acoustic capture signal chain, affecting the quality of the recording. The table below gives some typical use cases.
In nature, the sound pressure halves (by 6 dB) for every doubling of the distance. The farther away the sound source is to be captured, the quieter the acoustic signal will be that reaches the microphone. Since the self-noise of a microphone is practically constant, a reduction in the input signal level results in a reduction in the signal-to-noise ratio of the microphone output signal. Often, weak signals must be amplified to bring them to an appropriate level for the device's signal path. Amplifying the signal also amplifies the noise in the output signal. The greater the amplification, the greater the risk that the noise will rise to a level that significantly degrades the quality of the captured signal.
A high microphone signal-to-noise ratio helps keep the noise floor nearly inaudible, even when the signal is amplified. The longer the capture distance, the less self-noise the microphone has to have to avoid problems. This is especially important when the distance is long and the sound source itself is quiet. Since the sound pressure decays by 6 dB for every doubling of distance, using a microphone with a high signal-to-noise ratio of 6 dB allows you to double your capture distance without any degradation in signal quality.
POLQA (Objective Listening Quality Assessment) is an ITU-T standard model that uses digital speech analysis to objectively determine the quality and intelligibility of recorded speech signals. Microphones with a high signal-to-noise ratio perform significantly better in POLQA tests and have better speech intelligibility. When a signal is recorded with a high signal-to-noise ratio microphone, the same level of signal is easier to understand.
Like SNR, AOP is also an important audio/video quality factor. Distortion can easily render a video recording useless. There are many smartphone videos shot on the Internet at pop/rock concerts that are unwatchable due to severe audio distortion. If the incoming sound pressure level of the expected sound (or interfering sounds) is high or very high, a high AOP can improve the sound quality. A high AOP helps the microphone system to handle very high signal peaks that may appear in the incoming sound signal, even if the average sound pressure level is not very high. See some typical use cases in the table below.
Until a few years ago, standard levels of AOP for microphones in consumer electronic devices were between 110 and 120 dBSPL. In the recent past, AOP requirements have risen. To ensure that sound quality and speech recognition performance meet customer requirements, device designers should select microphones with AOPs close to or above 130 dBSPL. At lower sound pressure levels, it makes more sense to observe THD levels lower than the 10% specified for the AOP. In addition to having a high AOP, it is also important that THD remains low (less than 2%) and should be achieved at sufficiently high sound pressure levels for the intended application (e.g., up to 120 dBSPL).
In systems where the captured sound is intended for an algorithm, the sound quality goals may be different than when the signal is intended for the human ear. The signal does not necessarily have to sound natural, as long as it is optimized for the algorithm. Regardless of the use case, it is always important to keep the signal free of interference, artifacts, distortion, and noise.
Natural speech recognition (ASR) is the task of automatically transcribing a speech signal into text. Transcription accuracy is getting closer and closer to human levels, at around 95%. However, until now, it has been possible to achieve this level only in laboratories with good environmental conditions. Speech recognition in real-life environments and at long distances involves some important acoustic challenges, such as background noise, reverberation, echo cancellation, and microphone positioning. It is not enough to just have a good speech recognition engine. Every element in the system should perform to a high standard to prevent quality bottlenecks. The job of the microphone is to provide the best possible input signal to the speech recognition system. High input signal quality helps the ASR system analyze the incoming sound and find features in it to recognize the speech content. Key parameters include noise, distortion, frequency response, and phase.
High AOP can help speech recognition systems in noisy environments. Sometimes the speech signal itself is not strong and there are other interferences. For example, in voice-controlled home entertainment systems and digital assistants, there are speakers close to the microphone that can output loud music or voice information. High AOP helps keep distortion low and improves noise and echo cancellation.
The farther away from the speech source, the lower the signal-to-noise ratio of the signal input to the ASR algorithm. Therefore, when the target capture distance is longer, the microphone signal-to-noise ratio is higher.
A key feature of speech recognition systems is the ability to ignore sounds and noise that are not the speech being transcribed. The quality of audio/video capture and human-to-human communication can also be improved by excluding unwanted sounds from the signal. The goal is to increase the signal-to-noise ratio, which in this case is the ratio of the wanted sound (signal) to the unwanted ambient sound (noise).
Combining multiple microphones with algorithms allows for noise cancellation and directionality. Directional microphone systems, such as beamforming, can focus the microphone's sensitivity to the desired direction and highlight the desired sound source. Unwanted sounds can also be canceled based on parameters, such as the level difference between two microphones. Blind source separation is a more sophisticated noise reduction system. It can remove noise that is independent of direction, distance, and position. All of these noise cancellation methods benefit from the accuracy and quality of their received signals. Microphones should have a high signal-to-noise ratio, low distortion, a flat frequency response (which also improves phase response), and low group delay.
Previous article:ViewSonic's 4K gaming monitor XG320U is launched to help you overcome all difficulties!
Next article:Technology in call noise reduction of TWS headsets
- Popular Resources
- Popular amplifiers
- Red Hat announces definitive agreement to acquire Neural Magic
- 5G network speed is faster than 4G, but the perception is poor! Wu Hequan: 6G standard formulation should focus on user needs
- SEMI report: Global silicon wafer shipments increased by 6% in the third quarter of 2024
- OpenAI calls for a "North American Artificial Intelligence Alliance" to compete with China
- OpenAI is rumored to be launching a new intelligent body that can automatically perform tasks for users
- Arm: Focusing on efficient computing platforms, we work together to build a sustainable future
- AMD to cut 4% of its workforce to gain a stronger position in artificial intelligence chips
- NEC receives new supercomputer orders: Intel CPU + AMD accelerator + Nvidia switch
- RW61X: Wi-Fi 6 tri-band device in a secure i.MX RT MCU
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- CGD and Qorvo to jointly revolutionize motor control solutions
- CGD and Qorvo to jointly revolutionize motor control solutions
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Infineon's PASCO2V15 XENSIV PAS CO2 5V Sensor Now Available at Mouser for Accurate CO2 Level Measurement
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- Ultra-low power multi-sensor data logger with NFC interface
- 【Bluesun AB32VG1 RISC-V Evaluation Board】Development Board Introduction
- How to prevent wage subsidy fraud? All Sohu employees were exposed to wage subsidy fraud
- How to use a power amplifier to amplify and output a pulse train signal? How to use the Burst function of a signal generator?
- 【DIY Creative LED】LED lights and holes
- Surge arrester explanation and working principle
- Is there any error in the schematic diagram of the electrostatic generator?
- Cheap_Flash_FS (SPI_Flash version) -- embedded SPI_FLASH file system free source code, please download
- How to write interrupt function after using library function in MSP430F5529
- TPS61040 boost circuit abnormality