Understanding Voices - How Can ADI's Artificial Intelligence Significantly Improve Equipment Uptime?

Latest update time：2021-09-05 12:34

Reads：

Anyone familiar with the necessity of equipment maintenance knows how important the sound and vibrations emitted by equipment are. Proper equipment health monitoring through sound and vibration can reduce maintenance costs in half and double the service life. Implementing real-time acoustic data and analysis is another important condition-based system monitoring (CbM) approach.

We can learn what normal sounds a piece of equipment makes. When the sound changes, we can identify an anomaly. Then we can learn what the problem is, in a way that links the sound to a specific problem. Identifying anomalies may take a few minutes of training, but combining sound, vibration, and cause to diagnose can take a lifetime. Experienced technicians and engineers may have this knowledge, but they are a scarce resource. Identifying problems from sound alone can be difficult, even with recordings, descriptive frameworks, or in-person training from an expert.

Therefore, the ADI team has spent the past 20 years working to understand how humans interpret sound and vibration. Our goal is to build a system that can learn the sounds and vibrations from a device, decipher their meaning to detect anomalous behavior, and perform diagnostics. This article details the architecture of OtoSense, a device health monitoring system that supports what we call computer hearing, allowing computers to understand the main indicators of device behavior: sound and vibration .

The system works on any device and can work in real time without a network connection. It has been used in industrial applications to enable the implementation of a scalable and efficient device health monitoring system.

This article explores the principles that guided the development of OtoSense and the role of human hearing in its design. It then discusses how sound or vibration characteristics are designed, how to understand what they represent, and how OtoSense is constantly changing and improving in continuous learning to perform more complex diagnostics with more accurate results.

Guiding Principles

To ensure robustness, agnosticism, and efficiency, the OtoSense design philosophy is guided by several principles:

Get inspiration from human neurology. Humans can learn and understand any sound they hear in a very energy-efficient way.
Ability to learn both static and transient sounds. This requires constant tuning of functionality and ongoing monitoring.
Identification is done at the terminal close to the sensor. There should be no need to connect to a remote server over the network to make decisions.
Interact with experts and learn from them in a way that interferes with their daily work as little as possible and makes the process as pleasant as possible.

The human auditory system and OtoSense analysis

Hearing is a sense that is essential to survival. It is a holistic sense of distant, unseen events that matures before birth.

The process by which humans perceive sound can be described using four familiar steps: analog acquisition of sound, digital conversion, feature extraction, and interpretation. In each step, we will compare the human ear to the OtoSense system.

Analog acquisition and digitization . A membrane and lever in the middle ear capture sound, then adjust impedance to transmit vibrations into a fluid-filled cavity, where another membrane selectively shifts according to the spectral components present in the signal. This in turn bends the elastic units, which emit digital signals reflecting the degree and strength of the bend. These individual signals are then transmitted to the primary auditory cortex via parallel nerves arranged by frequency.

In OtoSense, this work is done by the sensor, amplifier, and codec. The digitization process uses a fixed sampling rate that can be adjusted between 250 Hz and 196kHz, and the waveform is encoded at 16 bits and then stored in a buffer of size between 128 and 4096.

Feature extraction Occurs in the primary cortex: frequency domain features such as dominant frequency, harmonics, and spectral shape, and time domain features such as impulses, intensity variations, and dominant frequency components within a time window of approximately 3 seconds.

OtoSense uses a time window, which we call a block, which moves with a fixed step size. The size of this block and the step size range from 23 milliseconds to 3 seconds, depending on the events that need to be recognized and the sampling rate of the features extracted at the terminal. In the next section, we will explain the features extracted by OtoSense in more detail.

Parsing occurs in the association cortex , which fuses all perceptions and memories and gives meaning to sounds (e.g., through language), playing a central role in shaping perception. Parsing organizes our descriptions of events and goes far beyond simply naming them. Giving a name to an item, a sound, or an event allows us to give it larger, more layers of meaning. For experts, names and meanings allow them to better understand their surroundings.

That’s why OtoSense’s interaction with people begins with visual, unsupervised sound mapping based on human neurology. OtoSense uses a graphical representation of all the sounds or vibrations it hears, arranged by similarity without trying to create fixed categories. This allows experts to organize and name the groups displayed on the screen without trying to create artificially bounded categories. They can build a semantic map based on their knowledge, perception, and expectations of OtoSense’s final output. An auto mechanic, an aerospace engineer, or a cold forging press expert—even someone working in the same field but from a different company—can divide, organize, and label the same soundscape in different ways. OtoSense uses the same bottom-up approach to giving meaning that it uses to shape language meaning.

From sound and vibration to characteristics

Over a period of time (a time window or block as shown earlier), a feature is assigned a unique number that describes a given property/quality of the sound or vibration during that time. The OtoSense platform selects features based on the following principles:

For both the frequency and time domains, the features should describe the environment as completely as possible, with as much detail as possible. They must describe stationary hums, as well as clicks, rattles, squeaks, and any sounds that change momentarily.
Features should form a set as orthogonal as possible. If a feature is defined as "average amplitude over a block", there should not be another feature that is highly correlated with it, such as "total spectral energy over a block". Of course, orthogonality may never be achieved, but no feature should be expressed as a combination of the others, and each feature must contain a single piece of information.
Features should minimize computation. Our brains only know addition, comparison, and resetting to 0. Most OtoSense features are designed to be incremental, so that each new example can modify the feature with a simple operation, without having to recalculate it over a full buffer, or worse, over a block. Minimizing computation also means that standard physical units can be ignored. For example, it makes no sense to try to express intensity as a value in dBA. If a dBA value needs to be output, it can be done at the output (if necessary).

Among the 2 to 1024 features of the OtoSense platform, a part describes the time domain. They are either extracted directly from the waveform or from the evolution of any other feature on the block. Among these features, some include the average and maximum amplitude, the complexity obtained from the linear length of the waveform, the amplitude variation, the presence and characteristics of pulses, the stability of the similarity between the first and the last buffer, the ultra-small autocorrelation of the convolution or the variation of the main spectral peaks.

The features used in the frequency domain are extracted from FFTs. FFTs are calculated on each buffer, producing outputs ranging from 128 to 2048 individual frequencies. The process then creates a vector of the required dimensionality, which is much smaller than an FFT, but still describes the environment in detail. OtoSense initially uses an agnostic approach to create data buckets of equal size on the logarithmic spectrum. These data buckets then focus on areas of the spectrum with high information density, depending on the environment and the event to be recognized, either from an unsupervised perspective that maximizes entropy, or from a semi-supervised perspective that uses labeled events as a guide. This mimics the cellular structure of our inner ears, where speech details are denser where the density of linguistic information is greatest.

Structure: Support terminal and local data

OtoSense performs anomaly detection and event identification at the end-point without the need for any remote device. This architecture ensures that the system is not affected by network failures and does not require all raw data blocks to be sent out for analysis. The end-point device running OtoSense is a self-contained system that can describe the behavior of the device being listened to in real time.

The OtoSense servers running the AI and HMI are typically hosted locally. A cloud architecture can aggregate multiple meaningful data streams into the output of an OtoSense device. For an AI that is dedicated to processing large amounts of data and interacting with hundreds of devices at a site, using cloud hosting does not make much sense.

Figure 1. OtoSense system

From features to anomaly detection

Normal/abnormal assessment does not require much interaction with experts. The experts only need to help determine the baseline that represents normal sound and vibration of the device. This baseline is then converted into an abnormal model on the Otosense server before being pushed to the device.

We then use two different strategies to assess whether the incoming sound or vibration is normal:

The first strategy is what we call "normality", which examines the surroundings of any new sound that enters the feature space, its distance from baseline points and clusters, and the size of these clusters. The larger the distance and the smaller the clusters, the more unusual the new sound and the higher the outlier value. When this outlier value is above an expert-defined threshold, the corresponding block is marked as unusual and sent to the server for expert review.
The second strategy is very simple: any incoming chunk whose feature value is above or below the maximum or minimum value of the baseline defined by the feature is marked as "extreme" and sent to the server.

Unusual sounds or vibrations are well covered by the combination of Abnormal and Extreme strategies, which also excel in detecting day-to-day wear and tear and brutal unexpected events.

From features to event recognition

Features belong to the physical domain, and meaning belongs to human cognition. To connect features to meaning, an interaction between OtoSense AI and human experts is required. We spent a lot of time studying customer feedback and developing a human-machine interface (HMI) that allows engineers to efficiently interact with OtoSense and design event recognition models. This HMI allows exploring data, labeling data, creating anomaly models and sound recognition models, and testing these models.

The OtoSense Sound Platter (also known as splatter) allows exploring and labeling sounds with a complete overview of the dataset. Splatter selects the most interesting and representative sounds in the full dataset and displays them as a 2D similarity map that mixes labeled and unlabeled sounds.

Figure 2. 2D splatter sound map in OtoSense Sound Platter.

Any sound or vibration, including its environment, can be visualized in many different ways - for example, using a Sound Widget (also known as a Swidget).

Figure 3. OtoSense sound widget (swidget).

At any time, an anomaly model or an event recognition model can be created. The event recognition model is a circular confusion matrix that allows OtoSense users to explore confusing events.

Figure 4. Event recognition models can be created based on the desired events.

Anomalies can be examined and marked through an interface that displays all unusual and extreme sounds.

Figure 5. Sound analysis over time in the OtoSense anomaly visualization interface.

Continuous learning process—from anomaly detection to increasingly complex diagnostics

OtoSense is designed to learn from multiple experts and, over time, perform increasingly complex diagnostics. A common process is a cycle between OtoSense and the experts:

Anomaly models and event recognition models are run on the terminal. These models create outputs for the probability of potential events occurring and their anomaly values.
An unusual sound or vibration that exceeds a defined threshold triggers an anomaly notification. Technicians and engineers using OtoSense can examine the sound and the surrounding sound information.
These experts then flag this unusual event.
New recognition models and anomaly models containing this new information are calculated and pushed to the terminal device.

in conclusion

OtoSense technology from ADI is designed to make sound and vibration expertise continuously available on any device without the need for a network connection to perform anomaly detection and event identification. This technology is increasingly being used for equipment health monitoring in aerospace, automotive, and industrial monitoring applications, indicating that it has shown good performance in scenarios that once required expertise and involved embedded applications, especially for complex devices.

Give you a careful heart, please click "Watching"

Latest articlesabout

■How do integrated switch controllers improve system energy efficiency?

■Reduce the pressure of hardware development, do you know this solution?

■Considerations for Using Zero-Drift Amplifiers in Wider Bandwidth Applications

■In-depth discussion of the CANopen protocol for low-power electronic control

■Exclusive benefits for engineers: Unlock new video courses and win wonderful gifts!

■Why does my hot-swap controller circuit oscillate?

■Several different ways to optimize SPI drivers

■How to design a four-switch buck-boost DC-DC converter using GaN FETs?

■Improve the security of the intelligent edge and learn about the ADI Assure trusted edge security architecture

■Regarding convolutional neural networks, have you clarified these concepts?

最新精华更多

Understanding Voices - How Can ADI's Artificial Intelligence Significantly Improve Equipment Uptime?

Latest articlesabout