Regarding convolutional neural networks, have you clarified these concepts?

Latest update time：2024-10-21

Reads：

With the rapid development of artificial intelligence (AI) technology, AI can increasingly support applications that were previously impossible or difficult to achieve. Based on this, this article explains convolutional neural networks (CNNs) and their significance to artificial intelligence and machine learning. CNN is a powerful tool that can extract features from complex data, such as recognizing complex patterns in audio or image signals.

What are Convolutional Neural Networks?

A neural network is a system or structure composed of neurons that enables AI to better understand data and solve complex problems. Although there are many types of neural networks, this article will focus only on convolutional neural networks (CNNs), whose main application areas are pattern recognition and object classification of input data. CNN is an artificial neural network used for deep learning. This network consists of an input layer, several convolutional layers, and an output layer. The convolutional layers are the most important part, and they use a unique set of weights and filters that allow the network to extract features from the input data. The data can be in many different forms such as images, audio, and text. This process of extracting features allows CNNs to recognize patterns in the data, allowing engineers to create more effective and efficient applications. To better understand CNNs, we will first discuss classic linear programming.

Linear Programming in Classical Control Technology

The task of control technology is to read the data with the help of sensors, process it, respond according to rules, and finally display or send the results. For example, a thermostat measures the temperature once a second, and the data of the temperature sensor is read by a microcontroller unit (MCU). This value is used as an input for a closed-loop control system and compared with the set temperature. This is an example of linear programming performed with the help of an MCU, a technique that compares pre-programmed values with actual values to give a clear conclusion. In contrast, AI systems usually work based on probability theory.

Complex pattern and signal processing

Many applications use input data that must first be identified by a pattern recognition system. Pattern recognition can be applied to different data structures. The examples discussed in this article are limited to one-dimensional or two-dimensional data structures, such as audio signals, electrocardiograms (ECGs), photoplethysmography (PPGs), one-dimensional vibration data or waveforms, thermal images, and two-dimensional waterfall data.

In the above pattern recognition, it is extremely difficult to implement the application through the MCU code. One example is to identify a specific object (such as a cat) in an image: in this case, it is impossible to distinguish whether the image to be analyzed was recorded a long time ago or just read from the camera. The analysis software determines whether there is a cat in the picture based on some specific rules: for example, a cat must have typical pointy ears, a triangular nose and whiskers. If these features can be identified in the image, the software can report that a cat has been found in the image. But there are some problems: What if the image only shows the back of the cat? What if the cat has no whiskers or has lost a leg in an accident? Although these abnormal situations are unlikely to occur, the pattern recognition code will have to consider all possible abnormal situations, thus adding a large number of additional rules. Even in this simple example, the rules set by the software can become very complex.

How machine learning replaces classical rules

The core idea behind AI is to learn by imitating humans on a small scale. It does not rely on formulating a large number of if-then rules, but rather on building a general machine model of pattern recognition. The key difference between the two approaches is that, compared to a complex set of rules, AI does not provide clear results. Instead of explicitly reporting "I recognized a cat in the image", AI provides a conclusion like "There is a 97.5% probability that there is a cat in the image, it may also be a leopard (2.1%) or a tiger (0.4%)." This means that at the end of the pattern recognition process, the application developer must make a decision through a decision threshold.

Another difference is that AI does not rely on fixed rules, but needs to be trained. The training process requires showing a large number of cat images to the neural network for it to learn. Eventually, the neural network will be able to independently identify whether there is a cat in the image. The key point is that in the future, AI can perform recognition beyond known training images. This neural network needs to be mapped to the MCU.

What's Inside AI's Pattern Recognition?

AI’s neural networks are similar to the biological neural networks of the human brain. A neuron has multiple inputs but only one output. Basically, these neurons are linear transformations of the inputs—multiplying the inputs by a number (weight w) and adding a constant (bias b), and then producing the outputs through a fixed nonlinear function, also known as the activation function1 ^. As the only nonlinear part of the network, the activation function is used to define the activation range of the artificial neuron’s values. The function of a neuron can be mathematically described as

Where f is the activation function, w is the weight, x is the input data, and b is the bias. The data can be a single scalar, vector, or matrix. Figure 1 shows a neuron with three inputs and an activation function ReLU ^2. Neurons in a network are always arranged in layers.

Figure 1. A neuron with three inputs and one output.

As mentioned above, CNNs are used for pattern recognition and object classification of input data. CNNs are divided into different parts: an input layer, several hidden layers, and an output layer. Figure 2 shows a small network that contains an input layer with three inputs, a hidden layer with five neurons, and an output layer with four outputs. The outputs of all neurons are connected to all inputs of the next layer. The network shown in Figure 2 is not realistic and is only used for illustration. Even for this small network, the equation used to describe the network has 32 biases and 32 weights.

The CIFAR neural network is a type of CNN that is widely used for image recognition. It consists of two main types of layers: convolutional layers and pooling layers, which use two methods, convolution and pooling, respectively, which are very effective in training neural networks. Convolutional layers use a mathematical operation called convolution to identify patterns in arrays of pixel values. Convolution occurs in the hidden layer, as shown in Figure 3. Convolution is repeated many times until the desired level of accuracy is achieved. The output value of the convolution operation will always be particularly high if the two input values being compared (in this case, the input image and the filter) are similar. Filters are sometimes also called convolution kernels. The result is then passed to the pooling layer to extract features and generate a feature map that represents the important features of the input data, called pooling. The operation of the pooling layer requires another filter, called the pooling filter. After training, when the network is running, the feature map is compared with the input data. Since the feature map retains specific characteristics, the output of the neuron will only be triggered when the content is similar. By combining convolution and pooling, the CIFAR network can be used to recognize and classify various objects in images with high accuracy.

Figure 2. A small neural network

Figure 3. CIFAR network model trained with the CIFAR-10 dataset

CIFAR-10 is a specific dataset that is commonly used to train CIFAR neural networks. It consists of 60,000 32×32 color images divided into 10 categories. These images are collected from various sources, such as web pages, news, and personal image collections. Each category contains 6,000 images, evenly distributed among the training set, test set, and validation set, making it an ideal image set for testing computer vision and other machine learning models.

The main difference between convolutional neural networks and other types of networks is the way they process data. Convolutional neural networks examine the properties of the input data one by one through filtering. The greater the number of convolutional layers, the finer the details that can be recognized. After the first convolution, the process starts with simple object properties (such as edges or points) and then a second convolution to recognize detailed structures such as corners, circles, rectangles, etc. After the third convolution, the features can represent certain complex patterns that are similar to parts of objects in the image and are usually unique to a given object. In our initial example, these features are the whiskers or ears of a cat. Visualization of the feature map (as shown in Figure 4) is not necessary for the application itself, but it helps to understand convolution.

Even a small network like CIFAR has hundreds of neurons per layer and many serially connected layers. As the complexity and size of the network increases, the number of weights and biases required grows rapidly. The CIFAR-10 example shown in Figure 3 already has 200,000 parameters, each of which requires a set of determined values during training. Feature maps can be further processed by pooling layers to reduce the number of parameters that need to be trained and retain important information.

Figure 4. Feature map of CNN

As mentioned above, after each convolution in a CNN, pooling usually occurs, which is often referred to as subsampling in some literature. It helps reduce the dimensionality of the data. Many areas in the feature map in Figure 4 contain little or no meaningful information. This is because the object is only a small part of the image and does not constitute the entire image. The rest of the image is not used in the feature map and is therefore not relevant for classification. In the pooling layer, the pooling type (max pooling or mean pooling) and the size of the pooling window matrix are specified. During the pooling process, the window matrix is gradually moved over the input data. For example, max pooling selects the largest data value in the window and discards all other values. In this way, the amount of data is continuously reduced, and finally unique attributes for each object category are formed.

The result of convolution and pooling is a large two-dimensional matrix. To achieve our real goal, classification, we need to transform the two-dimensional data into a very long one-dimensional vector. The transformation is done in the so-called flattening layer, followed by one or two fully connected layers. The neurons in the fully connected layer are similar to the structure shown in Figure 2. The output of the last layer of the neural network should be consistent with the number of categories that need to be distinguished. In addition, in the last layer, the data is normalized to produce a probability distribution (97.5% cat, 2.1% leopard, 0.4% tiger, etc.).

This is the entire process of neural network modeling. However, the weights and contents of the convolution kernels and filters are still unknown and must be determined through network training to enable the model to work. This will be explained in subsequent articles.

^1Usually use sigmoid, tanh or ReLU functions.
² ReLU: Rectified Linear Unit. For this function, when the input value is negative, the output is zero; when the input value is greater than zero, the output value is the input value.

????Click here to explore the ADI "chip" world

Latest articles about

■How do integrated switch controllers improve system energy efficiency?

■Reduce the pressure of hardware development, do you know this solution?

■Considerations for Using Zero-Drift Amplifiers in Wider Bandwidth Applications

■In-depth discussion of the CANopen protocol for low-power electronic control

■Exclusive benefits for engineers: Unlock new video courses and win wonderful gifts!

■Why does my hot-swap controller circuit oscillate?

■Several different ways to optimize SPI drivers

■How to design a four-switch buck-boost DC-DC converter using GaN FETs?

■Improve the security of the intelligent edge and learn about the ADI Assure trusted edge security architecture

■Timing Challenges of Multi-Axis Robots