Autonomous driving sensor: camera deep learning vision technology-EEWORLD

Collect

introduction

The traditional camera vision technology is relatively easy to implement in terms of algorithms, so it has been used by most existing car manufacturers for assisted driving functions. However, with the development of autonomous driving technology, algorithms based on deep learning have begun to emerge. In this issue, the editor will talk about the relevant technical information of deep vision algorithms. Let us learn together.

01. Overview of Deep Learning

Deep learning (DL) is a general term for a class of pattern analysis methods and is a new research direction in the field of machine learning (ML). By learning the inherent laws and representation levels of sample data, deep learning enables machines to have the ability to analyze and learn like humans, and can recognize data such as text, images and sounds, thereby realizing artificial intelligence (AI).

Figure: (artificial intelligence, machine learning, deep learning) relationship diagram

02. Significance of Deep Learning

Many of you may know that for a car to achieve autonomous driving, the three major systems of perception, decision-making, and control are indispensable. Among them, perception is placed first by us, because the vehicle first needs to understand the relationship between the vehicle and the three-dimensional changes in the real world in real time, that is, to accurately understand the position relationship and changes between the vehicle and the surrounding people, vehicles, obstacles, and road elements. Deep learning algorithms have effectively improved the "intelligence" level of sensors such as cameras and lidars, which to a large extent determines the reliability of autonomous vehicles in complex road conditions. Therefore, the application of deep learning has become the key. In addition, although there are many types of perception sensors for cars, the camera is the only sensor that can perceive the real world through images. Through deep learning, the ability to recognize images can be quickly improved, making our driving safer.

03. Differences between traditional camera vision algorithms and deep learning algorithms

Friends who have read the editor’s previous article about traditional camera vision algorithms will ask, since traditional camera vision algorithms can be used, why do we still need to study deep learning algorithms?

Because traditional vision algorithms have their own bottlenecks, whether it is a monocular camera or a multi-camera camera, traditional vision algorithms are based on artificial feature extraction to obtain sample feature libraries for identification and calculation. When the autonomous vehicle is driving, if it is found that the feature library does not have the sample or the feature library sample is inaccurate, the traditional vision algorithm will not be able to recognize it. In addition, the traditional vision algorithm also has poor segmentation in complex scenes. Therefore, the traditional vision algorithm based on artificial feature extraction has performance bottlenecks and cannot fully meet the target detection of autonomous driving.

Image source: Deep Learning vs. Traditional Computer Vision

The feature extraction advantage of the camera's deep learning vision algorithm is that it is based on a neural network algorithm, which simulates the human neural network and can perform semantic segmentation on information such as images input by the camera in autonomous driving (even the point cloud of lidar), effectively solving the problem of traditional vision algorithms having poor segmentation of complex actual scenes or sample feature libraries, allowing higher accuracy in tasks such as image classification, semantic segmentation, target detection, and simultaneous localization and mapping (SLAM).

Next, to make it easier for everyone to understand, I will first talk about what deep learning neural networks are, how they help cameras complete visual calculations such as image recognition, and how they are better than traditional camera visual algorithms.

04. Deep Learning Neural Network

It is easy to find that deep learning is accomplished by "depth" + "learning" from the literal meaning. "Depth" is to imitate the mode of information transmission and processing between neurons in the brain. Its model structure includes input layer, hidden layer and output layer. The input layer and output layer generally have only one layer, while the hidden layer (or middle layer) often has 5, 6 or even more layers. Multiple hidden layer (middle layer) nodes are called "depth" in deep learning; "learning" is to carry out "feature learning" or "representation learning", that is, through layer-by-layer feature transformation, the feature representation of the sample in the original space is transformed into a new feature space, and big data is used for learning and tuning, to establish an appropriate amount of neuron computing nodes and a multi-layer operation hierarchy, to get as close as possible to the actual correlation relationship, so as to make feature classification or prediction easier.

Figure: Schematic diagram of neural network structure

The above content is too abstract. In simple terms, the neural network has three layers:

Input: Each neuron in the input layer corresponds to a variable feature, and the neurons in the input layer are equivalent to containers containing numbers.

Output: Output layer, one neuron for regression problems and multiple neurons for classification problems

Parameters: All parameters in the network, that is, the weights and biases of the neurons in the middle layer (or hidden layer). Each neuron represents the features learned by the neural network in that layer.

Here you just need to remember that no matter the size of the neural network, it is composed of stacked single neuron networks.

It doesn’t matter if it’s hard to understand. Let me give you an example to explain it.

Suppose we want to buy a house, then the final transaction price we can afford is the output layer;

The input layer may have many raw features (i.e., factors for buying a house, such as house size, number of rooms, number of nearby schools, school education quality, public transportation, and parking spaces);

The neurons in the middle layer (or hidden layer) are the features we can learn, such as family size, education quality, travel

The more input feature data we collect, the more sophisticated the neural network will be. And as the number of original feature neurons in the input layer increases, the middle layer can learn enough and more detailed combination features with different meanings from the original features. For example, the area of the house and the number of rooms can indicate the number of family members, and the number of schools and the quality of the schools indicate the quality of education. Through the classification, statistics and calculation of the features corresponding to each neuron, we finally get the "housing price" we want in the output layer.

For the deep learning of the camera, the input layer is the image obtained by the camera. The image can be regarded as a bunch of data streams for the camera deep learning algorithm. These data streams can also be divided into more original features, such as the sparseness and density of each pixel in the image, semantic and geometric information, as well as color, brightness, grayscale, etc.; the middle layer classifies and calculates the original feature information of the input layer, and can identify the objects contained in the image (such as lane lines, obstacles, people, cars, traffic lights, etc.), and finally outputs the real-time distance, size, shape, traffic light color and other elements of objects related to the autonomous driving vehicle, helping the autonomous driving vehicle to complete real-time perception of the surrounding environment, recognition, ranging and other functions.

Pictured: NavInfo - Camera Vision Recognition Sample

Pictured: NavInfo - underground garage mapping and real-time relocation system

From the above, we can see that the deep learning algorithm for camera vision based on neural networks is much more useful than the traditional camera vision algorithm based on artificial feature extraction. Therefore, the current mainstream camera vision algorithms will use deep learning to solve the accuracy, recognition rate and image processing speed of self-driving cars for image classification, image segmentation, object detection, multi-target tracking, semantic segmentation, drivable area, target detection and simultaneous localization and mapping (SLAM), scene analysis and other tasks. Deep learning vision algorithms also make it possible for self-driving cars to be mass-produced quickly.

05. Camera Deep Learning Algorithm

There are three common deep learning vision algorithms used in autonomous driving camera sensors:

(1) A neural network system based on convolution operations, namely a convolutional neural network (CNN). It is widely used in image recognition.

(2) Autoencoding neural networks based on multi-layer neurons, including autoencoders and sparse coding, which has received widespread attention in recent years.

(3) Pre-training is performed in the form of a multi-layer autoencoder neural network, and then the deep belief network (DBN) weights are further optimized by combining the identification information.

Figure: General process of deep learning

06. Deep learning is a black box

Although we have talked so much about how deep learning algorithms based on neural networks obtain input and output, in fact, the above cases and algorithm classifications only help us to simply understand the neural network of deep learning. In fact, deep learning is a "black box". "Black box" means that the intermediate process of deep learning is unknown, and the results of deep learning are uncontrollable. In fact, programmers do not know how the neural network they program learns. They only know that the final output result is to use the "Universal approximation theorem" to fit the relationship between input data and output results as accurately as possible.

[1] [2]

Keywords：Camera Reference address：Autonomous driving sensor: camera deep learning vision technology

Previous article：SAIC's self-developed fuel cell system technology performance is comparable to the world's leading level
Next article：Which electric car company is the best? You will understand after understanding the solutions of these semiconductor manufacturers

Recommended ReadingLatest update time:2024-11-15 08:02

Velodyne Lidar Supplies Alpha Prime Sensors for Motional Autonomous Vehicles

SAN JOSE, Calif.--(BUSINESS WIRE)--Velodyne Lidar, Inc. (Nasdaq: VLDR) today announced a multi-year sales agreement for its Alpha Prime™ sensors with Motional, a global leader in autonomous technology. Velodyne will become the exclusive supplier of long-range surround view lidar sensors for Motional’s SAE Level 4 auto

[Automotive Electronics]

Velodyne Lidar Supplies Alpha Prime Sensors for Motional Autonomous Vehicles

Li Yuan from Beixing: The next five years will be a sea of stars for hybrid solid-state LiDAR

Dr. Li Yuan, CEO of Benewake, was recently invited to attend the "2021 Autonomous Driving Development Conference" jointly organized by the China Optical Engineering Society, Tsinghua University, China Intelligent Connected Vehicle Industry Innovation Alliance, Institute of Microelectronics of the Chinese Academy of Sc

[Automotive Electronics]

Li Yuan from Beixing: The next five years will be a sea of stars for hybrid solid-state LiDAR

aiSim5 LiDAR model verification method (Part 2)

The LiDAR in aiSim is a sensor based on ray tracing that can simulate the laser beam emitted by a real LiDAR and generate a 3D point cloud in the LAS v1.4 standard format, including azimuth, pitch angle, distance, etc. aiSim can simulate LiDAR monostatic and coaxial configurations. In aiSim, LiDAR simulat

[Embedded]

aiSim5 LiDAR model verification method (Part 2)

Velodyne Lidar releases Puck 32MR sensor with the highest resolution

　　Image source: Velodyne Lidar official website 　　According to foreign media reports, Velodyne Lidar, Inc. has launched the Puck 32MR? sensor, which provides cost-effective perception solutions for the low-speed autonomous driving market, including industrial vehicles, robots, space shuttles and drones. The lidar sen

[Embedded]

Velodyne Lidar releases Puck 32MR sensor with the highest resolution

Velodyne Lidar automotive laser radar solutions unveiled at the 2021 Guangzhou International Auto Show to help future mobility

Solid-state lidar sensors Velarray H800 and Velarray M1600, long-range Alpha Prime sensors... These advanced Velodyne Lidar lidar solutions were all unveiled at the 2021 Guangzhou International Auto Show. By fully displaying breakthrough, cost-effective solid-state lidar sensors and leading technologies led by smart i

[Automotive Electronics]

Velodyne Lidar automotive laser radar solutions unveiled at the 2021 Guangzhou International Auto Show to help future mobility

Technical Article—How an open-source LIDAR prototyping platform can shorten your design process

Summary This article explores ADI’s new and broadly marketed LIDAR prototyping platform and how it helps customers shorten product development time by providing a complete hardware and software solution that enables users to prototype their algorithms and custom hardware solutions. It details the modular hardware d

[Automotive Electronics]

Technical Article—How an open-source LIDAR prototyping platform can shorten your design process

Popular Resources
Popular amplifiers