Comparison of common autonomous driving algorithms in the market-EEWORLD

Collect

At present, the academic circle still uses "ranking" to score autonomous driving algorithms. The so-called "ranking" is to use its training data set on a certain data set to test the pros and cons of the algorithm. At present, the most commonly used ranking data set in the autonomous driving circle is nuScenes under Aptiv. Strictly speaking, it is almost impossible to compare the scores of autonomous driving algorithms. It is not fair to compare the algorithms alone. In addition, the efficiency and feasibility of the algorithms must also be considered. The data structure of the training data set will also affect the performance of the algorithm. At the same time, due to the unexplainable nature of deep learning, good performance on the nuScenes data set does not mean that it will perform well in other data sets. It may perform very poorly. Similarly, poor performance on the nuScenes data set does not mean that it will perform poorly in other data sets. Of course, the size of the computing power has nothing to do with the accuracy of the algorithm.

The tasks of the nuScenes dataset include six categories, namely 3D target detection, target tracking, target trajectory prediction, lidar segmentation, panoptic, and decision planning. Among them, 3D target detection is the most basic task of autonomous driving. Nearly 300 teams or companies around the world have participated in the competition. It is also the global autonomous driving dataset with the most participants, which shows its authority. The list of decision-making tasks has not been announced yet because there are too few people on the list. The participation in target tracking and target trajectory prediction is relatively high, while the participation in lidar target segmentation and panoptic is very low, with less than 20 companies participating.

Most of the recent list winners are Chinese companies or universities. Apart from China, other regions lack interest in autonomous driving. Even in the United States, most of the people researching autonomous driving are Chinese. Few car companies will participate in the list. In the early days, there were companies such as Mercedes-Benz and Bosch. Mercedes-Benz's performance was terrible, while Bosch was not bad. The reason why car companies do not participate in the list is very simple. If the performance is good, consumers will not know about it. If the performance is poor, it will be used by competitors to attack. So they simply do not participate. If they participate, they are very confident in their abilities, such as Leapmotor and SAIC.

The top 15 are as follows:

Source: Public information compilation

The nuScenes dataset is inspired by the groundbreaking KITTI dataset (completed by Toyota and Germany's KIT in 2012). nuScenes is the first large-scale dataset that provides data from the entire sensor suite of an autonomous vehicle (6 cameras, 1 LiDAR, 5 Radars, GPS, IMU). Compared to KITTI, nuScenes contains 7 times more object annotations. The full dataset includes approximately 1.4M camera images, 390k LiDAR sweeps, 1.4M Radar sweeps, and 1.4M object bounding boxes in 40k keyframes. To facilitate common computer vision tasks such as object detection and tracking, 23 object classes are annotated with accurate 3D bounding boxes at 2Hz over the entire dataset; object-level properties such as visibility, activity, and pose are also annotated.

If only cameras are used, that is, pure vision, Horizon Robotics' Sparse4D takes first and second place. Megvii's FAR3D is third, HOP, jointly developed by SenseTime, the University of Hong Kong, Harbin Institute of Technology, etc., is fourth, and Toyota ranks fifth. The effect of pure vision is much worse than that of vision and lidar fusion, but the effect of pure lidar is very similar to that of vision and lidar fusion.

There are six scores for 3D object detection (see the table below).

mAP is the abbreviation of mean of Average Precision.

mATE, Average Translation Error,The average translation error (ATE) is the two-dimensional Euclidean center distance,(in meters).

mASE, Average Scale Error, The average scale error (ASE) is 1 - IoU, where IoU is the three-dimensional intersection over union after angle alignment.

mAOE, Average Orientation Error The average angle error (AOE) is the minimum yaw angle difference between the predicted value and the true value. (All categories have an angle deviation within 360∘ degrees, except for the obstacle category, which has an angle deviation within 180∘).

mAVE, Average Velocity Error The average velocity error (AVE) is the L2 norm of the two-dimensional velocity difference (m/s).

mAAE, Average Attribute Error,The average attribute error (AAE) is defined as 1−acc, where acc is the category,classification accuracy.

Among them, mAP is the most core indicator.

Source: Public information compilation

mAP stands for the mean of average precision and is a measure of model performance in object detection. In object detection, because of the object positioning box, the accuracy in classification is not applicable, so the mAP indicator unique to object detection was proposed. SAIC ranks first in this single item.

The mAP calculation flow chart is very complicated. The class here is the classification, and nuScenes has 23 categories. Ground truth is the true value of manual annotation. Of course, it can also be automatically labeled by computer, but manual annotation is indispensable, just in proportion. Generally speaking, fine annotation is manual annotation, and computer automatic annotation is sparse annotation. Prediction is the answer given by the deep learning model based on the training data set.

To understand the concept of average precision, you must first be familiar with several basic concepts:

Precision refers to the ratio of true positive examples among all predicted positive examples, that is, the accuracy of the prediction.

Recall refers to the ratio of correctly predicted positive examples among all positive examples, that is, the coverage of correct predictions.

The true positive rate is TP, the true negative rate is TN, the false positive rate is FP, and the false negative rate is FN.

The precision is TP/TP+FP, and the recall is TP/FP+FN.

For AP calculation of a single category, each prediction result in object detection consists of two parts: the prediction box (boundingbox) and the confidence probability (PC). The bounding box is usually represented by the coordinates of the upper left corner and the lower right corner of the rectangular prediction box, i.e. x_min, y_min, x_max, y_max. The red box is the true value, i.e. the groundtruth, which is the correct answer; the green box is the algorithm prediction value, and 88% is the confidence level, which means that there is an 88% chance that it is a dog.

Intersection over Union (IoU) is generally called intersection over union (IoU) in Chinese. IoU measures the degree of overlap between two regions, which is the ratio of the overlapping area of the two regions to the total area of the two regions (the overlapping area is only counted once). As shown in the figure above, the IoU of two rectangular boxes is the ratio of the intersection area to the combined area.

Suppose there are 10 true values of a certain category in the test data set, such as "cat", and this algorithm predicts 5 of them. There are also 10 true values of the "dog" category, and this algorithm also predicts 10 of them. Then there are the following values.

Based on the precision and recall rates, we draw a curve according to different confidence thresholds.

Conf.Thresh. is the abbreviation of confidence threshold. According to the table, we can get a curve of precision and recall.

AP is a scalar that can be calculated in two ways.

1) Obtain AP by rectangular accumulation

2) Calculate AP by interpolating 10 points

K is the number of categories, which is 23.

The target tracking list is as follows, only the top five are selected.

Source: Public information compilation

These algorithms are mainly based on performance, and rarely on implementation. However, there are also algorithms that take actual implementation into consideration, such as Aptiv's pure lidar PointPillars, which was available as early as March 2019. Its mAP is only 0.305, but it has a frame rate of 61.2 per second using a 1080ti graphics card. The maximum frame rate can be relaxed to 150Hz, with minimal resource consumption. It is also the most common lidar algorithm at present.

Leapmotor's EA-LSS algorithm model is based on NVIDIA DGX-A100, which means 8 A100 graphics cards, with a frame rate of less than 15 per second, which is obviously not feasible.

The development of autonomous driving is facing difficulties. The algorithms are becoming more and more complex, the parameters are increasing, the demand for computing power is increasing, and the price of high-computing chips is increasing. Not only computing power but also storage bandwidth. Transformer requires much higher storage bandwidth than CNN, and the price of high-bandwidth HBM is more than ten times that of mainstream LPDDR4/5. This is not only true for chips, but also for other chips or components of the computing system. This leads to higher and higher costs for autonomous driving systems, and the price of L4 computing systems may eventually exceed US$30,000 or even higher.

Reference address：Comparison of common autonomous driving algorithms in the market

Previous article：Selection and development of dual axial flux motor drive assembly
Next article：From principle to practice: Learn how to use VT2710 to implement SPI simulation

Popular Resources
Popular amplifiers