The “ticket” for the second half of autonomous driving-EEWORLD

Collect

The first half of the autonomous driving war is about hardware and algorithms, while the second half is about data and the ability to turn data into gold, that is, data closed-loop capabilities.

However, the author learned from communications with industry insiders that at present, the idea of supporting iterative upgrades of autonomous driving systems through large-scale data collection from mass-produced vehicles has not yet been realized. Some companies have not yet set up such a data closed-loop process; although some companies have set up the process and have collected some data, they have not yet made good use of the data because the data closed-loop system is not advanced enough.

There are a lot of inefficient parts in the traditional closed loop of autonomous driving data. For example, almost every company needs to rely on the "human sea tactic" in the data annotation process, and relies on manual scene classification of the collected data one by one.

Fortunately, we are in an era of rapid technological updates. With the development of deep learning technology, especially as the potential of large models is gradually released, people are delighted to find that many links in the data closed loop can be realized. With automation or semi-automation, efficiency will also be significantly improved.

Due to the capacity advantage brought by the large number of parameters, the performance and generalization ability of large models are significantly improved compared to small models. In traditional data closed-loop processes such as data preprocessing and data annotation, which require a lot of manpower and are inefficient, the performance of large models is remarkable. Many companies are actively exploring, hoping to apply large models to data closed loops to accelerate algorithm iteration.

Large models may help data closed loop enter the 2.0 era (the era of low automation can be called the data closed loop "1.0" era), thus affecting the competitive situation of autonomous driving in the second half.

However, training large models requires a large amount of data and extremely high computing power, which places high demands on the underlying hardware facilities and AI R&D platforms.

In order to create a highly efficient data closed-loop system, Tesla also developed its own DOJO supercomputing center. Currently, Tesla's Autopilot system has collected more than 2.09 billion kilometers of road mining data. To a certain extent, investment in data closed-loop systems is one of the reasons why Tesla has taken a significant lead in autonomous driving research and development.

However, Tesla’s investment in this approach is also huge. It is reported that Tesla’s investment in DOJO supercomputing will exceed US$1 billion in 2024. in the country, there are only a handful of companies with such financial resources.

Then, a more feasible option for domestic OEMs and autonomous driving companies is to go to the cloud and quickly enter the cloud market with the help of large model capabilities, computing power, tool chains and other infrastructure and development platforms opened by cloud vendors. Data closed loop 2.0 era.

In particular, if cloud vendors have full-stack self-research capabilities and can provide a complete set of infrastructure, then OEMs and autonomous driving companies will not need to consider issues such as inconsistent tool interfaces provided by different companies when using them, which can reduce a lot of adaptation efforts. work to further improve development efficiency.

1. How large models accelerate data closed loop 2.0

In the data closed-loop 1.0 era, people are not yet ready to deal with the demand for large amounts of data in the development of autonomous driving systems. The degree of automation and efficiency of each module is not high enough.

The Data Closed Loop 2.0 era requires a system that can quickly process large amounts of data, allowing data to flow faster within the system, improving the efficiency of algorithm iterations, and making cars smarter as they drive.

At the 7.21 Huawei Cloud Intelligent Driving Innovation Summit, Huawei Cloud Autonomous Driving Development Platform was launched. With the blessing of the Pangu large model, the platform has excellent corner case solving capabilities, data preprocessing capabilities, data mining capabilities, and data annotation capabilities. , compared with the traditional data closed-loop system, they have shown significant improvements.

1.1 Pangu large model helps solve corner cases

The traditional way to solve corner cases is to collect as much relevant data as possible through actual vehicle road collection, and then train the model so that the model has the ability to respond. This method is more expensive and less efficient. What's more, many special scenes occur very rarely and are difficult to collect in real vehicles.

In recent years, people have discovered that NeRF technology can be used for scene reconstruction, and then by adjusting parameters such as changing the viewing angle, changing the lighting, and changing the vehicle driving path to simulate some scenes (synthetic data) that occur less frequently in the real world. As a supplement to actual vehicle road collection data.

As early as early 2022, Waymo began to use synthetic data generated based on NeRF technology for autonomous driving algorithm training.

One of CVPR's highlight papers this year, UniSim: A Neural Closed-Loop Sensor Simulator, also explores the use of NeRF technology for scene reconstruction. In this article, the authors from self-driving truck company Waabi divide the scene into three parts: static background (such as buildings, roads, and traffic signs), dynamic objects (such as pedestrians and cars), and objects outside the area (such as the sky and very far roads), and then use NeRF technology to model static backgrounds and dynamic objects respectively.

The author found that the scene reconstructed using NeRF technology is not only highly realistic, but also easy to expand. R&D personnel only need to collect data once for reconstruction.

In China, the scene reconstruction large model developed by Huawei Cloud based on the Pangu large model also incorporates NeRF technology. This model can perform scene reconstruction (synthetic data) based on the collected road mining video data. It is difficult for ordinary users to distinguish the difference between these reconstructed scenes and real scenes with the naked eye. These reconstructed scene data supplement the real road acquisition data and can be used to improve the accuracy of the perception model.

Specifically, the scene reconstruction large model inputs segments of road mining videos. After the model reconstructs the scene of these videos, the user can edit the weather, road conditions, and the attitude, position, and driving trajectory of the main vehicle, and then generate a new video data.

For example, users can change the weather in the original video from sunny to rainy, change day to night (as shown below), and turn a wide, flat road into a muddy trail.

In other words, users can generate more data by editing scene elements without relying entirely on road mining. In particular, for some data that is not convenient to collect, such as scene data under extreme weather, users can use scene reconstruction to generate it.

An engineer from Huawei Cloud told Jiuzhang Zhijia:

When we used scene reconstruction data to train the perception algorithm, we found that the data was indeed helpful for algorithm training. At the same time, our large models are constantly improving the coverage of these virtual scenes, striving to make these data more widely used, so that the autonomous driving algorithm can deal with more corner cases.

The improved ability to solve corner cases means that the autonomous driving system will be more involved in vehicle driving and the user experience will be better, which will ultimately lead to an increase in the penetration rate of autonomous driving.

1.2 Pangu large model assists data preprocessing

The data collected by the vehicle terminal generally needs to be pre-processed before entering the mining and annotation process. The main function of preprocessing is to classify data, remove unnecessary data, and retain data of important scenarios.

The traditional method of classifying data using manual playback is very time-consuming. If a large model is used to understand the content of the video and automatically classify the video data, work efficiency can be greatly improved.

The difficulty in using a model to classify videos lies in how to unlock the scene of the video through semantics. The large scene understanding model developed by Huawei Cloud based on the Pangu large model can semantically understand and then classify video data. After the user uploads the video data, the model can identify the key information and mark it according to the video category and the time of occurrence, as shown in the figure below. It also supports combined retrieval.

After testing, the scene understanding large model can recognize weather, time, objects, etc. with an accuracy of more than 90%.

It is reported that this kind of solution has been implemented in a certain main engine factory project. Engineers only need to call the API provided by Huawei Cloud to use the large scene understanding model to complete the work of classifying video data.

1.3 Pangu large model assists data mining

After the vehicle transmits road mining data back to the cloud, engineers usually need to mine higher-value data. Traditional methods of mining long-tail scenes based on tags generally can only distinguish known image categories.

Large models have strong generalization and are suitable for mining long-tail data.

In 2021, OpenAI released the CLIP model (a text-image multi-modal model), which can get rid of dependence on image tags, correspond text and images after unsupervised pre-training, and classify images based on text. .

This means that engineers can use such a text-image multi-modal model to retrieve image data in the drive log using text descriptions, for example, 'an engineering vehicle dragging cargo', 'a traffic light with two light bulbs on at the same time' Wait for long-tail scenarios.