With the optimization and upgrading of data collection equipment, autonomous driving data sets are also constantly upgraded and iterated. Major autonomous driving companies and research institutes at home and abroad have successively launched autonomous driving data sets, providing important research materials for the future technological development in the field of autonomous driving. The article "Autonomous Driving Open Source Data System: Current Situation and Future" systematically sorts out the open source data sets for autonomous driving, which is of great significance for promoting the virtuous cycle of the industrial ecology. This article is a review of open source data sets for autonomous driving released by Shanghai Artificial Intelligence Laboratory in conjunction with Shanghai Jiaotong University, Fudan University, Baidu, BYD, Weilai and other units. This review systematically sorts out more than 70 open source autonomous driving data sets at home and abroad for the first time, and summarizes how to build high-quality data sets, the core role of data in the closed-loop system of algorithms, and how to use generative large models to produce data on a large scale. On this basis, an in-depth analysis and discussion is carried out on the characteristics, data scale, and key scientific and technological issues that the third-generation autonomous driving data sets should have in the future.
Overview
As one of the important application areas of artificial intelligence, autonomous driving is expected to reshape the existing traffic and transportation mode, greatly improve traffic efficiency and safety, and have a profound impact on future urban and social development. At present, the domestic intelligent networked vehicle industry has entered the trial and start-up stage of commercialization. Road testing and demonstration application scenarios are becoming more mature, autonomous driving functional technology is accelerating iteration, vehicle networking application scenarios are becoming increasingly rich, and relevant laws and regulations at all levels are accelerating. The introduction of policies has jointly promoted the market into a period of rapid development. On the one hand, autonomous driving technology requires a large amount of data to train algorithm models to identify and understand the road environment, so as to make correct decisions and actions and achieve accurate, stable and safe driving experience. Data construction is crucial to the development of autonomous driving technology. On the other hand, the emergence of large models in natural language processing and general vision fields has further confirmed the importance of massive high-quality data, and inspired the construction of autonomous driving data sets!
Review article structure
Autonomous driving dataset
This review divides the nearly 100 open source datasets into two generations: The first generation of datasets is represented by KITTI, which was proposed in 2012. The input sensor modality consists of a monocular camera and a lidar, and a series of comprehensive perception tasks are proposed. The second generation of datasets is represented by nuScenes and Waymo datasets. The complexity of the sensor modality has increased. Surround view cameras, lidars, positioning information, and high-precision maps have become common components. Downstream tasks are oriented towards comprehensive tasks of perception, mapping, prediction, and path planning.
The complexity of sensor modalities is gradually increasing: surround view cameras, lidar, high-precision maps, ultrasonic radar sensors, GPS, IMU, HD Map, etc.
The size and diversity of data sets are growing: In terms of data richness, the collection time of mainstream autonomous driving data sets has gradually increased from about 10 hours at the beginning to 100 hours. With the evolution of automatic labeling technology and labeling tools, data sets of more than 1,000 hours have appeared in recent years. The diversity of driving scenarios is another key factor in the performance of autonomous driving systems. In order to improve the performance of algorithms in specific scenarios, some data sets are collected in multiple cities on multiple continents.
Dataset tasks extend from perception to prediction and planning: Downstream tasks of datasets such as Cityscapes and Mapillary launched in 2016 focus on dynamic object detection. Datasets such as SemanticKITTI and DrivingStereo launched in 2019 introduced tasks such as semantic segmentation, depth estimation, and optical flow estimation. In traditional prediction and planning modules, numerical calculation, optimization, search and other methods are generally used to solve. Datasets such as nuScenes, Waymo, and Argoverse V2 proposed around 2019 include not only perception tasks but also prediction and planning tasks, making it possible to conduct multiple task studies on the same dataset, while leading the community's trend of end-to-end autonomous driving research under the traditional multi-module paradigm.
Estimation of the impact of open source datasets for autonomous driving
Data algorithm closed-loop system
The modular autonomous driving system includes components such as perception, decision-making, planning, and control, most of which are implemented through data-driven neural network models. For these modules, massive and high-quality data is a necessary condition to ensure the performance of the modules. First of all, the introduction of massive data is necessary to solve various problems in existing autonomous driving systems. The problem that has always existed in autonomous driving engineering is the long-tail problem. The reason for this is that the amount of data for training the model is insufficient, resulting in a small number of cases that have not been learned by the model, and in the model reasoning stage, the model cannot give correct results for these edge scenarios. In addition, for rule-based modules, the existing method is to manually design various rules to make the module output results that conform to the artificial design logic. This method is time-consuming and labor-intensive, and it is difficult to cover all situations, which may cause the autonomous driving system to fail in some unseen scenarios. Using data-driven neural networks to replace these modules is a possible solution. At the same time, in the process of neural network learning, the introduction of data noise will inevitably have a negative impact on the optimization process and reduce model performance. Data quality includes not only the resolution and synchronization of sensor data, but also the accuracy of labels. In these two aspects, any quality problem directly affects the performance and safety of the autonomous driving system. In summary, massive and high-quality data has become an indispensable part of building an autonomous driving system.
A new generation of autonomous driving datasets in the era of big models
The current basic big models have achieved remarkable results in the fields of natural language processing and computer vision, but there are no big models for the vertical field of autonomous driving on the market. Taking the big models in other fields as a reference, the new generation of data sets should at least increase the data volume to be similar to that in other fields in order to enable the big models of autonomous driving. On the premise of ensuring the amount of data, the richness of the scene is more important to the performance of the algorithm. Autonomous driving vehicles will inevitably encounter scenes outside the training data in the real world. The large-scale application of autonomous driving technology will inevitably require the model to be able to make correct behaviors in rare scenes to avoid danger or functional failure. For most traffic scenes, it does not require a very large amount of data to cover them, but more attention should be paid to the long-tail scenes. Because some traffic scenes are very rare, such as car crashes, the lack of data will have a huge impact on the performance of the autonomous driving system.
The first and second generation autonomous driving datasets can no longer meet the development needs of autonomous driving systems, and the construction of a new generation of datasets needs to be put on the agenda. In the era of large models, big data has become an indispensable feature of the new generation of datasets. At the same time, modularly designed autonomous driving systems encounter problems such as high iteration costs and limited performance limits during implementation, and end-to-end autonomous driving architectures are gradually gaining favor in the industry. In addition, multimodal sensors, high-quality annotations, and model logical reasoning capabilities also need to be paid attention to. Based on this, this review summarizes the development goals of the new generation of datasets: multimodal, quality and quantity; end-to-end, decision-oriented; intelligent, logical reasoning.
Outlook for autonomous driving datasets in the era of big models
in conclusion
This review comprehensively reviews the current status and challenges of public datasets for autonomous driving. In view of the data algorithm closed-loop system, combined with the current development trend of large models, the vision and planning of the next generation of autonomous driving datasets are proposed. This review systematically summarizes the datasets used in the development of autonomous driving, and demonstrates the importance of promoting community development through challenges and rankings; it generally analyzes the data algorithm closed-loop system for autonomous driving, and summarizes the role of each important link, and finally demonstrates how to use the data algorithm closed-loop system through application cases.
Previous article:Silicon carbide is mainly used in electric vehicles
Next article:The implementation principle of 5G network unmanned driving technology
- Popular Resources
- Popular amplifiers
- A review of deep learning applications in traffic safety analysis
- Dual Radar: A Dual 4D Radar Multimodal Dataset for Autonomous Driving
- A review of learning-based camera and lidar simulation methods for autonomous driving systems
- Multi-port and shared memory architecture for high-performance ADAS SoCs
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Ten milestones on the road to 5G future development
- Big question: Experts, can such a PCB be made?
- Talking about 50 cents: After domestic substitution, is it really good? Continue to replace or return?
- Arteli AT32F415_Getting Started Guide_V1.00
- Questions about rotary transformer and AD2S83
- BYD employee died suddenly in a rental house: worked night shifts for a month before death, each shift lasted 12 hours
- MicroPython Hands-on (15) - AB Buttons on the Control Panel
- Security Tools Bombercat
- The solid state drive doesn't seem to be as good as I thought
- [NXP Rapid IoT Review] Local compilation of online generated projects