Four common misconceptions about end-to-end autonomous driving

Publisher:Qingliu2022Latest update time:2024-05-22 Source: 智能汽车设计 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere
Not surprisingly, as Tesla V12 is widely promoted in North America and begins to gain recognition from more and more users with its good performance, end-to-end autonomous driving has become the most concerned technical direction in the autonomous driving industry. Recently, I had the opportunity to communicate with many first-class engineers, product managers, investors, and media people in the industry. I found that although everyone is very interested in end-to-end autonomous driving, there are still some misunderstandings in the basic understanding of end-to-end autonomous driving. As someone who has had the honor of experiencing the city functions with and without maps of domestic first-tier brands, and has also experienced the two versions of FSD V11 and V12, here I would like to combine my professional background and the long-term progress tracking of Tesla FSD to talk about some common misunderstandings about end-to-end autonomous driving at this stage, and give my own interpretation of these issues.

01


Question 1: Can end-to-end perception and end-to-end decision-making and planning be considered end-to-end autonomous driving?


First of all, the definition of end-to-end autonomous driving is basically clear to everyone, that is, all the steps from sensor input to planning and even control signal output (Musk's Photon to Control) are end-to-end directional, so that the entire system can be used as a large model for gradient descent training. Through gradient backpropagation, the parameters of all links from input to output of the model can be updated and optimized during model training, so as to optimize the driving behavior of the entire system based on the driving decision trajectory directly perceived by the user. Recently, some friendly companies have claimed that they are end-to-end perception or end-to-end decision-making in the process of promoting end-to-end autonomous driving, but I think neither of them can be regarded as end-to-end autonomous driving, but can only be called pure data-driven perception and pure data-driven decision-making planning.

Some people even call the hybrid strategy of model decision-making and combining traditional methods for safety verification and trajectory optimization end-to-end planning. There are also claims that Tesla V12 is not a pure model output control signal, but a hybrid strategy that combines some rule methods. The reason is that the famous Tesla hacker Green on http://X.com sent a tweet some time ago saying that the rule code can still be found in the V12 technology stack. My understanding of this is that the code discovered by Green is likely to be the V11 version code retained by the V12 high-speed technology stack, because we know that V12 actually only replaces the original urban technology stack with end-to-end, and the high-speed will still switch back to the V11 solution. Therefore, finding some words of rule code in the cracked code does not mean that V12 is a fake "end-to-end" but that it is likely to be a high-speed code. In fact, we can see from the AI ​​Day in 2022 that V11 and previous versions are already hybrid solutions. Therefore, if V12 is not a complete model-direct trajectory, then there is no essential difference in the solution from the previous version. If this is the case, there is no reasonable explanation for the leap in performance improvement of V12. For Tesla’s previous plans, please refer to my AI Day interpretation EatElephant: Tesla AI Day 2022 - Ten thousand words interpretation: It can be called the Spring Festival Gala of autonomous driving, with a decentralized R&D team and an ambitious transformation into an AI technology company.

From the 2022 AI Day, V11 is already a hybrid of NN Planner’s planning solution

In short, whether it is the perception post-processing code, the planned candidate trajectory scoring, or even the safety backup strategy, once the rule code is introduced and there is an if-else branch, the gradient transfer of the entire system will be truncated, which will lose the biggest advantage of the end-to-end system to obtain global optimization through training.

02


Question 2: Is end-to-end a complete overhaul of previous technologies?


Another common misconception is that end-to-end means overthrowing previously accumulated technologies to carry out thorough innovation of new technologies. Many people think that since Tesla has just achieved user push of end-to-end autonomous driving systems, other manufacturers do not need to iterate on the original modular technology stack of perception, prediction, and planning. Instead, they can directly enter the end-to-end system and quickly catch up with or even surpass Tesla with the latecomer advantage. It is true that using a large model to complete the mapping from sensor input to planning control signals is the most thorough end-to-end. Companies have tried similar methods a long time ago, such as Nvidia's DAVE-2 and Wayve. This thorough end-to-end technology is indeed closer to a black box, and it is difficult to debug and iterate and optimize. At the same time, since sensor input signals such as images and point clouds are very high-dimensional input spaces, output control signals such as steering wheel angles and accelerator and brake pedals are very low-dimensional output spaces relative to inputs. There are many possible mappings from high-dimensional space to low-dimensional space. However, only one of them corresponds to the correct and reliable logic. In other words, direct end-to-end training is very prone to overfitting, making it completely unusable for actual vehicle testing.

A thorough end-to-end system will also use some common auxiliary tasks such as semantic segmentation and depth estimation to help model convergence and debugging.

So the FSD V12 we actually see retains almost all the previous visualization content, which shows that FSD V12 is end-to-end trained on the basis of the original powerful perception. The FSD iteration that started in October 2020 has not been abandoned, but has become a solid technical foundation for V12. Andrej Karparthy has answered similar questions before. Although he did not participate in the development of V12, he believes that all previous technical accumulation has not been abandoned, but has just moved from the front stage to the backstage. Therefore, end-to-end is to gradually remove part of the rule code on the basis of the original technology to gradually achieve end-to-end derivability.

V12 retains almost all the perception of FSD, only canceling limited visual content such as the vertebral barrel


03


Question 3: Can the end-to-end model in academic papers be migrated to actual products?


UniAD becoming the 2023 CVPR Best Paper undoubtedly represents the high hopes that the academic community has for end-to-end autonomous driving systems. Since Tesla introduced its innovation in visual BEV perception technology in 2021, the domestic academic community has invested a lot of enthusiasm in autonomous driving BEV perception, and a series of studies have been born, which has promoted the performance optimization and implementation of BEV methods. So can the end-to-end also take a similar route, led by academia and followed by the industry to promote the rapid iteration and implementation of end-to-end technology in products? I think it is more difficult. First of all, BEV perception is still a relatively modular technology, more at the algorithm level, and the entry-level performance does not require so much data. The launch of the high-quality academic open source dataset Nuscenes provides a convenient prerequisite for many BEV studies. Although the BEV perception solution iterated on Nuscenes cannot meet the product-level performance requirements, it is of great reference value as a proof of concept and model selection. However, the academic community lacks large-scale end-to-end available data. Currently, the largest Nuplan dataset contains 1,200 hours of real-car data collected in four cities. However, at a financial report meeting in 2023, Musk said that for end-to-end autonomous driving, "after training 1 million video cases, it can barely work; 2 million, slightly better; 3 million, you will feel Wow; when it comes to 10 million, its performance becomes incredible". Tesla's Autopilot feedback data is generally believed to be a 1-minute clip, so the entry-level 1 million video cases are about 16,000 hours, which is at least an order of magnitude more than the largest academic dataset. It should be noted here that nuplan collects data continuously, so there are fatal flaws in the distribution and diversity of data. Most of the data are simple scenes, which means that using academic datasets such as nuplan cannot even get a version that can barely get on the car.

The Nuplan dataset is already a very large academic dataset, but it may not be enough to explore end-to-end solutions.

So we see that most end-to-end autonomous driving solutions, including UniAD, cannot be run on real cars, and can only be used for open-loop evaluation. The reliability of open-loop evaluation indicators is very low, because open-loop evaluation cannot identify the problem of model confusion of cause and effect, so even if the model only learns to use historical path extrapolation, it can also obtain very good open-loop indicators, but such a model is completely unusable. In 2023, Baidu published a paper called AD-MLP (https://arxiv.org/pdf/2305.10430) to discuss the shortcomings of open-loop planning evaluation indicators. This paper only used historical information without introducing any perception, and obtained very good open-loop evaluation indicators, even close to some current SOTA work, but it is obvious that no one can drive a car well with their eyes closed!

AD MLP achieves good open-loop performance by not relying on sensory input, which shows that using open-loop performance as a reference is not very meaningful.

So can closed-loop policy verification solve the problem of open-loop imitation learning? At least for now, the academic community generally relies on the CARLA closed-loop simulation system for end-to-end research and development, but the model obtained by CARLA based on the game engine is also difficult to migrate to the real world.

04

[1] [2]
Reference address:Four common misconceptions about end-to-end autonomous driving

Previous article:Xiaomi cars will be the first to use the Pengpai cockpit chip, and Xiaomi mobile phones are also planning to use it.
Next article:GM patents road rage prevention system that can take over if vehicle loses control

Latest Automotive Electronics Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号