Tesla AI Day: Analysis of advantages and disadvantages of pure visual technology route, how far can it go?
To join the "Smart Car Expert Car Camera Industry Exchange Group" from Zhiquan , please add 18512119620 (same as WeChat), and note the company-name-position to join the group
introduction:
Tesla's pure vision technology route algorithm design is based on first principles, using BEV vision algorithms and grid networks to achieve the construction of a three-dimensional spatial bird's-eye view, and incorporating time series features, which greatly narrows the gap with lidar solutions. Problems on the environmental side can be avoided to the greatest extent through the driver's subjective judgment ; on the data side, with the help of a huge number of active vehicles, Tesla also uses shadow mode, unlabeled data technology, etc. to drive the continuous optimization of its own perception, decision-making, and planning algorithms& Iterate. The systemic advantages brought by the integrated integration of Tesla chips + algorithms + data, as well as the broader commercialization path brought by the cost advantages, stability & scalability of pure vision solutions, etc., are expected to make Tesla's autonomous driving technology Maintain continued leadership in the global market and continuously strengthen its leading position in the global electric vehicle market.
On September 30, Tesla held its annual AI Day and shared its latest achievements in AI technology over the past year. Many of the latest concepts and innovations shared this year included the thorough implementation of end-to-end data-driven approaches to all parts of the technical solution , as well as the performance leap brought about by highly vertically integrated optimization from underlying hardware to top-level algorithms. These mainly proposed technical solutions to the two biggest inherent problems of pure vision solutions: identifying unknown obstacles and coping with extreme climatic conditions, and have achieved certain results.
As a unique presence among global automakers, Tesla has always adhered to purely visual self-driving technology solutions. The purely visual autonomous driving solution has also attracted a large number of followers because of its lower cost and faster commercialization path. However, at the same time, the market has always had many concerns about this technology route.
Tesla’s autonomous driving technology route: pure visual solution, better economy and scalability. 1) Starting from February 2022, Tesla will cancel millimeter-wave radar in all North American sales models and enable autonomous driving with a purely visual solution. New domestic car-making forces NIO, Xpeng, and Ideal have all adopted the "camera + radar + high-precision map" solution. Traditional car companies such as Volkswagen currently adopt a "camera + radar" solution, and in the future it is expected that some models will be equipped with lidar. 2) Currently, Tesla has achieved full self-research of chips and software systems. Comparing the products of OEMs and RoboTaxi manufacturers, it can be seen that Tesla can currently conduct full-stack self-research on autonomous driving hardware, software systems, and application algorithms.
Algorithm framework: from perception to planning, decision-making, and simulation
Up to now, Tesla's software algorithms in the field of autonomous driving have gradually built a complete system of perception, planning algorithms, and simulation training by relying on data & neural networks. Especially in the perception layer, Tesla focuses on narrowing the gap with multi-sensor solutions through new technologies such as BEV, Transformer, timestamp, and Occupancy.
The efficiency and latency issues of the planning layer are still common problems faced by all autonomous driving solutions, and Tesla has currently achieved a certain lead through its unique data-driven lightweight generation network. In the following article, we will analyze the current status of Tesla's pure visual solution based on Tesla's latest technological progress in these fields.
Data engine:
one of
Tesla’s
core
strengths
Third-party pure visual solution provider: Software 1.0 development model makes more use of open source or modified open source algorithms, and multiple subtasks are connected in series to form a system. Currently, other pure visual systems developed based on Software 1.0 thinking have the following problems:
1) The perception task under software 1.0 thinking is composed of multiple independent sub-tasks in series, including dynamic and static object perception, road line perception, traffic sign perception, etc. Each subtask has a certain error, and under the serial structure, the error of the previous subtask will be substituted into the next subtask, ultimately seriously affecting the output results.
2) A large amount of parallel computing power of the GPU is wasted under the serial structure. For each subtask, there are similar data preprocessing steps. Under the serial process, these repeated data processing steps lead to a large amount of computational redundancy.
3) Code written in software 1.0 mode has low iteration efficiency when faced with tasks of the level of autonomous driving. Modifications to subtasks may affect the seriality of the overall system, which is far from being able to adjust to the main workload. Network algorithm parameters and data sets compared to software 2.0 mode.
Tesla: Software 2.0 model more suitable for pure visual algorithm development. In the software 1.0 era, engineers wrote code, debugged the code, and ran the code in the integrated development environment (IDE). In the software 2.0 era, engineers no longer directly programmed, but used the data set, network architecture or loss of the neural network. The function is modified and deployed based on the results generated by the neural network.
In AutoPilot, a large number of small neural networks replace the traditional programs written directly in C++, and then the various software interfaces are connected in series through a larger neural network (HydraNet), all of which form an interlocking and precise whole. What supports Tesla's software 2.0 development model is its huge data retention, efficient data processing mechanism, and excellent AI engineer team accumulated over many years.
Tesla’s unique advantages in implementing Software 2.0 development are: huge vehicle fleet + data closed loop + mature engineering thinking. Tesla's data engine forms a closed loop of its own, from fleets equipped with standard self-driving hardware collecting large amounts of data to triggering data return through various rules and triggers composed of human brain AI differences (such as takeover and behavioral differences) in shadow mode. The data with semantic information is filtered and sent back to the cloud. The cloud uses tools to correct the erroneous AI output and puts it into the data cluster. It then uses these valid data to train the vehicle-side online model and the cloud offline model, and finally deploys the vehicle-end terminal through shadow mode. The new test compares the indicators of different versions until the finally verified new model is deployed on the car, completing a complete data-driven iterative development cycle.
This entire data closed loop is undoubtedly a model for the application of data-driven systems today, and has been imitated by various companies. However, achieving high levels of automation and process standardization like Tesla is not something that can be achieved overnight. It requires not only understanding the working principles of the framework, but also strong engineering practice capabilities and years of polishing and iteration.
Engineer thinking emphasizes industrial practice and constantly adds new data to solve problems. The biggest difference between industry and academia in the current era of data-driven artificial intelligence is that academia always keeps data unchanged and constantly iterates new algorithms on the basis of unchanged data to improve model performance. However, the core of actual industrial practice It lies in finding problems, proactively acquiring corresponding data, adding it to the training set, and using constant new data to drive the model to solve problems. This is also an important reason why Tesla's autonomous driving team can always lead the way. In essence, the information learned in data-driven models all comes from data, and the differences between different models are mainly in learning speed and operating efficiency. In the final analysis, the quantity and quality of data determine the upper limit of the model.
Tesla data processing: Emphasizes data scalability and avoids dependence on high-precision maps. At this AI Day, Tesla gave a solution for 4D annotation through road reconstruction as the true value of lane line perception. It is essentially a crowdsourced mapping based on Tesla’s powerful visual perception capabilities. However, the difference is that Tesla does not Display "low-precision" maps constructed using these, and instead internalize these maps as ground truth into the perceptual model, avoiding reliance on high-precision maps with detailed information.
4D annotation is an important method for Tesla's future data processing. It was first introduced as early as 2019 at Autonomous Day. At that time, Tesla used the SfM method to reconstruct the surrounding scene, and then reconstructed the point cloud on the reconstructed point cloud. According to the evolution process of Autolabeler shared on 2022 AI Day, it can be seen that the topology at that time was only based on a single trajectory, and the reprojection error was less than 3 pixels. The entire annotation was still relatively manual, and it took 3.5 hours to annotate a video clip.
而 2021 年至今自动标注开始使用 3D 特征进行多趟采集轨迹的聚合重建,获得了小于 3 像素的冲投影精度,人工标注耗时与 2020 年数据相当,但计算效率显著提高,可扩展性也变得非常强。
▍ Analysis of advantages and disadvantages of pure visual technology route
The market's main concern about purely visual solutions is that the ability to identify unknown obstacles is not as good as lidar, and errors may occur when the amount of data is small. The general view is that pure visual solutions lack the ability to identify unknown obstacles. Visual perception based on recognition must have trained similar objects in the training set to avoid obstacles. It is helpless for strange obstacles that have never been seen before. Therefore, lidar It is considered an indispensable sensing device for safe autonomous driving. However, this perception is changing due to the rapid development of pure visual algorithms. Currently, in 2D floor plan processing, it has been able to maturely complete the segmentation of passable areas, and it can determine where passable areas are without relying on the recognition of obstacles.
The results of the Occupancy Network newly announced by Tesla at this AI Day extend the segmentation of the 2D field to the 3D space, which can be directly used for downstream path planning. According to the person in charge of Tesla's vision model, under high moving speeds and normal weather conditions, the recognition capability of the pure visual solution under Occupancy can even exceed that of the lidar solution. With further optimization in the future, the gap between pure vision solutions and lidar solutions in identifying unknown obstacles will become smaller and smaller.
Perception link: The gap between pure visual solutions and lidar can be greatly reduced through system engineering and technical means. The difficulty lies in the need to provide a holistic design like Tesla and optimize every detail in the visual solution. Humans perceive the world around them through vision, and pixel sensors, such as cameras, can provide a large amount of information at the lowest cost. This information is a complex, high-bandwidth constraint on the state of the world.
The most difficult point for pure vision solutions to catch up with multi-sensor fusion solutions is the entire workflow of building a neural network system. In the scenario where the vehicle body has limited computing power, memory and bandwidth, starting from the data engine to the iterative evaluation system with low latency, completing this series of tasks on a large scale and in batches requires a more holistic design and technical means to target Optimize every link.
Potential disadvantages of purely visual solutions: There are certain requirements for light sources, and recognition will be affected when encountering overexposure or backlighting (such as when exiting a tunnel scene) and in extreme environments. The purely visual solution relies entirely on the data input of the camera. Therefore, in strong backlight scenarios such as exiting a tunnel, the purely visual solution has always had poor detection of white objects. For purely visual solutions, the way to solve this type of problem is to continue to expand the amount of data so that the complete situation can be calculated even when part of the data is blurred or missing.
Multi-sensor fusion solution: It also faces the impact of extreme weather. The performance of lidar in environments such as heavy fog is equally bad and may even be inferior to that of cameras. In the multi-sensor fusion solution, the camera is also an important part of data input, relying on the data obtained by multiple sensors to corroborate each other to generate the three-dimensional space around the car body. When the light source is affected, the camera in the multi-sensor fusion solution will also be seriously affected. At this time, how to do the acquisition between multiple sensors is an additional issue.
In addition, under extreme weather conditions, lidar detects by emitting light beams, which is greatly affected by the environment. Once the light beam is blocked, it cannot be used normally. Therefore, it cannot be turned on in bad weather such as rain, snow, haze, sandstorms, etc. At this time, Multi-sensor solutions require more reliance on millimeter wave radar for detection.
However, it can be seen from Tesla’s cancellation of millimeter wave radar on all models in early 2022 that the accuracy of millimeter wave radar itself is too low, making it unable to
solve this problem well.
Therefore, in the face of extreme weather and other external conditions, no matter which plan is currently adopted, it still needs to rely more on the driver's own judgment and reduce the use of autonomous driving technology.
▍ Analysis of commercialization prospects of pure visual solutions
Compared with the fusion solution, the biggest advantage of the pure visual solution is the overall implementation cost. We estimate that Tesla’s hardware cost is only about US$200. After Tesla removed the millimeter-wave radar, the pure vision solution only required eight cameras, and the cost of a single camera was about $30. For multi-sensor fusion solutions, lidar is essential.
Although the cost of lidar has been continuously reduced in recent years, a single unit is still around US$600-2,000. The entire autonomous driving system generally requires 3-5 lidars, so the total cost also fluctuates between US$3,000-10,000. The cost of multi-sensor fusion solutions is significantly higher than that of pure vision solutions. The high cost is a key factor hindering its commercialization.
Tesla’s commercialization of autonomous driving: Against the background of continuous feature upgrades, prices have increased by more than 100%-200% in the past three years. As the feature richness of Tesla FSD continues to increase, its price is also rising. In North America, Tesla FSD price increased from US$5,000 in 2019 to US$15,000 in September 2022, a price increase of more than 200%. FSD prices in China increased from 27,800 yuan in April 2019 to 64,000 yuan in September 2022, an increase of more than 100%
Tesla's autonomous driving penetration rate: about 7.3% globally, higher than other regions in North America, and high-end models such as Model S/X have higher penetration rates than other models.
1) Troy Teslike data shows that the global penetration rate of Tesla FSD in 22Q1 was approximately 7.3%, which was approximately 28.4% lower than the peak penetration rate of 35.7% in 2019Q4. The main reason is that the sales proportion of lower-priced models such as Model 3/Model Y continues to increase .
2) Looking at different regions, the FSD penetration rate in North America is higher than other regions. Troy Teslike data shows that the penetration rate of Tesla FSD in North America in 22Q1 is about 14.5%, which is higher than the 6.6% penetration rate in Europe and the 0.3% penetration rate in the Asia-Pacific region.
3) Looking at different models, looking at North America, which has the highest FSD penetration rate , the FSD penetration rate of high-end models Model S and Model The penetration rate is much higher than the 20% penetration rate of Model Y in the same period.
The second advantage of the purely visual solution is the unity and scalability of the system. There is no need to consider the synchronization problem of different sensing units in the multi-sensor solution and the problem of trust when perception differences occur. After the purely visual solution obtains data directly through the camera, it can be directly transferred to the vector space for generation.
In the multi-sensor fusion solution, first of all, the problem of transmission synchronization of different sensors needs to be taken into consideration, and the transmission bandwidth, delay and other issues need to be matched to form a complete data system; secondly, when the perception results of different sensors are different, such as lidar and camera When there is a mismatch in perceived information, how to accept it is also an important factor that limits the flexibility of multi-fusion solutions.
The scalability of the pure vision solution will accelerate the commercial expansion of Tesla Semi , The launch of Tesla Bot will greatly accelerate the commercialization of the company's autonomous driving software field. Among them, Semi relies on three-team arrangement and self-driving technology to spread to the field of Tesla Bot, which to a large extent can accelerate the company's commercialization progress in the field of self-driving.
Semi will realize the rapid implementation of FSD autonomous driving function through three-team formation. According to company founder Musk, he revealed in the 2022Q2 conference call that Semi will be mass-produced in 2023.
1) From the perspective of price range, the starting price of Semi is US$150,000, which is about 1 times higher than the starting price of Model S and Model X. This will play a major role in increasing the penetration rate of FSD autonomous driving in the future.
2) From the perspective of the layout and configuration of autonomous driving, Semi will use FSD to realize a formation of three vehicles, thereby realizing the assisted driving function (by equipped with the FSD function, three autonomous trucks can use the automatic following function on the highway, and only need to use A truck driver (the latter two semis do not require a truck driver) can save labor costs to a great extent).
Space is limited and will be continued. . .
。
Smart Car Experts focuses on the smart sharing and interactive communication platform for the smart connected automobile industry. It has established 10+ industry communication groups and regularly broadcasts online special live broadcasts.
We sincerely invite you to join the autonomous vehicle camera industry WeChat communication group :
The group includes the heads of 40+ domestic automotive camera companies and 180+ module engineers from OEMs , including company general managers, R&D directors, chief engineers, university professors and experts, etc., gathering industry elites in the lidar-related industry chain. Welcome to join the group to communicate.
Please add Xiaoyi@zhichangjia WeChat : 18512119620 to apply to join the group.
Autonomous Driving | LiDAR
Millimeter wave radar | Automotive electronics
1500+ professional communities invite you to join!
Reply "Join group" in the background to join
If you think it looks good, click "Looking"!