Article count:1116 Read by:1609568

Tesla AI Day: Analysis of advantages and disadvantages of pure visual technology route, how far can it go?

Latest update time：2022-12-06

Reads：

To join the "Smart Car Expert Car Camera Industry Exchange Group" from Zhiquan , please add 18512119620 (same as WeChat), and note the company-name-position to join the group

introduction:

Tesla's pure vision technology route algorithm design is based on first principles, using BEV vision algorithms and grid networks to achieve the construction of a three-dimensional spatial bird's-eye view, and incorporating time series features, which greatly narrows the gap with lidar solutions. Problems on the environmental side can be avoided to the greatest extent through the driver's subjective judgment ; on the data side, with the help of a huge number of active vehicles, Tesla also uses shadow mode, unlabeled data technology, etc. to drive the continuous optimization of its own perception, decision-making, and planning algorithms& Iterate. The systemic advantages brought by the integrated integration of Tesla chips + algorithms + data, as well as the broader commercialization path brought by the cost advantages, stability & scalability of pure vision solutions, etc., are expected to make Tesla's autonomous driving technology Maintain continued leadership in the global market and continuously strengthen its leading position in the global electric vehicle market.

On September 30, Tesla held its annual AI Day and shared its latest achievements in AI technology over the past year. Many of the latest concepts and innovations shared this year included the thorough implementation of end-to-end data-driven approaches to all parts of the technical solution , as well as the performance leap brought about by highly vertically integrated optimization from underlying hardware to top-level algorithms. These mainly proposed technical solutions to the two biggest inherent problems of pure vision solutions: identifying unknown obstacles and coping with extreme climatic conditions, and have achieved certain results.

Source: CITIC Securities Research Institute

If you want to get the full version of the PDF report, click to follow the blue word " Intelligent Car Expert " above and reply "Visual" to get it.

As a unique presence among global automakers, Tesla has always adhered to purely visual self-driving technology solutions. The purely visual autonomous driving solution has also attracted a large number of followers because of its lower cost and faster commercialization path. However, at the same time, the market has always had many concerns about this technology route.

Tesla’s autonomous driving technology route: pure visual solution, better economy and scalability. 1) Starting from February 2022, Tesla will cancel millimeter-wave radar in all North American sales models and enable autonomous driving with a purely visual solution. New domestic car-making forces NIO, Xpeng, and Ideal have all adopted the "camera + radar + high-precision map" solution. Traditional car companies such as Volkswagen currently adopt a "camera + radar" solution, and in the future it is expected that some models will be equipped with lidar. 2) Currently, Tesla has achieved full self-research of chips and software systems. Comparing the products of OEMs and RoboTaxi manufacturers, it can be seen that Tesla can currently conduct full-stack self-research on autonomous driving hardware, software systems, and application algorithms.

The market's main concerns about Tesla's technical route: It is generally believed that the accuracy and robustness of pure vision solutions are not as good as lidar solutions, and it is more difficult to meet the safety standards required for autonomous driving. The purely visual solution eliminates lidar and uses pure cameras as data input, which results in using two-dimensional images to generate three-dimensional space, making it more difficult to accurately grasp the three-dimensional position information of obstacles.

At the same time, the input height of the camera is affected by lighting and other environmental influences, which may cause identification difficulties under backlight or heavy snow conditions, resulting in poor detection of white objects. The above issues and other issues have caused the market to worry about the safety of Tesla's pure vision solution. Even if the economy of the pure vision solution is excellent, it will not be able to meet the safety standards required for autonomous driving.

In this article, we will combine Tesla’s latest technological advances, related papers, and technical details announced at Tesla AI Day in the past two years to conduct a systematic analysis and discussion on the competitiveness, potential advantages and disadvantages of Tesla’s pure vision technology route for autonomous driving.

▍ Tesla’s pure vision algorithm technology discussion

Tesla has a clear leadership in pure vision algorithms and is committed to solving some problems inherent in pure vision solutions from a technical perspective. From the relevant paper results of Andrej Karpathy to the technical results displayed at AI Day, Tesla has been leading the technological advancement path of pure visual solutions and has maintained its position as an industry benchmark.

On September 30, 2022, Tesla held this year’s AI Day and shared the latest achievements in AI technology in the past year. Many of the latest concepts and innovations shared this year include the thorough implementation of end-to-end data From the driving method to each part of the technical solution, as well as the performance leap brought about by the high degree of vertical integration optimization from the underlying hardware to the top-level algorithm, it mainly focuses on identifying unknown obstacles and coping with extreme climate conditions. The biggest inherent advantages of pure visual solutions Technical solutions to the problems have been proposed and have achieved certain results.

Algorithm framework: from perception to planning, decision-making, and simulation

Up to now, Tesla's software algorithms in the field of autonomous driving have gradually built a complete system of perception, planning algorithms, and simulation training by relying on data & neural networks. Especially in the perception layer, Tesla focuses on narrowing the gap with multi-sensor solutions through new technologies such as BEV, Transformer, timestamp, and Occupancy.

The efficiency and latency issues of the planning layer are still common problems faced by all autonomous driving solutions, and Tesla has currently achieved a certain lead through its unique data-driven lightweight generation network. In the following article, we will analyze the current status of Tesla's pure visual solution based on Tesla's latest technological progress in these fields.

Visual perception algorithm: Add a grid network to quickly classify the identified objects into dynamic and static categories to improve the ability to identify obstacles. Tesla has shared a lot of technical details of BEV at the 2021 AI Day. The key is to use the Transformer algorithm to splice the results and generate a 3D space after using loosely coupled state perception through the 8 cameras on the car. .

The Occupancy Network (grid network) announced this time is Tesla’s further exploration of pure visual perception solutions. Occupancy is not a replacement for BEV's bird's-eye view, but it further expands the height direction based on BEV. After adding a dimension, Occupancy changes BEV's 2D grid into 3D, and then generates Occupancy Features instead of BEV. Features.

Occupancy's visual semantic perception has more advantages than lidar. It can quickly capture strange objects within the perception range and give them speed attributes. Compared with lidar, Occupancy does not need to do time synchronization with the camera, external reference alignment, etc., and its output frequency can reach 36Hz, while the output frequency of most lidar is only 10Hz, so in high-speed environments or for fast-moving objects For object perception, the performance of purely visual Occupancy may be better than lidar.

Path perception: Lane line perception Vector Lane has become the latest highlight, adapting to complex urban road sections. As early as Tesla 2020 AI Day, Tesla announced lane line sensing technology with BEV as the core for the first time, which caused an upsurge in research on BEV sensing in the academic community. At this AI Day, Tesla once again announced the latest progress in lane line perception, the most eye-catching of which is the end-to-end Vector Lane perception. In the pre-processing, Tesla first added the Map Component module, using the geometric topological relationship of the lane lines, lane line width, lane line number and other information in the low-precision map, and integrated the coding into the Vector Lane module.

The Vector Lane module uses a Transformer-like architecture, adding the lane line's starting point, end point, fork point, and confluence point as decoder input to the self-attention model , and then combines the results obtained from the Map Component to generate the Value and Key to obtain the latest Token. According to the results published by Tesla, the road line model generated in this way will be continuously fine-tuned according to the real-time perceived information, helping FSD obtain the road connection relationship at very complex intersections.

Planning and decision-making algorithm: generate path planning based on vector space. Planning and decision-making algorithms have always been the most complex parts of autonomous driving algorithms, requiring low-latency and high-accuracy reasoning on vehicles with limited performance (bandwidth, computing power, memory and other attributes).

Although Tesla's planning and decision-making algorithm is not as dazzling as the results achieved by its perception algorithm, it has undoubtedly reached the industry-leading level. Tesla's decision-making algorithm is based on the vector space constructed by its perception algorithm, using incremental tree search to complete the overall decision. The entire incremental tree search process is divided into two core modules:

1) Decision tree generation: Use the lane line results mentioned above and the vector space generated by Occupancy as input values to predict potential target states in the space, and then further split these targets into possible trajectory actions, and finally combine them with other dynamics Objects play games to arrive at optimal actions. For traditional methods, the difficulty lies in the calculation time. In complex urban scenarios, the optimal method after adding constraints cannot meet the timeliness requirements during road driving.

Tesla mentioned an optimization method in AI Day last year. First, it uses an incremental method to continuously add new constraints, and uses the optimal solution under fewer constraints as the initial value to continue to solve more complex optimization problems, and finally obtains the optimal solution. , Tesla engineers mentioned that although this method performs a lot of pre-generation offline and performs parallel optimization online, although the calculation time of 1 to 5ms for each candidate path is already excellent, it is still complicated to traverse the urban scene as much as possible. Not fast enough.

In the end, Tesla used another data-driven lightweight generation network to help quickly generate planned paths. This data-driven decision tree generation model is trained using the driving data of human drivers in the Tesla fleet and the true value of the global optimal path planned under offline conditions without time constraints, and can generate a candidate planning path within 100us.

2) Decision tree pruning: With the decision tree generation algorithm, there is a complete planning problem definition. However, even using the above generation method, it is impossible to traverse the complex scene decision tree within the limited response time. Therefore, a system that can quickly evaluate and score candidate paths, reject outrageous candidate paths, and cut the decision tree becomes another piece of the puzzle of the decision-making planning system.

Tesla also uses a combination of traditional methods and data-driven methods, using collision detection, comfort analysis, and training models based on actual fleet human driver data and shadow patterns to predict the probability that a candidate path will lead to takeover, as well as the candidate path The difference between the driving path of a human driver and four methods are used to evaluate candidate paths to complete pruning.

Data engine: one of Tesla’s core strengths

Third-party pure visual solution provider: Software 1.0 development model makes more use of open source or modified open source algorithms, and multiple subtasks are connected in series to form a system. Currently, other pure visual systems developed based on Software 1.0 thinking have the following problems:

1) The perception task under software 1.0 thinking is composed of multiple independent sub-tasks in series, including dynamic and static object perception, road line perception, traffic sign perception, etc. Each subtask has a certain error, and under the serial structure, the error of the previous subtask will be substituted into the next subtask, ultimately seriously affecting the output results.

2) A large amount of parallel computing power of the GPU is wasted under the serial structure. For each subtask, there are similar data preprocessing steps. Under the serial process, these repeated data processing steps lead to a large amount of computational redundancy.

3) Code written in software 1.0 mode has low iteration efficiency when faced with tasks of the level of autonomous driving. Modifications to subtasks may affect the seriality of the overall system, which is far from being able to adjust to the main workload. Network algorithm parameters and data sets compared to software 2.0 mode.

Tesla: Software 2.0 model more suitable for pure visual algorithm development. In the software 1.0 era, engineers wrote code, debugged the code, and ran the code in the integrated development environment (IDE). In the software 2.0 era, engineers no longer directly programmed, but used the data set, network architecture or loss of the neural network. The function is modified and deployed based on the results generated by the neural network.

In AutoPilot, a large number of small neural networks replace the traditional programs written directly in C++, and then the various software interfaces are connected in series through a larger neural network (HydraNet), all of which form an interlocking and precise whole. What supports Tesla's software 2.0 development model is its huge data retention, efficient data processing mechanism, and excellent AI engineer team accumulated over many years.

Tesla’s unique advantages in implementing Software 2.0 development are: huge vehicle fleet + data closed loop + mature engineering thinking. Tesla's data engine forms a closed loop of its own, from fleets equipped with standard self-driving hardware collecting large amounts of data to triggering data return through various rules and triggers composed of human brain AI differences (such as takeover and behavioral differences) in shadow mode. The data with semantic information is filtered and sent back to the cloud. The cloud uses tools to correct the erroneous AI output and puts it into the data cluster. It then uses these valid data to train the vehicle-side online model and the cloud offline model, and finally deploys the vehicle-end terminal through shadow mode. The new test compares the indicators of different versions until the finally verified new model is deployed on the car, completing a complete data-driven iterative development cycle.

This entire data closed loop is undoubtedly a model for the application of data-driven systems today, and has been imitated by various companies. However, achieving high levels of automation and process standardization like Tesla is not something that can be achieved overnight. It requires not only understanding the working principles of the framework, but also strong engineering practice capabilities and years of polishing and iteration.

Engineer thinking emphasizes industrial practice and constantly adds new data to solve problems. The biggest difference between industry and academia in the current era of data-driven artificial intelligence is that academia always keeps data unchanged and constantly iterates new algorithms on the basis of unchanged data to improve model performance. However, the core of actual industrial practice It lies in finding problems, proactively acquiring corresponding data, adding it to the training set, and using constant new data to drive the model to solve problems. This is also an important reason why Tesla's autonomous driving team can always lead the way. In essence, the information learned in data-driven models all comes from data, and the differences between different models are mainly in learning speed and operating efficiency. In the final analysis, the quantity and quality of data determine the upper limit of the model.

Tesla data processing: Emphasizes data scalability and avoids dependence on high-precision maps. At this AI Day, Tesla gave a solution for 4D annotation through road reconstruction as the true value of lane line perception. It is essentially a crowdsourced mapping based on Tesla’s powerful visual perception capabilities. However, the difference is that Tesla does not Display "low-precision" maps constructed using these, and instead internalize these maps as ground truth into the perceptual model, avoiding reliance on high-precision maps with detailed information.

4D annotation is an important method for Tesla's future data processing. It was first introduced as early as 2019 at Autonomous Day. At that time, Tesla used the SfM method to reconstruct the surrounding scene, and then reconstructed the point cloud on the reconstructed point cloud. According to the evolution process of Autolabeler shared on 2022 AI Day, it can be seen that the topology at that time was only based on a single trajectory, and the reprojection error was less than 3 pixels. The entire annotation was still relatively manual, and it took 3.5 hours to annotate a video clip.

而 2021 年至今自动标注开始使用 3D 特征进行多趟采集轨迹的聚合重建，获得了小于 3 像素的冲投影精度，人工标注耗时与 2020 年数据相当，但计算效率显著提高，可扩展性也变得非常强。

▍ Analysis of advantages and disadvantages of pure visual technology route

The market's main concern about purely visual solutions is that the ability to identify unknown obstacles is not as good as lidar, and errors may occur when the amount of data is small. The general view is that pure visual solutions lack the ability to identify unknown obstacles. Visual perception based on recognition must have trained similar objects in the training set to avoid obstacles. It is helpless for strange obstacles that have never been seen before. Therefore, lidar It is considered an indispensable sensing device for safe autonomous driving. However, this perception is changing due to the rapid development of pure visual algorithms. Currently, in 2D floor plan processing, it has been able to maturely complete the segmentation of passable areas, and it can determine where passable areas are without relying on the recognition of obstacles.

The results of the Occupancy Network newly announced by Tesla at this AI Day extend the segmentation of the 2D field to the 3D space, which can be directly used for downstream path planning. According to the person in charge of Tesla's vision model, under high moving speeds and normal weather conditions, the recognition capability of the pure visual solution under Occupancy can even exceed that of the lidar solution. With further optimization in the future, the gap between pure vision solutions and lidar solutions in identifying unknown obstacles will become smaller and smaller.

Perception link: The gap between pure visual solutions and lidar can be greatly reduced through system engineering and technical means. The difficulty lies in the need to provide a holistic design like Tesla and optimize every detail in the visual solution. Humans perceive the world around them through vision, and pixel sensors, such as cameras, can provide a large amount of information at the lowest cost. This information is a complex, high-bandwidth constraint on the state of the world.

The most difficult point for pure vision solutions to catch up with multi-sensor fusion solutions is the entire workflow of building a neural network system. In the scenario where the vehicle body has limited computing power, memory and bandwidth, starting from the data engine to the iterative evaluation system with low latency, completing this series of tasks on a large scale and in batches requires a more holistic design and technical means to target Optimize every link.

Potential disadvantages of purely visual solutions: There are certain requirements for light sources, and recognition will be affected when encountering overexposure or backlighting (such as when exiting a tunnel scene) and in extreme environments. The purely visual solution relies entirely on the data input of the camera. Therefore, in strong backlight scenarios such as exiting a tunnel, the purely visual solution has always had poor detection of white objects. For purely visual solutions, the way to solve this type of problem is to continue to expand the amount of data so that the complete situation can be calculated even when part of the data is blurred or missing.

Multi-sensor fusion solution: It also faces the impact of extreme weather. The performance of lidar in environments such as heavy fog is equally bad and may even be inferior to that of cameras. In the multi-sensor fusion solution, the camera is also an important part of data input, relying on the data obtained by multiple sensors to corroborate each other to generate the three-dimensional space around the car body. When the light source is affected, the camera in the multi-sensor fusion solution will also be seriously affected. At this time, how to do the acquisition between multiple sensors is an additional issue.

In addition, under extreme weather conditions, lidar detects by emitting light beams, which is greatly affected by the environment. Once the light beam is blocked, it cannot be used normally. Therefore, it cannot be turned on in bad weather such as rain, snow, haze, sandstorms, etc. At this time, Multi-sensor solutions require more reliance on millimeter wave radar for detection. However, it can be seen from Tesla’s cancellation of millimeter wave radar on all models in early 2022 that the accuracy of millimeter wave radar itself is too low, making it unable to solve this problem well. Therefore, in the face of extreme weather and other external conditions, no matter which plan is currently adopted, it still needs to rely more on the driver's own judgment and reduce the use of autonomous driving technology.

▍ Analysis of commercialization prospects of pure visual solutions

Compared with the fusion solution, the biggest advantage of the pure visual solution is the overall implementation cost. We estimate that Tesla’s hardware cost is only about US$200. After Tesla removed the millimeter-wave radar, the pure vision solution only required eight cameras, and the cost of a single camera was about $30. For multi-sensor fusion solutions, lidar is essential.

Although the cost of lidar has been continuously reduced in recent years, a single unit is still around US$600-2,000. The entire autonomous driving system generally requires 3-5 lidars, so the total cost also fluctuates between US$3,000-10,000. The cost of multi-sensor fusion solutions is significantly higher than that of pure vision solutions. The high cost is a key factor hindering its commercialization.

Tesla’s commercialization of autonomous driving: Against the background of continuous feature upgrades, prices have increased by more than 100%-200% in the past three years. As the feature richness of Tesla FSD continues to increase, its price is also rising. In North America, Tesla FSD price increased from US$5,000 in 2019 to US$15,000 in September 2022, a price increase of more than 200%. FSD prices in China increased from 27,800 yuan in April 2019 to 64,000 yuan in September 2022, an increase of more than 100%

Tesla's autonomous driving penetration rate: about 7.3% globally, higher than other regions in North America, and high-end models such as Model S/X have higher penetration rates than other models.

1) Troy Teslike data shows that the global penetration rate of Tesla FSD in 22Q1 was approximately 7.3%, which was approximately 28.4% lower than the peak penetration rate of 35.7% in 2019Q4. The main reason is that the sales proportion of lower-priced models such as Model 3/Model Y continues to increase .

2) Looking at different regions, the FSD penetration rate in North America is higher than other regions. Troy Teslike data shows that the penetration rate of Tesla FSD in North America in 22Q1 is about 14.5%, which is higher than the 6.6% penetration rate in Europe and the 0.3% penetration rate in the Asia-Pacific region.

3) Looking at different models, looking at North America, which has the highest FSD penetration rate , the FSD penetration rate of high-end models Model S and Model The penetration rate is much higher than the 20% penetration rate of Model Y in the same period.

The second advantage of the purely visual solution is the unity and scalability of the system. There is no need to consider the synchronization problem of different sensing units in the multi-sensor solution and the problem of trust when perception differences occur. After the purely visual solution obtains data directly through the camera, it can be directly transferred to the vector space for generation.

In the multi-sensor fusion solution, first of all, the problem of transmission synchronization of different sensors needs to be taken into consideration, and the transmission bandwidth, delay and other issues need to be matched to form a complete data system; secondly, when the perception results of different sensors are different, such as lidar and camera When there is a mismatch in perceived information, how to accept it is also an important factor that limits the flexibility of multi-fusion solutions.

The scalability of the pure vision solution will accelerate the commercial expansion of Tesla Semi , The launch of Tesla Bot will greatly accelerate the commercialization of the company's autonomous driving software field. Among them, Semi relies on three-team arrangement and self-driving technology to spread to the field of Tesla Bot, which to a large extent can accelerate the company's commercialization progress in the field of self-driving.

Semi will realize the rapid implementation of FSD autonomous driving function through three-team formation. According to company founder Musk, he revealed in the 2022Q2 conference call that Semi will be mass-produced in 2023.

1) From the perspective of price range, the starting price of Semi is US$150,000, which is about 1 times higher than the starting price of Model S and Model X. This will play a major role in increasing the penetration rate of FSD autonomous driving in the future.

2) From the perspective of the layout and configuration of autonomous driving, Semi will use FSD to realize a formation of three vehicles, thereby realizing the assisted driving function (by equipped with the FSD function, three autonomous trucks can use the automatic following function on the highway, and only need to use A truck driver (the latter two semis do not require a truck driver) can save labor costs to a great extent).

Space is limited and will be continued. . .

To get the full version PDF document

Click to follow the blue words " Intelligent Car Expert " above

Reply to " Visual " to receive

。

About Smart Car Expert

Smart Car Experts focuses on the smart sharing and interactive communication platform for the smart connected automobile industry. It has established 10+ industry communication groups and regularly broadcasts online special live broadcasts.

We sincerely invite you to join the autonomous vehicle camera industry WeChat communication group :

The group includes the heads of 40+ domestic automotive camera companies and 180+ module engineers from OEMs , including company general managers, R&D directors, chief engineers, university professors and experts, etc., gathering industry elites in the lidar-related industry chain. Welcome to join the group to communicate.

Please add Xiaoyi@zhichangjia WeChat : 18512119620 to apply to join the group.

[Disclaimer] The article is the independent opinion of the author and does not represent the position of smart car experts. If there are any problems with the content, copyright, etc. of the work, please contact Smart Car Expert within 30 days of the publication of this article to delete it or discuss copyright use matters.

Long press the QR code to follow Smart Car Expert

Autonomous Driving | LiDAR

Millimeter wave radar | Automotive electronics

1500+ professional communities invite you to join!

Reply "Join group" in the background to join

If you think it looks good, click "Looking"!

Latest articles about

■"Unveiling" the Xiaopeng P7+AI Eagle Eye Vision, Sunny Optical's 8-megapixel lens empowers it!

■Analysis of the high-speed copper cable market: Short-distance transmission in data centers shows cost-effectiveness

■[Radar Weekly News] Aeva's revenue in the first three quarters increased significantly year-on-year; Everlight Huaxin started research and development of all-solid-state laser radar; 4D radar company Starlink received tens of millions of financing; Sagitar SoC chip was certified

■Aerial imaging technology & enterprise inventory, another important progress in the field of virtual reality!

■Five questions and answers about the intelligent driving chip industry: the high-end intelligent driving chip camp has taken shape

■【Visit Great Wall Motors】Shenzhen Jieniu Intelligent Equipment Co., Ltd., a supplier of efficient grinding solutions, has confirmed its participation in the exhibition!

■Countdown 5 days! Guangmu Intelligent/Dongfang Zhongke/Jie Niu Intelligent/Electromagnetic Measurement/Nord Electronics/Shuanghuan/Easelink and other 20+ leading companies invite you to visit Great Wall Motors!

■Dismantling the eVTOL industry chain, ushering in the first year of global eVTOL commercial operation

■The next generation of interconnect technology is here! The 4th China Interconnect Technology and Industry Conference will be held in Shenzhen on December 7!

■In-depth research on the automotive connector industry (market size, competition landscape, domestic exports, etc.)