The first half of the automobile revolution is electrification, and the second half is intelligence. Electrification only changes the way the car is powered, but does not change the nature of the car. Intelligence is the main course of this revolution and will bring disruptive changes to cars. Cars will change from traditional mechanical bodies to intelligent bodies with powerful computing capabilities.
On the road to automotive intelligence, there is a leader with absolute strength, that is Tesla under the leadership of Elon Musk. The autonomous driving system it has built is the focus of global attention. Musk once wrote on Weibo that the artificial intelligence created by Tesla is the most advanced in the world.
Tesla is the only technology company in the world that has achieved full-stack self-developed and self-produced autonomous driving core areas. It has built a full-link autonomous driving software and hardware architecture that includes perception, regulation, control, and execution at various levels such as data, algorithms, and computing power.
Overall, Tesla's autonomous driving architecture uses a pure visual solution to perceive the world, and constructs a three-dimensional vector space of the real world through neural networks based on raw video data. In the vector space, a hybrid planning system that combines traditional regulation and control methods with neural networks is used to realize the behavior and path planning of the vehicle, generate control signals and transmit them to the actuators, and at the same time achieve continuous iteration of autonomous driving capabilities through a complete data closed-loop system and simulation platform.
The following will comprehensively analyze Tesla's core system for achieving FSD (Full Self-Drive) in four parts: perception, planning and control, data and simulation, and computing power.
01 Perception
According to the demonstration at Tesla AI Day in August 2021, Tesla's latest perception solution adopts a pure visual perception solution, completely abandoning non-camera sensors such as lidar and millimeter-wave radar, and only using cameras for perception, making it unique in the field of autonomous driving.
The principle of how humans perceive the world through their eyes is as follows: light passes through the eyes and information is collected by the retina. After transmission and preprocessing, the information reaches the visual cortex of the brain. Neurons extract characteristic structures such as color, direction, and edge from the information transmitted by the retina, and then transmit it to the inferior temporal lobe cortex. After complex processing by the cognitive neural network, the perception result is finally output.
Principles of human visual perception
The autonomous driving visual perception solution imitates the principles of the human visual system. The camera is the "eye of the car". Tesla cars use a total of eight cameras distributed around the car body. There are three cameras on the front of the car body, namely the front main field of view camera, the front wide field of view camera (fisheye lens) and the front narrow field of view camera (telephoto lens). There are two cameras on each side, namely the side front view camera and the side rear view camera. There is a rear view camera at the rear of the car body, which realizes a 360-degree global surround view as a whole, and the maximum monitoring distance can reach 250 meters.
Tesla body camera surround view
The real-world image data collected by the "Eye of the Car" is processed through a complex perception neural network architecture to construct a three-dimensional vector space of the real world, which includes dynamic traffic participants such as cars and pedestrians, static environmental objects such as road lines, traffic signs, traffic lights, buildings, and the coordinate position, direction angle, distance, speed, acceleration and other attribute parameters of each element. This vector space does not need to be completely consistent with the appearance of the real world, but tends to be a mathematical expression for machine understanding.
Use the camera to collect data and output the three-dimensional vector space through the neural network
According to Tesla's public information on AI DAY, after multiple rounds of upgrades and iterations, the visual perception framework currently used by Tesla is shown in the figure below. This is a shared feature multi-task neural network architecture based on video stream data, which has the ability to deeply recognize objects and short-term memory capabilities.
Tesla Visual Perception Network Architecture
Network infrastructure: HydraNet multi-head network
The basic structure of Tesla's visual perception network is composed of a backbone, a neck, and multiple branch heads. Tesla named it "HydraNet", which comes from the nine-headed snake in ancient Greek mythology.
The backbone layer completes end-to-end training on the original video data through the residual neural network (RegNet) and the BiFPN multi-scale feature fusion structure, extracts the multi-scale visual feature space (feature map) of the neck layer, and finally completes the sub-network training at the head layer according to different task types and outputs the perception results, supporting a total of more than 1,000 tasks including object detection, traffic light recognition, and lane line recognition.
HydraNet multi-task network structure
The core feature of the HydraNet network is that multiple subtask branches share the same feature space. Compared with using independent neural networks for a single task, it has the following advantages:
1) Using the same backbone to uniformly extract features and share them with each task head can avoid repeated calculations between different tasks and effectively improve the overall operation efficiency of the network;
2) Different subtask types can be decoupled. Each task runs independently without affecting other tasks. Therefore, when upgrading a single task, it is not necessary to verify whether other tasks are normal at the same time, which reduces the upgrade cost.
3) The generated feature space can be cached, which is convenient for calling at any time according to the needs of various tasks, and has strong scalability.
Data calibration layer: virtual camera builds standardized data
Tesla uses data collected from different cars to build a common perception network architecture. However, due to differences in camera installation external parameters, different cars may have slight deviations in the collected data. For this reason, Tesla adds a layer of "virtual standard camera" to the perception framework and introduces camera calibration external parameters to process the image data collected by each car through de-distortion, rotation and other methods, and uniformly map them to the same set of virtual standard camera coordinates, thereby achieving "calibration (RecTIfy)" of the original data of each camera, eliminating external parameter errors, ensuring data consistency, and feeding the calibrated data to the backbone neural network for training.
Insert a virtual camera layer before the raw data enters the neural network
Spatial understanding layer: Transformer realizes three-dimensional transformation
Since the data collected by the camera is at the 2D image level, which is not in the same dimension as the three-dimensional space of the real world, in order to achieve fully autonomous driving capabilities, it is necessary to transform the two-dimensional data into three-dimensional space.
In order to construct a three-dimensional vector space, the network needs to be able to output object depth information. Most autonomous driving companies use sensors such as lidar and millimeter-wave radar to obtain depth information and integrate it with visual perception results. Tesla insists on using video data obtained using pure visual solutions to calculate depth information. The idea is to introduce a BEV space conversion layer into the network structure to build the network's spatial understanding ability. The BEV coordinate system is a bird's-eye view coordinate system, which is a self-vehicle coordinate system that ignores elevation information.
The solution adopted by Tesla in the early days was to achieve perception in two-dimensional image space, then map it to three-dimensional vector space, and then fuse the results of all cameras. However, image-level perception is based on the ground plane hypothesis, that is, imagining the ground as an infinitely large plane, while the ground in the real world has slopes, which will lead to inaccurate depth information prediction. This is also the biggest difficulty faced by pure vision solutions based on cameras. At the same time, there is also the problem that a single camera cannot see the complete target, making "post-fusion" difficult to achieve.
In order to address this problem and make the perception results more accurate, Tesla adopts the idea of "front fusion" to directly fuse the different video data obtained by multiple cameras around the car body, and then use the same neural network for training to realize the transformation of features from two-dimensional image space to three-dimensional vector space.
Introducing BEV three-dimensional space conversion layer
The core module for realizing three-dimensional transformation is the Transformer neural network, which is a deep learning model based on the attention mechanism. It is derived from the information processing mechanism of the human brain. When faced with a large amount of external information, the human brain will filter out unimportant information and only focus on key information, which can greatly improve the efficiency of information processing. Transformer has a very outstanding performance in dealing with large-scale data learning tasks.
Previous article:Core technology of pure electric vehicles Driving principle of pure electric vehicles
Next article:Beidou satellite navigation system comprehensively supports the development of autonomous driving technology
- Popular Resources
- Popular amplifiers
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Simple and efficient zero-point acquisition circuit
- BMS solutions for electric bicycles and electric motorcycles under the new national standard for electric vehicles
- How to choose the type and model of diode in circuit design
- 【DWIN Serial Port Screen】Practical Application
- [National Technology N32G457 Review] RT-Thread drives SSD1306
- IIC communication problem between MSP430 and SHT11
- MSP430.dll initialization error occurs in the MSP430 download program
- A Problem with TTL Inverter Circuit
- EEWORLD University Hall ---- Sensor Technology and Applications Fan Shangchun, Beijing University of Aeronautics and Astronautics
- Analog Discovery 2 Review (6) Waveform Generator