This article briefly introduces visual perception in the autonomous driving industry, from sensor comparison to data collection and annotation, and then analyzes the perception algorithm, gives the difficulties and solutions of each module, and finally introduces the mainstream framework design of the perception module.
The visual perception system mainly uses the camera as the sensor input, and after a series of calculations and processing, it accurately perceives the environmental information around the vehicle. The purpose is to provide the fusion module with accurate and rich information, including the category, distance, speed, and direction of the detected object, and also to provide abstract semantic information. Therefore, the perception function of road traffic mainly includes the following three aspects:
Dynamic object detection (vehicles, pedestrians and non-motor vehicles)
Static object recognition (traffic signs and traffic lights)
Segmentation of drivable area (road area and lane lines)
If these three types of tasks are completed through the forward propagation of a deep neural network, it can not only improve the detection speed of the system and reduce the calculation parameters, but also improve the detection and segmentation accuracy by increasing the number of layers of the backbone network. As shown in the figure below: The visual perception task can be decomposed into target detection, image segmentation, target measurement, image classification, etc.
▍Sensor components
Front Line Camera
The viewing angle is small, and a camera module with an angle of about 52° is generally installed in the middle of the vehicle's front windshield. It is mainly used to sense scenes farther in front of the vehicle, and the perception distance is generally within 120 meters.
Panoramic wide-angle camera
The field of view is relatively large, and generally 6 camera modules with a field of view of about 100° are installed around the vehicle, mainly to sense the 360° surrounding environment (the installation scheme is similar to Tesla). The wide-angle camera has a certain degree of distortion, as shown in the following figure:
Surround view fisheye camera
The surround fisheye camera has a wide viewing angle of more than 180° and has better perception of close distances. It is usually used in parking scenarios such as APA and AVP. It is installed in four locations, including below the left and right rearview mirrors and below the front and rear license plates of the vehicle, to perform image stitching, parking space detection, visualization and other functions.
▍Camera calibration
The quality of camera calibration directly affects the accuracy of target ranging, which mainly includes intrinsic calibration and extrinsic calibration.
Intrinsic calibration is used to correct image distortion, and extrinsic calibration is used to unify the coordinate systems of multiple sensors and move their respective coordinate origins to the center of the vehicle's rear axle.
The most familiar calibration method is Zhang Zhengyou's checkerboard method. In the laboratory, a checkerboard board is usually made to calibrate the camera, as shown below:
Factory calibration
However, for mass production of autonomous driving, it is not possible to calibrate each vehicle using a calibration plate. Instead, a site is built for calibration when the vehicle leaves the factory, as shown in the following figure:
Online calibration
In addition, considering the deviation of the camera position when the vehicle runs for a period of time or during bumps, the perception system also has an online calibration model, which often uses information obtained from detections such as vanishing points or lane lines to update the changes in the pitch angle in real time.
▍Data Annotation
There are various emergencies in natural road scenes, so a large amount of real vehicle data needs to be collected for training. High-quality data annotation has become a crucial task, and all the information that the perception system needs to detect needs to be annotated. Annotation forms include object-level annotation and pixel-level annotation:
The target level annotation is as follows:
The pixel-level annotation is as follows:
Since detection and segmentation tasks in perception systems are often implemented using deep learning, which is a data-driven technology, it requires a large amount of data and annotation information for iteration. In order to improve the efficiency of annotation, a semi-automatic annotation method can be used, by embedding a neural network in the annotation tool to provide an initial annotation, which is then manually corrected, and after a period of time, new data and labels are loaded for iterative cycles.
▍Functional division
Visual perception can be divided into multiple functional modules, such as target detection and tracking, target measurement, drivable area, lane line detection, static object detection, etc.
Object Detection and Tracking
Identify dynamic objects such as vehicles (cars, trucks, electric vehicles, bicycles), pedestrians, etc., output the category and 3D information of the detected object, and match the information between frames to ensure the stability of the detection frame output and predict the running trajectory of the object. The accuracy of 3D regression directly performed by the neural network is not high, and the vehicle is usually split into the front, body, rear, and tire parts to form a 3D frame.
Difficulties in target detection: There are many occlusion situations and the accuracy of the orientation angle is a problem. There are many types of pedestrians and vehicles, which are prone to false detection. There are problems with multi-target tracking and ID switching.
For visual target detection, the perception performance will decline in bad weather conditions, and it is easy to miss detections at night when the lights are dim. If the results of the LiDAR are combined, the recall rate of the target will be greatly improved.
Target detection scheme:
The detection of multiple targets, especially vehicles, requires the 3D Bounding Box of the vehicle. The advantage of 3D is that it can provide the vehicle's orientation angle information and height information. By adding a multi-target tracking algorithm, corresponding ID numbers are given to vehicles and pedestrians.
As a probabilistic algorithm, deep learning cannot cover all dynamic object features even if it has a strong feature extraction capability. In engineering development, some geometric constraints can be added based on real scenarios (such as the length-to-width ratio of cars and trucks is fixed, the distance between vehicles cannot change suddenly, and the height of pedestrians is limited, etc.).
The benefit of adding geometric constraints is to improve the detection rate and reduce the false detection rate. For example, a car cannot be mistakenly detected as a truck. You can train a 3D detection model (or 2.5D model) and then cooperate with the back-end multi-target tracking optimization and the distance measurement method based on monocular vision geometry to complete the functional module.
Target measurement
Target measurement includes measuring the horizontal and vertical distance, horizontal and vertical speed of the target. Based on the output of target detection and tracking, the distance information and speed information of dynamic obstacles such as vehicles are calculated from the 2D plane image with the help of prior knowledge such as the ground, or the position of the object in the world coordinate system is directly regressed through the NN network. As shown in the following figure:
Difficulties of monocular measurement:
How to calculate the distance of an object in a certain direction from a monocular system that lacks depth information? Then we need to figure out the following questions:
What kind of needs are there?
What kind of priors are there?
What kind of maps are there?
What kind of accuracy is required?
What kind of energy can be provided
If we rely heavily on pattern recognition technology to make up for the lack of depth, is the pattern recognition robust enough to meet the stringent detection accuracy requirements of serially produced products?
Monocular measurement solution:
First, the geometric relationship between the world coordinates of the test object and the image pixel coordinates is established through an optical geometric model (i.e., a pinhole imaging model). Combined with the calibration results of the camera's internal and external parameters, the distance to the vehicle or obstacle in front can be obtained.
The second method is to directly regress the collected image samples to obtain the functional relationship between the image pixel coordinates and the vehicle distance. This method lacks the necessary theoretical support and is a pure data fitting method. Therefore, it is limited by the extraction accuracy of the fitting parameters and has relatively poor robustness.
Passable area
The division of the drivable area for vehicles mainly involves dividing the vehicle, ordinary road edges, curb edges, boundaries without visible obstacles, and unknown boundaries, and finally outputting the safe area where the vehicle can pass.
Difficulties in road segmentation:
In complex environment scenes, the boundary shapes are complex and diverse, which makes generalization difficult. Unlike other detections that have clear detection types (such as vehicles, pedestrians, and traffic lights), the passage space needs to divide the driving safety area of the vehicle, and all the obstacles that affect the vehicle's forward movement need to be divided, such as uncommon water barriers, cones, potholes, non-cement roads, green belts, tile-shaped road boundaries, crossroads, T-junctions, etc.
Previous article:Why Japanese cars prefer CVT transmissions
Next article:In-depth analysis of the domain controller, the core component of automotive intelligence
- Popular Resources
- Popular amplifiers
- Computer Vision Applications in Autonomous Vehicles: Methods, Challenges, and Future Directions
- Evaluating Roadside Perception for Autonomous Vehicles: Insights from Field Testing
- Investigation of occupancy perception in autonomous driving: An information fusion perspective
- CVPR 2023 Paper Summary: Vision Applications and Systems
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Active RFID Design Based on MSP430 and CC1100
- New uses for old phones (3) - Install the Android tool Termux
- How to design an HDMI digital interface converter based on a single chip
- Running the C interpreter on ESP32
- 【GD32E503 Review】+ W5500 Network Module Transplantation
- I have a question about 7060 chip burning!
- What is a patch antenna?
- [TI star product limited time purchase] +LAUNCHXL-CC2640R2 development board
- Sharing of stepper motor control board based on TB67S109AFNAG
- SensiML Industrial Monitoring Demo ST SensorTile