Analysis of sensor calibration methods for autonomous driving systems of smart cars-EEWORLD

Collect

Sensor calibration is a basic requirement for autonomous driving. A car is equipped with multiple/various sensors, and the coordinate relationship between them needs to be determined. The co-founder and CTO of Bay Area autonomous driving startup ZooX is Sebastia Thrun's student Jesse Levinson, whose doctoral thesis is on sensor calibration.

This work can be divided into two parts: intrinsic parameter calibration and extrinsic parameter calibration. The intrinsic parameters determine the mapping relationship inside the sensor, such as the focal length, eccentricity and pixel aspect ratio (+ distortion coefficient) of the camera, while the extrinsic parameters determine the conversion relationship between the sensor and an external coordinate system, such as posture parameters (rotation and translation 6 degrees of freedom).

Camera calibration was once a prerequisite for 3-D reconstruction in computer vision. Professor Zhang Zhengyou’s famous Zhang calibration method uses the plane calibration algorithm obtained by Absolute Conic invariance to simplify the control field.

The focus here is to discuss the external parameter calibration between different sensors, especially the calibration between lidar and camera.

In addition, in the development of autonomous driving, calibration between GPS/IMU and camera or lidar, and calibration between radar and camera are also common. The biggest problem in calibration between different sensors is how to measure the best, because the types of data obtained are different:

The camera is a pixel array of RGB images;

LiDAR is 3-D point cloud distance information (possibly with grayscale values of reflectance values);

GPS-IMU provides vehicle body position and attitude information;

Radar is a 2-D reflectance map.

In this case, the objective function for minimizing the calibration error will be different for different sensor pairs.

In addition, there are two calibration methods: targetless and target. The former is performed in a natural environment with fewer constraints and does not require a special target; the latter requires a special control field and a ground truth target, such as a typical checkerboard plane.

Here we only discuss the targetless method, and give several calibration algorithms one by one.

Hand-eye calibration

This is a problem under certain constraints that is commonly studied by calibration methods: in a broad sense, a "hand" (such as GPS/IMU) and an "eye" (lidar/camera) are fixed on a machine. Then when the machine moves, the posture changes of the "hand" and "eye" must satisfy certain constraints. In this way, solving an equation can obtain the coordinate transformation relationship between the "hand" and the "eye", which is generally an equation of the form AX=XB.

There are two types of hand-eye systems: eye in hand and eye to hand. Our situation here is obviously the former, that is, both the hand and the eye are moving.

Hand-eye calibration can be divided into two-step method and one-step method. The most famous paper about the latter is "hand eye calibration using dual quaternion". It is generally believed that the one-step method is more accurate than the two-step method. The former estimates the rotation first and then the translation.

Here we take a look at the calibration algorithm of LiDAR and camera through the paper "LiDAR and Camera Calibration using Motion Estimated by Sensor Fusion Odometry" from the University of Tokyo.

Obviously, it is to solve an extended problem of hand-eye calibration, that is, 2D-3D calibration, as shown in the figure:

There are two types of hand-eye systems: eye in hand and eye to hand. Our situation here is obviously the former, that is, both the hand and the eye are moving.

Here we take a look at the calibration algorithm of LiDAR and camera through the paper "LiDAR and Camera Calibration using Motion Estimated by Sensor Fusion Odometry" from the University of Tokyo.

Obviously, it is to solve an extended problem of hand-eye calibration, that is, 2D-3D calibration, as shown in the figure:

The typical solution for hand-eye calibration is a two-step method: first solve the rotation matrix, and then estimate the translation vector. The formula is given below:

Now due to the scale problem, the above solution is unstable, so we need to use the data of LiDAR to make some changes, as shown in the figure below:

The points of the 3-D point cloud are tracked in the image, and their 2D-3D correspondence can be described by the following formula:

The problem to be solved becomes:

The initial solution to the above optimization problem is obtained through the classic P3P.

After obtaining the motion parameters of the camera, the rotation and translation parameters can be obtained in the two-step hand-eye calibration method, where the translation is estimated as follows:

Note: Here, the estimation of camera motion and the estimation of hand-eye calibration are performed alternately to improve the estimation accuracy. In addition, the author also found some strategies that camera motion affects the calibration accuracy, see the following figure for analysis:

It can be concluded that: 1) the smaller the actual camera movement a is, the smaller the projection error is; 2) the smaller ( ) is, the smaller the projection error is. The first point indicates that the camera movement should be small during calibration, and the second point indicates that the depth of the surrounding environment to be calibrated should change little, such as a wall.

It is also found that increasing the rotation angle of the camera motion will reduce the error propagation from camera motion estimation to hand-eye calibration.

This method cannot be used in outdoor natural environments because the image points of the point cloud projection are difficult to determine.

There are three papers on how to optimize LiDAR-camera calibration. Instead of estimating calibration parameters through the matching error between 3-D point clouds and image points, the depth map formed by the point cloud in the image plane is directly calculated, which has a global matching measure with the image obtained by the camera.

However, these methods require a lot of iterations, and the best approach is to generate initial values based on hand-eye calibration.

In addition, the University of Michigan used lidar reflection values, and the University of Sydney improved on this basis. Both methods are not as convenient as Stanford University's method, which directly uses point cloud and image matching to achieve calibration.

Stanford paper “Automatic Online Calibration of Cameras and Lasers”.

Stanford’s approach is to correct the “drift” of the calibration online, as shown in the figure below: An accurate calibration should match the green points (depth discontinuities) and the red edges (through the inverse distance transform IDT).

The calibration objective function is defined as follows:

Where w is the video window size, f is the frame #, (i, j) is the pixel position in the image, and p is the 3-D point of the point cloud. X represents the lidar point cloud data, and D is the result of the IDT on the image.

The following figure is an example of the result of real-time online calibration:

The first row was calibrated well, the second row showed drift, and the third row was recalibrated.

Paper from the University of Michigan, “Automatic Targetless Extrinsic Calibration of a 3D Lidar and Camera by Maximizing Mutual Information”

The calibration task defined here is to solve the conversion relationship between the two sensors, as shown in the figure: solve R, T.

The Mutual Information (MI) objective function is defined as an entropy value:

The algorithm used to solve this problem is the gradient method:

The following figure is an example of calibration: RGB pixel and point cloud calibration.

“Automatic Calibration of Lidar and Camera Images using Normalized Mutual Information” from the University of Sydney, Australia.

This article is an improvement on the above method. The sensor configuration is shown in the figure:

The calibration process is shown in the figure below:

A new measure Gradient Orientation Measure (GOM) is defined as follows:

It is actually the gradient correlation measure between the image and the lidar point cloud.

When matching point cloud data and image data, the point cloud needs to be projected onto a cylindrical image, as shown in the figure:

The projection formula is as follows:

Before calculating the gradient of the point cloud, the point cloud needs to be projected onto the sphere. The formula is as follows:

Finally, the gradient calculation method of the point cloud is as follows:

The calibration task is to find the maximum GOM, and the Monte Carlo method is used in this paper, which is similar to the particle filter.

The following figure is an example of the result:

IMU-Camera Calibration

German Fraunhofer paper “INS-Camera Calibration without Ground Control Points“.

Although this article is for the calibration of UAVs, it is also suitable for vehicles.

This is the East, North, Up (ENU) coordinate system defined by the IMU:

In fact, IMU-camera calibration and LiDAR-camera calibration are similar. First solve a hand-eye calibration, and then optimize the result. However, IMU has no feedback information available, only pose data, so pose graph optimization is performed. The following figure is a flow chart: The camera still uses SFM to estimate the pose.

This is the image calibration plate used:

LiDAR system calibration

Oxford University paper "Automatic self-calibration of a full field-of-view 3D n-laser scanner".

This paper defines the "crispness" of the point cloud as a quality measure, and minimizes an entropy function, Rényi Quadratic Entropy (RQE), as the optimization goal for online calibration of LiDAR. (Note: The author also discusses the solution to the clock bias problem of LiDAR)

"Crisp" actually describes the density of the point cloud distribution as a GMM (Gaussian Mixture Model). According to the definition of information entropy, RQE is chosen as the measure: