This article focuses on the importance of road environment perception to mobile navigation and related technologies.
01Construction of 3D road geometry model
First of all, the mechanism behind the 3D geometric model is multi-view geometry, which means that in order to obtain the 3D geometric structure of the corresponding model, you must use a camera to take pictures at two different positions. As shown in Figure 1, you can use two cameras to take pictures at different positions to obtain a 3D geometric model; similarly, you can use a single camera to continuously move and then continuously perform 3D reconstruction.
The main principle is: use the left camera plane and the right camera plane to calculate R and T. Usually when doing SLAM, you need to match the corresponding points in the image first, then use at least eight corresponding points to use SVD to solve the extrinsic parameter matrix, and then use this extrinsic parameter matrix to decompose to get R and T. After getting the relative pose of the two cameras, you can get the coordinates of the corresponding 3D points.
Figure 1 Depth estimation
For binocular vision, the plane of the image needs to be transformed first, because if the previous method is used, when matching feature points, it is often a two-dimensional matching problem, and the amount of calculation is relatively large.
Therefore, the binocular camera needs to be transformed into the red plane in Figure 2. After the antipodal point is pulled to infinity, the matching of corresponding points becomes a one-dimensional search problem, that is, select a point from the left camera, and then when selecting the corresponding point from the right camera, you only need to search on the same row.
The advantage of using binoculars for depth estimation is that a fixed baseline can be obtained through camera calibration. After that, a one-dimensional search can save a lot of computational effort and obtain a dense disparity map, which in turn corresponds to a dense depth map and finally a dense three-dimensional point.
Figure 2 Stereo matching
With the development of the field, there are many deep learning networks that can be used to obtain disparity maps, but most deep learning methods are data-driven. A big problem with data-driven methods is that sometimes we don’t know what the ground-truth is.
Of course, now we can use Lidar for synchronization, then project the radar point cloud onto the binocular camera, and then use the depth to infer the parallax. Although this solution can get the true value, its true value is limited by the accuracy of the camera and Lidar calibration.
Based on this, we explored many self-supervision methods and designed the PVStereo structure, as shown in Figure 3.
Figure 3 PVStereo structure
It can be seen that the traditional matching method is used to match images at different levels. At that time, it is assumed that the disparity of the corresponding image points is reliable, so no matter which pyids it corresponds to, they are all reliable. This is consistent with the assumption of deep learning. Then, using traditional pyramid voting, a relatively accurate but sparse disparity map can be obtained.
Inspired by the KT dataset, I wanted to use some sparse true values to train a better network, so I used traditional methods to guess the true value of the disparity, avoiding the process of using the true value to train the network.
The loop-based approach proposes the OptSreo network, as shown in Figure 4. First, a multi-scale cost volume is constructed, and then a loop unit is used to iteratively update the high-resolution disparity estimate. This not only avoids the error accumulation problem in the coarse-to-fine paradigm, but also achieves a great trade-off between accuracy and efficiency due to its simplicity and efficiency.
The experimental results are relatively robust, but outliers may appear in some scenarios.
Figure 4 Disparity map generation
Since ground truth is difficult to obtain, one method is to use traditional methods to guess some true values as false ground truth and then train the network; another method is to train in an unsupervised manner. Based on previous work, CoT-Stereo was proposed, as shown in Figure 5.
Using two different networks, network A and network B, these two networks are like two students, with different initializations, but the network structure is exactly the same. During initialization, network A and network B master different knowledge, and then network A shares the knowledge it thinks is correct with network B, and network B also shares it with network A. In this way, they can continuously learn from each other and evolve.
Figure 5 CoT-Stereo architecture
The results of unsupervised binocular estimation are also compared with many methods. Although the ground-truth results cannot be compared with the fully supervised methods, the overall influence me and the corresponding L of the network are well balanced, as shown in Figure 6.
Figure 6 Experimental results
How to use depth or parallax to convert into normal vector information? When doing some perception tasks, it is found that sometimes depth is not a very useful information, and if RGB-D information is used for training, there are other problems. If normal vector information is used, no matter near or far, the information given in the end is almost the same, and normal vector information is some additional assistance for many tasks.
The research found that there is not much or almost no work on how to quickly convert depth maps or disparity maps into normal vector information, so this type of work is studied here. The original intention is to be able to translate depth to normal vector without taking up almost any computing resources. The general framework is shown in Figure 7.
Figure 7 Three-Filte-to-Normal Framework
It can be seen that this is the most basic perspective transformation process, that is, a 3D coordinate can be transformed into an image coordinate using the camera's intrinsic parameters. If it is known that the local point satisfies the plane characteristic equation, it can be surprisingly found that if these two equations are combined, a formula expression such as one-half Z can be obtained.
After a series of calculations, we can see that the partial derivative of 1/V in the u direction is very easy to handle in the field of image processing. 1/V corresponds to disparity, but it is a multiple of the disparity. Therefore, if we take the partial derivative of 1/V, we are convolving the disparity map. Therefore, the normal vector estimation method does not need to convert the depth map into a three-dimensional point cloud, perform KNN, and then perform local plane fitting like the traditional method. This process is very complicated. But this method can easily convert the normal vector by knowing Z or the known depth map or disparity map.
We used this method to conduct a series of related experiments, and the results are shown in Figure 8. Compared with the most mainstream method at the time, we found that the method in this paper has a very good balance between speed and accuracy. Although the accuracy may be slightly worse, it has surpassed almost most methods. The speed can reach 260Hz using a single core, and 21kHz if CUDA is used, corresponding to an image resolution of 640×480.
Figure 8 Experimental results
After obtaining the above information, scene analysis is required. The current mainstream methods are semantic segmentation, object and instance segmentation. For scene understanding, especially semantic segmentation and some traditional methods are based on RGB information processing.
The main focus here is RGB-X, that is, how to extract features from RGB plus depth or normal. The main application focuses on feasible ordered detection, that is, the feasible area seen when driving. Currently, a framework as shown in Figure 9 is proposed.
Figure 9 Network structure
Here, a dual-path structure is used to extract features separately. One path is to extract features from RGB information, and the other path is to extract features from depth or normals. If it is depth, it needs to be converted to normal. Then, the features of these two different information can be fused to finally get a better feature, which includes both the texture characteristics in the RGB information and the geometric characteristics in the depth image. Finally, a better semantic segmentation result map is obtained through connection.
Some improvements are made to the above version, as shown in Figure 10. Since the fusion structure of the network is relatively complex, there is room for further improvement, so the following work is done here: First, some constraints are added to different channels using deep supervision, and then learning is done, which can solve the problem of gradient explosion. Secondly, since the previous network converged too quickly, a new set of SNE+ is designed here, which is better than SNE.
Figure 10 Improved network structure
The previous work has been based on feature-level fusion, and here we also study some data-level fusion. How to improve performance through multiple perspectives and a single ground-truth? Here we propose a network structure as shown in Figure 11.
It is mainly based on the homography of the plane. The homography is that the corresponding points can be estimated through the homography matrix of four pairs of points, and if the homography matrix and the left-right image are known, it can be converted into the perspective of another image through the ground-truth. It can be seen that here a reference image is given, a target image is given, and then the corresponding homegra-marix is estimated through the corresponding points, and then the target image can be directly converted into a generated image.
Previous article:Analysis of FANUC robot welding gun posture function
Next article:A brief discussion on the four major application difficulties of AGV/AMR
- Popular Resources
- Popular amplifiers
- Using IMU to enhance robot positioning: a fundamental technology for accurate navigation
- Researchers develop self-learning robot that can clean washbasins like humans
- Universal Robots launches UR AI Accelerator to inject new AI power into collaborative robots
- The first batch of national standards for embodied intelligence of humanoid robots were released: divided into 4 levels according to limb movement, upper limb operation, etc.
- New chapter in payload: Universal Robots’ new generation UR20 and UR30 have upgraded performance
- Humanoid robots drive the demand for frameless torque motors, and manufacturers are actively deploying
- MiR Launches New Fleet Management Software MiR Fleet Enterprise, Setting New Standards in Scalability and Cybersecurity for Autonomous Mobile Robots
- Nidec Drive Technology produces harmonic reducers for the first time in China, growing together with the Chinese robotics industry
- DC motor driver chip, low voltage, high current, single full-bridge driver - Ruimeng MS31211
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Introduction to MCU ISP, IAP, ICP, JTAG, SWD programming technology
- Why do we need derivatives to determine the internal resistance of voltage and current sources?
- I found that several posts I replied to yesterday were gone.
- Can you guys help me figure out what electronic component this is?
- MSP430 ADC acquisition filter
- Discussion: Issues with using isolated power supplies
- Can I get less points for downloading data?
- How to reduce the forward transfer coefficient S of MOS tube
- Repost - I have never seen the cross-section of these cable interfaces before
- Help with digital voltmeter design issues