Therefore, as embedded computers have limited computing power, the ORB method is considered more suitable for the application of autonomous vehicles. Other image feature descriptors for VO are listed below, but not limited to DAISY (Tola et al., 2010), ASIFT (Morel and Yu, 2009), MROGH (Fan et al., 2011a), HARRIS (Wang et al., 2008), LDAHash (Fan et al., 2011b), D-BRIEF (Trzcinski and Lepetit, 2012), Vlfeat (Vedali and Fulkerson, 2010), FREAK (Alahi et al., 2012), Shape Context (Belongie et al., 2002), PCA-SIFT (Ke and Sukthantar, 2004).
2.3 Backend
The backend receives the camera pose estimated by the frontend and optimizes the initial pose to obtain a globally consistent motion trajectory and environment map (Sunderhauf and Protzel, 2012). Compared with the diverse algorithms of the frontend, the current types of backend algorithms can be mainly divided into two categories: filter-based methods (such as extended Kalman filter (EKF) Bailey et al., 2006) and optimization-based methods (such as factor graph Wrobel, 2001). They are described as follows: Filter-based methods, which mainly use the Bayesian principle to estimate the current state based on the previous state and current observation data (Liu, 2019).
Typical filter-based methods include extended Kalman filter (EKF) (Bailey et al., 2006), unscented Kalman filter (UKF) (Wan and Merwe, 2000) and particle filter (PF) (Arnaud et al., 2000). Taking the typical EKF-based SLAM method as an example, it is relatively successful in small-scale environments. However, since the covariance matrix is stored, its storage capacity increases with the square of the state quantity, so its application in large unknown scenes is always limited. Based on the optimization method, the core idea of the nonlinear optimization (graph optimization) method is to convert the back-end optimization algorithm into the form of a graph, with the subject pose and environmental features at different times as vertices, and the constraint relationship between vertices is represented by edges (Liang et al., 2013). After constructing the graph, the optimization-based algorithm is used to solve the pose of the target so that the state to be optimized on the vertex better satisfies the constraints on the corresponding edge. After executing the optimization algorithm, the corresponding graph is the target motion trajectory and the environment map. At present, most mainstream visual SLAM systems use nonlinear optimization methods.
2.4 Loopback
The task of loop closure is to allow the system to identify the current scene based on sensor information and determine that the area has been visited when returning to the original position, thereby eliminating the accumulated error of the SLAM system (Newman and Ho, 2005). For visual SLAM, traditional loop closure detection methods mainly use the bag of words (BoW) model (Galvez LoPez and Tardos, 2012), which is implemented as follows: i) Construct a word list containing K words by K-means clustering of local features extracted from the image. ii) Represent the image as a K-dimensional numerical vector based on the number of occurrences of each word. iii) Determine the difference in the scene and identify whether the autonomous vehicle has reached the identified scene.
2.5 Mapping
A fundamental component of autonomous vehicles is the ability to build a map of the environment and localize on the map. Mapping is one of the two tasks of a visual SLAM system (i.e., localization and mapping), and it plays an important role in navigation, obstacle avoidance, and environment reconstruction for autonomous driving. In general, map representations can be divided into two categories: metric maps and topological maps. Metric maps describe the relative positional relationships between map elements, while topological maps emphasize the connectivity between map elements. For classic SLAM systems, metric maps can be further divided into sparse maps and dense maps. Sparse maps contain only a small amount of information in the scene, which is suitable for localization, while dense maps contain more information, which is beneficial for vehicles to perform navigation tasks based on the map.
03 SOTA Research
3.1 Visual SLAM
Similar to the VO subsystem described above, pure visual SLAM systems can be divided into two categories according to the method of utilizing image information: feature-based methods and direct methods. Feature-based methods refer to estimating camera motion between adjacent frames and building environment maps by extracting and matching feature points. The disadvantage of this method is that it takes a long time to extract feature points and calculate descriptors. Therefore, some researchers suggest abandoning the calculation of key points and descriptors and then generating direct methods (Zou et al., 2020).
In addition, according to the type of sensor, visual SLAM can be divided into monocular, binocular, RGB-D and event camera-based methods. According to the density of the map, it can be divided into sparse, dense and semi-dense SLAM, which are introduced as follows:
3.1.1 Feature-based methods
In 2007, Davison et al. (2007) proposed the first real-time monocular vision SLAM system, Mono-SLAM. The result of real-time feature patch direction estimation is shown in Figure 3 (a). The EKF algorithm is used in the back end to track the sparse feature points obtained from the front end, and the camera pose and landmark point direction are used as state quantities to update its mean and covariance. In the same year, Klein and Murray (2007) proposed a parallel tracking and mapping system PTAM. It realizes the parallelization of tracking and mapping work. The process of feature extraction and mapping is shown in Figure 3 (b). For the first time, the front end and the back end are distinguished by a nonlinear optimization method, and a key frame mechanism is proposed.
Key images are connected in series to optimize motion trajectories and feature orientation. Many subsequent visual SLAM system designs have also adopted similar approaches. In 2015, Mur Artal et al. (2015) proposed ORB-SLAM, a relatively complete keyframe-based monocular SLAM method. Compared with the dual-thread mechanism of PTAM, this method divides the entire system into three threads: tracking, mapping, and loop closure. It should be noted that the processes of feature extraction and matching (left column), map construction, and loop detection are all based on ORB features (right column). Figure 3 (c) shows the real-time feature extraction process (left column) and trajectory tracking and mapping results (right column) of a monocular camera in a university road environment.
In 2017, Mur Artal et al. proposed a follow-up version of ORB-SLAM2 (Murartal and Tardos, 2017). This version supports loop detection and relocalization, has real-time map reuse capabilities, and the improved framework also opens the interface between stereo cameras and RGB-D cameras. The left column of Figure 3 (d) shows the stereo trajectory estimation and feature extraction of ORB-SLAM2. The right column of Figure 3 (d) shows the keyframes and dense point cloud mapping effects of the RGB-D camera in indoor scenes. The continuous green squares in the picture constitute the trajectory of the keyframe, and the dense 3D scene map constructed by the RGB-D camera surrounds the keyframe.
3.1.2 Direct-based methods
In 2011, Newcombe et al. (2011b) proposed a monocular SLAM framework based on the direct method DTAM. Unlike feature-based methods, DTAM adopts an inverse depth-based method to estimate the depth of features. The pose of the camera is calculated by direct image matching, and a dense map is constructed by an optimization-based method (Figure 4 (a)). In 2014, Jakob et al. (2014) proposed LSD-SLAM (Figure 4 (b)), which is a successful application of direct methods in the monocular visual SLAM framework. This method applies a pixel-oriented method to a semi-dense monocular SLAM system. Compared with feature-based methods, LSD-SLAM has lower sensitivity, but the system is fragile when the camera intrinsics and illumination change. In 2017, Forster et al. (2017) proposed SVO (Semi-Direct Visual Odometry). It uses a sparse direct method (also called a semi-direct method) to track key points (bottom of Figure 4 (c)) and estimates the pose based on the information around the key points. The top of Figure 4 (c) shows the trajectory of the sparse map in an indoor environment. Since the semi-direct method tracks sparse features and neither calculates descriptors nor processes dense information, SVO has lower time complexity and stronger real-time performance.
In 2016, Engel et al. (2018) proposed DSO, which also uses a semi-direct method to ensure higher accuracy at faster operating speeds. However, they are only visual odometry. Due to the lack of back-end optimization modules and loop closure modules, the tracking error of the system accumulates over time. Figure 4 (d) shows the 3D reconstruction and tracking effects of DSO (monocular visual odometry). The direct method has the advantages of fast calculation speed and insensitivity to weak feature conditions. However, it is based on the strong assumption that the grayscale is unchanged, so it is very sensitive to changes in lighting. On the contrary, the feature point method has good invariance.
In 2020, Zubizarreta et al. (2020) proposed a direct sparse mapping method DSM, which is a fully monocular visual SLAM system based on the photometric bundle adjustment (PBA) algorithm. Table 1 summarizes the main features of the state-of-the-art visual SLAM frameworks and their advantages and disadvantages. In addition to the above typical frameworks, other related works have been studied, such as (i) sparse visual SLAM; (ii) semi-dense visual SLAM; (iii) dense visual SLAM. As you can see, there are many achievements in the field of visual SLAM, and the paper only reviews the popular methods. Even though visual SLAM provides good localization and mapping results, all these solutions have advantages and disadvantages. In this work, the advantages and disadvantages of "sparse-based methods", "dense-based methods" and "feature-based methods" are summarized. The advantages and disadvantages of "direct-based methods", "monocular methods", "stereoscopic methods", "RGB-D methods" and "event camera methods" can be found in Table 2.
Previous article:A brief discussion on the challenges of building an 800V public fast charging network
Next article:Advantages and disadvantages of multi-speed electric drive system for autonomous driving
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- [Last 9 hours] Prize activity: Show off your favorite electronic products
- I recently saw the DJI Mech S1 and it’s very interesting. Is anyone using it?
- MicroPython common IO port simulates SPI communication, which may be used when porting to some boards
- Three-coordinate programming
- Beijing's well-known 5G chip developer is recruiting: digital front-end engineer (RISCv or CPU digital front-end direction)
- Bike modification series: solar energy and batteries
- 【AT-START-F403A Review】Part 3 F403A STOP Mode Current Test
- [Free book 100% gift] A book teaches you how to develop test systems and gain an in-depth understanding of data acquisition systems
- 【DIY Creative LED】Circuit Analysis
- [DIY Bing Dun Dun] + a simple small Dun Dun board