BEV perception is an important trend in autonomous driving. Conventional autonomous driving algorithm methods are based on detection, segmentation, and tracking in front views or perspective views, while the surrounding scene can be represented in BEV, which is relatively more intuitive, and representing the target in BEV is most ideal for subsequent modules. This study proposes a new BEV learning mechanism: Geometry-guided Kernel Transformer. GKT uses geometric priors to guide the transformer to focus on distinguishing areas and expand kernel features to generate BEV representations.
Background
For multi-view camera systems, the difficulty lies in how to convert 2D image representation into BEV representation. According to whether the geometric information is explicitly used for feature transformation, existing methods can be divided into two categories: geometry-based point-by-point transformation and geometry-free global transformation.
The point-by-point transformation based on geometry is shown in the figure below. Using the available correspondences, the 2D features are projected into 3D space and form a BEV representation.
However, this method is too dependent on calibration parameters. The external environment of a moving car is complex, and the camera may deviate from the calibration position during operation, which will cause errors in the system. In addition, point-by-point is complex and time-consuming, making it difficult to apply in actual environments.
The global transformation method takes into account the full correlation between the 2D image and the BEV view. This method is less affected by camera bias. But this also brings problems: the computing power required for global transformation is proportional to the number of pixels in the image. In addition, the model must globally mine discriminative information from all views, which makes the convergence process of the image more difficult.
In this study, researchers proposed a new 2D to BEV transformation mechanism to achieve higher efficiency and robustness, the algorithm is called Geometry-guided Kernel Transformer (GKT). The method uses coarse camera parameters to project BEV positions to obtain 2D positions in multi-view and multi-scale feature maps. Then, the model expands Kh×Kw kernel features around the previous position and interacts the BEV algorithm with the corresponding expanded features to generate BEV views.
Implementation
GKT uses geometric deep learning to guide the transformer to focus on the key areas, and expand the core features to generate BEV views. The framework of the proposed GKT is shown in the figure below. The upper part of the figure shows that the geometric information is used to guide the transformer to focus on the previous area in the multi-view image. The lower part is the core features of the expanded previous area, which interact with the BEV query to generate the BEV representation. In this method, a shared convolutional neural network extracts view features from the surrounding view, and the BEV space is evenly divided into grids, each grid corresponding to a 3D coordinate.
In order to make judgments quickly, the study introduced different indexing methods to get rid of the camera's dependence on calibration parameters at runtime. By comparing three methods: Im2col, grid sampling, and LUT indexing, it was determined that LUT indexing can enable GKT to achieve higher FPS. In this method, the core area of each BEV grid is fixed and can be pre-calculated offline. At runtime, the system can obtain the corresponding pixel index of each BEV query from the LUT and obtain features more quickly through indexing.
test
The algorithm is trained on the nuScenes dataset and evaluated on the val dataset. In the figure below, the vehicle map view segmentation efficiency of GKT and other BEV-based methods under two settings is compared. The test uses a 1×7 convolution to capture horizontal image information, and then applies an effective 7×1 kernel in GKT for 2D to BEV view conversion.
In the test, GKT showed high robustness. Compared with point-by-point transformation, GKT only uses camera parameters as a guide, rather than relying on them completely. When the camera deviates, the kernel area moves accordingly, but can still cover the target. Since the arrangement position of the transformer remains unchanged, but the attention weight of the core area is dynamically generated according to the offset, GKT can always focus on the target, thereby reducing the impact of camera deviation.
At the same time, GKT is highly efficient. By using the proposed LUT index, the 2D-3D mapping operation required for point-by-point transformation can be eliminated at runtime, making the forward process compact and fast. Compared with the global transformation, GKT only focuses on the geometry-guided kernel region and avoids global interactions. In comparison, GKT requires fewer assumptions and converges faster.
Therefore, GKT combines the advantages of point-by-point transformation and global transformation to achieve efficient and robust 2D to BEV view learning. Tests on the nuScenes dataset also show that GKT is very efficient, running at 72.3 FPS using a 3090 GPU and 45.6 FPS using a 2080ti GPU, which is lower than all current methods.
Previous article:Research and analysis on environmental wind tunnel frosting test of electric vehicle heat pump system
Next article:Autonomous driving algorithm based on Kalman filter
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- 2018 EEWORLD technical live broadcast replay summary ~ 40+ sessions including multiple hot topics, stay at home to know the forefront of the industry
- If the length of Ethernet data packet exceeds 1500 bytes, does it need to be manually divided into packets?
- Qinheng benefits are here, evaluation boards are given away! Choose from three models: CH549, CH559, and CH554 for free!
- How should I find the development board schematics and PCBs of major manufacturers?
- EEWORLD University ---- Wireless Power 101
- TI mmWave Radar MIMO (2TX4RX) Setup
- [STM32MP157C-EV1] After the first unboxing and evaluation, I will run rtt on it and share other
- MSP430 Program Library---Button
- EEWORLD University ---- Live Replay: How Littelfuse Improves the Safety and Reliability of Electronic Equipment in Smart Buildings in the Internet of Things Era
- [NXP Rapid IoT Review] + Kit Modification-External Lithium Battery