Research on autonomous driving fusion algorithms: BEV drives algorithm revolution, and AI large models drive algorithm iterations-EEWORLD

Collect

The core of the autonomous driving algorithm technology framework is divided into three parts: environment perception, decision planning, and control execution.

Environment perception: Convert sensor data into machine language of the scene where the vehicle is located, which can include: object detection, recognition and tracking, environment modeling, motion estimation, etc.;

Decision planning: Based on the output results of the perception algorithm, the final behavioral action instructions are given, including behavioral decisions (vehicle following, stopping and overtaking), action decisions (car steering, speed, etc.), path planning, etc.;

Control execution: Based on the output results of the decision-making layer, the underlying modules are mobilized to issue instructions to core control components such as throttle and brake to promote the vehicle to travel according to the planned route.

BEV drives algorithm revolution

BEV perception has received widespread attention in recent years. The BEV model mainly provides a unified space to facilitate the fusion of various tasks and sensors. It mainly has the following advantages:

BEV unifies multi-modal data processing dimensions, making multi-modal fusion easier

The BEV perception system can convert the information obtained by multiple cameras or radars into a bird's-eye view, and then perform tasks such as target detection and instance segmentation, which can more intuitively display the size and direction of objects in the BEV space.

In 2022, Peking University & Alibaba proposed a lidar and vision fusion framework - BEVFusion. The processing of radar point cloud and image processing are performed independently, using neural networks for encoding, projecting them into a unified BEV space, and then processing the two on the BEV space. Fusion.

BEVFusion framework

Source: arXiv

Realize time series information fusion and build 4D space

In the 4D space, the perception algorithm can better complete perception tasks such as speed measurement, and can pass the results of motion prediction to the decision-making and control module.

The intelligence robot proposed BEVDet4D in 2022, which is a version based on BEVDet with added timing fusion. BEVDet4D extends BEVDet by retaining intermediate BEV features of past frames, and then fuses the features by aligning and splicing with the current frame, allowing temporal cues to be obtained by querying two candidate features.

BEVDet4D network structure

Source: arXiv

"Brain supplement" occluded objects to achieve object prediction

In the BEV space, the algorithm can predict the occluded area based on prior knowledge and "brain" whether there are objects in the occluded area.

FIERY, proposed by Wayve and the University of Cambridge in 2021, is an end-to-end road dynamic object instance prediction algorithm that does not rely on high-precision maps and is only based on the bird's-eye view of a monocular camera.

FIERY model

Source: arXiv

Promote the development of end-to-end autonomous driving framework

In the BEV space, perception and prediction can be directly optimized end-to-end through neural networks in a unified space, and the results can be obtained at the same time. Not only the perception module, but also the planning and decision-making module based on BEV is also the direction of academic research.

In 2022, the paper ST-P3, co-authored by the autonomous driving team of the Shanghai Artificial Intelligence Laboratory and the team of Associate Professor Yan Junchi of Shanghai Jiao Tong University, proposed a spatio-temporal feature learning scheme that can simultaneously provide a more representative set of tasks for perception, prediction and planning tasks. sexual characteristics.

ST-P3 architecture

Source: arXiv

AI large models drive algorithm iteration

After 2012, deep learning algorithms have basically taken over the main branches of autonomous driving. In order to support larger and more complex AI computing needs, large AI models with the characteristics of "huge amounts of data, huge amounts of computing power, and huge amounts of algorithms" were born, which speed up algorithm iteration.

Large models and intelligent computing centers

In 2021, Haomo Zhixing launched the research and implementation of the Transformer large model, and then gradually used it on a large scale in projects including multi-modal sensing data fusion and cognitive model training. In December 2021, Haomo Zhixing released the autonomous driving data intelligence system MANA (Chinese name "Xuehu"), which integrates perception, cognition, annotation, simulation, calculation and other aspects. In January 2023, MANA OASIS, the MANA supercomputing center, was unveiled. It was jointly built by MANA and Volcano Engine. It can reach 67 billion floating-point operations per second. After the MANA training platform is deployed on OASIS, It can run various applications including cloud large model training, vehicle-side model training, annotation, simulation, etc. With the blessing of MANA OASIS, the five major models of MANA are ushering in a new appearance and upgrade.

MANA Supercomputing Center——Snow Lake·Oasis (MANA OASIS)

Source: Haomo Zhixing

Five major models of Haomo

Source: Haomo Zhixing

In August 2022, based on the Alibaba Cloud intelligent computing platform, Xpeng Motors built the autonomous driving intelligent computing center "Fuyao", dedicated to autonomous driving model training. In October 2022, Xiaopeng also announced the introduction of the Transformer large model.

Xpeng Automobile Intelligent Computing Center——Fuyao

Source: Xpeng Motors

In November 2022, Baidu released the Wenxin large model. Its autonomous driving perception model has a parameter scale of more than 1 billion, has thousands of object recognition capabilities, and greatly expanded the autonomous driving semantic recognition data. At present, it is mainly used in three aspects: long-range viewing, multi-modality and data mining.

Baidu Wenxin large model application - multi-modal