The most secretive AI vision player on the smart car track-EEWORLD

Collect

The smart car track hides the most secretive AI vision player.

This player has yet to officially announce any business progress related to smart cars, but it has repeatedly demonstrated its competitiveness in the most core, cutting-edge, and most coveted autonomous driving system capabilities—champion-level dominance at the world’s top AI conference.

Not only is it outstanding in specific technologies, there are many top-level researches in target detection, semantic segmentation, visual reasoning, etc.; it has also won multiple championships in autonomous driving-related competitions, and even used a pure visual solution with 7 cameras to complete a high-speed , autonomous driving in urban and parking environments.

This player is not Tesla’s AI team, this player is Megvii Technology.

In the recent CVPR, the top AI conference, visual research supported by large models is driving new research in the direction of autonomous driving. Megvii Research Institute, in a competition involving autonomous driving and smart car players, won the opportunity to examine autonomous driving. Champion of driving environment awareness.

The superstar in the field of AI vision has not yet been related to smart cars in business.

But with such technical research and results, can it really be purely academic research?

What self-driving competition did Megvii top the charts?

This competition that Megvii Research Institute participated in is a challenge set up by CVPR 2023 specifically for autonomous driving perception and decision-making systems.

Among them, the champion of the OpenLane Topology Relationship Challenge was won by Megvii.

There are four tracks in the challenge. In addition to the OpenLane Topology Challenge that Megvii participated in, there are also the Online HD Map Construction Challenge and the 3D Occupancy Prediction Challenge. ) and nuPlan Planning Challenge (nuPlan Planning).

Among them, the OpenLane topological relationship track mainly examines the ability of autonomous driving technology to understand scenarios.

The track requirements are based on the OpenLane-V2 (OpenLane-Huawei) data set. Given a surround-view camera photo, the contestants need to output the perception results of the lane centerline and traffic elements, as well as the prediction of topological relationships between these elements.

In other words, this competition does not examine the single recognition ability of lane edge lines or traffic signs in previous autonomous driving perceptions. Instead, it requires autonomous driving technology to be able to perceive lane center lines and understand the logic of lane center lines and traffic elements. Relationships, such as the green light, which means which lane can pass.

So how to determine the winner? The OpenLane-V2 data set provides a judgment standard: OLS score (OpenLane-V2 Score), which is judged by calculating the average value of the sensing results and topology prediction mAP.

Among the 34 participating teams, the team from Megvii Research Institute was the only one to score more than 55 points, reaching 55.19 points, which has a clear advantage.

So, what method did Megvii use?

Megvii’s pure vision solution for autonomous driving

First, in the perception phase, Megvii adopted two different models for the two perception tasks of traffic element detection and lane centerline detection.

For traffic element detection, Megvii uses the latest generation YOLOv8 in the mainstream 2D detection model YOLO series as the baseline. Compared with other 2D detection methods, YOLO is faster and has more accurate performance.

△ Image source: GitHub user RangeKing

In addition, the data set OpenLane-V2 used in the competition marked the corresponding relationship between traffic signs and lanes. Megvii added Strong augmentation, Reweighting classification loss, Resampling difficult samples, Pseudo label learning and Test-time augmentation in the YOLOv8 training process, a total of 5 This trick generates features corresponding to traffic elements by interacting with front-view images.

For lane centerline detection, Megvii uses the self-developed PETRv2 model as the baseline. PETRv2 provides a unified purely visual 3D perception framework that can be used for 3D object detection and BEV segmentation.

In this competition, Megvii used PETRv2 to extract 2D features from multi-view images, and used the camera frustum space to generate 3D coordinates, and input the 2D features and 3D coordinates into the 3D position encoder .

The 3D position encoder is then used to generate key and value components for the Transformer decoder , and lane queries interact with image features through the global attention mechanism to generate 3D lane centerline detection results and corresponding lane centerline features.

In the topological relationship prediction stage, Megvii built a multi-stage network framework based on YOLOv8 and PETRv2, used the results generated by the two perception tasks to splice corresponding features, and then used two layers of MLP to predict the corresponding topological relationship matrix.

(Picture note: Megvii’s final prediction results on the validation set, including bounding boxes, categories and confidence levels)

Finally, judging from the OLS score, the method of the Megvii team is ahead of other contestants in traffic element perception (DETt), topological relationship prediction between lane lines (TOPll), and topological relationship prediction between lane lines and traffic elements (TOPlt). By.

The most secretive AI vision player on the smart car track

Participating in this competition is the MFV (Megvii-Foundation model-Video) team of Megvii Research Institute.

The first author of the competition results paper is Wu Dongming. He obtained a bachelor's degree from the Xu Class of Beijing Institute of Technology in 2019. He later went on to study for a doctorate in the Department of Computer Science at Beijing Institute of Technology, studying under Professor Shen Jianbing. In 2022, he became a research intern at Megvii Research Institute.

The other authors of the paper are also from Megvii Research Institute, including Chang Jiahao, who graduated from the University of Science and Technology of China, and Li Zhuoling, who graduated from the University of Hong Kong.

It is worth mentioning that the PETRv2 model used in this challenge was one of the academic achievements released by the research team led by Dr. Sun Jian, the founding director of Megvii Research Institute, before his death.

Moreover, this is not Megvii’s only autonomous driving-related research results.

In addition to the PETR series of large models, Megvii has also released the B EVD epth detection model (which can achieve high-precision depth estimation for 3D targets), LargeKernel3D (for the first time proving the feasibility and necessity of large convolution kernels for 3D vision tasks), BEVStereo ( nuScenes pure vision solution 3D target detection S OTA ), etc... are all industry-leading technological achievements.

△ BEVStereo model framework

Megvii Research Institute has always been the research and development "brain" of Megvii AI technology, focusing on deep learning and computer vision. It also includes AI productivity platform Brain++, open source deep learning framework Tianyuan MegEngine, mobile high-efficiency convolutional neural network ShuffleNet, etc. The birthplace of results, it has published more than 120 papers at the world's top conferences, won more than 40 championships in top competitions, and has more than 1,300 business-related patent authorizations.

Moreover, unlike corporate research institutes that are pure R&D or cutting-edge technology pre-research layouts, Megvii Research Institute has been used as a combat force from the beginning. Therefore, Megvii Research Institute’s latest results and the direction it aims at are generally not done on a whim. , or research purely for research’s sake.

So this is what Megvii needs to pay attention to after it has successively produced top results in the field of autonomous driving and smart cars.

Compared with its old friend SenseTime, Megvii has not officially announced any smart car, autonomous driving business or cooperation, while SenseTime has launched a dedicated smart car business brand Jueying, led by co-founder Wang Xiaogang, with the goal of becoming SenseTime’s pillar new growth engine.

Regarding the trillion-dollar track such as smart cars and autonomous driving, will Megvii remain calm and stay put? Not too possible.

What's more, everything from technical research capabilities to technical implementation levels have been demonstrated through the summit.

Moreover, Megvii Research Institute also showed a self-driving pre-research demo, which can realize self-driving on highways and urban areas using only 7 cameras, and can also complete horizontal, vertical and side parking.

What level is this?

For reference, Tesla, the pure vision king, requires at least 8 cameras for its autonomous driving perception solution.

Reference address：The most secretive AI vision player on the smart car track

Previous article：US university develops new software to safely develop/test self-driving cars using realistic environments
Next article：Hyundai Motor: North America does not join Tesla Supercharging standard, charging speed is 3 times slower!