Cross-dataset collaborative learning in semantic segmentation selected by AAAI paper
At this morning Beijing time, the 36th AAAI Conference (Association for the Advancement of Artificial Intelligence) officially opened online. At this AAAI conference, the AMD-Xilinx AI team's paper "Cross-Dataset Collaborative Learning for Semantic Segmentation in Autonomous Driving" was successfully selected. This is the second time that the team has been recognized in the industry's top conferences after last year's CVPR and ICCV.
As one of the top comprehensive conferences in the field of artificial intelligence, AAAI focuses on multiple research fields such as machine learning, natural language processing, computer vision, and data mining. You should know that the AAAI conference is very strict in reviewing papers. All submitted papers must not only go through two rounds of conference review, but the number of papers that can be selected in the end is very limited. This year, the AAAI conference officially received a record 9,251 paper submissions, and finally accepted 1,349 papers, with an overall paper acceptance rate of only 15%.
The AMD-Xilinx AI team's paper can stand out in the fierce competition, which must contain unique innovation and value. We had an in-depth conversation with the first author of the paper, Wang Li, an algorithm engineer of the AMD-Xilinx AI team, to bring you this exclusive paper analysis.
The following is Wang Li’s self-narration.
Wang Li
AMD-Xilinx AI Team Algorithm Engineer
Collaborative learning across datasets
(Cross-Dataset Collaborative Learning,CDCL)
A simple, flexible and general method for semantic segmentation
In this paper, we explore the limitations of current work on semantic segmentation tasks in the field of autonomous driving, which focuses on designing network structures to improve the accuracy of a single target dataset . To address the above issues, we study how to use multiple datasets for collaborative training to improve the generalization ability of a single model on multiple datasets. This is the cross-dataset collaborative learning we proposed in the paper.
Our goal is to train a unified model that leverages the comprehensive information from multiple datasets to improve the generalization of the network and achieve satisfactory performance on each dataset.
First, we introduce a series of Dataset-Aware Blocks (DABs) as the basic computational units of the network , which help capture homogeneous convolutional parameters and dataset-aware statistical distributions across different datasets.
Secondly, we propose a Dataset-Alternation Training (DAT) mechanism to facilitate the multi-dataset collaborative optimization process . We evaluate it on multiple semantic segmentation datasets for autonomous driving and show that our method achieves significant improvements over existing single-dataset methods without introducing additional FLOPs.
For example, based on the segmentation algorithm of PSPNet (ResNet-18), our method improves the single-dataset baseline by 5.65%, 6.57% and 5.79% on the validation sets of three public datasets: Cityscapes, BDD100K and CamVid . At the same time, we also applied CDCL to the 3D point cloud semantic segmentation task to further verify the versatility of our method.
Swipe left and right to view more pictures
Long-term attention + continuous exploration + new direction = brand new algorithm
The new algorithm supports collaborative training of multiple datasets and can bring stable dataset accuracy improvement without computing power overhead
The semantic segmentation task in autonomous driving is an area of our long-term focus, and we have also seen many limitations in it.
First, the current segmentation algorithm mainly improves the accuracy of the target dataset by designing network structures of different complexity . For multiple datasets, this type of algorithm needs to save multiple groups of networks, each network is only responsible for its corresponding dataset, and does not consider the relationship between multiple datasets.
Secondly, in terms of dealing with the distribution differences of multiple datasets and improving the accuracy of the target dataset, the fine-tuning method is the most direct solution, but it requires rich experience to set some hyperparameters, and the two-stage training method will bring additional training overhead. Another way to use multiple datasets for training is label remapping, which also has disadvantages.
Our research direction is different from the above methods. We observe that the statistical parameters of BN in segmentation networks can directly reflect the differences in data distribution, while Conv is dataset-sharable. Therefore, we introduce the dataset-aware block (DAB) as the basic computational unit of the network, which helps capture the statistical distribution of homogeneous Conv and heterogeneous BN across different datasets . In addition, we propose a dataset alternating training (DAT) mechanism to promote the collaborative optimization process.
In short, the algorithm we proposed not only supports the collaborative training of multiple datasets, but also can steadily improve the accuracy of each dataset without introducing additional computing power overhead during the model inference process.
Save data collection and annotation costs for autonomous driving
In our opinion, for semantic segmentation tasks with multiple datasets, current algorithms often need to collect datasets of each target scene separately to fine-tune the model, and pixel-level annotation is very time-consuming and labor-intensive. Our method can effectively use public datasets to improve the generalization ability of the network without the need for additional annotated datasets . In this way, a single model can achieve satisfactory results on multiple target scenes without adding additional resource overhead. At the same time, this method can also achieve good accuracy in zero-sample scenes.
In the future, this method can be applied to autonomous driving scenarios. For different driving scenarios, we do not need to collect and annotate the corresponding datasets to train multiple model parameters. The algorithm in this article can directly use the public annotated datasets of autonomous driving scenarios to train a unified model, so that better results can be obtained in multiple scenarios . This method not only saves the cost of data collection and annotation, but also improves the practicality of the algorithm.
Currently, the first author of the paper, Wang Li, an algorithm engineer of the AMD-Xilinx AI team, has posted a lecture on the paper online. In the lecture, Wang Li elaborated on the core research and related algorithms in depth. You can click the link to watch it.
Note: Xilinx is now part of AMD
Recommended Reading