Real Skynet: Nvidia launches first cross-camera car tracking dataset-EEWORLD

Collect

Cities have great potential to use traffic cameras as citywide sensors to optimize traffic flow and manage traffic incidents, but existing technologies lack the ability to track vehicles over large areas, across multiple cameras, at different intersections, and in varying weather conditions.

To overcome this challenge, three distinct but closely related research problems must be addressed: 1) detection and tracking of objects within a single camera, i.e., multi-target single-camera (MTSC) tracking; 2) re-identification of objects across multiple cameras, i.e., ReID; 3) detection and tracking of objects across a network of cameras, i.e., multi-target cross-camera tracking (MTMC tracking). MTMC tracking can be seen as a combination of MTSC tracking within a camera and image-based ReID, connecting object trajectories between cameras.

As shown in Figure 1, multi-target cross-camera tracking consists of three major components: image-based re-identification, multi-target tracking within a single camera, and spatiotemporal analysis between cameras.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Figure 1: Multi-target cross-camera tracking

Compared with the recently popular pedestrian re-identification, vehicle re-identification faces two major challenges: one is the high variability within the class (because vehicles vary more from different perspectives than people), and the other is the high similarity between classes (because vehicle models produced by different car manufacturers are very similar). The existing vehicle re-identification datasets (VeRi-776 from Beihang University, VehicleID from Peking University, and PKU-VD from Peking University) do not provide original video and camera correction information, so they cannot be used to carry out video-based cross-camera vehicle tracking research.

The "Mobile City" dataset proposed by the authors of this paper contains high-definition synchronized videos, covers the largest number of intersections (10) and the largest number of cameras (40), collected in a medium-sized American city, and has a variety of scenes, including residential areas and highways. The main contributions of this paper are as follows:

Among existing datasets, this dataset has the largest spatial span and number of cameras/intersections, including diverse urban scenes and traffic flows, providing the best platform for city-scale solutions.
"Mobile City" is also the first dataset that supports (video-based) cross-camera multi-target vehicle tracking, providing original video, camera distribution and camera correction information, which will open the door to a new research field.
The performance of various state-of-the-art algorithms on this dataset was analyzed, and various algorithms combining visual and spatiotemporal analysis were compared, proving that this dataset is more challenging than other existing datasets.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Figure 2: Schematic diagram of the spatial distribution of cameras. The red arrows indicate the position and direction of the cameras.

论文：CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Paper link: https://arxiv.org/abs/1903.09254

Abstract : Urban traffic optimization using traffic cameras as sensors requires more powerful support for multi-target cross-camera tracking. This paper introduces CityFlow, a city-scale traffic camera dataset that includes more than 3 hours of synchronized HD video collected from 40 cameras extracted from 10 intersections, with the longest distance between two synchronized cameras being 2.5 km. To the best of our knowledge, "Flowing City" is currently the largest dataset in urban environments in terms of spatial span and number of cameras/videos. The dataset contains more than 200,000 object boxes and covers a variety of scenes, viewpoints, vehicle models, and urban traffic conditions.

We provide camera distribution and correction information to assist spatiotemporal analysis. In addition, we also provide a subset of this dataset for image-based vehicle re-identification. We conduct extensive experimental analysis, testing a variety of benchmark/state-of-the-art algorithms for cross-camera multi-object tracking, single-camera multi-object tracking, object detection and re-identification, and analyzing different network structures, loss functions, spatiotemporal models and their combinations.

This dataset and online evaluation server have been released in the 2019 AI City Competition (https://www.aicitychallenge.org/), where researchers can test their latest algorithmic techniques. We hope that this dataset will promote research in this field, improve the effectiveness of current algorithms, and optimize real-world traffic management. To protect privacy, all license plates and faces in the dataset have been occluded.

Comparison of “Mobile City” with relevant benchmarks

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Table 1: Summary of existing object re-identification datasets

It can be seen that "Mobile City" is currently the only dataset that supports cross-camera vehicle tracking. It has the largest number of cameras, more than 200,000 target boxes, and provides original video, camera distribution and multi-view analysis.

"Mobile City" Benchmark Dataset

The entire dataset includes 5 different scenes and 40 cameras, with a total video length of about 3 hours and 15 minutes, and 666 vehicles’ cross-camera trajectories are annotated. The following is a summary of these scenes (some scenes have overlapping cameras).

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

The figure below shows the distribution of vehicle colors and models.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

The following is an example of tracking and annotation results. The researchers first used the current advanced target detection and single-camera tracking methods to obtain a rough target trajectory, and manually fixed the errors in the trajectory. On this basis, they performed cross-camera information annotation.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

At the same time, they used the three-dimensional information of Google Maps and the two-dimensional projection results on the image for matching and optimization, and obtained a more accurate homography matrix, which was provided to the participating teams for three-dimensional space-time analysis.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Their experimental analysis is divided into three parts: image-based vehicle re-identification, single-camera multi-target tracking, and cross-camera tracking combined with spatiotemporal analysis.

First, for the re-identification part, the researchers compared the winning methods of last year's AI City Competition, the current best method for pedestrian re-identification (organized from the deep-person-reid project of Queen Mary University of London), and the best method for vehicle re-identification (from NVIDIA, just accepted by IJCNN). Below is a comparison of the CMC curves of these methods (the larger the enclosing area, the better the effect). It can be seen that the methods of pedestrian re-identification and vehicle re-identification are comparable on this dataset, but the overall accuracy of these methods is still very low, with a Rank-1 hit rate of only about 50%. In comparison, the same method can get a Rank-1 hit rate of more than 90% on the current VeRi dataset, which shows that the challenges of this dataset are still very large.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Below is a comparison of the ranking results of these methods. It can be seen that the camera's perspective is very diverse, which also brings greater difficulty.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

The following table compares the current most advanced single-camera tracking algorithms and combinations of target detection methods. DS stands for Deep SORT from the University of Koblenz-Landau in Germany, TC is the winning method in last year's AI City Competition, and MO is the leading method MOANA on the 3D tracking dataset of the current MOTChallenge (Multi-Object Tracking Competition). The target detection part compares YOLO, SSD, and Faster R-CNN. The best results so far come from the combination of TC and SSD.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

Finally, the table below adds a comparison of spatiotemporal analysis to compare the final results of cross-camera multi-target tracking. PROVID is the method used by the author of the VeRi dataset. 2WGMMF is a method previously proposed by the author's laboratory, which uses Gaussian distribution to learn the spatiotemporal relationship between cameras. Finally, FVS is part of the author's winning method in last year's AI City Competition. It uses manual setting of Gaussian distribution across cameras, so it is more accurate.

CVPR 2019 Nearly Full Score Paper: NVIDIA Launches the First Cross-Camera Car Tracking Dataset

About the Author

The first author of this paper, Zheng Tang, is a doctoral student at the School of Electronic and Computer Engineering at the University of Washington (Seattle) and is expected to graduate in June this year. The author is currently interning at NVIDIA and will join Amazon after graduation to join the unmanned store "Go" project. This paper is the result of his internship at NVIDIA.

In 2017 and 2018, Tang Zheng led his laboratory team to participate in the AI City Competition hosted by NVIDIA. Their team won the championship for two consecutive times, defeating nearly 40 teams from around the world, including the University of California, Berkeley, the University of Illinois at Urbana-Champaign, the University of Maryland, College Park, Beijing University of Posts and Telecommunications, and National Taiwan University. The second competition was a workshop at CVPR 2018. Because of the team's outstanding performance, Tang Zheng was invited to intern at NVIDIA, responsible for assisting in the preparation of the third AI City Competition (also a workshop at CVPR 2019 this year) and preparing the benchmark data set, which is the "Mobile City" data set introduced in this article.

今年的 AI 城市大赛共有三个分赛：跨摄像头多目标车辆跟踪、基于图片的车辆再识别以及交通异常检测。目前已经有全球超过 200 支参赛队伍报名（合计超过 700 名参赛者），是前两年比赛总和的四倍之多。英伟达会在今年加州长滩的 CVPR 会议上公布获奖队伍和颁发奖品（一台 Quadro GV100、三台 Titan RTX 和两台 Jetson AGX Xavier）。目前比赛仍然接受参赛队伍报名和 workshop 投稿，比赛截止时间是 5 月 10 日。另外，论文的其他作者包括英伟达 AI 城市项目的 CTO - Milind Naphade、英伟达研究院的 GAN 领域专家 - 劉洺堉、同样来自英伟达研究院的杨晓东（今年有三篇 CVPR oral 中稿）、英伟达雷蒙德分公司的首席研究员 - Stan Birchfield、汤政的导师黄正能教授等。

Reference address：Real Skynet: Nvidia launches first cross-camera car tracking dataset

Previous article：Nvidia's becoming the "strongest brain" for autonomous driving is inevitable by chance
Next article：What is the difference between Tesla and Waymo in autonomous driving?

Popular Resources
Popular amplifiers