Cities have great potential to use traffic cameras as citywide sensors to optimize traffic flow and manage traffic incidents, but existing technologies lack the ability to track vehicles over large areas, across multiple cameras, at different intersections, and in varying weather conditions.
To overcome this challenge, three distinct but closely related research problems must be addressed: 1) detection and tracking of objects within a single camera, i.e., multi-target single-camera (MTSC) tracking; 2) re-identification of objects across multiple cameras, i.e., ReID; 3) detection and tracking of objects across a network of cameras, i.e., multi-target cross-camera tracking (MTMC tracking). MTMC tracking can be seen as a combination of MTSC tracking within a camera and image-based ReID, connecting object trajectories between cameras.
As shown in Figure 1, multi-target cross-camera tracking consists of three major components: image-based re-identification, multi-target tracking within a single camera, and spatiotemporal analysis between cameras.
Compared with the recently popular pedestrian re-identification, vehicle re-identification faces two major challenges: one is the high variability within the class (because vehicles vary more from different perspectives than people), and the other is the high similarity between classes (because vehicle models produced by different car manufacturers are very similar). The existing vehicle re-identification datasets (VeRi-776 from Beihang University, VehicleID from Peking University, and PKU-VD from Peking University) do not provide original video and camera correction information, so they cannot be used to carry out video-based cross-camera vehicle tracking research.
The "Mobile City" dataset proposed by the authors of this paper contains high-definition synchronized videos, covers the largest number of intersections (10) and the largest number of cameras (40), collected in a medium-sized American city, and has a variety of scenes, including residential areas and highways. The main contributions of this paper are as follows:
Among existing datasets, this dataset has the largest spatial span and number of cameras/intersections, including diverse urban scenes and traffic flows, providing the best platform for city-scale solutions.
"Mobile City" is also the first dataset that supports (video-based) cross-camera multi-target vehicle tracking, providing original video, camera distribution and camera correction information, which will open the door to a new research field.
The performance of various state-of-the-art algorithms on this dataset was analyzed, and various algorithms combining visual and spatiotemporal analysis were compared, proving that this dataset is more challenging than other existing datasets.
论文:CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification
Paper link: https://arxiv.org/abs/1903.09254
Abstract : Urban traffic optimization using traffic cameras as sensors requires more powerful support for multi-target cross-camera tracking. This paper introduces CityFlow, a city-scale traffic camera dataset that includes more than 3 hours of synchronized HD video collected from 40 cameras extracted from 10 intersections, with the longest distance between two synchronized cameras being 2.5 km. To the best of our knowledge, "Flowing City" is currently the largest dataset in urban environments in terms of spatial span and number of cameras/videos. The dataset contains more than 200,000 object boxes and covers a variety of scenes, viewpoints, vehicle models, and urban traffic conditions.
We provide camera distribution and correction information to assist spatiotemporal analysis. In addition, we also provide a subset of this dataset for image-based vehicle re-identification. We conduct extensive experimental analysis, testing a variety of benchmark/state-of-the-art algorithms for cross-camera multi-object tracking, single-camera multi-object tracking, object detection and re-identification, and analyzing different network structures, loss functions, spatiotemporal models and their combinations.
This dataset and online evaluation server have been released in the 2019 AI City Competition (https://www.aicitychallenge.org/), where researchers can test their latest algorithmic techniques. We hope that this dataset will promote research in this field, improve the effectiveness of current algorithms, and optimize real-world traffic management. To protect privacy, all license plates and faces in the dataset have been occluded.
Comparison of “Mobile City” with relevant benchmarks
It can be seen that "Mobile City" is currently the only dataset that supports cross-camera vehicle tracking. It has the largest number of cameras, more than 200,000 target boxes, and provides original video, camera distribution and multi-view analysis.
"Mobile City" Benchmark Dataset
The entire dataset includes 5 different scenes and 40 cameras, with a total video length of about 3 hours and 15 minutes, and 666 vehicles’ cross-camera trajectories are annotated. The following is a summary of these scenes (some scenes have overlapping cameras).
The figure below shows the distribution of vehicle colors and models.
The following is an example of tracking and annotation results. The researchers first used the current advanced target detection and single-camera tracking methods to obtain a rough target trajectory, and manually fixed the errors in the trajectory. On this basis, they performed cross-camera information annotation.
At the same time, they used the three-dimensional information of Google Maps and the two-dimensional projection results on the image for matching and optimization, and obtained a more accurate homography matrix, which was provided to the participating teams for three-dimensional space-time analysis.
Their experimental analysis is divided into three parts: image-based vehicle re-identification, single-camera multi-target tracking, and cross-camera tracking combined with spatiotemporal analysis.
First, for the re-identification part, the researchers compared the winning methods of last year's AI City Competition, the current best method for pedestrian re-identification (organized from the deep-person-reid project of Queen Mary University of London), and the best method for vehicle re-identification (from NVIDIA, just accepted by IJCNN). Below is a comparison of the CMC curves of these methods (the larger the enclosing area, the better the effect). It can be seen that the methods of pedestrian re-identification and vehicle re-identification are comparable on this dataset, but the overall accuracy of these methods is still very low, with a Rank-1 hit rate of only about 50%. In comparison, the same method can get a Rank-1 hit rate of more than 90% on the current VeRi dataset, which shows that the challenges of this dataset are still very large.
Below is a comparison of the ranking results of these methods. It can be seen that the camera's perspective is very diverse, which also brings greater difficulty.
The following table compares the current most advanced single-camera tracking algorithms and combinations of target detection methods. DS stands for Deep SORT from the University of Koblenz-Landau in Germany, TC is the winning method in last year's AI City Competition, and MO is the leading method MOANA on the 3D tracking dataset of the current MOTChallenge (Multi-Object Tracking Competition). The target detection part compares YOLO, SSD, and Faster R-CNN. The best results so far come from the combination of TC and SSD.
Finally, the table below adds a comparison of spatiotemporal analysis to compare the final results of cross-camera multi-target tracking. PROVID is the method used by the author of the VeRi dataset. 2WGMMF is a method previously proposed by the author's laboratory, which uses Gaussian distribution to learn the spatiotemporal relationship between cameras. Finally, FVS is part of the author's winning method in last year's AI City Competition. It uses manual setting of Gaussian distribution across cameras, so it is more accurate.
About the Author
The first author of this paper, Zheng Tang, is a doctoral student at the School of Electronic and Computer Engineering at the University of Washington (Seattle) and is expected to graduate in June this year. The author is currently interning at NVIDIA and will join Amazon after graduation to join the unmanned store "Go" project. This paper is the result of his internship at NVIDIA.
In 2017 and 2018, Tang Zheng led his laboratory team to participate in the AI City Competition hosted by NVIDIA. Their team won the championship for two consecutive times, defeating nearly 40 teams from around the world, including the University of California, Berkeley, the University of Illinois at Urbana-Champaign, the University of Maryland, College Park, Beijing University of Posts and Telecommunications, and National Taiwan University. The second competition was a workshop at CVPR 2018. Because of the team's outstanding performance, Tang Zheng was invited to intern at NVIDIA, responsible for assisting in the preparation of the third AI City Competition (also a workshop at CVPR 2019 this year) and preparing the benchmark data set, which is the "Mobile City" data set introduced in this article.
今年的 AI 城市大赛共有三个分赛:跨摄像头多目标车辆跟踪、基于图片的车辆再识别以及交通异常检测。目前已经有全球超过 200 支参赛队伍报名(合计超过 700 名参赛者),是前两年比赛总和的四倍之多。英伟达会在今年加州长滩的 CVPR 会议上公布获奖队伍和颁发奖品(一台 Quadro GV100、三台 Titan RTX 和两台 Jetson AGX Xavier)。目前比赛仍然接受参赛队伍报名和 workshop 投稿,比赛截止时间是 5 月 10 日。另外,论文的其他作者包括英伟达 AI 城市项目的 CTO - Milind Naphade、英伟达研究院的 GAN 领域专家 - 劉洺堉、同样来自英伟达研究院的杨晓东(今年有三篇 CVPR oral 中稿)、英伟达雷蒙德分公司的首席研究员 - Stan Birchfield、汤政的导师黄正能教授等。
Previous article:Nvidia's becoming the "strongest brain" for autonomous driving is inevitable by chance
Next article:What is the difference between Tesla and Waymo in autonomous driving?
- Popular Resources
- Popular amplifiers
- A review of deep learning applications in traffic safety analysis
- Dual Radar: A Dual 4D Radar Multimodal Dataset for Autonomous Driving
- A review of learning-based camera and lidar simulation methods for autonomous driving systems
- Multimodal perception parameterized decision making for autonomous driving
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- Large amplitude sine wave signal frequency acquisition circuit
- 【Share】Output adjustable
- [N32L43x Review] 6. Software and Hardware I2C Driver for 0.96-inch OLED
- GD32E231 DIY Competition (2) - Unboxing and Testing
- Some bytes of code in the application memory area of MSP430FR5949IDAR FRAM are tampered
- Overview of RFID Anti-collision Technology
- MSP430 low power mode - while loop fails
- [MM32 eMiniBoard Review] Give feedback on this board
- IWR6843 Smart mmWave Sensor Antenna-Package Evaluation Module
- Three designs of constant current circuit