TL;DR: The CityFlow dataset as mentioned in this paper is a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km.
Abstract: Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. To the best of our knowledge, CityFlow is the largest-scale dataset in terms of spatial coverage and the number of cameras/videos in an urban environment. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions. Camera geometry and calibration information are provided to aid spatio-temporal analysis. In addition, a subset of the benchmark is made available for the task of image-based vehicle re-identification (ReID). We conducted an extensive experimental evaluation of baselines/state-of-the-art approaches in MTMC tracking, multi-target single-camera (MTSC) tracking, object detection, and image-based ReID on this dataset, analyzing the impact of different network architectures, loss functions, spatio-temporal models and their combinations on task effectiveness. An evaluation server is launched with the release of our benchmark at the 2019 AI City Challenge (https://www.aicitychallenge.org/) that allows researchers to compare the performance of their newest techniques. We expect this dataset to catalyze research in this field, propel the state-of-the-art forward, and lead to deployed traffic optimization(s) in the real world.
TL;DR: This paper describes an extension to intersections of the feature-tracking algorithm described in [1], which can accommodate the problem caused by the disruption of feature tracks.
Abstract: Intelligent Transportation Systems need methods to automatically monitor the road traffic, and especially track vehicles. Most research has concentrated on highways. Traffic in intersections is more variable, with multiple entrance and exit regions. This paper describes an extension to intersections of the feature-tracking algorithm described in [1]. Vehicle features are rarely tracked from their entrance in the field of view to their exit. Our algorithm can accommodate the problem caused by the disruption of feature tracks. It is evaluated on video sequences recorded on four different intersections.
TL;DR: This paper introduces the “Miovision traffic camera dataset” (MIO-TCD), the largest dataset for motorized traffic analysis to date, and demonstrates the viability of deep learning methods for vehicle localization and classification from a single video frame in real-life traffic scenarios.
Abstract: The ability to train on a large dataset of labeled samples is critical to the success of deep learning in many domains. In this paper, we focus on motor vehicle classification and localization from a single video frame and introduce the "MIOvision Traffic Camera Dataset" (MIO-TCD) in this context. MIO-TCD is the largest dataset for motorized traffic analysis to date. It includes 11 traffic object classes such as cars, trucks, buses, motorcycles, bicycles, pedestrians. It contains 786,702 annotated images acquired at different times of the day and different periods of the year by hundreds of traffic surveillance cameras deployed across Canada and the United States. The dataset consists of two parts: a "localization dataset", containing 137,743 full video frames with bounding boxes around traffic objects, and a "classification dataset", containing 648,959 crops of traffic objects from the 11 classes. We also report results from the 2017 CVPR MIO-TCD Challenge, that leveraged this dataset, and compare them with results for state-of-the-art deep learning architectures. These results demonstrate the viability of deep learning methods for vehicle localization and classification from a single video frame in real-life traffic scenarios. The topperforming methods achieve both accuracy and Kappa score above 96% on the classification dataset and mean-average precision of 77% on the localization dataset. We also identify scenarios in which state-of-the-art methods still fail and we suggest avenues to address these challenges. Both the dataset and detailed results are publicly available on-line [1].
TL;DR: The second edition of the NVIDIA AI City Challenge provided a forum to more than 70 academic and industrial research teams to compete and solve real-world problems using traffic camera video data.
Abstract: The NVIDIA AI City Challenge has been created to accelerate intelligent video analysis that helps make cities smarter and safer. With millions of traffic video cameras acting as sensors around the world, there is a significant opportunity for real-time and batch analysis of these videos to provide actionable insights. These insights will benefit a wide variety of agencies, from traffic control to public safety. The second edition of the NVIDIA AI City Challenge, being organized as a CVPR workshop, provided a forum to more than 70 academic and industrial research teams to compete and solve real-world problems using traffic camera video data. The Challenge was launched with three tracks — speed estimation, anomaly detection, and vehicle re-identification. Each track was chosen in consultation with traffic and public safety officials based on the value of potential solutions. With the largest available dataset for such tasks, and ground truth for each track, the Challenge enabled 22 teams to evaluate their solutions. Given how complex these tasks are, the results are encouraging and reflect increased value addition year over year for the Challenge.