Journal Article10.1109/mits.2023.3298534
Collaborative Perception in Autonomous Driving: Methods, Datasets, and Challenges
Yushan Han,Hui Zhang,Huifang Li,Yi Jin,Congyan Lang,Yidong Li +5 more
48
TL;DR: Collaborative perception is crucial for autonomous driving and involves addressing occlusion and sensor failure issues. Recent advancements in collaborative perception have increased, but few reviews have focused on systematical collaboration modules and datasets. This article reviews recent achievements to bridge this gap and motivate future research.
read more
Abstract: Collaborative perception is essential to address occlusion and sensor failure issues in autonomous driving. In recent years, theoretical and experimental investigations of novel works for collaborative perception have increased tremendously. So far, however, few reviews have focused on systematical collaboration modules and large-scale collaborative perception datasets. This article reviews recent achievements in this field to bridge this gap and motivate future research. We start with a brief overview of collaboration schemes. After that, we systematically summarize the collaborative perception methods for ideal scenarios and real-world issues. The former focuses on collaboration modules and efficiency, and the latter is devoted to addressing the problems in actual application. Furthermore, we present large-scale public datasets and summarize quantitative results on these benchmarks. Finally, we highlight gaps and overlooked challenges between current academic research and real-world applications.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception
Runsheng Xu,Xinhui Xia,Jinlong Li,Hanzhao Li,Shuo Zhang,Zhengzhong Tu,Zonglin Meng,Hao Xiang,Xiaoyu Dong,Rui Song,Hongkai Yu,Bolei Zhou,Jiaqi Ma +12 more
- 01 Jun 2023
TL;DR: V2V4Real is the first large-scale real-world multi-modal dataset for V2V perception, containing LiDAR, RGB, 3D bounding boxes, and HDMaps. It facilitates the development of cooperative perception systems and provides benchmarks for recent algorithms.
81
Toward Ensuring Safety for Autonomous Driving Perception: Standardization Progress, Research Advances, and Perspectives
Chen Sun,Ruihe Zhang,Yukun Lu,Yaodong Cui,Zejian Deng,Dongpu Cao,Amir Khajepour +6 more
TL;DR: The survey explores safety-related advancements in autonomous driving perception systems, covering standards, sensory modeling, metrics, and potential failures. It highlights the challenges and future directions in the field.
12
Artificial intelligence based object detection and traffic prediction by autonomous vehicles – A review
Preeti Sharma,Chhavi Rana +1 more
7
QUEST: Query Stream for Practical Cooperative Perception
Siqi Fan,Haibao Yu,Wenxian Yang,Jirui Yuan,Zaiqing Nie +4 more
- 13 May 2024
TL;DR: This paper proposes QUEST, a cooperative perception framework enabling interpretable instance-level flexible feature interaction via query stream flow among agents, demonstrating effectiveness in camera-based vehicle-infrastructure perception with real-world dataset DAIR-V2X-Seq.
Occlusion-Aware Planning for Autonomous Driving With Vehicle-to-Everything Communication
TL;DR: Occlusion-aware planning for autonomous driving with V2X communication enhances driving behaviors by leveraging perception data from onboard sensors and V2X communications independently, generating phantom road users in occluded areas, and integrating real and phantom road users into a POMDP planner to provide safe driving policies.
5
References
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger,Philipp Fischer,Thomas Brox +2 more
- 05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K
•Posted Content
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy,Lucas Beyer,Alexander Kolesnikov,Dirk Weissenborn,Xiaohua Zhai,Thomas Unterthiner,Mostafa Dehghani,Matthias Minderer,Georg Heigold,Sylvain Gelly,Jakob Uszkoreit,Neil Houlsby +11 more
TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
•Posted Content
Squeeze-and-Excitation Networks
TL;DR: Squeeze-and-excitation (SE) as mentioned in this paper adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels, which can be stacked together to form SENet architectures.
18.9K