Journal Article10.1109/iccv51070.2023.01816
DiffusionDet: Diffusion Model for Object Detection
Shoufa Chen,Peize Sun,Yibing Song,Ping Luo +3 more
- 01 Oct 2023
176
TL;DR: DiffusionDet is a novel object detection framework based on a diffusion process, achieving competitive performance with flexibility in the number of boxes and iterations.
read more
Abstract: We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way. Our work possesses an appealing property of flexibility, which enables the dynamic number of boxes and iterative evaluation. The extensive experiments on the standard benchmarks show that DiffusionDet achieves favorable performance compared to previous well-established detectors. For example, DiffusionDet achieves 5.3 AP and 4.8 AP gains when evaluated with more boxes and iteration steps, under a zero-shot transfer setting from COCO to CrowdHuman. Our code is available at https://github.com/ShoufaChen/DiffusionDet.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Diffusion models in medical imaging: A comprehensive survey.
A Kazerouni,Ehsan Khodapanah Aghdam,Moein Heidari,Reza Azad,Mohsen Fayyaz,Ilker Hacihaliloglu,Dorit Merhof +6 more
TL;DR: A comprehensive overview of diffusion models in the discipline of medical imaging can be found in this article , where the authors provide a taxonomy based on their application, imaging modality, organ of interest, and algorithms.
196
A Survey on Generative Diffusion Models
Hanqun Cao,Cheng Tan,Zhan Gao,Yilun Xu,Guangyong Chen,Pheng-Ann Heng,Stan Z. Li +6 more
TL;DR: This survey comprehensively elucidates the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion from three distinct angles: the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion.
145
Dense Distinct Query for End-to-End Object Detection
Shilong Zhang,Xinjiang Wang,Jiaqi Wang,Jiangmiao Pang,Chengqi Lyu,Wenwei Zhang,Ping Luo,Kai Chen +7 more
- 01 Jun 2023
TL;DR: Dense Distinct Query (DDQ) significantly improves object detection performance by combining the advantages of traditional and recent end-to-end detectors.
75
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao,Yongming Rao,Zuyan Liu,Benlin Liu,Jie Zhou,Jiwen Lu +5 more
- 01 Oct 2023
TL;DR: VPD framework utilizes pre-trained text-to-image diffusion models for visual perception tasks, leveraging their high-level knowledge and achieving state-of-the-art performance.
63
Dif-Fusion: Toward High Color Fidelity in Infrared and Visible Image Fusion With Diffusion Models
Jun Yue,Leyuan Fang,Shaobo Xia,Yue Deng,Jiayi Ma +4 more
TL;DR: Dif-Fusion achieves high color fidelity in infrared and visible image fusion by generating the distribution of multi-channel input data with diffusion models.
50
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon,Santosh K. Divvala,Ross Girshick,Ali Farhadi +3 more
- 27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
SSD: Single Shot MultiBox Detector
Wei Liu,Dragomir Anguelov,Dumitru Erhan,Christian Szegedy,Scott Reed,Cheng-Yang Fu,Alexander C. Berg +6 more
- 08 Oct 2016
TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.