Book Chapter10.1007/978-3-031-20077-9_13
End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
Mingxiang Liao,Fang Wang,Yuan-Gen Yao,Zhenjun Han,Jialing Zou,Yuze Wang,Bailan Feng,Peng Yuan,Qixiang Ye +8 more
- 01 Jan 2022
pp 210-226
16
TL;DR: Xiang et al. as discussed by the authors propose a sparse proposal evolution (SPE) approach, which advances WSOD from the two-stage pipeline with dense proposals to an end-to-end framework with sparse proposals.
read more
Abstract: Conventional methods for weakly supervised object detection (WSOD) typically enumerate dense proposals and select the discriminative proposals as objects. However, these two-stage “enumerate-and-select” methods suffer object feature ambiguity brought by dense proposals and low detection efficiency caused by the proposal enumeration procedure. In this study, we propose a sparse proposal evolution (SPE) approach, which advances WSOD from the two-stage pipeline with dense proposals to an end-to-end framework with sparse proposals. SPE is built upon a visual transformer equipped with a seed proposal generation (SPG) branch and a sparse proposal refinement (SPR) branch. SPG generates high-quality seed proposals by taking advantage of the cascaded self-attention mechanism of the visual transformer, and SPR trains the detector to predict sparse proposals which are supervised by the seed proposals in a one-to-one matching fashion. SPG and SPR are iteratively performed so that seed proposals update to accurate supervision signals and sparse proposals evolve to precise object regions. Experiments on VOC and COCO object detection datasets show that SPE outperforms the state-of-the-art end-to-end methods by 7.0% mAP and 8.1% AP50. It is an order of magnitude faster than the two-stage methods, setting the first solid baseline for end-to-end WSOD with sparse proposals. The code is available at https://github.com/MingXiangL/SPE .
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection
Yufei Yin,Jiajun Deng,Wengang Zhou,Li Li,Houqiang Li +4 more
- 01 Oct 2023
TL;DR: Cyclic-Bootstrap Labeling (CBL) optimizes MIDN with rank information from a reliable teacher network, improving the quality of pseudo-labeling and enhancing object detection performance.
6
Weakly Supervised Open-Vocabulary Object Detection
Jianghang Lin,Yunhang Shen,Bingquan Wang,Shaohui Lin,Huanlai Xing,Liujuan Cao +5 more
TL;DR: WSOVOD extends traditional weakly supervised object detection to open-vocabulary and cross-dataset learning, achieving state-of-the-art performance.
4
Misclassification in Weakly Supervised Object Detection.
Yonghua Xu,Jian Yang,Xuelong Li +2 more
TL;DR: Misclassification in weakly supervised object detection (WSOD) is a problem where some proposals exhibit semantic similarities with objects from other categories due to viewing perspective and background interference. MCC and MCT methods alleviate this problem by summarizing misclassification cases and decreasing loss weights of misclassified classes.
4
Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes
TL;DR: A Spatial Self-Distillation based Object Detector (SSD-Det) is heuristically proposed to mine spatial information to refine the inaccurate box in a self-distillation fashion and achieves state-of-the-art performance.
Proposal Feature Learning Using Proposal Relations for Weakly Supervised Object Detection
Zhaofei Wang,Weijia Zhang,Min-Ling Zhang +2 more
- 15 Jul 2024
TL;DR: This work proposes two approaches, PFL-WSOD, to improve weakly supervised object detection by capturing intra-proposal and inter-proposal relations through self-attention and salient region banks, respectively, enhancing proposal representation and detection accuracy.
3
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
•Posted Content
Focal Loss for Dense Object Detection
TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
16.7K
Learning Deep Features for Discriminative Localization
Bolei Zhou,Aditya Khosla,Agata Lapedriza,Aude Oliva,Antonio Torralba +4 more
- 27 Jun 2016
TL;DR: This work revisits the global average pooling layer proposed in [13], and sheds light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels.