Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes
TL;DR: This paper proposes a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarse- grained classes and image-level labels of large scalefine-grains classes, and can detect all classes at nearly fully-super supervised accuracy.
read more
Abstract: Recent advances in deep learning greatly boost the performance of object detection. State-of-the-art methods such as Faster-RCNN, FPN and R-FCN have achieved high accuracy in challenging benchmark datasets. However, these methods require fully annotated object bounding boxes for training, which are incredibly hard to scale up due to the high annotation cost. Weakly-supervised methods, on the other hand, only require image-level labels for training, but the performance is far below their fully-supervised counterparts. In this paper, we propose a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarse-grained classes and image-level labels of large scale fine-grained classes, and can detect all classes at nearly fully-supervised accuracy. We achieve this by utilizing the correlations between coarse-grained and fine-grained classes with shared backbone, soft-attention based proposal re-ranking, and a dual-level memory module. Experiment results show that our methods can achieve close accuracy on object detection to state-of-the-art fully-supervised methods on two large scale datasets, ImageNet and OpenImages, with only a small fraction of fully annotated classes.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Detecting Twenty-Thousand Classes Using Image-Level Supervision
TL;DR: Detic as mentioned in this paper proposes to train the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts, making it much easier to implement and compatible with a range of detection architectures and backbones.
Improving Object Detection with Selective Self-supervised Self-training
Yandong Li,Di Huang,Danfeng Qin,Liqiang Wang,Boqing Gong +4 more
- 23 Aug 2020
TL;DR: A selective net is proposed to rectify the supervision signals in Web images and not only identifies positive bounding boxes but also creates a safe zone for mining hard negative boxes.
89
Grounded Situation Recognition
Sarah M Pratt,Mark Yatskar,Luca Weihs,Ali Farhadi,Aniruddha Kembhavi,Aniruddha Kembhavi +5 more
- 23 Aug 2020
TL;DR: In this article, the authors introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles, and bounding-box groundings of entities.
73
Learning Open-World Object Proposals Without Learning to Classify
01 Apr 2022
TL;DR: Object Localization Network (OLN) as discussed by the authors estimates the objectness of each region purely by how well the location and shape of a region overlap with any ground-truth object (e.g., centerness and IoU).
61
Object Detection with a Unified Label Space from Multiple Datasets
Xiangyun Zhao,Samuel Schulter,Gaurav Sharma,Yi-Hsuan Tsai,Manmohan Chandraker,Ying Wu +5 more
- 23 Aug 2020
TL;DR: Zhao et al. as mentioned in this paper propose loss functions that carefully integrate partial but correct annotations with complementary but noisy pseudo labels to train a single object detector predicting over the union of all the label spaces.
50
References
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
- 06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon,Santosh K. Divvala,Ross Girshick,Ali Farhadi +3 more
- 27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
SSD: Single Shot MultiBox Detector
Wei Liu,Dragomir Anguelov,Dumitru Erhan,Christian Szegedy,Scott Reed,Cheng-Yang Fu,Alexander C. Berg +6 more
- 08 Oct 2016
TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.