Exploit Bounding Box Annotations for Multi-Label Object Recognition

doi:10.1109/CVPR.2016.37

Open AccessProceedings Article10.1109/CVPR.2016.37

Exploit Bounding Box Annotations for Multi-Label Object Recognition

Hao Yang, +5 more

- 27 Jun 2016

- pp 280-288

195

TL;DR: This paper first extracts object proposals from each image, then proposes to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline.

Abstract: Convolutional neural networks (CNNs) have shown great performance as general feature representations for object recognition applications. However, for multi-label images that contain multiple objects from different categories, scales and locations, global CNN features are not optimal. In this paper, we incorporate local information to enhance the feature discriminative power. In particular, we first extract object proposals from each image. With each image treated as a bag and object proposals extracted from it treated as instances, we transform the multi-label recognition problem into a multi-class multi-instance learning problem. Then, in addition to extracting the typical CNN feature representation from each proposal, we propose to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline. The proposed multi-view multiinstance framework utilizes both weak and strong labels effectively, and more importantly it has the generalization ability to even boost the performance of unseen categories by partial strong labels from other categories. Our framework is extensively compared with state-of-the-art handcrafted feature based methods and CNN based methods on two multi-label benchmark datasets. The experimental results validate the discriminative power and the generalization ability of the proposed framework. With strong labels, our framework is able to achieve state-of-the-art results in both datasets.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CVPR.2019.00532

Multi-Label Image Recognition With Graph Convolutional Networks

Zhao-Min Chen, +3 more

- 07 Apr 2019

TL;DR: This work proposes a multi-label classification model based on Graph Convolutional Network (GCN), and proposes a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN.

...read moreread less

1K

•Posted Content

Multi-Label Image Recognition with Graph Convolutional Networks

Zhao-Min Chen, +3 more

- 07 Apr 2019

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Zhang et al. as mentioned in this paper proposed a multi-label classification model based on Graph Convolutional Network (GCN), where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers.

...read moreread less

763

•Proceedings Article•10.1109/ICCV.2017.58

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Zhouxia Wang, +4 more

- 01 Oct 2017

TL;DR: In this paper, a spatial transformer layer is proposed to locate attentional regions from the convolutional feature maps in a region-proposal-free way and an LSTM (Long Short Term Memory) sub-network is used to sequentially predict semantic labeling scores on the located regions.

...read moreread less

417

•Journal Article•10.1109/TIP.2017.2689998

Deep Label Distribution Learning With Label Ambiguity

Bin-Bin Gao, +4 more

- 01 Jun 2017

- IEEE Transactions on Image Processing

TL;DR: The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small.

...read moreread less

408

•Proceedings Article•10.1109/ICCV.2019.00061

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Tianshui Chen, +4 more

- 01 Oct 2019

TL;DR: Semantic-Specific Graph Representation Learning (SSGRL) as mentioned in this paper proposes a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence.

...read moreread less

402

...

Expand

References

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

51.9K

•Proceedings Article•10.1109/CVPR.2014.81

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

- 23 Jun 2014

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

33.7K

•Journal Article•10.1007/S11263-009-0275-4

The Pascal Visual Object Classes (VOC) Challenge

Mark Everingham, +4 more

- 01 Jun 2010

- International Journal of Computer Vision

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

...read moreread less

21.3K

...

Expand

Exploit Bounding Box Annotations for Multi-Label Object Recognition

Chat with Paper

AI Agents for this Paper

Citations

Multi-Label Image Recognition With Graph Convolutional Networks

Multi-Label Image Recognition with Graph Convolutional Networks

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Deep Label Distribution Learning With Label Ambiguity

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

The Pascal Visual Object Classes (VOC) Challenge

Related Papers (5)

Deep Residual Learning for Image Recognition

The Pascal Visual Object Classes (VOC) Challenge

CNN-RNN: A Unified Framework for Multi-label Image Classification

Microsoft COCO: Common Objects in Context

ImageNet: A large-scale hierarchical image database