Exploit Bounding Box Annotations for Multi-Label Object Recognition
Hao Yang,Joey Tianyi Zhou,Yu Zhang,Bin-Bin Gao,Jianxin Wu,Jianfei Cai +5 more
- 27 Jun 2016
- pp 280-288
TL;DR: This paper first extracts object proposals from each image, then proposes to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline.
read more
Abstract: Convolutional neural networks (CNNs) have shown great performance as general feature representations for object recognition applications. However, for multi-label images that contain multiple objects from different categories, scales and locations, global CNN features are not optimal. In this paper, we incorporate local information to enhance the feature discriminative power. In particular, we first extract object proposals from each image. With each image treated as a bag and object proposals extracted from it treated as instances, we transform the multi-label recognition problem into a multi-class multi-instance learning problem. Then, in addition to extracting the typical CNN feature representation from each proposal, we propose to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline. The proposed multi-view multiinstance framework utilizes both weak and strong labels effectively, and more importantly it has the generalization ability to even boost the performance of unseen categories by partial strong labels from other categories. Our framework is extensively compared with state-of-the-art handcrafted feature based methods and CNN based methods on two multi-label benchmark datasets. The experimental results validate the discriminative power and the generalization ability of the proposed framework. With strong labels, our framework is able to achieve state-of-the-art results in both datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multi-Label Image Recognition With Graph Convolutional Networks
Zhao-Min Chen,Xiu-Shen Wei,Peng Wang,Yanwen Guo +3 more
- 07 Apr 2019
TL;DR: This work proposes a multi-label classification model based on Graph Convolutional Network (GCN), and proposes a novel re-weighted scheme to create an effective label correlation matrix to guide information propagation among the nodes in GCN.
•Posted Content
Multi-Label Image Recognition with Graph Convolutional Networks
TL;DR: Zhang et al. as mentioned in this paper proposed a multi-label classification model based on Graph Convolutional Network (GCN), where each node (label) is represented by word embeddings of a label, and GCN is learned to map this label graph into a set of inter-dependent object classifiers.
763
Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Zhouxia Wang,Tianshui Chen,Guanbin Li,Ruijia Xu,Liang Lin +4 more
- 01 Oct 2017
TL;DR: In this paper, a spatial transformer layer is proposed to locate attentional regions from the convolutional feature maps in a region-proposal-free way and an LSTM (Long Short Term Memory) sub-network is used to sequentially predict semantic labeling scores on the located regions.
Deep Label Distribution Learning With Label Ambiguity
TL;DR: The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small.
Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition
Tianshui Chen,Muxin Xu,Xiaolu Hui,Hefeng Wu,Liang Lin +4 more
- 01 Oct 2019
TL;DR: Semantic-Specific Graph Representation Learning (SSGRL) as mentioned in this paper proposes a semantic decoupling module that incorporates category semantics to guide learning semantic-specific representations and a semantic interaction module that correlates these representations with a graph built on the statistical label co-occurrence.
References
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Ross Girshick,Jeff Donahue,Trevor Darrell,Jitendra Malik +3 more
- 23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.