Journal Article10.1109/cvpr.2015.7299125
Beyond spatial pooling: Fine-grained representation learning in multiple domains
Chi Li,A. Reiter,Gregory D. Hager +2 more
- 01 Jun 2015
pp 4913-4922
5
TL;DR: This paper forms a probabilistic framework for analyzing the performance of pooling, and applies multiple scales of filters coupled with different pooling granularities, and makes use of color as an additional pooling domain, thereby reducing the sensitivity to spatial deformations.
read more
Abstract: Object recognition systems have shown great progress over recent years. However, creating object representations that are robust to changes in viewpoint while capturing local visual details continues to be a challenge. In particular, recent convolutional architectures employ spatial pooling to achieve scale and shift invariances, but they are still sensitive to out-of-plane rotations. In this paper, we formulate a probabilistic framework for analyzing the performance of pooling. This framework suggests two directions for improvement. First, we apply multiple scales of filters coupled with different pooling granularities, and second we make use of color as an additional pooling domain, thereby reducing the sensitivity to spatial deformations. We evaluate our algorithm on the object instance recognition task using two independent publicly available RGB-D datasets, and demonstrate significant improvements over the current state-of-the-art. In addition, we present a new dataset for industrial objects to further validate the effectiveness of our approach versus other state-of-the-art approaches for object recognition using RGB-D data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community
TL;DR: In this article, the authors provide a comprehensive survey of state-of-the-art remote sensing deep learning research for remote sensing applications, focusing on theories, tools, and challenges for the remote sensing community.
705
Fractal dimension of bag-of-visual words
Lucas Correia Ribas,Diogo Nunes Gonçalves,Jonathan de Andrade Silva,Amaury Castro,Odemir Martinez Bruno,Wesley Nunes Gonçalves +5 more
TL;DR: This paper proposes a new method to describe the visual words using the fractal dimension through box-counting method, and the experimental results reveal that the proposed method leads to highly discriminative features of theVisual words.
7
•Posted Content
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
Hirokatsu Kataoka,Yudai Miyashita,Tomoaki K. Yamabe,Soma Shirakabe,Sato Shinichi,Hironori Hoshino,Ryo Kato,Kaori Abe,Takaaki Imanari,Naomichi Kobayashi,Shinichiro Morita,Akio Nakamura +11 more
TL;DR: This review focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, and proposed "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.
cvpaper.challenge in CVPR2015 -- A review of CVPR2015
Kataoka Hirokatsu,Miyashita Yudai,Yamabe Tomoaki,Shirakabe Soma,Sato Shin'ichi,Hoshino Hironori,Kato Ryo,Abe Kaori,Imanari Takaaki,Kobayashi Naomichi,Morita Shinichiro,Nakamura Akio +11 more
- 14 Dec 2015
TL;DR: This challenge aims to simultaneously read papers and create documents for easy understanding top conference papers in Japanese in the fields of computer vision, image processing, pattern recognition and machine learning.
•Posted Content
cvpaper.challenge in 2016: Futuristic Computer Vision through 1, 600 Papers Survey.
Hirokatsu Kataoka,Soma Shirakabe,Yun He,Shunya Ueta,Teppei Suzuki,Kaori Abe,Asako Kanezaki,Shinichiro Morita,Toshiyuki Yabe,Yoshihiro Kanehara,Hiroya Yatsuyanagi,Shinya Maruyama,Ryousuke Takasawa,Masataka Fuchida,Yudai Miyashita,Kazushige Okayasu,Yuta Matsuzaki +16 more
TL;DR: The paper gives futuristic challenges disscussed in the cvpaper.challenge.
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Svetlana Lazebnik,Cordelia Schmid,Jean Ponce +2 more
- 17 Jun 2006
TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
TL;DR: SPP-Net as mentioned in this paper proposes a spatial pyramid pooling strategy, which can generate a fixed-length representation regardless of image size/scale, and achieves state-of-the-art performance in object detection.
8.6K
•Proceedings Article
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
Pierre Sermanet,David Eigen,Xiang Zhang,Michael Mathieu,Rob Fergus,Yann LeCun +5 more
- 23 Feb 2014
TL;DR: In this article, a multiscale and sliding window approach is proposed to predict object boundaries, which is then accumulated rather than suppressed in order to increase detection confidence, and OverFeat is the winner of the ImageNet Large Scale Visual Recognition Challenge 2013.
4.8K
•Proceedings Article
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Jeff Donahue,Yangqing Jia,Oriol Vinyals,Judy Hoffman,Ning Zhang,Eric Tzeng,Trevor Darrell +6 more
- 21 Jun 2014
TL;DR: DeCAF as discussed by the authors is an open-source implementation of these deep convolutional activation features, along with all associated network parameters, to enable vision researchers to conduct experimentation with deep representations across a range of visual concept learning paradigms.