Fine-Tuning CNN Image Retrieval with No Human Annotation

doi:10.1109/TPAMI.2018.2846566

Open AccessJournal Article10.1109/TPAMI.2018.2846566

Fine-Tuning CNN Image Retrieval with No Human Annotation

Filip Radenovic, +2 more

- 01 Jul 2019

- IEEE Transactions on Pattern Analysis an...

- Vol. 41, Iss: 7, pp 1655-1668

1.1K

TL;DR: It is shown that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.

Abstract: Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CVPR.2019.00342

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Torsten Sattler, +3 more

- 15 Jun 2019

TL;DR: In this article, the authors developed a theoretical model for camera pose regression and showed that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure.

...read moreread less

369

Journal Article•10.1016/J.COSE.2020.101748

Image-Based malware classification using ensemble of CNN architectures (IMCEC)

Danish Vasan, +6 more

- 01 May 2020

- Computers & Security

TL;DR: A novel ensemble convolutional neural networks (CNNs) based architecture for effective detection of both packed and unpacked malware, named Image-based Malware Classification using Ensemble of CNNs (IMCEC).

...read moreread less

345

•Proceedings Article•10.1109/CVPR46437.2021.00326

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

Paul-Edouard Sarlin, +10 more

- 01 Jun 2021

TL;DR: PixLoc as discussed by the authors aligns multiscale deep features with a 3D model to estimate a 6-DoF pose from an image and 3D models, which can localize in large environments given coarse pose priors.

...read moreread less

312

•Journal Article•10.1109/tpami.2021.3054775

Deep Learning for Person Re-Identification: A Survey and Outlook

01 Jun 2022

- IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as discussed by the authors conducted a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization.

...read moreread less

301

•Posted Content

Understanding the Limitations of CNN-based Absolute Camera Pose Regression

Torsten Sattler, +3 more

- 18 Mar 2019

- arXiv: Computer Vision and Pattern Recog...

TL;DR: A theoretical model for camera pose regression is developed that is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure, and shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.

...read moreread less

277

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

...

Expand

Fine-Tuning CNN Image Retrieval with No Human Annotation

Chat with Paper

AI Agents for this Paper

Citations

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Image-Based malware classification using ensemble of CNN architectures (IMCEC)

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

Deep Learning for Person Re-Identification: A Survey and Outlook

Understanding the Limitations of CNN-based Absolute Camera Pose Regression

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Adam: A Method for Stochastic Optimization

Related Papers (5)

Deep Residual Learning for Image Recognition

Object retrieval with large vocabularies and fast spatial matching

Distinctive Image Features from Scale-Invariant Keypoints

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

ImageNet Classification with Deep Convolutional Neural Networks