LoFTR: Detector-Free Local Feature Matching with Transformers

doi:10.1109/CVPR46437.2021.00881

Open AccessProceedings Article10.1109/CVPR46437.2021.00881

LoFTR: Detector-Free Local Feature Matching with Transformers

Jiaming Sun, +4 more

- 01 Apr 2021

- pp 8922-8931

1.3K

TL;DR: LoFTR as discussed by the authors uses self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images, which enables the method to produce dense matches in low-texture areas.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 1: Comparison between the proposed method LoFTR and the detector-based method SuperGlue [37]. This example demonstrates that LoFTR is capable of finding correspondences on the texture-less wall and the floor with repetitive patterns, where detector-based methods struggle to find repeatable interest points.1

Table 3: Evaluation on MegaDepth [21] for outdoor pose estimation. Matching with LoFTR results in better performance in the outdoor pose estimation task.

Table 1: Homography estimation on HPatches [7]. The AUC of the corner error in percentage is reported. The suffix DS indicates the differentiable matching with dualsoftmax.

Table 2: Evaluation on ScanNet [7] for indoor pose estimation. The AUC of the pose error in percentage is reported. LoFTR improves the state-of-the-art methods by a large margin. †indicates models trained on MegaDepth. The suffixes OT and DS indicate differentiable matching with optimal transport and dual-softmax, respectively.

Table 4: Visual localization evaluation on the Aachen Day-Night [54] benchmark v1.1. The evaluation results on both the local feature evaluation track and the full visual localization track are reported.

Table 5: Visual localization evaluation on the InLoc [41] benchmark.

Citations

Proceedings Article•10.1109/CVPR52688.2022.01578

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

Jiankun Li, +8 more

- 22 Mar 2022

TL;DR: A hierarchical network with recurrent refinement to update disparities in a coarse-to-fine manner, as well as a stacked cascaded architecture for inference and a new synthetic dataset with special attention to difficult cases for better generalizing to real-world scenes are introduced.

...read moreread less

150

Proceedings Article•10.1109/CVPR52688.2022.01086

Geometric Transformer for Fast and Robust Point Cloud Registration

Zheng Qin, +5 more

- 14 Feb 2022

TL;DR: This work proposes Geometric Transformer, a simplistic design that attains surprisingly high matching accuracy such that no RANSAC is required in the estimation of alignment transformation, leading to 100 times acceleration.

...read moreread less

148

Proceedings Article•10.1109/CVPR52688.2022.01086

Geometric Transformer for Fast and Robust Point Cloud Registration

Hao Yu, +5 more

- 14 Feb 2022

TL;DR: This work proposes Geometric Transformer, a simplistic design that attains surprisingly high matching accuracy such that no RANSAC is required in the estimation of alignment transformation, leading to 100 times acceleration.

...read moreread less

140

•Proceedings Article•10.1109/cvpr52688.2022.00839

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

01 Jun 2022

TL;DR: TransMVSNet as discussed by the authors proposes a feature matching transformer to aggregate long-range context information within and across images, which achieves state-of-the-art performance on DTU dataset, Tanks and Temples benchmark and BlendedMVS dataset.

...read moreread less

134

•Proceedings Article•10.1145/3528233.3530718

Neural 3D Reconstruction in the Wild

Jiaming Sun, +6 more

- 25 May 2022

TL;DR: This work introduces a new method that enables efficient and accurate surface reconstruction from Internet photo collections in the presence of varying illumination and proposes a hybrid voxel- and surface-guided sampling technique that allows for more efficient ray sampling around surfaces and leads to significant improvements in reconstruction quality.

...read moreread less

123

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Journal Article•10.1023/B:VISI.0000029664.99615.94

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004

- International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

59.3K

Preprint•10.48550/arxiv.1706.03762

Attention Is All You Need

Ashish Vaswani, +7 more

- 01 Jan 2017

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

51.8K