LoFTR: Detector-Free Local Feature Matching with Transformers
Jiaming Sun,Zehong Shen,Yuang Wang,Hujun Bao,Xiaowei Zhou +4 more
- 01 Apr 2021
- pp 8922-8931
TL;DR: LoFTR as discussed by the authors uses self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images, which enables the method to produce dense matches in low-texture areas.
read more
Abstract: We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The global receptive field provided by Transformer enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods. Code is available at our project page: https://zju3dv.github.io/loftr/.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures
![Figure 1: Comparison between the proposed method LoFTR and the detector-based method SuperGlue [37]. This example demonstrates that LoFTR is capable of finding correspondences on the texture-less wall and the floor with repetitive patterns, where detector-based methods struggle to find repeatable interest points.1](/figures/figure1-1-2z16ejavm0s4.png)
Figure 1: Comparison between the proposed method LoFTR and the detector-based method SuperGlue [37]. This example demonstrates that LoFTR is capable of finding correspondences on the texture-less wall and the floor with repetitive patterns, where detector-based methods struggle to find repeatable interest points.1 ![Table 3: Evaluation on MegaDepth [21] for outdoor pose estimation. Matching with LoFTR results in better performance in the outdoor pose estimation task.](/figures/table3-1-719iw5evnzee.png)
Table 3: Evaluation on MegaDepth [21] for outdoor pose estimation. Matching with LoFTR results in better performance in the outdoor pose estimation task. ![Table 1: Homography estimation on HPatches [7]. The AUC of the corner error in percentage is reported. The suffix DS indicates the differentiable matching with dualsoftmax.](/figures/table1-1-1odru6m6v34g.png)
Table 1: Homography estimation on HPatches [7]. The AUC of the corner error in percentage is reported. The suffix DS indicates the differentiable matching with dualsoftmax. ![Table 2: Evaluation on ScanNet [7] for indoor pose estimation. The AUC of the pose error in percentage is reported. LoFTR improves the state-of-the-art methods by a large margin. †indicates models trained on MegaDepth. The suffixes OT and DS indicate differentiable matching with optimal transport and dual-softmax, respectively.](/figures/table2-1-atecwm4swiar.png)
Table 2: Evaluation on ScanNet [7] for indoor pose estimation. The AUC of the pose error in percentage is reported. LoFTR improves the state-of-the-art methods by a large margin. †indicates models trained on MegaDepth. The suffixes OT and DS indicate differentiable matching with optimal transport and dual-softmax, respectively. ![Table 4: Visual localization evaluation on the Aachen Day-Night [54] benchmark v1.1. The evaluation results on both the local feature evaluation track and the full visual localization track are reported.](/figures/table4-1-6yghqw78y9i1.png)
Table 4: Visual localization evaluation on the Aachen Day-Night [54] benchmark v1.1. The evaluation results on both the local feature evaluation track and the full visual localization track are reported. ![Table 5: Visual localization evaluation on the InLoc [41] benchmark.](/figures/table5-1-4ry8peczgi7p.png)
Table 5: Visual localization evaluation on the InLoc [41] benchmark.
Citations
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
Xuyang Bai,Zeyu Hu,Xinge Zhu,Qingqiu Huang,Yilun Chen,Hongbo Fu,Chiew-Lan Tai +6 more
- 22 Mar 2022
TL;DR: The proposed TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions, achieves state-of-the-art performance on large-scale datasets and is extended to the 3D tracking task.
398
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
01 Jun 2022
TL;DR: TransFusion as mentioned in this paper proposes a soft-association mechanism to handle inferior image conditions, e.g., bad illumination and sensor misalignment, and achieves state-of-the-art performance on large-scale datasets.
Geometric Transformer for Fast and Robust Point Cloud Registration
01 Jun 2022
TL;DR: GeoTransformer as mentioned in this paper learns geometric feature for robust superpoint matching, which encodes pair-wise distances and triplet-wise angles, making it robust in low-overlap cases and invariant to rigid transformation.
240
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
TL;DR: Zhang et al. as discussed by the authors proposed a view-conditioned 2D diffusion model, Zero123, to generate multi-view images for the input view, and then aim to lift them up to 3D space.
•Posted Content
GMFlow: Learning Optical Flow via Global Matching
TL;DR: In this article, the authors propose a GMFlow framework, which consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
194
References
•Proceedings Article
COTR: Correspondence Transformer for Matching Across Images
Wei Jiang,Eduard Trulls,Jan Hosang,Andrea Tagliasacchi,Kwang Moo Yi +4 more
- 25 Mar 2021
TL;DR: In this article, the authors propose a novel framework for finding correspondences in images based on a deep neural network that, given two images and a query point in one of them, finds its correspondence in the other.
Neighbourhood Consensus Networks
Ignacio Rocco,Mircea Cimpoi,Relja Arandjelović,Akihiko Torii,Tomáš Pajdla,Josef Šivic +5 more
- 01 Jan 2018
TL;DR: A novel end-to-end trainable convolutional neural network for finding reliable dense correspondences between a pair of images based on neighbourhood consensus patterns.
164
•Posted Content
ContextDesc: Local Descriptor Augmentation with Cross-Modality Context
TL;DR: This paper proposes a unified learning framework that leverages and aggregates the cross-modality contextual information, including visual context from high-level image representation, and geometric context from 2D keypoint distribution, and proposes an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence.
Learning Accurate Dense Correspondences and When to Trust Them
Prune Truong,Martin Danelljan,Luc Van Gool,Radu Timofte +3 more
- 05 Jan 2021
TL;DR: PDCNet as discussed by the authors proposes a probabilistic approach to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map indicating the reliability and accuracy of the prediction.
GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences
Prune Truong,Martin Danelljan,Radu Timofte +2 more
- 14 Jun 2020
TL;DR: GLU-Net as mentioned in this paper proposes a universal network architecture that is directly applicable to all the aforementioned dense correspondence problems, achieving both high accuracy and robustness to large displacements by investigating the combined use of global and local correlation layers.