Open AccessPosted Content
From Coarse to Fine: Robust Hierarchical Localization at Large Scale
TL;DR: HF-Net as discussed by the authors proposes a hierarchical approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization, which achieves remarkable localization robustness across large variations of appearance and sets a new state-of-theart on two challenging benchmarks for large-scale localization.
read more
Abstract: Robust and accurate visual localization is a fundamental capability for numerous applications, such as autonomous driving, mobile robotics, or augmented reality. It remains, however, a challenging task, particularly for large-scale environments and in presence of significant appearance changes. State-of-the-art methods not only struggle with such scenarios, but are often too resource intensive for certain real-time applications. In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. We exploit the coarse-to-fine localization paradigm: we first perform a global retrieval to obtain location hypotheses and only later match local features within those candidate places. This hierarchical approach incurs significant runtime savings and makes our system suitable for real-time operation. By leveraging learned descriptors, our method achieves remarkable localization robustness across large variations of appearance and sets a new state-of-the-art on two challenging benchmarks for large-scale localization.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
View graph construction for large-scale uav images: an evaluation of state-of-the-art methods
J. Liu,Y. Ma,S. Jiang,Q. Li,W. Jiang,L. Wang +5 more
TL;DR: The test results demonstrate that the optimal method VLAD with HNSW can speed up about 100 times in finding matching candidate subset and a view graph that guides scene partition and sub-scene reconstruction in parallel SfM can be created by the optimal method.
Survey of Deep Learning-Based Methods for FMCW Radar Odometry and Ego-Localization
Marvin Brune,Tobias Meisen,Andr Pomp +2 more
TL;DR: Survey of deep learning-based methods for FMCW radar odometry and ego-localization focuses on the challenges of odometry and loop closure detection using FMCW radar sensors. The paper emphasizes the importance of these tasks in autonomous driving and introduces deep learning approaches to address them.
Towards Foundation Models for 3D Vision: How Close Are We?
Yiming Zuo,Karhan Kayan,Maggie Haitian Wang,Kwon-Su Jeon,Jia Deng,Thomas L. Griffiths +5 more
- 14 Oct 2024
TL;DR: This study evaluates 3D visual understanding in foundation models, revealing VLMs perform poorly, while specialized models are accurate but not robust, highlighting the need for improved 3D vision capabilities in AI systems.
Evaluation of Long-term Deep Visual Place Recognition
Farid Alijani,Jukka Peltomaki,Jussi Puura,Heikki Huttunen,Joni-Kristian Kamarainen,Esa Rahtu +5 more
- 01 Jan 2022
TL;DR: This paper studies recent visual place recognition and image retrieval methods and utilizes them to conduct extensive and comprehensive experiments on two diverse and large long-term indoor and outdoor robot navigation datasets, e.g., COLD and Oxford Radar RobotCar.
Enhancing Visual Place Recognition with Multi-modal Features and Time-constrained Graph Attention Aggregation
Zhuo Wang,Yunzhou Zhang,Xin Zhao,Jian Ning,Dehui Zou,Meiqi Pei +5 more
- 13 May 2024
TL;DR: This paper proposes a multi-modal visual place recognition method that incorporates depth information and a time-constrained graph attention aggregation to enhance robustness against appearance and perspective changes in autonomous driving and robotic navigation.
References
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
•Posted Content
Distilling the Knowledge in a Neural Network
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
21.2K
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler,Andrew Howard,Menglong Zhu,Andrey Zhmoginov,Liang-Chieh Chen +4 more
- 18 Jun 2018
TL;DR: MobileNetV2 as mentioned in this paper is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers and intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.