Enhancing Conventional Geometry-Based Visual Odometry Pipeline Through Integration of Deep Descriptors

doi:10.1109/ACCESS.2023.3284463

Journal Article10.1109/ACCESS.2023.3284463

Enhancing Conventional Geometry-Based Visual Odometry Pipeline Through Integration of Deep Descriptors

- Vol. 11, pp 58294-58307

TL;DR: In this paper , the authors integrate deep descriptors to improve the correspondence between image points for tracking in a traditional geometry-based visual odometry (VO) pipeline and propose a simple stereo VO pipeline inspired by popular techniques found in the literature.

Abstract: Geometry-based Visual Odometry (VO) techniques are renowned in the fields of computer vision and robotics. They use methods from multi-view geometry to estimate camera motion from visual data obtained from one or more cameras. Tracking the camera motion precisely between different views is dependent on the correct estimation of correspondences between salient points of the views. In practice, geometry-based methods are found to be quite effective but do not perform well in challenging cases due to tracking failures caused by abrupt motion, occlusions, textureless and low-light scenes, etc. On the contrary, end-to-end learning from visual data using deep neural networks is an emerging area of research and deals with challenging cases successfully. Despite being computationally expensive, these methods do not outperform their counterparts in conditions favorable to geometry-based methods. Considering these facts in this work, our goal is to integrate deep descriptors to improve the correspondence between image points for tracking in a traditional geometry-based VO pipeline. We propose a simple stereo VO pipeline inspired by popular techniques found in the literature. Two conventional and four deep descriptors have been used in our experiments conducted on various image sequences of the challenging KITTI benchmark dataset. We have determined empirically that deep descriptors can effectively minimize drift in the VO estimates and produce better camera trajectories. The experimental results on the KITTI dataset demonstrate that our VO method performs at par with the state-of-the-art works reported in the literature.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

Journal Article•10.1023/B:VISI.0000029664.99615.94

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004

- International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

59.3K

Journal Article•10.1145/358669.358692

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Martin A. Fischler, +1 more

- 01 Jun 1981

- Communications of The ACM

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.

...read moreread less

27.9K

•Posted Content

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 03 Dec 2019

- arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

25.9K

Proceedings Article•10.1109/ICCV.1999.790410

Object recognition from local scale-invariant features

David G. Lowe

- 20 Sep 1999

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

19.3K

Proceedings Article•10.1109/CVPR.2012.6248074

Are we ready for autonomous driving? The KITTI vision benchmark suite

Andreas Geiger, +2 more

- 16 Jun 2012

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.

...read moreread less

16.3K

...

Expand