TL;DR: In this article, a framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, is proposed for real-time mobile robot trajectory estimation and map-building.
Abstract: We propose a framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building. LIO-SAM formulates lidar-inertial odometry atop a factor graph, allowing a multitude of relative and absolute measurements, including loop closures, to be incorporated from different sources as factors into the system. The estimated motion from inertial measurement unit (IMU) pre-integration de-skews point clouds and produces an initial guess for lidar odometry optimization. The obtained lidar odometry solution is used to estimate the bias of the IMU. To ensure high performance in real-time, we marginalize old lidar scans for pose optimization, rather than matching lidar scans to a global map. Scan-matching at a local scale instead of a global scale significantly improves the real-time performance of the system, as does the selective introduction of keyframes, and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior "sub-keyframes." The proposed method is extensively evaluated on datasets gathered from three platforms over various scales and environments.
TL;DR: A framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior "sub-keyframes."
Abstract: We propose a framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building. LIO-SAM formulates lidar-inertial odometry atop a factor graph, allowing a multitude of relative and absolute measurements, including loop closures, to be incorporated from different sources as factors into the system. The estimated motion from inertial measurement unit (IMU) pre-integration de-skews point clouds and produces an initial guess for lidar odometry optimization. The obtained lidar odometry solution is used to estimate the bias of the IMU. To ensure high performance in real-time, we marginalize old lidar scans for pose optimization, rather than matching lidar scans to a global map. Scan-matching at a local scale instead of a global scale significantly improves the real-time performance of the system, as does the selective introduction of keyframes, and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior ``sub-keyframes.'' The proposed method is extensively evaluated on datasets gathered from three platforms over various scales and environments.
TL;DR: Li et al. as mentioned in this paper proposed a self-supervised monocular depth estimation network trained on stereo videos without any external supervision, which aligns the training image pairs into similar lighting condition with predictive brightness transformation parameters.
Abstract: We propose D3VO as a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation. We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision. In particular, it aligns the training image pairs into similar lighting condition with predictive brightness transformation parameters. Besides, we model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy and provides a learned weighting function for the photometric residuals in direct (feature-less) visual odometry. Evaluation results show that the proposed network outperforms state-of-the-art self-supervised depth estimation networks. D3VO tightly incorporates the predicted depth, pose and uncertainty into a direct visual odometry method to boost both the front-end tracking as well as the back-end non-linear optimization. We evaluate D3VO in terms of monocular visual odometry on both the KITTI odometry benchmark and the EuRoC MAV dataset. The results show that D3VO outperforms state-of-the-art traditional monocular VO methods by a large margin. It also achieves comparable results to state-of-the-art stereo/LiDAR odometry on KITTI and to the state-of-the-art visual-inertial odometry on EuRoC MAV, while using only a single camera.
TL;DR: This paper presents a robust, real-time LOAM algorithm for LiDARs with small FoV and irregular samplings, and addresses several fundamental challenges arising from such LiDars, and achieves better performance in both precision and efficiency compared to existing baselines.
Abstract: LiDAR odometry and mapping (LOAM) has been playing an important role in autonomous vehicles, due to its ability to simultaneously localize the robot’s pose and build high-precision, high-resolution maps of the surrounding environment. This enables autonomous navigation and safe path planning of autonomous vehicles. In this paper, we present a robust, real-time LOAM algorithm for LiDARs with small FoV and irregular samplings. By taking effort on both frontend and back-end, we address several fundamental challenges arising from such LiDARs, and achieve better performance in both precision and efficiency compared to existing baselines. To share our findings and to make contributions to the community, we open source our codes on Github1.
TL;DR: Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that the approach outperforms other SoTA methods, demonstrating the effectiveness of each module of the proposed method.
Abstract: Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by watching unlabeled videos via deep convolutional network has made significant progress recently. Current state-of-the-art (SoTA) methods treat the two tasks independently. One typical assumption of the existing depth estimation methods is that the scenes contain no independent moving objects. while object moving could be easily modeled using optical flow. In this paper, we propose to address the two tasks as a whole, i.e., to jointly understand per-pixel 3D geometry and motion. This eliminates the need of static scene assumption and enforces the inherent geometrical consistency during the learning process, yielding significantly improved results for both tasks. We call our method as “Every Pixel Counts++” or “EPC++”. Specifically, during training, given two consecutive frames from a video, we adopt three parallel networks to predict the camera motion (MotionNet), dense depth map (DepthNet), and per-pixel optical flow between two frames (OptFlowNet) respectively. The three types of information, are fed into a holistic 3D motion parser (HMP), and per-pixel 3D motion of both rigid background and moving objects are disentangled and recovered. Various loss terms are formulated to jointly supervise the three networks. An effective adaptive training strategy is proposed to achieve better performance and more efficient convergence. Comprehensive experiments were conducted on datasets with different scenes, including driving scenario (KITTI 2012 and KITTI 2015 datasets), mixed outdoor/indoor scenes (Make3D) and synthetic animation (MPI Sintel dataset). Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods, demonstrating the effectiveness of each module of our proposed method. Code will be available at: https://github.com/chenxuluo/EPC .
TL;DR: Experimental results indicate that LINS offers comparable performance with the state-of-the-art lidar-inertial odometry in terms of stability and accuracy and has order- of-magnitude improvement in speed.
Abstract: We present LINS, a lightweight lidar-inertial state estimator, for real-time ego-motion estimation. The proposed method enables robust and efficient navigation for ground vehicles in challenging environments, such as feature-less scenes, via fusing a 6-axis IMU and a 3D lidar in a tightly-coupled scheme. An iterated error-state Kalman filter (ESKF) is designed to correct the estimated state recursively by generating new feature correspondences in each iteration, and to keep the system computationally tractable. Moreover, we use a robocentric formulation that represents the state in a moving local frame in order to prevent filter divergence in a long run. To validate robustness and generalizability, extensive experiments are performed in various scenarios. Experimental results indicate that LINS offers comparable performance with the state-of-the-art lidar-inertial odometry in terms of stability and accuracy and has order-of-magnitude improvement in speed.
TL;DR: Without any external infrastructures prepositioned, each agent cooperatively performs a consensus-based fusion, which fuses the obtained direct and indirect RL estimates, to generate the relative positions to its neighbors in real time despite the fact that some UAVs may not have direct range measurements to their neighbors.
Abstract: This puts forth an infrastructure-free cooperative relative localization (RL) for unmanned aerial vehicles (UAVs) in global positioning system (GPS)-denied environments. Instead of estimating relative coordinates with vision-based methods, an onboard ultra-wideband (UWB) ranging and communication (RCM) network is adopted to both sense the inter-UAV distance and exchange information for RL estimation in 2-D spaces. Without any external infrastructures prepositioned, each agent cooperatively performs a consensus-based fusion, which fuses the obtained direct and indirect RL estimates, to generate the relative positions to its neighbors in real time despite the fact that some UAVs may not have direct range measurements to their neighbors. The proposed RL estimation is then applied to formation control. Extensive simulations and real-world flight tests corroborate the merits of the developed RL algorithm.
TL;DR: This paper proposes a novel accurate method for dead-reckoning of wheeled vehicles based only on an Inertial Measurement Unit (IMU) using the Kalman filter and the use of deep neural networks to dynamically adapt the noise parameters of the filter.
Abstract: In this paper, we propose a novel accurate method for dead-reckoning of wheeled vehicles based only on an Inertial Measurement Unit (IMU). In the context of intelligent vehicles, robust and accurate dead-reckoning based on the IMU may prove useful to correlate feeds from imaging sensors, to safely navigate through obstructions, or for safe emergency stops in the extreme case of exteroceptive sensors failure. The key components of the method are the Kalman filter and the use of deep neural networks to dynamically adapt the noise parameters of the filter. The method is tested on the KITTI odometry dataset, and our dead-reckoning inertial method based only on the IMU accurately estimates 3D position, velocity, orientation of the vehicle and self-calibrates the IMU biases. We achieve on average a 1.10% translational error and the algorithm competes with top-ranked methods which, by contrast, use LiDAR or stereo vision.
TL;DR: A novel system that explicitly disentangles scale from the network estimation, which achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset and presents some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability.
Abstract: In this work, we tackle the essential problem of scale inconsistency for self supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.
TL;DR: In this article, an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and mapping is described, which is more robust, scalable, and automatically generates tree diameter estimations.
Abstract: This letter describes an end-to-end pipeline for tree diameter estimation based on semantic segmentation and lidar odometry and mapping. Accurate mapping of this type of environment is challenging since the ground and the trees are surrounded by leaves, thorns and vines, and the sensor typically experiences extreme motion. We propose a semantic feature based pose optimization that simultaneously refines the tree models while estimating the robot pose. The pipeline utilizes a custom virtual reality tool for labeling 3D scans that is used to train a semantic segmentation network. The masked point cloud is used to compute a trellis graph that identifies individual instances and extracts relevant features that are used by the SLAM module. We show that traditional lidar and image based methods fail in the forest environment on both Unmanned Aerial Vehicle (UAV) and hand-carry systems, while our method is more robust, scalable, and automatically generates tree diameter estimations.
TL;DR: In this paper, a non-linear factor recovery method was proposed to extract relevant information for visual-inertial mapping from visual-intrusive odometry using nonlinear factors.
Abstract: Cameras and inertial measurement units are complementary sensors for ego-motion estimation and environment mapping. Their combination makes visual-inertial odometry (VIO) systems more accurate and robust. For globally consistent mapping, however, combining visual and inertial information is not straightforward. To estimate the motion and geometry with a set of images large baselines are required. Because of that, most systems operate on keyframes that have large time intervals between each other. Inertial data on the other hand quickly degrades with the duration of the intervals and after several seconds of integration, it typically contains only little useful information. In this letter, we propose to extract relevant information for visual-inertial mapping from visual-inertial odometry using non-linear factor recovery. We reconstruct a set of non-linear factors that make an optimal approximation of the information on the trajectory accumulated by VIO. To obtain a globally consistent map we combine these factors with loop-closing constraints using bundle adjustment. The VIO factors make the roll and pitch angles of the global map observable, and improve the robustness and the accuracy of the mapping. In experiments on a public benchmark, we demonstrate superior performance of our method over the state-of-the-art approaches.
TL;DR: This paper addresses the problem of loop closing for SLAM based on 3D laser scans recorded by autonomous cars by utilizing a deep neural network exploiting different cues generated from LiDAR data for finding loop closures.
Abstract: Simultaneous localization and mapping (SLAM) is a fundamental capability required by most autonomous systems. In this paper, we address the problem of loop closing for SLAM based on 3D laser scans recorded by autonomous cars. Our approach utilizes a deep neural network exploiting different cues generated from LiDAR data for finding loop closures. It estimates an image overlap generalized to range images and provides a relative yaw angle estimate between pairs of scans. Based on such predictions, we tackle loop closure detection and integrate our approach into an existing SLAM system to improve its mapping results. We evaluate our approach on sequences of the KITTI odometry benchmark and the Ford campus dataset. We show that our method can effectively detect loop closures surpassing the detection performance of state-of-the-art methods. To highlight the generalization capabilities of our approach, we evaluate our model on the Ford campus dataset while using only KITTI for training. The experiments show that the learned representation is able to provide reliable loop closure candidates, also in unseen environments.
TL;DR: This work provides a comprehensive survey, and proposes a new taxonomy for localization and mapping using deep learning, and revisits the problem of perceiving self-motion and scene understanding with on-board sensors, and shows how to solve it by integrating these modules into a prospective spatial machine intelligence system (SMIS).
Abstract: Deep learning based localization and mapping has recently attracted significant attention Instead of creating hand-designed algorithms through exploitation of physical models or geometric theories, deep learning based solutions provide an alternative to solve the problem in a data-driven way Benefiting from ever-increasing volumes of data and computational power, these methods are fast evolving into a new area that offers accurate and robust systems to track motion and estimate scenes and their structure for real-world applications In this work, we provide a comprehensive survey, and propose a new taxonomy for localization and mapping using deep learning We also discuss the limitations of current models, and indicate possible future directions A wide range of topics are covered, from learning odometry estimation, mapping, to global localization and simultaneous localization and mapping (SLAM) We revisit the problem of perceiving self-motion and scene understanding with on-board sensors, and show how to solve it by integrating these modules into a prospective spatial machine intelligence system (SMIS) It is our hope that this work can connect emerging works from robotics, computer vision and machine learning communities, and serve as a guide for future researchers to apply deep learning to tackle localization and mapping problems
TL;DR: This work uses a predictive cross-modal criterion, akin to “self-supervision,” measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud to infer dense depth from camera motion and sparse depth as estimated using a visual-inertial odometry system.
Abstract: We describe a method to infer dense depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to “self-supervision,” measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also present the first visual-inertial + depth dataset, which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, where we achieve state-of-the-art performance.
TL;DR: A self-supervised framework capable of full mapping and localisation with radar in urban environments, and is sensor agnostic and can be applied to most modalities.
Abstract: This paper presents a self-supervised framework for learning to detect robust keypoints for odometry estimation and metric localisation in radar. By embedding a differentiable point-based motion estimator inside our architecture, we learn keypoint locations, scores and descriptors from localisation error alone. This approach avoids imposing any assumption on what makes a robust keypoint and crucially allows them to be optimised for our application. Furthermore the architecture is sensor agnostic and can be applied to most modalities. We run experiments on 280km of real world driving from the Oxford Radar RobotCar Dataset and improve on the state-of-the-art in point-based radar odometry, reducing errors by up to 45% whilst running an order of magnitude faster, simultaneously solving metric loop closures. Combining these outputs, we provide a framework capable of full mapping and localisation with radar in urban environments.
TL;DR: A complementary multi–modal sensor fusion approach is presented that improves the reliability of the pose estimation process for aerial robots by fusing visual–inertial (VIO) and thermal–inERTial (TIO) odometry estimates with a LiDAR odometry and mapping solution.
Abstract: Resilient pose estimation for autonomous systems, and especially small unmanned aerial robots, is one of the core capabilities required for these robots to perform their assigned tasks in a reliable and efficient manner. Different sensing modalities have been utilized for the robot pose estimation process, particularly in GPS–denied environments. However, as aerial robots are deployed in more complex environments, such as subterranean mines and tunnels, different sensing modalities can become degraded in different parts of the environment due to the diversity of sensor perception challenges presented in terms of both nature and condition of the operational environment. Motivated by this fact, in this work a complementary multi–modal sensor fusion approach is presented that improves the reliability of the pose estimation process for aerial robots by fusing visual–inertial (VIO) and thermal–inertial (TIO) odometry estimates with a LiDAR odometry and mapping solution. In particular, VIO/TIO estimates are utilized for providing robust priors for LiDAR pose estimation as well as for selectively propagating the LiDAR pose estimates when LiDAR pose estimation process becomes degenerate. The proposed approach is experimentally verified in a variety of subterranean environments as well as utilized during the competition run of the tunnel circuit of the DARPA Subterranean Challenge.
TL;DR: Cathias et al. as mentioned in this paper proposed a tightly-coupled Extended Kalman Filter (EKF) framework for IMU-only state estimation, which regresses 3D displacement estimates and its uncertainty.
Abstract: In this letter we propose a tightly-coupled Extended Kalman Filter framework for IMU-only state estimation. Strap-down IMU measurements provide relative state estimates based on IMU kinematic motion model. However the integration of measurements is sensitive to sensor bias and noise, causing significant drift within seconds. Recent research by Yan et al. (RoNIN) and Chen et al. (IONet) showed the capability of using trained neural networks to obtain accurate 2D displacement estimates from segments of IMU data and obtained good position estimates from concatenating them. This letter demonstrates a network that regresses 3D displacement estimates and its uncertainty, giving us the ability to tightly fuse the relative state measurement into a stochastic cloning EKF to solve for pose, velocity and sensor biases. We show that our network, trained with pedestrian data from a headset, can produce statistically consistent measurement and uncertainty to be used as the update step in the filter, and the tightly-coupled system outperforms velocity integration approaches in position estimates, and AHRS attitude filter in orientation estimates. Video materials and code can be found on our project page:http://cathias.github.io/TLIO/.
TL;DR: Li-OM (Livox LiDar-inertial odometry and mapping) is real-time capable and achieves superior accuracy over state-of-the-art systems for both LiDAR types on public data sets of mechanical LiD ARs and in experiments using the Livox Horizon.
Abstract: We present a novel tightly-coupled LiDAR-inertial odometry and mapping scheme
for both solid-state and mechanical LiDARs. As frontend, a feature-based
lightweight LiDAR odometry provides fast motion estimates for adaptive keyframe
selection. As backend, a hierarchical keyframe-based sliding window
optimization is performed through marginalization for directly fusing IMU and
LiDAR measurements. For the Livox Horizon, a newly released solid-state LiDAR,
a novel feature extraction method is proposed to handle its irregular scan
pattern during preprocessing. LiLi-OM (Livox LiDAR-inertial odometry and
mapping) is real-time capable and achieves superior accuracy over
state-of-the-art systems for both LiDAR types on public data sets of mechanical
LiDARs and in experiments using the Livox Horizon. Source code and recorded
experimental data sets are available at https://github.com/KIT-ISAS/lili-om.
TL;DR: ClusterVO as discussed by the authors uses a multi-level probabilistic association mechanism and a heterogeneous CRF clustering approach combining semantic, spatial and motion information to jointly infer cluster segmentations online for every frame.
Abstract: We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving. At the core of our system lies a multi-level probabilistic association mechanism and a heterogeneous Conditional Random Field (CRF) clustering approach combining semantic, spatial and motion information to jointly infer cluster segmentations online for every frame. The poses of camera and dynamic objects are instantly solved through a sliding-window optimization. Our system is evaluated on Oxford Multimotion and KITTI dataset both quantitatively and qualitatively, reaching comparable results to state-of-the-art solutions on both odometry and dynamic trajectory recovery.
TL;DR: It is shown that the network, trained with pedestrian data from a headset, can produce statistically consistent measurement and uncertainty to be used as the update step in the filter, and the tightly-coupled system outperforms velocity integration approaches in position estimates, and AHRS attitude filter in orientation estimates.
Abstract: In this work we propose a tightly-coupled Extended Kalman Filter framework for IMU-only state estimation. Strap-down IMU measurements provide relative state estimates based on IMU kinematic motion model. However the integration of measurements is sensitive to sensor bias and noise, causing significant drift within seconds. Recent research by Yan et al. (RoNIN) and Chen et al. (IONet) showed the capability of using trained neural networks to obtain accurate 2D displacement estimates from segments of IMU data and obtained good position estimates from concatenating them. This paper demonstrates a network that regresses 3D displacement estimates and its uncertainty, giving us the ability to tightly fuse the relative state measurement into a stochastic cloning EKF to solve for pose, velocity and sensor biases. We show that our network, trained with pedestrian data from a headset, can produce statistically consistent measurement and uncertainty to be used as the update step in the filter, and the tightly-coupled system outperforms velocity integration approaches in position estimates, and AHRS attitude filter in orientation estimates.
TL;DR: In this paper, the authors present the Oxford Inertial Odometry Data Set (OxIOD), a first-of-its-kind public data set for deep learning-based inertial navigation research with fine-grained ground truth on all sequences.
Abstract: Modern inertial measurements units (IMUs) are small, cheap, energy efficient, and widely employed in smart devices and mobile robots. Exploiting inertial data for accurate and reliable pedestrian navigation supports is a key component for emerging Internet of Things applications and services. Recently, there has been a growing interest in applying deep neural networks (DNNs) to motion sensing and location estimation. However, the lack of sufficient labelled data for training and evaluating architecture benchmarks has limited the adoption of DNNs in IMU-based tasks. In this article, we present and release the Oxford Inertial Odometry Data Set (OxIOD), a first-of-its-kind public data set for deep-learning-based inertial navigation research with fine-grained ground truth on all sequences. Furthermore, to enable more efficient inference at the edge, we propose a novel lightweight framework to learn and reconstruct pedestrian trajectories from raw IMU data. Extensive experiments show the effectiveness of our data set and methods in achieving accurate data-driven pedestrian inertial navigation on resource-constrained devices.
TL;DR: The experimental results showed that the method could obtain the pose estimation performance close to the state-of-the-art lidar odometry approach that has been currently utilized in underground coal mine, providing robust and precise localization estimation for CMR applications.
Abstract: Robotic mining equipment plays an increasingly important role in the coal mining industry. Due to the complexity of the confined underground environment, available localization methods are limited, and restrict the development of coal mine robots (CMRs). Ultra-wideband (UWB) is a promising positioning sensor with high ranging accuracy. However, current applications about UWB positioning in coal mine focus mainly on position information, but rarely on orientation information. Positioning accuracy is often plagued by the loss of transmitted signals and multipath effects. In this paper, a pseudo-GPS positioning system in underground coal mine, composed by noisy UWB range measurements, is proposed to provide localization service for CMRs. An Error-State Kalman Filter (ESKF) is used for fusing measurements from the inertial measurement unit (IMU) and the established UWB positioning system. Then the complete six degree of freedom (6-DOF) state estimation can be realized. Meanwhile the biases of the IMU and the translation parameters of IMU w.r.t. UWB mobile node are also estimated online to adapt to long-term operation in harsh underground environments. In addition, an UWB anchor optimal deployment strategy is discussed to deploy UWB nodes appropriately in the laneway, and maintain realistic positioning accuracy for CMR in the meantime. A large number of field tests in different environments including the actual underground coal mine were conducted. The experimental results showed that our method could obtain the pose estimation performance close to the state-of-the-art lidar odometry approach that has been currently utilized in underground coal mine, providing robust and precise localization estimation for CMR applications.
TL;DR: Li et al. as discussed by the authors developed a sliding-window filter based LiDAR-Inertial-Camera odometry with online spatio-temporal calibration, which introduces a novel slidingwindow plane-feature tracking for efficiently processing 3D LiDARS point clouds.
Abstract: Multi-sensor fusion of multi-modal measurements from commodity inertial, visual and LiDAR sensors to provide robust and accurate 6DOF pose estimation holds great potential in robotics and beyond. In this paper, building upon our prior work (i.e., LIC-Fusion), we develop a sliding-window filter based LiDAR-Inertial-Camera odometry with online spatiotemporal calibration (i.e., LIC-Fusion 2.0), which introduces a novel sliding-window plane-feature tracking for efficiently processing 3D LiDAR point clouds. In particular, after motion compensation for LiDAR points by leveraging IMU data, low-curvature planar points are extracted and tracked across the sliding window. A novel outlier rejection criteria is proposed in the plane-feature tracking for high quality data association. Only the tracked planar points belonging to the same plane will be used for plane initialization, which makes the plane extraction efficient and robust. Moreover, we perform the observability analysis for the IMU-LiDAR subsystem under consideration and report the degenerate cases for spatiotemporal calibration using plane features. While the estimation consistency and identified degenerate motions are validated in Monte-Carlo simulations, different real-world experiments are also conducted to show that the proposed LIC-Fusion 2.0 outperforms its predecessor and other state-of-the-art methods.
TL;DR: An efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph that runs in real-time at full framerate using fixed lag smoothing is presented.
Abstract: We present an efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph. This runs in real-time at full framerate using fixed lag smoothing. To perform such tight integration, a new method to extract 3D line and planar primitives from lidar point clouds is presented. This approach overcomes the suboptimality of typical frame-to-frame tracking methods by treating the primitives as landmarks and tracking them over multiple scans. True integration of lidar features with standard visual features and IMU is made possible using a subtle passive synchronization of lidar and camera frames. The lightweight formulation of the 3D features allows for real-time execution on a single CPU. Our proposed system has been tested on a variety of platforms and scenarios, including underground exploration with a legged robot and outdoor scanning with a dynamically moving handheld device, for a total duration of 96 min and 2.4 km traveled distance. In these test sequences, using only one exteroceptive sensor leads to failure due to either underconstrained geometry (affecting lidar) or textureless areas caused by aggressive lighting changes (affecting vision). In these conditions, our factor graph naturally uses the best information available from each sensor modality without any hard switches.
TL;DR: This paper presents a novel triple-channel deep IO network architecture based on the physical and mathematical models of IMUs that outperforms all the existing solutions on the IMU readings of the challenging Micro Aerial Vehicle dataset and improves the accuracy by approximately 25%.
Abstract: Inertial measurement units (IMUs) suffer from bias and measurement noise, which makes it much more complicated to tackle the problem of inertial odometry (IO). Due to the error propagation over time, while estimating robot position, an inaccurate estimation or a small error will cause the odometry and a localization system unreliable and unusable in a split of seconds. This paper presents a novel triple-channel deep IO network architecture based on the physical and mathematical models of IMUs. The proposed method simulates the noise model in the training phase and becomes robust to noise during testing. Besides, the proposed network architecture also considers the time interval between two consecutive IMU readings (sampling time) so that it is robust to the change of IMU frequency and the missing of IMU information. To the best of our knowledge, this paper is the first work reviewing and analyzing the existing IO methods used by the deep-learning-based visual-IO approaches. The proposed network architecture outperforms all the existing solutions on the IMU readings of the challenging Micro Aerial Vehicle dataset and improves the accuracy by approximately 25%.
TL;DR: A novel framework for the implementation of next-generation visual odometry based on additional high-dimensional features, which have not been implemented in the relevant applications are presented.
Abstract: Monocular visual odometry provides more robust functions on navigation and obstacle avoidance for mobile robots than other visual odometries, such as binocular visual odometry, RGB-D visual odometry and basic odometry. This paper describes the problem of visual odometry and also determines the relationships between visual odometry and visual simultaneous localization and mapping (SLAM). The basic principle of visual odometry is expressed in the form of mathematics, specifically by incrementally solving the pose changes of two series of frames and further improving the odometry through global optimization. After analyzing the three main ways of implementing visual odometry, the state-of-the-art monocular visual odometries, including ORB-SLAM2, DSO and SVO, are also analyzed and compared in detail. The issues of robustness and real-time operations, which are generally of interest in the current visual odometry research, are discussed from the future development of the directions and trends. Furthermore, we present a novel framework for the implementation of next-generation visual odometry based on additional high-dimensional features, which have not been implemented in the relevant applications.
TL;DR: The first-of-its-kind dataset, LIBRE as discussed by the authors, is a dataset of 10 different LiDAR sensors, covering a range of manufacturers, models, and laser configurations.
Abstract: In this work, we present LIBRE: LiDAR Benchmarking and Reference, a first-of-its-kind dataset featuring 10 different LiDAR sensors, covering a range of manufacturers, models, and laser configurations. Data captured independently from each sensor includes three different environments and configurations: static targets, where objects were placed at known distances and measured from a fixed position within a controlled environment; adverse weather, where static obstacles were measured from a moving vehicle, captured in a weather chamber where LiDARs were exposed to different conditions (fog, rain, strong light); and finally, dynamic traffic, where dynamic objects were captured from a vehicle driven on public urban roads, multiple times at different times of the day, and including supporting sensors such as cameras, infrared imaging, and odometry devices. LIBRE will contribute to the research community to (1) provide a means for a fair comparison of currently available LiDARs, and (2) facilitate the improvement of existing self-driving vehicles and robotics-related software, in terms of development and tuning of LiDAR-based perception algorithms.
TL;DR: This work focuses on unsupervised learning for LiDAR odometry (LO) without trainable labels, and introduces the uncertainty-aware loss with geometric confidence, thereby al-lowing the reliability of the proposed pipeline.
Abstract: Learning-based ego-motion estimation approaches have recently drawn strong interest from researchers, mostly focusing on visual perception. A few learning-based approaches using Light Detection and Ranging (LiDAR) have been re-ported; however, they heavily rely on a supervised learning manner. Despite the meaningful performance of these approaches, supervised training requires ground-truth pose labels, which is the bottleneck for real-world applications. Differing from these approaches, we focus on unsupervised learning for LiDAR odometry (LO) without trainable labels. Achieving trainable LO in an unsupervised manner, we introduce the uncertainty-aware loss with geometric confidence, thereby al-lowing the reliability of the proposed pipeline. Evaluation on the KITTI, Complex Urban, and Oxford RobotCar datasets demonstrate the prominent performance of the proposed method compared to conventional model-based methods. The proposed method shows a comparable result against SuMa (in KITTI), LeGO-LOAM (in Complex Urban), and Stereo-VO (in Oxford RobotCar). The video and extra-information of the paper are described in https://sites.google.com/view/deeplo.
TL;DR: This paper reports on a method using direct radar odometry, PhaRaO, which infers relative motion from a pair of radar scans via phase correlation, and applies the Fourier Mellin transform for Cartesian and log-polar radar images to sequentially estimate rotation and translation.
Abstract: Recent studies in radar-based navigation present promising navigation performance using scanning radars. These scanning radar-based odometry methods are mostly feature-based; they detect and match salient features within a radar image. Differing from existing feature-based methods, this paper reports on a method using direct radar odometry, PhaRaO, which infers relative motion from a pair of radar scans via phase correlation. Specifically, we apply the Fourier Mellin transform (FMT) for Cartesian and log-polar radar images to sequentially estimate rotation and translation. In doing so, we decouple rotation and translation estimations in a coarse-to-fine manner to achieve real-time performance. The proposed method is evaluated using large-scale radar data obtained from various environments. The inferred trajectory yields a 2.34% (translation) and 2.93° (rotation) Relative Error (RE) over a 4km path length on average for the odometry estimation.
TL;DR: This paper presents a lightweight and real-time visual semantic SLAM framework running on board aerial robotic platforms that combines low-level visual/visual-inertial odometry (VO/VIO) along with geometrical information corresponding to planar surfaces extracted from detected semantic objects.
Abstract: Indoor environments have abundant presence of high-level semantic information which can provide a better understanding of the environment for robots to improve the uncertainty in their pose estimate. Although semantic information has proved to be useful, there are several challenges faced by the research community to accurately perceive, extract and utilize such semantic information from the environment. In order to address these challenges, in this paper we present a lightweight and real-time visual semantic SLAM framework running on board aerial robotic platforms. This novel method combines low-level visual/visual-inertial odometry (VO/VIO) along with geometrical information corresponding to planar surfaces extracted from detected semantic objects. Extracting the planar surfaces from selected semantic objects provides enhanced robustness and makes it possible to precisely improve the metric estimates rapidly, simultaneously generalizing to several object instances irrespective of their shape and size. Our graph-based approach can integrate several state of the art VO/VIO algorithms along with the state of the art object detectors in order to estimate the complete 6DoF pose of the robot while simultaneously creating a sparse semantic map of the environment. No prior knowledge of the objects is required, which is a significant advantage over other works. We test our approach on a standard RGB-D dataset comparing its performance with the state of the art SLAM algorithms. We also perform several challenging indoor experiments validating our approach in presence of distinct environmental conditions and furthermore test it on board an aerial robot. Video:https://vimeo.com/368217703Released Code:https://bitbucket.org/hridaybavle/semantic_slam.git.