Self-Supervised Multiscale Adversarial Regression Network for Stereo Disparity Estimation

doi:10.1109/TCYB.2020.2999492

Journal Article10.1109/TCYB.2020.2999492

Self-Supervised Multiscale Adversarial Regression Network for Stereo Disparity Estimation

Chen Wang, +7 more

- 01 Oct 2021

- IEEE Transactions on Systems, Man, and C...

- Vol. 51, Iss: 10, pp 4770-4783

58

TL;DR: A novel deep stereo approach called the “self-supervised multiscale adversarial regression network (SMAR-Net),” which relaxes the need for ground-truth depth maps for training and outperforms the current state-of-the-art self- supervised methods and achieves comparable outcomes to supervised methods.

Abstract: Deep learning approaches have significantly contributed to recent progress in stereo matching. These deep stereo matching methods are usually based on supervised training, which requires a large amount of high-quality ground-truth depth map annotations that are expensive to collect. Furthermore, only a limited quantity of stereo vision training data are currently available, obtained either by active sensors (Lidar and ToF cameras) or through computer graphics simulations and not meeting requirements for deep supervised training. Here, we propose a novel deep stereo approach called the “self-supervised multiscale adversarial regression network (SMAR-Net),” which relaxes the need for ground-truth depth maps for training. Specifically, we design a two-stage network. The first stage is a disparity regressor, in which a regression network estimates disparity values from stacked stereo image pairs. Stereo image stacking method is a novel contribution as it not only contains the spatial appearances of stereo images but also implies matching correspondences with different disparity values. In the second stage, a synthetic left image is generated based on the left–right consistency assumption. Our network is trained by minimizing a hybrid loss function composed of a content loss and an adversarial loss. The content loss minimizes the average warping error between the synthetic images and the real ones. In contrast to the generative adversarial loss, our proposed adversarial loss penalizes mismatches using multiscale features. This constrains the synthetic image and real image as being pixelwise identical instead of just belonging to the same distribution. Furthermore, the combined utilization of multiscale feature extraction in both the content loss and adversarial loss further improves the adaptability of SMAR-Net in ill-posed regions. Experiments on multiple benchmark datasets show that SMAR-Net outperforms the current state-of-the-art self-supervised methods and achieves comparable outcomes to supervised methods. The source code can be accessed at: https://github.com/Dawnstar8411/SMAR-Net .

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1016/J.DISPLA.2021.102053

Review of multi-view 3D object recognition methods based on deep learning

Shaohua Qi, +7 more

- 01 Sep 2021

- Displays

TL;DR: A comprehensive review and classification of the latest developments in the deep learning methods for multi-view 3D object recognition is presented, which summarizes the results of these methods on a few mainstream datasets, provides an insightful summary, and puts forward enlightening future research directions.

...read moreread less

197

•Journal Article•10.1016/j.patcog.2021.108498

Uncertainty estimation for stereo matching based on evidential deep learning

01 Apr 2022

- Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a novel approach to estimate both aleatoric and epistemic uncertainties for stereo matching in an end-to-end way, where the uncertainty parameters are predicted for each potential disparity and then averaged via the guidance of matching probability distribution.

...read moreread less

140

Journal Article•10.1016/J.DISPLA.2021.102076

Voxel-based three-view hybrid parallel network for 3D object classification

Weiwei Cai, +4 more

- 01 Sep 2021

- Displays

TL;DR: Wang et al. as mentioned in this paper proposed a voxel-based three-view hybrid parallel network for 3D shape classification, which first obtains the depth projection views of the three-dimensional model from the front view, the top view and the side view, and output its predicted probability value for the category of the 3D model.

...read moreread less

87

Journal Article•10.1016/J.DISPLA.2021.102102

Multi-view stereo in the Deep Learning Era: A comprehensive revfiew

Xiang Wang, +6 more

- 01 Dec 2021

- Displays

TL;DR: In this paper, a comprehensive review of recent deep learning methods for multi-view stereo is presented, which is mainly categorized into depth map based and volumetric based methods according to the 3D representation form and representative methods are reviewed in detail.

...read moreread less

78

Journal Article•10.1016/J.DISPLA.2021.102080

A brief survey on RGB-D semantic segmentation using deep learning

Changshuo Wang, +3 more

- 01 Dec 2021

- Displays

TL;DR: A comprehensive analysis is carried out on recent methods and their analysis of the semantic segmentation in RGB-D according to the research progress in recent years.

...read moreread less

40

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Book Chapter•10.1007/978-3-319-24574-4_28

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 05 Oct 2015

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

92K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Posted Content

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 03 Dec 2019

- arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

25.9K