Open AccessPosted Content
PatchmatchNet: Learned Multi-View Patchmatch Stereo
TL;DR: For the first time, an iterative multi-scale Patchmatch in an end-to-end trainable architecture is introduced and the Patchmatch core algorithm is improved with a novel and learned adaptive propagation and evaluation scheme for each iteration.
read more
Abstract: We present PatchmatchNet, a novel and learnable cascade formulation of Patchmatch for high-resolution multi-view stereo. With high computation speed and low memory requirement, PatchmatchNet can process higher resolution imagery and is more suited to run on resource limited devices than competitors that employ 3D cost volume regularization. For the first time we introduce an iterative multi-scale Patchmatch in an end-to-end trainable architecture and improve the Patchmatch core algorithm with a novel and learned adaptive propagation and evaluation scheme for each iteration. Extensive experiments show a very competitive performance and generalization for our method on DTU, Tanks & Temples and ETH3D, but at a significantly higher efficiency than all existing top-performing models: at least two and a half times faster than state-of-the-art methods with twice less memory usage.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
Yinhao Li,Zheng Ge,Guanyi Yu,Jinrong Yang,Zengran Wang,Yukang Shi,Jian‐Yuan Sun,Zeming Li +7 more
- 21 Jun 2022
TL;DR: Without any bells and whistles, BEVDepth achieves the new state-of-the-art 60.0% NDS on the challenging nuScenes test set while maintaining high efficiency and for the first time, the performance gap between the camera and LiDAR is largely reduced within 10% N DS.
366
Attention Concatenation Volume for Accurate and Efficient Stereo Matching
Gangwei Xu,Junda Cheng,Peng Guo,Xin Yang +3 more
- 04 Mar 2022
TL;DR: A novel cost volume construction method which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume is presented.
122
BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo
TL;DR: In this paper , the authors propose an effective method for creating temporal stereo by dynamically determining the center and range of the temporal stereo, the most confident center is found using the EM algorithm.
RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo
01 Jun 2022
TL;DR: RayNet as discussed by the authors directly optimizes the depth value along each camera ray, mimicking the range (depth) finding of a laser scanner, which reduces the MVS problem to ray-based depth optimization which is much more light-weight than full cost volume optimization.
Vis-MVSNet: Visibility-Aware Multi-view Stereo Network
TL;DR: This paper explicitly infer and integrate the pixel-wise occlusion information in the MVS network via the matching uncertainty estimation, and jointly inferred with the pair-wise depth map, which is further used as weighting guidance during the multi-view cost volume fusion.
107
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin,Piotr Dollár,Ross Girshick,Kaiming He,Bharath Hariharan,Serge Belongie +5 more
- 21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
- 01 Jan 2014
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
•Proceedings Article
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
- 01 Jan 2019
TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
Structure-from-Motion Revisited
Johannes L. Schonberger,Jan-Michael Frahm +1 more
- 27 Jun 2016
TL;DR: This work proposes a new SfM technique that improves upon the state of the art to make a further step towards building a truly general-purpose pipeline.