Journal Article10.48550/arXiv.2212.00792
SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction
118
TL;DR: SparseFusion as discussed by the authors distills a 3D consistent scene representation from a view-conditioned latent diffusion model, which is then used to recover a plausible 3D representation.
read more
Abstract: We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as a (probabilistic) 2D synthesis task, and while they can generate plausible 2D images, they do not infer a consistent underlying 3D. However, we find that this trade-off between 3D consistency and probabilistic image generation does not need to exist. In fact, we show that geometric consistency and generative inference can be complementary in a mode-seeking behavior. By distilling a 3D consistent scene representation from a view-conditioned latent diffusion model, we are able to recover a plausible 3D representation whose renderings are both accurate and realistic. We evaluate our approach across 51 categories in the CO3D dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Zero-1-to-3: Zero-shot One Image to 3D Object
TL;DR: Zero-1-to-3 as discussed by the authors is a framework for changing the camera viewpoint of an object given just a single RGB image by exploiting the geometric priors that large-scale diffusion models learn about natural images.
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
TL;DR: Zhang et al. as discussed by the authors proposed a view-conditioned 2D diffusion model, Zero123, to generate multi-view images for the input view, and then aim to lift them up to 3D space.
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Yuan Liu,Cheng-Jian Lin,Zijiao Zeng,Xiaoxiao Long,Lingjie Liu,Taku Komura,Wenping Wang +6 more
TL;DR: Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to -3D.
HexPlane: A Fast Representation for Dynamic Scenes
Ang Cao,Justin Johnson +1 more
TL;DR: HexPlane as discussed by the authors computes features for points in spacetime by fusing vectors extracted from each plane, which is highly efficient and can be used for modeling spacetime for dynamic 3D scenes.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long,Yuanchen Guo,Cheng Lin,Yuan Liu,Zhiyang Dou,Lingjie Liu,Yuexin Ma,Song-Hai Zhang,Marc Habermann,Christian Theobalt,Wenping Wang +10 more
TL;DR: Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images, is introduced and a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images is proposed to holistically improve the quality, consistency, and efficiency of image-to-3D tasks.
References
•Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
117.9K
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger,Philipp Fischer,Thomas Brox +2 more
- 05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
•Posted Content
Denoising Diffusion Probabilistic Models
TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
•Posted Content
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
TL;DR: A new dataset of human perceptual similarity judgments is introduced and it is found that deep features outperform all previous metrics by large margins on this dataset, and suggests that perceptual similarity is an emergent property shared across deep visual representations.
7.5K
Structure-from-Motion Revisited
Johannes L. Schonberger,Jan-Michael Frahm +1 more
- 27 Jun 2016
TL;DR: This work proposes a new SfM technique that improves upon the state of the art to make a further step towards building a truly general-purpose pipeline.