Journal Article10.1109/iccv51070.2023.00204
DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
Shengqu Cai,Eric Ryan Chan,Songyou Peng,Mohamad Shahbazi,Anton Obukhov,Luc Van Gool,Gordon Wetzstein +6 more
- 01 Oct 2023
9
TL;DR: DiffDreamer is an unsupervised framework for long-range scene extrapolation using image-conditioned diffusion models. It generates novel views depicting a long camera trajectory while training solely on internet-collected images.
read more
Abstract: Scene extrapolation—the idea of generating novel views by flying into a given image—is a promising, yet challenging task. For each predicted frame, a joint inpainting and 3D refinement problem has to be solved, which is ill posed and includes a high level of ambiguity. Moreover, training data for long-range scenes is difficult to obtain and usually lacks sufficient views to infer accurate camera poses. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. Utilizing the stochastic nature of the guided denoising steps, we train the diffusion models to refine projected RGBD images but condition the denoising steps on multiple past and future frames for inference. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving consistency significantly better than prior GAN-based methods. DiffDreamer is a powerful and efficient solution for scene extrapolation, producing impressive results despite limited supervision. Project page: https://primecai.github.io/diffdreamer.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
State of the Art on Diffusion Models for Visual Computing
Riccardo Pò,Yifan Wang,Vladislav Golyanik,Kfir Aberman,Jonathan T. Barron,Amit H. Bermano,Edwin P. Chan,Tali Dekel,Aleksander Holynski,Angjoo Kanazawa,Chunlei Liu,L. Liu,Ben Mildenhall,Matthias Nießner,Björn Ommer,Christian Theobalt,Peter Wonka,Gordon Wetzstein +17 more
TL;DR: Diffusion models are the generative AI architecture of choice for visual computing, enabling image, video, and 3D scene generation, editing, and reconstruction. The field is rapidly advancing with new works appearing daily on arXiv.
20
WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu,Haoyi Duan,Junhwa Hur,Kyle Sargent,Michael Rubinstein,William T. Freeman,Forrester Cole,Deqing Sun,Noah Snavely,Jiajun Wu,Charles Herrmann +10 more
- 16 Jun 2024
5
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai,Duygu Ceylan,Matheus Gadelha,Chun-Hao P. Huang,Tuanfeng Y. Wang,Gordon Wetzstein +5 more
- 16 Jun 2024
1
NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation
Rui Yu,Jiachen Liu,Zihan Zhou,Xiaolei Huang +3 more
- 13 May 2024
TL;DR: This paper presents NeRF-Enhanced Outpainting (NEO), a method for faithful field-of-view extrapolation using pre-captured images and NeRF-generated extended-FOV images to train a scene-specific image outpainting model, evaluated on four datasets with robust results.
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Yuru Jia,Lukas Hoyer,Shengyu Huang,Tianfu Wang,Luc Van Gool,Konrad Schindler,Anton Obukhov +6 more
References
Image quality assessment: from error visibility to structural similarity
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola,Jun-Yan Zhu,Tinghui Zhou,Alexei A. Efros +3 more
- 21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
•Posted Content
Rethinking Atrous Convolution for Semantic Image Segmentation
TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.
9.9K
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Zhang,Phillip Isola,Phillip Isola,Alexei A. Efros,Eli Shechtman,Oliver Wang +5 more
- 11 Jan 2018
TL;DR: In this paper, the authors introduce a new dataset of human perceptual similarity judgments, and systematically evaluate deep features across different architectures and tasks and compare them with classic metrics, finding that deep features outperform all previous metrics by large margins on their dataset.
Structure-from-Motion Revisited
Johannes L. Schonberger,Jan-Michael Frahm +1 more
- 27 Jun 2016
TL;DR: This work proposes a new SfM technique that improves upon the state of the art to make a further step towards building a truly general-purpose pipeline.