DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models

doi:10.1109/iccv51070.2023.00204

Journal Article10.1109/iccv51070.2023.00204

DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models

Shengqu Cai, +6 more

- 01 Oct 2023

9

TL;DR: DiffDreamer is an unsupervised framework for long-range scene extrapolation using image-conditioned diffusion models. It generates novel views depicting a long camera trajectory while training solely on internet-collected images.

Abstract: Scene extrapolation—the idea of generating novel views by flying into a given image—is a promising, yet challenging task. For each predicted frame, a joint inpainting and 3D refinement problem has to be solved, which is ill posed and includes a high level of ambiguity. Moreover, training data for long-range scenes is difficult to obtain and usually lacks sufficient views to infer accurate camera poses. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. Utilizing the stochastic nature of the guided denoising steps, we train the diffusion models to refine projected RGBD images but condition the denoising steps on multiple past and future frames for inference. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving consistency significantly better than prior GAN-based methods. DiffDreamer is a powerful and efficient solution for scene extrapolation, producing impressive results despite limited supervision. Project page: https://primecai.github.io/diffdreamer.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1111/cgf.15063

State of the Art on Diffusion Models for Visual Computing

Riccardo Pò, +17 more

- 30 Apr 2024

- Computer Graphics Forum

TL;DR: Diffusion models are the generative AI architecture of choice for visual computing, enabling image, video, and 3D scene generation, editing, and reconstruction. The field is rapidly advancing with new works appearing daily on arXiv.

...read moreread less

20

Journal Article•10.1109/cvpr52733.2024.00636

WonderJourney: Going from Anywhere to Everywhere

Hong-Xing Yu, +10 more

- 16 Jun 2024

5

Journal Article•10.1109/cvpr52733.2024.00727

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Shengqu Cai, +5 more

- 16 Jun 2024

1

Journal Article•10.1109/icra57147.2024.10611328

NeRF-Enhanced Outpainting for Faithful Field-of-View Extrapolation

Rui Yu, +3 more

- 13 May 2024

TL;DR: This paper presents NeRF-Enhanced Outpainting (NEO), a method for faithful field-of-view extrapolation using pre-captured images and NeRF-generated extended-FOV images to train a scene-specific image outpainting model, evaluated on four datasets with robust results.

...read moreread less

1

Journal Article•10.1007/978-3-031-72933-1_6

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, +6 more

- 02 Oct 2024

- Lecture Notes in Computer Science

References

Journal Article•10.1109/TIP.2003.819861

Image quality assessment: from error visibility to structural similarity

Zhou Wang, +3 more

- 01 Apr 2004

- IEEE Transactions on Image Processing

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.

...read moreread less

56.3K

•Proceedings Article•10.1109/CVPR.2017.632

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola, +3 more

- 21 Jul 2017

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

19.6K

•Posted Content

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +3 more

- 17 Jun 2017

- arXiv: Computer Vision and Pattern Recog...

TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

9.9K

•Proceedings Article•10.1109/CVPR.2018.00068

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang, +5 more

- 11 Jan 2018

TL;DR: In this paper, the authors introduce a new dataset of human perceptual similarity judgments, and systematically evaluate deep features across different architectures and tasks and compare them with classic metrics, finding that deep features outperform all previous metrics by large margins on their dataset.

...read moreread less

8K

Proceedings Article•10.1109/CVPR.2016.445

Structure-from-Motion Revisited

Johannes L. Schonberger, +1 more

- 27 Jun 2016

TL;DR: This work proposes a new SfM technique that improves upon the state of the art to make a further step towards building a truly general-purpose pipeline.

...read moreread less

6.1K

...

Expand