Decoupled Diffusion Models with Explicit Transition Probability

doi:10.48550/arXiv.2306.13720

Journal Article10.48550/arXiv.2306.13720

Decoupled Diffusion Models with Explicit Transition Probability

Yuhang Huang, +3 more

- 23 Jun 2023

- arXiv.org

- Vol. abs/2306.13720

TL;DR: Zhang et al. as mentioned in this paper proposed to decouple the intricate diffusion process into two comparatively simpler processes to improve the generative efficacy and speed of DPMs, in which the image distribution is approximated by an explicit transition probability while the noise path is controlled by the standard Wiener process.

Abstract: Recent diffusion probabilistic models (DPMs) have shown remarkable abilities of generated content, however, they often suffer from complex forward processes, resulting in inefficient solutions for the reversed process and prolonged sampling times. In this paper, we aim to address the aforementioned challenges by focusing on the diffusion process itself that we propose to decouple the intricate diffusion process into two comparatively simpler process to improve the generative efficacy and speed. In particular, we present a novel diffusion paradigm named DDM (\textbf{D}ecoupled \textbf{D}iffusion \textbf{M}odels) based on the It\^{o} diffusion process, in which the image distribution is approximated by an explicit transition probability while the noise path is controlled by the standard Wiener process. We find that decoupling the diffusion process reduces the learning difficulty and the explicit transition probability improves the generative speed significantly. We prove a new training objective for DPM, which enables the model to learn to predict the noise and image components separately. Moreover, given the novel forward diffusion equation, we derive the reverse denoising formula of DDM that naturally supports fewer steps of generation without ordinary differential equation (ODE) based accelerators. Our experiments demonstrate that DDM outperforms previous DPMs by a large margin in fewer function evaluations setting and gets comparable performances in long function evaluations setting. We also show that our framework can be applied to image-conditioned generation and high-resolution image synthesis, and that it can generate high-quality images with only 10 function evaluations.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 10: 10-step inpainting visualization.

Figure 2: Framework overview. (Top:) DPMs typically use an image-to-noise process, while we propose to split it into two relatively simpler processes: the image-to-zero mapping and the zeroto-noise mapping. We use an analytic function (in blue boxes) to model image to zero, or the attenuation gradient of the image (red line in the middle image). The zero-to-noise path is governed by the standard Wiener process. (Bottom:) We compare the equations of forward sampling, reversed sampling, and training objective of our method and DDPM.

Table 1: Unconditional generative performances (FID↓) on CIFAR-10 and CelebA-HQ-256 compared to previous DPMs including DDPM, SDE, LSGM, and CLD. Model size w.r.t the number of parameters is shown. means FID is lower than the second best method with statistical significance (p-value < 0.05) based on the two sample t-test. means the difference between our method and the closest best method is not statistical significant. We did not run the 2000-step sampling on CelebA-HQ-256, because it takes more than 7 days on an RTX 3090 GPU.

Figure 11: 10-step super-resolution visualization.

Figure 7: Comparisons of 10-step unconditional generation on CelebA-HQ-256.

Figure 12: 10-step saliency detection visualization.

References

•Proceedings Article

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

- 01 Jan 2014

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

28.9K

•Posted Content

Denoising Diffusion Probabilistic Models

Jonathan Ho, +2 more

- 19 Jun 2020

- arXiv: Learning

TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

...read moreread less

11.7K

•Posted Content

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, +8 more

- 06 Apr 2016

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Cityscapes as discussed by the authors is a large-scale dataset for semantic urban scene understanding, consisting of 5000 images with high quality pixel-level annotations and 200,000 additional images with coarse annotations.

...read moreread less

7.8K

•Proceedings Article

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Tero Karras, +3 more

- 15 Feb 2018

TL;DR: Recently, the authors proposed a new training methodology for GANs that grows both the generator and discriminator progressively, starting from a low resolution, and adding new layers that model increasingly fine details as training progresses.

...read moreread less

5.9K

•Posted Content

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein, +3 more

- 12 Mar 2015

- arXiv: Learning

TL;DR: This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.

...read moreread less

4.6K

...

Expand