Journal Article10.48550/arXiv.2306.13720
Decoupled Diffusion Models with Explicit Transition Probability
TL;DR: Zhang et al. as mentioned in this paper proposed to decouple the intricate diffusion process into two comparatively simpler processes to improve the generative efficacy and speed of DPMs, in which the image distribution is approximated by an explicit transition probability while the noise path is controlled by the standard Wiener process.
read more
Abstract: Recent diffusion probabilistic models (DPMs) have shown remarkable abilities of generated content, however, they often suffer from complex forward processes, resulting in inefficient solutions for the reversed process and prolonged sampling times. In this paper, we aim to address the aforementioned challenges by focusing on the diffusion process itself that we propose to decouple the intricate diffusion process into two comparatively simpler process to improve the generative efficacy and speed. In particular, we present a novel diffusion paradigm named DDM (\textbf{D}ecoupled \textbf{D}iffusion \textbf{M}odels) based on the It\^{o} diffusion process, in which the image distribution is approximated by an explicit transition probability while the noise path is controlled by the standard Wiener process. We find that decoupling the diffusion process reduces the learning difficulty and the explicit transition probability improves the generative speed significantly. We prove a new training objective for DPM, which enables the model to learn to predict the noise and image components separately. Moreover, given the novel forward diffusion equation, we derive the reverse denoising formula of DDM that naturally supports fewer steps of generation without ordinary differential equation (ODE) based accelerators. Our experiments demonstrate that DDM outperforms previous DPMs by a large margin in fewer function evaluations setting and gets comparable performances in long function evaluations setting. We also show that our framework can be applied to image-conditioned generation and high-resolution image synthesis, and that it can generate high-quality images with only 10 function evaluations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 10: 10-step inpainting visualization. 
Figure 2: Framework overview. (Top:) DPMs typically use an image-to-noise process, while we propose to split it into two relatively simpler processes: the image-to-zero mapping and the zeroto-noise mapping. We use an analytic function (in blue boxes) to model image to zero, or the attenuation gradient of the image (red line in the middle image). The zero-to-noise path is governed by the standard Wiener process. (Bottom:) We compare the equations of forward sampling, reversed sampling, and training objective of our method and DDPM. 
Table 1: Unconditional generative performances (FID↓) on CIFAR-10 and CelebA-HQ-256 compared to previous DPMs including DDPM, SDE, LSGM, and CLD. Model size w.r.t the number of parameters is shown. means FID is lower than the second best method with statistical significance (p-value < 0.05) based on the two sample t-test. means the difference between our method and the closest best method is not statistical significant. We did not run the 2000-step sampling on CelebA-HQ-256, because it takes more than 7 days on an RTX 3090 GPU. 
Figure 11: 10-step super-resolution visualization. 
Figure 7: Comparisons of 10-step unconditional generation on CelebA-HQ-256. 
Figure 12: 10-step saliency detection visualization.
References
•Proceedings Article
Auto-Encoding Variational Bayes
Diederik P. Kingma,Max Welling +1 more
- 01 Jan 2014
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
•Posted Content
Denoising Diffusion Probabilistic Models
TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
•Posted Content
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts,Mohamed Omran,Sebastian Ramos,Timo Rehfeld,Markus Enzweiler,Rodrigo Benenson,Uwe Franke,Stefan Roth,Bernt Schiele +8 more
TL;DR: Cityscapes as discussed by the authors is a large-scale dataset for semantic urban scene understanding, consisting of 5000 images with high quality pixel-level annotations and 200,000 additional images with coarse annotations.
7.8K
•Proceedings Article
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Tero Karras,Timo Aila,Samuli Laine,Jaakko Lehtinen +3 more
- 15 Feb 2018
TL;DR: Recently, the authors proposed a new training methodology for GANs that grows both the generator and discriminator progressively, starting from a low resolution, and adding new layers that model increasingly fine details as training progresses.
5.9K
•Posted Content
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
TL;DR: This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.