Proceedings Article10.48550/arXiv.2206.00927
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
Cheng Lu,Yuhao Zhou,Fan Bao,Jianfei Chen,Chongxuan Li,Jun Zhu +5 more
- 02 Jun 2022
Vol. abs/2206.00927
TL;DR: This work proposes DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee, suitable for both discrete-time and continuous-time DPMs without any further training.
read more
Abstract: Diffusion probabilistic models (DPMs) are emerging powerful generative models. Despite their high-quality generation performance, DPMs still suffer from their slow sampling as they generally need hundreds or thousands of sequential function evaluations (steps) of large neural networks to draw a sample. Sampling from DPMs can be viewed alternatively as solving the corresponding diffusion ordinary differential equations (ODEs). In this work, we propose an exact formulation of the solution of diffusion ODEs. The formulation analytically computes the linear part of the solution, rather than leaving all terms to black-box ODE solvers as adopted in previous works. By applying change-of-variable, the solution can be equivalently simplified to an exponentially weighted integral of the neural network. Based on our formulation, we propose DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee. DPM-Solver is suitable for both discrete-time and continuous-time DPMs without any further training. Experimental results show that DPM-Solver can generate high-quality samples in only 10 to 20 function evaluations on various datasets. We achieve 4.70 FID in 10 function evaluations and 2.87 FID in 20 function evaluations on the CIFAR10 dataset, and a $4\sim 16\times$ speedup compared with previous state-of-the-art training-free samplers on various datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures
![Figure 4: Random samples by DDIM [19] (quadratic time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [2] on CIFAR-10.](/figures/figure4-1-4gf5zngmczc1.png)
Figure 4: Random samples by DDIM [19] (quadratic time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [2] on CIFAR-10. 
Table 4: Sample quality measured by FID ↓ on CIFAR-10, CelebA 64×64 and ImageNet 64×64 with discretetime DPMs, varying the number of function evaluations (NFE). The method †GGDM needs extra training, and some results are missing in their original papers, which are replaced by “\”. ![Table 5: Sample quality measured by FID ↓ on ImageNet 128×128 with classifier guidance and on LSUN bedroom 256×256, varying the number of function evaluations (NFE). For DDIM and DDPM, we use uniform time steps for all the experiments, except that the experiment† uses the fine-tuned time steps by [4]. For DPM-Solver, we use the uniform logSNR steps as described in Appendix D.3.](/figures/table5-1-47ka95klazoa.png)
Table 5: Sample quality measured by FID ↓ on ImageNet 128×128 with classifier guidance and on LSUN bedroom 256×256, varying the number of function evaluations (NFE). For DDIM and DDPM, we use uniform time steps for all the experiments, except that the experiment† uses the fine-tuned time steps by [4]. For DPM-Solver, we use the uniform logSNR steps as described in Appendix D.3. ![Figure 5: Random samples by DDIM [19] (quadratic time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [19] on CelebA 64x64.](/figures/figure5-1-446tam98qi3k.png)
Figure 5: Random samples by DDIM [19] (quadratic time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [19] on CelebA 64x64. ![Figure 6: Random samples by DDIM [19] (uniform time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [16] on ImageNet 64x64.](/figures/figure6-1-2wzre6cdx4nf.png)
Figure 6: Random samples by DDIM [19] (uniform time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [16] on ImageNet 64x64. ![Figure 1: Samples by DDIM [19] with 10, 15, 20, 100 number of function evaluations (NFE), and DPM-Solver (ours) with only 10 NFE, using the pre-trained DPMs on ImageNet 256×256 with classifier guidance [4].](/figures/figure1-1-2v934jxpco0c.png)
Figure 1: Samples by DDIM [19] with 10, 15, 20, 100 number of function evaluations (NFE), and DPM-Solver (ours) with only 10 NFE, using the pre-trained DPMs on ImageNet 256×256 with classifier guidance [4].
Citations
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras,Miika Aittala,Timo Aila,Samuli Laine +3 more
- 01 Jun 2022
TL;DR: This work argues that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seeks to remedy the situation by presenting a design space that clearly separates the concrete design choices, and identifies several changes to both the sampling and training processes, as well as preconditioning of the score networks.
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang,Zhilong Zhang,Shenda Hong,Runsheng Xu,Yue Zhao,Yingxia Shao,Wentao Zhang,Min Yang,Bin Cui +8 more
TL;DR: A comprehensive review of existing variants of the diffusion models and a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification.
Diffusion Models in Vision: A Survey
TL;DR: A multi-perspective categorization of diffusion models applied in computer vision, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing models is introduced.
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji,Seungjun Nah,Xun Huang,Arash Vahdat,Jiaming Song,Qinsheng Zhang,Karsten Kreis,Miika Aittala,Timo Aila,Samuli Laine,B. Catanzaro,Tero Karras,Ming-Yu Liu +12 more
TL;DR: The authors propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages, which leads to improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text to image diffusion models on the standard benchmark.
515
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Andreas Blattmann,Robin Rombach,Huan Ling,Tim Dockhorn,Seung Wook Kim,Sanja Fidler,Karsten Kreis +6 more
TL;DR: In this article , the authors apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task, by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos.
480
References
ImageNet: A large-scale hierarchical image database
Jia Deng,Wei Dong,Richard Socher,Li-Jia Li,Kai Li,Li Fei-Fei +5 more
- 20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
•Proceedings Article
Auto-Encoding Variational Bayes
Diederik P. Kingma,Max Welling +1 more
- 01 Jan 2014
TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
•Posted Content
Denoising Diffusion Probabilistic Models
TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
Deep Learning Face Attributes in the Wild
Ziwei Liu,Ping Luo,Xiaogang Wang,Xiaoou Tang +3 more
- 07 Dec 2015
TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.