DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

doi:10.48550/arXiv.2206.00927

Proceedings Article10.48550/arXiv.2206.00927

DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

Cheng Lu, +5 more

- 02 Jun 2022

Vol. abs/2206.00927

734

TL;DR: This work proposes DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee, suitable for both discrete-time and continuous-time DPMs without any further training.

Abstract: Diffusion probabilistic models (DPMs) are emerging powerful generative models. Despite their high-quality generation performance, DPMs still suffer from their slow sampling as they generally need hundreds or thousands of sequential function evaluations (steps) of large neural networks to draw a sample. Sampling from DPMs can be viewed alternatively as solving the corresponding diffusion ordinary differential equations (ODEs). In this work, we propose an exact formulation of the solution of diffusion ODEs. The formulation analytically computes the linear part of the solution, rather than leaving all terms to black-box ODE solvers as adopted in previous works. By applying change-of-variable, the solution can be equivalently simplified to an exponentially weighted integral of the neural network. Based on our formulation, we propose DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee. DPM-Solver is suitable for both discrete-time and continuous-time DPMs without any further training. Experimental results show that DPM-Solver can generate high-quality samples in only 10 to 20 function evaluations on various datasets. We achieve 4.70 FID in 10 function evaluations and 2.87 FID in 20 function evaluations on the CIFAR10 dataset, and a $4\sim 16\times$ speedup compared with previous state-of-the-art training-free samplers on various datasets.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 4: Random samples by DDIM [19] (quadratic time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [2] on CIFAR-10.

Table 4: Sample quality measured by FID ↓ on CIFAR-10, CelebA 64×64 and ImageNet 64×64 with discretetime DPMs, varying the number of function evaluations (NFE). The method †GGDM needs extra training, and some results are missing in their original papers, which are replaced by “\”.

Table 5: Sample quality measured by FID ↓ on ImageNet 128×128 with classifier guidance and on LSUN bedroom 256×256, varying the number of function evaluations (NFE). For DDIM and DDPM, we use uniform time steps for all the experiments, except that the experiment† uses the fine-tuned time steps by [4]. For DPM-Solver, we use the uniform logSNR steps as described in Appendix D.3.

Figure 5: Random samples by DDIM [19] (quadratic time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [19] on CelebA 64x64.

Figure 6: Random samples by DDIM [19] (uniform time steps) and DPM-Solver (ours) with 10, 12, 15, 20 number of function evaluations (NFE) with the same random seed, using the pre-trained discrete-time DPMs [16] on ImageNet 64x64.

Figure 1: Samples by DDIM [19] with 10, 15, 20, 100 number of function evaluations (NFE), and DPM-Solver (ours) with only 10 NFE, using the pre-trained DPMs on ImageNet 256×256 with classifier guidance [4].

Citations

Proceedings Article•10.48550/arXiv.2206.00364

Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras, +3 more

- 01 Jun 2022

TL;DR: This work argues that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seeks to remedy the situation by presenting a design space that clearly separates the concrete design choices, and identifies several changes to both the sampling and training processes, as well as preconditioning of the score networks.

...read moreread less

966

Journal Article•10.48550/arXiv.2209.00796

Diffusion Models: A Comprehensive Survey of Methods and Applications

Ling Yang, +8 more

- 02 Sep 2022

- arXiv.org

TL;DR: A comprehensive review of existing variants of the diffusion models and a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial puriﬁcation.

...read moreread less

734

Journal Article•10.1109/TPAMI.2023.3261988

Diffusion Models in Vision: A Survey

Florinel-Alin Croitoru, +3 more

- 10 Sep 2022

- IEEE Transactions on Pattern Analysis an...

TL;DR: A multi-perspective categorization of diffusion models applied in computer vision, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing models is introduced.

...read moreread less

635

Journal Article•10.48550/arXiv.2211.01324

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, +12 more

- 02 Nov 2022

- arXiv.org

TL;DR: The authors propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages, which leads to improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text to image diffusion models on the standard benchmark.

...read moreread less

515

Journal Article•10.48550/arXiv.2304.08818

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Andreas Blattmann, +6 more

- 18 Apr 2023

- arXiv.org

TL;DR: In this article , the authors apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task, by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i.e., videos.

...read moreread less

480

...

Expand

References

Proceedings Article•10.1109/CVPR.2009.5206848

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

- 20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

75.9K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Proceedings Article

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

- 01 Jan 2014

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

28.9K

•Posted Content

Denoising Diffusion Probabilistic Models

Jonathan Ho, +2 more

- 19 Jun 2020

- arXiv: Learning

TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

...read moreread less

11.7K

•Proceedings Article•10.1109/ICCV.2015.425

Deep Learning Face Attributes in the Wild

Ziwei Liu, +3 more

- 07 Dec 2015

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

...read moreread less

10.1K

...

Expand