Diffusion Models Beat GANs on Image Synthesis

Open AccessPosted Content

Diffusion Models Beat GANs on Image Synthesis

- 11 May 2021

1.5K

TL;DR: In this paper, a series of ablations are used to trade off diversity for fidelity using gradients from a classifier, achieving an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNets 256$ \times$256, and 7.72 on Image-Nets 512$ Âtimes$512.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/cvpr52688.2022.01042

High-Resolution Image Synthesis with Latent Diffusion Models

01 Jun 2022

TL;DR: This article decompose the image formation process into a sequential application of denoising autoencoders, and apply them in the latent space of powerful pretrained autoencoder.

...read moreread less

5.4K

•Posted Content

Taming Transformers for High-Resolution Image Synthesis

Patrick Esser, +2 more

- 17 Dec 2020

- arXiv: Computer Vision and Pattern Recog...

TL;DR: It is demonstrated how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.

...read moreread less

1.7K

Journal Article•10.1109/iccv51070.2023.00355

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang, +2 more

- 01 Oct 2023

TL;DR: ControlNet adds spatial conditioning controls to text-to-image diffusion models, enabling control over various image aspects with single or multiple conditions.

...read moreread less

1.1K

Journal Article•10.1109/cvpr52729.2023.02155

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Nataniel Ruiz, +5 more

- 01 Jun 2023

TL;DR: DreamBooth fine-tunes text-to-image diffusion models to generate subject-driven images from text prompts, leveraging unique subject identifiers and a new autogenous class-specific prior preservation loss.

...read moreread less

871

•Proceedings Article•10.1109/cvpr52688.2022.01117

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

01 Jun 2022

TL;DR: RePaint as discussed by the authors employs a pretrained unconditional DDPM as the generative prior to condition the generation process, and only alter the reverse diffusion iterations by sampling the unmasked regions using the given image infor-mation.

...read moreread less

676

...

Expand

References

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Posted Content

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

- 02 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less

21K

•Posted Content

Decoupled Weight Decay Regularization

Ilya Loshchilov, +1 more

- 14 Nov 2017

- arXiv: Learning

TL;DR: This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance.

...read moreread less

14.4K

•Posted Content

Conditional Generative Adversarial Nets

Mehdi Mirza, +1 more

- 06 Nov 2014

- arXiv: Learning

TL;DR: The conditional version of generative adversarial nets is introduced, which can be constructed by simply feeding the data, y, to the generator and discriminator, and it is shown that this model can generate MNIST digits conditioned on class labels.

...read moreread less

12.1K

•Proceedings Article•10.1109/CVPR.2019.00453

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, +2 more

- 15 Jun 2019

TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

...read moreread less

11.7K

...

Expand

Diffusion Models Beat GANs on Image Synthesis

Chat with Paper

AI Agents for this Paper

Citations

High-Resolution Image Synthesis with Latent Diffusion Models

Taming Transformers for High-Resolution Image Synthesis

Adding Conditional Control to Text-to-Image Diffusion Models

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

References

Adam: A Method for Stochastic Optimization

Rethinking the Inception Architecture for Computer Vision

Decoupled Weight Decay Regularization

Conditional Generative Adversarial Nets

A Style-Based Generator Architecture for Generative Adversarial Networks

Related Papers (5)

Denoising Diffusion Probabilistic Models

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Analyzing and Improving the Image Quality of StyleGAN

Auto-Encoding Variational Bayes

A connection between score matching and denoising autoencoders