Video Diffusion Models

doi:10.48550/arxiv.2204.03458

Open AccessPosted Content10.48550/arxiv.2204.03458

Video Diffusion Models

07 Apr 2022

246

TL;DR: The authors proposed a diffusion model for video generation, which is a natural extension of the standard image diffusion architecture and enables jointly training from image and video data, which they find to reduce the variance of minibatch gradients and speed up optimization.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/tpami.2023.3261988

Diffusion Models in Vision: A Survey

01 Jan 2023

- IEEE Transactions on Pattern Analysis an...

TL;DR: Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling as discussed by the authors , and are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.

...read moreread less

568

Review•10.1145/3626235

Diffusion Models: A Comprehensive Survey of Methods and Applications

Lu Yang, +8 more

- 09 Nov 2023

- ACM Computing Surveys

TL;DR: Diffusion models are a powerful family of generative models for image, video, and molecule generation with record-breaking performance. This survey categorizes the research into sampling, likelihood estimation, and data handling with special structures. It also discusses potential combinations with other generative models and applications in various fields.

...read moreread less

478

Journal Article•10.1109/cvpr52729.2023.01764

InstructPix2Pix: Learning to Follow Image Editing Instructions

Tim Brooks, +2 more

- 01 Jun 2023

TL;DR: InstructPix2Pix learns to edit images from human instructions by generating a large dataset of image editing examples and training a conditional diffusion model on it.

...read moreread less

458

Journal Article•10.1109/cvpr52729.2023.02161

Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Andreas Blattmann, +6 more

- 01 Jun 2023

TL;DR: High-resolution video synthesis with latent diffusion models enables high-quality video generation while reducing compute demands.

...read moreread less

226

Journal Article•10.1109/iccv51070.2023.00701

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu, +9 more

- 01 Oct 2023

TL;DR: Tune-A-Video enables one-shot tuning of image diffusion models for T2V generation, leveraging pre-trained T2I models and introducing a tailored spatio-temporal attention mechanism.

...read moreread less

176

...

Expand

References

10.48550/arxiv.1906.02634

Scaling Autoregressive Video Models

Dirk Weissenborn, +2 more

TL;DR: This study presents a conceptually simple autoregressive video generation model using 3D self-attention, achieving competitive results on benchmark datasets with high fidelity and realism, and demonstrating potential for modeling complex phenomena in large-scale datasets like Kinetics.

...read moreread less

10.48550/arxiv.1503.03585

Deep Unsupervised Learning Using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein, +3 more

TL;DR: Researchers develop a deep unsupervised learning approach inspired by non-equilibrium statistical physics, enabling flexible and tractable generative models with thousands of layers, rapid learning, sampling, and evaluation, and open-source implementation.

...read moreread less

10.48550/arxiv.2105.05233

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal, +1 more

TL;DR: Diffusion models outperform GANs in image synthesis, achieving superior sample quality through architecture ablation and classifier guidance, with FID scores of 2.97-7.72 on ImageNet, and matching BigGAN-deep with fewer forward passes.

...read moreread less

10.48550/arxiv.2009.09761

DiffWave: A Versatile Diffusion Model for Audio Synthesis.

Zhifeng Kong, +4 more

TL;DR: DiffWave is a non-autoregressive diffusion model for audio synthesis, producing high-fidelity audios in various tasks, outperforming WaveNet vocoder in speech quality and outperforming autoregressive and GAN-based models in unconditional generation.

...read moreread less

10.48550/arxiv.1812.01608

Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling.

Jacob Menick, +1 more

TL;DR: Researchers propose Subscale Pixel Networks (SPNs) and Multidimensional Upscaling to generate high-fidelity images, addressing challenges in encoding context and preserving detail. They achieve state-of-the-art results on CelebAHQ and ImageNet datasets, setting new benchmarks in unconditional image generation.

...read moreread less