Video Diffusion Models

doi:10.48550/arxiv.2204.03458

Open AccessPosted Content10.48550/arxiv.2204.03458

Video Diffusion Models

07 Apr 2022

246

TL;DR: The authors proposed a diffusion model for video generation, which is a natural extension of the standard image diffusion architecture and enables jointly training from image and video data, which they find to reduce the variance of minibatch gradients and speed up optimization.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/tpami.2023.3261988

Diffusion Models in Vision: A Survey

01 Jan 2023

- IEEE Transactions on Pattern Analysis an...

TL;DR: Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling as discussed by the authors , and are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.

...read moreread less

568

Review•10.1145/3626235

Diffusion Models: A Comprehensive Survey of Methods and Applications

Lu Yang, +8 more

- 09 Nov 2023

- ACM Computing Surveys

TL;DR: Diffusion models are a powerful family of generative models for image, video, and molecule generation with record-breaking performance. This survey categorizes the research into sampling, likelihood estimation, and data handling with special structures. It also discusses potential combinations with other generative models and applications in various fields.

...read moreread less

478

Journal Article•10.1109/cvpr52729.2023.01764

InstructPix2Pix: Learning to Follow Image Editing Instructions

Tim Brooks, +2 more

- 01 Jun 2023

TL;DR: InstructPix2Pix learns to edit images from human instructions by generating a large dataset of image editing examples and training a conditional diffusion model on it.

...read moreread less

458

Journal Article•10.1109/cvpr52729.2023.02161

Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Andreas Blattmann, +6 more

- 01 Jun 2023

TL;DR: High-resolution video synthesis with latent diffusion models enables high-quality video generation while reducing compute demands.

...read moreread less

226

Journal Article•10.1109/iccv51070.2023.00701

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Jay Zhangjie Wu, +9 more

- 01 Oct 2023

TL;DR: Tune-A-Video enables one-shot tuning of image diffusion models for T2V generation, leveraging pre-trained T2I models and introducing a tailored spatio-temporal attention mechanism.

...read moreread less

176

...

Expand

References

10.48550/arxiv.1712.09763

PixelSNAIL: an Improved Autoregressive Generative Model.

Xi Chen, +3 more

TL;DR: Researchers introduce PixelSNAIL, an improved autoregressive generative model combining causal convolutions with self-attention, achieving state-of-the-art log-likelihood results on CIFAR-10 and ImageNet, outperforming previous models with 2.85 and 3.80 bits per dim, respectively.

...read moreread less

10.48550/arxiv.1907.06571

Adversarial Video Generation on Complex Datasets

Aidan Clark, +2 more

TL;DR: This study presents a large-scale Generative Adversarial Network (GAN) model, DVD-GAN, that generates high-fidelity video samples on complex datasets, achieving state-of-the-art results in video synthesis and prediction tasks, particularly on Kinetics-600 and UCF-101 datasets.

...read moreread less

10.48550/arxiv.2011.03864

Latent Neural Differential Equations for Video Generation

Cade Gordon, +1 more

TL;DR: This study introduces Latent Neural Differential Equations for video generation, leveraging their continuous time representation to improve quality and efficiency, achieving a new state-of-the-art Inception Score of 15.20 in 64x64 pixel unconditional video generation.

...read moreread less

10.48550/arxiv.2112.10741

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Alex Nichol, +7 more

TL;DR: Researchers develop GLIDE, a text-guided diffusion model for photorealistic image generation and editing, outperforming DALL-E in human evaluations and demonstrating fine-tuning capabilities for image inpainting and text-driven editing.

...read moreread less

10.48550/arxiv.1212.0402

UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild

Khurram Soomro, +2 more

TL;DR: UCF101 is a large-scale human action dataset with 101 classes, 13k clips, and 27 hours of video data, featuring realistic user-uploaded videos with camera motion and cluttered backgrounds, posing a challenging task for action recognition.

...read moreread less