Video Diffusion Models

doi:10.48550/arXiv.2204.03458

Proceedings Article10.48550/arXiv.2204.03458

Video Diffusion Models

Jonathan Ho, +5 more

- 07 Apr 2022

Vol. abs/2204.03458

841

TL;DR: The authors proposed a diffusion model for video generation, which is a natural extension of the standard image diffusion architecture and enables jointly training from image and video data, which they find to reduce the variance of minibatch gradients and speed up optimization.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.48550/arXiv.2206.00364

Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras, +3 more

- 01 Jun 2022

TL;DR: This work argues that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seeks to remedy the situation by presenting a design space that clearly separates the concrete design choices, and identifies several changes to both the sampling and training processes, as well as preconditioning of the score networks.

...read moreread less

966

Journal Article•10.48550/arXiv.2211.09800

InstructPix2Pix: Learning to Follow Image Editing Instructions

Tim Brooks, +2 more

- 17 Nov 2022

- arXiv.org

TL;DR: This paper proposed a method for editing images from human instructions, where given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.

...read moreread less

879

Journal Article•10.48550/arXiv.2210.02303

Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho, +10 more

- 05 Oct 2022

- arXiv.org

TL;DR: Imagen Video is presented, a text-conditional video generation system based on a cascade of video diffusion models not only capable of generating videos of high quality, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.

...read moreread less

838

Journal Article•10.48550/arXiv.2209.00796

Diffusion Models: A Comprehensive Survey of Methods and Applications

Ling Yang, +8 more

- 02 Sep 2022

- arXiv.org

TL;DR: A comprehensive review of existing variants of the diffusion models and a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial puriﬁcation.

...read moreread less

734

Proceedings Article•10.48550/arXiv.2206.00927

DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

Cheng Lu, +5 more

- 02 Jun 2022

TL;DR: This work proposes DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee, suitable for both discrete-time and continuous-time DPMs without any further training.

...read moreread less

734

...

Expand

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Book Chapter•10.1007/978-3-319-24574-4_28

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 05 Oct 2015

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

92K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Proceedings Article•10.1109/CVPR.2018.00813

Non-local Neural Networks

Xiaolong Wang, +3 more

- 18 Jun 2018

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.

...read moreread less

12.6K

•Posted Content

Denoising Diffusion Probabilistic Models

Jonathan Ho, +2 more

- 19 Jun 2020

- arXiv: Learning

TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

...read moreread less

11.7K

...

Expand