Proceedings Article10.48550/arXiv.2204.03458
Video Diffusion Models
Jonathan Ho,Tim Salimans,Alexey Gritsenko,William Chan,Mohammad Norouzi,David J. Fleet +5 more
- 07 Apr 2022
Vol. abs/2204.03458
841
TL;DR: The authors proposed a diffusion model for video generation, which is a natural extension of the standard image diffusion architecture and enables jointly training from image and video data, which they find to reduce the variance of minibatch gradients and speed up optimization.
read more
Abstract: Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to reduce the variance of minibatch gradients and speed up optimization. To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on established benchmarks for video prediction and unconditional video generation. Supplementary material is available at https://video-diffusion.github.io/
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras,Miika Aittala,Timo Aila,Samuli Laine +3 more
- 01 Jun 2022
TL;DR: This work argues that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seeks to remedy the situation by presenting a design space that clearly separates the concrete design choices, and identifies several changes to both the sampling and training processes, as well as preconditioning of the score networks.
InstructPix2Pix: Learning to Follow Image Editing Instructions
TL;DR: This paper proposed a method for editing images from human instructions, where given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho,V. K. Chan,Chitwan Saharia,Jay Whang,Ruiqi Gao,Alexey A. Gritsenko,Diederik P. Kingma,Ben Poole,Mahmood Norouzi,David J. Fleet,Tim Salimans +10 more
TL;DR: Imagen Video is presented, a text-conditional video generation system based on a cascade of video diffusion models not only capable of generating videos of high quality, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang,Zhilong Zhang,Shenda Hong,Runsheng Xu,Yue Zhao,Yingxia Shao,Wentao Zhang,Min Yang,Bin Cui +8 more
TL;DR: A comprehensive review of existing variants of the diffusion models and a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification.
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
Cheng Lu,Yuhao Zhou,Fan Bao,Jianfei Chen,Chongxuan Li,Jun Zhu +5 more
- 02 Jun 2022
TL;DR: This work proposes DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee, suitable for both discrete-time and continuous-time DPMs without any further training.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger,Philipp Fischer,Thomas Brox +2 more
- 05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Non-local Neural Networks
Xiaolong Wang,Ross Girshick,Abhinav Gupta,Kaiming He +3 more
- 18 Jun 2018
TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.
•Posted Content
Denoising Diffusion Probabilistic Models
TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.