Video Diffusion Models

doi:10.48550/arxiv.2204.03458

Open AccessPosted Content10.48550/arxiv.2204.03458

Video Diffusion Models

07 Apr 2022

246

TL;DR: The authors proposed a diffusion model for video generation, which is a natural extension of the standard image diffusion architecture and enables jointly training from image and video data, which they find to reduce the variance of minibatch gradients and speed up optimization.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/iccv51070.2023.01816

DiffusionDet: Diffusion Model for Object Detection

Shoufa Chen, +3 more

- 01 Oct 2023

TL;DR: DiffusionDet is a novel object detection framework based on a diffusion process, achieving competitive performance with flexibility in the number of boxes and iterations.

...read moreread less

176

Journal Article•10.1109/tkde.2024.3361474

A Survey on Generative Diffusion Models

Hanqun Cao, +6 more

- 06 Sep 2022

- IEEE Transactions on Knowledge and Data ...

TL;DR: This survey comprehensively elucidates the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion from three distinct angles: the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion.

...read moreread less

145

Journal Article•10.1109/cvpr52729.2023.02171

All are Worth Words: A ViT Backbone for Diffusion Models

Fan Bao, +6 more

- 01 Jun 2023

TL;DR: U-ViT is a ViT-based architecture for diffusion models that achieves comparable or superior performance to a CNN-based U-Net. It utilizes long skip connections between shallow and deep layers and treats all inputs as tokens.

...read moreread less

66

Journal Article•10.1109/cvpr52729.2023.00421

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

Norman Müller, +5 more

- 01 Jun 2023

TL;DR: DiffRF is a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. It directly generates volumetric radiance fields from posed images, learns multi-view consistent priors, and enables free-view synthesis and accurate shape generation.

...read moreread less

63

Journal Article•10.1109/comst.2024.3353265

Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

Minrui Xu, +11 more

- 01 Jan 2024

- IEEE Communications Surveys and Tutorial...

TL;DR: AIGC services deployed at mobile edge networks provide personalized and customized AIGC services in real time while maintaining user privacy.

...read moreread less

61

...

Expand

References

10.48550/arxiv.1712.09763

PixelSNAIL: an Improved Autoregressive Generative Model.

Xi Chen, +3 more

TL;DR: Researchers introduce PixelSNAIL, an improved autoregressive generative model combining causal convolutions with self-attention, achieving state-of-the-art log-likelihood results on CIFAR-10 and ImageNet, outperforming previous models with 2.85 and 3.80 bits per dim, respectively.

...read moreread less

10.48550/arxiv.1907.06571

Adversarial Video Generation on Complex Datasets

Aidan Clark, +2 more

TL;DR: This study presents a large-scale Generative Adversarial Network (GAN) model, DVD-GAN, that generates high-fidelity video samples on complex datasets, achieving state-of-the-art results in video synthesis and prediction tasks, particularly on Kinetics-600 and UCF-101 datasets.

...read moreread less

10.48550/arxiv.2011.03864

Latent Neural Differential Equations for Video Generation

Cade Gordon, +1 more

TL;DR: This study introduces Latent Neural Differential Equations for video generation, leveraging their continuous time representation to improve quality and efficiency, achieving a new state-of-the-art Inception Score of 15.20 in 64x64 pixel unconditional video generation.

...read moreread less

10.48550/arxiv.2112.10741

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Alex Nichol, +7 more

TL;DR: Researchers develop GLIDE, a text-guided diffusion model for photorealistic image generation and editing, outperforming DALL-E in human evaluations and demonstrating fine-tuning capabilities for image inpainting and text-driven editing.

...read moreread less

10.48550/arxiv.1212.0402

UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild

Khurram Soomro, +2 more

TL;DR: UCF101 is a large-scale human action dataset with 101 classes, 13k clips, and 27 hours of video data, featuring realistic user-uploaded videos with camera motion and cluttered backgrounds, posing a challenging task for action recognition.

...read moreread less