TokenFlow: Consistent Diffusion Features for Consistent Video Editing

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

19 Jul 2023

116

TL;DR: In this article , a text-to-image diffusion model is proposed to generate a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2311.17982

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, +15 more

- 29 Nov 2023

- arXiv.org

TL;DR: VBench is a comprehensive benchmark suite that dissects video generation quality into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods, and provides a dataset of human preference annotations to validate the benchmarks' alignment with human perception.

...read moreread less

105

Journal Article•10.48550/arxiv.2312.14125

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dan Kondratyuk, +30 more

- 21 Dec 2023

- arXiv.org

TL;DR: Empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation are presented, specifically highlighting VideoPoet's ability to generate high-fidelity motions.

...read moreread less

88

Journal Article•10.48550/arXiv.2305.13840

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

Weifeng Chen, +7 more

- 23 May 2023

- arXiv.org

TL;DR: In this article , a controllable text-to-video diffusion model, named Video-ControlNet, is proposed to generate videos conditioned on a sequence of control signals, such as edge or depth maps.

...read moreread less

76

Journal Article•10.48550/arxiv.2310.07204

State of the Art on Diffusion Models for Visual Computing

Ryan Po, +17 more

- 11 Oct 2023

- arXiv.org

TL;DR: The basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others are introduced.

...read moreread less

63

Journal Article•10.48550/arxiv.2310.10647

A Survey on Video Diffusion Models

Zhen Xing, +7 more

- 16 Oct 2023

- arXiv.org

TL;DR: This paper presents a comprehensive review of video diffusion models in the AIGC era, with a concise introduction to the fundamentals and evolution of diffusion models, and presents an overview of research on diffusion Models in the video domain.

...read moreread less

41

...

Expand

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Book Chapter•10.1007/978-3-319-24574-4_28

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 05 Oct 2015

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

92K

•Posted Content

Denoising Diffusion Probabilistic Models

Jonathan Ho, +2 more

- 19 Jun 2020

- arXiv: Learning

TL;DR: High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

...read moreread less

11.7K

•Posted Content

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein, +3 more

- 12 Mar 2015

- arXiv: Learning

TL;DR: This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.

...read moreread less

4.6K

Journal Article•10.48550/arXiv.2204.06125

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, +4 more

- 13 Apr 2022

- arXiv.org

TL;DR: This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.

...read moreread less

4.3K

...

Expand