SEGA: Instructing Diffusion using Semantic Dimensions

doi:10.48550/arXiv.2301.12247

Journal Article10.48550/arXiv.2301.12247

SEGA: Instructing Diffusion using Semantic Dimensions

Manuel Brack, +5 more

- 28 Jan 2023

- arXiv.org

- Vol. abs/2301.12247

24

TL;DR: In this article , the user can interact with the diffusion process to flexibly steer it along semantic directions, allowing for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2402.17525

Diffusion Model-Based Image Editing: A Survey

Yi Huang, +9 more

- 27 Feb 2024

- arXiv.org

TL;DR: This survey provides an exhaustive overview of diffusion model-based image editing methods, covering theoretical and practical aspects, including learning strategies, user-input conditions, and specific editing tasks, with a focus on inpainting and outpainting, and proposes a benchmark, EditEval, for evaluating text-guided image editing algorithms.

...read moreread less

38

Journal Article•10.48550/arxiv.2310.01506

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Xu Ju, +4 more

- 02 Oct 2023

- arXiv.org

TL;DR: Direct Inversion, a novel technique achieving optimal performance of both branches with just three lines of code, is introduced, which not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.

...read moreread less

35

Journal Article•10.48550/arXiv.2307.00522

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Linoy Tsaban

- 02 Jul 2023

- arXiv.org

TL;DR: In this paper , a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic guidance to real image editing, was proposed.

...read moreread less

16

Improving Negative-Prompt Inversion via Proximal Guidance

Ligong Han, +14 more

TL;DR: Proximal Negative-Prompt Inversion (ProxNPI) as mentioned in this paper extends the concepts of NTI and NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature.

...read moreread less

16

Journal Article•10.48550/arxiv.2310.16613

On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

Yixin Wu, +4 more

- 25 Oct 2023

- arXiv.org

TL;DR: This work proposes two poisoning attacks: a basic attack and a utility-preserving attack that are introduced as a viable mitigation strategy to maintain the attack stealthiness, while ensuring decent attack performance.

...read moreread less

7

...

Expand

References

•Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

27.5K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

•Proceedings Article•10.1109/CVPR.2019.00453

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, +2 more

- 15 Jun 2019

TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

...read moreread less

11.7K

•Proceedings Article•10.1109/ICCV.2015.425

Deep Learning Face Attributes in the Wild

Ziwei Liu, +3 more

- 07 Dec 2015

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

...read moreread less

10.1K

Journal Article•10.48550/arXiv.2204.06125

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, +4 more

- 13 Apr 2022

- arXiv.org

TL;DR: This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.

...read moreread less

4.3K

...

Expand