Journal Article10.48550/arXiv.2301.12247
SEGA: Instructing Diffusion using Semantic Dimensions
Manuel Brack,Felix Friedrich,Dominik Hintersdorf,Lukas Struppek,Patrick Schramowski,Kristian Kersting +5 more
24
TL;DR: In this article , the user can interact with the diffusion process to flexibly steer it along semantic directions, allowing for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception.
read more
Abstract: Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Diffusion Model-Based Image Editing: A Survey
Yi Huang,Jiancheng Huang,Yifan Liu,Mingfu Yan,Jiaxi Lv,Jianzhuang Liu,Wei Xiong,He Zhang,Shifeng Chen,Liangliang Cao +9 more
TL;DR: This survey provides an exhaustive overview of diffusion model-based image editing methods, covering theoretical and practical aspects, including learning strategies, user-input conditions, and specific editing tasks, with a focus on inpainting and outpainting, and proposes a benchmark, EditEval, for evaluating text-guided image editing algorithms.
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xu Ju,Ailing Zeng,Yuxuan Bian,Shaoteng Liu,Qiang Xu +4 more
TL;DR: Direct Inversion, a novel technique achieving optimal performance of both branches with just three lines of code, is introduced, which not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
TL;DR: In this paper , a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic guidance to real image editing, was proposed.
16
Improving Negative-Prompt Inversion via Proximal Guidance
Ligong Han,Song Wen,Qi Chen,Zhixing Zhang,Kunpeng Song,Mengwei Ren,Ruijiang Gao,Yuxiao Chen,Di Liu,Qilong Zhangli,Anastasis Stathopoulos,Jindong Jiang,Zhaoyang Xia,Akash Krishna Srivastava,Dimitris N. Metaxas +14 more
TL;DR: Proximal Negative-Prompt Inversion (ProxNPI) as mentioned in this paper extends the concepts of NTI and NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature.
16
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
TL;DR: This work proposes two poisoning attacks: a basic attack and a utility-preserving attack that are introduced as a viable mitigation strategy to maintain the attack stealthiness, while ensuring decent attack performance.
7
References
•Proceedings Article
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov,Kai Chen,Greg S. Corrado,Jeffrey Dean +3 more
- 16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
27.5K
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras,Samuli Laine,Timo Aila +2 more
- 15 Jun 2019
TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.
Deep Learning Face Attributes in the Wild
Ziwei Liu,Ping Luo,Xiaogang Wang,Xiaoou Tang +3 more
- 07 Dec 2015
TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
Hierarchical Text-Conditional Image Generation with CLIP Latents
TL;DR: This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
4.3K