Deep Unsupervised Learning Using Nonequilibrium Thermodynamics
Jascha Sohl-Dickstein,Eric A. Weiss,Niru Maheswaranathan,Surya Ganguli +3 more
318
TL;DR: Researchers develop a deep unsupervised learning approach inspired by non-equilibrium statistical physics, enabling flexible and tractable generative models with thousands of layers, rapid learning, sampling, and evaluation, and open-source implementation.
read more
Abstract: A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz,Yuanzhen Li,Varun Jampani,Yael Pritch,Michael Rubinstein,Kfir Aberman +5 more
- 01 Jun 2023
TL;DR: DreamBooth fine-tunes text-to-image diffusion models to generate subject-driven images from text prompts, leveraging unique subject identifiers and a new autogenous class-specific prior preservation loss.
871
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann,Tim Dockhorn,Sumith Kulal,Daniel Mendelevitch,Maciej Kilian,Dominik Lorenz +5 more
TL;DR: This paper identifies and evaluates three different stages for successful training of video LDMs: text-to-image Pretraining, video pretraining, and high-quality video finetuning, and shows that the necessity of a well-curated pretraining dataset for generating high- quality videos and a systematic curation process to train a strong base model.
361
Imagic: Text-Based Real Image Editing with Diffusion Models
Bahjat Kawar,Shiran Zada,Oran Lang,Omer Tov,Hui‐Wen Chang,Tali Dekel,Inbar Mosseri,Michal Irani +7 more
- 01 Jun 2023
TL;DR: Imagic is the first method to apply complex text-based semantic edits to a single real image. It requires only a single input image and a target text, and can produce high-quality complex semantic edits.
358
Vector Quantized Diffusion Model for Text-to-Image Synthesis
01 Jun 2022
TL;DR: In this paper , a vector quantized diffusion (VQ-Diffusion) model is proposed for text-to-image generation, whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM).
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
TL;DR: The proposed IP-Adapter is an effective and lightweight adapter to achieve image prompt capability for the pretrained text-to-image diffusion models and has the benefit of the decoupled cross-attention strategy, the image prompt can also work well with the text prompt to achieve multimodal image generation.
References
Training products of experts by minimizing contrastive divergence
TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.
A sparse texture representation using local affine regions
TL;DR: The proposed texture representation is evaluated in retrieval and classification tasks using the entire Brodatz database and a publicly available collection of 1,000 photographs of textured surfaces taken from different viewpoints.
Theano: A CPU and GPU Math Compiler in Python
James Bergstra,Olivier Breuleux,Frédéric Bastien,Pascal Lamblin,Razvan Pascanu,Guillaume Desjardins,Joseph Turian,David Warde-Farley,Yoshua Bengio +8 more
- 01 Jan 2010
TL;DR: This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design.
A Tutorial on Bayesian Nonparametric Models
Samuel J. Gershman,David M. Blei +1 more
TL;DR: This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application.
647
A New Learning Algorithm for Mean Field Boltzmann Machines
Max Welling,Geoffrey E. Hinton +1 more
- 28 Aug 2002
TL;DR: A new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion that eliminates the need to estimate equilibrium statistics, so it does not need to approximate the multimodal probability distribution of the free network with the unimodal mean field distribution.