High-Resolution Image Synthesis with Latent Diffusion Models

Overview

Latent Diffusion Models

arXiv | BibTeX

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
* equal contribution

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Model Zoo

Pretrained Autoencoding Models

rec2

Model FID vs val PSNR PSIM Link Comments
f=4, VQ (Z=8192, d=3) 0.58 27.43 +/- 4.26 0.53 +/- 0.21 https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
f=4, VQ (Z=8192, d=3) 1.06 25.21 +/- 4.17 0.72 +/- 0.26 https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 no attention
f=8, VQ (Z=16384, d=4) 1.14 23.07 +/- 3.99 1.17 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
f=8, VQ (Z=256, d=4) 1.49 22.35 +/- 3.81 1.26 +/- 0.37 https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
f=16, VQ (Z=16384, d=8) 5.15 20.83 +/- 3.61 1.73 +/- 0.43 https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1
f=4, KL 0.27 27.53 +/- 4.54 0.55 +/- 0.24 https://ommer-lab.com/files/latent-diffusion/kl-f4.zip
f=8, KL 0.90 24.19 +/- 4.19 1.02 +/- 0.35 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
f=16, KL (d=16) 0.87 24.08 +/- 4.22 1.07 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
f=32, KL (d=64) 2.04 22.27 +/- 3.93 1.41 +/- 0.40 https://ommer-lab.com/files/latent-diffusion/kl-f32.zip

Get the models

Running the following script downloads und extracts all available pretrained autoencoding models.

bash scripts/download_first_stages.sh

The first stage models can then be found in models/first_stage_models/<model_spec>

Pretrained LDMs

Datset Task Model FID IS Prec Recall Link Comments
CelebA-HQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 5.11 (5.11) 3.29 0.72 0.49 https://ommer-lab.com/files/latent-diffusion/celeba.zip
FFHQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 4.98 (4.98) 4.50 (4.50) 0.73 0.50 https://ommer-lab.com/files/latent-diffusion/ffhq.zip
LSUN-Churches Unconditional Image Synthesis LDM-KL-8 (400 DDIM steps, eta=0) 4.02 (4.02) 2.72 0.64 0.52 https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip
LSUN-Bedrooms Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 2.95 (3.0) 2.22 (2.23) 0.66 0.48 https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip
ImageNet Class-conditional Image Synthesis LDM-VQ-8 (200 DDIM steps, eta=1) 7.77(7.76)* /15.82** 201.56(209.52)* /78.82** 0.84* / 0.65** 0.35* / 0.63** https://ommer-lab.com/files/latent-diffusion/cin.zip *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by ADM
Conceptual Captions Text-conditional Image Synthesis LDM-VQ-f4 (100 DDIM steps, eta=0) 16.79 13.89 N/A N/A https://ommer-lab.com/files/latent-diffusion/text2img.zip finetuned from LAION
OpenImages Super-resolution LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation
OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip finetuned on resolution 512x512

Get the models

The LDMs listed above can jointly be downloaded and extracted via

bash scripts/download_models.sh

The models can then be found in models/ldm/<model_spec>.

Sampling with unconditional models

We provide a first script for sampling from our unconditional models. Start it via

CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> 

Inpainting

inpainting

Download the pre-trained weights

wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

and sample with

python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

indir should contain images *.png and masks <image_fname>_mask.png like the examples provided in data/inpainting_examples.

Train your own LDMs

Data preparation

Faces

For downloading the CelebA-HQ and FFHQ datasets, proceed as described in the taming-transformers repository.

LSUN

The LSUN datasets can be conveniently downloaded via the script available here. We performed a custom split into training and validation images, and provide the corresponding filenames at https://ommer-lab.com/files/lsun.zip. After downloading, extract them to ./data/lsun. The beds/cats/churches subsets should also be placed/symlinked at ./data/lsun/bedrooms/./data/lsun/cats/./data/lsun/churches, respectively.

ImageNet

The code will try to download (through Academic Torrents) and prepare ImageNet the first time it is used. However, since ImageNet is quite large, this requires a lot of disk space and time. If you already have ImageNet on your disk, you can speed things up by putting the data into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ (which defaults to ~/.cache/autoencoders/data/ILSVRC2012_{split}/data/), where {split} is one of train/validation. It should have the following structure:

${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
├── n01440764
│   ├── n01440764_10026.JPEG
│   ├── n01440764_10027.JPEG
│   ├── ...
├── n01443537
│   ├── n01443537_10007.JPEG
│   ├── n01443537_10014.JPEG
│   ├── ...
├── ...

If you haven't extracted the data, you can also place ILSVRC2012_img_train.tar/ILSVRC2012_img_val.tar (or symlinks to them) into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/ / ${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/, which will then be extracted into above structure without downloading it again. Note that this will only happen if neither a folder ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ nor a file ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready exist. Remove them if you want to force running the dataset preparation again.

Model Training

Logs and checkpoints for trained models are saved to logs/<START_DATE_AND_TIME>_<config_spec>.

Training autoencoder models

Configs for training a KL-regularized autoencoder on ImageNet are provided at configs/autoencoder. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec>.yaml -t --gpus 0,    

where config_spec is one of {autoencoder_kl_8x8x64(f=32, d=64), autoencoder_kl_16x16x16(f=16, d=16), autoencoder_kl_32x32x4(f=8, d=4), autoencoder_kl_64x64x3(f=4, d=3)}.

For training VQ-regularized models, see the taming-transformers repository.

Training LDMs

In configs/latent-diffusion/ we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,

where <config_spec> is one of {celebahq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3),ffhq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_bedrooms-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_churches-ldm-vq-4(f=8, KL-reg. autoencoder, spatial size 32x32x4),cin-ldm-vq-8(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.

Coming Soon...

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

DeepMind 892 Dec 28, 2022
NeRF visualization library under construction

NeRF visualization library using PlenOctrees, under construction pip install nerfvis Docs will be at: https://nerfvis.readthedocs.org import nerfvis s

Alex Yu 196 Jan 04, 2023
Summary of related papers on visual attention

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper Vision-Attention-Papers Channel attention Spatial attention Temp

MenghaoGuo 2.1k Dec 30, 2022
Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

Thor 66 Jan 04, 2023
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual VQA (CF-VQA) This repository is the Pytorch implementation of our paper "Counterfactual VQA: A Cause-Effect Look at Language Bias" in C

Yulei Niu 94 Dec 03, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 16.2k Jan 05, 2023
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Overview This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and hand

6 Jul 08, 2022
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022
An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

AdaOPS An implementation of the AdaOPS (Adaptive Online Packing-guided Search), which is an online POMDP Solver used to solve problems defined with th

9 Oct 05, 2022
PyTorch implementation of Neural Dual Contouring.

NDC PyTorch implementation of Neural Dual Contouring. Citation We are still writing the paper while adding more improvements and applications. If you

Zhiqin Chen 140 Dec 26, 2022
Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)

taganomaly Anomaly detection labeling tool, specifically for multiple time series (one time series per category). Taganomaly is a tool for creating la

Microsoft 272 Dec 17, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save SAVE_NAME --data PATH_TO_DATA_DIR --dataset DATASET --model model_name [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
official code for dynamic convolution decomposition

Revisiting Dynamic Convolution via Matrix Decomposition (ICLR 2021) A pytorch implementation of DCD. If you use this code in your research please cons

Yunsheng Li 110 Nov 23, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 08, 2023
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 01, 2023
Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

Code for the paper Jointly Efficient and Optimal Algorithms for Logistic Bandits, by Louis Faury, Marc Abeille, Clément Calauzènes and Kwang-Sun Jun.

Faury Louis 1 Jan 22, 2022
机器学习、深度学习、自然语言处理等人工智能基础知识总结。

说明 机器学习、深度学习、自然语言处理基础知识总结。 目前主要参考李航老师的《统计学习方法》一书,也有一些内容例如XGBoost、聚类、深度学习相关内容、NLP相关内容等是书中未提及的。

Peter 445 Dec 12, 2022
AI-Fitness-Tracker - AI Fitness Tracker With Python

AI-Fitness-Tracker We have build a AI based Fitness Tracker using OpenCV and Pyt

Sharvari Mangale 5 Feb 09, 2022