High-Resolution Image Synthesis with Latent Diffusion Models

Overview

Latent Diffusion Models

arXiv | BibTeX

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
* equal contribution

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Model Zoo

Pretrained Autoencoding Models

rec2

Model FID vs val PSNR PSIM Link Comments
f=4, VQ (Z=8192, d=3) 0.58 27.43 +/- 4.26 0.53 +/- 0.21 https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
f=4, VQ (Z=8192, d=3) 1.06 25.21 +/- 4.17 0.72 +/- 0.26 https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 no attention
f=8, VQ (Z=16384, d=4) 1.14 23.07 +/- 3.99 1.17 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
f=8, VQ (Z=256, d=4) 1.49 22.35 +/- 3.81 1.26 +/- 0.37 https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
f=16, VQ (Z=16384, d=8) 5.15 20.83 +/- 3.61 1.73 +/- 0.43 https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1
f=4, KL 0.27 27.53 +/- 4.54 0.55 +/- 0.24 https://ommer-lab.com/files/latent-diffusion/kl-f4.zip
f=8, KL 0.90 24.19 +/- 4.19 1.02 +/- 0.35 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
f=16, KL (d=16) 0.87 24.08 +/- 4.22 1.07 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
f=32, KL (d=64) 2.04 22.27 +/- 3.93 1.41 +/- 0.40 https://ommer-lab.com/files/latent-diffusion/kl-f32.zip

Get the models

Running the following script downloads und extracts all available pretrained autoencoding models.

bash scripts/download_first_stages.sh

The first stage models can then be found in models/first_stage_models/<model_spec>

Pretrained LDMs

Datset Task Model FID IS Prec Recall Link Comments
CelebA-HQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 5.11 (5.11) 3.29 0.72 0.49 https://ommer-lab.com/files/latent-diffusion/celeba.zip
FFHQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 4.98 (4.98) 4.50 (4.50) 0.73 0.50 https://ommer-lab.com/files/latent-diffusion/ffhq.zip
LSUN-Churches Unconditional Image Synthesis LDM-KL-8 (400 DDIM steps, eta=0) 4.02 (4.02) 2.72 0.64 0.52 https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip
LSUN-Bedrooms Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 2.95 (3.0) 2.22 (2.23) 0.66 0.48 https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip
ImageNet Class-conditional Image Synthesis LDM-VQ-8 (200 DDIM steps, eta=1) 7.77(7.76)* /15.82** 201.56(209.52)* /78.82** 0.84* / 0.65** 0.35* / 0.63** https://ommer-lab.com/files/latent-diffusion/cin.zip *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by ADM
Conceptual Captions Text-conditional Image Synthesis LDM-VQ-f4 (100 DDIM steps, eta=0) 16.79 13.89 N/A N/A https://ommer-lab.com/files/latent-diffusion/text2img.zip finetuned from LAION
OpenImages Super-resolution LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation
OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip finetuned on resolution 512x512

Get the models

The LDMs listed above can jointly be downloaded and extracted via

bash scripts/download_models.sh

The models can then be found in models/ldm/<model_spec>.

Sampling with unconditional models

We provide a first script for sampling from our unconditional models. Start it via

CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> 

Inpainting

inpainting

Download the pre-trained weights

wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

and sample with

python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

indir should contain images *.png and masks <image_fname>_mask.png like the examples provided in data/inpainting_examples.

Train your own LDMs

Data preparation

Faces

For downloading the CelebA-HQ and FFHQ datasets, proceed as described in the taming-transformers repository.

LSUN

The LSUN datasets can be conveniently downloaded via the script available here. We performed a custom split into training and validation images, and provide the corresponding filenames at https://ommer-lab.com/files/lsun.zip. After downloading, extract them to ./data/lsun. The beds/cats/churches subsets should also be placed/symlinked at ./data/lsun/bedrooms/./data/lsun/cats/./data/lsun/churches, respectively.

ImageNet

The code will try to download (through Academic Torrents) and prepare ImageNet the first time it is used. However, since ImageNet is quite large, this requires a lot of disk space and time. If you already have ImageNet on your disk, you can speed things up by putting the data into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ (which defaults to ~/.cache/autoencoders/data/ILSVRC2012_{split}/data/), where {split} is one of train/validation. It should have the following structure:

${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
├── n01440764
│   ├── n01440764_10026.JPEG
│   ├── n01440764_10027.JPEG
│   ├── ...
├── n01443537
│   ├── n01443537_10007.JPEG
│   ├── n01443537_10014.JPEG
│   ├── ...
├── ...

If you haven't extracted the data, you can also place ILSVRC2012_img_train.tar/ILSVRC2012_img_val.tar (or symlinks to them) into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/ / ${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/, which will then be extracted into above structure without downloading it again. Note that this will only happen if neither a folder ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ nor a file ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready exist. Remove them if you want to force running the dataset preparation again.

Model Training

Logs and checkpoints for trained models are saved to logs/<START_DATE_AND_TIME>_<config_spec>.

Training autoencoder models

Configs for training a KL-regularized autoencoder on ImageNet are provided at configs/autoencoder. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec>.yaml -t --gpus 0,    

where config_spec is one of {autoencoder_kl_8x8x64(f=32, d=64), autoencoder_kl_16x16x16(f=16, d=16), autoencoder_kl_32x32x4(f=8, d=4), autoencoder_kl_64x64x3(f=4, d=3)}.

For training VQ-regularized models, see the taming-transformers repository.

Training LDMs

In configs/latent-diffusion/ we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,

where <config_spec> is one of {celebahq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3),ffhq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_bedrooms-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_churches-ldm-vq-4(f=8, KL-reg. autoencoder, spatial size 32x32x4),cin-ldm-vq-8(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.

Coming Soon...

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum

CO-PILOT CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum, NeurIPS 2021, Shuang Ao, Tianyi Zhou, Guodong Long, Qingh

Shuang Ao 1 Feb 18, 2022
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Parsa Dahesh 6 Dec 14, 2022
FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

Gaurav Parmar 606 Jan 06, 2023
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022
Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

IROS21 information To test the code and reproduce the experiments, follow the installation steps in Installation.md. Afterwards, follow the steps in E

11 Oct 29, 2022
Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati

Ethyca 44 Dec 06, 2022
ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

TitlESP32 ESP32 MicroPython application to read and log data from a Tilt™ Hydrometer. Requirements A board with an ESP32 chip USB cable - USB A / micr

IoBeer 5 Dec 01, 2022
EdiBERT, a generative model for image editing

EdiBERT, a generative model for image editing EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The

16 Dec 07, 2022
Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

Shufflenet-v2-Pytorch Introduction This is a Pytorch implementation of faceplusplus's ShuffleNet-v2. For details, please read the following papers:

423 Dec 07, 2022
PyTorch and Tensorflow functional model definitions

functional-zoo Model definitions and pretrained weights for PyTorch and Tensorflow PyTorch, unlike lua torch, has autograd in it's core, so using modu

Sergey Zagoruyko 590 Dec 22, 2022
[ECCV2020] Content-Consistent Matching for Domain Adaptive Semantic Segmentation

[ECCV20] Content-Consistent Matching for Domain Adaptive Semantic Segmentation This is a PyTorch implementation of CCM. News: GTA-4K list is available

Guangrui Li 88 Aug 25, 2022
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org

Yuanyuan Yuan 172 Dec 29, 2022
SysWhispers Shellcode Loader

Shhhloader Shhhloader is a SysWhispers Shellcode Loader that is currently a Work in Progress. It takes raw shellcode as input and compiles a C++ stub

icyguider 630 Jan 03, 2023
Deep Learning tutorials in jupyter notebooks.

DeepSchool.io Sign up here for Udemy Course on Machine Learning (Use code DEEPSCHOOL-MARCH to get 85% off course). Goals Make Deep Learning easier (mi

Sachin Abeywardana 1.8k Dec 28, 2022
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

Sumith Kulal 40 Dec 05, 2022
Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

yueliu1999 297 Dec 27, 2022
A small tool to joint picture including gif

README 做设计的时候遇到拼接长图的情况,但是发现没有什么好用的能拼接gif的工具。 于是自己写了个gif拼接小工具。 可以自动拼接gif、png和jpg等常见格式。 效果 从上至下 从下至上 从左至右 从右至左 使用 克隆仓库 git clone https://github.com/Dels

3 Dec 15, 2021
Personalized Federated Learning using Pytorch (pFedMe)

Personalized Federated Learning with Moreau Envelopes (NeurIPS 2020) This repository implements all experiments in the paper Personalized Federated Le

Charlie Dinh 226 Dec 30, 2022
Dynamical movement primitives (DMPs), probabilistic movement primitives (ProMPs), spatially coupled bimanual DMPs.

Movement Primitives Movement primitives are a common group of policy representations in robotics. There are many different types and variations. This

DFKI Robotics Innovation Center 63 Jan 06, 2023