Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Overview

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

This is the repository containing code used for the Unleashing Transformers paper.

front_page_sample

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Sam Bond-Taylor*, Peter Hessey*, Hiroshi Sasaki, Toby P. Breckon, Chris G. Willcocks
* Authors contributed equally

Abstract

Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements.

front_page_sample

arXiv | BibTeX | Project Page

Table of Contents

Setup

Currently, a dedicated graphics card capable of running CUDA is required to run the code used in this repository. All models used for the paper were trained on a single NVIDIA RTX 2080 Ti using CUDA version 11.1.

Set up conda environment

To run the code in this repository we recommend you set up a virtual environment using conda. To get set up quickly, use miniconda.

Run the following command to clone this repo using git and create and activate the conda environment unleashing:

git clone https://github.com/samb-t/unleashing-transformers.git && cd unleashing-transformers
conda create --name unleashing --file requirements.yml
conda activate unleashing  

You should now be able to run all commands available in the following sections.

Dataset Setup

To configure the default paths for datasets used for training the models in this repo, simply edit datasets.yaml - changing the paths attribute of each dataset you wish to use to the path where your dataset is saved locally.

Dataset Official Link Academic Torrents Link
FFHQ Official FFHQ Academic Torrents FFHQ
LSUN Official LSUN Academic Torrents LSUN

Commands

This section contains details on the basic commands for training and calculating metrics on the Absorbing Diffusion models. All training was completed on a single NVIDIA RTX 2080 Ti and these commands presume the same level of hardware. If your GPU has less VRAM than a 2080 Ti then you may need to train using smaller batch sizes and/or smaller models than the defaults.

For a detailed list of all commands options, including altering model architecture, logging output, checkpointing frequency, etc., please add the --help flag to the end of your command.

All commands should be run from the head directory, i.e. the directory containing the README file.

Set up visdom server

Before training, you'll need to start a visdom server in order to easily view model output (loss graphs, reconstructions, etc.). To do this, run the following command:

visdom -p 8097

This starts a visdom server listening on port 8097, which is the default used by our models. If you navigate to localhost:8097 you will see be able to view the live server.

To specify a different port when training any models, use the --visdom_port flag.

Train a Vector-Quantized autoencoder on LSUN Churches

The following command starts the training for a VQGAN on LSUN Churches:

python3 train_vqgan.py --dataset churches --log_dir vqae_churches --amp --batch_size 4

As specified with the --log_dir flag, results will be saved to the directory logs/vqae_churches. This includes all logs, model checkpoints and saved outputs. The --amp flag enables mixed-precision training, necessary for training using a batch size of 4 (the default) on a single 2080 Ti.

Train an Absorbing Diffusion sampler using the above Vector-Quantized autoencoder

After training the VQ model using the previous command, you'll be able to run the following commands to train a discrete diffusion prior on the latent space of the Vector-Quantized model:

python3 train_sampler.py --sampler absorbing --dataset churches --log_dir absorbing_churches --ae_load_dir vqae_churches --ae_load_step 2200000 --amp 

The sampler needs to load the trained Vector-Quantized autoencoder in order to generate the latents it will use as for training (and validation). Latents are cached after the first time this is run to speed up training.

Experiments on trained Absorbing Diffusion Sampler

This section contains simple template commands for calculating metrics and other experiments on trained samplers.

Calculate FID

python experiments/calc_FID.py --sampler absorbing --dataset churches --log_dir FID_log --ae_load_dir vqae_churches --ae_load_step 2200000  --load_dir absorbing_churches --load_step 2000000 --n_samples 50000

Calculate PRDC Scores

python experiments/calc_PRDC.py --sampler absorbing --dataset churches --log_dir PRDC_log --ae_load_dir vqae_churches --ae_load_step 2200000 --load_dir absorbing_churches --load_step 2000000 --n_samples 50000

Calculate ELBO Estimates

The following command fine-tunes a Vector-Quantized autoencoder to compute reconstruction likelihood, and then evaluates the ELBO of the overall model.

python experiments/calc_approximate_ELBO.py --sampler absorbing --dataset ffhq --log_dir nll_churches --ae_load_dir vqae_churches --ae_load_step 2200000 --load_dir absorbing_churches --load_step 2000000 --steps_per_eval 5000 --train_steps 10000

NOTE: the --steps_per_eval flag is required for this script, as a validation dataset is used.

Find Nearest Neighbours

Produces a random batch of samples and finds the nearest neighbour images in the training set based on LPIPS distance.

python experiments/calc_nearest_neighbours.py --sampler absorbing --dataset churches --log_dir nearest_neighbours_churches --ae_load_dir vqae_churches --ae_load_step 2200000 --load_dir absorbing_churches --load_step 2000000

Generate Higher Resolution Samples

By applying the absorbing diffusion model to various locations at once and aggregating denoising probabilities, larger samples than observed during training are able to be generated (see Figures 4 and 11).

python experiments/generate_big_samples.py --sampler absorbing --dataset churches --log_dir big_samples_churches --ae_load_dir vqae_churches --ae_load_step 2200000 load_dir absorbing_churches --load_step 2000000 --shape 32 16

Use the --shape flag to specify the dimensions of the latents to generate.

Related Work

The following papers were particularly helpful when developing this work:

BibTeX

@article{bond2021unleashing,
  title     = {Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes},
  author    = {Sam Bond-Taylor and Peter Hessey and Hiroshi Sasaki and Toby P. Breckon and Chris G. Willcocks},
  journal   = {arXiv preprint arXiv:2111.12701},
  year      = {2021}
}
Owner
Sam Bond-Taylor
PhD student at Durham University interested in deep generative modelling.
Sam Bond-Taylor
Localization Distillation for Object Detection

Localization Distillation for Object Detection This repo is based on mmDetection. This is the code for our paper: Localization Distillation

274 Dec 26, 2022
Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

Jacob Morris 38 Oct 21, 2022
Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

The Face Synthetics dataset Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. It was introduced in ou

Microsoft 608 Jan 02, 2023
meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

meProp The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf]

LancoPKU 107 Nov 18, 2022
High-resolution networks and Segmentation Transformer for Semantic Segmentation

High-resolution networks and Segmentation Transformer for Semantic Segmentation Branches This is the implementation for HRNet + OCR. The PyTroch 1.1 v

HRNet 2.8k Jan 07, 2023
Official implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks"

DiscoGAN Official PyTorch implementation of Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. Prerequisites Python 2.7

SK T-Brain 754 Dec 29, 2022
Ontologysim: a Owlready2 library for applied production simulation

Ontologysim: a Owlready2 library for applied production simulation Ontologysim is an open-source deep production simulation framework, with an emphasi

10 Nov 30, 2022
Basics of 2D and 3D Human Pose Estimation.

Human Pose Estimation 101 If you want a slightly more rigorous tutorial and understand the basics of Human Pose Estimation and how the field has evolv

Sudharshan Chandra Babu 293 Dec 14, 2022
This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

From "Onion Not Found" to Guard Discovery (PETS'22) This repository holds the code and data for our PETS'22 paper titled 'From "Onion Not Found" to Gu

Lennart Oldenburg 3 May 04, 2022
Fashion Landmark Estimation with HRNet

HRNet for Fashion Landmark Estimation (Modified from deep-high-resolution-net.pytorch) Introduction This code applies the HRNet (Deep High-Resolution

SVIP Lab 91 Dec 26, 2022
Small little script to scrape, parse and check for active tor nodes. Can be used as proxies.

TorScrape TorScrape is a small but useful script made in python that scrapes a website for active tor nodes, parse the html and then save the nodes in

5 Dec 04, 2022
Simple implementation of Mobile-Former on Pytorch

Simple-implementation-of-Mobile-Former At present, only the model but no trained. There may be some bug in the code, and some details may be different

Acheung 103 Dec 31, 2022
System Design course at HSE (2021)

System Design course at HSE (2021) Wiki-страница курса Структура репозитория: slides - директория с презентациями с занятий tasks - материалы для выпо

22 Dec 25, 2022
Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

CQL-JAX This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on

Karush Suri 8 Nov 07, 2022
[Link]deep_portfolo - Use Reforcemet earg ad Supervsed learg to Optmze portfolo allocato []

rl_portfolio This Repository uses Reinforcement Learning and Supervised learning to Optimize portfolio allocation. The goal is to make profitable agen

Deepender Singla 165 Dec 02, 2022
Multi-Glimpse Network With Python

Multi-Glimpse Network Our code requires Python ≥ 3.8 Installation For example, venv + pip: $ python3 -m venv env $ source env/bin/activate (env) $ pyt

9 May 10, 2022
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
Nest Protect integration for Home Assistant. This will allow you to integrate your smoke, heat, co and occupancy status real-time in HA.

Nest Protect integration for Home Assistant Custom component for Home Assistant to interact with Nest Protect devices via an undocumented and unoffici

Mick Vleeshouwer 175 Dec 29, 2022
Explanatory Learning: Beyond Empiricism in Neural Networks

Explanatory Learning This is the official repository for "Explanatory Learning: Beyond Empiricism in Neural Networks". Datasets Download the datasets

GLADIA Research Group 10 Dec 06, 2022
DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

Kuang-Jui Hsu 139 Dec 22, 2022