Ladder Variational Autoencoders (LVAE) in PyTorch

Overview

Ladder Variational Autoencoders (LVAE)

PyTorch implementation of Ladder Variational Autoencoders (LVAE) [1]:

                 LVAE equation

where the variational distributions q at each layer are multivariate Normal with diagonal covariance.

Significant differences from [1] include:

  • skip connections in the generative path: conditioning on all layers above rather than only on the layer above (see for example [2])
  • spatial (convolutional) latent variables
  • free bits [3] instead of beta annealing [4]

Install requirements and run MNIST example

pip install -r requirements.txt
CUDA_VISIBLE_DEVICES=0 python main.py --zdims 32 32 32 --downsample 1 1 1 --nonlin elu --skip --blocks-per-layer 4 --gated --freebits 0.5 --learn-top-prior --data-dep-init --seed 42 --dataset static_mnist

Dependencies include boilr (a framework for PyTorch) and multiobject (which provides multi-object datasets with PyTorch dataloaders).

Likelihood results

Log likelihood bounds on the test set (average over 4 random seeds).

dataset num layers -ELBO - log p(x)
[100 iws]
- log p(x)
[1000 iws]
binarized MNIST 3 82.14 79.47 79.24
binarized MNIST 6 80.74 78.65 78.52
binarized MNIST 12 80.50 78.50 78.30
multi-dSprites (0-2) 12 26.9 23.2
SVHN 15 4012 (1.88) 3973 (1.87)
CIFAR10 3 7651 (3.59) 7591 (3.56)
CIFAR10 6 7321 (3.44) 7268 (3.41)
CIFAR10 15 7128 (3.35) 7068 (3.32)
CelebA 20 20026 (2.35) 19913 (2.34)

Note:

  • Bits per dimension in brackets.
  • 'iws' stands for importance weighted samples. More samples means tighter log likelihood lower bound. The bound converges to the actual log likelihood as the number of samples goes to infinity [5]. Note that the model is always trained with the ELBO (1 sample).
  • Each pixel in the images is modeled independently. The likelihood is Bernoulli for binary images, and discretized mixture of logistics with 10 components [6] otherwise.
  • One day I'll get around to evaluating the IW bound on all datasets with 10000 samples.

Supported datasets

  • Statically binarized MNIST [7], see Hugo Larochelle's website http://www.cs.toronto.edu/~larocheh/public/datasets/
  • SVHN
  • CIFAR10
  • CelebA rescaled and cropped to 64x64 – see code for details. The path in experiment.data.DatasetLoader has to be modified
  • binary multi-dSprites: 64x64 RGB shapes (0 to 2) in each image

Samples

Binarized MNIST

MNIST samples

Multi-dSprites

multi-dSprites samples

SVHN

SVHN samples

CIFAR

CIFAR samples

CelebA

CelebA samples

Hierarchical representations

Here we try to visualize the representations learned by individual layers. We can get a rough idea of what's going on at layer i as follows:

  • Sample latent variables from all layers above layer i (Eq. 1).

  • With these variables fixed, take S conditional samples at layer i (Eq. 2). Note that they are all conditioned on the same samples. These correspond to one row in the images below.

  • For each of these samples (each small image in the images below), pick the mode/mean of the conditional distribution of each layer below (Eq. 3).

  • Finally, sample an image x given the latent variables (Eq. 4).

Formally:

                

where s = 1, ..., S denotes the sample index.

The equations above yield S sample images conditioned on the same values of z for layers i+1 to L. These S samples are shown in one row of the images below. Notice that samples from each row are almost identical when the variability comes from a low-level layer, as such layers mostly model local structure and details. Higher layers on the other hand model global structure, and we observe more and more variability in each row as we move to higher layers. When the sampling happens in the top layer (i = L), all samples are completely independent, even within a row.

Binarized MNIST: layers 4, 8, 10, and 12 (top layer)

MNIST layers 4   MNIST layers 8

MNIST layers 10   MNIST layers 12

SVHN: layers 4, 10, 13, and 15 (top layer)

SVHN layers 4   SVHN layers 10

SVHN layers 13   SVHN layers 15

CIFAR: layers 3, 7, 10, and 15 (top layer)

CIFAR layers 3   CIFAR layers 7

CIFAR layers 10   CIFAR layers 15

CelebA: layers 6, 11, 16, and 20 (top layer)

CelebA layers 6

CelebA layers 11

CelebA layers 16

CelebA layers 20

Multi-dSprites: layers 3, 7, 10, and 12 (top layer)

MNIST layers 4   MNIST layers 8

MNIST layers 10   MNIST layers 12

Experimental details

I did not perform an extensive hyperparameter search, but this worked pretty well:

  • Downsampling by a factor of 2 in the beginning of inference. After that, activations are downsampled 4 times for 64x64 images (CelebA and multi-dSprites), and 3 times otherwise. The spatial size of the final feature map is always 2x2. Between these downsampling steps there is approximately the same number of stochastic layers.
  • 4 residual blocks between stochastic layers. Haven't tried with more than 4 though, as models become quite big and we get diminishing returns.
  • The deterministic parts of bottom-up and top-down architecture are (almost) perfectly mirrored for simplicity.
  • Stochastic layers have spatial random variables, and the number of rvs per "location" (i.e. number of channels of the feature map after sampling from a layer) is 32 in all layers.
  • All other feature maps in deterministic paths have 64 channels.
  • Skip connections in the generative model (--skip).
  • Gated residual blocks (--gated).
  • Learned prior of the top layer (--learn-top-prior).
  • A form of data-dependent initialization of weights (--data-dep-init). See code for details.
  • freebits=1.0 in experiments with more than 6 stochastic layers, and 0.5 for smaller models.
  • For everything else, see _add_args() in experiment/experiment_manager.py.

With these settings, the number of parameters is roughly 1M per stochastic layer. I tried to control for this by experimenting e.g. with half the number of layers but twice the number of residual blocks, but it looks like the number of stochastic layers is what matters the most.

References

[1] CK Sønderby, T Raiko, L Maaløe, SK Sønderby, O Winther. Ladder Variational Autoencoders, NIPS 2016

[2] L Maaløe, M Fraccaro, V Liévin, O Winther. BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling, NeurIPS 2019

[3] DP Kingma, T Salimans, R Jozefowicz, X Chen, I Sutskever, M Welling. Improved Variational Inference with Inverse Autoregressive Flow, NIPS 2016

[4] I Higgins, L Matthey, A Pal, C Burgess, X Glorot, M Botvinick, S Mohamed, A Lerchner. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, ICLR 2017

[5] Y Burda, RB Grosse, R Salakhutdinov. Importance Weighted Autoencoders, ICLR 2016

[6] T Salimans, A Karpathy, X Chen, DP Kingma. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications, ICLR 2017

[7] H Larochelle, I Murray. The neural autoregressive distribution estimator, AISTATS 2011

Owner
Andrea Dittadi
PhD student at DTU Compute | representation learning, deep generative models
Andrea Dittadi
Code for CVPR 2018 paper --- Texture Mapping for 3D Reconstruction with RGB-D Sensor

G2LTex This repository contains the implementation of "Texture Mapping for 3D Reconstruction with RGB-D Sensor (CVPR2018)" based on mvs-texturing. Due

Fu Yanping(付燕平) 129 Dec 30, 2022
Learning to Segment Instances in Videos with Spatial Propagation Network

Learning to Segment Instances in Videos with Spatial Propagation Network This paper is available at the 2017 DAVIS Challenge website. Check our result

Jingchun Cheng 145 Sep 28, 2022
Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation"

1 Introduction Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation". The code s

Liang Zhang 10 Dec 10, 2022
Creating Multi Task Models With Keras

Creating Multi Task Models With Keras About The Project! I used the keras and Tensorflow Library, To build a Deep Learning Neural Network to Creating

Srajan Chourasia 4 Nov 28, 2022
The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.

Pneumonia Detection from X-Rays Project Overview In this project, you will apply the skills that you have acquired in this 2D medical imaging course t

Omar Laham 1 Jan 14, 2022
Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

UTNet (Accepted at MICCAI 2021) Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation Introduction Transf

110 Jan 01, 2023
Veri Setinizi Yolov5 Formatına Dönüştürün

Veri Setinizi Yolov5 Formatına Dönüştürün! Bu Repo da Neler Var? Xml Formatındaki Veri Setini .Txt Formatına Çevirme Xml Formatındaki Dosyaları Silme

Kadir Nar 4 Aug 22, 2022
YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks

YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks.

Adam Van Etten 145 Jan 01, 2023
A repository that finds a person who looks like you by using face recognition technology.

Find Your Twin Hello everyone, I've always wondered how casting agencies do the casting for a scene where a certain actor is young or old for a movie

Cengizhan Yurdakul 3 Jan 29, 2022
Semantic graph parser based on Categorial grammars

Lambekseq "Everyone who failed Greek or Latin hates it." This package is for proving theorems in Categorial grammars (CG) and constructing semantic gr

10 Aug 19, 2022
A simple software for capturing human body movements using the Kinect camera.

KinectMotionCapture A simple software for capturing human body movements using the Kinect camera. The software can seamlessly save joints and bones po

Aleksander Palkowski 5 Aug 13, 2022
Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

SE3 Transformer - Pytorch Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. May be needed for replicating Alphafold2 resu

Phil Wang 207 Dec 23, 2022
Code for ICCV 2021 paper Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes using Scene Graphs

Graph-to-3D This is the official implementation of the paper Graph-to-3d: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs | arx

Helisa Dhamo 33 Jan 06, 2023
Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and

594 Jan 06, 2023
JugLab 33 Dec 30, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

Photogrammetry & Robotics Bonn 67 Dec 06, 2022
Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a

Tristan Croll 24 Nov 23, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
Pytorch implementation of One-Shot Affordance Detection

One-shot Affordance Detection PyTorch implementation of our one-shot affordance detection models. This repository contains PyTorch evaluation code, tr

46 Dec 12, 2022