Vector Quantization, in Pytorch

Last update: Jan 08, 2023

Overview

Vector Quantization - Pytorch

A vector quantization library originally transcribed from Deepmind's tensorflow implementation, made conveniently into a package. It uses exponential moving averages to update the dictionary.

VQ has been successfully used by Deepmind and OpenAI for high quality generation of images (VQ-VAE-2) and music (Jukebox).

Install

$ pip install vector-quantize-pytorch

Usage

import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize(
    dim = 256,
    codebook_size = 512,     # codebook size
    decay = 0.8,             # the exponential moving average decay, lower means the dictionary will change faster
    commitment = 1.          # the weight on the commitment loss
)

x = torch.randn(1, 1024, 256)
quantized, indices, commit_loss = vq(x) # (1, 1024, 256), (1, 1024), (1)

Variants

This paper proposes to use multiple vector quantizers to recursively quantize the residuals of the waveform. You can use this with the ResidualVQ class and one extra initialization parameter.

import torch
from vector_quantize_pytorch import ResidualVQ

residual_vq = ResidualVQ(
    dim = 256,
    num_quantizers = 8,      # specify number of quantizers
    codebook_size = 1024,    # codebook size
)

x = torch.randn(1, 1024, 256)
quantized, indices, commit_loss = residual_vq(x)

# (1, 1024, 256), (8, 1, 1024), (8, 1)
# (batch, seq, dim), (quantizer, batch, seq), (quantizer, batch)

Initialization

The SoundStream paper proposes that the codebook should be initialized by the kmeans centroids of the first batch. You can easily turn on this feature with one flag kmeans_init = True, for either VectorQuantize or ResidualVQ class

import torch
from vector_quantize_pytorch import ResidualVQ

residual_vq = ResidualVQ(
    dim = 256,
    codebook_size = 256,
    num_quantizers = 4,
    kmeans_init = True,   # set to True
    kmeans_iters = 10     # number of kmeans iterations to calculate the centroids for the codebook on init
)

x = torch.randn(1, 1024, 256)
quantized, indices, commit_loss = residual_vq(x)

Increasing codebook usage

This repository will contain a few techniques from various papers to combat "dead" codebook entries, which is a common problem when using vector quantizers.

Lower codebook dimension

The Improved VQGAN paper proposes to have the codebook kept in a lower dimension. The encoder values are projected down before being projected back to high dimensional after quantization. You can set this with the codebook_dim hyperparameter.

import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize(
    dim = 256,
    codebook_size = 256,
    codebook_dim = 16      # paper proposes setting this to 32 or as low as 8 to increase codebook usage
)

x = torch.randn(1, 1024, 256)
quantized, indices, commit_loss = vq(x)

Cosine similarity

The Improved VQGAN paper also proposes to l2 normalize the codes and the encoded vectors, which boils down to using cosine similarity for the distance. They claim enforcing the vectors on a sphere leads to improvements in code usage and downstream reconstruction. You can turn this on by setting use_cosine_sim = True

import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize(
    dim = 256,
    codebook_size = 256,
    use_cosine_sim = True   # set this to True
)

x = torch.randn(1, 1024, 256)
quantized, indices, commit_loss = vq(x)

Expiring stale codes

Finally, the SoundStream paper has a scheme where they replace codes that have hits below a certain threshold with randomly selected vector from the current batch. You can set this threshold with threshold_ema_dead_code keyword.

import torch
from vector_quantize_pytorch import VectorQuantize

vq = VectorQuantize(
    dim = 256,
    codebook_size = 512,
    threshold_ema_dead_code = 2  # should actively replace any codes that have an exponential moving average cluster size less than 2
)

x = torch.randn(1, 1024, 256)
quantized, indices, commit_loss = vq(x)

Citations

@misc{oord2018neural,
    title   = {Neural Discrete Representation Learning},
    author  = {Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
    year    = {2018},
    eprint  = {1711.00937},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

@misc{zeghidour2021soundstream,
    title   = {SoundStream: An End-to-End Neural Audio Codec},
    author  = {Neil Zeghidour and Alejandro Luebs and Ahmed Omran and Jan Skoglund and Marco Tagliasacchi},
    year    = {2021},
    eprint  = {2107.03312},
    archivePrefix = {arXiv},
    primaryClass = {cs.SD}
}

@inproceedings{anonymous2022vectorquantized,
    title   = {Vector-quantized Image Modeling with Improved {VQGAN}},
    author  = {Anonymous},
    booktitle = {Submitted to The Tenth International Conference on Learning Representations },
    year    = {2022},
    url     = {https://openreview.net/forum?id=pfNyExj7z2},
    note    = {under review}
}

Comments

Quantizers are not DDP/AMP compliant
Hi Lucidrains,

Thanks for the amazing work you do by implementing all those papers!

Is there a plan to make the Quantizer be compliant with:

DDP - They need an all gather before calculating anything so the updates are exactly the same across all ranks

AMP - In my experience, if AMP touches upon the quantizers it screws up the gradient magnitudes making it NaN/Overflow

If you want I can have a go at it.
opened by danieltudosiu 7
Commitment Loss Problems

Hello,

First of all, thank you so much for this powerful implementation.

I have been researching to train some VQ-VAE to generate faces from FFHQ 128x128 and I always have the same problem if I use the commitment loss (0.25) and the gamma (0.99) like in the original paper, the commitment loss seems to grow infinitely. I know you said that it is an auxiliary loss and that is not that important but is this normal behavior? If not, how can I avoid for that to happen in the case I wanted to use this loss?

Thank you so much in advance!

opened by pedrocg42 6
fix dimensions: the codebook must look at data by taking each time fr…

…ame individually. In SoundStream article: "This vector quantizer learns a codebook of N vectors to encode each D-dimensional frame of enc(x)."

opened by wesbz 5
kmeans and ddp hangs

kmeans and ddp hangs for me. ddp is initialized by pytorch lightning in my case. I have several questions:

In https://github.com/lucidrains/vector-quantize-pytorch/blob/master/vector_quantize_pytorch/vector_quantize_pytorch.py#L98

all_num_samples = all_gather_sizes(local_samples, dim = 0) should it be dim = 1 (as dim 0 is the codebook dimension)?

Then in https://github.com/lucidrains/vector-quantize-pytorch/blob/master/vector_quantize_pytorch/vector_quantize_pytorch.py#L93 it just hangs for me. I am not totally sure, but I believe distributed.broadcast in

https://github.com/lucidrains/vector-quantize-pytorch/blob/master/vector_quantize_pytorch/vector_quantize_pytorch.py#L90

is called with incompatible shapes. See https://pytorch.org/docs/stable/distributed.html#torch.distributed.broadcast

tensor must have the same number of elements in all processes participating in the collective.

opened by tasptz 4
Cannot Converge with L2 Loss

I am trying to quantize the latent vector. To be specific, I use a Encoder to get the latent representation z of the input. Then I try to quantize z, then send z into Decoder.

However, during my experiment, I found the reconstruction loss cannot decrease with L2 loss, namely, the EuclideanCodebook. The model can converge with cosine similarity. Have any idea about this phenomenon?

I think cosine similarity only considers the direction of the vector, instead of the scale of the vector. I still want to use EuclideanCodebook.

opened by kingnobro 3
Error when using gloo as DDP backend

Hello! Thank you for your great work on implementing VQ layer. When I use the VQ layer in DDP mode and use gloo as the backend as suggested in README, I got the following error: terminate called after throwing an instance of 'gloo::EnforceNotMet' what(): [enforce fail at ../third_party/gloo/gloo/transport/tcp/pair.cc:510] op.preamble.length <= op.nbytes. 8773632 vs 8386560

Do you have any ideas on how to solve this problem?
I also tried to use nccl as the backend, however the program only hangs forever...

opened by Saltychtao 3
codebook initialization

Hi, Thank you for this great work. It's quite useful!

I have been having problems with index collapse and I'm not sure where it's coming from. But upon digging into the code, it seems that when we're not using k-means to initialize the codebook vectors, randn (normal distribution) is used to initialize them. The vqvae paper specifically uses uniform distribution for initialization, which allows the authors to ignore KL divergence when training.

This is from the vqvae paper: "Since we assume a uniform prior for z, the KL term that usually appears in the ELBO is constant w.r.t. the encoder parameters and can thus be ignored for training."

Is there any reason why you changed to Normal distribution here?

Thanks!

opened by ramyamounir 3
possible papers (and code) of interest

Have you had a look at bitsandbytes?

https://github.com/TimDettmers/bitsandbytes

https://arxiv.org/abs/2208.07339

https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/

Also this paper on tradeoffs for various 8 bit quantization formats,

https://arxiv.org/pdf/2206.02915v1.pdf

opened by Thomas-MMJ 2
RQ-VAE: How can I get a list of all learned codebook vectors (as indexed in the "indices")?

Hi Lucid, i am working on quantizing CLIP image embeddings with your RQ-VAE. It works pretty well.

Next I want to take all learned codebook vectors and add them to the vocab of a GPT (as frozen token embeddings).

The idea is to train a GPT with CLIP image embeddings in between texts, e.g. IMAGE-CAPTION or TEXT-IMAGE-TEXT-IMAGE- ... Flamingo-style).

If this works, then GPT could maybe also learn to generate quantized CLIP IM embeddings token by token --> and then e.g. show images through a.) retrieval or b.) a DALLE 2 decoder :)

... So my question is: Once the RQ-VAE is trained and i can get the quantized reconstructions and indices - How can I get a list or tensor of the actual codebook? (all possible vectors from the rq-vocab) :)

opened by christophschuhmann 2
Expire codes heuristic is replacing inputs

Thanks for the implementation!

One question, should this

https://github.com/lucidrains/vector-quantize-pytorch/blob/ebce893fff695845f7fe0f04d1400d2c29b94f98/vector_quantize_pytorch/vector_quantize_pytorch.py#L177

be actually self.expire_codes_(quantize)?

opened by kashif 2
orthogonal regularization loss useless?

because the codebooks are not registered as trainable parameters, and the orthogonal loss is only a function of the codebooks, is the orthogonal loss entirely useless?

opened by GallagherCommaJack 2
EMA update on CosineCodebook

The original VIT-VQGAN paper does not seem to use EMA update for codebook learning since their codebook is unit-normalized vectors.

Particularly, to my understanding, EMA update does not quite make sense when the encoder outputs and codebook vectors are unit-normalized ones.

What's your take on this? Should we NOT use EMA update with CosineCodebook?

opened by le4m 3
Loss and Backprop Details

Hi,

During training the vqvae backprops on multiple losses. While inputting feature maps to the model, we are given a loss, shoud I manually backpropagate and update weights through (the good ol' loss.backward() and optimizer.step()) this or is it handled implicitly?

opened by Malik7115 3
Missing parameter of beta
Hi, in the original VQVAE paper, the commit_loss is defined as

(quantize.detach()-x) ** 2 + beta * (quantize - x.detach() ** 2)

where the beta is usually to be 0.25. But the commit_loss is defined as the following in your implementation:

F.mse_loss(quantize.detach(), x)

So I wonder if the parameter beta is set to be 1 by default or if the second term is missing? Thank you very much.
opened by Corleone-Huang 1
No way of training the codebook
Hi! Could you please explain how the codebook vectors are updated if the codebook vectors are not required to be orthogonal?

embed tensors in both Euclidean and CosineSim codebooks are registered as buffers, so they can't be updated at all

There is no loss on the codebook vectors that moves them closer to the input

Am I missing something? It seems that right now there is no way of updating the codebook vectors without the orthogonal loss.
opened by RafailFridman 5
Plugging vector-quantize-pytorch into taming-transformers
Hi,

I noticed your architecture could be plugged within the pipeline from https://github.com/CompVis/taming-transformers. I have proposed a code here (https://github.com/tanouch/taming-transformers) doing that. It enables to properly compare the different features proposed in your repo (Lower codebook dimension, Cosine similarity, Orthogonal regularization loss, etc) with the original formulation.

The code from this repo can be seen in both files

taming-transformers/taming/models/vqgan.py

taming-transformers/taming/modules/vqvae/quantize.py

As you can see, it is easy to launch a large scale training with your proposed architecture.

I am not sure this issue belongs here or in the taming-transformers repo. However, I thought you might be interested. Thanks again for your work and these open-sourced repositeries !
opened by tanouch 2

Releases(0.10.14)

0.10.14(Nov 26, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.12(Nov 23, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.11(Nov 17, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.10(Nov 14, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.9(Nov 10, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.8(Nov 7, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.7(Nov 7, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.6(Nov 4, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.5(Nov 3, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.4(Nov 3, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.3(Nov 2, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.2(Oct 30, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.1(Oct 26, 2022)

null
Source code(tar.gz)
Source code(zip)
0.10.0(Oct 26, 2022)

null
Source code(tar.gz)
Source code(zip)
0.9.2(Aug 8, 2022)

null
Source code(tar.gz)
Source code(zip)
0.9.1(Jul 30, 2022)

null
Source code(tar.gz)
Source code(zip)
0.9.0(Jul 30, 2022)

null
Source code(tar.gz)
Source code(zip)
v0.8.1(Jul 11, 2022)

null
Source code(tar.gz)
Source code(zip)
0.8.0(Jul 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.7.3(Jun 24, 2022)

null
Source code(tar.gz)
Source code(zip)
0.7.1(Mar 16, 2022)

Source code(tar.gz)
Source code(zip)
0.7.0(Mar 16, 2022)

Source code(tar.gz)
Source code(zip)
0.6.0(Mar 13, 2022)

Source code(tar.gz)
Source code(zip)
0.5.1(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.0(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.4.10(Dec 17, 2021)

Source code(tar.gz)
Source code(zip)
0.4.8(Dec 10, 2021)

Source code(tar.gz)
Source code(zip)
0.4.7(Dec 4, 2021)

Source code(tar.gz)
Source code(zip)
0.4.6(Dec 4, 2021)

Source code(tar.gz)
Source code(zip)
0.4.5(Dec 4, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need

GitHub Repository

Code repository for the paper Computer Vision User Entity Behavior Analytics

Computer Vision User Entity Behavior Analytics Code repository for "Computer Vision User Entity Behavior Analytics" Code Description dataset.csv As di

2 Aug 20, 2022

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

809 Dec 16, 2022

An example of semantic segmentation using tensorflow in eager execution.

Semantic segmentation using Tensorflow eager execution Requirement Python 2.7+ Tensorflow-gpu OpenCv H5py Scikit-learn Numpy Imgaug Train with eager e

25 Sep 29, 2022

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

15 Aug 30, 2022

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information"

Repo for paper "Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information" Notes I probabl

0 Jul 01, 2021

Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

108 Dec 01, 2022

Blender Add-on that sets a Material's Base Color to one of Pantone's Colors of the Year

Blender PCOY (Pantone Color of the Year) MCMC (Mid-Century Modern Colors) HG71 (House & Garden Colors 1971) Blender Add-ons That Assign a Custom Color

15 Nov 20, 2022

Talk covering the features of skorch

Skorch Talk Skorch - A Union of Scikit-learn and PyTorch Presentation The slides can be downloaded at: download link. Google Colab Part One - MNIST Pa

3 Oct 20, 2020

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS. With Live, you can build a working mobile app ML demo in minutes.

559 Jan 01, 2023

Implementation of Graph Convolutional Networks in TensorFlow

Graph Convolutional Networks This is a TensorFlow implementation of Graph Convolutional Networks for the task of (semi-supervised) classification of n

6.6k Dec 30, 2022

Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

81 Nov 26, 2022

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR, 2019)

Multi-task Self-supervised Object Detection via Recycling of Bounding Box Annotations (CVPR 2019) To make better use of given limited labels, we propo

126 Sep 13, 2022

A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

2 Nov 25, 2022

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

32 Oct 26, 2022

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Overview PyTorch 0.4.1 | Python 3.6.5 Annotated implementations with comparative introductions for minimax, non-saturating, wasserstein, wasserstein g

471 Dec 16, 2022

AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

AITom Introduction AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis. AITom is originated from the tomominer l

93 Jan 02, 2023

RGB-stacking 🛑 🟩 🔷 for robotic manipulation

RGB-stacking 🛑 🟩 🔷 for robotic manipulation BLOG | PAPER | VIDEO Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes, Alex X. Lee*,

95 Dec 23, 2022

Vanilla and Prototypical Networks with Random Weights for image classification on Omniglot and mini-ImageNet. Made with Python3.

vanilla-rw-protonets-project Vanilla Prototypical Networks and PNs with Random Weights for image classification on Omniglot and mini-ImageNet. Made wi

8 Aug 31, 2022

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems - NAACL 2021

Fine-grained Post-training for Multi-turn Response Selection Implements the model described in the following paper Fine-grained Post-training for Impr

83 Dec 20, 2022

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

12 May 30, 2022

Vector Quantization, in Pytorch

Related tags

Overview

Vector Quantization - Pytorch

Install

Usage

Variants

Initialization

Increasing codebook usage

Lower codebook dimension

Cosine similarity

Expiring stale codes

Citations

Comments

Releases(0.10.14)

0.10.14(Nov 26, 2022)

0.10.12(Nov 23, 2022)

0.10.11(Nov 17, 2022)

0.10.10(Nov 14, 2022)

0.10.9(Nov 10, 2022)

0.10.8(Nov 7, 2022)

0.10.7(Nov 7, 2022)

0.10.6(Nov 4, 2022)

0.10.5(Nov 3, 2022)

0.10.4(Nov 3, 2022)

0.10.3(Nov 2, 2022)

0.10.2(Oct 30, 2022)

0.10.1(Oct 26, 2022)

0.10.0(Oct 26, 2022)

0.9.2(Aug 8, 2022)

0.9.1(Jul 30, 2022)

0.9.0(Jul 30, 2022)

v0.8.1(Jul 11, 2022)

0.8.0(Jul 7, 2022)

v0.7.3(Jun 24, 2022)

0.7.1(Mar 16, 2022)

0.7.0(Mar 16, 2022)

0.6.0(Mar 13, 2022)

0.5.1(Mar 10, 2022)

0.5.0(Mar 10, 2022)

0.4.10(Dec 17, 2021)

0.4.8(Dec 10, 2021)

0.4.7(Dec 4, 2021)

0.4.6(Dec 4, 2021)

0.4.5(Dec 4, 2021)