Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Last update: Dec 25, 2022

Overview

Perceiver IO

Unofficial PyTorch implementation of

This implementation supports training of Perceiver IO models with Pytorch Lightning on some example tasks via a command line interface. Perceiver IO models are constructed using generic encoder and decoder classes and task-specific input and output adapters (see Model API).

Setup

conda env create -f environment.yml
conda activate perceiver-io
export PYTHONPATH=.

Tasks

In the following subsections, Perceiver IO models are trained on some example tasks at smaller scale. In particular, they were trained on two NVIDIA GTX 1080 GPUs (8 GB memory each) using Pytorch Lightning's support for distributed data-parallel training. I didn't really tune model architectures and other hyper-parameters, so you'll probably get better results with a bit of experimentation. Support for more datasets and tasks will be added later.

Masked language modeling

Pretrain a Perceiver IO model on masked language modeling (MLM) with text from the IMDB training set. The pretrained encoder is then used for training a sentiment classification model.

python train/train_mlm.py --dataset=imdb --learning_rate=1e-3 --batch_size=64 \
  --max_epochs=200 --dropout=0.0 --weight_decay=0.0 \
  --accelerator=ddp --gpus=-1

All available command line options and their default values can be displayed with python train/train_mlm.py -h.

Sentiment classification

Train a classification decoder using a frozen encoder from masked language modeling. If you ran MLM yourself you'll need to modify the --mlm_checkpoint argument accordingly, otherwise download checkpoints from here and extract them in the root directory of this project.

python train/train_seq_clf.py --dataset=imdb --learning_rate=1e-3 --batch_size=128 \
  --max_epochs=15 --dropout=0.0 --weight_decay=1e-3 --freeze_encoder \
  --accelerator=ddp --gpus=-1 \
  --mlm_checkpoint 'logs/mlm/version_0/checkpoints/epoch=199-val_loss=4.899.ckpt'

Unfreeze the encoder and jointly fine-tune it together with the decoder that has been trained in the previous step. If you ran the previous step yourself you'll need to modify the --clf_checkpoint argument accordingly, otherwise download checkpoints from here.

python train/train_seq_clf.py --dataset=imdb --learning_rate=1e-4 --batch_size=128 \
  --max_epochs=15 --dropout=0.2 --weight_decay=1e-3 \
  --accelerator=ddp --gpus=-1 \
  --clf_checkpoint 'logs/seq_clf/version_0/checkpoints/epoch=014-val_loss=0.350.ckpt'

All available command line options and their default values can be displayed with python train/train_seq_clf.py -h.

Image classification

Classify MNIST images. See also Model API for details about the underlying Perceiver IO model.

python train/train_img_clf.py --dataset=mnist --learning_rate=1e-3 --batch_size=128 \
  --max_epochs=20 --dropout=0.0 --weight_decay=1e-4 \
  --accelerator=ddp --gpus=-1

All available command line options and their default values can be displayed with python train/train_img_clf.py -h.

Model API

The model API is based on generic encoder and decoder classes (PerceiverEncoder and PerceiverDecoder) and task-specific input and output adapters. The following snippet shows how they can be used to create an MNIST image classifier, for example:

from perceiver.adapter import ImageInputAdapter, ClassificationOutputAdapter
from perceiver.model import PerceiverIO, PerceiverEncoder, PerceiverDecoder

latent_shape = (32, 128)

# Fourier-encode pixel positions and flatten along spatial dimensions
input_adapter = ImageInputAdapter(image_shape=(28, 28, 1), num_frequency_bands=32)

# Project generic Perceiver decoder output to specified number of classes
output_adapter = ClassificationOutputAdapter(num_classes=10, num_output_channels=128)

# Generic Perceiver encoder
encoder = PerceiverEncoder(
    input_adapter=input_adapter,
    latent_shape=latent_shape,
    num_layers=3,
    num_cross_attention_heads=4,
    num_self_attention_heads=4,
    num_self_attention_layers_per_block=3,
    dropout=0.0)

# Generic Perceiver decoder
decoder = PerceiverDecoder(
    output_adapter=output_adapter,
    latent_shape=latent_shape,
    num_cross_attention_heads=1,
    dropout=0.0)

# MNIST classifier implemented as Perceiver IO model
mnist_classifier = PerceiverIO(encoder, decoder)

Tensorboard

Commands in section Tasks write Tensorboard logs to the logs directory. They can be visualized with tensorboard --logir logs. MLM training additionally writes predictions of masked sample text to Tensorboard's TEXT page. For example, the command

python train/train_mlm.py --dataset=imdb --learning_rate=1e-3 --batch_size=64 \
  --max_epochs=200 --dropout=0.0 --weight_decay=0.0 \
  --accelerator=ddp --gpus=-1 --predict_k=5 \
  --predict_samples='i have watched this [MASK] and it was awesome'

writes the top 5 predictions for I have watched this [MASK] and it was awesome to Tensorboard after each epoch:

i have watched this [MASK] and it was awesome
i have watched this movie and it was awesome
i have watched this show and it was awesome
i have watched this film and it was awesome
i have watched this series and it was awesome
i have watched this dvd and it was awesome

Citations

@misc{jaegle2021perceiver,
    title   = {Perceiver: General Perception with Iterative Attention},
    author  = {Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira},
    year    = {2021},
    eprint  = {2103.03206},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

@misc{jaegle2021perceiver,
    title   = {Perceiver IO: A General Architecture for Structured Inputs & Outputs},
    author  = {Andrew Jaegle and Sebastian Borgeaud and Jean-Baptiste Alayrac and Carl Doersch and Catalin Ionescu and David Ding and Skanda Koppula and Andrew Brock and Evan Shelhamer and Olivier Hénaff and Matthew M. Botvinick and Andrew Zisserman and Oriol Vinyals and João Carreira},
    year    = {2021},
    eprint  = {2107.14795},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Comments

use Conda version of TorchMetrics
preferably use Conda version, similar could be done also with PL except Conda does not support extras

https://anaconda.org/conda-forge/torchmetrics

https://anaconda.org/conda-forge/pytorch-lightning
opened by Borda 8
Genomic sequences

Hello,

Thank you for your implementation of the PerceiverIO project. I am trying to use your work for genomic sequences of shape (10k, 1). I noticed that your model produces the SAME output for DIFFERENT inputs when the num_channels dimension is 1 (I am not using the Fourier Feature encodings). If the outputs are not the same, then they are nominally different. Can you please guide me in solving this issue? Thanks in advance!

Please let me know what additional information you would need to reproduce this bug.

opened by ajv012 7
Fix poetry python version limits

The grpc dependency doesn't build with Python 3.10 yet because it relies on outdated setuptools which the dependency tree isn't managing automatically, so don't allow environments under 3.10+ until grpc gets their act together.

Commit also updates minor versions of some deps because re-synchronizing poetry.lock with pyproject requires a full "poetry update" which pulls updated minor dependency versions everywhere.

opened by mattsta 5
make repo as installable package
I found it will be useful to have this project as an installable package so I suggest the following changes:

rename actual perceiver as model (as it holds all model-related components)

create new package perceiverio which pulls model, data and cli as sub-packages

simplify the cli package as just module as it quite lite

add setup.py to be installed
opened by Borda 5
Data preprocessing and documentation enhancements, major refactorings
Functional enhancements:

Support for static word masking in addition to dynamic word masking.

Support for individual token masking in addition to whole word masking.

Task-specific data preprocessing for all supported text datasets.

Constant learning rate scheduler with warmup now used by default.

Documentation enhancements:

All training examples now provided as command line and Python script.

Better overview of official models and example training checkpoints.

Example training checkpoints can now be downloaded individually.

Minor enhancements to all other documentation sections.

Refactorings and breaking changes:

Rename image package to vision.

TextDataModule base class now implements complete preprocessing logic.

TextDataModule subclasses only convert source dataset to a common structure.

Abstraction over cross-attention query creation (QueryProvider).

Decouple OutputAdapter interface from trainable cross-attention query.

Implement learned positions encodings as nn.Embedding.

Move adapters to separate perceiver.model.core.adapter module.

Rename PerceiverConfig to PerceiverIOConfig

Rename LitModel base class to LitPerceiverIO.

LitClassifier.forward now behaves like the wrapped model's forward.

Object-oriented design of conversion from Hugging Face Perceiver models.

Major refactoring of PerceiverAR and CausalLanguageModel.

Move FourierPositionEncoding to perceiver.model.core.position` module.
opened by krasserm 2
Multi-head attention as specified in paper, API changes and refactorings
Multi-head attention as specified in https://arxiv.org/abs/2107.14795 Appendix E

Renaming of constructor parameters in Pytorch Model API

Redesign of config classes in Pytorch Lightning API and CLI

Output query now managed by output adapter instead of decoder
opened by krasserm 2

text encoding error

Hi, I am getting this error

Traceback (most recent call last):
  File "train/train_mlm.py", line 113, in <module>
    main(parser.parse_args())
  File "train/train_mlm.py", line 69, in main
    data_module.setup()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
    fn(*args, **kwargs)
  File "/opt/perceiver-io/data/imdb.py", line 131, in setup
    self.ds_train = IMDBDataset(root=self.root, split='train')
  File "/opt/perceiver-io/data/imdb.py", line 42, in __init__
    self.raw_x, self.raw_y = load_split(root, split)
  File "/opt/perceiver-io/data/imdb.py", line 34, in load_split
    raw_x.append(f.read())
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 449: ordinal not in range(128)

it is probably related to the unicode encoding

opened by batrlatom 2

What is Q in the latent encoder layers?

It seems that in the multi-layer encoder, you use x_latent as Q, x as KV, shouldn't the QKV all be x_latent in latent layers? Please correct me if I missed something in the paper, thank you!

opened by zhangyuygss 1
Support key padding masks for Perceiver AR
tokenizers must be configured to padding_side="left" in order to be compatible with Perceiver AR

support configuration of padding_side on base class of text data modules (TextDataModule).

implement random sequence truncation in data module instead of model

sequences in a batch are individually truncated to different lengths.

enable random_train_shift by default which increases regularization.
opened by krasserm 0
Implement processor for optical flow
Implement OpticalFlowProcessor to preprocess input images and create optical flows from model predictions

Add video_utils to sample frames from videos and create output videos from estimated optical flows

Extend inference notebook with examples for optical flow
opened by cstub 0
Major refactorings

Better modularization Documentation rewrite Add support for Huggingface tokenizers Add support for Huggingface datasets Add support for Docker Fix missing bias terms in MHA Weight init according to paper

opened by krasserm 0

Releases(0.7.0)

0.7.0(Dec 4, 2022)

This release adds a Perceiver IO for predicting the optical flow between two images. It also adds utilities for producing an optical flow video from an input video (see inference notebook, for a demo). Thanks to @cstub for this great contribution. See milestone 0.7.0 for a list of closed tickets.
Source code(tar.gz)
Source code(zip)
0.7b1(Nov 20, 2022)
Data preprocessing and documentation enhancements, major refactorings

Functional enhancements:

Support for static word masking in addition to dynamic word masking.

Support for individual token masking in addition to whole word masking.

Task-specific data preprocessing for all supported text datasets.

Constant learning rate scheduler with warmup now used by default.

Documentation enhancements:

All training examples now provided as command line and Python script.

Better overview of official models and example training checkpoints.

Example training checkpoints can now be downloaded individually.

Minor enhancements to all other documentation sections.

Refactorings and breaking changes:

Rename image package to vision.

TextDataModule base class now implements complete preprocessing logic.

TextDataModule subclasses only convert source dataset to a common structure.

Abstraction over cross-attention query creation (QueryProvider).

Decouple OutputAdapter interface from trainable cross-attention query.

Implement learned positions encodings as nn.Embedding.

Move adapters to separate perceiver.model.core.adapter module.

Rename PerceiverConfig to PerceiverIOConfig

Rename LitModel base class to LitPerceiverIO.

LitClassifier.forward now behaves like the wrapped model's forward.

Object-oriented design of conversion from Hugging Face Perceiver models.

Major refactoring of PerceiverAR and CausalLanguageModel.

Move FourierPositionEncoding to perceiver.model.core.position` module.

Source code(tar.gz)
Source code(zip)
0.6.0(Sep 25, 2022)

Implementation of Perceiver AR including training and inference examples (#20).
Source code(tar.gz)
Source code(zip)
0.5.1(Aug 31, 2022)
Upgrade to PyTorch Lightning 1.7.3 and PyTorch 1.12.1.

See milestone 0.5.1 for a complete list of closed tickets.

Source code(tar.gz)
Source code(zip)
0.5.0(Aug 22, 2022)
Highlights of the 0.5.0 release:

Import pretrained models from Huggingface Hub

New training examples

New inference examples

UTF-8 bytes tokenization

Source code(tar.gz)
Source code(zip)

Owner

Martin Krasser

Freelance machine learning engineer, software developer and consultant. Mountainbike freerider, bass guitar player.

GitHub Repository

High-fidelity performance metrics for generative models in PyTorch

5 Oct 24, 2021

Fast Discounted Cumulative Sums in PyTorch

TODO: update this README! Fast Discounted Cumulative Sums in PyTorch This repository implements an efficient parallel algorithm for the computation of

7 Feb 17, 2022

A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

234 Dec 07, 2022

pip install antialiased-cnns to improve stability and accuracy

Antialiased CNNs [Project Page] [Paper] [Talk] Making Convolutional Networks Shift-Invariant Again Richard Zhang. In ICML, 2019. Quick & easy start Ru

1.6k Dec 28, 2022

Training PyTorch models with differential privacy

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the cli

1.3k Dec 29, 2022

PyTorch Lightning Optical Flow models, scripts, and pretrained weights.

105 Dec 16, 2022

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

251 Dec 25, 2022

PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions

glow-pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions

433 Dec 27, 2022

Learning Sparse Neural Networks through L0 regularization

Example implementation of the L0 regularization method described at Learning Sparse Neural Networks through L0 regularization, Christos Louizos, Max W

202 Nov 10, 2022

Official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis.

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis This repo contains the official implementations of EigenDamage: Structured Prunin

107 Apr 20, 2022

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

12 Dec 19, 2021

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

4.4k Jan 04, 2023

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

757 Jan 04, 2023

Code snippets created for the PyTorch discussion board

PyTorch misc Collection of code snippets I've written for the PyTorch discussion board. All scripts were testes using the PyTorch 1.0 preview and torc

461 Dec 26, 2022

Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

1.8k Jan 06, 2023

torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

2.6k Jan 03, 2023

PyTorch wrappers for using your model in audacity!

130 Dec 14, 2022

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

1.5k Jan 03, 2023

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

15 Dec 24, 2022

PyTorch toolkit for biomedical imaging

farabio is a minimal PyTorch toolkit for out-of-the-box deep learning support in biomedical imaging. For further information, see Wikis and Docs.

47 Dec 28, 2022

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Related tags

Overview

Perceiver IO

Setup

Tasks

Masked language modeling

Sentiment classification

Image classification

Model API

Tensorboard

Citations

Comments

Releases(0.7.0)

0.7.0(Dec 4, 2022)

0.7b1(Nov 20, 2022)

0.6.0(Sep 25, 2022)

0.5.1(Aug 31, 2022)

0.5.0(Aug 22, 2022)

Owner

Martin Krasser

High-fidelity performance metrics for generative models in PyTorch

Fast Discounted Cumulative Sums in PyTorch

A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

pip install antialiased-cnns to improve stability and accuracy

Training PyTorch models with differential privacy

PyTorch Lightning Optical Flow models, scripts, and pretrained weights.

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions

Learning Sparse Neural Networks through L0 regularization

Official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis.

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

Code snippets created for the PyTorch discussion board

Reformer, the efficient Transformer, in Pytorch

torch-optimizer -- collection of optimizers for Pytorch

PyTorch wrappers for using your model in audacity!

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

PyTorch toolkit for biomedical imaging