Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Last update: Dec 29, 2022

Overview

Perceiver - Pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Install

$ pip install perceiver-pytorch

Usage

import torch
from perceiver_pytorch import Perceiver

model = Perceiver(
    num_fourier_features = 6,    # number of fourier features, with original value (2 * K + 1)
    depth = 48,                  # depth of net, in paper, they went deep, making up for lack of attention
    num_latents = 6,             # number of latents, or induced set points, or centroids. different papers giving it different names
    cross_dim = 512,             # cross attention dimension
    latent_dim = 512,            # latent dimension
    cross_heads = 1,             # number of heads for cross attention. paper said 1
    latent_heads = 8,            # number of heads for latent self attention, 8
    cross_dim_head = 64,
    latent_dim_head = 64,
    num_classes = 1000,          # output number of classes
    attn_dropout = 0.,
    ff_dropout = 0.,
    weight_tie_layers = False    # whether to weight tie layers (optional, as indicated in the diagram)
)

img = torch.randn(1, 224 * 224) # 1 imagenet image, pixelized

model(img) # (1, 1000)

Citations

@misc{jaegle2021perceiver,
    title   = {Perceiver: General Perception with Iterative Attention},
    author  = {Andrew Jaegle and Felix Gimeno and Andrew Brock and Andrew Zisserman and Oriol Vinyals and Joao Carreira},
    year    = {2021},
    eprint  = {2103.03206},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

Latent averaging to the logits?

I read through the paper last night and came away confused about a few things. I looked through your code hoping for some clarity.

One issue that doesn't seem to be explained in the paper (or I am missing it) is how the authors go from a set of latents to the logits used at the classification head. You implemented this by taking the mean of the latent set:

https://github.com/lucidrains/perceiver-pytorch/blob/main/perceiver_pytorch/perceiver_pytorch.py#L203

Is this actually how the authors convert to logits?

opened by neonbjb 7
PerceiverAR?

Hey @lucidrains - love this repo, and still trying to wrap my head around the various difference between Perceiver architectures; how hard would it be to extend PerceiverIO to PerceiverAR; what fundamentally needs to change?

opened by siddk 5
Not using the classification head in Perceiver

Hi @lucidrains, thank you for your great job!

I'd like to use the Perceiver (not PerceiverIO) without the classification head (average and projection). Do you think we could add an option to avoid using it? I can do a PR if you want.

Thanks!

opened by gegallego 4
Decoder Attention Module needs a FF network as well in perceiver_io.py script

Hi,

According to perceiver io paper's (https://arxiv.org/abs/2107.14795) architectural details, they mention that the decoder attention block contains a cross attention block (4), which is already implemented in the perceiver_io.py script (Line 151), followed by a Feedforward network, given by equation (6) in the paper, which is not present in that script. I am not aware of the repercussions of not having FF in the decoder module but it might be a good idea to have it in the implementation. Something like self.decoder_ff = PreNorm(FeedForward(queries_dim)) would do the job. Experimentally, the authors had found that omitting equation (5) is helpful.

opened by Hritikbansal 4
Positional encoding are already part of the input

Hello! First of all, thank you for this implementation.

My inputs already have the proper positional encoding as part of the channel axis. Would it be possible to add a feature to deactivate the default implementation of the positional encoding?

Thank you!

opened by Atlis 4
x = self.latents + self.pos_emb
self.latents = nn.Parameter(torch.randn(num_latents, latent_dim)) self.pos_emb = nn.Parameter(torch.randn(num_latents, latent_dim)) ... x = self.latents + self.pos_emb

I'm not very familiar with pytorch, but does this make sense? I mean, what's intended when 2 trainable weight matrices are simply summed and that's that's the only place where both latents and pos_emb appear. It looks like it can be replaced with only one matrix.
opened by galchinsky 4
Fourier encoding is not similar to the paper

First of all, thanks for sharing the code !

I have a follow up question to #4.

In the paper, the authors mentioned about [sin(f_kπx_d), cos(f_kπx_d)], where f_k is a bank of frequencies spaced log-linearly between 1 and µ/2. Can you maybe point out how you came to the 1/2**i scaling in the code ?

https://github.com/lucidrains/perceiver-pytorch/blob/6ae733773d29cb29383f3ac7b45af8cb6bd2c0dc/perceiver_pytorch/perceiver_pytorch.py#L28-L35

Thanks!

opened by cheneeheng 4
Fourier encoding should be for position coordinates instead of byte array
The fourier_encode function as implemented takes as input a byte array x and directly encodes it with sin/cos before concating with the input.

As I understand the NeRF position encodings, they encode the x/y/etc. position coordinates, and not a transformation of the data itself. From the Perceiver paper:

We parametrize the frequency encoding to take the values [sin(fkπxd), cos(fkπxd)], where the frequencies fk is the kth band of a bank of frequencies spaced log-linearly between 1 and µ/2... For example, by allowing the network to resolve the maximum frequency present in an input array, we can encourage it to learn to compare the values of bytes at any positions in the input array. xd is the value of the input position along the dth dimension (e.g. for images d = 2 and for video d = 3). xd takes values in [−1, 1] for each dimension. We concatenate the raw positional value xd to produce the final representation of position. This results in a positional encoding of size d(2K + 1).

NeRF position encoding examples:

https://github.com/bmild/nerf/blob/20a91e764a28816ee2234fcadb73bd59a613a44c/run_nerf_helpers.py#L22

https://github.com/ankurhanda/nerf2D
opened by eridgd 4
Positional encoding frequency bands should be linearly spaced

A small bug, but as alluded to in this comment by @marcdumon, it seems as though the frequency bands are indeed spaced linearly in the official JAX implementation.

opened by djl11 2
Bug in fourier_encode (?)
Thank you for this great implementation. I'm learning a lot from it!

I think I found a problem in the fourier_encode method. In this line: https://github.com/lucidrains/perceiver-pytorch/blob/b33aced4e1b266aeb1383e03ab63f0a9951f9126/perceiver_pytorch/perceiver_pytorch.py#L36

the scales are always the same whatever value of parameter base. Example:

max_freq = 10, num_bands=6, base = 2 => scales = [1.0000, 1.3797, 1.9037, 2.6265, 3.6239, 5.0000] max_freq = 10, num_bands=6, base = 10 => scales = [1.0000, 1.3797, 1.9037, 2.6265, 3.6239, 5.0000]
opened by marcdumon 2
Attention softmax is applied to incorrect dimension?
I am studying multi-head attention. When I was reading through [1], I found that the attenion softmax is applied over the last dimension of the similarity tensor sim:

q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h = h), (q, k, v)) sim = einsum('b i d, b j d -> b i j', q, k) * self.scale if exists(mask): <removed> # attention, what we cannot get enough of attn = sim.softmax(dim = -1)

If I understand correctly sim has the shape (b*h) n1 n2. The softmax is computed over the last dimension n2. Shouldn't the softmax be applied to matrices with all the similarity values of a single head (i.e. with shape n1, n2)?

[1] https://github.com/lucidrains/perceiver-pytorch/blob/main/perceiver_pytorch/perceiver_io.py#L97
opened by breuderink 2
Issue defining base in fourier_encode for experimental.py, gated.py, mixed_latents.py

Hey Lucid, love the work, it appears you deprecated base in fourier_encode at https://github.com/lucidrains/perceiver-pytorch/commit/144b0d9716a7212b5fd6d95a2267c4d4a08b56a7

But experimental.py, gated.py, mixed_latents.py are still trying to define the base within the forward pass. https://github.com/lucidrains/perceiver-pytorch/blob/abbb5d5949d3509c57749bd134f5068f2761aac7/perceiver_pytorch/experimental.py#L122 https://github.com/lucidrains/perceiver-pytorch/blob/2d59df42ebb0b7538af77d584f5ae5b50759618b/perceiver_pytorch/mixed_latents.py#L85 https://github.com/lucidrains/perceiver-pytorch/blob/2d59df42ebb0b7538af77d584f5ae5b50759618b/perceiver_pytorch/gated.py#L103

Thanks again, keep up the great work.

opened by TannerLaBorde 0
Audio + Text data?

Can someone please guide me on how you can process both audio and .txt data through perceiver simultaneously for multimodality learning?

An example code would be nice.

Thanks

opened by Sidz1812 1
just a suggestion

Hi I like to start with thanking you for such a great work with a lot of great implementations. I have a small suggestion. I suggest for all your codes/modules try to add if __name__ == "__main__": so that if someone just wants to use one file/module can easily try that without having going through whole implementations. for example I am trying to use the this, in case of having a if __name__ == "__main__": I can easily try to run a random input and see how it will work. This will increase the usability with a huge amount.

Keep up the great work :)

opened by seyeeet 4
What should I change if I want to use data with input size 720*184

thanks for sharing this code, I was wondering what should I change if I want to be able to use data that can be converted into images with an input size of 720*184? thanks in advance

opened by Oussamab21 0
Question regarding queries dimensionality in Perceiver IO

Hi @lucidrains,

I think I may be missing something - why do we define the perceiver IO queries vector to have a batch dimension (i.e. queries = torch.randn(1, 128, 32))? Was this just to make the code work nicely? Shouldnt we be using queries = torch.randn(128, 32) ? I expect to use the same embedding for all of my batch elements, which is IIUC what your code is doing.

opened by pcicales 3

Releases(0.8.6)

0.8.6(Dec 5, 2022)

null
Source code(tar.gz)
Source code(zip)
0.8.5(Dec 5, 2022)

null
Source code(tar.gz)
Source code(zip)
0.8.4(Dec 5, 2022)

null
Source code(tar.gz)
Source code(zip)
0.8.3(Jan 25, 2022)

Source code(tar.gz)
Source code(zip)
0.8.2(Jan 25, 2022)

Source code(tar.gz)
Source code(zip)
0.8.1(Dec 12, 2021)

Source code(tar.gz)
Source code(zip)
0.8.0(Dec 7, 2021)

Source code(tar.gz)
Source code(zip)
0.7.5(Oct 10, 2021)

Source code(tar.gz)
Source code(zip)
0.7.4(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.7.3(Sep 26, 2021)

Source code(tar.gz)
Source code(zip)
0.7.2(Sep 26, 2021)

Source code(tar.gz)
Source code(zip)
0.7.1(Sep 13, 2021)

Source code(tar.gz)
Source code(zip)
0.7.0(Aug 30, 2021)

Source code(tar.gz)
Source code(zip)
0.6.2(Aug 30, 2021)

Source code(tar.gz)
Source code(zip)
0.6.1(Aug 30, 2021)

Source code(tar.gz)
Source code(zip)
0.6.0(Aug 29, 2021)

Source code(tar.gz)
Source code(zip)
0.5.1(Aug 2, 2021)

Source code(tar.gz)
Source code(zip)
0.5.0(Aug 2, 2021)

Source code(tar.gz)
Source code(zip)
0.4.0(May 11, 2021)

Source code(tar.gz)
Source code(zip)
0.3.0(Apr 16, 2021)

Source code(tar.gz)
Source code(zip)
0.2.1(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
0.2.0(Apr 10, 2021)

Source code(tar.gz)
Source code(zip)
0.1.20(Apr 4, 2021)

Source code(tar.gz)
Source code(zip)
0.1.19(Mar 25, 2021)

Source code(tar.gz)
Source code(zip)
0.1.18(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.17(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.16(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.15(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.14(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.12(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need.

GitHub Repository

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

1 Jan 16, 2022

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su [arXiv] [project site] DistDe

122 Dec 13, 2022

HybVIO visual-inertial odometry and SLAM system

HybVIO A visual-inertial odometry system with an optional SLAM module. This is a research-oriented codebase, which has been published for the purposes

320 Jan 03, 2023

🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

Conditional Motion In-Betweening (CMIB) Official implementation of paper: Conditional Motion In-betweeening. Paper(arXiv) | Project Page | YouTube in-

81 Dec 22, 2022

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

52 Jan 04, 2023

A python interface for training Reinforcement Learning bots to battle on pokemon showdown

The pokemon showdown Python environment A Python interface to create battling pokemon agents. poke-env offers an easy-to-use interface for creating ru

184 Dec 30, 2022

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

11 Oct 23, 2022

TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

296 Dec 06, 2022

Deep Learning tutorials in jupyter notebooks.

DeepSchool.io Sign up here for Udemy Course on Machine Learning (Use code DEEPSCHOOL-MARCH to get 85% off course). Goals Make Deep Learning easier (mi

1.8k Dec 28, 2022

Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Swin Unet V2 Swin Unet V2 is a modified version of Swin Unet arxiv based on Swin

26 Dec 03, 2022

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Learning Generative Models of Textured 3D Meshes from Real-World Images This is the reference implementation of "Learning Generative Models of Texture

115 Jan 07, 2023

Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to p

5 Aug 25, 2022

Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

Attack-Probabilistic-Models This is the source code for Adversarial Attacks on Probabilistic Autoregressive Forecasting Models. This repository contai

25 Sep 14, 2022

Pytorch implementation of the paper Time-series Generative Adversarial Networks

TimeGAN-pytorch Pytorch implementation of the paper Time-series Generative Adversarial Networks presented at NeurIPS'19. Jinsung Yoon, Daniel Jarrett

21 Nov 24, 2022

In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Kaggle Competition: Forest Cover Type Prediction In this project we predict the forest cover type (the predominant kind of tree cover) using the carto

1 Mar 15, 2022

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

ProMo (Prosody Morph) Questions? Comments? Feedback? Chat with us on gitter! A library for manipulating pitch and duration in an algorithmic way, for

71 Jan 02, 2023

Standalone pre-training recipe with JAX+Flax

Sabertooth Sabertooth is standalone pre-training recipe based on JAX+Flax, with data pipelines implemented in Rust. It runs on CPU, GPU, and/or TPU, b

26 Nov 28, 2022

Generative Handwriting using LSTM Mixture Density Network with TensorFlow

Generative Handwriting Demo using TensorFlow An attempt to implement the random handwriting generation portion of Alex Graves' paper. See my blog post

686 Nov 24, 2022

Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021) authors: Boris Knyazev, Michal Drozdzal, Graham Taylor, Adriana Romero-Soriano Overv

462 Jan 03, 2023

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

sequitur sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code. It implements three differ

305 Dec 21, 2022

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Related tags

Overview

Perceiver - Pytorch

Install

Usage

Citations

Comments

Releases(0.8.6)

0.8.6(Dec 5, 2022)

0.8.5(Dec 5, 2022)

0.8.4(Dec 5, 2022)

0.8.3(Jan 25, 2022)

0.8.2(Jan 25, 2022)

0.8.1(Dec 12, 2021)

0.8.0(Dec 7, 2021)

0.7.5(Oct 10, 2021)

0.7.4(Oct 4, 2021)

0.7.3(Sep 26, 2021)

0.7.2(Sep 26, 2021)

0.7.1(Sep 13, 2021)

0.7.0(Aug 30, 2021)

0.6.2(Aug 30, 2021)

0.6.1(Aug 30, 2021)

0.6.0(Aug 29, 2021)

0.5.1(Aug 2, 2021)

0.5.0(Aug 2, 2021)

0.4.0(May 11, 2021)

0.3.0(Apr 16, 2021)

0.2.1(Apr 15, 2021)

0.2.0(Apr 10, 2021)

0.1.20(Apr 4, 2021)

0.1.19(Mar 25, 2021)

0.1.18(Mar 23, 2021)

0.1.17(Mar 23, 2021)

0.1.16(Mar 23, 2021)

0.1.15(Mar 23, 2021)

0.1.14(Mar 23, 2021)

0.1.12(Mar 23, 2021)

Owner

Phil Wang

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

Repository for "Toward Practical Monocular Indoor Depth Estimation" (CVPR 2022)

HybVIO visual-inertial odometry and SLAM system

🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

A python interface for training Reinforcement Learning bots to battle on pokemon showdown

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

TuckER: Tensor Factorization for Knowledge Graph Completion

Deep Learning tutorials in jupyter notebooks.

Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

Adversarial Attacks on Probabilistic Autoregressive Forecasting Models.

Pytorch implementation of the paper Time-series Generative Adversarial Networks

In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

Standalone pre-training recipe with JAX+Flax

Generative Handwriting using LSTM Mixture Density Network with TensorFlow

Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code