Implementation of Nyström Self-attention, from the paper Nyströmformer

Last update: Jan 02, 2023

Overview

Nyström Attention

Implementation of Nyström Self-attention, from the paper Nyströmformer.

Install

$ pip install nystrom-attention

Usage

import torch
from nystrom_attention import NystromAttention

attn = NystromAttention(
    dim = 512,
    dim_head = 64,
    heads = 8,
    num_landmarks = 256,    # number of landmarks
    pinv_iterations = 6,    # number of moore-penrose iterations for approximating pinverse. 6 was recommended by the paper
    residual = True         # whether to do an extra residual with the value or not. supposedly faster convergence if turned on
)

x = torch.randn(1, 16384, 512)
mask = torch.ones(1, 16384).bool()

attn(x, mask = mask) # (1, 16384, 512)

Nyströmformer, layers of Nyström attention

import torch
from nystrom_attention import Nystromformer

model = Nystromformer(
    dim = 512,
    dim_head = 64,
    heads = 8,
    depth = 6,
    num_landmarks = 256,
    pinv_iterations = 6
)

x = torch.randn(1, 16384, 512)
mask = torch.ones(1, 16384).bool()

model(x, mask = mask) # (1, 16384, 512)

You can also import it as Nyströmer if you wish

from nystrom_attention import Nystromer

Citations

@misc{xiong2021nystromformer,
    title   = {Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention},
    author  = {Yunyang Xiong and Zhanpeng Zeng and Rudrasis Chakraborty and Mingxing Tan and Glenn Fung and Yin Li and Vikas Singh},
    year    = {2021},
    eprint  = {2102.03902},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

Comments

Clarification on masking
Given the dimensionality of the mask argument, (N, T), I'm assuming this is a boolean mask for masking out padding tokens. I created the following function to generate such a mask given an input tensor:

def _create_pad_mask(self, x: torch.LongTensor) -> torch.BoolTensor: mask = torch.ones_like(x).to(torch.bool) mask[x==0] = False return mask

where 0 is the padding token, setting positions to False so not to attend to them.

However, I am unsure how to apply a causal mask to the attention layers so to prevent my decoder from accessing future elements. I couldn't see an example of this in the full Nystromformer module. How can I achieve this?

For context, I am trying to apply the causal mask generated by the following function:

def _create_causal_mask(self, x: torch.LongTensor) -> torch.FloatTensor: size = x.shape[1] mask = (torch.triu(torch.ones(size, size)) == 1).transpose(0, 1) mask = mask.float().masked_fill_(mask == 0, float('-inf')).masked_fill_(mask==1, 0.0) return mask

One way I can think of is to set return_attn to True, apply the mask on the returned attention weights then matmul with the value tensor. But this has a few issues:

Having to return v

Computing the full attention matrix (I think), defeating the entire point of linear attention

Needlessly calculating out only to discard it.

Is this just a limitation of Nystrom attention? Or am I overlooking something obvious?

Thanks
opened by vvvm23 3
Possible bug with padding
Hey there,

I was going through the code and I noticed the following, which I found curious.

In Line 75, you pad the input tensor to a multiple of num_landmarks from the front:

x = F.pad(x, (0, 0, padding, 0), value = 0)

In Line 144 you trim the extra padding elements you inserted in the output tensor from the end.

out = out[:, :n]

Am I not getting something, or should we be removing the front elements of out?

out = out[:, out.size(1) - n:]
opened by georgepar 2
Nystrom for Image processing
thank you for sharing the wondeful code. I am working on image processing and wanted to try your code for the same. I have 2 doubts:

How to select residual_conv_kernel? I could not find any details for the same. also, it is enabled by a flag. When should we enable it and when to disable it?

Is there any guideline for deciding num_landmarks for image processing task?

Thanks
opened by paragon1234 1
Error when mask is of the same size as that of the input X

Hi,

First of all, thank you for putting such an easy to use implementation on GitHub. I'm trying to incorporate the nystrom attention into a legacy codebase, it previously used to provide the input X and the mask (off the same dimensions as X) to a Multi headed Attention Layer.

When I'm trying to integrate nystrom attention with it, it runs alright without the mask. But, when I pass the mask alongside it, it throws einops rearrange error.

Sorry, if this is a very basic question, but how would you recommend I deal with handling 3D mask (same dimensions as the size of input) in the codebase.

Best, VB

opened by Vaibhavs10 1

ViewBackward inplace deprecation warning

Hello again,

The following code results in a UserWarning in PyTorch 1.8.1.

In [1]: from nystrom_attention.nystrom_attention import NystromAttention

In [2]: import torch

In [3]: attn = NystromAttention(256)

In [4]: x = torch.randn(1, 8192, 256)

In [5]: attn(x)
/home/alex/.tmp/nystrom-attention/nystrom_attention/nystrom_attention.py:91: UserWarning: Output 0 of ViewBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  ../torch/csrc/autograd/variable.cpp:547.)
  q *= self.scale
Out[5]:
tensor([[[-0.0449, -0.1726,  0.1409,  ...,  0.0127,  0.2287, -0.2437],
         [-0.1132,  0.3229, -0.1279,  ...,  0.0084, -0.3307, -0.2351],
         [ 0.0361,  0.1013,  0.0828,  ...,  0.1045, -0.1627,  0.0736],
         ...,
         [ 0.0018,  0.1385, -0.1716,  ..., -0.0366, -0.0682,  0.0241],
         [ 0.1497,  0.0149, -0.0020,  ..., -0.0352, -0.1126,  0.0193],
         [ 0.1341,  0.0077,  0.1627,  ..., -0.0363,  0.1057, -0.2071]]],
       grad_fn=<SliceBackward>)

Not a huge issue, but worth mentioning

opened by vvvm23 1

Relative position encoding

Similar to the question raised for the performer architecture , is it possible to implement a relative position encoding given the methodology in which attention is calculated?

opened by jdcla 1
How can we implement "batch_first" in Nystrom attention?

Hi,

Thanks a lot for implementing the nystromformer attention algorithm! Very nice job!

I am wondering whether it is feasible to add the "batch_first" option in the nystrom attention algorithm? This allow the algorithm to be integrated in the existing pytorch transformer encoder architecture.

opened by mark0935git 0
x-transformers

Hi @lucidrains - just wondering if we can plug in Nystrom Attention with x-transformers?

I've been plugging in Vision Transformers with X-transformers but am wondering if its possible to have a Nystrom transformer with x-transformer improvements to plug into a ViT?

opened by robbohua 0

Releases(0.0.11)

0.0.11(Apr 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Mar 18, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Feb 24, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Feb 18, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Feb 14, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.0.1(Feb 11, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need.

GitHub Repository

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning AutoPentest-DRL is an automated penetration testing framework based o

217 Jan 01, 2023

Active window border replacement for window managers.

xborder Active window border replacement for window managers. Usage git clone https://github.com/deter0/xborder cd xborder chmod +x xborders ./xborder

250 Dec 30, 2022

FedML: A Research Library and Benchmark for Federated Machine Learning

FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed

2.3k Jan 08, 2023

This repository contains code used to audit the stability of personality predictions made by two algorithmic hiring systems

Stability Audit This repository contains code used to audit the stability of personality predictions made by two algorithmic hiring systems, Humantic

4 Oct 27, 2022

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

2.5k Dec 29, 2022

2D Time independent Schrodinger equation solver for arbitrary shape of well

Schrodinger Well Python Python solver for timeless Schrodinger equation for well with arbitrary shape https://imgur.com/a/jlhK7OZ Pictures of circular

24 Nov 18, 2022

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

566 Dec 28, 2022

RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

90 Dec 08, 2022

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Disentangle Your Dense Object Detector This repo contains the supported code and configuration files to reproduce object detection results of Disentan

51 Jan 07, 2023

Free like Freedom

This is all very much a work in progress! More to come! ( We're working on it though! Stay tuned!) Installation Open an Anaconda Prompt (in Windows, o

2.3k Jan 04, 2023

An Open-Source Tool for Automatic Disease Diagnosis..

OpenMedicalChatbox An Open-Source Package for Automatic Disease Diagnosis. Overview Due to the lack of open source for existing RL-base automated diag

8 Nov 08, 2022

Detectron2 for Document Layout Analysis

Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det

163 Nov 21, 2022

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 03, 2022

Implementation of Nyström Self-attention, from the paper Nyströmformer

Related tags

Overview

Nyström Attention

Install

Usage

Citations

Comments

Releases(0.0.11)

0.0.11(Apr 6, 2021)

0.0.10(Mar 18, 2021)

0.0.9(Feb 24, 2021)

0.0.8(Feb 18, 2021)

0.0.7(Feb 14, 2021)

0.0.6(Feb 12, 2021)

0.0.5(Feb 12, 2021)

0.0.4(Feb 12, 2021)

0.0.3(Feb 12, 2021)

0.0.2(Feb 12, 2021)

0.0.1(Feb 11, 2021)

Owner

Phil Wang

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning

Active window border replacement for window managers.

FedML: A Research Library and Benchmark for Federated Machine Learning

This repository contains code used to audit the stability of personality predictions made by two algorithmic hiring systems

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

2D Time independent Schrodinger equation solver for arbitrary shape of well

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealFormer-Pytorch Implementation of RealFormer using pytorch

Official Implementation of DDOD (Disentangle your Dense Object Detector), ACM MM2021

Free like Freedom

An Open-Source Tool for Automatic Disease Diagnosis..

Detectron2 for Document Layout Analysis

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

Joint deep network for feature line detection and description

Predicting Tweet Sentiment Maching Learning and streamlit

Steerable discovery of neural audio effects

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

This is an easy python software which allows to sort images with faces by gender and after by age.

Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

A python3 tool to take a 360 degree survey of the RF spectrum (hamlib + rotctld + RTL-SDR/HackRF)