Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Overview

CAPE 🌴 pylint pytest

PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!

Setup 🔧

Minimum requirements:

torch >= 1.10.0

Install from source:

git clone https://github.com/gcambara/cape.git
cd cape
pip install --editable ./

Usage 📖

Ready to go along with PyTorch's official implementation of Transformers. Default initialization behaves identically as sinusoidal positional embeddings, summing them up to your content embeddings:

from torch import nn
from cape import CAPE1d

pos_emb = CAPE1d(d_model=512)
transformer = nn.Transformer(d_model=512)

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x = pos_emb(x) # forward sums the positional embedding by default
x = transformer(x)

Alternatively, you can get positional embeddings separately

x = torch.randn(10, 32, 512)
pos_emb = pos_emb.compute_pos_emb(x)

scale = 512**0.5
x = (scale * x) + pos_emb
x = transformer(x)

Let's see a few examples of CAPE initialization for different modalities, inspired by the original paper experiments.

CAPE for text 🔤

CAPE1d is ready to be applied to text. Keep max_local_shift between 0 and 0.5 to shift local positions without disordering them.

from cape import CAPE1d
pos_emb = CAPE1d(d_model=512, max_global_shift=5.0, 
                 max_local_shift=0.5, max_global_scaling=1.03, 
                 normalize=False)

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x = pos_emb(x)

Padding is supported by indicating the length of samples in the forward method, with the x_lengths argument. For example, the original length of samples is 7, although they have been padded to sequence length 10.

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x_lengths = torch.ones(32)*7
x = pos_emb(x, x_lengths=x_lengths)

CAPE for audio 🎙️

CAPE1d for audio is applied similarly to text. Use positions_delta argument to set the separation in seconds between time steps, and x_lengths for indicating sample durations in case there is padding.

For instance, let's consider no padding and same hop size (30 ms) at every sample in the batch:

# Max global shift is 60 s.
# Max local shift is set to 0.5 to maintain positional order.
# Max global scaling is 1.1, according to WSJ recipe.
# Freq scale is 30 to ensure that 30 ms queries are possible with long audios
from cape import CAPE1d
pos_emb = CAPE1d(d_model=512, max_global_shift=60.0, 
                 max_local_shift=0.5, max_global_scaling=1.1, 
                 normalize=True, freq_scale=30.0)

x = torch.randn(100, 32, 512) # seq_len, batch_size, n_feats
positions_delta = 0.03 # 30 ms of stride
x = pos_emb(x, positions_delta=positions_delta)

Now, let's imagine that the original duration of all samples is 2.5 s, although they have been padded to 3.0 s. Hop size is 30 ms for every sample in the batch.

x = torch.randn(100, 32, 512) # seq_len, batch_size, n_feats

duration = 2.5
positions_delta = 0.03
x_lengths = torch.ones(32)*duration
x = pos_emb(x, x_lengths=x_lengths, positions_delta=positions_delta)

What if the hop size is different for every sample in the batch? E.g. first half of the samples have stride of 30 ms, and the second half of 50 ms.

positions_delta = 0.03
positions_delta = torch.ones(32)*positions_delta
positions_delta[16:] = 0.05
x = pos_emb(x, positions_delta=positions_delta)
positions_delta
tensor([0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300,
        0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0500, 0.0500,
        0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500,
        0.0500, 0.0500, 0.0500, 0.0500, 0.0500])

Lastly, let's consider a very rare case, where hop size is different for every sample in the batch, and is not constant within some samples. E.g. stride of 30 ms for the first half of samples, and 50 ms for the second half. However, the hop size of the very first sample linearly increases for each time step.

from einops import repeat
positions_delta = 0.03
positions_delta = torch.ones(32)*positions_delta
positions_delta[16:] = 0.05
positions_delta = repeat(positions_delta, 'b -> b new_axis', new_axis=100)
positions_delta[0, :] *= torch.arange(1, 101)
x = pos_emb(x, positions_delta=positions_delta)
positions_delta
tensor([[0.0300, 0.0600, 0.0900,  ..., 2.9400, 2.9700, 3.0000],
        [0.0300, 0.0300, 0.0300,  ..., 0.0300, 0.0300, 0.0300],
        [0.0300, 0.0300, 0.0300,  ..., 0.0300, 0.0300, 0.0300],
        ...,
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500]])

CAPE for ViT 🖼️

CAPE2d is used for embedding positions in image patches. Scaling of positions between [-1, 1] is done within the module, whether patches are square or non-square. Thus, set max_local_shift between 0 and 0.5, and the scale of local shifts will be adjusted according to the height and width of patches. Beyond values of 0.5 the order of positions might be altered, do this at your own risk!

from cape import CAPE2d
pos_emb = CAPE2d(d_model=512, max_global_shift=0.5, 
                 max_local_shift=0.5, max_global_scaling=1.4)

# Case 1: square patches
x = torch.randn(16, 16, 32, 512) # height, width, batch_size, n_feats
x = pos_emb(x)

# Case 2: non-square patches
x = torch.randn(24, 16, 32, 512) # height, width, batch_size, n_feats
x = pos_emb(x)

Citation ✍️

I just did this PyTorch implementation following the paper's Python code and the Flashlight recipe in C++. All the credit goes to the original authors, please cite them if you use this for your research project:

@inproceedings{likhomanenko2021cape,
title={{CAPE}: Encoding Relative Positions with Continuous Augmented Positional Embeddings},
author={Tatiana Likhomanenko and Qiantong Xu and Gabriel Synnaeve and Ronan Collobert and Alex Rogozhnikov},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=n-FqqWXnWW}
}

Acknowledgments 🙏

Many thanks to the paper's authors for code reviewing and clarifying doubts about the paper and the implementation. :)

You might also like...
Implementation of
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Rotary Embeddings - Pytorch A standalone library for adding rotary embeddings to transformers in Pytorch, following its success as relative positional

A PyTorch Implementation of
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

PyTorch implementation of the NIPS-17 paper
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Styled Augmented Translation
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

TANL: Structured Prediction as Translation between Augmented Natural Languages

TANL: Structured Prediction as Translation between Augmented Natural Languages Code for the paper "Structured Prediction as Translation between Augmen

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Releases(v1.0.0)
Owner
Guillermo Cámbara
🎙️ PhD Candidate in Self-Supervised Learning + Speech Recognition @ Universitat Pompeu Fabra & Telefónica Research
Guillermo Cámbara
Contextual Attention Localization for Offline Handwritten Text Recognition

CALText This repository contains the source code for CALText model introduced in "CALText: Contextual Attention Localization for Offline Handwritten T

0 Feb 17, 2022
Materials for my scikit-learn tutorial

Scikit-learn Tutorial Jake VanderPlas email: [email protected] twitter: @jakevdp gith

Jake Vanderplas 1.6k Dec 30, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 03, 2023
An educational resource to help anyone learn deep reinforcement learning.

Status: Maintenance (expect bug fixes and minor updates) Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that ma

OpenAI 7.6k Jan 09, 2023
Memory-Augmented Model Predictive Control

Memory-Augmented Model Predictive Control This repository hosts the source code for the journal article "Composing MPC with LQR and Neural Networks fo

Fangyu Wu 1 Jun 19, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
[ICML 2021] A fast algorithm for fitting robust decision trees.

GROOT: Growing Robust Trees Growing Robust Trees (GROOT) is an algorithm that fits binary classification decision trees such that they are robust agai

Cyber Analytics Lab 17 Nov 21, 2022
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

GalaXC GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification @InProceedings{Saini21, author = {Saini, D. and Jain,

Extreme Classification 28 Dec 05, 2022
The Illinois repository for Climatehack (https://climatehack.ai/). We won 1st place!

Climatehack This is the repository for Illinois's Climatehack Team. We earned first place on the leaderboard with a final score of 0.87992. An overvie

Jatin Mathur 20 Jun 09, 2022
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
Official implement of Paper:A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images 深度监督影像融合网络DSIFN用于高分辨率双时相遥感影像变化检测 Of

Chenxiao Zhang 135 Dec 19, 2022
Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships.

feature-set-comp Compares various time-series feature sets on computational performance, within-set structure, and between-set relationships. Reposito

Trent Henderson 7 May 25, 2022
Randomizes the warps in a stock pokeemerald repo.

pokeemerald warp randomizer Randomizes the warps in a stock pokeemerald repo. Usage Instructions Install networkx and matplotlib via pip3 or similar.

Max Thomas 6 Mar 17, 2022
(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

IsoTree Fast and multi-threaded implementation of Extended Isolation Forest, Fair-Cut Forest, SCiForest (a.k.a. Split-Criterion iForest), and regular

141 Dec 29, 2022
This library contains a Tensorflow implementation of the paper Stability Analysis of Unfolded WMMSE for Power Allocation

UWMMSE-stability Tensorflow implementation of Stability Analysis of UWMMSE Overview This library contains a Tensorflow implementation of the paper Sta

Arindam Chowdhury 1 Nov 16, 2022
BackgroundRemover lets you Remove Background from images and video with a simple command line interface

BackgroundRemover BackgroundRemover is a command line tool to remove background from video and image, made by nadermx to power https://BackgroundRemov

Johnathan Nader 1.7k Dec 30, 2022
Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Şebnem 6 Jan 18, 2022
PECOS - Prediction for Enormous and Correlated Spaces

PECOS - Predictions for Enormous and Correlated Output Spaces PECOS is a versatile and modular machine learning (ML) framework for fast learning and i

Amazon 387 Jan 04, 2023
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

NNProject - DeepMask This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks. Th

189 Nov 16, 2022