A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Last update: Sep 20, 2022

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

A PyTorch implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1-2]. Unlike another Unofficial PyTorch implementation [3], our version is heavily borrowed from the official implementation [4] and TensorFlow implementation[5], and try to keep consistent with them.

Usage

You can access the TokenLearner and TokenLearnerModuleV11 class from the tokenlearner file. You can use this layer with a Vision Transformer, MLPMixer, or Video Vision Transformer as done in the paper.

import torch
from tokenlearner import TokenLearner

tklr = TokenLearner(in_channels=128, num_tokens=8, use_sum_pooling=False)

x = torch.ones(256, 32, 32, 128)  # [bs, h, w, c]
y1 = tklr(x)
print(y1.shape)  # [256, 8, 128]

You can also use TokenLearnerModuleV11, which aligns with the official implementation.

import torch
from tokenlearner import TokenLearnerModuleV11

tklr_v11 = TokenLearnerModuleV11(in_channels=128, num_tokens=8, num_groups=4, dropout_rate=0.)

tklr_v11.eval()  # control droput
x = torch.ones(256, 32, 32, 128)   # [bs, h, w, c]
y2 = tklr_v11(x)
print(y2.shape)  # [256, 8, 128]

References

[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297

[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88

[3] Unofficial PyTorch implementation

[4] official implementation

[5] TensorFlow implementation

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Usage

References

Owner

Caiyong Wang

Leveraging Two Types of Global Graph for Sequential Fashion Recommendation, ICMR 2021

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

code for Grapadora research paper experimentation

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

Multiple-criteria decision-making (MCDM) with Electre, Promethee, Weighted Sum and Pareto

Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

Official implementation of "Articulation Aware Canonical Surface Mapping"

Implementation of popular bandit algorithms in batch environments.

Unsupervised Foreground Extraction via Deep Region Competition

PyTorch implementation of PSPNet

Neural Nano-Optics for High-quality Thin Lens Imaging

Global Filter Networks for Image Classification

FMA: A Dataset For Music Analysis

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

Bot developed in Python that automates races in pegaxy.

Pixel-wise segmentation on VOC2012 dataset using pytorch.

Source Code For Template-Based Named Entity Recognition Using BART

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"