Fastformer

Notes from the authors

Pytorch/Keras implementation of Fastformer. The keras version only includes the core fastformer attention part. The pytorch version is written in a huggingface transformers style. The jupyter notebooks contain the quickstart codes for text classification on AG's News (without pretrained word embeddings for simplicity), which can be directly run. We noticed that in our experiments, NOT all tasks need FFNN, residual connection, layer normalization and even position embedding. For example, we find that in news recommendation, it is better to directly use Fastformer without layer normalization and position embedding. However, in Ad CVR prediction, both position embedding and layer normalization are needed.

Keras version: 2.2.4 (may not be compatible with higher versions)

TF version: from 1.12 to 1.15 (may be compatible with lower versions)

Pytorch version: 1.6.0 (may be compatible with higher/lower versions)

Citation

@article{wu2021fastformer,
  title={Fastformer: Additive Attention Can Be All You Need},
  author={Wu, Chuhan and Wu, Fangzhao and Qi, Tao and Huang, Yongfeng},
  journal={arXiv preprint arXiv:2108.09084},
  year={2021}
}

A pytorch &keras implementation and demo of Fastformer.

Related tags

Overview

Fastformer

Notes from the authors

Citation

Owner

VGG16 model-based classification project about brain tumor detection.

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Turning SymPy expressions into PyTorch modules.

abess: Fast Best-Subset Selection in Python and R

Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch

Probabilistic Gradient Boosting Machines

C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

An example showing how to use jax to train resnet50 on multi-node multi-GPU

Hyperbolic Hierarchical Clustering.

Meaningful titles for tabs and PDF downloads! Also supports tab search.

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems - NAACL 2021

code for "Self-supervised edge features for improved Graph Neural Network training",

学习 python3 以来写的一些垃圾玩具……

A list of multi-task learning papers and projects.

Analysis of Smiles through reservoir sampling & RDkit

nnFormer: Interleaved Transformer for Volumetric Segmentation

Global-Local Context Network for Person Search

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".

This is a demo app to be used in the video streaming applications