Awesome MLP-based Transformers papers

An up-to-date list of Transformers based fully on MLPs without attention!

Why this repo?

After transformers and fully-based attention mechanism models took over most of the deep learning world since 2019, it appears that the power does not come from attention, and indeed replacing the feed-forward network in a transformer by attention performs horrible (~30% top-1 on ImageNet). It appears that Attention is not all we need. After all, we don't need inductive-biased models such as CNNs anymore, and we can lean back on MLPs since (1) we have enough data, (2) We have powerful optimization, regularization and data augmentation techniques. As we saw a big hipe on transformers awesome vision transformer and BERT-related papers, we expect to see a big hipe in fully MLP-based networks without attention, and the research focus is now shited to finding efficient ways of mixing tokens without involving attention mechanisms. This repository aims at gathering and collecting all these kind of papers.

Contributing

Please help in contributing to this list by submitting an issue or a pull request

- Paper Name [[pdf]](link) [[code]](link)

Papers

MLP-Mixer: An all-MLP Architecture for Vision [pdf] [official code] [code] [code] [code] [Yannic Kilcher Video]
Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet [pdf] [code]
ResMLP: Feedforward networks for image classification with data-efficient training [pdf] [code] [code] [code]
Pay Attention to MLPs [pdf] [code] [code] [code]
FNet: Mixing Tokens with Fourier Transforms [pdf] [code] [Yannic Kilcher Video]
Can Attention Enable MLPs To Catch Up With CNNs? [pdf]
MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation [pdf]
On the Bias Against Inductive Biases [pdf]
S² MLP: Spatial-Shift MLP Architecture for Vision [pdf]
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition [pdf] [code]
Rethinking Token-Mixing MLP for MLP-based Vision Backbone [pdf]
Global Filter Networks for Image Classification [pdf] [code]
What Makes for Hierarchical Vision Transformer? [pdf]
As-MLP: An Axial Shifted MLP architecture for Vision [pdf][code]
CycleMLP: A MLP-like Architecture for Dense Prediction [pdf][code]
S² MLPv2: Improved Spatial-Shift MLP Architecture for Vision [pdf]
RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [pdf] [code]
Hire-MLP: Vision MLP via Hierarchical Rearrangement [pdf]
Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [pdf]
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? [pdf]
Patches Are All You Need? [pdf] [code]
Exploring the Limits of Large Scale Pre-training [pdf]
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs [pdf] [code]
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation [pdf] [code]
Are We Ready for a New Paradigm Shift? A Survey on Visual Deep MLP [pdf]
MetaFormer is Actually What You Need for Vision [pdf] [code]
An Image Patch is a Wave: Phase-Aware Vision MLP [pdf]
MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video [pdf]
SWAT: Spatial Structure Within and Among Tokens [pdf]
MLP Architectures for Vision-and-Language Modeling: An Empirical Study [pdf] [code]
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [pdf] [code]

Transformers based fully on MLPs

Related tags

Overview

Awesome MLP-based Transformers papers

Why this repo?

Contributing

Papers

Owner

Fawaz Sammani

App for identification of various objects. Based on YOLO v4 tiny architecture

Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

Tutorials, assignments, and competitions for MIT Deep Learning related courses.

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

Prevent `CUDA error: out of memory` in just 1 line of code.

Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

A toolkit for Lagrangian-based constrained optimization in Pytorch

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Official repository for "Orthogonal Projection Loss" (ICCV'21)

[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Set of models for classifcation of 3D volumes

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.