A curated list of awesome resources combining Transformers with Neural Architecture Search

Overview

Awesome Transformer Architecture Search: Awesome

To keep track of the large number of recent papers that look at the intersection of Transformers and Neural Architecture Search (NAS), we have created this awesome list of curated papers and resources, inspired by awesome-autodl, awesome-architecture-search, and awesome-computer-vision. Papers are divided into the following categories:

  1. General Transformer search
  2. Domain Specific, applied Transformer search (divided into NLP, Vision, ASR)
  3. Insights on Transformer components or searchable parameters
  4. Transformer Surveys

This repository is maintained by the AutoML Group Freiburg. Please feel free to pull requests or open an issue to add papers.

General Transformer Search

Title Venue Group
UniNet: Unified Architecture Search with Convolutions, Transformer and MLP arxiv [Oct'21] SenseTime
Analyzing and Mitigating Interference in Neural Architecture Search arxiv [Aug'21] Tsinghua, MSR
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search ICCV'21 Sun Yat-sen University
Memory-Efficient Differentiable Transformer Architecture Search ACL-IJCNLP'21 MSR, Peking University
Finding Fast Transformers: One-Shot Neural Architecture Search by Component Composition arxiv [Aug'20] Google Research
AutoTrans: Automating Transformer Design via Reinforced Architecture Search arxiv [Sep'20] Fudan University
NAT: Neural Architecture Transformer for Accurate and Compact Architectures NeurIPS'19 Tencent AI
The Evolved Transformer ICML'19 Google Brain

Domain Specific Transformer Search

Vision

Title Venue Group
AutoFormer: Searching Transformers for Visual Recognition ICCV'21 MSR
GLiT: Neural Architecture Search for Global and Local Image Transformer ICCV'21 University of Sydney
Searching for Efficient Multi-Stage Vision Transformers ICCV'21 workshop MIT
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers CVPR'21 Bytedance Inc.
Vision Transformer Architecture Search arxiv [June'21] SenseTime, Tsingua University

Natural Language Processing

Title Venue Group
AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models ACL'21 MIT
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search KDD'21 MSR, Tsinghua University
AutoBERT-Zero: Evolving the BERT backbone from scratch arxiv [July'21] Huawei Noah’s Ark Lab
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing ACL'20 MIT

Automatic Speech Recognition

Title Venue Group
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search ICASSP'21 MSR
Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR arxiv [Aug'21] NPU, Xi'an
Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search arxiv [April'21] Chinese Academy of Sciences
Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition INTERSPEECH'20 VUNO Inc.

Insights on Transformer components and interesting papers

Title Venue Group
Patches are All You Need ? ICLR'22 under review -
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows ICCV'21 best paper MSR
Rethinking Spatial Dimensions of Vision Transformers ICCV'21 NAVER AI
What makes for hierarchical vision transformers arxiv [Sept'21] HUST
AutoAttend: Automated Attention Representation Search ICML'21 Tsinghua University
Rethinking Attention with Performers ICLR'21 Oral Google
LambdaNetworks: Modeling long-range Interactions without Attention ICLR'21 Google Research
HyperGrid Transformers ICLR'21 Google Research
LocalViT: Bringing Locality to Vision Transformers arxiv [April'21] ETH Zurich
NASABN: A Neural Architecture Search Framework for Attention-Based Networks IJCNN'20 Chinese Academy of Sciences
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned ACL'19 Yandex

Transformer Surveys

Title Venue Group
Transformers in Vision: A Survey arxiv [Oct'21] MBZ University of AI
Efficient Transformers: A Survey arxiv [Sept'21] Google Research

Misc resources

Owner
Yash Mehta
Researcher, deep learning 🍁 Previously @GatsbyUCL, @NTUsingapore, @AmazonSDE
Yash Mehta
Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

Milad Moradi 3 Sep 20, 2022
Companion repo of the UCC 2021 paper "Predictive Auto-scaling with OpenStack Monasca"

Predictive Auto-scaling with OpenStack Monasca Giacomo Lanciano*, Filippo Galli, Tommaso Cucinotta, Davide Bacciu, Andrea Passarella 2021 IEEE/ACM 14t

Giacomo Lanciano 0 Dec 07, 2022
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection 🤖 Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 03, 2022
Used to record WKU's utility bills on a regular basis.

WKU水电费小助手 一个用于定期记录WKU水电费的脚本 Looking for English Readme? 背景 由于WKU校园内的水电账单系统时常存在扣费延迟的现象,而补扣的费用缺乏令人信服的证明。不少学生为费用摸不着头脑,但也没有申诉的依据。为了更好地掌握水电费使用情况,留下一手证据,我开源

2 Jul 21, 2022
The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II.

Ruo-Ze Liu 216 Jan 04, 2023
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022
This is official implementaion of paper "Token Shift Transformer for Video Classification".

This is official implementaion of paper "Token Shift Transformer for Video Classification". We achieve SOTA performance 80.40% on Kinetics-400 val. Paper link

VideoNet 60 Dec 30, 2022
Exploring Image Deblurring via Blur Kernel Space (CVPR'21)

Exploring Image Deblurring via Encoded Blur Kernel Space About the project We introduce a method to encode the blur operators of an arbitrary dataset

VinAI Research 118 Dec 19, 2022
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

idn-solver Paper | Project Page This repository contains the code release of our ICCV 2021 paper: A Confidence-based Iterative Solver of Depths and Su

zhaowang 43 Nov 17, 2022
Large scale embeddings on a single machine.

Marius Marius is a system under active development for training embeddings for large-scale graphs on a single machine. Training on large scale graphs

Marius 107 Jan 03, 2023
This repository contains PyTorch models for SpecTr (Spectral Transformer).

SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation This repository contains PyTorch models for SpecTr (Spectral Transformer).

Boxiang Yun 45 Dec 13, 2022
DataCLUE: 国内首个以数据为中心的AI测评(含模型分析报告)

DataCLUE: A Benchmark Suite for Data-centric NLP You can get the english version of README. 以数据为中心的AI测评(DataCLUE) 内容导引 章节 描述 简介 介绍以数据为中心的AI测评(DataCLUE

CLUE benchmark 135 Dec 22, 2022
AI Summer's complete catalog of articles

Learn Deep Learning with AI Summer A collection of all articles (almost 100) written for the AI Summer blog organized by topic. Deep Learning Theory M

AI Summer 95 Dec 29, 2022
Controlling a game using mediapipe hand tracking

These scripts use the Google mediapipe hand tracking solution in combination with a webcam in order to send game instructions to a racing game. It features 2 methods of control

3 May 17, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

A Workbook for the Qiskit Developer Certification Exam Hello everyone! This is Bartu, a fellow Qiskitter. I have recently taken the Certification exam

Bartu Bisgin 66 Dec 10, 2022
Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)

Graph Convolutional Gated Recurrent Neural Network (GCGRNN) Improved from Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF

Lei Lin 21 Dec 18, 2022
City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces

City Surfaces: City-scale Semantic Segmentation of Sidewalk Surfaces Paper Temporary GitHub page for City Surfaces paper. More soon! While designing s

14 Nov 10, 2022
Multi agent DDPG algorithm written in Python + Pytorch

Multi agent DDPG algorithm written in Python + Pytorch. It also includes a Jupyter notebook, Tennis.ipynb, as a showcase.

Rogier Wachters 2 Feb 26, 2022