Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

Overview

RefinerViT

This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm and include the relabbeling trick included in TokenLabelling.

Introduction

Refined Vision Transformer is initially described in arxiv, which observes vision transformers require much more datafor model pre-training. Most of recent works thus are dedicated to designing morecomplex architectures or training methods to address the data-efficiency issue ofViTs. However, few of them explore improving the self-attention mechanism, akey factor distinguishing ViTs from CNNs. Different from existing works, weintroduce a conceptually simple scheme, calledrefiner, to directly refine the self-attention maps of ViTs. Specifically, refiner exploresattention expansionthatprojects the multi-head attention maps to a higher-dimensional space to promotetheir diversity. Further, refiner applies convolutions to augment local patternsof the attention maps, which we show is equivalent to adistributed local atten-tion—features are aggregated locally with learnable kernels and then globallyaggregated with self-attention. Extensive experiments demonstrate that refinerworks surprisingly well. Significantly, it enables ViTs to achieve 86% top-1 classifi-cation accuracy on ImageNet with only 81M parameters.

Please run git clone with --recursive to clone timm as submodule and install it with cd pytorch-image-models && pip install -e ./

Requirements

torch>=1.4.0 torchvision>=0.5.0 pyyaml numpy timm==0.4.5

A summary of the results are shown below for quick reference. Details can be found in the paper.

Model head layer dim Image resolution Param Top 1
Refiner-ViT-S 12 16 384 224 25M 83.6
Refiner-ViT-S 12 16 384 384 25M 84.6
Refiner-ViT-M 12 32 420 224 55M 84.6
Refiner-ViT-M 12 32 420 384 55M 85.6
Refiner-ViT-L 16 32 512 224 81M 84.9
Refiner-ViT-L 16 32 512 384 81M 85.8
Refiner-ViT-L 16 32 512 448 81M 86.0

Training

Train the Refiner-ViT-S from scratch:

bash run.sh scripts/refiner_s.yaml 

To use the re-labbeling tricks for improving the accuracy, download the relabel_data based on NFNet. This is provided in TokenLabelling repo. Then, copy the relabbeling data to the data folder.

Modelisation on galaxy evolution using PEGASE-HR

model_galaxy Modelisation on galaxy evolution using PEGASE-HR This is a labwork done in internship at IAP directed by Damien Le Borgne (https://github

Adrien Anthore 1 Jan 14, 2022
Setup freqtrade/freqUI on Heroku

UNMAINTAINED - REPO MOVED TO https://github.com/p-zombie/freqtrade Creating the app git clone https://github.com/joaorafaelm/freqtrade.git && cd freqt

João 51 Aug 29, 2022
CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

CenterFace Introduce CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices. Recent Update 2019.09.

StarClouds 1.2k Dec 21, 2022
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c

Christopher Doer 64 Dec 29, 2022
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023
A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

mocap4face by Facemoji mocap4face by Facemoji is a free, multiplatform SDK for real-time facial motion capture based on Facial Action Coding System or

Facemoji 591 Dec 27, 2022
Consecutive-Subsequence - Simple software to calculate susequence with highest sum

Simple software to calculate susequence with highest sum This repository contain

Gbadamosi Farouk 1 Jan 31, 2022
Run object detection model on the Raspberry Pi

Using TensorFlow Lite with Python is great for embedded devices based on Linux, such as Raspberry Pi.

Dimitri Yanovsky 6 Oct 08, 2022
TResNet: High Performance GPU-Dedicated Architecture

TResNet: High Performance GPU-Dedicated Architecture paperV2 | pretrained models Official PyTorch Implementation Tal Ridnik, Hussam Lawen, Asaf Noy, I

426 Dec 28, 2022
Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC. Para los Laboratorios de la materia, vamos a utilizar el len

Luis Biedma 18 Dec 12, 2022
Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Information Gain Filtration Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF sho

4 Jul 28, 2022
“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Data Augmentation for Cross-Domain Named Entity Recognition Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio This repository

<a href=[email protected]"> 18 Sep 10, 2022
Implementation for the "Surface Reconstruction from 3D Line Segments" paper.

Surface Reconstruction from 3D Line Segments Surface reconstruction from 3d line segments. Langlois, P. A., Boulch, A., & Marlet, R. In 2019 Internati

85 Jan 04, 2023
Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

Data Efficient Stagewise Knowledge Distillation Table of Contents Data Efficient Stagewise Knowledge Distillation Table of Contents Requirements Image

IvLabs 112 Dec 02, 2022
Tiny Object Detection in Aerial Images.

AI-TOD AI-TOD is a dataset for tiny object detection in aerial images. [Paper] [Dataset] Description AI-TOD comes with 700,621 object instances for ei

jwwangchn 116 Dec 30, 2022
The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

PRIMER The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization. PRIMER is a pre-trained model for mu

AI2 114 Jan 06, 2023
Unet network with mean teacher for altrasound image segmentation

Unet network with mean teacher for altrasound image segmentation

5 Nov 21, 2022
YOLOv5🚀 reproduction by Guo Quanhao using PaddlePaddle

YOLOv5-Paddle YOLOv5 🚀 reproduction by Guo Quanhao using PaddlePaddle 支持AutoBatch 支持AutoAnchor 支持GPU Memory 快速开始 使用AIStudio高性能环境快速构建YOLOv5训练(PaddlePa

QuanHao Guo 20 Nov 14, 2022
Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Unsupervised Learning of Visual 3D Keypoints for Control [Project Website] [Paper] Boyuan Chen1, Pieter Abbeel1, Deepak Pathak2 1UC Berkeley 2Carnegie

Boyuan Chen 34 Jul 22, 2022
NEG loss implemented in pytorch

Pytorch Negative Sampling Loss Negative Sampling Loss implemented in PyTorch. Usage neg_loss = NEG_loss(num_classes, embedding_size) optimizer =

Daniil Gavrilov 123 Sep 13, 2022