A collection of SOTA Image Classification Models in PyTorch

Last update: Dec 30, 2022

Overview

SOTA Image Classification Models in PyTorch

Intended for easy to use and integrate SOTA image classification models into object detection, semantic segmentation, pose estimation, etc.

Model Zoo

Model	ImageNet-1k Top-1 Acc ^(%)	Params ^(M)	GFLOPs	Variants & Weights
MicroNet	51.4`\|`59.4`\|`62.5	2`\|`2`\|`3	6M`\|`12M`\|`21M	M1\|M2\|M3

MobileFormer	76.7`\|`77.9`\|`79.3	9`\|`11`\|`14	214M`\|`294M`\|`508M	214\|294\|508

GFNet	80.1`\|`81.5`\|`82.9	15`\|`32`\|`54	2`\|`5`\|`8	T\|S\|B
PVTv2	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	B1\|B2\|B4
ResT	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	S\|B\|L

Conformer	81.3`\|`83.4`\|`84.1	24`\|`38`\|`83	5`\|`11`\|`23	T\|S\|B
Shuffle	82.4`\|`83.6`\|`84.0	28`\|`50`\|`88	5`\|`9`\|`16	T\|S\|B
CSWin	82.7`\|`83.6`\|`84.2	23`\|`35`\|`78	4`\|`7`\|`15	T\|S\|B

CycleMLP	81.6`\|`83.0`\|`83.2	27`\|`52`\|`76	4`\|`10`\|`12	B2\|B4\|B5
HireMLP	81.8`\|`83.1`\|`83.4	33`\|`58`\|`96	4`\|`8`\|`14	S\|B\|L
sMLP	81.9`\|`83.1`\|`83.4	24`\|`49`\|`66	5`\|`10`\|`14	T\|S\|B

XCiT	80.4`\|`83.9`\|`84.3	12`\|`48`\|`84	2`\|`9`\|`16	T\|S\|M
VOLO	84.2`\|`85.2`\|`85.4	27`\|`59`\|`86	7`\|`14`\|`21	D1\|D2\|D3

Table Notes

Image size is 224x224. EfficientNetv2 uses progressive learning (image size from 128 to 380).
All models' weights are from official repositories.
Only models trained on ImageNet1k are compared.
(Parameters > 200M) Models are not included.
PVTv2, ResT, Conformer, XCiT and CycleMLP models work with any image size.

Usage

Requirements (click to expand)

python >= 3.6
torch >= 1.8.1
torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.

Show Available Models

$ python tools/show.py

A table with model names and variants will be shown:

Model Names    Model Variants
-------------  --------------------------------
ResNet         ['18', '34', '50', '101', '152']
MicroNet       ['M1', 'M2', 'M3']
GFNet          ['T', 'S', 'B']
PVTv2          ['B1', 'B2', 'B3', 'B4', 'B5']
ResT           ['S', 'B', 'L']
Conformer      ['T', 'S', 'B']
Shuffle        ['T', 'S', 'B']
CSWin          ['T', 'S', 'B', 'L']
CycleMLP       ['B1', 'B2', 'B3', 'B4', 'B5']
XciT           ['T', 'S', 'M', 'L']
VOLO           ['D1', 'D2', 'D3', 'D4']

Inference

Download your desired model's weights from Model Zoo table.
Change MODEL parameters and TEST parameters in config file here. And run the the following command.

$ python tools/infer.py --cfg configs/test.yaml

You will see an output similar to this:

File: assests\dog.jpg >>>>> Golden retriever

Training (click to expand)

$ python tools/train.py --cfg configs/train.yaml

Evaluate (click to expand)

$ python tools/val.py --cfg configs/train.yaml

Fine-tune (click to expand)

Fine-tune on CIFAR-10:

$ python tools/finetune.py --cfg configs/finetune.yaml

References (click to expand)

Citations (click to expand)

@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v3},
  year={2021}
}

@article{peng2021conformer,
  title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
  author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
  journal={arXiv preprint arXiv:2105.03889},
  year={2021},
}

@misc{dong2021cswin,
  title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows}, 
  author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
  year={2021},
  eprint={2107.00652},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction}, 
  author={Shoufa Chen and Enze Xie and Chongjian Ge and Ding Liang and Ping Luo},
  year={2021},
  eprint={2107.10224},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{elnouby2021xcit,
  title={XCiT: Cross-Covariance Image Transformers}, 
  author={Alaaeldin El-Nouby and Hugo Touvron and Mathilde Caron and Piotr Bojanowski and Matthijs Douze and Armand Joulin and Ivan Laptev and Natalia Neverova and Gabriel Synnaeve and Jakob Verbeek and Hervé Jegou},
  year={2021},
  eprint={2106.09681},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yuan2021volo,
  title={VOLO: Vision Outlooker for Visual Recognition}, 
  author={Li Yuan and Qibin Hou and Zihang Jiang and Jiashi Feng and Shuicheng Yan},
  year={2021},
  eprint={2106.13112},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yan2020micronet,
  title={MicroNet for Efficient Language Modeling}, 
  author={Zhongxia Yan and Hanrui Wang and Demi Guo and Song Han},
  year={2020},
  eprint={2005.07877},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{chen2021mobileformer,
  title={Mobile-Former: Bridging MobileNet and Transformer}, 
  author={Yinpeng Chen and Xiyang Dai and Dongdong Chen and Mengchen Liu and Xiaoyi Dong and Lu Yuan and Zicheng Liu},
  year={2021},
  eprint={2108.05895},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{rao2021global,
  title={Global Filter Networks for Image Classification},
  author={Rao, Yongming and Zhao, Wenliang and Zhu, Zheng and Lu, Jiwen and Zhou, Jie},
  journal={arXiv preprint arXiv:2107.00645},
  year={2021}
}

@article{huang2021shuffle,
  title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
  author={Huang, Zilong and Ben, Youcheng and Luo, Guozhong and Cheng, Pei and Yu, Gang and Fu, Bin},
  journal={arXiv preprint arXiv:2106.03650},
  year={2021}
}

You might also like...

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

99 Dec 27, 2022

SOTA model in CIFAR10

A PyTorch Implementation of CIFAR Tricks 调研了CIFAR10数据集上各种trick，数据增强，正则化方法，并进行了实现。目前项目告一段落，如果有更好的想法，或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。 0. Requirement

58 Dec 21, 2022

A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

159 Dec 22, 2022

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

7k Jan 2, 2023

Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

2.4k Dec 31, 2022

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Official code Cross-Covariance Image Transformer (XCiT)

605 Jan 2, 2023

Implement face detection, and age and gender classification, and emotion classification.

YOLO Keras Face Detection Implement Face detection, and Age and Gender Classification, and Emotion Classification. (image from wider face dataset) Ove

10 Nov 14, 2022

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

3 Dec 1, 2022

Releases(v0.2.0)

v0.2.0(Sep 12, 2021)

add MicroNet model bug fixes
Source code(tar.gz)
Source code(zip)
v0.1.0(Aug 5, 2021)
Add the following models and weights:

Swin

PVTv2

GFNet

CSWin

CrossFormer

CycleMLP

Source code(tar.gz)
Source code(zip)

Model	ImageNet-1k Top-1 Acc ^(%)	Params ^(M)	GFLOPs	Variants & Weights
MicroNet	51.4`\|`59.4`\|`62.5	2`\|`2`\|`3	6M`\|`12M`\|`21M	M1\|M2\|M3

MobileFormer	76.7`\|`77.9`\|`79.3	9`\|`11`\|`14	214M`\|`294M`\|`508M	214\|294\|508

GFNet	80.1`\|`81.5`\|`82.9	15`\|`32`\|`54	2`\|`5`\|`8	T\|S\|B
PVTv2	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	B1\|B2\|B4
ResT	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	S\|B\|L

Conformer	81.3`\|`83.4`\|`84.1	24`\|`38`\|`83	5`\|`11`\|`23	T\|S\|B
Shuffle	82.4`\|`83.6`\|`84.0	28`\|`50`\|`88	5`\|`9`\|`16	T\|S\|B
CSWin	82.7`\|`83.6`\|`84.2	23`\|`35`\|`78	4`\|`7`\|`15	T\|S\|B

CycleMLP	81.6`\|`83.0`\|`83.2	27`\|`52`\|`76	4`\|`10`\|`12	B2\|B4\|B5
HireMLP	81.8`\|`83.1`\|`83.4	33`\|`58`\|`96	4`\|`8`\|`14	S\|B\|L
sMLP	81.9`\|`83.1`\|`83.4	24`\|`49`\|`66	5`\|`10`\|`14	T\|S\|B

XCiT	80.4`\|`83.9`\|`84.3	12`\|`48`\|`84	2`\|`9`\|`16	T\|S\|M
VOLO	84.2`\|`85.2`\|`85.4	27`\|`59`\|`86	7`\|`14`\|`21	D1\|D2\|D3

A collection of SOTA Image Classification Models in PyTorch

Related tags

Overview

SOTA Image Classification Models in PyTorch

Model Zoo

Usage

You might also like...

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

SOTA model in CIFAR10

A toolkit for document-level event extraction, containing some SOTA model implementations

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Collection of generative models in Pytorch version.

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Implement face detection, and age and gender classification, and emotion classification.

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

Releases(v0.2.0)

v0.2.0(Sep 12, 2021)

v0.1.0(Aug 5, 2021)

Owner

sithu3

Export CenterPoint PonintPillars ONNX Model For TensorRT

[cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Uses Open AI Gym environment to create autonomous cryptocurrency bot to trade cryptocurrencies.

IEEE Winter Conference on Applications of Computer Vision 2022 Accepted

The implemention of Video Depth Estimation by Fusing Flow-to-Depth Proposals

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

Interpretable-contrastive-word-mover-s-embedding

NEO: Non Equilibrium Sampling on the orbit of a deterministic transform

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

Fairness Metrics: All you need to know

[CVPR 2021] Forecasting the panoptic segmentation of future video frames

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

Ranking Models in Unlabeled New Environments （iccv21）

Monitora la qualità della ricezione dei segnali radio nelle province siciliane.

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Pytorch implementation of BRECQ, ICLR 2021

Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network