CMT: Convolutional Neural Networks Meet Vision Transformers

Last update: Dec 30, 2022

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

This repo is the CMT model which impelement with pytorch, no reference source code so this is a non-official version.

2. Enveriments

python 3.7+
pytorch 1.7.1
pillow
apex
opencv-python

You can see this repo to find how to install the apex

3. DataSet

Trainig

/data/home/imagenet/train/xxx.jpeg, 0
/data/home/imagenet/train/xxx.jpeg, 1
...
/data/home/imagenet/train/xxx.jpeg, 999

Testing

/data/home/imagenet/test/xxx.jpeg, 0
/data/home/imagenet/test/xxx.jpeg, 1
...
/data/home/imagenet/test/xxx.jpeg, 999

4. Training & Inference

Training

CMT-Tiny

#!/bin/bash
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1
export OMP_NUM_THREADS
export MKL_NUM_THREADS
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train.py --batch_size 512 --num_workers 48 --lr 6e-3 --optimizer_name "adamw" --tf_optimizer 1 --cosine 1 --model_name cmtti --max_epochs 300 \
--warmup_epochs 5 --num-classes 1000 --input_size 184 \ --crop_size 160 --weight_decay 1e-1 --grad_clip 0 --repeated-aug 0 --max_grad_norm 5.0 
--drop_path_rate 0.1 --FP16 0 --qkv_bias 1 
--ape 0 --rpe 1 --pe_nd 0 --mode O2 --amp 1 --apex 0 \ 
--train_file $file_folder$/train.txt \
--val_file $file_folder$/val.txt \
--log-dir $save_folder$/log_dir \
--checkpoints-path $save_folder$/checkpoints

Note: If you use the bs 128 * 8 may be get more accuracy, balance the acc & speed.

Inference

#!/bin/bash
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore test.py \
--dist-url 'tcp://127.0.0.1:9966' --dist-backend 'nccl' --multiprocessing-distributed=1 --world-size=1  --rank=0 
--batch-size 128 --num-workers 48 --num-classes 1000 --input_size 184 --crop_size 160 \
--ape 0 --rpe 1 --pe_nd 0 --qkv_bias 1 --swin 0 --model_name cmtti --dropout 0.1 --emb_dropout 0.1 \
--test_file $file_folder$/val.txt \
--checkpoints-path $save_folder$/checkpoints/xxx.pth.tar \
--save_folder $save_folder$/acc_logits/

calculate acc

python utils/calculate_acc.py --logits_file $save_folder$/acc_logits/

5. Imagenet Result

model-name	input_size	FLOPs	Params	[email protected]_crop(ours)	acc(papers)	weights
CMT-T	160x160	516M	11.3M	75.124%	79.2%	weights
CMT-T	224x224	1.01G	11.3M	78.4%	-	weights
CMT-XS	192x192	-	-	-	81.8%	-
CMT-S	224x224	-	-	-	83.5%	-
CMT-L	256x256	-	-	-	84.5%	-

6. TODO

Other result may comming sonn if someone need.
Release the CMT-XS result on the imagenet.
Check the diff with papers, author give the hyparameters on the issue
Adjusting the best hyperparameters for CMT or transformers

Supplementary

If you want to know more, I give the CMT explanation, as well as the tuning and training process on here.

CMT: Convolutional Neural Networks Meet Vision Transformers

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

2. Enveriments

3. DataSet

4. Training & Inference

5. Imagenet Result

6. TODO

Supplementary

Owner

FlyEgle

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

A set of tools for Namebase and HNS

Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Fantasy Points Prediction and Dream Team Formation

Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

This is an official pytorch implementation of Fast Fourier Convolution.

FTIR-Deep Learning - FTIR Deep Learning With Python

ICLR21 Tent: Fully Test-Time Adaptation by Entropy Minimization

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Variational autoencoder for anime face reconstruction

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

Utility code for use with PyXLL

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

PyTorch implementation of probabilistic deep forecast applied to air quality.

Autonomous Robots Kalman Filters

Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Accurate identification of bacteriophages from metagenomic data using Transformer