Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Last update: Dec 02, 2022

Related tags

Overview

MobileViT

RegNet

Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TRANSFORMER.

Model Architecture
Usage
Citation

Model Architecture

MobileViT Architecture

Usage

Training

python main.py

optional arguments:
  -h, --help            show this help message and exit
  --gpu_device GPU_DEVICE
                        Select specific GPU to run the model
  --batch-size N        Input batch size for training (default: 64)
  --epochs N            Number of epochs to train (default: 20)
  --num-class N         Number of classes to classify (default: 10)
  --lr LR               Learning rate (default: 0.01)
  --weight-decay WD     Weight decay (default: 1e-5)
  --model-path PATH     Path to save the model

Citation

@InProceedings{Sachin2021,
  title = {MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TRANSFORMER},
  author = {Sachin Mehta and Mohammad Rastegari},
  booktitle = {},
  year = {2021}
}

If this implement have any problem please let me know, thank you.

Comments

Training settings

I really appreciate your efforts in implementing this model in pytorch. Here, I have one concern about the training settings. If what I understand is correct, you just trained the model for less than 5 epoches.

In addition, the hyper-parameters you adopted is different from that in the original article. For instance, in the original manuscript, authors train mobilevit using AdamW optimizer, label smoothing cross-entry and multi-scale sampler. The training phase has a warmup stage.

I also found that the classificaion accuracy provided here is much lower than that in the original version.

I conjecture that the gab between accuracies are caused by different training settings.

opened by hkzhang91 6

load pretrain weight failed

import torch
import models

model = models.MobileViT_S()
PATH = "./MobileVit-S.pth.tar"
weights = torch.load(PATH, map_location=lambda storage, loc: storage)
model.load_state_dict(weights['state_dict'])
model.eval()
torch.save(model, './model.pt')

I try to load the pre-train weight to test one demo; but the network structure does not seem to match the weights, is there any solution?

opened by hererookie 2

model training hyperparameter

A problem has been bothering me. the learning rate, optimizer, batch_size, L2 regularization, label smoothing and epochs are inconsistent with the paper. How should I modify the code?

opened by Agino-ltp 1
Have you test MobileVit on cifar-10?

Thanks for your wonderful work！

I prepare to try MobileVit on small dataset, such as MNIST, and I need adjust the network structure. Before this work, I want to know if MobileVit has a better performance than other networks on small dataset.

I notice "get_cifar10_dataset" in utils.py. Have you tested MobileVit on cifar-10? If you have, could you please show me the accuracy and inference time result?

opened by Jerryme-xxm 1

Issues when loading MobileViT_S()

I wanted to load the MobileViT_S() model and use the pre-trained weights, but I have got some errors in my code. To make it easier and help others, I will share my solution (in case there will be someone who is beginner like me):

def load_mobilevit_weights(model_path):
  # Create an instance of the MobileViT model
  net = MobileViT_S()
  
  # Load the PyTorch state_dict
  state_dict = torch.load(model_path, map_location=torch.device('cpu'))['state_dict']
  
  # Since there is a problem in the names of layers, we will change the keys to meet the MobileViT model architecture
  for key in list(state_dict.keys()):
    state_dict[key.replace('module.', '')] = state_dict.pop(key)
  
  # Once the keys are fixed, we can modify the parameters of MobileViT
  net.load_state_dict(state_dict)
  
  return net

net = load_mobilevit_weights("MobileViT_S_model_best.pth.tar")

opened by Sehaba95 4

Releases(weight)

weight(Oct 18, 2021)

https://drive.google.com/file/d/1ZQt1vACHTN98QJYaT2JW3kPF-wziHyPX/view?usp=sharing
Source code(tar.gz)
Source code(zip)
best_coco.pt(20.07 MB)

Owner

Hong-Jia Chen

Master student at National Chung Cheng University, Taiwan. Interested in Deep Learning and Computer Vision.

GitHub Repository

A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up/down.

HandTrackingBrightnessControl A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up

19 Feb 12, 2022

A machine learning malware analysis framework for Android apps.

🕵️ A machine learning malware analysis framework for Android apps. ☢️ DroidDetective is a Python tool for analysing Android applications (APKs) for p

77 Dec 27, 2022

🎃 Core identification module of AI powerful point reading system platform.

ppReader-Kernel Intro Core identification module of AI powerful point reading system platform. Usage 硬件： Windows10、GPU：nvdia GTX 1060 、普通RBG相机软件： con

1 Jan 11, 2022

Assessing syntactic abilities of BERT

BERT-Syntax Assesing the syntactic abilities of BERT. What Evaluate Google's BERT-Base and BERT-Large models on the syntactic agreement datasets from

147 Aug 02, 2022

The ARCA23K baseline system

ARCA23K Baseline System This is the source code for the baseline system associated with the ARCA23K dataset. Details about ARCA23K and the baseline sy

4 Jul 02, 2022

PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

546 Dec 05, 2022

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

3 Oct 06, 2022

CN24 is a complete semantic segmentation framework using fully convolutional networks

Build status: master (production branch): develop (development branch): Welcome to the CN24 GitHub repository! CN24 is a complete semantic segmentatio

123 Jul 14, 2022

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

26 Oct 25, 2022

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Related tags

Overview

MobileViT

RegNet

Table of Contents

Model Architecture

Usage

Training

Citation

If this implement have any problem please let me know, thank you.

Comments

Training settings

load pretrain weight failed

model training hyperparameter

Have you test MobileVit on cifar-10?

Issues when loading MobileViT_S()

Releases(weight)

weight(Oct 18, 2021)

Owner

Hong-Jia Chen

A hand tracking demo made with mediapipe where you can control lights with pinching your fingers and moving your hand up/down.

A machine learning malware analysis framework for Android apps.

🎃 Core identification module of AI powerful point reading system platform.

Assessing syntactic abilities of BERT

The ARCA23K baseline system

PyTorch implementations of algorithms for density estimation

ALBERT-pytorch-implementation - ALBERT pytorch implementation

CN24 is a complete semantic segmentation framework using fully convolutional networks

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

🛰️ List of earth observation companies and job sites

[CVPR'21] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration

Cockpit is a visual and statistical debugger specifically designed for deep learning.

The first public PyTorch implementation of Attentive Recurrent Comparators

Unofficial pytorch implementation of 'Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization'

Jetson Nano-based smart camera system that measures crowd face mask usage in real-time.

This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

TensorLight - A high-level framework for TensorFlow

I3-master-layout - Simple master and stack layout script

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

vit for few-shot classification