Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Related tags

Deep LearningArch-Net
Overview

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

The official implementation of Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

TL;DR Arch-Net is a family of neural networks made up of simple and efficient operators. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. For the classification task, only 30k unlabeled images randomly sampled from ImageNet dataset is needed.

Main Results

ImageNet Classification

Model Bit Width Top1 Top5
Arch-Net_Resnet18 32w32a 69.76 89.08
Arch-Net_Resnet18 2w4a 68.77 88.66
Arch-Net_Resnet34 32w32a 73.30 91.42
Arch-Net_Resnet34 2w4a 72.40 91.01
Arch-Net_Resnet50 32w32a 76.13 92.86
Arch-Net_Resnet50 2w4a 74.56 92.39
Arch-Net_MobilenetV1 32w32a 68.79 88.68
Arch-Net_MobilenetV1 2w4a 67.29 88.07
Arch-Net_MobilenetV2 32w32a 71.88 90.29
Arch-Net_MobilenetV2 2w4a 69.09 89.13

Multi30k Machine Translation

Model translation direction Bit Width BLEU
Transformer English to Gemany 32w32a 32.44
Transformer English to Gemany 2w4a 33.75
Transformer English to Gemany 4w4a 34.35
Transformer English to Gemany 8w8a 36.44
Transformer Gemany to English 32w32a 30.32
Transformer Gemany to English 2w4a 32.50
Transformer Gemany to English 4w4a 34.34
Transformer Gemany to English 8w8a 34.05

Dependencies

python == 3.6

refer to requirements.txt for more details

Data Preparation

Download ImageNet and multi30k data(google drive or BaiduYun, code: 8brd) and put them in ./arch-net/data/ as follow:

./data/
├── imagenet
│   ├── train
│   ├── val
├── multi30k

Download teacher models at google drive or BaiduYun(code: 57ew) and put them in ./arch-net/models/teacher/pretrained_models/

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

train and evaluate

cd ./train_imagenet

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn

evaluate if you already have the trained models

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn --evaluate

Machine Translation

train a arch-net_transformer of 2w4a

cd ./train_transformer

python3 train_archnet_transformer.py --translate_direction en2de --teacher_model_path ../models/teacher/pretrained_models/transformer_en_de.chkpt --data_pkl ../data/multi30k/m30k_ende_shr.pkl --batch_size 48 --final_epochs 50 --weight_bit 2 --feature_bit 4 --lr 1e-3 --weight_decay 1e-6 --label_smoothing
  • for arch-net_transformer of 8w8a, use the lr of 1e-3 and the weight decay of 1e-4

evaluate

cd ./evaluate

python3 translate.py --data_pkl ./data/multi30k/m30k_ende_shr.pkl --model path_to_the_outptu_directory/model_max_acc.chkpt
  • to get the BLEU of the evaluated results, go to this website, and then upload 'predictions.txt' in the output directory and the 'gt_en.txt' or 'gt_de.txt' in ./arch-net/data_gt/multi30k/

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{xu2021archnet,
      title={Arch-Net: Model Distillation for Architecture Agnostic Model Deployment}, 
      author={Weixin Xu and Zipeng Feng and Shuangkang Fang and Song Yuan and Yi Yang and Shuchang Zhou},
      year={2021},
      eprint={2111.01135},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

attention-is-all-you-need-pytorch

LSQuantization

pytorch-mobilenet-v1

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

Owner
MEGVII Research
Power Human with AI. 持续创新拓展认知边界 非凡科技成就产品价值
MEGVII Research
FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection

FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection This repository contains an implementation of FCAF3D, a 3D object detection method introdu

SamsungLabs 153 Dec 29, 2022
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

Mikaela Uy 294 Dec 12, 2022
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

HKBU High Performance Machine Learning Lab 6 Nov 18, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

The source code is temporariy removed, as we are solving potential copyright and license issues with GRANSO (http://www.timmitchell.com/software/GRANS

SUN Group @ UMN 28 Aug 03, 2022
Code for Learning to Segment The Tail (LST)

Learning to Segment the Tail [arXiv] In this repository, we release code for Learning to Segment The Tail (LST). The code is directly modified from th

47 Nov 07, 2022
implementation for paper "ShelfNet for fast semantic segmentation"

ShelfNet-lightweight for paper (ShelfNet for fast semantic segmentation) This repo contains implementation of ShelfNet-lightweight models for real-tim

Juntang Zhuang 252 Sep 16, 2022
Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV

Wasi Ahmad 26 Dec 03, 2022
This is a library for training and applying sparse fine-tunings with torch and transformers.

This is a library for training and applying sparse fine-tunings with torch and transformers. Please refer to our paper Composable Sparse Fine-Tuning f

Cambridge Language Technology Lab 37 Dec 30, 2022
This repository contains a Ruby API for utilizing TensorFlow.

tensorflow.rb Description This repository contains a Ruby API for utilizing TensorFlow. Linux CPU Linux GPU PIP Mac OS CPU Not Configured Not Configur

somatic labs 825 Dec 26, 2022
A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

Abhishek Singh 444 Jan 03, 2023
ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees This repository is the official implementation of the empirica

Kuan-Lin (Jason) Chen 2 Oct 02, 2022
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
Record radiologists' eye gaze when they are labeling images.

Record radiologists' eye gaze when they are labeling images. Read for installation, usage, and deep learning examples. Why use MicEye Versatile As a l

24 Nov 03, 2022
Code for “ACE-HGNN: Adaptive Curvature ExplorationHyperbolic Graph Neural Network”

ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network This repository is the implementation of ACE-HGNN in PyTorch. Environment pyt

9 Nov 28, 2022
A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

GFNet-Pytorch (NeurIPS 2020) This repo contains the official code and pre-trained models for the glance and focus network (GFNet). Glance and Focus: a

Rainforest Wang 169 Oct 28, 2022
Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

3D Infomax improves GNNs for Molecular Property Prediction Video | Paper We pre-train GNNs to understand the geometry of molecules given only their 2D

Hannes Stärk 95 Dec 30, 2022
[CoRL 2021] A robotics benchmark for cross-embodiment imitation.

x-magical x-magical is a benchmark extension of MAGICAL specifically geared towards cross-embodiment imitation. The tasks still provide the Demo/Test

Kevin Zakka 36 Nov 26, 2022
CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels

CoINN: Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels Accurate pressure drop estimat

Alejandro Montanez 0 Jan 21, 2022
This package implements the algorithms introduced in Smucler, Sapienza, and Rotnitzky (2020) to compute optimal adjustment sets in causal graphical models.

optimaladj: A library for computing optimal adjustment sets in causal graphical models This package implements the algorithms introduced in Smucler, S

Facundo Sapienza 6 Aug 04, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022