Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

Overview

Deformable Butterfly: A Highly Structured and Sparse Linear Transform

DeBut

Advantages

  • DeBut generalizes the square power of two butterfly factor matrices, which allows learnable factorized linear transform with strutured sparsity and flexible input-output size.
  • The intermediate matrix dimensions in a DeBut chain can either shrink or grow to permit a variable tradeoff between number of parameters and representation power.

Running Codes

Our codes include two parts, namely: 1) ALS initialization for layers in the pretrained model and 2) fine-tuning the compressed modelwith DeBut layers. To make it easier to verify the experimental results, we provide the running commands and the corresponding script files, which allow the readers to reproduce the results displayed in the tables.

We test our codes on Pytorch 1.2 (cuda 11.2). To install DeBut, run:

git clone https://github.com/RuiLin0212/DeBut.git
pip install -r requirements.txt

Alternative Initialization (ALS)

This part of the codes aims to:

  • Verify whether the given chain is able to generate a dense matrix at the end.
  • Initialize the DeBut factors of a selected layer in the given pretrained model.

Besides, as anextension, ALS initialization can be used to approxiamte any matrix, not necessarily a certain layer of a pretrained model.

Bipolar Test

python chain_test.py \
--sup [superscript of the chain] \
--sub [subscript of the chain] \
--log_path [directory where the summaries will be stored]

We offer an example to check a chain designed for a matrix of size [512, 4608], run:

sh ./script/bipolar_test.sh

Layer Initialization

python main.py
--type_init ALS3 \
--sup [superscript of the chain] \
--sub [subscript of the chain] \
--iter [number of iterations, and 2 iterations are equal to 1 sweep] \
--model [name of the model] \
--layer_name [name of the layer that will be substituted by DeBut factors] \
--pth_path [path of the pretrained model] \
--log_path [directory where the summaries will be stored] \
--layer_type [type of the selected layer, fc or conv] \
--gpu [index of the GPU that will be used]

For LeNet, VGG-16-BN, and ResNet-50, we provide an example of one layer for each neural network, respectively, run:

sh ./script/init_lenet.sh \ # FC1 layer in the modified LeNet
sh ./script/init_vgg.sh \ # CONV1 layer in VGG-16-BN
sh ./script/init_resnet.sh # layer4.1.conv1 in ResNet-50

Matrix Approximation

python main.py \
--type_init ALS3 \
--sup [superscript of the chain] \
--sub [subscript of the chain] \
--iter [number of iterations, and 2 iterations are equal to 1 sweep] \
--F_path [path of the matrix that needs to be approximated] \
--log_path [directory where the summaries will be stored] \
--gpu [index of the GPU that will be used]

We generate a random matrix of size [512, 2048], to approximate this matrix, run:

sh ./script/init_matrix.sh 

Fine-tuning

After using ALS initialization to get the well-initialized DeBut factors of the selected layers, we aim at fine-tuning the compressed models with DeBut layers in the second stage. In the following, we display the commands we use for [email protected], [email protected], and [email protected], respectively. Besides, we give the scripts, which can run to reproduce our experimental results. It is worth noting that there are several important arguments related to the DeBut chains and initialized DeBut factors in the commands:

  • r_shape_txt: The path to .txt files, which describe the shapes of the factors in the given monotonic or bulging DeBut chains
  • debut_layers: The name of the selected layers, which will be substituted by the DeBut factors.
  • DeBut_init_dir: The directory of the well-initialized DeBut factors.

MNIST & CIFAR-10

For dataset MNIST and CIFAR-10, we train our models using the following commands.

python train.py \
–-log_dir [directory of the saved logs and models] \
–-data_dir [directory to training data] \
–-r_shape_txt [path to txt files for shapes of the chain] \
–-dataset [MNIST/CIFAR10] \
–-debut_layers [layers which use DeBut] \
–-arch [LeNet_DeBut/VGG_DeBut] \
–-use_pretrain [whether to use the pretrained model] \
–-pretrained_file [path to the pretrained checkpoint file] \
–-use_ALS [whether to use ALS as the initialization method] \
–-DeBut_init_dir [directory of the saved ALS files] \
–-batch_size [training batch] \
–-epochs [training epochs] \
–-learning_rate [training learning rate] \
–-lr_decay_step [learning rate decay step] \
–-momentum [SGD momentum] \
–-weight_decay [weight decay] \
–-gpu [index of the GPU that will be used]

ImageNet

For ImageNet, we use commands as below:

python train_imagenet.py \
-–log_dir [directory of the saved logs and models] \
–-data_dir [directory to training data] \
–-r_shape_txt [path to txt files for shapes of the chain] \
–-arch resnet50 \
–-pretrained_file [path to the pretrained checkpoint file] \
–-use_ALS [whether to use ALS as the initialization method] \
–-DeBut_init_dir [directory of the saved ALS files] \
–-batch_size [training batch] \
–-epochs [training epochs] \
–-learning_rate [training learning rate] \
–-momentum [SGD momentum] \
–-weight_decay [weight decay] \
–-label_smooth [label smoothing] \
–-gpu [index of the GPU that will be used]

Scripts

We also provide some examples of replacing layers in each neural network, run:

sh ./bash_files/train_lenet.sh n # Use DeBut layers in the modified LeNet
sh ./bash_files/train_vgg.sh n # Use DeBut layers in VGG-16-BN
553 sh ./bash_files/train_imagenet.sh n # Use DeBut layers in ResNet-50

Experimental Results

Architecture

We display the structures of the modified LeNet and VGG-16 we used in our experiments. Left: The modified LeNet with a baseline accuracy of 99.29% on MNIST. Right: VGG-16-BN with a baseline accuracy of 93.96% on CIFAR-10. In both networks, the activation, max pooling and batch normalization layers are not shown for brevity.

LeNet Trained on MNIST

DeBut substitution of single and multiple layers in the modified LeNet. LC and MC stand for layer-wise compression and model-wise compression, respectively, whereas "Params" means the total number of parameters in the whole network. These notations apply to subsequent tables.

VGG Trained on CIFAR-10

DeBut substitution of single and multiple layers in VGG-16-BN.

ResNet-50 Trained on ImageNet

Results of ResNet-50 on ImageNet. DeBut chains are used to substitute the CONV layers in the last three bottleneck blocks.

Comparison

LeNet on MNIST

VGG-16-BN on CIFAR-10

Appendix

For more experimental details please check Appendix.

License

DeBut is released under MIT License.

Owner
Rui LIN
Rui LIN
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

1 Nov 01, 2021
A simple log parser and summariser for IIS web server logs

IISLogFileParser A basic parser tool for IIS Logs which summarises findings from the log file. Inspired by the Gist https://gist.github.com/wh13371/e7

2 Mar 26, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models

Cross-framework Python Package for Evaluation of Latent-based Generative Models Latte Latte (for LATent Tensor Evaluation) is a cross-framework Python

Karn Watcharasupat 30 Sep 08, 2022
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
6D Grasping Policy for Point Clouds

GA-DDPG [website, paper] Installation git clone https://github.com/liruiw/GA-DDPG.git --recursive Setup: Ubuntu 16.04 or above, CUDA 10.0 or above, py

Lirui Wang 48 Dec 21, 2022
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Dec 31, 2022
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

FENSE The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evalua

Zhiling Zhang 13 Dec 23, 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

P-tuning v2 P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks An optimized prompt tuning strategy achievi

THUDM 540 Dec 30, 2022
The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees This project contains the codes of pap

0 Apr 20, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 03, 2023
PyTorch implementation for our paper "Deep Facial Synthesis: A New Challenge"

FSGAN Here is the official PyTorch implementation for our paper "Deep Facial Synthesis: A New Challenge". This project achieve the translation between

Deng-Ping Fan 32 Oct 10, 2022
PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Improving Visual-Semantic Embeddings with Hard Negatives Code for the image-caption retrieval methods from VSE++: Improving Visual-Semantic Embeddings

Fartash Faghri 441 Dec 05, 2022
Code for reproducing experiments in "Improved Training of Wasserstein GANs"

Improved Training of Wasserstein GANs Code for reproducing experiments in "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, Tensor

Ishaan Gulrajani 2.2k Jan 01, 2023
This repo is customed for VisDrone.

Object Detection for VisDrone(无人机航拍图像目标检测) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
Official PyTorch implementation of the Fishr regularization for out-of-distribution generalization

Fishr: Invariant Gradient Variances for Out-of-distribution Generalization Official PyTorch implementation of the Fishr regularization for out-of-dist

62 Dec 22, 2022
Does Pretraining for Summarization Reuqire Knowledge Transfer?

Pretraining summarization models using a corpus of nonsense

Approximately Correct Machine Intelligence (ACMI) Lab 12 Dec 19, 2022
This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021.

Open Rule Induction This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021. Abstract Rule

Xingran Chen 16 Nov 14, 2022
Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

1 Nov 12, 2021