Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Overview

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

This is the Pytorch implementation for sparse progressive distillation (SPD). For more details about the motivation, techniques and experimental results, refer to our paper here.

Running

  • Environment Preparation (using python3)

    pip install -r requirements.txt
  • Dataset Preparation

    The original GLUE dataset could be downloaded here.

BERT_base fine-tuning on GLUE

We use finetuned BERT_base as the teacher. For each task of GLUE benchmark, we obtain the finetuned model using the original huggingface transformers code with the following script.

python run_glue.py \
          --model_name_or_path $INT_DIR \
          --task_name $TASK_NAME \
          --do_train \
          --do_eval \
          --data_dir $GLUE_DIR/$TASK_NAME/ \
          --max_seq_length 128 \
          --per_gpu_train_batch_size 32 \
          --per_gpu_eval_batch_size 32 \
          --learning_rate 3e-5 \
          --num_train_epochs 4.0 \
          --output_dir $OUT_DIR \
          --evaluate_during_training \
          --overwrite_output_dir \
          --logging_steps 400 \
          --logging_dir $OUT_DIR \
          --save_steps 10000

Sparse Progressive Distillation

We use run_glue.py to run the sparse progressive distillation. --num_prune_epochs is the epochs for pruning. --num_train_epochs is the total number of epochs (pruning, progressive distillation, finetuning).

python run_glue.py \
  --model_name_or_path PATH_TO_FINETUNED_MODEL \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/$TASK_NAME/ \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 6.4e-4 \
  --save_steps 50 \
  --num_prune_epochs 30 \
  --num_train_epochs 60 \
  --sparsity 0.9 \
  --output_dir $OUT_DIR \
  --evaluate_during_training \
  --replacing_rate 0.8 \
  --overwrite_output_dir \
  --steps_for_replacing 0 \
  --scheduler_type linear

To Dos

  • Provide our teacher model for each task.

  • Provide best performed model checkpoint for each task.

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in 🇰🇷 Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Dec 31, 2022
Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

AISys Lab 27 Jan 05, 2023
The Python3 import playground

The Python3 import playground I have been confused about python modules and packages, this text tries to clear the topic up a bit. Sources: https://ch

Michael Moser 5 Feb 22, 2022
Data visualization app for H&M competition in kaggle

handm_data_visualize_app Data visualization app by streamlit for H&M competition in kaggle. competition page: https://www.kaggle.com/competitions/h-an

Kyohei Uto 12 Apr 30, 2022
GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

GAN-STEM-Conv2MultiSlice GAN method to help covert lower resolution STEM images generated by convolution methods to higher resolution STEM images gene

UW-Madison Computational Materials Group 2 Feb 10, 2021
Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

diffusion-channels Source code for paper "Deep Diffusion Models for Robust Channel Estimation". Generic flow: Use 'matlab/main.mat' to generate traini

The University of Texas Computational Sensing and Imaging Lab 15 Dec 22, 2022
CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

CvT2DistilGPT2 Improving Chest X-Ray Report Generation by Leveraging Warm-Starting This repository houses the implementation of CvT2DistilGPT2 from [1

The Australian e-Health Research Centre 21 Dec 28, 2022
PyTorch META-DATASET (Few-shot classification benchmark)

PyTorch META-DATASET (Few-shot classification benchmark) This repo contains a PyTorch implementation of meta-dataset and a unified implementation of s

Malik Boudiaf 39 Oct 31, 2022
Hyperbolic Procrustes Analysis Using Riemannian Geometry

Hyperbolic Procrustes Analysis Using Riemannian Geometry The code in this repository creates the figures presented in this article: Please notice that

Ronen Talmon's Lab 2 Jan 08, 2023
NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

J Brave 2 Sep 01, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 03, 2023
BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

BARF 🤮 : Bundle-Adjusting Neural Radiance Fields Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey IEEE International Conference on Comp

Chen-Hsuan Lin 539 Dec 28, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 51 Jan 06, 2023
HyperaPy: An automatic hyperparameter optimization framework ⚡🚀

hyperpy HyperPy: An automatic hyperparameter optimization framework Description HyperPy: Library for automatic hyperparameter optimization. Build on t

Sergio Mora 7 Sep 06, 2022
PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

PyGAD: Genetic Algorithm in Python PyGAD is an open-source easy-to-use Python 3 library for building the genetic algorithm and optimizing machine lear

Ahmed Gad 1.1k Dec 26, 2022
An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Simple Tar Dataset An unopinionated replacement for PyTorch's Dataset and ImageFolder classes, for datasets stored as uncompressed Tar archives. Just

Joao Henriques 47 Dec 20, 2022
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

ROSITA News & Updates (24/08/2021) Release the demo to perform fine-grained semantic alignments using the pretrained ROSITA model. (15/08/2021) Releas

Vision and Language Group@ MIL 48 Dec 23, 2022
Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

Please read the blog post that goes with this code! Jupyter Notebook Setup System Requirements: Python, pip (Optional) virtualenv To start the Jupyter

Denny Britz 863 Dec 15, 2022
Official implementation of NeurIPS'2021 paper TransformerFusion

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers Project Page | Paper | Video TransformerFusion: Monocular RGB Scene Reconstru

Aljaz Bozic 118 Dec 25, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022