Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Last update: Dec 05, 2022

Overview

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

This is the Pytorch implementation for sparse progressive distillation (SPD). For more details about the motivation, techniques and experimental results, refer to our paper here.

Running

Environment Preparation (using python3)
```
pip install -r requirements.txt
```
Dataset Preparation

The original GLUE dataset could be downloaded here.

BERT_base fine-tuning on GLUE

We use finetuned BERT_base as the teacher. For each task of GLUE benchmark, we obtain the finetuned model using the original huggingface transformers code with the following script.

python run_glue.py \
          --model_name_or_path $INT_DIR \
          --task_name $TASK_NAME \
          --do_train \
          --do_eval \
          --data_dir $GLUE_DIR/$TASK_NAME/ \
          --max_seq_length 128 \
          --per_gpu_train_batch_size 32 \
          --per_gpu_eval_batch_size 32 \
          --learning_rate 3e-5 \
          --num_train_epochs 4.0 \
          --output_dir $OUT_DIR \
          --evaluate_during_training \
          --overwrite_output_dir \
          --logging_steps 400 \
          --logging_dir $OUT_DIR \
          --save_steps 10000

Sparse Progressive Distillation

We use run_glue.py to run the sparse progressive distillation. --num_prune_epochs is the epochs for pruning. --num_train_epochs is the total number of epochs (pruning, progressive distillation, finetuning).

python run_glue.py \
  --model_name_or_path PATH_TO_FINETUNED_MODEL \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/$TASK_NAME/ \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 6.4e-4 \
  --save_steps 50 \
  --num_prune_epochs 30 \
  --num_train_epochs 60 \
  --sparsity 0.9 \
  --output_dir $OUT_DIR \
  --evaluate_during_training \
  --replacing_rate 0.8 \
  --overwrite_output_dir \
  --steps_for_replacing 0 \
  --scheduler_type linear

To Dos

Provide our teacher model for each task.
Provide best performed model checkpoint for each task.

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Related tags

Overview

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Running

BERT_base fine-tuning on GLUE

Sparse Progressive Distillation

To Dos

Owner

PSPNet in Chainer

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

This repository compare a selfie with images from identity documents and response if the selfie match.

This is an open solution to the Home Credit Default Risk challenge 🏡

DLWP: Deep Learning Weather Prediction

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Optical machine for senses sensing using speckle and deep learning

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight)

A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

Apply AnimeGAN-v2 across frames of a video clip

TensorFlow implementation of AlexNet and its training and testing on ImageNet ILSVRC 2012 dataset

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Weakly Supervised Segmentation by Tensorflow.

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

An Open Source Machine Learning Framework for Everyone

Implementation for Panoptic-PolarNet (CVPR 2021)

Animation of solving the traveling salesman problem to optimality using mixed-integer programming and iteratively eliminating sub tours

Dark Finix: All in one hacking framework with almost 100 tools

This is the official code release for the paper Shape and Material Capture at Home