Explore extreme compression for pre-trained language models

Last update: Nov 14, 2022

Related tags

Deep Learning Xcompression

Overview

Explore extreme compression for pre-trained language models

Code for paper "Exploring extreme parameter compression for pre-trained language models ICLR2022"

Before Training

install some libraries

 pip install tensorly==0.5.0

Torch is needed, torch 1.0-1.4 is preferred

Install horovod for distributed learning

Configuration Install horovod on GPU

pip install horovod[pytorch]

loading pre-trained models

wget https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin -P  models/bert-base-uncased
wget https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt -P  models/bert-base-uncased
cp models/bert-base-uncased/pytorch_model.bin models/bert-td-72-384/pytorch_model.bin 
cp models/bert-base-uncased/vocab.txt models/bert-td-72-384/vocab.txt

generate training data for given corpora (e.g., saved in the path "corpora" )

python pregenerate_training_data.py --train_corpus ${CORPUS_RAW} \ 
                  --bert_model ${BERT_BASE_DIR}$ \
                  --reduce_memory --do_lower_case \
                  --epochs_to_generate 3 \
                  --output_dir ${CORPUS_JSON_DIR}$

task data augmentation

python data_augmentation.py --pretrained_bert_model ${BERT_BASE_DIR}$ \
                            --glove_embs ${GLOVE_EMB}$ \
                            --glue_dir ${GLUE_DIR}$ \  
                            --task_name ${TASK_NAME}$

Decomposing BERT

decomposition and general distillation

Run with horovod

mpirun -np 8 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib python3 general_distill.py --teacher_model models/bert-base-uncased --student_model models/bert-gd-72-384 --pregenerated_data data/pregenerated_data --num_train_epochs 2.0 --train_batch_size 32 --output_dir output/bert-gd-72-384 -use_swap --do_lower_case

To restrict sharing among SAN or FFN, add "ops" and set "ops" to be "san" or "ffn" in bert-gd-72-384/config.json

ops = "san"

Evaluation

Task distillation with data augmentation in fine-tuning phase

Rename a pretrained model as "", for instance, change step_0_pytorch_model.bin to pytorch_model.bin, and change load_compressed_model from false to true in output/config.json

Task distillation for distributed training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 task_distill.py --teacher_model models/bert-base-uncasedi/STS-B --student_model models/bert-gd-72-384 --task_name STS-B --aug_train --data_dir data/glue_data/SST-2 --max_seq_length 128 --train_batch_size 32 --aug_train --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir ./output/36-256-STS-B

Task distillation for single gpu

python3  task_distill.py  --teacher_model models/bert-base-uncased   --student_model  models/bert-td-72-384  --output output_demo  --data_dir  data/glue_data/SST-2   --task_name  SST-2  --do_lower_case --aug_train

For augmentation, you should add --aug_train

Get test result for model

python run_glue.py --model_name_or_path  models/bert-td-72-384/SST-2 --task_name SST-2 --do_eval --do_predict --data_dir data/glue_data/STS-B --max_seq_length 128 --save_steps 500 --save_total_limit 2 --output_dir ./output/SST-2

Explore extreme compression for pre-trained language models

Related tags

Overview

Explore extreme compression for pre-trained language models

Before Training

install some libraries

loading pre-trained models

generate training data for given corpora (e.g., saved in the path "corpora" )

task data augmentation

Decomposing BERT

decomposition and general distillation

Evaluation

Task distillation with data augmentation in fine-tuning phase

Owner

twinkle

Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures

Airborne magnetic data of the Osborne Mine and Lightning Creek sill complex, Australia

Human motion synthesis using Unity3D

This code is an unofficial implementation of HiFiSinger.

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

Neural Contours: Learning to Draw Lines from 3D Shapes (CVPR2020)

Python package for visualizing the loss landscape of parameterized quantum algorithms.

a spacial-temporal pattern detection system for home automation

Official implement of "CAT: Cross Attention in Vision Transformer".

Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Repositório da disciplina de APC, no segundo semestre de 2021

Pytorch implementation of Zero-DCE++

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Example repository for custom C++/CUDA operators for TorchScript

Google AI Open Images - Object Detection Track: Open Solution

Orchestrating Distributed Materials Acceleration Platform Tutorial