Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Overview

Structured Super Lottery Tickets in BERT

This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization" (ACL 2021).


Getting Start

  1. python3.6
    Reference to download and install : https://www.python.org/downloads/release/python-360/
  2. install requirements
    > pip install -r requirements.txt

Data

  1. Download data
    sh download.sh
    Please refer to download GLUE dataset: https://gluebenchmark.com/
  2. Preprocess data
    > sh experiments/glue/prepro.sh
    For more data processing details, please refer to this repo.

Verifying Phase Transition Phenomenon

  1. Fine-tune a pre-trained BERT model with single task data, compute importance scores, and generate one-shot structured pruning masks at multiple sparsity levels. E.g., for MNLI, run

    ./scripts/train_mnli.sh GPUID
    
  2. Rewind and evaluate the winning, random, and losing tickets at multiple sparsity levels. E.g., for MNLI, run

    ./scripts/rewind_mnli.sh GPUID
    

You may try tasks with smaller sizes (e.g., SST, MRPC, RTE) to see a more pronounced phase transition.


Multi-task Learning (MTL) with Tickets Sharing

  1. Identify a set of super tickets for each individual task.

    • Identify winning tickets at multiple sparsity levels for each individual task. E.g., for MTDNN-base, run

      ./scripts/prepare_mtdnn_base.sh GPUID
      

      We recommend to use the same optimization settings, e.g., learning rate, optimizer and random seed, in both the ticket identification procedures and the MTL. We empirically observe that the super tickets perform better in MTL in such a case.

    • [Optional] For each individual task, identify a set of super tickets from the winning tickets at multiple sparsity levels. You can skip this step if you wish to directly use the set of super tickets identified by us. If you wish to identify super tickets on your own (This is recommended if you use a different optimization settings, e.g., learning rate, optimizer and random seed, from those in our scripts. These factors may affect the candidacy of super tickets.), we provide the template scripts

      ./scripts/rewind_mnli_winning.sh GPUID
      ./scripts/rewind_qnli_winning.sh GPUID
      ./scripts/rewind_qqp_winning.sh GPUID
      ./scripts/rewind_sst_winning.sh GPUID
      ./scripts/rewind_mrpc_winning.sh GPUID
      ./scripts/rewind_cola_winning.sh GPUID
      ./scripts/rewind_stsb_winning.sh GPUID
      ./scripts/rewind_rte_winning.sh GPUID
      

      These scripts rewind the winning tickets at multiple sparsity levels. You can manually identify the set of super tickets as the set of winning tickets that perform the best among all sparsity levels.

  2. Construct multi-task super tickets by aggregating the identified sets of super tickets of all tasks. E.g., to use the super tickets identified by us, run

    python construct_mtl_mask.py
    

    You can modify the script to use the super tickets identified by yourself.

  3. MTL with tickets sharing. Run

    ./scripts/train_mtdnn.sh GPUID
    

MTL Benchmark

MTL evaluation results on GLUE dev set averaged over 5 random seeds.

Model MNLI-m/mm (Acc) QNLI (Acc) QQP (Acc/F1) SST-2 (Acc) MRPC (Acc/F1) CoLA (Mcc) STS-B (P/S) RTE (Acc) Avg Score Avg Compression
MTDNN, base 84.6/84.2 90.5 90.6/87.4 92.2 80.6/86.2 54.0 86.2/86.4 79.0 82.4 100%
Tickets-Share, base 84.5/84.1 91.0 90.7/87.5 92.7 87.0/90.5 52.0 87.7/87.5 81.2 83.3 92.9%
MTDNN, large 86.5/86.0 92.2 91.2/88.1 93.5 85.2/89.4 56.2 87.2/86.9 83.0 84.4 100%
Tickets-Share, large 86.7/86.0 92.1 91.3/88.4 93.2 88.4/91.5 61.8 89.2/89.1 80.5 85.4 83.3%

Citation

@article{liang2021super,
  title={Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization},
  author={Liang, Chen and Zuo, Simiao and Chen, Minshuo and Jiang, Haoming and Liu, Xiaodong and He, Pengcheng and Zhao, Tuo and Chen, Weizhu},
  journal={arXiv preprint arXiv:2105.12002},
  year={2021}
}

@article{liu2020mtmtdnn,
  title={The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding},
  author={Liu, Xiaodong and Wang, Yu and Ji, Jianshu and Cheng, Hao and Zhu, Xueyun and Awa, Emmanuel and He, Pengcheng and Chen, Weizhu and Poon, Hoifung and Cao, Guihong and Jianfeng Gao},
  journal={arXiv preprint arXiv:2002.07972},
  year={2020}
}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang ([email protected]).

Owner
Chen Liang
Chen Liang
LCG T-TEST USING EUCLIDEAN METHOD

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

2 Jan 21, 2022
This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

IPL-data-analysis This project consists of data analysis and data visualization of all IPL seasons from 2008 to 2019 and answering the most asked ques

Sivateja A T 2 Feb 08, 2022
Interpretable Models for NLP using PyTorch

This repo is deprecated. Please find the updated package here. https://github.com/EdGENetworks/anuvada Anuvada: Interpretable Models for NLP using PyT

Sandeep Tammu 19 Dec 17, 2022
Implementation of Multistream Transformers in Pytorch

Multistream Transformers Implementation of Multistream Transformers in Pytorch. This repository deviates slightly from the paper, where instead of usi

Phil Wang 47 Jul 26, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 02, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

PEGASUS library Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised

Google Research 1.4k Dec 22, 2022
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

Computation for Indian Language Technology (CFILT) 9 Oct 13, 2022
Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

Zumo Labs 253 Dec 21, 2022
Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

words-per-minute A terminal app written in python utilizing the curses module th

Tanim Islam 1 Jan 14, 2022
A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Bloxflip Smart Bet A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode. https://bloxflip.com/crash. THIS

43 Jan 05, 2023
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

Galois Autocompleter 91 Sep 23, 2022
A method to generate speech across multiple speakers

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Facebook Archive 873 Dec 15, 2022
Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言 现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID,如果机器的创建者是我们可控的话,那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity,利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

Jumbo 26 Dec 05, 2022
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python) 日本語は以下に続きます (Japanese follows) English: This book is written in Japanese and primaril

Ryuichi Yamamoto 189 Dec 29, 2022
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Tencent 633 Dec 28, 2022
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 07, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022
official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

Plugin 3 Jan 12, 2022