PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

Related tags

Text Data & NLPsiatl
Overview

This repository contains source code for NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models" (Paper link)

Introduction

This paper presents a simple transfer learning approach that addresses the problem of catastrophic forgetting. We pretrain a language model and then transfer it to a new model, to which we add a recurrent layer and an attention mechanism. Based on multi-task learning, we use a weighted sum of losses (language model loss and classification loss) and fine-tune the pretrained model on our (classification) task.

Architecture

Step 1:

  • Pretraining of a word-level LSTM-based language model

Step 2:

  • Fine-tuning the language model (LM) on a classification task

  • Use of an auxiliary LM loss

  • Employing 2 different optimizers (1 for the pretrained part and 1 for the newly added part)

  • Sequentially unfreezing

Reference

@inproceedings{chronopoulou-etal-2019-embarrassingly,
    title = "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models",
    author = "Chronopoulou, Alexandra  and
      Baziotis, Christos  and
      Potamianos, Alexandros",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-1213",
    pages = "2089--2095",
}

Prerequisites

Dependencies

  • PyTorch version >=0.4.0

  • Python version >= 3.6

Install Requirements

Create Environment (Optional): Ideally, you should create a conda environment for the project.

conda create -n siatl python=3
conda activate siatl

Install PyTorch 0.4.0 with the desired cuda version to use the GPU:

conda install pytorch==0.4.0 torchvision -c pytorch

Then install the rest of the requirements:

pip install -r requirements.txt

Download Data

You can find Sarcasm Corpus V2 (link) under datasets/

Plot visualization

Visdom is used to visualized metrics during training. You should start the server through the command line (using tmux or screen) by typing visdom. You will be then able to see the visualizations by going to http://localhost:8097 in your browser.

Check here for more: https://github.com/facebookresearch/visdom#usage

Training

In order to train the model, either the LM or the SiATL, you need to run the corresponding python script and pass as an argument a yaml model config. The yaml config specifies all the configuration details of the experiment to be conducted. To make any changes to a model, change an existing or create a new yaml config file.

The yaml config files can be found under model_configs/ directory.

Use the pretrained Language Model:

cd checkpoints/
wget https://www.dropbox.com/s/lalizxf3qs4qd3a/lm20m_70K.pt 

(Download it and place it in checkpoints/ directory)

(Optional) Train a Language Model:

Assuming you have placed the training and validation data under datasets/<name_of_your_corpus/train.txt, datasets/<name_of_your_corpus/valid.txt (check the model_configs/lm_20m_word.yaml's data section), you can train a LM.

See for example:

python models/sent_lm.py -i lm_20m_word.yaml

Fine-tune the Language Model on the labeled dataset, using an auxiliary LM loss, 2 optimizers and sequential unfreezing, as described in the paper:

To fine-tune it on the Sarcasm Corpus V2 dataset:

python models/run_clf.py -i SCV2_aux_ft_gu.yaml --aux_loss --transfer

  • -i: Configuration yaml file (under model_configs/)
  • --aux_loss: You can choose if you want to use an auxiliary LM loss
  • --transfer: You can choose if you want to use a pretrained LM to initalize the embedding and hidden layer of your model. If not, they will be randomly initialized
Owner
Alexandra Chronopoulou
Research Intern at AllenAI. CS PhD student in LMU Munich.
Alexandra Chronopoulou
Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. Documentation Proper documentation is available at

HUSEIN ZOLKEPLI 151 Jan 05, 2023
HAN2HAN : Hangul Font Generation

HAN2HAN : Hangul Font Generation

Changwoo Lee 36 Dec 28, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in

Nuked 1 Aug 21, 2022
Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

words-per-minute A terminal app written in python utilizing the curses module th

Tanim Islam 1 Jan 14, 2022
hashily is a Python module that provides a variety of text decoding and encoding operations.

hashily is a python module that performs a variety of text decoding and encoding functions. It also various functions for encrypting and decrypting text using various ciphers.

DevMysT 5 Jul 17, 2022
Datasets of Automatic Keyphrase Extraction

This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. Following are the datasets and the original papers that proposed them. If yo

LIAAD - Laboratory of Artificial Intelligence and Decision Support 163 Dec 23, 2022
ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

Google Research 409 Jan 06, 2023
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger In this project, our aim is to tune, compare, and contrast the perf

Chirag Daryani 0 Dec 25, 2021
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 01, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

Microsoft 281 Dec 15, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

Abdullah Tarek 3 Mar 11, 2022
Use Tensorflow2.7.0 Build OpenAI'GPT-2

TF2_GPT-2 Use Tensorflow2.7.0 Build OpenAI'GPT-2 使用最新tensorflow2.7.0构建openai官方的GPT-2 NLP模型 优点 使用无监督技术 拥有大量词汇量 可实现续写(堪比“xx梦续写”) 实现对话后续将应用于FloatTech的Bot

Watermelon 9 Sep 13, 2022
A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Vladimir Larin 7 Nov 15, 2022
Stand-alone language identification system

langid.py readme Introduction langid.py is a standalone Language Identification (LangID) tool. The design principles are as follows: Fast Pre-trained

2k Jan 04, 2023
Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

BigScience Workshop 242 Dec 25, 2022