ConvBERT: Improving BERT with Span-based Dynamic Convolution

Last update: Dec 10, 2022

Related tags

Overview

ConvBERT

Introduction

In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU. For detailed description and experimental results, please refer to our NeurIPS 2020 paper ConvBERT: Improving BERT with Span-based Dynamic Convolution.

Requirements

Python 3
tensorflow 1.15
numpy
scikit-learn

Experiments

Pre-training

These instructions pre-train a medium-small sized ConvBERT model (17M parameters) using the OpenWebText corpus.

To build the tf-record and pre-train the model, download the OpenWebText corpus (12G) and setup your data directory in build_data.sh and pretrain.sh. Then run

bash build_data.sh

The processed data require roughly 30G of disk space. Then, to pre-train the model, run

bash pretrain.sh

See configure_pretraining.py for the details of the supported hyperparameters.

Fine-tining

We gives the instruction to fine-tune a pre-trained medium-small sized ConvBERT model (17M parameters) on GLUE. You can refer to the Google Colab notebook for a quick example. See our paper for more details on model performance. Pre-trained model can be found here. (You can also download it from baidu cloud with extraction code m9d2.)

To evaluate the performance on GLUE, you can download the GLUE data by running

python3 download_glue_data.py

Set up the data by running mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data. After preparing the GLUE data, setup your data directory in finetune.sh and run

bash finetune.sh

And you can test different tasks by changing configs in finetune.sh.

If you find this repo helpful, please consider cite

@article{Jiang2020ConvBERT,
  title={ConvBERT: Improving BERT with Span-based Dynamic Convolution},
  author={Zi-Hang Jiang and Weihao Yu and Daquan Zhou and Y. Chen and Jiashi Feng and S. Yan},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.02496}
}

References

Here are some great resources we benefit:

Codebase: Our codebase are based on ELECTRA.

Dynamic convolution: Implementation from Pay Less Attention with Lightweight and Dynamic Convolutions

Dataset: OpenWebText from Language Models are Unsupervised Multitask Learners

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Related tags

Overview

ConvBERT

Introduction

Requirements

Experiments

Pre-training

Fine-tining

References

Owner

YITUTech

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Associated Repository for "Translation between Molecules and Natural Language"

AutoGluon: AutoML for Text, Image, and Tabular Data

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

NLPShala , the best IDE for all Natural language processing tasks.

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

A Structured Self-attentive Sentence Embedding

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Natural Language Processing Specialization

CoSENT、STS、SentenceBERT

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Making text a first-class citizen in TensorFlow.

Stanford CoreNLP provides a set of natural language analysis tools written in Java

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.