InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Last update: Nov 25, 2022

Overview

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

This is the official code base for our ICLR 2021 paper:

"InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective".

Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Usage

Prepare your environment

Download required packages

pip install -r requirements.txt

ANLI and TextFooler

To run ANLI and TextFooler experiments, refer to README in the ANLI directory.

SQuAD

We will upload the code for the SQuAD experiments soon.

Citation

@inproceedings{
wang2021infobert,
title={InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective},
author={Wang, Boxin and Wang, Shuohang and Cheng, Yu and Gan, Zhe and Jia, Ruoxi and Li, Bo and Liu, Jingjing},
booktitle={International Conference on Learning Representations},
year={2021}}

Owner

AI Secure

UIUC Secure Learning Lab

GitHub Repository

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

1 Jan 01, 2022

Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

3 Nov 22, 2022

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Quickly train T5 models in just 3 lines of code + ONNX support simpleT5 is built on top of PyTorch-lightning ⚡️ and Transformers 🤗 that lets you quic

220 Dec 30, 2022

Uncomplete archive of files from the European Nopsled Team

European Nopsled CTF Archive This is an archive of collected material from various Capture the Flag competitions that the European Nopsled team played

4 Nov 24, 2021

Kinky furry assitant based on GPT2

KinkyFurs-V0 Kinky furry assistant based on GPT2 How to run python3 V0.py then, open web browser and go to localhost:8080 Requirements: Flask trans

1 Jun 11, 2022

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Parrot Parrot is a paraphrase based utterance augmentation framework purpose built to accelerate training NLU models. A paraphrase framework is more t

690 Jan 04, 2023

Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

23 Sep 05, 2022

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

203 Dec 20, 2022

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

Pretrain_Bert_with_MaskLM Info 使用Mask LM预训练任务来预训练Bert模型。基于pytorch框架，训练关于垂直领域语料的预训练语言模型，目的是提升下游任务的表现。 Pretraining Task Mask Language Model，简称Mask LM，即

24 Dec 10, 2022

This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

60 Nov 11, 2022

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classifi

186 Dec 24, 2022

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Related tags

Overview

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Usage

Prepare your environment

ANLI and TextFooler

SQuAD

Citation

Owner

AI Secure

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Python library for Serbian Natural language processing (NLP)

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Uncomplete archive of files from the European Nopsled Team

Kinky furry assitant based on GPT2

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Pipeline for training LSA models using Scikit-Learn.

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

This repository has a implementations of data augmentation for NLP for Japanese.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Tools and data for measuring the popularity & growth of various programming languages.

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

A Flask Sentiment Analysis API, with visual implementation

SurvTRACE: Transformers for Survival Analysis with Competing Events

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

CDLA: A Chinese document layout analysis (CDLA) dataset

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.