Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Overview

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Abstract: Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.

The repository contains the code to reproduce our work in debiasing NLU models without in-distribution degradation. We provide 2 runs of experiment that are shown in our paper:

  1. Debias MNLI model from syntactic bias and evaluate on HANS as the out-of-distribution data.
  2. Debias MNLI model from hypothesis-only bias and evaluate on MNLI-hard sets as the out-of-distribution data.

Requirements

The code requires python >= 3.6 and pytorch >= 1.1.0.

Additional required dependencies can be found in requirements.txt. Install all requirements by running:

pip install -r requirements.txt

Data

Our experiments use MNLI dataset version provided by GLUE benchmark. Download the file from here, and unzip under the directory ./dataset Additionally download the following files here for evaluating on hard/easy splits of both MNLI dev and test sets. The dataset directory should be structured as the following:

└── dataset 
    └── MNLI
        ├── train.tsv
        ├── dev_matched.tsv
        ├── dev_mismatched.tsv
        ├── dev_mismatched.tsv
        ├── dev_matched_easy.tsv
        ├── dev_matched_hard.tsv
        ├── dev_mismatched_easy.tsv
        ├── dev_mismatched_hard.tsv
        ├── multinli_hard
        │   ├── multinli_0.9_test_matched_unlabeled_hard.jsonl
        │   └── multinli_0.9_test_mismatched_unlabeled_hard.jsonl
        ├── multinli_test
        │   ├── multinli_0.9_test_matched_unlabeled.jsonl
        │   └── multinli_0.9_test_mismatched_unlabeled.jsonl
        └── original

Running the experiments

For each evaluation setting, use the --mode and --which_bias arguments to set the appropriate loss function and the type of bias to mitigate (e.g, hans, hypo).

To reproduce our result on MNLI ⮕ HANS, run the following:

cd src/
CUDA_VISIBLE_DEVICES=6 python train_distill_bert.py \
    --output_dir ../checkpoints/hans/bert_confreg_lr5_epoch3_seed444 \
    --do_train --do_eval --mode smoothed_distill \
    --seed 444 --which_bias hans

For the MNLI ⮕ hard splits, run the following:

cd src/
CUDA_VISIBLE_DEVICES=10 python train_distill_bert.py \
    --output_dir ../checkpoints/hypo/bert_confreg_lr5_epoch3_seed111 \
    --do_train --do_eval --mode smoothed_distill \
    --seed 111 --which_bias hypo

Expected results

Results on the MNLI ⮕ HANS setting:

Mode Seed MNLI-m MNLI-mm HANS avg.
None 111 84.57 84.72 62.04
conf-reg 111 84.17 85.02 69.62

Results on the MNLI ⮕ Hard-splits setting:

Mode Seed MNLI-m MNLI-mm MNLI-m hard MNLI-mm hard
None 111 84.62 84.71 76.07 76.75
conf-reg 111 85.01 84.87 78.02 78.89

Contact

Contact person: Ajie Utama, [email protected]

https://www.ukp.tu-darmstadt.de/

Please reach out to us for further questions or if you encounter any issue. You can cite this work by the following:

@InProceedings{UtamaDebias2020,
  author    = {Utama, P. Ajie and Moosavi, Nafise Sadat and Gurevych, Iryna},
  title     = {Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance},
  booktitle = {Proceedings of the 58th Conference of the Association for Computational Linguistics},
  month     = jul,
  year      = {2020},
  publisher = {Association for Computational Linguistics}
}

Acknowledgement

The code in this repository is build on the implementation of debiasing method by Clark et al. The original version can be found here

Owner
Ubiquitous Knowledge Processing Lab
Ubiquitous Knowledge Processing Lab
Adaptation through prediction: multisensory active inference torque control

Adaptation through prediction: multisensory active inference torque control Submitted to IEEE Transactions on Cognitive and Developmental Systems Abst

Cristian Meo 1 Nov 07, 2022
To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts

JaxTon 💯 JAX exercises Mission 🚀 To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beg

Rohan Rao 512 Jan 01, 2023
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

GalaXC GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification @InProceedings{Saini21, author = {Saini, D. and Jain,

Extreme Classification 28 Dec 05, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 08, 2023
An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

gym-idsgame An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym gym-idsgame is a reinforcement learning environment for simulating at

Kim Hammar 29 Dec 03, 2022
Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021]. Overview This package contains the model implementation and training

Google Research 365 Dec 30, 2022
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 09, 2023
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

QCraft 101 Dec 05, 2022
Minimalistic PyTorch training loop

Backbone for PyTorch training loop Will try to keep it minimalistic. pip install back from back import Bone Features Progress bar Checkpoints saving/l

Kashin 4 Jan 16, 2020
A simple, high level, easy-to-use open source Computer Vision library for Python.

ZoomVision : Slicing Aid Detection A simple, high level, easy-to-use open source Computer Vision library for Python. Installation Installing dependenc

Nurettin Sinanoğlu 2 Mar 04, 2022
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Fast and Context-Aware Framework for Space-Time Video Super-Resolution Preparation Dependencies PyTorch 1.2.0 CUDA 10.0 DCNv2 cd model/DCNv2 bash make

Xueheng Zhang 1 Mar 29, 2022
Generative Adversarial Networks(GANs)

Generative Adversarial Networks(GANs) Vanilla GAN ClusterGAN Vanilla GAN Model Structure Final Generator Structure A MLP with 2 hidden layers of hidde

Zhenbang Feng 2 Nov 05, 2021
PyTorch implementation of neural style randomization for data augmentation

README Augment training images for deep neural networks by randomizing their visual style, as described in our paper: https://arxiv.org/abs/1809.05375

84 Nov 23, 2022
Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

Static Token And Credential Scanner CI Integrations What is it? STACS is a YARA

STACS 18 Aug 04, 2022
Crowd-sourced Annotation of Human Motion.

Motion Annotation Tool Live: https://motion-annotation.humanoids.kit.edu Paper: The KIT Motion-Language Dataset Installation Start by installing all P

Matthias Plappert 4 May 25, 2020
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023