This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Last update: Dec 12, 2022

Related tags

Deep Learning self-debiasing

Overview

Self-Diagnosis and Self-Debiasing

This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP.

🔧 Setup

All requirements can be found in requirements.txt. You can install all required packages with pip install -r requirements.txt.

🩺 Self-Diagnosis

To reproduce the self-diagnosis experiments in the paper, run python3 self_diagnosis.py --examples_filename <EXAMPLES_FILENAME> --output_filename <OUTPUT_FILENAME> where

<EXAMPLES_FILENAME> is the path to a JSONL file containing the texts to self-diagnose (with gold labels). Those should be in the format used by RealToxicityPrompts. Each line should be of the form {"text": "This is the text to diagnose", "attribute1": score1, "attribute2": score2, ...}, for example:

{"text": "That Bernie Sanders doesn't want to add over and over again to the list of amendments to be considered", "insult": 0.10827508, "flirtation": 0.25115338, "toxicity": 0.11353362, "profanity": 0.06803029, "sexually_explicit": 0.08224664, "severe_toxicity": 0.04441638, "threat": 0.17986308, "identity_attack": 0.14791483}

<OUTPUT_FILENAME> is the path to a plain text file to which all outputs are written.

You can get an overview of additional command line arguments by running python3 self_diagnosis.py --help.

🩹 Self-Debiasing

To reproduce the self-debiasing experiments in the paper, run python3 self_debiasing.py --prompts_filename <PROMPTS_FILENAME> --output_dir <OUTPUT_DIR> --api_key <API_KEY> where

<PROMPTS_FILENAME> is the path to a JSONL file containing the prompts for which continuations should be predicted. Those should be in the format used by RealToxicityPrompts. Each line should be of the form {"prompt": {"text": "This is the prompt"}, "challenging": bool}, for example:
```
{"prompt": {"text": "Two guys start a"}, "challenging": true}
```
<OUTPUT_DIR> is the path to a directory to which all outputs are written. This includes a file RESULTS.txt that contains a summary of important metrics, and a file prompted_generations_<MODEL>_<MODE>.txt for each model and mode.
<API_KEY> is the API key used to access Perspective API as described here.

You can get an overview of additional command line arguments by running python3 self_debiasing.py --help.

😲 Perplexity

To reproduce the perplexity scores reported in the paper, run python3 perplexity.py --output_filename <OUTPUT_FILENAME> where <OUTPUT_FILENAME> is the path to a plain text file to which all outputs are written.

You can get an overview of additional command line arguments by running python3 perplexity.py --help.

📕 Citation

If you make use of the code in this repository, please cite the following paper:

@article{schick2020self,
  title={Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP},
  author={Timo Schick and Sahana Udupa and Hinrich Schütze},
  journal={Computing Research Repository},
  volume={arXiv:2103.00453},
  url={http://arxiv.org/abs/2103.00453},
  year={2021}
}

This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Related tags

Overview

Self-Diagnosis and Self-Debiasing

🔧 Setup

🩺 Self-Diagnosis

🩹 Self-Debiasing

😲 Perplexity

📕 Citation

Owner

Timo Schick

Segment axon and myelin from microscopy data using deep learning

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

A library for optimization on Riemannian manifolds

PyTorch and Tensorflow functional model definitions

Official repo for QHack—the quantum machine learning hackathon

Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

HGCAE Pytorch implementation. CVPR2021 accepted.

Self-Supervised CNN-GCN Autoencoder

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

UV matrix decompostion using movielens dataset

PyTorch implementation of Higher Order Recurrent Space-Time Transformer

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

Algorithmic trading with deep learning experiments

wgan, wgan2(improved, gp), infogan, and dcgan implementation in lasagne, keras, pytorch

The code for "Deep Level Set for Box-supervised Instance Segmentation in Aerial Images".

Testbed of AI Systems Quality Management

A little Python application to auto tag your photos with the power of machine learning.

a pytorch implementation of auto-punctuation learned character by character

領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。