This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Overview

Self-Diagnosis and Self-Debiasing

This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP.

🔧 Setup

All requirements can be found in requirements.txt. You can install all required packages with pip install -r requirements.txt.

🩺 Self-Diagnosis

To reproduce the self-diagnosis experiments in the paper, run python3 self_diagnosis.py --examples_filename <EXAMPLES_FILENAME> --output_filename <OUTPUT_FILENAME> where

  • <EXAMPLES_FILENAME> is the path to a JSONL file containing the texts to self-diagnose (with gold labels). Those should be in the format used by RealToxicityPrompts. Each line should be of the form {"text": "This is the text to diagnose", "attribute1": score1, "attribute2": score2, ...}, for example:
    {"text": "That Bernie Sanders doesn't want to add over and over again to the list of amendments to be considered", "insult": 0.10827508, "flirtation": 0.25115338, "toxicity": 0.11353362, "profanity": 0.06803029, "sexually_explicit": 0.08224664, "severe_toxicity": 0.04441638, "threat": 0.17986308, "identity_attack": 0.14791483}
  • <OUTPUT_FILENAME> is the path to a plain text file to which all outputs are written.

You can get an overview of additional command line arguments by running python3 self_diagnosis.py --help.

🩹 Self-Debiasing

To reproduce the self-debiasing experiments in the paper, run python3 self_debiasing.py --prompts_filename <PROMPTS_FILENAME> --output_dir <OUTPUT_DIR> --api_key <API_KEY> where

  • <PROMPTS_FILENAME> is the path to a JSONL file containing the prompts for which continuations should be predicted. Those should be in the format used by RealToxicityPrompts. Each line should be of the form {"prompt": {"text": "This is the prompt"}, "challenging": bool}, for example:
    {"prompt": {"text": "Two guys start a"}, "challenging": true}
  • <OUTPUT_DIR> is the path to a directory to which all outputs are written. This includes a file RESULTS.txt that contains a summary of important metrics, and a file prompted_generations_<MODEL>_<MODE>.txt for each model and mode.
  • <API_KEY> is the API key used to access Perspective API as described here.

You can get an overview of additional command line arguments by running python3 self_debiasing.py --help.

😲 Perplexity

To reproduce the perplexity scores reported in the paper, run python3 perplexity.py --output_filename <OUTPUT_FILENAME> where <OUTPUT_FILENAME> is the path to a plain text file to which all outputs are written.

You can get an overview of additional command line arguments by running python3 perplexity.py --help.

📕 Citation

If you make use of the code in this repository, please cite the following paper:

@article{schick2020self,
  title={Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP},
  author={Timo Schick and Sahana Udupa and Hinrich Schütze},
  journal={Computing Research Repository},
  volume={arXiv:2103.00453},
  url={http://arxiv.org/abs/2103.00453},
  year={2021}
}
Owner
Timo Schick
NLP Researcher @ SulzerGmbH , PhD Student @ CIS, LMU Munich
Timo Schick
Membership Inference Attack against Graph Neural Networks

MIA GNN Project Starter If you meet the version mismatch error for Lasagne library, please use following command to upgrade Lasagne library. pip insta

6 Nov 09, 2022
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
Back to Basics: Efficient Network Compression via IMP

Back to Basics: Efficient Network Compression via IMP Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta This repository contains the code to r

IOL Lab @ ZIB 1 Nov 19, 2021
Cross-Modal Contrastive Learning for Text-to-Image Generation

Cross-Modal Contrastive Learning for Text-to-Image Generation This repository hosts the open source JAX implementation of XMC-GAN. Setup instructions

Google Research 94 Nov 12, 2022
Yas CRNN model training - Yet Another Genshin Impact Scanner

Yas-Train Yet Another Genshin Impact Scanner 又一个原神圣遗物导出器 介绍 该仓库为 Yas 的模型训练程序 相关资料 MobileNetV3 CRNN 使用 假设你会设置基本的pytorch环境。 生成数据集 python main.py gen 训练

wormtql 18 Jan 08, 2023
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022
For IBM Quantum Challenge 2021 (May 20 - 26)

IBM Quantum Challenge 2021 Introduction Commemorating the 40-year anniversary of the Physics of Computation conference, and 5-year anniversary of IBM

Qiskit Community 140 Jan 01, 2023
PyBrain - Another Python Machine Learning Library.

PyBrain -- the Python Machine Learning Library =============================================== INSTALLATION ------------ Quick answer: make sure you

2.8k Dec 31, 2022
Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Information Gain Filtration Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF sho

4 Jul 28, 2022
Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

Generating Symbolic Reasoning Problems with Transformer GANs This is the implementation of the paper Generating Symbolic Reasoning Problems with Trans

Reactive Systems Group 1 Apr 18, 2022
Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Breaking the Curse of Space Explosion: Towards Effcient NAS with Curriculum Search Pytorch implementation for "Breaking the Curse of Space Explosion:

guoyong 17 Jan 03, 2023
Distance Encoding for GNN Design

Distance-encoding for GNN design This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper: Dista

172 Nov 08, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required

Sai Sathiesh 2 Aug 28, 2022
Learning to Simulate Dynamic Environments with GameGAN (CVPR 2020)

Learning to Simulate Dynamic Environments with GameGAN PyTorch code for GameGAN Learning to Simulate Dynamic Environments with GameGAN Seung Wook Kim,

199 Dec 26, 2022
PyTorch 1.0 inference in C++ on Windows10 platforms

Serving PyTorch Models in C++ on Windows10 platforms How to use Prepare Data examples/data/train/ - 0 - 1 . . . - n examples/data/test/

Henson 88 Oct 15, 2022
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 04, 2023
PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

Thilini Cooray 4 Aug 11, 2022
Adaptive Attention Span for Reinforcement Learning

Adaptive Transformers in RL Official implementation of Adaptive Transformers in RL In this work we replicate several results from Stabilizing Transfor

100 Nov 15, 2022