Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Last update: Oct 24, 2022

Related tags

Overview

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

This repository contains the code for the paper

B. Glocker, S. Winzeck. Algorithmic encoding of protected characteristics and its implications on disparities across subgroups. 2021. under review. arXiv:2110.14755

Dataset

The CheXpert imaging dataset together with the patient demographic information used in this work can be downloaded from https://stanfordmlgroup.github.io/competitions/chexpert/.

Code

For running the code, we recommend setting up a dedicated Python environment.

Setup Python environment using conda

Create and activate a Python 3 conda environment:

conda create -n pymira python=3
conda activate chexploration

Install PyTorch using conda:

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

Setup Python environment using virtualenv

Create and activate a Python 3 virtual environment:

virtualenv -p python3 <path_to_envs>/chexploration
source <path_to_envs>/chexploration/bin/activate

Install PyTorch using pip:

pip install torch torchvision

Install additional Python packages:

pip install matplotlib jupyter pandas seaborn pytorch-lightning scikit-learn scikit-image tensorboard tqdm openpyxl

How to use

In order to replicate the results presented in the paper, please follow these steps:

Download the CheXpert dataset, copy the file train.csv to the datafiles folder
Download the CheXpert demographics data, copy the file CHEXPERT DEMO.xlsx to the datafiles folder
Run the notebook chexpert.sample.ipynb to generate the study data
Adjust the variable img_data_dir to point to the imaging data and run the following scripts
- Run the script chexpert.disease.py to train a disease detection model
- Run the script chexpert.sex.py to train a sex classification model
- Run the script chexpert.race.py to train a race classification model
Run the notebook chexpert.predictions.ipynb to evaluate all three prediction models
Run the notebook chexpert.explorer.ipynb for the unsupervised exploration of feature representations

Additionally, there are scripts chexpert.sex.split.py and chexpert.race.split.py to run SPLIT on the disease detection model. The default setting in all scripts is to train a DenseNet-121 using the training data from all patients. The results for models trained on subgroups only can be produced by changing the path to the datafiles (e.g., using full_sample_train_white.csv and full_sample_val_white.csv instead of full_sample_train.csv and full_sample_val.csv).

Note, the Python scripts also contain code for running the experiments using a ResNet-34 backbone which requires less GPU memory.

Trained models

All trained models, feature embeddings and output predictions can be found here.

Funding sources

This work is supported through funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757173, Project MIRA, ERC-2017-STG) and by the UKRI London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare.

License

This project is licensed under the Apache License 2.0.

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Related tags

Overview

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Dataset

Code

Setup Python environment using conda

Setup Python environment using virtualenv

Install additional Python packages:

How to use

Trained models

Funding sources

License

Owner

Team MIRA - BioMedIA

Collection of sports betting AI tools.

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

Tool for live presentations using manim

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

ONNX Command-Line Toolbox

Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".

Leveraging OpenAI's Codex to solve cornerstone problems in Music

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Simple Baselines for Human Pose Estimation and Tracking

This code is an implementation for Singing TTS.

一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

The repo for the paper "I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection".