Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

Overview

Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance

Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leora Horwitz, and David Sontag. 2021. Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance. In Thirty-fifth Conference on Neural Information Processing Systems.

Individuals often make different decisions when faced with the same context, due to personal preferences and background. For instance, judges may vary in their leniency towards certain drug-related offenses, and doctors may vary in their preference for how to start treatment for certain types of patients. With these examples in mind, we present an algorithm for identifying types of contexts (e.g., types of cases or patients) with high inter-decision-maker disagreement. We formalize this as a causal inference problem, seeking a region where the assignment of decision-maker has a large causal effect on the decision. We give an iterative algorithm to find a region maximizing this objective and give a generalization bound for its performance. In a semi-synthetic experiment, we show that our algorithm recovers the correct region of disagreement accurately compared to baselines. Finally, we apply our algorithm to real-world healthcare datasets, recovering variation that aligns with existing clinical knowledge.

To run our algorithm, see run_semisynth_exp_recover_beta.ipynb for how to call IterativeRegionEstimator.py. The baselines and our model are also implemented in baselines.py. Helper functions (e.g. for evaluation) are in helpers.py.

Please refer to the following steps to reproduce the experiments and figures in this paper:

  1. To set-up the required packages, run create_env.sh, passing in a conda environment name. Then run source activate with the environment name to enter it.

  2. To run the semi-synthetic experiment,

    1. Download the criminal justice dataset from https://github.com/stanford-policylab/recidivism-predictions
    2. Process the data using data_processing/semisynth_process_data.ipynb.
    3. To run the iterative algorithm and baselines, run python3 run_baselines_on_semisynth.py with the product of the following arguments:
      1. type of model: Iterative, Direct, TarNet, ULearner, CausalForest
      2. number of agents: 2, 5, 10, 20, 40, 87 in our experiments
      3. subset: drug_possession, misdemeanor_under35
    4. Figures 1, 3, and 4 compare metrics for the methods. They can be produced by running plot_semisynth.ipynb.
    5. Figure 2 examines tuning the region size. run_semisynth_exp_recoverbeta.ipynb is a stand-alone notebook for reproducing it.
    6. Figures 5 and 6 examine convergence of the iterative algorithm. They can be produced by running plot_convergence.ipynb.
    7. Figures 7 and 8 examine how robust the iterative algorithm and direct baselines are to violations of the assumption that there are two agent groups. First, run python3 run_robustness_semisynth_experiment.py with the product of the following arguments:
      1. type of model: Iterative, Direct
      2. number of groups: 2, 3, 5, 10
      3. subset: drug_possession, misdemeanor_under35 Note that the number of agents is fixed at 40. The figures can then be produced by running plot_robustness.ipynb.
    8. Note: Helper code that is called to generate semi-synthetic data is located in semisynth_subsets.py, semisynth_dataloader.py, and semisynth_dataloader_robust.py.
  3. The real-world diabetes experiment uses proprietary data extracted using generate_t2dm_cohort.sql and first_line.sql.

    1. Select an outcome model from logistic regressions, decision trees, and random forests based on AUC, calibration, and partial dependence plots. Figure 9 and the statistics in Table 2 that guided our selection of a random forest outcome model are produced in select_outcome_model_for_diabetes_experiment.ipynb.
    2. The experiment is run with python3 run_baseline_models.py diabetes Iterative DecisionTree RandomForest. Figure 10b, the information needed to create Figures 10a, the statistics in Tables 1 and 3, and the fold consistency evaluation will be outputted.
    3. Note: Data loading helper functions, including how data is split, are located in real_data_loader.py. Most of the functions called to generate the output are located in realdata_analysis.py.
  4. The real-world Parkinson's experiment was run using open-access data.

    1. Download the data from https://www.ppmi-info.org/.
    2. Run python3 ppmi_feature_extraction.py passing in the directory containing the downloaded raw data and directory where processed data will be outputted.
    3. Manually process the treatment data to correct for typos in the drug name and treatment date
    4. Run process_parkinsons_data.ipynb to gather the data for the experiment.
    5. The experiment is run with python3 run_baseline_models.py ppmi Iterative DecisionTree. The information for creating Figure 11 and Table 4 are outputted.
Owner
Sontag Lab
Machine learning algorithms and applications to health care.
Sontag Lab
VM3000 Microphones

VM3000-Microphones This project was completed by Ricky Leman under the supervision of Dr Ben Travaglione and Professor Melinda Hodkiewicz as part of t

UWA System Health Lab 0 Jun 04, 2021
Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

pvbatchForFoam Some pvbatch (paraview) scripts for postprocessing OpenFOAM data For every script there is a help message available: pvbatch pv_state_s

Morev Ilya 2 Oct 26, 2022
A python toolbox for predictive uncertainty quantification, calibration, metrics, and visualization

Website, Tutorials, and Docs    Uncertainty Toolbox A python toolbox for predictive uncertainty quantification, calibration, metrics, and visualizatio

Uncertainty Toolbox 1.4k Dec 28, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021
OpenCV, MediaPipe Pose Estimation, Affine Transform for Icon Overlay

Yoga Pose Identification and Icon Matching Project Goal Detect yoga poses performed by a user and overlay a corresponding icon image. Running the main

Anna Garverick 1 Dec 03, 2021
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

Gal Ben-Zvi 1 Jan 09, 2023
Deeplab-resnet-101 in Pytorch with Jaccard loss

Deeplab-resnet-101 Pytorch with Lovász hinge loss Train deeplab-resnet-101 with binary Jaccard loss surrogate, the Lovász hinge, as described in http:

Maxim Berman 95 Apr 15, 2022
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency(ECCV 2020) This is an official python implementati

304 Jan 03, 2023
[NeurIPS2021] Code Release of K-Net: Towards Unified Image Segmentation

K-Net: Towards Unified Image Segmentation Introduction This is an official release of the paper K-Net:Towards Unified Image Segmentation. K-Net will a

Wenwei Zhang 423 Jan 02, 2023
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

VidLanKD Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohi

Zineng Tang 54 Dec 20, 2022
Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud This repository contains a reference implementation of our Part-Aware Data Augment

Jaeseok Choi 62 Jan 03, 2023
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 07, 2022
Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

TEMOS: TExt to MOtionS Generating diverse human motions from textual descriptions Description Official PyTorch implementation of the paper "TEMOS: Gen

Mathis Petrovich 187 Dec 27, 2022
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

912 Jan 08, 2023
Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

CMSF Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning" Requirements Python = 3.7.6 PyTorch

4 Nov 25, 2022
Implementation of ConvMixer for "Patches Are All You Need? 🤷"

Patches Are All You Need? 🤷 This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?" by Asher

CMU Locus Lab 934 Jan 08, 2023
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

RODD Official Implementation of 2022 CVPRW Paper RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection Introduction: Recent studie

Umar Khalid 17 Oct 11, 2022
Code for GNMR in ICDE 2021

GNMR Code for GNMR in ICDE 2021 Please unzip data files in Datasets/MultiInt-ML10M first. Run labcode_preSamp.py (with graph sampling) for ECommerce-c

7 Oct 27, 2022