Official implementation of the method ContIG, for self-supervised learning from medical imaging with genomics

Related tags

Deep LearningContIG
Overview

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics

This is the code implementation of the paper "ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics".

If you find this repository useful, please consider citing our paper in your work:

@misc{contig2021,
      title={ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics}, 
      author={Aiham Taleb and Matthias Kirchler and Remo Monti and Christoph Lippert},
      year={2021},
      eprint={2111.13424},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

To run the experiments, you will have to have access to UK Biobank data (requires application) and will need to set up the data modalities properly.

We handle the paths to different external files with the paths.toml. Model checkpoints are stored in CHECKPOINTS_BASE_PATH (='checkpoints' by default). For some parts, we use plink and plink2 software, which you can download from here and here. Unzip and set the corresponding paths in the paths.toml file.

Python

Install the dependencies via

conda env create --file environment.yml

Setting up image data

See image_preprocessing for the code. We first use resize.py to find the retinal fundus circle, crop to that part of the image, and then filter out the darkest and brightest images with filtering_images.py.

After preprocessing the images, make sure to set BASE_IMG in paths.toml to the directory that contains the directories {left|right}/512_{left|right}/processed/.

Ancestry prediction

We only included individuals that were genetically most likely to be of european ancestry. We used the genotype-based prediction pipeline GenoPred; see documentation on the site, and put the path to the output (a .model_pred file in tsv format) into the ANCESTRY variable in paths.toml.

This ancestry prediction can also be replaced by the UKB variable 22006. In this case, create a tsv file with two columns, IID and EUR; set EUR = 1 for caucasians and EUR = 0 for others, and point the ANCESTRY variable in paths.toml to this file. Explicit ancestry prediction and the caucasian variable are mostly identical, but our ancestry prediction is a little more lenient and includes a few more individuals.

Setting up genetic data

We use three different genetic modalities in the paper.

Setting up Raw SNPs

Raw SNPs work mostly without preprocessing and use the basic microarray data from UKB. Make sure to set the BASE_GEN path in paths.toml to the directory that contains all the bed/bim/fam files from the UKB.

Setting up Polygenic Scores

PGS requires the imputed data. See the pgs directory for a reference to set everything up. Make sure to update the BASE_PGS to point to the output directory from that. We also include a list of scores used in the main paper.

Setting up Burden Scores

Burden scores are computed using the whole exome sequencing release from the UKB. We used faatpipe to preprocess this data; see there for details. Update the BASE_BURDEN variable in paths.toml to include the results (should point to a directory with combined_burdens_colnames.txt, combined_burdens_iid.txt and combined_burdens.h5).

Setting up phenotypic UKB data

Point the UKB_PHENO_FILE variable in paths.toml to the full phenotype csv file from the UKB data release and run export_card() from data.data_ukb.py to preprocess the data (only needs to be run once; there may be a bug with pandas >= 1.3 on some systems, so consider using pandas = 1.2.5 for this step).

You can ignore the BLOOD_BIOMARKERS variable, since it's not used in any of the experiments.

Setting up downstream tasks

Download and unzip the downstream tasks from PALM, RFMiD and APTOS and point the {PALM|RFMID|APTOS}_PATH variables in paths.toml correspondingly.

UKB downstream tasks are set up with the main UKB set above.

Training self-supervised models

ContIG

In order to train models with our method ContIG, use the script train_contig.py. In this script, it is possible to set many of the constants used in training, such as IMG_SIZE, BATCH_SIZE, LR, CM_EMBEDDING_SIZE, GENETICS_MODALITY and many others. We provide default values at the beginning of this script, which we use in our reported values. Please make sure to set the paths to datasets in paths.toml beforehand.

Baseline models

In order to train the baseline models, each script is named after the algorithm: SimCLR simclr.py, NNCLR nnclr.py, Simsiam simsiam.py, Barlow Twins barlow_twins.py, and BYOL byol.py

Each of these scripts allow for setting all the relevant hyper-parameters for training these baselines, such as max_epochs, PROJECTION_DIM, TEMPERATURE, and others. Please make sure to set the paths to datasets in paths.toml beforehand.

Evaluating Models

To fine-tune (=train) the models on downstream tasks, the following scripts are the starting points:

  • For APTOS Retinopathy detection: use aptos_diabetic_retinopathy.py
  • For RFMiD Multi-Disease classification: use rfmid_retinal_disease_classification.py
  • For PALM Myopia Segmentation: use palm_myopia_segmentation.py
  • For UK Biobank Cardiovascular discrete risk factors classification: use ukb_covariate_classification.py
  • For UK Biobank Cardiovascular continuous risk factors prediction (regression): use ukb_covariate_prediction.py

Each of the above scripts defines its hyper-parameters at the beginning of the respective files. A common variable however is CHECKPOINT_PATH, whose default value is None. If set to None, this means to train the model from scratch without loading any pretrained checkpoint. Otherwise, it loads the encoder weights from pretrained models.

Running explanations

Global explanations

Global explanations are implemented in feature_explanations.py. See the final_plots function for an example to create explanations with specific models.

Local explanations

Local explanations are implemented in local_explanations.py. Individuals for which to create explanations can be set with the INDIVIDUALS variable. See the final_plots function for an example to create explanations with specific models.

Running the GWAS

The GWAS is implemented in downstream_gwas.py. You can specify models for which to run the GWAS in the WEIGHT_PATHS dict and then run the run_all_gwas function to iterate over this dict.

Owner
Digital Health & Machine Learning
Digital Health & Machine Learning
NEATEST: Evolving Neural Networks Through Augmenting Topologies with Evolution Strategy Training

NEATEST: Evolving Neural Networks Through Augmenting Topologies with Evolution Strategy Training

Göktuğ Karakaşlı 16 Dec 05, 2022
Provide partial dates and retain the date precision through processing

Prefix date parser This is a helper class to parse dates with varied degrees of precision. For example, a data source might state a date as 2001, 2001

Friedrich Lindenberg 13 Dec 14, 2022
The repository contain code for building compiler using puthon.

Building Compiler This is a python implementation of JamieBuild's "Super Tiny Compiler" Overview JamieBuilds developed a wonderfully educative compile

Shyam Das Shrestha 1 Nov 21, 2021
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
Medical-Image-Triage-and-Classification-System-Based-on-COVID-19-CT-and-X-ray-Scan-Dataset

Medical-Image-Triage-and-Classification-System-Based-on-COVID-19-CT-and-X-ray-Sc

2 Dec 26, 2021
A CV toolkit for my papers.

PyTorch-Encoding created by Hang Zhang Documentation Please visit the Docs for detail instructions of installation and usage. Please visit the link to

Hang Zhang 2k Jan 04, 2023
Implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networks, using PyTorch

C-CNN: Contourlet Convolutional Neural Networks This repo implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networ

Goh Kun Shun (KHUN) 10 Nov 03, 2022
Experiments for Operating Systems Lab (ETCS-352)

Operating Systems Lab (ETCS-352) Experiments for Operating Systems Lab (ETCS-352) performed by me in 2021 at uni. All codes are written by me except t

Deekshant Wadhwa 0 Sep 06, 2022
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Varun Nair 37 Dec 30, 2022
The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

Representative Batch Normalization (RBN) with Feature Calibration The official implementation of the CVPR2021 oral paper: Representative Batch Normali

Open source projects of ShangHua-Gao 76 Nov 09, 2022
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". **The code is in the "master

杨攀 93 Jan 07, 2023
This program was designed to detect whether someone is wearing a facemask through a live video stream.

This program was designed to detect whether someone is wearing a facemask through a live video stream. A custom lightweight CNN trained with TensorFlow on a public dataset provided by Kaggle is used

0 Apr 02, 2022
[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation This is the official implementation for the method described in Ch

Jiaxing Yan 27 Dec 30, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
CRNN With PyTorch

CRNN-PyTorch Implementation of https://arxiv.org/abs/1507.05717

Vadim 4 Sep 01, 2022
Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

This is a repository for the paper of "Well-classified Examples are Underestimated in Classification with Deep Neural Networks" The implementation and

LancoPKU 25 Dec 11, 2022
Place holder for HOPE: a human-centric and task-oriented MT evaluation framework using professional post-editing

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation Place holder for dat

Lifeng Han 1 Apr 25, 2022
Code from PropMix, accepted at BMVC'21

PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels This repository is the official implementation of Hard Sample Fil

6 Dec 21, 2022
Build fully-functioning computer vision models with PyTorch

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inferenc

Alan Bi 576 Dec 29, 2022
Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergen

281 Dec 30, 2022