This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

Overview

HiRID-ICU-Benchmark

This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset for which the manuscript can be found here.

We first introduce key resources to better understand the structure and specificity of the data. We then detail the different features of our pipeline and how to use them as shown in the below figure.

Figure

Key Resources

We build our work on previously released data, models, and metrics. To help users which might be unfamiliar with them we provide in this section some related documentation.

HiRID data

We based our benchmark on a recent dataset in intensive care called HiRID. It is a freely accessible critical care dataset containing data from more than 33,000 patient admissions to the Department of Intensive Care Medicine, Bern University Hospital, Switzerland (ICU) from January 2008 to June 2016. It was first released as part of the circulatory Early Warning Score project.

First, you can find some more details about the demographics of the patients of the data in Appendix A: HiRID Dataset Details. However, for more details about the original data, it's better to refer to its latest documentation . More in detail the documentation contains the following sections of interest:

  • Getting started This first section points to a jupyter notebook to familiarize yourself with the data.
  • Data details This second section contains a description of the variables existing in the dataset. To complete this section you can refer to our varref.tsv which we use to build the common version of the data.
  • Structure of the published data This final section contains details about the structure of the raw data you will have to download and place in hirid-data-root folder (see "Run Pre-Processing").

Models

As for the data, in this benchmark, we compare existing machine learning models that are commonly used for multivariate time-series data. For these models' implementation we use pytorch, for the deep learning models, lightgbm for the boosted tree approaches, and sklearn for the logistic regression model and metrics. In the deep learning models we used the following models:

Metrics

In our benchmark we use different metrics depending on the tasks, however, all the implementations are from sklearn which documents well their usage:

Setup

In the following we assume a Linux installation, however, other platforms may also work

  1. Install Conda, see the official installation instructions
  2. clone this repository and change into the directory of the repository
  3. conda env update (creates an environment icu-benchmark)
  4. pip install -e .

Download Data

  1. Get access to the HiRID 1.1.1 dataset on physionet. This entails
    1. getting a credentialed physionet account
    2. submit a usage request to the data depositor
  2. Once access is granted, download the following files
    1. reference_data.tar.gz
    2. observation_tables_parquet.tar.gz
    3. pharma_records_parquet.tar.gz
  3. unpack the files into the same directory using e.g. cat *.tar.gz | tar zxvf - -i

How to Run

Run Prepocessing

Activate the conda environment using conda activate icu-benchmark. Then

icu-benchmarks preprocess --hirid-data-root [path to unpacked parquet files as downloaded from phyiosnet] \
                          --work-dir [output directory] \
                          --var-ref-path ./preprocessing/resources/varref.tsv \
                          --split-path ./preprocessing/resources/split.tsv \
                          --nr-workers 8

The above command requires about 6GB of RAM per core and in total approximately 30GB of disk space.

Run Training

Custom training

To run a custom training you should, activate the conda environment using conda activate icu-benchmark. Then

icu-benchmarks train -c [path to gin config] \
                     -l [path to logdir] \
                     -t [task name] \
                     -sd [seed number] 

Task name should be one of the following: Mortality_At24Hours, Dynamic_CircFailure_12Hours, Dynamic_RespFailure_12Hours, Dynamic_UrineOutput_2Hours_Reg, Phenotyping_APACHEGroup or Remaining_LOS_Reg.\ To see an example of gin-config file please refer to ./configs/. You can also check directly the gin-config documentation. this will create a new directory [path to logdir]/[task name]/[seed number]/ containing:

  • val_metrics.pkl and test_metrics.pkl: Pickle files with model's performance respectively validation and test sets.
  • train_config.gin: The so-called "operative" config allowing the save the configuration used at training.
  • model.(torch/txt/joblib) : The weights of the model that was trained. The extension depends model type.
  • tensorboard/: (Optional) Directory with tensorboard logs. One can do tensorboard --logdir ./tensorboard to visualize them,

Reproduce experiments from the paper

If you are interested in reproducing the experiments from the paper, you can directly use the pre-built scripts in ./run_scripts/. For instance, you can run the following command to reproduce the GRU baseline on the Mortality task:

sh run_script/baselines/Mortality_At24Hours/GRU.sh

As for custom training, you will create a directory with the files mentioned above. The pre-built scripts are divided into four categories as follows:

  • baselines: This folder contains scripts to reproduce the main benchmark experiment. Each of them will run a model with the best parameters we found using a random search for 10 identical seeds.
  • ablations: This folder contains the scripts to reproduce the ablations studies on the horizon, sequence length, and weighting.
  • random-search: This script will run each one instance of a random search. This means if you want a k-run search you need to run it k times.
  • pretrained: This last type of script allows us to evaluate pretrain models from our experiments. We discuss them more in detail in the next section

Run Evaluation of Pretrained Models

Custom Evaluation

As for training a model, you can evaluate any previously trained model using the evaluate as follows:

icu-benchmarks evaluate -c [path to gin config] \
                        -l [path to logdir] \
                        -t [task name] \

This command will evaluate the model at [path to logdir]/[task name]/model.(torch/txt/joblib) on the test set of the dataset provided in the config. Results are saved to test_metrics.pkl file.

Evaluate Manuscript models

To either check the pre-processing pipeline outcome or simply reproduce the paper results we provided weights for all models of the benchmark experiment in files/pretrained_weights. Please note that the data items in this repository utilize the git-lfs framework. You need to install git-lfs on your system to be able to download and access the pretrained weights.

Once this is done you can evaluate any network by running :

sh ./run_scripts/pretrained/[task name]/[model name].sh

Note that we provide only one set of weights for each model which corresponds to the median performance among the 10 runs reported in the manuscript.

Run Pipeline on Simulated Data

We provide a small toy data set to test the processing pipeline and to get a rough impression how to original data looks like. Since there are restrictions accessing the HiRID data set, instead of publishing a small subset of the data, we generated a very simple simulated dataset based on some statistics aggregated from the full HiRID dataset. It is however not useful for data exploration or training, as for example the values are sampled independently from each other and any structure between variables in the original data set is not represented.

The example data set is provided in files/fake_data. Similar as with the original data, the preprocessing pipeline can be run using

icu-benchmarks preprocess --hirid-data-root files/fake_data --work-dir fake_data_wdir --var-ref-path preprocessing/resources/varref.tsv

Note, that for this fake dataset some models cannot be successfully trained, as the training instances are degenerate. In case you'd like to explore the training part of our pipeline, you could work with pretrained models as described above.

Dataset Generation

The data set was generated using the following command:

python -m icu_benchmarks.synthetic_data.generate_simple_fake_data files/dataset_stats/ files/fake_data/ --var-ref-path preprocessing/resources/varref.tsv

The script generate_simple_fake_data.py generates fake observation and pharma records in the following way: It first generates a series of timestamps where the difference between consecutive timestamps is sampled from the distribution of timestamp differences in the original dataset. Then, for every timestamp, a variableid/pharmaid is selected at random also according to the distribution in the original dataset. Finally, we sample the values of a variable from a gaussian with mean and standard deviation as observed in the original data. We then clip the values to fit the lower and upperbound as given in the varref table.

The necessary statistics for sampling can be found in files/dataset_stats. They were generated using

python -m icu_benchmarks.synthetic_data.collect_stats [Path to the decompressed parquet data directory as published on physionet] files/dataset_stats/

License

You can find the license for the original HiRID data here. For our code we license it under a MIT License

Owner
Biomedical Informatics at ETH Zurich
Biomedical Informatics at ETH Zurich
A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations.

IllustrationGAN A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations. Generated Images

268 Nov 27, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 09, 2023
A small demonstration of using WebDataset with ImageNet and PyTorch Lightning

A small demonstration of using WebDataset with ImageNet and PyTorch Lightning This is a small repo illustrating how to use WebDataset on ImageNet. usi

50 Dec 16, 2022
PolyTrack: Tracking with Bounding Polygons

PolyTrack: Tracking with Bounding Polygons Abstract In this paper, we present a novel method called PolyTrack for fast multi-object tracking and segme

Gaspar Faure 13 Sep 15, 2022
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

Kaichun Mo 26 Nov 04, 2022
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Kai Zhang 2k Dec 31, 2022
Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

title emoji colorFrom colorTo sdk app_file pinned Transport_Mode_Detector 🚀 purple yellow gradio app.py false Configuration title: string Display tit

Nishant Rajadhyaksha 3 Jan 16, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022
[TNNLS 2021] The official code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement"

CSDNet-CSDGAN this is the code for the paper "Learning Deep Context-Sensitive Decomposition for Low-Light Image Enhancement" Environment Preparing pyt

Jiaao Zhang 17 Nov 05, 2022
STRIVE: Scene Text Replacement In Videos

STRIVE: Scene Text Replacement In Videos Dataset Types: RoboText SynthText RealWorld videos RoboText : Videos of texts collected using navigation robo

15 Jul 11, 2022
这是一个unet-pytorch的源码,可以训练自己的模型

Unet:U-Net: Convolutional Networks for Biomedical Image Segmentation目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Downl

Bubbliiiing 567 Jan 05, 2023
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation)

This repo provides code for QB-Norm (Cross Modal Retrieval with Querybank Normalisation) Usage example python dynamic_inverted_softmax.py --sims_train

36 Dec 29, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

SELF-ATTENTIVE VAD: CONTEXT-AWARE DETECTION OF VOICE FROM NOISE (ICASSP 2021) Pytorch implementation of SELF-ATTENTIVE VAD | Paper | Dataset Yong Rae

97 Dec 23, 2022
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

291 Jan 02, 2023
The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

ISC-Track1-Submission The codes and related files to reproduce the results for Image Similarity Challenge Track 1. Required dependencies To begin with

Wenhao Wang 115 Jan 02, 2023
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

36 Oct 04, 2022
Complete U-net Implementation with keras

U Net Lowered with Keras Complete U-net Implementation with keras Original Paper Link : https://arxiv.org/abs/1505.04597 Special Implementations : The

Sagnik Roy 14 Oct 10, 2022
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

This is a Pytorch implementation of Janai, J., Güney, F., Ranjan, A., Black, M. and Geiger, A., Unsupervised Learning of Multi-Frame Optical Flow with

Anurag Ranjan 110 Nov 02, 2022