The easiest tool for extracting radiomics features and training ML models on them.

Overview


ClassyRadiomics

License CI Build codecov

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()
Comments
  • Preprocessing features fails during machine learning

    Preprocessing features fails during machine learning

    Describe the bug

    Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
    feature_df = extractor.run()
    
    feature_df.head()
    
    label_df = pd.read_csv(data_dir / "labels.csv")
    label_df.sample(5)
    
    from autorad.data.dataset import FeatureDataset
    
    merged_feature_df = feature_df.merge(label_df, left_on="ID",
        right_on="patient_ID", how="left")
    feature_dataset = FeatureDataset(
        merged_feature_df,
        target="diagnosis",
        ID_colname="ID"
    )
    
    splits_path = result_dir / "splits.json"
    feature_dataset.split(method="train_val_test", save_path=splits_path)
    
    from autorad.models.classifier import MLClassifier
    from autorad.training.trainer import Trainer
    
    models = MLClassifier.initialize_default_sklearn_models()
    print(models)
    
    trainer = Trainer(
        dataset=feature_dataset,
        models=models,
        result_dir=result_dir,
        experiment_name="Fibromatosis_vs_sarcoma_classification",
    )
    trainer.run_auto_preprocessing(
            selection_methods=["boruta"],
            oversampling=False,
            )
    
    

    Expected Results

    Initialising the trainer and running preprocessing on the features

    Actual Results

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [15], in <cell line: 7>()
          1 trainer = Trainer(
          2     dataset=feature_dataset,
          3     models=models,
          4     result_dir=result_dir,
          5     experiment_name="Fibromatosis_vs_sarcoma_classification",
          6 )
    ----> 7 trainer.run_auto_preprocessing(
          8         selection_methods=["boruta"],
          9         oversampling=False,
         10         )
    
    File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
         70 preprocessor = Preprocessor(
         71     normalize=True,
         72     feature_selection_method=selection_method,
         73     oversampling_method=oversampling_method,
         74 )
         75 try:
         76     preprocessed[selection_method][
         77         oversampling_method
    ---> 78     ] = preprocessor.fit_transform(self.dataset.data)
         79 except AssertionError:
         80     log.error(
         81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
         82     )
    
    File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
         64 result_y = {}
         65 all_features = X.train.columns.tolist()
    ---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
         67     X.train, y.train
         68 )
         69 self.selected_features = self.pipeline["select"].selected_features(
         70     column_names=all_features
         71 )
         72 result_X["train"] = pd.DataFrame(
         73     X_train_trans, columns=self.selected_features
         74 )
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
        432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
        433 if hasattr(last_step, "fit_transform"):
    --> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
        435 else:
        436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
         44 def fit_transform(
         45     self, X: np.ndarray, y: np.ndarray
         46 ) -> tuple[np.ndarray, np.ndarray]:
    ---> 47     self.fit(X, y)
         48     return X[:, self.selected_columns], y
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
        122 with warnings.catch_warnings():
        123     warnings.simplefilter("ignore")
    --> 124     model.fit(X, y)
        125 self.selected_columns = np.where(model.support_)[0].tolist()
        126 if not self.selected_columns:
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
        188 def fit(self, X, y):
        189     """
        190     Fits the Boruta feature selection with the provided estimator.
        191 
       (...)
        198         The target values.
        199     """
    --> 201     return self._fit(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
        249 def _fit(self, X, y):
        250     # check input params
    --> 251     self._check_params(X, y)
        252     self.random_state = check_random_state(self.random_state)
        253     # setup variables for Boruta
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
        513 """
        514 Check hyperparameters as well as X and y before proceeding with fit.
        515 """
        516 # check X and y are consistent len, X is Array and y is column
    --> 517 X, y = check_X_y(X, y)
        518 if self.perc <= 0 or self.perc > 100:
        519     raise ValueError('The percentile should be between 0 and 100.')
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
        961 if y is None:
        962     raise ValueError("y cannot be None")
    --> 964 X = check_array(
        965     X,
        966     accept_sparse=accept_sparse,
        967     accept_large_sparse=accept_large_sparse,
        968     dtype=dtype,
        969     order=order,
        970     copy=copy,
        971     force_all_finite=force_all_finite,
        972     ensure_2d=ensure_2d,
        973     allow_nd=allow_nd,
        974     ensure_min_samples=ensure_min_samples,
        975     ensure_min_features=ensure_min_features,
        976     estimator=estimator,
        977 )
        979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
        981 check_consistent_length(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        744         array = array.astype(dtype, casting="unsafe", copy=False)
        745     else:
    --> 746         array = np.asarray(array, order=order, dtype=dtype)
        747 except ComplexWarning as complex_warning:
        748     raise ValueError(
        749         "Complex data not supported\n{}\n".format(array)
        750     ) from complex_warning
    
    ValueError: could not broadcast input array from shape (60,1015) into shape (60,)
    
    opened by wagon-master 3
  • BUG: Time and memory inefficient concating in pandas on every case.

    BUG: Time and memory inefficient concating in pandas on every case.

    In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

    opened by laqua-stack 2
  • Feature/add inference mlflow

    Feature/add inference mlflow

    Major changes:

    • fixed training with autologging of training parameters, preprocessor and classifier in MLFlow
    • webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation
    • webapp: moved all steps into subpages
    • webapp: added Getting started in the landing page

    Fixes:

    • webapp: fixed extraction params discarding Feature Names selected from Feature Classes
    opened by pwoznicki 1
  • example_WORC.ipynb not being up to date with the repository

    example_WORC.ipynb not being up to date with the repository

    Describe the bug

    In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    
    
    from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
    feature_df = extractor.run()
    
    

    Expected Results

    1: Importing the function get_paths_with_separate_folder_per_case

    2: Using default_MR.yaml as value for extraction_params

    Actual Results

    1:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    Input In [7], in <cell line: 1>()
    ----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
          3 # create a table with all the paths
          4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    
    ModuleNotFoundError: No module named 'autorad.data.utils'
    

    2:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [18], in <cell line: 1>()
    ----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
          2 feature_df = extractor.run()
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
         39 self.dataset = dataset
         40 self.feature_set = feature_set
    ---> 41 self.extraction_params = self._get_extraction_param_path(
         42     extraction_params
         43 )
         44 log.info(f"Using extraction params from {self.extraction_params}")
         45 self.n_jobs = set_n_jobs(n_jobs)
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
         53     result = default_extraction_param_dir / extraction_params
         54 else:
    ---> 55     raise ValueError(
         56         f"Extraction parameter file {extraction_params} not found."
         57     )
         58 return result
    
    ValueError: Extraction parameter file default_MR.yaml not found.
    

    Fix

    1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

    opened by wagon-master 1
  • Bugfix/refactor

    Bugfix/refactor

    New features:

    • log feature dataset and splits in MLFlow
    • update docs & add getting-started

    Fixes:

    • fix evaluation in the web app
    • fix docs build in readthedocss
    opened by pwoznicki 0
  • Support various readers (Nibabel, ITK)

    Support various readers (Nibabel, ITK)

    Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

    Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.

    enhancement 
    opened by pwoznicki 0
Releases(v0.2.2)
  • v0.2.2(Jul 30, 2022)

Owner
Piotr Woźnicki
Recently graduated medical doctor, working on medical image analysis.
Piotr Woźnicki
This is the dataset for testing the robustness of various VO/VIO methods

KAIST VIO dataset This is the dataset for testing the robustness of various VO/VIO methods You can download the whole dataset on KAIST VIO dataset Ind

1 Sep 01, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

46 Nov 09, 2022
PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

Duo Li 33 Nov 27, 2022
Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

Dinglan Peng 115 Dec 17, 2022
PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation The paper: https://arxiv.org/abs/1704.03296 What makes

Jacob Gildenblat 322 Dec 17, 2022
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) 🔥 If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 04, 2023
Relative Human dataset, CVPR 2022

Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including: Depth layers (DLs): relative depth relationsh

Yu Sun 112 Dec 02, 2022
Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

Yuanpu Xie 91 Nov 21, 2022
Machine learning notebooks in different subjects optimized to run in google collaboratory

Notebooks Name Description Category Link Training pix2pix This notebook shows a simple pipeline for training pix2pix on a simple dataset. Most of the

Zaid Alyafeai 363 Dec 06, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ra

EnliteAI GmbH 222 Dec 24, 2022
SynNet - synthetic tree generation using neural networks

SynNet This repo contains the code and analysis scripts for our amortized approach to synthetic tree generation using neural networks. Our model can s

Wenhao Gao 60 Dec 29, 2022
General purpose Slater-Koster tight-binding code for electronic structure calculations

tight-binder Introduction General purpose tight-binding code for electronic structure calculations based on the Slater-Koster approximation. The code

9 Dec 15, 2022
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,

ALFRED 204 Dec 15, 2022
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

Yuqing Song 61 Oct 11, 2022
Blind visual quality assessment on 360° Video based on progressive learning

Blind visual quality assessment on omnidirectional or 360 video (ProVQA) Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and V

5 Jan 06, 2023
MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

Michael Murray 6 Oct 25, 2020
Repository of continual learning papers

Continual learning paper repository This repository contains an incomplete (but dynamically updated) list of papers exploring continual learning in ma

29 Jan 05, 2023
(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

NeRF--: Neural Radiance Fields Without Known Camera Parameters Project Page | Arxiv | Colab Notebook | Data Zirui Wang¹, Shangzhe Wu², Weidi Xie², Min

Active Vision Laboratory 411 Dec 26, 2022
This is an official pytorch implementation of Fast Fourier Convolution.

Fast Fourier Convolution (FFC) for Image Classification This is the official code of Fast Fourier Convolution for image classification on ImageNet. Ma

pkumi 199 Jan 03, 2023