The easiest tool for extracting radiomics features and training ML models on them.

Overview


ClassyRadiomics

License CI Build codecov

Simple pipeline for experimenting with radiomics features

Installation

git clone https://github.com/piotrekwoznicki/ClassyRadiomics.git
cd classrad
pip install -e .

Example - Hydronephrosis detection from CT images:

Extract radiomics features and save them to CSV table

df = pd.read_csv(table_dir / "paths.csv")
extractor = FeatureExtractor(
    df=df,
    out_path=(table_dir / "features.csv"),
    image_col="img_path",
    mask_col="seg_path",
    verbose=True,
)
extractor.extract_features()

Create a dataset from the features table

feature_df = pd.read_csv(table_dir / "features.csv")
data = Dataset(
    dataframe=feature_df,
    features=feature_cols,
    target=label_col="Hydronephrosis",
    task_name="Hydronephrosis detection"
)
data.cross_validation_split_test_from_column(
    column_name="cohort", test_value="control"
)

Select classifiers to compare

classifier_names = [
    "Gaussian Process Classifier",
    "Logistic Regression",
    "SVM",
    "Random Forest",
    "XGBoost",
]
classifiers = [MLClassifier(name) for name in classifier_names]

Create an evaluator to train and evaluate selected classifiers

evaluator = Evaluator(dataset=data, models=classifiers)
evaluator.evaluate_cross_validation()
evaluator.boxplot_by_class()
evaluator.plot_all_cross_validation()
evaluator.plot_test()
Comments
  • Preprocessing features fails during machine learning

    Preprocessing features fails during machine learning

    Describe the bug

    Trying to use Machine Learning in the self-hosted webapp, as well as in example_WORC.ipynb fails.

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    from autorad.utils.preprocessing import get_paths_with_separate_folder_per_case
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")
    feature_df = extractor.run()
    
    feature_df.head()
    
    label_df = pd.read_csv(data_dir / "labels.csv")
    label_df.sample(5)
    
    from autorad.data.dataset import FeatureDataset
    
    merged_feature_df = feature_df.merge(label_df, left_on="ID",
        right_on="patient_ID", how="left")
    feature_dataset = FeatureDataset(
        merged_feature_df,
        target="diagnosis",
        ID_colname="ID"
    )
    
    splits_path = result_dir / "splits.json"
    feature_dataset.split(method="train_val_test", save_path=splits_path)
    
    from autorad.models.classifier import MLClassifier
    from autorad.training.trainer import Trainer
    
    models = MLClassifier.initialize_default_sklearn_models()
    print(models)
    
    trainer = Trainer(
        dataset=feature_dataset,
        models=models,
        result_dir=result_dir,
        experiment_name="Fibromatosis_vs_sarcoma_classification",
    )
    trainer.run_auto_preprocessing(
            selection_methods=["boruta"],
            oversampling=False,
            )
    
    

    Expected Results

    Initialising the trainer and running preprocessing on the features

    Actual Results

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [15], in <cell line: 7>()
          1 trainer = Trainer(
          2     dataset=feature_dataset,
          3     models=models,
          4     result_dir=result_dir,
          5     experiment_name="Fibromatosis_vs_sarcoma_classification",
          6 )
    ----> 7 trainer.run_auto_preprocessing(
          8         selection_methods=["boruta"],
          9         oversampling=False,
         10         )
    
    File ~/AutoRadiomics/autorad/training/trainer.py:78, in Trainer.run_auto_preprocessing(self, oversampling, selection_methods)
         70 preprocessor = Preprocessor(
         71     normalize=True,
         72     feature_selection_method=selection_method,
         73     oversampling_method=oversampling_method,
         74 )
         75 try:
         76     preprocessed[selection_method][
         77         oversampling_method
    ---> 78     ] = preprocessor.fit_transform(self.dataset.data)
         79 except AssertionError:
         80     log.error(
         81         f"Preprocessing with {selection_method} and {oversampling_method} failed."
         82     )
    
    File ~/AutoRadiomics/autorad/preprocessing/preprocessor.py:66, in Preprocessor.fit_transform(self, data)
         64 result_y = {}
         65 all_features = X.train.columns.tolist()
    ---> 66 X_train_trans, y_train_trans = self.pipeline.fit_transform(
         67     X.train, y.train
         68 )
         69 self.selected_features = self.pipeline["select"].selected_features(
         70     column_names=all_features
         71 )
         72 result_X["train"] = pd.DataFrame(
         73     X_train_trans, columns=self.selected_features
         74 )
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
        432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
        433 if hasattr(last_step, "fit_transform"):
    --> 434     return last_step.fit_transform(Xt, y, **fit_params_last_step)
        435 else:
        436     return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:47, in CoreSelector.fit_transform(self, X, y)
         44 def fit_transform(
         45     self, X: np.ndarray, y: np.ndarray
         46 ) -> tuple[np.ndarray, np.ndarray]:
    ---> 47     self.fit(X, y)
         48     return X[:, self.selected_columns], y
    
    File ~/AutoRadiomics/autorad/feature_selection/selector.py:124, in BorutaSelector.fit(self, X, y, verbose)
        122 with warnings.catch_warnings():
        123     warnings.simplefilter("ignore")
    --> 124     model.fit(X, y)
        125 self.selected_columns = np.where(model.support_)[0].tolist()
        126 if not self.selected_columns:
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:201, in BorutaPy.fit(self, X, y)
        188 def fit(self, X, y):
        189     """
        190     Fits the Boruta feature selection with the provided estimator.
        191 
       (...)
        198         The target values.
        199     """
    --> 201     return self._fit(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:251, in BorutaPy._fit(self, X, y)
        249 def _fit(self, X, y):
        250     # check input params
    --> 251     self._check_params(X, y)
        252     self.random_state = check_random_state(self.random_state)
        253     # setup variables for Boruta
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/boruta/boruta_py.py:517, in BorutaPy._check_params(self, X, y)
        513 """
        514 Check hyperparameters as well as X and y before proceeding with fit.
        515 """
        516 # check X and y are consistent len, X is Array and y is column
    --> 517 X, y = check_X_y(X, y)
        518 if self.perc <= 0 or self.perc > 100:
        519     raise ValueError('The percentile should be between 0 and 100.')
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:964, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
        961 if y is None:
        962     raise ValueError("y cannot be None")
    --> 964 X = check_array(
        965     X,
        966     accept_sparse=accept_sparse,
        967     accept_large_sparse=accept_large_sparse,
        968     dtype=dtype,
        969     order=order,
        970     copy=copy,
        971     force_all_finite=force_all_finite,
        972     ensure_2d=ensure_2d,
        973     allow_nd=allow_nd,
        974     ensure_min_samples=ensure_min_samples,
        975     ensure_min_features=ensure_min_features,
        976     estimator=estimator,
        977 )
        979 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
        981 check_consistent_length(X, y)
    
    File ~/miniconda3/envs/AutoRadiomics/lib/python3.10/site-packages/sklearn/utils/validation.py:746, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
        744         array = array.astype(dtype, casting="unsafe", copy=False)
        745     else:
    --> 746         array = np.asarray(array, order=order, dtype=dtype)
        747 except ComplexWarning as complex_warning:
        748     raise ValueError(
        749         "Complex data not supported\n{}\n".format(array)
        750     ) from complex_warning
    
    ValueError: could not broadcast input array from shape (60,1015) into shape (60,)
    
    opened by wagon-master 3
  • BUG: Time and memory inefficient concating in pandas on every case.

    BUG: Time and memory inefficient concating in pandas on every case.

    In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144

    opened by laqua-stack 2
  • Feature/add inference mlflow

    Feature/add inference mlflow

    Major changes:

    • fixed training with autologging of training parameters, preprocessor and classifier in MLFlow
    • webapp: added Predict subpage for inference on a single case, giving out class probability and Shap explanation
    • webapp: moved all steps into subpages
    • webapp: added Getting started in the landing page

    Fixes:

    • webapp: fixed extraction params discarding Feature Names selected from Feature Classes
    opened by pwoznicki 1
  • example_WORC.ipynb not being up to date with the repository

    example_WORC.ipynb not being up to date with the repository

    Describe the bug

    In example_WORC.ipynb there are function calls that do not work due to code in the repository being changed while the example_WORC.ipynb code wasn't updated to reflect those changes

    Steps/Code to Reproduce

    import pandas as pd
    from pathlib import Path
    from autorad.external.download_WORC import download_WORCDatabase
    
    # Set where we will save our data and results
    base_dir = Path.cwd() / "autorad_tutorial"
    data_dir = base_dir / "data"
    result_dir = base_dir / "results"
    data_dir.mkdir(exist_ok=True, parents=True)
    result_dir.mkdir(exist_ok=True, parents=True)
    
    %load_ext autoreload
    %autoreload 2
    
    download data (it may take a few minutes)
    download_WORCDatabase(
    dataset="Desmoid",
    data_folder=data_dir,
    n_subjects=100,
    )
    
    
    
    from autorad.data.utils import get_paths_with_separate_folder_per_case  # 1
    
    # create a table with all the paths
    paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    paths_df.sample(5)
    
    
    from autorad.data.dataset import ImageDataset
    from autorad.feature_extraction.extractor import FeatureExtractor
    import logging
    
    logging.getLogger().setLevel(logging.CRITICAL)
    
    image_dataset = ImageDataset(
        paths_df,
        ID_colname="ID",
        root_dir=data_dir,
    )
    
    # Let's take a look at the data, plotting random 10 cases
    image_dataset.plot_examples(n=10, window=None)
    
    extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") # 2
    feature_df = extractor.run()
    
    

    Expected Results

    1: Importing the function get_paths_with_separate_folder_per_case

    2: Using default_MR.yaml as value for extraction_params

    Actual Results

    1:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    Input In [7], in <cell line: 1>()
    ----> 1 from autorad.data.utils import get_paths_with_separate_folder_per_case
          3 # create a table with all the paths
          4 paths_df = get_paths_with_separate_folder_per_case(data_dir, relative=True)
    
    ModuleNotFoundError: No module named 'autorad.data.utils'
    

    2:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [18], in <cell line: 1>()
    ----> 1 extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml")
          2 feature_df = extractor.run()
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:41, in FeatureExtractor.__init__(self, dataset, feature_set, extraction_params, n_jobs)
         39 self.dataset = dataset
         40 self.feature_set = feature_set
    ---> 41 self.extraction_params = self._get_extraction_param_path(
         42     extraction_params
         43 )
         44 log.info(f"Using extraction params from {self.extraction_params}")
         45 self.n_jobs = set_n_jobs(n_jobs)
    
    File ~/AutoRadiomics/autorad/feature_extraction/extractor.py:55, in FeatureExtractor._get_extraction_param_path(self, extraction_params)
         53     result = default_extraction_param_dir / extraction_params
         54 else:
    ---> 55     raise ValueError(
         56         f"Extraction parameter file {extraction_params} not found."
         57     )
         58 return result
    
    ValueError: Extraction parameter file default_MR.yaml not found.
    

    Fix

    1: change from autorad.data.utils to from autorad.utils.preprocessing 2: change extractor = FeatureExtractor(image_dataset, extraction_params="default_MR.yaml") to extractor = FeatureExtractor(image_dataset, extraction_params="MR_default.yaml")

    opened by wagon-master 1
  • Bugfix/refactor

    Bugfix/refactor

    New features:

    • log feature dataset and splits in MLFlow
    • update docs & add getting-started

    Fixes:

    • fix evaluation in the web app
    • fix docs build in readthedocss
    opened by pwoznicki 0
  • Support various readers (Nibabel, ITK)

    Support various readers (Nibabel, ITK)

    Currently we use Nibabel for loading images. It works only for Nifti images, but a user may want to load a DICOM image, without converting it to Nifti.

    Consider using MONAI LoadImage() function that provides a common interface for loading both Nifti and DICOM images.

    enhancement 
    opened by pwoznicki 0
Releases(v0.2.2)
  • v0.2.2(Jul 30, 2022)

Owner
Piotr Woźnicki
Recently graduated medical doctor, working on medical image analysis.
Piotr Woźnicki
LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

LightningFSL: Few-Shot Learning with Pytorch-Lightning In this repo, a number of pytorch-lightning implementations of FSL algorithms are provided, inc

Xu Luo 76 Dec 11, 2022
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Kai Zhang 2k Dec 31, 2022
ParaGen is a PyTorch deep learning framework for parallel sequence generation

ParaGen is a PyTorch deep learning framework for parallel sequence generation. Apart from sequence generation, ParaGen also enhances various NLP tasks, including sequence-level classification, extrac

Bytedance Inc. 169 Dec 22, 2022
StarGAN v2-Tensorflow - Simple Tensorflow implementation of StarGAN v2

Official Tensorflow implementation Open ! - Clova AI StarGAN v2 — Un-official TensorFlow Implementation [Paper] [Pytorch] : Diverse Image Synthesis f

Junho Kim 110 Jul 02, 2022
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

74 Dec 30, 2022
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

StarGAN v2 - Official PyTorch Implementation StarGAN v2: Diverse Image Synthesis for Multiple Domains Yunjey Choi*, Youngjung Uh*, Jaejun Yoo*, Jung-W

Clova AI Research 3.1k Jan 09, 2023
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

Brent Griffin 172 Dec 21, 2022
This repository contains the code for: RerrFact model for SciVer shared task

RerrFact This repository contains the code for: RerrFact model for SciVer shared task. Setup for Inference 1. Download SciFact database Download the S

Ashish Rana 1 May 22, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit c

Monash Green AI Lab 51 Dec 10, 2022
Deep Residual Networks with 1K Layers

Deep Residual Networks with 1K Layers By Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Microsoft Research Asia (MSRA). Table of Contents Introduc

Kaiming He 856 Jan 06, 2023
Dungeons and Dragons randomized content generator

Component based Dungeons and Dragons generator Supports Entity/Monster Generation NPC Generation Weapon Generation Encounter Generation Environment Ge

Zac 3 Dec 04, 2021
Content shared at DS-OX Meetup

Streamlit-Projects Streamlit projects available in this repo: An introduction to Streamlit presented at DS-OX (Feb 26, 2020) meetup Streamlit 101 - Ja

Arvindra 69 Dec 23, 2022
Action Recognition for Self-Driving Cars

Action Recognition for Self-Driving Cars This repo contains the codes for the 2021 Fall semester project "Action Recognition for Self-Driving Cars" at

VITA lab at EPFL 3 Apr 07, 2022
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
A Lightweight Hyperparameter Optimization Tool 🚀

Lightweight Hyperparameter Optimization 🚀 The mle-hyperopt package provides a simple and intuitive API for hyperparameter optimization of your Machin

136 Jan 08, 2023
OMNIVORE is a single vision model for many different visual modalities

Omnivore: A Single Model for Many Visual Modalities [paper][website] OMNIVORE is a single vision model for many different visual modalities. It learns

Meta Research 451 Dec 27, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
NBEATSx: Neural basis expansion analysis with exogenous variables

NBEATSx: Neural basis expansion analysis with exogenous variables We extend the NBEATS model to incorporate exogenous factors. The resulting method, c

Cristian Challu 100 Dec 31, 2022
Pytorch implementation of Bert and Pals: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

PyTorch implementation of BERT and PALs Introduction Work by Asa Cooper Stickland and Iain Murray, University of Edinburgh. Code for BERT and PALs; mo

Asa Cooper Stickland 70 Dec 29, 2022