Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

Related tags

Deep LearningDeepCDR
Overview

DeepCDR

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

This work has been accepted to ECCB2020 and was also published in the journal Bioinformatics.

model

DeepCDR is a hybrid graph convolutional network for cancer drug response prediction. It takes both multi-omics data of cancer cell lines and drug structure as inputs and predicts the drug sensitivity (binary or contineous IC50 value).

Requirements

  • Keras==2.1.4
  • TensorFlow==1.13.1
  • hickle >= 2.1.0

Installation

DeepCDR can be downloaded by

git clone https://github.com/kimmo1019/DeepCDR

Installation has been tested in a Linux/MacOS platform.

Instructions

We provide detailed step-by-step instructions for running DeepCDR model including data preprocessing, model training, and model test.

Model implementation

Step 1: Data Preparing

Three types of raw data are required to generate genomic mutation matrix, gene expression matrix and DNA methylation matrix from CCLE database.

CCLE_mutations.csv - Genomic mutation profile from CCLE database

CCLE_expression.csv - Gene expression profile from CCLE database

CCLE_RRBS_TSS_1kb_20180614.txt - DNA methylation profile from CCLE database

The three types of raw data genomic mutation file, gene expression file and DNA methylation file can be downloaded from CCLE database or from our provided Cloud Server.

After data preprocessed, the three following preprocessed files will be in located in data folder.

genomic_mutation_34673_demap_features.csv -- genomic mutation matrix where each column denotes mutation locus and each row denotes a cell line

genomic_expression_561celllines_697genes_demap_features.csv -- gene expression matrix where each column denotes a coding gene and each row denotes a cell line

genomic_methylation_561celllines_808genes_demap_features.csv -- DNA methylation matrix where each column denotes a methylation locus and each row denotes a cell line

We recommend to start from the preprocessed data. Please note that each preprocessed file is in csv format, of which the column and row name are provided to speficy mutation location, gene name, methylation location and corresponding Cell line.

Step 2: Drug feature representation

Each drug in our study will be represented as a graph containing nodes and edges. From the GDSC database, we collected 223 drugs that have unique Pubchem ids. Note that a drug under different screening condition (different GDSC drug id) may share the same Pubchem id. Here, we used deepchem library for extracting node features and gragh of a drug. The node feature (75 dimension) corresponds to a stom in within a drug, which includes atom type, degree and hybridization, etc.

We recorded three types of features in a list as following

drug_feat = [node_feature, adj_list, degree_list]
node_feature - features of all atoms within a drug with size (nb_atom, 75)
adj_list - adjacent list of all atoms within a drug. It denotes the all the neighboring atoms indexs
degree_list - degree list of all atoms within a drug. It denotes the number of neighboring atoms 

The above feature list will be further compressed as pubchem_id.hkl using hickle library.

Please note that we provided the extracted features of 223 drugs from GDSC database, just unzip the drug_graph_feat.zip file in data/GDSC folder

Step 3: DeepCDR model training and testing

Here, we provide both DeepCDR regression and classification model here.

DeepCDR regression model

python run_DeepCDR.py -gpu_id [gpu_id] -use_mut [use_mut] -use_gexp [use_gexp] -use_methy [use_methy] 
[gpu_id] - set GPU card id (default:0)
[use_mut] - whether use genomic mutation data (default: True)
[use_gexp] - whether use gene expression data (default: True)
[use_methy] - whether use DNA methylation data (default: True)

One can run python run_DeepCDR.py -gpu_id 0 -use_mut True -use_gexp True -use_methy True to implement the DeepCDR regression model.

The trained model will be saved in data/checkpoint folder. The overall Pearson's correlation will be calculated.

DeepCDR classification model

python run_DeepCDR_classify.py -gpu_id [gpu_id] -use_mut [use_mut] -use_gexp [use_gexp] -use_methy [use_methy] 
[gpu_id] - set GPU card id (default:0)
[use_mut] - whether use genomic mutation data (default: True)
[use_gexp] - whether use gene expression data (default: True)
[use_methy] - whether use DNA methylation data (default: True)

One can run python run_DeepCDR_classify.py -gpu_id 0 -use_mut True -use_gexp True -use_methy True to implement the DeepCDR lassification model.

The trained model will be saved in data/checkpoint folder. The overall AUC and auPRn will be calculated.

External patient data

We also provided the external patient data downloaded from Firehose Broad GDAC. The patient data were preprocessed the same way as cell line data. The preprocessed data can be downloaded from our Server.

The preprocessed data contain three important files:

mut.csv - Genomic mutation profile of patients

expr.csv - Gene expression profile of patients

methy.csv - DNA methylation profile of patients

Note that the preprocessed patient data (csv format) have exact the same columns names as the three cell line data (genomic_mutation_34673_demap_features.csv, genomic_expression_561celllines_697genes_demap_features.csv, genomic_methylation_561celllines_808genes_demap_features.csv). The only difference is that the row name of patient data were replaced with patient unique barcode instead of cell line name.

Such format-consistent data is easy for external evaluation by repacing the cell line data with patient data.

Predicted missing data

As GDSC database only measured IC50 of part cell line and drug paires. We applied DeepCDR to predicted the missing IC50 values in GDSC database. The predicted results can be find at data/Missing_data_pre/records_pre_all.txt. Each record represents a predicted drug and cell line pair. The records were sorted by the predicted median IC50 values of a drug (see Fig.2E).

Contact

If you have any question regard our code or data, please do not hesitate to open a issue or directly contact me ([email protected])

Cite

If you used our work in your research, please consider citing our paper

Qiao Liu, Zhiqiang Hu, Rui Jiang, Mu Zhou, DeepCDR: a hybrid graph convolutional network for predicting cancer drug response, Bioinformatics, 2020, 36(2):i911-i918.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Owner
Qiao Liu
Qiao Liu
Public repo for the ICCV2021-CVAMD paper "Is it Time to Replace CNNs with Transformers for Medical Images?"

Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C

Christos Matsoukas 80 Dec 27, 2022
DyNet: The Dynamic Neural Network Toolkit

The Dynamic Neural Network Toolkit General Installation C++ Python Getting Started Citing Releases and Contributing General DyNet is a neural network

Chris Dyer's lab @ LTI/CMU 3.3k Jan 06, 2023
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 04, 2023
[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Contents Local and Global GAN Cross-View Image Translation Semantic Image Synthesis Acknowledgments Related Projects Citation Contributions Collaborat

Hao Tang 131 Dec 07, 2022
Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

Human-Segmentation-PyTorch Human segmentation models, training/inference code, and trained weights, implemented in PyTorch. Supported networks UNet: b

Thuy Ng 474 Dec 19, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 05, 2023
Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

NVIDIA AI IOT 803 Jan 06, 2023
Style transfer, deep learning, feature transform

FastPhotoStyle License Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons

NVIDIA Corporation 10.9k Jan 02, 2023
A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration.

A python package simulating the quasi-2D pseudospin-1/2 Gross-Pitaevskii equation with NVIDIA GPU acceleration. Introduction spinor-gpe is high-level,

2 Sep 20, 2022
Feup-csr - Repository holding my group's submission to the CSR project competition

CSR Competições de Swarm Robotics Swarm Robotics Competitions This repository holds the files submitted for the CSR project competition. Project group

Nuno Pereira 1 Jan 04, 2022
CVPR2021 Content-Aware GAN Compression

Content-Aware GAN Compression [ArXiv] Paper accepted to CVPR2021. @inproceedings{liu2021content, title = {Content-Aware GAN Compression}, auth

52 Nov 06, 2022
PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling This repository contains the implementation for the paper Diffusion

James Thornton 50 Jan 03, 2023
Weakly-supervised object detection.

Wetectron Wetectron is a software system that implements state-of-the-art weakly-supervised object detection algorithms. Project CVPR'20, ECCV'20 | Pa

NVIDIA Research Projects 342 Jan 05, 2023
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
Compact Bidirectional Transformer for Image Captioning

Compact Bidirectional Transformer for Image Captioning Requirements Python 3.8 Pytorch 1.6 lmdb h5py tensorboardX Prepare Data Please use git clone --

YE Zhou 19 Dec 12, 2022
Dynamic Graph Event Detection

DyGED Dynamic Graph Event Detection Get Started pip install -r requirements.txt TODO Paper link to arxiv, and how to cite. Twitter Weather dataset tra

Mert Koşan 3 May 09, 2022
Pytorch implementation of "Forward Thinking: Building and Training Neural Networks One Layer at a Time"

forward-thinking-pytorch Pytorch implementation of Forward Thinking: Building and Training Neural Networks One Layer at a Time Requirements Python 2.7

Kim Heecheol 65 Oct 06, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022