Improving Compound Activity Classification via Deep Transfer and Representation Learning

Overview

Improving Compound Activity Classification via Deep Transfer and Representation Learning

This repository is the official implementation of Improving Compound Activity Classification via Deep Transfer and Representation Learning.

Requirements

Operating systems: Red Hat Enterprise Linux Server 7.9

To install requirements:

pip install -r requirements.txt

Installation guide

Download the code and dataset with the command:

git clone https://github.com/ninglab/TransferAct.git

Data Processing

1. Use provided processed dataset

One can use our provided processed dataset in ./data/pairs/: the dataset of pairs of processed balanced assays $\mathcal{P}$ . Check the details of bioassay selection, processing, and assay pair selection in our paper in Section 5.1.1 and Section 5.1.2, respectively. We provided our dataset of pairs as data/pairs.tar.gz compressed file. Please use tar to de-compress it.

2. Use own dataset

We provide necessary scripts in ./data/scripts/ with the processing steps in ./data/scripts/README.md.

Training

1. Running TAc

  • To run TAc-dmpn,
python code/train_aada.py --source_data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --alpha 1 --lamda 0 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance --mpn_shared
  • To run TAc-dmpna, add these arguments to the above command
--attn_dim 100 --aggregation self-attention --model aada_attention

source_data_path and target_data_path specify the path to the source and target assay CSV files of the pair, respectively. First line contains a header smiles,target. Each of the following lines are comma-separated with the SMILES in the 1st column and the 0/1 label in the 2nd column.

dataset_type specifies the type of task; always classification for this project.

extra_metrics specifies the list of evaluation metrics.

hidden_size specifies the dimension of the learned compound representation out of GNN-based feature generators.

depth specifies the number of message passing steps.

init_lr specifies the initial learning rate.

batch_size specifies the batch size.

ffn_hidden_size and ffn_num_layers specify the number of hidden units and layers, respectively, in the fully connected network used as the classifier.

epochs specifies the total number of epochs.

split_type specifies the type of data split.

crossval_index_file specifies the path to the index file which contains the indices of data points for train, validation and test split for each fold.

save_dir specifies the directory where the model, evaluation scores and predictions will be saved.

class_balance indicates whether to use class-balanced batches during training.

model specifies which model to use.

aggregation specifies which pooling mechanism to use to get the compound representation from the atom representations. Default set to mean: the atom-level representations from the message passing network are averaged over all atoms of a compound to yield the compound representation.

attn_dim specifies the dimension of the hidden layer in the 2-layer fully connected network used as the attention network.

Use python code/train_aada.py -h to check the meaning and default values of other parameters.

2. Running TAc-fc variants and ablations

  • To run Tac-fc,
python code/train_aada.py --source_data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --local_discriminator_hidden_size 100 --local_discriminator_num_layers 2 --global_discriminator_hidden_size 100 --global_discriminator_num_layers 2 --epochs 40 --alpha 1 --lamda 1 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance --mpn_shared
  • To run TAc-fc-dmpna, add these arguments to the above command
--attn_dim 100 --aggregation self-attention --model aada_attention
Ablations
  • To run TAc-f, add --exclude_global to the above command.
  • To run TAc-c, add --exclude_local to the above command.
  • Adding both --exclude_local and --exclude_global is equivalent to running TAc.

3. Running Baselines

DANN

python code/train_aada.py --source_data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --global_discriminator_hidden_size 100 --global_discriminator_num_layers 2 --epochs 40 --alpha 1 --lamda 1 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance --mpn_shared
  • To run DANN-dmpn, add --model dann to the above command.
  • To run DANN-dmpna, add --model dann_attention --attn_dim 100 --aggregation self-attention --model to the above command.

Run the following baselines from chemprop as follows:

FCN-morgan

python chemprop/train.py --data_path <assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --features_generator morgan --features_only --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

FCN-morganc

python chemprop/train.py --data_path <assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --features_generator morgan_count --features_only --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

FCN-dmpn

python chemprop/train.py --data_path <assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

FCN-dmpna

Add the following to the above command:

--model mpnn_attention --attn_dim 100 --aggregation self-attention

For the above baselines, data_path specifies the path to the target assay CSV file.

FCN-dmpn(DT)

python chemprop/train.py --data_path <source_assay_csv_file> --target_data_path <target_assay_csv_file> --dataset_type classification --extra_metrics prc-auc precision recall accuracy f1_score  --hidden_size 25 --depth 4 --init_lr 1e-3 --batch_size 10 --ffn_hidden_size 100 --ffn_num_layers 2 --epochs 40 --split_type index_predetermined --crossval_index_file <index_file> --save_dir <chkpt_dir> --class_balance

FCN-dmpna(DT)

--model mpnn_attention --attn_dim 100 --aggregation self-attention

For FCN-dmpn(DT)and FCN-dmpna(DT), data_path and target_data_path specify the path to the source and target assay CSV files.

Use python chemprop/train.py -h to check the meaning of other parameters.

Testing

  1. To predict the labels of the compounds in the test set for Tac*, DANN methods:

    python code/predict.py --test_path <test_csv_file> --checkpoint_dir <chkpt_dir> --preds_path <pred_file>

    test_path specifies the path to a CSV file containing a list of SMILES and ground-truth labels. First line contains a header smiles,target. Each of the following lines are comma-separated with the SMILES in the 1st column and the 0/1 label in the 2nd column.

    checkpoint_dir specifies the path to the checkpoint directory where the model checkpoint(s) .pt filles are saved (i.e., save_dir during training).

    preds_path specifies the path to a CSV file where the predictions will be saved.

  2. To predict the labels of the compounds in the test set for other methods:

    python chemprop/predict.py --test_path <test_csv_file> --checkpoint_dir <chkpt_dir> --preds_path <pred_file>
    

Compound Prioritization using dmpna:

Please refer to the README.md in the comprank directory.

Owner
NingLab
NingLab
Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

Nguyễn Trường Lâu 6 Aug 22, 2022
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 05, 2023
Our solution for SSN Invente 2021's Hackathon

Our solution for SSN Invente 2021's Hackathon. To help maitain godowns in a pristine and safe condition using raspberry pi.

1 Jan 12, 2022
moving object detection for satellite videos.

DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos Algorithm Introduction DSFNet: Dynamic and Static Fusion Net

xiaochao 39 Dec 16, 2022
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022
Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.

LILA LILA: Language-Informed Latent Actions Code and Experiments for Language-Informed Latent Actions (LILA), for using natural language to guide assi

Sidd Karamcheti 11 Nov 25, 2022
BboxToolkit is a tiny library of special bounding boxes.

BboxToolkit is a light codebase collecting some practical functions for the special-shape detection, such as oriented detection

jbwang1997 73 Jan 01, 2023
Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

cim-misspelling Pytorch implementation of Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence, CHIL 2022. This model (

Juyong Kim 11 Dec 19, 2022
Unofficial Implement PU-Transformer

PU-Transformer-pytorch Pytorch unofficial implementation of PU-Transformer (PU-Transformer: Point Cloud Upsampling Transformer) https://arxiv.org/abs/

Lee Hyung Jun 7 Sep 21, 2022
Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks Stable Neural ODE with Lyapunov-Stable Equilibrium

Kang Qiyu 8 Dec 12, 2022
keyframes-CNN-RNN(action recognition)

keyframes-CNN-RNN(action recognition) Environment: python=3.7 pytorch=1.2 Datasets: Following the format of UCF101 action recognition. Run steps: Mo

4 Feb 09, 2022
A Graph Neural Network Tool for Recovering Dense Sub-graphs in Random Dense Graphs.

PYGON A Graph Neural Network Tool for Recovering Dense Sub-graphs in Random Dense Graphs. Installation This code requires to install and run the graph

Yoram Louzoun's Lab 0 Jun 25, 2021
Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset Description This dataset contains the six different classes as described in our paper[]. The

0 Dec 16, 2021
Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation This reposi

First Person Vision @ Image Processing Laboratory - University of Catania 1 Aug 21, 2022
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates

Simon Niklaus 1.4k Dec 28, 2022
FLSim a flexible, standalone library written in PyTorch that simulates FL settings with a minimal, easy-to-use API

Federated Learning Simulator (FLSim) is a flexible, standalone core library that simulates FL settings with a minimal, easy-to-use API. FLSim is domain-agnostic and accommodates many use cases such a

Meta Research 162 Jan 02, 2023
ROS-UGV-Control-Interface - Control interface which can be used in any UGV

ROS-UGV-Control-Interface Cam Closed: Cam Opened:

Ahmet Fatih Akcan 1 Nov 04, 2022
Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

Geometrically Adaptive Dictionary Attack on Face Recognition This is the Pytorch code of our paper "Geometrically Adaptive Dictionary Attack on Face R

6 Nov 21, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021. Figure 1: In the process of motion capture (mocap), some joints or even the whole human

Shinny cui 3 Oct 31, 2022