Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Last update: Dec 05, 2022

Related tags

Deep Learning ChIRo

Overview

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations

This directory contains the model architectures and experimental setups used for ChIRo, SchNet, DimeNet++, and SphereNet on the four tasks considered in the preprint:

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations

These four tasks are:

Contrastive learning to cluster conformers of different stereoisomers in a learned latent space
Classification of chiral centers as R/S
Classification of the sign (+/-; l/d) of rotated circularly polarized light
Ranking enantiomers by their docking scores in an enantiosensitive protein pocket.

The exact data splits used for tasks (1), (2), and (4) can be downloaded from:

https://figshare.com/s/e23be65a884ce7fc8543

See the appendix of "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations" for details on how the datasets for task (3) were extracted and filtered from the commercial Reaxys database.

This directory is organized as follows:

Subdirectory model/ contains the implementation of ChIRo.
- model/alpha_encoder.py contains the network architecture of ChIRo
- model/embedding_functions.py contains the featurization of the input conformers (RDKit mol objects) for ChIRo.
- model/datasets_samplers.py contains the Pytorch / Pytorch Geometric data samplers used for sampling conformers in each training batch.
- model/train_functions.py and model/train_models.py contain supporting training/inference loops for each experiment with ChIRo.
- model/optimization_functions.py contains the loss functions used in the experiments with ChIRo.
- Subdirectory model/gnn_3D/ contains the implementations of SchNet, DimeNet++, and SphereNet used for each experiment.
  - model/gnn_3D/schnet.py contains the publicly available code for SchNet, with adaptations for readout.
  - model/gnn_3D/dimenet_pp.py contains the publicly available code for DimeNet++, with adaptations for readout.
  - model/gnn_3D/spherenet.py contains the publicly available code for SphereNet, with adaptations for readout.
  - model/gnn_3D/train_functions.py and model/gnn_3D/train_models.py contain the training/inference loops for each experiment with SchNet, DimeNet++, or SphereNet.
  - model/gnn_3D/optimization_functions.py contains the loss functions used in the experiments with SchNet, DimeNet++, or SphereNet.
Subdirectory params_files/ contains the hyperparameters used to define exact network initializations for ChIRo, SchNet, DimeNet++, and SphereNet for each experiment. The parameter .json files are specified with a random seed = 1, and the first fold of cross validation for the l/d classifcation task. For the experiments specified in the paper, we use random seeds = 1,2,3 when repeating experiments across three training/test trials.
Subdirectory training_scripts/ contains the python scripts to run each of the four experiments, for each of the four 3D models ChIRo, SchNet, DimeNet++, and SphereNet. Before running each experiment, move the corresponding training script to the parent directory.
Subdirectory hyperopt/ contains hyperparameter optimization scripts for ChIRo using Raytune.
Subdirectory experiment_analysis/ contains jupyter notebooks for analyzing results of each experiment.
Subdirectory paper_results/ contains the parameter files, model parameter dictionaries, and loss curves for each experiment reported in the paper.

To run each experiment, first create a conda environment with the following dependencies:

python = 3.8.6
pytorch = 1.7.0
torchaudio = 0.7.0
torchvision = 0.8.1
torch-geometric = 1.6.3
torch-cluster = 1.5.8
torch-scatter = 2.0.5
torch-sparce = 0.6.8
torch-spline-conv = 1.2.1
numpy = 1.19.2
pandas = 1.1.3
rdkit = 2020.09.4
scikit-learn = 0.23.2
matplotlib = 3.3.3
scipy = 1.5.2
sympy = 1.8
tqdm = 4.58.0

Then, download the datasets (with exact training/validation/test splits) from https://figshare.com/s/e23be65a884ce7fc8543 and place them in a new directory final_data_splits/

You may then run each experiment by calling:

python training_{experiment}_{model}.py params_files/params_{experiment}_{model}.json {path_to_results_directory}/

For instance, you can run the docking experiment for ChIRo with a random seed of 1 (editable in the params .json file) by calling:

python training_binary_ranking.py params_files/params_binary_ranking_ChIRo.json results_binary_ranking_ChIRo/

After training, this will create a results directory containing model checkpoints, best model parameter dictionaries, and results on the test set (if applicable).

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Related tags

Overview

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations

Owner

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

Riemannian Convex Potential Maps

ATAC: Adversarially Trained Actor Critic

UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.

This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

ByteTrack(Multi-Object Tracking by Associating Every Detection Box)のPythonでのONNX推論サンプル

SigOpt wrappers for scikit-learn methods

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

A sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Implementation of UNET architecture for Image Segmentation.

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Trainable PyTorch reproduction of AlphaFold 2

CCPD: a diverse and well-annotated dataset for license plate detection and recognition

OCR-D wrapper for detectron2 based segmentation models

Python interface for SmartRF Sniffer 2 Firmware