Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Related tags

Deep LearningChIRo
Overview

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations

ScreenShot

This directory contains the model architectures and experimental setups used for ChIRo, SchNet, DimeNet++, and SphereNet on the four tasks considered in the preprint:

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations

These four tasks are:

  1. Contrastive learning to cluster conformers of different stereoisomers in a learned latent space
  2. Classification of chiral centers as R/S
  3. Classification of the sign (+/-; l/d) of rotated circularly polarized light
  4. Ranking enantiomers by their docking scores in an enantiosensitive protein pocket.

The exact data splits used for tasks (1), (2), and (4) can be downloaded from:

https://figshare.com/s/e23be65a884ce7fc8543

See the appendix of "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations" for details on how the datasets for task (3) were extracted and filtered from the commercial Reaxys database.


This directory is organized as follows:

  • Subdirectory model/ contains the implementation of ChIRo.

    • model/alpha_encoder.py contains the network architecture of ChIRo

    • model/embedding_functions.py contains the featurization of the input conformers (RDKit mol objects) for ChIRo.

    • model/datasets_samplers.py contains the Pytorch / Pytorch Geometric data samplers used for sampling conformers in each training batch.

    • model/train_functions.py and model/train_models.py contain supporting training/inference loops for each experiment with ChIRo.

    • model/optimization_functions.py contains the loss functions used in the experiments with ChIRo.

    • Subdirectory model/gnn_3D/ contains the implementations of SchNet, DimeNet++, and SphereNet used for each experiment.

      • model/gnn_3D/schnet.py contains the publicly available code for SchNet, with adaptations for readout.
      • model/gnn_3D/dimenet_pp.py contains the publicly available code for DimeNet++, with adaptations for readout.
      • model/gnn_3D/spherenet.py contains the publicly available code for SphereNet, with adaptations for readout.
      • model/gnn_3D/train_functions.py and model/gnn_3D/train_models.py contain the training/inference loops for each experiment with SchNet, DimeNet++, or SphereNet.
      • model/gnn_3D/optimization_functions.py contains the loss functions used in the experiments with SchNet, DimeNet++, or SphereNet.
  • Subdirectory params_files/ contains the hyperparameters used to define exact network initializations for ChIRo, SchNet, DimeNet++, and SphereNet for each experiment. The parameter .json files are specified with a random seed = 1, and the first fold of cross validation for the l/d classifcation task. For the experiments specified in the paper, we use random seeds = 1,2,3 when repeating experiments across three training/test trials.

  • Subdirectory training_scripts/ contains the python scripts to run each of the four experiments, for each of the four 3D models ChIRo, SchNet, DimeNet++, and SphereNet. Before running each experiment, move the corresponding training script to the parent directory.

  • Subdirectory hyperopt/ contains hyperparameter optimization scripts for ChIRo using Raytune.

  • Subdirectory experiment_analysis/ contains jupyter notebooks for analyzing results of each experiment.

  • Subdirectory paper_results/ contains the parameter files, model parameter dictionaries, and loss curves for each experiment reported in the paper.


To run each experiment, first create a conda environment with the following dependencies:

  • python = 3.8.6
  • pytorch = 1.7.0
  • torchaudio = 0.7.0
  • torchvision = 0.8.1
  • torch-geometric = 1.6.3
  • torch-cluster = 1.5.8
  • torch-scatter = 2.0.5
  • torch-sparce = 0.6.8
  • torch-spline-conv = 1.2.1
  • numpy = 1.19.2
  • pandas = 1.1.3
  • rdkit = 2020.09.4
  • scikit-learn = 0.23.2
  • matplotlib = 3.3.3
  • scipy = 1.5.2
  • sympy = 1.8
  • tqdm = 4.58.0

Then, download the datasets (with exact training/validation/test splits) from https://figshare.com/s/e23be65a884ce7fc8543 and place them in a new directory final_data_splits/

You may then run each experiment by calling:

python training_{experiment}_{model}.py params_files/params_{experiment}_{model}.json {path_to_results_directory}/

For instance, you can run the docking experiment for ChIRo with a random seed of 1 (editable in the params .json file) by calling:

python training_binary_ranking.py params_files/params_binary_ranking_ChIRo.json results_binary_ranking_ChIRo/

After training, this will create a results directory containing model checkpoints, best model parameter dictionaries, and results on the test set (if applicable).

Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra

850-Safra-DS-ModuloI Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra Para aprender mais Git https://learngitbranc

Brian Nunes 7 Dec 10, 2022
This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637 Dependencies The model depends on the foll

Jörg Encke 2 Oct 14, 2022
DeepStochlog Package For Python

DeepStochLog Installation Installing SWI Prolog DeepStochLog requires SWI Prolog to run. Run the following commands to install: sudo apt-add-repositor

KU Leuven Machine Learning Research Group 17 Dec 23, 2022
Predicting Student Attentiveness using OpenCV

Predicting-Student-Attentiveness-using-OpenCV The model will predict if a student is attentive or not through facial parameter received through the st

Johann Pinto 2 Aug 20, 2022
Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

Betty van Aken 2 Sep 20, 2022
Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

Milad Moradi 3 Sep 20, 2022
Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.

Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets.

beringresearch 285 Jan 04, 2023
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc

3.2k Jan 08, 2023
a minimal terminal with python 😎😉

Meterm a terminal with python 😎 How to use Clone Project: $ git clone https://github.com/motahharm/meterm.git Run: in Terminal: meterm.exe Or pip ins

Motahhar.Mokfi 5 Jan 28, 2022
This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

HealthGen: Conditional EHR Time Series Generation This repository contains the implementation of the HealthGen model, a generative model to synthesize

0 Jan 20, 2022
How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Deep Q-Learning Recommend papers The first step is to read and understand the method that you will implement. It was first introduced in a 2013 paper

1 Jan 25, 2022
Vehicles Counting using YOLOv4 + DeepSORT + Flask + Ngrok

A project for counting vehicles using YOLOv4 + DeepSORT + Flask + Ngrok

Duong Tran Thanh 37 Dec 16, 2022
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
Adversarially Learned Inference

Adversarially Learned Inference Code for the Adversarially Learned Inference paper. Compiling the paper locally From the repo's root directory, $ cd p

Mohamed Ishmael Belghazi 308 Sep 24, 2022
PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 🤖 PaddlePaddle Visual Transformers (PaddleViT or

1k Dec 28, 2022
A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

A python script to dump all the challenges locally of a CTFd-based Capture the Flag. Features Connects and logins to a remote CTFd instance. Dumps all

Podalirius 77 Dec 07, 2022
Implementation for paper: Self-Regulation for Semantic Segmentation

Self-Regulation for Semantic Segmentation This is the PyTorch implementation for paper Self-Regulation for Semantic Segmentation, ICCV 2021. Citing SR

Dong ZHANG 30 Nov 21, 2022
SimpleDepthEstimation - An unified codebase for NN-based monocular depth estimation methods

SimpleDepthEstimation Introduction This is an unified codebase for NN-based monocular depth estimation methods, the framework is based on detectron2 (

8 Dec 13, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation.

ERFNet This code is a toolbox that uses Torch library for training and evaluating the ERFNet architecture for semantic segmentation. NEW!! New PyTorch

Edu 104 Jan 05, 2023