GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

Related tags

Deep LearningGeoMol
Overview

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles


This repository contains a method to generate 3D conformer ensembles directly from the molecular graph as described in our paper.

Requirements

  • python (version>=3.7.9)
  • pytorch (version>=1.7.0)
  • rdkit (version>=2020.03.2)
  • pytorch-geometric (version>=1.6.3)
  • networkx (version>=2.5.1)
  • pot (version>=0.7.0)

Installation

Data

Download and extract the GEOM dataset from the original source:

  1. wget https://dataverse.harvard.edu/api/access/datafile/4327252
  2. tar -xvf 4327252

Environment

Run make conda_env to create the conda environment. The script will request you to enter one of the supported CUDA versions listed here. The script uses this CUDA version to install PyTorch and PyTorch Geometric. Alternatively, you could manually follow the steps to install PyTorch Geometric here.

Usage

This should result in two different directories, one for each half of GEOM. You should place the qm9 conformers directory in the data/QM9/ directory and do the same for the drugs directory. This is all you need to train the model:

python train.py --data_dir data/QM9/qm9/ --split_path data/QM9/splits/split0.npy --log_dir ./test_run --n_epochs 250 --dataset qm9

Use the provided script to generate conformers. The test_csv arg should be a csv file with SMILES in the first column, and the number of conformers you want to generate in the second column. This will output a compressed dictionary of rdkit mols in the trained_model_dir directory (unless you provide the out arg):

python generate_confs.py --trained_model_dir trained_models/qm9/ --test_csv data/QM9/test_smiles.csv --dataset qm9

You can use the provided visualize_confs.ipynb jupyter notebook to visualize the generated conformers.

Additional comments

Training

To train the model, our code randomly samples files from the GEOM dataset and randomly samples conformers within those files. This is a lot of file I/O, which wasn't a huge issue for us when training, but could be an issue for others. If you're having issues with this, feel free to reach out, and I can help you reconfigure the code.

Some limitations

Currently, the model is hardcoded for atoms with a max of 4 neighbors. Since the dataset we train on didn't have atoms with more than 4 neighbors, we made this choice to speed up the code. In principle, the code can be adapted for something like a pentavalent phosphorus, but this wasn't a priority for us.

We can't deal with disconnected fragments (i.e. there is a "." in the SMILES).

This code will work poorly for macrocycles.

To ensure correct predictions, ALL tetrahedral chiral centers must be specified. There's probably a way to automate the specification of "rigid" chiral centers (e.g. in a fused ring), which I'll hopefully figure out soon, but I'm grad student with limited time :(

Feedback and collaboration

Code like this doesn't improve without feedback from the community. If you have comments/suggestions, please reach out to us! We're always happy to chat and provide input on how you can take this method to the next level.

code for paper -- "Seamless Satellite-image Synthesis"

Seamless Satellite-image Synthesis by Jialin Zhu and Tom Kelly. Project site. The code of our models borrows heavily from the BicycleGAN repository an

Light 14 Apr 05, 2022
Vision-Language Pre-training for Image Captioning and Question Answering

VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap

Luowei Zhou 373 Jan 03, 2023
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum 🤗 Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 842 Dec 30, 2022
RefineMask (CVPR 2021)

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features (CVPR 2021) This repo is the official implementation of RefineMask:

Gang Zhang 191 Jan 07, 2023
Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery Lorien is an infrastructure to massively explore/benchmark the best sc

Amazon Web Services - Labs 45 Dec 12, 2022
Public Models considered for emotion estimation from EEG

Emotion-EEG Set of models for emotion estimation from EEG. Composed by the combination of two deep-learing models learning together (RNN and CNN) with

Victor Delvigne 21 Dec 23, 2022
The dynamics of representation learning in shallow, non-linear autoencoders

The dynamics of representation learning in shallow, non-linear autoencoders The package is written in python and uses the pytorch implementation to ML

Maria Refinetti 4 Jun 08, 2022
Short and long time series classification using convolutional neural networks

time-series-classification Short and long time series classification via convolutional neural networks In this project, we present a novel framework f

35 Oct 22, 2022
Code for Temporally Abstract Partial Models

Code for Temporally Abstract Partial Models Accompanies the code for the experimental section of the paper: Temporally Abstract Partial Models, Khetar

DeepMind 19 Jul 13, 2022
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically. The collected data will then be used to train a deep neural network that can

Martin Valchev 3 Apr 24, 2022
HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t

HTSeq 57 Dec 20, 2022
Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

Hicham EL BOUKKOURI 31 Dec 05, 2022
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022
TRACER: Extreme Attention Guided Salient Object Tracing Network implementation in PyTorch

TRACER: Extreme Attention Guided Salient Object Tracing Network This paper was accepted at AAAI 2022 SA poster session. Datasets All datasets are avai

Karel 118 Dec 29, 2022
Kaggleship: Kaggle Notebooks

Kaggleship: Kaggle Notebooks This repository contains my Kaggle notebooks. They are generally about data science, machine learning, and deep learning.

Erfan Sobhaei 1 Jan 25, 2022
A PyTorch implementation of a Factorization Machine module in cython.

fmpytorch A library for factorization machines in pytorch. A factorization machine is like a linear model, except multiplicative interaction terms bet

Jack Hessel 167 Jul 06, 2022
Volsdf - Volume Rendering of Neural Implicit Surfaces

Volume Rendering of Neural Implicit Surfaces Project Page | Paper | Data This re

Lior Yariv 221 Jan 07, 2023
CS50's Introduction to Artificial Intelligence Test Scripts

CS50's Introduction to Artificial Intelligence Test Scripts 🤷‍♂️ What's this? 🤷‍♀️ This repository contains Python scripts to automate tests for mos

Jet Kan 2 Dec 28, 2022