MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Related tags

Deep LearningMolRep
Overview

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

Summary

MolRep is a Python package for fairly measuring algorithmic progress on chemical property datasets. It currently provides a complete re-evaluation of 16 state-of-the-art deep representation models over 16 benchmark property datsaets.

architecture

If you found this package useful, please cite biorxiv for now:


Install & Usage

We provide a script to install the environment. You will need the conda package manager, which can be installed from here.

To install the required packages, follow there instructions (tested on a linux terminal):

  1. clone the repository

    git clone https://github.com/Jh-SYSU/MolRep

  2. cd into the cloned directory

    cd MolRep

  3. run the install script

    source install.sh [<your_cuda_version>]

Where <your_cuda_version> is an optional argument that can be either cpu, cu92, cu100, cu101. If you do not provide a cuda version, the script will default to cpu. The script will create a virtual environment named MolRep, with all the required packages needed to run our code. Important: do NOT run this command using bash instead of source!

Data

Data could be download from Google_Driver

Current Dataset

Dataset Task Task type #Molecule Splits Metric Reference
QM7 1 Regression 7160 Stratified MAE Wu et al.
QM8 12 Regression 21786 Random MAE Wu et al.
QM9 12 Regression 133885 Random MAE Wu et al.
ESOL 1 Regression 1128 Random RMSE Wu et al.
FreeSolv 1 Regression 642 Random RMSE Wu et al.
Lipophilicity 1 Regression 4200 Random RMSE Wu et al.
BBBP 1 Classification 2039 Scaffold ROC-AUC Wu et al.
Tox21 12 Classification 7831 Random ROC-AUC Wu et al.
SIDER 27 Classification 1427 Random ROC-AUC Wu et al.
ClinTox 2 Classification 1478 Random ROC-AUC Wu et al.
Liver injury 1 Classification 2788 Random ROC-AUC Xu et al.
Mutagenesis 1 Classification 6511 Random ROC-AUC Hansen et al.
hERG 1 Classification 4813 Random ROC-AUC Li et al.
MUV 17 Classification 93087 Random PRC-AUC Wu et al.
HIV 1 Classification 41127 Random ROC-AUC Wu et al.
BACE 1 Classification 1513 Random ROC-AUC Wu et al.

Methods

Current Methods

Self-/unsupervised Models

Methods Descriptions Reference
Mol2Vec Mol2Vec is an unsupervised approach to learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Jaeger et al.
N-Gram graph N-gram graph is a simple unsupervised representation for molecules that first embeds the vertices in the molecule graph and then constructs a compact representation for the graph by assembling the ver-tex embeddings in short walks in the graph. Liu et al.
FP2Vec FP2Vec is a molecular featurizer that represents a chemical compound as a set of trainable embedding vectors and combine with CNN model. Jeon et al.
VAE VAE is a framework for training two neural networks (encoder and decoder) to learn a mapping from high-dimensional molecular representation into a lower-dimensional space. Kingma et al.

Sequence Models

Methods Descriptions Reference
BiLSTM BiLSTM is an artificial recurrent neural network (RNN) architecture to encoding sequences from compound SMILES strings. Hochreiter et al.
SALSTM SALSTM is a self-attention mechanism with improved BiLSTM for molecule representation. Zheng et al
Transformer Transformer is a network based solely on attention mechanisms and dispensing with recurrence and convolutions entirely to encodes compound SMILES strings. Vaswani et al.
MAT MAT is a molecule attention transformer utilized inter-atomic distances and the molecular graph structure to augment the attention mechanism. Maziarka et al.

Graph Models

Methods Descriptions Reference
DGCNN DGCNN is a deep graph convolutional neural network that proposes a graph convolution model with SortPooling layer which sorts graph vertices in a consistent order to learning the embedding of molec-ular graph. Zhang et al.
GraphSAGE GraphSAGE is a framework for inductive representation learning on molecular graphs that used to generate low-dimensional representations for atoms and performs sum, mean or max-pooling neigh-borhood aggregation to updates the atom representation and molecular representation. Hamilton et al.
GIN GIN is the Graph Isomorphism Network that builds upon the limitations of GraphSAGE to capture different graph structures with the Weisfeiler-Lehman graph isomorphism test. Xu et al.
ECC ECC is an Edge-Conditioned Convolution Network that learns a different parameter for each edge label (bond type) on the molecular graph, and neighbor aggregation is weighted according to specific edge parameters. Simonovsky et al.
DiffPool DiffPool combines a differentiable graph encoder with its an adaptive pooling mechanism that col-lapses nodes on the basis of a supervised criterion to learning the representation of molecular graphs. Ying et al.
MPNN MPNN is a message-passing graph neural network that learns the representation of compound molecular graph. It mainly focused on obtaining effective vertices (atoms) embedding Gilmer et al.
D-MPNN DMPNN is another message-passing graph neural network that messages associated with directed edges (bonds) rather than those with vertices. It can make use of the bond attributes. Yang et al.
CMPNN CMPNN is the graph neural network that improve the molecular graph embedding by strengthening the message interactions between edges (bonds) and nodes (atoms). Song et al.

Training

To train a model by K-fold, run 5-fold-training_example.ipynb.

Testing

To test a pretrained model, run testing-example.ipynb.

Results

Results on Classification Tasks.

Datasets BBBP Tox21 SIDER ClinTox MUV HIV BACE
Mol2Vec 0.9213±0.0052 0.8139±0.0081 0.6043±0.0061 0.8572±0.0054 0.1178±0.0032 0.8413±0.0047 0.8284±0.0023
N-Gram graph 0.9012±0.0385 0.8371±0.0421 0.6482±0.0437 0.8753±0.0077 0.1011±0.0000 0.8378±0.0034 0.8472±0.0057
FP2Vec 0.8076±0.0032 0.8578±0.0076 0.6678±0.0068 0.8834±0.0432 0.0856±0.0031 0.7894±0.0052 0.8129±0.0492
VAE 0.8378±0.0031 0.8315±0.0382 0.6493±0.0762 0.8674±0.0124 0.0794±0.0001 0.8109±0.0381 0.8368±0.0762
BiLSTM 0.8391±0.0032 0.8279±0.0098 0.6092±0.0303 0.8319±0.0120 0.0382±0.0000 0.7962±0.0098 0.8263±0.0031
SALSTM 0.8482±0.0329 0.8253±0.0031 0.6308±0.0036 0.8317±0.0003 0.0409±0.0000 0.8034±0.0128 0.8348±0.0019
Transformer 0.9610±0.0119 0.8129±0.0013 0.6017±0.0012 0.8572±0.0032 0.0716±0.0017 0.8372±0.0314 0.8407±0.0738
MAT 0.9620±0.0392 0.8393±0.0039 0.6276±0.0029 0.8777±0.0149 0.0913±0.0001 0.8653±0.0054 0.8519±0.0504
DGCNN 0.9311±0.0434 0.7992±0.0057 0.6007±0.0053 0.8302±0.0126 0.0438±0.0000 0.8297±0.0038 0.8361±0.0034
GraphSAGE 0.9630±0.0474 0.8166±0.0041 0.6403±0.0045 0.9116±0.0146 0.1145±0.0000 0.8705±0.0724 0.9316±0.0360
GIN 0.8746±0.0359 0.8178±0.0031 0.5904±0.0000 0.8842±0.0004 0.0832±0.0000 0.8015±0.0328 0.8275±0.0034
ECC 0.9620±0.0003 0.8677±0.0090 0.6750±0.0092 0.8862±0.0831 0.1308±0.0013 0.8733±0.0025 0.8419±0.0092
DiffPool 0.8732±0.0391 0.8012±0.0130 0.6087±0.0130 0.8345±0.0233 0.0934±0.0001 0.8452±0.0042 0.8592±0.0391
MPNN 0.9321±0.0312 0.8440±0.014 0.6313±0.0121 0.8414±0.0294 0.0572±0.0001 0.8032±0.0092 0.8493±0.0013
DMPNN 0.9562±0.0070 0.8429±0.0391 0.6378±0.0329 0.8692±0.0051 0.0867±0.0032 0.8137±0.0072 0.8678±0.0372
CMPNN 0.9854±0.0215 0.8593±0.0088 0.6581±0.0020 0.9169±0.0065 0.1435±0.0002 0.8687±0.0003 0.8932±0.0019

More results will be updated soon.

Owner
AI-Health @NSCC-gz
AI-Health @NSCC-gz
Deep motion generator collections

GenMotion GenMotion (/gen’motion/) is a Python library for making skeletal animations. It enables easy dataset loading and experiment sharing for synt

23 May 24, 2022
Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021

ATISS: Autoregressive Transformers for Indoor Scene Synthesis This repository contains the code that accompanies our paper ATISS: Autoregressive Trans

138 Dec 22, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

<a href=[email protected]"> 55 Jan 03, 2023
A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Emma 1 Jan 18, 2022
GNN4Traffic - This is the repository for the collection of Graph Neural Network for Traffic Forecasting

GNN4Traffic - This is the repository for the collection of Graph Neural Network for Traffic Forecasting

564 Jan 02, 2023
A TensorFlow implementation of the Mnemonic Descent Method.

MDM A Tensorflow implementation of the Mnemonic Descent Method. Mnemonic Descent Method: A recurrent process applied for end-to-end face alignment G.

123 Oct 07, 2022
Project for tracking occupancy in Tel-Aviv parking lots.

Ahuzat Dibuk - Tracking occupancy in Tel-Aviv parking lots main.py This module was set-up to be executed on Google Cloud Platform. I run it every 15 m

Geva Kipper 35 Nov 22, 2022
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution(max side is equal to 640 an

DeepCam Shenzhen 1.4k Jan 07, 2023
A practical ML pipeline for data labeling with experiment tracking using DVC.

Auto Label Pipeline A practical ML pipeline for data labeling with experiment tracking using DVC Goals: Demonstrate reproducible ML Use DVC to build a

Todd Cook 4 Mar 08, 2022
Tensorflow-Project-Template - A best practice for tensorflow project template architecture.

Tensorflow Project Template A simple and well designed structure is essential for any Deep Learning project, so after a lot of practice and contributi

Mahmoud G. Salem 3.6k Dec 22, 2022
Robust & Reliable Route Recommendation on Road Networks

NeuroMLR: Robust & Reliable Route Recommendation on Road Networks This repository is the official implementation of NeuroMLR: Robust & Reliable Route

4 Dec 20, 2022
A Flexible Generative Framework for Graph-based Semi-supervised Learning (NeurIPS 2019)

G3NN This repo provides a pytorch implementation for the 4 instantiations of the flexible generative framework as described in the following paper: A

Jiaqi Ma 14 Oct 11, 2022
tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

tf2onnx converts TensorFlow (tf-1.x or tf-2.x), tf.keras and tflite models to ONNX via command line or python api.

Open Neural Network Exchange 1.8k Jan 08, 2023
CondenseNet V2: Sparse Feature Reactivation for Deep Networks

CondenseNetV2 This repository is the official Pytorch implementation for "CondenseNet V2: Sparse Feature Reactivation for Deep Networks" paper by Le Y

Haojun Jiang 74 Dec 12, 2022
Implementation of Change-Based Exploration Transfer (C-BET)

Implementation of Change-Based Exploration Transfer (C-BET), as presented in Interesting Object, Curious Agent: Learning Task-Agnostic Exploration.

Simone Parisi 29 Dec 04, 2022
Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

Microsoft 11 Oct 20, 2022
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 04, 2023
A simple Python library for stochastic graphical ecological models

What is Viridicle? Viridicle is a library for simulating stochastic graphical ecological models. It implements the continuous time models described in

Theorem Engine 0 Dec 04, 2021
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021