Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Overview

3D Infomax improves GNNs for Molecular Property Prediction

Video | Paper

We pre-train GNNs to understand the geometry of molecules given only their 2D molecular graph which they can use for better molecular property predictions. Below is a 3 step guide for how to use the code and how to reproduce our results. If you have questions, don't hesitate to open an issue or ask me via [email protected] or social media. I am happy to hear from you!

This repository additionally adapts different self-supervised learning methods to graphs such as "Bootstrap your own Latent", "Barlow Twins", or "VICReg".

Step 1: Setup Environment

We will set up the environment using Anaconda. Clone the current repo

git clone https://github.com/HannesStark/3DInfomax

Create a new environment with all required packages using environment.yml (this can take a while). While in the project directory run:

conda env create

Activate the environment

conda activate graphssl

Step 2: 3D Pre-train a model

Let's pre-train a GNN with 50 000 molecules and their structures from the QM9 dataset (you can also skip to Step 3 and use the pre-trained model weights provided in this repo). For other datasets see the Data section below.

python train.py --config=configs_clean/pre-train_QM9.yml

This will first create the processed data of dataset/QM9/qm9.csv with the 3D information in qm9_eV.npz. Then your model starts pre-training and all the logs are saved in the runs folder which will also contain the pre-trained model as best_checkpoint.pt that can later be loaded for fine-tuning.

You can start tensorboard and navigate to localhost:6006 in your browser to monitor the training process:

tensorboard --logdir=runs --port=6006

Explanation:

The config files in configs_clean provide additional examples and blueprints to train different models. The files always contain a model_type that should be pre-trained (2D network) and a model3d_type (3D network) where you can specify the parameters of these networks. To find out more about all the other parameters in the config file, have a look at their description by running python train.py --help.

Step 3: Fine-tune a model

During pre-training a directory is created in the runs directory that contains the pre-trained model. We provide an example of such a directory with already pre-trained weights runs/PNA_qmugs_NTXentMultiplePositives_620000_123_25-08_09-19-52 which we can fine-tune for predicting QM9's homo property as follows.

python train.py --config=configs_clean/tune_QM9_homo.yml

You can monitor the fine-tuning process on tensorboard as well and in the end the results will be printed to the console but also saved in the runs directory that was created for fine-tuning in the file evaluation_test.txt.

The model which we are fine-tuning from is specified in configs_clean/tune_QM9_homo.yml via the parameter:

pretrain_checkpoint: runs/PNA_qmugs_NTXentMultiplePositives_620000_123_25-08_09-19-52/best_checkpoint_35epochs.pt

Multiple seeds:

This is a second fine-tuning example where we predict non-quantum properties of the OGB datasets and train multiple seeds (we always use the seeds 1, 2, 3, 4, 5, 6 in our experiments):

python train.py --config=configs_clean/tune_freesolv.yml

After all runs are done, the averaged results are saved in the runs directory of each seed in the file multiple_seed_test_statistics.txt

Data

You can pre-train or fine-tune on different datasets by specifying the dataset: parameter in a .yml file such as dataset: drugs to use GEOM-Drugs.

The QM9 dataset and the OGB datasets are already provided with this repository. The QMugs and GEOM-Drugs datasets need to be downloaded and placed in the correct location.

GEOM-Drugs: Download GEOM-Drugs here ( the rdkit_folder.tar.gz file), unzip it, and place it into dataset/GEOM.

QMugs: Download QMugs here (the structures.tar and summary.csv files), unzip the structures.tar, and place the resulting structures folder and the summary.csv file into a new folder QMugs that you have to create NEXT TO the repository root. Not in the repository root (sorry for this).

Owner
Hannes Stärk
MIT Research Intern • Geometric DL + Graphs :heart: • M. Sc. Informatics from TU Munich
Hannes Stärk
Official implementation of DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations in TensorFlow 2

DreamerPro Official implementation of DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations in TensorFl

22 Nov 01, 2022
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos Official implementation for Multi-Modal Interaction Gr

Zongmeng Zhang 15 Oct 18, 2022
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

Space Invaders This is a simple SPACE INVADER game create using PYGAME whihc hav

Gaurav Pandey 2 Jan 08, 2022
A Unified Generative Framework for Various NER Subtasks.

This is the code for ACL-ICJNLP2021 paper A Unified Generative Framework for Various NER Subtasks. Install the package in the requirements.txt, then u

177 Jan 05, 2023
Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

DPGNN This repository is an official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Im

Yu Wang (Jack) 18 Oct 12, 2022
AITUS - An atomatic notr maker for CYTUS

AITUS an automatic note maker for CYTUS. 利用AI根据指定乐曲生成CYTUS游戏谱面。 效果展示:https://www

GradiusTwinbee 6 Feb 24, 2022
EfficientNetv2 TensorRT int8

EfficientNetv2_TensorRT_int8 EfficientNetv2模型实现来自https://github.com/d-li14/efficientnetv2.pytorch 环境配置 ubuntu:18.04 cuda:11.0 cudnn:8.0 tensorrt:7

34 Apr 24, 2022
FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

FusionNet_Pytorch FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics Requirements Pytorch 0.1.11 Pyt

Choi Gunho 102 Dec 13, 2022
A Python Package for Portfolio Optimization using the Critical Line Algorithm

PyCLA A Python Package for Portfolio Optimization using the Critical Line Algorithm Getting started To use PyCLA, clone the repo and install the requi

19 Oct 11, 2022
Implementation of CSRL from the AAAI2022 paper: Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

CSRL Implementation of CSRL from the AAAI2022 paper: Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning Python: 3

4 Apr 14, 2022
An open-source project for applying deep learning to medical scenarios

Auto Vaidya An open source solution for creating end-end web app for employing the power of deep learning in various clinical scenarios like implant d

Smaranjit Ghose 18 May 29, 2022
Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Optimizers Visualized Visualization of how different optimizers handle mathematical functions for optimization. Contents Installation Usage Functions

Gautam J 1 Jan 01, 2022
IJCAI2020 & IJCV 2020 :city_sunrise: Unsupervised Scene Adaptation with Memory Regularization in vivo

Seg_Uncertainty In this repo, we provide the code for the two papers, i.e., MRNet:Unsupervised Scene Adaptation with Memory Regularization in vivo, IJ

Zhedong Zheng 348 Jan 05, 2023
Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

NVIDIA Research Projects 4.8k Jan 09, 2023
DeepLab2: A TensorFlow Library for Deep Labeling

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.

Google Research 845 Jan 04, 2023
Python SDK for building, training, and deploying ML models

Overview of Kubeflow Fairing Kubeflow Fairing is a Python package that streamlines the process of building, training, and deploying machine learning (

Kubeflow 325 Dec 13, 2022
[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples This repository is the official implementation of "Tow

Sungyoon Lee 4 Jul 12, 2022
Auto HMM: Automatic Discrete and Continous HMM including Model selection

Auto HMM: Automatic Discrete and Continous HMM including Model selection

Chess_champion 29 Dec 07, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022