EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Overview

EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Paper on arXiv

EquiBind, is a SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand’s bound pose and orientation. EquiBind achieves significant speed-ups and better quality compared to traditional and recent baselines. If you have questions, don't hesitate to open an issue or ask me via [email protected] or social media or Octavian Ganea via [email protected]. We are happy to hear from you!

Dataset

Our preprocessed data (see dataset section in the paper Appendix) is available from zenodo.
The files in data contain the names for the time-based data split.

If you want to train one of our models with the data then:

  1. download it from zenodo
  2. unzip the directory and place it into data such that you have the path data/PDBBind

Use provided model weights to predict binding structure of your own protein-ligand pairs:

Step 1: What you need as input

Ligand files of the formats .mol2 or .sdf or .pdbqt or .pdb.
Receptor files of the format .pdb
For each complex you want to predict you need a directory containing the ligand and receptor file. Like this:

my_data_folder
└───name1
    │   name1_protein.pdb
    │   name1_ligand.sdf
└───name2
    │   name2_protein.pdb
    │   name2_ligand.sdf
...

Step 2: Setup Environment

We will set up the environment using Anaconda. Clone the current repo

git clone https://github.com/HannesStark/EquiBind

Create a new environment with all required packages using environment.yml (this can take a while). While in the project directory run:

conda env create

Activate the environment

conda activate equibind

Here are the requirements themselves if you want to install them manually instead of using the environment.yml:

python=3.7
pytorch 1.10
torchvision
cudatoolkit=10.2
torchaudio
dgl-cuda10.2
rdkit
openbabel
biopython
rdkit
biopandas
pot
dgllife
joblib
pyaml
icecream
matplotlib
tensorboard

Step 3: Predict Binding Structures!

In the config file configs_clean/inference.yml set the path to your input data folder inference_path: path_to/my_data_folder.
Then run:

python inference.py --config=configs_clean/inference.yml

Done! 🎉
Your results are saved as .sdf files in the directory specified in the config file under output_directory: 'data/results/output' and as tensors at runs/flexible_self_docking/predictions_RDKitFalse.pt!

Reproducing paper numbers

Download the data and place it as described in the "Dataset" section above.

Using the provided model weights

To predict binding structures using the provided model weights run:

python inference.py --config=configs_clean/inference_file_for_reproduce.yml

This will give you the results of EquiBind-U and then those of EquiBind after running the fast ligand point cloud fitting corrections.
The numbers are a bit better than what is reported in the paper. We will put the improved numbers into the next update of the paper.

Training a model yourself and using those weights

To train the model yourself, run:

python train.py --config=configs_clean/RDKitCoords_flexible_self_docking.yml

The model weights are saved in the runs directory.
You can also start a tensorboard server tensorboard --logdir=runs and watch the model train.
To evaluate the model on the test set, change the run_dirs: entry of the config file inference_file_for_reproduce.yml to point to the directory produced in runs. Then you can runpython inference.py --config=configs_clean/inference_file_for_reproduce.yml as above!

Reference

📃 Paper on arXiv

@misc{stark2022equibind,
      title={EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction}, 
      author={Hannes Stärk and Octavian-Eugen Ganea and Lagnajit Pattanaik and Regina Barzilay and Tommi Jaakkola},
      year={2022}
}
Owner
Hannes Stärk
MIT Research Intern • Geometric DL + Graphs :heart: • M. Sc. Informatics from TU Munich
Hannes Stärk
Simple node deletion tool for onnx.

snd4onnx Simple node deletion tool for onnx. I only test very miscellaneous and limited patterns as a hobby. There are probably a large number of bugs

Katsuya Hyodo 6 May 15, 2022
This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

CPM 项目描述 CPM(Chinese Pretrained Models)模型是北京智源人工智能研究院和清华大学发布的中文大规模预训练模型。官方发布了三种规模的模型,参数量分别为109M、334M、2.6B,用户需申请与通过审核,方可下载。 由于原项目需要考虑大模型的训练和使用,需要安装较为复杂

hzwer 190 Jan 08, 2023
Reinforcement learning framework and algorithms implemented in PyTorch.

Reinforcement learning framework and algorithms implemented in PyTorch.

Robotic AI & Learning Lab Berkeley 2.1k Jan 04, 2023
Negative Interactions for Improved Collaborative Filtering:

Negative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher This notebook provides an implementation in Python 3 of the alg

Harald Steck 21 Mar 05, 2022
Erpnext app for make employee salary on payroll entry based on one or more project with percentage for all project equal 100 %

Project Payroll this app for make payroll for employee based on projects like project on 30 % and project 2 70 % as account dimension it makes genral

Ibrahim Morghim 8 Jan 02, 2023
GNEE - GAT Neural Event Embeddings

GNEE - GAT Neural Event Embeddings This repository contains source code for the GNEE (GAT Neural Event Embeddings) method introduced in the paper: "Se

João Pedro Rodrigues Mattos 0 Sep 15, 2021
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet an

QIMP team 30 Jan 01, 2023
Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Implicit-Semantic-Response-Alignment Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation" Prerequisites pyt

4 Dec 19, 2022
SwinTrack: A Simple and Strong Baseline for Transformer Tracking

SwinTrack This is the official repo for SwinTrack. A Simple and Strong Baseline Prerequisites Environment conda (recommended) conda create -y -n SwinT

LitingLin 196 Jan 04, 2023
Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation

Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation Experiment Setting: CIFAR10 (downloaded and saved in ./DATA

John Seon Keun Yi 38 Dec 27, 2022
Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [Arxiv] [Paper] As acquiring pixel-wise an

Lukas Hoyer 305 Dec 29, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 33k Dec 28, 2022
Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

Aleksei Petrenko 191 Dec 23, 2022
Rayvens makes it possible for data scientists to access hundreds of data services within Ray with little effort.

Rayvens augments Ray with events. With Rayvens, Ray applications can subscribe to event streams, process and produce events. Rayvens leverages Apache

CodeFlare 32 Dec 25, 2022
A Human-in-the-Loop workflow for creating HD images from text

A Human-in-the-Loop? workflow for creating HD images from text DALL·E Flow is an interactive workflow for generating high-definition images from text

Jina AI 2.5k Jan 02, 2023
Learning Representations that Support Robust Transfer of Predictors

Transfer Risk Minimization (TRM) Code for Learning Representations that Support Robust Transfer of Predictors Prepare the Datasets Preprocess the Scen

Yilun Xu 15 Dec 07, 2022
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Implementation of BI-RADS-BERT & The Advantages of Section Tokenization.

BI-RADS BERT Implementation of BI-RADS-BERT & The Advantages of Section Tokenization. This implementation could be used on other radiology in house co

1 May 17, 2022
Temporal-Relational CrossTransformers

Temporal-Relational Cross-Transformers (TRX) This repo contains code for the method introduced in the paper: Temporal-Relational CrossTransformers for

83 Dec 12, 2022
Barbershop: GAN-based Image Compositing using Segmentation Masks (SIGGRAPH Asia 2021)

Barbershop: GAN-based Image Compositing using Segmentation Masks Barbershop: GAN-based Image Compositing using Segmentation Masks Peihao Zhu, Rameen A

Peihao Zhu 928 Dec 30, 2022