Replication attempt for the Protein Folding Model

Overview

RGN2-Replica (WIP)

To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding for particular use when no evolutionary homologs are available (ie. for protein design).

Install

$ pip install rgn2-replica

To load sample dataset

from datasets import load_from_disk
ds = load_from_disk("data/ur90_small")
print(ds['train'][0])

To convert to pandas for exploration

df = ds['train'].to_pandas()
df.sample(5)

To train ProteinLM

Run the following command with default parameters

python -m scripts.lmtrainer

This will start the run using sample dataset in repo directory on CPU.

TO-DO LIST: ordered by priority

  • Provide basic package and file structure

  • RGN2:

    • Contribute adaptation of RGN1 for different ops
      • Simple LSTM with:
        • Inputs (B, L, emb_dim)
        • Outputs (B, L, 4) (4 features which should be outputs of linear projections)
    • Find a good (and reproducible) training scheme
    • Benchmark regression vs classification of torsional alphabet
  • Language Model:

  • To be merged when first versions of RGN are ready:

    • Geometry module
    • Adapt functionality from MP-NeRF:
      • Sidechain building
      • Full backbone from CA
      • Fast loss functions and metrics
      • Modifications to convert LSTM cell into RGN cell
  • Contirbute trainer classes / functionality.

    • Sequence preprocessing for AminoBERT
      • inverted fragments
      • sequence masking
      • loss function wrapper v1 by @DrHB
      • Sample dataset by @gurvindersingh
      • Dataloder
      • ...
  • Contribute Data Infra for training:

  • Contribute Rosetta Scripts ( contact me by email ([email protected]) / discord to get a key for Rosetta if interested in doing this part. )

  • NOTES:

  • Use functionality provided in MP-NeRF wherever possible (avoid repetition).

Contribute:

Hey there! New ideas are welcome: open/close issues, fork the repo and share your code with a Pull Request.

Currently the main discussions / conversation about the model development is happening in this discord server under the /self-supervised-learning channel.

Clone this project to your computer:

git clone https://github.com/EricAlcaide/pysimplechain

Please, follow this guideline on open source contribtuion

Citations:

@article {Chowdhury2021.08.02.454840,
    author = {Chowdhury, Ratul and Bouatta, Nazim and Biswas, Surojit and Rochereau, Charlotte and Church, George M. and Sorger, Peter K. and AlQuraishi, Mohammed},
    title = {Single-sequence protein structure prediction using language models from deep learning},
    elocation-id = {2021.08.02.454840},
    year = {2021},
    doi = {10.1101/2021.08.02.454840},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/08/04/2021.08.02.454840},
    eprint = {https://www.biorxiv.org/content/early/2021/08/04/2021.08.02.454840.full.pdf},
    journal = {bioRxiv}
}

@article{alquraishi_2019,
	author={AlQuraishi, Mohammed},
	title={End-to-End Differentiable Learning of Protein Structure},
	volume={8},
	DOI={10.1016/j.cels.2019.03.006},
	URL={https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30076-6}
	number={4},
	journal={Cell Systems},
	year={2019},
	pages={292-301.e3}
Owner
Eric Alcaide
Y el mayor bien es pequeño; que toda la vida es sueño, y los sueños, sueños son.
Eric Alcaide
pytorch implementation for PointNet

PointNet.pytorch This repo is implementation for PointNet in pytorch. The model is in pointnet/model.py. It is teste

Fei Xia 1.7k Dec 30, 2022
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Aspuru-Guzik group repo 55 Nov 29, 2022
A Kaggle competition: discriminate gender based on handwriting

Gender discrimination based on handwriting See http://fastml.com/gender-discrimination/ for description. prep_data.py - a first step chunk_by_authors.

Zygmunt Zając 22 Jul 20, 2022
Official Pytorch implementation of MixMo framework

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Official PyTorch implementation of the MixMo framework | paper | docs Alexandr

79 Nov 07, 2022
DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

Vehicle Indicator Toolset Deep Learning and Computer Vision based indicator toolset for vehicle drivers using live dash-cam footages. Tracking of vehi

Alex Xu 12 Dec 28, 2021
PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)

OCT-GAN: Neural ODE-based Conditional Tabular GANs (OCT-GAN) Code for reproducing the experiments in the paper: Jayoung Kim*, Jinsung Jeon*, Jaehoon L

BigDyL 7 Dec 27, 2022
Code for "Universal inference meets random projections: a scalable test for log-concavity"

How to use this repository This repository contains code to replicate the results of "Universal inference meets random projections: a scalable test fo

Robin Dunn 0 Nov 21, 2021
Project to create an open-source 6 DoF input device

6DInputs A Project to create open-source 3D printed 6 DoF input devices Note the plural ('6DInputs' and 'devices') in the headings. We would like seve

RepRap Ltd 47 Jul 28, 2022
Pytorch library for fast transformer implementations

Transformers are very successful models that achieve state of the art performance in many natural language tasks

Idiap Research Institute 1.3k Dec 30, 2022
Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations Trevor Ablett, Daniel (Yifan) Zhai, Jonatha

STARS Laboratory 3 Feb 01, 2022
Hierarchical Few-Shot Generative Models

Hierarchical Few-Shot Generative Models Giorgio Giannone, Ole Winther This repo contains code and experiments for the paper Hierarchical Few-Shot Gene

Giorgio Giannone 6 Dec 12, 2022
Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

autolfads-tf2 A TensorFlow 2.0 implementation of LFADS and AutoLFADS. Installati

Systems Neural Engineering Lab 11 Oct 29, 2022
Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required

Sai Sathiesh 2 Aug 28, 2022
Example for AUAV 2022 with obstacle avoidance.

AUAV 2022 Sample This is a sample PX4 based quadrotor path planning framework based on Ubuntu 20.04 and ROS noetic for the IEEE Autonomous UAS 2022 co

James Goppert 11 Sep 16, 2022
Script utilizando OpenCV e modelo Machine Learning para detectar o uso de máscaras.

Reconhecendo máscaras Este repositório contém um script em Python3 que reconhece se um rosto está ou não portando uma máscara! O código utiliza da bib

Maria Eduarda de Azevedo Silva 168 Oct 20, 2022
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 05, 2022
PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

PyG 16.5k Jan 08, 2023
A curated list of awesome Active Learning

Awesome Active Learning 🤩 A curated list of awesome Active Learning ! 🤩 Background (image source: Settles, Burr) What is Active Learning? Active lea

BAI Fan 431 Jan 03, 2023
Code for "Unsupervised State Representation Learning in Atari"

Unsupervised State Representation Learning in Atari Ankesh Anand*, Evan Racah*, Sherjil Ozair*, Yoshua Bengio, Marc-Alexandre Côté, R Devon Hjelm This

Mila 217 Jan 03, 2023
Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

This repository contains tools to simulate the ground filtering process of a registered point cloud. The repository contains two filtering methods. The first method uses a normal vector, and fit to p

5 Aug 25, 2022