Codebase of deep learning models for inferring stability of mRNA molecules

Overview

Kaggle OpenVaccine Models

Codebase of deep learning models for inferring stability of mRNA molecules, corresponding to the Kaggle Open Vaccine Challenge and accompanying manuscript "Predictive models of RNA degradation through dual crowdsourcing", Wayment-Steele et al (2021) (full citation when available).

Models contained here are:

"Nullrecurrent": A reconstruction of winning solution by Jiayang Gao. Link to original notebooks provided below.

"DegScore-XGBoost": A model based the original DegScore model and XGBoost.

NB on other historic names for models

  • The Nullrecurrent model was called "OV" model in some instances and the .h5 model files for the Nullrecurrent model are labeled "ov".

  • The DegScore-XGBoost model was called the "BT" model in Eterna analysis.

Organization

scripts: Python scripts to perform inference.

notebooks: Python notebooks to perform inference.

model_files: Store .h5 model files used at inference time.

data: Data corresponding to Kaggle challenge and to subsequent tests on mRNAs.

data/Kaggle_RYOS_data

This directory contains training set and test sets in .csv and in .json form.

Kaggle_RYOS_trainset_prediction_output_Sep2021.txt contains predictions from the Nullrecurrent code in this repository.

Model MCRMSEs were evaluated by uploading submissions to the Kaggle competition website at https://www.kaggle.com/c/stanford-covid-vaccine.

data/mRNA_233x_data

This directory contains original data and scripts to reproduce model analysis from manuscript.

Because all the original formats are slightly different, the reformat_*.py scripts read in the original formats and reformats them in two forms for each prediction: "FULL" and "PCR" in the directory formatted_predictions.

"FULL" is per-nucleotide predictions for all the nucleotides. "PCR" has had the regions outside the RT-PCR sequencing set to NaN.

python collate_predictions.py reads in all the data and outputs all_predictions_233x.csv

RegenerateFigure5.ipynb reproduces the final scatterplot comparisons.

posthoc_code_predictions contains predictions from the Nullrecurrent code model contained in this repository. To generate these predictions use the sequence file in the mRNA_233x_data folder and run the following command(s):

python scripts/nullrecurrent_inference.py -d deg_Mg_pH10 -i 233_sequences.txt -o 233x_nullrecurrent_output_Oct2021_deg_Mg_50C.txt,

etc.

Dependencies

Install via pip install requirements.txt or conda install --file requirements.txt.

Not pip-installable: EternaFold, Vienna, and Arnie, see below.

Setup

  1. Install git-lfs (best to do before git-cloning this KaggleOpenVaccine repo).

  2. Install EternaFold (the nullrecurrent model uses this), available for free noncommercial use here.

  3. Install ViennaRNA (the DegScore-XGBoost model uses this), available here.

  4. Git clone Arnie, which wraps EternaFold in python and allows RNA thermodynamic calculations across many packages. Follow instructions here to link EternaFold to it.

  5. Add path to this repository as KOV_PATH (so that script can find path to stored model files):

export KOV_PATH='/path/to/KaggleOpenVaccine'

Usage

To run the nullrecurrent winning solution on one construct, given in example.txt:

CGC

Run

python scripts/nullrecurrent_inference.py [-d deg] -i example.txt -o predict.txt

where the deg is one of the following options

deg_Mg_pH10
deg_pH10
deg_Mg_50C
deg_50C

Similarly, for the DegScore-XGBoost model :

python scripts/degscore-xgboost_inference.py -i example.txt -o predict.txt

This write a text file of output predictions to predict.txt:

(Nullrecurrent output)

2.1289976365, 2.650808962, 2.1869660805000004

(DegScore-XGBoost output)

0.2697107, 0.37091506, 0.48528114

A note on energy model versions

The predictions in the Kaggle competition and for the manuscript were performed with EternaFold parameters and CONTRAfold-SE code. The currently available EternaFold code will result in slightly different values. For more on the difference, see the EternaFold README.

Individual Kaggle Solutions

This code is based on the winning solution for the Open Vaccine Kaggle Competition Challenge. The competition can be found here:

https://www.kaggle.com/c/stanford-covid-vaccine/overview

This code is also the supplementary material for the Kaggle Competition Solution Paper. The individual Kaggle writeups for the top solutions that have been featured in that paper can be found in the following table:

Team Name Team Members Rank Link to the solution
Nullrecurrent Jiayang Gao 1 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189620
Kazuki ** 2 Kazuki Onodera, Kazuki Fujikawa 2 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189709
Striderl Hanfei Mao 3 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189574
FromTheWheel & Dyed & StoneShop Gilles Vandewiele, Michele Tinti, Bram Steenwinckel 4 https://www.kaggle.com/group16/covid-19-mrna-4th-place-solution
tito Takuya Ito 5 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189691
nyanp Taiga Noumi 6 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189241
One architecture Shujun He 7 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189564
ishikei Keiichiro Ishi 8 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/190314
Keep going to be GM Youhan Lee 9 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189845
Social Distancing Please Fatih Öztürk,Anthony Chiu,Emin Ozturk 11 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189571
The Machine Karim Amer,Mohamed Fares 13 https://www.kaggle.com/c/stanford-covid-vaccine/discussion/189585
You might also like...
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Codebase for "ProtoAttend: Attention-Based Prototypical Learning." Authors: Sercan O. Arik and Tomas Pfister Paper: Sercan O. Arik and Tomas Pfister,

Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.
Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

Spearmint Bayesian optimization codebase

Spearmint Spearmint is a software package to perform Bayesian optimization. The Software is designed to automatically run experiments (thus the code n

A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art methods on major benchmarks like KITTI(ViP) and nuScenes(CBGS).

Comments
  • HW edits

    HW edits

    Changes:

    Remove hardcoded paths in scripts

    Remove tmp csv output files for nullrecurrent

    Rename to reflect model naming in paper "nullrecurrent"

    Reorganize example inputs and outputs

    Update README

    Add requirements file

    opened by HWaymentSteele 0
Releases(v1.0)
  • v1.0(Sep 30, 2022)

Owner
Eternagame
Eternagame
KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021) This is an official implementation of the AAAI-2021 paper "KGDet: Keypoint-Guided Fashion Detecti

Qian Shenhan 35 Dec 29, 2022
Code for paper "Context-self contrastive pretraining for crop type semantic segmentation"

Code for paper "Context-self contrastive pretraining for crop type semantic segmentation" Setting up a python environment Follow the instruction in ht

Michael Tarasiou 11 Oct 09, 2022
This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

A Workbook for the Qiskit Developer Certification Exam Hello everyone! This is Bartu, a fellow Qiskitter. I have recently taken the Certification exam

Bartu Bisgin 66 Dec 10, 2022
Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation

Tiny-NewsRec The source codes for our paper "Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation". Requirements PyTorch == 1.6.0 Tensor

Yang Yu 3 Dec 07, 2022
IAUnet: Global Context-Aware Feature Learning for Person Re-Identification

IAUnet This repository contains the code for the paper: IAUnet: Global Context-Aware Feature Learning for Person Re-Identification Ruibing Hou, Bingpe

30 Jul 14, 2022
Face Alignment using python

Face Alignment Face Alignment using python Input Image Aligned Face Aligned Face Aligned Face Input Image Aligned Face Input Image Aligned Face Instal

Sajjad Aemmi 28 Nov 23, 2022
PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

Mitch Hill 32 Nov 25, 2022
Two-stage CenterNet

Probabilistic two-stage detection Two-stage object detectors that use class-agnostic one-stage detectors as the proposal network. Probabilistic two-st

Xingyi Zhou 1.1k Jan 03, 2023
YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

YOLOX CondInst -- YOLOX 实例分割 前言 本项目是自己学习实例分割时,复现的代码. 通过自己编程,让自己对实例分割有更进一步的了解。 若想

DDGRCF 16 Nov 18, 2022
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the momen

ChemEngAI 40 Dec 27, 2022
A big endian Gentoo port developed on a Pine64.org RockPro64

Gentoo-aarch64_be A big endian Gentoo port developed on a Pine64.org RockPro64 The endian wars are over... little endian won. As a result, it is incre

Rory Bolt 6 Dec 07, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 06, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

DER.ClassIL.Pytorch This repo is the official implementation of DER: Dynamically Expandable Representation for Class Incremental Learning (CVPR 2021)

rhyssiyan 108 Jan 01, 2023
PPO Lagrangian in JAX

PPO Lagrangian in JAX This repository implements PPO in JAX. Implementation is tested on the safety-gym benchmark. Usage Install dependencies using th

Karush Suri 2 Sep 14, 2022
Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

DeepStock Technical experimentations to beat the stock market using deep learning. Experimentations Deep Learning Stock Prediction with Daily News Hea

Keon 449 Dec 29, 2022
Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)

Streamlit Tutorials Install pip install streamlit Run cd [directory] streamlit run app.py --server.address 0.0.0.0 --server.port [your port] # http:/

Jihye Back 30 Jan 06, 2023
a Lightweight library for sequential learning agents, including reinforcement learning

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning) TL;DR salina is a lightweight library

Facebook Research 405 Dec 17, 2022
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

David Mascharka 351 Nov 18, 2022
PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Pytorch implementation of VQVAE. This paper combines 2 tricks: Vector Quantization (check out this amazing blog for better understanding.) Straight-Th

Vrushank Changawala 2 Oct 06, 2021