Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Last update: Jun 27, 2022

Related tags

Overview

B4PPI

Benchmarking Pipeline for the Prediction of Protein-Protein Interactions

How this benchmarking pipeline has been built, and how to use it, is detailed in our preprint here (please cite it if you find this work useful!).

A minimal example is available here, and the list of requirements there.

How to use the gold standard

All the data files are in data, most of them are available as csv (sep='|') and pickled pandas DataFrames (sometimes the csv file may be missing due to file size constraints on GitHub).

The gold standard, without pre-processed features, can be loaded using:

goldStandard = pd.read_csv(
    os.path.join('data', 'benchmarkingGS_v1-0.csv'),
    sep='|'
)

Or with the pre-processed features:

goldStandard_with_featuresSeq = pd.read_pickle(
    os.path.join('data', 'benchmarkingGS_v1-0_similarityMeasure_sequence_v3-1.pkl')
)

UniProtIDs are used for both proteins A and B.
isInteraction is the ground truth from the IntAct database (1 = interacting proteins, 0 = non-interacting proteins).
trainTest is the split between training set (train), first testing set T1 (test1) and second testing set T2 (test2).
Pre-processed features are explained in the manuscript.

Training and evaluation can then be done normally. The code from the preprint is in the Training section.

How to cite this work

Lannelongue L., Inouye M., Construction of in silico protein-protein interaction networks across different topologies using machine learning, 2022, BioArxiv

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

Credits

The code was written in Python 3.7.
Many libraries were used, in particular Pandas, Numpy, scikit-learn and PyTorch Lightning (full list in the code and in the requirements file).
Plots were drawn using Matplotlib, Seaborn and the MetBrewer colour palettes.
Logs were saved using Weight & Bias.

Benchmarking Pipeline for Prediction of Protein-Protein Interactions

Related tags

Overview

B4PPI

How to use the gold standard

How to cite this work

Licence

Credits

Owner

Loïc Lannelongue

EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

Nonnegative spatial factorization for multivariate count data

Direct design of biquad filter cascades with deep learning by sampling random polynomials.

Array Camera Ptychography

Space-event-trace - Tracing service for spaceteam events

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

A very impractical 3D rendering engine that runs in the python terminal.

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

Automated Hyperparameter Optimization Competition

Hand gesture recognition model that can be used as a remote control for a smart tv.

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

Colab notebook for openai/glide-text2im.

[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

Vanilla and Prototypical Networks with Random Weights for image classification on Omniglot and mini-ImageNet. Made with Python3.

StorSeismic: An approach to pre-train a neural network to store seismic data features

An University Project of Quera Web Crawling.

Numerai tournament example scripts using NN and optuna