Implementation of ProteinBERT in Pytorch

Last update: Dec 25, 2022

Overview

ProteinBERT - Pytorch (wip)

Implementation of ProteinBERT in Pytorch.

Install

$ pip install protein-bert-pytorch

Usage

import torch
from protein_bert_pytorch import ProteinBERT

model = ProteinBERT(
    num_tokens = 21,
    num_annotation = 8943,
    dim = 512,
    dim_global = 256,
    depth = 6,
    narrow_conv_kernel = 9,
    wide_conv_kernel = 9,
    wide_conv_dilation = 5,
    attn_heads = 8,
    attn_dim_head = 64
)

seq = torch.randint(0, 21, (2, 2048))
mask = torch.ones(2, 2048).bool()
annotation = torch.randint(0, 1, (2, 8943)).float()

seq_logits, annotation_logits = model(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

Citations

@article {Brandes2021.05.24.445464,
    author      = {Brandes, Nadav and Ofer, Dan and Peleg, Yam and Rappoport, Nadav and Linial, Michal},
    title       = {ProteinBERT: A universal deep-learning model of protein sequence and function},
    year        = {2021},
    doi         = {10.1101/2021.05.24.445464},
    publisher   = {Cold Spring Harbor Laboratory},
    URL         = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464},
    eprint      = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464.full.pdf},
    journal     = {bioRxiv}
}

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

142 Jan 6, 2023

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

605 Jan 2, 2023

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

286 Jan 2, 2023

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

201 Nov 9, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 8, 2023

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

When

seq        = torch.randint(0, 21, (2, 2048)).cuda()
annotation = torch.randint(0, 1, (2, 8943)).float().cuda()
mask       = torch.ones(2, 2048).bool().cuda()

learner.cuda()

loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

OUTPUT

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-60892e498570> in <module>
      4 learner.cuda()
      5 
----> 6 loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

~/data/.conda/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/mnt/5280b/wwang/proteinbert/protein_bert_pytorch.py in forward(self, seq, annotation, mask)
    365 
    366         for token_id in self.exclude_token_ids:
--> 367             random_replace_token_prob_mask = random_replace_token_prob_mask & (random_tokens != token_id)  # make sure you never substitute a token with an excluded token type (pad, start, end)
    368 
    369         # noise sequence

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

opened by wilmerwang 0

How to use this bert version to use the pretrianed model?

Hi guys, thanks for great work. I'm trying to use this pytorch version protein-bert to use the pre-trained model 'ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl', but have no clues at all. Could you please give some suggestions? Thank you so much!

opened by Y-H-Joe 1

Implementation of ProteinBERT in Pytorch

Related tags

Overview

ProteinBERT - Pytorch (wip)

Install

Usage

Citations

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

PyTorch original implementation of Cross-lingual Language Model Pretraining.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

How to use this bert version to use the pretrianed model?

Releases(0.1.0)

0.1.0(Aug 10, 2021)

0.0.11(Aug 6, 2021)

0.0.10(Jun 11, 2021)

0.0.9(Jun 11, 2021)

0.0.8(Jun 11, 2021)

0.0.7(Jun 10, 2021)

0.0.6(May 29, 2021)

0.0.5(May 28, 2021)

0.0.4(May 28, 2021)

0.0.3a(May 28, 2021)

0.0.2(May 28, 2021)

0.0.1(May 28, 2021)

Owner

Phil Wang

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Simple text to phones converter for multiple languages

A high-level Python library for Quantum Natural Language Processing

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

This is a NLP based project to extract effective date of the contract from their text files.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Python api wrapper for JellyFish Lights

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Document processing using transformers