Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Last update: Dec 28, 2022

Overview

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX.

This package is still in alpha stage, therefore some functionalities such as beam searches are still in development.

Installation

ONNX-T5 is available on PyPi.

pip install onnxt5

For the dev version you can run the following.

git clone https://github.com/abelriboulot/onnxt5
cd onnxt5
pip install -e .

Usage

The simplest way to get started for generation is to use the default pre-trained version of T5 on ONNX included in the package.

NOTE: Please note that the first time you call get_encoder_decoder_tokenizer, the models are being downloaded which might take a minute or two.

from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
# output_text: "J'ai été victime d'une série d'accidents."

Other tasks just require to change the prefix in your prompt, for instance for summarization:

prompt = 'summarize: <PARAGRAPH>'
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)

If you want to get the embeddings of text, you can run the following

from onnxt5.api import get_encoder_decoder_tokenizer, run_embeddings_text

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
prompt = 'Listen, Billy Pilgrim has come unstuck in time.'
encoder_embeddings, decoder_embeddings = run_embeddings_text(encoder_sess, decoder_sess, tokenizer, prompt)

ONNXT5 also lets you export and use your own models. See the examples\ folder for more detailed examples.

T5 works with tokens such as summarize:, translate English to German:, or question: ... context:. You can see a list of the pretrained tasks and token in the appendix D of the original paper.

Functionalities

Run any of the T5 trained tasks in a line (translation, summarization, sentiment analysis, completion, generation)
Export your own T5 models to ONNX easily
Utility functions to generate what you need quickly
Up to 4X speedup compared to PyTorch execution for smaller contexts

Benchmarks

The outperformance varies heavily based on the length of the context. For contexts less than ~500 words, ONNX outperforms greatly, going up to a 4X speedup compared to PyTorch. However, the longer the context, the smaller the speedup of ONNX, with Pytorch being faster above 500 words.

GPU Benchmark, Embedding Task

GPU Benchmark, Generation Task

Contributing

The project is still in its infancy, so I would love your feedback, to know what problems you are trying to solve, hear issues you're encountering, and discuss features that would help you. Therefore feel free to shoot me an e-mail (see my profile for the address!) or join our slack community.

Acknowledgements

This repo is based on the work of Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu from Google, as well as the implementation of T5 from the huggingface team, the work of the Microsoft ONNX and onnxruntime teams, in particular Tianlei Wu, and the work of Thomas Wolf on generation of text.

Original T5 Paper

@article{2019t5,
  author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
  title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  journal = {arXiv e-prints},
  year = {2019},
  archivePrefix = {arXiv},
  eprint = {1910.10683},
}

Microsoft onnxruntime repo

HuggingFace implementation of T5

Comments

Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.

Hi there, I've run a guide code and it doesn't work. I'm getting an error on the following line, decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()

text is a text from Wikipedia about cars.

onnxt5==0.1.4 protobuf==3.6.0 python==3.7

opened by vladislavkoz 6
Default T5 summary contains ..

<extra_id_0> the company<extra_id_1> the company<extra_id_2>.<extra_id_3>.<extra_id_4>.<extra_id_5>.<extra_id_6>. <extra_id_7>.

Do I need some postprocessing? Or it is an issue?

opened by vladislavkoz 5

int() argument must be a string , when running exemple.

Hello , i can't run the first exemple ,

from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
 # output_text: "J'ai été victime d'une série d'accidents."

the model begin calculation but before End, i have this error :

TypeError                                 Traceback (most recent call last)
<ipython-input-1-257f12b63043> in <module>
      5 prompt = 'translate English to French: I was a victim of a series of accidents.'
      6 
----> 7 output_text, output_logits = generative_t5(prompt, max_length=16, temperature=0.)
      8 # output_text: "J'ai été victime d'une série d'accidents."

~\Anaconda3\envs\onnxt5\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~\Anaconda3\envs\onnxt5\lib\site-packages\onnxt5\models.py in forward(self, prompt, max_length, temperature, repetition_penalty, top_k, top_p, max_context_length)
    145                 new_tokens.append(next_token)
    146 
--> 147             return self.tokenizer.decode(new_tokens), new_logits

~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils_base.py in decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
   3000             skip_special_tokens=skip_special_tokens,
   3001             clean_up_tokenization_spaces=clean_up_tokenization_spaces,
-> 3002             **kwargs,
   3003         )
   3004 

~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens)
    730         spaces_between_special_tokens: bool = True,
    731     ) -> str:
--> 732         filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
    733 
    734         # To avoid mixing byte-level and unicode for byte-level BPT

~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils.py in convert_ids_to_tokens(self, ids, skip_special_tokens)
    708         tokens = []
    709         for index in ids:
--> 710             index = int(index)
    711             if skip_special_tokens and index in self.all_special_ids:
    712                 continue

TypeError: int() argument must be a string, a bytes-like object or a number, not 'list

and i have no idea how to find solution , if you have any solution !? thx !

opened by AZE38 3

Inference time on gpu vs onnxt5-gpu

@abelriboulot , @Ki6an , @brymck .
I have finetuned t5 model for paraphrasing task like this: Paraphrase with t5

I want to reduce inference time, so I exported finetuned t5 model using onnxt5, here I get time taken more in case where I use onnx model on gpu than pytorch model on gpu.

gpu: time taken = 0.2357314471155405 time taken = 0.24958523781970143 time taken = 0.20342689706012607 time taken = 0.5490081580355763 time taken = 0.10756197292357683

onnxt5-gpu time taken = 0.5277913622558117 time taken = 0.6335883080027997 time taken = 0.6975196991115808 time taken = 1.9159171842038631 time taken = 0.7938353712670505

Did I make mistake in exporting/loading model ? gpu code onnxt5-gpu code

opened by priyanksonis 1
Add progress bar

This adds a progress bar using tqdm.

The files this library downloads are about 500 MB in size, so I'd like to have some feedback on what's happening. Originally I wasn't clear what was the cause of the delay when running get_encoder_decoder_tokenizer.

opened by brymck 0
Add download progress bar

This adds a progress bar using tqdm.

The files this library downloads are about 500 MB in size, so I'd like to have some feedback on what's happening. Originally I wasn't clear what was the cause of the delay when running get_encoder_decoder_tokenizer.

opened by brymck 0
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0

Add dtype to new_tokens tensor to avoid an error when decoding

Thanks for the repo!

I was having an error message come up when running the code after my initial install.

Small code example:

import os

import torch
from onnxt5 import GenerativeT5
from onnxt5.api import get_sess
from transformers import AutoTokenizer

model_dir = <path-to-tokenizer-and-onnx-files>
model_name = <name-of-model>

tokenizer = AutoTokenizer.from_pretrained(
    model_dir,
)

decoder_sess, encoder_sess = get_sess(
    os.path.join(model_dir, model_name)
)

model = GenerativeT5(
    encoder_sess,
    decoder_sess,
    tokenizer,
    onnx=True,
    cuda=torch.cuda.is_available(),
)

sentences = [
    "I has good grammar.",
    "I have bettr grammur."
]

corrected_sentences = [
    model(f"grammar: {sentence}",
          max_length=512,
          temperature=1,
          )[0]
    for sentence in sentences
]

The error

Traceback (most recent call last):
  File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 133, in <module>
    main()
  File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 125, in main
    prediction_output = predict_fn(input_data=input_tokens,
  File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 95, in predict_fn
    corrected_sentences = [model(f"grammar: {sentence}",
  File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/code/inference.py", line 95, in <listcomp>
    corrected_sentences = [model(f"grammar: {sentence}",
  File "/Users/jamiebrandon/Code/inferentia-test/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/jamiebrandon/Code/inferentia-test/onnx_example/compiled-t5-base-grammar-correction/onnxt5/onnxt5/models.py", line 154, in forward
    return self.tokenizer.decode(new_tokens), new_logits
  File "/Users/jamiebrandon/Code/inferentia-test/venv/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3367, in decode
    return self._decode(
  File "/Users/jamiebrandon/Code/inferentia-test/venv/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 548, in _decode
    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
TypeError: 'float' object cannot be interpreted as an integer

It seems the tensor for new tokens is of type float instead of long. Adding dtype=torch.long to the instantiation of the tensor resolved my issue, so I thought I'd share.

opened by jambran 0

Running example "export_pretrained_model.py" as-is fails (See details)

86%|████████▌ | 18/21 [00:00<00:00, 44.29it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-f543e3365977> in <module>()
     27 # Generating text
     28 generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
---> 29 generative_t5('translate English to French: I was a victim of a series of accidents.', 21, temperature=0.)[0]

3 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
    505         if isinstance(token_ids, int):
    506             token_ids = [token_ids]
--> 507         text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
    508 
    509         if clean_up_tokenization_spaces:

TypeError: 'float' object cannot be interpreted as an integer

Any possible version conflicts that you know of?

opened by PrithivirajDamodaran 2

How to suppress output

How to suppress output? Setting verbosity logging level does nothing 5%|█████████▊ | 16/300 [00:01<00:18, 15.65it/s]

opened by 127 0
Can this model suitable for multilingual-t5 accelerate?

Recently, I use the chinese function of multilingual-t5 model to accomplish the Chinese NLG tasks. However, the inference speed might be slow, could this model be used for multilingual-t5? How can I do?

opened by williamwong91 2

Releases(0.1.9)

0.1.9(Jan 28, 2021)

Source code(tar.gz)
Source code(zip)
0.1.8(Jan 28, 2021)

Source code(tar.gz)
Source code(zip)
0.1.6(Aug 15, 2020)

Source code(tar.gz)
Source code(zip)
0.1.3(Aug 4, 2020)

Source code(tar.gz)
Source code(zip)
0.0.9(Aug 1, 2020)

Source code(tar.gz)
Source code(zip)
0.0.5(Aug 1, 2020)

Now the Generative T5 can stop early if it receives an end of sequence signal.
Source code(tar.gz)
Source code(zip)
0.0.2(Aug 1, 2020)

Minor changes to the metadata
Source code(tar.gz)
Source code(zip)
0.0.1(Aug 1, 2020)

Initial release of the package
Source code(tar.gz)
Source code(zip)
0.03(Aug 1, 2020)

Added url to the benchmarks.
Source code(tar.gz)
Source code(zip)

Owner

Abel

Repentant portfolio manager, turned data scientist. I'm one Vonnegut quote away from figuring out this whole life thing.

GitHub Repository

The source code of HeCo

HeCo This repo is for source code of KDD 2021 paper "Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning". Paper Link: htt

106 Dec 27, 2022

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

16 Apr 02, 2022

Reproduction process of BERT on SST2 dataset

BERT-SST2-Prod Reproduction process of BERT on SST2 dataset 安装说明下载代码库 git clone https://github.com/JunnYu/BERT-SST2-Prod 进入文件夹，安装requirements pip ins

1 Nov 18, 2021

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Seq2seq_attn Use the Seq2Seq method to implement machine translation and use the

1 Jun 28, 2022

Spert NLP Relation Extraction API deployed with torchserve for inference

URLMask Python program for Linux users to change a URL to ANY domain. A program than can take any url and mask it to any domain name you like. E.g. ne

1 Nov 24, 2021

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

0 Dec 27, 2021

A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

1.2k Dec 18, 2022

MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

228 Nov 21, 2022

Kurumi ChatBot

KurumiChatBot Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @TokisakiChatB

3 Jun 28, 2022

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

22 Dec 29, 2022

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

1 Jan 28, 2022

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

gpt3-instruct-sandbox Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API Description This project updates an existing GPT-3 san

312 Jan 03, 2023

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Related tags

Overview

Installation

Usage

Functionalities

Benchmarks

GPU Benchmark, Embedding Task

GPU Benchmark, Generation Task

Contributing

Acknowledgements

Comments

Patching CVE-2007-4559

Releases(0.1.9)

0.1.9(Jan 28, 2021)

0.1.8(Jan 28, 2021)

0.1.6(Aug 15, 2020)

0.1.3(Aug 4, 2020)

0.0.9(Aug 1, 2020)

0.0.5(Aug 1, 2020)

0.0.2(Aug 1, 2020)

0.0.1(Aug 1, 2020)

0.03(Aug 1, 2020)

Owner

Abel

The source code of HeCo

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Reproduction process of BERT on SST2 dataset

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Spert NLP Relation Extraction API deployed with torchserve for inference

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

A python framework to transform natural language questions to queries in a database query language.

MPNet: Masked and Permuted Pre-training for Language Understanding

Kurumi ChatBot

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

2021语言与智能技术竞赛：机器阅读理解任务

Japanese NLP Library

BERT score for text generation

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

PG-19 Language Modelling Benchmark

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

ACL'2021: Learning Dense Representations of Phrases at Scale