Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Last update: Jan 03, 2023

Overview

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

Marketing
Search Engine Optimization
Topic generation etc.
Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model:

k2t: Model
k2t-base: Model
mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage:

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here:

from keytotext import trainer

UI:

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

API:

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

https://github.com/Shivanandroy/simpleT5 (Shivanand Roy)
https://github.com/patil-suraj/question_generation (Suraj Patil)
https://github.com/MathewAlexander/T5_nlg (Mathew Alexander)

Articles about keytotext:

https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45 (Mathew Alexander)
Amazing Video by 1LittleCoder here: https://www.youtube.com/watch?v=I0iBzP-SxFY about keytotext
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b (Prakhar Mishra)

Comments

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
Hi,

I tried to install keytotext via pip install keytotext --upgrade in local machine.

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none) ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?
opened by abhijithneilabraham 6
Add finetuning model to keytotext

Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus
enhancement good first issue

opened by gagan3012 2
"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

opened by drscotthawley 2
Add Citations

Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by gagan3012 1
Adding new models to keytotext

Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.
enhancement good first issue

opened by gagan3012 1
Inference API for Keytotext

Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

Describe the solution you'd like Inference API
enhancement good first issue

opened by gagan3012 1
Create Better UI

Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

Describe the solution you'd like Better UI with a nicer design
enhancement

opened by gagan3012 1
Add `st.cache` to load model

Hi @gagan3012,

Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

Hope this works for you and let me know if you have any other questions! 🎈

Cheers, Johannes

opened by jrieke 1
ValueError: transformers.models.auto.__spec__ is None

'from keytotext import pipeline'

While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

opened by varunakk 0
Update README.md
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Update trainer.py
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

opened by skintflickz 0
New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'
I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade !sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

Have attached error screenshot

OS: Windows

Browser Chrome
opened by aishwaryapisal9 2
Update trainer.py
Delete progress_bar_refresh_rate in trainer.py

Description

delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

Motivation and Context

having this argument fails the training process

How Has This Been Tested?

Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

Screenshots (if appropriate):

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by anath2110benten 0
Why is cv2 required?

https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

opened by ChunxuYang 0
Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by RuiFeiHe 6

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

Trainer tool finalized and completed!
Source code(tar.gz)
Source code(zip)
v1.4.1(Jul 2, 2021)

Val acc added
Source code(tar.gz)
Source code(zip)
v1.3.9(Jul 2, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v1.3.8(Jul 2, 2021)

New Upload to hf hub module
Source code(tar.gz)
Source code(zip)
v1.3.1(Jun 16, 2021)

Documentation updated along with sematic versioning
Source code(tar.gz)
Source code(zip)

v0.3.1(Jun 15, 2021)

This version features a tested trainer which can be used in 4 lines of code:

from keytotext import KeytotextTrainer

model = KeytotextTrainer()
model.from_pretrained(model_name="t5-small")
model.train(data_df=df,batch_size=4, max_epochs=3, use_gpu=True)
model.save_model()

Source code(tar.gz)
Source code(zip)

v0.2.9(Jun 15, 2021)

This release features the new Trainer module More details coming soon
Source code(tar.gz)
Source code(zip)
v0.2.5(May 12, 2021)
Changes:

Bug Fixes

Maintaining new models

Source code(tar.gz)
Source code(zip)
v0.2.4(May 11, 2021)
Changes:

Refactoring of code

Ability to add new models too

Source code(tar.gz)
Source code(zip)
v0.2.3(May 10, 2021)
v0.2.3 :

Bug fixes

New models added

Source code(tar.gz)
Source code(zip)
v0.2.2(May 10, 2021)
Changes:

Now keytotext supports new models trained by other people too

A new fine-tuning script

Source code(tar.gz)
Source code(zip)
v0.2.1(May 5, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v0.2.0(May 4, 2021)
Latest Release:

Completed API

Completed testing

completed all Evals

UI Improvements too

Source code(tar.gz)
Source code(zip)
v0.1.6(May 2, 2021)
Changes:

Updates to Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.5(May 2, 2021)
Changes:

Added Trainer API

Added Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 30, 2021)

Latest release
Source code(tar.gz)
Source code(zip)
v0.1.3(Apr 27, 2021)

Updates
Source code(tar.gz)
Source code(zip)
0.1.1(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 26, 2021)

Production release- 0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Gagan Bhatia

Software Developer | Machine Learning Enthusiast

GitHub Repository https://share.streamlit.io/gagan3012/keytotext/UI/app.py

A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

184 Dec 18, 2022

Simple Text-To-Speech Bot For Discord

Simple Text-To-Speech Bot For Discord This is a very simple TTS bot for discord made with python. For this bot you need FFMPEG, see installation to se

1 Sep 26, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

3 Jul 24, 2022

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling This repository contains PyTorch evaluation code, training code and pretrain

94 Oct 26, 2022

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

61 Dec 10, 2022

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in

1 Aug 21, 2022

Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

16 Feb 24, 2022

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

231 Dec 30, 2022

Utilizing RBERT model for KLUE Relation Extraction task

RBERT for Relation Extraction task for KLUE Project Description Relation Extraction task is one of the task of Korean Language Understanding Evaluatio

14 Nov 15, 2022

Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

11.7k Dec 30, 2022

A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

0 Aug 04, 2021

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

16 Nov 12, 2022

AMUSE - financial summarization

AMUSE AMUSE - financial summarization Unzip data.zip Train new model: python FinAnalyze.py --task train --start 0 --count how many files,-1 for all

1 Jan 11, 2022

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

About spellchecker.py Implementing a highly-accurate, brute-force, and dynamically programmed spellchecking program that utilizes the Damerau-Levensht

1 Dec 11, 2021

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Safaricom_Codility Machine Learning 2022 The test entails two questions. Question 1 was on Machine Learning. Question 2 was on SQL I ran out of time.

1 Mar 03, 2022

Problem: Given a nepali news find the category of the news

Classification of category of nepali news catorgory using different algorithms Problem: Multiclass Classification Approaches: TFIDF for vectorization

2 Jan 09, 2022

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

8.4k Dec 26, 2022

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Crosslingual Coreference Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non

71 Jan 04, 2023

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Related tags

Overview

keytotext

Model:

Usage:

Trainer:

UI:

API:

BibTex:

References

Articles about keytotext:

Comments

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

v1.4.1(Jul 2, 2021)

v1.3.9(Jul 2, 2021)

v1.3.8(Jul 2, 2021)

v1.3.1(Jun 16, 2021)

v0.3.1(Jun 15, 2021)

v0.2.9(Jun 15, 2021)

v0.2.5(May 12, 2021)

v0.2.4(May 11, 2021)

v0.2.3(May 10, 2021)

v0.2.2(May 10, 2021)

v0.2.1(May 5, 2021)

v0.2.0(May 4, 2021)

v0.1.6(May 2, 2021)

v0.1.5(May 2, 2021)

v0.1.4(Apr 30, 2021)

v0.1.3(Apr 27, 2021)

0.1.1(Apr 26, 2021)

0.1.0(Apr 26, 2021)

Owner

Gagan Bhatia

A Fast Sequence Transducer Implementation with PyTorch Bindings

Simple Text-To-Speech Bot For Discord

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Predict the spans of toxic posts that were responsible for the toxic label of the posts

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Findings of ACL 2021

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Utilizing RBERT model for KLUE Relation Extraction task

Leon is an open-source personal assistant who can live on your server.

A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AMUSE - financial summarization

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Problem: Given a nepali news find the category of the news

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.