Trained T5 and T5-large model for creating keywords from text

Last update: Nov 24, 2022

Overview

text to keywords

Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru

Pretraining Large version | Pretraining Base version

habr article

Usage

Example usage (the code returns a list with keywords. duplicates are possible):

pip install transformers sentencepiece

from itertools import groupby
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = "0x7194633/keyt5-large" # or 0x7194633/keyt5-base
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def generate(text, **kwargs):
    inputs = tokenizer(text, return_tensors='pt')
    with torch.no_grad():
        hypotheses = model.generate(**inputs, num_beams=5, **kwargs)
    s = tokenizer.decode(hypotheses[0], skip_special_tokens=True)
    s = s.replace('; ', ';').replace(' ;', ';').lower().split(';')[:-1]
    s = [el for el, _ in groupby(s)]
    return s

article = """Reuters сообщил об отмене 3,6 тыс. авиарейсов из-за «омикрона» и погоды
Наибольшее число отмен авиарейсов 2 января пришлось на американские авиакомпании 
SkyWest и Southwest, у каждой — более 400 отмененных рейсов. При этом среди 
отмененных 2 января авиарейсов — более 2,1 тыс. рейсов в США. Также свыше 6400 
рейсов были задержаны."""

print(generate(article, top_p=1.0, max_length=64))  
# ['авиаперевозки', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов', 'отмена рейсов', 'отмена авиарейсов']

Training

To teach the keyT5-base and keyT5-large models, you will need a table in csv format, like this:

KeyT5 models were trained on ~7000 compressed habr.com articles. data.csv collect.py Exclusively supports the Russian language!

X	Y
Some text that is fed to the input	The text that should come out
Some text that is fed to the input	The text that should come out

Go to the training notebook and learn more about it:

Trained T5 and T5-large model for creating keywords from text

Related tags

Overview

text to keywords

Usage

Training

Owner

Danil

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

An Open-Source Package for Neural Relation Extraction (NRE)

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

Augmenty is an augmentation library based on spaCy for augmenting texts.

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

AI_Assistant - This is a Python based Voice Assistant.

Searching keywords in PDF file folders

Client library to download and publish models and other files on the huggingface.co hub

中文空间语义理解评测

Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

Weird Sort-and-Compress Thing

Automatically search Stack Overflow for the command you want to run

Quick insights from Zoom meeting transcripts using Graph + NLP

Scikit-learn style model finetuning for NLP

ChessCoach is a neural network-based chess engine capable of natural-language commentary.