A high-level yet extensible library for fast language model tuning via automatic prompt search

Overview

ruPrompts

ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with HuggingFace Hub, configuration system powered by Hydra, and command line interface.

Prompt is a text instruction for language model, like

Translate English to French:
cat =>

For some tasks the prompt is obvious, but for some it isn't. With ruPrompts you can define only the prompt format, like {text} , and train it automatically for any task, if you have a training dataset.

You can currently use ruPrompts for text-to-text tasks, such as summarization, detoxification, style transfer, etc., and for styled text generation, as a special case of text-to-text.

Features

  • Modular structure for convenient extensibility
  • Integration with HF Transformers, support for all models with LM head
  • Integration with HF Hub for sharing and loading pretrained prompts
  • CLI and configuration system powered by Hydra
  • Pretrained prompts for ruGPT-3

Installation

ruPrompts can be installed with pip:

pip install ruprompts[hydra]

See Installation for other installation options.

Usage

Loading a pretrained prompt for styled text generation:

>> ppln_joke("Говорит кружка ложке") [{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]">
>>> import ruprompts
>>> from transformers import pipeline

>>> ppln_joke = pipeline("text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_joke")
>>> ppln_joke("Говорит кружка ложке")
[{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]

For text2text tasks:

>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли") [{"generated_text": 'Опять эти люди все испортили'}]">
>>> ppln_detox = pipeline("text2text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_detox_russe")
>>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли")
[{"generated_text": 'Опять эти люди все испортили'}]

Proceed to Quick Start for a more detailed introduction or start using ruPrompts right now with our Colab Tutorials.

License

ruPrompts is Apache 2.0 licensed. See the LICENSE file for details.

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

Laura 1 Jan 28, 2022
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 08, 2023
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022
A simple version of DeTR

DeTR-Lite A simple version of DeTR Before you enjoy this DeTR-Lite The purpose of this project is to allow you to learn the basic knowledge of DeTR. P

Jianhua Yang 11 Jun 13, 2022
NLP command-line assistant powered by OpenAI

NLP command-line assistant powered by OpenAI

Axel 16 Dec 09, 2022
Search Git commits in natural language

NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co

Pushkar Patel 50 Mar 22, 2022
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Jan 02, 2023
Harvis is designed to automate your C2 Infrastructure.

Harvis Harvis is designed to automate your C2 Infrastructure, currently using Mythic C2. 📌 What is it? Harvis is a python tool to help you create mul

Thiago Mayllart 99 Oct 06, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
An open-source NLP library: fast text cleaning and preprocessing.

An open-source NLP library: fast text cleaning and preprocessing

Iaroslav 21 Mar 18, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 02, 2022
NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Manglish Text Editor This is a simple transiteration (romanization ) program which is used to convert manglish to malayalam (converts njaan to ഞാൻ ).

Merin Rose Tom 1 May 11, 2022
✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

AI-For-Road-Safety Challenge hosted by Omdena Hyderabad Chapter Original Repo Link : https://github.com/OmdenaAI/omdena-india-roadsafety Final Present

Prathima Kadari 7 Nov 29, 2022
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
Modified GPT using average pooling to reduce the softmax attention memory constraints.

NLP-GPT-Upsampling This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Ny

WD 1 Dec 03, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023