This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

Related tags

Text Data & NLPNLP
Overview

FALLABOUT-SRMMIC 21

POETRY-GENERATION

HINGLISH

DESCRIPTION

We have developed a NLP(natural language processing) model which automatically generates a poem based on the initial/promt text given as input by the user.

Motivation

The majority of ML/DL models result is usualy based on the training/validation accuracy and loss. And one of the models which does not depend on either on accuracy or loss is NLP text generating model. Irrespective of the accuracy the generated text may or maynot make sense. Sometimes the accuracy can be very high and not give satisfactory results or end up in a loop. So this can only be done by looking at the result after many trails and training.

Uses

  1. Can be used for creative and fun purposes.
  2. Can sometimes used for reproducing or generating the text for larger datasets.
  3. Literature purpose like understanding and analysing a certain poetric style.

What's unique?

  1. Unlike many poetry generation, we also built a hindi poetry text generation model.
  2. We provide an analysis for LSTM layers and transformers with an example for better understanding.

Built with

  1. Streamlit for frontend
  2. tensorflow keras for hindi poetry
  3. aitextgen for english poetry

Deeper into the project

The english poetry generation is developed with the help of an open-sourse library known as aitextgen. The famous GPT-2 transformer is used in this project, finetuned on Shakespeares poems and sonnets alone. The hindi poetry generation is built with tensorflow keras. The front-end is simply handled by streamlit.

Here is an example of how aitextgen is fine tuned. Here is an example on how to train your own model using tensorflow keras.

A peek into our project

hindiNLP

EnglishNLP

Installation

The app.py file should be installed and download the model from this link. The trained_model folder should specify the path to your downloaded model. And you have to install trained_model_hindi from this link and specify the path as above. The trained_model_hindi forlder contains the trained model, tokenizer and etc. Similarly the trained_model folder for english also contains the model and uses the default built in GPT-2 transformer. Finally streamlit run app.py in your terminal and enjoy the app.

FALLABOUT SRM

This is how Your code should look while running on local.

Future works

  1. Planning on including a translator to slide easily between languages.
  2. Introduce more poet based model in many languages.

Authors

  1. Paras Rawat
  2. Daketi Yatin
LeBenchmark: a reproducible framework for assessing SSL from speech

LeBenchmark: a reproducible framework for assessing SSL from speech

11 Nov 30, 2022
基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

SentencesJudger SentencesJudger 是一个基于GRU神经网络的句子判断程序,基本的功能是判断文章中的某一句话是否为一个优美的句子。 English 如何使用SentencesJudger 确认Python运行环境 安装pyTorch与LTP python3 -m pip

8 Mar 24, 2022
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

PlanTL-SANIDAD 203 Dec 20, 2022
A fast and lightweight python-based CTC beam search decoder for speech recognition.

pyctcdecode A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support

Kensho 315 Dec 21, 2022
Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

GTFONow Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries. Features Automatically escalate privileges using miscon

101 Jan 03, 2023
Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

Merin Rose Tom 1 Jan 16, 2022
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022
NLP topic mdel LDA - Gathered from New York Times website

NLP topic mdel LDA - Gathered from New York Times website

1 Oct 14, 2021
In this Notebook I've build some machine-learning and deep-learning to classify corona virus tweets, in both multi class classification and binary classification.

Hello, This Notebook Contains Example of Corona Virus Tweets Multi Class Classification. - Classes is: Extremely Positive, Positive, Extremely Negativ

Khaled Tofailieh 3 Dec 06, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

NAME inflect.py - Correctly generate plurals, singular nouns, ordinals, indefinite articles; convert numbers to words. SYNOPSIS import inflect p = in

Jason R. Coombs 762 Dec 29, 2022
Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

Loren Kociko 1 Jun 14, 2022
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 07, 2022
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

Akbar Karimi 81 Dec 09, 2022
SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

Erre Quadro Srl 384 Dec 12, 2022
中文无监督SimCSE Pytorch实现

A PyTorch implementation of unsupervised SimCSE SimCSE: Simple Contrastive Learning of Sentence Embeddings 1. 用法 无监督训练 python train_unsup.py ./data/ne

99 Dec 23, 2022
Graph Coloring - Weighted Vertex Coloring Problem

Graph Coloring - Weighted Vertex Coloring Problem This project proposes several local searches and an MCTS algorithm for the weighted vertex coloring

Cyril 1 Jul 08, 2022
Reading Wikipedia to Answer Open-Domain Questions

DrQA This is a PyTorch implementation of the DrQA system described in the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions. Quick Link

Facebook Research 4.3k Jan 01, 2023
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

Lihu Chen 32 Jan 03, 2023
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022