This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

Related tags

Text Data & NLPNLP
Overview

FALLABOUT-SRMMIC 21

POETRY-GENERATION

HINGLISH

DESCRIPTION

We have developed a NLP(natural language processing) model which automatically generates a poem based on the initial/promt text given as input by the user.

Motivation

The majority of ML/DL models result is usualy based on the training/validation accuracy and loss. And one of the models which does not depend on either on accuracy or loss is NLP text generating model. Irrespective of the accuracy the generated text may or maynot make sense. Sometimes the accuracy can be very high and not give satisfactory results or end up in a loop. So this can only be done by looking at the result after many trails and training.

Uses

  1. Can be used for creative and fun purposes.
  2. Can sometimes used for reproducing or generating the text for larger datasets.
  3. Literature purpose like understanding and analysing a certain poetric style.

What's unique?

  1. Unlike many poetry generation, we also built a hindi poetry text generation model.
  2. We provide an analysis for LSTM layers and transformers with an example for better understanding.

Built with

  1. Streamlit for frontend
  2. tensorflow keras for hindi poetry
  3. aitextgen for english poetry

Deeper into the project

The english poetry generation is developed with the help of an open-sourse library known as aitextgen. The famous GPT-2 transformer is used in this project, finetuned on Shakespeares poems and sonnets alone. The hindi poetry generation is built with tensorflow keras. The front-end is simply handled by streamlit.

Here is an example of how aitextgen is fine tuned. Here is an example on how to train your own model using tensorflow keras.

A peek into our project

hindiNLP

EnglishNLP

Installation

The app.py file should be installed and download the model from this link. The trained_model folder should specify the path to your downloaded model. And you have to install trained_model_hindi from this link and specify the path as above. The trained_model_hindi forlder contains the trained model, tokenizer and etc. Similarly the trained_model folder for english also contains the model and uses the default built in GPT-2 transformer. Finally streamlit run app.py in your terminal and enjoy the app.

FALLABOUT SRM

This is how Your code should look while running on local.

Future works

  1. Planning on including a translator to slide easily between languages.
  2. Introduce more poet based model in many languages.

Authors

  1. Paras Rawat
  2. Daketi Yatin
SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

Princeton Natural Language Processing 2.5k Jan 07, 2023
Beyond the Imitation Game collaborative benchmark for enormous language models

BIG-bench ๐Ÿช‘ The Beyond the Imitation Game Benchmark (BIG-bench) will be a collaborative benchmark intended to probe large language models, and extrap

Google 1.3k Jan 01, 2023
InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

Facebook Research 2.2k Dec 27, 2022
Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations Created by Jiahao Pang, Duanshun Li, and Dong Tian from InterDigital In

InterDigital 21 Dec 29, 2022
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 847 Dec 19, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Applying BERT Fine Tuning to Sentiment Classification on Amazon Reviews Abstract Sentiment analysis has made great progress in recent years, due to th

Alexander Leonardo Lique Lamas 5 Jan 03, 2022
Exploration of BERT-based models on twitter sentiment classifications

twitter-sentiment-analysis Explore the relationship between twitter sentiment of Tesla and its stock price/return. Explore the effect of different BER

Sammy Cui 2 Oct 02, 2022
A curated list of FOSS tools to improve the Hacker News experience

Awesome-Hackernews Hacker News is a social news website focusing on computer technologies, hacking and startups. It promotes any content likely to "gr

Bryton Lacquement 141 Dec 27, 2022
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokรฉmon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
Python package for Turkish Language.

PyTurkce Python package for Turkish Language. Documentation: https://pyturkce.readthedocs.io. Installation pip install pyturkce Usage from pyturkce im

Mert Cobanov 14 Oct 09, 2022
Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

youtube-dl and ffmpeg Windows Explorer Integration Download videos from YouTube/Twitch/Twitter and more (any platform that is supported by youtube-dl)

Wolfgang 226 Dec 30, 2022
A curated list of efficient attention modules

awesome-fast-attention A curated list of efficient attention modules

Sepehr Sameni 891 Dec 22, 2022
๋‚ด๋ถ€ ์ž‘์—…์šฉ django + vue(vuetify) boilerplate. ์ง  ํ•˜๋ฉด ๋Œ์•„๊ฐ.

Pocket Galaxy ์•„์ฃผ ๊ฐ„๋‹จํ•œ ๊ฐœ์ธ์šฉ, ํ˜น์€ ๋‚ด๋ถ€์šฉ ํˆด์„ ๋งŒ๋“ค์–ด์•ผํ•˜๋Š”๋ฐ ์ด์™•์ด๋ฉด ์›น์ด ํŽธํ•˜์ฃ ? ๊ทธ๋Ÿด๋•Œ๋ฅผ ์œ„ํ•ด ๋งŒ๋“ค์–ด๋‘” django์™€ vue(vuetify)๋กœ ์ด๋ค„์ง„ boilerplate ์ž…๋‹ˆ๋‹ค. ๊ฐ ํด๋”์— ์žˆ๋Š” ์„ค๋ช…์„œ๋Œ€๋กœ ์‹คํ–‰์„ ์‹œํ‚ค๋ฉด ์ผ๋‹จ ๋‹น์žฅ ๋ญ”๊ฐ€๊ฐ€ ๋Œ์•„๊ฐ‘๋‹ˆ

Jamie J. Seol 16 Dec 03, 2021
Pipeline for fast building text classification TF-IDF + LogReg baselines.

Text Classification Baseline Pipeline for fast building text classification TF-IDF + LogReg baselines. Usage Instead of writing custom code for specif

Dani El-Ayyass 57 Dec 07, 2022
Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(ร ยธโ€ก'รขล’ยฃ')ร ยธโ€ก")) (เธ‡'โŒฃ')เธ‡ Full documentation: https://ftfy.readthedocs.org Testimonials โ€œMy life is li

Luminoso Technologies, Inc. 3.4k Dec 29, 2022
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

652 Jan 06, 2023
A framework for implementing federated learning

This is partly the reproduction of the paper of [Privacy-Preserving Federated Learning in Fog Computing](DOI: 10.1109/JIOT.2020.2987958. 2020)

DavidChen 46 Sep 23, 2022
Natural language computational chemistry command line interface.

nlcc Install pip install nlcc Must have Open-AI Codex key: export OPENAI_API_KEY=your key here then nlcc key bindings ctrl-w copy to clipboard (Note

Andrew White 37 Dec 14, 2022