Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

Last update: Jan 09, 2022

Related tags

Overview

gpt2-poetry

The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los Angeles.

I am currently analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry by utilizing the GPT-2 architecture (code originated from "Language Models are Unsupervised Multitask Learners" by Radford et. al., paper at this link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) to generate poetry trained on two different corpora: a corpora of sonnets (fourteen lined, rhymed poems) and another corpora of free-verse poetry from ten to eighteen lines selected from Poetry Magazine's issues from January 2012 - December 2021. I plan to compare the quality of these poems to randomly selected human-written poems from each of the training sets through a participant survey on the different characteristics of poetry.

To run: install Python 3.9.8, as well as the following modules: Fire 0.1.3, Regex 2017.4.5, Requests 2.21.0, tqdm 4.31.1, and toposort 1.5.

This project is in process and solely the free-verse portion of the data is currently uploaded to Github. The sonnets generated by the GPT-2 model will be uploaded soon!

Last updated: 1/5/2021

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

Related tags

Overview

gpt2-poetry

Owner

Ashley Kim

Label data using HuggingFace's transformers and automatically get a prediction service

Code for lyric-section-to-comment generation based on huggingface transformers.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

LSTM model - IMDB review sentiment analysis

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Amazon Multilingual Counterfactual Dataset (AMCD)

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

SGMC: Spectral Graph Matrix Completion

This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Smart discord chatbot integrated with Dialogflow

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Fidibo.com comments Sentiment Analyser

A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

ChatBotProyect - This is an unfinished project about a simple chatbot.

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

Unsupervised Language Model Pre-training for French

A Chinese to English Neural Model Translation Project

Modified GPT using average pooling to reduce the softmax attention memory constraints.