Fake Shakespearean Text Generator

Last update: Feb 15, 2022

Overview

Fake Shakespearean Text Generator

This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts.

Files and folders of the project.

models folder

This folder contains to zip file, one for stateful model and the other for stateless model (this model files are fully saved model architectures,not just weights).

weights.zip

As you its name implies, this zip file contains the model's weights as checkpoint format (see tensorflow model save formats).

tokenizer.save

This file is an saved and trained (sure on the dataset) instance of Tensorflow Tokenizer (used at inference time).

shakespeare.txt

This file is the dataset and composed of regular texts (see below what does it look like).

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

train.py

Contains codes for training.

inference.py

Contains codes for inference.

How to Train the Model

A more depth look into train.py file

First, it gets the dataset from the specified url (line 11). Then reads the dataset to train the tokenizer object just mentioned above and trains the tokenizer (line 18). After training, encodes the dataset (line 24). Since this is a stateful model, all sequences in batch should be start where the sequences at the same index number in the previous batch left off. Let's say a batch composes of 32 sequences. The 33th sequence (i.e. the first sequence in the second batch) should exactly start where the 1st sequence (i.e. first sequence in the first batch) ended up. The second sequence in the 2nd batch should start where 2nd sequnce in first batch ended up and so on. Codes between line 28 and line 48 do this and result the dataset. Codes between line 53 and line 57 create the stateful model. Note that to be able to adjust recurrent_dropout hyperparameter you have to train the model on a GPU. After creation of model, a callback to reset states at the beginning of each epoch is created. Then the training start with the calling fit method and then model (see tensorflow' entire model save), model's weights and the tokenizer is saved.

Usage of the Model

Where the magic happens (inference.py file)

To be able use the model, it should first converted to a stateless model due to a stateful model expects a batch of inputs instead of just an input. To do this a stateless model with the same architecture of stateful model should be created. Codes between line 44 and line 49 do this. To load weights the model should be builded. After building weight are loaded to the stateless model. This model uses predicted character at time step t as an inputs at time t + 1 to predict character at t + 2 and this operation keep goes until the prediction of last character (in this case it 100 but you can change it whatever you want. Note that the longer sequences end up with more inaccurate results). To predict the next characters, first the provided initial character should be tokenized. preprocess function does this. To prevent repeated characters to be shown in the generated text, the next character should be selected from candidate characters randomly. The next_char function does this. The randomness can be controlled with temperature parameter (to learn usage of it check the comment at line 30). The complete_text function, takes a character as an argument, predicts the next character via next_char function and concatenates the predicted character to the text. It repeats the process until to reach n_chars. Last, the stateless model will be saved also.

Results

Effects of the magic

print(complete_text("a"))

arpet:
like revenge borning and vinged him not.

lady good:
then to know to creat it; his best,--lord

print(complete_text("k"))

ken countents.
we are for free!

first man:
his honour'd in the days ere in any since
and all this ma

print(complete_text("f"))

ford:
hold! we must percy and he was were good.

gabes:
by fair lord, my courters,
sir.

nurse:
well

print(complete_text("h"))

holdred?
what she pass myself in some a queen
and fair little heartom in this trumpet our hands?
the

Fake Shakespearean Text Generator

Related tags

Overview

Fake Shakespearean Text Generator

How to Train the Model

A more depth look into train.py file

Usage of the Model

Where the magic happens (inference.py file)

Results

Effects of the magic

Owner

Recep YILDIRIM

Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

MPNet: Masked and Permuted Pre-training for Language Understanding

Python package for performing Entity and Text Matching using Deep Learning.

[ICLR'19] Trellis Networks for Sequence Modeling

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Interpretable Models for NLP using PyTorch

Entity Disambiguation as text extraction (ACL 2022)

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

A sentence aligner for comparable corpora

lightweight, fast and robust columnar dataframe for data analytics with online update