Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Overview

Sonnet finder

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Usage

This is a Python script that should run without a GPU or any other special hardware requirements.

  1. Install the required packages, e.g. via: pip install -r requirements.txt

  2. Prepare a plain text file, say input.txt, with text you want to make a sonnet out of (sonnet-ize? sonnet-ify?). It can have multiple sentences on the same line, but a sentence should not be split across multiple lines.

    For example, I used pandoc --to=plain --wrap=none to generate a text file from my LaTeX papers. You could also start by grabbing some text files from Project Gutenberg.

  3. Run sonnet finder: python sonnet_finder.py input.txt -o output.tsv

    Using -o will save a list of all extracted candidate phrases, sorted by rhyme pattern, so you can generate new sonnets more quickly (see below) or browse and cherry-pick from the candidates to make your own sonnet out of these lines.

    Either way, the script will output a full example sonnet to STDOUT (provided enough rhyming pairs in iambic pentameter were found).

  4. If you've saved an output.tsv file before, you can quickly generate new sonnets via python sonnet_remix.py output.tsv. Since the stress and pronunciation prediction can be slow on larger files, this is much better than re-running sonnet_finder.py if you want more automatically generated suggestions.

Examples

This is a sonnet (with cherry-picked lines) made out of my PhD thesis:

the application of existing tools
describe a mapping to a modern form
applying similar replacement rules
the base ensembles slightly outperform

hungarian, icelandic, portuguese
perform a similar evaluation
contemporary lexemes or morphemes
a single dataset in isolation

historical and modern language stages
the weighted combination of encoder
the german dative ending -e in phrases
predictions fed into the next decoder

in this example from the innsbruck letter
machine translation still remains the better

These stanzas are compiled from a couple of automatically-generated suggestions based on the abstracts of all papers published in 2021 in the ACL Anthology:

effective algorithm that enables
improvements on a wide variety
and training with adjudicated labels
anxiety and test anxiety

obtain remarkable improvements on
decoder architecture, which equips
associated with the lexicon
surprising personal relationships

the impact of the anaphoric one
complexity prediction competition
developed for a laboratory run
existing parsers typically condition

examples, while in practice, most unseen
evaluate translation tasks between

Here's the same using Moby Dick:

among the marble senate of the dead
offensive matters consequent upon
a crawling reptile of the land, instead
fifteen, eighteen, and twenty hours on

the lakeman now patrolled the barricade
egyptian tablets, whose antiquity
the waters seemed a golden finger laid
maintains a permanent obliquity

the pequod with the little negro pippin
and with a frightful roll and vomit, he
increased, besides perhaps improving it in
transparent air into the summer sea

the traces of a simple honest heart
the fishery, and not the thousandth part

(The emjambment in the third stanza here is a lucky coincidence; the script currently doesn't do any kind of syntactic analysis or attempt coherence between lines.)

How it works

This script relies on the grapheme-to-phoneme library g2p_en by Park & Kim to convert the English input text to phoneme sequences (i.e., how the text would be pronounced). I chose this because it's a pip-installable Python library that fulfills two important criteria:

  1. it's not restricted to looking up pronunciations in a dictionary, but can handle arbitrary words through the use of a neural model (although, obviously, this will not always be accurate);

  2. it provides stress information for each vowel (i.e., whether any given vowel should be stressed or unstressed, which is important for determining the poetic meter).

The script then scans the g2p output for occurrences of iambic pentameter, i.e. a 0101010101(0) pattern, additionally checking if they coincide with word boundaries.

For finding snippets that rhyme, I rely mostly on Ghazvininejad et al. (2016), particularly §3 (relaxing the iambic pentameter a bit by allowing words that end in 100) and §5.2 (giving an operational definition of "slant rhyme" that I mostly try to follow).

QNA (Questions Nobody Asked)

  • Why does the script sometimes output lines that don't rhyme or don't fit the iambic meter? This script can only be as good as the grapheme-to-phoneme algorithm that's used. It frequently fails on words it doesn't know (for example, it tries to rhyme datasets with Portuguese?!) and also usually fails on abbreviations. Maybe there's a better g2p library that could be used, or the existing g2p_en could be modified to accept a custom dictionary, so you could manually define pronunciations for commonly used words.

  • Could this script also generate other types of poems? Sure. You could start by changing the regex iambic_pentameter to something else; maybe a sequence of dactyls? There are some further hardcoded assumptions in the code about iambic pentameter in the function get_stress_and_boundaries() that might have to be modified.

  • Could this script generate poems in languages other than English? This would require a suitable replacement for g2p_en that predicts pronunciations and stress patterns for the desired language, as well as re-writing the code that determines whether two phrases can rhyme; see the comments in the script for details. In particular, the code for English uses ARPABET notation for the pronunciation, which won't be suitable for other languages.

  • Can this script generate completely novel phrases in the style of an input text? This script does not "hallucinate" any text or generate anything that wasn't already there in the input; if you want to do that, take a look at Deep-speare maybe.

etc.

Written by Marcel Bollmann, inspired by a tweet, licensed under the MIT License.

I'm not the first one to write a script like this, but it was a fun exercise!

Owner
Marcel Bollmann
Computational linguist, postdoc, programming enthusiast.
Marcel Bollmann
Search msDS-AllowedToActOnBehalfOfOtherIdentity

前言 现在进行RBCD的攻击手段主要是搜索mS-DS-CreatorSID,如果机器的创建者是我们可控的话,那就可以修改对应机器的msDS-AllowedToActOnBehalfOfOtherIdentity,利用工具SharpAllowedToAct-Modify 那我们索性也试试搜索所有计算机

Jumbo 26 Dec 05, 2022
A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

AI2 1.3k Jan 03, 2023
硕士期间自学的NLP子任务,供学习参考

NLP_Chinese_down_stream_task 自学的NLP子任务,供学习参考 任务1 :短文本分类 (1).数据集:THUCNews中文文本数据集(10分类) (2).模型:BERT+FC/LSTM,Pytorch实现 (3).使用方法: 预训练模型使用的是中文BERT-WWM, 下载地

12 May 31, 2022
SimCTG - A Contrastive Framework for Neural Text Generation

A Contrastive Framework for Neural Text Generation Authors: Yixuan Su, Tian Lan,

Yixuan Su 345 Jan 03, 2023
Harvis is designed to automate your C2 Infrastructure.

Harvis Harvis is designed to automate your C2 Infrastructure, currently using Mythic C2. 📌 What is it? Harvis is a python tool to help you create mul

Thiago Mayllart 99 Oct 06, 2022
Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022
Linking data between GBIF, Biodiverse, and Open Tree of Life

GBIF-biodiverse-OpenTree Linking data between GBIF, Biodiverse, and Open Tree of Life The python scripts will rely on opentree and Dendropy. To set up

2 Oct 03, 2022
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

34 Nov 24, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

Structural Guidance for Transformer Language Models This repository accompanies the paper, Structural Guidance for Transformer Language Models, publis

International Business Machines 10 Dec 14, 2022
Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

Amazon 69 Jan 03, 2023
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 08, 2021
DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 03, 2023
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

Yiming Wang 919 Jan 03, 2023
vits chinese, tts chinese, tts mandarin

vits chinese, tts chinese, tts mandarin 史上训练最简单,音质最好的语音合成系统

AmorTX 12 Dec 14, 2022
Abhijith Neil Abraham 2 Nov 05, 2021
FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

* MY SOCIAL MEDIA : Programming And Memes Want to contact Mr. Error ? CONTACT : [ema

Mr. Error 9 Jun 17, 2021
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python) 日本語は以下に続きます (Japanese follows) English: This book is written in Japanese and primaril

Ryuichi Yamamoto 189 Dec 29, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 09, 2021
Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Kötü söz sahibine aittir." -Anonim Nedir? sinkaf uygunsuz yorumların bulunmasını sağlayan bir python kütüphanesidir. Farkı nedir? Diğer algoritmalard

KaraGoz 4 Feb 18, 2022