A Python library for generating new text from existing samples.

Last update: May 17, 2022

Related tags

Overview

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birthday messages, horoscopes, Wikipedia articles, or the utterances of your game's NPCs. Everything works without an omnipotent "AI" - it is dead-simple code and therefore fast.

Check out the examples and feel free to contribute!

Installation

pip3 install remarkov

Example

Scrape the Wikipedia page for "Computer Programming" and generate a new text from it:

./tools/scrape-wiki.py Computer_programming | remarkov build | remarkov generate

You can also use remarkov programmatically:

from remarkov import create_model

model = create_model()
model.add_text("This is a sample text and this is another.")

print(model.generate().text())
# "This is a sample text and this is a sample text and this is a sample text ..."

Development

Make sure you run pytest as module. This will add the current directory to the import path:

python3 -m pytest

This project uses black for source code formatting:

black .

Generate documentation for the project (this uses the original pdoc at pdoc.dev):

git checkout gh-pages
pdoc -t pdoc/template -o public/docs <path_to_remarkov_module>

Run type checks using mypy:

mypy -p remarkov

Publishing is done like this (don't forget to bump the version in setup.py):

pip3 install twine # optional

git tag -a <version>
git push --tags

python3 setup.py clean --all
python3 setup.py sdist bdist_wheel
twine check "dist/*"
twine upload "dist/*"

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

21 Nov 3, 2022

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

1 Dec 28, 2021

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

4 Feb 9, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

Comments

Release schedule
[x] Add source code documentation

[x] Improve explanation on website

[x] Adapt syntax highlighting in docs

[x] Generate samples for showcase

[x] Articles

[x] Birthday

[x] Horoscope

[x] Utterance

[x] Enable gh-pages
opened by lausek 0

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)
ReMarkov Example Datasets - EN

Based on:

https://github.com/kavgan/OpinRank (Cars, Hotels)

https://github.com/dsnam/markovscope (Horoscopes)

https://github.com/hmi-utwente/video-game-text-corpora (NPC)

ReMarkov Wikipedia Scraper (Blockchain)

Source code(tar.gz)
Source code(zip)
remarkov-dataset.7z(6.16 MB)
remarkov-dataset.zip(9.05 MB)

A Python library for generating new text from existing samples.

Related tags

Overview

Installation

Example

Development

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

Comments

Release schedule

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)

ReMarkov Example Datasets - EN

Owner

A PyTorch library and evaluation platform for end-to-end compression research

Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Coded illumination for improved lensless imaging

Location-Sensitive Visual Recognition with Cross-IOU Loss

Style transfer, deep learning, feature transform

Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS 2021), and the code to generate simulation results.

Orbivator AI - To Determine which features of data (measurements) are most important for diagnosing breast cancer and find out if breast cancer occurs or not.

This is the repository for Learning to Generate Piano Music With Sustain Pedals

AI pipelines for Nvidia Jetson Platform

Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

[NeurIPS'20] Multiscale Deep Equilibrium Models

An Unpaired Sketch-to-Photo Translation Model

Read number plates with https://platerecognizer.com/

This is a demo app to be used in the video streaming applications

Sample code from the Neural Networks from Scratch book.

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Signals-backend - A suite of card games written in Python

Kindle is an easy model build package for PyTorch.

Playing around with FastAPI and streamlit to create a YoloV5 object detector