Taking the fight to the establishment.

Overview

Throwdown

Taking the fight to the establishment.

Wat?

I wanted a simple markdown interpreter in python and/or javascript to output html for my website. Python does not have a bug-free official distribution, javascript only has things you install through npm and I don't want to have anything to do with node and the 100 MB of dependencies you end up uploading to your FTP server in order to do the most basic tasks.

So writing my own parser it is then eh? I tried to trudge through the commonmark markdown spec and had a heart attack at the complexity. 24722 words and 181 pages of complicated language explaining features I absolutely don't need.

I just want minimal, well defined, syntactical elements with maximum payoff, so here is throwdown. Taking the fight to the establishment to have a stupidly minimal markup language in both definition and capability.

Goals

Keep it a subset for markdown so we can use existing IDEs & plugins. Support HTML tags in line with text. Write a well defined language spec in the manual to ease creating new interpreters for it.

The spec

Tokenization

Given a piece of text we tokenize the following concepts (these are regular expressions using DOTALL and MULTILINE modifiers):

blank_line(s): (\r | \n | \r\n){2,}
html tag: <.*?>
code: ```.*?```
unescaped italic: ^_|(?

any characters inbetween matching tokens are flagged content.

Content itself gets an additional treatment where we replace this regex

\\(?)

for escaped characters, with whatever was in the matched group. I currently do this in the generation stage but it could move to any stage.

I am not sure if in the UTF8 2/3/4 byte characters any of these elements may match, so make sure to perform these single-characetr checks per unicode char, not per byte.

Parsing

We then have a parsing pass that tries to group matching tokens:

In this example:

This *word* is bold but this* is wrong.

We have the following tokens:

content, bold, content, bold, content, bold, content

The parser simply finds any content block surrounded by matching code|italic|bold neighbours, and then 'consumes' these neighbours so they can not be picked up more than once. Reading from left to right this means we get (note we search outwards from content recursively to support *_content_* notations, instead of holding on to the boundary tokens as soon as we encounter them):

content group content bold content

Then, any token outside of a group gets merged into it's content, any consecutive content gets merged into 1 content. The first step reduced the bold into it's left neighbour:

content group content content

The next step reduces the two content blocks into one:

content group content

The above step should include html tags.

A final step is to remove the blank line tokens, but first we must make sure to merge consecutive group and content blocks, because after this any consecutive content and/or group tokens are known unique paragraphs (or headers) so the blank lines are no longer necessary to imply this separation.

Generation

Then there is the generation step. We simply walk the resulting tokens and output a html document.

  • If a content group is preceded by a heading, the node gets wrapped into tags where n is the number of #.
  • Every other content node gets wrapped into

    tags.

  • Every group gets wrapped based on the first and last tokens (which are identical).
    • italic becomes In this case the wrapping is recursive, a bold group in an intalic group may exist.
    • bold becomes In this case the wrapping is recursive, an italic group in a bold group may exist.
    • code becomes

Write
to insert single line breaks manually.

TODO:

Consider bullet points and numbered lists, though the html is not super invasive.

Owner
Trevor van Hoof
I write tools & shaders TropicalTrevor in the Demoscene
Trevor van Hoof
Jannik Ramrath 1 Feb 05, 2022
List of short Codeforces problems with a statement of 1000 characters or less. Python script and data files included.

Shortest problems on Codeforces List of Codeforces problems with a short problem statement of 1000 characters or less. Sorted for each rating level. B

32 Dec 24, 2022
Un script en python qui permet d'automatique bumpée (disboard.org) tout les 2h

auto-bumper Un script en python qui permet d'automatique bumpée (disboard.org) tout les 2h Pour la première utilisation, 1.Lancer Install.bat 2.(faire

!! 1 Jan 09, 2022
A simple weather app.

keather A simple weather app. This is currently not finished. Dependencies: yay -S python-beautifulsoup4 tk

1 Jan 09, 2022
Build a grocery store management application.

python_projects_grocery_webapp In this python project, we will build a grocery store management application. It will be 3 tier application, Front end:

codebasics 54 Dec 29, 2022
Ontario-Covid19-Screening - An automated Covid-19 School Screening Tool for Ontario

Ontario-Covid19-Screening An automated Covid-19 School Screening Tool for Ontari

Rayan K 0 Feb 20, 2022
Get a list of content on your Netflix My List that is expiring in the next month or two.

Netflix My List Expiring Movies Annoyed at Netflix for taking away your movies? Now you don't have to be! Installation instructions Install selenium C

24 Aug 06, 2022
Why write code when you can import it directly from GitHub Copilot?

Copilot Importer Why write code when you can import it directly from GitHub Copilot? What is Copilot Importer? The copilot python module will dynamica

Mythic 41 Jan 04, 2023
Chemical Analysis Calculator, with full solution display.

Chemicology Chemical Analysis Calculator, to solve problems efficiently by displaying whole solution. Go to releases for downloading .exe, .dmg, Linux

Muhammad Moazzam 2 Aug 06, 2022
WATTS provides a set of Python classes that can manage simulation workflows for multiple codes where information is exchanged at a coarse level

WATTS (Workflow and Template Toolkit for Simulation) provides a set of Python classes that can manage simulation workflows for multiple codes where information is exchanged at a coarse level.

13 Dec 23, 2022
北大选课网2021年春季验证码识别

北大选课网验证码识别 2021 年春季学期 Powered by Elector Quartet (@Rabbit, @xmcp, @SpiritedAwayCN, @gzz) 数据集描述 最初的数据集为 5130 张人工标记的验证码,之后利用早期训练好的模型在选课网上进行自动验证 (自举),又收集

Rabbit 27 Sep 17, 2022
4Geeks Academy Full-Stack Developer program final project.

Final Project Chavi, Clara y Pablo 4Geeks Academy Full-Stack Developer program final project. Authors Javier Manteca - Coding - chavisam Clara Rojano

1 Feb 05, 2022
Button paginator using discord_components

Button Paginator With discord-components Button paginator using discord_components Welcome! It's a paginator for discord-componets! Thanks to the orig

Decave 7 Feb 12, 2022
A new mini-batch framework for optimal transport in deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow.

BoMb-OT Python3 implementation of the papers On Transportation of Mini-batches: A Hierarchical Approach and Improving Mini-batch Optimal Transport via

Khai Ba Nguyen 18 Nov 14, 2022
Calibre Libgen Non-fiction / Sci-tech store plugin

CalibreLibgenSci A Libgen Non-Fiction/Sci-tech store plugin for Calibre Installation Download the latest zip file release from here Open Calibre Navig

IDDQD 9 Dec 27, 2022
Rufus port to linux, writed on Python3

Rufus-for-Linux Rufus port to linux, writed on Python3 Программа будет иметь тот же интерфейс что и оригинал, и тот же функционал. Программа создается

10 May 12, 2022
Programmatic startup/shutdown of ASGI apps.

asgi-lifespan Programmatically send startup/shutdown lifespan events into ASGI applications. When used in combination with an ASGI-capable HTTP client

Florimond Manca 129 Dec 27, 2022
Margin Calculator - Personally tailored investment tool

Margin Calculator - Personally tailored investment tool

1 Jul 19, 2022
A Python package that provides astronomical constants.

AstroConst A Python package that provides astronomical constants. The code is being developed by Marc van der Sluys of the department of Astrophysics

Marc van der Sluys 1 Jan 10, 2022
Python with braces. Because Python is awesome, but whitespace is awful.

Bython Python with braces. Because Python is awesome, but whitespace is awful. Bython is a Python preprosessor which translates curly brackets into in

1 Nov 04, 2021