Paranoid text spacing in Python

Overview

pangu.py

https://img.shields.io/travis/vinta/pangu.py/master.svg?style=flat-square https://img.shields.io/codecov/c/github/vinta/pangu.py/master.svg?style=flat-square https://img.shields.io/pypi/v/pangu.svg?style=flat-square https://img.shields.io/pypi/pyversions/pangu.svg?style=flat-square https://img.shields.io/badge/made%20with-%e2%9d%a4-ff69b4.svg?style=flat-square

Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).

Installation

$ pip install -U pangu

Usage

In Python

import pangu

new_text = pangu.spacing_text('當你凝視著bug,bug也凝視著你')
# new_text = '當你凝視著 bug,bug 也凝視著你'

nwe_content = pangu.spacing_file('path/to/file.txt')
# nwe_content = '與 PM 戰鬥的人,應當小心自己不要成為 PM'

In CLI

$ pangu "請使用uname -m指令來檢查你的Linux作業系統是32位元或是[敏感词已被屏蔽]位元"
請使用 uname -m 指令來檢查你的 Linux 作業系統是 32 位元或是 [敏感词已被屏蔽] 位元

$ python -m pangu "為什麼小明有問題都不Google?因為他有Bing"
為什麼小明有問題都不 Google?因為他有 Bing

$ echo "未來的某一天,Gmail配備的AI可能會得出一個結論:想要消滅垃圾郵件最好的辦法就是消滅人類" >> path/to/file.txt
$ pangu -f path/to/file.txt >> pangu_file.txt
$ cat pangu_file.txt
未來的某一天,Gmail 配備的 AI 可能會得出一個結論:想要消滅垃圾郵件最好的辦法就是消滅人類

$ echo "心裡想的是Microservice,手裡做的是Distributed Monolith" | pangu
心裡想的是 Microservice,手裡做的是 Distributed Monolith

$ echo "你從什麼時候開始產生了我沒使用Monkey Patch的錯覺?" | python -m pangu
你從什麼時候開始產生了我沒使用 Monkey Patch 的錯覺?
Owner
Vinta Chen
I failed the Turing Test.
Vinta Chen
Format Covid values to ASCII-Table (Only for Germany and Austria)

Covid-19-Formatter (Only for Germany and Austria) Dieses Script speichert die gemeldeten Daten des RKIs / BMSGPK und formatiert diese zu einer Asci Ta

56 Jan 22, 2022
Add your new words to a text file and get them randomly.

Memorize-New-Words In this very very very little project, I've wrote a code to memorize new english words. Therefore you can add the words and their m

Mostafa 2 Jul 04, 2022
从flomo导出的笔记中生成词云

flomo-word-cloud 从flomo导出的笔记中生成词云 如何使用? 将本项目克隆到你的电脑上,使用如下的命令,安装所需python库 pip install -r requirements.txt 在项目里新建一个file文件夹,把所有从flomo导出的html文件放入其中 运行main

Hannnk 9 Dec 30, 2022
pydantic-i18n is an extension to support an i18n for the pydantic error messages.

pydantic-i18n is an extension to support an i18n for the pydantic error messages

Boardpack 48 Dec 21, 2022
Free & simple way to encipher text

VenSipher VenSipher is a free medium through which text can be enciphered. It can convert any text into an unrecognizable secret text that can only be

3 Jan 28, 2022
Word and phrase lists in CSV

Word Lists Word and phrase lists in CSV, collected from different sources. Oxford Word Lists: oxford-5k.csv - Oxford 3000 and 5000 oxford-opal.csv - O

Anton Zhiyanov 14 Oct 14, 2022
Map Reduce Wordcount in Python using gRPC

This project is implemented in Python using gRPC. The input files are given in .txt format and the word count operation is performed.

Divija 4 Dec 05, 2022
box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.

Box is a text-based visual programming language inspired by Unreal Engine blueprint function graphs. $ cat factorial.box ┌─ƒ(Factorial)───┐

Pranav 104 Dec 24, 2022
Fuzz a language by mixing up only few words.

afasi Fuzz a language by mixing up only few words. Status Beta. Note: The default branch is default. Use Examples Version General Help Translate Help

Stefan Hagen 2 Dec 14, 2022
strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

Mihail Zaytsev 1 Oct 22, 2021
A production-ready pipeline for text mining and subject indexing

A production-ready pipeline for text mining and subject indexing

UF Open Source Club 12 Nov 06, 2022
utoken is a multilingual tokenizer that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresses and hashtags.

utoken utoken is a multilingual tokenizer that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresse

Ulf Hermjakob 11 Jan 05, 2023
A simple text editor for linux

wolf-editor A simple text editor for linux Installing using Deb Package Download newest package from releases CD into folder where the downloaded acka

Focal Fossa 5 Nov 30, 2021
Open-source linguistic ethnography tool for framing public opinion in mediatized groups.

Open-source linguistic ethnography tool for framing public opinion in mediatized groups. Table of Contents Installing Quickstart Links Installing Pyth

Qualichat 7 Jun 02, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 08, 2023
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 02, 2023
Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 1.2k Jan 01, 2023
An anthology of a variety of tools for the Persian language in Python

An anthology of a variety of tools for the Persian language in Python

Persian Tools 106 Nov 08, 2022
Convert text to morse code and play morse code sound.

Convert text(english) to morse codes and play morse sound!

Mohammad Dori 5 Jul 15, 2022
A Python package to facilitate research on building and evaluating automated scoring models.

Rater Scoring Modeling Tool Introduction Automated scoring of written and spoken test responses is a growing field in educational natural language pro

ETS 59 Oct 10, 2022