Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Last update: Dec 29, 2022

Related tags

Text Data & NLP python-zhuyin

Overview

Python-Zhuyin (pyzhuyin) 注音和拼音轉換

Introduction 介紹

pyzhuyin is an open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo).

pyzhuyin 是一個開放原始碼的 Python 套件，提供了將拼音轉換成注音的統一介面。

Installation 安裝

pip install pyzhuyin

Usage 使用

from pyzhuyin import pinyin_to_zhuyin, zhuyin_to_pinyin


assert(pinyin_to_zhuyin("lu3") == "ㄌㄨˇ")
assert(pinyin_to_zhuyin("dan4") == "ㄉㄢˋ")
assert(map(pinyin_to_zhuyin, ["lu3", "dan4"]) == ["ㄌㄨˇ", "ㄉㄢˋ"])

assert(zhuyin_to_pinyin("ㄌㄩˊ") == "lü2")
assert(zhuyin_to_pinyin("˙ㄗ") == "zi5")
assert(map(lambda z: zhuyin_to_pinyin(z, u_to_v=True), ["ㄌㄩˊ", "˙ㄗ"]) == ["lv2", "zi5"])

Testing 測試

Run the following command at the root of the project to test the library:

在根目錄執行以下指令以測試套件:

python3 -m unittest

Notes 備註

Only support numeric tone for pinyin
- e.g. "lu3" instead of "lǔ"
Neutral tone is represented as 5
- e.g. "˙ㄗ" -> "zi5"
For pinyin_to_zhuyin:
- if corresponding zhuyin not found, raise ValueError
- internally convert all v to ü
For zhuyin_to_pinyin:
- if corresponding pinyin not found, raise ValueError
兒化音 is not supported because it is not representable in the zhuyin system as a "combo" word
- e.g. "公園兒" -> "gong1 yuanr2" -> "ㄍㄨㄥㄩㄢㄦˊ" (not allowed)

Data Sources 資料來源

中華民國教育部（Ministry of Education, R.O.C.）。《重編國語辭典修訂本》（版本編號：2015_20210928 ）

網址：https://dict.revised.moe.edu.tw/

CC BY-ND 3.0 TW 授權

Author 作者

Raymond Ku

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Related tags

Overview

Python-Zhuyin (pyzhuyin) 注音和拼音轉換

Introduction 介紹

Installation 安裝

Usage 使用

Testing 測試

Notes 備註

Data Sources 資料來源

Author 作者

Owner

A simple chatbot based on chatterbot that you can use for anything has basic features

My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

AutoGluon: AutoML for Text, Image, and Tabular Data

Datasets of Automatic Keyphrase Extraction

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Code for the paper "Flexible Generation of Natural Language Deductions"

Data preprocessing rosetta parser for python

Beyond Paragraphs: NLP for Long Sequences

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Count the frequency of letters or words in a text file and show a graph.

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

NLPShala , the best IDE for all Natural language processing tasks.

Translation for Trilium Notes. Trilium Notes 中文版.

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Related tags

Overview

Python-Zhuyin (pyzhuyin) 注音和拼音轉換

Introduction 介紹

Installation 安裝

Usage 使用

Testing 測試

Notes 備註

Data Sources 資料來源

Author 作者

Owner

A simple chatbot based on chatterbot that you can use for anything has basic features

My Implementation for the paper EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks using Tensorflow

AutoGluon: AutoML for Text, Image, and Tabular Data

Datasets of Automatic Keyphrase Extraction

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Code for the paper "Flexible Generation of Natural Language Deductions"

Data preprocessing rosetta parser for python

Beyond Paragraphs: NLP for Long Sequences

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Count the frequency of letters or words in a text file and show a graph.

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

NLPShala , the best IDE for all Natural language processing tasks.

Translation for Trilium Notes. Trilium Notes 中文版.

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。