Charsiu: A transformer-based phonetic aligner

Last update: Dec 09, 2022

Related tags

Overview

Charsiu: A transformer-based phonetic aligner [arXiv]

Note. This is a preview version. The aligner is under active development. New functions, new languages and detailed documentation will be added soon!

Intro

Charsiu is a phonetic alignment tool, which can:

recognise phonemes in a given audio file
perform forced alignment using phone transcriptions created in the previous step or provided by the user.
directly predict the phone-to-audio alignment from audio (text-independent alignment)

Fun fact: Char Siu is one of the most representative dishes of Cantonese cuisine 🍲 (see wiki).

Tutorial (In progress)

You can directly run our model in the cloud via Google Colab!

Forced alignment:
Textless alignmnet:

Development plan

Package

Items	Progress
Documentation	Nov 2021
Textgrid support	Nov 2021
Model compression	TBD

Multilingual support

Language	Progress
English (American)	√
Mandarin Chinese	Nov 2021
Spanish	Dec 2021
English (British)	TBD
Cantonese	TBD
AAVE	TBD

Pretrained models

Our pretrained models are availble at the HuggingFace model hub: https://huggingface.co/charsiu.

Dependencies

pytorch
transformers
datasets
librosa
g2pe
praatio

Training

Coming soon!

Finetuning

Coming soon!

Attribution and Citation

For now, you can cite this tool as:

@article{zhu2019charsiu,
  title={Phone-to-audio alignment without text: A Semi-supervised Approach},
  author={Zhu, Jian and Zhang, Cong and Jurgens, David},
  journal={arXiv preprint arXiv:????????????????????},
  year={2021}
 }

To share a direct web link: https://github.com/lingjzhu/charsiu/.

References

Transformers
s3prl
Montreal Forced Aligner

Disclaimer

This tool is a beta version and is still under active development. It may have bugs and quirks, alongside the difficulties and provisos which are described throughout the documentation. This tool is distributed under MIT liscence. Please see license for details.

By using this tool, you acknowledge:

That you understand that this tool does not produce perfect camera-ready data, and that all results should be hand-checked for sanity's sake, or at the very least, noise should be taken into account.
That you understand that this tool is a work in progress which may contain bugs. Future versions will be released, and bug fixes (and additions) will not necessarily be advertised.
That this tool may break with future updates of the various dependencies, and that the authors are not required to repair the package when that happens.
That you understand that the authors are not required or necessarily available to fix bugs which are encountered (although you're welcome to submit bug reports to Jian Zhu ([email protected]), if needed), nor to modify the tool to your needs.
That you will acknowledge the authors of the tool if you use, modify, fork, or re-use the code in your future work.
That rather than re-distributing this tool to other researchers, you will instead advise them to download the latest version from the website.

... and, most importantly:

That neither the authors, our collaborators, nor the the University of Michigan or any related universities on the whole, are responsible for the results obtained from the proper or improper usage of the tool, and that the tool is provided as-is, as a service to our fellow linguists.

All that said, thanks for using our tool, and we hope it works wonderfully for you!

Support or Contact

Please contact Jian Zhu ([email protected]) for technical support.
Contact Cong Zhang ([email protected]) if you would like to receive more instructions on how to use the package.

Charsiu: A transformer-based phonetic aligner

Related tags

Overview

Charsiu: A transformer-based phonetic aligner [arXiv]

Intro

Tutorial (In progress)

Development plan

Pretrained models

Dependencies

Training

Finetuning

Attribution and Citation

References

Disclaimer

Support or Contact

Owner

jzhu

ZEBRA: Zero Evidence Biometric Recognition Assessment

Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

Using deep learning to predict gene structures of the coding genes in DNA sequences of Arabidopsis thaliana

Bayesian Image Reconstruction using Deep Generative Models

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Neural Network to colorize grayscale images

The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing".

CTRL-C: Camera calibration TRansformer with Line-Classification

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Official Implementation for the "An Empirical Investigation of 3D Anomaly Detection and Segmentation" paper.

Python Algorithm Interview Book Review

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility ICCV2021

Rewrite ultralytics/yolov5 v6.0 opencv inference code based on numpy, no need to rely on pytorch

My personal code and solution to the Synacor Challenge from 2012 OSCON.

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

Efficient neural networks for analog audio effect modeling

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

EXplainable Artificial Intelligence (XAI)

Data Augmentation Using Keras and Python