Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Last update: Oct 14, 2022

Overview

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

This is an implementation of the paper, along with the pipeline and pretrained model using an open dataset. Audio samples of the paper is available here.

Recipe

This open pipeline uses the Databaker dataset. Please refer to our previous pipeline for dataset preprocessing, while only the Databaker dataset is used. Besides, you need to run lexicon/build_databaker.py to build the vocabulary, download the lexicon from zdic.net, and encode them with XLM-R. Feel free to change the target directory to save the data, which is specified in build_databaker.py and lexicon_utils.py.

Below are the commands to train and evaluate. Default target directories specified in the preprocessing scripts are used, so please substitute them with your own. The evaluation script can be run simultaneously with the training script. You may also use the evaluation script to synthesize samples from pretrained models. Please refer to the help of the arguments for their meanings.

python -m torch.distributed.launch --nproc_per_node=NGPU --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=D:\free_corpus\packed\ --training_languages=zh-cn --eval_languages=zh-cn --training_speakers=databaker --eval_steps=100000:150000 --hparams="input_method=char,multi_speaker=True,use_knowledge_attention=True,remove_space=True,data_format=nlti" --external_embed=D:\free_corpus\packed\embed.zip --vocab=D:\free_corpus\packed\db_vocab.json

python eval.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=D:\free_corpus\packed\ --eval_languages=zh-cn --eval_meta=D:\free_corpus\packed\metadata.eval.txt --hparams="input_method=char,multi_speaker=True,use_knowledge_attention=True,remove_space=True,data_format=nlti" --start_step=100000 --vocab=D:\free_corpus\packed\db_vocab.json --external_embed=D:\free_corpus\packed\embed.zip --eval_speakers=databaker

Besides, to report CER, you need to create azure_key.json with your own Azure STT subscription, with content of {"subscription": "YOUR_KEY", "region": "YOUR_REGION"}, see utils/transcribe.py. Due to significant differences of the datasets used, the implementation is for demonstration only and could not fully reproduce the results in the paper.

Pretrained Model

The pretrained models on Databaker are available at OneDrive Link, which reaches a CER of 4.19%. Relevant files necessary for generation of speeches including lexicon texts, lexicon embeddings, the vocabulary file, and evaluation scripts are also included to aid fast reproduction.

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Related tags

Overview

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Recipe

Pretrained Model

Owner

Mutian He

Language-Agnostic SEntence Representations

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

Labelling platform for text using distant supervision

Minimal GUI for accessing the Watson Text to Speech service.

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Semantic search for quotes.

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

A python framework to transform natural language questions to queries in a database query language.

Turkish Stop Words Türkçe Dolgu Sözcükleri

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Text Classification Using LSTM

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

Chinese Named Entity Recognization (BiLSTM with PyTorch)

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!