PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

Automated question generation and question answering from Turkish texts using text-to-text transformers

Module for automatic summarization of text documents and HTML pages.

Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

Question answering app is used to answer for a user given question from user given text.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

Beautiful visualizations of how language differs among document types.

Arabic speech recognition, classification and text-to-speech.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Repository of the Code to Chatbots, developed in Python

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Fixes mojibake and other glitches in Unicode text, after the fact.

Codes to pre-train Japanese T5 models

End-to-End Speech Processing Toolkit

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

Pipeline for fast building text classification TF-IDF + LogReg baselines.

A workshop with several modules to help learn Feast, an open-source feature store