PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Last update: Nov 14, 2022

Overview

VAENAR-TTS - PyTorch Implementation

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Inference

You have to download the pretrained models and put them in output/ckpt/LJSpeech/.

For English single-speaker TTS, run

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

The generated utterances will be put in output/result/.

Batch Inference

Batch inference is also supported, try

python3 synthesize.py --source preprocessed_data/LJSpeech/val.txt --restore_step RESTORE_STEP --mode batch -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

to synthesize all utterances in preprocessed_data/LJSpeech/val.txt

Training

Datasets

The supported datasets are

LJSpeech: a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.

Preprocessing

First, run

python3 prepare_align.py config/LJSpeech/preprocess.yaml

for some preparations. And then run the preprocessing script.

python3 preprocess.py config/LJSpeech/preprocess.yaml

Training

Train your model with

python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

TensorBoard

Use

tensorboard --logdir output/log/LJSpeech

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.

Implementation Issues

Removed arguments, methods during converting Tensorflow to PyTorch: name, kwargs, training, get_config()
Specify in_features in LinearNorm which is corresponding to tf.keras.layers.Dense. Also, in_channels is explicitly specified in Conv1D.
get_mask_from_lengths() function returns logical not of that of FastSpeech2.
In this implementation, the griffin_lim algorithms is used to convert a mel-spectrogram to a waveform. You can use HiFi-GAN as a vocoder by setting config, but you need to train it from scratch (you cannot use the provided pre-trained HiFi-GAN model).

Citation

@misc{lee2021vaenar-tts,
  author = {Lee, Keon},
  title = {VAENAR-TTS},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/VAENAR-TTS}}
}

References

thuhcsi's VAENAR-TTS

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

40 Aug 14, 2022

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

261 Nov 12, 2022

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

152 Jan 2, 2023

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

1.8k Jan 8, 2023

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

66 Nov 28, 2022

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

GLAT Implementation for the ACL2021 paper "Glancing Transformer for Non-Autoregressive Neural Machine Translation" Requirements Python = 3.7 Pytorch

117 Jan 9, 2023

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

150 Dec 6, 2022

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

WaveGrad2 - PyTorch Implementation PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis. Status (202

59 Dec 6, 2022

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling Reference Main paper to be cited (Di Wu et al., 2020) @article

34 Nov 3, 2022

Comments

inference results

Hi! Thank you for the port. Have you been able to get results on inference stage? I successfully train model, validation losses are decreasing, but at inference there's garbage. I started logging log-probabilities of posterior and prior networks and see that they're also going down throughout training. Logpgobs in around -70000 for both networks which is very very small number, say zero in probability space. Also if remove clipping kl divergence torch.max(kl_divergence, torch.tensor(0., device=device)) something bad happens and kl goes negative, which is not possible in math point of view, but can be if our values are not valid distributions. Then I set n_samples to 4 and reduce batch_size to 8 but still get negative values. Pytorch's implementation of KLDivloss with log_targets=True always give 0 loss values.... so.. have you had any success?

opened by thepowerfuldeez 22
For model/prior.py _initial_sample, why the prob is calculated as from N(0,1)?

Hello, thanks for sharing the pytorch-based code! However, I have some question about the _initial_sample func in model/prior.py. epsilon is sampled from N(0, t) (t is the temperature), how its logprob is calculated? For norm distribution, After log (the mean is 0) . Can you explain why use \sigma as 1 instead of t here?

opened by seekerzz 16

Releases(v1.0.0)

v1.0.0(Aug 3, 2021)

First Official Release
Source code(tar.gz)
Source code(zip)

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Related tags

Overview

VAENAR-TTS - PyTorch Implementation

Quickstart

Dependencies

Inference

Batch Inference

Training

Datasets

Preprocessing

Training

TensorBoard

Implementation Issues

Citation

References

You might also like...

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

Comments

inference results

For model/prior.py _initial_sample, why the prob is calculated as from N(0,1)?

Releases(v1.0.0)

v1.0.0(Aug 3, 2021)

Owner

Keon Lee

Studying Python release adoptions by looking at PyPI downloads

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

A python comtrade load library accelerated by go

Official Python implementation of the FuzionCoin protocol

Causal-Adversarial-Instruments - PyTorch Implementation for Developing Library of Investigating Adversarial Examples on A Causal View by Instruments

This program creates a formatted excel file which highlights the undervalued stock according to Graham's number.

Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels (BMVC 2021)

百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

A modular, research-friendly framework for high-performance and inference of sequence models at many scales

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

A Real-World Benchmark for Reinforcement Learning based Recommender System

Framework that uses artificial intelligence applied to mathematical models to make predictions

Few-shot NLP benchmark for unified, rigorous eval

Optimising chemical reactions using machine learning

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021