Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Last update: Dec 23, 2022

Related tags

Overview

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

This is an unofficial implementation of GradTTS. We created this project based on GlowTTS (https://github.com/jaywalnut310/glow-tts). We replace the GlowDecoder with DiffusionDecoder which follows the settings of the original paper. In addition, we also replace torch.distributed with horovod for convenience and we don't use fp16 now.

Training and inference

Please go to egs/ folder, and see run.sh and inference_waveglow_vocoder.py for example use. Before training, please download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY. And build Monotonic Alignment Search Code (Cython): cd monotonic_align; python setup.py build_ext --inplace. Before inference, you should download waveglow checkpoint from download_link and put it into the waveglow folder.

Reference Materials

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

GlowTTS

Score-Based Generative Modeling through Stochastic Differential Equations

score_sde_pytorch

denoising-diffusion-pytorch

Authors

Heyang Xue(https://github.com/WelkinYang) and Qicong Xie(https://github.com/QicongXie)

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Related tags

Overview

GradTTS

Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv)

About this repo

Training and inference

Reference Materials

Authors

Owner

HeyangXue1997

Using VapourSynth with super resolution models and speeding them up with TensorRT.

PG2Net: Personalized and Group PreferenceGuided Network for Next Place Prediction

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Implementation of Convolutional enhanced image Transformer

The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

Kaggle Feedback Prize - Evaluating Student Writing 15th solution

The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

It is the assignment for COMP 576 in Rice University

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

DTCN SMP Challenge - Sequential prediction learning framework and algorithm

Randomized Correspondence Algorithm for Structural Image Editing

Corgis are the cutest creatures; have 30K of them!

Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

PPO Lagrangian in JAX

code for generating data set ES-ImageNet with corresponding training code

A collection of Google research projects related to Federated Learning and Federated Analytics.

Acoustic mosquito detection code with Bayesian Neural Networks