Tacotron2-HiFiGAN-master

Implementation of TTS with combination of Tacotron2 and HiFi-GAN for Mandarin TTS.

Inference

In order to inference, we need to download pre-trained tacotraon2 model for mandarin, and place in the root path. Then, we can run infer_tacotron2_hifigan.py to get TTS result. We can alter the input text by editting variablle text in the infer_tacotron2_hifigan.py. Then the result will be saved in the root path named as output.wav.

The pre-trained model of HiFi-GAN has been placed in the LJ_FT_T2_V3, which is trained by LJSppech and fine-tuned with Tacotron2. You can find more pre-trained model from original HiFi-GAN repo with different size and parameters. If you want to try different models or train your own model, please do remember to alter variables in infer_tacotron2_hifigan.py to change the path of HiFi-GAN model.

Audio Sample

Input: 相对论直接和间接的催生了量子力学的诞生也为研究微观世界的高速运动确立了全新的数学模型
Output: tacotron2-hifigan.wav

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Related tags

Overview

Tacotron2-HiFiGAN-master

Inference

Audio Sample

Owner

SunLu Z

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Python powered crossword generator with database with 20k+ polish words

Common Voice Dataset explorer

Codes for processing meeting summarization datasets AMI and ICSI.

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

lightweight, fast and robust columnar dataframe for data analytics with online update

Python port of Google's libphonenumber

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

RIDE automatically creates the package and boilerplate OOP Python node scripts as per your needs

Simple and efficient RevNet-Library with DeepSpeed support

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

Autoregressive Entity Retrieval

Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

Pre-training BERT masked language models with custom vocabulary

Yet Another Neural Machine Translation Toolkit

MPNet: Masked and Permuted Pre-training for Language Understanding

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Python library for processing Chinese text