Refactored version of FastSpeech2

Last update: May 26, 2022

Overview

FastSpeech2

This repository is a refactored version from ming024's own. I focused on refactoring structure for fitting my cases and making parallel pre-processing codes. And I wrote installation guide with the latest version of MFA(Montreal Force Aligner).

Installation

Tested on python 3.8, Ubuntu 20.04
- Notice ! For installing MFA, you should install the miniconda.
- If you run MFA under 16.04 or ealier version of Ubuntu, you will face a compile error.
In your system
- To install pyworld, run "sudo apt-get install python3.x-dev". (x is your python version).
- To install sndfile, run "sudo apt-get install libsndfile-dev"
- To use MFA, run "sudo apt-get install libopenblas-base"
Install requirements

# install pytorch_sound
pip install git+https://github.com/appleholic/pytorch_sound
pip install -e .

Download datasets

VCTK
- Visit and download dataset from https://datashare.is.ed.ac.uk/handle/10283/2651
- Move to "./data" and extract compressed file.
  - If you wanna save dataset to another directory, you must change the path of configuration files.
LibriTTS
- To be updated

Install MFA
- Visit and follow a guide that described in MFA installation website.
- Additional installation
  - mfa thirdparty download
  - mfa download acoustic english
Pre-trained checkpoint
- VCTK, 400k steps : Google Drive Link

Preprocess (VCTK case)

Prepare MFA

python fastspeech2/scripts/prepare_align.py configs/vctk_prepare_align.json

Run MFA for making alignments

# Define your the number of threads to run MFA at the last of a command. "-j [The number of threads]"
mfa align data/fastspeech2/vctk lexicons/librispeech-lexicon.txt english data/fastspeech2/vctk-pre -j 24

Feature preprocessing

python fastspeech2/scripts/preprocess.py configs/vctk_preprocess.json

Train

Multi-speaker fastspeech2

python fastspeech2/scripts/train.py configs/fastspeech2_vctk_tts.json

If you want to change the parameters of training FastSpeech2, check out the code and put the option to configuration file.
- train code : fastspeech2/scripts/train.py
- config : configs/fastspeech2_vctk_tts.json

Fastspeech2 with reference encoder (To be updated)

Synthesize

Multi-spaker model

In a code

from fastspeech2.inference import Inferencer
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# arguments
# chk_path: str, lexicon_path: str, device: str = 'cuda'
inferencer = Inferencer(chk_path=chk_path, lexicon_path=lexicon_path, device=device)

# initialize hifigan
interface = InterfaceHifiGAN(model_name='hifi_gan_v1_universal', device='cuda')

# arguments
# text: str, speaker: int = 0, pitch_control: float = 1., energy_control: float = 1., duration_control: float = 1.
txt = 'Hello, I am a programmer.'
mel_spectrogram = inferencer.tts(txt, speaker=0)

# Reconstructs speech by using Hifi-GAN
pred_wav = interface.decode(mel_spectrogram.transpose(1, 2)).squeeze()

# If you test on a jupyter notebook
from IPython.display import Audio
Audio(pred_wav.cpu().numpy(), rate=22050)

In command line

python fastspeech2/scripts/synthesize.py [TEXT] [OUTPUT PATH] [CHECKPOINT PATH] [LEXICON PATH] [[DEVICE]] [[SPEAKER]]

Reference encoder (not updated)

Reference

ming024/FastSpeech2

Refactored version of FastSpeech2

Related tags

Overview

FastSpeech2

Installation

Preprocess (VCTK case)

Train

Synthesize

Multi-spaker model

Reference encoder (not updated)

Reference

Owner

ILJI CHOI

A simple word search made in python

Material for GW4SHM workshop, 16/03/2022.

List of GSoC organisations with number of times they have been selected.

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

nlpcommon is a python Open Source Toolkit for text classification.

Residual2Vec: Debiasing graph embedding using random graphs

BiNE: Bipartite Network Embedding

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Write Alphabet, Words and Sentences with your eyes.

The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

Fixes mojibake and other glitches in Unicode text, after the fact.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Global Rhythm Style Transfer Without Text Transcriptions

Reading Wikipedia to Answer Open-Domain Questions

This repository has a implementations of data augmentation for NLP for Japanese.

Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms