ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

Tom-the-AI - A compound artificial intelligence software for Linux systems.

General purpose Slater-Koster tight-binding code for electronic structure calculations

Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

Pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model'

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Explainer for black box models that predict molecule properties

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)

Denoising Normalizing Flow

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

Repo for the ACMMM20 submission: "Personalized breath based biometric authentication with wearable multimodality".

A PyTorch-based library for semi-supervised learning

AAAI 2022 paper - Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction

Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Metric learning algorithms in Python

Doing the asl sign language classification on static images using graph neural networks.

TeST: Temporal-Stable Thresholding for Semi-supervised Learning

frida工具的缝合怪

EsViT: Efficient self-supervised Vision Transformers