ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

Development of IP code based on VIPs and AADM

ML powered analytics engine for outlier detection and root cause analysis.

Official Implementation of "Transformers Can Do Bayesian Inference"

Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras

Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

The code uses SegFormer for Semantic Segmentation on Drone Dataset.

使用yolov5训练自己数据集(详细过程)并通过flask部署

A Marvelous ChatBot implement using PyTorch.

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swar.

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

NP DRAW paper released code

Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

Py-FEAT: Python Facial Expression Analysis Toolbox