German Text-To-Speech Engine using Tacotron and Griffin-Lim

Related tags

Text Data & NLPjotts
Overview

jotts

JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due to real time usage I decided not to include a vocoder and use griffin-lim instead which results in a more robotic voice but is much faster.

API

  • First create an instance of JoTTS. The initializer takes force_model_download as an optional parameter in case that the last download of the synthesizer failed and the model cannot be applied.

  • Call speak with a text parameter that contains the text to speak out loud. The second parameter can be set to True, to wait until speaking is done.

  • Use text2wav to create a wav file instead of speaking the text.

Example usage

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

Todo

  • Add an option to change the default audio device to speak the text
  • Add a parameter to select other models but the default model
  • Add threading or multi processing to allow speaking without blocking
  • Add a vocoder instead of griffin-lim to improve audio output.

Training a model for your own voice

Training a synthesizer model is easy - if you know how to do it. I created a course on udemy to show you how it is done. Don't buy the tutorial for the full price, there is a discout every month :-)

https://www.udemy.com/course/voice-cloning/

If you neither have the backgroud or the resources or if you are just lazy or too rich, contact me for contract work. Cloning a voice normally needs ~15 Minutes of clean audio from the voice you want to clone.

Disclaimer

I hope that my (and any other person's) voice will be used only for legal and ethical purposes. Please do not get into mischief with it.

Comments
  • SSL: CERTIFICATE_VERIFY_FAILED

    SSL: CERTIFICATE_VERIFY_FAILED

    my code is

    from jotts import JoTTS
    jotts = JoTTS()
    jotts.speak("Das Wetter heute ist fantastisch.", True)
    jotts.textToWav("Es war aber auch schon mal besser!")
    

    and I receive this :

    2022-11-01 09:39:57.536 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
    2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:50 - There is no tts model yet, downloading...
    2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:60 - Download file: https://github.com/padmalcom/jotts/releases/download/v0.1/v0.1.pt
    v0.1.pt: 0.00B [00:00, ?B/s]
    
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
        encode_chunked=req.has_header('Transfer-encoding'))
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
        self._send_request(method, url, body, headers, encode_chunked)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
        self.endheaders(body, encode_chunked=encode_chunked)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
        self.send(msg)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
        self.connect()
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1392, in connect
        server_hostname=server_hostname)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
        session=session
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 853, in _create
        self.do_handshake()
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1117, in do_handshake
        self._sslobj.do_handshake()
    ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        jotts = JoTTS()
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 68, in __init__
        MODEL_FILE = self.__prepare_model__(force_model_download);
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 62, in __prepare_model__
        urllib.request.urlretrieve(DOWNLOAD_URL, filename=MODEL_FILE, reporthook=t.update_to)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
        return opener.open(url, data, timeout)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
        response = self._open(req, data)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
        '_open', req)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
        result = func(*args)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
        context=self._context, check_hostname=self._check_hostname)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
        raise URLError(err)
    urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
    

    what am I doing wrong. ? Thanks !

    opened by deladriere 3
  • Samples of jotts in combination with a modern vocoder like (MB)Melgan, HifiGAN

    Samples of jotts in combination with a modern vocoder like (MB)Melgan, HifiGAN

    I tried to drop a spectrogram sanmple as npy and feed HifiGAN but it gave me a lot of noise. I am wondering how good your results are, do you have samples with vocoders like above?

    opened by eqikkwkp25-cyber 2
  • jotts.text2wav not existing / needs jotts.textToWav

    jotts.text2wav not existing / needs jotts.textToWav

    running this example on MacOS 11.6

    from jotts import JoTTS
    
    jotts = JoTTS()
    jotts.speak("Das Wetter heute ist fantastisch.", True)
    jotts.speak("Wir sind Die Roboter.", True)
    jotts.text2wav("Es war aber auch schon mal besser!")
    

    give an error trying to generate the wav file (The speak function works really well !)

    2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
    2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
    2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
    Synthesizer using device: cpu
    Trainable Parameters: 30.874M
    Loaded synthesizer "v0.1.pt" trained to step 79000
    
    | Generating 1/1
    [W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.
    
    
    Done.
    
    | Generating 1/1
    
    
    Done.
    
    Traceback (most recent call last):
      File "test_jotts.py", line 6, in <module>
        jotts.text2wav("Es war aber auch schon mal besser!")
    AttributeError: 'JoTTS' object has no attribute 'text2wav'
    

    using jotts.textToWav works well but there is still this [W NNPACK.cpp:79] message here is the output

    2021-12-14 17:45:31.699 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
    2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
    2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
    Synthesizer using device: cpu
    Trainable Parameters: 30.874M
    Loaded synthesizer "v0.1.pt" trained to step 79000
    
    | Generating 1/1
    [W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.
    
    
    Done.
    
    
    | Generating 1/1
    
    
    Done.
    
    
    | Generating 1/1
    
    
    Done.
    
    opened by deladriere 2
  • can this run on a Rapsberry Pi  Zero ?

    can this run on a Rapsberry Pi Zero ?

    Sorry not an issue but I would like to have a Raspberry Pi Zero speak German without the need for an Internet connection (Amazon Polly and IBM Watson have great German voices but are paid service quite complex to install - not to mention the need for a connect and its delays) I just subscribed to your course (I understand only a bit of German) ;-) Maybe some of the heavy work can be done on a fast computer but I need the text to speech to be done on the Raspberry Pi ?

    opened by deladriere 2
  • Missing additional information in README

    Missing additional information in README

    Typo somewhere: The readme says "The synthesizer model has been trained on my voice using Tacotron1." while the releases say "v0.1 Latest Pre-trained German synthesizer model based on tacotron2."

    Can you add more hints how you trained your model(s), i.e. which base repository, data structure and how many hours of your voice you need for the current results?

    opened by eqikkwkp25-cyber 1
Releases(generic_v0.4)
Owner
padmalcom
PhD in Computer Science, interested in machine learning, game programming and robotics. Hope my projects help somewhere.
padmalcom
Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

words-per-minute A terminal app written in python utilizing the curses module th

Tanim Islam 1 Jan 14, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

Roberto Sanchez 0 Aug 04, 2021
Nested Named Entity Recognition for Chinese Biomedical Text

CBio-NAMER CBioNAMER (Nested nAMed Entity Recognition for Chinese Biomedical Text) is our method used in CBLUE (Chinese Biomedical Language Understand

8 Dec 25, 2022
A CSRankings-like index for speech researchers

Speech Rankings This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possib

Mutian He 19 Nov 26, 2022
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

CrossSum This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summ

BUET CSE NLP Group 29 Nov 19, 2022
A simple Streamlit App to classify swahili news into different categories.

Swahili News Classifier Streamlit App A simple app to classify swahili news into different categories. Installation Install all streamlit requirements

Davis David 4 May 01, 2022
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

Kakao Brain 1.2k Dec 21, 2022
Translates basic English sentences into the Huna language (hoo-NAH)

huna-translator The Huna Language Translates basic English sentences into the Huna language (hoo-NAH). The Huna constructed language was developed in

Miles Smith 0 Jan 20, 2022
Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

Language Technology at the University of Helsinki 167 Jan 03, 2023
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
Script to generate VAD dataset used in Asteroid recipe

About the dataset LibriVAD is an open source dataset for voice activity detection in noisy environments. It is derived from LibriSpeech signals (clean

11 Sep 15, 2022
NLP-Project - Used an API to scrape 2000 reddit posts, then used NLP analysis and created a classification model to mixed succcess

Project 3: Web APIs & NLP Problem Statement How do r/Libertarian and r/Neoliberal differ on Biden post-inaguration? The goal of the project is to see

Adam Muhammad Klesc 2 Mar 29, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

Jie Lei 雷杰 612 Jan 04, 2023
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classifi

186 Dec 24, 2022
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems, code simplify inside Jupyter Notebooks 100%. Tab

HUSEIN ZOLKEPLI 1.7k Dec 30, 2022
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022