German Text-To-Speech Engine using Tacotron and Griffin-Lim

Last update: Aug 28, 2022

Related tags

Overview

jotts

JoTTS is a German text-to-speech engine using tacotron and griffin-lim. The synthesizer model has been trained on my voice using Tacotron1. Due to real time usage I decided not to include a vocoder and use griffin-lim instead which results in a more robotic voice but is much faster.

API

First create an instance of JoTTS. The initializer takes force_model_download as an optional parameter in case that the last download of the synthesizer failed and the model cannot be applied.
Call speak with a text parameter that contains the text to speak out loud. The second parameter can be set to True, to wait until speaking is done.
Use text2wav to create a wav file instead of speaking the text.

Example usage

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

Todo

Add an option to change the default audio device to speak the text
Add a parameter to select other models but the default model
Add threading or multi processing to allow speaking without blocking
Add a vocoder instead of griffin-lim to improve audio output.

Training a model for your own voice

Training a synthesizer model is easy - if you know how to do it. I created a course on udemy to show you how it is done. Don't buy the tutorial for the full price, there is a discout every month :-)

https://www.udemy.com/course/voice-cloning/

If you neither have the backgroud or the resources or if you are just lazy or too rich, contact me for contract work. Cloning a voice normally needs ~15 Minutes of clean audio from the voice you want to clone.

Disclaimer

I hope that my (and any other person's) voice will be used only for legal and ethical purposes. Please do not get into mischief with it.

Comments

SSL: CERTIFICATE_VERIFY_FAILED

my code is

from jotts import JoTTS
jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.textToWav("Es war aber auch schon mal besser!")

and I receive this :

2022-11-01 09:39:57.536 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:50 - There is no tts model yet, downloading...
2022-11-01 09:39:57.537 | DEBUG    | jotts.jotts:__prepare_model__:60 - Download file: https://github.com/padmalcom/jotts/releases/download/v0.1/v0.1.pt
v0.1.pt: 0.00B [00:00, ?B/s]

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1392, in connect
    server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 2, in <module>
    jotts = JoTTS()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 68, in __init__
    MODEL_FILE = self.__prepare_model__(force_model_download);
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/jotts/jotts.py", line 62, in __prepare_model__
    urllib.request.urlretrieve(DOWNLOAD_URL, filename=MODEL_FILE, reporthook=t.update_to)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

what am I doing wrong. ? Thanks !

opened by deladriere 3

Samples of jotts in combination with a modern vocoder like (MB)Melgan, HifiGAN

I tried to drop a spectrogram sanmple as npy and feed HifiGAN but it gave me a lot of noise. I am wondering how good your results are, do you have samples with vocoders like above?

opened by eqikkwkp25-cyber 2

jotts.text2wav not existing / needs jotts.textToWav

running this example on MacOS 11.6

from jotts import JoTTS

jotts = JoTTS()
jotts.speak("Das Wetter heute ist fantastisch.", True)
jotts.speak("Wir sind Die Roboter.", True)
jotts.text2wav("Es war aber auch schon mal besser!")

give an error trying to generate the wav file (The speak function works really well !)

2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
2021-12-14 17:41:22.415 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
Synthesizer using device: cpu
Trainable Parameters: 30.874M
Loaded synthesizer "v0.1.pt" trained to step 79000

| Generating 1/1
[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


Done.

| Generating 1/1


Done.

Traceback (most recent call last):
  File "test_jotts.py", line 6, in <module>
    jotts.text2wav("Es war aber auch schon mal besser!")
AttributeError: 'JoTTS' object has no attribute 'text2wav'

using jotts.textToWav works well but there is still this [W NNPACK.cpp:79] message here is the output

2021-12-14 17:45:31.699 | DEBUG    | jotts.jotts:__init__:66 - Initializing JoTTS...
2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:83 - Using CPU for inference.
2021-12-14 17:45:31.700 | DEBUG    | jotts.jotts:__init__:85 - Loading the synthesizer...
Synthesizer using device: cpu
Trainable Parameters: 30.874M
Loaded synthesizer "v0.1.pt" trained to step 79000

| Generating 1/1
[W NNPACK.cpp:79] Could not initialize NNPACK! Reason: Unsupported hardware.


Done.


| Generating 1/1


Done.


| Generating 1/1


Done.

opened by deladriere 2

can this run on a Rapsberry Pi Zero ?

Sorry not an issue but I would like to have a Raspberry Pi Zero speak German without the need for an Internet connection (Amazon Polly and IBM Watson have great German voices but are paid service quite complex to install - not to mention the need for a connect and its delays) I just subscribed to your course (I understand only a bit of German) ;-) Maybe some of the heavy work can be done on a fast computer but I need the text to speech to be done on the Raspberry Pi ?

opened by deladriere 2
Missing additional information in README

Typo somewhere: The readme says "The synthesizer model has been trained on my voice using Tacotron1." while the releases say "v0.1 Latest Pre-trained German synthesizer model based on tacotron2."

Can you add more hints how you trained your model(s), i.e. which base repository, data structure and how many hours of your voice you need for the current results?

opened by eqikkwkp25-cyber 1

Releases(generic_v0.4)

generic_v0.4(Dec 30, 2022)

Trained for 98k steps on german common voice dataset.
Source code(tar.gz)
Source code(zip)
generic_v0.4.pt(353.51 MB)
vocoder_v0.1(Nov 8, 2022)

WaveRNN vocoder trained for 142.000 steps. Can be used instead of griffin-lim algorithm, might deliver better results but requires more ressources to apply.
Source code(tar.gz)
Source code(zip)
vocoder_v0.1.pt(51.40 MB)
jonas_v0.1(Nov 22, 2021)

Pre-trained German synthesizer model based on tacotron.
Source code(tar.gz)
Source code(zip)
jonas_v0.1.pt(353.49 MB)
generic_v0.3(Oct 27, 2022)

Trained for 75k steps on high quality voice.
Source code(tar.gz)
Source code(zip)
generic_v0.3.pt(353.49 MB)

Owner

padmalcom

PhD in Computer Science, interested in machine learning, game programming and robotics. Hope my projects help somewhere.

GitHub Repository

基于Transformer的单模型、多尺度的VAE模型

UniVAE 基于Transformer的单模型、多尺度的VAE模型介绍 https://kexue.fm/archives/8475 依赖需要大于0.10.6版本的bert4keras（当前还没有推到pypi上，可以直接从GitHub上clone最新版）。引用 @misc{univae,

49 Aug 24, 2022

Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

577 Jan 07, 2023

TPlinker for NER 中文/英文命名实体识别

本项目是参考 TPLinker 中HandshakingTagging思想，将TPLinker由原来的关系抽取(RE)模型修改为命名实体识别(NER)模型。

113 Dec 28, 2022

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

12 Sep 28, 2022

📔️ Generate a text-based journal from a template file.

JGen 📔️ Generate a text-based journal from a template file. Contents Getting Started Example Overview Usage Details Reserved Keywords Gotchas Getting

21 Sep 25, 2022

Codes for coreference-aware machine reading comprehension

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022. Dataset There are three folders for our thr

11 Sep 29, 2022

AMUSE - financial summarization

AMUSE AMUSE - financial summarization Unzip data.zip Train new model: python FinAnalyze.py --task train --start 0 --count how many files,-1 for all

1 Jan 11, 2022

Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

167 Jan 03, 2023

Repository of the Code to Chatbots, developed in Python

Description In this repository you will find the Code to my Chatbots, developed in Python. I'll explain the structure of this Repository later. Requir

0 Oct 25, 2022

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

Tokenizer Le Tokenizer est un analyseur lexicale, il permet, comme Flex and Yacc par exemple, de tokenizer du code, c'est à dire transformer du code e

1 Aug 15, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

112 Nov 06, 2022

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

409 Oct 28, 2022

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

What is MUSE? MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE). MUS

47 Sep 05, 2022

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

100 Dec 19, 2022

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

MedMCQA MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering A large-scale, Multiple-Choice Question Answe

24 Nov 30, 2022