glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Last update: Dec 25, 2022

Related tags

Overview

Glow-Speak

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Installation

git clone https://github.com/rhasspy/glow-speak.git
cd glow-speak/

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade setuptools wheel
pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -r requirements.txt

python3 setup.py develop
glow-speak --version

Voices

The following languages/voices are supported:

German
- de_thorsten
Chinese
- cmn_jing_li
Greek
- el_rapunzelina
English
- en-us_ljspeech
- en-us_mary_ann
Spanish
- es_tux
Finnish
- fi_harri_tapani_ylilammi
French
- fr_siwis
Hungarian
- hu_diana_majlinger
Italian
- it_riccardo_fasol
Korean
- ko_kss
Dutch
- nl_rdh
Russian
- ru_nikolaev
Swedish
- sv_talesyntese
Swahili
- sw_biblia_takatifu
Vietnamese
- vi_vais1000

Usage

Download Voices

glow-speak-download de_thorsten

Command-Line Synthesis

glow-speak -v en-us_mary_ann 'This is a test.' --output-file test.wav

HTTP Server

glow-speak-http-server --debug

Visit http://localhost:5002

Socket Server

Start the server:

glow-speak-socket-server --voice en-us_mary_ann --socket /tmp/glow-speak.sock

From a separate terminal:

echo 'This is a test.' | bin/glow-speak-socket-client --socket /tmp/glow-speak.sock | xargs aplay

Lines from client to server are synthesized, and the path to the WAV file is returned (usually in /tmp).

You might also like...

End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

5.9k Jan 3, 2023

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

26 Dec 14, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

86 Jun 11, 2021

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

34 Sep 8, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗 Contributing to OpenSpeech 🤗 OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

513 Jan 3, 2023

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

21 Dec 20, 2022

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

3 Apr 2, 2022

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

19 Oct 14, 2022

Comments

AssertionError on web interface (only) - and Raspberry Pi Bullseye test

Hi Micheal,

great work again! :smiley:

I just saw this repository and thought I'd give it a try on my freshly installed Raspberry Pi 4 with 32bit Raspberry Pi OS Bullseye (Debian 11). Installation almost finished without errors! :partying_face: ... I just had to fix one thing: sudo apt-get install libatlas-base-dev After 15min I was already generating audio :grin: :+1:

When I tested en mary_ann and thorsten_de via the web interface I got this error as soon as my test sentence ended with a question mark:

DEBUG:glow-speak:ɪ_z ð_ɪ_s ɐ_n_ˈʌ_ð_ɚ t_ˈɛ_s_t? .
ERROR:glow_speak.http_server:
Traceback (most recent call last):
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
    return await self.ensure_async(handler)(**request_.view_args)
  File "/home/pi/glow-speak/glow_speak/http_server.py", line 484, in app_say
    wav_bytes = await text_to_wav(text, voice, **tts_args)
  File "/home/pi/glow-speak/glow_speak/http_server.py", line 323, in text_to_wav
    text_ids = text_to_ids(
  File "/home/pi/glow-speak/glow_speak/__init__.py", line 110, in text_to_ids
    text_ids = phonemes2ids(
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 190, in phonemes2ids
    maybe_extend_ids(sub_phoneme, word_ids, append_list=False)
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 108, in maybe_extend_ids
    maybe_ids = missing_func(phoneme)
  File "/home/pi/glow-speak/glow_speak/__init__.py", line 59, in guess_ids
    typing.List[Phoneme], guess_phonemes(phoneme, self.to_phonemes)
  File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/gruut_ipa/accent.py", line 159, in guess_phonemes
    assert dist_split is not None
AssertionError

Maybe some encoding error when reading the web input?

Speed seems pretty good, comparable to Larynx I'd say :+1: and I noticed the pronunciations have been improved for German :clap: :sunglasses:

opened by fquirin 0

Releases(v1.0)

v1.0(Oct 20, 2021)

Source code(tar.gz)
Source code(zip)
cmn_jing_li.tar.gz(101.49 MB)
de_thorsten.tar.gz(101.59 MB)
el_rapunzelina.tar.gz(101.34 MB)
en-us_ljspeech.tar.gz(101.66 MB)
en-us_mary_ann.tar.gz(101.69 MB)
es_tux.tar.gz(101.61 MB)
fi_harri_tapani_ylilammi.tar.gz(101.46 MB)
fr_siwis.tar.gz(101.59 MB)
hu_diana_majlinger.tar.gz(101.47 MB)
it_riccardo_fasol.tar.gz(101.70 MB)
ko_kss.tar.gz(101.58 MB)
nl_rdh.tar.gz(101.60 MB)
ru_nikolaev.tar.gz(101.64 MB)
sv_talesyntese.tar.gz(101.42 MB)
sw_biblia_takatifu.tar.gz(101.71 MB)
vi_vais1000.tar.gz(101.28 MB)

Owner

Rhasspy

Offline voice assistant

GitHub Repository

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

35 Sep 17, 2022

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 01, 2023

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Recurrent VLN-BERT Code of the Recurrent-VLN-BERT paper: A Recurrent Vision-and-Language BERT for Navigation Yicong Hong, Qi Wu, Yuankai Qi, Cristian

109 Dec 21, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

2 May 12, 2022

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Sinkhorn Transformer This is a reproduction of the work outlined in Sparse Sinkhorn Attention, with additional enhancements. It includes a parameteriz

217 Nov 25, 2022

Pattern Matching in Python

Pattern Matching finalmente chega no Python 3.10. E daí? "Pattern matching", ou "correspondência de padrões" como é conhecido no Brasil. Algumas pesso

6 Feb 16, 2022

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

12 Sep 28, 2022

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Related tags

Overview

Glow-Speak

Installation

Voices

Usage

Download Voices

Command-Line Synthesis

HTTP Server

Socket Server

You might also like...

End-to-End Speech Processing Toolkit

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Athena is an open-source implementation of end-to-end speech processing engine.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Comments

AssertionError on web interface (only) - and Raspberry Pi Bullseye test

Releases(v1.0)

v1.0(Oct 20, 2021)

Owner

Rhasspy

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Pattern Matching in Python

Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

2021语言与智能技术竞赛：机器阅读理解任务

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

Precision Medicine Knowledge Graph (PrimeKG)

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

This is a project of data parallel that running on NLP tasks.

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

超轻量级bert的pytorch版本，大量中文注释，容易修改结构，持续更新

Shared code for training sentence embeddings with Flax / JAX

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Pre-Training with Whole Word Masking for Chinese BERT

COVID-19 Related NLP Papers

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。