Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Overview

Google Text-To-Speech Batch Prompt File Maker

forthebadge forthebadge

Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pro! This repository contains a tool for generating Google Text-To-Speech audio files in batch. It is ideal for offline prompts creation with Google voices for application in IVRs

In order to use this repository, clone the contents in your local environment with the following console command:

git clone https://github.com/ponchotitlan/google_text-to-speech_prompt_maker.git

Once cloned, follow the next steps for environment setup:

1) GCP account setup

Before adjusting up the contents of this project, it is neccesary to setup the Cloud Text-to-Speech API in your Google Cloud project:

  1. Follow the official documentation for activating this API and creating a Service Account
  2. Generate a JSON key associated to this Service Account
  3. Save this JSON key file in the same location as the contents of this repository

2) CSV and YAML files

Prepare a CSV document with the texts that you want to convert into prompt audio files. The CSV must have the following structure:

    <FILE NAME WITHOUT THE EXTENSION> , <PROMPT TEXT OR COMPLIANT SSML GRAMMAR>

An Excel export to CSV format should be enough for rendering a compatible structure, ever since the text within a cell is dumped between quotes if it contains spaces. An example of a compliant file with SSML prompts would look like the following:

    sample_prompt_01,"<speak>Welcome to ACME. How can I help you today?</speak>"
    sample_prompt_02,"<speak>Press 1 for sales. <break time=200ms/>Press 2 for Tech Support. <break time=200ms/>Or stay in the line for agent support</speak>"
    ...

Additionally, prepare a YAML document with the structure mentioned in the setup.yaml file included in this repository. The fields are the following:

# CSV format is: FILE_NAME , PROMPT_CONTENT
csv_prompts_file: <my_csv_file.csv>

google_settings:
    # ROUTE TO THE JSON KEY ASSOCIATED TO GCP. IF THE ROUTE HAS SPACES, ADD QUOTES TO THE VALUE
    JSON_key: <my_key.json>

    # PROMPT TYPE. ALLOWED VALUES ARE:
    # normal | SSML
    prompt_type: SSML

    # FILE FORMAT. ALLOWED VALUES ARE:
    # wav | mp3
    output_audio_format: wav

    # COMPLIANT LANGUAGE CODE. SEE https://cloud.google.com/text-to-speech/docs/voices FOR COMPATIBLE CODES
    language_code: es-US

    # COMPLIANT VOICE NAME. SEE https://cloud.google.com/text-to-speech/docs/voices FOR COMPATIBLE NAMES
    voice_name: es-US-Wavenet-C

    # COMPLIANT VOICE GENDER. SEE https://cloud.google.com/text-to-speech/docs/voices FOR COMPATIBLE GENDERS WITH THE SELECTED VOICE ABOVE
    voice_gender: MALE

    # COMPLIANT AUDIO ENCODING. SUPPORTED TYPES ARE:
    # AUDIO_ENCODING_UNSPECIFIED | LINEAR16 | MP3 | OGG_OPUS
    audio_encoding: LINEAR16

3) Dependencies installation

Install the requirements in a virtual environment with the following command:

pip install -r requirements.txt

4) Inline calling

The usage of the script requires the following inline elements:

usage: init.py [-h] [-b BATCH] configurationYAML

Batch prompt generation with Google TTS services

positional arguments:
  configurationYAML     YAML file with operation settings

optional arguments:
  -h, --help            show this help message and exit
  -b BATCH, --batch BATCH
                        Amount of rows in the CSV file to process at the same
                        time. Suggested max value is 100. Default is 10

An example is:

py init.py setup.yaml

The command prompt will show logs based on the status of each row:

✅ Prompt sample_prompt_04.WAV created successfully!
✅ Prompt sample_prompt_01.WAV created successfully!
✅ Prompt sample_prompt_03.WAV created successfully!
✅ Prompt sample_prompt_02.WAV created successfully!

The corresponding audio files will be saved in the same location where this script is executed.

5) Encoding for Cisco CVP Audio Elements

Unfortunately, Google Text-To-Speech service does not support the compulsory 8-bit μ-law encoding as per the Python SDK documentation (I am currently working on a Java version which does support this encoding. This option might be released in the Python SDK in the future). However, there are many online services such as this one for achieving the aforementioned. Audacity can also be used for the purpose. Follow this tutorial for compatible file conversion steps. There's a more straightforward tool which has been proven useful for me in order to process batch files with the CVP compatible settings.

The resulting files can later be uploaded into the Tomcat server for usage within a design in Cisco CallStudio. The route within the CVP Windows Server VM is the following:

    C:\Cisco\CVP\VXMLServer\Tomcat\webapps\CVP\audio

Please refer to the Official Cisco Documentation for more information.

Crafted with ❤️ by Alfonso Sandoval - Cisco

You might also like...
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

TOPSIS implementation in Python Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) CHING-LAI Hwang and Yoon introduced TOPSIS

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux
voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

Command-line tools for speech and intent recognition on Linux

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

A Python module made to simplify the usage of Text To Speech and Speech Recognition.
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

Command Line Text-To-Speech using Google TTS
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

Releases(v1.2.0)
Owner
Ponchotitlán
💻 ☕ 🥃 Let's talk about networks coding, automation and orchestration autour a cup of coffee, and a sip of tequila;
Ponchotitlán
Text classification on IMDB dataset using Keras and Bi-LSTM network

Text classification on IMDB dataset using Keras and Bi-LSTM Text classification on IMDB dataset using Keras and Bi-LSTM network. Usage python3 main.py

Hamza Rashid 2 Sep 27, 2022
"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

transformers-arithmetic This repository contains the code to reproduce the experiments from the paper: Nogueira, Jiang, Lin "Investigating the Limitat

Castorini 33 Nov 16, 2022
An end to end ASR Transformer model training repo

END TO END ASR TRANSFORMER 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统 Model Instructions 1.数据准备: 自行下载数据,遵循文件结构如下: ├── data │ ├── train │

旷视天元 MegEngine 10 Jul 19, 2022
Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

Zeyao Du 5.6k Jan 04, 2023
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

VinAI Research 58 Dec 23, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 124 Jan 03, 2023
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

THUNLP 2.3k Jan 08, 2023
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021
Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Channel Auto-Post Bot This bot can send all new messages from one channel, directly to another channel (or group, just in case), without the forwarded

Aditya 128 Dec 29, 2022
Labelling platform for text using distant supervision

With DataQA, you can label unstructured text documents using rule-based distant supervision.

245 Aug 05, 2022
189 Jan 02, 2023
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 09, 2023
SciBERT is a BERT model trained on scientific text.

SciBERT is a BERT model trained on scientific text.

AI2 1.2k Dec 24, 2022
Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, whic

Jesse Zaneveld 33 Dec 28, 2022
This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

Ai-ChatBot-Python A chatbot is an intelligent system which can hold a conversation with a human using natural language in real time. Due to the rise o

1 Oct 30, 2021
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

Easy-Translate is a script for translating large text files in your machine using the M2M100 models from Facebook/Meta AI. We also privide a script fo

Iker García-Ferrero 41 Dec 15, 2022
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022
【原神】自动演奏风物之诗琴的程序

疯物之诗琴 读取midi并自动演奏原神风物之诗琴。 可以自定义配置文件自动调整音符来适配风物之诗琴。 (原神1.4直播那天就开始做了!到现在才能放出来。。) 如何使用 在Release页面中下载打包好的程序和midi压缩包并解压。 双击运行“疯物之诗琴.exe”。 在原神中打开风物之诗琴,软件内输入

435 Jan 04, 2023