当前位置:网站首页>Speech recognition and conversion small test knife (1)
Speech recognition and conversion small test knife (1)
2022-08-07 00:24:00 【Andy Dennis】
前言
These days, I suddenly feel that the voice is a little interesting.Would like to explore some implementations with some libraries.
语音合成
Text-to-Speech, 简称TTS
pyttsx3
首先, 下载好pyttsx, As I watched 文档 是2.6的,So here I will download it2.6版本的pyttsx3
pip install pyttsx3==2.6
直接读出来
import pyttsx3
zh_voice_id = 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ZH-CN_HUIHUI_11.0'
en_engine = pyttsx3.init() # 默认英文
zh_engine = pyttsx3.init()
zh_engine.setProperty('voice', zh_voice_id)
def say_text(engine, text):
show_engine_info(engine)
engine.say(text)
engine.runAndWait()
def show_engine_info(engine):
voices = engine.getProperty('voices')
for voice in voices:
print("Voice:")
print(" - ID: %s" % voice.id)
print(" - Name: %s" % voice.name)
print(" - Languages: %s" % voice.languages)
print(" - Gender: %s" % voice.gender)
print(" - Age: %s" % voice.age)
if __name__ == '__main__':
say_text(en_engine, 'I will study hard. Only in this way can I get a good remark.')
# say_text(zh_engine, 'I don't think I lied') # Chinese is still a problem
Dump to file.发现: ‘Engine’ object has no attribute ‘save_to_file’
I will look at the source code of this version when I have time…
FastSpeech2
项目链接 https://github.com/ming024/FastSpeech2
我是在colab上跑的,修改notebook为GPU加速
Need to put the author'sckpt(Of course you can train yourself, Just put it in the designated place)加载到google网盘中
def copy_pretrained_weight(ckpt_name, ckpt_share_path):
assert ckpt_name in ['LJSpeech', 'AISHELL3', 'LibriTTS']
if not os.path.exists('output'):
os.mkdir('output')
if not os.path.exists('output/ckpt'):
os.mkdir('output/ckpt')
dir_path = 'output/ckpt/{}'.format(ckpt_name)
if not os.path.exists(dir_path):
os.mkdir(dir_path)
shutil.copy(ckpt_share_path, dir_path)
copy_pretrained_weight('LibriTTS', '/content/drive/MyDrive/share/FastSpeech2/LibriTTS_800000.pth.tar')
改一下pretrained名字
os.rename('/content/FastSpeech2/output/ckpt/LibriTTS/LibriTTS_800000.pth.tar',
'/content/FastSpeech2/output/ckpt/LibriTTS/800000.pth.tar')
I am running here LibriTTS 的版本,Multi-person English speech synthesis,In fact, there is also a Chinese versionAISHELL3, 看一下readme.md就懂了,这里就不说了.
然后需要安装一下requirements.txt中的库,解压一下 HiFiGAN, 这里hifigan是decoder.
!unzip -d /content/FastSpeech2/hifigan /content/FastSpeech2/hifigan/generator_universal.pth.tar.zip
跑起来
!python3 synthesize.py --text "Want the stars and the sun, want the world to surrender, and want you by your side."\
--speaker_id 0 --restore_step 800000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml
输出信息:
Removing weight norm…
Raw Text Sequence: Want the stars and the sun, want the world to surrender, and want you by your side
Phoneme Sequence: {W AA1 N T DH AH0 S T AA1 R Z AE1 N D DH AH0 S AH1 N sp W AA1 N T DH AH0 W ER1 L D T AH0 S ER0 EH1 N D ER0 sp AE1 N D W AA1 N T Y UW1 B AY1 Y AO1 R S AY1 D}
Also outputs two files

下载到本地后,look一下.The mel spectrum generated by this sentence.另外.wavJust click on the file to play it.(该wavI will use the documentsASR的例子hh)
语音识别
Automatic Speech Recognition 简称 ASR
pocketSphinx
I wanted to install it,结果报错了,Looking at other blogs seems to be installing something else,do it first
wenet
https://github.com/wenet-e2e/wenet
pip install wenet
可以先去下载 https://github.com/wenet-e2e/wenet/releases/download/
这些文件,Of course, you can also not specify it directlymodel_dir,It will download itself C:/Users/Administrator/.wenet/

import sys
import wave
import wenet
def wav2text(test_wav, only_last=True):
with wave.open(test_wav, 'rb') as fin:
assert fin.getnchannels() == 1
wav = fin.readframes(fin.getnframes())
decoder = wenet.Decoder(lang='en', model_dir='.wenet/en')
# We suppose the wav is 16k, 16bits, and decode every 0.5 seconds
interval = int(0.5 * 16000) * 2
result = []
for i in range(0, len(wav), interval):
last = False if i + interval < len(wav) else True
chunk_wav = wav[i: min(i + interval, len(wav))]
ans = decoder.decode(chunk_wav, last)
result.append(ans)
if only_last:
return result[-1]
return result
if __name__ == '__main__':
test_wav = 'demo/demo.wav'
text = wav2text(test_wav)
print(text)
{
“nbest” : [{
“sentence” : “want the stars and the sun want the world to surrender and want you by your side”
}],
“type” : “final_result”
}
Take a look at the output (没有标点符号…是个问题)
want the stars and the sun want the world to surrender and want you by your side
原文:
Want the stars and the sun, want the world to surrender, and want you by your side
边栏推荐
猜你喜欢

Hand torn Android Framework bottom-level interview questions collection

vi学习(2)【常用命令包括移动光标/选中文本/撤销和反撤销/删除文本/复制粘贴/查找替换/插入】
![[7] Advanced C language -- program compilation (preprocessing operation) + linking](/img/bf/fc81f3dc354e78712be9a6064a929f.png)
[7] Advanced C language -- program compilation (preprocessing operation) + linking

对比学习模型小抄(1)

著名的 P=NP 问题到底是什么?

新库上线 | CnOpenData 新三板公司专利及引用被引用数据

Mathematical English topic comprehension model record (1)

union联合体的理解(接上篇数据类型float)

leetcode 19. 删除链表的倒数第 N 个结点

网页版MC服务器搭建+汉化
随机推荐
自学软件测试看什么书入门比较好呢?
【Day_13 0509】参数解析
Web开发初探:网页布局盒子模型
MySql操作之DDL
网络通信之NIO编程
Understanding of union union (continued from the previous article data type float)
组合数学——二项式反演
Ftrace function graph简介
azkaban
How does the fragment get the click event of the activity
2022危险化学品经营单位安全管理人员考试题模拟考试题库及模拟考试
[Redis] Redis Learning - Five Basic Data Types
语音识别与转换小试牛刀(1)
How is the Service started?tell you
数据库查询
Introduction to Ftrace function graph
Vi learning (2) the common commands include move cursor/highlight the text/undo and the undo/delete/copy and paste the text/find replacement/insert 】
view function index questions
虚拟机CrossOver2022下载及如何使用教程
2022.8.4 模拟赛