当前位置：网站首页>Speech recognition and conversion small test knife (1)

Speech recognition and conversion small test knife (1)

2022-08-07 00:24:00 【Andy Dennis】

前言

These days, I suddenly feel that the voice is a little interesting.Would like to explore some implementations with some libraries.

语音合成

Text-to-Speech, 简称TTS

pyttsx3

首先, 下载好pyttsx, As I watched 文档是2.6的,So here I will download it2.6版本的pyttsx3

pip install pyttsx3==2.6

直接读出来

import pyttsx3


zh_voice_id = 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ZH-CN_HUIHUI_11.0'

en_engine = pyttsx3.init() # 默认英文
zh_engine = pyttsx3.init()
zh_engine.setProperty('voice', zh_voice_id)


def say_text(engine, text):
    show_engine_info(engine)
    engine.say(text)
    engine.runAndWait()


def show_engine_info(engine):
    voices = engine.getProperty('voices')
    for voice in voices:
        print("Voice:")
        print(" - ID: %s" % voice.id)
        print(" - Name: %s" % voice.name)
        print(" - Languages: %s" % voice.languages)
        print(" - Gender: %s" % voice.gender)
        print(" - Age: %s" % voice.age)


if __name__ == '__main__':
    say_text(en_engine, 'I will study hard. Only in this way can I get a good remark.')
    # say_text(zh_engine, 'I don't think I lied') # Chinese is still a problem

Dump to file.发现: ‘Engine’ object has no attribute ‘save_to_file’
I will look at the source code of this version when I have time…

FastSpeech2

项目链接 https://github.com/ming024/FastSpeech2

我是在colab上跑的,修改notebook为GPU加速
Need to put the author'sckpt(Of course you can train yourself, Just put it in the designated place)加载到google网盘中

def copy_pretrained_weight(ckpt_name, ckpt_share_path):
  assert ckpt_name in ['LJSpeech', 'AISHELL3', 'LibriTTS']

  if not os.path.exists('output'):
    os.mkdir('output')
  if not os.path.exists('output/ckpt'):
    os.mkdir('output/ckpt')
  dir_path = 'output/ckpt/{}'.format(ckpt_name)
  if not os.path.exists(dir_path):
    os.mkdir(dir_path)
  shutil.copy(ckpt_share_path, dir_path)


copy_pretrained_weight('LibriTTS', '/content/drive/MyDrive/share/FastSpeech2/LibriTTS_800000.pth.tar')

改一下pretrained名字

os.rename('/content/FastSpeech2/output/ckpt/LibriTTS/LibriTTS_800000.pth.tar', 
          '/content/FastSpeech2/output/ckpt/LibriTTS/800000.pth.tar')

I am running here LibriTTS 的版本,Multi-person English speech synthesis,In fact, there is also a Chinese versionAISHELL3, 看一下readme.md就懂了,这里就不说了.
然后需要安装一下requirements.txt中的库,解压一下 HiFiGAN, 这里hifigan是decoder.

!unzip -d /content/FastSpeech2/hifigan /content/FastSpeech2/hifigan/generator_universal.pth.tar.zip

跑起来

!python3 synthesize.py --text "Want the stars and the sun, want the world to surrender, and want you by your side."\
  --speaker_id 0 --restore_step 800000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml

输出信息:
Removing weight norm…
Raw Text Sequence: Want the stars and the sun, want the world to surrender, and want you by your side
Phoneme Sequence: {W AA1 N T DH AH0 S T AA1 R Z AE1 N D DH AH0 S AH1 N sp W AA1 N T DH AH0 W ER1 L D T AH0 S ER0 EH1 N D ER0 sp AE1 N D W AA1 N T Y UW1 B AY1 Y AO1 R S AY1 D}

Also outputs two files

下载到本地后,look一下.The mel spectrum generated by this sentence.另外.wavJust click on the file to play it.（该wavI will use the documentsASR的例子hh）

语音识别

Automatic Speech Recognition 简称 ASR

pocketSphinx

I wanted to install it,结果报错了,Looking at other blogs seems to be installing something else,do it first

wenet

https://github.com/wenet-e2e/wenet

pip install wenet

可以先去下载 https://github.com/wenet-e2e/wenet/releases/download/
这些文件,Of course, you can also not specify it directlymodel_dir,It will download itself C:/Users/Administrator/.wenet/

import sys
import wave
import wenet


def wav2text(test_wav, only_last=True):
    with wave.open(test_wav, 'rb') as fin:
        assert fin.getnchannels() == 1
        wav = fin.readframes(fin.getnframes())

    decoder = wenet.Decoder(lang='en', model_dir='.wenet/en')
    # We suppose the wav is 16k, 16bits, and decode every 0.5 seconds
    interval = int(0.5 * 16000) * 2
    result = []
    for i in range(0, len(wav), interval):
        last = False if i + interval < len(wav) else True
        chunk_wav = wav[i: min(i + interval, len(wav))]
        ans = decoder.decode(chunk_wav, last)
        result.append(ans)
    if only_last:
        return result[-1]
    return result


if __name__ == '__main__':
    test_wav = 'demo/demo.wav'
    text = wav2text(test_wav)
    print(text)