ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Last update: Dec 29, 2022

Overview

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

日本語は以下に続きます (Japanese follows)

English: This book is written in Japanese and primarily focuses on Japanese TTS. Some of the functionality (e.g., neural network implementations) in this codebase can be used for other languages. However, we didn't prepare any guide or code for non-Japanese TTS systems. We may extend the codebase for other languages in the future but cannot guarantee if we would work on it.

Installation

pip install ttslearn

リポジトリの構成

ttslearn: 「Pythonで学ぶ音声合成」のために作成された、音声合成のコアライブラリです。 pip install ttslearn としてインストールされるライブラリの実体です。書籍のサンプルコードとしてだけでなく、汎用的な音声合成のライブラリとしてもご利用いただけます。
notebooks: 第4章から第10章までの、Jupyter notebook形式のソースコードです。
hydra: 第6章で解説している hydra のサンプルコードです。
recipes: 第6章、第8章、第10章で解説している、日本語音声合成のレシピです。JSUTコーパスを利用した日本語音声合成システムの実装が含まれています。
extra_recipes: 発展的な音声合成のレシピです。書籍では解説していませんが、ttslearn ライブラリの利用例として、JSUTコーパス、JVSコーパスを用いた音声合成のレシピをリポジトリに含めています。

詳細なドキュメントは、https://r9y9.github.io/ttslearn/ を参照してください。

ライセンス

ソースコードのライセンスはMITです。商用・非商用問わずに、お使いいただけます。詳細は LICENSEファイルを参照してください。

学習済みモデルの利用規約

本リポジトリのリリースページでは、JSUTコーパス・JVSコーパスを用いて学習した、学習済みモデルを配布しています。それらの学習済みモデルは、「非商用目的」でのみ利用可能です。学習済みモデルを利用する際は、各コーパスの利用規約も併せてご確認ください。

また、作者は、学習済みモデルの利用による一切の請求、損害、その他の義務について何らの責任も負わないものとします。

付録

付録として、日本語音声合成のフルコンテキストラベルの仕様をまとめています。詳細は、docs/appendix.pdf を参照してください。

問い合わせ

書籍の内容、ソースコードに関する質問などありましたら、GitHub issue にてお問い合わせをいただければ、可能な限り返答します。

お詫びと訂正

本書の正誤表を以下のリンク先でまとめています。

本書の正誤表

もし、正誤表に記載されていない誤植などの間違いを見つけた場合は、GitHub issue にてご連絡ください。

謝辞

Tacotron 2の一部ソースコードは、ESPnetを元に作られました。(thanks to @kan-bayashi)
発展的なレシピの実装のほとんどにおいて、kan-bayashi/ParallelWaveGANを利用しました。
日本語音声合成のテキスト処理には、Open JTalk およびそのPythonラッパーを利用しました。

リンク

Amazon: https://www.amazon.co.jp/dp/4295012270/
インプレス書籍情報: https://book.impress.co.jp/books/1120101073

Comments

WaveNetの損失関数を計算する時の出力のシフト方向
お世話になっております！

WaveNetの損失関数に関して、一つ間違っているかと思うところがあって、ご確認いただきたいです。

7.7節の最後により、WaveNetの損失関数を計算する時に、

自己回帰モデルとしての制約を満たすために、出力を時間方向に一つシフトしていることに注意します。シフトしないまま損失を計算すると、WaveNetは時刻 t までの入力を元に時刻 t の音声サンプルを予測するという、本来の目的に沿わない動作をしてしまいます。時刻 t までの入力を元に、時刻 t + 1 の音声サンプルを予測することが、学習の目的であることに注意します。この問題は、WaveNetのみならず、teacher forcingを利用するその他の自己回帰モデルにも共通するため、実装の際に十分に注意する必要があります。

該当のソースコードcode 7.16とcode 7.17は下記です。

https://github.com/r9y9/ttslearn/blob/0fd4c04c11a2c8552198e39c0d517ef4c540b47d/notebooks/ch07_WaveNet.ipynb#L1233-L1234

https://github.com/r9y9/ttslearn/blob/0fd4c04c11a2c8552198e39c0d517ef4c540b47d/notebooks/ch07_WaveNet.ipynb#L1244

しかし、第8章のcode 8.11に、出力のシフト方向は正反対です。

https://github.com/r9y9/ttslearn/blob/0fd4c04c11a2c8552198e39c0d517ef4c540b47d/notebooks/ch08_Recipe-WaveNet.ipynb#L1300-L1301

レシピソースコードのシフト方向も同じ正反対です。

https://github.com/r9y9/ttslearn/blob/0fd4c04c11a2c8552198e39c0d517ef4c540b47d/recipes/wavenet/train_wavenet.py#L24-L27

もしかして片方が間違っているかと思っています。

自分の認識として、teacher forcingの場合、x_hat[:, :, t]は因果的な畳み込みで、x[:, t]までの音声サンプルから予測され、x[:, t + 1]と比較するのが正しいかと思います。なので、第8章以降の方（x_hat[:, :, :-1], x[:, 1:]）が正しいように思います。

しかしcode 8.11を編集して実際に実行してみた結果、x_hat[:, :, :-1], x[:, 1:]の損失値はx_hat[:, :, 1:], x[:, :-1]より大きかったです。

前者の損失値は

5.5439348220825195 5.494748115539551 5.402365684509277 5.309176921844482 5.262940883636475 ...

で、後者の損失値は

5.043774604797363 4.923819541931152 4.949016094207764 4.854518413543701 4.862161636352539 ...

です。

なので、どちらが正しいかはよくわからなくなります。ご確認いただけないでしょうか？
誤植
opened by zzxiang 5
WaveNetの学習時と推論時の畳み込み入力xの形状
聞く場所を間違えたら申し訳ございません。

第7章のWaveNetのソースコードを拝見するときによくわからないところがあります。

WaveNetの学習においては、畳み込み入力xの形状は(B, out_channels, T)です。μ-lawアルゴリズムによって音声波形を8 bit（2^8=256通り）に量子化した場合、out_channelsは256です。

# 量子化された離散値列から One-hot ベクトルに変換 # (B, T) -> (B, T, out_channels) -> (B, out_channels, T) x = F.one_hot(x, self.out_channels).transpose(1, 2).float() # 条件付き特徴量のアップサンプリング c = self.upsample_net(c) assert c.size(-1) == x.size(-1) # One-hot ベクトルの次元から隠れ層の次元に変換 x = self.first_conv(x)

なので、自分の認識として、学習時の畳み込みは時間軸方向に行われると思います。本の224ページ目の話通り、

時刻t - 1の教師データ（図7-15a）を時刻tの入力として利用することで、学習の難しさを緩和します。この方法はteacher forcingと呼ばれます[64]。

その一方、

推論時には教師データは得られないため、1サンプルずつ逐次的に音声を生成しなければなりません。

推論においては、畳み込み入力xの形状は(1, out_channels)です。

outputs = [] # 自己回帰生成における初期値 current_input = torch.zeros(B, 1, self.out_channels).to(c.device) current_input[:, :, int(mulaw_quantize(0))] = 1 # ... # 逐次的に生成 for t in ts: # 時刻 t における入力は、時刻 t-1 における出力 if t > 0: current_input = outputs[-1] # 時刻 t における条件付け特徴量 ct = c[:, t, :].unsqueeze(1) x = current_input x = self.first_conv.incremental_forward(x) # ... outputs += [x.data]

すなわち、推論時の畳み込みはout_channels方向、あるいはOne-hotベクトル方向に行われて、学習時と違います。

この認識は正しいですか？

もし正しければ、なぜ時間軸に学習した畳み込みの重みはOne-hotベクトル方向に使えますでしょうか？
opened by zzxiang 5
図7-14の矢印と記号
図7-14 (p.214) について、

「アップサンプル済み条件付け特徴量」が「1x1 Conv」に入り、そこから「tanh」と「\sigma」の出力と合流していますが、code 7.9 では「tanh」と「\sigma」の入力と合流しているように見えます

残差接続などですべて記号が \otimes になっていますが、「tanh」と「\sigma」の出力が合流するところを除き、記号は \oplus であるべきだと思います（cf. WaveNetの原論文）

「Dilated Conv1d Block」の内部と外部で、「1x1 Conv」と「1x1 conv」と大文字・小文字が使い分けられていますが、コード中ではどちらも同じレイヤーが使われていると思います。

ちなみに、1.について、条件付け特徴量は活性化関数の前と後のどちらで足し合わせる方がいい、というようなことは何か言えるのでしょうか？
誤植
opened by tomotakatakahashi 4
Transfer learning

hello,

I went trough all the recipes and was looking for how transfer learning could be done in a similar way as described here

transfer learning

without having to rely upon that similar (parallel) wav/text is used as an input like in the multispeaker example.

thank you very much for an answer in advance.

best regards

opened by dtfasteas 4
ttslearn.util.example_audio_file のドキュメントのExampleで、ttslearn.util.example_audio_file を使っていない
『Pythonで学ぶ音声合成』、大変興味深く、少しずつ読み進めています。

ドキュメントに軽微なミスと思われる点を見つけましたので共有します。 https://r9y9.github.io/ttslearn/latest/generated/ttslearn.util.example_audio_file.html#ttslearn.util.example_audio_file

現状

>>> fs, x = wavfile.read(pysptk.util.example_audio_file())

https://github.com/r9y9/ttslearn/blob/59c6f491ce205cab611e171054af43afcc6ca603/ttslearn/util.py#L91

修正案

>>> fs, x = wavfile.read(ttslearn.util.example_audio_file())

ttslearn.util.example_audio_fileのExampleですので、pysptk → ttslearn に返る提案です

example_audio_file関数をどう実装しているんだろうと気になり、ソースコードを見たときに気づきました。
誤植
opened by ftnext 3
質問：recipes/tacotron実行の際のミニバッチサイズの変更方法

第10章第5節Tacotronの学習にてレシピのstage3を実行しているとCudaのメモリ不足のエラーが出たため、ミニバッチサイズを下げることで対応したいと考えたのですが、どのような変更をすればよいか教えていただけないでしょうか。また、それ以外にもエラーに対処出来る方法があれば教えていただきたいです。

実行環境 wsl2を用いたLinux(Ubuntu)での自機環境 jupyter-labではなくシェル上でレシピを順々に実行 Windows11 Ubuntu 20.04.3 LTS python3.8.8 GPU NVIDIA GeForce GTX 1650 GPUメモリ12GB 専用GPUメモリ4GB 共有GPUメモリ8GB

エラーメッセージ RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 4.00 GiB total capacity; 2.31 GiB already allocated; 0 bytes free; 2.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

その他に必要な情報がありましたら、いっていただければ返答します。よろしくお願いします。

opened by mikapote 3
Question about the book.

Hey @r9y9

I found your repository https://github.com/r9y9/wavenet_vocoder after doing some searching online for a starting point for getting something like Respeecher has (https://www.respeecher.com/). Specifically, not using text as the main input for speech synthesis, but the voice itself, as there is a lot of extra information that gets lost in simply translation to text.

I've also found your talk on breaking down the WaveGAN approach (https://www.youtube.com/watch?v=BZxqf-Wkhig&t=330s). I found that really helpful and insightful, so I wanted to thank you for that.

So, the question I wanted to ask is, would this book be a good starting point for trying to get a recipe/workflow for training a network on a speaker's voice, then using the user's input voice as a guide for the synthesized speech? Could you recommend a good starting point?

The ParallelWaveGAN (https://kan-bayashi.github.io/ParallelWaveGAN/) seems like the closest thing I could find in getting what I wanted, but it seems to be more oriented around TTS, and I couldn't get the voice conversation to actually work with anything other than the ground truth samples from the trained data.

Anyways, feel free to delete this, I just didn't know of a good way to contact you with the questions I have.

Thanks!

opened by mercuito 2
質問：ttslearnを用いたtacotron2学習での転移学習（ファインチューニング）の方法

先日はissuesに反応してくださりありがとうございます。今回の質問は主にp332の内容です。新しく音声合成を作ろうと思ったのですが、コーパスの規模がJSUTコーパスよりも小さいため転移学習をしようと考えました。しかし出力されたチェックポイントを初期値として学習させる方法が分からず質問させていただきました。ご存知かとは思いますが、NVIDIA/tacotron2では学習スクリプトを実行する際にオプションを付けることで学習済みモデルを初期値にできるそうです。このようにレシピ実行時のオプションとして実装されているのか、別の方法で行うのか教えていただけたら幸いです。

実行環境第10章のnotebooksをGoogle colab proで実行しています。

opened by mikapote 2
Fix accent phrase border in pp_symbols

openjtalk.pyのpp_symbolsにおいて、１５１行目のアクセント句境界の条件が if a3 == 1 and a2_next == 1: となっており、子音と母音の区別がないため、子音と母音の間にも区切り記号'#'が入ってしまいます。条件に当該音素が母音、撥音、促音に該当する場合を追加して、 if a3 == 1 and a2_next == 1 and p3 in "aeiouAEIOUNcl": とすべきではないでしょうか。（現実的には促音の後に区切りが入ることはないかもしれませんが）
bug

opened by sucveria 2

from ttslearn.dnntts import DNNTTS にてtorchのImportError

macOS（GPUはない環境です） Python 3.9.4

再現手順

$ pip install ttslearn
$ python
>>> from ttslearn.dnntts import DNNTTS
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/.../venv/lib/python3.9/site-packages/ttslearn/__init__.py", line 3, in <module>
    from . import util
  File "/Users/.../venv/lib/python3.9/site-packages/ttslearn/util.py", line 12, in <module>
    import torch
  File "/Users/.../venv/lib/python3.9/site-packages/torch/__init__.py", line 202, in <module>
    from torch._C import *  # noqa: F403
ImportError: dlopen(/Users/.../venv/lib/python3.9/site-packages/torch/_C.cpython-39-darwin.so, 2): Library not loaded: @loader_path/../.dylibs/libomp.dylib
  Referenced from: /Users/.../venv/lib/python3.9/site-packages/torch/lib/libtorch_cpu.dylib
  Reason: no suitable image found.  Did find:
	/Users/.../venv/lib/python3.9/site-packages/torch/lib/../.dylibs/libomp.dylib: cannot load 'libomp.dylib' (load command 0x80000034 is unknown)
	/Users/.../venv/lib/python3.9/site-packages/torch/lib/../.dylibs/libomp.dylib: cannot load 'libomp.dylib' (load command 0x80000034 is unknown)

環境情報

$ pip freeze
antlr4-python3-runtime==4.9.3
appdirs==1.4.4
argcomplete==2.0.0
audioread==2.1.9
beautifulsoup4==4.11.1
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.0
cycler==0.11.0
Cython==0.29.30
decorator==5.1.1
fastdtw==0.3.4
filelock==3.7.1
fonttools==4.34.4
gdown==4.5.1
h5py==3.7.0
hydra-core==1.2.0
idna==3.3
joblib==1.1.0
kaldiio==2.17.2
kiwisolver==1.4.4
librosa==0.9.2
llvmlite==0.38.1
matplotlib==3.5.2
nnmnkwii==0.1.1
numba==0.55.2
numpy==1.22.4
omegaconf==2.2.2
packaging==21.3
parallel-wavegan==0.5.5
Pillow==9.2.0
pooch==1.6.0
protobuf==3.20.1
pycparser==2.21
pyopenjtalk==0.2.0
pyparsing==3.0.9
PySocks==1.7.1
pysptk==0.1.21
python-dateutil==2.8.2
pyworld==0.3.0
PyYAML==6.0
requests==2.28.1
resampy==0.3.1
scikit-learn==1.1.1
scipy==1.8.1
six==1.16.0
SoundFile==0.10.3.post1
soupsieve==2.3.2.post1
tensorboardX==2.5.1
threadpoolctl==3.1.0
toml==0.10.2
torch==1.12.0
tqdm==4.64.0
ttslearn==0.2.2
typing_extensions==4.3.0
urllib3==1.26.10
xmltodict==0.13.0
yq==3.0.2

Workaround

$ pip install -U 'torch<1.11'
$ python
>>> from ttslearn.dnntts import DNNTTS

torchのバージョンを固定すれば対応できそうに思います https://github.com/r9y9/ttslearn/blob/v0.2.2/setup.py#L37

opened by ftnext 1

[誤植] pp_symbolsから"$"が抜けている
p. 307、 code 10.3 で、

if e3 == 0: PP.append("")

とありますが、おそらく

if e3 == 0: PP.append("$")

の誤植だと思います。リポジトリ内のコードでは正しく "$" がappendされているようです。（たとえば　https://github.com/r9y9/ttslearn/blob/87c6c77fd3352cac7ae7c66a5493e5af3324ba21/ttslearn/tacotron/frontend/openjtalk.py#L131 ）
誤植
opened by tomotakatakahashi 1
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
JSUTコーパスの置換方法

はじめまして、よろしくお願い致します。オリジナル音声を作るためbasic5000の音声を(BASIC5000_1900まで)収録したので、これを元の音声と置換したいのですが、

stage -1: コーパスのダウンロード

if is_colab(): ! ./run.sh --stage -1 --stop-stage -1

をどのようにすればできるでしょうか。動作環境はGoogle colab です。

追記 BASIC5000_1900までの音声は

WAV形式

ステレオ

32bit

44100Hz

で録音していたのですがこのまま録音を続けても問題ないでしょうか。録音し直したほうがよろしいでしょうか。

opened by Oryzae-O 0
第１０章日本語Tactronに基づく音声合成システムの実装のノートブックで、エラーになります。

幸田と申します。

第１０章のノートブックの３２セル目で、以下のようなエラーが出ます。 ※実行環境は、Jetson Orinで行っています。

/tmp/ipykernel_11866/3661236322.py:13: FutureWarning: Pass orig_sr=48000, target_sr=16000 as keyword args. From version 0.10 passing these as positional arguments will result in an error x = librosa.resample(x, _sr, sr)

ValueError Traceback (most recent call last) File ~/ttslearn/venv/lib/python3.8/site-packages/scipy/signal/windows/_windows.py:2214, in get_window(window, Nx, fftbins) 2213 try: -> 2214 beta = float(window) 2215 except (TypeError, ValueError) as e:

ValueError: could not convert string to float: 'hanning'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) File ~/ttslearn/venv/lib/python3.8/site-packages/scipy/signal/windows/_windows.py:2232, in get_window(window, Nx, fftbins) 2231 try: -> 2232 winfunc = _win_equiv[winstr] 2233 except KeyError as e:

KeyError: 'hanning'

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last) Cell In [32], line 15 12 x = (x / 32768).astype(np.float64) 13 x = librosa.resample(x, _sr, sr) ---> 15 out_feats = logmelspectrogram(x, sr) 17 # 冒頭と末尾の非音声区間の長さを調整 18 assert "sil" in labels.contexts[0] and "sil" in labels.contexts[-1]

File ~/ttslearn/venv/lib/python3.8/site-packages/ttslearn/dsp.py:310, in logmelspectrogram(y, sr, n_fft, hop_length, win_length, n_mels, fmin, fmax, clip) 307 if n_fft is None: 308 n_fft = next_power_of_2(win_length) --> 310 S = librosa.stft( 311 y, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window="hanning" 312 ) 314 fmin = 0 if fmin is None else fmin 315 fmax = sr // 2 if fmax is None else fmax

File ~/ttslearn/venv/lib/python3.8/site-packages/librosa/util/decorators.py:88, in deprecate_positional_args.._inner_deprecate_positional_args..inner_f(*args, **kwargs) 86 extra_args = len(args) - len(all_args) 87 if extra_args <= 0: ---> 88 return f(*args, **kwargs) 90 # extra_args > 0 91 args_msg = [ 92 "{}={}".format(name, arg) 93 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:]) 94 ]

File ~/ttslearn/venv/lib/python3.8/site-packages/librosa/core/spectrum.py:204, in stft(y, n_fft, hop_length, win_length, window, center, dtype, pad_mode) 201 # Check audio is valid 202 util.valid_audio(y, mono=False) --> 204 fft_window = get_window(window, win_length, fftbins=True) 206 # Pad the window out to n_fft size 207 fft_window = util.pad_center(fft_window, size=n_fft)

File ~/ttslearn/venv/lib/python3.8/site-packages/librosa/util/decorators.py:88, in deprecate_positional_args.._inner_deprecate_positional_args..inner_f(*args, **kwargs) 86 extra_args = len(args) - len(all_args) 87 if extra_args <= 0: ---> 88 return f(*args, **kwargs) 90 # extra_args > 0 91 args_msg = [ 92 "{}={}".format(name, arg) 93 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:]) 94 ]

File ~/ttslearn/venv/lib/python3.8/site-packages/librosa/filters.py:1185, in get_window(window, Nx, fftbins) 1180 return window(Nx) 1182 elif isinstance(window, (str, tuple)) or np.isscalar(window): 1183 # TODO: if we add custom window functions in librosa, call them here -> 1185 return scipy.signal.get_window(window, Nx, fftbins=fftbins) 1187 elif isinstance(window, (np.ndarray, list)): 1188 if len(window) == Nx:

File ~/ttslearn/venv/lib/python3.8/site-packages/scipy/signal/windows/_windows.py:2234, in get_window(window, Nx, fftbins) 2232 winfunc = _win_equiv[winstr] 2233 except KeyError as e: -> 2234 raise ValueError("Unknown window type.") from e 2236 if winfunc is dpss: 2237 params = (Nx,) + args + (None,)

ValueError: Unknown window type.

なにか、環境やインストールの際のバージョン等に誤りがあるのでしょうか？

恐れ入りますが、よろしくおねがいいたします。

opened by koudah 1
ttslearnにおける転移学習について

初めまして。mule-engineer13と申します。『Pythonで学ぶ音声合成』を参考に音声合成の学習を進めております。 1点質問をさせていただきます。

ttslearnのtacotron2とwavenetボコーダについて、転移学習を行うことは可能でしょうか？可能であれば、方法を伺えますと幸いです。何卒よろしくお願いいたします。

opened by mule-engineer13 2
テキスト(.txt)からフルコンテキストラベル(.lab)ファイルの生成について

masahiroviewと申します。

先日はcmakeエラーに関する質問にご回答いただきありがとうございました。 https://github.com/r9y9/ttslearn/issues/38 おかげさまで、ご用意いただいたデモでtacotron2＋wavenetボコーダにて音響モデルを作成することに成功いたしました。

つきましては、今度は自前で音源(.wav)とテキスト原稿(.txt)をを用意して、音声合成を行いたいと考えております。 pythonの知識が乏しく、テキストファイル(.txt)から、フルコンテキストラベルファイル(.lab)を自動生成するプログラムが組めず困っております。プログラムのデモコード等があれば、ご教授いただけないでしょうか？

お忙しいところ恐縮ですが、何卒よろしくお願いいたします。

opened by masahiroview 2
ttslearnのインストールについて

初めまして。masahiroviewと申します。書籍「Pythonで学ぶ音声合成」を参考に、合成音声の機械学習環境の構築に取り組んでおります。デモを使って学習する際に問題が発生しましたので、ご連絡させていただいた次第です。

https://r9y9.github.io/ttslearn/latest/notebooks/ch10_Recipe-Tacotron.html こちらのデモを基に、実際に環境を構築しようとしたのですが、ttslearnのインストールができない状況です。

●環境概要：aws、sagemakerのノートブックにて構築。

●インストールコマンド、エラーメッセージ !pip install ttslearn

エラーメッセージーーーーーーーーーーーーーーーーーー Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com/ Collecting ttslearn Using cached ttslearn-0.2.2.tar.gz (295 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting hydra-core>=1.1.0 Using cached hydra_core-1.2.0-py3-none-any.whl (151 kB) Requirement already satisfied: tqdm in /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages (from ttslearn) (4.62.3) Collecting pyopenjtalk>=0.1.0 Using cached pyopenjtalk-0.2.0.tar.gz (1.5 MB) Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [23 lines of output] Traceback (most recent call last): File "/usr/local/bin/cmake", line 5, in from cmake import cmake ModuleNotFoundError: No module named 'cmake' Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in main() File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 130, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-slvgpo17/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel return self._get_build_requires( File "/tmp/pip-build-env-slvgpo17/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 143, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-slvgpo17/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 267, in run_setup super(_BuildMetaLegacyBackend, File "/tmp/pip-build-env-slvgpo17/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 158, in run_setup exec(compile(code, file, 'exec'), locals()) File "setup.py", line 154, in File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/subprocess.py", line 448, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout, subprocess.CalledProcessError: Command '['cmake', '..', '-DHTS_ENGINE_INCLUDE_DIR=.', '-DHTS_ENGINE_LIB=dummy']' returned non-zero exit status 1. [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. WARNING: You are using pip version 22.0.4; however, version 22.2.1 is available. You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command. ーーーーーーーーーーーーーーーーーーーーーーーーーーーー

エラーメッセージを参考に、cmakeのアップデートを試したのですが、改善しませんでした。

●以下、関連のありそうな情報を記載します。

コマンド !python --version !gcc -v !cmake --version

以下、ログ出力ーーーーーーーーーーーーーーーーーー Python 3.8.12 Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-libmpx --enable-libsanitizer --enable-gnu-indirect-function --enable-libcilkrts --enable-libatomic --enable-libquadmath --enable-libitm --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 7.3.1 20180712 (Red Hat 7.3.1-13) (GCC) cmake version 3.22.3 CMake suite maintained and supported by Kitware (kitware.com/cmake). ーーーーーーーーーーーーーーーーーーーーーーーーー

github上での質問をするのが、今回初めてですので、至らぬ点もあるかもしれません。お忙しいところ恐縮ですが、ご回答いただけますと幸いです。

opened by masahiroview 6

Releases(v0.2.2)

v0.2.2(Jan 4, 2022)
v0.2.2 <2022-01-04>

#30: Fix typo in WaveNetTTS docs

#26: Fix accent phrase border in pp_symbols

#22: Fix wrong wavenet loss calculation (addresses #21)

#20: Fix: 毎回JSUTダウンロードをやり直す（全てのrun.shをFix）

#19: Enable Windows CI

#17: Add conv1d test to ensure forward/incremental_forward correctness

#14: windows: use expanduser instead of os.environ["HOME"]

#13: Fix: 毎回JSUTダウンロードをやり直す

#12: Fix #10 書籍 p.82 code4.9 関数stftの誤植

#11: Add warning for streamlit online demo

Source code(tar.gz)
Source code(zip)
recipes.zip(114.21 KB)
v0.2.1(Aug 21, 2021)
v0.2.1 <2021-08-21>

pretrained: add PWG TTS models for common voice (ja)

pretrained: add HiFi-GAN based TTS models using JVS and JSUT corpus

Add HiFi-GAN configs for JVS and JSUT extra recipes

#7: Add script to generate ground-truth aligned (GTA) features

#5: [docker] Push docker image to Docker Hub

#4: [docker] fix docker build fail because no 'gcc' command

#2: [extra_recipes] Fix the suffix of the script; s/sh/bash/

#1: Add common voice jp recipes

Source code(tar.gz)
Source code(zip)
multspk_tacotron2_hifipwg_jvs24k.tar.gz(144.62 MB)
multspk_tacotron2_pwg_cv16k.tar.gz(96.72 MB)
multspk_tacotron2_pwg_cv24k.tar.gz(96.72 MB)
recipes.zip(114.21 KB)
tacotron2_hifipwg_jsut24k.tar.gz(143.64 MB)
v0.2.0(Aug 11, 2021)

The official public release https://book.impress.co.jp/books/1120101073
Source code(tar.gz)
Source code(zip)
dnntts.tar.gz(1.00 MB)
multspk_tacotron2_pwg_jvs16k.tar.gz(97.30 MB)
multspk_tacotron2_pwg_jvs24k.tar.gz(97.32 MB)
recipes.zip(113.69 KB)
tacotron2.tar.gz(97.81 MB)
tacotron2_pwg_jsut16k.tar.gz(96.23 MB)
tacotron2_pwg_jsut24k.tar.gz(96.33 MB)
wavenettts.tar.gz(11.81 MB)

Owner

Ryuichi Yamamoto

Speech Synthesis, Voice Conversion, Machine Learning

GitHub Repository https://r9y9.github.io/ttslearn/

Language-Agnostic SEntence Representations

LASER Language-Agnostic SEntence Representations LASER is a library to calculate and use multilingual sentence embeddings. NEWS 2019/11/08 CCMatrix is

3.2k Jan 04, 2023

Search with BERT vectors in Solr and Elasticsearch

123 Dec 29, 2022

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

KoBERTopic 모델 소개 KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정했습니다. 기존 BERTopic : https://github.com/MaartenGr/BERTopic/tree/05a6790b21009d

26 Jan 03, 2023

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

36 Dec 18, 2022

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context This Repository contains the code on AVA of our ACM MM 2021 paper: LSTC: Boosting

9 Oct 11, 2022

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

VampiresVsWerewolves Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition. Our Algorithm finish

1 Jan 21, 2022

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Channel Auto-Post Bot This bot can send all new messages from one channel, directly to another channel (or group, just in case), without the forwarded

128 Dec 29, 2022

Opal-lang - A WIP programming language based on Python

thanks to aphitorite for the beautiful logo! opal opal is a WIP transcompiled pr

3 Nov 04, 2022

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition （CoNLL-2003

1.2k Dec 26, 2022

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization 📥 Download Datasets 📥 Download Trained Models INTRODUCTION TH2ZH (

5 Jan 03, 2022

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

GP211-Grand-Projet Ce repertoire contient tout les programmes nécessaires au bon fonctionnement de notre projet-logiciel. Cette interface graphique es

1 Dec 21, 2021

Training open neural machine translation models

Train Opus-MT models This package includes scripts for training NMT models using MarianNMT and OPUS data for OPUS-MT. More details are given in the Ma

167 Jan 03, 2023

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

TestRank in Pytorch Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks by Yu Li, Min Li, Qiuxia Lai, Ya

3 May 19, 2022

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

203 Dec 20, 2022

GPT-3 command line interaction

Writer_unblock Straight-forward command line interfacing with GPT-3. Finding yourself stuck at a conceptual stage? Spinning your wheels needlessly on

6 Feb 10, 2022

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 09, 2023

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Related tags

Overview

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Installation

リポジトリの構成

ライセンス

学習済みモデルの利用規約

付録

問い合わせ

お詫びと訂正

謝辞

リンク

Comments

現状

修正案

再現手順

環境情報

Workaround

Patching CVE-2007-4559

stage -1: コーパスのダウンロード

Releases(v0.2.2)

v0.2.2(Jan 4, 2022)

v0.2.2 <2022-01-04>

v0.2.1(Aug 21, 2021)

v0.2.1 <2021-08-21>

v0.2.0(Aug 11, 2021)

Owner

Ryuichi Yamamoto

Language-Agnostic SEntence Representations

Search with BERT vectors in Solr and Elasticsearch

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Opal-lang - A WIP programming language based on Python

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

Training open neural machine translation models

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

GPT-3 command line interaction

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Higher quality textures for the Metal Gear Solid series.

Intent parsing and slot filling in PyTorch with seq2seq + attention

AMUSE - financial summarization