Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Overview

Fast (GAN Based Neural) Vocoder

Chinese README

Todo

  • Submit demo
  • Support NHV

Discription

Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include NHV in the future. Developed on BiaoBei dataset, you can modify conf and hparams.py to fit your own dataset and model.

Usage

  • Prepare data
    • write path of wav data in a file, for example: cd dataset && python3 biaobei.py
    • bash preprocess.sh <wav path file> <path to save processed data> dataset/audio dataset/mel
    • for example: bash preprocess.sh dataset/BZNSYP.txt processed dataset/audio dataset/mel
  • Train
    • command:
    bash train.sh \
        <GPU ids> \
        /path/to/audio/train \
        /path/to/audio/valid \
        /path/to/mel/train \
        /path/to/mel/valid \
        <model name> \
        <if multi band> \
        <if use scheduler> \
        <path to configuration file>
    
    • for example:
    bash train.sh \
    0 \
    dataset/audio/train \
    dataset/audio/valid \
    dataset/mel/train \
    dataset/mel/valid \
    hifigan \
    0 0 0 \
    conf/hifigan/light.yaml
    
  • Train from checkpoint
    • command:
    bash train.sh \
        <GPU ids> \
        /path/to/audio/train \
        /path/to/audio/valid \
        /path/to/mel/train \
        /path/to/mel/valid \
        <model name> \
        <if multi band> \
        <if use scheduler> \
        <path to configuration file> \
        /path/to/checkpoint \
        <step of checkpoint>
    
  • Synthesize
    • command:
    bash synthesize.sh \
        /path/to/checkpoint \
        /path/to/mel \
        /path/for/saving/wav \
        <model name> \
        /path/to/configuration/file
    

Acknowledgments

Comments
  • why set the L=30 ?

    why set the L=30 ?

    hello,I have some question, in the paper ,the shape of basis matrix is [32, 256] , but in the code ,the shape is [30, 256] . And according to the function "overlap_and_add" , output_size = (frames - 1) * frame_step + frame_length, if the L=30, I think it cannot match the real wave length ? for example, hop_len=256, mel.shape=[80, 140] , theoretically the output wave length is 140*256=35840. according to the code, the output wave length is 33600.

    Thanks in advance.

    opened by yingfenging 3
  • Link to Basis-MelGAN paper?

    Link to Basis-MelGAN paper?

    Hi Zhengxi, congrats on your paper's acceptance on Interspeech 2021!

    I got pretty interested in your paper while reading the abstract of Basis-MelGAN on the README, but I could not find any link to the paper. Though the Interspeech conference is only 2 months away, don't you have any plans on publishing the paper on arXiv in near future?

    opened by seungwonpark 2
  • Random start index in WeightDataset

    Random start index in WeightDataset

    At this line: https://github.com/xcmyz/FastVocoder/blob/a9af370be896b1096e746ce6489fb16fef8ca585/data/dataset.py#L97

    If the input mel size smaller than fix-length, the random raise issue, I have try except to pass these short audios, but I just wonder it is handle in collate.

    More than that, the segment size as I found in hifigan is 32, but in basic-melgan it (fix-length) is set to 140. Are there any difference between the 140 for biaobei and the one for LJspeech

    opened by v-nhandt21 0
  • can basis-melgan  be used as  unversial vocoder?

    can basis-melgan be used as unversial vocoder?

    I tried it for a single speaker dataset, rtf surprises me. Have you ever use basis-melgan for a multi-speaker dataset, or is it suitable for unseen speaker tts synthesis?

    opened by mayfool 0
  • Shape mismatch error on new dataset

    Shape mismatch error on new dataset

    Hi, thanks for your work!

    The frame rate of my dataset is 22050, and hop size of text2mel model is 256. I have changed hparams.py accordingly, but training results in an expcetion: (preprocessing was fine, anyway)

      File "/home/user/speechlab/FastVocoder-main/model/loss/loss.py", line 23, in forward
        assert est_source_sub_band.size(1) == wav_sub_band.size(1)
    

    I figured out that model inference still uses hop-size of 240. So how to make your code fully compatible with other datasets? it seems that the codes are somehow hardcoded for Biaobei dataset.

    opened by tekinek 1
  • Multiband Architecture

    Multiband Architecture

    Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

    help wanted 
    opened by Rongjiehuang 6
Owner
Zhengxi Liu (刘正曦)
Interested in high performance neural vocoder and expressive TTS acoustic model. Member of DeepMist and developed MistGPU.
Zhengxi Liu (刘正曦)
Nateve compiler developed with python.

Adam Adam is a Nateve Programming Language compiler developed using Python. Nateve Nateve is a new general domain programming language open source ins

Nateve 7 Jan 15, 2022
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

MilaNLP 35 Sep 17, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Kay Savetz 60 Dec 25, 2022
Machine translation models released by the Gourmet project

Gourmet Models Overview The Gourmet project has released several machine translation models to translate low-resource languages. This repository conta

Edinburgh NLP 5 Dec 08, 2021
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022
CDLA: A Chinese document layout analysis (CDLA) dataset

CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title

buptlihang 84 Dec 28, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

- 基于标题的大规模商品实体检索top1 一、任务介绍 CCKS 2020:基于标题的大规模商品实体检索,任务为对于给定的一个商品标题,参赛系统需要匹配到该标题在给定商品库中的对应商品实体。 输入:输入文件包括若干行商品标题。 输出:输出文本每一行包括此标题对应的商品实体,即给定知识库中商品 ID,

43 Nov 11, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 05, 2022
Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries.

VirtualAssistant Simple virtual assistant using pyttsx3 and speech recognition optionally with pywhatkit and pther libraries. Third Party Libraries us

Logadheep 1 Nov 27, 2021
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

KoBERTopic 모델 소개 KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정했습니다. 기존 BERTopic : https://github.com/MaartenGr/BERTopic/tree/05a6790b21009d

Won Joon Yoo 26 Jan 03, 2023
✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

AI-For-Road-Safety Challenge hosted by Omdena Hyderabad Chapter Original Repo Link : https://github.com/OmdenaAI/omdena-india-roadsafety Final Present

Prathima Kadari 7 Nov 29, 2022
Code for the paper "Flexible Generation of Natural Language Deductions"

Code for the paper "Flexible Generation of Natural Language Deductions"

Kaj Bostrom 12 Nov 11, 2022
Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

Vincent Rasneur 230 Nov 16, 2022
Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

UlionTse 907 Dec 27, 2022
Basic yet complete Machine Learning pipeline for NLP tasks

Basic yet complete Machine Learning pipeline for NLP tasks This repository accompanies the article on building basic yet complete ML pipelines for sol

Ivan 20 Aug 22, 2022
Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 04, 2022