Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

Explainability for Vision Transformers (in PyTorch)

Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

CS506-Spring2022 - Code and Slides for Boston University CS 506

GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Gin provides a lightweight configuration framework for Python

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"

MVSDF - Learning Signed Distance Field for Multi-view Surface Reconstruction

Add gui for YoloV5 using PyQt5

A Python library for differentiable optimal control on accelerators.

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Generate images from texts. In Russian. In PaddlePaddle

Level Based Customer Segmentation

Recreate CenternetV2 based on MMDET.

An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models