Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

Last update: Dec 24, 2022

Overview

CycleGAN-VC3-PyTorch

This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion, a nice work on Voice-Conversion/Voice Cloning.

CycleGAN-VC3

Project Page

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, CycleGAN-VC [3] and CycleGAN-VC2 [2] have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram.

Figure 1. We developed time-frequency adaptive normalization (TFAN), which extends instance normalization [5] so that the affine parameters become element-dependent and are determined according to an entire input mel-spectrogram.

This repository contains:

TFAN module code which implemented the TFAN module
model code which implemented the model network.
audio preprocessing script you can use to create cache for training data.
training scripts to train the model.

CycleGAN-VC3-PyTorch

Requirement

pip install -r requirements.txt

Usage

Reference

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion. Paper, Project
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

WechatPay(微信)

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

Related tags

Overview

CycleGAN-VC3-PyTorch

CycleGAN-VC3

Project Page

Table of Contents

Requirement

Usage

Reference

Donation

License

Owner

Kun Ma

Create animations for the optimization trajectory of neural nets

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

A cross-lingual COVID-19 fake news dataset

Dyalog-apl-docset - Dyalog APL Dash Docset Generator

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon.

A small fun project using python OpenCV, mediapipe, and pydirectinput

A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Sandbox for training deep learning networks

Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

Adversarially Learned Inference

Technical Analysis library in pandas for backtesting algotrading and quantitative analysis

Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

A benchmark for the task of translation suggestion

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

Related tags

Overview

CycleGAN-VC3-PyTorch

CycleGAN-VC3

Project Page

Table of Contents

Requirement

Usage

Reference

Donation

License

Owner

Kun Ma

Create animations for the optimization trajectory of neural nets

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

A cross-lingual COVID-19 fake news dataset

Dyalog-apl-docset - Dyalog APL Dash Docset Generator

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

A small fun project using python OpenCV, mediapipe, and pydirectinput

A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

A PyTorch implementation of Sharpness-Aware Minimization for Efficiently Improving Generalization

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Sandbox for training deep learning networks

Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

Adversarially Learned Inference

Technical Analysis library in pandas for backtesting algotrading and quantitative analysis

Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

A benchmark for the task of translation suggestion

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, and Adrien Gaidon.