Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Last update: Dec 28, 2022

Related tags

Deep Learning ppg-vc

Overview

ppg-vc

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

This repo implements different kinds of PPG-based VC models. Pretrained models. More models are on the way.

Notes:

The PPG model provided in conformer_ppg_model is based on Hybrid CTC-Attention phoneme recognizer, trained with LibriSpeech (960hrs). PPGs have frame-shift of 10 ms, with dimensionality of 144. This modelis very much similar to the one used in this paper.
This repo uses HifiGAN V1 as the vocoder model, sampling rate of synthesized audio is 24kHz.

Highlights

Any-to-many VC
Any-to-Any VC (a.k.a. few/one-shot VC)

How to use

Data preprocessing

Please run 1_compute_ctc_att_bnf.py to compute PPG features.
Please run 2_compute_f0.py to compute fundamental frequency.
Please run 3_compute_spk_dvecs.py to compute speaker d-vectors.

Training

Please refer to run.sh

Conversion

Plesae refer to test.sh

TODO

Upload pretraind models.

Citations

@ARTICLE{liu2021any,
  author={Liu, Songxiang and Cao, Yuewen and Wang, Disong and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling}, 
  year={2021},
  volume={29},
  number={},
  pages={1717-1728},
  doi={10.1109/TASLP.2021.3076867}
}

@inproceedings{Liu2018,
  author={Songxiang Liu and Jinghua Zhong and Lifa Sun and Xixin Wu and Xunying Liu and Helen Meng},
  title={Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={496--500},
  doi={10.21437/Interspeech.2018-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1504}
}

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Related tags

Overview

ppg-vc

Highlights

How to use

Data preprocessing

Training

Conversion

TODO

Citations

Owner

Liu Songxiang

Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator

Educational 2D SLAM implementation based on ICP and Pose Graph

PyTorch implementation of the REMIND method from our ECCV-2020 paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting"

harmonic-percussive-residual separation algorithm wrapped as a VST3 plugin (iPlug2)

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

tree-math: mathematical operations for JAX pytrees

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

TorchXRayVision: A library of chest X-ray datasets and models.

Hashformers is a framework for hashtag segmentation with transformers.

Keras-1D-ACGAN-Data-Augmentation

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

This is a Keras implementation of a CNN for estimating age, gender and mask from a camera.

A forwarding MPI implementation that can use any other MPI implementation via an MPI ABI

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

A PyTorch implementation of "DGC-Net: Dense Geometric Correspondence Network"

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"