This code is an unofficial implementation of HiFiSinger.

Overview

HiFiSinger

This code is an unofficial implementation of HiFiSinger. The algorithm is based on the following papers:

Chen, J., Tan, X., Luan, J., Qin, T., & Liu, T. Y. (2020). HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776.
Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019). Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32, 3171-3180.
Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.

Requirements

Please see the 'requirements.txt'.

Structure

Generator

  • In training, length regulator use target duration.

Discriminator

  • HiFiSinger uses Sub Frequency GAN(SF-GAN).
  • The frequency range of sampling is fixed and length range is randomized.

Used dataset

  • Code verification was conducted through a limited-sized, private Korean dataset.
  • Please report the information about any available open source dataset.
    • The set of midi files with syncronized lyric and high resolution vocal wave files

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in 'Hyper_Parameters.yaml' according to your environment.

  • Sound

    • Setting basic sound parameters.
  • Tokens

    • The number of Lyric token.
  • Max_Note

    • The highest note value for embedding.
  • Min/Max duration

    • Mel length which model use.
    • Min duration is used at pattern generating only.
  • Encoder

    • Setting the encoder.
  • Duration_Predictor

    • Setting for duration predictor
  • Decoder

    • Setting for decoder.
  • Discriminator

    • Setting for discriminator
    • In frequency range, frequency is the index of mel dimension.
      • The index must be equal or less than Sould.Mel_Dim.
  • Vocoder_Path

    • Setting the traced vocoder path.
    • To generate this, please check Here
  • Train

    • Setting the parameters of training.
  • Use_Mixed_Precision

  • Inference_Batch_Size

    • Setting the batch size when inference
  • Inference_Path

    • Setting the inference path
  • Checkpoint_Path

    • Setting the checkpoint path
  • Log_Path

    • Setting the tensorboard log path
  • Device

    • Setting which GPU device is used in multi-GPU enviornment.
    • Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

  • There is no available open source dataset.

Inference file path while training for verification.

  • Inference_for_Training
    • There are two examples for inference.
    • It is midi file based script.

Run

Command

python Train.py -s 
  • -hp

    • The hyper paramter file path
    • This is required.
  • -s

    • The resume step parameter.
    • Default is 0.
Owner
Heejo You
Main focus: Psycholinguistics / Mechine learning / Deep learning
Heejo You
Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 03, 2023
Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

Nazim Shaikh 64 Nov 02, 2022
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP 📎 A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

Cade Gordon 396 Dec 30, 2022
Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral)

Single-Stage Instance Shadow Detection with Bidirectional Relation Learning (CVPR 2021 Oral) Tianyu Wang*, Xiaowei Hu*, Chi-Wing Fu, and Pheng-Ann Hen

Steve Wong 51 Oct 20, 2022
Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

Deep Image Search - AI-Based Image Search Engine Deep Image Search is an AI-based image search engine that includes deep transfer learning features Ex

139 Jan 01, 2023
LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

This project is based on ultralytics/yolov3. LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image. The related paper is avai

26 Dec 13, 2022
E2C implementation in PyTorch

Embed to Control implementation in PyTorch Paper can be found here: https://arxiv.org/abs/1506.07365 You will need a patched version of OpenAI Gym in

Yicheng Luo 42 Dec 12, 2022
PyGCL: A PyTorch Library for Graph Contrastive Learning

PyGCL is a PyTorch-based open-source Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standa

PyGCL 588 Dec 31, 2022
[ICLR'21] Counterfactual Generative Networks

This repository contains the code for the ICLR 2021 paper "Counterfactual Generative Networks" by Axel Sauer and Andreas Geiger. If you want to take the CGN for a spin and generate counterfactual ima

88 Jan 02, 2023
Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

Mohamed Elmorsy 2 Jul 06, 2022
C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion By: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedal

Meta Research 309 Dec 16, 2022
Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image This repository is an implementation of the method described in the following pap

21 Dec 15, 2022
Official code for paper Exemplar Based 3D Portrait Stylization.

3D-Portrait-Stylization This is the official code for the paper "Exemplar Based 3D Portrait Stylization". You can check the paper on our project websi

60 Dec 07, 2022
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Fibonaaci Series Generating the Fibonacci sequence is a classic recursive proble

Moureen Caroline O 1 Feb 15, 2022
Noether Networks: meta-learning useful conserved quantities

Noether Networks: meta-learning useful conserved quantities This repository contains the code necessary to reproduce experiments from "Noether Network

Dylan Doblar 33 Nov 23, 2022
Code for Greedy Gradient Ensemble for Visual Question Answering (ICCV 2021, Oral)

Greedy Gradient Ensemble for De-biased VQA Code release for "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). GGE can

21 Jun 29, 2022
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.6k Jan 02, 2023
Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

SS-GMM Implementation of ICCV 2021 oral paper -- Self-Supervised Image Prior Learning with GMM from a Single Noisy Image with supplementary material R

HUST-The Tan Lab 4 Dec 05, 2022