Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

Related tags

Deep LearningLDNet
Overview

LDNet

Author: Wen-Chin Huang (Nagoya University) Email: [email protected]

This is the official implementation of the paper "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech". This is a model that takes an input synthetic speech sample and outputs the simulated human rating.

Results

Usage

Currently we support only the VCC2018 dataset. We plan to release the BVCC dataset in the near future.

Requirements

  • PyTorch 1.9 (versions not too old should be fine.)
  • librosa
  • pandas
  • h5py
  • scipy
  • matplotlib
  • tqdm

Data preparation

# Download the VCC2018 dataset.
cd data
./download.sh vcc2018

Training

We provide configs that correspond to the following rows in the above figure:

  • (a): MBNet.yaml
  • (d): LDNet_MobileNetV3_RNN_5e-3.yaml
  • (e): LDNet_MobileNetV3_FFN_1e-3.yaml
  • (f): LDNet-MN_MobileNetV3_RNN_FFN_1e-3_lamb4.yaml
  • (g): LDNet-ML_MobileNetV3_FFN_1e-3.yaml
python train.py --config configs/<config_name> --tag <tag_name>

By default, the experimental results will be stored in exp/<tag_name>, including:

  • model-<steps>.pt: model checkpoints.
  • config.yml: the config file.
  • idtable.pkl: the dictionary that maps listener to ID.
  • training_<inference_mode>: the validation results generated along the training. This file is useful for model selection. Note that the inference_mode in the config file decides what mode is used during validation in the training.

There are some arguments that can be changed:

  • --exp_dir: The directory for storing the experimental results.
  • --data_dir: The data directory. Default is data/vcc2018.
  • seed: random seed.
  • update_freq: This is very important. See below.

Batch size and update_freq

By default, all LDNet models are trained with a batch size of 60. In my experiments, I used a single NVIDIA GeForce RTX 3090 with 24GB mdemory for training. I cannot fit the whole model in the GPU, so I accumulate gradients for update_freq forward passes and do one backward update. Before training, please check the train_batch_size in the config file, and set update_freq properly. For instance, in configs/LDNet_MobileNetV3_FFN_1e-3.yaml the train_batch_size is 20, so update_freq should be set to 3.

Inference

python inference.py --tag LDNet-ML_MobileNetV3_FFN_1e-3 --mode mean_listener

Use mode to specify which inference mode to use. Choices are: mean_net, all_listeners and mean_listener. By default, all checkpoints in the exp directory will be evaluated.

There are some arguments that can be changed:

  • ep: if you want to evaluate one model checkpoint, say, model-10000.pt, then simply pass --ep 10000.
  • start_ep: if you want to evaluate model checkpoints after a certain steps, say, 10000 steps later, then simply pass --start_ep 10000.

There are some files you can inspect after the evaluation:

  • <dataset_name>_<inference_mode>.csv: the validation and test set results.
  • <dataset_name>_<inference_mode>_<test/valid>/: figures that visualize the prediction distributions, including;
    • <ep>_distribution.png: distribution over the score range (1-5).
    • <ep>_utt_scatter_plot_utt: utterance-wise scatter plot of the ground truth and the predicted scores.
    • <ep>_sys_scatter_plot_utt: system-wise scatter plot of the ground truth and the predicted scores.

Acknowledgement

This repository inherits from this great unofficial MBNet implementation.

Citation

If you find this recipe useful, please consider citing following paper:

@article{huang2021ldnet,
  title={LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech},
  author={Huang, Wen-Chin and Cooper, Erica and Yamagishi, Junichi and Toda, Tomoki},
  journal={arXiv preprint arXiv:2110.09103},
  year={2021}
}
Owner
Wen-Chin Huang (unilight)
Ph.D. candidate at Nagoya University, Japan. M.S. @ Nagoya University. B.S. @ National Taiwan University. RA at IIS, Academia Sinica, Taiwan.
Wen-Chin Huang (unilight)
Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun

Computer Vision and Intelligence Research (CVIR) 13 Dec 10, 2022
Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Machine learning enabling high-throughput and remote operations at large-scale user facilities. Overview This repository contains the source code and

BNL 4 Sep 24, 2022
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

CC 4.4k Dec 27, 2022
Info and sample codes for "NTU RGB+D Action Recognition Dataset"

"NTU RGB+D" Action Recognition Dataset "NTU RGB+D 120" Action Recognition Dataset "NTU RGB+D" is a large-scale dataset for human action recognition. I

Amir Shahroudy 578 Dec 30, 2022
Semantic segmentation models, datasets and losses implemented in PyTorch.

Semantic Segmentation in PyTorch Semantic Segmentation in PyTorch Requirements Main Features Models Datasets Losses Learning rate schedulers Data augm

Yassine 1.3k Jan 07, 2023
NLP made easy

GluonNLP: Your Choice of Deep Learning for NLP GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you l

Distributed (Deep) Machine Learning Community 2.5k Jan 04, 2023
[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF] Wuyang Chen, Xinyu Gong, Zhangyang Wang In ICLR 2

VITA 156 Nov 28, 2022
Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

Erick Cobos 73 Dec 04, 2022
Convex optimization for fun and profit.

CFMM Optimal Routing This repository contains the code needed to generate the figures used in the paper Optimal Routing for Constant Function Market M

Guillermo Angeris 183 Dec 29, 2022
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

YeongHyeon Park 9 Oct 25, 2022
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

Alibaba Cloud 57 Nov 27, 2022
PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi PIKA is a lightweight speech processing toolkit based on Pytorch and (Py)

336 Nov 25, 2022
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper] This is Official PyTorch implementatio

42 Nov 01, 2022
Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

longlongman 170 Dec 01, 2022
Scalable Optical Flow-based Image Montaging and Alignment

SOFIMA SOFIMA (Scalable Optical Flow-based Image Montaging and Alignment) is a tool for stitching, aligning and warping large 2d, 3d and 4d microscopy

Google Research 16 Dec 21, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning. Please check https://ncvx.org for detailed instruction

SUN Group @ UMN 28 Aug 03, 2022
Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Bert Axioms This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics Required Data In order to run this code, you

Arthur Câmara 5 Jan 21, 2022
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo

Rishikesh (ऋषिकेश) 175 Dec 23, 2022
Gradient Inversion with Generative Image Prior

Gradient Inversion with Generative Image Prior This repository is an implementation of "Gradient Inversion with Generative Image Prior", accepted to N

MLLab @ Postech 25 Jan 09, 2023
A python library for face detection and features extraction based on mediapipe library

FaceAnalyzer A python library for face detection and features extraction based on mediapipe library Introduction FaceAnalyzer is a library based on me

Saifeddine ALOUI 14 Dec 30, 2022