Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Last update: Dec 20, 2022

Related tags

Deep Learning StrengthNet

Overview

StrengthNet

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

https://arxiv.org/abs/2110.03156

Dependency

Ubuntu 18.04.5 LTS

GPU: Quadro RTX 6000
Driver version: 450.80.02
CUDA version: 11.0

Python 3.5

tensorflow-gpu 2.0.0b1 (cudnn=7.6.0)
scipy
pandas
matplotlib
librosa

Environment set-up

For example,

conda create -n strengthnet python=3.5
conda activate strengthnet
pip install -r requirements.txt
conda install cudnn=7.6.0

Usage

Run python utils.py to extract .wav to .h5;
Run python train.py to train a CNN-BLSTM based StrengthNet;

Evaluating new samples

Put the waveforms you wish to evaluate in a folder. For example, / /
Run python test.py --rootdir / /

This script will evaluate all the .wav files in / /, and write the results to / / /StrengthNet_result_raw.txt.

By default, the output/strengthnet.h5 pretrained model is used.

Citation

If you find this work useful in your research, please consider citing:

@misc{liu2021strengthnet,
      title={StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis}, 
      author={Rui Liu and Berrak Sisman and Haizhou Li},
      year={2021},
      eprint={2110.03156},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Resources

The ESD corpus is released by the HLT lab, NUS, Singapore.

The strength scores for the English samples of the ESD corpus are available here.

Acknowledgements:

MOSNet: https://github.com/lochenchou/MOSNet

Relative Attributes: Relative Attributes

License

This work is released under MIT License (see LICENSE file for details).

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

Related tags

Overview

StrengthNet

Dependency

Environment set-up

Usage

Evaluating new samples

Citation

Resources

Acknowledgements:

License

Owner

RuiLiu

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

A minimalist environment for decision-making in autonomous driving

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Transformer in Computer Vision

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

DANA paper supplementary materials

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Codebase for arXiv preprint "NeRF++: Analyzing and Improving Neural Radiance Fields"

LAMDA: Label Matching Deep Domain Adaptation

基于tensorflow 2.x的图片识别工具集

(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

MLP-Like Vision Permutator for Visual Recognition (PyTorch)

zeus is a Python implementation of the Ensemble Slice Sampling method.

Code repository for our paper regarding the L3D dataset.

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)