Evaluation and Benchmarking of Speech Super-resolution Methods

Overview

Speech Super-resolution Evaluation and Benchmarking

What this repo do:

  • A toolbox for the evaluation of speech super-resolution algorithms.
  • Unify the evaluation pipline of speech super-resolution algorithms for a easier comparison between different systems.
  • Benchmarking speech super-resolution methods (pull request is welcome). Encouraging reproducible research.

I build this repo while I'm writing my paper for INTERSPEECH 2022: Neural Vocoder is All You Need for Speech Super-resolution. The model mentioned in this paper, NVSR, will also be open-sourced here.

Installation

Install via pip:

pip3 install ssr_eval

Please make sure you have already installed sox.

Quick Example

A basic example: Evaluate on a system that do nothing:

from ssr_eval import test 
test()
  • The evaluation result json file will be stored in the ./results directory: Example file
  • The code will automatically handle stuffs like downloading test sets.
  • You will find a field "averaged" at the bottom of the json file that looks like below. This field mark the performance of the system.
"averaged": {
        "proc_fft_24000_44100": {
            "lsd": 5.152331300436993,
            "log_sispec": 5.8051057146229095,
            "sispec": 30.23394207533686,
            "ssim": 0.8484425044157442
        }
    }

Here we report four metrics:

  1. Log spectral distance(LSD).
  2. Log scale invariant spectral distance [1] (log-sispec).
  3. Scale invariant spectral distance [1] (sispec).
  4. Structral similarity (SSIM).

⚠️ LSD is the most widely used metric for super-resolution. And I include another three metrics just in case you need them.


main_idea

Below is the code of test()

from ssr_eval import SSR_Eval_Helper, BasicTestee

# You need to implement a class for the model to be evaluated.
class MyTestee(BasicTestee):
    def __init__(self) -> None:
        super().__init__()

    # You need to implement this function
    def infer(self, x):
        """A testee that do nothing

        Args:
            x (np.array): [sample,], with model_input_sr sample rate
            target (np.array): [sample,], with model_output_sr sample rate

        Returns:
            np.array: [sample,]
        """
        return x

def test():
    testee = MyTestee()
    # Initialize a evaluation helper
    helper = SSR_Eval_Helper(
        testee,
        test_name="unprocessed",  # Test name for storing the result
        input_sr=44100,  # The sampling rate of the input x in the 'infer' function
        output_sr=44100,  # The sampling rate of the output x in the 'infer' function
        evaluation_sr=48000,  # The sampling rate to calculate evaluation metrics.
        setting_fft={
            "cutoff_freq": [
                12000
            ],  # The cutoff frequency of the input x in the 'infer' function
        },
        save_processed_result=True
    )
    # Perform evaluation
    ## Use all eight speakers in the test set for evaluation (limit_test_speaker=-1) 
    ## Evaluate on 10 utterance for each speaker (limit_test_nums=10)
    helper.evaluate(limit_test_nums=10, limit_test_speaker=-1)

The code will automatically handle stuffs like downloading test sets. The evaluation result will be saved in the ./results directory.

Baselines

We provide several pretrained baselines. For example, to run the NVSR baseline, you can click the link in the following table for more details.


Table.1 Log-spectral distance (LSD) on different input sampling-rate (Evaluated on 44.1kHz).

Method One for all Params 2kHz 4kHz 8kHz 12kHz 16kHz 24kHz 32kHz AVG
NVSR [Pretrained Model] Yes 99.0M 1.04 0.98 0.91 0.85 0.79 0.70 0.60 0.84
WSRGlow(24kHz→48kHz) No 229.9M - - - - - 0.79 - -
WSRGlow(12kHz→48kHz) No 229.9M - - - 0.87 - - - -
WSRGlow(8kHz→48kHz) No 229.9M - - 0.98 - - - - -
WSRGlow(4kHz→48kHz) No 229.9M - 1.12 - - - - - -
Nu-wave(24kHz→48kHz) No 3.0M - - - - - 1.22 - -
Nu-wave(12kHz→48kHz) No 3.0M - - - 1.40 - - - -
Nu-wave(8kHz→48kHz) No 3.0M - - 1.42 - - - - -
Nu-wave(4kHz→48kHz) No 3.0M - 1.42 - - - - - -
Unprocessed - - 5.69 5.50 5.15 4.85 4.54 3.84 2.95 4.65

Click the link of the model for more details.

Here "one for all" means model can process flexible input sampling rate.

Features

The following code demonstrate the full options in the SSR_Eval_Helper:

testee = MyTestee()
helper = SSR_Eval_Helper(testee, # Your testsee object with 'infer' function implemented
                        test_name="unprocess",  # The name of this test. Used for saving the log file in the ./results directory
                        test_data_root="./your_path/vctk_test", # The directory to store the test data, which will be automatically downloaded.
                        input_sr=44100, # The sampling rate of the input x in the 'infer' function
                        output_sr=44100, # The sampling rate of the output x in the 'infer' function
                        evaluation_sr=48000, # The sampling rate to calculate evaluation metrics. 
                        save_processed_result=False, # If True, save model output in the dataset directory.
                        # (Recommend/Default) Use fourier method to simulate low-resolution effect
                        setting_fft = {
                            "cutoff_freq": [1000, 2000, 4000, 6000, 8000, 12000, 16000], # The cutoff frequency of the input x in the 'infer' function
                        }, 
                        # Use lowpass filtering to simulate low-resolution effect. All possible combinations will be evaluated. 
                        setting_lowpass_filtering = {
                            "filter":["cheby","butter","bessel","ellip"], # The type of filter 
                            "cutoff_freq": [1000, 2000, 4000, 6000, 8000, 12000, 16000], 
                            "filter_order": [3,6,9] # Filter orders
                        }, 
                        # Use subsampling method to simulate low-resolution effect
                        setting_subsampling = {
                            "cutoff_freq": [1000, 2000, 4000, 6000, 8000, 12000, 16000],
                        }, 
                        # Use mp3 compression method to simulate low-resolution effect
                        setting_mp3_compression = {
                            "low_kbps": [32, 48, 64, 96, 128],
                        },
)

helper.evaluate(limit_test_nums=10, # For each speaker, only evaluate on 10 utterances.
                limit_test_speaker=-1 # Evaluate on all the speakers. 
                )

⚠️ I recommand all the users to use fourier method (setting_fft) to simulate low-resolution effect for the convinence of comparing between different system.

Dataset Details

We build the test sets using VCTK (version 0.92), a multi-speaker English corpus that contains 110 speakers with different accents.

  • Speakers used for the test set: p360, p361, p362, p363, p364, p374, p376, s5
  • For the remaining 100 speakers, p280 and p315 are omitted for the technical issues.
  • Other 98 speakers are used for training.

Citation

If you find this repo useful for your research, please consider citing:

@misc{liu2022neural,
      title={Neural Vocoder is All You Need for Speech Super-resolution}, 
      author={Haohe Liu and Woosung Choi and Xubo Liu and Qiuqiang Kong and Qiao Tian and DeLiang Wang},
      year={2022},
      eprint={2203.14941},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Reference

[1] Liu, Haohe, et al. "VoiceFixer: Toward General Speech Restoration with Neural Vocoder." arXiv preprint arXiv:2109.13731 (2021).

Owner
Haohe Liu (刘濠赫)
Haohe Liu (刘濠赫)
Plover-tapey-tape: an alternative to Plover’s built-in paper tape

plover-tapey-tape plover-tapey-tape is an alternative to Plover’s built-in paper

7 May 29, 2022
Source code for the paper "SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text" PACLIC 2021

Adversarial text generator Refer to "adversarial_text_generator"[https://github.com/quocnsh/SEPP_generator] project for generating adversarial texts A

0 Oct 05, 2021
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

KoRean based ELECTRA (KR-ELECTRA) This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computa

12 Jun 03, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Pytorch Recurrent Variational Autoencoder Model: This is the implementation of Samuel Bowman's Generating Sentences from a Continuous Space with Kim's

Daniil Gavrilov 347 Nov 14, 2022
Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system

Recommender-Systems Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system So the data

Yash Kumar 0 Jan 20, 2022
Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Training Reproduce of PSPNet. (Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with support

Li Xuhong 126 Jul 13, 2022
[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

DeepSurfels: Learning Online Appearance Fusion Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission DeepSurfel

Online Reconstruction 52 Nov 14, 2022
Code release for NeuS

NeuS We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inpu

Peng Wang 813 Jan 04, 2023
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu,

Layer6 Labs 37 Dec 18, 2022
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
Official repository of DeMFI (arXiv.)

DeMFI This is the official repository of DeMFI (Deep Joint Deblurring and Multi-Frame Interpolation). [ArXiv_ver.] Coming Soon. Reference Jihyong Oh a

Jihyong Oh 56 Dec 14, 2022
LeetCode Solutions https://t.me/tenvlad

leetcode LeetCode Solutions groupped by common patterns YouTube: https://www.youtube.com/c/vladten Telegram: https://t.me/nilinterface Problems source

Vlad Ten 158 Dec 29, 2022
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 01, 2023
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network

ild-cnn This is supplementary material for the manuscript: "Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neur

22 Nov 05, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 01, 2023
Demo project for real time anomaly detection using kafka and python

kafkaml-anomaly-detection Project for real time anomaly detection using kafka and python It's assumed that zookeeper and kafka are running in the loca

Rodrigo Arenas 36 Dec 12, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
Experiments with differentiable stacks and queues in PyTorch

Please use stacknn-core instead! StackNN This project implements differentiable stacks and queues in PyTorch. The data structures are implemented in s

Will Merrill 141 Oct 06, 2022