SelfRemaster: SSL Speech Restoration

Last update: Jan 07, 2023

Overview

SelfRemaster: Self-Supervised Speech Restoration

Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Demo

Audio samples
Audio effect transfer with Gradio + HuggingFace Spaces 🤗

Setup

Clone this repository: git clone https://github.com/Takaaki-Saeki/ssl_speech_restoration.git
CD into this repository: cd ssl_speech_restoration
Install python packages and download some pretrained models: ./setup.sh

Getting started

If you use default Japanese corpora
- Download JSUT Basic5000 and JVS Corpus
- Downsample them to 22.05 kHz and Place them under data/ as jsut_22k and jvs_22k
- Place simulated low-quality data under ./data as jsut_22k-low and jvs_22k-low
Or you can use arbitrary datasets by modifying config files

Training

You can choose MelSpec or SourFilter models with --config_path option.
As shown in the paper, MelSpec model is of higher-quality.

Firstly you need to split the data to train/val/test and dump them by the following command.

python preprocess.py --config_path configs/train/${feature}/ssl_jsut.yaml

To perform self-supervised learning with dual learning, run the following command.

python train.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, refer to train.py.

Speech restoration

To perform speech restoration of the test data, run the following command.

python eval.py \
    --config_path configs/test/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, see eval.py.

Audio effect transfer

You can run a simple audio effect transfer demo using a model pretrained with real data.
Run the following command.

python aet_demo.py

Or you can customize the dataset or model.
You need to edit audio_effect_transfer.yaml and run the following command.

python aet.py \
    --config_path configs/test/melspec/audio_effect_transfer.yaml \
    --stage ssl-dual \
    --run_name aet_melspec_dual

For other options, see aet.py.

Pretrained models

See here.

Reproducing results

You can generate simulated low-quality data as in the paper with the following command.

python simulated_data.py \
    --in_dir ${input_directory (e.g., path to jsut_22k)} \
    --output_dir ${output_directory (e.g., path to jsut_22k-low)} \
    --corpus_type ${single-speaker corpus or multi-speaker corpus} \
    --deg_type lowpass

Then download the pretrained model correspond to the deg_type and run the following command.

python eval.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

Citation

@article{saeki22selfremaster,
  title={{SelfRemaster}: {S}elf-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling},
  author={T. Saeki and S. Takamichi and T. Nakamura and N. Tanji and H. Saruwatari},
  journal={arXiv preprint arXiv:2203.12937},
  year={2022}
}

SelfRemaster: SSL Speech Restoration

Related tags

Overview

SelfRemaster: Self-Supervised Speech Restoration

Demo

Setup

Getting started

Training

Speech restoration

Audio effect transfer

Pretrained models

Reproducing results

Citation

Reference

Owner

Takaaki Saeki

An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).

patchmatch和patchmatchstereo算法的python实现

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Nodule Generation Algorithm Baseline and template code for node21 generation track

Code for the paper: Hierarchical Reinforcement Learning With Timed Subgoals, published at NeurIPS 2021

CountDown to New Year and shoot fireworks

The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

WSDM‘2022: Knowledge Enhanced Sports Game Summarization

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Grounding Representation Similarity with Statistical Testing

Source code for Acorn, the precision farming rover by Twisted Fields

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

An efficient implementation of GPNN

Medical-Image-Triage-and-Classification-System-Based-on-COVID-19-CT-and-X-ray-Scan-Dataset