SelfRemaster: SSL Speech Restoration

Overview

SelfRemaster: Self-Supervised Speech Restoration

Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Demo

Setup

  1. Clone this repository: git clone https://github.com/Takaaki-Saeki/ssl_speech_restoration.git
  2. CD into this repository: cd ssl_speech_restoration
  3. Install python packages and download some pretrained models: ./setup.sh

Getting started

  • If you use default Japanese corpora
    • Download JSUT Basic5000 and JVS Corpus
    • Downsample them to 22.05 kHz and Place them under data/ as jsut_22k and jvs_22k
    • Place simulated low-quality data under ./data as jsut_22k-low and jvs_22k-low
  • Or you can use arbitrary datasets by modifying config files

Training

You can choose MelSpec or SourFilter models with --config_path option.
As shown in the paper, MelSpec model is of higher-quality.

Firstly you need to split the data to train/val/test and dump them by the following command.

python preprocess.py --config_path configs/train/${feature}/ssl_jsut.yaml

To perform self-supervised learning with dual learning, run the following command.

python train.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, refer to train.py.

Speech restoration

To perform speech restoration of the test data, run the following command.

python eval.py \
    --config_path configs/test/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, see eval.py.

Audio effect transfer

You can run a simple audio effect transfer demo using a model pretrained with real data.
Run the following command.

python aet_demo.py

Or you can customize the dataset or model.
You need to edit audio_effect_transfer.yaml and run the following command.

python aet.py \
    --config_path configs/test/melspec/audio_effect_transfer.yaml \
    --stage ssl-dual \
    --run_name aet_melspec_dual

For other options, see aet.py.

Pretrained models

See here.

Reproducing results

You can generate simulated low-quality data as in the paper with the following command.

python simulated_data.py \
    --in_dir ${input_directory (e.g., path to jsut_22k)} \
    --output_dir ${output_directory (e.g., path to jsut_22k-low)} \
    --corpus_type ${single-speaker corpus or multi-speaker corpus} \
    --deg_type lowpass

Then download the pretrained model correspond to the deg_type and run the following command.

python eval.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

Citation

@article{saeki22selfremaster,
  title={{SelfRemaster}: {S}elf-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling},
  author={T. Saeki and S. Takamichi and T. Nakamura and N. Tanji and H. Saruwatari},
  journal={arXiv preprint arXiv:2203.12937},
  year={2022}
}

Reference

Owner
Takaaki Saeki
Ph.D. Student @ UTokyo / Spoken Language Processing
Takaaki Saeki
The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

Guoji Fu 18 Nov 14, 2022
RID-Noise: Towards Robust Inverse Design under Noisy Environments

This is code of RID-Noise. Reproduce RID-Noise Results Toy tasks Please refer to the notebook ridnoise.ipynb to view experiments on three toy tasks. B

Thyrix 2 Nov 23, 2022
Learning What and Where to Draw

###Learning What and Where to Draw Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee This is the code for our NIPS 201

Scott Ellison Reed 337 Nov 18, 2022
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。

已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的

182 Jan 02, 2023
Neural Scene Flow Prior (NeurIPS 2021 spotlight)

Neural Scene Flow Prior Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey Will appear on Thirty-fifth Conference on Neural Information Processing Syste

Lilac Lee 85 Jan 03, 2023
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages This repository contains the code for the pa

Kelechi 40 Nov 24, 2022
Temporally Coherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Duc Linh Nguyen 2 Jan 18, 2022
Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

L2F - Learning to Forget for Meta-Learning Sungyong Baik, Seokil Hong, Kyoung Mu Lee Source code for CVPR 2020 paper "Learning to Forget for Meta-Lear

Sungyong Baik 29 May 22, 2022
Position detection system of mobile robot in the warehouse enviroment

Autonomous-Forklift-System About | GUI | Tests | Starting | License | Author | 🎯 About An application that run the autonomous forklift paletization a

Kamil Goś 1 Nov 24, 2021
An open source library for face detection in images. The face detection speed can reach 1000FPS.

libfacedetection This is an open source library for CNN-based face detection in images. The CNN model has been converted to static variables in C sour

Shiqi Yu 11.4k Dec 27, 2022
Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data

This is the XLM-T repository, which includes data, code and pre-trained multilingual language models for Twitter. XLM-T - A Multilingual Language Mode

Cardiff NLP 112 Dec 27, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
Mercury: easily convert Python notebook to web app and share with others

Mercury Share your Python notebooks with others Easily convert your Python notebooks into interactive web apps by adding parameters in YAML. Simply ad

MLJAR 2.2k Dec 27, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrai

Hugging Face 77.4k Jan 05, 2023
Official PyTorch implementation of the NeurIPS 2021 paper StyleGAN3

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Eugenio Herrera 92 Nov 18, 2022
Dynamic Bottleneck for Robust Self-Supervised Exploration

Dynamic Bottleneck Introduction This is a TensorFlow based implementation for our paper on "Dynamic Bottleneck for Robust Self-Supervised Exploration"

Bai Chenjia 4 Nov 14, 2022
[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers Installation pip install -r requirements.txt Dataset Preparation Given the

Yingchen Yu 25 Nov 09, 2022
Public Models considered for emotion estimation from EEG

Emotion-EEG Set of models for emotion estimation from EEG. Composed by the combination of two deep-learing models learning together (RNN and CNN) with

Victor Delvigne 21 Dec 23, 2022
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
CMP 414/765 course repository for Spring 2022 semester

CMP414/765: Artificial Intelligence Spring2021 This is the GitHub repository for course CMP 414/765: Artificial Intelligence taught at The City Univer

ch00226855 4 May 16, 2022