NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Overview

NU-Wave — Official PyTorch Implementation

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Junhyeok Lee, Seungu Han @ MINDsLab Inc., SNU

Paper(arXiv): https://arxiv.org/abs/2104.02321 (Accepted to INTERSPEECH 2021)
Audio Samples: https://mindslab-ai.github.io/nuwave

Official Pytorch+Lightning Implementation for NU-Wave.

Update: CODE RELEASED! README is DONE.

Requirements

Preprocessing

Before running our project, you need to download and preprocess dataset to .pt files

  1. Download VCTK dataset
  2. Remove speaker p280 and p315
  3. Modify path of downloaded dataset data:dir in hparameters.yaml
  4. run utils/wav2pt.py
$ python utils/wav2pt.py

Training

  1. Adjust hparameters.yaml, especially train section.
train:
  batch_size: 18 # Dependent on GPU memory size
  lr: 0.00003
  weight_decay: 0.00
  num_workers: 64 # Dependent on CPU cores
  gpus: 2 # number of GPUs
  opt_eps: 1e-9
  beta1: 0.5
  beta2: 0.999
  • If you want to train with single speaker, use VCTKSingleSpkDataset instead of VCTKMultiSpkDataset for dataset in dataloader.py. And use batch_size=1 for validation dataloader.
  • Adjust data section in hparameters.yaml.
data:
  dir: '/DATA1/VCTK/VCTK-Corpus/wav48/p225' #dir/spk/format
  format: '*mic1.pt'
  cv_ratio: (223./231., 8./231., 0.00) #train/val/test
  1. run trainer.py.
$ python trainer.py
  • If you want to resume training from checkpoint, check parser.
    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,\
            required = False, help = "Resume Checkpoint epoch number")
    parser.add_argument('-s', '--restart', action = "store_true",\
            required = False, help = "Significant change occured, use this")
    parser.add_argument('-e', '--ema', action = "store_true",\
            required = False, help = "Start from ema checkpoint")
    args = parser.parse_args()
  • During training, tensorboard logger is logging loss, spectrogram and audio.
$ tensorboard --logdir=./tensorboard --bind_all

Evaluation

run for_test.py or test.py

$ python test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}
or
$ python for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}

Please check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,
                required = True, help = "Resume Checkpoint epoch number")
    parser.add_argument('-e', '--ema', action = "store_true",
                required = False, help = "Start from ema checkpoint")
    parser.add_argument('--save', action = "store_true",
               required = False, help = "Save file")

While we provide lightning style test code test.py, it has device dependency. Thus, we recommend to use for_test.py.

References

This implementation uses code from following repositories:

This README and the webpage for the audio samples are inspired by:

The audio samples on our webpage are partially derived from:

Repository Structure

.
├── Dockerfile
├── dataloader.py           # Dataloader for train/val(=test)
├── filters.py              # Filter implementation
├── test.py                 # Test with lightning_loop.
├── for_test.py             # Test with for_loop. Recommended due to device dependency of lightning
├── hparameter.yaml         # Config
├── lightning_model.py      # NU-Wave implementation. DDPM is based on ivanvok's WaveGrad implementation
├── model.py                # NU-Wave model based on lmnt-com's DiffWave implementation
├── requirement.txt         # requirement libraries
├── sampling.py             # Sampling a file
├── trainer.py              # Lightning trainer
├── README.md           
├── LICSENSE
├── utils
│  ├── stft.py              # STFT layer
│  ├── tblogger.py          # Tensorboard Logger for lightning
│  └── wav2pt.py            # Preprocessing
└── docs                    # For github.io
   └─ ...

Citation & Contact

If this repository useful for your research, please consider citing! Bibtex will be updated after INTERSPEECH 2021 conference.

@article{lee2021nuwave,
  title={NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling},
  author={Lee, Junhyeok and Han, Seungu},
  journal={arXiv preprint arXiv:2104.02321},
  year={2021}
}

If you have a question or any kind of inquiries, please contact Junhyeok Lee at [email protected]

Owner
MINDs Lab
MINDsLab provides AI platform and various AI engines based on deep machine learning.
MINDs Lab
Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR 2018).

Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR2018) By Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu and J

Zilong Huang 245 Dec 13, 2022
Universal Probability Distributions with Optimal Transport and Convex Optimization

Sylvester normalizing flows for variational inference Pytorch implementation of Sylvester normalizing flows, based on our paper: Sylvester normalizing

Rianne van den Berg 172 Dec 13, 2022
RADIal is available now! Check the download section

Latest news: RADIal is available now! Check the download section. However, because we are currently working on the data anonymization, we provide for

valeo.ai 55 Jan 03, 2023
Music library streaming app written in Flask & VueJS

djtaytay This is a little toy app made to explore Vue, brush up on my Python, and make a remote music collection accessable through a web interface. I

Ryan Tasson 6 May 27, 2022
An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

gym-idsgame An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym gym-idsgame is a reinforcement learning environment for simulating at

Kim Hammar 29 Dec 03, 2022
Automatically align face images 🙃→🙂. Can also do windowing and warping.

Automatic Face Alignment (AFA) Carl M. Gaspar & Oliver G.B. Garrod You have lots of photos of faces like this: But you want to line up all of the face

Carl Michael Gaspar 15 Dec 12, 2022
Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Cross-media Structured Common Space for Multimedia Event Extraction Table of Contents Overview Requirements Data Quickstart Citation Overview The code

Manling Li 49 Nov 21, 2022
PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

logit-adj-pytorch PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment This code implements the paper: Long-tail Learning via

Chamuditha Jayanga 53 Dec 23, 2022
Swapping face using Face Mesh with TensorFlow Lite

Swapping face using Face Mesh with TensorFlow Lite

iwatake 17 Apr 26, 2022
Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021
Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

基于Paddle实现PiT ——Rethinking Spatial Dimensions of Vision Transformers,arxiv 官方原版代

Hongtao Wen 4 Jan 15, 2022
The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

SAG-DTA The code is the implementation for the paper 'SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network'. Requirements py

Shugang Zhang 7 Aug 02, 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 123 Dec 23, 2022
Text Generation by Learning from Demonstrations

Text Generation by Learning from Demonstrations The README was last updated on March 7, 2021. The repo is based on fairseq (v0.9.?). Paper arXiv Prere

38 Oct 21, 2022
E-RAFT: Dense Optical Flow from Event Cameras

E-RAFT: Dense Optical Flow from Event Cameras This is the code for the paper E-RAFT: Dense Optical Flow from Event Cameras by Mathias Gehrig, Mario Mi

Robotics and Perception Group 71 Dec 12, 2022
Additional code for Stable-baselines3 to load and upload models from the Hub.

Hugging Face x Stable-baselines3 A library to load and upload Stable-baselines3 models from the Hub. Installation With pip Examples [Todo: add colab t

Hugging Face 34 Dec 10, 2022
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
Repository for "Space-Time Correspondence as a Contrastive Random Walk" (NeurIPS 2020)

Space-Time Correspondence as a Contrastive Random Walk This is the repository for Space-Time Correspondence as a Contrastive Random Walk, published at

A. Jabri 239 Dec 27, 2022
Robust fine-tuning of zero-shot models

Robust fine-tuning of zero-shot models This repository contains code for the paper Robust fine-tuning of zero-shot models by Mitchell Wortsman*, Gabri

224 Dec 29, 2022
Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

gaurav pathak 86 Oct 28, 2022