Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Overview

LQVAE-separation

Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

Paper

Samples

GT Compressed Separated
Drums GT Compressed Drums Separated Drums
Bass GT Compressed Bass Separated Bass
Mix GT Compressed Mix Separated Mix

The separation is performed on a x64 compressed latent domain. The results can be upsampled via Jukebox upsamplers in order to increment perceptive quality (WIP).

Install

Install the conda package manager from https://docs.conda.io/en/latest/miniconda.html

conda create --name lqvae-separation python=3.7.5
conda activate lqvae-separation
pip install mpi4py==3.0.3
pip install ffmpeg-python==0.2.0
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
pip install -r requirements.txt
pip install -e .

Checkpoints

  • Enter inside script/ folder and create the folder checkpoints/ and the folder results/.
  • Download the checkpoints contained in this Google Drive folder and put them inside checkpoints/

Separation with checkpoints

  • Call the following in order to perform bs separations of 3 seconds starting from second shift of the mixture created with the sources in path_1 and path_2. The sources must be WAV files sampled at 22kHz.
    PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs
    
  • The default value for bs is 64, and can be handled by an RTX3080 with 16 GB of VRAM. Lower the value if you get CUDA: out of memory.

Training

LQ-VAE

  • The vqvae/vqvae.pyfile of Jukebox has been modified in order to include the linearization loss of the LQ-VAE (it is computed at all levels of the hierarchical VQ-VAE but we only care of the topmost level given that we perform separation there). One can train a new LQ-VAE on custom data (here data/train for train and data/test for test) by running the following from the root of the project
PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae --sample_length=131072 --bs=8 
--audio_files_dir=data/train/ --labels=False --train --test --aug_shift --aug_blend --name=lq_vae --test_audio_files_dir=data/test
  • The trained model uses the vqvae hyperparameters in hparams.py so if you want to change the levels / downsampling factors you have to modify them there.
  • The only constraint for training the LQ-VAE is to use an even number for the batch size, given its use of pairs in the loss.
  • Given that L_lin enforces the sum operation on the latent domain, you can use the data of both sources together (or any other audio data).
  • Checkpoints are save in logs/lq_vae (lq_vae is the name parameter).

Priors

  • After training the LQ-VAE, train two priors on two different classes by calling
PYTHONPATH=. mpiexec -n 1 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pior_source
 --audio_files_dir=data/source/train --test_audio_files_dir=data/source/test --labels=False --train --test --aug_shift
  --aug_blend --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --min_duration=24 --sample_length=1048576 
  --bs=16 --n_ctx=8192 --sample=True --sample_iters=1000 --restore_vqvae=logs/lq_vae/checkpoint_lq_vae.pth.tar
  • Here the data of the source is located in data/source/train and data/source/test and we assume the LQ-VAE has 3 levels (topmost level = 2).
  • The Transformer model is defined by the parameters of small_prior in hparams.py and uses a context of n_ctx=8192 codes.
  • The checkpoint path of the LQ-VAE trained in the previous step must be passed to --restore_vqvae
  • Checkpoints are save in logs/pior_source (pior_source is the name parameter).

Codebook sums

  • Before separation, the sums between all codes must be computed using the LQ-VAE. This can be done using the codebook_precalc.py in the script folder:
PYTHONPATH=.. python codebook_precalc.py --save_path=checkpoints/codebook_sum_precalc.pt 
--restore_vqvae=../logs/lq_vae/checkpoint_lq_vae.pth.tar` --raw_to_tokens=64 --l_bins=2048
--sample_rate=22050 --alpha=[0.5, 0.5] --downs_t=(2, 2, 2) --commit=1.0 --emb_width=64

Separation with trained checkpoints

  • Trained checkpoints can be given to bayesian_inference.py as following:
    PYTHONPATH=.. python bayesian_inference.py --shift=shift --path_1=path_1 --path_2=path_2 --bs=bs --restore_vqvae=checkpoints/checkpoint_step_60001_latent.pth.tar
    --restore_priors 'checkpoints/checkpoint_drums_22050_latent_78_19k.pth.tar' checkpoints/checkpoint_latest.pth.tar' --sum_codebook=checkpoints/codebook_precalc_22050_latent.pt
    
  • restore_priors accepts two paths to the first and second prior checkpoints.

Evaluation

  • In order to evaluate the pre-trained checkpoints, run bayesian_test.py after you have put the full Slakh drums and bass validation split inside data/bass/validation and data/drums/validation.

Future work

  • training of upsamplers for increasing the quality of the separations
  • better rejection sampling method (maybe use verifiers as in https://arxiv.org/abs/2110.14168)

Citations

If you find the code useful for your research, please consider citing

@article{mancusi2021unsupervised,
  title={Unsupervised Source Separation via Bayesian Inference in the Latent Domain},
  author={Mancusi, Michele and Postolache, Emilian and Fumero, Marco and Santilli, Andrea and Cosmo, Luca and Rodol{\`a}, Emanuele},
  journal={arXiv preprint arXiv:2110.05313},
  year={2021}
}

as well as the Jukebox baseline:

  • Dhariwal, P., Jun, H., Payne, C., Kim, J. W., Radford, A., & Sutskever, I. (2020). Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341.
Owner
Michele Mancusi
PhD student in Computer Science @ La Sapienza University of Rome, MSc in Quantum Information @ La Sapienza University of Rome
Michele Mancusi
Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

VatsalAryanBhatanagar 1 Dec 30, 2021
Official code for Score-Based Generative Modeling through Stochastic Differential Equations

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains the official implementation for the paper Score-Based Gen

Yang Song 818 Jan 06, 2023
Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

lolviz By Terence Parr. See Explained.ai for more stuff. A very nice looking javascript lolviz port with improvements by Adnan M.Sagar. A simple Pytho

Terence Parr 785 Dec 30, 2022
A distributed deep learning framework that supports flexible parallelization strategies.

FlexFlow FlexFlow is a deep learning framework that accelerates distributed DNN training by automatically searching for efficient parallelization stra

528 Dec 25, 2022
Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.

Knowledge Graph Embeddings and Chemical Effect Prediction, 2020. Scripts and outputs related to the paper Prediction of Adverse Biological Effects of

Knowledge Graphs at the Norwegian Institute for Water Research 1 Nov 01, 2021
🐦 Quickly annotate data from the comfort of your Jupyter notebook

🐦 pigeon - Quickly annotate data on Jupyter Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort

Anastasis Germanidis 647 Jan 05, 2023
Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image This repository contains the code for the following paper: R. Hu,

Meta Research 37 Jan 04, 2023
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
[CVPR 2021 Oral] Variational Relational Point Completion Network

VRCNet: Variational Relational Point Completion Network This repository contains the PyTorch implementation of the paper: Variational Relational Point

PL 121 Dec 12, 2022
Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

This is a repository for the paper of "Well-classified Examples are Underestimated in Classification with Deep Neural Networks" The implementation and

LancoPKU 25 Dec 11, 2022
Code and results accompanying our paper titled Mixture Proportion Estimation and PU Learning: A Modern Approach at Neurips 2021 (Spotlight)

Mixture Proportion Estimation and PU Learning: A Modern Approach This repository is the official implementation of Mixture Proportion Estimation and P

Approximately Correct Machine Intelligence (ACMI) Lab 23 Dec 28, 2022
A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

Differentiable SVD Introduction This repository contains: The official Pytorch implementation of ICCV21 paper Why Approximate Matrix Square Root Outpe

YueSong 32 Dec 25, 2022
My published benchmark for a Kaggle Simulations Competition

Lux AI Working Title Bot Please refer to the Kaggle notebook for the comment section. The comment section contains my explanation on my code structure

Tong Hui Kang 29 Aug 22, 2022
Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Filtration Curves for Graph Representation This repository provides the code from the KDD'21 paper Filtration Curves for Graph Representation. Depende

Machine Learning and Computational Biology Lab 16 Oct 16, 2022
YOLOX Win10 Project

Introduction 这是一个用于Windows训练YOLOX的项目,相比于官方项目,做了一些适配和修改: 1、解决了Windows下import yolox失败,No such file or directory: 'xxx.xml'等路径问题 2、CUDA out of memory等显存不

5 Jun 08, 2022
A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

EfficientNet A PyTorch implementation of EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. [arxiv] [Official TF Repo] Implemen

AhnDW 298 Dec 10, 2022
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Multipath RefineNet A MATLAB based framework for semantic image segmentation and general dense prediction tasks on images. This is the source code for

Guosheng Lin 575 Dec 06, 2022
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022
Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

An official implementation of paper Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

11 Nov 23, 2022