Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Last update: Oct 22, 2022

Related tags

Overview

Y-Net

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021

Project page: ipcv.github.io/Acappella/
Paper: Arxiv, BMVC (not available yet)

Running a demo / Y-Net Inference

We provide simple functions to load models with pre-trained weights. Steps:

Clone the repo or download y-net>VnBSS>models (models can run as a standalone package)
Load a model:

from VnBSS import y_net_gr # or from models import y_net_gr 
model = y_net_gr(n=1)

Check a demo fully working:

Citation

@inproceedings{acappella,
    author    = {Juan F. Montesinos and
                 Venkatesh S. Kadandale and
                 Gloria Haro},
    title     = {A cappella: Audio-visual Singing VoiceSeparation},
    booktitle = {British Machine Vision Conference (BMVC)},
    year      = {2021},

}

.
.
.
.
.
.
.
.

Training / Using DEV code

###Training The most difficult part is to prepare the dataset as everything is builded upon a very specific format.
To run training:
python run.py -m model_name --workname experiment_name --arxiv_path directory_of_experiments --pretrained_from path_pret_weights
You can inspect the argparse at default.py>argparse_default.
Possible model names are: y_net_g, y_net_gr, y_net_m,y_net_r,u_net,llcp

Testing

Go to manuscript_scripts and replace checkpoint paths by yours in the testing scripts.
Run: bash manuscript_scripts/test_gr_r.sh
Replace the paths of manuscript_scripts/auto_metrics.py by your experiment_directory path.
Run: python manuscript_scripts/auto_metrics.py to visualise results.

It's a complicated framework. HELP!

The best option to run the framework is to debug! Having a runable code helps to see input shapes, dataflow and to run line by line. Download The circle of life demo with the files already processed. It will act like a dataset of 6 samples. You can download it from Google Drive 1.1 Gb.

Unzip the file
run python run.py -m y_net_gr (for example)

Everything has been configured to run by default this way.

The model

Each effective model is wrapped by a nn.Module which takes care of computing the STFT, the mask, returning the waveform etcetera... This wrapper can be found at VnBSS>models>y_net.py>YNet. To get rid of this you can simply inherit the class, take minimum layers and keep the core_forward method, which is the inference step without the miscelanea.

FAQs

How to change the optimizer's hyperparameters?
Go to config>optimizer.json
How to change clip duration, video framerate, STFT parameters or audio samplerate?
Go to config>__init__.py
How to change the batch size or the amount of epochs?
Go to config>hyptrs.json
How to dump predictions from the training and test set
Go to default.py. Modify DUMP_FILES (can be controlled at a subset level). force argument skips the iteration-wise conditions and dumps for every single network prediction.
Is tensorboard enabled?
Yes, you will find tensorboard records at your_experiment_directory/used_workname/tensorboard
Can I resume an experiment?
Yes, if you set exactly the same experiment folder and workname, the system will detect it and will resume from there.
I'm trying to resume but found AssertionError If there is an exception before running the model
How to change the amount of layers of U-Net
U-net is build dynamically given a list of layers per block as shown in models>__init__.py from outer to inner blocks.
How to modify the default network values?
The json file config>net_cfg.json overwrites any default configuration from the model.

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Related tags

Overview

Y-Net

Running a demo / Y-Net Inference

Citation

Training / Using DEV code

Testing

It's a complicated framework. HELP!

The model

FAQs

Owner

Juan F. Montesinos

Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

A library for augmenting annotated audio data

This Bot can extract audios and subtitles from video files

Python I/O for STEM audio files

Accompanying code for our paper "Point Cloud Audio Processing"

Python module for handling audio metadata

Frescobaldi LilyPond Editor

Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS)

Nayeli: cool telegram groups vc music project

Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Identify the emotion of multiple speakers in an Audio Segment

Pythonic bindings for FFmpeg's libraries.

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Audio book player for senior visually impaired.

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

Learn chords with your MIDI keyboard !

A voice based calculator by using termux api in Android

A2DP agent for promiscuous/permissive audio sinc.

praudio provides audio preprocessing framework for Deep Learning audio applications

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Related tags

Overview

Y-Net

Running a demo / Y-Net Inference

Citation

Training / Using DEV code

Testing

It's a complicated framework. HELP!

The model

FAQs

Owner

Juan F. Montesinos

Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

A library for augmenting annotated audio data

This Bot can extract audios and subtitles from video files

Python I/O for STEM audio files

Accompanying code for our paper "Point Cloud Audio Processing"

Python module for handling audio metadata

Frescobaldi LilyPond Editor

Tradutor de um arquivo MIDI para ser usado em um simulador RISC-V(RARS)

Nayeli: cool telegram groups vc music project

Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Identify the emotion of multiple speakers in an Audio Segment

﻿﻿Pythonic bindings for FFmpeg's libraries.

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Audio book player for senior visually impaired.

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

Learn chords with your MIDI keyboard !

A voice based calculator by using termux api in Android

A2DP agent for promiscuous/permissive audio sinc.

praudio provides audio preprocessing framework for Deep Learning audio applications

Pythonic bindings for FFmpeg's libraries.