MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

Overview

MusicYOLO

MusicYOLO framework uses the object detection model, YOLOX, to locate notes in the spectrogram. Its performance on the ISMIR2014 dataset, MIR-ST500 dataset and SSVD dataset show that MusicYOLO significantly improves onset/offset detection compared with previous approaches.

Installation

Step1. Install pytorch.

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch

Step1. Install YOLOX.

git clone [email protected]:xk-wang/MusicYOLO.git
cd MusicYOLO
pip3 install -U pip && pip3 install -r requirements.txt
pip3 install -v -e .  # or  python3 setup.py develop

Step2. Install apex.

# skip this step if you don't want to train model.
cd apex
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" .

Step3. Install pycocotools.

pip3 install cython;
cd cocoapi/PythonAPI && pip3 install -v .

Inference

Download the pretrained musicyolo1 and musicyolo2 models described in our paper. Put these two models under the models folder. The models are stored in BaiduYun https://pan.baidu.com/s/1TbE36ydi-6EZXwxo5DwfLg?pwd=1234 code: 1234

SSVD & ISMIR2014

Step1. Download SSVD-v2.0 from https://github.com/xk-wang/SSVD-v2.0

Step2. Onset/offset detection (use musicyolo2.pth)

python3 tools/predict.py -f exps/example/custom/yolox_singing.py -c models/musicyolo2.pth --audiodir $SSVD_TEST_SET_PATH --savedir $SAVE_PATH --ext .flac --device gpu

Step3. Evaluate

python3 tools/note_eval.py --label $SSVD_TEST_SET_PATH --result $SAVE_PATH --offset

Similar process for ISMIR2014 dataset.

MIR-ST500

Since MIR-ST500 dataset is a mixture of vocals and accompaniments, we need to separate vocals and accompaniments with spleeter first. Besides, since the singing duration of each audio in MIR-ST500 dataset is too long, we will first cut each audio into short audios of about 35s for on/offset detection.

Step1. Audio source seperation

python3 tools/util/do_spleeter.py $MIR_ST500_DIR

Step2. Split audio

python3 tools/util/split_mst.py --mst_path $MST_TEST_VOCAL_PATH --dest_dir $SPLIT_PATH

Step3. Onset/offset detection (use musicyolo1.pth)

python3 tools/predict.py -f exps/example/custom/yolox_singing.py -c models/musicyolo1.pth --audiodir $SPLIT_PATH --savedir $SAVE_PATH --ext .wav --device gpu

Step4. Merge results

Because we split the MIR-ST500 test set audio earlier, the results are also splited. Here we merge the split results.

python3 tools/util/merge_res.py --audio_dir $SPLIT_PATH --origin_dir $SAVE_PATH --final_dir $MERGE_PATH

Step5. Evaluate

python3 tools/note_eval.py --label $MIR_ST500_TEST_LABEL_PATH --result $MERGE_PATH --offset

Train yourself

Download yolox-s weight from https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth . Put the model weight under models folder.

Train on SSVD (get musicyolo2)

Step1. Get SSVD train set

Download SSVD-v2.0 from https://github.com/xk-wang/SSVD-v2.0. Put the images folder under the datasets folder.

Step2. Train

python3 tools/train.py -f exps/example/custom/yolox_singing.py -d 1 -b 16 --fp16 -o -c models/yolox_s.pth

Train on MIR-ST500 (get musicyolo1)

Prepair note object detection dataset

Because there are a few audios for SSVD training set, we use Labelme software to annotate note object manually. There are a lot of data in MIR-ST500 training set, so we design a set of automatic annotation tools.

Step1. Audio source seperation

python3 tools/util/do_spleeter.py $MIR_ST500_TRAIN_DIR

Step2. Split audio

python3 tools/util/split_mst.py --mst_path $MIR_ST500_TRAIN_DIR --dest_dir $TRAIN_SPLIT_PATH

Step3. Automatic annotation

python3 tools/util/automatic_annotation.py --audiodir $TRAIN_SPLIT_PATH --imgdir $MST_NOTE_PATH

Step4. Automatic annotation

Divide the training set and validation set by yourself. We break up the images and divide them according to the ratio of 7:3 to get the training set and validation set. The images and annotations are put under $YOU_MIR_ST500_IMAGES folder.

Step4. Coco dataset format

The MIR-st500 note object detection dataset is organized in a format similar to the images folder in SSVD v2.0 dataset.

python3 tools/util/labelme2coco.py --annotationpath $YOU_MIR_ST500_IMAGES/train --jsonpath $IMAGE_DIR/train/_annotations.coco.json

python3 tools/util/labelme2coco.py --annotationpath $YOU_MIR_ST500_IMAGES/valid --jsonpath $IMAGE_DIR/valid/_annotations.coco.json

then put the MIR-ST500 note object detection dataset under the datasets folder like SSVD.

Train

the similar process like training on SSVD dataset.

Citation

 @article{yolox2021,
  title={YOLOX: Exceeding YOLO Series in 2021},
  author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
  journal={arXiv preprint arXiv:2107.08430},
  year={2021}
}

@inproceedings{musicyolo2022,
  title={A SIGHT-SINGING ONSET/OFFSET DETECTION FRAMEWORK BASED ON OBJECT DETECTION INSTEAD OF SPECTRUM FRAMES.},
  author={X. Wang, W. Xu, W. Yang and W. Cheng},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={},
  year={2022},
}
Owner
Xianke Wang
Stay hungry stay foolish!
Xianke Wang
AttGAN: Facial Attribute Editing by Only Changing What You Want (IEEE TIP 2019)

News 11 Jan 2020: We clean up the code to make it more readable! The old version is here: v1. AttGAN TIP Nov. 2019, arXiv Nov. 2017 TensorFlow impleme

Zhenliang He 568 Dec 14, 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22) Ok-Topk is a scheme for distributed training with sparse gradients

Shigang Li 9 Oct 29, 2022
Reference implementation for Structured Prediction with Deep Value Networks

Deep Value Network (DVN) This code is a python reference implementation of DVNs introduced in Deep Value Networks Learn to Evaluate and Iteratively Re

Michael Gygli 55 Feb 02, 2022
ParaGen is a PyTorch deep learning framework for parallel sequence generation

ParaGen is a PyTorch deep learning framework for parallel sequence generation. Apart from sequence generation, ParaGen also enhances various NLP tasks, including sequence-level classification, extrac

Bytedance Inc. 169 Dec 22, 2022
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Deformable Neural Radiance Fields This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies. Project Page Paper Video This codebase conta

Google 1k Jan 09, 2023
A scikit-learn compatible neural network library that wraps PyTorch

A scikit-learn compatible neural network library that wraps PyTorch. Resources Documentation Source Code Examples To see more elaborate examples, look

4.9k Jan 03, 2023
Neural Network Libraries

Neural Network Libraries Neural Network Libraries is a deep learning framework that is intended to be used for research, development and production. W

Sony 2.6k Dec 30, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
Trying to understand alias-free-gan.

alias-free-gan-explanation Trying to understand alias-free-gan in my own way. [Chinese Version 中文版本] CC-BY-4.0 License. Tzu-Heng Lin motivation of thi

Tzu-Heng Lin 12 Mar 17, 2022
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
Deep learning model, heat map, data prepo

deep learning model, heat map, data prepo

Pamela Dekas 1 Jan 14, 2022
An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Raspberry Pi Air Quality Monitor A simple air quality monitoring service for the Raspberry Pi. Installation Clone the repository and run the following

rydercalmdown 24 Dec 09, 2022
Breast cancer is been classified into benign tumour and malignant tumour.

Breast cancer is been classified into benign tumour and malignant tumour. Logistic regression is applied in this model.

1 Feb 04, 2022
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
Implements an infinite sum of poisson-weighted convolutions

An infinite sum of Poisson-weighted convolutions Kyle Cranmer, Aug 2018 If viewing on GitHub, this looks better with nbviewer: click here Consider a v

Kyle Cranmer 26 Dec 07, 2022
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Imbalanced Dataset Sampler Introduction In many machine learning applications, we often come across datasets where some types of data may be seen more

Ming 2k Jan 08, 2023
Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks

pix2vox [Demonstration video] Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks. Generated samples Single-category generation M

Takumi Moriya 232 Nov 14, 2022
Code for "Hierarchical Skills for Efficient Exploration" HSD-3 Algorithm and Baselines

Hierarchical Skills for Efficient Exploration This is the source code release for the paper Hierarchical Skills for Efficient Exploration. It contains

Facebook Research 38 Dec 06, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 01, 2023