The mini-MusicNet dataset

Overview

mini-MusicNet

A music-domain dataset for multi-label classification

Music transcription is sequence-to-sequence prediction problem: given an audio performance, we must predict a corresponding sequence of notes. If we ignore correlations in the sequence of notes, music transcription simplifies to a multi-label classification problem. Given an audio performance, we are tasked with predicting the set of notes present in an audio performance at a given time. The mini-MusicNet dataset is derived from the MusicNet dataset, providing a scaled-down, pre-processed subset of MusicNet suitable for multi-label classification.

This repository provides information for downloading and interacting with mini-MusicNet, as well as some algorithmic baselines for multi-label classification with mini-MusicNet.

About mini-MusicNet

Download. The mini-MusicNet dataset can be downloaded here. To follow the tutorial in the next section or run explore.ipynb, please download mini-MusicNet to the minimusic sub-directory of the root of this repository.

This dataset consists of n = 82,500 data points with d = 4,096 features and k = 128 binary labels per datapoint. Each data point is an approximately 9ms audio clip: these clips are sampled at regular intervals from the underlying MusicNet dataset. Each clip is normalized to amplitudes in [-1,1]. The label on a datapoint is a binary k-dimensional (multi-hot) vector that indicates the notes being performed at the center of the audio clip. We define train, validation, and test splits with n = 62,500, 10,000, and 10,000 data points respectively. The mini-MusicNet dataset can be acquired here. Alternatively, you can use construct.py to reconstruct mini-MusicNet from a copy of MusicNet.

Exploring mini-MusicNet

To get started, let's load and visualize the training data. The contents of this section are summarized in the explore.ipynb notebook.

import numpy as np
import matplotlib.pyplot as plt

Xtrain = np.load('minimusic/audio-train.npy')
Ytrain = np.load('minimusic/labels-train.npy')

fig, ax = plt.subplots(1, 2, figsize=(10,2))
ax[0].set_title('Raw acoustic features')
ax[0].plot(Xtrain[0])
ax[1].set_title('Fourier transform of the raw features')
ax[1].plot(np.abs(np.fft.rfft(Xtrain[0])[0:256])) # clip to 256 features for easier visualization

Now let's see how linear (ridge) regression performs on the raw audio features. We'll measure results using average precision.

from sklearn.metrics import average_precision_score

Xtest = np.load('minimusic/audio-test.npy')
Ytest = np.load('minimusic/labels-test.npy')

R = .001
beta = np.dot(np.linalg.inv(np.dot(Xtrain.T,Xtrain) + R*np.eye(Xtrain.shape[1])),np.dot(Xtrain.T,Ytrain))

print('Train AP:', round(average_precision_score(Ytrain, np.dot(Xtrain, beta), average='micro'), 2))
print('Test AP:', round(average_precision_score(Ytest, np.dot(Xtest, beta), average='micro'), 2))

Train AP: 0.19 Test AP: 0.04

That's not so great. We can do much better by transforming our audio wave to the Fourier domain.

Xtrainfft = np.abs(np.fft.rfft(Xtrain))
Xtestfft = np.abs(np.fft.rfft(Xtest))

R = .001
beta = np.dot(np.linalg.inv(np.dot(Xtrainfft.T,Xtrainfft) + R*np.eye(Xtrainfft.shape[1])),np.dot(Xtrainfft.T,Ytrain))

print('Train AP:', round(average_precision_score(Ytrain, np.dot(Xtrainfft, beta), average='micro'), 2))
print('Test AP:', round(average_precision_score(Ytest, np.dot(Xtestfft, beta), average='micro'), 2))

Train AP: 0.57 Test AP: 0.47

Finally, it can often be more revealing to look at a precision-recall curve, rather than the scalar average precision (the area under the P/R curve). Let's see what our full P/R curve looks like for ridge regression on Fourier features.

fig, ax = plt.subplots(1, 2, figsize=(10,4))
ax[0].set_title('Train P/R Curve')
plot_pr_curve(ax[0], Ytrain, np.dot(Xtrainfft, beta))
ax[1].set_title('Test P/R Curve')
plot_pr_curve(ax[1], Ytest, np.dot(Xtestfft, beta))

And that's enough to get us started! We hope that mini-MusicNet can be a useful resource for empirical work in multi-label classification.

References

For further information about MusicNet, or if you want to cite this work, please see:

@inproceedings{thickstun2017learning,
  author    = {John Thickstun and Zaid Harchaoui and Sham M. Kakade},
  title     = {Learning Features of Music from Scratch},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
}
Owner
John Thickstun
John Thickstun
This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Graphormer By Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng*, Guolin Ke, Di He*, Yanming Shen and Tie-Yan Liu. This repo is the official impl

Microsoft 1.3k Dec 26, 2022
EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

EFENet EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation Code is a bit messy now. I woud clean up soon. For training the EF

Yaping Zhao 19 Nov 05, 2022
OneShot Learning-based hotword detection.

EfficientWord-Net Hotword detection based on one-shot learning Home assistants require special phrases called hotwords to get activated (eg:"ok google

ANT-BRaiN 102 Dec 25, 2022
Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy Simplex Algorithm is a popular algorithm for linear programmi

Reda BELHAJ 2 Oct 12, 2022
Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

Python HPC Optimizaciones incrementales de N-Body (all-pairs) con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámb

Andrés Milla 12 Aug 04, 2022
Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

Jetsonhacks 25 Dec 26, 2022
Simple and Distributed Machine Learning

Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy

Microsoft 3.9k Dec 30, 2022
Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

Code to reproduce the results for Statistically Robust Neural Network Classification, published in UAI 2021

1 Jun 02, 2022
Models, datasets and tools for Facial keypoints detection

Template for Data Science Project This repo aims to give a robust starting point to any Data Science related project. It contains readymade tools setu

girafe.ai 1 Feb 11, 2022
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
Official Implementation of "Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras"

Multi Camera Pig Tracking Official Implementation of Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras CVPR2021 CV4Animals Workshop P

44 Jan 06, 2023
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

Jacob Gildenblat 6.6k Jan 06, 2023
Contains supplementary materials for reproduce results in HMC divergence time estimation manuscript

Scalable Bayesian divergence time estimation with ratio transformations This repository contains the instructions and files to reproduce the analyses

Suchard Research Group 1 Sep 21, 2022
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
GyroSPD: Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices

GyroSPD Code for the paper "Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices" accepted at NeurIPS 2021. Re

Federico Lopez 12 Dec 12, 2022
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Hybrid solving process for combinatorial optimization problems Combinatorial optimization has found applications in numerous fields, from aerospace to

117 Dec 13, 2022
利用python脚本实现微信、支付宝账单的合并,并保存到excel文件实现自动记账,可查看可视化图表。

KeepAccounts_v2.0 KeepAccounts.exe和其配套表格能够实现微信、支付宝官方导出账单的读取合并,为每笔帐标记类型,并按月份和类型生成可视化图表。再也不用消费一笔记一笔,每月仅需10分钟,记好所有的帐。 作者: MickLife Bilibili: https://spac

159 Jan 01, 2023
Dual Attention Network for Scene Segmentation (CVPR2019)

Dual Attention Network for Scene Segmentation(CVPR2019) Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu Introduction W

Jun Fu 2.2k Dec 28, 2022
PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

pix2pix-pytorch PyTorch implementation of Image-to-Image Translation Using Conditional Adversarial Networks. Based on pix2pix by Phillip Isola et al.

mrzhu 383 Dec 17, 2022