An AI for Music Generation

Last update: Dec 31, 2022

Overview

MuseGAN

MuseGAN is a project on music generation. In a nutshell, we aim to generate polyphonic music of multiple tracks (instruments). The proposed models are able to generate music either from scratch, or by accompanying a track given a priori by the user.

We train the model with training data collected from Lakh Pianoroll Dataset to generate pop song phrases consisting of bass, drums, guitar, piano and strings tracks.

Sample results are available here.

Looking for a PyTorch version? Check out this repository.

Prerequisites

Below we assume the working directory is the repository root.

Install dependencies

Using pipenv (recommended)

Make sure pipenv is installed. (If not, simply run pip install pipenv.)

# Install the dependencies
pipenv install
# Activate the virtual environment
pipenv shell

Using pip

# Install the dependencies
pip install -r requirements.txt

Prepare training data

The training data is collected from Lakh Pianoroll Dataset (LPD), a new multitrack pianoroll dataset.

# Download the training data
./scripts/download_data.sh
# Store the training data to shared memory
./scripts/process_data.sh

You can also download the training data manually (train_x_lpd_5_phr.npz).

As pianoroll matrices are generally sparse, we store only the indices of nonzero elements and the array shape into a npz file to save space, and later restore the original array. To save some training data data into this format, simply run np.savez_compressed("data.npz", shape=data.shape, nonzero=data.nonzero())

Scripts

We provide several shell scripts for easy managing the experiments. (See here for a detailed documentation.)

Below we assume the working directory is the repository root.

Train a new model

Run the following command to set up a new experiment with default settings.

# Set up a new experiment
./scripts/setup_exp.sh "./exp/my_experiment/" "Some notes on my experiment"

Modify the configuration and model parameter files for experimental settings.

You can either train the model:

# Train the model
./scripts/run_train.sh "./exp/my_experiment/" "0"

or run the experiment (training + inference + interpolation):

# Run the experiment
./scripts/run_exp.sh "./exp/my_experiment/" "0"

Collect training data

Run the following command to collect training data from MIDI files.

# Collect training data
./scripts/collect_data.sh "./midi_dir/" "data/train.npy"

Use pretrained models

Download pretrained models
```
# Download the pretrained models
./scripts/download_models.sh
```
You can also download the pretrained models manually (pretrained_models.tar.gz).

You can either perform inference from a trained model:

# Run inference from a pretrained model
./scripts/run_inference.sh "./exp/default/" "0"

or perform interpolation from a trained model:

# Run interpolation from a pretrained model
./scripts/run_interpolation.sh "./exp/default/" "0"

Outputs

By default, samples will be generated alongside the training. You can disable this behavior by setting save_samples_steps to zero in the configuration file (config.yaml). The generated will be stored in the following three formats by default.

.npy: raw numpy arrays
.png: image files
.npz: multitrack pianoroll files that can be loaded by the Pypianoroll package

You can disable saving in a specific format by setting save_array_samples, save_image_samples and save_pianoroll_samples to False in the configuration file.

The generated pianorolls are stored in .npz format to save space and processing time. You can use the following code to write them into MIDI files.

from pypianoroll import Multitrack

m = Multitrack('./test.npz')
m.write('./test.mid')

Sample Results

Some sample results can be found in ./exp/ directory. More samples can be downloaded from the following links.

sample_results.tar.gz (54.7 MB): sample inference and interpolation results
training_samples.tar.gz (18.7 MB): sample generated results at different steps

Papers

Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation
Hao-Wen Dong and Yi-Hsuan Yang
in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 2018.
[website] [arxiv] [paper] [slides(long)] [slides(short)] [poster] [code]

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Hao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang and Yi-Hsuan Yang, (*equal contribution)
in Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018.
[website] [arxiv] [paper] [slides] [code]

MuseGAN: Demonstration of a Convolutional GAN Based Model for Generating Multi-track Piano-rolls
Hao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang and Yi-Hsuan Yang (*equal contribution)
in Late-Breaking Demos of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017. (two-page extended abstract)
[paper] [poster]

An AI for Music Generation

Related tags

Overview

MuseGAN

Prerequisites

Install dependencies

Prepare training data

Scripts

Train a new model

Collect training data

Use pretrained models

Outputs

Sample Results

Papers

Owner

Hao-Wen Dong

Voice to Text using Raspberry Pi

Audio features extraction

Use python MIDI to write some simple music

Code for paper 'Audio-Driven Emotional Video Portraits'.

A small project where I identify notes and key harmonies in a piece of music and use them further to recreate and generate the same piece of music through Python

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

Pyrogram bot to automate streaming music in voice chats

Inner ear models for Python

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Music player and music library manager for Linux, Windows, and macOS

Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Royal Music You can play music and video at a time in vc

Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

Minimal command-line music player written in Python

Powerful, simple, audio tag editor for GNU/Linux

Pythonic bindings for FFmpeg's libraries.

An AI for Music Generation

Related tags

Overview

MuseGAN

Prerequisites

Install dependencies

Prepare training data

Scripts

Train a new model

Collect training data

Use pretrained models

Outputs

Sample Results

Papers

Owner

Hao-Wen Dong

Voice to Text using Raspberry Pi

Audio features extraction

Use python MIDI to write some simple music

Code for paper 'Audio-Driven Emotional Video Portraits'.

A small project where I identify notes and key harmonies in a piece of music and use them further to recreate and generate the same piece of music through Python

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

Pyrogram bot to automate streaming music in voice chats

Inner ear models for Python

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Music player and music library manager for Linux, Windows, and macOS

Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Royal Music You can play music and video at a time in vc

Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

Minimal command-line music player written in Python

Powerful, simple, audio tag editor for GNU/Linux

﻿﻿Pythonic bindings for FFmpeg's libraries.

Pythonic bindings for FFmpeg's libraries.