Emotional conditioned music generation using transformer-based model.

Last update: Nov 09, 2022

Related tags

Overview

This is the official repository of EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. The paper has been accepted by International Society for Music Information Retrieval Conference 2021.

Note: We release the transcribed MIDI files. As for the audio part, due to the copyright issue, we will only release the YouTube ID of the tracks and the timestamp of them. You might use open source crawler to get the audio file.

Use EMOPIA by MusPy

install muspy

pip install muspy

Use it in your script

import muspy

emopia = muspy.EMOPIADataset("data/emopia/", download_and_extract=True)
emopia.convert()
music = emopia[0]
print(music.annotations[0].annotation)

You can get the label of the piece of music:

{'emo_class': '1', 'YouTube_ID': '0vLPYiPN7qY', 'seg_id': '0'}

emo_class: ['1', '2', '3', '4']
YouTube_ID: the YouTube ID of this piece of music
seg_id: means this piece of music is the ith piece we take from this song. (zero-based).

For more usage please refer to MusPy.

Emotion Classification

For the classification models and codes, please refer to this repo.

Conditional Generation

Environment

Install PyTorch and fast transformer:
- torch==1.7.0 (Please install it according to your CUDA version.)
- fast transformer :
```
pip install --user pytorch-fast-transformers 
```
  or refer to the original repository
Other requirements:

pip install -r requirements.txt

Usage

Inference

Download the checkpoints and put them into exp/

Manually:

By commend: (install gdown: pip install gdown)

#baseline:
gdown --id 1Q9vQYnNJ0hXBFwcxdWQgDNmzoW3MLl3h --output exp/baseline.zip

# no-pretrained transformer
gdown --id 1ZULJgBRu2Wb3jxFmGfAHP1v_tjoryFM7 --output exp/no-pretrained_transformer.zip

# pretrained transformer
gdown --id 19Seq18b2JNzOamEQMG1uarKjj27HJkHu --output exp/pretrained_transformer.zip

Inference options:

num_songs: number of midis you want to generate.
out_dir: the folder where the generated midi will be saved. If not specified, midi files will be saved to exp/MODEL_YOU_USED/gen_midis/.
task_type: the task_type needs to be the same as the task specified during training.
- '4-cls' for 4 class conditioning
- 'Arousal' for only conditioning on arousal
- 'Valence' for only conditioning on Valence
- 'ignore' for not conditioning
emo_tag: the target class of emotion you want to assign.
- If the task_type is '4-cls', emo_tag can be: 1,2,3,4, which refers to Q1, Q2, Q3, Q4.
- If the task_type is 'Arousal', emo_tag can be: 1, 2. 1 for High arousal, 2 for Low arousal.
- If the task_type is 'Valence', emo_tag can be: 1, 2. 1 for High Valence, 2 for Low Valence.

Inference

python main_cp.py --mode inference --task_type 4-cls --load_ckt CHECKPOINT_FOLDER --load_ckt_loss 25 --num_songs 10 --emo_tag 1

Train the model by yourself

Prepare the data follow the steps.
training options:

exp_name: the folder name that the checkpoints will be saved.
data_parallel: use data_parallel to let the training process faster. (0: not use, 1: use)
task_type: the conditioning task:
- '4-cls' for 4 class conditioning
- 'Arousal' for only conditioning on arousal
- 'Valence' for only conditioning on Valence
- 'ignore' for not conditioning
a. Only train on EMOPIA: (no-pretrained transformer in the paper)
```
  python main_cp.py --path_train_data emopia --exp_name YOUR_EXP_NAME --load_ckt none
```
b. Pre-train the transformer on AILabs17k:
```
  python main_cp.py --path_train_data ailabs --exp_name YOUR_EXP_NAME --load_ckt none --task_type ignore
```
c. fine-tune the transformer on EMOPIA: For example, you want to use the pre-trained model stored in 0309-1857 with loss= 30 to fine-tune:
```
  python main_cp.py --path_train_data emopia --exp_name YOUR_EXP_NAME --load_ckt 0309-1857 --load_ckt_loss 30
```

Baseline

The baseline code is based on the work of Learning to Generate Music with Sentiment
According to the author, the model works best when it is trained with 4096 neurons of LSTM, but takes 12 days for training. Therefore, due to the limit of computational resource, we used the size of 512 neurons instead of 4096.
In order to use this as evaluation against our model, the target emotion classes is expanded to 4Q instead of just positive/negative.

Authors

The paper is a co-working project with Joann, SeungHeon and Nabin. This repository is mentained by Joann and me.

License

The EMOPIA dataset is released under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). It is provided primarily for research purposes and is prohibited to be used for commercial purposes. When sharing your result based on EMOPIA, any act that defames the original music owner is strictly prohibited.

The hand drawn piano in the logo comes from Adobe stock. The author is Burak. I purchased it under standard license.

Cite the dataset

@inproceedings{{EMOPIA},
         author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan},
         title = {{MOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation},
         booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
         year = {2021}
}

Emotional conditioned music generation using transformer-based model.

Related tags

Overview

Use EMOPIA by MusPy

Emotion Classification

Conditional Generation

Environment

Usage

Inference

Train the model by yourself

Baseline

Authors

License

Cite the dataset

Owner

hung anna

Interpolation-based reduced-order models

Second Order Optimization and Curvature Estimation with K-FAC in JAX.

git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Pytorch implementation of Implicit Behavior Cloning.

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

Whisper is a file-based time-series database format for Graphite.

This repository contains implementations and illustrative code to accompany DeepMind publications

Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

3D ResNet Video Classification accelerated by TensorRT

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

IsoGCN code for ICLR2021

Projects of Andfun Yangon

Implementation of Memformer, a Memory-augmented Transformer, in Pytorch

Examples of how to create colorful, annotated equations in Latex using Tikz.

A Real-World Benchmark for Reinforcement Learning based Recommender System

Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

This repository contains code for the paper "Decoupling Representation and Classifier for Long-Tailed Recognition", published at ICLR 2020

Run Effective Large Batch Contrastive Learning on Limited Memory GPU