Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Last update: Dec 05, 2022

Overview

Text2Music Emotion Embedding

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Reference

Emotion Embedding Spaces for Matching Music to Stories, ISMIR 2021 [paper]

-- Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra

@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}

Requirements

conda create -n YOUR_ENV_NAME python=3.7
conda activate YOUR_ENV_NAME
pip install -r requirements.txt

Data

You need to collect audio files of AudioSet mood subset (link).
Read the audio files and store them into .npy format.
Other relevant data including Alm's dataset (original link), ISEAR dataset (original link), emotion embeddings, pretrained Word2Vec, and data splits are all available here (link).
Unzip ttm_data.tar.gz and locate the extracted data folder under text2music-emotion-embedding/.

Training

Here is an example for training a metric learning model.

python3 src/metric_learning/main.py \
        --dataset 'isear' \
        --num_branches 3 \
        --data_path YOUR_DATA_PATH_TO_AUDIOSET

Fore more examples, check bash files under scripts folder.

Test

Here is an example for the test.

python3 src/metric_learning/main.py \
        --mode 'TEST' \
        --dataset 'alm' \
        --model_load_path 'data/pretrained/alm_cross.ckpt' \
        --data_path 'YOUR_DATA_PATH_TO_AUDIOSET'

Pretrained three-branch metric learning models (alm_cross.ckpt and isear_cross.ckpt) are included in ttm_data.tar.gz. This code is reproducible by locating the unzipped data folder under text2music-emotion-embedding/.

Visualization

Embedding distribution of each model can be projected onto 2-dimensional space. We used uniform manifold approximation and projection (UMAP) to visualize the distribution. UMAP is known to preserve more of global structure compared to t-SNE.

Demo

Please try some examples done by the three-branch metric learning model [Soundcloud].

License

Some License

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Related tags

Overview

Text2Music Emotion Embedding

Reference

Requirements

Data

Training

Test

Visualization

Demo

License

Owner

Minz Won

Expand human face editing via Global Direction of StyleCLIP, especially to maintain similarity during editing.

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Semi-supervised Domain Adaptation via Minimax Entropy

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

Аналитика доходности инвестиционного портфеля в Тинькофф брокере

Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

Systematic generalisation with group invariant predictions

Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

CMSC320 - Introduction to Data Science - Fall 2021

In the case of your data having only 1 channel while want to use timm models

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Automatic caption evaluation metric based on typicality analysis.

Graph-total-spanning-trees - A Python script to get total number of Spanning Trees in a Graph

Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

Pytorch-3dunet - 3D U-Net model for volumetric semantic segmentation written in pytorch

Super Resolution for images using deep learning.

BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库，帮助大家挑选或训练出更适合自己科研或者业务的模型结构

A toolkit for document-level event extraction, containing some SOTA model implementations