The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Last update: Oct 30, 2022

Related tags

Text Data & NLP speech_separation_PIT

Overview

Speech Separation

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Result Example (Clisk to hear the voices): mix || prediction voice1 || prediction voice2

Mix Spectrogram

Predict Voice1's Spectrogram

Predict Voice2's Spectrogram

1. Quick train

Step 1:

Download LibriMixSmall, extract it and move it to the root of the project.

Step 2:

./train.sh

It will take about ONLY 2-3 HOURS to train with normal GPU. After each epoch, the prediction is generated to ./viz_outout folder.

2. Quick inference

./inference.sh The result will be generated to ./viz_outout folder.

3. More detail

Input: The Complex spectrogram. Get from the raw mixed audio signal
Output: The complex ratio mask (cRM) ---> complex spectrogram ---> separated voices.
Model: Use the simple version of this implementation , which is defined in paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Loss function: Permutation Invariant Training Loss and PairWise Neg SisDr Loss (more SOTA)
Dataset: A small version of LibriMix dataset. I get from LibriMixSmall

4. Current problem

Due to small dataset size for fast training, the model is a bit overfitting to the training set. Use the bigger dataset will potentially help to overcome that. Some suggestions:

Use the original LibriMix Dataset which is way much bigger (around 60 times bigger that what I have trained).
Use this work to download much more in-the-wild dataset and use datasets/VoiceMixtureDataset.py instead of the Libri one that I am using. p/s I have trained and it work too.

The simple project to separate mixed voice (2 clean voices) to 2 separate voices.

Related tags

Overview

Speech Separation

1. Quick train

Step 1:

Step 2:

2. Quick inference

3. More detail

4. Current problem

Owner

vuthede

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

Estimation of the CEFR complexity score of a given word, sentence or text.

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

A sentence aligner for comparable corpora

Pre-training BERT masked language models with custom vocabulary

aMLP Transformer Model for Japanese

Edge-Augmented Graph Transformer

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

Py65 65816 - Add support for the 65C816 to py65

A simple version of DeTR

ADCS cert template modification and ACL enumeration

Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

🎐 a python library for doing approximate and phonetic matching of strings.

Code voor mijn Master project omtrent VideoBERT

Submit issues and feature requests for our API here.

A Streamlit web app that generates Rick and Morty stories using GPT2.

中文无监督SimCSE Pytorch实现

RecipeReduce: Simplified Recipe Processing for Lazy Programmers

A music comments dataset, containing 39,051 comments for 27,384 songs.