neural network based speaker embedder

Last update: Dec 29, 2022

Overview

Content

What is deepaudio-speaker?

Deepaudio-speaker is a framework for training neural network based speaker embedders. It supports online audio augmentation thanks to torch-audiomentation. It inlcudes or will include popular neural network architectures and losses used for speaker embedder.

To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework (just like what pyannote-audio and openspeech do).

Deepaudio-tts is coming soon.

Installation

conda create -n deepaudio python=3.8.5
conda activate deepaudio
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge
git clone https://github.com/deepaudio/deepaudio-speaker.git
cd deepaudio-speaker
pip install -e .

Get Started

Supported Datasets

####Voxceleb2

Download VoxCeleb dataset and follow this script to obtain this kind of directory structure:

/path/to/voxceleb/voxceleb1/dev/wav/id10001/1zcIwhmdeo4/00001.wav
/path/to/voxceleb/voxceleb1/test/wav/id10270/5r0dWxy17C8/00001.wav
/path/to/voxceleb/voxceleb2/dev/aac/id00012/21Uxsk56VDQ/00001.m4a
/path/to/voxceleb/voxceleb2/test/aac/id00017/01dfn2spqyE/00001.m4a

Training examples

Example1: Train the ecapa-tdnn model with fbank features on GPU.

$ deepaudio-speaker-train  \
    dataset=voxceleb2 \
    dataset.dataset_path=/your/path/to/voxceleb2/dev/wav/ \
    model=ecapa \
    model.channels=1024 \
    feature=fbank \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=aamsoftmax

Example2: Extract speaker embedding with trained model.

Todo

Model Architecture

ECAPA-TDNN This is an unofficial implementation from @lawlict. Please find more details in this link.

ECAPA-TDNN This is implemented by @joonson. Please find more details in this link.

ResNetSE34L This is borrowed from voxceleb trainer.

ResNetSE34V2 This is borrowed from voxceleb trainer.

resnet101 This is proposed by BUT for speaker diarization. Please note that the feature used in this framework is different from VB-HMM

How to contribute to deepaudio-speaker

It is a personal project. So I don't have enough gpu resources to do a lot of experiments. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.

Acknowledge

I borrow a lot of codes from openspeech and pyannote-audio

neural network based speaker embedder

Related tags

Overview

Content

What is deepaudio-speaker?

Installation

Get Started

Supported Datasets

Training examples

Model Architecture

How to contribute to deepaudio-speaker

Acknowledge

Owner

Constituency Tree Labeling Tool

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Python utility library for compositing PDF documents with reportlab.

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

A look-ahead multi-entity Transformer for modeling coordinated agents.

Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Awesome Treasure of Transformers Models Collection

A script that automatically creates a branch name using google translation api and jira api

This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers.

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Simple text to phones converter for multiple languages

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!