VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

Last update: Dec 27, 2022

Related tags

Overview

VGGVox models for speaker identification and verification

This directory contains code to import and evaluate the speaker identification and verification models pretrained on the VoxCeleb(1 & 2) datasets as described in the following papers (1 and 2):

[1] A. Nagrani*, J. S. Chung*, A. Zisserman, VoxCeleb: a large-scale speaker identification dataset, 
INTERSPEECH, 2017

[2] J. S. Chung*, A. Nagrani*, A. Zisserman, VoxCeleb2: Deep Speaker Recognition, 
INTERSPEECH, 2018

The models trained for verification map voice spectrograms to a compact Euclidean space where distances directly correspond to a measure of speaker similarity. Such embeddings can be used for tasks such as speaker verification, clustering and diarisation.

Prerequisites

[1] Matlab

[2] Matconvnet.

Installing

The easiest way to use the code in this repo is with the vl_contrib package manager. To install, follow these steps:

Install and compile matconvnet by following instructions here.
Run:

vl_contrib install VGGVox
vl_contrib setup VGGVox

You can then run the demo scripts provided to import and test the models. There are three short demo scripts. The first two scripts are for identification and verification models trained on VoxCeleb1. The third script imports and test a verification model trained on VoxCeleb2. These demos demonstrate how to evaluate the models directly on .wav audio files:

demo_vggvox_identif 
demo_vggvox_verif 
demo_vggvox_verif_voxceleb2

Models

The matconvnet models can also be downloaded directly using the following links:

Model trained for identification on VoxCeleb1

Model trained for verification on VoxCeleb1

Model trained for verification on VoxCeleb2 (this is a resnet based model)

Datasets

These models have been pretrained on the VoxCeleb (1&2) datasets. VoxCeleb contains over 1 million utterances for 7,000+ celebrities, extracted from videos uploaded to YouTube. The speakers span a wide range of different ethnicities, accents, professions and ages. The dataset can be downloaded directly from here.

Citation

If you use this code then please cite:

@InProceedings{Nagrani17,
  author       = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
  title        = "VoxCeleb: a large-scale speaker identification dataset",
  booktitle    = "INTERSPEECH",
  year         = "2017",
}


@InProceedings{Nagrani17,
  author       = "Chung, J.~S. and Nagrani, A. and Zisserman, A.",
  title        = "VoxCeleb2: Deep Speaker Recognition",
  booktitle    = "INTERSPEECH",
  year         = "2018",
}

Fixes

Note - since we take only the magnitude of the spectrogram, the matlab functions here to extract spectrograms provide mirrored spectrograms (along the freq axis). This has been fixed in later models where we chop the spectrograms in half before feeding them into the network.

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

Related tags

Overview

VGGVox models for speaker identification and verification

Prerequisites

Installing

Models

Datasets

Citation

Fixes

Owner

The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Make your own game in a font!

Motion and Shape Capture from Sparse Markers

Estimation of human density in a closed space using deep learning.

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

PyElastica is the Python implementation of Elastica, an open-source software for the simulation of assemblies of slender, one-dimensional structures using Cosserat Rod theory.

PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Automatic meme generation model using Tensorflow Keras.

Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

A collection of inference modules for fastai2

Meta Learning for Semi-Supervised Few-Shot Classification

Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Multi-Joint dynamics with Contact. A general purpose physics simulator.

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

🍷 Gracefully claim weekly free games and monthly content from Epic Store.

Single-Stage 6D Object Pose Estimation, CVPR 2020