SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Last update: May 20, 2022

Related tags

Deep Learning speechnas

Overview

speechnas

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification

ASRU 2021 IEEE Automatic Speech Recognition and Understanding

If this repository is useful to you, please cite our work properly. Thank you!

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification, ASRU 2021.

Environment

Set up the environment for the reposity by

PyTorch 1.7+

Check configuration

Check configuration in ./config/

inference

bash metric/metric_eer/auto_run.sh

Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin.

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Related tags

Overview

speechnas

Environment

Check configuration

inference

Owner

Wentao Zhu

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SciFive: a text-text transformer model for biomedical literature

Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Code for the Paper "Diffusion Models for Handwriting Generation"

Using python and scikit-learn to make stock predictions

Top #1 Submission code for the first https://alphamev.ai MEV competition with best AUC (0.9893) and MSE (0.0982).

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

This app is a simple example of using Strealit to create a financial data web app.

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Bot developed in Python that automates races in pegaxy.

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

A python/pytorch utility library