Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Last update: Apr 30, 2022

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Related tags

Overview

Speaker-Embeddings-Correlation-Pooling

Owner

Themos Stafylakis

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

TensorFlow code and pre-trained models for BERT

Journalism AI – Quotes extraction for modular journalism

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Pre-training BERT masked language models with custom vocabulary

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

PIZZA - a task-oriented semantic parsing dataset

Beyond the Imitation Game collaborative benchmark for enormous language models

CredData is a set of files including credentials in open source projects

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

Kinky furry assitant based on GPT2

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Sapiens is a human antibody language model based on BERT.

A Python script which randomly chooses and prints a file from a directory.