Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

Image Fusion Transformer

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

PED: DETR for Crowd Pedestrian Detection

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Kernel Point Convolutions

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods.

This repository implements WGAN_GP.

Pca-on-genotypes - Mini bioinformatics project - PCA on genotypes

Calling Julia from Python - an experiment on data loading

Systemic Evolutionary Chemical Space Exploration for Drug Discovery

We have made you a wrapper you can't refuse

This repository contains demos I made with the Transformers library by HuggingFace.

A simple Python library for stochastic graphical ecological models

A set of examples around hub for creating and processing datasets

Isaac Gym Reinforcement Learning Environments

Brain tumor detection using CNN (InceptionResNetV2 Model)

[ICLR'19] Trellis Networks for Sequence Modeling

Yolov5 + Deep Sort with PyTorch

PyGCL: Graph Contrastive Learning Library for PyTorch