A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Overview

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

The x-vector-vad system is described in the paper; Ogura, M. & Haynes, M. (2021) X-vector-vad for Multi-genre Broadcast Speech-to-text. The paper has been submitted to 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) and is currently under review as of June 2021.

Quickstart

$ docker pull bbcrd/bbc-speech-segmenter

# Test

$ docker run -w /wrk -v `pwd`:/wrk bbcrd/bbc-speech-segmenter ./test.sh

# Segmentation help

$ docker run bbcrd/bbc-speech-segmenter ./run-segmentation.sh --help
usage: run-segmentation.sh [options] input.wav input.stm output-dir

options:
  --nj NUM                 Maximum number of CPU cores to use
  --stage STAGE            Start from this stage
  --cluster-threshold THR  Cluster stopping criteria. Default: -0.3
  --vad-threshold THR      Xvector classifier threshold. Lower the number the
                           more speech segments shall be returned at the
                           expense of accuracy. Default: 0.2
  --vad-method             Filter segments on an individual or segment basis.
                           Default: individual
  --no-vad                 Skip xvector vad stages. Default: false
  --help                   Print this message

# Run segmentation (VAD + diarisation), results are in output-dir/diarize.stm

$ docker run -v `pwd`:/data bbcrd/bbc-speech-segmenter \
  ./run-segmentation.sh /data/audio.wav /data/audio.stm /data/output-dir

$ cat output-dir/diarize.stm
audio 0 audio_S00004 3.750 10.125 <speech>
audio 0 audio_S00003 10.125 13.687 <speech>
audio 0 audio_S00004 13.688 16.313 <speech>
...

# Train x-vector classifier

$ docker run -w /wrk/recipe -v `pwd`:/wrk bbcrd/bbc-speech-segmenter \
  local/xvector_utils.py data/bbc-vad-train/reference.stm            \
  data/bbc-vad-train/xvectors.ark new_model.pkl

# Evaluate x-vector classifier

$ docker run -w /wrk/recipe -v `pwd`:/wrk bbcrd/bbc-speech-segmenter \
  local/xvector_utils.py evaluate data/bbc-vad-eval/reference.stm    \
  data/bbc-vad-eval/xvectors.ark model/xvector-classifier.pkl

Audio & STM file format

In order to run the segmentation script you need your audio in 16Khz Mono WAV format. You also need an STM file describing the segments you want to apply voice activity detection and speaker diarization to.

For more information on the STM file format see XVECTOR_UTILS.md.

# Convert audio file to 16Khz mono wav

$ ffmpeg audio.mp3 -vn -ac 1 -ar 16000 audio.wav

# Create STM file for input

$ DURATION=$(ffprobe -i audio.wav -show_entries format=duration -v quiet -of csv="p=0")
$ DURATION=$(printf "%0.2f\n" $DURATION)

$ FILENAME=$(basename audio.wav)

$ echo "${FILENAME%.*} 0 ${FILENAME%.*} 0.00 $DURATION <label> _" > audio.stm

$ cat audio.stm
audio 0 audio 0.00 60.00 <label> _

Use Docker image to run code in local checkout

# Bulid Docker image

$ docker build -t bbc-speech-segmenter .

# Spin up a Docker container in an interactive mode

$ docker run -it -v `pwd`:/wrk bbc-speech-segmenter /bin/bash

# Inside a Docker container

$ cd /wrk/

# Run test

$ ./test.sh
All checks passed

Training and evaluation

X-vector utility

xvector_utils.py can be used to train and evaluate x-vector classifier, as well as o extract and visualize x-vectors. For more detailed information, see XVECTOR_UTILS.md.

The documentation also gives details on file formats such as ARK, SCP or STM, which are required to use this tool.

Run x-vector VAD training

Two files are required for x-vector-vad training:

  • Reference STM file
  • X-vectors ARK file

For example, from inside the Docker container:

$ cd /wrk/recipe

$ python3 local/xvector_utils.py train \
  data/bbc-vad-train/reference.stm     \
  data/bbc-vad-train/xvectors.ark      \
  new_model.pkl

The model will be saved as new_model.pkl.

Run x-vector VAD evaluation

Three files are needed in order to run VAD evaluation:

  • Reference STM file
  • X-vectors ARK file
  • x-vector-vad classifier model

For example, from inside the Docker container:

$ cd /wrk/recipe

$ python3 local/xvector_utils.py evaluate \
  data/bbc-vad-eval/reference.stm        \
  data/bbc-vad-eval/xvectors.ark         \
  model/xvector-classifier.pkl

WebRTC baseline

The code for the baseline WebRTC system referenced in the paper is available in the directory recipe/baselines/denoising_DIHARD18_webrtc.

Request access to bbc-vad-train

Due to size restriction, only bbc-vad-eval is included in the repository. If you'd like access to bbc-vad-train, please contact Matt Haynes.

Authors

Owner
BBC
Open source code used on public facing services, internal services and educational resources.
BBC
[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Semantics Disentangling for Generalized Zero-shot Learning This is the official implementation for paper Zhi Chen, Yadan Luo, Ruihong Qiu, Zi Huang, J

25 Dec 06, 2022
Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

O-OFDMNet This includes Python implementation of O-OFDMNet, a deep learning-based optical OFDM system, which uses neural networks for signal processin

Thien Luong 4 Sep 09, 2022
Gesture Volume Control Using OpenCV and MediaPipe

This Project Uses OpenCV and MediaPipe Hand solutions to identify hands and Change system volume by taking thumb and index finger positions

Pratham Bhatnagar 6 Sep 12, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
custom pytorch implementation of MoCo v3

MoCov3-pytorch custom implementation of MoCov3 [arxiv]. I made minor modifications based on the official MoCo repository [github]. No ViT part code an

39 Nov 14, 2022
Ἀνατομή is a PyTorch library to analyze representation of neural networks

Ἀνατομή is a PyTorch library to analyze representation of neural networks

Ryuichiro Hataya 50 Dec 05, 2022
ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

Sebastian Flennerhag 764 Dec 31, 2022
Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

Isaac ROS Visual Odometry This repository provides a ROS2 package that estimates stereo visual inertial odometry using the Isaac Elbrus GPU-accelerate

NVIDIA Isaac ROS 343 Jan 03, 2023
Repository for Driving Style Recognition algorithms for Autonomous Vehicles

Driving Style Recognition Using Interval Type-2 Fuzzy Inference System and Multiple Experts Decision Making Created by Iago Pachêco Gomes at USP - ICM

Iago Gomes 9 Nov 28, 2022
A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

DrQA A pytorch implementation of the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions (DrQA). Reading comprehension is a task to produ

Runqi Yang 394 Nov 08, 2022
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
Long Expressive Memory (LEM)

Long Expressive Memory for Sequence Modeling This repository contains the implementation to reproduce the numerical experiments of the paper Long Expr

Konstantin Rusch 47 Dec 17, 2022
Video Contrastive Learning with Global Context

Video Contrastive Learning with Global Context (VCLR) This is the official PyTorch implementation of our VCLR paper. Install dependencies environments

143 Dec 26, 2022
Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination (ICCV 2021) Dataset License This work is l

DongYoung Kim 33 Jan 04, 2023
deep learning model that learns to code with drawing in the Processing language

sketchnet sketchnet - processing code generator can we teach a computer to draw pictures with code. We use Processing and java/jruby code paired with

41 Dec 12, 2022
AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

NeurIPS 2021 Paper "Learning with Algorithmic Supervision via Continuous Relaxations"

Felix Petersen 76 Jan 01, 2023
Fast methods to work with hydro- and topography data in pure Python.

PyFlwDir Intro PyFlwDir contains a series of methods to work with gridded DEM and flow direction datasets, which are key to many workflows in many ear

Deltares 27 Dec 07, 2022
Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides Project | This repo is the officia

CVSM Group - email: <a href=[email protected]"> 33 Dec 28, 2022
A style-based Quantum Generative Adversarial Network

Style-qGAN A style based Quantum Generative Adversarial Network (style-qGAN) model for Monte Carlo event generation. Tutorial We have prepared a noteb

9 Nov 24, 2022
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

0 Jan 23, 2022