Python interface to the WebRTC Voice Activity Detector

Last update: Dec 22, 2022

Related tags

Audio py-webrtcvad

Overview

py-webrtcvad

This is a python interface to the WebRTC Voice Activity Detector (VAD). It is compatible with Python 2 and Python 3.

A VAD classifies a piece of audio data as being voiced or unvoiced. It can be useful for telephony and speech recognition.

The VAD that Google developed for the WebRTC project is reportedly one of the best available, being fast, modern and free.

How to use it

Install the webrtcvad module:
```
pip install webrtcvad
```
Create a Vad object:
```
import webrtcvad
vad = webrtcvad.Vad()
```
Optionally, set its aggressiveness mode, which is an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is the most aggressive. (You can also set the mode when you create the VAD, e.g. vad = webrtcvad.Vad(3)):
```
vad.set_mode(1)
```

Give it a short segment ("frame") of audio. The WebRTC VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, 32000 or 48000 Hz. A frame must be either 10, 20, or 30 ms in duration:

# Run the VAD on 10 ms of silence. The result should be False.
sample_rate = 16000
frame_duration = 10  # ms
frame = b'\x00\x00' * int(sample_rate * frame_duration / 1000)
print 'Contains speech: %s' % (vad.is_speech(frame, sample_rate)

See example.py for a more detailed example that will process a .wav file, find the voiced segments, and write each one as a separate .wav.

How to run unit tests

To run unit tests:

pip install -e ".[dev]"
python setup.py test

History

2.0.10

Fixed memory leak. Thank you, bond005!

2.0.9

Improved example code. Added WebRTC license.

2.0.8

Fixed Windows compilation errors. Thank you, xiongyihui!

Python interface to the WebRTC Voice Activity Detector

Related tags

Overview

py-webrtcvad

How to use it

How to run unit tests

History

Owner

John Wiseman

An AI for Music Generation

Audio library for modelling loudness

Dataset and baseline code for the VocalSound dataset (ICASSP2022).

Cobra is a highly-accurate and lightweight voice activity detection (VAD) engine.

Audio book player for senior visually impaired.

Audio fingerprinting and recognition in Python

FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

AudioDVP:Photorealistic Audio-driven Video Portraits

IDing the songs played on the do you radio show

Reading list for research topics in sound event detection

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Free and Open Source Channel/Group Voice chat music player for telegram with button support saavn playback support.

Speech Algorithms Collections

Tune in is a Collaborative Music Playing Systems where multiple guests can join a room and enjoy the song being played

A library for augmenting annotated audio data

Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art!

Implicit neural differentiable FM synthesizer

Mopidy is an extensible music server written in Python

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.