Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Universal Probability Distributions with Optimal Transport and Convex Optimization

a delightful machine learning tool that allows you to train, test and use models without writing code

Repo for flood prediction using LSTMs and HAND

Library to enable Bayesian active learning in your research or labeling work.

SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.

A simple implementation of Kalman filter in Multi Object Tracking

Memory Defense: More Robust Classificationvia a Memory-Masking Autoencoder

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

Progressive Coordinate Transforms for Monocular 3D Object Detection

Notepy is a full-featured Notepad Python app

Understanding and Overcoming the Challenges of Efficient Transformer Quantization

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

StyleMapGAN - Official PyTorch Implementation

AntroPy: entropy and complexity of (EEG) time-series in Python

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"