Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

Codes for building and training the neural network model described in Domain-informed neural networks for interaction localization within astroparticle experiments.

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

Cmsc11 arcade - Final Project for CMSC11

HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

Multi-query Video Retreival

Buffon’s needle: one of the oldest problems in geometric probability

Files for a tutorial to train SegNet for road scenes using the CamVid dataset

Set of models for classifcation of 3D volumes

Honours project, on creating a depth estimation map from two stereo images of featureless regions

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

An addernet CUDA version

Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Learning Logic Rules for Document-Level Relation Extraction

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Language Models for the legal domain in Spanish done @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).