Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Last update: Dec 01, 2022

Related tags

Overview

SEW (Squeezed and Efficient Wav2vec)

The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition" by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q Weinberger, and Yoav Artzi.

Model Checkpoints

Unsupervisedly Pre-trained on LibriSpeech 960h

Model	Pre-training updates	Dataset	Model
W2V2-tiny	100K	Librispeech 960h	download
W2V2-small	100K	Librispeech 960h	download
W2V2-mid	100K	Librispeech 960h	download
W2V2-base	100K	Librispeech 960h	download
SEW-tiny	100K	Librispeech 960h	download
SEW-small	100K	Librispeech 960h	download
SEW-mid	100K	Librispeech 960h	download
SEW-D-tiny	100K	Librispeech 960h	download
SEW-D-small	100K	Librispeech 960h	download
SEW-D-mid	100K	Librispeech 960h	download
SEW-D-mid (k127)	100K	Librispeech 960h	download
SEW-D-base	100K	Librispeech 960h	download
SEW-D-base+	100K	Librispeech 960h	download
SEW-D-mid	400K	Librispeech 960h	download
SEW-D-mid (k127)	400K	Librispeech 960h	download
SEW-D-base+	400K	Librispeech 960h	download

Usage

Dependencies

The code is tested with fairseq commit 05255f9, deberta commit bf17ca4 and the following packages.

torch==1.8.0
torchaudio==0.8.0
tqdm==4.49.0
Hydra==2.5
hydra-core==1.0.4
fvcore==0.1.5.post20210330
omegaconf==2.0.5
einops==0.3.0
fire==0.2.1

Apex

Please install NVIDIA's apex with

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

wav2letter decoder

Currently, we are decoding with wav2letter v0.2 python binding at commit 96f5f9d Please install the python binding here https://github.com/flashlight/wav2letter/tree/96f5f9d3b41e01af0a031ee0d2604acd9ef3b1b0/bindings/python The newest commit d5a93f0 in v0.2 branch leads to worse WER for wav2vec 2.0 baselines.

Installation

git clone https://github.com/asappresearch/sew.git
cd sew 
pip install -e .

Pre-training

Pre-training SEW models

Run the following command where $model_size can be tiny, small, or mid, and $ngpu is tne number of GPUs you want to use.

bash scripts/pt-sew.sh $model_size $ngpu

Pre-training SEW-D models

bash scripts/pt-sew-d.sh $model_size $ngpu

where $model_size can be tiny, small, mid, mid-k127, base, or base+.

Fine-tuning

Run the following script to fine-tune a model with the hyperparameters from wav2vec 2.0.

bash scripts/ft-model.sh $pre_trained_model $split $ngpu

where $pre_trained_model can be either a W2V2, SEW, or a SEW-D model checkpoint and $split can be 10m, 1h, 10h, or 100h.

Here we also provide a set of hyperparameters which sets all dropouts the same as the pre-training stage, and we found it to be more stable.

bash scripts/ft-model-stable.sh $pre_trained_model $split $ngpu

If you see out of GPU memory error, please scale down the dataset.max_tokens and scale up the optimization.update_freq in scripts/ft-model.sh. For example modifying these lines

  dataset.max_tokens=3200000 \
  optimization.update_freq="[$((8 / $ngpu))]" \

  dataset.max_tokens=1600000 \
  optimization.update_freq="[$((16 / $ngpu))]" \

which reduces the batch size and increases the gradient accumulation steps in order to use less GPU memory.

Evaluation

Please run this script to prepare the official LibriSpeech 4-gram language model.

bash scripts/prepare_librispeech_lm.sh $kenlm_build_bin

where $kenlm_build_bin is the folder that contains the KenLM build_binary executable file (e.g. /home/user/kenlm/build/bin).

Then run this script to evaluate a pre-trained ASR model

python tools/eval_w2v.py tunelm --subsets '["dev-clean", "dev-other", "test-clean", "test-other"]' --model $asr_checkpoint

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University

139 Nov 18, 2022

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Comments

8000 sample rate audio

Hello there,

I'm trying to train on 8000 Hz sample rate audio dataset. Is it enough to simply add task.sample_rate=8000 to the fairseq command or there are additional config changes that I should make?

I would much appreciate any advice

Thank you

opened by Mega4alik 0
How to train using not English Languages

Hi! Thank you for the awesome model!

We are very interested in your project and we try to use the sew for Japanese Language. When we train the model, should we use these scripts? Thanks! https://github.com/asappresearch/sew/tree/master/scripts

opened by jigenji 1
:bug: Fix padding mask calculation

This PR updates the padding mask calculation to be the same as the one in the reference Wav2Vec2 implementation (same commit as listed in SEW's README): https://github.com/pytorch/fairseq/blob/05255f96410e5b1eaf3bf59b767d5b4b7e2c3a35/fairseq/models/wav2vec/wav2vec2.py#L477

For more details on how and why it was fixed in fairseq, check out this PR by @patrickvonplaten https://github.com/pytorch/fairseq/pull/3228

opened by anton-l 0

Releases(v0.0.1)

v0.0.1(Sep 15, 2021)

First release.
Source code(tar.gz)
Source code(zip)

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Related tags

Overview

SEW (Squeezed and Efficient Wav2vec)

Model Checkpoints

Unsupervisedly Pre-trained on LibriSpeech 960h

Usage

Dependencies

Apex

wav2letter decoder

Installation

Pre-training

Fine-tuning

Evaluation

You might also like...

Code for the paper Learning the Predictability of the Future

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Code for our CVPR 2021 paper "MetaCam+DSCE"

Comments

8000 sample rate audio

How to train using not English Languages

:bug: Fix padding mask calculation

Releases(v0.0.1)

v0.0.1(Sep 15, 2021)

Owner

ASAPP Research

DL course co-developed by YSDA, HSE and Skoltech

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Imaging, analysis, and simulation software for radio interferometry

Accelerating BERT Inference for Sequence Labeling via Early-Exit

The devkit of the nuPlan dataset.

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Global-Local Context Network for Person Search

A python tutorial on bayesian modeling techniques (PyMC3)

Think Big, Teach Small: Do Language Models Distil Occam’s Razor?

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

Cmsc11 arcade - Final Project for CMSC11

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

ICCV2021 Papers with Code

Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

The fundamental package for scientific computing with Python.

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

For holding anime-related object classification and detection models

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery