Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Overview

🤗 Transformers Wav2Vec2 + PyCTCDecode

Introduction

This repo shows how 🤗 Transformers can be used in combination with kensho-technologies's PyCTCDecode & KenLM ngram as a simple way to boost word error rate (WER).

Included is a file to create an ngram with KenLM as well as a simple evaluation script to compare the results of using Wav2Vec2 with PyCTCDecode + KenLM vs. without using any language model.

Note: The scripts are written to be used on GPU. If you want to use a CPU instead, simply remove all .to("cuda") occurances in eval.py.

Installation

In a first step, one should install KenLM. For Ubuntu, it should be enough to follow the installation steps described here. The installed kenlm folder should be move into this repo for ./create_ngram.py to function correctly. Alternatively, one can also link the lmplz binary file to a lmplz bash command to directly run lmplz instead of ./kenlm/build/bin/lmplz.

Next, some Python dependencies should be installed. Assuming PyTorch is installed, it should be sufficient to run pip install -r requirements.txt.

Run evaluation

Create ngram

In a first step on should create a ngram. E.g. for polish the command would be:

./create_ngram.py --language polish --path_to_ngram polish.arpa

After the language model is created, one should open the file. one should add a </s> The file should have a structure which looks more or less as follows:

\data\        
ngram 1=86586
ngram 2=546387
ngram 3=796581           
ngram 4=843999             
ngram 5=850874              
                                                  
\1-grams:
-5.7532206      <unk>   0
0       <s>     -0.06677356                                                                            
-3.4645514      drugi   -0.2088903
...

Now it is very important also add a </s> token to the n-gram so that it can be correctly loaded. You can simple copy the line:

0 <s> -0.06677356

and change <s> to </s>. When doing this you should also inclease ngram by 1. The new ngram should look as follows:

\data\
ngram 1=86587
ngram 2=546387
ngram 3=796581
ngram 4=843999
ngram 5=850874

\1-grams:
-5.7532206      <unk>   0
0       <s>     -0.06677356
0       </s>     -0.06677356
-3.4645514      drugi   -0.2088903
...

Now the ngram can be correctly used with pyctcdecode

Run eval

Having created the ngram, one can run:

./eval.py --language polish --path_to_ngram polish.arpa

To compare Wav2Vec2 + LM vs. Wav2Vec2 + No LM on polish.

Results

Without tuning any hyperparameters, the following results were obtained:

Comparison of Wav2Vec2 without Language model vs. Wav2Vec2 with `pyctcdecode` + KenLM 5gram.
Fine-tuned Wav2Vec2 models were used and evaluated on MLS datasets.
Take a closer look at `./eval.py` for comparison

==================================================portuguese==================================================
polish - No LM - | WER: 0.3069742867206763 | CER: 0.06054530156286364 | Time: 58.04590034484863
polish - With LM - | WER: 0.2291299753434308 | CER: 0.06211174564528545 | Time: 191.65409898757935

==================================================spanish==================================================
portuguese - No LM - | WER: 0.18208286674132138 | CER: 0.05016682956422096 | Time: 114.61633825302124
portuguese - With LM - | WER: 0.1487761958086706 | CER: 0.04489231909945738 | Time: 429.78511357307434

==================================================polish==================================================
spanish - No LM - | WER: 0.2581272104769545 | CER: 0.0703088156033147 | Time: 147.8634352684021
spanish - With LM - | WER: 0.14927852292116295 | CER: 0.052034208044195916 | Time: 563.0732748508453

It can be seen that the word error rate (WER) is significantly improved when using PyCTCDecode + KenLM. However, the character error rate (CER) does not improve as much or not at all. This is expected since using a language model will make sure that words that are predicted are words that exist in the language's vocabulary. Wav2Vec2 without a LM produces many words that are more or less correct but contain a couple of spelling errors, thus not contributing to a good WER. Those words are likely to be "corrected" by Wav2Vec2 + LM leading to an improved WER. However a Wav2Vec2 already has a good character error rate as its vocabulary is composed of characters meaning that a "word-based" language model doesn't really help in this case.

Overall WER is probably the more important metric though, so it might make a lot of sense to add a LM to Wav2Vec2.

In terms of speed, adding a LM significantly reduces speed. However, the script is not at all optimized for speed so using multi-processing and batched inference would significantly speed up both Wav2Vec2 without LM and with LM.

Owner
Patrick von Platen
Patrick von Platen
The Agriculture Domain of ERPNext comes with features to record crops and land

Agriculture The Agriculture Domain of ERPNext comes with features to record crops and land, track plant, soil, water, weather analytics, and even trac

Frappe 21 Jan 02, 2023
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022
Implementation of "JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting"

JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting Pytorch implementation for the paper "JOKR: Joint Keypoint Repres

45 Dec 25, 2022
Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

人像卡通化 (Photo to Cartoon) 中文版 | English Version 该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

Minivision_AI 3.5k Dec 30, 2022
An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

MMGEN-FaceStylor English | 简体中文 Introduction This repo is an efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits

OpenMMLab 182 Dec 27, 2022
Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Katsuya Hyodo 6 May 15, 2022
Bald-to-Hairy Translation Using CycleGAN

GANiry: Bald-to-Hairy Translation Using CycleGAN Official PyTorch implementation of GANiry. GANiry: Bald-to-Hairy Translation Using CycleGAN, Fidan Sa

Fidan Samet 10 Oct 27, 2022
PyTorch implementation of the TTC algorithm

Trust-the-Critics This repository is a PyTorch implementation of the TTC algorithm and the WGAN misalignment experiments presented in Trust the Critic

0 Nov 29, 2021
A python library to build Model Trees with Linear Models at the leaves.

A python library to build Model Trees with Linear Models at the leaves.

Marco Cerliani 212 Dec 30, 2022
A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

Marvin Cao 1.4k Dec 14, 2022
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

ASAPP Research 67 Dec 01, 2022
Monify: an Expense tracker Program implemented in a Graphical User Interface that allows users to keep track of their expenses

💳 MONIFY (EXPENSE TRACKER PRO) 💳 Description Monify is an Expense tracker Program implemented in a Graphical User Interface allows users to add inco

Moyosore Weke 1 Dec 14, 2021
Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Demetri Pananos 9 Oct 04, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 03, 2023
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

69 Oct 19, 2022
a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Arno Barton 1 Oct 29, 2021
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our rep

7.7k Jan 06, 2023
MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

MAg Paper Abstract File structure Dataset prepare Data description How to use MAg? Why not try the MAg_lib! Trained models Experiment and results Some

Calvin Pang 3 Apr 08, 2022