End-to-end speech secognition toolkit

Overview

End-to-end speech secognition toolkit

This is an E2E ASR toolkit modified from Espnet1 (version 0.9.9).
This is the official implementation of paper:
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
This is also the official implementation of paper:
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model
We achieve state-of-the-art results on two of the most popular results in Aishell-1 and AIshell-2 Mandarin datasets.
Please feel free to change / modify the code as you like. :)

Update

  • 2021/12/29: Release the first version, which contains all MMI-related features, including MMI training criteria, MMI Prefix Score (for attention-based encoder-decoder, AED) and MMI Alignment Score (For neural transducer, NT).
  • 2022/1/6: Release the word-level N-gram LM scorer.

Environment:

The main dependencies of this code can be divided into three part: kaldi, espnet and k2.

  1. kaldi is mainly used for feature extraction. To install kaldi, please follow the instructions here.
  2. Espnet is a open-source end-to-end speech recognition toolkit. please follow the instructions here to install its environment.
    2.1. Pytorch, cudatoolkit, along with many other dependencies will be install automatically during this process. 2.2. If you are going to use NT models, you are recommend to install a RNN-T warpper. Please run ${ESPNET_ROOT}/tools/installer/install_warp-transducer.sh
    2.3. Once you have installed the espnet envrionment successfully, please run pip uninstall espnet to remove the espnet library. So our code will be used.
    2.4. Also link the kaldi in ${ESPNET_ROOT}: ln -s ${KALDI-ROOT} ${ESPNET_ROOT}
  3. k2 is a python-based FST library. Please follow the instructions here to install it. GPU version is required.
    3.1. To use word N-gram LM, please also install kaldilm
  4. There might be some dependency conflicts during building the environment. We report ours below as a reference:
    4.1 OS: CentOS 7; GCC 7.3.1; Python 3.8.10; CUDA 10.1; Pytorch 1.7.1; k2-fsa 1.2 (very old for now)
    4.2 Other python libraries are in requirement.txt (It is not recommend to use this file to build the environment directly).

Results

Currently we have released examples on Aishell-1 and Aishell-2 datasets.

With MMI training & decoding methods and the word-level N-gram LM. We achieve results on Aishell-1 and Aishell-2 as below. All results are in CER%

Test set Aishell-1-dev Aishell-1-test Aishell-2-ios Aishell-2-android Aishell-2-mic
AED 4.73 5.32 5.73 6.56 6.53
AED + MMI + Word Ngram 4.08 4.45 5.26 6.22 5.92
NT 4.41 4.81 5.70 6.75 6.58
NT + MMI + Word Ngram 3.86 4.18 5.06 6.08 5.98

(example on Librispeech is not fully prepared)

Get Start

Take Aishell-1 as an example. Working process for other examples are very similar.
Prepare data and LMs

cd ${ESPNET_ROOT}/egs/aishell1
source path.sh
bash prepare.sh # prepare the data

split the json file of training data for each GPU. (we use 8GPUs)

python3 espnet_utils/splitjson.py -p 
   
     dump/train_sp/deltafalse/data.json

   

Training and decoding for NT model:

bash nt.sh      # to train the nueal transducer model

Training and decoding for AED model:

bash aed.sh     # or to train the attention-based encoder-decoder model

Several Hint:

  1. Please change the paths in path.sh accordingly before you start
  2. Please change the data to config your data path in prepare.sh
  3. Our code runs in DDP style. Before you start, you need to set them manually. We assume Pytorch distributed API works well on your machine.
export HOST_GPU_NUM=x       # number of GPUs on each host
export HOST_NUM=x           # number of hosts
export NODE_NUM=x           # number of GPUs in total (on all hosts)
export INDEX=x              # index of this host
export CHIEF_IP=xx.xx.xx.xx # IP of the master host
  1. Multiple choices are available during decoding (we take aed.sh as an example, but the usage of nt.sh is the same).
    To use the MMI-related scorers, you need train the model with MMI auxiliary criterion;

To use MMI Prefix Score (in AED) or MMI Alignment score (in NT):

bash aed.sh --stage 2 --mmi-weight 0.2

To use any external LM, you need to train them in advance (as implemented in prepare.sh)

To use word-level N-gram LM:

bash aed.sh --stage 2 --word-ngram-weight 0.4

To use character-level N-gram LM:

bash aed.sh --stage 2 --ngram-weight 1.0

To use neural network LM:

bash aed.sh --stage 2 --lm-weight 1.0

Reference

kaldi: https://github.com/kaldi-asr/kaldi
Espent: https://github.com/espnet/espnet
k2-fsa: https://github.com/k2-fsa/k2

Citations

@article{tian2021consistent,  
  title={Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI},  
  author={Tian, Jinchuan and Yu, Jianwei and Weng, Chao and Zhang, Shi-Xiong and Su, Dan and Yu, Dong and Zou, Yuexian},  
  journal={arXiv preprint arXiv:2112.02498},  
  year={2021}  
}  

@misc{tian2022improving,
      title={Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model}, 
      author={Jinchuan Tian and Jianwei Yu and Chao Weng and Yuexian Zou and Dong Yu},
      year={2022},
      eprint={2201.01995},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Authorship

Jinchuan Tian; [email protected] or [email protected]
Jianwei Yu; [email protected] (supervisor)
Chao Weng; [email protected]
Yuexian Zou; [email protected]

Owner
Jinchuan Tian
Graduate student @ Peking University, Shenzhen; Research intern @ Tencent AI LAB;
Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"

Scripts for "Current best-practices in single-cell RNA-seq: a tutorial" This repository is complementary to the publication: M.D. Luecken, F.J. Theis,

Theis Lab 968 Dec 28, 2022
Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

星星的孩子 - 一款为孤独症孩子设计的聊天机器人游戏 孤独症儿童是目前常常被忽视的一类群体。他们有着类似性格内向的特征,实际却受着广泛性发育障碍的折磨。 项目背景 这类儿童在与人交往时存在着沟通障碍,其特点表现在: 社交交流差,互动障碍明显 认知能力有限,被动认知 兴趣狭窄,重复刻板,缺乏变化和想象

Tianyi Pan 35 Nov 24, 2022
Creating Multi Task Models With Keras

Creating Multi Task Models With Keras About The Project! I used the keras and Tensorflow Library, To build a Deep Learning Neural Network to Creating

Srajan Chourasia 4 Nov 28, 2022
Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

Bib Parser Convenient script to parse .bib files with the ACM Digital Library li

Mehtab Iqbal (Shahan) 1 Jan 26, 2022
Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

Nils Thuerey 1.3k Jan 08, 2023
An SMPC companion library for Syft

SyMPC A library that extends PySyft with SMPC support SyMPC /ˈsɪmpəθi/ is a library which extends PySyft ≥0.3 with SMPC support. It allows computing o

Arturo Marquez Flores 0 Oct 13, 2021
NAS-Bench-x11 and the Power of Learning Curves

NAS-Bench-x11 NAS-Bench-x11 and the Power of Learning Curves Shen Yan, Colin White, Yash Savani, Frank Hutter. NeurIPS 2021. Surrogate NAS benchmarks

AutoML-Freiburg-Hannover 13 Nov 18, 2022
Object-aware Contrastive Learning for Debiased Scene Representation

Object-aware Contrastive Learning Official PyTorch implementation of "Object-aware Contrastive Learning for Debiased Scene Representation" by Sangwoo

43 Dec 14, 2022
Image-Scaling Attacks and Defenses

Image-Scaling Attacks & Defenses This repository belongs to our publication: Erwin Quiring, David Klein, Daniel Arp, Martin Johns and Konrad Rieck. Ad

Erwin Quiring 163 Nov 21, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

Object DGCNN & DETR3D This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110

Wang, Yue 539 Jan 07, 2023
Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

120 Dec 15, 2022
PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose Release Notes The official PyTorch implementation of Neural View S

Angtian Wang 20 Oct 09, 2022
Paper: De-rendering Stylized Texts

Paper: De-rendering Stylized Texts Wataru Shimoda1, Daichi Haraguchi2, Seiichi Uchida2, Kota Yamaguchi1 1CyberAgent.Inc, 2 Kyushu University Accepted

CyberAgent AI Lab 55 Dec 18, 2022
PyTorch Implementation for Deep Metric Learning Pipelines

Easily Extendable Basic Deep Metric Learning Pipeline Karsten Roth ([email 

Karsten Roth 543 Jan 04, 2023
This is a JAX implementation of Neural Radiance Fields for learning purposes.

learn-nerf This is a JAX implementation of Neural Radiance Fields for learning purposes. I've been curious about NeRF and its follow-up work for a whi

Alex Nichol 62 Dec 20, 2022
PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

DARDet PyTorch implementation of "DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images", [pdf]. Highlights: 1. We develop a new dense

41 Oct 23, 2022
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners A PyTorch re-implementation of Mask Autoencoder trai

Tianyu Hua 23 Dec 13, 2022
Python calculations for the position of the sun and moon.

Astral This is 'astral' a Python module which calculates Times for various positions of the sun: dawn, sunrise, solar noon, sunset, dusk, solar elevat

Simon Kennedy 169 Dec 20, 2022
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 04, 2023