This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Last update: Aug 24, 2022

Related tags

Text Data & NLP Interspeech2021

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: https://arxiv.org/pdf/2104.11462.pdf

(this page is still under construction)

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Owner

Athena is an open-source implementation of end-to-end speech processing engine.

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Semi-automated vocabulary generation from semantic vector models

Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

Open-source offline translation library written in Python. Uses OpenNMT for translations

SGMC: Spectral Graph Matrix Completion

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Active learning for text classification in Python

DELTA is a deep learning based natural language and speech processing platform.

An implementation of WaveNet with fast generation

Non-Autoregressive Predictive Coding

Telegram AI chat bot written in Python using Pyrogram

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Open Source Neural Machine Translation in PyTorch

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!