Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Last update: May 05, 2022

Overview

Speech_38_ru_commands

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Программа умеет распознавать 38 ключевых слов на русском языке , произнесенных в микрофон из списка:

дальше, вперед, назад, вверх, вниз, выше, ниже, домой, громче, тише, лайк, дизлайк, следующий, предыдущий, сначала, перемотай, выключи, стоп, хватит, замолчи, заткнись, останови, пауза, включи, смотреть, продолжи, играй, запусти, ноль, один, два, три, четыре, пять, шесть, семь, восемь, девять.

Используемая модель была подготовлена для соревнования Yandex Cup 2021 ML Challenge: ASR. Получило 3 место из 54 участников. с показателем точности 92.01

Скачать модель по ссылке https://disk.yandex.ru/d/L053qF-0OPKlog

Пример запуска программы:

python speech_38_ru_commands.py --porog 1.2

где , число 1.2 - это порог уверенности в команде. Можно задавать в диапазоне 0.0 - 7.9999

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Related tags

Overview

Speech_38_ru_commands

Owner

Andrey

VoiceFixer VoiceFixer is a framework for general speech restoration.

Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

Conditional Transformer Language Model for Controllable Generation

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

A high-level Python library for Quantum Natural Language Processing

A PyTorch-based model pruning toolkit for pre-trained language models

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Pangu-Alpha for Transformers

CorNet Correlation Networks for Extreme Multi-label Text Classification

Задания КЕГЭ по информатике 2021 на Python

An open-source NLP library: fast text cleaning and preprocessing.

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

Google and Stanford University released a new pre-trained model called ELECTRA

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Creating an LSTM model to generate music

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

A look-ahead multi-entity Transformer for modeling coordinated agents.

Code Generation using a large neural network called GPT-J

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers