Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)

Planner_backend - Academic planner application designed for students and counselors.

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

An efficient PyTorch implementation of the evaluation metrics in recommender systems.

A python tutorial on bayesian modeling techniques (PyMC3)

InvTorch: memory-efficient models with invertible functions

GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

neural image generation

Official Implementation of LARGE: Latent-Based Regression through GAN Semantics

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Robot Reinforcement Learning on the Constraint Manifold

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

DeepStochlog Package For Python

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

Automatically replace ONNX's RandomNormal node with Constant node.

PyTorch implementation of PNASNet-5 on ImageNet

🏃‍♀️ A curated list about human motion capture, analysis and synthesis.

An executor that performs image segmentation on fashion items