The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

Last update: Dec 17, 2022

Overview

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation .

Requirement:

apex
fairseq
scikit-learn
pytorch

Process data following https://github.com/pytorch/fairseq/tree/main/examples/translation#multilingual-translation.
Training:

data_bin=    # data path 
lang_pairs=  # comma separated language pairs

fairseq-train $data_path \
    --task parameter_differentiation_task --lang-pairs $lang_pairs --encoder-langtok tgt \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --optimizer adam --lr 0.0015 --adam-betas '(0.9,0.98)' \
    --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
    --arch parameter_differentiation_base_model \
    --max-tokens 8192 \
    --user-dir $PWD

Decoding

source_lang=
target_lang=
model_path=
fairseq-generate $data_path --path $model_path \
    --task parameter_differentiation_task --lang-pairs $lang_pairs --encoder-langtok tgt \
    --beam 4 --lenpen 0.6 --remove-bpe sentencepiece \
    --source-lang $source_lang --target-lang $target_lang > result.$source_lang-$target_lang.txt

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

Related tags

Overview

Owner

Qian Wang

Learning Spatio-Temporal Transformer for Visual Tracking

A simple Streamlit App to classify swahili news into different categories.

A PyTorch implementation of VIOLET

Topic Inference with Zeroshot models

Natural Language Processing Specialization

A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS.

An implementation of the Pay Attention when Required transformer

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

A Lightweight NLP Data Loader for All Deep Learning Frameworks in Python

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

LUKE -- Language Understanding with Knowledge-based Embeddings

Conversational text Analysis using various NLP techniques

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

LSTM model - IMDB review sentiment analysis

BERT Attention Analysis