"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Last update: Oct 21, 2022

Related tags

Overview

bandit-nmt

THIS REPO DEMONSTRATES HOW TO INTEGRATE A POLICY GRADIENT METHOD INTO NMT. FOR A STATE-OF-THE-ART NMT CODEBASE, VISIT simple-nmt.

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-decoder model and benchmarks the combination under simulated noisy rewards.

Requirements:

Python 3.6
PyTorch 0.2

NOTE: as of Sep 16 2017, the code got 2x slower when I upgraded to PyTorch 2.0. This is a known issue and PyTorch is fixing it.

IMPORTANT: Set home directory (otherwise scripts will not run correctly):

> export BANDIT_HOME=$PWD
> export DATA=$BANDIT_HOME/data
> export SCRIPT=$BANDIT_HOME/scripts

Data extraction

Download pre-processing scripts

> cd $DATA/scripts
> bash download_scripts.sh

For German-English

> cd $DATA/en-de
> bash extract_data_de_en.sh

NOTE: train_2014 and train_2015 highly overlap. Please be cautious when using them for other projects.

Data should be ready in $DATA/en-de/prep

TODO: Chinese-English needs segmentation

Data pre-processing

> cd $SCRIPT
> bash make_data.sh de en

Pretraining

Pretrain both actor and critic

> cd $SCRIPT
> bash pretrain.sh en-de $YOUR_LOG_DIR

See scripts/pretrain.sh for more details.

Pretrain actor only

> cd $BANDIT_HOME
> python train.py -data $YOUR_DATA -save_dir $YOUR_SAVE_DIR -end_epoch 10

Reinforcement training

> cd $BANDIT_HOME

From scratch

> python train.py -data $YOUR_DATA -save_dir $YOUR_SAVE_DIR -start_reinforce 10 -end_epoch 100 -critic_pretrain_epochs 5

From a pretrained model

> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -save_dir $YOUR_SAVE_DIR -start_reinforce -1 -end_epoch 100 -critic_pretrain_epochs 5

Perturbed rewards

For example, use thumb up/thump down reward:

> cd $BANDIT_HOME
> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -save_dir $YOUR_SAVE_DIR -start_reinforce -1 -end_epoch 100 -critic_pretrain_epochs 5 -pert_func bin -pert_param 1

See lib/metric/PertFunction.py for more types of function.

Evaluation

> cd $BANDIT_HOME

On heldout sets (heldout BLEU):

> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -eval -save_dir .

On bandit set (per-sentence BLEU):

> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -eval_sample -save_dir .

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Related tags

Overview

bandit-nmt

Data extraction

Data pre-processing

Pretraining

Reinforcement training

Perturbed rewards

Evaluation

Owner

Khanh Nguyen

Pytorch implementation of forward and inverse Haar Wavelets 2D

Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays

EssentialMC2 Video Understanding

A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

Implementation of neural class expression synthesizers

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

A tensorflow=1.13 implementation of Deconvolutional Networks on Graph Data (NeurIPS 2021)

DuBE: Duple-balanced Ensemble Learning from Skewed Data

This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

Robust & Reliable Route Recommendation on Road Networks

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Lenia - Mathematical Life Forms

Funnels: Exact maximum likelihood with dimensionality reduction.

Original Implementation of Prompt Tuning from Lester, et al, 2021

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation