Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Last update: Dec 20, 2022

Related tags

Deep Learning BPref

Overview

B-Pref

Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments.

Install

conda env create -f conda_env.yml
pip install -e .[docs,tests,extra]
cd custom_dmcontrol
pip install -e .
cd custom_dmc2gym
pip install -e .
pip install git+https://github.com/rlworkgroup/[email protected]#egg=metaworld
pip install pybullet

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_sac.sh 
./scripts/[env_name]/run_sac_unsuper.sh

PPO & PPO + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_ppo.sh 
./scripts/[env_name]/run_ppo_unsuper.sh

Run experiments on irrational teacher

To design more realistic models of human teachers, we consider a common stochastic model and systematically manipulate its terms and operators:

teacher_beta: rationality constant of stochastic preference model (default: -1 for perfectly rational model)
teacher_gamma: discount factor to model myopic behavior (default: 1)
teacher_eps_mistake: probability of making a mistake (default: 0)
teacher_eps_skip: hyperparameters to control skip threshold (\in [0,1])
teacher_eps_equal: hyperparameters to control equal threshold (\in [0,1])

In B-Pref, we tried the following teachers:

Oracle teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Mistake teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0.1, teacher_eps_skip=0, teacher_eps_equal=0)

Noisy teacher: (teacher_beta=1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Skip teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0.1, teacher_eps_equal=0)

Myopic teacher: (teacher_beta=-1, teacher_gamma=0.9, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Equal teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0.1)

PEBBLE

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PEBBLE.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

PrefPPO

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PrefPPO.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

note: full hyper-paramters for meta-world will be updated soon!

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Related tags

Overview

B-Pref

Install

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

PPO & PPO + unsupervised pre-training

Run experiments on irrational teacher

PEBBLE

PrefPPO

Owner

[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution

Code accompanying "Adaptive Methods for Aggregated Domain Generalization"

A curated list of awesome projects and resources related fastai

Reference implementation for Structured Prediction with Deep Value Networks

GLM (General Language Model)

Simple tools for logging and visualizing, loading and training

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Implementation of the GBST block from the Charformer paper, in Pytorch

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Out-of-Distribution Generalization of Chest X-ray Using Risk Extrapolation

Medical Image Segmentation using Squeeze-and-Expansion Transformers

Seq2seq - Sequence to Sequence Learning with Keras

Python scripts form performing stereo depth estimation using the CoEx model in ONNX.

FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis