Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Last update: Apr 12, 2022

Related tags

Overview

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Beining Han, Chongyi Zheng, Harris Chan, Keiran Paster, Michael R. Zhang, Jimmy Ba

Summary: Deep Reinforcement Learning agents often face unanticipated environmental changes after deployment in the real world. These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents. Unfortunately, deep RL policies are usually sensitive to these changes and fail to act robustly against them. This resembles the problem of domain generalization in supervised learning. In this work, we study this problem for goal-conditioned RL agents. We propose a theoretical framework in the Block MDP setting that characterizes the generalizability of goal-conditioned policies to new environments. Under this framework, we develop a practical method PA-SkewFit (PASF) that enhances domain generalization.

@article{han2021learning,
  title={Learning Domain Invariant Representations in Goal-conditioned Block MDPs},
  author={Han, Beining and Zheng, Chongyi and Chan, Harris and Paster, Keiran and Zhang, Michael and Ba, Jimmy},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

Installation

Our code was adapted from rlkit and was tested on a Ubuntu 20.04 server.

This instruction assumes that you have already installed NVIDIA driver, Anaconda, and MuJoCo.

You'll need to get your own MuJoCo key if you want to use MuJoCo.

1. Create Anaconda environment

Install the included Anaconda environment

$ conda env create -f environment/pasf_env.yml
$ source activate pasf_env
(pasf_env) $ python

2. Download the goals

Download the goals from the following link and put it here: (PASF DIR)/multiworld/envs/mujoco.

https://drive.google.com/drive/folders/1L9SYFADWmFzdP1c6wf2yo2WjOlXJh8Iu?usp=sharing

$ ls (PASF DIR)/multiworld/envs/mujoco
... goals ...

(Optional) Speed up with GPU rendering

3. (Optional) Speed-up with GPU rendering

Note: GPU rendering for mujoco-py speeds up training a lot but consumes more GPU memory at the same time.

Check this Issues:

Remember to do this stuff with the mujoco-py package inside of your pasf_env.

Running Experiments

The following command run the PASF experiments for the four tasks: Reach, Door, Push, Pickup, in the learning curve respectively.

$ source activate pasf_env
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_reach_lc_exp.bash
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_door_lc_exp.bash
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_push_lc_exp.bash
(pasf_env) $ bash (PASF DIR)/bash_scripts/pasf_pickup_lc_exp.bash

The bash scripts only set , , and with the exact values we used for LC. But you can play with other hyperparameters in python scripts under (PASF DIR)/experiment.
Training and evaluation environments are chosen in python scripts for each task. You can find the backgrounds in (PASF DIR)/multiworld/core/background and domains in (PASF DIR)/multiworld/envs/assets/sawyer_xyz.
Results are recorded in progress.csv under (PASF DIR)/data/ and variant.json contains configuration for each experiment.
We simply set random seeds as 0, 1, 2, etc., and run experiments with 6-9 different seeds for each task.
Error and output logs can be found in (PASF DIR)/terminal_log.

Questions

If you have any questions, comments, or suggestions, please reach out to Beining Han ([email protected]) and Chongyi Zheng ([email protected]).

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Related tags

Overview

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

Installation

1. Create Anaconda environment

2. Download the goals

3. (Optional) Speed-up with GPU rendering

Running Experiments

Questions

Owner

Chongyi Zheng

Import Python modules from dicts and JSON formatted documents.

HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

Code and data for paper "Deep Photo Style Transfer"

moving object detection for satellite videos.

Official implementation of Sparse Transformer-based Action Recognition

Scalable machine learning based time series forecasting

Training a Resilient Q-Network against Observational Interference, Causal Inference Q-Networks

Unoffical implementation about Image Super-Resolution via Iterative Refinement by Pytorch

🔀 Visual Room Rearrangement

Demonstration of the Model Training as a CI/CD System in Vertex AI

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

An efficient toolkit for Face Stylization based on the paper "AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning"

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Self-Guided Contrastive Learning for BERT Sentence Representations

Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

Mixed Transformer UNet for Medical Image Segmentation