Safe Policy Optimization with Local Features

Last update: Jun 05, 2022

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

Mandatory
- Gym-MiniGrid (https://github.com/maximecb/gym-minigrid)
- Hydra (https://github.com/facebookresearch/hydra)
- pymdptoolbox (https://github.com/sawcordwell/pymdptoolbox)
Optional
- GPy (https://github.com/SheffieldML/GPy)

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}

Safe Policy Optimization with Local Features

Related tags

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

Installation

Simulation configuration

Grid World Experiment

Safety-Gym Experiment

Citation

Owner

Akifumi Wachi

Implementation of Heterogeneous Graph Attention Network

Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

A Python wrapper for Google Tesseract

Multi Camera Calibration

[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

From a body shape, infer the anatomic skeleton.

Implementation of UNet on the Joey ML framework

PyTorch implementation of Pointnet2/Pointnet++

SpineAI Bilsky Grading With Python

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

Official Repo of my work for SREC Nandyal Machine Learning Bootcamp

Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Code for one-stage adaptive set-based HOI detector AS-Net.

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

RodoSol-ALPR Dataset

Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

Deep learning based hand gesture recognition using LSTM and MediaPipie.