Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

Overview

Accelerating Reinforcement Learning with Learned Skill Priors

[Project Website] [Paper]

Karl Pertsch1, Youngwoon Lee1, Joseph Lim1

1CLVR Lab, University of Southern California

This is the official PyTorch implementation of the paper "Accelerating Reinforcement Learning with Learned Skill Priors" (CoRL 2020).

Updates

  • [Mar 2021]: added an improved version of SPiRL with closed-loop skill decoder (see example commands)

Requirements

  • python 3.7+
  • mujoco 2.0 (for RL experiments)
  • Ubuntu 18.04

Installation Instructions

Create a virtual environment and install all required packages.

cd spirl
pip3 install virtualenv
virtualenv -p $(which python3) ./venv
source ./venv/bin/activate

# Install dependencies and package
pip3 install -r requirements.txt
pip3 install -e .

Set the environment variables that specify the root experiment and data directories. For example:

mkdir ./experiments
mkdir ./data
export EXP_DIR=./experiments
export DATA_DIR=./data

Finally, install our fork of the D4RL benchmark repository by following its installation instructions. It will provide both, the kitchen environment as well as the training data for the skill prior model in kitchen and maze environment.

Example Commands

To train a skill prior model for the kitchen environment, run:

python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/hierarchical --val_data_size=160

Results can be visualized using tensorboard in the experiment directory: tensorboard --logdir=$EXP_DIR.

For training a SPIRL agent on the kitchen environment using the pre-trained skill prior from above, run:

python3 spirl/rl/train.py --path=spirl/configs/hrl/kitchen/spirl --seed=0 --prefix=SPIRL_kitchen_seed0

Results will be written to WandB. Before running RL, create an account and then change the WandB entity and project name at the top of rl/train.py to match your account.

In both commands, kitchen can be replaced with maze / block_stacking to run on the respective environment. Before training models on these environments, the corresponding datasets need to be downloaded (the kitchen dataset gets downloaded automatically) -- download links are provided below. Additional commands for training baseline models / agents are also provided below.

Baseline Commands

  • Train Single-step action prior:
python3 spirl/train.py --path=spirl/configs/skill_prior_learning/kitchen/flat --val_data_size=160
  • Run Vanilla SAC:
python3 spirl/rl/train.py --path=spirl/configs/rl/kitchen/SAC --seed=0 --prefix=SAC_kitchen_seed0
  • Run SAC w/ single-step action prior:
python3 spirl/rl/train.py --path=spirl/configs/rl/kitchen/prior_initialized/flat_prior/ --seed=0 --prefix=flatPrior_kitchen_seed0
  • Run BC + finetune:
python3 spirl/rl/train.py --path=spirl/configs/rl/kitchen/prior_initialized/bc_finetune/ --seed=0 --prefix=bcFinetune_kitchen_seed0
  • Run Skill Space Policy w/o prior:
python3 spirl/rl/train.py --path=spirl/configs/hrl/kitchen/no_prior/ --seed=0 --prefix=SSP_noPrior_kitchen_seed0

Again, all commands can be run on maze / block stacking by replacing kitchen with the respective environment in the paths (after downloading the datasets).

Starting to Modify the Code

Modifying the hyperparameters

The default hyperparameters are defined in the respective model files, e.g. in skill_prior_mdl.py for the SPIRL model. Modifications to these parameters can be defined through the experiment config files (passed to the respective command via the --path variable). For an example, see kitchen/hierarchical/conf.py.

Adding a new dataset for model training

All code that is dataset-specific should be placed in a corresponding subfolder in spirl/data. To add a data loader for a new dataset, the Dataset classes from data_loader.py need to be subclassed and the __getitem__ function needs to be overwritten to load a single data sample. The output dict should include the following keys:

dict({
    'states': (time, state_dim)                 # state sequence (for state-based prior inputs)
    'actions': (time, action_dim)               # action sequence (as skill input for training prior model)
    'images':  (time, channels, width, height)  # image sequence (for image-based prior inputs)
})

All datasets used with the codebase so far have been based on HDF5 files. The GlobalSplitDataset provides functionality to read all HDF5-files in a directory and split them in train/val/test based on percentages. The VideoDataset class provides many functionalities for manipulating sequences, like randomly cropping subsequences, padding etc.

Adding a new RL environment

To add a new RL environment, simply define a new environent class in spirl/rl/envs that inherits from the environment interface in spirl/rl/components/environment.py.

Modifying the skill prior model architecture

Start by defining a model class in the spirl/models directory that inherits from the BaseModel or SkillPriorMdl class. The new model needs to define the architecture in the constructor (e.g. by overwriting the build_network() function), implement the forward pass and loss functions, as well as model-specific logging functionality if desired. For an example, see spirl/models/skill_prior_mdl.py.

Note, that most basic architecture components (MLPs, CNNs, LSTMs, Flow models etc) are defined in spirl/modules and can be conveniently reused for easy architecture definitions. Below are some links to the most important classes.

Component File Description
MLP Predictor Basic N-layer fully-connected network. Defines number of inputs, outputs, layers and hidden units.
CNN-Encoder ConvEncoder Convolutional encoder, number of layers determined by input dimensionality (resolution halved per layer). Number of channels doubles per layer. Returns encoded vector + skip activations.
CNN-Decoder ConvDecoder Mirrors architecture of conv. encoder. Can take skip connections as input, also versions that copy pixels etc.
Processing-LSTM BaseProcessingLSTM Basic N-layer LSTM for processing an input sequence. Produces one output per timestep, number of layers / hidden size configurable.
Prediction-LSTM RecurrentPredictor Same as processing LSTM, but for autoregressive prediction.
Mixture-Density Network MDN MLP that outputs GMM distribution.
Normalizing Flow Model NormalizingFlowModel Implements normalizing flow model that stacks multiple flow blocks. Implementation for RealNVP block provided.

Adding a new RL algorithm

The core RL algorithms are implemented within the Agent class. For adding a new algorithm, a new file needs to be created in spirl/rl/agents and BaseAgent needs to be subclassed. In particular, any required networks (actor, critic etc) need to be constructed and the update(...) function needs to be overwritten. For an example, see the SAC implementation in SACAgent.

The main SPIRL skill prior regularized RL algorithm is implemented in ActionPriorSACAgent.

Detailed Code Structure Overview

spirl
  |- components            # reusable infrastructure for model training
  |    |- base_model.py    # basic model class that all models inherit from
  |    |- checkpointer.py  # handles storing + loading of model checkpoints
  |    |- data_loader.py   # basic dataset classes, new datasets need to inherit from here
  |    |- evaluator.py     # defines basic evaluation routines, eg top-of-N evaluation, + eval logging
  |    |- logger.py        # implements core logging functionality using tensorboardX
  |    |- params.py        # definition of command line params for model training
  |    |- trainer_base.py  # basic training utils used in main trainer file
  |
  |- configs               # all experiment configs should be placed here
  |    |- data_collect     # configs for data collection runs
  |    |- default_data_configs   # defines one default data config per dataset, e.g. state/action dim etc
  |    |- hrl              # configs for hierarchical downstream RL
  |    |- rl               # configs for non-hierarchical downstream RL
  |    |- skill_prior_learning   # configs for skill embedding and prior training (both hierarchical and flat)
  |
  |- data                  # any dataset-specific code (like data generation scripts, custom loaders etc)
  |- models                # holds all model classes that implement forward, loss, visualization
  |- modules               # reusable architecture components (like MLPs, CNNs, LSTMs, Flows etc)
  |- rl                    # all code related to RL
  |    |- agents           # implements core algorithms in agent classes, like SAC etc
  |    |- components       # reusable infrastructure for RL experiments
  |        |- agent.py     # basic agent and hierarchial agent classes - do not implement any specific RL algo
  |        |- critic.py    # basic critic implementations (eg MLP-based critic)
  |        |- environment.py    # defines environment interface, basic gym env
  |        |- normalization.py  # observation normalization classes, only optional
  |        |- params.py    # definition of command line params for RL training
  |        |- policy.py    # basic policy interface definition
  |        |- replay_buffer.py  # simple numpy-array replay buffer, uniform sampling and versions
  |        |- sampler.py   # rollout sampler for collecting experience, for flat and hierarchical agents
  |    |- envs             # all custom RL environments should be defined here
  |    |- policies         # policy implementations go here, MLP-policy and RandomAction are implemented
  |    |- utils            # utilities for RL code like MPI, WandB related code
  |    |- train.py         # main RL training script, builds all components + runs training
  |
  |- utils                 # general utilities, pytorch / visualization utilities etc
  |- train.py              # main model training script, builds all components + runs training loop and logging

The general philosophy is that each new experiment gets a new config file that captures all hyperparameters etc. so that experiments themselves are version controllable.

Datasets

Dataset Link Size
Maze https://drive.google.com/file/d/1pXM-EDCwFrfgUjxITBsR48FqW9gMoXYZ/view?usp=sharing 12GB
Block Stacking https://drive.google.com/file/d/1VobNYJQw_Uwax0kbFG7KOXTgv6ja2s1M/view?usp=sharing 11GB

You can download the datasets used for the experiments in the paper with the links above. To download the data via the command line, see example commands here.

If you want to generate more data or make other modifications to the data generating procedure, we provide instructions for regenerating the maze and block stacking datasets here.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{pertsch2020spirl,
    title={Accelerating Reinforcement Learning with Learned Skill Priors},
    author={Karl Pertsch and Youngwoon Lee and Joseph J. Lim},
    booktitle={Conference on Robot Learning (CoRL)},
    year={2020},
}

Acknowledgements

The model architecture and training code builds on a code base which we jointly developed with Oleh Rybkin for our previous project on hierarchial prediction.

We also published many of the utils / architectural building blocks in a stand-alone package for easy import into your own research projects: check out the blox python module.

Owner
Cognitive Learning for Vision and Robotics (CLVR) lab @ USC
Learning and Reasoning for Artificial Intelligence, especially focused on perception and action. Led by Professor Joseph J. Lim @ USC
Cognitive Learning for Vision and Robotics (CLVR) lab @ USC
A simple implementation of Kalman filter in Multi Object Tracking

kalman Filter in Multi-object Tracking A simple implementation of Kalman filter in Multi Object Tracking 本实现是在https://github.com/liuchangji/kalman-fil

124 Dec 29, 2022
Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

HackerMath for Machine Learning “Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard

Amit Kapoor 1.4k Dec 22, 2022
A Comparative Review of Recent Kinect-Based Action Recognition Algorithms (TIP2020, Matlab codes)

A Comparative Review of Recent Kinect-Based Action Recognition Algorithms This repo contains: the HDG implementation (Matlab codes) for 'Analysis and

Lei Wang 5 Oct 22, 2022
Stochastic Extragradient: General Analysis and Improved Rates

Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G

Hugo Berard 4 Nov 11, 2022
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

169 Jan 07, 2023
Layer 7 DDoS Panel with Cloudflare Bypass ( UAM, CAPTCHA, BFM, etc.. )

Blood Deluxe DDoS DDoS Attack Panel includes CloudFlare Bypass (UAM, CAPTCHA, BFM, etc..)(It works intermittently. Working on it) Don't attack any web

272 Nov 01, 2022
Supporting code for the paper "Dangers of Bayesian Model Averaging under Covariate Shift"

Dangers of Bayesian Model Averaging under Covariate Shift This repository contains the code to reproduce the experiments in the paper Dangers of Bayes

Pavel Izmailov 25 Sep 21, 2022
A simple editor for captions in .SRT file extension

WaySRT A simple editor for captions in .SRT file extension The program doesn't use any external dependecies, just run: python way_srt.py {file_name.sr

Gustavo Lopes 3 Nov 16, 2022
[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search The official implementation of the paper LightTra

Multimedia Research 290 Dec 24, 2022
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Unmesh 7 Jun 26, 2022
The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Deep Exemplar-based Video Colorization (Pytorch Implementation) Paper | Pretrained Model | Youtube video 🔥 | Colab demo Deep Exemplar-based Video Col

Bo Zhang 253 Dec 27, 2022
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

Athar Sefid 6 Nov 02, 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Where we are ? 12.27 目前和原论文仍有1%左右得差距,但已经力压很多SOTA了 ckpt__448_epoch_25.pth mIoU

zichengsaber 60 Dec 11, 2022
Covid-19 Test AI (Deep Learning - NNs) Software. Accuracy is the %96.5, loss is the 0.09 :)

Covid-19 Test AI (Deep Learning - NNs) Software I developed a segmentation algorithm to understand whether Covid-19 Test Photos are positive or negati

Emirhan BULUT 28 Dec 04, 2021
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

ICT.MIRACLE lab 75 Dec 26, 2022
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
Simulation-based inference for the Galactic Center Excess

Simulation-based inference for the Galactic Center Excess Siddharth Mishra-Sharma and Kyle Cranmer Abstract The nature of the Fermi gamma-ray Galactic

Siddharth Mishra-Sharma 3 Jan 21, 2022
Some experiments with tennis player aging curves using Hilbert space GPs in PyMC. Only experimental for now.

NOTE: This is still being developed! Setup notes This document uses Jeff Sackmann's tennis data. You can obtain it as follows: git clone https://githu

Martin Ingram 1 Jan 20, 2022
Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras

Face Mask Detection Face Mask Detection System built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

Chandrika Deb 1.4k Jan 03, 2023