Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Related tags

Deep Learninglfgp
Overview

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning

Trevor Ablett*, Bryan Chan*, Jonathan Kelly (*equal contribution)

Poster at Neurips 2021 Deep Reinforcement Learning Workshop


Adversarial Imitation Learning (AIL) is a technique for learning from demonstrations that helps remedy the distribution shift problem that occurs with Behavioural Cloning. Empirically, we found that for manipulation tasks, off-policy AIL can suffer from inefficient or stagnated learning. In this work, we resolve this by enforcing exploration of a set of easy-to-define auxiliary tasks, in addition to a main task.

This repository contains the source code for reproducing our results.

Setup

We recommend the readers set up a virtual environment (e.g. virtualenv, conda, pyenv, etc.). Please also ensure to use Python 3.7 as we have not tested in any other Python versions. In the following, we assume the working directory is the directory containing this README:

.
├── lfgp_data/
├── liegroups/
├── manipulator-learning/
├── rl_sandbox/
├── README.md
└── requirements.txt

To install, simply clone and install with pip, which will automatically install all dependencies:

git clone [email protected]:utiasSTARS/lfgp.git && cd lfgp
pip install rl_sandbox

Environments

In this paper, we evaluated our method in the four environments listed below:

bring_0                  # bring blue block to blue zone
stack_0                  # stack blue block onto green block
insert_0                 # insert blue block into blue zone slot
unstack_stack_env_only_0 # remove green block from blue block, and stack blue block onto green block

Trained Models and Expert Data

The expert and trained lfgp models can be found at this google drive link. The zip file is 570MB. All of our generated expert data is included, but we only include single seeds of each trained model to reduce the size.

The Data Directory

This subsection provides the desired directory structure that we will be assuming for the remaining README. The unzipped lfgp_data directory follows the structure:

.
├── lfgp_data/
│   ├── expert_data/
│   │   ├── unstack_stack_env_only_0-expert_data/
│   │   │   ├── reset/
│   │   │   │   ├── 54000_steps/
│   │   │   │   └── 9000_steps/
│   │   │   └── play/
│   │   │       └── 9000_steps/
│   │   ├── stack_0-expert_data/
│   │   │   └── (same as unstack_stack_env_only_0-expert_data)/
│   │   ├── insert_0-expert_data/
│   │   │   └── (same as unstack_stack_env_only_0-expert_data)/
│   │   └── bring_0-expert_data/
│   │       └── (same as unstack_stack_env_only_0-expert_data)/
│   └── trained_models/
│       ├── experts/
│       │   ├── unstack_stack_env_only_0/
│       │   ├── stack_0/
│       │   ├── insert_0/
│       │   └── bring_0/
│       ├── unstack_stack_env_only_0/
│       │   ├── multitask_bc/
│       │   ├── lfgp_ns/
│       │   ├── lfgp/
│       │   ├── dac/
│       │   ├── bc_less_data/
│       │   └── bc/
│       ├── stack_0/
│       │   └── (same as unstack_stack_env_only_0)
│       ├── insert_0/
│       │   └── (same as unstack_stack_env_only_0)
│       └── bring_0/
│           └── (same as unstack_stack_env_only_0)
├── liegroups/
├── manipulator-learning/
├── rl_sandbox/
├── README.md
└── requirements.txt

Create Expert and Generate Expert Demonstrations

Readers can generate their own experts and expert demonstrations by executing the scripts in the rl_sandbox/rl_sandbox/examples/lfgp/experts directory. More specifically, create_expert.py and create_expert_data.py respectively train the expert and generate the expert demonstrations. We note that training the expert is time consuming and may take up to multiple days.

To create an expert, you can run the following command:

# Create a stack expert using SAC-X with seed 0. --gpu_buffer would store the replay buffer on the GPU.
# For more details, please use --help command for more options.
python rl_sandbox/rl_sandbox/examples/lfgp/experts/create_expert.py \
    --seed=0 \
    --main_task=stack_0 \
    --device=cuda \
    --gpu_buffer

A results directory will be generated. A tensorboard, an experiment setting, a training progress file, model checkpoints, and a buffer checkpoint will be created.

To generate play-based and reset-based expert data using a trained model, you can run the following commands:

# Generate play-based stack expert data with seed 1. The program halts when one of --num_episodes or --num_steps is satisfied.
# For more details, please use --help command for more options
python rl_sandbox/rl_sandbox/examples/lfgp/experts/create_expert_data.py \
--model_path=data/stack_0/expert/state_dict.pt \
--config_path=data/stack_0/expert/sacx_experiment_setting.pkl \
--save_path=./test_expert_data \
--num_episodes=10 \
--num_steps=1000 \
--seed=1 \
--render

# Generate reset-based stack expert data with seed 1. Note that --num_episodes will need to be scaled by number of tasks (i.e. num_episodes * num_tasks).
python rl_sandbox/rl_sandbox/examples/lfgp/experts/create_expert_data.py \
--model_path=data/stack_0/expert/state_dict.pt \
--config_path=data/stack_0/expert/sacx_experiment_setting.pkl \
--save_path=./test_expert_data \
--num_episodes=10 \
--num_steps=1000 \
--seed=1 \
--render \
--reset_between_intentions

The generated expert data will be stored under --save_path, in separate files int_0.gz, ..., int_{num_tasks - 1}.gz.

Training the Models with Imitation Learning

In the following, we assume the expert data is generated following the previous section and is stored under test_expert_data. The training scripts run_*.py are stored in rl_sandbox/rl_sandbox/examples/lfgp directory. There are five run scripts, each corresponding to a variant of the compared methods (except for behavioural cloning less data, since the change is only in the expert data). The runs will be saved in the same results directory mentioned previously. Note that the default hyperparameters specified in the scripts are listed on the appendix.

Behavioural Cloning (BC)

There are two scripts for single-task and multitask BC: run_bc.py and run_multitask_bc.py. You can run the following commands:

# Train single-task BC agent to stack with using reset-based data.
# NOTE: intention 2 is the main intention (i.e. stack intention). The main intention is indexed at 2 for all environments.
python rl_sandbox/rl_sandbox/examples/lfgp/run_bc.py \
--seed=0 \
--expert_path=test_expert_data/int_2.gz \
--main_task=stack_0 \
--render \
--device=cuda

# Train multitask BC agent to stack with using reset-based data.
python rl_sandbox/rl_sandbox/examples/lfgp/run_multitask_bc.py \
--seed=0 \
--expert_paths=test_expert_data/int_0.gz,\
test_expert_data/int_1.gz,\
test_expert_data/int_2.gz,\
test_expert_data/int_3.gz,\
test_expert_data/int_4.gz,\
test_expert_data/int_5.gz
--main_task=stack_0 \
--render \
--device=cuda

Adversarial Imitation learning (AIL)

There are three scripts for Discriminator-Actor-Critic (DAC), Learning from Guided Play (LfGP), and LfGP-NS (No Schedule): run_dac.py, run_lfgp.py, run_lfgp_ns.py. You can run the following commands:

# Train DAC agent to stack with using reset-based data.
python rl_sandbox/rl_sandbox/examples/lfgp/run_dac.py \
--seed=0 \
--expert_path=test_expert_data/int_2.gz \
--main_task=stack_0 \
--render \
--device=cuda

# Train LfGP agent to stack with using reset-based data.
python rl_sandbox/rl_sandbox/examples/lfgp/run_lfgp.py \
--seed=0 \
--expert_paths=test_expert_data/int_0.gz,\
test_expert_data/int_1.gz,\
test_expert_data/int_2.gz,\
test_expert_data/int_3.gz,\
test_expert_data/int_4.gz,\
test_expert_data/int_5.gz
--main_task=stack_0 \
--device=cuda \
--render

# Train LfGP-NS agent to stack with using reset-based data.
python rl_sandbox/rl_sandbox/examples/lfgp/run_lfgp_ns.py \
--seed=0 \
--expert_paths=test_expert_data/int_0.gz,\
test_expert_data/int_1.gz,\
test_expert_data/int_2.gz,\
test_expert_data/int_3.gz,\
test_expert_data/int_4.gz,\
test_expert_data/int_5.gz,\
test_expert_data/int_6.gz \
--main_task=stack_0 \
--device=cuda \
--render

Evaluating the Models

The readers may load up trained agents and evaluate them using the evaluate.py script under the rl_sandbox/rl_sandbox/examples/eval_tools directory. Currently, only the lfgp agent is supplied due to the space restrictions mentioned above.

# For single-task agents - DAC, BC
# To run single-task agent (e.g. BC)
python rl_sandbox/rl_sandbox/examples/eval_tools/evaluate.py \
--seed=1 \
--model_path=data/stack_0/il_agents/bc/state_dict.pt \
--config_path=data/stack_0/il_agents/bc/bc_experiment_setting.pkl \
--num_episodes=5 \
--intention=0 \
--render \
--device=cuda

# For multitask agents - SAC-X, LfGP, LfGP-NS, Multitask BC
# To run all intentions for multitask agents (e.g. SAC-X)
python rl_sandbox/rl_sandbox/examples/eval_tools/evaluate.py \
--seed=1 \
--model_path=data/stack_0/expert/state_dict.pt \
--config_path=data/stack_0/expert/sacx_experiment_setting.pkl \
--num_episodes=5 \
--intention=-1 \
--render \
--device=cuda

# To run only the main intention for multitask agents (e.g. LfGP)
python rl_sandbox/rl_sandbox/examples/eval_tools/evaluate.py \
--seed=1 \
--model_path=data/stack_0/il_agents/lfgp/state_dict.pt \
--config_path=data/stack_0/il_agents/lfgp/lfgp_experiment_setting.pkl \
--num_episodes=5 \
--intention=2 \
--render \
--device=cuda

Owner
STARS Laboratory
We are the Space and Terrestrial Autonomous Robotic Systems Laboratory at the University of Toronto
STARS Laboratory
D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos This repository contains the implementation for "D²Conv3D: Dynamic Dilated Co

17 Oct 20, 2022
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022
基于pytorch构建cyclegan示例

cyclegan-demo 基于Pytorch构建CycleGAN示例 如何运行 准备数据集 将数据集整理成4个文件,分别命名为 trainA, trainB:训练集,A、B代表两类图片 testA, testB:测试集,A、B代表两类图片 例如 D:\CODE\CYCLEGAN-DEMO\DATA

Koorye 3 Oct 18, 2022
using yolox+deepsort for object-tracker

YOLOX_deepsort_tracker yolox+deepsort实现目标跟踪 最新的yolox尝尝鲜~~(yolox正处在频繁更新阶段,因此直接链接yolox仓库作为子模块) Install Clone the repository recursively: git clone --rec

245 Dec 26, 2022
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Ranger-Deep-Learning-Optimizer Ranger - a synergistic optimizer combining RAdam (Rectified Adam) and LookAhead, and now GC (gradient centralization) i

Less Wright 1.1k Dec 21, 2022
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

2021: A Year Full of Amazing AI papers- A Review 📌 A curated list of the latest breakthroughs in AI by release date with a clear video explanation, l

Louis-François Bouchard 2.9k Dec 31, 2022
Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

Chenhongyi Yang 21 Dec 13, 2022
This repo is to present various code demos on how to use our Graph4NLP library.

Deep Learning on Graphs for Natural Language Processing Demo The repository contains code examples for DLG4NLP tutorials at NAACL 2021, SIGIR 2021, KD

Graph4AI 143 Dec 23, 2022
Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

Eric Alcaide 36 Nov 29, 2022
Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

PackNet: https://arxiv.org/abs/1711.05769 Pretrained models are available here: https://uofi.box.com/s/zap2p03tnst9dfisad4u0sfupc0y1fxt Datasets in Py

Arun Mallya 216 Jan 05, 2023
Pytorch library for seismic data augmentation

Pytorch library for seismic data augmentation

Artemii Novoselov 27 Nov 22, 2022
Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Visual Transformer for Facial Emotion Recognition (FER) This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recog

Mario Sessa 8 Dec 12, 2022
Platform-agnostic AI Framework 🔥

🇬🇧 TensorLayerX is a multi-backend AI framework, which can run on almost all operation systems and AI hardwares, and support hybrid-framework progra

TensorLayer Community 171 Jan 06, 2023
Reinforcement Learning for Portfolio Management

qtrader Reinforcement Learning for Portfolio Management Why Reinforcement Learning? Learns the optimal action, rather than models the market. Adaptive

Angelos Filos 406 Jan 01, 2023
Vector.ai assignment

fabio-tests-nisargatman Low Level Approach: ###Tables: continents: id*, name, population, area, createdAt, updatedAt countries: id*, name, population,

Ravi Pullagurla 1 Nov 09, 2021
A simple API wrapper for Discord interactions.

Your ultimate Discord interactions library for discord.py. About | Installation | Examples | Discord | PyPI About What is discord-py-interactions? dis

james 641 Jan 03, 2023
NHS AI Lab Skunkworks project: Long Stayer Risk Stratification

NHS AI Lab Skunkworks project: Long Stayer Risk Stratification A pilot project for the NHS AI Lab Skunkworks team, Long Stayer Risk Stratification use

NHSX 21 Nov 14, 2022
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Contrastive Unpaired Translation (CUT) video (1m) | video (10m) | website | paper We provide our PyTorch implementation of unpaired image-to-image tra

1.7k Dec 27, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework

OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework Introduction OpenFed is a foundational library for federated learning

25 Dec 12, 2022