[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

Last update: Nov 21, 2022

Related tags

Overview

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning

This is the Tensorflow implementation of ICLR 2021 paper Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments. We propose a simple method RAPID for exploration through scroring the previous episodes and reproducing the good exploration behaviors with imitation learning.

The implementation is based on OpenAI baselines. For all the experiments, add the option --disable_rapid to see the baseline result. RAPID can achieve better performance and sample efficiency than state-of-the-art exploration methods on MiniGrid environments.

Cite This Work

@inproceedings{
zha2021rank,
title={Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments},
author={Daochen Zha and Wenye Ma and Lei Yuan and Xia Hu and Ji Liu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=MtEE0CktZht}
}

Installation

Please make sure that you have Python 3.5+ installed. First, clone the repo with

git clone https://github.com/daochenzha/rapid.git
cd rapid

Then install the dependencies with pip:

pip install -r requirements.txt
pip install -e .

To run MuJoCo experiments, you need to have the MuJoCo license. Install mujoco-py with

pip install mujoco-py==1.50.1.68

How to run the code

The entry is main.py. Some important hyperparameters are as follows.

--env: what environment to be used
--num_timesteps: the number of timesteps to be run
--w0: the weight of extrinsic reward score
--w1: the weight of local score
--w2: the weight of global score
--sl_until: do the RAPID update until which timestep
--disable_rapid: use it to compare with PPO baseline
--log_dir: the directory to save logs

Reproducing the result of MiniGrid environments

For MiniGrid-KeyCorridorS3R2, run

python main.py --env MiniGrid-KeyCorridorS3R2-v0 --sl_until 1200000

For MiniGrid-KeyCorridorS3R3, run

python main.py --env MiniGrid-KeyCorridorS3R3-v0 --sl_until 3000000

For other environments, run

python main.py --env $ENV

where $ENV is the environment name.

Run MiniWorld Maze environment

Clone the latest master branch of MiniWorld and install it

git clone -b master --single-branch --depth=1 https://github.com/maximecb/gym-miniworld.git
cd gym-miniwolrd
pip install -e .
cd ..

Start training with

python main.py --env MiniWorld-MazeS5-v0 --num_timesteps 5000000 --nsteps 512 --w1 0.00001 --w2 0.0 --log_dir results/MiniWorld-MazeS5-v0

For server without screens, you may install xvfb with

apt-get install xvfb

Then start training with

xvfb-run -a -s "-screen 0 1024x768x24 -ac +extension GLX +render -noreset" python main.py --env MiniWorld-MazeS5-v0 --num_timesteps 5000000 --nsteps 512 --w1 0.00001 --w2 0.0 --log_dir results/MiniWorld-MazeS5-v0

Run MuJoCo experiments

Run

python main.py --seed 0 --env $env --num_timesteps 5000000 --lr 5e-4 --w1 0.001 --w2 0.0 --log_dir logs/$ENV/rapid

where $ENV can be EpisodeSwimmer-v2, EpisodeHopper-v2, EpisodeWalker2d-v2, EpisodeInvertedPendulum-v2, DensityEpisodeSwimmer-v2, or ViscosityEpisodeSwimmer-v2.

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

Related tags

Overview

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning

Cite This Work

Installation

How to run the code

Reproducing the result of MiniGrid environments

Run MiniWorld Maze environment

Run MuJoCo experiments

Owner

Daochen Zha

A novel benchmark dataset for Monocular Layout prediction

An all-in-one application to visualize multiple different local path planning algorithms

A modular, research-friendly framework for high-performance and inference of sequence models at many scales

A framework for Quantification written in Python

A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

Jaxtorch (a jax nn library)

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Block-wisely Supervised Neural Architecture Search with Knowledge Distillation (CVPR 2020)

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Code for Massive-scale Decoding for Text Generation using Lattices

MonoRCNN is a monocular 3D object detection method for automonous driving

pytorch, hand(object) detect ,yolo v5，手检测

Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis for Eyewear Devices

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation