Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms applied on Continuous Control Tasks

Overview

Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms applied on Continuous Control Tasks

This is the master thesis project by Giacomo Arcieri, written at the FZI Research Center for Information Technology (Karlsruhe, Germany).

Introduction

Model-Based Reinforcement Learning (MBRL) has recently become popular as it is expected to solve RL problems with fewer trials (i.e. higher sample efficiency) than model-free methods. However, it is not clear how much of the recent MBRL progress is due to improved algorithms or due to improved models. Hence, this work compares a set of mathematical methods that are commonly used as models for MBRL. This thesis aims to provide a benchmark to assess the model influence on RL algorithms. The evaluated models will be (deterministic) Neural Networks (NNs), ensembles of (deterministic) NNs, Bayesian Neural Networks (BNNs), and Gaussian Processes (GPs). Two different and innovative BNNs are applied: the Concrete Dropout NN and the Anchored Ensembling. The model performance is assessed on a large suite of different benchmarking environments, namely one OpenAI Gym Classic Control problem (Pendulum) and seven PyBullet-Gym tasks (MuJoCo implementation). The RL algorithm the model performance is assessed on is Model Predictive Control (MPC) combined with Random Shooting (RS).

Requirements

This project is tested on Python 3.6.

First, you can perform a minimal installation of OpenAI Gym with

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

Then, you can install Pybullet-Gym with

git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .

Important: Do not use python setup.py install or other Pybullet-Gym installation methods.

Finally, you can install all the dependencies with

pip install -r requirements.txt

Important: There are a couple of changes to make in two Pybullet-Gym envs:

  1. There is currently a mistake in Hopper. This project uses HopperMuJoCoEnv-v0, but this env imports the Roboschool locomotor instead of the MuJoCo locomotor. Open the file
pybullet-gym/pybulletgym/envs/mujoco/envs/locomotion/hopper_env.py

and change

from pybulletgym.envs.roboschool.robots.locomotors import Hopper

with

from pybulletgym.envs.mujoco.robots.locomotors.hopper import Hopper
  1. Ant has obs_dim=111 but only the first 27 obs are important, the others are only zeros. If it is true that these zeros do not affect performance, it is also true they slow down the training, especially for the Gaussian Process. Therefore, it is better to delete these unimportant obs. Open the file
pybullet-gym/pybulletgym/envs/mujoco/robots/locomotors/ant.py

and set obs_dim=27 and comment or delete line 25

np.clip(cfrc_ext, -1, 1).flat

Project Description

Models

The models are defined in the folder models:

  • deterministicNN.py: it includes the deterministic NN (NN) and the deterministic ensemble (ens_NNs).

  • PNN.py: here the Anchored Ensembling is defined following this example. PNN defines one NN of the Anchored Ensembling. This is needed to define ens_PNNs which is the Anchored Ensembling as well as the model applied in the evaluation.

  • ConcreteDropout.py: it defines the Concrete Dropout NN, mainly based on the Yarin Gal's notebook, but also on this other project. First, the ConcreteDropout Layer is defined. Then, the Concrete Dropout NN is designed (BNN). Finally, also an ensemble of Concrete Dropout NNs is defined (ens_BNN), but I did not use it in the model comparison (ens_BNN is extremely slow and BNN is already like an ensemble).

  • GP.py: it defines the Gaussian Process model based on gpflow. Two different versions are applied: the GPR and the SVGP (choose by setting the parameter gp_model). Only the GPR performance is reported in the evaluation because the SVGP has not even solved the Pendulum environment.

RL algorithm

The model performance is evaluated in the following files:

  1. main.py: it is defined the function main which takes all the params that are passed to MB_trainer. Five MB_trainer are initialized, each with a different seed, which are run in parallel. It is also possible to run two models in parallel by setting the param model2 as well.

  2. MB_trainer.py: it includes the initialization of the env and the model as well as the RL training loop. The function play_one_step computes one step of the loop. The model is trained with the function training_step. At the end of the loop, a pickle file is saved, wich includes all the rewards achieved by the model in all the episodes of the env.

  3. play_one_step.py: it includes all the functions to compute one step (i.e. to choose one action): the epsilon greedy policy for the exploration, the Information Gain exploration, and the exploitation of the model with MPC+RS (function get_action). The rewards as well as the RS trajectories are computed with the cost functions in cost_functions.py.

  4. training_step.py: first the relevant information is prepared by the function data_training, then the model is trained with the function training_step.

  5. cost_functions.py: it includes all the cost functions of the envs.

Other two files are contained in the folder rewards:

  • plot_rewards.ipynb: it is the notebook where the model performance is plotted. First, the 5 pickles associated with the 5 seeds are combined in only one pickle. Then, the performance is evaluated with various plots.

  • distribution.ipynb: this notebook inspects the distribution of the seeds in InvertedDoublePendulum (Section 6.9 of the thesis).

Results

Our results show significant differences among models performance do exist.

It is the Concrete Dropout NN the clear winner of the model comparison. It reported higher sample efficiency, overall performance and robustness across different seeds in Pendulum, InvertedPendulum, InvertedDoublePendulum, ReacherPyBullet, HalfCheetah, and Hopper. In Walker2D and Ant it was no worse than the others either.

Authors should be aware of the differences found and distinguish between improvements due to better algorithms or due to better models when they present novel methods.

The figures of the evaluation are reported in the folder rewards/images.

Acknowledgment

Special thanks go to the supervisor of this project David Woelfle.

Owner
Giacomo Arcieri
Giacomo Arcieri
Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 87 Jan 03, 2023
Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

IROS21 information To test the code and reproduce the experiments, follow the installation steps in Installation.md. Afterwards, follow the steps in E

11 Oct 29, 2022
Learning to Map Large-scale Sparse Graphs on Memristive Crossbar

Release of AutoGMap:Learning to Map Large-scale Sparse Graphs on Memristive Crossbar For reproduction of our searched model, the Ubuntu OS is recommen

2 Aug 23, 2022
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Unlearnable Examples Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah

Hanxun Huang 98 Dec 07, 2022
Implementation of paper "Towards a Unified View of Parameter-Efficient Transfer Learning"

A Unified Framework for Parameter-Efficient Transfer Learning This is the official implementation of the paper: Towards a Unified View of Parameter-Ef

Junxian He 216 Dec 29, 2022
Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

NAVER/LINE Vision 30 Dec 06, 2022
Using CNN to mimic the driver based on training data from Torcs

Behavioural-Cloning-in-autonomous-driving Using CNN to mimic the driver based on training data from Torcs. Approach First, the data was collected from

Sudharshan 2 Jan 05, 2022
Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado financeiro.

Tutoriais Públicos Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado finan

Trading com Dados 68 Oct 15, 2022
Fantasy Points Prediction and Dream Team Formation

Fantasy-Points-Prediction-and-Dream-Team-Formation Collected Data from open source resources that have over 100 Parameters for predicting cricket play

Akarsh Singh 2 Sep 13, 2022
MMFlow is an open source optical flow toolbox based on PyTorch

Documentation: https://mmflow.readthedocs.io/ Introduction English | 简体中文 MMFlow is an open source optical flow toolbox based on PyTorch. It is a part

OpenMMLab 688 Jan 06, 2023
A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

Bill Pung 1 Jun 05, 2022
This repository contains tutorials for the py4DSTEM Python package

py4DSTEM Tutorials This repository contains tutorials for the py4DSTEM Python package. For more information about py4DSTEM, including installation ins

11 Dec 23, 2022
Retrieve and analysis data from SDSS (Sloan Digital Sky Survey)

Author: Behrouz Safari License: MIT sdss A python package for retrieving and analysing data from SDSS (Sloan Digital Sky Survey) Installation Install

Behrouz 3 Oct 28, 2022
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 190 Dec 30, 2022
Import Python modules from dicts and JSON formatted documents.

Paker Paker is module for importing Python packages/modules from dictionaries and JSON formatted documents. It was inspired by httpimporter. Important

Wojciech Wentland 1 Sep 07, 2022
Node Editor Plug for Blender

NodeEditor Blender的程序化建模插件 Show Current 基本框架:自定义的tree-node-socket、tree中的node与socket采用字典查询、基于socket入度的拓扑排序 数据传递和处理依靠Tree中的字典,socket传递字典key TODO 增加更多的节点

Cuimi 11 Dec 03, 2022
Fast and Simple Neural Vocoder, the Multiband RNNMS

Multiband RNN_MS Fast and Simple vocoder, Multiband RNN_MS. Demo Quick training How to Use System Details Results References Demo ToDO: Link super gre

tarepan 5 Jan 11, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
A hobby project which includes a hand-gesture based virtual piano using a mobile phone camera and OpenCV library functions

Overview This is a hobby project which includes a hand-gesture controlled virtual piano using an android phone camera and some OpenCV library. My moti

Abhinav Gupta 1 Nov 19, 2021
Experiments with Fourier layers on simulation data.

Factorized Fourier Neural Operators This repository contains the code to reproduce the results in our NeurIPS 2021 ML4PS workshop paper, Factorized Fo

Alasdair Tran 57 Dec 25, 2022