[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

Overview

PG-MORL

This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control (ICML 2020).

In this paper, we propose an evolutionary learning algorithm to compute a high-quality and dense Pareto solutions for multi-objective continuous robot control problems. We also design seven multi-objective continuous control benchmark problems based on Mujoco, which are also included in this repository. This repository also contains the code for the baseline algorithms in the paper.

teaser

Installation

Prerequisites

  • Operating System: tested on Ubuntu 16.04 and Ubuntu 18.04.
  • Python Version: >= 3.7.4.
  • PyTorch Version: >= 1.3.0.
  • MuJoCo : install mujoco and mujoco-py of version 2.0 by following the instructions in mujoco-py.

Install Dependencies

You can either install the dependencies in a conda virtual env (recomended) or manually.

For conda virtual env installation, simply create a virtual env named pgmorl by:

conda env create -f environment.yml

If you prefer to install all the dependencies by yourself, you could open environment.yml in editor to see which packages need to be installed by pip.

Run the Code

The training related code are in the folder morl. We provide the scripts in scrips folder to run our algorithm/baseline algorithms on each problem described in the paper, and also provide several visualization scripts in scripts/plot folder for you to visualize the computed Pareto policies and the training process.

Precomputed Pareto Results

While you can run the training code the compute the Pareto policies from scratch by following the training steps below, we also provide the precomputed Pareto results for each problem. You can download them for each problem separately in this google drive link and directly visualize them with the visualization instructions to play with the results. After downloading the precomputed results, you can unzip it, create a results folder under the project root directory, and put the downloaded file inside.

Benchmark Problems

We design seven multi-objective continuous control benchmark problems based on Mujoco simulation, including Walker2d-v2, HalfCheetah-v2, Hopper-v2, Ant-v2, Swimmer-v2, Humanoid-v2, and Hopper-v3. A suffix of -v3 indicates a three-objective problem. The reward (i.e. objective) functions in each problem are designed to have similar scales. All environments code can be found in environments/mujoco folder. To avoid conflicting to the original mujoco environment names, we add a MO- prefix to the name of each environment. For example, the environment name for Walker2d-v2 is MO-Walker2d-v2.

Train

The main entrance of the training code is at morl/run.py. We provide a training script in scripts folder for each problem for you to easily start with. You can just follow the following steps to see how to run the training for each problem by each algorithm (our algorithm and baseline algorithms).

  • Enter the project folder

    cd PGMORL
    
  • Activate the conda env:

    conda activate pgmorl
    
  • To run our algorithm on Walker2d-v2 for a single run:

    python scripts/walker2d-v2.py --pgmorl --num-seeds 1 --num-processes 1
    

    You can also set other flags as arguments to run the baseline algorithms (e.g. --ra, --moead, --pfa, --random). Please refer to the python scripts for more details about the arguments.

  • By default, the results are stored in results/[problem name]/[algorithm name]/[seed idx].

Visualization

  • We provide a script to visualize the computed/downloaded Pareto results.

    python scripts/plot/ep_obj_visualize_2d.py --env MO-Walker2d-v2 --log-dir ./results/Walker2d-v2/pgmorl/0/
    

    You can replace MO-Walker2d-v2 to your problem name, and replace the ./results/Walker2d-v2/pgmorl/0 by the path to your stored results.

    It will show a plot of the computed Pareto policies in the performance space. By double-click the point in the plot, it will automatically open a new window and render the simulation for the selected policy.

  • We also provide a script to help you visualize the evolution process of the policy population.

    python scripts/plot/training_visualize_2d.py --env MO-Walker2d-v2 --log-dir ./results/Walker2d-v2/pgmorl/0/
    

    It will plot the policy population (gray points) in each generation with some other useful information. The black points are the policies on the Pareto front, the green circles are the selected policies to be optimized in next generation, the red points are the predicted offsprings and the green points are the real offsprings. You can interact with the plot with the keyboard. For example, be pressing left/right, you can evolve the policy population by generation. You can refer to the plot scripts for the full description of the allowable operations.

Reproducibility

We run all our experiments on VM instances with 96 Intel Skylake vCPUs and 86.4G memory on Google Cloud Platform without GPU.

Acknowledgement

We use the implementation of pytorch-a2c-ppo-acktr-gail as the underlying PPO implementation and modify it into our Multi-Objective Policy Gradient algorithm.

Citation

If you find our paper or code is useful, please consider citing:

@inproceedings{xu2020prediction,
  title={Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control},
  author={Xu, Jie and Tian, Yunsheng and Ma, Pingchuan and Rus, Daniela and Sueda, Shinjiro and Matusik, Wojciech},
  booktitle={Proceedings of the 37th International Conference on Machine Learning},
  year={2020}
}
Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Explaining in Style: Official TensorFlow Colab Explaining in Style: Training a GAN to explain a classifier in StyleSpace Oran Lang, Yossi Gandelsman,

Google 197 Nov 08, 2022
[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

MMChat This repo contains the code and data for the LREC2022 paper MMChat: Multi-Modal Chat Dataset on Social Media. Dataset MMChat is a large-scale d

Silver 47 Jan 03, 2023
Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters"

Official Code Release for "CLIP-Adapter: Better Vision-Language Models with Feature Adapters" Pipeline of CLIP-Adapter CLIP-Adapter is a drop-in modul

peng gao 157 Dec 26, 2022
Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Türkiye Mobese Görüntü Takip Türkiye Mobese görüntülerinde OPENCV ve Yolo ile takip sistemi Multiple Object Tracking System in Turkish Mobese with OPE

15 Dec 22, 2022
A copy of Ares that costs 30 fucking dollars.

Finalement, j'ai décidé d'abandonner cette idée, je me suis comporté comme un enfant qui été en colère. Comme m'ont dit certaines personnes j'ai des c

Bleu 24 Apr 14, 2022
Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Pytorch code for SS-Net This is a pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021). Environment Code is tested

Sun Ran 1 May 18, 2022
A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

OutliersSlidingWindows A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows" Dataset generatio

PaoloPellizzoni 0 Jan 05, 2022
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
Boundary-aware Transformers for Skin Lesion Segmentation

Boundary-aware Transformers for Skin Lesion Segmentation Introduction This is an official release of the paper Boundary-aware Transformers for Skin Le

Jiacheng Wang 79 Dec 16, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Kun Liu*, Yao Fu*, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao. Noisy-Labeled NER with Confidence Estimation. NAACL 2021. [arxiv]

30 Nov 12, 2022
Arquitetura e Desenho de Software.

S203 Este é um repositório dedicado às aulas de Arquitetura e Desenho de Software, cuja sigla é "S203". E agora, José? Como não tenho muito a falar aq

Fabio 7 Oct 23, 2021
PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Anti-Backdoor Learning PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Check the unlearning effect

Yige-Li 51 Dec 07, 2022
This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

ReceptiveFieldAnalysisToolbox This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures usin

84 Nov 23, 2022
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 27, 2022
利用python脚本实现微信、支付宝账单的合并,并保存到excel文件实现自动记账,可查看可视化图表。

KeepAccounts_v2.0 KeepAccounts.exe和其配套表格能够实现微信、支付宝官方导出账单的读取合并,为每笔帐标记类型,并按月份和类型生成可视化图表。再也不用消费一笔记一笔,每月仅需10分钟,记好所有的帐。 作者: MickLife Bilibili: https://spac

159 Jan 01, 2023
A fast and easy to use, moddable, Python based Minecraft server!

PyMine PyMine - The fastest, easiest to use, Python-based Minecraft Server! Features Note: This list is not always up to date, and doesn't contain all

PyMine 144 Dec 30, 2022
Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

DeepCourse: Deep Learning for Computer Vision arthurdouillard.com/deepcourse/ This is a course I'm giving to the French engineering school EPITA each

Arthur Douillard 113 Nov 29, 2022
Developing your First ML Workflow of the AWS Machine Learning Engineer Nanodegree Program

Exercises and project documentation for the 3. Developing your First ML Workflow of the AWS Machine Learning Engineer Nanodegree Program

Simona Mircheva 1 Jan 13, 2022
Numba-accelerated Pythonic implementation of MPDATA with examples in Python, Julia and Matlab

PyMPDATA PyMPDATA is a high-performance Numba-accelerated Pythonic implementation of the MPDATA algorithm of Smolarkiewicz et al. used in geophysical

Atmospheric Cloud Simulation Group @ Jagiellonian University 15 Nov 23, 2022