Reinforcement learning framework and algorithms implemented in PyTorch.

Related tags

Deep Learningrlkit
Overview

RLkit

Reinforcement learning framework and algorithms implemented in PyTorch.

Implemented algorithms:

To get started, checkout the example scripts, linked above.

What's New

Version 0.2

04/25/2019

  • Use new multiworld code that requires explicit environment registration.
  • Make installation easier by adding setup.py and using default conf.py.

04/16/2019

  • Log how many train steps were called
  • Log env_info and agent_info.

04/05/2019-04/15/2019

  • Add rendering
  • Fix SAC bug to account for future entropy (#41, #43)
  • Add online algorithm mode (#42)

04/05/2019

The initial release for 0.2 has the following major changes:

  • Remove Serializable class and use default pickle scheme.
  • Remove PyTorchModule class and use native torch.nn.Module directly.
  • Switch to batch-style training rather than online training.
    • Makes code more amenable to parallelization.
    • Implementing the online-version is straightforward.
  • Refactor training code to be its own object, rather than being integrated inside of RLAlgorithm.
  • Refactor sampling code to be its own object, rather than being integrated inside of RLAlgorithm.
  • Implement Skew-Fit: State-Covering Self-Supervised Reinforcement Learning, a method for performing goal-directed exploration to maximize the entropy of visited states.
  • Update soft actor-critic to more closely match TensorFlow implementation:
    • Rename TwinSAC to just SAC.
    • Only have Q networks.
    • Remove unnecessary policy regualization terms.
    • Use numerically stable Jacobian computation.

Overall, the refactors are intended to make the code more modular and readable than the previous versions.

Version 0.1

12/04/2018

  • Add RIG implementation

12/03/2018

  • Add HER implementation
  • Add doodad support

10/16/2018

  • Upgraded to PyTorch v0.4
  • Added Twin Soft Actor Critic Implementation
  • Various small refactor (e.g. logger, evaluate code)

Installation

  1. Install and use the included Ananconda environment
$ conda env create -f environment/[linux-cpu|linux-gpu|mac]-env.yml
$ source activate rlkit
(rlkit) $ python examples/ddpg.py

Choose the appropriate .yml file for your system. These Anaconda environments use MuJoCo 1.5 and gym 0.10.5. You'll need to get your own MuJoCo key if you want to use MuJoCo.

  1. Add this repo directory to your PYTHONPATH environment variable or simply run:
pip install -e .
  1. (Optional) Copy conf.py to conf_private.py and edit to override defaults:
cp rlkit/launchers/conf.py rlkit/launchers/conf_private.py
  1. (Optional) If you plan on running the Skew-Fit experiments or the HER example with the Sawyer environment, then you need to install multiworld.

DISCLAIMER: the mac environment has only been tested without a GPU.

For an even more portable solution, try using the docker image provided in environment/docker. The Anaconda env should be enough, but this docker image addresses some of the rendering issues that may arise when using MuJoCo 1.5 and GPUs. The docker image supports GPU, but it should work without a GPU. To use a GPU with the image, you need to have nvidia-docker installed.

Using a GPU

You can use a GPU by calling

import rlkit.torch.pytorch_util as ptu
ptu.set_gpu_mode(True)

before launching the scripts.

If you are using doodad (see below), simply use the use_gpu flag:

run_experiment(..., use_gpu=True)

Visualizing a policy and seeing results

During training, the results will be saved to a file called under

LOCAL_LOG_DIR/
   
    /
    

    
   
  • LOCAL_LOG_DIR is the directory set by rlkit.launchers.config.LOCAL_LOG_DIR. Default name is 'output'.
  • is given either to setup_logger.
  • is auto-generated and based off of exp_prefix.
  • inside this folder, you should see a file called params.pkl. To visualize a policy, run
(rlkit) $ python scripts/run_policy.py LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

or

(rlkit) $ python scripts/run_goal_conditioned_policy.py LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

depending on whether or not the policy is goal-conditioned.

If you have rllab installed, you can also visualize the results using rllab's viskit, described at the bottom of this page

tl;dr run

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

to visualize all experiments with a prefix of exp_prefix. To only visualize a single run, you can do

python rllab/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/<folder name>

Alternatively, if you don't want to clone all of rllab, a repository containing only viskit can be found here. You can similarly visualize results with.

python viskit/viskit/frontend.py LOCAL_LOG_DIR/<exp_prefix>/

This viskit repo also has a few extra nice features, like plotting multiple Y-axis values at once, figure-splitting on multiple keys, and being able to filter hyperparametrs out.

Visualizing a goal-conditioned policy

To visualize a goal-conditioned policy, run

(rlkit) $ python scripts/run_goal_conditioned_policy.py
LOCAL_LOG_DIR/
   
    /
    
     /params.pkl

    
   

Launching jobs with doodad

The run_experiment function makes it easy to run Python code on Amazon Web Services (AWS) or Google Cloud Platform (GCP) by using this fork of doodad.

It's as easy as:

from rlkit.launchers.launcher_util import run_experiment

def function_to_run(variant):
    learning_rate = variant['learning_rate']
    ...

run_experiment(
    function_to_run,
    exp_prefix="my-experiment-name",
    mode='ec2',  # or 'gcp'
    variant={'learning_rate': 1e-3},
)

You will need to set up parameters in config.py (see step one of Installation). This requires some knowledge of AWS and/or GCP, which is beyond the scope of this README. To learn more, more about doodad, go to the repository, which is based on this original repository.

Requests for pull-requests

  • Implement policy-gradient algorithms.
  • Implement model-based algorithms.

Legacy Code (v0.1.2)

For Temporal Difference Models (TDMs) and the original implementation of Reinforcement Learning with Imagined Goals (RIG), run git checkout tags/v0.1.2.

References

The algorithms are based on the following papers

Offline Meta-Reinforcement Learning with Online Self-Supervision Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine. arXiv preprint, 2021.

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning. Vitchyr H. Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine. ICML, 2020.

Visual Reinforcement Learning with Imagined Goals. Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine. NeurIPS 2018.

Temporal Difference Models: Model-Free Deep RL for Model-Based Control. Vitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine. ICLR 2018.

Hindsight Experience Replay. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba. NeurIPS 2017.

Deep Reinforcement Learning with Double Q-learning. Hado van Hasselt, Arthur Guez, David Silver. AAAI 2016.

Human-level control through deep reinforcement learning. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis. Nature 2015.

Soft Actor-Critic Algorithms and Applications. Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine. arXiv preprint, 2018.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. ICML, 2018.

Addressing Function Approximation Error in Actor-Critic Methods Scott Fujimoto, Herke van Hoof, David Meger. ICML, 2018.

Credits

This repository was initially developed primarily by Vitchyr Pong, until July 2021, at which point it was transferred to the RAIL Berkeley organization and is primarily maintained by Ashvin Nair. Other major collaborators and contributions:

A lot of the coding infrastructure is based on rllab. The serialization and logger code are basically a carbon copy of the rllab versions.

The Dockerfile is based on the OpenAI mujoco-py Dockerfile.

The SMAC code builds off of the PEARL code, which built off of an older RLKit version.

Owner
Robotic AI & Learning Lab Berkeley
Robotic AI & Learning Lab Berkeley
Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"

Adversarial Neuron Pruning Purifies Backdoored Deep Models Code for NeurIPS 2021 "Adversarial Neuron Pruning Purifies Backdoored Deep Models" by Dongx

Dongxian Wu 31 Dec 11, 2022
Realtime_Multi-Person_Pose_Estimation

Introduction Multi Person PoseEstimation By PyTorch Results Require Pytorch Installation git submodule init && git submodule update Demo Download conv

tensorboy 1.3k Jan 05, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks

Discovering Dynamic Salient Regions with Spatio-Temporal Graph Neural Networks This is the official code for DyReg model inroduced in Discovering Dyna

Bitdefender Machine Learning 11 Nov 08, 2022
Bravia core script for python

Bravia-Core-Script You need to have a mandatory account If this L3 does not work, try another L3. enjoy

5 Dec 26, 2021
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

Networks Learning 11 Dec 09, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch.

MPDL---TODO This repository is an implementation of our NeurIPS 2021 paper (Stylized Dialogue Generation with Multi-Pass Dual Learning) in PyTorch. Ci

CodebaseLi 3 Nov 27, 2022
A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

GAM ⠀⠀ A PyTorch implementation of Graph Classification Using Structural Attention (KDD 2018). Abstract Graph classification is a problem with practic

Benedek Rozemberczki 259 Dec 05, 2022
Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

Customers Segmentation using PHP and Rubix ML PHP Library Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can !

Mickaël Andrieu 11 Oct 08, 2022
Code for Boundary-Aware Segmentation Network for Mobile and Web Applications

BASNet Boundary-Aware Segmentation Network for Mobile and Web Applications This repository contain implementation of BASNet in tensorflow/keras. comme

Hamid Ali 8 Nov 24, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023
AirCode: A Robust Object Encoding Method

AirCode This repo contains source codes for the arXiv preprint "AirCode: A Robust Object Encoding Method" Demo Object matching comparison when the obj

Chen Wang 30 Dec 09, 2022
This is a code repository for paper OODformer: Out-Of-Distribution Detection Transformer

OODformer: Out-Of-Distribution Detection Transformer This repo is the official the implementation of the OODformer: Out-Of-Distribution Detection Tran

34 Dec 02, 2022
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
Code for paper "Multi-level Disentanglement Graph Neural Network"

Multi-level Disentanglement Graph Neural Network (MD-GNN) This is a PyTorch implementation of the MD-GNN, and the code includes the following modules:

Lirong Wu 6 Dec 29, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
Reverse engineering Rosetta 2 in M1 Mac

Project Champollion About this project Rosetta 2 is an emulation mechanism to run the x86_64 applications on Arm-based Apple Silicon with Ahead-Of-Tim

FFRI Security, Inc. 258 Jan 07, 2023
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022