An efficient framework for reinforcement learning.

Last update: Nov 30, 2022

Overview

rl: An efficient framework for reinforcement learning

Requirements
Introduction
PPO
Test

Requirements

name	version
Python	>=3.7
numpy	>=1.19
torch	>=1.7
tensorboard	>=2.5
tensorboardX	>=2.4
gym	>=0.18.3

Make sure your Python environment is activated before installing following requirements.
pip install -U gym tensorboard tensorboardx

Introduction

Quick Start

CartPole-v0:
python demo.py
Enter the following commands in terminal to start training Pendulum-v0:
python demo.py --env_name Pendulum-v0 --target_reward -250.0
Use Recurrent Neural Network:
python demo.py --env_name Pendulum-v0 --target_reward -250.0 --use_rnn --log_dir Pendulum-v0_RNN
Open a new terminal:
tensorboard --logdir=result
Then you can access the training information by visiting http://localhost:6006/ in browser.

Structure

core/ Reinforcement Learning core module
- log.py/ logging
- ppo.py/ Proximal Policy Optimization algorithm
- network.py definition of actor and critic network
env/ environment for multiprocessing
- test_env.py test environment
- vec_env.py wrapped vector environment
result/ training curves and models
demo.py demonstration

Proximal Policy Optimization

PPO is an on-policy and model-free reinforcement learning algorithm.

Components

Generalized Advantage Estimation (GAE)
Gate Recurrent Unit (GRU)

Hyperparameters

hyperparameter	note	value
env_num	number of parallel processes	16
chunk_len	BPTT for GRU	10
eps	clipping parameter	0.2
gamma	discount factor	0.99
gae_lambda	trade-off between TD and MC	0.95
entropy_coef	coefficient of entropy	0.05
ppo_epoch	data usage	5
adv_norm	normalized advantage	1 (True)
max_norm	gradient clipping (L2)	20.0
weight_decay	weight decay (L2)	1e-6
lr_actor	learning rate of actor network	1e-3
lr_critic	learning rate of critic network	1e-3

Test Environment

A simple test environment for verifying the effectiveness of this algorithm (of course, the algorithm can also be implemented by yourself).
Simple logic with less code.

Mechanism

The environment chooses one number randomly in every step, and returns the one-hot matrix.
If the action taken matches the number chosen in the last 3 steps, you will get a complete reward of 1.

>>> from env.test_env import TestEnv
>>> env = TestEnv()
>>> env.seed(0)
>>> env.reset()
array([1., 0., 0.], dtype=float32)
>>> env.step(9 * 0 + 3 * 0 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 1 + 3 * 0 + 1 * 0)
(array([1., 0., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 1., 0.], dtype=float32), 0.0, False, {'str': 'Completely wrong.'})
>>> env.step(9 * 0 + 3 * 1 + 1 * 0)
(array([0., 0., 1.], dtype=float32), 0.6666666666666666, False, {'str': 'Partially correct.'})
>>> env.step(9 * 2 + 3 * 0 + 1 * 0)
(array([1., 0., 0.], dtype=float32), 0.3333333333333333, False, {'str': 'Partially correct.'})
>>> env.step(9 * 0 + 3 * 2 + 1 * 1)
(array([0., 0., 1.], dtype=float32), 1.0, False, {'str': 'Completely correct.'})
>>>

Convergence Reward

General RL algorithms will achieve an average reward of 55.5.
Because of the state memory unit, RNN based RL algorithms can reach the goal of 100.0.

2021, ICCD Lab, Dalian University of Technology. Author: Jingcheng Jiang.

An efficient framework for reinforcement learning.

Related tags

Overview

rl: An efficient framework for reinforcement learning

Requirements

Introduction

Quick Start

Structure

Proximal Policy Optimization

Components

Hyperparameters

Test Environment

Mechanism

Convergence Reward

Owner

Semi-supervised semantic segmentation needs strong, varied perturbations

An essential implementation of BYOL in PyTorch + PyTorch Lightning

PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Unifying Global-Local Representations in Salient Object Detection with Transformer

A3C LSTM Atari with Pytorch plus A3G design

Fashion Recommender System With Python

A universal memory dumper using Frida

Unsupervised Feature Ranking via Attribute Networks.

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

Example of semantic segmentation in Keras

A Keras implementation of CapsNet in the paper: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules

3D Human Pose Machines with Self-supervised Learning

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Official Repository for Machine Learning class - Physics Without Frontiers 2021

An efficient framework for reinforcement learning.

Related tags

Overview

rl: An efficient framework for reinforcement learning

Requirements

Introduction

Quick Start

Structure

Proximal Policy Optimization

Components

Hyperparameters

Test Environment

Mechanism

Convergence Reward

Owner

Semi-supervised semantic segmentation needs strong, varied perturbations

An essential implementation of BYOL in PyTorch + PyTorch Lightning

PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Unifying Global-Local Representations in Salient Object Detection with Transformer

A3C LSTM Atari with Pytorch plus A3G design

Fashion Recommender System With Python

A universal memory dumper using Frida

Unsupervised Feature Ranking via Attribute Networks.

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

Example of semantic segmentation in Keras

A Keras implementation of CapsNet in the paper: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules

3D Human Pose Machines with Self-supervised Learning

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Official Repository for Machine Learning class - Physics Without Frontiers 2021

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.