Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Last update: Dec 31, 2022

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name	Type	Estimator	Paper	File
Q-Learning	Value-based / Off policy	TD	Watkins et al. Q-Learning. Machine Learning, 1992	q_learning.py
REINFORCE	Policy-based On policy	MC	Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000.	reinforce.py
DQN	Value-based / Off policy	TD	Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015.	doing
A2C	Actor-Critic / On policy	n-step TD	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.	a2c.py
A3C	Actor-Critic / On policy	n-step TD	.Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016	a3c.py
ACER	Actor-Critic / On policy	GAE	Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017.	doing
ACKTR	Actor-Critic / On policy	GAE	Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017.	doing
PPO	Actor-Critic / On policy	GAE	Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017.	ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component	Description
policy	neural network model
gamma	discount factor of cumulative reward
lr	learning rate. i.e. `lr_actor`, `lr_critic`
lr_decay	weight decay to schedule the learning rate
lr_scheduler	scheduler for the learning rate
coef_critic_loss	coefficient of critic loss
coef_entropy_loss	coefficient of entropy loss
writer	summary writer to record information
buffer	replay buffer to store historical trajectories
use_cuda	use GPU
clip_grad	gradients clipping
max_grad_norm	maximum norm of gradients clipped
norm_advantage	advantage normalization
open_tb	open summary writer
open_tqdm	open process bar

Methods

Methods	Description
preprocess_obs()	preprocess observation before input into the neural network
select_action()	use actor network to select an action based on the policy distribution.
estimate_obs()	use critic network to estimate the value of observation
update()	update the parameter by calculate losses and gradients
train()	set the neural network to train mode
eval()	set the neural network to evaluate mode
save()	save the model parameters
load()	load the model parameters

Update & To-do & Limitations

Update History

2021-12-09 ADD TRICK:norm_critic_loss in PPO
2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
2021-12-07 ADD ALGO: A3C
2021-12-05 ADD ALGO: PPO
2021-11-28 ADD ALGO: A2C
2021-11-20 ADD ALGO: Q learning, Reinforce

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Related tags

Overview

PyTorch RL Minimal Implementations

RL Algorithms List

Quick Start

Requirements

Abstract Agent

Components / Parameters

Methods

Update & To-do & Limitations

Update History

To-do List

Current Limitations

Reference & Acknowledgements

Owner

Gemini Light

Free course that takes you from zero to Reinforcement Learning PRO 🦸🏻‍🦸🏽

Segmentation models with pretrained backbones. PyTorch.

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

A Dataset for Direct Quotation Extraction and Attribution in News Articles.

Video2x - A lossless video/GIF/image upscaler achieved with waifu2x, Anime4K, SRMD and RealSR.

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

A demo of how to use JAX to create a simple gravity simulation

Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

PyTorch implementations of Generative Adversarial Networks.

PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

Implementation for Homogeneous Unbalanced Regularized Optimal Transport

Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch

Extreme Rotation Estimation using Dense Correlation Volumes

AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

Nicely is a real-time Feedback and Intervention Program Depression is a prevalent issue across all age groups, socioeconomic classes, and cultural identities.

An example of time series augmentation methods with Keras

Flax is a neural network ecosystem for JAX that is designed for flexibility.

本步态识别系统主要基于GaitSet模型进行实现

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"