PyTorch implementations of deep reinforcement learning algorithms and environments

Overview

Deep Reinforcement Learning Algorithms with PyTorch

Travis CI contributions welcome

RL PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

(To help you remember things you learn about machine learning in general write them in Save All and try out the public deck there about Fast AI's machine learning textbook.)

Algorithms Implemented

  1. Deep Q Learning (DQN) (Mnih et al. 2013)
  2. DQN with Fixed Q Targets (Mnih et al. 2013)
  3. Double DQN (DDQN) (Hado van Hasselt et al. 2015)
  4. DDQN with Prioritised Experience Replay (Schaul et al. 2016)
  5. Dueling DDQN (Wang et al. 2016)
  6. REINFORCE (Williams et al. 1992)
  7. Deep Deterministic Policy Gradients (DDPG) (Lillicrap et al. 2016 )
  8. Twin Delayed Deep Deterministic Policy Gradients (TD3) (Fujimoto et al. 2018)
  9. Soft Actor-Critic (SAC) (Haarnoja et al. 2018)
  10. Soft Actor-Critic for Discrete Actions (SAC-Discrete) (Christodoulou 2019)
  11. Asynchronous Advantage Actor Critic (A3C) (Mnih et al. 2016)
  12. Syncrhonous Advantage Actor Critic (A2C)
  13. Proximal Policy Optimisation (PPO) (Schulman et al. 2017)
  14. DQN with Hindsight Experience Replay (DQN-HER) (Andrychowicz et al. 2018)
  15. DDPG with Hindsight Experience Replay (DDPG-HER) (Andrychowicz et al. 2018 )
  16. Hierarchical-DQN (h-DQN) (Kulkarni et al. 2016)
  17. Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) (Florensa et al. 2017)
  18. Diversity Is All You Need (DIAYN) (Eyensbach et al. 2018)

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add more hierarchical RL algorithms soon.

Environments Implemented

  1. Bit Flipping Game (as described in Andrychowicz et al. 2018)
  2. Four Rooms Game (as described in Sutton et al. 1998)
  3. Long Corridor Game (as described in Kulkarni et al. 2016)
  4. Ant-{Maze, Push, Fall} (as desribed in Nachum et al. 2018 and their accompanying code)

Results

1. Cart Pole and Mountain Car

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Mountain Car. The mean result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files results/Cart_Pole.py and results/Mountain_Car.py.

Cart Pole and Mountain Car Results

2. Hindsight Experience Replay (HER) Experiements

Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

HER Experiment Results

3. Hierarchical Reinforcement Learning Experiments

The results on the left below show the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.

The results on the right show the performance of DDQN and algorithm Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL) from Florensa et al. 2017. DDQN is used as the comparison because the implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training for SNN-HRL were used for pre-training which is why there is no reward for those episodes.

Long Corridor and Four Rooms

Usage

The repository's high-level structure is:

├── agents                    
    ├── actor_critic_agents   
    ├── DQN_agents         
    ├── policy_gradient_agents
    └── stochastic_policy_search_agents 
├── environments   
├── results             
    └── data_and_graphs        
├── tests
├── utilities             
    └── data structures            

i) To watch the agents learn the above games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt

python results/Cart_Pole.py

For other games change the last line to one of the other files in the Results folder.

ii) To train the agents on another game

Most Open AI gym environments should work. All you would need to do is change the config.environment field (look at Results/Cart_Pole.py for an example of this).

You can also play with your own custom game if you create a separate class that inherits from gym.Env. See Environments/Four_Rooms_Environment.py for an example of a custom environment and then see the script Results/Four_Rooms.py to see how to have agents play the environment.

Owner
Petros Christodoulou
Petros Christodoulou
A hifiasm fork for metagenome assembly using Hifi reads.

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.

44 Jul 10, 2022
Official repo for our 3DV 2021 paper "Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements".

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy Paper. Pr

Yu Rong 41 Dec 13, 2022
TensorFlow (Python API) implementation of Neural Style

neural-style-tf This is a TensorFlow implementation of several techniques described in the papers: Image Style Transfer Using Convolutional Neural Net

Cameron 3.1k Jan 02, 2023
Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

Vision Transformer(ViT) in Tensorflow2 Tensorflow2 implementation of the Vision Transformer(ViT). This repository is for An image is worth 16x16 words

sungjun lee 42 Dec 27, 2022
Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai

Coursera-deep-learning-specialization - Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks an

Aman Chadha 1.7k Jan 08, 2023
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app

5 Dec 06, 2022
Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

Phonon-Vibration-Viewer (For GPUMD) Visualizing lattice vibration information from phonon dispersion for primitive atoms. In this tutorial, we will in

Liangting 6 Dec 10, 2022
Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

SML (ICCV 2021, Oral) : Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Standardi

SangHun 61 Dec 27, 2022
An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

알고리즘 스터디 🔥 부스트캠프 웹모바일 6기 iOS 10조의 알고리즘 스터디 입니다. 개인적인 사정 등으로 S034, S055만 참가하였습니다. 스터디 목적 상진: 코테 합격 + 부캠끝나고 아침에 일어나기 위해 필요한 사이클 기완: 꾸준하게 자리에 앉아 공부하기 +

2 Jan 11, 2022
LogAvgExp - Pytorch Implementation of LogAvgExp

LogAvgExp - Pytorch Implementation of LogAvgExp for Pytorch Install $ pip instal

Phil Wang 31 Oct 14, 2022
A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

This repository contains the similarity metrics designed and evaluated in the paper, and instructions and code to re-run the experiments. Implementation in the deep-learning framework PyTorch

Steffen 86 Dec 27, 2022
Adaptive Attention Span for Reinforcement Learning

Adaptive Transformers in RL Official implementation of Adaptive Transformers in RL In this work we replicate several results from Stabilizing Transfor

100 Nov 15, 2022
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
UFT - Universal File Transfer With Python

UFT 2.0.0 UFT (Universal File Transfer) is a CLI tool , which can be used to upl

Merwin 1 Feb 18, 2022
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Sehoon Kim 139 Dec 27, 2022
ICON: Implicit Clothed humans Obtained from Normals

ICON: Implicit Clothed humans Obtained from Normals arXiv, December 2021. Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black Table of C

Yuliang Xiu 1.1k Dec 30, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022
A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

Fully Distributed CIDACS-RL The CIDACS-RL is a brazillian record linkage tool suitable to integrate large amount of data with high accuracy. However,

Robespierre Pita 5 Nov 04, 2022
A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

MADGRAD Optimization Method A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization pip install madgrad Try it out! A best

Meta Research 774 Dec 31, 2022
StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

StyleGAN2 with adaptive discriminator augmentation (ADA) — Official TensorFlow implementation Training Generative Adversarial Networks with Limited Da

NVIDIA Research Projects 1.7k Dec 29, 2022