A clean and robust Pytorch implementation of PPO on continuous action space.

Last update: Dec 16, 2022

Related tags

Overview

PPO-Continuous-Pytorch

I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable.
And this is a clean and robust Pytorch implementation of PPO on continuous action space. Here is the result:

All the experiments are trained with same hyperparameters.

Dependencies

gym==0.18.3
box2d==2.3.10
numpy==1.21.2
pytorch==1.8.1

How to use my code

Play with trained model

run 'python main.py --write False --render True --Loadmodel True --ModelIdex 400'

Train from scratch

run 'python main.py', where the default enviroment is Pendulum-v0.

Change Enviroment

If you want to train on different enviroments, just run 'python main.py --EnvIdex 0'.
The --EnvIdex can be set to be 0~5, where
'--EnvIdex 0' for 'BipedalWalker-v3'
'--EnvIdex 1' for 'BipedalWalkerHardcore-v3'
'--EnvIdex 2' for 'LunarLanderContinuous-v2'
'--EnvIdex 3' for 'Pendulum-v0'
'--EnvIdex 4' for 'Humanoid-v2'
'--EnvIdex 5' for 'HalfCheetah-v2'

Visualize the training curve

You can use the tensorboard to visualize the training curve. History training curve is saved at '\runs'

Hyperparameter Setting

For more details of Hyperparameter Setting, please check 'main.py'

A clean and robust Pytorch implementation of PPO on continuous action space.

Related tags

Overview

PPO-Continuous-Pytorch

Dependencies

How to use my code

Play with trained model

Train from scratch

Change Enviroment

Visualize the training curve

Hyperparameter Setting

Owner

XinJingHao

A motion tracking system for any arbitaray points in a video frame.

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

Implementation of Axial attention - attending to multi-dimensional data efficiently

Tom-the-AI - A compound artificial intelligence software for Linux systems.

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Code for our CVPR 2021 paper "MetaCam+DSCE"

Bayesian Image Reconstruction using Deep Generative Models

A simple editor for captions in .SRT file extension

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Model parallel transformers in Jax and Haiku

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

A map update dataset and benchmark

HyperDict - Self linked dictionary in Python

Lightweight, Python library for fast and reproducible experimentation :microscope:

Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

Code for Multinomial Diffusion

A task Provided by A respective Artenal Ai and Ml based Company to complete it

A clean and robust Pytorch implementation of PPO on continuous action space.

Related tags

Overview

PPO-Continuous-Pytorch

Dependencies

How to use my code

Play with trained model

Train from scratch

Change Enviroment

Visualize the training curve

Hyperparameter Setting

Owner

XinJingHao

A motion tracking system for any arbitaray points in a video frame.

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

Implementation of Axial attention - attending to multi-dimensional data efficiently

Tom-the-AI - A compound artificial intelligence software for Linux systems.

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Code for our CVPR 2021 paper "MetaCam+DSCE"

Bayesian Image Reconstruction using Deep Generative Models

A simple editor for captions in .SRT file extension

AI创造营 ：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Model parallel transformers in Jax and Haiku

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

A map update dataset and benchmark

HyperDict - Self linked dictionary in Python

Lightweight, Python library for fast and reproducible experimentation :microscope:

Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

Code for Multinomial Diffusion

A task Provided by A respective Artenal Ai and Ml based Company to complete it

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人