RL agent to play μRTS with Stable-Baselines3

Overview

Gym-μRTS with Stable-Baselines3/PyTorch

This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS using Stable-Baselines3 library. Apart from reproducibility, this might open access to a diverse set of well tested algorithms, and toolings for training, evaluations, and more.

Original paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.

Original code: gym-microrts-paper.

demo.gif

Install

Prerequisites:

  • Python 3.7+
  • Java 8.0+
  • FFmpeg (for video capturing)
git clone https://github.com/kachayev/gym-microrts-paper-sb3
cd gym-microrts-paper-sb3
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note that I use newer version of gym-microrts compared to the one that was originally used for the paper.

Training

To traing an agent:

$ python ppo_gridnet_diverse_encode_decode_sb3.py

If everything is setup correctly, you'll see typicall SB3 verbose logging:

Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 2e+03    |
|    ep_rew_mean     | 0.0      |
| time/              |          |
|    fps             | 179      |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 2048     |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.72e+03     |
|    ep_rew_mean          | -5.0         |
| time/                   |              |
|    fps                  | 55           |
|    iterations           | 2            |
|    time_elapsed         | 74           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0056759235 |
|    clip_fraction        | 0.0861       |
|    clip_range           | 0.2          |
|    entropy_loss         | -5.65        |
|    explained_variance   | 0.412        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.024       |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.00451     |
|    value_loss           | 0.00413      |
------------------------------------------

As soon as correctness of the implementation is verified, I will provide details on how to use RL Baselines3 Zoo for training and evaluations.

Implementational Caveats

A few notes / pain points regarding the implementation of the alrogithms, and the process of integrating it with stable-baselines3:

  • Gym does not ship a space for "array of multidiscrete" use case (let's be honest, it's not very common). But it gives an option for defining your space when necessary. A new space, when defined, is not easy to integrate into SB3. In a few different places SB3 raises NotImplementedError facing unknown space (example 1, example 2).
  • Seems like switching to fully rolled out MutliDiscrete space definition has a significant performance penalty. Still investigating if this can be improved.
  • Invalid masking is implemented by passing masks into observations from the wrapper (the observation space is replaced with gym.spaces.Dict to hold both observations and masks). By doing it this way, masks are now available for policy, and fit rollout buffer layout. Masking is implemented by setting logits into -inf (or to a rather small number).

Look for xxx(hack) comments in the code for more details.

Owner
Oleksii Kachaiev
Principal Software Engineer @ Riot, League of Legends Data/ML/AI. Research interests: human-level intelligence for RTS games and complex open world simulations.
Oleksii Kachaiev
Efficient 6-DoF Grasp Generation in Cluttered Scenes

Contact-GraspNet Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, Dieter

NVIDIA Research Projects 148 Dec 28, 2022
Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

Face Identity Disentanglement via Latent Space Mapping Description Official Implementation of the paper Face Identity Disentanglement via Latent Space

150 Dec 07, 2022
An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

알고리즘 스터디 🔥 부스트캠프 웹모바일 6기 iOS 10조의 알고리즘 스터디 입니다. 개인적인 사정 등으로 S034, S055만 참가하였습니다. 스터디 목적 상진: 코테 합격 + 부캠끝나고 아침에 일어나기 위해 필요한 사이클 기완: 꾸준하게 자리에 앉아 공부하기 +

2 Jan 11, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022
Bytedance Inc. 2.5k Jan 06, 2023
GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

GUPNet This is the official implementation of "Geometry Uncertainty Projection Network for Monocular 3D Object Detection". citation If you find our wo

Yan Lu 103 Dec 28, 2022
Character Controllers using Motion VAEs

Character Controllers using Motion VAEs This repo is the codebase for the SIGGRAPH 2020 paper with the title above. Please find the paper and demo at

Electronic Arts 165 Jan 03, 2023
Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV File YOLOv3 weight can be downloaded

Ngoc Quyen Ngo 2 Mar 27, 2022
To prepare an image processing model to classify the type of disaster based on the image dataset

Disaster Classificiation using CNNs bunnysaini/Disaster-Classificiation Goal To prepare an image processing model to classify the type of disaster bas

Bunny Saini 1 Jan 24, 2022
Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
Automatic detection and classification of Covid severity degree in LUS (lung ultrasound) scans

Final-Project Final project in the Technion, Biomedical faculty, by Mor Ventura, Dekel Brav & Omri Magen. Subproject 1: Automatic Detection of LUS Cha

Mor Ventura 1 Dec 18, 2021
Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
Learning to Segment Instances in Videos with Spatial Propagation Network

Learning to Segment Instances in Videos with Spatial Propagation Network This paper is available at the 2017 DAVIS Challenge website. Check our result

Jingchun Cheng 145 Sep 28, 2022
Create images and texts with the First Order Generative Adversarial Networks

First Order Divergence for training GANs This repository contains code accompanying the paper First Order Generative Advesarial Netoworks The majority

Zalando Research 35 Dec 11, 2021
Malware Env for OpenAI Gym

Malware Env for OpenAI Gym Citing If you use this code in a publication please cite the following paper: Hyrum S. Anderson, Anant Kharkar, Bobby Fila

ENDGAME 563 Dec 29, 2022
Rotation Robust Descriptors

RoRD Rotation-Robust Descriptors and Orthographic Views for Local Feature Matching Project Page | Paper link Evaluation and Datasets MMA : Training on

Udit Singh Parihar 25 Nov 15, 2022
MISSFormer: An Effective Medical Image Segmentation Transformer

MISSFormer Code for paper "MISSFormer: An Effective Medical Image Segmentation Transformer". Please read our preprint at the following link: paper_add

Fong 22 Dec 24, 2022
Code for weakly supervised segmentation of a single class

SingleClassRL Implementation of weak single object segmentation from paper "Regularized Loss for Weakly Supervised Single Class Semantic Segmentation"

16 Nov 14, 2022
Analyzing basic network responses to novel classes

novelty-detection Analyzing how AlexNet responds to novel classes with varying degrees of similarity to pretrained classes from ImageNet. If you find

Noam Eshed 34 Oct 02, 2022