Selfplay In MultiPlayer Environments

Related tags

Deep LearningSIMPLE
Overview

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

Selfplay In MultiPlayer Environments
· Report Bug · Request Feature


Table of Contents

  1. About The Project
  2. Getting Started
  3. Tutorial
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgements


About The Project

SIMPLE Diagram

This project allows you to train AI agents on custom-built multiplayer environments, through self-play reinforcement learning.

It implements Proximal Policy Optimisation (PPO), with a built-in wrapper around the multiplayer environments that handles the loading and action-taking of opponents in the environment. The wrapper delays the reward back to the PPO agent, until all opponents have taken their turn. In essence, it converts the multiplayer environment into a single-player environment that is constantly evolving as new versions of the policy network are added to the network bank.

To learn more, check out the accompanying blog post.

This guide explains how to get started with the repo, add new custom environments and tune the hyperparameters of the system.

Have fun!


Getting Started

To get a local copy up and running, follow these simple steps.

Prerequisites

Install Docker and Docker Compose to make use of the docker-compose.yml file

Installation

  1. Clone the repo
    git clone https://github.com/davidADSP/SIMPLE.git
    cd SIMPLE
  2. Build the image and 'up' the container.
    docker-compose up -d
  3. Choose an environment to install in the container (tictactoe, connect4, sushigo and butterfly are currently implemented)
    bash ./scripts/install_env.sh sushigo

Tutorial

This is a quick tutorial to allow you to start using the two entrypoints into the codebase: test.py and train.py.

TODO - I'll be adding more substantial documentation for both of these entrypoints in due course! For now, descriptions of each command line argument can be found at the bottom of the files themselves.


Quickstart

test.py

This entrypoint allows you to play against a trained AI, pit two AIs against eachother or play against a baseline random model.

For example, try the following command to play against a baseline random model in the Sushi Go environment.

docker-compose exec app python3 test.py -d -g 1 -a base base human -e sushigo 

train.py

This entrypoint allows you to start training the AI using selfplay PPO. The underlying PPO engine is from the Stable Baselines package.

For example, you can start training the agent to learn how to play SushiGo with the following command:

docker-compose exec app python3 train.py -r -e sushigo 

After 30 or 40 iterations the process should have achieved above the default threshold score of 0.2 and will output a new best_model.zip to the /zoo/sushigo folder.

Training runs until you kill the process manually (e.g. with Ctrl-C), so do that now.

You can now use the test.py entrypoint to play 100 games silently between the current best_model.zip and the random baselines model as follows:

docker-compose exec app python3 test.py -g 100 -a best_model base base -e sushigo 

You should see that the best_model scores better than the two baseline model opponents.

Played 100 games: {'best_model_btkce': 31.0, 'base_sajsi': -15.5, 'base_poqaj': -15.5}

You can continue training the agent by dropping the -r reset flag from the train.py entrypoint arguments - it will just pick up from where it left off.

docker-compose exec app python3 train.py -e sushigo 

Congratulations, you've just completed one training cycle for the game Sushi Go! The PPO agent will now have to work out a way to beat the model it has just created...


Tensorboard

To monitor training, you can start Tensorboard with the following command:

bash scripts/tensorboard.sh

Navigate to localhost:6006 in a browser to view the output.

In the /zoo/pretrained/ folder there is a pre-trained //best_model.zip for each game, that can be copied up a directory (e.g. to /zoo/sushigo/best_model.zip) if you want to test playing against a pre-trained agent right away.


Custom Environments

You can add a new environment by copying and editing an existing environment in the /environments/ folder.

For the environment to work with the SIMPLE self-play wrapper, the class must contain the following methods (expanding on the standard methods from the OpenAI Gym framework):

__init__

In the initiation method, you need to define the usual action_space and observation_space, as well as two additional variables:

  • n_players - the number of players in the game
  • current_player_num - an integer that tracks which player is currently active  

step

The step method accepts an action from the current active player and performs the necessary steps to update the game environment. It should also it should update the current_player_num to the next player, and check to see if an end state of the game has been reached.

reset

The reset method is called to reset the game to the starting state, ready to accept the first action.

render

The render function is called to output a visual or human readable summary of the current game state to the log file.

observation

The observation function returns a numpy array that can be fed as input to the PPO policy network. It should return a numeric representation of the current game state, from the perspective of the current player, where each element of the array is in the range [-1,1].

legal_actions

The legal_actions function returns a numpy vector of the same length as the action space, where 1 indicates that the action is valid and 0 indicates that the action is invalid.

Please refer to existing environments for examples of how to implement each method.

You will also need to add the environment to the two functions in /utils/register.py - follow the existing examples of environments for the structure.


Parallelisation

The training process can be parallelised using MPI across multiple cores.

For example to run 10 parallel threads that contribute games to the current iteration, you can simply run:

docker-compose exec app mpirun -np 10 python3 train.py -e sushigo 

Roadmap

See the open issues for a list of proposed features (and known issues).


Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the GPL-3.0. See LICENSE for more information.


Contact

David Foster - @davidADSP - [email protected]

Project Link: https://github.com/davidADSP/SIMPLE


Acknowledgements

There are many repositories and blogs that have helped me to put together this repository. One that deserves particular acknowledgement is David's Ha's Slime Volleyball Gym, that also implements multi-agent reinforcement learning. It has helped to me understand how to adapt the callback function to a self-play setting and also to how to implement MPI so that the codebase can be highly parallelised. Definitely worth checking out!


Image Segmentation and Object Detection in Pytorch

Image Segmentation and Object Detection in Pytorch Pytorch-Segmentation-Detection is a library for image segmentation and object detection with report

Daniil Pakhomov 732 Dec 10, 2022
The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Energy-based Conditional Generative Adversarial Network (ECGAN) This is the code for the NeurIPS 2021 paper "A Unified View of cGANs with and without

sianchen 22 May 28, 2022
[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints Official implementation for Reducing Footskate in Human Motion Recon

Virginia Tech Vision and Learning Lab 38 Nov 01, 2022
TFOD-MASKRCNN - Tensorflow MaskRCNN With Python

Tensorflow- MaskRCNN Steps git clone https://github.com/amalaj7/TFOD-MASKRCNN.gi

Amal Ajay 2 Jan 18, 2022
VOLO: Vision Outlooker for Visual Recognition

VOLO: Vision Outlooker for Visual Recognition, arxiv This is a PyTorch implementation of our paper. We present Vision Outlooker (VOLO). We show that o

Sea AI Lab 876 Dec 09, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save SAVE_NAME --data PATH_TO_DATA_DIR --dataset DATASET --model model_name [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

ON-LSTM This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree

Yikang Shen 572 Nov 21, 2022
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation This repository contains code and data f

Zoey Liu 0 Jan 07, 2022
Official code for the ICLR 2021 paper Neural ODE Processes

Neural ODE Processes Official code for the paper Neural ODE Processes (ICLR 2021). Abstract Neural Ordinary Differential Equations (NODEs) use a neura

Cristian Bodnar 50 Oct 28, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

Gal Ben-Zvi 1 Jan 09, 2023
Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Implicit Representations of Meaning in Neural Language Models Preliminaries Create and set up a conda environment as follows: conda create -n state-pr

Belinda Li 39 Nov 03, 2022
Cross-Task Consistency Learning Framework for Multi-Task Learning

Cross-Task Consistency Learning Framework for Multi-Task Learning Tested on numpy(v1.19.1) opencv-python(v4.4.0.42) torch(v1.7.0) torchvision(v0.8.0)

Aki Nakano 2 Jan 08, 2022
Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos This repository is the official tensorflow python implementation

Yasamin Jafarian 287 Jan 06, 2023
DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

Kuang-Jui Hsu 139 Dec 22, 2022
Image augmentation library in Python for machine learning.

Augmentor is an image augmentation library in Python for machine learning. It aims to be a standalone library that is platform and framework independe

Marcus D. Bloice 4.8k Jan 07, 2023
Project repo for the paper SILT: Self-supervised Lighting Transfer Using Implicit Image Decomposition

SILT: Self-supervised Lighting Transfer Using Implicit Image Decomposition (BMVC 2021) Project repo for the paper SILT: Self-supervised Lighting Trans

6 Dec 04, 2022
GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

owl 37 Dec 24, 2022
FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

FusionNet_Pytorch FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics Requirements Pytorch 0.1.11 Pyt

Choi Gunho 102 Dec 13, 2022
The story of Chicken for Club Bing

Chicken Story tl;dr: The time when Microsoft banned my entire country for cheating at Club Bing. (A lot of the details are from memory so I've recreat

Eyal 142 May 16, 2022