Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Last update: Sep 16, 2022

Related tags

Overview

Overcooked-AI

We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm.
In this repository, we implemented behavior cloning(BC), offline MADDPG, MADDPG+REM (MADDPG w/ REM), MADDPG+BCQ (MADDPG w/ BCQ) with pytorch. Now, BCQ is in ' Working In Progress', and it's not implemented completely.

We collected 0.5M multi-agent offline RL dataset and experimented with each comparison methods. We collected this data with online MADDPG agents, and it includes exploration trajectories using OU noise. The experiments are ran on Asymmetric Advantages on the Overcooked environment.

We are looking forward your contribution!

How to Run

Collect Offline Data

python train_online.py agent=maddpg save_replay_buffer=true

While the agents train with 0.5M steps, the trajectory replay buffer will be dumped in your experiment/{date}/{time}_maddpg_{exp_name}/buffer folder.
Please replace the path in config/data/local.yaml to the experiment by-product directory.

Download Dataset

Or, if you want to use our dataset pre-collected, please enjoy this link.
We provide 0.5M trajectories in Asymmetric Advantages layout.
Please download our dataset in your local computer and replace the path in config/data/local.yaml

Train Offline Models

Behavior Cloning

python train_bc.py agent=bc data=local

Offline MADDPG (Vanilla)

python train_offline.py agent=maddpg data=local

Offline MADDPG (w/ REM)

python train_offline.py agent=rem_maddpg data=local

Offline MADDPG (w/ BCQ) (WIP)

python train_offline.py agent=bcq_maddpg data=local

Result

Graph

Online	Offline (0.5M Data)	Offline (0.25M Data)

Video

Online	BC	Offline /w REM

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Related tags

Overview

Overcooked-AI

How to Run

Collect Offline Data

Download Dataset

Train Offline Models

Behavior Cloning

Offline MADDPG (Vanilla)

Offline MADDPG (w/ REM)

Offline MADDPG (w/ BCQ) (WIP)

Result

Graph

Video

Acknowledgement

Owner

Baek In-Chang

Alternatives to Deep Neural Networks for Function Approximations in Finance

Athena is the only tool that you will ever need to optimize your portfolio.

Rede Neural Convolucional feita durante o processo seletivo do Laboratório de Inteligência Artificial da FACOM (UFMS)

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Laplacian Score-regularized Concrete Autoencoders

Pairwise model for commonlit competition

MegEngine implementation of YOLOX

Official code for "Mean Shift for Self-Supervised Learning"

For AILAB: Cross Lingual Retrieval on Yelp Search Engine

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

A collection of loss functions for medical image segmentation

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

nfelo: a power ranking, prediction, and betting model for the NFL

Realtime segmentation with ENet, the fast and accurate segmentation net.

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

WSDM2022 Challenge - Large scale temporal graph link prediction

AI-Fitness-Tracker - AI Fitness Tracker With Python

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)