Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Last update: Sep 16, 2022

Related tags

Overview

Overcooked-AI

We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm.
In this repository, we implemented behavior cloning(BC), offline MADDPG, MADDPG+REM (MADDPG w/ REM), MADDPG+BCQ (MADDPG w/ BCQ) with pytorch. Now, BCQ is in ' Working In Progress', and it's not implemented completely.

We collected 0.5M multi-agent offline RL dataset and experimented with each comparison methods. We collected this data with online MADDPG agents, and it includes exploration trajectories using OU noise. The experiments are ran on Asymmetric Advantages on the Overcooked environment.

We are looking forward your contribution!

How to Run

Collect Offline Data

python train_online.py agent=maddpg save_replay_buffer=true

While the agents train with 0.5M steps, the trajectory replay buffer will be dumped in your experiment/{date}/{time}_maddpg_{exp_name}/buffer folder.
Please replace the path in config/data/local.yaml to the experiment by-product directory.

Download Dataset

Or, if you want to use our dataset pre-collected, please enjoy this link.
We provide 0.5M trajectories in Asymmetric Advantages layout.
Please download our dataset in your local computer and replace the path in config/data/local.yaml

Train Offline Models

Behavior Cloning

python train_bc.py agent=bc data=local

Offline MADDPG (Vanilla)

python train_offline.py agent=maddpg data=local

Offline MADDPG (w/ REM)

python train_offline.py agent=rem_maddpg data=local

Offline MADDPG (w/ BCQ) (WIP)

python train_offline.py agent=bcq_maddpg data=local

Result

Graph

Online	Offline (0.5M Data)	Offline (0.25M Data)

Video

Online	BC	Offline /w REM

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Related tags

Overview

Overcooked-AI

How to Run

Collect Offline Data

Download Dataset

Train Offline Models

Behavior Cloning

Offline MADDPG (Vanilla)

Offline MADDPG (w/ REM)

Offline MADDPG (w/ BCQ) (WIP)

Result

Graph

Video

Acknowledgement

Owner

Baek In-Chang

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Preprocessed Datasets for our Multimodal NER paper

Consensus score for tripadvisor

Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

PyTorch implementation of the Value Iteration Networks (VIN) (NIPS '16 best paper)

Investigating Attention Mechanism in 3D Point Cloud Object Detection (arXiv 2021)

Python port of R's Comprehensive Dynamic Time Warp algorithm package

GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications

On Evaluation Metrics for Graph Generative Models

Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper

ML course - EPFL Machine Learning Course, Fall 2021

ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

The fastai book, published as Jupyter Notebooks

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

Deep Learning with PyTorch made easy 🚀 !

LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

Tello Drone Trajectory Tracking

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos