Malmo Collaborative AI Challenge - Team Pig Catcher

Overview

The Malmo Collaborative AI Challenge - Team Pig Catcher

Approach

The challenge involves 2 agents who can either cooperate or defect. The optimal policy, based on stag hunt [1], depends on the policy of the other agent. Not knowing the other agent's policy, the optimal solution is then based on modelling the other agent's policy. Similarly, the challenge can be considered a sequential social dilemma [2], as goals could change over time.

By treating the other agent as part of the environment, we can use model-free RL, and simply aim to maximise the reward of our agent. As a baseline we take a DRL algorithm - ACER [3] - and train it against the evaluation agent (which randomly uses a focused or random strategy every episode).

We chose to approach this challenge using hierarchical RL. We assume there are 2 subpolicies, one for each type of partner agent. To do so, we use option heads [4], whereby the agent has shared features, but separate heads for different subpolicies. In this case, ACER with 2 subpolicies has 2 Q-value heads and 2 policy heads. To choose which subpolicy to use at any given time, the agent also has an additional classifier head that is trained (using an oracle) to distinguish which option to use. Therefore, we ask the following questions:

  • Can the agent distinguish between the two possible behaviours of the evaluation agent?
  • Does the agent learn qualitatively different subpolicies?

Unfortunately, due to technical difficulties and time restrictions, we were unable to successfully train an agent. Full results and more details can be found in our video.

Design Decisions

For our baseline, we implemented ACER [3] in PyTorch based on reference code [5, 6]. In addition, we augmented the state that the agent receives with the previous action, reward and a step counter [7]. Our challenge entry augments the agent with option heads [4], and we aim to distinguish the different policies of the evaluation agent.

We also introduce a novel contribution - a batch version of ACER - which increases stability. We sample a batch of off-policy trajectories, and then truncate them to match the smallest.

Instructions

Dependencies:

Firstly, build the Malmo Docker image. Secondly, enable running Docker as a non-root user.

Run ACER with OMP_NUM_THREADS=1 python pc_main.py. The code automatically opens up Minecraft (Docker) instances.

Discussion

Team Pig Catcher Discussion Video

References

[1] Game Theory of Mind
[2] Multi-agent Reinforcement Learning in Sequential Social Dilemmas
[3] Sample Efficient Actor-Critic with Experience Replay
[4] Classifying Options for Deep Reinforcement Learning
[5] ikostrikov/pytorch-a3c
[6] pfnet/ChainerRL
[7] Learning to Navigate in Complex Environments



This repository contains the task definition and example code for the Malmo Collaborative AI Challenge. This challenge is organized to encourage research in collaborative AI - to work towards AI agents that learn to collaborate to solve problems and achieve goals. You can find additional details, including terms and conditions, prizes and information on how to participate at the Challenge Homepage.

Join the chat at https://gitter.im/malmo-challenge/Lobby license


Notes for challenge participants: Once you and your team decide to participate in the challenge, please make sure to register your team at our Registration Page. On the registration form, you need to provide a link to the GitHub repository that will contain your solution. We recommend that you fork this repository (learn how), and provide address of the forked repo. You can then update your submission as you make progress on the challenge task. We will consider the version of the code on branch master at the time of the submission deadline as your challenge submission. Your submission needs to contain code in working order, a 1-page description of your approach, and a 1-minute video that shows off your agent. Please see the challenge terms and conditions for further details.


Jump to:

Installation

Prerequisites

Minimal installation

pip install -e git+https://github.com/Microsoft/malmo-challenge#egg=malmopy

or

git clone https://github.com/Microsoft/malmo-challenge
cd malmo-challenge
pip install -e .

Optional extensions

Some of the example code uses additional dependencies to provide 'extra' functionality. These can be installed using:

pip install -e '.[extra1, extra2]'

For example to install gym and chainer:

pip install -e '.[gym]'

Or to install all extras:

pip install -e '.[all]'

The following extras are available:

  • gym: OpenAI Gym is an interface to a wide range of reinforcement learning environments. Installing this extra enables the Atari example agents in samples/atari to train on the gym environments. Note that OpenAI gym atari environments are currently not available on Windows.
  • tensorflow: TensorFlow is a popular deep learning framework developed by Google. In our examples it enables visualizations through TensorBoard.

Getting started

Play the challenge task

The challenge task takes the form of a mini game, called Pig Chase. Learn about the game, and try playing it yourself on our Pig Chase Challenge page.

Run your first experiment

See how to run your first baseline experiment on the Pig Chase Challenge page.

Next steps

Run an experiment in Docker on Azure

Docker is a virtualization platform that makes it easy to deploy software with all its dependencies. We use docker to run experiments locally or in the cloud. Details on how to run an example experiment using docker are in the docker README.

Resources

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

DistMIS Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation. DistriMIS Distributing Deep Learning Hyperparameter Tuning

HiEST 2 Sep 09, 2022
Use CLIP to represent video for Retrieval Task

A Straightforward Framework For Video Retrieval Using CLIP This repository contains the basic code for feature extraction and replication of results.

Jesus Andres Portillo Quintero 54 Dec 22, 2022
Optimizing synthesizer parameters using gradient approximation

Optimizing synthesizer parameters using gradient approximation NASH 2021 Hackathon! These are some experiments I conducted during NASH 2021, the Neura

Jordie Shier 10 Feb 10, 2022
[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

BCMI 49 Jul 27, 2022
Python Jupyter kernel using Poetry for reproducible notebooks

Poetry Kernel Use per-directory Poetry environments to run Jupyter kernels. No need to install a Jupyter kernel per Python virtual environment! The id

Pathbird 204 Jan 04, 2023
Codes for “A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection”

DSAMNet The pytorch implementation for "A Deeply-supervised Attention Metric-based Network and an Open Aerial Image Dataset for Remote Sensing Change

Mengxi Liu 41 Dec 14, 2022
Mengzi Pretrained Models

中文 | English Mengzi 尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用,但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下,研发出各项指标更优的模型。 我们的目标不是追求更大的模型规模,而是轻量级但更强大,同时对部署和工业落地更友好的模型。

Langboat 424 Jan 04, 2023
Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Box_Discretization_Network This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method

Yuliang Liu 266 Nov 24, 2022
code for Grapadora research paper experimentation

Road feature embedding selection method Code for research paper experimentation Abstract Traffic forecasting models rely on data that needs to be sens

Eric López Manibardo 0 May 26, 2022
DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

52 Dec 30, 2022
Fuzzer for Linux Kernel Drivers

difuze: Fuzzer for Linux Kernel Drivers This repo contains all the sources (including setup scripts), you need to get difuze up and running. Tested on

seclab 344 Dec 27, 2022
Coursera - Quiz & Assignment of Coursera

Coursera Assignments This repository is aimed to help Coursera learners who have difficulties in their learning process. The quiz and programming home

浅梦 828 Jan 04, 2023
A python package to perform same transformation to coco-annotation as performed on the image.

coco-transform-util A python package to perform same transformation to coco-annotation as performed on the image. Installation Way 1 $ git clone https

1 Jan 14, 2022
Examples of how to create colorful, annotated equations in Latex using Tikz.

The file "eqn_annotate.tex" is the main latex file. This repository provides four examples of annotated equations: [example_prob.tex] A simple one ins

SyNeRCyS Research Lab 3.2k Jan 05, 2023
Clustering is a popular approach to detect patterns in unlabeled data

Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data

Tarek Naous 24 Nov 11, 2022
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
Buffon’s needle: one of the oldest problems in geometric probability

Buffon-s-Needle Buffon’s needle is one of the oldest problems in geometric proba

3 Feb 18, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022