Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Overview

EfficientZero (NeurIPS 2021)

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Environments

EfficientZero requires python3 (>=3.6) and pytorch (>=1.8.0) with the development headers.

We recommend to use torch amp (--amp_type torch_amp) to accelerate training.

Prerequisites

Before starting training, you need to build the c++/cython style external packages.

cd core/ctree
bash make.sh

The distributed framework of this codebase is built on ray.

Installation

As for other packages required for this codebase, please run pip install -r requirements.txt.

Usage

Quick start

  • Train: python main.py --env BreakoutNoFrameskip-v4 --case atari --opr train --amp_type torch_amp --num_gpus 1 --num_cpus 10 --cpu_actor 1 --gpu_actor 1 --force
  • Test: python main.py --env BreakoutNoFrameskip-v4 --case atari --opr test --amp_type torch_amp --num_gpus 1 --load_model --model_path model.p \

Bash file

We provide train.sh and test.sh for training and evaluation.

  • Train:
    • With 4 GPUs (3090): bash train.sh
  • Test: bash test.sh
Required Arguments Description
--env Name of the environment
--case {atari} It's used for switching between different domains(default: atari)
--opr {train,test} select the operation to be performed
--amp_type {torch_amp,none} use torch amp for acceleration
Other Arguments Description
--force will rewrite the result directory
--num_gpus 4 how many GPUs are available
--num_cpus 96 how many CPUs are available
--cpu_actor 14 how many cpu workers
--gpu_actor 20 how many gpu workers
--seed 0 the seed
--use_priority use priority in replay buffer sampling
--use_max_priority use the max priority for the newly collectted data
--amp_type 'torch_amp' use torch amp for acceleration
--info 'EZ-V0' some tags for you experiments
--p_mcts_num 8 set the parallel number of envs in self-play
--revisit_policy_search_rate 0.99 set the rate of reanalyzing policies
--use_root_value use root values in value targets (require more GPU actors)
--render render in evaluation
--save_video save videos for evaluation

Architecture Designs

The architecture of the training pipeline is shown as follows:

Some suggestions

  • To use a smaller model, you can choose smaller dim of the projection layers (Eg: 256/64) and the LSTM hidden layer (Eg: 64) in the config.
  • For GPUs with 10G memory instead of 20G memory, you can allocate 0.25 gpu for each GPU maker (@ray.remote(num_gpus=0.25)) in core/reanalyze_worker.py.

New environment registration

If you wan to apply EfficientZero to a new environment like mujoco. Here are the steps for registration:

  1. Follow the directory config/atari and create dir for the env at config/mujoco.
  2. Implement your MujocoConfig(BaseConfig) class and implement the models as well as your environment wrapper.
  3. Register the case at main.py.

Results

Evaluation with 32 seeds for 3 different runs (different seeds).

Citation

If you find this repo useful, please cite our paper:

@inproceedings{ye2021mastering,
  title={Mastering Atari Games with Limited Data},
  author={Weirui Ye, and Shaohuai Liu, and Thanard Kurutach, and Pieter Abbeel, and Yang Gao},
  booktitle={NeurIPS},
  year={2021}
}

Contact

If you have any question or want to use the code, please contact [email protected] .

Acknowledgement

We appreciate the following github repos a lot for their valuable code base implementations:

https://github.com/koulanurag/muzero-pytorch

https://github.com/werner-duvaud/muzero-general

https://github.com/pytorch/ELF

Owner
Weirui Ye
Weirui Ye
Example for AUAV 2022 with obstacle avoidance.

AUAV 2022 Sample This is a sample PX4 based quadrotor path planning framework based on Ubuntu 20.04 and ROS noetic for the IEEE Autonomous UAS 2022 co

James Goppert 11 Sep 16, 2022
Code for Temporally Abstract Partial Models

Code for Temporally Abstract Partial Models Accompanies the code for the experimental section of the paper: Temporally Abstract Partial Models, Khetar

DeepMind 19 Jul 13, 2022
Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Complex-Valued Neural Networks (CVNN) Done by @NEGU93 - J. Agustin Barrachina Using this library, the only difference with a Tensorflow code is that y

youceF 1 Nov 12, 2021
ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Ibai Gorordo 18 Nov 06, 2022
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

4 Jan 07, 2022
Scalable machine learning based time series forecasting

mlforecast Scalable machine learning based time series forecasting. Install PyPI pip install mlforecast Optional dependencies If you want more functio

Nixtla 145 Dec 24, 2022
List of all dependencies affected by node-ipc malicious commit

node-ipc-dependencies-list List of all dependencies affected by node-ipc malicious commit as of 17/3/2022 - 19/3/2022 (timestamp) Please improve upon

99 Oct 15, 2022
Pytorch implementation of MalConv

MalConv-Pytorch A Pytorch implementation of MalConv Desciprtion This is the implementation of MalConv proposed in Malware Detection by Eating a Whole

Alexander H. Liu 58 Oct 26, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
Grammar Induction using a Template Tree Approach

Gitta Gitta ("Grammar Induction using a Template Tree Approach") is a method for inducing context-free grammars. It performs particularly well on data

Thomas Winters 36 Nov 15, 2022
Use of Attention Gates in a Convolutional Neural Network / Medical Image Classification and Segmentation

Attention Gated Networks (Image Classification & Segmentation) Pytorch implementation of attention gates used in U-Net and VGG-16 models. The framewor

Ozan Oktay 1.6k Dec 30, 2022
This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

AlexNet_3dConv TensorFlow implementation of AlexNet(2012) by Alex Krizhevsky, with 3D convolutiional layers. 3D AlexNet Network with a standart AlexNe

Denis Timonin 41 Jan 16, 2022
Accelerated SMPL operation, commonly used in generate 3D human mesh, STAR included.

SMPL2 An enchanced and accelerated SMPL operation which commonly used in 3D human mesh generation. It takes a poses, shapes, cam_trans as inputs, outp

JinTian 20 Oct 17, 2022
Barbershop: GAN-based Image Compositing using Segmentation Masks (SIGGRAPH Asia 2021)

Barbershop: GAN-based Image Compositing using Segmentation Masks Barbershop: GAN-based Image Compositing using Segmentation Masks Peihao Zhu, Rameen A

Peihao Zhu 928 Dec 30, 2022
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
yufan 81 Dec 08, 2022
Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

Yunzhong Hou 80 Dec 25, 2022
Predict halo masses from simulations via graph neural networks

HaloGraphNet Predict halo masses from simulations via Graph Neural Networks. Given a dark matter halo and its galaxies, creates a graph with informati

Pablo Villanueva Domingo 20 Nov 15, 2022
A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines Understanding the results of deep neural networks is

Johan van den Heuvel 2 Dec 13, 2021