Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Last update: Nov 16, 2022

Related tags

Overview

Population-Based Bandits (PB2)

Code for the Population-Based Bandits (PB2) Algorithm, from the paper Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits.

The framework is based on a union of ray (using rllib and tune) and GPy. Heavily inspired by the ray tune pbt_ppo example.

NOTE PB2 is included in the ray.tune library, which is the official supported implementation. The link to the code is here, and the accompanying blog post is here.

Running the Code

To run the IMPALA experiment, use command:

python run_impala.py

To run the PPO experiment, use command:

python run_ppo.py

Config

Within that function, there are multiple ways to mix it up. You can choose the following:

-env_name: for example BreakoutNoFrameSkip-v4.
-method: either pb2 or pbt (or asha for PPO).
-freq: the frequency of updating hyperparams, we use 500,000 for IMPALA and 50,000 for PPO.
-seed: we used 0 1 2 3 4 5 6... and plan to add more seeds.
-max: the maximum number of timesteps, we used 10,000,000 for IMPALA and 1,000,000 for PPO.

It should also be possible to adapt this code to run other ray tune schedulers. We used it for ASHA in our PPO experiments. We are also working to include a BOHB baseline.

Please get in touch for all questions. jackph [at] robots [dot] ox [dot] ac [dot] uk

Citing PB2

Finally, if you found this repo useful, please consider citing us:

@inproceedings{NEURIPS2020_c7af0926,
 author = {Parker-Holder, Jack and Nguyen, Vu and Roberts, Stephen J},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {17200--17211},
 publisher = {Curran Associates, Inc.},
 title = {Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits},
 url = {https://proceedings.neurips.cc/paper/2020/file/c7af0926b294e47e52e46cfebe173f20-Paper.pdf},
 volume = {33},
 year = {2020}
}

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

Related tags

Overview

Population-Based Bandits (PB2)

Running the Code

Config

Citing PB2

Owner

Jack Parker-Holder

Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

Implementation of the paper Scalable Intervention Target Estimation in Linear Models (NeurIPS 2021), and the code to generate simulation results.

Official implementation of Densely connected normalizing flows

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

AAAI-22 paper: SimSR: Simple Distance-based State Representationfor Deep Reinforcement Learning

The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

Implementation of Multistream Transformers in Pytorch

Sequence lineage information extracted from RKI sequence data repo

A python code to convert Keras pre-trained weights to Pytorch version

Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

PyTorch implementation for View-Guided Point Cloud Completion

An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available actions

Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

Music source separation is a task to separate audio recordings into individual sources

An example of time series augmentation methods with Keras

Multi-task Learning of Order-Consistent Causal Graphs (NeuRIPs 2021)

Autonomous racing with the Anki Overdrive

For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

Towards Long-Form Video Understanding

Blender Add-on that sets a Material's Base Color to one of Pantone's Colors of the Year