Active Offline Policy Selection With Python

Last update: Oct 15, 2022

Related tags

Overview

Active Offline Policy Selection

This is supporting example code for NeurIPS 2021 paper Active Offline Policy Selection by Ksenia Konyushkova*, Yutian Chen*, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas.

To simulate the active offline policy selection for a set of policies, one needs to provide a number of files. We provide the files for 76 policies on cartpole_swingup environemnt.

Sampled episodic returns for all policies on a number of evalauation episodes (full-reward-samples-dict.pkl), or a way of sampling a new episode of evaluation upon request for any policy. The file full-reward-samples-dict.pkl contains a dictionary that maps a policy by its string representation to a numpy.ndarray of of shape (5000,) (number of reward samples).
Off-policy evaluation score, such as fitted Q-evaluation (FQE) for all policies (ope_values.pkl). The file ope_values.pkl contains dictionary that maps policy info into OPE estimates. We provide FQE scores for the policies.
Actions that policies take on 1000 randomly sampled states from the offline dataset (actions.pkl). The file actions.pkl contains a dictionary with keys actions and policy_keys. actions is a list of 1000 ( number of states used to compute the kernel) elements of numpy.ndarray type of dimensionality 76x1 (number of policies by the dimensionality of the actions). policy_keys contains a dictionary mapping from string representation of a policy to the index of that policy in actions.

Installation

To set up the virtual environment, run the following commands. From within the active_ops directory:

python3 -m venv active_ops_env
source active_ops_env/bin/activate

pip install --upgrade pip
pip install -r requirements.txt

To run the demo with colab, enable the jupyter_http_over_ws extension:

jupyter serverextension enable --py jupyter_http_over_ws

Finally, start a server:

jupyter notebook \
  --NotebookApp.allow_origin='https://colab.research.google.com' \
  --port=8888 \
  --NotebookApp.port_retries=0

Usage

To run the code refer to Active_ops_experiment.ipynb colab notebook. Execute blocks of code one by one to reproduce the final plot. You can modify various parameters maked by @param to test various baselines in modified settings. This code loads the example of data for cartpole_environment provided in the data folder. Using this data, we reproduce the results of Figure 14 of the paper.

Citing this work

@inproceedings{konyushkovachen2021aops,
    title = "Active Offline Policy Selection",
    author = "Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas",
    booktitle = NeurIPS,
    year = 2021
}

Disclaimer

This is not an official Google product.

The datasets in this work are licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit [http://creativecommons.org/licenses/by/4.0/] (http://creativecommons.org/licenses/by/4.0/).

Active Offline Policy Selection With Python

Related tags

Overview

Active Offline Policy Selection

Installation

Usage

Citing this work

Disclaimer

Owner

DeepMind

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

(3DV 2021 Oral) Filtering by Cluster Consistency for Large-Scale Multi-Image Matching

an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Remote sensing change detection tool based on PaddlePaddle

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV)

RNN Predict Street Commercial Vitality

This package implements the algorithms introduced in Smucler, Sapienza, and Rotnitzky (2020) to compute optimal adjustment sets in causal graphical models.

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

Attentive Implicit Representation Networks (AIR-Nets)

This is the official implementation of VaxNeRF (Voxel-Accelearated NeRF).

Library for converting from RGB / GrayScale image to base64 and back.

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.

This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"