batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

MMRazor: a model compression toolkit for model slimming and AutoML

Affine / perspective transformation in Pose Estimation with Tensorflow 2

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

MoveNetを用いたPythonでの姿勢推定のデモ

ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

DeepLab2: A TensorFlow Library for Deep Labeling

ExCon: Explanation-driven Supervised Contrastive Learning

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

TensorFlow 2 AI/ML library wrapper for openFrameworks

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

🔀 Visual Room Rearrangement

Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

Informal Persian Universal Dependency Treebank

An essential implementation of BYOL in PyTorch + PyTorch Lightning

PyTorch implementation of a Real-ESRGAN model trained on custom dataset