Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Last update: Nov 07, 2022

Related tags

Overview

CQL-JAX

This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on top of the SAC base of JAX-RL.

Usage

Install Dependencies-

pip install -r requirements.txt
pip install "jax[cuda111]<=0.21.1" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Run CQL-

python train_offline.py --env_name=hopper-expert-v0 --min_q_weight=5

Please use the following values of min_q_weight on MuJoCo tasks to reproduce CQL results from IQL paper-

Domain	medium	medium-replay	medium-expert
walker	10	1	10
hopper	5	5	1
cheetah	90	80	100

For antmaze tasks min_q_weight=10 is found to work best.

In case of Out-Of Memory errors in JAX, try running with the following env variables-

XLA_PYTHON_CLIENT_MEM_FRACTION=0.80 python ...
XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 python ...

Performance & Runtime

Returns are more or less same as the torch implementation and comparable to IQL-

Task	CQL(PyTorch)	CQL(JAX)	IQL
hopper-medium-v2	58.5	74.6	66.3
hopper-medium-replay-v2	95.0	92.1	94.7
hopper-medium-expert-v2	105.4	83.2	91.5
antmaze-umaze-v0	74.0	69.5	87.5
antmaze-umaze-diverse-v0	84.0	78.7	62.2
antmaze-medium-play-v0	61.2	14.2	71.2
antmaze-medium-diverse-v0	53.7	10.7	70.2
antmaze-large-play-v0	15.8	0.0	39.6
antmaze-large-diverse-v0	14.9	0.0	47.5

Wall-clock time averages to ~50 mins, improving over IQL paper's 80 min CQL and closing the gap with IQL's 20 min.

Task	CQL(JAX)	IQL
hopper-medium-v2	52	27
hopper-medium-replay-v2	54	30
hopper-medium-expert-v2	57	29

Time efficiency over the original torch implementation is more than 4 times.

For more offline RL algorithm implementations, check out the JAX-RL, IQL and rlkit repositories.

Citation

In case you use CQL-JAX for your research, please cite the following-

@misc{cqljax,
  author = {Suri, Karush},
  title = {{Conservative Q Learning in JAX.}},
  url = {https://github.com/karush17/cql-jax},
  year = {2021}
}

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Related tags

Overview

CQL-JAX

Usage

Performance & Runtime

Citation

References

Owner

Karush Suri

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Arxiv harvester - Poor man's simple harvester for arXiv resources

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

PyTorch implementation of the wavelet analysis from Torrence & Compo

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Code for Universal Semi-Supervised Semantic Segmentation models paper accepted in ICCV 2019

Pytorch implementation of the unsupervised object discovery method LOST.

Offline Reinforcement Learning with Implicit Q-Learning

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Pywonderland - A tour in the wonderland of math with python.

How to Leverage Multimodal EHR Data for Better Medical Predictions?

ArtEmis: Affective Language for Art

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

ROS support for Velodyne 3D LIDARs

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021