Addressing Function Approximation Error in Actor-Critic Methods

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If you use our code or data please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.2 and Python 3.7.

Usage

The paper results can be reproduced by running:

./run_experiments.sh

Experiments on single environments can be run by calling:

python main.py --env HalfCheetah-v2

Hyper-parameters can be modified with different arguments to main.py. We include an implementation of DDPG (DDPG.py), which is not used in the paper, for easy comparison of hyper-parameters with TD3. This is not the implementation of "Our DDPG" as used in the paper (see OurDDPG.py).

Algorithms which TD3 compares against (PPO, TRPO, ACKTR, DDPG) can be found at OpenAI baselines repository.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparamters, etc, to improve performance. Learning curves are still the original results found in the paper.

Learning curves found in the paper are found under /learning_curves. Each learning curve are formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are peformed every 5000 time steps, over a total of 1 million time steps.

Numerical results can be found in the paper, or from the learning curves. Video of the learned agent can be found here.

Bibtex

@inproceedings{fujimoto2018addressing,
  title={Addressing Function Approximation Error in Actor-Critic Methods},
  author={Fujimoto, Scott and Hoof, Herke and Meger, David},
  booktitle={International Conference on Machine Learning},
  pages={1582--1591},
  year={2018}
}

Author's PyTorch implementation of TD3 for OpenAI gym tasks

Related tags

Overview

Addressing Function Approximation Error in Actor-Critic Methods

Usage

Results

Bibtex

Owner

Scott Fujimoto

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Code for our SIGCOMM'21 paper "Network Planning with Deep Reinforcement Learning".

This is an unofficial PyTorch implementation of Meta Pseudo Labels

IDA file loader for UF2, created for the DEFCON 29 hardware badge

Mouse Brain in the Model Zoo

Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

GANmouflage: 3D Object Nondetection with Texture Fields

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Raptor-Multi-Tool - Raptor Multi Tool With Python

Just Go with the Flow: Self-Supervised Scene Flow Estimation

HomoInterpGAN - Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation

Snapchat-filters-app-opencv-python - Here we used opencv and other inbuilt python modules to create filter application like snapchat

Simultaneous Demand Prediction and Planning

Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis