Evolution Strategies in PyTorch

Overview

Evolution Strategies

This is a PyTorch implementation of Evolution Strategies.

Requirements

Python 3.5, PyTorch >= 0.2.0, numpy, gym, universe, cv2

What is this? (For non-ML people)

A large class of problems in AI can be described as "Markov Decision Processes," in which there is an agent taking actions in an environment, and receiving reward, with the goal being to maximize reward. This is a very general framework, which can be applied to many tasks, from learning how to play video games to robotic control. For the past few decades, most people used Reinforcement Learning -- that is, learning from trial and error -- to solve these problems. In particular, there was an extension of the backpropagation algorithm from Supervised Learning, called the Policy Gradient, which could train neural networks to solve these problems. Recently, OpenAI had shown that black-box optimization of neural network parameters (that is, not using the Policy Gradient or even Reinforcement Learning) can achieve similar results to state of the art Reinforcement Learning algorithms, and can be parallelized much more efficiently. This repo is an implementation of that black-box optimization algorithm.

Usage

There are two neural networks provided in model.py, a small neural network meant for simple tasks with discrete observations and actions, and a larger Convnet-LSTM meant for Atari games.

Run python3 main.py --help to see all of the options and hyperparameters available to you.

Typical usage would be:

python3 main.py --small-net --env-name CartPole-v1

which will run the small network on CartPole, printing performance on every training batch. Default hyperparameters should be able to solve CartPole fairly quickly.

python3 main.py --small-net --env-name CartPole-v1 --test --restore path_to_checkpoint

which will render the environment and the performance of the agent saved in the checkpoint. Checkpoints are saved once per gradient update in training, always overwriting the old file.

python3 main.py --env-name PongDeterministic-v4 --n 10 --lr 0.01 --useAdam

which will train on Pong and produce a learning curve similar to this one:

Learning curve

This graph was produced after approximately 24 hours of training on a 12-core computer. I would expect that a more thorough hyperparameter search, and more importantly a larger batch size, would allow the network to solve the environment.

Deviations from the paper

  • I have not yet tried virtual batch normalization, but instead use the selu nonlinearity, which serves the same purpose but at a significantly reduced computational overhead. ES appears to be training on Pong quite well even with relatively small batch sizes and selu.

  • I did not pass rewards between workers, but rather sent them all to one master worker which took a gradient step and sent the new models back to the workers. If you have more cores than your batch size, OpenAI's method is probably more efficient, but if your batch size is larger than the number of cores, I think my method would be better.

  • I do not adaptively change the max episode length as is recommended in the paper, although it is provided as an option. The reasoning being that doing so is most helpful when you are running many cores in parallel, whereas I was using at most 12. Moreover, capping the episode length can severely cripple the performance of the algorithm if reward is correlated with episode length, as we cannot learn from highly-performing perturbations until most of the workers catch up (and they might not for a long time).

Tips

  • If you increase the batch size, n, you should increase the learning rate as well.

  • Feel free to stop training when you see that the unperturbed model is consistently solving the environment, even if the perturbed models are not.

  • During training you probably want to look at the rank of the unperturbed model within the population of perturbed models. Ideally some perturbation is performing better than your unperturbed model (if this doesn't happen, you probably won't learn anything useful). This requires 1 extra rollout per gradient step, but as this rollout can be computed in parallel with the training rollouts, this does not add to training time. It does, however, give us access to one less CPU core.

  • Sigma is a tricky hyperparameter to get right -- higher values of sigma will correspond to less variance in the gradient estimate, but will be more biased. At the same time, sigma is controlling the variance of our perturbations, so if we need a more varied population, it should be increased. It might be possible to adaptively change sigma based on the rank of the unperturbed model mentioned in the tip above. I tried a few simple heuristics based on this and found no significant performance increase, but it might be possible to do this more intelligently.

  • I found, as OpenAI did in their paper, that performance on Atari increased as I increased the size of the neural net.

Your code is making my computer slow help

Short answer: decrease the batch size to the number of cores in your computer, and decrease the learning rate as well. This will most likely hurt the performance of the algorithm.

Long answer: If you want large batch sizes while also keeping the number of spawned threads down, I have provided an old version in the slow_version branch which allows you to do multiple rollouts per thread, per gradient step. This code is not supported, however, and it is not recommended that you use it.

Contributions

Please feel free to make Github issues or send pull requests.

License

MIT

Owner
Andrew Gambardella
Machine Learning DPhil (PhD) student at University of Oxford
Andrew Gambardella
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 06, 2023
ML model to classify between cats and dogs

Cats-and-dogs-classifier This is my first ML model which can classify between cats and dogs. Here the accuracy is around 75%, however , the accuracy c

Sharath V 4 Aug 20, 2021
Tensorflow implementation of MIRNet for Low-light image enhancement

MIRNet Tensorflow implementation of the MIRNet architecture as proposed by Learning Enriched Features for Real Image Restoration and Enhancement. Lanu

Soumik Rakshit 91 Jan 06, 2023
mlpack: a scalable C++ machine learning library --

a fast, flexible machine learning library Home | Documentation | Doxygen | Community | Help | IRC Chat Download: current stable version (3.4.2) mlpack

mlpack 4.2k Jan 09, 2023
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org

Yuanyuan Yuan 172 Dec 29, 2022
PyTorch inference for "Progressive Growing of GANs" with CelebA snapshot

Progressive Growing of GANs inference in PyTorch with CelebA training snapshot Description This is an inference sample written in PyTorch of the origi

320 Nov 21, 2022
LaneAF: Robust Multi-Lane Detection with Affinity Fields

LaneAF: Robust Multi-Lane Detection with Affinity Fields This repository contains Pytorch code for training and testing LaneAF lane detection models i

155 Dec 17, 2022
From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Code for CCA-SSG model proposed in the NeurIPS 2021 paper From Canonical Correlation Analysis to Self-supervised Graph Neural Networks.

Hengrui Zhang 44 Nov 27, 2022
Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

ONNX-HybridNets-Multitask-Road-Detection Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONN

Ibai Gorordo 45 Jan 01, 2023
A Lightweight Hyperparameter Optimization Tool 🚀

Lightweight Hyperparameter Optimization 🚀 The mle-hyperopt package provides a simple and intuitive API for hyperparameter optimization of your Machin

136 Jan 08, 2023
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
Implementation of average- and worst-case robust flatness measures for adversarial training.

Relating Adversarially Robust Generalization to Flat Minima This repository contains code corresponding to the MLSys'21 paper: D. Stutz, M. Hein, B. S

David Stutz 13 Nov 27, 2022
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan 32 Dec 23, 2022
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

Oliver Hahn 16 Dec 23, 2022
Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Control-Robot-Arm-using-PS4-Controller You can see all details about this Robot

MohammadReza Sharifi 5 Jan 01, 2022
Benchmarks for Model-Based Optimization

Design-Bench Design-Bench is a benchmarking framework for solving automatic design problems that involve choosing an input that maximizes a black-box

Brandon Trabucco 43 Dec 20, 2022
Pytorch implementation of

EfficientTTS Unofficial Pytorch implementation of "EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture"(arXiv). Disclaimer: Somebo

Liu Songxiang 109 Nov 16, 2022
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
Code for the paper "Learning-Augmented Algorithms for Online Steiner Tree"

Learning-Augmented Algorithms for Online Steiner Tree This is the code for the paper "Learning-Augmented Algorithms for Online Steiner Tree". Requirem

0 Dec 09, 2021