Neural Dynamic Policies for End-to-End Sensorimotor Learning

Last update: Dec 11, 2022

Related tags

Deep Learning neural-dynamic-policies

Overview

Neural Dynamic Policies for End-to-End Sensorimotor Learning

In NeurIPS 2020 (Spotlight) [Project Website] [Project Video]

Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak
Carnegie Mellon University & Facebook AI Research

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. If you find this work useful in your research, please cite:

  @inproceedings{bahl2020neural,
    Author = { Bahl, Shikhar and Mukadam, Mustafa and
    Gupta, Abhinav and Pathak, Deepak},
    Title = {Neural Dynamic Policies for End-to-End Sensorimotor Learning},
    Booktitle = {NeurIPS},
    Year = {2020}
  }

1) Installation and Usage

This code is based on PyTorch. This code needs MuJoCo 1.5 to run. To install and setup the code, run the following commands:

#create directory for data and add dependencies
cd neural-dynamic-polices; mkdir data/
git clone https://github.com/rll/rllab.git
git clone https://github.com/openai/baselines.git

#create virtual env
conda create --name ndp python=3.5
source activate ndp

#install requirements
pip install -r requirements.txt
#OR try
conda env create -f ndp.yaml

Training imitation learning

cd neural-dynamic-polices
# name of the experiment
python main_il.py --name NAME

Training RL: run the script run_rl.sh. ENV_NAME is the environment (could be throw, pick, push, soccer, faucet). ALGO-TYPE is the algorithm (dmp for NDPs, ppo for PPO [Schulman et al., 2017] and ppo-multi for the multistep actor-critic architecture we present in our paper).

sh run_rl.sh ENV_NAME ALGO-TYPE EXP_ID SEED

In order to visualize trained models/policies, use the same exact arguments as used for training but call vis_policy.sh

  sh vis_policy.sh ENV_NAME ALGO-TYPE EXP_ID SEED

2) Other helpful pointers

3) Acknowledgements

Neural Dynamic Policies for End-to-End Sensorimotor Learning

Related tags

Overview

Neural Dynamic Policies for End-to-End Sensorimotor Learning

In NeurIPS 2020 (Spotlight) [Project Website] [Project Video]

1) Installation and Usage

2) Other helpful pointers

3) Acknowledgements

Owner

Shikhar Bahl

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Picasso: a methods for embedding points in 2D in a way that respects distances while fitting a user-specified shape.

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

1st place solution in CCF BDCI 2021 ULSEG challenge

Airbus Ship Detection Challenge

A data-driven maritime port simulator

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Segmentation Training Pipeline

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

CaLiGraph Ontology as a Challenge for Semantic Reasoners ([email protected]'21)

Deep learning model, heat map, data prepo

PyTorch Connectomics: segmentation toolbox for EM connectomics

This repo contains research materials released by members of the Google Brain team in Tokyo.

Tracking Progress in Question Answering over Knowledge Graphs

Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Continual World is a benchmark for continual reinforcement learning

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

Some useful blender add-ons for SMPL skeleton's poses and global translation.

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.