Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Overview

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

[Project website] [Paper]

This project is a PyTorch implementation of Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation, published in CoRL 2021.

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill the state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We validate our proposed approach on three manipulation tasks in obstructed environments and show its high sample-efficiency, outperforming state-of-the-art algorithms for visual policy learning.

Prerequisites

Installation

  1. Install Mujoco 2.0 and add the following environment variables into ~/.bashrc or ~/.zshrc.
# Download mujoco 2.0
$ wget https://www.roboti.us/download/mujoco200_linux.zip -O mujoco.zip
$ unzip mujoco.zip -d ~/.mujoco
$ mv ~/.mujoco/mujoco200_linux ~/.mujoco/mujoco200

# Copy mujoco license key `mjkey.txt` to `~/.mujoco`

# Add mujoco to LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin

# For GPU rendering (replace 418 with your nvidia driver version)
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-418

# Only for a headless server
$ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-418/libGL.so
  1. Download this repository and install python dependencies
# Install system packages
sudo apt-get install libgl1-mesa-dev libgl1-mesa-glx libosmesa6-dev patchelf libopenmpi-dev libglew-dev python3-pip python3-numpy python3-scipy

# Create out/ folder for saving/loading RL checkpoints
cd mopa-pd
mkdir out

# Create checkpoints/ folder for saving/loading BC-Visual checkpoints
mkdir checkpoints

# Install required python packages in your new env
pip install -r requirements.txt
  1. Install ompl
# Linux
sudo apt install libyaml-cpp-dev
sh ./scripts/misc/installEigen.sh #from the home directory # install Eigen
sudo apt-get install libboost-all-dev # install Boost C++ for ompl

# Mac OS
brew install libyaml yaml-cpp
brew install eigen

# Build ompl
git clone [email protected]:ompl/ompl.git ../ompl
cd ../ompl
cmake .
sudo make install

# if ompl-x.x (x.x is the version) is installed in /usr/local/include, you need to rename it to ompl
mv /usr/local/include/ompl-x.x /usr/local/include/ompl
  1. Build motion planner python wrapper
cd ./mopa-pd/motion_planners
python setup.py build_ext --inplace
  1. Configure wandb for tracking experiments (optional)
  • Sign up for a free account at https://app.wandb.ai/login?signup=true.
  • Open this file: config/__init__.py
  • Set wandb argument to True.
  • Add your username to the entity argument.
  • Add your project name to the project argument.
  1. Servers without a monitor (optional)

You may use the following code to create a virtual monitor for rendering.

# Run the next line for Ubuntu
$ sudo apt-get install xserver-xorg libglu1-mesa-dev freeglut3-dev mesa-common-dev libxmu-dev libxi-dev

# Configure nvidia-x
$ sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

# Launch a virtual display
$ sudo /usr/bin/X :1 &

# Run a command with DISPLAY=:1
DISPLAY=:1 <command>

Available environments

SawyerPushObstacle-v0 SawyerLiftObstacle-v0 SawyerAssemblyObstacle-v0
Sawyer Push Sawyer Lift Sawyer Assembly

How to run experiments

Launch a virtual display (only for a headless server)

sudo /usr/bin/X :1 &

MoPA-RL

# train MoPA-RL policy
sh ./scripts/3d/assembly/mopa.sh 0 1234
sh ./scripts/3d/lift/mopa.sh 0 1234
sh ./scripts/3d/push/mopa.sh 0 1234

# evaluate MoPA-RL policy
sh ./scripts/3d/assembly/mopa_eval.sh 0 1234
sh ./scripts/3d/lift/mopa_eval.sh 0 1234
sh ./scripts/3d/push/mopa_eval.sh 0 1234

# generate MoPA-RL data for BC-Visual using trained MoPA-RL's checkpoint
sh ./scripts/3d/assembly/run_multiple_sh.sh
sh ./scripts/3d/lift/run_multiple_sh.sh
sh ./scripts/3d/push/run_multiple_sh.sh

BC-Visual

# pre-process MoPA-RL data
python util/state_img_preprocessing.py

cd rl # must be inside rl folder to execute the following commands

# bc_visual_args.py is the config for training and evaluating BC-Visual

# train BC-Visual 
python behavioral_cloning_visual.py

# evaluate BC-Visual
python evaluate_bc_visual.py

Baselines and Ours

  • Sawyer Push
###### Training
sh ./scripts/3d/push/bcrl_stochastic_two_buffers.sh 0 1234 # Ours
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_mopa.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_dr.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/push/bcrl_mopa_sota.sh 0 1234 # CoL 
sh ./scripts/3d/push/bcrl_sota.sh 0 1234 # CoL (w BC Smoothing) 
sh ./scripts/3d/push/mopa_asym.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/push/bcrl_stochastic_randweights.sh 0 1234 # Asym. SAC

###### Evaluation
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_eval.sh 0 1234 # Ours
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_mopa_eval.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/push/bcrl_stochastic_two_buffers_dr_eval.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/push/bcrl_mopa_sota_eval.sh 0 1234 # CoL
sh ./scripts/3d/push/bcrl_sota_eval.sh 0 1234 # CoL (w BC Smoothing)
sh ./scripts/3d/push/mopa_asym_eval.sh 0 1234  # MoPA Asym. SAC
sh ./scripts/3d/push/bcrl_stochastic_randweights_eval.sh 0 1234 # Asym. SAC
  • Sawyer Lift
###### Training
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers.sh 0 1234 # Ours
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_mopa.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_dr.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/lift/bcrl_mopa_sota.sh 0 1234 # CoL
sh ./scripts/3d/lift/bcrl_sota.sh 0 1234 # CoL (w BC Smoothing) 
sh ./scripts/3d/lift/mopa_asym.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/lift/bcrl_stochastic_randweights.sh 0 1234 # Asym. SAC

###### Evaluation
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_eval.sh 0 1234 # Ours
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_mopa_eval.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/lift/bcrl_stochastic_two_buffers_dr_eval.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/lift/bcrl_mopa_sota_eval.sh 0 1234 # CoL
sh ./scripts/3d/lift/bcrl_sota_eval.sh 0 1234 # CoL (w BC Smoothing)
sh ./scripts/3d/lift/mopa_asym_eval.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/lift/bcrl_stochastic_randweights_eval.sh 0 1234 # Asym. SAC
  • Sawyer Assembly
###### Training
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers.sh 0 1234 # Ours
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_mopa.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_dr.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/assembly/bcrl_mopa_sota.sh 0 1234 # CoL
sh ./scripts/3d/assembly/bcrl_sota.sh 0 1234 # CoL (w BC Smoothing)
sh ./scripts/3d/assembly/mopa_asym.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/assembly/bcrl_stochastic_randweights.sh 0 1234 # Asym. SAC

###### Evaluation
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_eval.sh 0 1234 # Ours
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_mopa_eval.sh 0 1234 # Ours (w/o BC Smoothing)
sh ./scripts/3d/assembly/bcrl_stochastic_two_buffers_dr_eval.sh 0 1234 # Ours (w DR)
sh ./scripts/3d/assembly/bcrl_mopa_sota_eval.sh 0 1234 # CoL
sh ./scripts/3d/assembly/bcrl_sota_eval.sh 0 1234 # CoL (w BC Smoothing) 
sh ./scripts/3d/assembly/mopa_asym_eval.sh 0 1234 # MoPA Asym. SAC
sh ./scripts/3d/assembly/bcrl_stochastic_randweights_eval.sh 0 1234 # Asym. SAC

Domain Randomization

To run experiments with domain randomized simulation, the following parameters can be set in config:

  • dr: set to True to train the model with domain randomization
  • dr_params_set: choose as per the training environment - ["sawyer_push, sawyer_lift, sawyer_assembly]
  • dr_eval: set to True for evaluating the domain randomization model

Directories

The structure of the repository:

  • rl: Reinforcement learning code
  • env: Environment code for simulated experiments (2D Push and all Sawyer tasks)
  • config: Configuration files
  • util: Utility code
  • motion_planners: Motion planner code from MoPA-RL
  • scripts: Scripts for all experiments

Log directories:

  • logs/rl.ENV.DATE.PREFIX.SEED:
    • cmd.sh: A command used for running a job
    • git.txt: Log gitdiff
    • prarms.json: Summary of parameters
    • video: Generated evaulation videos (every evalute_interval)
    • wandb: Training summary of W&B, like tensorboard summary
    • ckpt_*.pt: Stored checkpoints (every ckpt_interval)
    • replay_*.pt: Stored replay buffers (every ckpt_interval)

Trouble shooting

Mujoco GPU rendering

To use GPU rendering for mujoco, you need to add /usr/lib/nvidia-000 (000 should be replaced with your NVIDIA driver version) to LD_LIBRARY_PATH before installing mujoco-py. Then, during mujoco-py compilation, it will show you linuxgpuextension instead of linuxcpuextension. In Ubuntu 18.04, you may encounter an GL-related error while building mujoco-py, open venv/lib/python3.7/site-packages/mujoco_py/gl/eglshim.c and comment line 5 #include <GL/gl.h> and line 7 #include <GL/glext.h>.

Virtual display on headless machines

On servers, you don’t have a monitor. Use this to get a virtual monitor for rendering and put DISPLAY=:1 in front of a command.

# Run the next line for Ubuntu
$ sudo apt-get install xserver-xorg libglu1-mesa-dev freeglut3-dev mesa-common-dev libxmu-dev libxi-dev

# Configure nvidia-x
$ sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

# Launch a virtual display
$ sudo /usr/bin/X :1 &

# Run a command with DISPLAY=:1
DISPLAY=:1 <command>

pybind11-dev not found

wget http://archive.ubuntu.com/ubuntu/pool/universe/p/pybind11/pybind11-dev_2.2.4-2_all.deb
sudo apt install ./pybind11-dev_2.2.4-2_all.deb

ERROR: GLEW initalization error: Missing GL version

This issue is most likely due to running on a headless server.

Solution 1:

sudo mkdir -p /usr/lib/nvidia-000

Then add this line to ~/.bashrc file:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-000

Solution 2:

1. First import and call mujocopy_render_hack in main.py
2. Follow the instructions in "Virtual display on headless machines" section
3. When running a script, remember to add DISPLAY:=1 <command> 

/usr/bin/ld: cannot find -lGL

Source: https://stackoverflow.com/questions/33447653/usr-bin-ld-cannot-find-lgl-ubuntu-14-04

sudo rm /usr/lib/x86_64-linux-gnu/libGL.so 
sudo ln -s /usr/lib/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so 

References

Citation

If you find this useful, please cite

@inproceedings{liu2021mopa,
  title={Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation},
  author={I-Chun Arthur Liu and Shagun Uppal and Gaurav S. Sukhatme and Joseph J. Lim and Peter Englert and Youngwoon Lee},
  booktitle={Conference on Robot Learning},
  year={2021}
}

Authors

I-Chun (Arthur) Liu*, Shagun Uppal*, Gaurav S. Sukhatme, Joseph J. Lim, Peter Englert, and Youngwoon Lee at USC CLVR and USC RESL (*Equal contribution)

Owner
Cognitive Learning for Vision and Robotics (CLVR) lab @ USC
Learning and Reasoning for Artificial Intelligence, especially focused on perception and action. Led by Professor Joseph J. Lim @ USC
Cognitive Learning for Vision and Robotics (CLVR) lab @ USC
Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

U-GAT-IT — Official PyTorch Implementation : Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Imag

Hyeonwoo Kang 2.4k Jan 04, 2023
Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

Triple-cooperative Video Shadow Detection Code and dataset for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"[arXiv link] [official l

Zhihao Chen 24 Oct 04, 2022
ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021 Dataset Code Demos Authors: He Zhang, Yuting Ye, Tak

HE ZHANG 194 Dec 06, 2022
Measuring and Improving Consistency in Pretrained Language Models

ParaRel 🤘 This repository contains the code and data for the paper: Measuring and Improving Consistency in Pretrained Language Models as well as the

Yanai Elazar 26 Dec 02, 2022
toroidal - a lightweight transformer library for PyTorch

toroidal - a lightweight transformer library for PyTorch Toroidal transformers are of smaller size and lower weight than the more common E-I types. Th

MathInf GmbH 64 Jan 07, 2023
The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022
OpenL3: Open-source deep audio and image embeddings

OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. Please refer to the documentation for detailed instructi

Music and Audio Research Laboratory - NYU 326 Jan 02, 2023
Custom implementation of Corrleation Module

Pytorch Correlation module this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC This tutorial was used as a basis for

Clément Pinard 361 Dec 12, 2022
Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

BCMI 75 Oct 21, 2021
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
A modular application for performing anomaly detection in networks

Deep-Learning-Models-for-Network-Annomaly-Detection The modular app consists for mainly three annomaly detection algorithms. The system supports model

Shivam Patel 1 Dec 09, 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 03, 2022
Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets This is the official PyTorch implementation for the paper Rapid Neural A

48 Dec 26, 2022
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 05, 2023
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
Image Deblurring using Generative Adversarial Networks

DeblurGAN arXiv Paper Version Pytorch implementation of the paper DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Our netwo

Orest Kupyn 2.2k Jan 01, 2023
The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

Kangel Zenn 5 Mar 26, 2022
Codebase for testing whether hidden states of neural networks encode discrete structures.

structural-probes Codebase for testing whether hidden states of neural networks encode discrete structures. Based on the paper A Structural Probe for

John Hewitt 349 Dec 17, 2022
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

UNION Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please

50 Dec 30, 2022