Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Overview

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Official PyTorch implementation for the paper

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Rishabh Jangir*, Nicklas Hansen*, Sambaran Ghosal, Mohit Jain, and Xiaolong Wang

[arXiv], [Webpage]

Installation

GPU access with CUDA >=11.1 support is required. Install MuJoCo if you do not have it installed already:

  • Obtain a license on the MuJoCo website.
  • Download MuJoCo binaries here.
  • Unzip the downloaded archive into ~/.mujoco/mujoco200 and place your license key file mjkey.txt at ~/.mujoco.
  • Use the env variables MUJOCO_PY_MJKEY_PATH and MUJOCO_PY_MUJOCO_PATH to specify the MuJoCo license key path and the MuJoCo directory path.
  • Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Then, the remainder of the dependencies can be installed with the following commands:

conda env create -f setup/conda.yml
conda activate lookcloser

Training

We provide training scripts for solving each of the four tasks using our method. The training scripts can be found in the scripts directory. Training takes approximately 16 hours on a single GPU for 500k timesteps.

Command: bash scripts/multiview.sh runs with the default arguments set towards training the reach environment with image observations with our crossview method.

Please take a look at src/arguments.py for detailed description of arguments and their usage. The different baselines considered in the paper can be run with little modification of the input arguments.

Results

We find that while using multiple views alone improves the sim-to-real performance of SAC, our Transformer-based view fusion is far more robust across all tasks.

sim-to-real results

See our paper for more results.

Method

Our method improves vision-based robotic manipulation by fusing information from multiple cameras using transformers. The learned RL policy transfers from simulation to a real robot, and solves precision-based manipulation tasks directly from uncalibrated cameras, without access to state information, and with a high degree of variability in task configurations.

method

Attention Maps

We visualize attention maps learned by our method, and find that it learns to relate concepts shared between the two views, e.g. when querying a point on an object shown the egocentric view, our method attends strongly to the same object in the third-person view, and vice-versa. attention

Tasks

Together with our method, we also release a set of four image-based robotic manipulation tasks used in our research. Each task is goal-conditioned with the goal specified directly in the image observations, the agent has no access to state information, and task configurations are randomly initialized at the start of each episode. The provided tasks are:

  • Reach: Reach a randomly positioned mark on the table with the robot's end-effector.
  • Push: Push a box to a goal position indicated by a mark on the table.
  • Pegbox: Place a peg attached to the robot's end-effector with a string into a box.
  • Hammerall: Hammer in an out-of-position peg; each episode, only one of four pegs are randomly initialized out-of-position.

tasks

Citation

If you find our work useful in your research, please consider citing the paper as follows:

@article{Jangir2022Look,
  title={Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation},
  author={ Rishabh Jangir and Nicklas Hansen and Sambaral Ghosal and Mohit Jain and Xiaolong Wang},
  booktitle={arXiv},
  primaryclass={cs.LG},
  year={2022}
}

License

This repository is licensed under the MIT license; see LICENSE for more information.

Owner
Rishabh Jangir
Robotics, AI, Reinforcement Learning, Machine Intelligence.
Rishabh Jangir
A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

AI-Biomed @NSCC-gz 3 May 08, 2022
Deep Face Recognition in PyTorch

Face Recognition in PyTorch By Alexey Gruzdev and Vladislav Sovrasov Introduction A repository for different experimental Face Recognition models such

Alexey Gruzdev 141 Sep 11, 2022
Create images and texts with the First Order Generative Adversarial Networks

First Order Divergence for training GANs This repository contains code accompanying the paper First Order Generative Advesarial Netoworks The majority

Zalando Research 35 Dec 11, 2021
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022
Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation"

1 Introduction Official code for article "Expression is enough: Improving traffic signal control with advanced traffic state representation". The code s

Liang Zhang 10 Dec 10, 2022
Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Sartorius - Cell Instance Segmentation https://www.kaggle.com/c/sartorius-cell-instance-segmentation Environment setup Build docker image bash .dev_sc

68 Dec 09, 2022
Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

HAABSAStar Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis". This project builds on the code from https://gith

1 Sep 14, 2020
Multiple style transfer via variational autoencoder

ST-VAE Multiple style transfer via variational autoencoder By Zhi-Song Liu, Vicky Kalogeiton and Marie-Paule Cani This repo only provides simple testi

13 Oct 29, 2022
A curated list of awesome neural radiance fields papers

Awesome Neural Radiance Fields A curated list of awesome neural radiance fields papers, inspired by awesome-computer-vision. How to submit a pull requ

Yen-Chen Lin 3.9k Dec 27, 2022
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
Library for 8-bit optimizers and quantization routines.

bitsandbytes Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions. Paper -- V

Facebook Research 687 Jan 04, 2023
the code of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021)

RMA-Net This repo is the implementation of the paper: Recurrent Multi-view Alignment Network for Unsupervised Surface Registration (CVPR 2021). Paper

Wanquan Feng 205 Nov 09, 2022
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
PyTorch for Semantic Segmentation

PyTorch for Semantic Segmentation This repository contains some models for semantic segmentation and the pipeline of training and testing models, impl

Zijun Deng 1.7k Jan 06, 2023
A Tensorfflow implementation of Attend, Infer, Repeat

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models This is an unofficial Tensorflow implementation of Attend, Infear, Repeat (AIR)

Adam Kosiorek 82 May 27, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ User support: lambeq-su

Cambridge Quantum 315 Jan 01, 2023
A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

3d-pose-baseline This is the code for the paper Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little. A simple yet effective baseline for 3

Julieta Martinez 1.3k Jan 03, 2023