Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Overview

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

License: MIT PWC

This repository is the pytorch implementation of our paper:

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
International Conference on Robotics and Automation (ICRA), 2021

[Project Page] [arXiv] [GitHub]

Installation

Clone the current repository and required submodules:

git clone https://github.com/GT-RIPL/robo-vln
cd robo-vln
  
export robovln_rootdir=$PWD
    
git submodule init 
git submodule update

Habitat and Other Dependencies

Install robo-vln dependencies as follows:

conda create -n habitat python=3.6 cmake=3.14.0
cd $robovln_rootdir
python -m pip install -r requirements.txt

We use modified versions of Habitat-Sim and Habitat-API to support continuous control/action-spaces in Habitat Simulator. The details regarding continuous action spaces and converting discrete VLN dataset into continuous control formulation can be found in our paper. The specific commits of our modified Habitat-Sim and Habitat-API versions are mentioned below.

# installs both habitat-api and habitat_baselines
cd $robovln_rootdir/environments/habitat-lab
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all
	
# Install habitat-sim
cd $robovln_rootdir/environments/habitat-sim
python setup.py install --headless --with-cuda

Data

Similar to Habitat-API, we expect a data folder (or symlink) with a particular structure in the top-level directory of this project.

Matterport3D

We utilize Matterport3D (MP3D) photo-realistic scene reconstructions to train and evaluate our agent. A total of 90 Matterport3D scenes are used for robo-vln. Here is the official Matterport3D Dataset download link and associated instructions: project webpage. To download the scenes needed for robo-vln, run the following commands:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract this data to data/scene_datasets/mp3d such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb.

Dataset

The Robo-VLN dataset is a continuous control formualtion of the VLN-CE dataset by Krantz et al ported over from Room-to-Room (R2R) dataset created by Anderson et al. The details regarding converting discrete VLN dataset into continuous control formulation can be found in our paper.

Dataset Path to extract Size
robo_vln_v1.zip data/datasets/robo_vln_v1 76.9 MB

Robo-VLN Dataset

The dataset robo_vln_v1 contains the train, val_seen, and val_unseen splits.

  • train: 7739 episodes
  • val_seen: 570 episodes
  • val_unseen: 1224 episodes

Format of {split}.json.gz

{
    'episodes' = [
        {
            'episode_id': 4991,
            'trajectory_id': 3279,
            'scene_id': 'mp3d/JeFG25nYj2p/JeFG25nYj2p.glb',
            'instruction': {
                'instruction_text': 'Walk past the striped area rug...',
                'instruction_tokens': [2384, 1589, 2202, 2118, 133, 1856, 9]
            },
            'start_position': [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            'start_rotation': [0, 0.3332950713608026, 0, 0.9428225683587541],
            'goals': [
                {
                    'position': [3.360340118408203, 0.09358400106430054, 3.07817006111145], 
                    'radius': 3.0
                }
            ],
            'reference_path': [
                [10.257800102233887, 0.09358400106430054, -2.379739999771118], 
                [9.434900283813477, 0.09358400106430054, -1.3061100244522095]
                ...
                [3.360340118408203, 0.09358400106430054, 3.07817006111145],
            ],
            'info': {'geodesic_distance': 9.65537166595459},
        },
        ...
    ],
    'instruction_vocab': [
        'word_list': [..., 'orchids', 'order', 'orient', ...],
        'word2idx_dict': {
            ...,
            'orchids': 1505,
            'order': 1506,
            'orient': 1507,
            ...
        },
        'itos': [..., 'orchids', 'order', 'orient', ...],
        'stoi': {
            ...,
            'orchids': 1505,
            'order': 1506,
            'orient': 1507,
            ...
        },
        'num_vocab': 2504,
        'UNK_INDEX': 1,
        'PAD_INDEX': 0,
    ]
}
  • Format of {split}_gt.json.gz
{
    '4991': {
        'actions': [
          ...
          [-0.999969482421875, 1.0],
          [-0.9999847412109375, 0.15731772780418396],
          ...
          ],
        'forward_steps': 325,
        'locations': [
            [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            [10.257800102233887, 0.09358400106430054, -2.379739999771118],
            ...
            [-12.644463539123535, 0.1518409252166748, 4.2241311073303220]
        ]
    }
    ...
}

Depth Encoder Weights

Similar to VLN-CE, our learning-based models utilizes a depth encoder pretained on a large-scale point-goal navigation task i.e. DDPPO. We utilize depth pretraining by using the DDPPO features from the ResNet50 from the original paper. The pretrained network can be downloaded here. Extract the contents of ddppo-models.zip to data/ddppo-models/{model}.pth.

Training and reproducing results

We use run.py script to train and evaluate all of our baseline models. Use run.py along with a configuration file and a run type (either train or eval) to train or evaluate:

python run.py --exp-config path/to/config.yaml --run-type {train | eval}

For lists of modifiable configuration options, see the default task config and experiment config files.

Evaluating Models

All models can be evaluated using python run.py --exp-config path/to/config.yaml --run-type eval. The relevant config entries for evaluation are:

EVAL_CKPT_PATH_DIR  # path to a checkpoint or a directory of checkpoints
EVAL.USE_CKPT_CONFIG  # if True, use the config saved in the checkpoint file
EVAL.SPLIT  # which dataset split to evaluate on (typically val_seen or val_unseen)
EVAL.EPISODE_COUNT  # how many episodes to evaluate

If EVAL.EPISODE_COUNT is equal to or greater than the number of episodes in the evaluation dataset, all episodes will be evaluated. If EVAL_CKPT_PATH_DIR is a directory, one checkpoint will be evaluated at a time. If there are no more checkpoints to evaluate, the script will poll the directory every few seconds looking for a new one. Each config file listed in the next section is capable of both training and evaluating the model it is accompanied by.

Off-line Data Buffer

All our models require an off-line data buffer for training. To collect the continuous control dataset for both train and val_seen splits, run the following commands before training (Please note that it would take some time on a single GPU to store data. Please also make sure to dedicate around ~1.5 TB of hard-disk space for data collection):

Collect data buffer for train split:

python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_train.yaml --run-type train

Collect data buffer for val_seen split:

python run.py --exp-config robo_vln_baselines/config/paper_configs/robovln_data_val.yaml --run-type train 

CUDA

We use 2 GPUs to train our Hierarchical Model hierarchical_cma.yaml. To train the hierarchical model, dedicate 2 GPUs for training as follows:

CUDA_VISIBLE_DEVICES=0,1 python run.py --exp-config robo_vln_baselines/config/paper_configs/hierarchical_cma.yaml --run-type train

Models/Results From the Paper

Model val_seen SPL val_unseen SPL Config
Seq2Seq 0.34 0.30 seq2seq_robo.yaml
PM 0.27 0.24 seq2seq_robo_pm.yaml
CMA 0.25 0.25 cma.yaml
HCM (Ours) 0.43 0.40 hierarchical_cma.yaml
Legend
Seq2Seq Sequence-to-Sequence. Please see our paper on modification made to the model to match the continuous action spaces in robo-vln
PM Progress monitor
CMA Cross-Modal Attention model. Please see our paper on modification made to the model to match the continuous action spaces in robo-vln
HCM Hierarchical Cross-Modal Agent Module (The proposed hierarchical VLN model from our paper).

Pretrained Model

We provide pretrained model for our best Hierarchical Cross-Modal Agent (HCM). Pre-trained Model can be downloaded as follows:

Pre-trained Model Size
HCM_Agent.pth 691 MB

Citation

If you find this repository useful, please cite our paper:

@inproceedings{irshad2021hierarchical,
title={Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation},
author={Muhammad Zubair Irshad and Chih-Yao Ma and Zsolt Kira},
booktitle={Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year={2021},
url={https://arxiv.org/abs/2104.10674}
}

Acknowledgments

  • This code is built upon the implementation from VLN-CE
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?” Usage To replicate our results in Secti

Albert Webson 64 Dec 11, 2022
Facebook Research 605 Jan 02, 2023
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
TraSw for FairMOT - A Single-Target Attack example (Attack ID: 19; Screener ID: 24):

TraSw for FairMOT A Single-Target Attack example (Attack ID: 19; Screener ID: 24): Fig.1 Original Fig.2 Attacked By perturbing only two frames in this

Derry Lin 21 Dec 21, 2022
Code for `BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery`, Neurips 2021

This folder contains the code for 'Scalable Variational Approaches for Bayesian Causal Discovery'. Installation To install, use conda with conda env c

14 Sep 21, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 06, 2023
Simple Linear 2nd ODE Solver GUI - A 2nd constant coefficient linear ODE solver with simple GUI using euler's method

Simple_Linear_2nd_ODE_Solver_GUI Description It is a 2nd constant coefficient li

:) 4 Feb 05, 2022
Differentiable Abundance Matching With Python

shamnet Differentiable Stellar Population Synthesis Installation You can install shamnet with pip. Installation dependencies are numpy, jax, corrfunc,

5 Dec 17, 2021
SCALoss: Side and Corner Aligned Loss for Bounding Box Regression (AAAI2022).

SCALoss PyTorch implementation of the paper "SCALoss: Side and Corner Aligned Loss for Bounding Box Regression" (AAAI 2022). Introduction IoU-based lo

TuZheng 20 Sep 07, 2022
A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks

A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks without the use of any outside machine learning libraries - all from scratch.

Kordel K. France 2 Nov 14, 2022
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

John 9 Sep 18, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intenti

NVIDIA Corporation 6.9k Jan 03, 2023
Rethinking Nearest Neighbors for Visual Classification

Rethinking Nearest Neighbors for Visual Classification arXiv Environment settings Check out scripts/env_setup.sh Setup data Download the following fin

Menglin Jia 29 Oct 11, 2022
A programming language written with python

Kaoft A programming language written with python How to use A simple Hello World: c="Hello World" c Output: "Hello World" Operators: a=12

1 Jan 24, 2022
Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets (including obl

Azavea 1.7k Dec 22, 2022
OpenMMLab Model Deployment Toolset

Introduction English | 简体中文 MMDeploy is an open-source deep learning model deployment toolset. It is a part of the OpenMMLab project. Major features F

OpenMMLab 1.5k Dec 30, 2022
Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

This repository is the implementation of the paper "Thermal Control of Laser Powder Bed Fusion Using Deep Reinforcement Learning", linked here. The project makes use of the Deep Reinforcement Library

BaratiLab 11 Dec 27, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022
Implementation of the federated dual coordinate descent (FedDCD) method.

FedDCD.jl Implementation of the federated dual coordinate descent (FedDCD) method. Installation To install, just call Pkg.add("https://github.com/Zhen

Zhenan Fan 6 Sep 21, 2022
This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

41 Dec 14, 2022