Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Last update: Dec 14, 2022

Related tags

Deep Learning vln-bert

Overview

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, and Dhruv Batra

Paper: https://arxiv.org/abs/2004.14973

Model Zoo

A variety of pre-trained VLN-BERT weights can accessed through the following links:

	Pre-training Stages	Job ID	Val Unseen SR	URL
0	no pre-training	174631	30.52%	TBD
1	1	175134	45.17%	TBD
3	1 and 2	221943	49.64%	download
2	1 and 3	220929	50.02%	download
4	1, 2, and 3 (Full Model)	220825	59.26%	download

Usage Instructions

Follow the instructions in INSTALL.md to setup this codebase. The instructions walk you through several steps including preprocessing the Matterport3D panoramas by extracting regions with a pretrained object detector.

Training

To preform stage 3 of pre-training, first download ViLBERT weights from here. Then, run:

python \
-m torch.distributed.launch \
--nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
train.py \
--from_pretrained <path/to/vilbert_pytorch_model_9.bin> \
--save_name [pre_train_run_id] \
--num_epochs 50 \
--warmup_proportion 0.08 \
--cooldown_factor 8 \
--masked_language \
--masked_vision \
--no_ranking

To fine-tune VLN-BERT for the path selection task, run:

python \
-m torch.distributed.launch \
--nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
train.py \
--from_pretrained <path/to/pytorch_model_50.bin> \
--save_name [fine_tune_run_id]

Evaluation

To evaluate a pre-trained model, run:

python test.py \
--split [val_seen|val_unseen] \
--from_pretrained <path/to/run_[run_id]_pytorch_model.bin> \
--save_name [run_id]

followed by:

python scripts/calculate-metrics.py <path/to/results_[val_seen|val_unseen].json>

Citation

If you find this code useful, please consider citing:

@inproceedings{majumdar2020improving,
  title={Improving Vision-and-Language Navigation with Image-Text Pairs from the Web},
  author={Arjun Majumdar and Ayush Shrivastava and Stefan Lee and Peter Anderson and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Related tags

Overview

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Model Zoo

Usage Instructions

Training

Evaluation

Citation

Owner

Arjun Majumdar

《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

A command line simple note taking app

PyTorch implementation of UPFlow (unsupervised optical flow learning)

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

PyTorch implementation HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Using LSTM to detect spoofing attacks in an Air-Ground network

Compare outputs between layers written in Tensorflow and layers written in Pytorch

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

phylotorch-bito is a package providing an interface to BITO for phylotorch

pytorch implementation of fast-neural-style

PointCNN: Convolution On X-Transformed Points (NeurIPS 2018)

Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Raptor-Multi-Tool - Raptor Multi Tool With Python

Pytorch implementation for M^3L