ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

Related tags

Deep Learningalfred
Overview

ALFRED

A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk,
Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
CVPR 2020

ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications.

For the latest updates, see: askforalfred.com

What more? Checkout ALFWorld – interactive TextWorld environments for ALFRED scenes!

Quickstart

Clone repo:

$ git clone https://github.com/askforalfred/alfred.git alfred
$ export ALFRED_ROOT=$(pwd)/alfred

Install requirements:

$ virtualenv -p $(which python3) --system-site-packages alfred_env # or whichever package manager you prefer
$ source alfred_env/bin/activate

$ cd $ALFRED_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Download Trajectory JSONs and Resnet feats (~17GB):

$ cd $ALFRED_ROOT/data
$ sh download_data.sh json_feat

Train models:

$ cd $ALFRED_ROOT
$ python models/train/train_seq2seq.py --data data/json_feat_2.1.0 --model seq2seq_im_mask --dout exp/model:{model},name:pm_and_subgoals_01 --splits data/splits/oct21.json --gpu --batch 8 --pm_aux_loss_wt 0.1 --subgoal_aux_loss_wt 0.1

More Info

  • Dataset: Downloading full dataset, Folder structure, JSON structure.
  • Models: Training and Evaluation, File structure, Pre-trained models.
  • Data Generation: Generation, Replay Checks, Data Augmentation (high-res, depth, segementation masks etc).
  • Errata: Updated numbers for Goto subgoal evaluation.
  • THOR 2.1.0 Docs: Deprecated documentation from Ai2-THOR 2.1.0 release.
  • FAQ: Frequently Asked Questions.

SOTA Models

Open-source models that outperform the Seq2Seq baselines from ALFRED:

Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, Chen Sun
Paper, Code

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following
Kunal Pratap Singh*, Suvaansh Bhambri*, Byeonghwi Kim*, Roozbeh Mottaghi, Jonghyun Choi
Paper, Code

Contact Mohit to add your model here.

Prerequisites

  • Python 3
  • PyTorch 1.1.0
  • Torchvision 0.3.0
  • AI2THOR 2.1.0

See requirements.txt for all prerequisites

Hardware

Tested on:

  • GPU - GTX 1080 Ti (12GB)
  • CPU - Intel Xeon (Quad Core)
  • RAM - 16GB
  • OS - Ubuntu 16.04

Leaderboard

Run your model on test seen and unseen sets, and create an action-sequence dump of your agent:

$ cd $ALFRED_ROOT
$ python models/eval/leaderboard.py --model_path <model_path>/model.pth --model models.model.seq2seq_im_mask --data data/json_feat_2.1.0 --gpu --num_threads 5

This will create a JSON file, e.g. task_results_20191218_081448_662435.json, inside the <model_path> folder. Submit this JSON here: AI2 ALFRED Leaderboard. For rules and restrictions, see the getting started page.

Rules:

  1. You are only allowed to use RGB and language instructions (goal & step-by-step) as input for your agents. You cannot use additional depth, mask, metadata info etc. from the simulator on Test Seen and Test Unseen scenes. However, during training you are allowed to use additional info for auxiliary losses etc.
  2. During evaluation, agents are restricted to max_steps=1000 and max_fails=10. Do not change these settings in the leaderboard script; these modifications will not be reflected in the evaluation server.
  3. Pick a legible model name for the submission. Just "baseline" is not very descriptive.
  4. All submissions must be attempts to solve the ALFRED dataset.
  5. Answer the following questions in the description: a. Did you use additional sensory information from THOR as input, eg: depth, segmentation masks, class masks, panoramic images etc. during test-time? If so, please report them. b. Did you use the alignments between step-by-step instructions and expert action-sequences for training or testing? (no by default; the instructions are serialized into a single sentence)
  6. Share who you are: provide a team name and affiliation.
  7. (Optional) Share how you solved it: if possible, share information about how the task was solved. Link an academic paper or code repository if public.
  8. Only submit your own work: you may evaluate any model on the validation set, but must only submit your own work for evaluation against the test set.

Docker Setup

Install Docker and NVIDIA Docker.

Modify docker_build.py and docker_run.py to your needs.

Build

Build the image:

$ python scripts/docker_build.py 

Run (Local)

For local machines:

$ python scripts/docker_run.py
 
  source ~/alfred_env/bin/activate
  cd $ALFRED_ROOT

Run (Headless)

For headless VMs and Cloud-Instances:

$ python scripts/docker_run.py --headless 

  # inside docker
  tmux new -s startx  # start a new tmux session

  # start nvidia-xconfig (might have to run this twice)
  sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024
  sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024

  # start X server on DISPLAY 0
  # single X server should be sufficient for multiple instances of THOR
  sudo python ~/alfred/scripts/startx.py 0  # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0

  # detach from tmux shell
  # Ctrl+b then d

  # source env
  source ~/alfred_env/bin/activate
  
  # set DISPLAY variable to match X server
  export DISPLAY=:0

  # check THOR
  cd $ALFRED_ROOT
  python scripts/check_thor.py

  ###############
  ## (300, 300, 3)
  ## Everything works!!!

You might have to modify X_DISPLAY in gen/constants.py depending on which display you use.

Cloud Instance

ALFRED can be setup on headless machines like AWS or GoogleCloud instances. The main requirement is that you have access to a GPU machine that supports OpenGL rendering. Run startx.py in a tmux shell:

# start tmux session
$ tmux new -s startx 

# start X server on DISPLAY 0
# single X server should be sufficient for multiple instances of THOR
$ sudo python $ALFRED_ROOT/scripts/startx.py 0  # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0

# detach from tmux shell
# Ctrl+b then d

# set DISPLAY variable to match X server
$ export DISPLAY=:0

# check THOR
$ cd $ALFRED_ROOT
$ python scripts/check_thor.py

###############
## (300, 300, 3)
## Everything works!!!

You might have to modify X_DISPLAY in gen/constants.py depending on which display you use.

Also, checkout this guide: Setting up THOR on Google Cloud

Citation

If you find the dataset or code useful, please cite:

@inproceedings{ALFRED20,
  title ={{ALFRED: A Benchmark for Interpreting Grounded
           Instructions for Everyday Tasks}},
  author={Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and
          Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020},
  url  = {https://arxiv.org/abs/1912.01734}
}

License

MIT License

Change Log

14/10/2020:

  • Added errata for Goto subgoal evaluation.

28/10/2020:

  • Added --use_templated_goals option to train with templated goals instead of human-annotated goal descriptions.

26/10/2020:

  • Fixed missing stop-frame in Modeling Quickstart dataset (json_feat_2.1.0.zip).

07/04/2020:

  • Updated download links. Switched from Google Cloud to AWS. Old download links will be deactivated.

28/03/2020:

  • Updated the mask-interaction API to use IoU scores instead of max pixel count for selecting objects.
  • Results table in the paper will be updated with new numbers.

Contact

Questions or issues? Contact [email protected]

Owner
ALFRED
ALFRED
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

AugMix Introduction We propose AugMix, a data processing technique that mixes augmented images and enforces consistent embeddings of the augmented ima

Google Research 876 Dec 17, 2022
✔️ Visual, reactive testing library for Julia. Time machine included.

PlutoTest.jl (alpha release) Visual, reactive testing library for Julia A macro @test that you can use to verify your code's correctness. But instead

Pluto 68 Dec 20, 2022
Emotion classification of online comments based on RNN

emotion_classification Emotion classification of online comments based on RNN, the accuracy of the model in the test set reaches 99% data: Large Movie

1 Nov 23, 2021
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 229 Jan 02, 2023
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 05, 2023
Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe

Traductor de señas Traductor de lengua de señas al español basado en Python con Opencv y MedaiPipe Requerimientos 🔧 Python 3.8 o inferior para evitar

Jahaziel Hernandez Hoyos 3 Nov 12, 2022
Materials for my scikit-learn tutorial

Scikit-learn Tutorial Jake VanderPlas email: [email protected] twitter: @jakevdp gith

Jake Vanderplas 1.6k Dec 30, 2022
Code for the paper: Hierarchical Reinforcement Learning With Timed Subgoals, published at NeurIPS 2021

Hierarchical reinforcement learning with Timed Subgoals (HiTS) This repository contains code for reproducing experiments from our paper "Hierarchical

Autonomous Learning Group 21 Dec 03, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

11 Oct 08, 2022
E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

11 Nov 08, 2022
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 06, 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022
SGoLAM - Simultaneous Goal Localization and Mapping

SGoLAM - Simultaneous Goal Localization and Mapping PyTorch implementation of the MultiON runner-up entry, SGoLAM: Simultaneous Goal Localization and

10 Jan 05, 2023
根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

Genshin-lyre-auto-play 简体中文 | English 简介 根据midi文件演奏“风物之诗琴”的脚本。由Python驱动,在此承诺, ⚠️ 项目内绝不含任何能够引起安全问题的代码。 前排提示:所有键盘在动但是原神没反应的都是因为没有管理员权限,双击run.bat或者以管理员模式

御坂17032号 386 Jan 01, 2023
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 803 Dec 28, 2022
Implementation of ML models like Decision tree, Naive Bayes, Logistic Regression and many other

ML_Model_implementaion Implementation of ML models like Decision tree, Naive Bayes, Logistic Regression and many other dectree_model: Implementation o

Anshuman Dalai 3 Jan 24, 2022
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL: Graph Contrastive Learning for PyTorch PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL com

GCL: Graph Contrastive Learning Library for PyTorch 594 Jan 08, 2023