Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Last update: Dec 18, 2022

Related tags

Overview

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers

Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu, Shashank Shekhar†, Graham W. Taylor†, Maksims Volkovs

* Denotes equal contribution
† University of Guelph / Vector Institute

Prerequisites and Environment

pytorch-gpu 1.13.1
numpy 1.16.0
tqdm

All experiments were conducted on a 20-core Intel(R) Xeon(R) CPU E5-2630 v4 @2.20GHz and 4 NVIDIA V100 GPUs with 32GB GPU memory.

Dataset

Visual Genome

Download it here. Unzip it under the data folder. You should see a vg folder unzipped there. It contains .json annotations that suit the dataloader used in this repo.

Visual Relation Detection

See Images:VRD

Images

Visual Genome

Create a folder for all images:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vg
mkdir VG_100K

Download Visual Genome images from the official page. Unzip all images (part 1 and part 2) into VG_100K/. There should be a total of 108249 files.

Visual Relation Detection

Create the vrd folder under data:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vrd

Download the original annotation json files from here and unzip json_dataset.zip here. The images can be downloaded from here. Unzip sg_dataset.zip to create an sg_dataset folder in data/vrd. Next run the preprocessing scripts:

cd $ROOT
python tools/rename_vrd_with_numbers.py
python tools/convert_vrd_anno_to_coco_format.py

rename_vrd_with_numbers.py converts all non-jpg images (some images are in png or gif) to jpg, and renames them in the {:012d}.jpg format (e.g., "000000000001.jpg"). It also creates new relationship annotations other than the original ones. This is mostly to make things easier for the dataloader. The filename mapping from the original is stored in data/vrd/*_fname_mapping.json where "*" is either "train" or "val".

convert_vrd_anno_to_coco_format.py creates object detection annotations from the new annotations generated above, which are required by the dataloader during training.

Pre-trained Object Detection Models

Download pre-trained object detection models here. Unzip it under the root directory. Note: We do not include code for training object detectors. Please refer to the "(Optional) Training Object Detection Models" section in Large-Scale-VRD.pytorch for this.

Directory Structure

The final directories should look like:

|-- data
|   |-- detections_train.json
|   |-- detections_val.json
|   |-- new_annotations_train.json
|   |-- new_annotations_val.json
|   |-- objects.json
|   |-- predicates.json
|-- evaluation
|-- output
|   |-- pair_predicate_dict.dat
|   |-- train_data.dat
|   |-- valid_data.dat
|-- config.py
|-- core.py
|-- data_utils.py
|-- evaluation_utils.py
|-- feature_utils.py
|-- file_utils.py
|-- preprocess.py
|-- trainer.py
|-- transformer.py

Evaluating Pre-trained Relationship Detection models

DO NOT CHANGE anything in the provided config files(configs/xx/xxxx.yaml) even if you want to test with less or more than 8 GPUs. Use the environment variable CUDA_VISIBLE_DEVICES to control how many and which GPUs to use. Remove the --multi-gpu-test for single-gpu inference.

Visual Genome

NOTE: May require at least 64GB RAM to evaluate on the Visual Genome test set

We use three evaluation metrics for Visual Genome:

SGDET: predict all the three labels and two boxes
SGCLS: predict subject, object and predicate labels given ground truth subject and object boxes
PRDCLS: predict predicate labels given ground truth subject and object boxes and labels

Training Scene Graph Generation Models

With the following command lines, the training results (models and logs) should be in $ROOT/Outputs/xxx/ where xxx is the .yaml file name used in the command without the ".yaml" extension. If you want to test with your trained models, simply run the test commands described above by setting --load_ckpt as the path of your trained models.

Visual Relation Detection

To train our scene graph generation model on the VRD dataset, run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 500 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vrd/

cd evaluation/vrd

python run_all_for_vrd.py prediction.txt

Visual Genome

To train our scene graph generation model on the VG dataset, download the json files from https://visualgenome.org/api/v0/api_home.html, put the extracted files under data and then run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 2000 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vg/

cd evaluation/vg

python run_all.py prediction.txt

Acknowledgements

This repository uses code based on the ContrastiveLosses4VRD Ji Zhang, Neural-Motifs source code from Rowan Zellers.

Citation

If you find this code useful in your research, please cite the following paper:

@inproceedings{lu2021seq2seq,
  title={Context-aware Scene Graph Generation with Seq2Seq Transformers},
  author={Yichao Lu, Himanshu Rai, Jason Chang, Boris Knyazev, Guangwei Yu, Shashank Shekhar, Graham W. Taylor, Maksims Volkovs},
  booktitle={ICCV},
  year={2021}
}

Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Related tags

Overview

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers

Prerequisites and Environment

Dataset

Visual Genome

Visual Relation Detection

Images

Visual Genome

Visual Relation Detection

Pre-trained Object Detection Models

Directory Structure

Evaluating Pre-trained Relationship Detection models

Visual Genome

Training Scene Graph Generation Models

Visual Relation Detection

Visual Genome

Acknowledgements

Citation

Owner

Layer6 Labs

Instantaneous Motion Generation for Robots and Machines.

The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble

Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

The Balloon Learning Environment - flying stratospheric balloons with deep reinforcement learning.

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Multi-label classification of retinal disorders

Distance-Ratio-Based Formulation for Metric Learning

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Official implementation of "Generating 3D Molecules for Target Protein Binding"

Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Research on Tabular Deep Learning (Python package & papers)

The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

SASM - simple crossplatform IDE for NASM, MASM, GAS and FASM assembly languages

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

So-ViT: Mind Visual Tokens for Vision Transformer

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21)

Implementation of Google Brain's WaveGrad high-fidelity vocoder

Aircraft design optimization made fast through modern automatic differentiation