Code repository for our paper "Learning to Generate Scene Graph from Natural Language Supervision" in ICCV 2021

Overview

Scene Graph Generation from Natural Language Supervision

This repository includes the Pytorch code for our paper "Learning to Generate Scene Graph from Natural Language Supervision" accepted in ICCV 2021.

overview figure

Top (our setting): Our goal is learning to generate localized scene graphs from image-text pairs. Once trained, our model takes an image and its detected objects as inputs and outputs the image scene graph. Bottom (our results): A comparison of results from our method and state-of-the-art (SOTA) with varying levels of supervision.

Contents

  1. Overview
  2. Qualitative Results
  3. Installation
  4. Data
  5. Metrics
  6. Pretrained Object Detector
  7. Pretrained Scene Graph Generation Models
  8. Model Training
  9. Model Evaluation
  10. Acknowledgement
  11. Reference

Overview

Learning from image-text data has demonstrated recent success for many recognition tasks, yet is currently limited to visual features or individual visual concepts such as objects. In this paper, we propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph. Further, we design a Transformer-based model to predict these "pseudo" labels via a masked token prediction task. Learning from only image-sentence pairs, our model achieves 30% relative gain over a latest method trained with human-annotated unlocalized scene graphs. Our model also shows strong results for weakly and fully supervised scene graph generation. In addition, we explore an open-vocabulary setting for detecting scene graphs, and present the first result for open-set scene graph generation.

Qualitative Results

Our generated scene graphs learned from image descriptions

overview figure

Partial visualization of Figure 3 in our paper: Our model trained by image-sentence pairs produces scene graphs with a high quality (e.g. "man-on-motorcycle" and "man-wearing-helmet" in first example). More comparison with other models trained by stronger supervision (e.g. unlocalized scene graph labels, localized scene graph labels) can be viewed in the Figure 3 of paper.

Our generated scene graphs in open-set and closed-set settings

overview figure

Figure 4 in our paper: We explored open-set setting where the categories of target concepts (objects and predicates) are unknown during training. Compared to our closed-set model, our open-set model detects more concepts outside the evaluation dataset, Visual Genome (e.g. "swinge", "mouse", "keyboard"). Our results suggest an exciting avenue of large-scale training of open-set scene graph generation using image captioning dataset such as Conceptual Caption.

Installation

Check INSTALL.md for installation instructions.

Data

Check DATASET.md for instructions of data downloading.

Metrics

Explanation of metrics in this toolkit are given in METRICS.md

Pretrained Object Detector

In this project, we primarily use the detector Faster RCNN pretrained on Open Images dataset. To use this repo, you don't need to run this detector. You can directly download the extracted detection features, as the instruction in DATASET.md. If you're interested in this detector, the pretrained model can be found in TensorFlow 1 Detection Model Zoo: faster_rcnn_inception_resnet_v2_atrous_oidv4.

Additionally, to compare with previous fully supervised models, we also use the detector pretrained by Scene-Graph-Benchmark. You can download this Faster R-CNN model and extract all the files to the directory checkpoints/pretrained_faster_rcnn.

Pretrained Scene Graph Generation Models

Our pretrained SGG models can be downloaded on Google Drive. The details of these models can be found in Model Training section below. After downloading, please put all the folders to the directory checkpoints/. More pretrained models will be released. Stay tuned!

Model Training

To train our scene graph generation models, run the script

bash train.sh MODEL_TYPE

where MODEL_TYPE specifies the training supervision, the training dataset and the scene graph generation model. See details below.

  1. Language supervised models: trained by image-text pairs

    • Language_CC-COCO_Uniter: train our Transformer-based model on Conceptual Caption (CC) and COCO Caption (COCO) datasets
    • Language_*_Uniter: train our Transformer-based model on single dataset. * represents the dataset name and can be CC, COCO, and VG
    • Language_OpensetCOCO_Uniter: train our Transformer-based model on COCO dataset in open-set setting
    • Language_CC-COCO_MotifNet: train Motif-Net model with language supervision from CC and COCO datasets
  2. Weakly supervised models: trained by unlocalized scene graph labels

    • Weakly_Uniter: train our Transformer-based model
  3. Fully supervised models: trained by localized scene graph labels

    • Sup_Uniter: train our Transformer-based model
    • Sup_OnlineDetector_Uniter: train our Transformer-based model by using the object detector from Scene-Graph-Benchmark.

You can set CUDA_VISIBLE_DEVICES in train.sh to specify which GPUs are used for model training (e.g., the default script uses 2 GPUs).

Model Evaluation

To evaluate the trained scene graph generation model, you can reuse the commands in train.sh by simply changing WSVL.SKIP_TRAIN to True and setting OUTPUT_DIR as the path to your trained model. One example can be found in test.sh and just run bash test.sh.

Acknowledgement

This repository was built based on Scene-Graph-Benchmark for scene graph generation and UNITER for image-text representation learning.

We specially would like to thank Pengchuan Zhang for providing the object detector pretrained on Objects365 dataset which was used in our ablation study.

Reference

If you are using our code, please consider citing our paper.

@inproceedings{zhong2021SGGfromNLS,
  title={Learning to Generate Scene Graph from Natural Language Supervision},
  author={Zhong, Yiwu and Shi, Jing and Yang, Jianwei and Xu, Chenliang and Li, Yin},
  booktitle={ICCV},
  year={2021}
}
Owner
Yiwu Zhong
Ph.D. Student of Computer Science in University of Wisconsin-Madison
Yiwu Zhong
Easy and comprehensive assessment of predictive power, with support for neuroimaging features

Documentation: https://raamana.github.io/neuropredict/ News As of v0.6, neuropredict now supports regression applications i.e. predicting continuous t

Pradeep Reddy Raamana 93 Nov 29, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 05, 2023
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
Deep learning algorithms for muon momentum estimation in the CMS Trigger System

Deep learning algorithms for muon momentum estimation in the CMS Trigger System The Compact Muon Solenoid (CMS) is a general-purpose detector at the L

anuragB 2 Oct 06, 2021
Azua - build AI algorithms to aid efficient decision-making with minimum data requirements.

Project Azua 0. Overview Many modern AI algorithms are known to be data-hungry, whereas human decision-making is much more efficient. The human can re

Microsoft 197 Jan 06, 2023
Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI)

Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI) Preparation Clone the Synchronized-BatchNorm-P

Fangneng Zhan 12 Aug 10, 2022
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Facebook Research 366 Dec 28, 2022
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
VGG16 model-based classification project about brain tumor detection.

Brain-Tumor-Classification-with-MRI VGG16 model-based classification project about brain tumor detection. First, you can check what people are doing o

Atakan Erdoğan 2 Mar 21, 2022
TrackTech: Real-time tracking of subjects and objects on multiple cameras

TrackTech: Real-time tracking of subjects and objects on multiple cameras This project is part of the 2021 spring bachelor final project of the Bachel

5 Jun 17, 2022
License Plate Detection Application

LicensePlate_Project 🚗 🚙 [Project] 2021.02 ~ 2021.09 License Plate Detection Application Overview 1. 데이터 수집 및 라벨링 차량 번호판 이미지를 직접 수집하여 각 이미지에 대해 '번호판

4 Oct 10, 2022
Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021)

Video Instance Segmentation using Inter-Frame Communication Transformers (NeurIPS 2021) Paper Video Instance Segmentation using Inter-Frame Communicat

Sukjun Hwang 81 Dec 29, 2022
Group-Free 3D Object Detection via Transformers

Group-Free 3D Object Detection via Transformers By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong. This repo is the official implementation of "Group-

Ze Liu 213 Dec 07, 2022
Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

Yosep 1 Nov 15, 2021
Realistic lighting in ursina!

Ursina Lighting Realistic lighting in ursina! If you want to have realistic lighting in ursina, import the UrsinaLighting.py in your project and use t

17 Jul 07, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
competitions-v2

Codabench (formerly Codalab Competitions v2) Installation $ cp .env_sample .env $ docker-compose up -d $ docker-compose exec django ./manage.py migrat

CodaLab 21 Dec 02, 2022
Hypersearch weight debugging and losses tutorial

tutorial Activate tensorboard option Running TensorBoard remotely When working on a remote server, you can use SSH tunneling to forward the port of th

1 Dec 11, 2021
This repository includes code of my study about Asynchronous in Frequency domain of GAN images.

Exploring the Asynchronous of the Frequency Spectra of GAN-generated Facial Images Binh M. Le & Simon S. Woo, "Exploring the Asynchronous of the Frequ

4 Aug 06, 2022