Data and code for ICCV 2021 paper Distant Supervision for Scene Graph Generation.

Related tags

Deep LearningVisualDS
Overview

Distant Supervision for Scene Graph Generation

Data and code for ICCV 2021 paper Distant Supervision for Scene Graph Generation.

Introduction

The paper applies distant supervision to visual relation detection. The intuition of distant supervision is that possible predicates between entity pairs are highly dependent on the entity types. For example, there might be ride on, feed between human and horse in images, but it is less likely to be covering. Thus, we apply this correlation to take advantage of unlabeled data. Given the knowledge base containing possible combinations between entity types and predicates, our framework enables distantly supervised training without using any human-annotated relation data, and semi-supervised training that incorporates both human-labeled data and distantly labeled data. To build the knowledge base, we parse all possible (subject, predicate, object) triplets from Conceptual Caption dataset, resulting in a knowledge base containing 1.9M distinct relational triples.

Code

Thanks to the elegant code from Scene-Graph-Benchmark.pytorch. This project is built on their framework. There are also some differences from their settings. We show the differences in a later section.

The Illustration of Distant Supervision

alt text

Installation

Check INSTALL.md for installation instructions.

Dataset

Check DATASET.md for instructions of dataset preprocessing.

Metrics

Our metrics are directly adapted from Scene-Graph-Benchmark.pytorch.

Object Detector

Download Pre-trained Detector

In generally SGG tasks, the detector is pre-trained on the object bounding box annotations on training set. We directly use the pre-trained Faster R-CNN provided by Scene-Graph-Benchmark.pytorch, because our 20 category setting and their 50 category setting have the same training set.

After you download the Faster R-CNN model, please extract all the files to the directory /home/username/checkpoints/pretrained_faster_rcnn. To train your own Faster R-CNN model, please follow the next section.

The above pre-trained Faster R-CNN model achives 38.52/26.35/28.14 mAp on VG train/val/test set respectively.

Pre-train Your Own Detector

In this work, we do not modify the Faster R-CNN part. The training process can be referred to the origin code.

EM Algorithm based Training

All commands of training are saved in the directory cmds/. The directory of cmds looks like:

cmds/  
├── 20 
│   └── motif
│       ├── predcls
│       │   ├── ds \\ distant supervision which is weakly supervised training
│       │   │   ├── em_M_step1.sh
│       │   │   ├── em_E_step2.sh
│       │   │   ├── em_M_step2.sh
│       │   │   ├── em_M_step1_wclip.sh
│       │   │   ├── em_E_step2_wclip.sh
│       │   │   └── em_M_step2_wclip.sh
│       │   ├── semi \\ semi-supervised training 
│       │   │   ├── em_E_step1.sh
│       │   │   ├── em_M_step1.sh
│       │   │   ├── em_E_step2.sh
│       │   │   └── em_M_step2.sh
│       │   └── sup
│       │       ├── train.sh
│       │       └── val.sh
│       │
│       ├── sgcls
│       │   ...
│       │
│       ├── sgdet
│       │   ...

Generally, we use an EM algorithm based training, which means the model is trained iteratively. In E-step, we estimate the predicate label distribution between entity pairs. In M-step, we optimize the model with estimated predicate label distribution. For example, the em_E_step1 means the initialization of predicate label distribution, and in em_M_step1 the model will be optimized on the label estimation.

All checkpoints can be downloaded from MODEL_ZOO.md.

Preparation

Before running the code, you need to specify the current path as environment variable SG and the experiments' root directory as EXP.

# specify current directory as SG, e.g.:
export SG=~/VisualDS
# specify experiment directory, e.g.:
export EXP=~/exps

Weakly Supervised Training

Weakly supervised training can be done with only knowledge base or can also use external semantic signals to train a better model. As for the external semantic signals, we use currently popular CLIP to initialize the probability of possible predicates between entity pairs.

  1. w/o CLIP training for Predcls:
# no need for em_E_step1
sh cmds/20/motif/predcls/ds/em_M_step1.sh
sh cmds/20/motif/predcls/ds/em_E_step2.sh
sh cmds/20/motif/predcls/ds/em_M_step2.sh
  1. with CLIP training for Predcls:

Before training, please ensure datasets/vg/20/cc_clip_logits.pk is downloaded.

# the em_E_step1 is conducted by CLIP
sh cmds/20/motif/predcls/ds/em_M_step1_wclip.sh
sh cmds/20/motif/predcls/ds/em_E_step2_wclip.sh
sh cmds/20/motif/predcls/ds/em_M_step2_wclip.sh
  1. training for Sgcls and Sgdet:

E_step results of Predcls are directly used for Sgcls and Sgdet. Thus, there is no em_E_step.sh for Sgcls and Sgdet.

Semi-Supervised Training

In semi-supervised training, we use supervised model trained with labeled data to estimate predicate labels for entity pairs. So before conduct semi-supervised training, we should conduct a normal supervised training on Predcls task first:

sh cmds/20/motif/predcls/sup/train.sh

Or just download the trained model here, and put it into $EXP/20/predcls/sup/sup.

Noted that, for three tasks Predcls, Sgcls, Sgdet, we all use supervised model of Predcls task to initialize predicate label distributions. After the preparation, we can run:

sh cmds/20/motif/predcls/semi/em_E_step1.sh
sh cmds/20/motif/predcls/semi/em_M_step1.sh
sh cmds/20/motif/predcls/semi/em_E_step2.sh
sh cmds/20/motif/predcls/semi/em_M_step2.sh

Difference from Scene-Graph-Benchmark.pytorch

  1. Fix a bug in evaluation.

    We found that in previous evaluation, there are sometimes duplicated triplets in images, e.g. (1-man, ride, 2-horse)*3. We fix this small bug and use only unique triplets. By fixing the bug, the performance of the model will decrease somewhat. For example, the [email protected] of predcls task will decrease about 1~3 points.

  2. We conduct experiments on 20 categories predicate setting rather than 50 categories.

  3. In evaluation, weakly supervised trained model uses logits rather than softmax normalized scores for relation triplets ranking.

Owner
THUNLP
Natural Language Processing Lab at Tsinghua University
THUNLP
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
Tensorflow implementation of "Learning Deep Features for Discriminative Localization"

Weakly_detector Tensorflow implementation of "Learning Deep Features for Discriminative Localization" B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and

Taeksoo Kim 363 Jun 29, 2022
Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering

Path-Generator-QA This is a Pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Common

Peifeng Wang 33 Dec 05, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

AnimeGAN A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. Randomly Generated Images The images are

Jie Lei 雷杰 1.2k Jan 03, 2023
Python Fanduel API (2021) - Lineup Automation

Southpaw is a python package that provides access to the Fanduel API. Optimize your DFS experience by programmatically updating your lineups, analyzin

Brandin Canfield 13 Jan 04, 2023
Memory-Augmented Model Predictive Control

Memory-Augmented Model Predictive Control This repository hosts the source code for the journal article "Composing MPC with LQR and Neural Networks fo

Fangyu Wu 1 Jun 19, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Human Dynamics from Monocular Video with Dynamic Camera Movements

Human Dynamics from Monocular Video with Dynamic Camera Movements Ri Yu, Hwangpil Park and Jehee Lee Seoul National University ACM Transactions on Gra

215 Jan 01, 2023
A Comparative Framework for Multimodal Recommender Systems

Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia

Preferred.AI 671 Jan 03, 2023
Use tensorflow to implement a Deep Neural Network for real time lane detection

LaneNet-Lane-Detection Use tensorflow to implement a Deep Neural Network for real time lane detection mainly based on the IEEE IV conference paper "To

MaybeShewill-CV 1.9k Jan 08, 2023
Reliable probability face embeddings

ProbFace, arxiv This is a demo code of training and testing [ProbFace] using Tensorflow. ProbFace is a reliable Probabilistic Face Embeddging (PFE) me

Kaen Chan 34 Dec 31, 2022
Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

NOTE We have noticed a lot of concern that PULSE will be used to identify individuals whose faces have been blurred out. We want to emphasize that thi

Denis Malimonov 2k Dec 29, 2022
Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Low Precision Decentralized Training with Heterogenous Data Official PyTorch implementation for "Low Precision Decentralized Distributed Training with

Aparna Aketi 0 Nov 23, 2021
PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction Introduction This is official PyTorch implementation of Towards Accurate Alignment

TANG Xiao 96 Dec 27, 2022
ICSS - Interactive Continual Semantic Segmentation

Presentation This repository contains the code of our paper: Weakly-supervised c

Alteia 9 Jul 23, 2022
Convert ONNX model graph to Keras model format.

Convert ONNX model graph to Keras model format.

Grigory Malivenko 175 Dec 28, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
LAnguage Model Analysis

LAMA: LAnguage Model Analysis LAMA is a probe for analyzing the factual and commonsense knowledge contained in pretrained language models. The dataset

Meta Research 960 Jan 08, 2023