Official Implementation of Few-shot Visual Relationship Co-localization

Last update: Oct 13, 2022

Related tags

Deep Learning VRC

Overview

VRC

Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper

project page | paper

Requirements

Use python >= 3.8.5. Conda recommended : https://docs.anaconda.com/anaconda/install/linux/
Use pytorch 1.7.0 CUDA 10.2
Other requirements from 'requirements.txt'

To setup environment

# create new env vrc
$ conda create -n vrc python=3.8.5

# activate vrc
$ conda activate vrc

# install pytorch, torchvision
$ conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch

# install other dependencies
$ pip install -r requirements.txt

Training

Preparing dataset

Download VG images from https://visualgenome.org/
Extract faster_rcnn features of VG images using data_preparation/vrc_extract_frcnn_feats.py. Please follow instructions here.
Download VrR-VG dataset from http://vrr-vg.com/ or Google Drive Link

Training VR Encoder (VTransE)

Training parameters

To check and update training, model and dataset parameters see VR_Encoder/configs

To train VR Encoder:

$ python train_vr_encoder.py

Training VR Similarity Network (Relation Network)

Training parameters

To check and update training, testing, model and dataset parameters see VR_SimilarityNetwork/configs

To train VR Similarity Network:

$ python SimilarityNetworkTrain.py

To train VR Similarity Network (w/ concat as VR Encoding):

$ python ConcatplusSimilarityNetworkTrain.py

To evaluate (set eval setting in test_config.yaml)

$ python FullModelTest.py

Cite

If you find this code/paper useful for your research, please consider citing.

@InProceedings{teotiaMMM2021,
  author    = "Teotia, Revant and Mishra, Vaibhav and Maheshwari, Mayank and Mishra, Anand",
  title     = "Few-shot Visual Relationship Co-Localization",
  booktitle = "ICCV",
  year      = "2021",
}

Acknowledgements

This repo uses https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark and scripts from https://github.com/facebookresearch/mmf for Faster R-CNN feature extraction.

Code provided by https://github.com/zawlin/cvpr17_vtranse and https://github.com/yangxuntu/vrd helped in implementing VR encoder.

Contact

For any clarification, comment, or suggestion please create an issue or contact Revant, Vaibhav or Mayank.

Official Implementation of Few-shot Visual Relationship Co-localization

Related tags

Overview

VRC

Requirements

Training

Preparing dataset

Training VR Encoder (VTransE)

Training parameters

To train VR Encoder:

Training VR Similarity Network (Relation Network)

Training parameters

To train VR Similarity Network:

To train VR Similarity Network (w/ concat as VR Encoding):

To evaluate (set eval setting in test_config.yaml)

Cite

Acknowledgements

Contact

Owner

"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

PartImageNet is a large, high-quality dataset with part segmentation annotations

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Python scripts for performing lane detection using the LSTR model in ONNX

Statistical and Algorithmic Investing Strategies for Everyone

The backbone CSPDarkNet of YOLOX.

ICCV2021 Expert-Goal Trajectory Prediction

Advances in Neural Information Processing Systems (NeurIPS), 2020.

Simple image captioning model - CLIP prefix captioning.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

An Unsupervised Graph-based Toolbox for Fraud Detection

Get the partition that a file belongs and the percentage of space that consumes

A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

Reinforcement Learning for the Blackjack

Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Learning where to learn - Gradient sparsity in meta and continual learning