Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Last update: Dec 23, 2022

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Related tags

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Installation

Evaluating

Training

Acknowledgement

Owner

Henghui Ding

Denoising Diffusion Probabilistic Models

Решения, подсказки, тесты и утилиты для тренировки по алгоритмам от Яндекса.

RADIal is available now! Check the download section

Instance-Dependent Partial Label Learning

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

Deep Learning Algorithms for Hedging with Frictions

DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

Uni-Fold: Training your own deep protein-folding models

DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

Simple, efficient and flexible vision toolbox for mxnet framework.

Volsdf - Volume Rendering of Neural Implicit Surfaces

Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks

Py4fi2nd - Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.

ROS Basics and TurtleSim