Referring Video Object Segmentation

Overview

Awesome-Referring-Video-Object-Segmentation Awesome

Welcome to starts ⭐ & comments πŸ’Ή & sharing πŸ˜€ !!

- 2021.12.12: Recent papers (from 2021) 
- welcome to add if any information misses. 😎

Introduction

image

Referring video object segmentation aims at segmenting an object in video with language expressions.

Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video. A detailed explanation of the new task can be found in the following paper.

Seonguk Seo, Joon-Young Lee, Bohyung Han, β€œURVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark”, European Conference on Computer Vision (ECCV), 2020:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Impressive Works Related to Referring Video Object Segmentation (RVOS)

Cross-modal progressive comprehension for referring segmentation:https://arxiv.org/abs/2105.07175 image

Benchmark

The 3rd Large-scale Video Object Segmentation - Track 3: Referring Video Object Segmentation

Datasets

image

Refer-YouTube-VOS-datasets

  • YouTube-VOS:
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_YTVOS_w_refer.py
python down_YTVOS_w_refer.py

Folder structure:

${current_path}/
└── refer_youtube_vos/ 
    β”œβ”€β”€ train/
    β”‚   β”œβ”€β”€ JPEGImages/
    β”‚   β”‚   └── */ (video folders)
    β”‚   β”‚       └── *.jpg (frame image files) 
    β”‚   └── Annotations/
    β”‚       └── */ (video folders)
    β”‚           └── *.png (mask annotation files) 
    β”œβ”€β”€ valid/
    β”‚   └── JPEGImages/
    β”‚       └── */ (video folders)
    β”‚           └── *.jpg (frame image files) 
    └── meta_expressions/
        β”œβ”€β”€ train/
        β”‚   └── meta_expressions.json  (text annotations)
        └── valid/
            └── meta_expressions.json  (text annotations)
  • A2D-Sentences:

REPO:https://web.eecs.umich.edu/~jjcorso/r/a2d/

paper:https://arxiv.org/abs/1803.07485

image

Citation:

@misc{gavrilyuk2018actor,
      title={Actor and Action Video Segmentation from a Sentence}, 
      author={Kirill Gavrilyuk and Amir Ghodrati and Zhenyang Li and Cees G. M. Snoek},
      year={2018},
      eprint={1803.07485},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License: The dataset may not be republished in any form without the written consent of the authors.

README Dataset and Annotation (version 1.0, 1.9GB, tar.bz) Evaluation Toolkit (version 1.0, tar.bz)

mkdir a2d_sentences
cd a2d_sentences
wget https://web.eecs.umich.edu/~jjcorso/bigshare/A2D_main_1_0.tar.bz
tar jxvf A2D_main_1_0.tar.bz
mkdir text_annotations

cd text_annotations
wget https://kgavrilyuk.github.io/actor_action/a2d_annotation.txt
wget https://kgavrilyuk.github.io/actor_action/a2d_missed_videos.txt
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_a2d_annotation_with_instances.py
python down_a2d_annotation_with_instances.py
unzip a2d_annotation_with_instances.zip
#rm a2d_annotation_with_instances.zip
cd ..

cd ..

Folder structure:

${current_path}/
└── a2d_sentences/ 
    β”œβ”€β”€ Release/
    β”‚   β”œβ”€β”€ videoset.csv  (videos metadata file)
    β”‚   └── CLIPS320/
    β”‚       └── *.mp4     (video files)
    └── text_annotations/
        β”œβ”€β”€ a2d_annotation.txt  (actual text annotations)
        β”œβ”€β”€ a2d_missed_videos.txt
        └── a2d_annotation_with_instances/ 
            └── */ (video folders)
                └── *.h5 (annotations files) 

Citation:

@inproceedings{YaXuCaCVPR2017,
  author = {Yan, Y. and Xu, C. and Cai, D. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking},
  year = {2017}
}
@inproceedings{XuCoCVPR2016,
  author = {Xu, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Actor-Action Semantic Segmentation with Grouping-Process Models},
  year = {2016}
}
@inproceedings{XuHsXiCVPR2015,
  author = {Xu, C. and Hsieh, S.-H. and Xiong, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  poster = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D_poster.pdf},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Can Humans Fly? {Action} Understanding with Multiple Classes of Actors},
  url = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D.pdf},
  year = {2015}
}

image

downloading_script

mkdir jhmdb_sentences
cd jhmdb_sentences
wget http://files.is.tue.mpg.de/jhmdb/Rename_Images.tar.gz
wget https://kgavrilyuk.github.io/actor_action/jhmdb_annotation.txt
wget http://files.is.tue.mpg.de/jhmdb/puppet_mask.zip
tar -xzvf  Rename_Images.tar.gz
unzip puppet_mask.zip
cd ..

Folder structure:

${current_path}/
└── jhmdb_sentences/ 
    β”œβ”€β”€ Rename_Images/  (frame images)
    β”‚   └── */ (action dirs)
    β”œβ”€β”€ puppet_mask/  (mask annotations)
    β”‚   └── */ (action dirs)
    └── jhmdb_annotation.txt  (text annotations)

Citation:

@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}

image image image

Owner
Explorer
Explorer
Streaming over lightweight data transformations

Description Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast a

Research Unit of Medical Imaging, Physics and Technology 256 Jan 08, 2023
The codes and models in 'Gaze Estimation using Transformer'.

GazeTR We provide the code of GazeTR-Hybrid in "Gaze Estimation using Transformer". We recommend you to use data processing codes provided in GazeHub.

65 Dec 27, 2022
SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

SweiNet SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging. SweiNet takes as in

Felix Jin 3 Mar 31, 2022
Pytorch Implementation of PointNet and PointNet++++

Pytorch Implementation of PointNet and PointNet++ This repo is implementation for PointNet and PointNet++ in pytorch. Update 2021/03/27: (1) Release p

Luigi Ariano 1 Nov 11, 2021
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
Educational API for 3D Vision using pose to control carton.

Educational API for 3D Vision using pose to control carton.

41 Jul 10, 2022
POT : Python Optimal Transport

POT: Python Optimal Transport This open source Python library provide several solvers for optimization problems related to Optimal Transport for signa

Python Optimal Transport 1.7k Dec 31, 2022
Model Quantization Benchmark

Introduction MQBench is an open-source model quantization toolkit based on PyTorch fx. The envision of MQBench is to provide: SOTA Algorithms. With MQ

500 Jan 06, 2023
NovelD: A Simple yet Effective Exploration Criterion

NovelD: A Simple yet Effective Exploration Criterion Intro This is an implementation of the method proposed in NovelD: A Simple yet Effective Explorat

29 Dec 05, 2022
TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication)

Parameterization of Hypercomplex Multiplications (PHM) This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex

Aston Zhang 9 Oct 26, 2022
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

Geometry-aware Instance-reweighted Adversarial Training This repository provides codes for Geometry-aware Instance-reweighted Adversarial Training (ht

Jingfeng 47 Dec 22, 2022
[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

[ICCV 2021] A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

CodingMan 45 Dec 12, 2022
Neural Style and MSG-Net

PyTorch-Style-Transfer This repo provides PyTorch Implementation of MSG-Net (ours) and Neural Style (Gatys et al. CVPR 2016), which has been included

Hang Zhang 904 Dec 21, 2022
An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Sequence Feature Alignment (SFA) By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao This repository is an o

WangWen 79 Dec 24, 2022
CVPR '21: In the light of feature distributions: Moment matching for Neural Style Transfer

In the light of feature distributions: Moment matching for Neural Style Transfer (CVPR 2021) This repository provides code to recreate results present

Nikolai Kalischek 49 Oct 13, 2022
Explainable Zero-Shot Topic Extraction

Zero-Shot Topic Extraction with Common-Sense Knowledge Graph This repository contains the code for reproducing the results reported in the paper "Expl

D2K Lab 56 Dec 14, 2022
An open source implementation of CLIP.

OpenCLIP Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training). The goal of this repository is to enable

2.7k Dec 31, 2022
This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Learning Invariant Representation for Unsupervised Image Restoration (CVPR 2020) Introduction This is an implementation for the paper "Learning Invari

GarField 88 Nov 07, 2022