Referring Video Object Segmentation

Overview

Awesome-Referring-Video-Object-Segmentation Awesome

Welcome to starts โญ & comments ๐Ÿ’น & sharing ๐Ÿ˜€ !!

- 2021.12.12: Recent papers (from 2021) 
- welcome to add if any information misses. ๐Ÿ˜Ž

Introduction

image

Referring video object segmentation aims at segmenting an object in video with language expressions.

Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video. A detailed explanation of the new task can be found in the following paper.

Seonguk Seo, Joon-Young Lee, Bohyung Han, โ€œURVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmarkโ€, European Conference on Computer Vision (ECCV), 2020:https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600205.pdf

Impressive Works Related to Referring Video Object Segmentation (RVOS)

Cross-modal progressive comprehension for referring segmentation:https://arxiv.org/abs/2105.07175 image

Benchmark

The 3rd Large-scale Video Object Segmentation - Track 3: Referring Video Object Segmentation

Datasets

image

Refer-YouTube-VOS-datasets

  • YouTube-VOS:
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_YTVOS_w_refer.py
python down_YTVOS_w_refer.py

Folder structure:

${current_path}/
โ””โ”€โ”€ refer_youtube_vos/ 
    โ”œโ”€โ”€ train/
    โ”‚   โ”œโ”€โ”€ JPEGImages/
    โ”‚   โ”‚   โ””โ”€โ”€ */ (video folders)
    โ”‚   โ”‚       โ””โ”€โ”€ *.jpg (frame image files) 
    โ”‚   โ””โ”€โ”€ Annotations/
    โ”‚       โ””โ”€โ”€ */ (video folders)
    โ”‚           โ””โ”€โ”€ *.png (mask annotation files) 
    โ”œโ”€โ”€ valid/
    โ”‚   โ””โ”€โ”€ JPEGImages/
    โ”‚       โ””โ”€โ”€ */ (video folders)
    โ”‚           โ””โ”€โ”€ *.jpg (frame image files) 
    โ””โ”€โ”€ meta_expressions/
        โ”œโ”€โ”€ train/
        โ”‚   โ””โ”€โ”€ meta_expressions.json  (text annotations)
        โ””โ”€โ”€ valid/
            โ””โ”€โ”€ meta_expressions.json  (text annotations)
  • A2D-Sentences:

REPO:https://web.eecs.umich.edu/~jjcorso/r/a2d/

paper:https://arxiv.org/abs/1803.07485

image

Citation:

@misc{gavrilyuk2018actor,
      title={Actor and Action Video Segmentation from a Sentence}, 
      author={Kirill Gavrilyuk and Amir Ghodrati and Zhenyang Li and Cees G. M. Snoek},
      year={2018},
      eprint={1803.07485},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License: The dataset may not be republished in any form without the written consent of the authors.

README Dataset and Annotation (version 1.0, 1.9GB, tar.bz) Evaluation Toolkit (version 1.0, tar.bz)

mkdir a2d_sentences
cd a2d_sentences
wget https://web.eecs.umich.edu/~jjcorso/bigshare/A2D_main_1_0.tar.bz
tar jxvf A2D_main_1_0.tar.bz
mkdir text_annotations

cd text_annotations
wget https://kgavrilyuk.github.io/actor_action/a2d_annotation.txt
wget https://kgavrilyuk.github.io/actor_action/a2d_missed_videos.txt
wget https://github.com/JerryX1110/awesome-rvos/blob/main/down_a2d_annotation_with_instances.py
python down_a2d_annotation_with_instances.py
unzip a2d_annotation_with_instances.zip
#rm a2d_annotation_with_instances.zip
cd ..

cd ..

Folder structure:

${current_path}/
โ””โ”€โ”€ a2d_sentences/ 
    โ”œโ”€โ”€ Release/
    โ”‚   โ”œโ”€โ”€ videoset.csv  (videos metadata file)
    โ”‚   โ””โ”€โ”€ CLIPS320/
    โ”‚       โ””โ”€โ”€ *.mp4     (video files)
    โ””โ”€โ”€ text_annotations/
        โ”œโ”€โ”€ a2d_annotation.txt  (actual text annotations)
        โ”œโ”€โ”€ a2d_missed_videos.txt
        โ””โ”€โ”€ a2d_annotation_with_instances/ 
            โ””โ”€โ”€ */ (video folders)
                โ””โ”€โ”€ *.h5 (annotations files) 

Citation:

@inproceedings{YaXuCaCVPR2017,
  author = {Yan, Y. and Xu, C. and Cai, D. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking},
  year = {2017}
}
@inproceedings{XuCoCVPR2016,
  author = {Xu, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Actor-Action Semantic Segmentation with Grouping-Process Models},
  year = {2016}
}
@inproceedings{XuHsXiCVPR2015,
  author = {Xu, C. and Hsieh, S.-H. and Xiong, C. and {\bf Corso}, {\bf J. J.}},
  booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition}},
  datadownload = {http://web.eecs.umich.edu/~jjcorso/r/a2d},
  poster = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D_poster.pdf},
  tags = {computer vision, activity recognition, video understanding, semantic segmentation},
  title = {Can Humans Fly? {Action} Understanding with Multiple Classes of Actors},
  url = {http://web.eecs.umich.edu/~jjcorso/pubs/xu_corso_CVPR2015_A2D.pdf},
  year = {2015}
}

image

downloading_script

mkdir jhmdb_sentences
cd jhmdb_sentences
wget http://files.is.tue.mpg.de/jhmdb/Rename_Images.tar.gz
wget https://kgavrilyuk.github.io/actor_action/jhmdb_annotation.txt
wget http://files.is.tue.mpg.de/jhmdb/puppet_mask.zip
tar -xzvf  Rename_Images.tar.gz
unzip puppet_mask.zip
cd ..

Folder structure:

${current_path}/
โ””โ”€โ”€ jhmdb_sentences/ 
    โ”œโ”€โ”€ Rename_Images/  (frame images)
    โ”‚   โ””โ”€โ”€ */ (action dirs)
    โ”œโ”€โ”€ puppet_mask/  (mask annotations)
    โ”‚   โ””โ”€โ”€ */ (action dirs)
    โ””โ”€โ”€ jhmdb_annotation.txt  (text annotations)

Citation:

@inproceedings{Jhuang:ICCV:2013,
title = {Towards understanding action recognition},
author = {H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black},
booktitle = {International Conf. on Computer Vision (ICCV)},
month = Dec,
pages = {3192-3199},
year = {2013}
}

image image image

Owner
Explorer
Explorer
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

GalaXC GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification @InProceedings{Saini21, author = {Saini, D. and Jain,

Extreme Classification 28 Dec 05, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 08, 2022
Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics Overall pipeline of OCN. Paper Link: [arXiv] [AAAI

13 Nov 21, 2022
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

Kin-Yiu, Wong 1.8k Jan 04, 2023
NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions Overview NUANCED is a user-centric conversational recommen

Facebook Research 18 Dec 28, 2021
Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

LASR Installation Build with conda conda env create -f lasr.yml conda activate lasr # install softras cd third_party/softras; python setup.py install;

Google 157 Dec 26, 2022
GBIM(Gesture-Based Interaction map)

ๆ‰‹ๅŠฟไบคไบ’ๅœฐๅ›พ GBIM(Gesture-Based Interaction map)๏ผŒๅŸบไบŽ่ง†่ง‰ๆทฑๅบฆ็ฅž็ป็ฝ‘็ปœ็š„ไบคไบ’ๅœฐๅ›พ๏ผŒ้€š่ฟ‡็”ต่„‘ๆ‘„ๅƒๅคด่ง‚ๅฏŸไฝฟ็”จ่€…็š„ๆ‰‹ๅŠฟๅ˜ๅŒ–๏ผŒ่ฟ›่€ŒๆŽงๅˆถๅœฐๅ›พ่ฟ›่กŒ็ฎ€ๅ•็š„ไบคไบ’ใ€‚็ฝ‘็ปœไฝฟ็”จPaddleXๆไพ›็š„่ฝป้‡็บงๆจกๅž‹PPYOLO TinyไปฅๅŠMobileNet V3 small๏ผŒไฝฟๅพ—ๆ•ดไธชๆจกๅž‹ๅคงๅฐ็บฆ10MBๅทฆๅณ๏ผŒๅณไฝฟๅœจCPUไธ‹ไนŸ่ƒฝๅฟซ้€Ÿๅฎšไฝๅ’Œ่ฏ†ๅˆซๆ‰‹ๅŠฟใ€‚

8 Feb 10, 2022
Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

42 Jul 25, 2022
In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

ๆจกๅผ่ฏ†ๅˆซๅคงไฝœไธšโ€”โ€”ไบบ่„ธๆฃ€ๆต‹ไธŽ่ฏ†ๅˆซๅนณๅฐ ๆœฌ้กน็›ฎๆ˜ฏไธ€ไธช็ฎ€ๆ˜“็š„ไบบ่„ธๆฃ€ๆต‹่ฏ†ๅˆซๅนณๅฐ๏ผŒๆไพ›ไบ†ไบบ่„ธไฟกๆฏๅฝ•ๅ…ฅๅ’Œไบบ่„ธ่ฏ†ๅˆซ็š„ๅŠŸ่ƒฝใ€‚ๅ‰็ซฏ้‡‡็”จ html+css+js๏ผŒๅŽ็ซฏ้‡‡็”จ pytorch๏ผŒ

Xuhua Huang 5 Aug 02, 2022
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection ๐Ÿค– Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 03, 2022
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 07, 2022
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

ManipulaTHOR: A Framework for Visual Object Manipulation Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha

AI2 65 Dec 30, 2022
๐Ÿ…๐Ÿ…๐Ÿ…YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320ร—320~

YOLOv5-Lite๏ผšlighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

pogg 1.5k Jan 05, 2023
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector ๐Ÿ˜ท by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
A Python package for time series augmentation

tsaug tsaug is a Python package for time series augmentation. It offers a set of augmentation methods for time series, as well as a simple API to conn

Arundo Analytics 278 Jan 01, 2023
Sparse Physics-based and Interpretable Neural Networks

Sparse Physics-based and Interpretable Neural Networks for PDEs This repository contains the code and manuscript for research done on Sparse Physics-b

28 Jan 03, 2023
Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Welcome to the CSIRO Face Analysis SDK. Documentation for the SDK can be found in doc/documentation.html. All code in this SDK is provided according t

Luiz Carlos Vieira 7 Jul 16, 2020
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation This repository contains code and data f

Zoey Liu 0 Jan 07, 2022