This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Last update: May 03, 2022

Related tags

Deep Learning ObjProp

Overview

ObjProp

Introduction

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Installation

This repo is built using mmdetection. To install the dependencies, first clone the repository locally:

git clone https://github.com/anirudh-chakravarthy/objprop.git

Then, install PyTorch 1.1.0, torchvision 0.3.0, mmcv 0.2.12:

conda install pytorch==1.1.0 torchvision==0.3.0 -c pytorch
pip install mmcv==0.2.12

Then, install the CocoAPI for YouTube-VIS

conda install cython scipy
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

Training

First, download and prepare the YouTube-VIS dataset using the following instructions.

To train ObjProp, run the following command:

python3 tools/train.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos_objprop.py

In order to change the arguments such as dataset directory, learning rate, number of GPUs, etc, refer to the following configuration file configs/masktrack_rcnn_r50_fpn_1x_youtubevos_objprop.py.

Inference

To perform inference using ObjProp, run the following command:

python3 tools/test_video.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos_objprop.py [MODEL_PATH] --out [OUTPUT_PATH.json] --eval segm

A JSON file with the inference results will be saved at OUTPUT_PATH.json. To evaluate the performance, submit the result file to the evaluation server.

License

ObjProp is released under the Apache 2.0 license.

Citation

@article{Chakravarthy2021ObjProp,
  author = {Anirudh S Chakravarthy and Won-Dong Jang and Zudi Lin and Donglai Wei and Song Bai and Hanspeter Pfister},  
  title = {Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation},
  journal = {CoRR},
  volume = {abs/2111.07529},
  year = {2021},
  url = {https://arxiv.org/abs/2111.07529}
}

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Related tags

Overview

ObjProp

Introduction

Installation

Training

Inference

License

Citation

Owner

Anirudh S Chakravarthy

Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

An implementation for Neural Architecture Search with Random Labels (CVPR 2021 poster) on Pytorch.

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian (CVPR 2022)

"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

SmoothGrad implementation in PyTorch

This repository accompanies the ACM TOIS paper "What can I cook with these ingredients?" - Understanding cooking-related information needs in conversational search

A simple python library for fast image generation of people who do not exist.

Diagnostic tests for linguistic capacities in language models

🥇 LG-AI-Challenge 2022 1위 솔루션 입니다.

Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark

Creating predictive checklists from data using integer programming.

TextureGAN in Pytorch

Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021)

😮The official implementation of "CoNeRF: Controllable Neural Radiance Fields" 😮

GeDML is an easy-to-use generalized deep metric learning library

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"