Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Overview

Training Script for Reuse-VOS

This code implementation of CVPR 2021 paper : Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Hard case (Ours, FRTM)

sample ours hard (Ours)

sample FRTM hard (FRTM)

Easy case (Ours, FRTM)

sample ours easy(Ours)

sample FRTM easy(FRTM)

Requirement

python package

  • torch
  • python-opencv
  • skimage
  • easydict

GPU support

  • GPU Memory >= 11GB (RN18)
  • CUDA >= 10.0
  • pytorch >= 1.4.0

Datasets

DAVIS

To test the DAVIS validation split, download and unzip the 2017 480p trainval images and annotations here.

/path/DAVIS
|-- Annotations/
|-- ImageSets/
|-- JPEGImages/

YouTubeVOS

To test our validation split and the YouTubeVOS challenge 'valid' split, download YouTubeVOS 2018 and place it in this directory structure:

/path/ytvos2018
|-- train/
|-- train_all_frames/
|-- valid/
`-- valid_all_frames/

Release

DAVIS

model Backbone Training set J & F 17 J & F 16 link
G-FRTM (t=1) Resnet18 Youtube-VOS + DAVIS 71.7 80.9 Google Drive
G-FRTM (t=0.7) Resnet18 Youtube-VOS + DAVIS 69.9 80.5 same pth
G-FRTM (t=1) Resnet101 Youtube-VOS + DAVIS 76.4 84.3 Google Drive
G-FRTM (t=0.7) Resnet101 Youtube-VOS + DAVIS 74.3 82.3 same pth

Youtube-VOS

model Backbone Training set G J-S J-Us F-S F-Us link
G-FRTM (t=1) Resnet18 Youtube-VOS 63.8 68.3 55.2 70.6 61.0 Google Drive
G-FRTM (t=0.8) Resnet18 Youtube-VOS 63.4 67.6 55.8 69.3 60.9 same pth
G-FRTM (t=0.7) Resnet18 Youtube-VOS 62.7 67.1 55.2 68.2 60.1 same pth

We initialize orignal-FRTM layers from official FRTM repository weight for Youtube-VOS benchmark. S = Seen, Us = Unseen

Target model cache

Here is the cache file we used for ResNet18 file

Run

Train

Open train.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run following command.

python train.py --name <session-name> --ftext resnet18 --dset all --dev cuda:0

--name is the name of save_dir name of current train --ftext is the name of the feature extractor, either resnet18 or resnet101. --dset is one of dv2017, ytvos2018 or all ("all" really means "both"). --dev is the name of the device to train on. --m1 is the margin1 for training reuse gate, and we use 1.0 for DAVIS benchmark and 0.5 for Youtube-VOS benchmark. --m2 is the margin2 for training reuse gate, and we use 0.

Replace "session-name" with whatever you like. Subdirectories with this name will be created under your checkpoint and tensorboard paths.

Eval

Open eval.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run following command.

python evaluate.py --ftext resnet18 --dset dv2017val --dev cuda:0

--ftext is the name of the feature extractor, either resnet18 or resnet101. --dset is one of dv2016val, dv2017val, yt2018jjval, yt2018val or yt2018valAll --dev is the name of the device to eval on. --TH Threshold for tau default= 0.7

The inference results will be saved at ${ROOT}/${result} . It is better to check multiple pth file for good accuracy.

Acknowledgement

This codebase borrows the code and structure from official FRTM repository. We are grateful to Facebook Inc. with valuable discussions.

Reference

The codebase is built based on following works

@misc{park2020learning,
      title={Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation}, 
      author={Hyojin Park and Jayeon Yoo and Seohyeong Jeong and Ganesh Venkatesh and Nojun Kwak},
      year={2020},
      eprint={2012.11655},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
HYOJINPARK
HYOJINPARK
Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

Google 7.3k Dec 26, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022
A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

Phil Wang 62 Dec 20, 2022
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

Hendrik Schröter 292 Dec 25, 2022
Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics Overall pipeline of OCN. Paper Link: [arXiv] [AAAI

13 Nov 21, 2022
Simple image captioning model - CLIP prefix captioning.

CLIP prefix captioning. Inference Notebook: 🥳 New: 🥳 Our technical papar is finally out! Official implementation for the paper "ClipCap: CLIP Prefix

688 Jan 04, 2023
Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks This repository contains the code and pre-trained

Hassan Dbouk 7 Dec 05, 2022
Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle、PyTorch、Caffe2、MxNet、Keras、TensorFlow and Advbox can benchmark the robustness of machine learning models.

Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddle、PyTorch、Caffe2、MxNet、Keras、TensorFlow and Advbox can benchmark the robustness of machine learning models

AdvBox 1.3k Dec 25, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs

Project Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs, https://arxiv.org/pdf/2111.01940.pdf. Authors Truong Son Hy

5 Jun 28, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

ST This is the code of NeurIPS 2021 paper "Towards Enabling Meta-Learning from Target Models". If you use any content of this repo for your work, plea

Su Lu 7 Dec 06, 2022
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Kai Zhang 2k Dec 31, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

Joe Ortiz 34 Dec 05, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 828 Dec 28, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 05, 2023
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

Spchcat Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. Description spchcat is a command-line tool that read

Pete Warden 279 Jan 03, 2023
InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing

InsTrim The paper: InsTrim: Lightweight Instrumentation for Coverage-guided Fuzzing Build Prerequisite llvm-8.0-dev clang-8.0 cmake = 3.2 Make git cl

75 Dec 23, 2022
Fast (simple) spectral synthesis and emission-line fitting of DESI spectra.

FastSpecFit Introduction This repository contains code and documentation to perform fast, simple spectral synthesis and emission-line fitting of DESI

5 Aug 02, 2022