[CVPR'21] Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

Overview

IVOS-W

Paper

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

Zhaoyun Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang, Shenghua Gao.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[Preprint] [Supplementary Material] [Poster]

Getting Started

Create the environment

# create conda env
conda create -n ivosw python=3.7
# activate conda env
conda activate ivosw
# install pytorch
conda install pytorch=1.3 torchvision
# install other dependencies
pip install -r requirements.txt

We adopt MANet, IPN, and ATNet as the VOS algorithms. Please follow the instructions to install the dependencies.

git clone https://github.com/yuk6heo/IVOS-ATNet.git VOS/ATNet
git clone https://github.com/lightas/CVPR2020_MANet.git VOS/MANet
git clone https://github.com/zyy-cn/IPN.git VOS/IPN

Dataset Preparation

  • DAVIS 2017 Dataset
    • Download the data and human annotated scribbles here.
    • Place DAVIS folder into root/data.
  • YouTube-VOS Dataset
    • Download the YouTube-VOS 2018 version here.
    • Clean up the annotations following here.
    • Download our annotated scribbles here.

Create a DAVIS-like structure of YouTube-VOS by running the following commands:

python datasets/prepare_ytbvos.py --src path/to/youtube_vos --scb path/to/scribble_dir

Evaluation

For evaluation, please download the pretrained agent model and quality assessment model, then place them into root/weights and run the following commands:

python eval_agent_{atnet/manet/ipn}.py with setting={oracle/wild} dataset={davis/ytbvos} method={random/linspace/worst/ours}

The results will be stored in results/{VOS}/{setting}/{dataset}/{method}/summary.json

Note: The results may fluctuate slightly with different versions of networkx, which is used by davisinteractive to generate simulated scribbles.

Training

First, prepare the data used to train the agent by downloading reward records and pretrained experience buffer, place them into root/train, or generate them from scratch:

python produce_reward.py
python pretrain_agent.py

To train the agent:

python train_agent.py

To train the segmentation quality assessment model:

python generate_data.py
python quality_assessment.py

Citation

@inproceedings{IVOSW,
  title     = {Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild},
  author    = {Zhaoyuan Yin and
               Jia Zheng and
               Weixin Luo and
               Shenhan Qian and
               Hanling Zhang and
               Shenghua Gao},
  booktitle = {CVPR},
  year      = {2021}
}

LICENSE

The code is released under the MIT license.

Owner
SVIP Lab
ShanghaiTech Vision and Intelligent Perception Lab
SVIP Lab
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
object recognition with machine learning on Respberry pi

Respberrypi_object-recognition object recognition with machine learning on Respberry pi line.py 建立一支與樹梅派連線的 linebot 使用此 linebot 遠端控制樹梅派拍照 config.ini l

1 Dec 11, 2021
The official implementation of Equalization Loss for Long-Tailed Object Recognition (CVPR 2020) based on Detectron2

Equalization Loss for Long-Tailed Object Recognition Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, Junjie Yan ⚠️ We re

Jingru Tan 197 Dec 25, 2022
Neural Radiance Fields Using PyTorch

This project is a PyTorch implementation of Neural Radiance Fields (NeRF) for reproduction of results whilst running at a faster speed.

Vedant Ghodke 1 Feb 11, 2022
Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Adaptive Task-Relational Context (ATRC) This repository provides source code for the ICCV 2021 paper Exploring Relational Context for Multi-Task Dense

David Brüggemann 35 Dec 05, 2022
Newt - a Gaussian process library in JAX.

Newt __ \/_ (' \`\ _\, \ \\/ /`\/\ \\ \ \\

AaltoML 0 Nov 02, 2021
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm In-Place Activated BatchNorm for Memory-Optimized Training of DNNs In-Place Activated BatchNorm (InPlace-ABN) is a novel

1.3k Dec 29, 2022
A High-Performance Distributed Library for Large-Scale Bundle Adjustment

MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment This repo contains an official implementation of MegBA. MegBA is a

旷视研究院 3D 组 336 Dec 27, 2022
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

Sky Computing Introduction Sky Computing is a load-balanced framework for federated learning model parallelism. It adaptively allocate model layers to

HPC-AI Tech 72 Dec 27, 2022
Official implementation of "Generating 3D Molecules for Target Protein Binding"

Generating 3D Molecules for Target Protein Binding This is the official implementation of the GraphBP method proposed in the following paper. Meng Liu

DIVE Lab, Texas A&M University 74 Dec 07, 2022
OverFeat is a Convolutional Network-based image classifier and feature extractor.

OverFeat OverFeat is a Convolutional Network-based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participat

593 Dec 08, 2022
Joint Detection and Identification Feature Learning for Person Search

Person Search Project This repository hosts the code for our paper Joint Detection and Identification Feature Learning for Person Search. The code is

712 Dec 17, 2022
Контрольная работа по математическим методам машинного обучения

ML-MathMethods-Test Контрольная работа по математическим методам машинного обучения. Вычисление основных статистик, диаграмм и графиков, проверка разл

Stas Ivanovskii 1 Jan 06, 2022
Image-to-image translation with conditional adversarial nets

pix2pix Project | Arxiv | PyTorch Torch implementation for learning a mapping from input images to output images, for example: Image-to-Image Translat

Phillip Isola 9.3k Jan 08, 2023
PyTorch implementation of "Optimization Planning for 3D ConvNets"

Optimization-Planning-for-3D-ConvNets Code for the ICML 2021 paper: Optimization Planning for 3D ConvNets. Authors: Zhaofan Qiu, Ting Yao, Chong-Wah N

Zhaofan Qiu 2 Jan 12, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 04, 2023
A diff tool for language models

LMdiff Qualitative comparison of large language models. Demo & Paper: http://lmdiff.net LMdiff is a MIT-IBM Watson AI Lab collaboration between: Hendr

Hendrik Strobelt 27 Dec 29, 2022
This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Influence Selection for Active Learning (ISAL) This project hosts the code for implementing the ISAL algorithm for object detection and image classifi

25 Sep 11, 2022
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Daft-Exprt - PyTorch Implementation PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis The

Keon Lee 47 Dec 18, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022