Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Last update: Dec 14, 2022

Related tags

Overview

RMNet

This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation.

Cite this work

@inproceedings{xie2021efficient,
  title={Efficient Regional Memory Network for Video Object Segmentation},
  author={Xie, Haozhe and 
          Yao, Hongxun and 
          Zhou, Shangchen and 
          Zhang, Shengping and 
          Sun, Wenxiu},
  booktitle={CVPR},
  year={2021}
}

Datasets

We use the ECSSD, COCO, PASCAL VOC, MSRA10K, DAVIS, and YouTube-VOS datasets in our experiments, which are available below:

Pretrained Models

The pretrained models for DAVIS and YouTube-VOS are available as follows:

RMNet for DAVIS (202 MB)
RMNet for YouTube-VOS (202 MB)

Prerequisites

Clone the Code Repository

git clone https://github.com/hzxie/RMNet.git

Install Python Denpendencies

cd RMNet
pip install -r requirements.txt

Build PyTorch Extensions

NOTE: PyTorch >= 1.4, CUDA >= 9.0 and GCC >= 4.9 are required.

RMNET_HOME=`pwd`

cd $RMNET_HOME/extensions/reg_att_map_generator
python setup.py install --user

cd $RMNET_HOME/extensions/flow_affine_transformation
python setup.py install --user

Precompute the Optical Flow

For the DAVIS dataset, the optical flows are computed by FlowNet2-CSS with the model pretrained on FlyingThings3D.
For the YouTube-VOS dataset, the optical flows are computed by RAFT with the model pretrained on Sintel.

Update Settings in `config.py`

You need to update the file path of the datasets:

__C.DATASETS                                     = edict()
__C.DATASETS.DAVIS                               = edict()
__C.DATASETS.DAVIS.INDEXING_FILE_PATH            = './datasets/DAVIS.json'
__C.DATASETS.DAVIS.IMG_FILE_PATH                 = '/path/to/Datasets/DAVIS/JPEGImages/480p/%s/%05d.jpg'
__C.DATASETS.DAVIS.ANNOTATION_FILE_PATH          = '/path/to/Datasets/DAVIS/Annotations/480p/%s/%05d.png'
__C.DATASETS.DAVIS.OPTICAL_FLOW_FILE_PATH        = '/path/to/Datasets/DAVIS/OpticalFlows/480p/%s/%05d.flo'
__C.DATASETS.YOUTUBE_VOS                         = edict()
__C.DATASETS.YOUTUBE_VOS.INDEXING_FILE_PATH      = '/path/to/Datasets/YouTubeVOS/%s/meta.json'
__C.DATASETS.YOUTUBE_VOS.IMG_FILE_PATH           = '/path/to/Datasets/YouTubeVOS/%s/JPEGImages/%s/%s.jpg'
__C.DATASETS.YOUTUBE_VOS.ANNOTATION_FILE_PATH    = '/path/to/Datasets/YouTubeVOS/%s/Annotations/%s/%s.png'
__C.DATASETS.YOUTUBE_VOS.OPTICAL_FLOW_FILE_PATH  = '/path/to/Datasets/YouTubeVOS/%s/OpticalFlows/%s/%s.flo'
__C.DATASETS.PASCAL_VOC                          = edict()
__C.DATASETS.PASCAL_VOC.INDEXING_FILE_PATH       = '/path/to/Datasets/voc2012/trainval.txt'
__C.DATASETS.PASCAL_VOC.IMG_FILE_PATH            = '/path/to/Datasets/voc2012/images/%s.jpg'
__C.DATASETS.PASCAL_VOC.ANNOTATION_FILE_PATH     = '/path/to/Datasets/voc2012/masks/%s.png'
__C.DATASETS.ECSSD                               = edict()
__C.DATASETS.ECSSD.N_IMAGES                      = 1000
__C.DATASETS.ECSSD.IMG_FILE_PATH                 = '/path/to/Datasets/ecssd/images/%s.jpg'
__C.DATASETS.ECSSD.ANNOTATION_FILE_PATH          = '/path/to/Datasets/ecssd/masks/%s.png'
__C.DATASETS.MSRA10K                             = edict()
__C.DATASETS.MSRA10K.INDEXING_FILE_PATH          = './datasets/msra10k.txt'
__C.DATASETS.MSRA10K.IMG_FILE_PATH               = '/path/to/Datasets/msra10k/images/%s.jpg'
__C.DATASETS.MSRA10K.ANNOTATION_FILE_PATH        = '/path/to/Datasets/msra10k/masks/%s.png'
__C.DATASETS.MSCOCO                              = edict()
__C.DATASETS.MSCOCO.INDEXING_FILE_PATH           = './datasets/mscoco.txt'
__C.DATASETS.MSCOCO.IMG_FILE_PATH                = '/path/to/Datasets/coco2017/images/train2017/%s.jpg'
__C.DATASETS.MSCOCO.ANNOTATION_FILE_PATH         = '/path/to/Datasets/coco2017/masks/train2017/%s.png'
__C.DATASETS.ADE20K                              = edict()
__C.DATASETS.ADE20K.INDEXING_FILE_PATH           = './datasets/ade20k.txt'
__C.DATASETS.ADE20K.IMG_FILE_PATH                = '/path/to/Datasets/ADE20K_2016_07_26/images/training/%s.jpg'
__C.DATASETS.ADE20K.ANNOTATION_FILE_PATH         = '/path/to/Datasets/ADE20K_2016_07_26/images/training/%s_seg.png'

# Dataset Options: DAVIS, DAVIS_FRAMES, YOUTUBE_VOS, ECSSD, MSCOCO, PASCAL_VOC, MSRA10K, ADE20K
__C.DATASET.TRAIN_DATASET                        = ['ECSSD', 'PASCAL_VOC', 'MSRA10K', 'MSCOCO']  # Pretrain
__C.DATASET.TRAIN_DATASET                        = ['YOUTUBE_VOS', 'DAVISx5']                    # Fine-tune
__C.DATASET.TEST_DATASET                         = 'DAVIS'

# Network Options: RMNet, TinyFlowNet
__C.TRAIN.NETWORK                                = 'RMNet'

Get Started

To train RMNet, you can simply use the following command:

python3 runner.py

To test RMNet, you can use the following command:

python3 runner.py --test --weights=/path/to/pretrained/model.pth

License

This project is open sourced under MIT license.

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Related tags

Overview

RMNet

Cite this work

Datasets

Pretrained Models

Prerequisites

Clone the Code Repository

Install Python Denpendencies

Build PyTorch Extensions

Precompute the Optical Flow

Update Settings in `config.py`

Get Started

License

Owner

Haozhe Xie

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

Merlion: A Machine Learning Framework for Time Series Intelligence

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Python3 Implementation of (Subspace Constrained) Mean Shift Algorithm in Euclidean and Directional Product Spaces

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Asymmetric metric learning for knowledge transfer

10x faster matrix and vector operations

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

PyTorch implementation of PP-LCNet: A Lightweight CPU Convolutional Neural Network

Yolo Traffic Light Detection With Python

Face recognition. Redefined.

Compressed Video Action Recognition

Pytorch implementation of RED-SDS (NeurIPS 2021).

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Related tags

Overview

RMNet

Cite this work

Datasets

Pretrained Models

Prerequisites

Clone the Code Repository

Install Python Denpendencies

Build PyTorch Extensions

Precompute the Optical Flow

Update Settings in config.py

Get Started

License

Owner

Haozhe Xie

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

Merlion: A Machine Learning Framework for Time Series Intelligence

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Python3 Implementation of (Subspace Constrained) Mean Shift Algorithm in Euclidean and Directional Product Spaces

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Asymmetric metric learning for knowledge transfer

10x faster matrix and vector operations

code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

PyTorch implementation of PP-LCNet: A Lightweight CPU Convolutional Neural Network

Yolo Traffic Light Detection With Python

Face recognition. Redefined.

Compressed Video Action Recognition

Pytorch implementation of RED-SDS (NeurIPS 2021).

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"

Update Settings in `config.py`