Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Related tags

Deep LearningDVM
Overview

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]


Paper: https://arxiv.org/abs/2104.11208

Introduction

Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets. In this paper, we propose a deep learning-based video matting framework which employs a novel and effective spatio-temporal feature aggregation module (ST-FAM). As optical flow estimation can be very unreliable within matting regions, ST-FAM is designed to effectively align and aggregate information across different spatial scales and temporal frames within the network decoder. To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced. The other contribution consists of a large-scale video matting dataset with groundtruth alpha mattes for quantitative evaluation and real-world high-resolution videos with trimaps for qualitative evaluation. Quantitative and qualitative experimental results show that our framework significantly outperforms conventional video matting and deep image matting methods applied to video in presence of multi-frame temporal information.

Framework

framework

Dataset

We composite foreground images and videos onto high-resolution background videos to generate large-scale video matting training/testing dataset. Follow the steps to prepare the datasets. The structure is as the following.

DVM
  ├── fg
    ├── image
      ├── train
        ├── alpha
          ├── xxx.png
          ├── yyy.png
          ├── ...
        ├── fg
          ├── xxx.png
          ├── yyy.png
          ├── ...
      ├── test
        ├── alpha
          ├── xxx.png
          ├── yyy.png
          ├── ...
        ├── fg
          ├── xxx.png
          ├── yyy.png
          ├── ...
        ├── trimap
          ├── xxx.png
          ├── yyy.png
          ├── ...
    ├── video
      ├── train
        ├── 0000
          ├── a.mp4
          ├── f.mp4
        ├── ...
      ├── test
        ├── 0000
          ├── a.mp4
          ├── f.mp4
        ├── ...
  ├── bg
    ├── train
      ├── 0000.mp4
      ├── 0001.mp4
      ├── ...
    ├── test
      ├── 0000.mp4
      ├── 0001.mp4
      ├── ...
  1. Please contact Brian Price ([email protected]) for the Adobe Image Matting dataset.

  2. Put training fg/alpha images and testing fg/alpha/trimap images from Adobe dataset in the corresponding directories.

  3. Download training/testing videos and place them in the corresponding directories.

    Link: https://pan.baidu.com/s/1yBJr0SqsEjDToVAUb8dSCw Password: l9ck

  4. Use data/process.py to generate training/testing datasets. About 1T storage is needed.

Reference

If you find our work useful in your research, please consider citing:

@inproceedings{sun2021dvm,
  author    = {Yanan Sun and Guanzhi Wang and Qiao Gu and Chi-Keung Tang and Yu-Wing Tai}
  title     = {Deep Video Matting via Spatio-Temporal Alignment and Aggregation},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
}

Contact

If you have any questions or suggestions about this repo, please feel free to contact me ([email protected]).

Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking."

Expert-Linking Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking." This is

BoChen 12 Jan 01, 2023
Bayesian dessert for Lasagne

Gelato Bayesian dessert for Lasagne Recent results in Bayesian statistics for constructing robust neural networks have proved that it is one of the be

Maxim Kochurov 84 May 11, 2020
DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

DeepMetaHandles (CVPR2021 Oral) [paper] [animations] DeepMetaHandles is a shape deformation technique. It learns a set of meta-handles for each given

Liu Minghua 73 Dec 15, 2022
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 09, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

417 Dec 20, 2022
Tree-based Search Graph for Approximate Nearest Neighbor Search

TBSG: Tree-based Search Graph for Approximate Nearest Neighbor Search. TBSG is a graph-based algorithm for ANNS based on Cover Tree, which is also an

Fanxbin 2 Dec 27, 2022
A quantum game modeling of pandemic (QHack 2022)

Contributors: @JongheumJung, @YoonjaeChung, @GyunghunKim Abstract In the regime of a global pandemic, leaders around the world need to consider variou

Yoonjae Chung 8 Apr 03, 2022
This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Self-Supervised Learning with Vision Transformers By Zhenda Xie*, Yutong Lin*, Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao and Han Hu This repo is the

Swin Transformer 529 Jan 02, 2023
Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Intro This repository contains code to generate data and reproduce experiments from our NeurIPS 2019 paper: Boris Knyazev, Graham W. Taylor, Mohamed R

Boris Knyazev 242 Jan 06, 2023
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

Matteo Tanzi 0 May 04, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

90 Dec 29, 2022
Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

SymEngine Python Wrappers Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library. Installation Pip See License section

136 Dec 28, 2022
PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

Learning Character-Agnostic Motion for Motion Retargeting in 2D We provide PyTorch implementation for our paper Learning Character-Agnostic Motion for

Rundi Wu 367 Dec 22, 2022
Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

Reconstructing 3D Human Pose by Watching Humans in the Mirror Qi Fang*, Qing Shuai*, Junting Dong, Hujun Bao, Xiaowei Zhou CVPR 2021 Oral The videos a

ZJU3DV 178 Dec 13, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.13

Keon Lee 140 Dec 21, 2022
Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

SSRL-for-image-classification Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

Feng 2 Nov 19, 2021
An official implementation of the Anchor DETR.

Anchor DETR: Query Design for Transformer-Based Detector Introduction This repository is an official implementation of the Anchor DETR. We encode the

MEGVII Research 276 Dec 28, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

291 Dec 24, 2022