Github project for Attention-guided Temporal Coherent Video Object Matting.

Related tags

Deep LearningTCVOM
Overview

Attention-guided Temporal Coherent Video Object Matting

This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matting (arXiv:2105.11427). We provide our code, the supplementary material, trained model and VideoMatting108 dataset here. For the trimap generation module, please see TCVOM-TGM.

The code, the trained model and the dataset are for academic and non-commercial use only.

The supplementary material can be found here.

Table of Contents

VideoMatting108 Dataset

VideoMatting108 is a large video matting dataset that contains 108 video clips with their corresponding groundtruth alpha matte, all in 1080p resolution, 80 clips for training and 28 clips for validation.

You can download the dataset here. The total size of the dataset is 192GB and we've split the archive into 1GB chunks.

The contents of the dataset are the following:

  • FG: contains the foreground RGBA image, where the alpha channel is the groundtruth matte and RGB channel is the groundtruth foreground.
  • BG: contains background RGB image used for composition.
  • flow_png_val: contains quantized optical flow of validation video clips for calculating MESSDdt metric. You can choose not to download this folder if you don't need to calculate this metric. You can refer to the _flow_read() function in calc_metric.py for usage.
  • *_videos*.txt: train / val split.
  • frame_corr.json: FG / BG frame pair used for composition.

After decompressing, the dataset folder should have the structure of the following (please rename flow_png_val to flow_png):

|---dataset
  |-FG_done
  |-BG_done
  |-flow_png
  |-frame_corr.json
  |-train_videos.txt
  |-train_videos_subset.txt
  |-val_videos.txt
  |-val_videos_subset.txt

Models

Currently our method supports four different image matting methods as base.

  • gca (GCA Matting by Li et al., code is from here)
  • dim (DeepImageMatting by Xu et al., we use the reimplementation code from here)
  • index (IndexNet Matting by Lu et al., code is from here)
  • fba (FBA Matting by Forte et al., code is from here)
    • There are some differences in our training and the original FBA paper. We believe that there are still space for further performance gain through hyperparameter fine-tuning.
      • We did not use the foreground extension technique during training. Also we use four GPUs instead of one.
      • We used the conventional adam optimizer instead of radam.
      • We used mean instead of sum during loss computation to keep the loss balanced (especially for L_af).

The trained model can be downloaded here. We provide four different weights for every base method.

  • *_SINGLE_Lim.pth: The trained weight of the base image matting method on the VideoMatting108 dataset without TAM. Only L_im is used during the pretrain. This is the baseline model.
  • *_TAM_Lim_Ltc_Laf.pth: The trained weight of base image matting method with TAM on VideoMatting108 dataset. L_im, L_tc and L_af is used during the training. This is our full model.
  • *_TAM_pretrain.pth: The pretrained weight of base image matting method with TAM on the DIM dataset. Only L_im is used during the training.
  • *_fe.pth: The converted weight from the original model checkpoint, only used for pretraining TAM.

Results

This is the quantitative result on VideoMatting108 validation dataset with medium width trimap. The metric is averaged on all 28 validation video clips.

We use CUDA 10.2 during the inference. Using CUDA 11.1 might result in slightly lower metric. All metrics are calculated with calc_metric.py.

Method Loss SSDA dtSSD MESSDdt MSE*(10^3) mSAD
GCA+F (Baseline) L_im 55.82 31.64 2.15 8.20 40.85
GCA+TAM L_im+L_tc+L_af 50.41 27.28 1.48 7.07 37.65
DIM+F (Baseline) L_im 61.85 34.55 2.82 9.99 44.38
DIM+TAM L_im+L_tc+L_af 58.94 29.89 2.06 9.02 43.28
Index+F (Baseline) L_im 58.53 33.03 2.33 9.37 43.53
Index+TAM L_im+L_tc+L_af 57.91 29.36 1.81 8.78 43.17
FBA+F (Baseline) L_im 57.47 29.60 2.19 9.28 40.57
FBA+TAM L_im+L_tc+L_af 51.57 25.50 1.59 7.61 37.24

Usage

Requirements

Python=3.8
Pytorch=1.6.0
numpy
opencv-python
imgaug
tqdm
yacs

Inference

pred_single.py and pred_vmn.py automatically use all CUDA devices available. pred_test.py uses cuda:0 device as default.

  • Inference on VideoMatting108 validation set using our full model

    • python pred_vmd.py --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir
  • Inference on VideoMatting108 validation set using the baseline model

    • python pred_single.py --dataset vmd --model {gca,dim,index,fba} --data /path/to/VideoMatting108dataset --load /path/to/weight.pth --trimap {wide,narrow,medium} --save /path/to/outdir
  • Calculating metrics

    • python calc_metric.py --pred /path/to/prediction/result --data /path/to/VideoMatting108dataset
    • The result will be saved in metric.json inside /path/to/prediction/result. Use tail to see the final averaged result.

  • Inference on test video clips

    • First, prepare the data. Make sure the workspace folder has the structure of the following:

      |---workspace
        |---video1
          |---00000_rgb.png
          |---00000_trimap.png
          |---00001_rgb.png
          |---00001_trimap.png
          |---....
        |---video2
        |---video3
        |---...
      
    • python pred_test.py --gpu CUDA_DEVICES_NUMBER_SPLIT_BY_COMMA --model {gca,vmn_gca,dim,vmn_dim,index,vmn_index,fba,vmn_fba} --data /path/to/workspace --load /path/to/weight.pth --save /path/to/outdir [video1] [video2] ...
      • The model parameter: vmn_BASEMETHOD corresponds to our full model, BASEMETHOD corresponds to the baseline model.
      • Without specifying the name of the video clip folders in the command line, the script will process all video clips under /path/to/workspace.

Training

PY_CMD="python -m torch.distributed.launch --nproc_per_node=NUMBER_OF_CUDA_DEVICES"
  • Pretrain TAM on DIM dataset. Please see cfgs/pretrain_vmn_BASEMETHOD.yaml for configuration and refer to dataset/DIM.py for dataset preparation.

    $PY_CMD pretrain_ddp.py --cfg cfgs/pretrain_vmn_index.yaml
  • Training our full method on VideoMatting108 dataset. This will load the pretrained TAM weight as initialization. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep.yaml for configuration.

    $PY_CMD train_ddp.py --cfg /path/to/config.yaml
  • Training the baseline method on VideoMatting108 dataset without TAM. Please see cfgs/vmd_vmn_BASEMETHOD_pretrained_30ep_single.yaml for configuration.

    $PY_CMD train_single_ddp.py --cfg /path/to/config.yaml

Contact

If you have any questions, please feel free to contact [email protected].

Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

RE results graph visualization and company clustering Installation pip install -r requirements.txt python -m nltk.downloader stopwords python3.7 main.

Jieun Han 1 Oct 06, 2022
Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)

Graph Posterior Network This is the official code repository to the paper Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classifica

Maximilian Stadler 30 Dec 05, 2022
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper] This is Official PyTorch implementatio

42 Nov 01, 2022
A computer vision pipeline to identify the "icons" in Christian paintings

Christian-Iconography A computer vision pipeline to identify the "icons" in Christian paintings. A bit about iconography. Iconography is related to id

Rishab Mudliar 3 Jul 30, 2022
Powerful and efficient Computer Vision Annotation Tool (CVAT)

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

OpenVINO Toolkit 8.6k Jan 01, 2023
A Closer Look at Reference Learning for Fourier Phase Retrieval

A Closer Look at Reference Learning for Fourier Phase Retrieval This repository contains code for our NeurIPS 2021 Workshop on Deep Learning and Inver

Tobias Uelwer 1 Oct 28, 2021
Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

InfoSwap: Information Bottleneck Disentanglement for Identity Swapping Code usage Please check out the user manual page. Paper Gege Gao, Huaibo Huang,

Grace Hešeri 56 Dec 20, 2022
Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

Hung-Tu Chen 342 Jan 01, 2023
This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

DV Lab 73 Sep 18, 2022
Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Official PyTorch implementation of the NeurIPS 2021 paper Mingcong Liu, Qiang

onion 462 Dec 29, 2022
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy Simplex Algorithm is a popular algorithm for linear programmi

Reda BELHAJ 2 Oct 12, 2022
Learning-Augmented Dynamic Power Management

Learning-Augmented Dynamic Power Management This repository contains source code accompanying paper Learning-Augmented Dynamic Power Management with M

Adam 0 Feb 22, 2022
Neural Dynamic Policies for End-to-End Sensorimotor Learning

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning.

Shikhar Bahl 47 Dec 11, 2022
phylotorch-bito is a package providing an interface to BITO for phylotorch

phylotorch-bito phylotorch-bito is a package providing an interface to BITO for phylotorch Dependencies phylotorch BITO Installation Get the source co

Mathieu Fourment 2 Sep 01, 2022
Transformer in Vision

Transformer-in-Vision Recent Transformer-based CV and related works. Welcome to comment/contribute! Keep updated. Resource SCENIC: A JAX Library for C

Yong-Lu Li 1.1k Dec 30, 2022
FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks This is our implementation for the paper: FinGAT: A Financial Graph At

Yu-Che Tsai 64 Dec 13, 2022
Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
NeurIPS 2021, "Fine Samples for Learning with Noisy Labels"

[Official] FINE Samples for Learning with Noisy Labels This repository is the official implementation of "FINE Samples for Learning with Noisy Labels"

mythbuster 27 Dec 23, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022