Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Last update: Aug 21, 2022

Related tags

Overview

Robust Video Matting (RVM)

English | 中文

Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves 4K 76FPS and HD 104FPS on an Nvidia GTX 1080 Ti GPU. The project was developed at ByteDance Inc.

News

[Aug 25 2021] Source code and pretrained models are published.
[Jul 27 2021] Paper is accepted by WACV 2022.

Showreel

Watch the showreel video (YouTube, Bilibili) to see the model's performance.

All footage in the video are available in Google Drive and Baidu Pan (code: tb3w).

Demo

Webcam Demo: Run the model live in your browser. Visualize recurrent states.
Colab Demo: Test our model on your own videos with free GPU.

Download

We recommend MobileNetv3 models for most use cases. ResNet50 models are the larger variant with small performance improvements. Our model is available on various inference frameworks. See inference documentation for more instructions.

Framework	Download	Notes
PyTorch	rvm_mobilenetv3.pth rvm_resnet50.pth	Official weights for PyTorch. Doc
TorchHub	Nothing to Download.	Easiest way to use our model in your PyTorch project. Doc
TorchScript	rvm_mobilenetv3_fp32.torchscript rvm_mobilenetv3_fp16.torchscript rvm_resnet50_fp32.torchscript rvm_resnet50_fp16.torchscript	If inference on mobile, consider export int8 quantized models yourself. Doc
ONNX	rvm_mobilenetv3_fp32.onnx rvm_mobilenetv3_fp16.onnx rvm_resnet50_fp32.onnx rvm_resnet50_fp16.onnx	Tested on ONNX Runtime with CPU and CUDA backends. Provided models use opset 12. Doc, Exporter.
TensorFlow	rvm_mobilenetv3_tf.zip rvm_resnet50_tf.zip	TensorFlow 2 SavedModel. Doc
TensorFlow.js	rvm_mobilenetv3_tfjs_int8.zip	Run the model on the web. Demo, Starter Code
CoreML	rvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel rvm_mobilenetv3_1280x720_s0.375_int8.mlmodel rvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel rvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel	CoreML does not support dynamic resolution. Other resolutions can be exported yourself. Models require iOS 13+. `s` denotes `downsample_ratio`. Doc, Exporter

All models are available in Google Drive and Baidu Pan (code: gym7).

PyTorch Example

Install dependencies:

pip install -r requirements_inference.txt

Load the model:

import torch
from model import MattingNetwork

model = MattingNetwork('mobilenetv3').eval().cuda()  # or "resnet50"
model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))

To convert videos, we provide a simple conversion API:

from inference import convert_video

convert_video(
    model,                           # The model, can be on any device (cpu or cuda).
    input_source='input.mp4',        # A video file or an image sequence directory.
    output_type='video',             # Choose "video" or "png_sequence"
    output_composition='output.mp4', # File path if video; directory path if png sequence.
    output_video_mbps=4,             # Output video mbps. Not needed for png sequence.
    downsample_ratio=None,           # A hyperparameter to adjust or use None for auto.
    seq_chunk=12,                    # Process n frames at once for better parallelism.
)

Or write your own inference code:

from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from inference_utils import VideoReader, VideoWriter

reader = VideoReader('input.mp4', transform=ToTensor())
writer = VideoWriter('output.mp4', frame_rate=30)

bgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda()  # Green background.
rec = [None] * 4                                       # Initial recurrent states.
downsample_ratio = 0.25                                # Adjust based on your video.

with torch.no_grad():
    for src in DataLoader(reader):                     # RGB tensor normalized to 0 ~ 1.
        fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)  # Cycle the recurrent states.
        com = fgr * pha + bgr * (1 - pha)              # Composite to green background. 
        writer.write(com)                              # Write frame.

The models and converter API are also available through TorchHub.

# Load the model.
model = torch.hub.load("PeterL1n/RobustVideoMatting", "mobilenetv3") # or "resnet50"

# Converter API.
convert_video = torch.hub.load("PeterL1n/RobustVideoMatting", "converter")

Please see inference documentation for details on downsample_ratio hyperparameter, more converter arguments, and more advanced usage.

Training and Evaluation

Please refer to the training documentation to train and evaluate your own model.

Speed

Speed is measured with inference_speed_test.py for reference.

GPU	dType	HD (1920x1080)	4K (3840x2160)
RTX 3090	FP16	172 FPS	154 FPS
RTX 2060 Super	FP16	134 FPS	108 FPS
GTX 1080 Ti	FP32	104 FPS	74 FPS

Note 1: HD uses downsample_ratio=0.25, 4K uses downsample_ratio=0.125. All tests use batch size 1 and frame chunk 1.
Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.
Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding/decoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding/decoding in Python, please refer to PyNvCodec.

Project Members

You might also like...

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

10 Aug 30, 2022

Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

6 May 15, 2022

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

6 May 15, 2022

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

8 Oct 13, 2022

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

16 Dec 22, 2022

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Related tags

Overview

Robust Video Matting (RVM)

News

Showreel

Demo

Download

PyTorch Example

Training and Evaluation

Speed

Project Members

You might also like...

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

Simple ONNX operation generator. Simple Operation Generator for ONNX.

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

A few stylization coreML models that I've trained with CreateML

Github project for Attention-guided Temporal Coherent Video Object Matting.

Video Matting Refinement For Python

Releases(v1.1.0)

v1.1.0(Jan 23, 2022)

Owner

flow-dev

Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

Perturb-and-max-product: Sampling and learning in discrete energy-based models

unet for image segmentation

GitHub repository for the ICLR Computational Geometry & Topology Challenge 2021

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021.

PyTorch Implementation of DSB for Score Based Generative Modeling. Experiments managed using Hydra.

A curated list of neural network pruning resources.

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Code for "Layered Neural Rendering for Retiming People in Video."

automated systems to assist guarding corona Virus precautions for Closed Rooms (e.g. Halls, offices, etc..)

An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

Dungeons and Dragons randomized content generator

PyTorch implementation of Higher Order Recurrent Space-Time Transformer

Histocartography is a framework bringing together AI and Digital Pathology

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation (CIKM'17)

September-Assistant - Open-source Windows Voice Assistant

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR