MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Overview

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Codes for the following paper:

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images
Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, James Tompkin
ECCV 2020

High-level overview of approach.

See more at our project page.

If you use these codes, please cite:

@inproceedings{Attal:2020:ECCV,
    author    = "Benjamin Attal and Selena Ling and Aaron Gokaslan and Christian Richardt and James Tompkin",
    title     = "{MatryODShka}: Real-time {6DoF} Video View Synthesis using Multi-Sphere Images",
    booktitle = "European Conference on Computer Vision (ECCV)",
    month     = aug,
    year      = "2020",
    url       = "https://visual.cs.brown.edu/matryodshka"
}

Note that our codes are based on the code from the paper "Stereo Maginification: Learning View Synthesis using Multiplane Images" by Zhou et al. [1], and on the code from the paper "Pixel2mesh: Generating 3D Mesh Models from Single RGB Images." by Wang et al. [3]. Please also cite their work.

Setup

  • Create a conda environment from the matryodshka-gpu.yml file.
  • Run ./download_glob.sh to download the files needed for training and testing.
  • Download the dataset as in Section Replica dataset.

Training the model

See train.py for training the model.

  • To train with transform inverse regularization, use --transform_inverse_reg flag.

  • To train with CoordNet, use --coord_net flag.

  • To experiment with different losses (elpips or l2), use --which_loss flag.

    • To train with spherical weighting on loss maps, use --spherical_attention flag.
  • To train with graph convolution network (GCN), use --gcn flag. Note the particular GCN architecture definition we used is from the Pixel2Mesh repo [3].

  • The current scripts support training on Replica 360 and cubemap dataset and RealEstate10K dataset. Use --input_type to switch between these types of inputs (ODS, PP, REALESTATE_PP).

See scripts/train/*.sh for some sample scripts.

Testing the model

See test.py for testing the model with replica-360 test set.

  • When testing on video frames, e.g. test_video_640x320, include on_video in --test_type flag.
  • When testing on high-resolution images, include high_res in --test_type flag.

See scripts/test/*.sh for sample scripts.

Evaluation

See eval.py for evaluating the model, which saves the metric scores into a json file. We evaluate our models on

  • third-view reconstruction quality

    • See scripts/eval/*-reg.sh for a sample script.
  • frame-to-frame reconstruction differences on video sequences to evaluate the effect of transform inverse regularization on temporal consistency.

    • Include on_video when specifying the --eval_type flag.
    • See scripts/eval/*-video.sh for a sample script.

Pre-trained model

Download models pre-trained with and without transform inverse regularization by running ./download_model.sh. These can also be found here at the Brown library for archival purposes.

Replica dataset

We rendered a 360 and a cubemap dataset for training from the Facebook Replica Dataset [2]. This data can be found here at the Brown library for archival purposes. You should have access to the following datasets.

  • train_640x320
  • test_640x320
  • test_video_640x320

You can also find the camera pose information here that were used to render the training dataset. Each line of the txt fileach line of the txt file is formatted as below:

camera_position_x camera_position_y camera_position_z ods_baseline target1_offset_x target1_offset_y target1_offset_z target2_offset_x target2_offset_y target2_offset_z target3_offset_x target3_offset_y target3_offset_z

We also have a fork of the Replica dataset codebase which can regenerate our data from scratch. This contains customized rendering scripts that allow output of ODS, equirectangular, and cubemap projection spherical imagery, along with corresponding depth maps.

Note that the 360 dataset we release for download was rendered with an incorrect 90-degree camera rotation around the up vector and a horizontal flip. Regenerating the dataset from our released code fork with the customized rendering scripts will not include this coordinate change. The output model performance should be approximately the same.

Exporting the model to ONNX

We export our model to ONNX by firstly converting the checkpoint into a pb file, which then gets converted to an onnx file with the tf2onnx module. See export.py for exporting the model into .pb file.

See scripts/export/model-name.sh for a sample script to run export.py, and scripts/export/pb2onnx.sh for a sample script to run pb-to-onnx conversion.

Unity Application + ONNX to TensorRT Conversion

We are still working on releasing the real-time Unity application and onnx2trt conversion scripts. Please bear with us!

References

[1] Zhou, Tinghui, et al. "Stereo magnification: Learning view synthesis using multiplane images." arXiv preprint arXiv:1805.09817 (2018). https://github.com/google/stereo-magnification

[2] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019). https://github.com/facebookresearch/Replica-Dataset

[3] Wang, Nanyang, et al. "Pixel2mesh: Generating 3d mesh models from single rgb images." Proceedings of the European Conference on Computer Vision (ECCV). 2018. https://github.com/nywang16/Pixel2Mesh

Owner
Brown University Visual Computing Group
Brown University Visual Computing Group
The-Secret-Sharing-Schemes - This interactive script demonstrates the Secret Sharing Schemes algorithm

The-Secret-Sharing-Schemes This interactive script demonstrates the Secret Shari

Nishaant Goswamy 1 Jan 02, 2022
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on

MI^2 DataLab 4 Aug 01, 2022
Simple tutorials using Google's TensorFlow Framework

TensorFlow-Tutorials Introduction to deep learning based on Google's TensorFlow framework. These tutorials are direct ports of Newmu's Theano Tutorial

Nathan Lintz 6k Jan 06, 2023
Tool for live presentations using manim

manim-presentation Tool for live presentations using manim Install pip install manim-presentation opencv-python Usage Use the class Slide as your sce

Federico Galatolo 146 Jan 06, 2023
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Lottery Simulator 2022 for Web Launch Application Developed by John Seong in Ontario. This game was designed to encourage young people not to gamble o

John Seong 2 Sep 02, 2022
Implementation for Shape from Polarization for Complex Scenes in the Wild

sfp-wild Implementation for Shape from Polarization for Complex Scenes in the Wild project website | paper Code and dataset will be released soon. Int

Chenyang LEI 41 Dec 23, 2022
A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍 用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来

44 Sep 15, 2022
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023
ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

49 Jan 03, 2023
SW components and demos for visual kinship recognition. An emphasis is put on the FIW dataset-- data loaders, benchmarks, results in summary.

FIW Data Development Kit Table of Contents Introduction Families In the Wild Database Publications Organization To Do License Getting Involved Introdu

Joseph P. Robinson 12 Jun 04, 2022
PyTorch code accompanying the paper "Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning" (NeurIPS 2021).

HIGL This is a PyTorch implementation for our paper: Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning (NeurIPS 2021). Our cod

Junsu Kim 20 Dec 14, 2022
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022
Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

YOLOv5-GUI 🎉 YOLOv5算法(ver.6及ver.5)的Qt-GUI实现 🎉 Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5). 基于YOLOv5的v5版本和v6版本及Javacr大佬的UI逻辑进行编写

EricFang 12 Dec 28, 2022
Official repository for Natural Image Matting via Guided Contextual Attention

GCA-Matting: Natural Image Matting via Guided Contextual Attention The source codes and models of Natural Image Matting via Guided Contextual Attentio

Li Yaoyi 349 Dec 26, 2022
Explainability of the Implications of Supervised and Unsupervised Face Image Quality Estimations Through Activation Map Variation Analyses in Face Recognition Models

Explainable_FIQA_WITH_AMVA Note This is the official repository of the paper: Explainability of the Implications of Supervised and Unsupervised Face I

3 May 08, 2022
Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US simulation

AutomaticUSnavigation Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US

Cesare Magnetti 6 Dec 05, 2022
Malmo Collaborative AI Challenge - Team Pig Catcher

The Malmo Collaborative AI Challenge - Team Pig Catcher Approach The challenge involves 2 agents who can either cooperate or defect. The optimal polic

Kai Arulkumaran 66 Jun 29, 2022
This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation).

FlatGCN This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation, submitted to ICASSP2022). Req

Dreamer 2 Aug 09, 2022
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

22 Dec 12, 2022