MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Overview

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Codes for the following paper:

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images
Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, James Tompkin
ECCV 2020

High-level overview of approach.

See more at our project page.

If you use these codes, please cite:

@inproceedings{Attal:2020:ECCV,
    author    = "Benjamin Attal and Selena Ling and Aaron Gokaslan and Christian Richardt and James Tompkin",
    title     = "{MatryODShka}: Real-time {6DoF} Video View Synthesis using Multi-Sphere Images",
    booktitle = "European Conference on Computer Vision (ECCV)",
    month     = aug,
    year      = "2020",
    url       = "https://visual.cs.brown.edu/matryodshka"
}

Note that our codes are based on the code from the paper "Stereo Maginification: Learning View Synthesis using Multiplane Images" by Zhou et al. [1], and on the code from the paper "Pixel2mesh: Generating 3D Mesh Models from Single RGB Images." by Wang et al. [3]. Please also cite their work.

Setup

  • Create a conda environment from the matryodshka-gpu.yml file.
  • Run ./download_glob.sh to download the files needed for training and testing.
  • Download the dataset as in Section Replica dataset.

Training the model

See train.py for training the model.

  • To train with transform inverse regularization, use --transform_inverse_reg flag.

  • To train with CoordNet, use --coord_net flag.

  • To experiment with different losses (elpips or l2), use --which_loss flag.

    • To train with spherical weighting on loss maps, use --spherical_attention flag.
  • To train with graph convolution network (GCN), use --gcn flag. Note the particular GCN architecture definition we used is from the Pixel2Mesh repo [3].

  • The current scripts support training on Replica 360 and cubemap dataset and RealEstate10K dataset. Use --input_type to switch between these types of inputs (ODS, PP, REALESTATE_PP).

See scripts/train/*.sh for some sample scripts.

Testing the model

See test.py for testing the model with replica-360 test set.

  • When testing on video frames, e.g. test_video_640x320, include on_video in --test_type flag.
  • When testing on high-resolution images, include high_res in --test_type flag.

See scripts/test/*.sh for sample scripts.

Evaluation

See eval.py for evaluating the model, which saves the metric scores into a json file. We evaluate our models on

  • third-view reconstruction quality

    • See scripts/eval/*-reg.sh for a sample script.
  • frame-to-frame reconstruction differences on video sequences to evaluate the effect of transform inverse regularization on temporal consistency.

    • Include on_video when specifying the --eval_type flag.
    • See scripts/eval/*-video.sh for a sample script.

Pre-trained model

Download models pre-trained with and without transform inverse regularization by running ./download_model.sh. These can also be found here at the Brown library for archival purposes.

Replica dataset

We rendered a 360 and a cubemap dataset for training from the Facebook Replica Dataset [2]. This data can be found here at the Brown library for archival purposes. You should have access to the following datasets.

  • train_640x320
  • test_640x320
  • test_video_640x320

You can also find the camera pose information here that were used to render the training dataset. Each line of the txt fileach line of the txt file is formatted as below:

camera_position_x camera_position_y camera_position_z ods_baseline target1_offset_x target1_offset_y target1_offset_z target2_offset_x target2_offset_y target2_offset_z target3_offset_x target3_offset_y target3_offset_z

We also have a fork of the Replica dataset codebase which can regenerate our data from scratch. This contains customized rendering scripts that allow output of ODS, equirectangular, and cubemap projection spherical imagery, along with corresponding depth maps.

Note that the 360 dataset we release for download was rendered with an incorrect 90-degree camera rotation around the up vector and a horizontal flip. Regenerating the dataset from our released code fork with the customized rendering scripts will not include this coordinate change. The output model performance should be approximately the same.

Exporting the model to ONNX

We export our model to ONNX by firstly converting the checkpoint into a pb file, which then gets converted to an onnx file with the tf2onnx module. See export.py for exporting the model into .pb file.

See scripts/export/model-name.sh for a sample script to run export.py, and scripts/export/pb2onnx.sh for a sample script to run pb-to-onnx conversion.

Unity Application + ONNX to TensorRT Conversion

We are still working on releasing the real-time Unity application and onnx2trt conversion scripts. Please bear with us!

References

[1] Zhou, Tinghui, et al. "Stereo magnification: Learning view synthesis using multiplane images." arXiv preprint arXiv:1805.09817 (2018). https://github.com/google/stereo-magnification

[2] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019). https://github.com/facebookresearch/Replica-Dataset

[3] Wang, Nanyang, et al. "Pixel2mesh: Generating 3d mesh models from single rgb images." Proceedings of the European Conference on Computer Vision (ECCV). 2018. https://github.com/nywang16/Pixel2Mesh

Owner
Brown University Visual Computing Group
Brown University Visual Computing Group
Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes Official source code. Appears at CVPR 2022 This repository provides a new dataset, named In

27 Dec 27, 2022
An air quality monitoring service with a Raspberry Pi and a SDS011 sensor.

Raspberry Pi Air Quality Monitor A simple air quality monitoring service for the Raspberry Pi. Installation Clone the repository and run the following

rydercalmdown 24 Dec 09, 2022
Brain tumor detection using Convolution-Neural Network (CNN)

Detect and Classify Brain Tumor using CNN. A system performing detection and classification by using Deep Learning Algorithms using Convolution-Neural Network (CNN).

assia 1 Feb 07, 2022
Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".

Consistent Depth of Moving Objects in Video This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in

Google 203 Jan 05, 2023
Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System The possibilities to involve

Babu Kumaran Nalini 0 Nov 19, 2021
a basic code repository for basic task in CV(classification,detection,segmentation)

basic_cv a basic code repository for basic task in CV(classification,detection,segmentation,tracking) classification generate dataset train predict de

1 Oct 15, 2021
Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

n-stage Latent Dirichlet Allocation (n-LDA) Proposed n-LDA & A Novel Approach for classical LDA Latent Dirichlet Allocation (LDA) is a generative prob

Anıl Güven 4 Mar 07, 2022
Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

NIRPS-ETC Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph February 2

Nolan Grieves 2 Sep 15, 2022
Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set This is the repository for the Deep Learning proje

Robert Krug 3 Feb 06, 2022
This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 03, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Fine-grainedImageClassification Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification We trained model here: lin

ZhenchaoTang 14 Oct 21, 2022
Alpha-Zero - Telegram Group Manager Bot Written In Python Using Pyrogram

✨ Alpha Zero Bot ✨ Telegram Group Manager Bot + Userbot Written In Python Using

1 Feb 17, 2022
PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

GCResNet PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021. The code will

11 May 19, 2022
iris - Open Source Photos Platform Powered by PyTorch

Open Source Photos Platform Powered by PyTorch. Submission for PyTorch Annual Hackathon 2021.

Omkar Prabhu 137 Sep 10, 2022
PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com

Jake Tae 5 Jan 27, 2022
EfficientNetV2 implementation using PyTorch

EfficientNetV2-S implementation using PyTorch Train Steps Configure imagenet path by changing data_dir in train.py python main.py --benchmark for mode

Jahongir Yunusov 86 Dec 29, 2022
[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

MonoRUn MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University) 96 Dec 10, 2022
Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors

Gas detection Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors. Description The MQ-2 sensor can detect multiple gases (CO, H2, CH4, LPG,

Filip Š 15 Sep 30, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022