MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Last update: Jan 04, 2023

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

If you find our work useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2104-13325,
  author    = {Zhenpei Yang and
               Zhile Ren and
               Qi Shan and
               Qixing Huang},
  title     = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
  journal   = {CoRR},
  volume    = {abs/2104.13325},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.13325},
  eprinttype = {arXiv},
  eprint    = {2104.13325},
  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

✏️ Changelog

Nov 27 2021

Initial release. Note that our released code achieve improved results than those reported in the initial arxiv pre-print. In addition, we include the evaluation on DTU dataset. We will update our paper soon.

⚙️ Installation

Click to expand

The code is tested with CUDA10.1. Please use following commands to install dependencies:

conda create --name mvs2d python=3.7
conda activate mvs2d

pip install -r requirements.txt

The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.

.
├── configs
├── datasets
├── demo
├── networks
├── scripts
├── pretrained_model
│   ├── demon
│   ├── dtu
│   └── scannet
├── data
│   ├── DeMoN
│   ├── DTU_hr
│   ├── SampleSet
│   ├── ScanNet
│   └── ScanNet_3_frame_jitter_pose.npy
├── splits
│   ├── DeMoN_samples_test_2_frame.npy
│   ├── DeMoN_samples_train_2_frame.npy
│   ├── ScanNet_3_frame_test.npy
│   ├── ScanNet_3_frame_train.npy
│   └── ScanNet_3_frame_val.npy

🎬 Demo

Click to expand

After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.

python demo.py --cfg configs/scannet/release.conf

The results are saved as demo.png

⏳ Training & Testing

We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.

ScanNet

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗 noisy pose 🔗

Training

First download and extract ScanNet training data and split. Then run following command to train our model.

bash scripts/scannet/train.sh

To train the multi-scale attention model, add --robust 1 to the training command in scripts/scannet/train.sh.

To train our model with noisy input pose, add --perturb_pose 1 to the training command in scripts/scannet/train.sh.

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/scannet/test.sh

You should get something like these:

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.059	0.016	0.026	0.157	0.084	0.964	0.995	0.999	0.108	0.079	0.856	0.974	0.996

SUN3D/RGBD/Scenes11

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗

Training

First download and extract DeMoN training data and split. Then run following command to train our model.

bash scripts/demon/train.sh

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/demon/test.sh

You should get something like these:

dataset rgbd: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.082	0.165	0.047	0.440	0.147	0.921	0.939	0.948	0.325	0.284	0.753	0.894	0.933

dataset scenes11: 256

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.046	0.080	0.018	0.439	0.107	0.976	0.989	0.993	0.155	0.058	0.822	0.945	0.979

dataset sun3d: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.099	0.055	0.044	0.304	0.137	0.893	0.970	0.993	0.224	0.171	0.649	0.890	0.969

-> Done!

depth

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.071	0.096	0.033	0.402	0.127	0.938	0.970	0.981	0.222	0.152	0.755	0.915	0.963

DTU

Click to expand

Download

data 🔗 eval data 🔗 pretrained models 🔗

Training

First download and extract DTU training data. Then run following command to train our model.

bash scripts/dtu/test.sh

Testing

First download and extract DTU eval data and pretrained models.

The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.

bash scripts/dtu/test.sh

You should get something like these:

Acc 0.4051747996189477
Comp 0.2776021161518006
F-score 0.34138845788537414

Acknowledgement

The fusion code for DTU dataset is heavily built upon from PatchMatchNet

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

✏️ Changelog

Nov 27 2021

⚙️ Installation

🎬 Demo

⏳ Training & Testing

ScanNet

Download

Training

Testing

SUN3D/RGBD/Scenes11

Download

Training

Testing

DTU

Download

Training

Testing

Acknowledgement

Owner

How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

PyContinual (An Easy and Extendible Framework for Continual Learning)

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

Matching python environment code for Lux AI 2021 Kaggle competition, and a gym interface for RL models.

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Source Code For Template-Based Named Entity Recognition Using BART

A fast model to compute optical flow between two input images.

Pytorch implementation of the paper: "SAPNet: Segmentation-Aware Progressive Network for Perceptual Contrastive Image Deraining"

Differentiable scientific computing library

The implementation of FOLD-R++ algorithm

Video-face-extractor - Video face extractor with Python

A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

Material del curso IIC2233 Programación Avanzada 📚

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

FirmWire is a full-system baseband firmware emulation platform for fuzzing, debugging, and root-cause analysis of smartphone baseband firmwares

On Evaluation Metrics for Graph Generative Models

Honours project, on creating a depth estimation map from two stereo images of featureless regions

Rule-based Customer Segmentation

Funnels: Exact maximum likelihood with dimensionality reduction.

The code for paper "Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation" which is accepted by AAAI 2022