Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

Last update: Jan 04, 2023

Overview

IterMVS

official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

Introduction

IterMVS is a novel learning-based MVS method combining highest efficiency and competitive reconstruction quality. We propose a novel GRU-based estimator that encodes pixel-wise probability distributions of depth in its hidden state. Ingesting multi-scale matching information, our model refines these distributions over multiple iterations and infers depth and confidence. Extensive experiments on DTU, Tanks & Temples and ETH3D show highest efficiency in both memory and run-time, and a better generalization ability than many state-of-the-art learning-based methods.

If you find this project useful for your research, please cite:

@misc{wang2021itermvs,
      title={IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo}, 
      author={Fangjinhua Wang and Silvano Galliani and Christoph Vogel and Marc Pollefeys},
      year={2021},
      eprint={2112.05126},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Installation

Requirements

python 3.6
CUDA 10.1

pip install -r requirements.txt

Reproducing Results

Download pre-processed datasets (provided by PatchmatchNet): DTU's evaluation set, Tanks & Temples and ETH3D benchmark. Each dataset is organized as follows:

root_directory
├──scan1 (scene_name1)
├──scan2 (scene_name2) 
      ├── images                 
      │   ├── 00000000.jpg       
      │   ├── 00000001.jpg       
      │   └── ...                
      ├── cams_1                   
      │   ├── 00000000_cam.txt   
      │   ├── 00000001_cam.txt   
      │   └── ...                
      └── pair.txt

Camera file cam.txt stores the camera parameters, which includes extrinsic, intrinsic, minimum depth and maximum depth:

extrinsic
E00 E01 E02 E03
E10 E11 E12 E13
E20 E21 E22 E23
E30 E31 E32 E33

intrinsic
K00 K01 K02
K10 K11 K12
K20 K21 K22

DEPTH_MIN DEPTH_MAX

pair.txt stores the view selection result. For each reference image, 10 best source views are stored in the file:

TOTAL_IMAGE_NUM
IMAGE_ID0                       # index of reference image 0 
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 0 
IMAGE_ID1                       # index of reference image 1
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 1 
...

Evaluation on DTU:

For DTU's evaluation set, first download our processed camera parameters from here. Unzip it and replace all the old camera files in the folders cams_1 with new files for all the scans.
In eval_dtu.sh, set DTU_TESTING as the root directory of corresponding dataset, set --outdir as the directory to store the reconstructed point clouds.
CKPT_FILE is the path of checkpoint file (default as our pretrained model which is trained on DTU, the path is checkpoints/dtu/model_000015.ckpt).
Test on GPU by running bash eval_dtu.sh. The code includes depth map estimation and depth fusion. The outputs are the point clouds in ply format.
For quantitative evaluation, download SampleSet and Points from DTU's website. Unzip them and place Points folder in SampleSet/MVS Data/. The structure looks like:

SampleSet
├──MVS Data
      └──Points

In evaluations/dtu/BaseEvalMain_web.m, set dataPath as the path to SampleSet/MVS Data/, plyPath as directory that stores the reconstructed point clouds and resultsPath as directory to store the evaluation results. Then run evaluations/dtu/BaseEvalMain_web.m in matlab.

The results look like:

Acc. (mm)	Comp. (mm)	Overall (mm)
0.373	0.354	0.363

Evaluation on Tansk & Temples:

In eval_tanks.sh, set TANK_TESTING as the root directory of the dataset and --outdir as the directory to store the reconstructed point clouds.
CKPT_FILE is the path of checkpoint file (default as our pretrained model which is trained on DTU, the path is checkpoints/dtu/model_000015.ckpt). We also provide our pretrained model trained on BlendedMVS (checkpoints/blendedmvs/model_000015.ckpt)
Test on GPU by running bash eval_tanks.sh. The code includes depth map estimation and depth fusion. The outputs are the point clouds in ply format.
For our detailed quantitative results on Tanks & Temples, please check the leaderboards (Tanks & Temples: trained on DTU, Tanks & Temples: trained on BlendedMVS).

Evaluation on ETH3D:

In eval_eth.sh, set ETH3D_TESTING as the root directory of the dataset and --outdir as the directory to store the reconstructed point clouds.
CKPT_FILE is the path of checkpoint file (default as our pretrained model which is trained on DTU, the path is checkpoints/dtu/model_000015.ckpt). We also provide our pretrained model trained on BlendedMVS (checkpoints/blendedmvs/model_000015.ckpt)
Test on GPU by running bash eval_eth.sh. The code includes depth map estimation and depth fusion. The outputs are the point clouds in ply format.
For our detailed quantitative results on ETH3D, please check the leaderboards (ETH3D: trained on DTU, ETH3D: trained on BlendedMVS).

Evaluation on custom dataset:

We support preparing the custom dataset from COLMAP's results. The script colmap_input.py (modified based on the script from MVSNet) converts COLMAP's sparse reconstruction results into the same format as the datasets that we provide.
Test on GPU by running bash eval_custom.sh.

Training

DTU

Download pre-processed DTU's training set (provided by PatchmatchNet). The dataset is already organized as follows:

root_directory
├──Cameras_1
├──Rectified
└──Depths_raw

Download our processed camera parameters from here. Unzip all the camera folders into root_directory/Cameras_1.
In train_dtu.sh, set MVS_TRAINING as the root directory of dataset; set --logdir as the directory to store the checkpoints.
Train the model by running bash train_dtu.sh.

BlendedMVS

Download the dataset.
In train_blend.sh, set MVS_TRAINING as the root directory of dataset; set --logdir as the directory to store the checkpoints.
Train the model by running bash train_blend.sh.

Acknowledgements

Thanks to Yao Yao for opening source of his excellent work MVSNet. Thanks to Xiaoyang Guo for opening source of his PyTorch implementation of MVSNet MVSNet-pytorch.

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

Related tags

Overview

IterMVS

Introduction

Installation

Requirements

Reproducing Results

Evaluation on DTU:

Evaluation on Tansk & Temples:

Evaluation on ETH3D:

Evaluation on custom dataset:

Training

DTU

BlendedMVS

Acknowledgements

Owner

Fangjinhua Wang

Apollo optimizer in tensorflow

An experimentation and research platform to investigate the interaction of automated agents in an abstract simulated network environments.

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

Implementation of Multistream Transformers in Pytorch

Repo for 2021 SDD assessment task 2, by Felix, Anna, and James.

Gesture Volume Control Using OpenCV and MediaPipe

Bounding Wasserstein distance with couplings

This is a model made out of Neural Network specifically a Convolutional Neural Network model

Using Language Model to Bootstrap Human Activity Recognition Ambient Sensors Based in Smart Homes

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Optimal space decomposition based-product quantization for approximate nearest neighbor search

A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Implementation of ConvMixer in TensorFlow and Keras

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

PyTorch implementation of VAGAN: Visual Feature Attribution Using Wasserstein GANs

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Code to reproduce the results for Compositional Attention

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)