PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Last update: Jan 04, 2023

Related tags

Deep Learning Neural-Scene-Flow-Fields

Overview

Neural Scene Flow Fields

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

[Project Website] [Paper] [Video]

Dependency

The code is tested with Python3, Pytorch >= 1.6 and CUDA >= 10.2, the dependencies includes

configargparse
matplotlib
opencv
scikit-image
scipy
cupy
imageio.
tqdm

Video preprocessing

Download nerf_data.zip from link, an example input video with SfM camera poses and intrinsics estimated from COLMAP (Note you need to use COLMAP "colmap image_undistorter" command to undistort input images to get "dense" folder as shown in the example, this dense folder should include "images" and "sparse" folders).
Download single view depth prediction model "model.pt" from link, and put it on the folder "nsff_scripts".
Run the following commands to generate required inputs for training/inference:

    # Usage
    cd nsff_scripts
    # create camera intrinsics/extrinsic format for NSFF, same as original NeRF where it uses imgs2poses.py script from the LLFF code: https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py
    python save_poses_nerf.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/"
    # Resize input images and run single view model
    python run_midas.py --data_path "/home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/" --input_w 640 --input_h 360 --resize_height 288
    # Run optical flow model (for easy setup and Pytorch version consistency, we use RAFT as backbond optical flow model, but should be easy to change to other models such as PWC-Net or FlowNet2.0)
    ./download_models.sh
    python run_flows_video.py --model models/raft-things.pth --data_path /home/xxx/Neural-Scene-Flow-Fields/kid-running/dense/ --epi_threhold 1.0 --input_flow_w 768 --input_semantic_w 1024 --input_semantic_h 576

Rendering from an example pretrained model

Download pretraind model "kid-running_ndc_5f_sv_of_sm_unify3_F00-30.zip" from link. Unzipping and putting it in the folder "nsff_exp/logs/kid-running_ndc_5f_sv_of_sm_unify3_F00-30/360000.tar".

Set datadir in config/config_kid-running.txt to the root directory of input video. Then go to directory "nsff_exp":

   cd nsff_exp

Rendering of fixed time, viewpoint interpolation

   python run_nerf.py --config configs/config_kid-running.txt --render_bt --target_idx 10

By running the example command, you should get the following result:

Rendering of fixed viewpoint, time interpolation

   python run_nerf.py --config configs/config_kid-running.txt --render_lockcam_slowmo --target_idx 8

By running the example command, you should get the following result:

Rendering of space-time interpolation

   python run_nerf.py --config configs/config_kid-running.txt --render_slowmo_bt  --target_idx 10

By running the example command, you should get the following result:

Training

In configs/config_kid-running.txt, modifying expname to any name you like (different from the original one), and running the following command to train the model:

    python run_nerf.py --config configs/config_kid-running.txt

The per-scene training takes ~2 days using 2 Nvidia V100 GPUs.

Several parameters in config files you might need to know for training a good model

N_samples: in order to render images with higher resolution, you have to increase number sampled points
start_frame, end_frame: indicate training frame range. The default model usually works for video of 1~2s. Training on longer frames can cause oversmooth rendering. To mitigate the effect, you can increase the capacity of the network by increasing netwidth (but it can drastically increase training time and memory usage).
decay_iteration: number of iteartion in initialization stage. Data-driven losses will decay every 1000*decay_iteration steps. It's usually good to match decay_iteration to the number of training frames.
no_ndc: our current implementation only supports reconstruction in NDC space, meaning it only works for forward-facing scene like original NeRF. But it should be not hard to adapt to euclidean space.
use_motion_mask, num_extra_sample: whether to use estimated coarse motion segmentation mask to perform hard-mining sampling during initialization stage, and how many extra samples during initialization stage.
w_depth, w_optical_flow: weight of losses for single-view depth and geometry consistency priors described in the paper
w_cycle: weights of scene flow cycle consistency loss
w_sm: weight of scene flow smoothness loss
w_prob_reg: weight of disocculusion weight regularization

Evaluation on the Dynamic Scene Dataset

Download Dynamic Scene dataset "dynamic_scene_data_full.zip" from link
Download pretrained model "dynamic_scene_pretrained_models.zip" from link, unzip and put them in the folder "nsff_exp/logs/"
Run the following command for each scene to get quantitative results reported in the paper:

   # Usage: configs/config_xxx.txt indicates each scene name such as config_balloon1-2.txt in nsff/configs
   python evaluation.py --config configs/config_xxx.txt

Note: you have to use modified LPIPS implementation included in this branch in order to measure LIPIS error for dynamic region only as described in the paper.

Acknowledgment

The code is based on implementation of several prior work:

License

This repository is released under the MIT license.

Citation

If you find our code/models useful, please consider citing our paper:

@article{li2020neural,
  title={Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes},
  author={Li, Zhengqi and Niklaus, Simon and Snavely, Noah and Wang, Oliver},
  journal={arXiv preprint arXiv:2011.13084},
  year={2020}
}

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Related tags

Overview

Neural Scene Flow Fields

Dependency

Video preprocessing

Rendering from an example pretrained model

Training

Evaluation on the Dynamic Scene Dataset

Acknowledgment

License

Citation

Owner

Zhengqi Li

Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Styled Handwritten Text Generation with Transformers (ICCV 21)

Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods."

Use unsupervised and supervised learning to predict stocks

Jittor implementation of PCT:Point Cloud Transformer

Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

This repository comes with the paper "On the Robustness of Counterfactual Explanations to Adverse Perturbations"

Multi-Glimpse Network With Python

The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

Deep Learning Package based on TensorFlow

ComputerVision - This repository aims at realized easy network architecture

The Python ensemble sampling toolkit for affine-invariant MCMC

Fast, flexible and fun neural networks.