PyTorch DepthNet Training on Still Box dataset

Overview

DepthNet training on Still Box

Project page

This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in your work, please cite us with the following bibtex :

@Article{isprs-annals-IV-2-W3-67-2017,
AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.},
TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}

depthnet

End-to-end depth from motion with stabilized monocular videos

  • This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement.
  • Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m.

See our second paper for information about using this code on real videos with speed estimation

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

Click Below for video

youtube video

DepthNet

DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image.

  • No information is given about movement direction
  • DepthNet is Fully Convolutional, which means it is completely robust to optical center fault
  • This network only works for pinhole-like pictures

Still Box

stillbox

Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is impossible to infer from a single image, as shapes get all kinds of sizes and positions.

  • You can download it here
  • The dataset webpage also provides a tutorial on how to read the data

Training

Requirements

[sudo] pip3 install -r requirements.txt

If you want to log some outputs from the validation set with the --log-output option, you need openCV python bindings to convert depth to RGB with a rainbow colormap.

If you don't have opencv, grayscales will be logged

Usage

Best results can be obtained by training on still box 64 and then finetuned successively up to the resolution you target. Here are the parameters used for the paper (please note how learning rate and batch size are changed, training was done a single GTX 980Ti).

python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn
python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64
python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128
python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256

Note: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation

Pretrained networks

Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm.

Name training set Error (m)
DepthNet_elu_bn_64.pth.tar 64 4.65 Link
DepthNet_elu_bn_128.pth.tar 128 3.08 Link
DepthNet_elu_bn_256.pth.tar 256 2.29 Link
DepthNet_elu_bn_512.pth.tar 512 1.97 Link

All the networks have the same size and same structure.

Custom FOV and focal length

Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively

  • 32px for 64x64 images
  • 64px for 128x128 images
  • 128px for 128x128 images
  • 256px for 512x512 images

Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training.

If you need to use a custom focal length and FOV you can simply resize the pictures and crop them.

Say you have a picture of width w with an associated FOV fov. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match fov (cropping doesn't affect focal length in pixels), and then resize it to w. Note that DepthNet can take rectangular pictures as input.

cropped_w = w/tan(pi*fov/360)

we naturally recommend to do this operation offline, metadata from metadata.json won't need to be altered.

with pretrained DepthNet

If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh w and FOV fov we get with a network trained with a particular focal length f the following width to resize to:

resized_w = f/2*tan(pi*fov/360)

That way, you won't have to make a dedicated training or even download the still box dataset


/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet

Testing Inference

The run_inference.py lets you run an inference on a folder of images, and save the depth maps in different visualizations.

A simple still box scene of 512x512 pictures for testing can be downloaded here. Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above.

python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs

Visualise training

Training can be visualized via tensorboard by launching this command in another terminal

tensorboard --logdir=/path/to/DepthNet/Results

You can then access the board from any computer in the local network by accessing machine_ip:6006 from a web browser, just as a regular tensorboard server. More info here

Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
Code for Learning to Segment The Tail (LST)

Learning to Segment the Tail [arXiv] In this repository, we release code for Learning to Segment The Tail (LST). The code is directly modified from th

47 Nov 07, 2022
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
Dilated Convolution with Learnable Spacings PyTorch

Dilated-Convolution-with-Learnable-Spacings-PyTorch Ismail Khalfaoui Hassani Dilated Convolution with Learnable Spacings (abbreviated to DCLS) is a no

15 Dec 09, 2022
A repository for interferometer controller code.

dses-interferometer-controller A repository for interferometer controller code, hardware, and simulations. See dses.science for more information on th

Eli Reed 1 Jan 17, 2022
Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021)

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021) Yunsong Zhou, Yuan He, Hongzi Zhu, Cheng Wang, Hongyang Li, Qinhong Jia

Yunsong Zhou 51 Dec 14, 2022
Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"

StyleAttack Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer" Prepare Pois

THUNLP 19 Nov 20, 2022
Efficient Training of Visual Transformers with Small Datasets

Official codes for "Efficient Training of Visual Transformers with Small Datasets", NerIPS 2021.

Yahui Liu 112 Dec 25, 2022
A small library of 3D related utilities used in my research.

utils3D A small library of 3D related utilities used in my research. Installation Install via GitHub pip install git+https://github.com/Steve-Tod/util

Zhenyu Jiang 8 May 20, 2022
Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact

NVIDIA Research Projects 10.6k Jan 01, 2023
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit 👉 Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022
Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

SG-GAN TensorFlow implementation of SG-GAN. Prerequisites TensorFlow (implemented in v1.3) numpy scipy pillow Getting Started Train Prepare dataset. W

lplcor 61 Jun 07, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
RLHive: a framework designed to facilitate research in reinforcement learning.

RLHive is a framework designed to facilitate research in reinforcement learning. It provides the components necessary to run a full RL experiment, for both single agent and multi agent environments.

88 Jan 05, 2023
Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System The possibilities to involve

Babu Kumaran Nalini 0 Nov 19, 2021
Localizing Visual Sounds the Hard Way

Localizing-Visual-Sounds-the-Hard-Way Code and Dataset for "Localizing Visual Sounds the Hard Way". The repo contains code and our pre-trained model.

Honglie Chen 58 Dec 07, 2022
Code for the paper "Jukebox: A Generative Model for Music"

Status: Archive (code is provided as-is, no updates expected) Jukebox Code for "Jukebox: A Generative Model for Music" Paper Blog Explorer Colab Insta

OpenAI 6k Jan 02, 2023
An easier way to build neural search on the cloud

An easier way to build neural search on the cloud Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g

Jina AI 17k Jan 02, 2023
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021