PyTorch DepthNet Training on Still Box dataset

Overview

DepthNet training on Still Box

Project page

This code can replicate the results of our paper that was published in UAVg-17. If you use this repo in your work, please cite us with the following bibtex :

@Article{isprs-annals-IV-2-W3-67-2017,
AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.},
TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS},
JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {IV-2/W3},
YEAR = {2017},
PAGES = {67--74},
URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
}

depthnet

End-to-end depth from motion with stabilized monocular videos

  • This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement.
  • Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m.

See our second paper for information about using this code on real videos with speed estimation

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

Click Below for video

youtube video

DepthNet

DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image.

  • No information is given about movement direction
  • DepthNet is Fully Convolutional, which means it is completely robust to optical center fault
  • This network only works for pinhole-like pictures

Still Box

stillbox

Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is impossible to infer from a single image, as shapes get all kinds of sizes and positions.

  • You can download it here
  • The dataset webpage also provides a tutorial on how to read the data

Training

Requirements

[sudo] pip3 install -r requirements.txt

If you want to log some outputs from the validation set with the --log-output option, you need openCV python bindings to convert depth to RGB with a rainbow colormap.

If you don't have opencv, grayscales will be logged

Usage

Best results can be obtained by training on still box 64 and then finetuned successively up to the resolution you target. Here are the parameters used for the paper (please note how learning rate and batch size are changed, training was done a single GTX 980Ti).

python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn
python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64
python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128
python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256

Note: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation

Pretrained networks

Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm.

Name training set Error (m)
DepthNet_elu_bn_64.pth.tar 64 4.65 Link
DepthNet_elu_bn_128.pth.tar 128 3.08 Link
DepthNet_elu_bn_256.pth.tar 256 2.29 Link
DepthNet_elu_bn_512.pth.tar 512 1.97 Link

All the networks have the same size and same structure.

Custom FOV and focal length

Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively

  • 32px for 64x64 images
  • 64px for 128x128 images
  • 128px for 128x128 images
  • 256px for 512x512 images

Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training.

If you need to use a custom focal length and FOV you can simply resize the pictures and crop them.

Say you have a picture of width w with an associated FOV fov. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match fov (cropping doesn't affect focal length in pixels), and then resize it to w. Note that DepthNet can take rectangular pictures as input.

cropped_w = w/tan(pi*fov/360)

we naturally recommend to do this operation offline, metadata from metadata.json won't need to be altered.

with pretrained DepthNet

If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh w and FOV fov we get with a network trained with a particular focal length f the following width to resize to:

resized_w = f/2*tan(pi*fov/360)

That way, you won't have to make a dedicated training or even download the still box dataset


/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet

Testing Inference

The run_inference.py lets you run an inference on a folder of images, and save the depth maps in different visualizations.

A simple still box scene of 512x512 pictures for testing can be downloaded here. Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above.

python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs

Visualise training

Training can be visualized via tensorboard by launching this command in another terminal

tensorboard --logdir=/path/to/DepthNet/Results

You can then access the board from any computer in the local network by accessing machine_ip:6006 from a web browser, just as a regular tensorboard server. More info here

Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

DavidHuang 126 Dec 30, 2022
Download & Install mods for your favorit game with a few simple clicks

Husko's SteamWorkshop Downloader 🔴 IMPORTANT ❗ 🔴 The Tool is currently being rewritten so updates will be slow and only on the dev branch until it i

Husko 67 Nov 25, 2022
Model of an AI powered sign language interpreter.

TEXT AND SPEECH TO SIGN LANGUAGE. A web application which takes in text or live audio speech recording as input, converts and displays the relevant Si

Mark Gatere 4 Mar 30, 2022
A python-image-classification web application project, written in Python and served through the Flask Microframework

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and

Gerald Maduabuchi 19 Dec 12, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning Pytorch Implementation for DisCo: Remedy Self-supervi

79 Jan 06, 2023
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021) An efficient PyTorch library for Point Cloud Completion.

Microsoft 119 Jan 02, 2023
Code for paper Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

Decoupled Spatial-Temporal Graph Neural Networks Code for our paper: Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting.

S22 43 Jan 04, 2023
Answering Open-Domain Questions of Varying Reasoning Steps from Text

This repository contains the authors' implementation of the Iterative Retriever, Reader, and Reranker (IRRR) model in the EMNLP 2021 paper "Answering Open-Domain Questions of Varying Reasoning Steps

26 Dec 22, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

Object DGCNN & DETR3D This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110

Wang, Yue 539 Jan 07, 2023
EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness

EqGAN - Improving GAN Equilibrium by Raising Spatial Awareness Improving GAN Equilibrium by Raising Spatial Awareness Jianyuan Wang, Ceyuan Yang, Ying

GenForce: May Generative Force Be with You 149 Dec 19, 2022
Semi-Autoregressive Transformer for Image Captioning

Semi-Autoregressive Transformer for Image Captioning Requirements Python 3.6 Pytorch 1.6 Prepare data Please use git clone --recurse-submodules to clo

YE Zhou 23 Dec 09, 2022
Neon: an add-on for Lightbulb making it easier to handle component interactions

Neon Neon is an add-on for Lightbulb making it easier to handle component interactions. Installation pip install git+https://github.com/neonjonn/light

Neon Jonn 9 Apr 29, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
Discord bot for notifying on github events

Git-Observer Discord bot for notifying on github events ⚠️ This bot is meant to write messages to only one channel (implementing this for multiple pro

ilu_vatar_ 0 Apr 19, 2022
We present a regularized self-labeling approach to improve the generalization and robustness properties of fine-tuning.

Overview This repository provides the implementation for the paper "Improved Regularization and Robustness for Fine-tuning in Neural Networks", which

NEU-StatsML-Research 21 Sep 08, 2022
How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Deep Q-Learning Recommend papers The first step is to read and understand the method that you will implement. It was first introduced in a 2013 paper

1 Jan 25, 2022
Simple codebase for flexible neural net training

neural-modular Simple codebase for flexible neural net training. Allows for seamless exchange of models, dataset, and optimizers. Uses hydra for confi

Jannik Kossen 7 Apr 05, 2022
DrQ-v2: Improved Data-Augmented Reinforcement Learning

DrQ-v2: Improved Data-Augmented RL Agent Method DrQ-v2 is a model-free off-policy algorithm for image-based continuous control. DrQ-v2 builds on DrQ,

Facebook Research 234 Jan 01, 2023
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Gated-Attention Architectures for Task-Oriented Language Grounding This is a PyTorch implementation of the AAAI-18 paper: Gated-Attention Architecture

Devendra Chaplot 234 Nov 05, 2022