Code for the upcoming CVPR 2021 paper

Overview

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael FirmanCVPR 2021

[Link to paper]

We introduce ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available.

  • Self-supervised: We train from monocular video only. No depths or poses are needed at training or test time.
  • Good depths from single frames; even better depths from short sequences.
  • Efficient: Only one forward pass at test time. No test-time optimization needed.
  • State-of-the-art self-supervised monocular-trained depth estimation on KITTI and CityScapes.

Overview

Cost volumes are commonly used for estimating depths from multiple input views:

Cost volume used for aggreagting sequences of frames

However, cost volumes do not easily work with self-supervised training.

Baseline: Depth from cost volume input without our contributions

In our paper, we:

  • Introduce an adaptive cost volume to deal with unknown scene scales
  • Fix problems with moving objects
  • Introduce augmentations to deal with static cameras and start-of-sequence frames

These contributions enable cost volumes to work with self-supervised training:

ManyDepth: Depth from cost volume input with our contributions

With our contributions, short test-time sequences give better predictions than methods which predict depth from just a single frame.

ManyDepth vs Monodepth2 depths and error maps

✏️ 📄 Citation

If you find our work useful or interesting, please cite our paper:

@inproceedings{watson2021temporal,
    author = {Jamie Watson and
              Oisin Mac Aodha and
              Victor Prisacariu and
              Gabriel Brostow and
              Michael Firman},
    title = {{The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth}},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

📈 Results

Our ManyDepth method outperforms all previous methods in all subsections across most metrics, whether or not the baselines use multiple frames at test time. See our paper for full details.

KITTI results table

👀 Reproducing Paper Results

To recreate the results from our paper, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_KITTI_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name>

Depending on the size of your GPU, you may need to set --batch_size to be lower than 12. Additionally you can train a high resolution model by adding --height 320 --width 1024.

For instructions on downloading the KITTI dataset, see Monodepth2

To train a CityScapes model, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.train \
    --data_path <your_preprocessed_cityscapes_path> \
    --log_dir <your_save_path>  \
    --model_name <your_model_name> \
    --dataset cityscapes_preprocessed \
    --split cityscapes_preprocessed \
    --freeze_teacher_epoch 5 \
    --height 192 --width 512

This assumes you have already preprocessed the CityScapes dataset using SfMLearner's prepare_train_data.py script. We used the following command:

python prepare_train_data.py \
    --img_height 512 \
    --img_width 1024 \
    --dataset_dir <path_to_downloaded_cityscapes_data> \
    --dataset_name cityscapes \
    --dump_root <your_preprocessed_cityscapes_path> \
    --seq_length 3 \
    --num_threads 8

Note that while we use the --img_height 512 flag, the prepare_train_data.py script will save images which are 1024x384 as it also crops off the bottom portion of the image. You could probably save disk space without a loss of accuracy by preprocessing with --img_height 256 --img_width 512 (to create 512x192 images), but this isn't what we did for our experiments.

💾 Pretrained weights and evaluation

You can download weights for some pretrained models here:

To evaluate a model on KITTI, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_KITTI_path> \
    --load_weights_folder <your_model_path>
    --eval_mono

Make sure you have first run export_gt_depth.py to extract ground truth files.

And to evaluate a model on Cityscapes, run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> \
python -m manydepth.evaluate_depth \
    --data_path <your_cityscapes_path> \
    --load_weights_folder <your_model_path>
    --eval_mono \
    --eval_split cityscapes

During evaluation, we crop and evaluate on the middle 50% of the images.

We provide ground truth depth files HERE, which were converted from pixel disparities using intrinsics and the known baseline. Download this and unzip into splits/cityscapes.

🖼 Running on your own images

We provide some sample code in test_simple.py which demonstrates multi-frame inference. This predicts depth for a sequence of two images cropped from a dashcam video. Prediction also requires an estimate of the intrinsics matrix, in json format. For the provided test images, we have estimated the intrinsics to be equivalent to those of the KITTI dataset. Note that the intrinsics provided in the json file are expected to be in normalised coordinates.

Download and unzip model weights from one of the links above, and then run the following command:

python -m manydepth.test_simple \
    --target_image_path assets/test_sequence_target.jpg \
    --source_image_path assets/test_sequence_source.jpg \
    --intrinsics_json_path assets/test_sequence_intrinsics.json \
    --model_path path/to/weights

A predicted depth map rendering will be saved to assets/test_sequence_target_disp.jpeg.

👩‍⚖️ License

Copyright © Niantic, Inc. 2021. Patent Pending. All rights reserved. Please see the license file for terms.

Owner
Niantic Labs
Building technologies and ideas that move us
Niantic Labs
DeepDiffusion: Unsupervised Learning of Retrieval-adapted Representations via Diffusion-based Ranking on Latent Feature Manifold

DeepDiffusion Introduction This repository provides the code of the DeepDiffusion algorithm for unsupervised learning of retrieval-adapted representat

4 Nov 15, 2022
Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?" Install // Datasets // Experiments // Models // License // Reference Full video Offi

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
Code for the paper: Fighting Fake News: Image Splice Detection via Learned Self-Consistency

Fighting Fake News: Image Splice Detection via Learned Self-Consistency [paper] [website] Minyoung Huh *12, Andrew Liu *1, Andrew Owens1, Alexei A. Ef

minyoung huh (jacob) 174 Dec 09, 2022
Some experiments with tennis player aging curves using Hilbert space GPs in PyMC. Only experimental for now.

NOTE: This is still being developed! Setup notes This document uses Jeff Sackmann's tennis data. You can obtain it as follows: git clone https://githu

Martin Ingram 1 Jan 20, 2022
A Human-in-the-Loop workflow for creating HD images from text

A Human-in-the-Loop? workflow for creating HD images from text DALL·E Flow is an interactive workflow for generating high-definition images from text

Jina AI 2.5k Jan 02, 2023
A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

What is This This is a simple re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"(1). Only Sections

102 Dec 14, 2022
Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

CReST in Tensorflow 2 Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Ki

Google Research 75 Nov 01, 2022
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

Adrian Rosebrock 4.3k Jan 08, 2023
Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation", if you find this useful and use

57 Dec 27, 2022
Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Pruner for nested cross-validation - Sphinx-Doc Nested cross-validation is necessary to avoid biased model performance in embedded feature selection i

1 Dec 15, 2021
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Unified-EPT Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation. Installation Linux, CUDA=10.0,

29 Aug 23, 2022
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022 News (03/16/2022) upload retrieval checkpoints finetuned on COCO and Flickr T

187 Jan 02, 2023
Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

PIXEL-LEVEL CRACK DETECTION FROM IMAGES OF LEVEE SYSTEMS : A COMPARATIVE STUDY G

Manisha Panta 2 Jul 23, 2022
An implementation of chunked, compressed, N-dimensional arrays for Python.

Zarr Latest Release Package Status License Build Status Coverage Downloads Gitter Citation What is it? Zarr is a Python package providing an implement

Zarr Developers 1.1k Dec 30, 2022
UFPR-ADMR-v2 Dataset

UFPR-ADMR-v2 Dataset The UFPR-ADMRv2 dataset contains 5,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), w

Gabriel Salomon 8 Sep 29, 2022
🔥 Cannlytics-powered artificial intelligence 🤖

Cannlytics AI 🔥 Cannlytics-powered artificial intelligence 🤖 🏗️ Installation 🏃‍♀️ Quickstart 🧱 Development 🦾 Automation 💸 Support 🏛️ License ?

Cannlytics 3 Nov 11, 2022
Prototype for Baby Action Detection and Classification

Baby Action Detection Table of Contents About Install Run Predictions Demo About An attempt to harness the power of Deep Learning to come up with a so

Shreyas K 30 Dec 16, 2022
v objective diffusion inference code for PyTorch.

v-diffusion-pytorch v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The

Katherine Crowson 635 Dec 30, 2022
unofficial pytorch implementation of RefineGAN

RefineGAN unofficial pytorch implementation of RefineGAN (https://arxiv.org/abs/1709.00753) for CSMRI reconstruction, the official code using tensorpa

xinby17 5 Jul 21, 2022