[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Last update: Dec 30, 2022

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

This is the official implementation for the method described in

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Jiaxing Yan, Hong Zhao, Penghui Bu and YuSheng Jin.

3DV 2021 (arXiv pdf)

Setup

Assuming a fresh Anaconda distribution, you can install the dependencies with:

conda install pytorch=1.7.0 torchvision=0.8.1 -c pytorch
pip install tensorboardX==2.1
pip install opencv-python==3.4.7.28
pip install albumentations==0.5.2   # we use albumentations for faster image preprocessing

This project uses Python 3.7.8, cuda 11.4, the experiments were conducted using a single NVIDIA RTX 3090 GPU and CPU environment - Intel Core i9-9900KF.

We recommend using a conda environment to avoid dependency conflicts.

Prediction for a single image

You can predict scaled disparity for a single image with:

python test_simple.py --image_path images/test_image.jpg --model_name MS_1024x320

On its first run either of these commands will download the MS_1024x320 pretrained model (272MB) into the models/ folder. We provide the following options for --model_name:

`--model_name`	Training modality	Resolution	Abs_Rel	Sq_Rel	$\delta<1.25$
`M_640x192`	Mono	640 x 192	0.105	0.769	0.892
`M_1024x320`	Mono	1024 x 320	0.102	0.734	0.898
`M_1280x384`	Mono	1280 x 384	0.102	0.715	0.900
`MS_640x192`	Mono + Stereo	640 x 192	0.102	0.752	0.894
`MS_1024x320`	Mono + Stereo	1024 x 320	0.096	0.694	0.908

KITTI training data

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

Splits

The train/test/validation splits are defined in the splits/ folder. By default, the code will train a depth model using Zhou's subset of the standard Eigen split of KITTI, which is designed for monocular training. You can also train a model using the new benchmark split or the odometry split by setting the --split flag.

Training

Monocular training:

python train.py --model_name mono_model

Stereo training:

Our code defaults to using Zhou's subsampled Eigen training data. For stereo-only training we have to specify that we want to use the full Eigen training set.

python train.py --model_name stereo_model \
  --frame_ids 0 --use_stereo --split eigen_full

Monocular + stereo training:

python train.py --model_name mono+stereo_model \
  --frame_ids 0 -1 1 --use_stereo

Note: For high resolution input, e.g. 1024x320 and 1280x384, we employ a lightweight setup, ResNet18 and 640x192, for pose encoder at training for memory savings. The following example command trains a model named M_1024x320:

python train.py --model_name M_1024x320 --num_layers 50 --height 320 --width 1024 --num_layers_pose 18 --height_pose 192 --width_pose 640
#             encoder     resolution                                     
# DepthNet   resnet50      1024x320
# PoseNet    resnet18       640x192

Finetuning a pretrained model

Add the following to the training command to load an existing model for finetuning:

python train.py --model_name finetuned_mono --load_weights_folder ~/tmp/mono_model/models/weights_19

Other training options

Run python train.py -h (or look at options.py) to see the range of other training options, such as learning rates and ablation settings.

KITTI evaluation

To prepare the ground truth depth maps run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

...assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

The following example command evaluates the weights of a model named MS_1024x320:

python evaluate_depth.py --load_weights_folder ./log/MS_1024x320 --eval_mono --data_path ./kitti_data --eval_split eigen

Precomputed results

You can download our precomputed disparity predictions from the following links:

Training modality	Input size	`.npy` filesize	Eigen disparities
Mono	640 x 192	326M	Download 🔗
Mono	1024 x 320	871M	Download 🔗
Mono	1280 x 384	1.27G	Download 🔗
Mono + Stereo	640 x 192	326M	Download 🔗
Mono + Stereo	1024 x 320	871M	Download 🔗

References

Monodepth2 - https://github.com/nianticlabs/monodepth2

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Setup

Prediction for a single image

KITTI training data

Training

Finetuning a pretrained model

Other training options

KITTI evaluation

Precomputed results

References

Owner

Jiaxing Yan

Framework to build and train RL algorithms

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

Video Contrastive Learning with Global Context

Inkscape extensions for figure resizing and editing

Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

CMSC320 - Introduction to Data Science - Fall 2021

Imagededup - 😎 Finding duplicate images made easy

Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms

constructing maps of intellectual influence from publication data

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

TLXZoo - Pre-trained models based on TensorLayerX

External Attention Network

The 2nd place solution of 2021 google landmark retrieval on kaggle.

A script depending on VASP output for calculating Fermi-Softness.

Run object detection model on the Raspberry Pi

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Brain Tumor Detection with Tensorflow Neural Networks.

Binary Stochastic Neurons in PyTorch

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Related tags

Overview

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Setup

Prediction for a single image

KITTI training data

Training

Finetuning a pretrained model

Other training options

KITTI evaluation

Precomputed results

References

Owner

Jiaxing Yan

Framework to build and train RL algorithms

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

Video Contrastive Learning with Global Context

Inkscape extensions for figure resizing and editing

Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

CMSC320 - Introduction to Data Science - Fall 2021

Imagededup - 😎 Finding duplicate images made easy

Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms

constructing maps of intellectual influence from publication data

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

TLXZoo - Pre-trained models based on TensorLayerX

External Attention Network

The 2nd place solution of 2021 google landmark retrieval on kaggle.

A script depending on VASP output for calculating Fermi-Softness.

Run object detection model on the Raspberry Pi

Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Brain Tumor Detection with Tensorflow Neural Networks.

Binary Stochastic Neurons in PyTorch

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch