[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

Last update: Dec 10, 2022

Overview

MonoRUn

MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation. CVPR 2021. [paper] Hansheng Chen, Yuyao Huang, Wei Tian*, Zhong Gao, Lu Xiong. (*Corresponding author: Wei Tian.)

This repository is the PyTorch implementation for MonoRUn. The codes are based on MMDetection and MMDetection3D, although we use our own data formats. The PnP C++ codes are modified from PVNet.

Installation

Please refer to INSTALL.md.

Data preparation

Download the official KITTI 3D object dataset, including left color images, calibration files and training labels.

Download the train/val/test image lists [Google Drive | Baidu Pan, password: cj4u]. For training with LiDAR supervision, download the preprocessed object coordinate maps [Google Drive | Baidu Pan, password: fp3h].

Extract the downloaded archives according to the following folder structure. It is recommended to symlink the dataset root to $MonoRUn_ROOT/data. If your folder structure is different, you may need to change the corresponding paths in config files.

$MonoRUn_ROOT
├── configs
├── monorun
├── tools
├── data
│   ├── kitti
│   │   ├── testing
│   │   │   ├── calib
│   │   │   ├── image_2
│   │   │   └── test_list.txt
│   │   └── training
│   │       ├── calib
│   │       ├── image_2
│   │       ├── label_2
│   │       ├── obj_crd
│   │       ├── mono3dsplit_train_list.txt
│   │       ├── mono3dsplit_val_list.txt
│   │       └── trainval_list.txt

Run the preparation script to generate image metas:

cd $MonoRUn_ROOT
python tools/prepare_kitti.py

Train

cd $MonoRUn_ROOT

To train without LiDAR supervision:

python train.py configs/kitti_multiclass.py --gpu-ids 0 1

where --gpu-ids 0 1 specifies the GPU IDs. In the paper we use two GPUs for distributed training. The number of GPUs affects the mini-batch size. You may change the samples_per_gpu option in the config file to vary the number of images per GPU. If you encounter out of memory issue, add the argument --seed 0 --deterministic to save GPU memory.

To train with LiDAR supervision:

python train.py configs/kitti_multiclass_lidar_supv.py --gpu-ids 0 1

To view other training options:

python train.py -h

By default, logs and checkpoints will be saved to $MonoRUn_ROOT/work_dirs. You can run TensorBoard to plot the logs:

tensorboard --logdir $MonoRUn_ROOT/work_dirs

The above configs use the 3712-image split for training and the other split for validating. If you want to train on the full training set (train-val), use the config files with _trainval postfix.

Test

You can download the pretrained models:

kitti_multiclass.pth [Google Drive | Baidu Pan, password: 6bih] trained on KITTI training split
kitti_multiclass_lidar_supv.pth [Google Drive | Baidu Pan, password: nmdb] trained on KITTI training split
kitti_multiclass_lidar_supv_trainval.pth [Google Drive | Baidu Pan, password: hg2r] trained on KITTI train-val

To test and evaluate on the validation set using config at $CONFIG_PATH and checkpoint at $CPT_PATH:

python test.py $CONFIG_PATH $CPT_PATH --val-set --gpu-ids 0

To test on the test set and save detection results to $RESULT_DIR:

python test.py $CONFIG_PATH $CPT_PATH --result-dir $RESULT_DIR --gpu-ids 0

You can append the argument --show-dir $SHOW_DIR to save visualized results.

To view other testing options:

python test.py -h

Note: the training and testing scripts in the root directory are wrappers for the original scripts taken from MMDetection, which can be found in $MonoRUn_ROOT/tools. For advanced usage, please refer to the official MMDetection docs.

Demo

We provide a demo script to perform inference on images in a directory and save the visualized results. Example:

python demo/infer_imgs.py $KITTI_RAW_DIR/2011_09_30/2011_09_30_drive_0027_sync/image_02/data configs/kitti_multiclass_lidar_supv_trainval.py checkpoints/kitti_multiclass_lidar_supv_trainval.pth --calib demo/calib.csv --show-dir show/2011_09_30_drive_0027

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{monorun2021, 
  author = {Hansheng Chen and Yuyao Huang and Wei Tian and Zhong Gao and Lu Xiong}, 
  title = {MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation}, 
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
  year = {2021}
}

[CVPR'21] MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation

Related tags

Overview

MonoRUn

Installation

Data preparation

Train

Test

Demo

Citation

Owner

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

This repository attempts to replicate the SqueezeNet architecture and implement the same on an image classification task.

Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

Attention Probe: Vision Transformer Distillation in the Wild

Contains code for the paper "Vision Transformers are Robust Learners".

Tensorflow implementation of soft-attention mechanism for video caption generation.

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

Server files for UltimateLabeling

ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI'22)

Fast RFC3339 compliant Python date-time library

Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

Official Matlab Implementation for "Tiny Obstacle Discovery by Occlusion-aware Multilayer Regression", TIP 2020

🇰🇷 Text to Image in Korean

The implementation for the SportsCap (IJCV 2021)

Get started with Machine Learning with Python - An introduction with Python programming examples

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation

Spectral Tensor Train Parameterization of Deep Learning Layers

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning