Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.

Last update: Aug 24, 2022

Overview

WIBAM (Work in progress)

Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data

3D object detector trained on NuScenes only.

3D object detector finetuned on the WIBAM dataset.

Description

This is the project code for WIBAM as presented in our paper:

WIBAM: Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data
Matthew Howe, Ian Reid, Jamie Mackenzie
In: Britich Machine Vision Conference (BMVC) 2021

The preprint paper is available here.

Accurate 7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users. In principle, this could be achieved by a single camera system that is capable of detecting the pose of each vehicle but this would require a large, accurately labelled dataset from which to train the detector. Although large vehicle pose datasets exist (ostensibly developed for autonomous vehicles), we find training on these datasets inadequate. These datasets contain images from a ground level viewpoint, whereas an ideal view for intersection observation would be elevated higher above the road surface. We develop an alternative approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras; showing in the process that large existing autonomous vehicle datasets can be leveraged for pre-training. To fine-tune the monocular 3D object detector, our method utilises multiple 2D detections from overlapping, wide-baseline views and a loss that encodes the subjacent geometric consistency. Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets. We present our training methodology, multi-view reprojection loss, and dataset.

Additional information about my thesis

Link to ARSC video

Replicate my results

Please see the how to run section. Inference can be achieved with a single GPU (~8GB VRAM). Training was done on either two Nvidia 3080s or 2 Nvidia V100s. (min ~40GB VRAM required).

Results

Citation

@article{WIBAM,
  title={Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data},
  author={Matthew Howe, Ian Reid, Jamie Mackenzie},
  journal={32nd British Machine Vision Conference, BMVC 2021},
  year={2021}
}

Acknowledgements

This repo is a modified clone of CenterTrack https://github.com/xingyizhou/CenterTrack. CenterTrack is developed upon CenterNet. Both codebases are released under MIT License themselves. Some code of CenterNet are from third-parties with different licenses, please check the CenterNet repo for details. In addition, this repo uses py-motmetrics for MOT evaluation and nuscenes-devkit for nuScenes evaluation and preprocessing. See NOTICE for detail. Please note the licenses of each dataset. Most of the datasets we used in this project are under non-commercial licenses.

This research has been supported through the Australian Government Research Training Program Scholarship. High performance compute resources used in this work were funded by the Australian Research Council via LE190100080.

Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.

Related tags

Overview

WIBAM (Work in progress)

Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data

Description

Additional information about my thesis

Replicate my results

Results

Citation

Acknowledgements

Owner

Matthew Howe

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Eth brownie struct encoding example

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

NeuralDiff: Segmenting 3D objects that move in egocentric videos

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Official Pytorch implementation of "Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral)"

codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Free course that takes you from zero to Reinforcement Learning PRO 🦸🏻‍🦸🏽

PyTorch original implementation of Cross-lingual Language Model Pretraining.

PyTorch implementation of Deformable Convolution

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.