[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Last update: Dec 13, 2022

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

@inproceedings{hou2021multiview,
  title={Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)},
  author={Hou, Yunzhong and Zheng, Liang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia (MM ’21)},
  year={2021}
}

Overview

We release the PyTorch code for MVDeTr, a state-of-the-art multiview pedestrian detector. Its superior performance should be credited to transformer architectures, updated loss terms, and view-coherent data augmentations. Moreover, MVDeTr is also very efficient and can be trained on a single RTX 2080TI. This repo also includes a simplified version of MVDet, which also runs on a single RTX 2080TI.

MVDeTr Code

This repo is dedicated to the code for MVDeTr.

Dependencies

This code uses the following libraries

python
pytorch & tochvision
numpy
matplotlib
pillow
opencv-python
kornia

Data Preparation

By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.

Your ~/Data/ folder should look like this

Data
├── MultiviewX/
│   └── ...
└── Wildtrack/ 
    └── ...

Code Preparation

Before running the code, one should go to multiview_detector/models/ops and run bash mask.sh to build the deformable transformer (forked from Deformable DETR).

Training

In order to train classifiers, please run the following,

python main.py -d wildtrack
python main.py -d multiviewx

This should automatically return evaluation results similar to the reported 91.5% MODA on Wildtrack dataset and 93.7% MODA on MultiviewX dataset.

Architectures

This repo supports multiple architecture variants. For MVDeTr, please specify --world_feat deform_trans; for a similar fully convolutional architecture like MVDet, please specify --world_feat conv.

Loss terms

This repo supports multiple loss terms. For the focal loss variant as in MVDeTr, please specify --use_mse 0; for the MSE loss as in MVDet, please specify ----use_mse 1.

Augmentations

This repo includes support for view coherent data augmentation, which applies affine transformations onto the per-view inputs, and then invert the per-view feature maps to maintain multiview coherency.

Pre-trained models

You can download the checkpoints at this link.

[ACM MM 2021] Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Related tags

Overview

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) [arXiv] [paper]

Overview

Content

MVDeTr Code

Dependencies

Data Preparation

Code Preparation

Training

Architectures

Loss terms

Augmentations

Pre-trained models

Owner

Yunzhong Hou

DTCN IJCAI - Sequential prediction learning framework and algorithm

A PyTorch Toolbox for Face Recognition

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

PyTorch implementation of PSPNet segmentation network

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Official code for paper Exemplar Based 3D Portrait Stylization.

Easily Process a Batch of Cox Models

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

The repository for freeCodeCamp's YouTube course, Algorithmic Trading in Python

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model'

A 35mm camera, based on the Canonet G-III QL17 rangefinder, simulated in Python.

Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network

Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

Implementation for Simple Spectral Graph Convolution in ICLR 2021

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)