This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Related tags

Deep LearningMOTR
Overview

MOTR: End-to-End Multiple-Object Tracking with TRansformer

PWC PWC

This repository is an official implementation of the paper MOTR: End-to-End Multiple-Object Tracking with TRansformer.

Introduction

TL; DR. MOTR is a fully end-to-end multiple-object tracking framework based on Transformer. It directly outputs the tracks within the video sequences without any association procedures.

Abstract. The key challenge in multiple-object tracking (MOT) task is temporal modeling of the object under track. Existing tracking-by-detection methods adopt simple heuristics, such as spatial or appearance similarity. Such methods, in spite of their commonality, are overly simple and insufficient to model complex variations, such as tracking through occlusion. Inherently, existing methods lack the ability to learn temporal variations from data. In this paper, we present MOTR, the first fully end-to-end multiple-object tracking framework. It learns to model the long-range temporal variation of the objects. It performs temporal association implicitly and avoids previous explicit heuristics. Built on Transformer and DETR, MOTR introduces the concept of “track query”. Each track query models the entire track of an object. It is transferred and updated frame-by-frame to perform object detection and tracking, in a seamless manner. Temporal aggregation network combined with multi-frame training is proposed to model the long-range temporal relation. Experimental results show that MOTR achieves state-of-the-art performance.

Main Results

Method Dataset Train Data MOTA IDF1 IDS URL
MOTR MOT16 MOT17+CrowdHuman Val 65.8 67.1 547 model
MOTR MOT17 MOT17+CrowdHuman Val 66.5 67.0 1884 model

Note:

  1. All models of MOTR are trained on 8 NVIDIA Tesla V100 GPUs.
  2. The training time is about 2.5 days for 200 epochs;
  3. The inference speed is about 7.5 FPS for resolution 1536x800;
  4. All models of MOTR are trained with ResNet50 with pre-trained weights on COCO dataset.

Installation

The codebase is built on top of Deformable DETR.

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n deformable_detr python=3.7 pip

    Then, activate the environment:

    conda activate deformable_detr
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt
  • Build MultiScaleDeformableAttention

    cd ./models/ops
    sh ./make.sh

Usage

Dataset preparation

Please download MOT17 dataset and CrowdHuman dataset and organize them like FairMOT as following:

.
├── crowdhuman
│   ├── images
│   └── labels_with_ids
├── MOT15
│   ├── images
│   ├── labels_with_ids
│   ├── test
│   └── train
├── MOT17
│   ├── images
│   ├── labels_with_ids

Training and Evaluation

Training on single node

You can download COCO pretrained weights from Deformable DETR. Then training MOTR on 8 GPUs as following:

sh configs/r50_motr_train.sh

Evaluation on MOT15

You can download the pretrained model of MOTR (the link is in "Main Results" session), then run following command to evaluate it on MOT15 train dataset:

sh configs/r50_motr_eval.sh

For visual in demo video, you can enable 'vis=True' in eval.py like:

det.detect(vis=True)

Evaluation on MOT17

You can download the pretrained model of MOTR (the link is in "Main Results" session), then run following command to evaluate it on MOT17 test dataset (submit to server):

sh configs/r50_motr_submit.sh

Citing MOTR

If you find MOTR useful in your research, please consider citing:

@article{zeng2021motr,
  title={MOTR: End-to-End Multiple-Object Tracking with TRansformer},
  author={Zeng, Fangao and Dong, Bin and Wang, Tiancai and Chen, Cheng and Zhang, Xiangyu and Wei, Yichen},
  journal={arXiv preprint arXiv:2105.03247},
  year={2021}
}
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities.

Playground for CLIP-like models Demo Colab Link GradCAM Visualization Naive Zero-shot Detection Smarter Zero-shot Detection Captcha Solver Changelog 2

Kevin Zakka 101 Dec 30, 2022
[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

Niantic Labs 44 Nov 29, 2022
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 08, 2022
An Inverse Kinematics library aiming performance and modularity

IKPy Demo Live demos of what IKPy can do (click on the image below to see the video): Also, a presentation of IKPy: Presentation. Features With IKPy,

Pierre Manceron 481 Jan 02, 2023
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

PyAF (Python Automatic Forecasting) PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python module

CARME Antoine 405 Jan 02, 2023
Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

41 May 18, 2022
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Ka

Tomas Jakab 87 Nov 30, 2022
Automatic Idiomatic Expression Detection

IDentifier of Idiomatic Expressions via Semantic Compatibility (DISC) An Idiomatic identifier that detects the presence and span of idiomatic expressi

5 Jun 09, 2022
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 04, 2023
TransVTSpotter: End-to-end Video Text Spotter with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

weijiawu 66 Dec 26, 2022
Multi-Task Deep Neural Networks for Natural Language Understanding

New Release We released Adversarial training for both LM pre-training/finetuning and f-divergence. Large-scale Adversarial training for LMs: ALUM code

Xiaodong 2.1k Dec 30, 2022
The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Yuki M. Asano 249 Dec 22, 2022
Python script that allows you to automatically setup your Growtopia server.

AutoSetup Python script that allows you to automatically setup your Growtopia server. How To Use Firstly, install all the required modules that used i

Aspire 3 Mar 06, 2022
RepVGG: Making VGG-style ConvNets Great Again

RepVGG: Making VGG-style ConvNets Great Again (PyTorch) This is a super simple ConvNet architecture that achieves over 80% top-1 accuracy on ImageNet

2.8k Jan 04, 2023
Public implementation of "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression" from CoRL'21

Self-Supervised Reward Regression (SSRR) Codebase for CoRL 2021 paper "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression "

19 Dec 12, 2022
Joint deep network for feature line detection and description

SOLD² - Self-supervised Occlusion-aware Line Description and Detection This repository contains the implementation of the paper: SOLD² : Self-supervis

Computer Vision and Geometry Lab 427 Dec 27, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Histocartography is a framework bringing together AI and Digital Pathology

Documentation | Paper Welcome to the histocartography repository! histocartography is a python-based library designed to facilitate the development of

155 Nov 23, 2022
PyTorch common framework to accelerate network implementation, training and validation

pytorch-framework PyTorch common framework to accelerate network implementation, training and validation. This framework is inspired by works from MML

Dongliang Cao 3 Dec 19, 2022
GANfolk: Using AI to create portraits of fictional people to sell as NFTs

GANfolk are AI-generated renderings of fictional people. Each image in the collection was created by a pair of Generative Adversarial Networks (GANs) with names and backstories also created with AI.

Robert A. Gonsalves 32 Dec 02, 2022