BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Last update: Dec 29, 2022

Overview

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue.

This repo is the official implementation of BigDetection. It is based on mmdetection and CBNetV2.

Introduction

We construct a new large-scale benchmark termed BigDetection. Our goal is to simply leverage the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. BigDetection dataset has 600 object categories and contains 3.4M training images with 36M object bounding boxes. We show some important statistics of BigDetection in the following figure.

Left: Number of images per category of BigDetection. Right: Number of instances in different object sizes.

Results and Models

BigDetection Validation

We show the evaluation results on BigDetection Validation. We hope BigDetection could serve as a new challenging benchmark for evaluating next-level object detection methods.

Method	mAP (bigdet val)	Links
YOLOv3	9.7	model/config
Deformable DETR	13.1	model/config
Faster R-CNN (C4)*	18.9	model
Faster R-CNN (FPN)*	19.4	model
CenterNet2*	23.1	model
Cascade R-CNN*	24.1	model
CBNetV2-Swin-Base	35.1	model/config

COCO Validation

We show the finetuning performance on COCO minival/test-dev. Results show that BigDetection pre-training provides significant benefits for different detector architectures. We achieve 59.8 mAP on COCO test-dev with a single model.

Method	mAP (coco minival/test-dev)	Links
YOLOv3	30.5/-	config
Deformable DETR	39.9/-	model/config
Faster R-CNN (C4)*	38.8/-	model
Faster R-CNN (FPN)*	40.5/-	model
CenterNet2*	45.3/-	model
Cascade R-CNN*	45.1/-	model
CBNetV2-Swin-Base	59.1/59.5	model/config
CBNetV2-Swin-Base (TTA)	59.5/59.8	config

Data Efficiency

We followed STAC and SoftTeacher to evaluate on COCO for different partial annotation settings.

Method	mAP (1%)	mAP (2%)	mAP (5%)	mAP (10%)
Baseline	9.8	14.3	21.2	26.2
STAC	14.0	18.3	24.4	28.6
SoftTeacher (ICCV 21)	20.5	26.5	30.7	34.0
Ours	25.3	28.1	31.9	34.1
	model	model	model	model

Notes

The models following * are implemented on another detection codebase Detectron2. Here we provide the pretrained checkpoints. The results can be reproduced following the installation of CenterNet2 codebase.
Most of models are trained for 8X schedule on BigDetection.
Most of pretrained models are finetuned for 1X schedule on COCO.
TTA denotes test time augmentation.
Pre-trained models of Swin Transformer can be downloaded from Swin Transformer for ImageNet Classification.

Getting Started

Requirements

Ubuntu 16.04
CUDA 10.2

Installation

# Create conda environment
conda create -n bigdet python=3.7 -y
conda activate bigdet

# Install Pytorch
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 -c pytorch

# Install mmcv
pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html

# Clone and install
git clone https://github.com/amazon-research/bigdetection.git
cd bigdetection
pip install -r requirements/build.txt
pip install -v -e .

# Install Apex (optinal)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Data Preparation

Our BigDetection involves 3 datasets and train/val data can be downloaded from their official website (Objects365, OpenImages v6, LVIS v1.0). All datasets should be placed under $bigdetection/data/ as below. The synsets (total 600 class names) of BigDetection dataset can be downloaded here: bigdetection_synsets. Contact us with [email protected] to get access to our pre-processed annotation files.

bigdetection/data
└── BigDetection
    ├── annotations
    │   ├── bigdet_obj_train.json
    │   ├── bigdet_oid_train.json
    │   ├── bigdet_lvis_train.json
    │   ├── bigdet_val.json
    │   └── cas_weights.json
    ├── train
    │   ├── Objects365
    │   ├── OpenImages
    │   └── LVIS
    └── val

Training

To train a detector with pre-trained models, run:

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options load_from=<PRETRAIN_MODEL>

Pre-training

To pre-train a CBNetV2 with a Swin-Base backbone on BigDetection using 8 GPUs, run: (PRETRAIN_MODEL should be pre-trained checkpoint of Base-Swin-Transformer: model)

tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py 8 \
    --cfg-options load_from=<PRETRAIN_MODEL>

To pre-train a Deformable-DETR with a ResNet-50 backbone on BigDetection, run:

tools/dist_train.sh configs/BigDetection/deformable_detr/deformable_detr_r50_16x2_8x_bigdet.py 8

Fine-tuning

To fine-tune a BigDetection pre-trained CBNetV2 (with Swin-Base backbone) on COCO, run: (PRETRAIN_MODEL should be BigDetection pre-trained checkpoint of CBNetV2: model)

tools/dist_train.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py 8 \
    --cfg-options load_from=<PRETRAIN_MODEL>

Inference

To evaluate a detector with pre-trained checkpoints, run:

tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT> <GPU_NUM> --eval bbox

BigDetection evaluation

To evaluate pre-trained CBNetV2 on BigDetection validation, run:

tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_bigdet.py \
    <BIGDET_PRETRAIN_CHECKPOINT> 8 --eval bbox

COCO evaluation

To evaluate COCO-finetuned CBNetV2 on COCO validation, run:

# without test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco.py \
    <COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask

# with test-time-augmentation
tools/dist_test.sh configs/BigDetection/cbnetv2/htc_cbv2_swin_base_giou_4conv1f_adamw_20e_coco_tta.py \
    <COCO_FINETUNE_CHECKPOINT> 8 --eval bbox mask

Other configuration based on Detectron2 can be found at detectron2-probject.

Citation

If you use our dataset or pretrained models in your research, please kindly consider to cite the following paper.

@article{bigdetection2022,
  title={BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training},
  author={Likun Cai and Zhi Zhang and Yi Zhu and Li Zhang and Mu Li and Xiangyang Xue},
  journal={arXiv preprint arXiv:2203.13249},
  year={2022}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

We thank the authors releasing mmdetection and CBNetv2 for object detection research community.

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Related tags

Overview

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Introduction

Results and Models

BigDetection Validation

COCO Validation

Data Efficiency

Notes

Getting Started

Requirements

Installation

Data Preparation

Training

Inference

Citation

Security

License

Acknowledgement

Owner

Nightmare-Writeup - Writeup for the Nightmare CTF Challenge from 2022 DiceCTF

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

Pytorch implementation of the paper: "SAPNet: Segmentation-Aware Progressive Network for Perceptual Contrastive Image Deraining"

SOTR: Segmenting Objects with Transformers [ICCV 2021]

An open source machine learning library for performing regression tasks using RVM technique.

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

The official start-up code for paper "FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark."

Deep Multimodal Neural Architecture Search

Tutorials and implementations for "Self-normalizing networks"

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

Improving Non-autoregressive Generation with Mixup Training

This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

DIP-football - A football video analyse system based on Yolov5, alphapose, Qt6

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode