当前位置：网站首页>Deep learning object detection

Deep learning object detection

2022-04-23 05:30:00 【Mikawa】

deep learning object detection

Paper list from 2014 to 2019

deep_learning_object_detection_history

Milestones

milestone

Object detector composed parts

Input: Image, Patches, Image Pyramid
Backbones: VGG16, ResNet-50, SpineNet, EfficientNet-B0/B7 , CSPResNeXt50, CSPDarknet53
Neck:
- Additional blocks: SPP, ASPP, RFB, SAM
- Path-aggregation blocks: FPN, PAN, NAS-FPN, Fully-connected FPN, BiFPN, ASFF, SFAM
Heads:
- Dense Prediction (one-stage):
  - RPN, SSD, YOLO, RetinaNet (anchor based)
  - CornerNet, CenterNet, MatrixNet, FCOS (anchor free)
- Sparse Prediction (two-stage):
  - Faster R-CNN, R-FCN, Mask R-CNN (anchor based)
  - RepPoints (anchor free)

Detection methods category

Object detection steps

One-Stage

Extracts feature on all area of image, classify the objects,

localize bounding-box

Two-Stage

Generates category-independent region proposals,

extracts feature vector from each region proposal
Classify the objects, precisely bounding-box prediction (NMS)

Small object detection tricks

Framework for small object detection
- Multi-scale Feature Learning
Enhance the Receptive Fields (visual attention mechanisms)
Data Augmentation
- GAN-based Detection
- Flipping, cropping, rotating, scaling
Training Strategy
- Unsupervised object detection
- Weakly Supervised Object Detection
- Multi-Scale Training/Val/Test
- GPU accelerate
Context-based Detection
- Local context
- Global context
- Context interactive
Neural Architecture Search
- Stacking more pyramid networks
- Adding feature dimension
- Adopting high capacity architecture
Efficient post-processing methods
- Non maximum suppression (NMS)
- Soft-NMS
Deformable convolutional networks
Multi-task joint learning and optimization
- Object detection
- Semantic segmentation
- Instance segmentation
- Edge detection
- Highlight detection
Establish small object datasets

Performance table

FPS(Speed) index is related to the hardware spec(e.g. CPU, GPU, RAM, etc), so it is hard to make an equal comparison. The solution is to measure the performance of all models on hardware with equivalent specifications, but it is very difficult and time consuming.

Detector	COCO (mAP@IoU=0.5:0.95)	Published In
R-CNN	-	CVPR’14
Fast R-CNN	19.7	ICCV’15
Faster R-CNN	21.9	NIPS’15
YOLO v1	-	CVPR’16
SSD	31.2	ECCV’16
R-FCN	29.9	NIPS’16
FPN	36.2	CVPR’17
YOLO v2	-	CVPR’17
RetinaNet	39.1	ICCV’17
Mask R-CNN	39.8	ICCV’17
Soft-NMS	40.9	ICCV’17
YOLO v3	33.0	arXiv’18
RefineDet	41.8	CVPR’18
Cascade R-CNN	42.8	CVPR’ 18
RFBNet	-	ECCV’18
Softer-NMS	-	arXiv’ 18
SNIPER	43.5	NIPS’ 18
M2Det	44.2	AAAI’19
Libra R-CNN	43.0	CVPR’19
FSAF	44.6	CVPR’19
ExtremeNet	43.7	CVPR’19
CenterNet	45.1	ICCV’19
FreeAnchor	44.8	NeurIPS’19
CBNet	53.3	AAAI’20
YOLOv4	-	arXiv’20
ATSS	50.7	CVPR’ 20
Hit-Detector	41.4	CVPR’ 20
DetectoRS	54.7	arXiv’20

Performance on MS COCO

MS COCO detection evaluation metrics

2014

[R-CNN] Rich feature hierarchies for accurate object detection and semantic segmentation | [CVPR’ 14] |[pdf] [official code - caffe] CNN

2015

[Fast R-CNN] Fast R-CNN | [ICCV’ 15] |[pdf] [official code - caffe] RoI
[Faster R-CNN, RPN] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks | [NIPS’ 15] |[pdf] [official code - caffe] [unofficial code - tensorflow] [unofficial code - pytorch] Region Proposal Network (RPN) NMS

2016

[YOLO v1] You Only Look Once: Unified, Real-Time Object Detection | [CVPR’ 16] |[pdf] [official code - c] One-stage
[SSD] SSD: Single Shot MultiBox Detector | [ECCV’ 16] |[pdf] [official code - caffe] [unofficial code - tensorflow] [unofficial code - pytorch] Multi-scale feature map VGG16 NMS
[R-FCN] R-FCN: Object Detection via Region-based Fully Convolutional Networks | [NIPS’ 16] |[pdf] [official code - caffe] [unofficial code - caffe]

2017

[FPN] Feature Pyramid Networks for Object Detection | [CVPR’ 17] |[pdf] [unofficial code - caffe] Feature Pyramid Networks
[YOLO v2] YOLO9000: Better, Faster, Stronger | [CVPR’ 17] |[pdf] [official code - c] [unofficial code - caffe] [unofficial code - tensorflow] [unofficial code - tensorflow] [unofficial code - pytorch]
[RetinaNet] Focal Loss for Dense Object Detection | [ICCV’ 17] |[pdf] [official code - keras] [unofficial code - pytorch] [unofficial code - mxnet] [unofficial code - tensorflow] Focal Loss
[Mask R-CNN] Mask R-CNN | [ICCV’ 17] |[pdf] [official code - caffe2] [unofficial code - tensorflow] [unofficial code - tensorflow] [unofficial code - pytorch]
[Soft-NMS] Improving Object Detection With One Line of Code | [ICCV’ 17] |[pdf] [official code - caffe] Soft-NMS

2018

[YOLO v3] YOLOv3: An Incremental Improvement | [arXiv’ 18] |[pdf] [official code - c] [unofficial code - pytorch] [unofficial code - pytorch] [unofficial code - keras] [unofficial code - tensorflow]
[RefineDet] Single-Shot Refinement Neural Network for Object Detection | [CVPR’ 18] |[pdf] [official code - caffe] [unofficial code - chainer] [unofficial code - pytorch] Combine one-stage and two-stage
[Cascade R-CNN] Cascade R-CNN: Delving into High Quality Object Detection | [CVPR’ 18] |[pdf] [official code - caffe] Training Strategy
[RFBNet] Receptive Field Block Net for Accurate and Fast Object Detection | [ECCV’ 18] |[pdf] [official code - pytorch] Enhance the Receptive Fields
[Softer-NMS] Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection | [arXiv’ 18] |[pdf] Soft-NMS
[SNIPER] SNIPER: Efficient Multi-Scale Training | [NIPS’ 18] |[pdf] Training Strategy

2019

[M2Det] M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | [AAAI’ 19] |[pdf] [official code - pytorch] Multi-scale Feature Learning
[Libra R-CNN] Libra R-CNN: Balanced Learning for Object Detection | [CVPR’ 19] |[pdf] Training Strategy
[FSAF] Feature Selective Anchor-Free Module for Single-Shot Object Detection | [CVPR’ 19] |[pdf] Anchor-Free
[ExtremeNet] Bottom-up Object Detection by Grouping Extreme and Center Points | [CVPR’ 19] |[pdf] | [official code - pytorch] Instance Segmentation
[CenterNet] CenterNet: Keypoint Triplets for Object Detection | [ICCV’ 19] |[pdf] Keypoint-based detector
[FreeAnchor] FreeAnchor: Learning to Match Anchors for Visual Object Detection | [NeurIPS’ 19] |[pdf] Anchor-Free

2020

[CBnet] Cbnet: A novel composite backbone network architecture for object detection | [AAAI’ 20] |[pdf] Composite Backbone Network
[YOLOv4] YOLOv4: Optimal Speed and Accuracy of Object Detection | [arXiv’ 20] |[pdf]
- Input: Mosaic data augmentation, Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT)
- BackBone: CSPDarknet53, Mish-activation, DropBlock regularization
- Neck: SPP block, PAN (path-aggregation block)
- Prediction: CIoU-loss, DIoU-NMS
[ATSS] Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection | [CVPR’ 20] |[pdf] Anchor-Based Training Strategy
[Hit-Detector] Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection | [CVPR’ 20] |[pdf] Neural Architecture Search
[DetectoRS] DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | [arXiv’ 20] |[pdf] Recursive Feature Pyramid Switchable Atrous Convolution Instance Segmentation

Survey

Recent advances in small object detection based on deep learning: A review [pdf]
A Survey of Deep Learning-based Object Detection [pdf]
Object Detection in 20 Y ears: A Survey [pdf]
Recent Advances in Deep Learning for Object Detection [pdf]