当前位置：网站首页>QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution

QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution

2022-08-11 03:25:00 【AI Vision Network】

论文：https://arxiv.org/abs/2103.09136

代码（已开源）：https://github.com/ChenhongyiYang/QueryDet-PyTorch

计算机视觉研究院专栏

作者：Edison_G

Although deep learning for general object detection has achieved great success in the past few years,但The performance and efficiency of detecting small targets are far from satisfactory.

概述

Facilitate small object detectionThe most common and efficient way to do this is to use high-resolution images or feature maps.然而,These two methods will lead to the high cost of calculation,Because the calculation of the cost would be increased with the increase of image and character size is proportional to the growth.

为了两全其美,研究者提出了QueryDet,Using a Novel Query Mechanism to Accelerate the Inference Speed of Feature Pyramid-Based Object Detectors.The process consists of two steps as shown below.

First predict the rough location of small objects on low-resolution features,Then use the guided by these rough location sparse high-resolution feature calculation accurate test results.This not only reaps the benefits of high-resolution feature maps,Also avoids useless calculations on the background area.在流行的COCO数据集上,This method will detectmAP提高了1.0,mAP small提高了2.0,High-resolution inference speed increased on average3倍.with more small goalsVisDrone数据集上,The researcher creates a new state,average at the same time2.3times the high-resolution speedup.

背景及动机

By scaling the size of the input image or lowerCNNdownsampling rate to preserve high-resolution features to improve small object detection,as they increase the effective resolution in the resulting feature maps.然而,Just increasing the resolution of the feature maps incurs considerable computational cost.几项工作[A unified multi-scale deep convolutional neural network for fast object detection] [Feature pyramid networks for object detection][Ssd: Single shot multibox detector]proposed by reusing theCNNMulti-scale feature maps of different layers to build a feature pyramid to solve this problem.Objects of different scales are processed at different levels：Large objects tend to be detected on high-level features,While small objects are usually detected at low levels.The feature pyramid paradigm saves the computational cost of maintaining high-resolution feature maps from shallow to deep in the backbone.尽管如此,The computational complexity of the detection head for low-level features is still huge.

例如,在RetinaNetadd an extra pyramid levelP2将带来大约300%的计算量（FLOPs）And testing of memory cost;因此在NVIDIA 2080Ti GPUAdmiral inference speed from13.6 FPSseverely reduced to4.85 FPS.

Researchers have proposed a simple and effective methodQueryDet,To save test head calculation,While improving the performance of small targets.Motivation comes from two key observations：

1）对低级特征的计算是高度冗余的.在大多数情况下,The spatial distribution of small objects is very sparse：They occupy only a small part of the high-resolution feature map;因此浪费了大量的计算.

2）Feature pyramids are highly structured.Although we cannot accurately detect small objects in low-resolution feature maps,But we can still infer their existence and rough location with high confidence.

A natural idea to take advantage of the two observations in the figure above is,We can only apply the detection head to the spatial location of small objects.This strategy requires localizing the coarse locations of small objects on the desired feature maps with low cost and sparse computation.

在今天分享中,The researchers propose a cascaded sparse query based on a novel query mechanism(CSQ)的QueryDet,如上图所示.Recursive predictor of low resolution features small target on the drawing（查询）的粗略位置,并将它们用于 Directs computation in higher resolution feature maps.With sparse convolution,Significantly reduces the computational cost of the detection head for low-level features,While maintaining the detection accuracy of small targets.请注意,The proposed method aims to save space,so it is compatible with other acceleration methods,such as a lightweight backbone、模型修剪、Model Quantization and Knowledge Distillation.

新框架

Revisiting RetinaNet

RetinaNet有两部分：一个带有FPN的主干网络,Output multi-scale feature maps and two detection heads for classification and regression.

ResNet+FPN：提取图片特征
Anchor：border search
Class subnet (Focal Loss)：预测类别
Box subnet：Predict border coordinates and size

P3 headoccupies nearly half of theFLOPs,while the low resolution functionP4到P7cost only15%.因此,如果想将FPN扩展到P2for better small target performance,the cost is unaffordable：高分辨率的P2和P3will account for the total cost of75%.In the analysis of the wind,描述了QueryDetHow to reduce the calculation of the characteristic of high resolution and promote the reasoningsp.

Accelerating Inference by Sparse Query

在基于FPNThe detector design of,小目标倾向于从高分辨率低级特征图中检测到.然而,由于小目标通常在空间中分布稀疏,高分辨率特征图上的密集计算范式效率非常低.受此观察的启发,The researchers put forward a kind of from coarse to fine cost method to reduce the low level of the pyramid：首先,On the characteristics of rough figure predicted the coarse position of small objects,然后集中计算精细特征图上的相应位置.这个过程可以看作是一个查询过程：粗略的位置是查询键,用于检测小目标的高分辨率特征是查询值;Therefore, the proposed method is calledQueryDet.The whole process of the method is shown in the following figure.

Relationships with Related Work

请注意,Although the new method and use itRPNTwo stages of target detector have some similarities,But they differ in the following ways：

The new method is just in a rough prediction classification calculation results,而RPNSimultaneous computation of classification and regression
RPNis computed on full feature maps at all levels,而QueryDetThe computation is sparse and selective
The two-stage approach relies onRoIAlign或RoIPoolingoperations like this to align features with first-stage candidates.

尽管如此,They are not in the proposed approach,Because the researcher has no box output in the rough prediction.值得注意的是,The proposed method is based onFPN的RPN兼容,因此可以将QueryDetIncorporated into a two-stage detector to speed up proposal generation.

实验及可视化

Comparison of accuracy (AP) and speed (FPS) of our QueryDet and the baseline RetinaNet on COCO mini-val set

Comparison of detection accuracy (AP) and speed (FPS) of our QueryDet and the baseline RetinaNet on VisDrone validation set

The speed and accuracy (AP and AR) trade-off with input images with different sizes on COCO and VisDrone. The trade-off is controlled by the the query threshold σ. The leftmost marker (the ▲ marker) of each curve stands for the result when Cascade Sparse Query is not applied. QD stands for QueryDet and RN stands for RetinaNet

Visualization of the detection results and the query heatmap for small objects of our QueryDet on MS-COCO and VisDrone2018 datasets. We remove class labels for VisDrone2018 to better distinguish the small bounding boxes

原网站

版权声明
本文为[AI Vision Network]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/223/202208110313358534.html

当前位置：网站首页>QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution

QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution

边栏推荐

猜你喜欢

随机推荐