当前位置:网站首页>QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution
QueryDet: Cascading Sparse Query Accelerates Small Object Detection at High Resolution
2022-08-11 03:25:00 【AI Vision Network】
论文:https://arxiv.org/abs/2103.09136
代码(已开源):https://github.com/ChenhongyiYang/QueryDet-PyTorch
计算机视觉研究院专栏
作者:Edison_G
Although deep learning for general object detection has achieved great success in the past few years,但The performance and efficiency of detecting small targets are far from satisfactory.
01
概述
Facilitate small object detectionThe most common and efficient way to do this is to use high-resolution images or feature maps.然而,These two methods will lead to the high cost of calculation,Because the calculation of the cost would be increased with the increase of image and character size is proportional to the growth.
为了两全其美,研究者提出了QueryDet,Using a Novel Query Mechanism to Accelerate the Inference Speed of Feature Pyramid-Based Object Detectors.The process consists of two steps as shown below.
First predict the rough location of small objects on low-resolution features,Then use the guided by these rough location sparse high-resolution feature calculation accurate test results.This not only reaps the benefits of high-resolution feature maps,Also avoids useless calculations on the background area.在流行的COCO数据集上,This method will detectmAP提高了1.0,mAP small提高了2.0,High-resolution inference speed increased on average3倍.with more small goalsVisDrone数据集上,The researcher creates a new state,average at the same time2.3times the high-resolution speedup.
02
背景及动机
By scaling the size of the input image or lowerCNNdownsampling rate to preserve high-resolution features to improve small object detection,as they increase the effective resolution in the resulting feature maps.然而,Just increasing the resolution of the feature maps incurs considerable computational cost.几项工作[A unified multi-scale deep convolutional neural network for fast object detection] [Feature pyramid networks for object detection][Ssd: Single shot multibox detector]proposed by reusing theCNNMulti-scale feature maps of different layers to build a feature pyramid to solve this problem.Objects of different scales are processed at different levels:Large objects tend to be detected on high-level features,While small objects are usually detected at low levels.The feature pyramid paradigm saves the computational cost of maintaining high-resolution feature maps from shallow to deep in the backbone.尽管如此,The computational complexity of the detection head for low-level features is still huge.
例如,在RetinaNetadd an extra pyramid levelP2将带来大约300%的计算量(FLOPs)And testing of memory cost;因此在NVIDIA 2080Ti GPUAdmiral inference speed from13.6 FPSseverely reduced to4.85 FPS.
Researchers have proposed a simple and effective methodQueryDet,To save test head calculation,While improving the performance of small targets.Motivation comes from two key observations:
1)对低级特征的计算是高度冗余的.在大多数情况下,The spatial distribution of small objects is very sparse:They occupy only a small part of the high-resolution feature map;因此浪费了大量的计算.
2)Feature pyramids are highly structured.Although we cannot accurately detect small objects in low-resolution feature maps,But we can still infer their existence and rough location with high confidence.
A natural idea to take advantage of the two observations in the figure above is,We can only apply the detection head to the spatial location of small objects.This strategy requires localizing the coarse locations of small objects on the desired feature maps with low cost and sparse computation.
在今天分享中,The researchers propose a cascaded sparse query based on a novel query mechanism(CSQ)的QueryDet,如上图所示.Recursive predictor of low resolution features small target on the drawing(查询)的粗略位置,并将它们用于 Directs computation in higher resolution feature maps.With sparse convolution,Significantly reduces the computational cost of the detection head for low-level features,While maintaining the detection accuracy of small targets.请注意,The proposed method aims to save space,so it is compatible with other acceleration methods,such as a lightweight backbone、模型修剪、Model Quantization and Knowledge Distillation.
03
新框架
Revisiting RetinaNet
RetinaNet有两部分:一个带有FPN的主干网络,Output multi-scale feature maps and two detection heads for classification and regression.
ResNet+FPN:提取图片特征
Anchor:border search
Class subnet (Focal Loss):预测类别
Box subnet:Predict border coordinates and size
P3 headoccupies nearly half of theFLOPs,while the low resolution functionP4到P7cost only15%.因此,如果想将FPN扩展到P2for better small target performance,the cost is unaffordable:高分辨率的P2和P3will account for the total cost of75%.In the analysis of the wind,描述了QueryDetHow to reduce the calculation of the characteristic of high resolution and promote the reasoningsp.
Accelerating Inference by Sparse Query
在基于FPNThe detector design of,小目标倾向于从高分辨率低级特征图中检测到.然而,由于小目标通常在空间中分布稀疏,高分辨率特征图上的密集计算范式效率非常低.受此观察的启发,The researchers put forward a kind of from coarse to fine cost method to reduce the low level of the pyramid:首先,On the characteristics of rough figure predicted the coarse position of small objects,然后集中计算精细特征图上的相应位置.这个过程可以看作是一个查询过程:粗略的位置是查询键,用于检测小目标的高分辨率特征是查询值;Therefore, the proposed method is calledQueryDet.The whole process of the method is shown in the following figure.
Relationships with Related Work
请注意,Although the new method and use itRPNTwo stages of target detector have some similarities,But they differ in the following ways:
The new method is just in a rough prediction classification calculation results,而RPNSimultaneous computation of classification and regression
RPNis computed on full feature maps at all levels,而QueryDetThe computation is sparse and selective
The two-stage approach relies onRoIAlign或RoIPoolingoperations like this to align features with first-stage candidates.
尽管如此,They are not in the proposed approach,Because the researcher has no box output in the rough prediction.值得注意的是,The proposed method is based onFPN的RPN兼容,因此可以将QueryDetIncorporated into a two-stage detector to speed up proposal generation.
04
实验及可视化
Comparison of accuracy (AP) and speed (FPS) of our QueryDet and the baseline RetinaNet on COCO mini-val set
Comparison of detection accuracy (AP) and speed (FPS) of our QueryDet and the baseline RetinaNet on VisDrone validation set
The speed and accuracy (AP and AR) trade-off with input images with different sizes on COCO and VisDrone. The trade-off is controlled by the the query threshold σ. The leftmost marker (the ▲ marker) of each curve stands for the result when Cascade Sparse Query is not applied. QD stands for QueryDet and RN stands for RetinaNet
Visualization of the detection results and the query heatmap for small objects of our QueryDet on MS-COCO and VisDrone2018 datasets. We remove class labels for VisDrone2018 to better distinguish the small bounding boxes
边栏推荐
- 互换性与测量技术——表面粗糙度选取和标注方法
- Unity2D animation (1) introduction to Unity scheme - animation system composition and the function of use
- 求和、计数的窗口函数应用
- 言简意赅,说说 @Transactional 在项目中的使用
- Summary of debugging skills
- df和df -lh的意思
- STC8H开发(十五): GPIO驱动Ci24R1无线模块
- 2022-08-10 第六小组 瞒春 学习笔记
- Paper Accuracy - 2017 CVPR "High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis"
- What problems should we pay attention to when building a programmatic trading system?
猜你喜欢
Idea (优选)cherry-pick操作
Unity2D animation (1) introduction to Unity scheme - animation system composition and the function of use
【Unity入门计划】Unity2D动画(1)-动画系统的组成及功能的使用
作业8.10 TFTP协议 下载功能
浮点数在内存中的存储方式
互换性测量与技术——偏差与公差的计算,公差图的绘制,配合与公差等级的选择方法
SQL 开发的十个高级概念
【愚公系列】2022年08月 Go教学课程 036-类型断言
言简意赅,说说 @Transactional 在项目中的使用
Environment configuration of ESP32 (arduino arduino2.0 VScode platform which is easy to use?)
随机推荐
(Nips-2015)空间变换器网络
一次简单的 JVM 调优,学会拿去写到简历里
【Pdf自动生成书签】
2022-08-10 第六小组 瞒春 学习笔记
成都纸质发票再见!开住宿费电子发票即将全面取代酒店餐饮加油站发票
KingbaseES有什么办法,默认不读取sys_catalog下的系统视图?
电商项目——商城限时秒杀功能系统
输入起始位置,终止位置截取链表
元素的BFC属性
言简意赅,说说 @Transactional 在项目中的使用
程序化交易的策略类型可以分为哪几种?
7 sorting algorithms that are often tested in interviews
EasyCVR接入海康大华设备选择其它集群服务器时,通道ServerID错误该如何解决?
按摩椅控制板的开发让按摩椅变得简约智能
Multi-threaded ThreadPoolExecutor
QueryDet:级联稀疏query加速高分辨率下的小目标检测
C语言之自定义类型------结构体
面试常考的7种排序算法
怎么删除语句审计日志?
索引的创建、查看、删除