当前位置:网站首页>Interpretation of the paper: GAN and detection network multi-task/SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network
Interpretation of the paper: GAN and detection network multi-task/SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network
2022-08-11 06:33:00 【pontoon】
1. Bottleneck:
Small-scale targets are limited by the lack of sufficient target feature information, making it difficult to distinguish them from the background, and small-scale targets are generally low-resolution and blurry, so the detection performance is average
CNN-based target detection algorithms all need to use downsampling operations, resulting in small-scale targets not only losing spatial location information, but also the original few target features are almost submerged by the features on the background
2. Contribution to this article:
A novel unified end-to-end multi-task generative adversarial network (MTGAN) for small object detection is proposed, which can be combined with any existing detectors
In MTGAN, a generator network generates super-resolution images, and a multi-task discriminator network is introduced to simultaneously distinguish real high-resolution images from fake images, predict object categories, and refine bounding boxes.More importantly, the classification and regression losses are back-propagated to further guide the generator network to produce super-resolution images for easier classification and better localization.
Finally, the effectiveness of MTGAN for object detection is demonstrated, where the detection performance is much improved over several state-of-the-art detectors (mainly for small objects)
3. Solution:
(A) Overall network input image
(B) The detector separates the target from the background in the input image (cropping, which is equivalent to extracting ROI from RPN), and then uses it for training the generator and discriminator, or extracting ROI during testing
(C) Positive and negative samples generated by the detector
(D) The generator is a super-resolution network that generates super-resolution from low-resolution images
(E) The discriminator is a multi-task network, whose input comes from the super-resolution image generated by the generator, judges the authenticity of the image, image classification, and image regression (equivalent to adding classification and regression to the original discriminatorbranch, introducing detection tasks)
The discriminator is a multi-task network whose gradients are passed back to the generator, so that the images generated by the generator are generated in the following direction (high resolution, easy for classification and regression)
Three branches of the discriminator (the true and false branch of the detection image is finally output by sigmoid, the classification branch is finally output by softmax, and the regression branch is finally output as (x,y,w,h))
Generator and discriminator network structure: (x5 represents a residual block with five layers of convolution)
Overall design objective function: (this is only an approximate function, which will be split later)
I^{LR} means low resolution image
I^{HR} means high resolution image
u represents the category label value
v represents the detection box regression label value
θ represents the discriminator network parameters
w means generator parameter
Objective function details:
(1) MSE-LOSS is minimized to make it close to the real image, but the disadvantage is that it is more blurry
(2) Adversarial Loss adds adversarial loss to improve detail reconstruction ability and fool the discriminator
(3) Classification Loss Classification Loss
and represent the probability that the generated image belongs to category u, and the probability that the real image is input to category u.
(4) Regression Loss regression loss, SR represents the generated super-score, and when ui=0, there is no regression value for the background class
smmoth L1 loss
Overall objective function: where α, β and γ are the weights to weigh different terms (α = 0.001, β = γ = 0.01)
4. Experiment:
Experiment on COCO dataset
The initial GAN is not stable. In order to avoid local optima, an MSE-based SR network is first trained to initialize the generator network.
COCO minival subset
Column 1: Real low-res image
Second column: real high-resolution images
Column 3: Generate high-resolution images
Ablation experiment:
Comparison of SOTA detection models:
Red: model prediction
Green: true tag
The author concludes that there is still a lot of room for improvement...
边栏推荐
猜你喜欢
Jetpack使用异常问题集锦
第四范式OpenMLDB优化创新论文被国际数据库顶会VLDB录用
关于openlayer中swipe位置偏移的问题
智能风控中台设计与落地
跨应用间调用: URL Scheme
STM32 基于固件库的工程模板的建立
Vscode remote connection server terminal zsh+Oh-my-zsh + Powerlevel10 + Autosuggestions + Autojump + Syntax-highlighting
精彩联动 | OpenMLDB Pulsar Connector原理和实操
STM32-库函数-SetSysClock(void)函数解析-正点原子探索者
OpenMLDB官网升级,神秘贡献者地图带你快速进阶
随机推荐
C语言实现简易扫雷(附带源码)
Visual studio2019 configuration uses pthread
贡献者任务第三期精彩来袭
2021-09-11 C language variables and memory allocation
开源之夏 2022 火热来袭 | 欢迎报名 OpenMLDB 社区项目~
Diagnostic Log and Trace——为应用程序和上下文设置日志级别的方法
Kotlin 增量编译的新方式 | 技术解析
目标检测思维导图
批量快速修改代码的正则表达式替换
NUC980-开发环境搭建
Maykel Studio - Django Web Application Framework + MySQL Database Third Training
2021-09-11 C语言 变量与内存分配
OpenMLDB:线上线下一致的生产级特征计算平台
蓝牙技术-简介
CMT2380F32模块开发10-高级定时器例程
论文解读:GAN与检测网络多任务/SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network
SearchGuard配置
关于openlayer中swipe位置偏移的问题
vscode插件开发——代码提示、代码补全、代码分析
OpenMLDB + Jupyter Notebook:快速搭建机器学习应用