当前位置:网站首页>是什么让训练综合分类网络艰苦?
是什么让训练综合分类网络艰苦?
2022-08-10 03:32:00 【Rainylt】
paper : What makes training multi-modal classification networks hard?
cvpr2020
One sentence summary: In multi-modal training, the overfitting index obtained by combining the validation set and the training set is used to modulate the Loss< of different modalities./strong>Weights to solve the imbalance problem of multimodal training.
After a sentence is finished, there are still many questions. This article is also worth writing a note.
Multimodal fusion method
(1) Early Fusion
Concat or fuse by other means when the original data is input
(2) Mid Fusion
After extracting features, concat, then go through the Fusion module, and then go through the classification head
(3) Late Fusion
After extracting features, concat, and directly pass the classification header
What is the multimodal training imbalance problem?
The reason is that the author found that in the video classification task, the multi-modal model is not as good as the single-modal model
As shown above, A is Audio and OF is optical flow.
The models used are similar. For example, A+RGB is to add Audio's Encoder on the basis of single RGB, and then concat the two features together and classify them through the classifier.The single RGB is directly RGB through the encoder, and then classified by the classifier.
It seems that there is no transformer added after concat for fuse?Maybe the fuse module can solve this problem to some extent?
Why?
Two findings:
(1) The multi-modal model has a higher training set accuracy, but the validation set accuracy is lower
(2) The Late Fusion model has a singleThe modal model has almost twice as many parameters
=>Suspected problem is overfitting
How to solve it?
First try the conventional solution to overfitting:
1. Dropout
2. Pre-train
3. early stop
Then try the mid-Fusion solution:
1. Concatenate (conv gets the feature, continue to conv after concat, and finally pass the classification header)
2. Gating in SE mode
3. Gating in Non-Local mode (that is, transformer)
Comments:
1. Early stop is completely impossible, maybe it is stoptoo early?
2. Pre-train is better than late-concat without Pre-train, but it can't catch up with single-modal RGB, or maybe it's because Pre-train is not very good?
3. Mid-concat has some improvements, but it is better than Non-Local. I didn't expect it, because Non-Local is only connected to one layer?Are there more convs after mid-concat?
4. Dropout has some effects, because there is indeed overfitting, but dropout was not added?
5. SE is a channel-level Attention, and there is no space, time, and frequency-level Attention, and the effect is poor.Excusable, adding it is the same as not adding it
This part has to look at the detail.
In other words, mid-concat is better than late concat, and dropout can also work, but the improvement is not much.
If dropout can work, it means that the network does have overfitting. After adding dropout, it finally exceedsSingle mode, mid-concat can also exceed single mode, so it is still necessary to fusion early.
Project for this article
In order to solve the problem of overfitting, this paper first proposes an indicator to measure the degree of overfitting:
*Represents the real scene (approximately on the validation set).
At the same time, based on Late Fusion, there are 3 branches, each with a classification header:
The middle is the feature after concat, plus a classification header, and the two sides are the respective classification headers.Each Loss has a weight, and the weight is optimized by the above indicators:
Finally get the weight formula:
The derivation is not too muchI understand, but in short, he assigned a weight to each Loss, so how did he do it in the final test?
边栏推荐
- excel高级绘图技巧100讲(二十三)-Excel中实现倒计时计数
- 2022 Top Net Cup Quals Reverse Partial writeup
- 如何编写一份优质的测试用例?
- 2022.8.9考试平衡的余数--1000题解
- [Kali Security Penetration Testing Practice Course] Chapter 9 Wireless Network Penetration
- Error state based Kalman filter ESKF
- ECCV 2022 Oral | CCPL: 一种通用的关联性保留损失函数实现通用风格迁移
- 数组(一)
- 实例045:求和
- 《GB39707-2020》PDF download
猜你喜欢
ArcGIS Advanced (1) - Install ArcGIS Enterprise and create an sde library
【二叉树-困难】124. 二叉树中的最大路径和
[Semantic Segmentation] 2022-HRViT CVPR
网络爬虫错误
What makes training multi-modal classification networks hard?
【QT】QT项目:自制Wireshark
[Kali Security Penetration Testing Practice Course] Chapter 8 Web Penetration
Difference Between Data Mining and Data Warehousing
实例048:数字比大小
微生物是如何影响身体健康的
随机推荐
2022 Top Net Cup Quals Reverse Partial writeup
从滑动标尺模型看企业网络安全能力评估与建设
【二叉树-中等】2265. 统计值等于子树平均值的节点数
2022.8.9 Exam Cube Sum--1100 Question Solutions
what is a microcontroller or mcu
Screen 拆分屏幕
IDEA自动生成serialVersionUID
OpenCV图像处理学习二,图像掩膜处理
2022.8.9 Remainder of Exam Balance--1000 Question Solutions
【图像分类】2022-ConvMixer ICLR
2022.8.9考试独特的投标拍卖--800题解
HRnet
[Kali Security Penetration Testing Practice Tutorial] Chapter 6 Password Attack
2022.8.8考试摄像师老马(photographer)题解
实例047:函数交换变量
【Kali安全渗透测试实践教程】第7章 权限提升
idea 删除文件空行
【图像分类】2022-CycleMLP ICLR
The flask to add and delete
LeetCode每日两题01:移动零 (均1200道)方法:双指针