当前位置:网站首页>Thesis unscramble TransFG: A Transformer Architecture for Fine - grained Recognition
Thesis unscramble TransFG: A Transformer Architecture for Fine - grained Recognition
2022-08-11 06:32:00 【pontoon】
This article is the application of transformers in fine-grained fields.
Problem: Transformer has not been used in the field of image segmentation
Contribution points: 1. The input of the vision transformer divides the image into patches, but there is no overlap. The article is changed to split patches and use overlap (this can only be counted as a trick)
2.Part Selection Module
In layman's terms, the input of the last layer is different from the vision transformer, that is, the weights of all layers before the last layer (shown in the red box) are multiplied, and then the tokens with great weight are filtered and spliced together as the L-th layer.enter.
First of all, the output of the L-1 layer is originally like this:
The weight of a previous layer is as follows:
The value range of the subscript l is (1,2,...,L-1)
Assuming there are K self-attention heads, the weight in each head is:
The value range of the superscript i is (0,1,...,K)
The weights are multiplied to all layers before the last layer:
Then select the A_k tokens with the largest weight as the input of the last layer.
So after processing, its input can be expressed as:
From the perspective of the model architecture, it can be found that the token with the arrow in the red box is selected, and it is also the token with a large weight after the weight is multiplied. The blue box on the right represents the patch corresponding to the selected token.
3.Contrastive loss
The author said that the features between different categories in the fine-grained field are very similar, so it is not enough to use the cross-entropy loss to learn the features. After the cross-entropy loss, a new Contrastive loss is added, which introduces the cosine similarity.(Used to estimate the similarity of two vectors), the more similar the vectors, the greater the cosine similarity.
The purpose of the author's proposal of this loss function is to reduce the similarity of "classification tokens" of different categories, and maximize the similarity of the same "classification tokens".The formula of Contrastive loss is as follows:
Where a is an artificially set constant.
So the overall function is:
Experiment:
Comparison with CNN and ViT on several datasets for sub-classification, SOTA
边栏推荐
猜你喜欢
论文解读:跨模态/多光谱/多模态检测 Cross-Modality Fusion Transformer for Multispectral Object Detection
构建面向特征工程的数据生态 ——拥抱开源生态,OpenMLDB全面打通MLOps生态工具链
CVPR2022——Not All Points Are Equal : IA-SSD
跨应用间调用: URL Scheme
Node-3.构建Web应用(二)
Diagnostic Log and Trace——dlt的编译和安装
Waymo dataset usage introduction (waymo-open-dataset)
MSP430学习总结——时钟UCS
活动预告 | 4月23日,多场OpenMLDB精彩分享来袭,不负周末好时光
The latest safety helmet wearing recognition system in 2022
随机推荐
KANO模型——确定需求优先级的神器
OpenMLDB官网升级,神秘贡献者地图带你快速进阶
自定义形状seekbar学习
论文解读:GAN与检测网络多任务/SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network
Safety helmet recognition - construction safety "regulator"
USB中用NRZI来编码数据
MSP430学习总结(二)——GPIO
umi约定式路由规则修改
系统性能及并发数的一些计算公式
scanf函数在混合接受数据(%d和%c相连接)时候的方式
aPaaS和iPaaS的区别
端口的作用
promise 改变状态的方法和promise 的then方法
栈stack
小程序技术原理分析
贡献者任务第三期精彩来袭
arduino的esp32环境搭建(不需要翻墙,不需要离线安装)
Kotlin 增量编译的新方式 | 技术解析
物联网IOT 固件升级
微文案也能提升用户体验