当前位置:网站首页>Thesis unscramble TransFG: A Transformer Architecture for Fine - grained Recognition
Thesis unscramble TransFG: A Transformer Architecture for Fine - grained Recognition
2022-08-11 06:32:00 【pontoon】
This article is the application of transformers in fine-grained fields.
Problem: Transformer has not been used in the field of image segmentation
Contribution points: 1. The input of the vision transformer divides the image into patches, but there is no overlap. The article is changed to split patches and use overlap (this can only be counted as a trick)
2.Part Selection Module
In layman's terms, the input of the last layer is different from the vision transformer, that is, the weights of all layers before the last layer (shown in the red box) are multiplied, and then the tokens with great weight are filtered and spliced together as the L-th layer.enter.
First of all, the output of the L-1 layer is originally like this:
The weight of a previous layer is as follows:
The value range of the subscript l is (1,2,...,L-1)
Assuming there are K self-attention heads, the weight in each head is:
The value range of the superscript i is (0,1,...,K)
The weights are multiplied to all layers before the last layer:
Then select the A_k tokens with the largest weight as the input of the last layer.
So after processing, its input can be expressed as:
From the perspective of the model architecture, it can be found that the token with the arrow in the red box is selected, and it is also the token with a large weight after the weight is multiplied. The blue box on the right represents the patch corresponding to the selected token.
3.Contrastive loss
The author said that the features between different categories in the fine-grained field are very similar, so it is not enough to use the cross-entropy loss to learn the features. After the cross-entropy loss, a new Contrastive loss is added, which introduces the cosine similarity.(Used to estimate the similarity of two vectors), the more similar the vectors, the greater the cosine similarity.
The purpose of the author's proposal of this loss function is to reduce the similarity of "classification tokens" of different categories, and maximize the similarity of the same "classification tokens".The formula of Contrastive loss is as follows:
Where a is an artificially set constant.
So the overall function is:
Experiment:
Comparison with CNN and ViT on several datasets for sub-classification, SOTA
边栏推荐
猜你喜欢
Mei cole studios - deep learning second BP neural network
关于mmdetection框架实用小工具说明
[Meetup]OpenMLDBxDolphinScheduler 链接特征工程与调度环节,打造端到端MLOps工作流
场景驱动的特征计算方式OpenMLDB,高效实现“现算先用”
推出 Space Marketplace 测试版 | 新发布
实时姿态估计--基于空洞卷积的人体姿态估计网络
Maykle Studio - HarmonyOS Application Development Fourth Training
Mei cole studios - fifth training DjangoWeb application framework + MySQL database
珍爱网App竞品分析报告
stm32-WS2812 PWM+DMA(自己写库函数)
随机推荐
STM32学习笔记(白话文理解版)—外部IO中断实验
何凯明新作ViTDET:目标检测领域,颠覆分层backbone理念
网络七层结构(讲人话)
自定义形状seekbar学习--方向盘view
Maykle Studio - HarmonyOS Application Development Fourth Training
swin-transformer训练自己的数据集<自留>
Ubuntu下安装mysql笔记
ARM 汇编指令 ADR 与 LDR 使用
[Meetup]OpenMLDBxDolphinScheduler 链接特征工程与调度环节,打造端到端MLOps工作流
net6 的Web MVC项目中事务功能的应用
如何快速转行做产品经理
开源机器学习数据库OpenMLDB贡献者计划全面启动
第四范式OpenMLDB优化创新论文被国际数据库顶会VLDB录用
weex入门踩坑
vscode插件开发——代码提示、代码补全、代码分析(续)
2021-09-11 C语言 变量与内存分配
Solutions to the 7th Jimei University Programming Contest (Individual Contest)
ActiveReports报表分类之页面报表
The latest safety helmet wearing recognition system in 2022
STM32学习笔记(白话文理解版)—按键控制