当前位置:网站首页>Understanding of swin transformer network architecture and corresponding improvement modules
Understanding of swin transformer network architecture and corresponding improvement modules
2022-04-23 00:01:00 【liiiiiiiiiiiiike】
swin-Transformer
Transformer More and more fire , Personal feelings are based on Transformer It's amazing to use the matrix for vision !!

Swin-Transformer Compare with VIT The way to improve :
- SwinT Use similar CNN Middle level secondary construction method , In this way backbone It is helpful to build detection and segmentation tasks on this basis , and VIT It is a direct one-time down sampling 16 times , The following characteristic graphs keep the sampling rate unchanged .
- stay SwinT Use in Windows Multi-head Self-Attention(WMSA) The concept of , In the diagram above 4 Double down sampling and 8 Times down sampling , Divide the picture into multiple disjoint areas (window), and Multi-head Self-Attention Only in each independent window In the operation , be relative to VIT Direct to global in window Conduct Multi-head Self-Attention, The purpose of this is to reduce the amount of calculation , although SwinT Proposed WMSA Have the ability to save calculation , But it's a different sacrifice window At the cost of information transmission between , So the author aims at WMSA The shortcomings of , Put forward Shifted Windows Multi-head Self-Attention(SW-MSA), In this way, information can be transmitted in adjacent windows !
SwinT The network architecture of

- First put the picture (H * W * C) Input to Patch Partition The module is divided into blocks , Quadruple sampling method , Wide and high /4, passageway * 16 , Re pass Linear Embedding layer , This layer also passes through conv Realized , The main function is to reduce the channel (H/4,W/4,16*C)—> (H/4,W/4,C)
- Then through four stage Build feature maps of different sizes , except stage1 Pass first Linear Embedding Out of layer , The other three. stage It's all through Patch Merging Layer down sampling , Then it's all stacked and repeated SwinT block, It can be downloaded from (b) see ,SwinT block There are two structures in it W-MSA and SW-MSA, Because these two structures are used in pairs , So you can see the stacked block Is an even number .
- Finally, for classified networks , Then there will be another Layer Norm layer , Global pooling layer and FC Layer to get the final output .
Next, we're going to talk about Patch Merging、W-MSA、SW-MSA And the relative position paranoia used (relative position bias) Explain in detail , and SwinT block Used in MLP The structure and VIT The structure is the same in
* Patch Merging Detailed explanation
Patch Merging Follow Yolov5 in focus The structure is similar , Every other pixel is a patch, So wide and high /2,C * 4, And then through a Layer Norm layer , Finally, a FC Layer in Feature Map The depth direction changes linearly (H/2,W/2,C*4)-> (H/2,W/2,C * 2) .
* W-MSA Detailed explanation
introduce Windows Multi-head Self-Attention Module is to reduce the amount of calculation , Realize the idea : Is to divide a picture into multiple window,window Many points patch( Pixels ), Every patch Only in this part window In doing Multi-head Self-Attention. Be careful : W-MSA Each of them window No information interaction .
* SW-MSA Detailed explanation
According to W-MSA in window There is no information interaction between them, which has been improved , Put forward SW-MSA.

As shown in the figure above , On the left W-MSA stay layer L Use ,SW-MSA It's in L+1 Layer using , Because from SwinT block You can see that they are used in pairs , The left and right windows can be compared (Windows) There's a shift , With this offset M/2 Pixel , This solves the problem that information cannot be exchanged between different windows !!
window The number is before 4 Become what you are now 9 A the !!!!!!!!!!!!!!!!!
Adopted by the author Efficient batch computation for shifted configuration This method of calculation , That is to say, put each in the right figure window Regroup into 4 individual window! But one problem is that different regions carry different information , If forced to merge together, it is easy to cause information confusion , The author's solution is that if the pixels of the new region are not the pixels of the original region , In the calculation QK Then subtract 100, In this way softmax after , The connection between this part of pixels and other pixels is 0 了 ,** Be careful :** After the calculation, you have to move the data back to the original position .
Detailed explanation of model parameter configuration

-
win.sz 7 * 7 Indicates the window size
-
dim Express feature map The depth of the passage ( Or it can be said that token Vector length of )
-
head Indicates that in the multi head attention module head Number
版权声明
本文为[liiiiiiiiiiiiike]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204222358593552.html
边栏推荐
- 彻底解决Failed to execute goal on project xxxxx
- The parameter configuration of websoket package is out of the box
- grid_ Map (6): grid mapping in ROS compilation run
- 【必备知识】线激光扫描三维成像原理
- 基于.NetCore开发博客项目 StarBlog - (3) 模型设计
- Go language - use CO process to efficiently calculate the accumulation of each number in 0-2000
- Detailed explanation of MySQL index
- Use example to add a sort invalidation problem when using PageHelper
- Ansible Yum warehouse
- Ansible job 1
猜你喜欢

The latest MySQL tutorial is easy to understand

Reg 正则表达式学习笔记

Detailed explanation of seven common query connections in MySQL

Fundamentals of programming language (1)

Solve the error reporting problem of require is not defined

算法--两数相加 II(Kotlin)

微信小程序页面跳转

LabVIEW implements application stop or exit

LeetCode 1446 - 1449

XPath positioning
随机推荐
Mysql中的七种常用查询连接详解
The latest MySQL tutorial is easy to understand
Calculate text size based on height
Basic use of redis
Thought of reducing Governance -- detailed summary of binary search
Mysql的字段类型详解
51 single chip microcomputer learning_ 4-1 nixie tube display
The parameter configuration of websoket package is out of the box
FPGA(六)RTL代码之二(复杂电路设计1)
Write a beautiful login page with fluent (latest version)
FPGA(四)数字IC面试的四个基本问题
Robot OS System Architecture Design
目标检测模型回归anchor偏移量等问题
Rotate according to angle
YASKAWA motor servo software sigmawin + cannot be connected to the servo driver
Visual studio always conflicts with Sogou input method
(mm-2018) local convolutional neural network for pedestrian re recognition
Reinstall windows10
[transaction management]
Install the most complete version of ActiveMQ under the official website in 2022 and the official website access method