当前位置:网站首页>Detailed explanation of VIT transformer
Detailed explanation of VIT transformer
2022-08-09 20:44:00 【The romance of cherry blossoms】
1.VIT overall structure
Build a patch sequence for image data
For an image, divide the image into 9 windows. To pull these windows into a vector, such as a 10*10*3-dimensional image, we first need to pull the image into a 300-dimensional vector.
Location code:
There are two ways of position coding. The first coding is one-dimensional coding. These windows are coded into 1, 2, 3, 4, 5, 6, 7, 8, 9 in order.The second way is two-dimensional encoding, which returns the coordinates of each image window.
Finally, connect a layer of fully-connected layers to map the image encoding and positional encoding to a more easily recognizable encoding for computation.
So, what does the 0 code in the architecture diagram do?
We generally add 0 codes to image classification. Image segmentation and target detection generally do not need to be added. 0patch is mainly used for feature integration to integrate the feature vectors of each window. Therefore, 0 patch can be added in any position.
2. Detailed explanation of the formula
3. The receptive field of multi-head attention
As shown in the figure, the vertical axis represents the distance of attention, which is also equivalent to the receptive field of convolution. When there is only one head, the receptive field is relatively small, and the receptive field is also large. With the number of headsThe increase of , the receptive field is generally relatively large, which shows that Transformer extracts global features.
4.Position coding
Conclusion: The encoding is useful, but the encoding has little effect. Simply use the simple one. The effect of 2D (calculating the encoding of rows and columns separately, and then summing) is stillIt is not as good as 1D, and it is not very useful to add a shared position code to each layer
Of course, this is a classification task, and positional encoding may not have much effect
5. Experimental effect(/14 indicates the side length of the patch)

6.TNT: Transformer in Transformer
VIT only models the pathch, ignoring the smaller details
The external transformer divides the original image into windows, and generates a feature vector through image encoding and position encoding.
The internal transformer will further reorganize the window of the external transformer into multiple superpixels and reorganize them into new vectors. For example, the external transformer will split the image into 16*16*3 windows, and the internal transformer will split it again.It is divided into 4*4 superpixels, and the size of the small window is 4*4*48, so that each patch integrates the information of multiple channels.The new vector changes the output feature size through full connection. At this time, the internal combined vector is the same as the patch code size, and the internal vector and the external vector are added.
Visualization of TNT's PatchEmbedding
For the blue dots represent the features extracted by TNT, it can be seen from the visual image that the features of the blue dots are more discrete, have larger variance, and are more conducive to separation, More distinctive features and more diverse distribution
Experimental Results
For both internal and external training, the best effect is to add coding
边栏推荐
- 以技术御风险,护航云原生 | 同创永益 X 博云举办产品联合发布会
- Experience far more than Hue, this is the favorite SQL tool for technicians
- 动态RDLC报表(五)
- 五种常用的排序方法
- 毕昇编译器优化:Lazy Code Motion
- Wallys/QCA 9880/802.11ac Mini PCIe Wi-Fi Module, Dual Band, 2,4GHz / 5GHz advanced edition
- 阿里云张新涛:支持沉浸式体验应用快速落地,阿里云云XR平台发布
- 华为发布「国产Copilot内核」PanGu-Coder,而且真的能用中文哦!
- uniapp中使用网页录音并上传声音文件(发语音)——js-audio-recorder的使用【伸手党福利】
- 5.4 总结
猜你喜欢
随机推荐
win10 uwp 自定义控件 SplitViewItem
AWK使用
网络安全:常见的网络协议
01 -- 钉钉机器人
什么是ROS
50道Redis面试题,来看看你会多少?
动手学深度学习_风格迁移
SSM框架练手项目,高企必备的管理系统—CRM管理系统
虚拟修补:您需要知道的一切
艺术与科技的狂欢,云端XR支撑阿那亚2022砂之盒沉浸艺术季
PHP基础笔记-NO.4
IMX6ULL—Assembly LED Lights
Unity webgl 关于适配网页 ,并且用到js中的SetTimeOut和SetInterval()
史上最全架构师知识图谱(纯干货)
进行知识管理的好处有哪些?
C程序设计-第四版
kakka rebalance解决方案
Simple prohibition of garbage collection in d
jmeter-录制脚本
anno arm移植Qt环境后,编译正常,程序无法正常启动问题的记录