当前位置:网站首页>A general U-shaped transformer for image restoration
A general U-shaped transformer for image restoration
2022-04-23 06:00:00 【umbrellalalalala】
2021 year 6 month 6 Submitted to arxiv Articles on .ICCV2021 Of Eformer Namely Uformer Based on improvement , It seems worth reading , Simply record .
Know that the account with the same name is released synchronously .
Catalog
One 、 Architecture design
The structure is as shown in the figure :
Compared to the ordinary UNet, The difference is that LeWin Transformer, such Transformer It is also the innovation of this work .
So-called LeWin Transformer, Namely local-enhanced window Transformer, It includes W-MSA and LeFF:
- W-MSA:non-overlapping window-based self-attention, The purpose is to reduce the computational overhead ( Tradition transformer It is calculated globally self-attention, And it's not );
- LeFF: Tradition transformer Feedforward neural network is used in , Can't make good use of local context,LeFF The adoption of can capture local information.
️ Two innovations :
- Put forward LeWin Transformer, introduce UNet
- Three jump connections
Two 、 Main module details
2.1,W-MSA
This is the biggest innovation of this work . ( Reminded ,swin Transformer There are )
First will C×H×W Of X It is divided into N individual C×M×M individual patch, Every patch Deemed to have M×M individual C dimension vector(N = H × W / M²), this C individual vector enter W-MSA in . According to the above formula , A simple understanding is to make X Divided into non overlapping N slice , And then... For each piece self-attention The calculation of .
Author expresses , Although it is carried out on one piece self-attention The calculation of , But in UNet Of encode Stage , Due to the existence of down sampling , So calculate self attention on this piece , Corresponding to the calculation of self attention on the larger receptive field before down sampling .
Adopted relative position encoding, So the calculation formula can be expressed as :
A reference to this location code [48,41] Namely :
[48] Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position repre-
sentations. arXiv preprint arXiv:1803.02155, 2018.
[41] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining
Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint
arXiv:2103.14030, 2021.
2.2,LeFF
LeFF yes Incorporating Convolution Designs into Visual Transformers
Invented , Among them Convolution-enhanced image Transformer (CeiT)
Including this design .
The essence is to self-attention Calculate the output of the module N individual token(vector), Rearrange to N × N \sqrt{N} \times \sqrt{N} N×N Of “image”, Then proceed depth-wise Convolution operation of . After watching CeiT The diagram given by the author , Look again Uformer The diagram given by the author , It's not difficult to understand the meaning :
Each linear layer / After the convolution layer , Using all of these GELU Activation function .
(depth-wise A search on the Internet has , The function is to reduce parameters , Speed up computing )
2.3, Three jump connections
UNet The architecture has jump connections , In this work is to encoder Part of the Transformer The output of is passed to decoder part , But there are many ways to use these jump connections to convey information , The author explores three :
- The first is direct concat To come over ;
- The second is : every last decode stage There is one upsampling and two Transformer block, It means the first to use self-attention, The second use cross attention;
- The third is concat Information as key and value Of cross attention.
The author believes that the three are similar , But the first one is a little better , So the first one is used as Uformer Default Settings .
Uformer The details of architecture design are here , I won't take a closer look at other contents .
3、 ... and 、 Calculate the cost
since LeWin Transformer Medium W-MSA Focus on reducing computing overhead , So naturally, let's look at the complexity of the algorithm :
Given feature map X, Dimension for C×H×W, If it's traditional self-attention, So the complexity is O ( H 2 W 2 C ) O(H^2 W^2 C) O(H2W2C), Is divided into M×M Of patch Do it again self-attention, It is O ( H W M 2 M 4 C ) = O ( M 2 H W C ) O(\frac{HW}{M^2}M^4 C)=O(M^2HWC) O(M2HWM4C)=O(M2HWC), Reduced complexity .
Four 、 experimental result
The author did denoising 、 Go to the rain 、 Deblurring experiment .
版权声明
本文为[umbrellalalalala]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230543451118.html
边栏推荐
- 对比学习论文——[MoCo,CVPR2020]Momentum Contrast for Unsupervised Visual Representation Learning
- 图像恢复论文——[RED-Net, NIPS16]Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks wi
- 异常的处理:抓抛模型
- 图解HashCode存在的意义
- Font shape `OMX/cmex/m/n‘ in size <10.53937> not available (Font) size <10.95> substituted.
- Pytorch learning record (XI): data enhancement, torchvision Explanation of various functions of transforms
- Implementation of displaying database pictures to browser tables based on thymeleaf
- Pyqy5 learning (2): qmainwindow + QWidget + qlabel
- Pytorch learning record (III): structure of neural network + using sequential and module to define the model
- Use Matplotlib. In Jupiter notebook Pyplot server hangs up and crashes
猜你喜欢
Ptorch learning record (XIII): recurrent neural network
Chapter 3 of linear algebra - Elementary Transformation of matrix and system of linear equations
lambda表达式
delete和truncate
Pyqy5 learning (2): qmainwindow + QWidget + qlabel
Contrôle automatique (version Han min)
基于thymeleaf实现数据库图片展示到浏览器表格
线性代数第二章-矩阵及其运算
Filebrowser realizes private network disk
Practical operation - Nacos installation and configuration
随机推荐
Illustrate the significance of hashcode
线性代数第三章-矩阵的初等变换与线性方程组
Chapter 3 of linear algebra - Elementary Transformation of matrix and system of linear equations
PyTorch笔记——实现线性回归完整代码&手动或自动计算梯度代码对比
域内用户访问域外samba服务器用户名密码错误
Pytorch introduction notes - use a simple example to observe the output size of each layer of forward propagation
事实最终变量与最终变量
去噪论文阅读——[RIDNet, ICCV19]Real Image Denoising with Feature Attention
LDCT图像重建论文——Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Opensips (1) -- detailed process of installing opensips
Anaconda
Pytorch learning record (XII): learning rate attenuation + regularization
治療TensorFlow後遺症——簡單例子記錄torch.utils.data.dataset.Dataset重寫時的圖片維度問題
SQL注入
The official website of UMI yarn create @ umijs / UMI app reports an error: the syntax of file name, directory name or volume label is incorrect
自定义异常类
Get the value of state in effects in DVA
深度学习基础——简单了解meta learning(来自李宏毅课程笔记)
Write your own redistemplate
container