当前位置：网站首页>Reading of denoising paper - [ridnet, iccv19] real image denoising with feature attention

Reading of denoising paper - [ridnet, iccv19] real image denoising with feature attention

2022-04-23 05:59:00 【umbrellalalalala】

Know that the account with the same name is published synchronously

Catalog

One 、 Detailed explanation of architecture parameters
Two 、Contribution&innovation
3、 ... and 、feature attention
Four 、 Data sets & Training details & Ablation Experiment

One 、 Detailed explanation of architecture parameters

This part starts with the architecture , The next part is about the key parts of the architecture motivation.
Insert picture description here

The details of the network architecture have been marked in the figure . The top half of the figure is the overall architecture , The lower half is a separate EAM The architecture of .

Input noisy image, Output noise-free image.

The author divides the architecture into three module：

feature extraction: $f_0=M_e(x)$ , There is only one floor .
feature learning residual module: $f_r=M_{fl}(f_0)$ , By a number of EAM form .
reconstruction module: $\hat{y}=M_r(f_r)$ , There is only one floor .

loss function yes ：
$L(w)=\frac{1}{N}\sum_{i=1}^{N}||RIDNet(x_i)-y_i||_1=\frac{1}{N}\sum_{i=1}^{N}|RIDNet(x_i)-y_i|$

kernel size Except for the finer convolution in the figure above is $1\times1$ , Everything else is $3\times3$ .

About channel, Almost all convolution layers are 64, In addition to the next layer in this structure, the lower sampling is 4：
Insert picture description here
This is channel-wise Of feature attention, Here's what it looks like after unfolding ：

（ Notice that the picture above is missing soft-shrinkage, The author in $H_D$ Then I used it ）
In the picture above d, The author uses 16, So it leads to $H_D$ Then I got the only one channel by 4 Of feature map, Of all other layers channel All are 64 Of .
️ Be careful ： The last multiplication is element-wise Of , Because of the size Different , Therefore, we need to carry out adaptive multiplication first , the $\times 1 \times c$ The part of is expanded to $\times w \times c$ （adaptively rescaled）, The expansion method is replication .

In the paper notation And abbreviations have a lot of , among ERB(enhanced residual block) Refers to ：
Insert picture description here
The thin convolution layer in the figure above is the only one in the whole architecture $1\times1$ The convolution of layer , All other convolutions are $3\times3$ Of .
other notation And abbreviations are mostly marked in the figure above , What is not marked is also relatively simple , Don't say .

Two 、Contribution&innovation

Let's talk about it briefly , Then to important places , In the next part, I'll talk about .

The authors say their contribution as follows ：

Is the first to use in denoising feature attention Model of ;
The existing model Increasing depth may not improve performance, And cause the gradient to disappear ;
This is a one stage model（ contrast CBDNet yes two stage model）, There is only one stage of speech denoising （ contrast CBDNet There is estimated noise 、 There are two stages of denoising ）.

Say that the second increase in depth does not increase performance , The author also said ：

simple cascading the residual modules will not achieve better performance.

Compare the author's method ： increase EAM The number of modules , Can improve performance,EAM It also includes residual learning Thought .

3、 ... and 、feature attention

Insert picture description here
( As a reminder $g_p$ yes global pooling)
Or this structure , It is a feature attention Structure , According to the first part , It's a change feature map Different from channel The weight of . Of course, this can't be regarded as the originality of this work , The author said they were referring to ：

Squeeze-and-excitation networks.

Of course, a little change has been made , Among the things above $H_D$ In the heel ReLU, And the author follows soft-shrinkage.

The author's original words are also put here ：

The feature attention mechanism for selecting the essential features.

Four 、 Data sets & Training details & Ablation Experiment

1, Training data set ：

synthetic image: BSD500,DIV2K,MIT-Adobe FiveK Generate ;
real-world image： Yes SIDD,Poly,RENOIR Generate .

（paper I didn't say how to synthesize the noise map , Check the code , It should be to join directly Gaussion noise; For real noise maps , Look up the SIDD, It's with ground-truth Of noisy image dataset）

2, Data enhancement method ：

random rotation of 90°, 180°, 270°;
flipping horizontally

3, Test data set ：

4 A real noise image data set RNI15,DND,Nam,SIDD.
3 A synthetic noise image data set ：widely-used 12 classical images,BSD68 color and gray 68 images.

4, Training details ：

batchsize=32,patchsize by $80\times80$

5, Ablation Experiment ：

When LSC,SSC,LC Three connections （skip connection） When it's all used , The effect is the best （ These three connections are shown in the figure at the beginning of the article ）, Remove the use of any connection , The effect will decrease .
️ Be careful ： If you don't use these connections , Then increase the of the network depth Will not be raised performance.
Yes feature attention Better than nothing .

Insert picture description here