当前位置：网站首页>Hands-on deep learning_ResNet

Hands-on deep learning_ResNet

2022-08-06 12:32:00 【CV Small Rookie】

The main feature of the residual network is the residual block,The residual block can be clearly expressed with a formula：

f(x)=x+g(x) .

Why design such a function？The following example will answer this question for you

Suppose there is a specific class of neural network architectures ,它包括学习速率和其他超参数设置. 对于所有 $f\in F$ ,Some parameter sets exist（例如权重和偏置）,These parameters can be obtained by training on a suitable dataset. 现在假设 $f\ast$ 是我们真正想要找到的函数,如果是 $f\ast \in F$ ,那我们可以轻而易举的训练得到它,但通常我们不会那么幸运. 相反,We will try to find a function $f_{F} \ast$ ,这是我们在F中的最佳选择. 例如,给定一个具有X特性和y标签的数据集,我们可以尝试通过解决以下优化问题来找到它：

$f_{F}\ast := arg \ \underset{f}{min} L(X,y,f)\ subject \ to \ f \in F$

在构建的时候,我们的 There are usually two cases：Nested function classes and non-nested function classes.

可以看到,in a non-nested function class,虽然 $F_{1->6}$ The fitting ability is constantly improving,但是和 $f\ast$ The distance is not getting closer.相反,If using nested class functions,On the basis of the original function, the fitting ability is continuously increased,always available $f\ast$ 逼近的.

针对这一问题,何恺明等人提出了残差网络（ResNet）. 它在2015年的ImageNet图像识别挑战赛夺魁,并深刻影响了后来的深度神经网络的设计. 残差网络的核心思想是：每个附加层都应该更容易地包含原始函数作为其元素之一. 于是,残差块（residual blocks）便诞生了,This design had a profound impact on how deep neural networks are built.

残差块 residual blocks

On the left is a normal block,On the right is a residual block,The residual structure is implemented by taking a path from the input and adding it directly to the normal block.Compared to the normal block, it is directly fitted by convolution f(x) ,The convolutional part of the residual block only needs to be fitted f(x)-x ,即可.This makes the network structure easier to optimize,It is also easier for input to propagate forward through such a skip link（In fact, backpropagation is also more optimized,我们之后再说）

我们具体来看一下 ResNet What the residual blocks in ：ResNet 沿用了 VGG 完整的 3 × 3 卷积层设计. 残差块里首先有 2 个有相同输出通道数的 3 × 3 卷积层. Each convolutional layer is followed by one Batch Normalization 和 ReLU 激活函数. 然后我们通过跨层数据通路,跳过这 2 个卷积运算,将输入直接加在最后的 ReLU 激活函数前. 这样的设计要求 2 个卷积层的输出与输入形状一样,从而使它们可以相加. 如果想改变通道数,就需要引入一个额外的 1 × 1 卷积层来将输入变换成需要的形状后再做相加运算.如下图.

class Residual(nn.Module):  #@save
    def __init__(self, input_channels, num_channels,
                 use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels,
                               kernel_size=3, padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels,
                               kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels,
                                   kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

ResNet 结构

ResNet-18为例：

ResNet 的前两层跟之前介绍的 GoogLeNet 中的一样：在输出通道数为 64 、步长为 2 的 7 × 7卷积层后,The step length is 2 的 3 × 3 的maxpooling .不同之处在于 ResNet 每个卷积层后增加了Batch Normalization.

b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                   nn.BatchNorm2d(64), nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

ResNet 则使用 4 个由残差块组成的模块,每个模块使用若干个同样输出通道数的残差块. 第一个模块的通道数同输入通道数一致. 由于之前已经使用了步幅为 2 的maxpooling,所以无须减小高和宽. 之后的每个模块在第一个残差块里将上一个模块的通道数翻倍,并将高和宽减半.（这里的 4 I drew on the picture above）

def resnet_block(input_channels, num_channels, num_residuals,
                 first_block=False):
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(input_channels, num_channels,
                                use_1x1conv=True, strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk

b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))

最后与 GoogLeNet 一样,在 ResNet Add a global average pooling layer to it,以及全连接层输出.

net = nn.Sequential(b1, b2, b3, b4, b5,
                    nn.AdaptiveAvgPool2d((1,1)),
                    nn.Flatten(), nn.Linear(512, 10))

ResNet Why it is possible to train very deep networks

In addition, when forward propagation or back propagation is performed：

原网站

版权声明
本文为[CV Small Rookie]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/218/202208061223498788.html

当前位置：网站首页>Hands-on deep learning_ResNet

Hands-on deep learning_ResNet

残差块 residual blocks

ResNet 结构

ResNet Why it is possible to train very deep networks

边栏推荐

猜你喜欢

随机推荐