当前位置：网站首页>手推卷积神经网络参数（卷积核）求导

手推卷积神经网络参数（卷积核）求导

2022-08-11 05:15:00 【种树家】

手推卷积神经网络求导（卷积链式法则如何理解）

对于卷积如何求参数的导数问题（特别是对多个卷积层如何对初始层数的参数如何求导）困扰我许久了，也一直没有找到这方面的资料，所以自己研究了一下，在这里与大家分享，如果有错误还请大家不吝赐教，有任何疑问请留言。

一个参数求导

torch中的求导需要最后的损失函数值是一个标量。下面以L1损失为例进行分析。我们想自己定义输入和卷积核，所以使用torch.nn.functional中的conv2d进行卷积操作，方便进行分析。具体代码如下

import torch
import torch.nn as nn
import torch.nn.functional as f
loss=nn.L1Loss()
x1=torch.tensor([[[[1,2,3,4],
                [4,3,2,1],
                [1,2,3,4],
                [4,3,2,1]]]],dtype=torch.float,requires_grad=True)
w1=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
bias1=torch.tensor([1],requires_grad=True,dtype=torch.float)
print(a.shape)
print(b.shape)
c=f.conv2d(x1,w1,bias1,padding=1)
print(c)
zero=torch.zeros_like(c)
c=loss(c,zero)
c.backward()
print(w1.grad)

输出结果如下：

C:\software\Anaconda3\python.exe "D:/SZ/runned code/test.py"
torch.Size([1, 1, 4, 4])
torch.Size([1, 1, 3, 3])
tensor([[[[11., 16., 16., 11.],
          [14., 22., 25., 18.],
          [18., 25., 22., 14.],
          [11., 16., 16., 11.]]]], grad_fn=<ThnnConv2DBackward>)
tensor([[[[1.3125, 1.8750, 1.5000],
          [1.8750, 2.5000, 1.8750],
          [1.5000, 1.8750, 1.3125]]]])

Process finished with exit code 0

我们定义了一个输入，它的shape是[1,1,4,4],具体数值可以参看代码，定义了一个卷积核，它的shape是[1,1,3,3]，数值全是1，对输入做卷积，padding是1，所以输出保持和输入一样的尺寸。首先我们要明确L1loss是什么？L1loss是平均绝对误差，即对误差矩阵所有元素求平均值。对于本文损失函数来说，用输出和0做L1损失，即是对输出矩阵中的所有元素求平均值。
下面插图进行详解（自己手画的 emo）先研究只有一个w时怎么求梯度
在这里插入图片描述

两个参数求导

下面研究两个参数时怎么对两个参数进行求导

import torch
import torch.nn as nn
import torch.nn.functional as f
loss=nn.L1Loss()
x1=torch.tensor([[[[1,2,3,4],
                [4,3,2,1],
                [1,2,3,4],
                [4,3,2,1]]]],dtype=torch.float,requires_grad=True)
w1=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
bias1=torch.tensor([1],requires_grad=True,dtype=torch.float)
bias2=torch.tensor([1],requires_grad=True,dtype=torch.float)
w2=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
w3=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
print(x1.shape)
print(w1.shape)
x2=f.conv2d(x1,w1,bias1,padding=1)
print(x2)
x3=f.conv2d(x2,w2,bias2,padding=1)
print(x3)
zero=torch.zeros_like(x3)
l=loss(x3,zero)
l.backward()
print(w1.grad)
print(w2.grad)

输出结果如下：

C:\software\Anaconda3\python.exe "D:/SZ/runned code/test.py"
torch.Size([1, 1, 4, 4])
torch.Size([1, 1, 3, 3])
tensor([[[[11., 16., 16., 11.],
          [14., 22., 25., 18.],
          [18., 25., 22., 14.],
          [11., 16., 16., 11.]]]], grad_fn=<ThnnConv2DBackward>)
tensor([[[[ 64., 105., 109.,  71.],
          [107., 170., 170., 107.],
          [107., 170., 170., 107.],
          [ 71., 109., 105.,  64.]]]], grad_fn=<ThnnConv2DBackward>)
tensor([[[[ 9.3750, 12.5000, 10.6250],
          [12.5000, 15.6250, 12.5000],
          [10.6250, 12.5000,  9.3750]]]])
tensor([[[[10.5625, 13.2500, 10.5625],
          [13.2500, 16.6250, 13.2500],
          [10.5625, 13.2500, 10.5625]]]])

下面插图进行详解（自己手画的 emo）
在这里插入图片描述

三个参数求导

下面研究卷积层为三层时怎么对各层参数求导

import torch
import torch.nn as nn
import torch.nn.functional as f
loss=nn.L1Loss()
x1=torch.tensor([[[[1,2,3,4],
                [4,3,2,1],
                [1,2,3,4],
                [4,3,2,1]]]],dtype=torch.float,requires_grad=True)
w1=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
bias1=torch.tensor([1],requires_grad=True,dtype=torch.float)
bias2=torch.tensor([1],requires_grad=True,dtype=torch.float)
bias3=torch.tensor([1],requires_grad=True,dtype=torch.float)
w2=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
w3=torch.tensor([[[[1,1,1],
                [1,1,1],
                [1,1,1]]]],dtype=torch.float,requires_grad=True)
print(x1.shape)
print(w1.shape)
x2=f.conv2d(x1,w1,bias1,padding=1)
print(x2)
x3=f.conv2d(x2,w2,bias2,padding=1)
print(x3)
x4=f.conv2d(x3,w3,bias3,padding=1)
print(x4)
zero=torch.zeros_like(x4)
l=loss(x4,zero)
l.backward()

print(w1.grad)
print(w2.grad)
print(w3.grad)

输出结果为：

C:\software\Anaconda3\python.exe "D:/SZ/runned code/test.py"
torch.Size([1, 1, 4, 4])
torch.Size([1, 1, 3, 3])
tensor([[[[11., 16., 16., 11.],
          [14., 22., 25., 18.],
          [18., 25., 22., 14.],
          [11., 16., 16., 11.]]]], grad_fn=<ThnnConv2DBackward>)
tensor([[[[ 64., 105., 109.,  71.],
          [107., 170., 170., 107.],
          [107., 170., 170., 107.],
          [ 71., 109., 105.,  64.]]]], grad_fn=<ThnnConv2DBackward>)
tensor([[[[ 447.,  726.,  733.,  458.],
          [ 724., 1173., 1180.,  735.],
          [ 735., 1180., 1173.,  724.],
          [ 458.,  733.,  726.,  447.]]]], grad_fn=<ThnnConv2DBackward>)
tensor([[[[ 64.6875,  85.3125,  73.1250],
          [ 85.3125, 105.6250,  85.3125],
          [ 73.1250,  85.3125,  64.6875]]]])
tensor([[[[ 72.4375,  90.3125,  73.3750],
          [ 90.3125, 111.8750,  90.3125],
          [ 73.3750,  90.3125,  72.4375]]]])
tensor([[[[ 73.2500,  91.0625,  73.6875],
          [ 91.0625, 112.8750,  91.0625],
          [ 73.6875,  91.0625,  73.2500]]]])

在这里插入图片描述
所以可以得到以下结论：

所以我们推导出了卷积神经网络如何更新参数的链式求导法则