当前位置:网站首页>PyTorch 20. Pytorch tips (continuously updated)
PyTorch 20. Pytorch tips (continuously updated)
2022-04-23 07:29:00 【DCGJ666】
PyTorch skill ( Continuous updating )
View the output details of each layer of the model
from torchsummary import summary
summary(your_model, input_size=(channels, H, W))
input_size It is set according to the input size of your own network model
Gradient cut (Gradient Clipping)
import torch.nn as nn
outputs = model(data)
loss = loss_fn(outputs, target)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()
nn.utils.clip_grad_norm_ Parameters of :
- parameters A variable based iterator , It's going to do gradient normalization
- max_norm The maximum norm of the gradient
- norm_type Specify the type of norm , The default is 2
Expand the dimension of a single picture
view() Realization
import cv2
import torch
image = cv2.imread(img_path)
image = torch.tensor(image)
print(image.size)
img = image.view(1, *image.size())
print(img.size())
np.newaxis Realization
import cv2
import torch
image = cv2.imread(img_path)
print(image.shape)
img = image[np.newaxis, :, :, :]
print(img.shape)
unsqueeze() Realization
import cv2
import torch
iamge = cv2.imread(img_path)
image = torch.tensor(image)
print(img.size())
img = image.unsqueeze(dim=0)
print(img.size())
Hot coding alone
stay PyTorch When cross entropy loss function is used in the label convert to onehot, So you don't have to manually Convert , While using MSE It needs to be manually converted to onehot code
import torch.nn.functional as F
import torch
tensor = torch.arange(0, 5)
one_hot = F.one_hot(tensor)
# Output :
# tensor([[1, 0, 0],
# [0, 1, 0],
# [0, 0, 1],
# [1, 0, 0],
# [0, 1, 0]])
F.one_hot They will test the number of different categories by themselves , Generate the corresponding unique hot code , We can also specify the number of categories :
tensor = torch.arange(0, 5) % 3 # tensor([0, 1, 2, 0, 1])
one_hot = F.one_hot(tensor, num_classes=5)
# Output :
# tensor([[1, 0, 0, 0, 0],
# [0, 1, 0, 0, 0],
# [0, 0, 1, 0, 0],
# [1, 0, 0, 0, 0],
# [0, 1, 0, 0, 0]])
Prevent model validation from exploding
There is no need to derive when validating the model , That is, there is no need for gradient calculation , close autograd, It can speed up , To save memory , If you don't turn it off, it may explode
with torch.no_grad():
# Use model The code that makes predictions
pass
For the use of torch.cuda.empty_cache() Why , Because with Pytorch Training for , There may be more and more useless temporary variables , Lead to out of memory.
It means Pytorch The cache allocator will allocate some fixed memory in advance , Even if it's actually tensors I haven't used up all this memory , These memories can't be used by other applications . This distribution process was first CUDA Memory access triggers .
and torch.cuda.empty_cache() Is to release the unused cache memory currently held by the cache allocator , So that these memories can be used by other GPU Used in applications , Note that using this command does not release tensors Occupied video memory .
Monitoring tools
sudo apt-get install htop # Monitor memory (-d For update frequency )
htop -d=0.1
watch -n 0.1 nvidia-smi # Monitor video memory (-n For update frequency , Every time 0.1s Updated once )
Pytorch-Memory-Utils Monitor the occupation of video memory
Memory occupation
Memory occupation = Model parameters + Calculate the resulting intermediate variable
Methods to reduce the occupation of video memory
- inplace Replace
- use del Clear intermediate variables while calculating
- Reduce batch_size, Avoid full connection , Multi use down sampling
- Because each iteration will introduce some temporary variables , It will cause the training speed to be slower and slower , Basically linear growth . But if you use it periodically torch.cuda.empty_cache() This problem can be solved .
Freeze the parameters of some layers
When loading the pre training model , We sometimes want to freeze the front layers , So that its parameters do not change during the training process .
We need to know the name of each layer first , Print with the following code :
net = Network() # Get custom network structure
for name, value in net.named_parameters():
print('name: {0},\t grad: {1}'.format(name, value.requires_grad))
Suppose the first few layers of information are as follows :
name: cnn.VGG_16.convolution1_1.weight, grad: True
name: cnn.VGG_16.convolution1_1.bias, grad: True
name: cnn.VGG_16.convolution1_2.weight, grad: True
name: cnn.VGG_16.convolution1_2.bias, grad: True
name: cnn.VGG_16.convolution2_1.weight, grad: True
name: cnn.VGG_16.convolution2_1.bias, grad: True
name: cnn.VGG_16.convolution2_2.weight, grad: True
name: cnn.VGG_16.convolution2_2.bias, grad: True
hinder True Indicates that the parameters of the layer are trainable , Then we define a list of layers to freeze
no_grad = [
'cnn.VGG_16.convolution1_1.weight',
'cnn.VGG_16.convolution1_1.bias',
'cnn.VGG_16.convolution1_2.weight',
'cnn.VGG_16.convolution1_2.bias'
]
The freezing method is as follows :
net = Net.CTPN() # Get the network structure
for name, value in net.named_parameters():
if name in no_grad:
value.requires_grad = False
else:
value.requires_grad = True
Finally, when defining the optimizer , Only right requires_grad by True To update the parameters of the layer
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)
Explicitly specify model.train() and model.eval()
There are often some sub models in our model , Its parameters during training and testing are different , such as dropout Discard rate and Batch Normalization Medium γ \gamma γ and β \beta β etc. , At this point, we need to explicitly specify different stages , stay pytorch In which we pass model.train() and model.eval() Explicitly specify ( because BN Of running_mean Such not nn.Parameter, use requires_grad Can't freeze , Need to call BN Of .eval())
Use different learning rates for different levels
Take the following model as an example :
net = Network() # Get custom network structure
for name, value in net.named_parameters():
print('name: {}'.format(name))
# Output :
# name: cnn.VGG_16.convolution1_1.weight
# name: cnn.VGG_16.convolution1_1.bias
# name: cnn.VGG_16.convolution1_2.weight
# name: cnn.VGG_16.convolution1_2.bias
# name: cnn.VGG_16.convolution2_1.weight
# name: cnn.VGG_16.convolution2_1.bias
# name: cnn.VGG_16.convolution2_2.weight
# name: cnn.VGG_16.convolution2_2.bias
Yes convolution1 and convolution2 Set different learning rates , First, separate them , Put it in a different list :
conv1_params = []
conv2_params = []
for name, params in net.named_parameters():
if "convolution1" in name:
conv1_params += [params]
else:
conv2_params += [params]
# Then do the following in the optimizer :
optimizer = optim.Adam(
[
{
"params": conv1_params, 'lr':0.01},
{
"params": conv2_params, 'lr':0.001},
],
weight_decay = 1e-3
)
We divide the model into two parts , Put it in a list , Each part corresponds to a dictionary above , Set different learning rates in the dictionary , When the two parts have the same other parameters , Just put the parameters outside the list as global parameters , As above ‘weight_decay’.
You can also set a global learning rate outside the list , When the local learning rate is set in each part of the dictionary , Use the learning rate , Otherwise, the global learning rate out of the list is used .
retain_graph Use
When back propagating a loss , stay pytorch Call in out.backward() That is to say ,
Yes loss The gradient of the loss function to the learning parameter can be obtained by back propagation , stay .backward() in ,
backward(gradient=None, retain_graph=None, create_graph=False)
Here we focus on retain_graph This parameter , If this parameter is False perhaps None After the back propagation , Release the built graph, If True It's not right graph To release .
But we've already calculated the gradient , Why save graph Well ? Here's an example , For example, against the generation network GAN You need to train a module, such as a generator , Then train the discriminator , At this time, the whole network will have more than two loss,
G_loss = ...
D_loss = ...
opt.zero_grad() # Clear all gradients 0
D_loss.backward(retain_graph=True) # preservation graph structure , Use later
opt.step() # Update gradient , Update only D Gradient of , Because only D The gradient of is not 0
opt.zero_grad() # Clear all gradients 0
G_loss.backward(retain_graph=False) # Do not save graph structure , You can release graph
# Pass in the next iteration forward just so so build come out
opt.step() # Update gradient , Update only G Of , Because only G Not for 0
At this time, you can check multiple in the network loss Step by step training .
版权声明
本文为[DCGJ666]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230611343622.html
边栏推荐
猜你喜欢

【点云系列】Neural Opacity Point Cloud(NOPC)

机器学习——PCA与LDA

PyTorch 10. 学习率

【点云系列】SO-Net:Self-Organizing Network for Point Cloud Analysis

Use originpro express for free

Résolution du système

重大安保事件应急通信系统解决方案

基于openmv的无人机Apriltag动态追踪降落完整项目资料(labview+openmv+apriltag+正点原子四轴)

AUTOSAR从入门到精通100讲(五十二)-诊断和通信管理功能单元

ARMCC/GCC下的stack protector
随机推荐
安装 pycuda 出现 PEP517 的错误
项目文件“ ”已被重命名或已不在解决方案中、未能找到与解决方案关联的源代码管理提供程序——两个工程问题
Résolution du système
Int8 quantification and inference of onnx model using TRT
Detailed explanation of unwind stack backtracking
AMBA协议学习小记
以智能生产引领行业风潮!美摄智能视频生产平台亮相2021世界超高清视频产业发展大会
EMMC/SD学习小记
[8] Assertion failed: dims. nbDims == 4 || dims. nbDims == 5
【技术规范】:如何写好技术文档?
x86架构初探之8086
基于open mv 搭配stm32循迹
WinForm scroll bar beautification
GIS实战应用案例100篇(五十一)-ArcGIS中根据指定的范围计算nc文件逐时次空间平均值的方法
Chapter 1 numpy Foundation
防汛救灾应急通信系统
excel实战应用案例100讲(八)-Excel的报表连接功能
Wechat applet uses wxml2canvas plug-in to generate some problem records of pictures
STM32多路测温无线传输报警系统设计(工业定时测温/机舱温度定时检测等)
UEFI学习01-ARM AARCH64编译、ArmPlatformPriPeiCore(SEC)