当前位置:网站首页>Tensorflow realizes gradient accumulation, and then returns
Tensorflow realizes gradient accumulation, and then returns
2022-04-23 20:48:00 【NuerNuer】
Because the host graphics card has only 12g Explicit memory of , And only one piece 30 Series of cards , Therefore, when running code, you will inevitably encounter batch_size Don't be too embarrassed , So you can use , The gradient accumulation method is optimized , To expand in disguise batch_size. This kind of operation is in pytorch Good implementation in , But in tf It's a little complicated .
Code up , explain :
def train():
...
...
# All trainable parameters
trainable_vars = tf.trainable_variables()
# Specify the parameters to be trained
vit_trainable_vars = [var for var in trainable_vars if 'VGG' not in var.name] #both generate and vision_transformer #291
print("************vars to train:",len(vit_trainable_vars))
# Define an operation in the calculation diagram , Create an etc. for each parameter to be trained shape Of all the 0 Variable
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in vit_trainable_vars]
# Gradient zeroing operation
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
global_step = tf.Variable(1, trainable=False)
# Define optimization operations
with tf.device('/gpu:0'):
with tf.name_scope('train'):
#train_step = tf.train.AdamOptimizer(learning_rate, 0.9, 0.999).minimize(loss, global_step=global_step)
# Optimizer
optimizer = tf.train.AdamOptimizer(learning_rate, 0.9, 0.999)
# Calculate the gradient
grads = optimizer.compute_gradients(loss, vit_trainable_vars)
# Add this gradient class to
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(grads)]
# Optimization parameters
train_step = optimizer.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(grads)], global_step=global_step)
...
iter = 0
while True:
...
...
iter += 1
sess.run(accum_ops) # Accumulate two gradients
if iter % 2 == 0:
...
sess.run(train_step, feed_dict={...}) # Optimize the parameters once
...
sess.run(zero_ops) # Set the gradient to 0
...
This completes the calculation of the gradient twice , And accumulate the purpose of the return , amount to batch_size It's doubled .
It is worth noting that , If we don't specify the parameterization to save , The newly created Variable It will also be preserved , It will make our model larger , Therefore, only the parameters of the original model should be saved . example , I use it in practice :
var_to_save = [val for val in var if 'Adam' not in val.name and 'Variable_' not in val.name]
saver = tf.train.Saver(var_to_save, max_to_keep=None)
版权声明
本文为[NuerNuer]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204210545522575.html
边栏推荐
- 內網滲透之DOS命令
- Learn to C language fourth day
- Use of node template engine
- 41. 缺失的第一个正数
- 浅谈数据库设计之三大范式
- Leetcode 709, convert to lowercase
- Resolve the error - error identifier 'attr_ id‘ is not in camel case camelcase
- 3-5通过XSS获取cookie以及XSS后台管理系统的使用
- Unity ECS dots notes
- JS arrow function user and processing method of converting arrow function into ordinary function
猜你喜欢
Common commands of MySQL in Linux
MySQL进阶之表的增删改查
What about laptop Caton? Teach you to reinstall the system with one click to "revive" the computer
Mysql database common sense storage engine
LeetCode 116. Populate the next right node pointer for each node
Recommended usage scenarios and production tools for common 60 types of charts
GSI-ECM工程建设管理数字化平台
GO语言开发天天生鲜项目第三天 案例-新闻发布系统二
Go language development Daily Fresh Project Day 3 Case - Press Release System II
wait、waitpid
随机推荐
学会打字后的思考
matplotlib. Pyplot partition drawing
JS arrow function user and processing method of converting arrow function into ordinary function
Unity asset import settings
Leetcode 232, queue with stack
MySQL advanced common functions
Deep analysis of C language function
go map
Deep analysis of C language pointer (Part I)
Come in and teach you how to solve the problem of port occupation
pikachuxss如何获取cookie靶场,返回首页总是失败
Leetcode 74. Search two-dimensional matrix
LeetCode 116. Populate the next right node pointer for each node
Google 尝试在 Chrome 中使用 Rust
How to configure SSH public key in code cloud
go array
Linux中,MySQL的常用命令
Leetcode 1351. Negative numbers in statistical ordered matrices
Go限制深度遍历目录下文件
Sequential state