当前位置:网站首页>[Note] Is the value of BatchSize the bigger the better?
[Note] Is the value of BatchSize the bigger the better?
2022-08-11 04:21:00 【Time.Xu】
The value of BatchSize is not the bigger the better.
Usually we may think that the training effect of the model will be better when setting a larger batchsize.The reasons are as follows:
1. Since the model obtains more training data each time, the descending direction of the model will be more accurate, and the model training curve will be smoother.
2. Reduced training time.At the same epoch, the number of batches required by batchsize is reduced, so the processing speed becomes faster.
But ah but,
Larger batchsize has the following issues to be aware of:
1. Memory problem.Large batches may cause memory/video memory overflow
2. The generalization ability decreases.This is something I hadn't considered before.Using a batch size that is too large may negatively affect the accuracy of the network during training, as it reduces the randomness of gradient descent.
Using a smaller batch size produces more erratic, more random weight updates.This has two positive effects.First, it can help the training "jump out" of local minima that it may have gotten stuck in before, and second, it can stabilize the training to a "flatter" minimum, which usually indicates better generalization performance.
HowSelect the Batch size when training the neural network? - Knowing (zhihu.com)
The above link (invasion and deletion) states:
- When there are enoughHashrate, select a batch size of 32 or less.
- When the computing power is not enough, make a trade-off between efficiency and generalization, and try to choose a smaller batch size.
- When the model is trained to the end, if you want to improve the performance in a more refined way (such as the paper experiment/competition to the end), there is a useful trick, which is to set the batch size to 1, that is, do pure SGD, and slowly reduce the error.
边栏推荐
猜你喜欢

Interchangeability and Measurement Techniques - Tolerance Principles and Selection Methods

Clang Code Model: Error: The clangbackend executable “X:/clangbackend.exe“ could not be started
![[FPGA] Design Ideas - I2C Protocol](/img/ad/7bd52222e81b81a02b72cd3fbc5e16.png)
[FPGA] Design Ideas - I2C Protocol

机器学习是什么?详解机器学习概念
![[FPGA] day19- binary to decimal (BCD code)](/img/d8/6d223e5e81786335a143f135385b08.png)
[FPGA] day19- binary to decimal (BCD code)

Interchangeable Measurement Techniques - Geometric Errors
![[Likou] 22. Bracket generation](/img/f6/435fe9e0b4c1545514d1bf195ffd44.png)
[Likou] 22. Bracket generation

The development of the massage chair control panel makes the massage chair simple and intelligent

【深度学习】基于卷积神经网络的天气识别训练

「转」“搜索”的原理,架构,实现,实践,面试不用再怕了
随机推荐
map和set--天然的搜索和查找语义
Read the article, high-performance and predictable data center network
Pinduoduo store business license related issues
jwsManager服务接口实现类-jni实现
Is there any way for kingbaseES to not read the system view under sys_catalog by default?
MYSQLg advanced ------ return table
【服务器安装Redis】Centos7离线安装redis
(转)JVM中那些区域会发生OOM?
Enter the starting position, the ending position intercepts the linked list
Clang Code Model: Error: The clangbackend executable “X:/clangbackend.exe“ could not be started
How to rebuild after pathman_config and pathman_config_params are deleted?
leetCode刷题14天二叉树系列之《 110 平衡二叉树判断》
Get the length of the linked list
C# 一周入门高级编程之《C#-LINQ》Day Four
【FPGA】SDRAM
LeetCode刷题第12天二叉树系列之《104 二叉树的最大深度》
监听U盘插入 拔出 消息,获得U盘盘符
Common layout effect realization scheme
js 将字符串作为js执行代码使用
Redis:解决分布式高并发修改同一个Key的问题