当前位置:网站首页>[Note] Is the value of BatchSize the bigger the better?
[Note] Is the value of BatchSize the bigger the better?
2022-08-11 04:21:00 【Time.Xu】
The value of BatchSize is not the bigger the better.
Usually we may think that the training effect of the model will be better when setting a larger batchsize.The reasons are as follows:
1. Since the model obtains more training data each time, the descending direction of the model will be more accurate, and the model training curve will be smoother.
2. Reduced training time.At the same epoch, the number of batches required by batchsize is reduced, so the processing speed becomes faster.
But ah but,
Larger batchsize has the following issues to be aware of:
1. Memory problem.Large batches may cause memory/video memory overflow
2. The generalization ability decreases.This is something I hadn't considered before.Using a batch size that is too large may negatively affect the accuracy of the network during training, as it reduces the randomness of gradient descent.
Using a smaller batch size produces more erratic, more random weight updates.This has two positive effects.First, it can help the training "jump out" of local minima that it may have gotten stuck in before, and second, it can stabilize the training to a "flatter" minimum, which usually indicates better generalization performance.
HowSelect the Batch size when training the neural network? - Knowing (zhihu.com)
The above link (invasion and deletion) states:
- When there are enoughHashrate, select a batch size of 32 or less.
- When the computing power is not enough, make a trade-off between efficiency and generalization, and try to choose a smaller batch size.
- When the model is trained to the end, if you want to improve the performance in a more refined way (such as the paper experiment/competition to the end), there is a useful trick, which is to set the batch size to 1, that is, do pure SGD, and slowly reduce the error.
边栏推荐
猜你喜欢

CTO said that the number of rows in a MySQL table should not exceed 2000w, why?

【深度学习】基于卷积神经网络的天气识别训练

The custom of the C language types -- -- -- -- -- - structure

【FPGA】abbreviation

WPF DataGrid 使用数据模板(2)

【人话版】WEB3将至之“权益的游戏”

一文读懂 高性能可预期数据中心网络

"3 Longest Substring Without Repeating Characters" on the 17th day of LeetCode brushing

Read the article, high-performance and predictable data center network

Power Cabinet Data Monitoring RTU
随机推荐
洛谷P2150 寿司晚宴
WPF DataGrid 使用数据模板(2)
Graphical LeetCode - 640. Solving Equations (Difficulty: Moderate)
[FPGA] Design Ideas - I2C Protocol
"98 BST and Its Verification" of the 13th day of leetcode brushing series of binary tree series
Differences and connections between distributed and clustered
redis按照正则批量删除key
es-head plugin insert query and conditional query (5)
Multi-serial port RS485 industrial gateway BL110
洛谷P2245 星际导航
Build Zabbix Kubernetes cluster monitoring platform
堆排序 和冒泡排序
机器学习是什么?详解机器学习概念
Clang Code Model: Error: The clangbackend executable “X:/clangbackend.exe“ could not be started
【FPGA】day22-SPI protocol loopback
What is ensemble learning in machine learning?
MYSQLg advanced ------ return table
洛谷P4324 扭动的回文串
【组成原理 九 CPU】
How to delete statements audit log?