当前位置:网站首页>Deep Learning [Chapter 2]
Deep Learning [Chapter 2]
2022-08-11 01:41:00 【sweetheart7-7】
文章目录
机器学习任务攻略
注意: 当 loss 在 training data When it is very big,如果增加模型复杂度,但是 loss 并没有减少,大概率是 optimization 有问题.
解决 o v e r f i t t i n g overfitting overfitting 的几种常见办法:
- 减少模型复杂度,Choose a simpler and smoother model
- 增加训练集数据
- Reduce parameters or share parameters
- 减少 feature
- Early stopping
- Regularization
- Dropout
How to pick as much as possible in the unknown testing data The above perform better model
You can add a validation set to choose a better one model,通常采用 N Fold cross-validation to split the dataset and perform validation.
类神经网络训练不起来怎么办
optimization Fails because…
Local minima(局部最小值)与 saddle point(鞍点)
梯度为 0
如何判断在 θ = θ ′ θ=θ' θ=θ′Loss function 形状:It is described by Taylor series expansion.
当满足 critical point 时,grdient 为 0
在 θ θ θ 为其他值时,如果都大于 L ( θ ′ ) L(θ') L(θ′) 时,Explain that this is the local minimum point…
But we can't bring all of them v v v 值,So it can be turned into the following judgment:
满足 v T H v > 0 v^THv > 0 vTHv>0 的 H H H(hessian) 矩阵叫做 positive definite.
positive definite 的特性:All eigenvalues are positive.
例子:
当 critical point 是 saddle point(鞍点)时,可以通过 Hessian 来帮我们判断 update 的方向.
Find the eigenvalue is Negative counterpart The direction of the eigenvectors,Go in this direction,will reduce the gradient.
一个点代表一个 network.
The vertical axis represents when the training stops,Loss 的大小.
The horizontal axis represents when the training stops,The ratio of eigenvalues with positive eigenvalues to all eigenvalues.
So in high dimensional spaces most are saddle points rather than local minima.
batch 与 momentum
batch
为什么要用 batch:每个 batch A wave of parameters can be updated
When there is parallel operation,batch size The big one might train one epoch 会更快.
但是在 batch size 小的 noise 对 optimization There may be better results.
小的 batch 对 training 更好 可能的解释:
每次 batch 时对应的 loss function 有差异,The corresponding gradients are different.
小 batch size 对 testing 更好:
Local minima 也有好坏之分,平原上的 Local minima 更好,in the canyon Local minima 更差,而 大的 batch size 会更倾向于 in the canyon Local minima.
因为小的 batch size 的 update The direction is random,Its easier to jump out Sharp Minima.
Momentum
普通的 grident descent 在 update only go 梯度的反方向
加上 Momentum 后,update 时,The gradient will go in the opposite direction of the gradient at this time as well momentum(The direction of the previous step) the inverse of the sum.
而 momentum It is the sum of all previous forward directions.
自动调整学习率 (Learning Rate)
当 Loss 不下降的时候,Not necessarily stuck critical point 处(Hard to get to critical point).
当 learning rate for planting,There may be two problems in the picture above(震荡 与 First normal and then very slow)
我们要改一下 gradient descend 的式子,make it in steep places learning rate 小,Gentle place learning rate 大.
Adagrad
相当于 如果 grident 大的话,σ 就大,σ 大的话,learning rate 就小了.
RMSProp
引入 α 来表示 新算出来的 grident 所占的比重.
Adam: RMSProp + Momentum
引入 Adagrad
η Set to a function that varies with time,increase over time η (learning rate) 越来越小.
Warm up (黑科技)
momentus It is to increase the inertia of historical movement,RMS is to moderate the size of the pace,become smoother
损失函数 (Loss)
当只有两个 class 时,一般采用 sigmoid ( 此时 sigmoid 跟 softmax 的作用等价),And two or more are used softmax.
minimizing cross-entropy 就相当于 maximizing linklihood
用 Mean Square 处理 classify 问题,May get stuck critical point.
·
边栏推荐
- [21 Days Learning Challenge] Half Insertion Sort
- Sigma development pays attention to details
- How to check if the online query suddenly slows down
- winform下的富文本编辑器
- How to convert url to obj or obj to url
- 【ASM】字节码操作 ClassWriter COMPUTE_FRAMES 的作用 与 visitMaxs 的关系
- MSTP - Multiple Spanning Tree (Case + Configuration)
- apache+PHP+MySQL+word press, page error when installing word press?
- J9数字论:DAO治理更像一种生态过程:治理原生于网络,不断演变
- More parameter exposure of Pico 4: Pancake + color perspective, and Pro version
猜你喜欢
随机推荐
成功解决TypeError: can‘t multiply sequence by non-int of type ‘float‘
生信实验记录(part2)--tf.reduce_sum()用法介绍
最新国产电源厂家及具体型号pin-to-pin替代手册发布
MySQL进阶查询
std::format格式化自定义类型
The iterator and generator
21、阿里云oss
MySQL中的DDL常规操作总结
MySQL基础篇【第一篇】| 数据库概述及数据准备、常用命令、查看表结构步骤
Linux安装redis数据库
C# WebBrower1控件可编辑模式保存时会提示“该文档已被修改,是否保存修改结果”
C#-委托的详细用法
Mysql database installation and configuration detailed tutorial
第二课第一周第4-6节 医学预后案例欣赏+作业解析
Web APIs BOM - A Comprehensive Case of Operating Browsers
双机热备综合实验(VRRP+OSPF+VTP+NAT+DHCP+PVSTP+单臂路由)
划分字母区间[贪心->空间换时间->数组hash优化]
21. Aliyun oss
SAP ABAP JSON 格式数据处理
Linux install redis database