当前位置：网站首页>Deep Learning [Chapter 2]

Deep Learning [Chapter 2]

2022-08-11 01:41:00 【sweetheart7-7】

文章目录

机器学习任务攻略
类神经网络训练不起来怎么办

机器学习任务攻略

在这里插入图片描述
注意： 当 loss 在 training data When it is very big,如果增加模型复杂度,但是 loss 并没有减少,大概率是 optimization 有问题.

解决 $o v er f i tt in g$ 的几种常见办法：

减少模型复杂度,Choose a simpler and smoother model
增加训练集数据
Reduce parameters or share parameters
减少 feature
Early stopping
Regularization
Dropout

How to pick as much as possible in the unknown testing data The above perform better model
You can add a validation set to choose a better one model,通常采用 N Fold cross-validation to split the dataset and perform validation.

类神经网络训练不起来怎么办

optimization Fails because…

Local minima（局部最小值）与 saddle point（鞍点）

梯度为 0

在这里插入图片描述

如何判断在 $θ = θ^{'}$ Loss function 形状：It is described by Taylor series expansion.

在这里插入图片描述

当满足 critical point 时,grdient 为 0

在这里插入图片描述

在 $θ$ 为其他值时,如果都大于 $L (θ^{'})$ 时,Explain that this is the local minimum point…
But we can't bring all of them $v$ 值,So it can be turned into the following judgment：
满足 $v^THv > 0$ 的 $H$ (hessian) 矩阵叫做 positive definite.
positive definite 的特性：All eigenvalues are positive.

在这里插入图片描述

例子：

在这里插入图片描述

当 critical point 是 saddle point（鞍点）时,可以通过 Hessian 来帮我们判断 update 的方向.

Find the eigenvalue is Negative counterpart The direction of the eigenvectors,Go in this direction,will reduce the gradient.

在这里插入图片描述

一个点代表一个 network.
The vertical axis represents when the training stops,Loss 的大小.
The horizontal axis represents when the training stops,The ratio of eigenvalues with positive eigenvalues to all eigenvalues.

So in high dimensional spaces most are saddle points rather than local minima.

batch 与 momentum

batch

在这里插入图片描述

为什么要用 batch：每个 batch A wave of parameters can be updated

在这里插入图片描述

When there is parallel operation,batch size The big one might train one epoch 会更快.

在这里插入图片描述

但是在 batch size 小的 noise 对 optimization There may be better results.

在这里插入图片描述
小的 batch 对 training 更好可能的解释：

每次 batch 时对应的 loss function 有差异,The corresponding gradients are different.

在这里插入图片描述
小 batch size 对 testing 更好：

在这里插入图片描述
Local minima 也有好坏之分,平原上的 Local minima 更好,in the canyon Local minima 更差,而大的 batch size 会更倾向于 in the canyon Local minima.

因为小的 batch size 的 update The direction is random,Its easier to jump out Sharp Minima.

在这里插入图片描述

Momentum

在这里插入图片描述

普通的 grident descent 在 update only go 梯度的反方向

在这里插入图片描述

加上 Momentum 后,update 时,The gradient will go in the opposite direction of the gradient at this time as well momentum（The direction of the previous step） the inverse of the sum.

在这里插入图片描述