当前位置:网站首页>Postgraduate Work Weekly
Postgraduate Work Weekly
2022-08-09 16:50:00 【wangyunpeng33】
学习目标(第二周):
吴恩达教授深度学习课程《神经网络与深度学习》
- 神经网络基础
《机器学习实战》
- A preliminary understanding of the top ten machine learning algorithms,会调用scikit learn和它的contrib库,Try to reproduce after you get used to it.
- Combine video lessons with the previous foundation,The plan focuses on supervised learning algorithms.逻辑回归,决策树,朴素贝叶斯,Support vector machines and dimensionality reduction methodsPCA
学习内容:
视频课程内容
- 二元分类 (Binary Classification)
- Logistic 回归 (Logistic Regression)
- 梯度下降法 (Gradient Descent)
- 向量化 (Vectorization)
- 向量化 Logistic 回归的梯度输出 (Vectorizing Logistic Regression’s Gradient Computation)
- 建立一个【Cat identification】神经网络
《机器学习实战》前六章
Familiar with algorithm principles,Run the sample code to help understand
学习时间:
- 5.15-5.21
学习产出:
- 完成Week2 After-school exercises and programming assignments for basic neural networks
- CSDN 博客 1 篇
二元分类
在二分类问题中,我们的目标就是习得一个分类器,它以图片的特征向量x作为输入,然后预测输出 y 结果为1还是0.
逻辑回归
Logistic regression is suitable for solving binary classification problems,给出输入 x x x 以及参数 w w w 和 b b b 之后,Produces output predictions
y ^ = σ ( w T x + b ) \hat{y}=\sigma(w^{T}x+b) y^=σ(wTx+b)
σ ( z ) = 1 1 + e − z \sigma(z)=\frac1{1+e^{-z}} σ(z)=1+e−z1
损失函数(误差函数) L L L:
L ( y ^ , y ) = − y log ( y ^ ) − ( 1 − y ) log ( 1 − y ^ ) L(\hat{y},y)=-y\log(\hat{y})-(1-y)\log(1-\hat{y}) L(y^,y)=−ylog(y^)−(1−y)log(1−y^)
损失函数是在单个训练样本中定义的,It measures how well an algorithm performs on a single training sample,为了衡量算法在全部训练样本上的表现如何,我们需要定义一个算法的代价函数,算法的代价函数是对 m 个样本的损失函数求和然后除以 m :
J ( w , b ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) ) = 1 m ∑ i = 1 m ( − y ( i ) log y ^ ( i ) − ( 1 − y ( i ) ) log ( 1 − y ^ ( i ) ) ) J(w,b)=\frac1m\sum_{i=1}^mL(\hat{y}^{(i)},y^{(i)})=\frac1m\sum_{i=1}^m(-y^{(i)}\log\hat{y}^{(i)}-(1-y^{(i)})\log(1-\hat{y}^{(i)})) J(w,b)=m1i=1∑mL(y^(i),y(i))=m1i=1∑m(−y(i)logy^(i)−(1−y(i))log(1−y^(i)))
所以在训练逻辑回归模型时候,我们需要找到合适的 w w w 和 b b b ,来让代价函数 J J J 的总代价降到最低.
梯度下降法 (Gradient Descent)
在测试集上,通过最小化代价函数(成本函数) J ( w , b ) J ( w , b ) J(w,b)来训练的参数 w w w和 b b b
朝最陡的下坡方向走一步,不断地迭代,直到走到全局最优解或者接近全局最优解的地方.
α 表示学习率(learning rate),用来控制步长(step)
α越小,代价函数JThe slower the convergence,而αToo large will cause J在最小值附近震荡
For example, there are two features w 1 w1 w1 w 2 w2 w2的mA sample logistic regression gradient descent algorithm
代码流程
J=0;dw1=0;dw2=0;db=0;
for i = 1 to m
z(i) = wx(i)+b;
a(i) = sigmoid(z(i));
J += -[y(i)log(a(i))+(1-y(i))log(1-a(i));
dz(i) = a(i)-y(i);
dw1 += x1(i)dz(i);
dw2 += x2(i)dz(i);
db += dz(i);
J/= m;
dw1/= m;
dw2/= m;
db/= m;
w=w-alpha*dw
b=b-alpha*db
向量化 (Vectorization)
when applying deep learning algorithms,显式地使用for循环使算法很低效,引入向量化技术,它可以允许你的代码摆脱这些显式的for循环.
在逻辑回归中你需要去计算 z = w T x + b z=w^Tx+b z=wTx+b
x
z=0
for i in range(n_x)
z+=w[i]*x[i]
z+=b
向量化实现z=np.dot(w,x)+b
理解Python的广播(broadcasting)功能以及在Python-numpy中构造向量
Record two interesting questions
Q1:考虑以下两个随机数组a和b:
a = np.random.randn(4, 3) # a.shape = (4, 3)
b = np.random.randn(3, 2) # b.shape = (3, 2)
c = a * b
计算不成立因为这两个矩阵维度不匹配
Q2: 考虑以下代码段:
a = np.random.randn(3, 3)
b = np.random.randn(3, 1)
c = a * b
这会触发广播机制,b会被复制3次变成(3 * 3),而 * 操作是元素乘法,所以c.shape = (3, 3)
Programming assignments this week
搭建一个能够 【Identify cats】A simple neural network
Reference for this work【Kulbear】的github中的文章Logistic Regression with a Neural Network mindset
The main steps and key codes of building a neural network:
定义模型结构(例如输入特征的数量)
初始化模型的参数
构建sigmoid(),初始化参数w和b
#标准化数据集
train_set_x = train_set_x_flatten / 255
test_set_x = test_set_x_flatten / 255
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
def initialize_with_zeros(dim):
w = np.zeros(shape = (dim,1))
b = 0
#使用断言来确保我要的数据是正确的
assert(w.shape == (dim, 1)) #w的维度是(dim,1)
assert(isinstance(b, float) or isinstance(b, int)) #b的类型是float或者是int
return (w , b)
- 循环:
3.1 计算当前损失(正向传播)
3.2 计算当前梯度(反向传播)
3.3 更新参数(梯度下降)
def propagate(w, b, X, Y):
m = X.shape[1]
#正向传播
A = sigmoid(np.dot(w.T,X) + b) #计算激活值,请参考公式2.
cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) #计算成本,请参考公式3和4.
#反向传播
dw = (1 / m) * np.dot(X, (A - Y).T) #请参考视频中的偏导公式.
db = (1 / m) * np.sum(A - Y) #请参考视频中的偏导公式.
#使用断言确保我的数据是正确的
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
#创建一个字典,把dw和db保存起来.
grads = {
"dw": dw,
"db": db
}
return (grads , cost)
使用梯度下降更新参数,最小化成本函数 J J J来学习 w w w 和 b b b
参数 θ = θ − α d θ θ=θ−α dθ θ=θ−αdθ
def optimize(w , b , X , Y , num_iterations , learning_rate , print_cost = True):
costs = []
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]
w = w - learning_rate * dw
b = b - learning_rate * db
#记录成本
if i % 100 == 0:
costs.append(cost)
#打印成本数据
if (print_cost) and (i % 100 == 0):
print("迭代的次数: %i , 误差值: %f" % (i,cost))
params = {
"w" : w,
"b" : b }
grads = {
"dw": dw,
"db": db }
return (params , grads , costs)
The actual test results after the model is built:
机器学习算法
机器学习算法可以分为三大类:
- 监督学习算法 (Supervised Algorithms) 在监督学习训练过程中,可以由训练数据集学到或建立一个模式(函数 / learning model),并依此模式推测新的实例.主要包括神经网络、支持向量机、K-Near neighbor law、朴素贝叶斯法、决策树等.
- 无监督学习算法 (Unsupervised Algorithms) 没有明确目的的训练方式,你无法提前知道结果是什么.常见的2类算法是:聚类、降维.
- 强化学习算法 (Reinforcement Algorithms) 主要基于决策进行训练,算法根据输出结果(决策)的成功或错误来训练自己,通过大量经验训练优化后的算法将能够给出较好的预测
基本的机器学习算法:
- 线性回归算法 Linear Regression
- 支持向量机算法 (Support Vector Machine,SVM)
- 最近邻居/k-近邻算法 (K-Nearest Neighbors,KNN)
- 逻辑回归算法 Logistic Regression
- 树算法 Decision Treek
- 平均算法 K-Means
- 随机森林算法 Random Forest
- 朴素贝叶斯算法 Naive Bayes
- 降维算法 Dimensional Reduction
- 梯度增强算法 Gradient Boosting
KNN(K Nearest Neighbor) K近邻(有监督算法,分类算法)
分类数据最简单有效的算法,The advantage is that it can be used to fill in missing values,可以处理非线性问题.The tuning method isK值的选择,k值太小,容易过拟合.The disadvantage is that it is very sensitive to the local structure of the data.计算量大,The data needs to be normalized,Make every data point in the same range,Because we need to find the distance of all neighbors,所以效率低下,Very time consuming when the dataset is large.Mainly used for small samples,Datasets with fewer features,KNNIt is more suitable for dealing with some relatively complex classification rules,It is widely used in recommender systems.
Decision Tree决策树(有监督算法,概率算法)
Only discrete features are accepted,It belongs to classification decision tree.条件熵的计算 H(Label |某个特征) This conditional entropy reflects when the feature is known,The level of confusion in labels,can help us select features,Select the node of the decision tree for the next step. Decision tree as a greedy algorithm,Decision trees cannot be viewed from a global perspective,This makes it difficult to tune.The advantage is that it is highly interpretable,可视化.The disadvantage is that it is easy to overfit(通过剪枝避免过拟合),很难调优,准确率不高.The decision tree algorithm can be seen as an integration of multiple logistic regression algorithms.
Naive Bayes朴素贝叶斯算法
基于概率论的贝叶斯定理,Suitable for independent scenarios between features,例如利用花瓣的长度和宽度来预测花的类型.“朴素”的内涵可以理解为特征和特征之间独立性强,Thereby reducing the requirements for the amount of data.Provides methods for estimating unknown probabilities using known values.
SVM(Support Vector Machine)支持向量机
寻找一个超平面 w T x + b w^Tx+b wTx+b将样本分为两类,and the largest interval.不易过拟合,Suitable for dealing with complex nonlinear problems.SVMAlgorithms are the best in between simple algorithms and neural networks. The hyperplane is determined by only a few support vectors,Show that it doesn't care about the minutiae,所以不容易过拟合,Handle complex nonlinear problems.缺点在于计算量大.
Dimensional Reduction 降维算法
There are a number of reasons for dimensionality reduction:Make data easier to use,降低算法开销,去除噪声,Make the results understandable.Among them, principal component analysis is suitable for numerical data,由于Numpy中的模块linalg可以用eig()Solve for the eigenvalues and eigenvectors,方便我们实现PCA.
PCAThe rough steps are to average,计算协方差矩阵,用linalg.eig()求解特征值特征向量,特征值排序,利用最大的NThe matrix composed of the eigenvectors is spatially transformed.
边栏推荐
猜你喜欢
随机推荐
什么是跨境电商测评?
Use tensorboard remotely on the server
模型训练的auc和loss比较问题
PAT1027 Printing Hourglass
YOLOV1详解
WebGL探索——抉择:实践方向(twgl.js、Filament、Claygl、BabylonJS、ThreeJS、LayaboxJS、SceneJS、ThinkJS、ThingJS)
A shortcut method for writing menu commands in C
NoUniqueBeanDefinitionException和JSON乱码处理出现异常
【研究生工作周报】(第五周)
[Elementary C language] Detailed explanation of branch statements
抱抱脸(hugging face)教程-中文翻译-预处理
Noun concept summary (not regularly updated ~ ~)
地铁预约Postman脚本使用
抱抱脸(hugging face)教程-中文翻译-使用 AutoClass 加载预训练的实例
Welcome to use CSDN - markdown editor
【Leetcode】433. 最小基因变化
从数组到js基础结束
【研究生工作周报】(第十二周)
stream去重相同属性对象
人脸识别示例代码解析(二)——人脸识别解析