当前位置:网站首页>3、 Gradient descent solution θ

3、 Gradient descent solution θ

2022-04-23 14:40:00 Beyond proverb

One 、 Get the objective function J(θ), The solution makes J(θ) The youngest θ value

Find the minimum value of the objective function by the least square method
 Insert picture description here
 Insert picture description here
 Insert picture description here
Let the partial guide be 0 You can solve the minimum θ value , namely  Insert picture description here

Two 、 Determine as convex function

Convex functions need judgment methods , such as : Definition 、 First order conditions 、 Second order condition, etc . Using positive definiteness, the second-order condition is used .
A positive semidefinite must be a convex function , Opening up , Positive semidefinite must have a minimum
When judging with second-order conditions , Need to get Hessian matrix , according to Hessian The positive definiteness of determines the concavity and convexity of the function . such as Hessian Matrix positive semidefinite , The function is convex ;Hessian The matrix is positive definite , Strictly convex function

Hessian matrix : Hesse matrix (Hessian Matrix), Also known as Hessian matrix 、 Heather matrix 、 Hesse matrix, etc , It is a square matrix composed of the second partial derivatives of a function of several variables , Describes the local curvature of a function .

3、 ... and 、Hessian matrix

The Hesse matrix is determined by the objective function at point x A symmetric matrix consisting of the second partial derivatives at
Positive definite : Yes A The eigenvalues of are all positive numbers , that A It must be positive definite
improper : Non positive definite or semi positive definite
if A The eigenvalues of the ≥0, Then semidefinite , otherwise ,A Is non positive definite .

Yes J(θ) Find the second derivative of the loss function , What you get must be positive semidefinite , Because I do dot multiplication with myself . Insert picture description here

Four 、 Analytic solution

 Insert picture description here
The numerical solution is a numerical value calculated by some approximation under certain conditions , It can satisfy the equation under the given accuracy conditions , The analytical solution is the analytical formula of the equation ( Such as root formula and so on ), Is the exact solution of the equation , It can satisfy the equation with arbitrary accuracy .

5、 ... and 、 Gradient descent method

This course is similar to other courses , I won't go into details here . Gradient descent method

Gradient descent method : It is a method to find the optimal solution at the fastest speed .

technological process :
1, initialization θ, there θ It's a set of parameters , Initialization is random Then you can.
2, Solving gradient gradient
3,θ(t+1) = θ(t) - grand*learning_rate
there learning_rate Commonly used α It means the learning rate , It's a super parameter , Too big , If the step is too big, it is easy to shake back and forth ; Too small , A lot of iterations , Time consuming .
4,grad < threshold when , Iteration stop , convergence , among threshold It's also a super parameter
Hyperparameters : The parameters passed in by the user are required , If not, use the default parameters .

6、 ... and 、 Code implementation

Guide pack

import numpy as np
import matplotlib.pyplot as plt

Initialize sample data

#  It's quite random X dimension X1,rand Is a random uniform distribution 
X = 2 * np.random.rand(100, 1)
#  Artificial settings, real Y A column of ,np.random.randn(100, 1) It's settings error,randn It's the standard Zhengtai distribution 
y = 4 + 3 * X + np.random.randn(100, 1)
#  Integrate X0 and X1
X_b = np.c_[np.ones((100, 1)), X]
print(X_b)
""" [[1. 1.01134124] [1. 0.98400529] [1. 1.69201204] [1. 0.70020158] [1. 0.1160646 ] [1. 0.42502983] [1. 1.90699898] [1. 0.54715372] [1. 0.73002827] [1. 1.29651341] [1. 1.62559406] [1. 1.61745598] [1. 1.86701453] [1. 1.20449051] [1. 1.97722538] [1. 0.5063885 ] [1. 1.61769812] [1. 0.63034575] [1. 1.98271789] [1. 1.17275471] [1. 0.14718811] [1. 0.94934555] [1. 0.69871645] [1. 1.22897542] [1. 0.59516153] [1. 1.19071408] [1. 1.18316576] [1. 0.03684612] [1. 0.3147711 ] [1. 1.07570897] [1. 1.27796797] [1. 1.43159157] [1. 0.71388871] [1. 0.81642577] [1. 1.68275133] [1. 0.53735427] [1. 1.44912342] [1. 0.10624546] [1. 1.14697422] [1. 1.35930391] [1. 0.73655224] [1. 1.08512154] [1. 0.91499434] [1. 0.62176609] [1. 1.60077283] [1. 0.25995875] [1. 0.3119241 ] [1. 0.25099575] [1. 0.93227026] [1. 0.85510054] [1. 1.5681651 ] [1. 0.49828274] [1. 0.14520117] [1. 1.61801978] [1. 1.08275593] [1. 0.53545855] [1. 1.48276384] [1. 1.19092276] [1. 0.19209144] [1. 1.91535667] [1. 1.94012402] [1. 1.27952383] [1. 1.23557691] [1. 0.9941706 ] [1. 1.04642378] [1. 1.02114013] [1. 1.13222297] [1. 0.5126448 ] [1. 1.22900735] [1. 1.49631537] [1. 0.82234995] [1. 1.24810189] [1. 0.67549922] [1. 1.72536141] [1. 0.15290908] [1. 0.17069838] [1. 0.27173192] [1. 0.09084242] [1. 0.13085313] [1. 1.72356775] [1. 1.65718819] [1. 1.7877667 ] [1. 1.70736708] [1. 0.8037657 ] [1. 0.5386607 ] [1. 0.59842584] [1. 0.4433115 ] [1. 0.11305317] [1. 0.15295053] [1. 1.81369029] [1. 1.72434082] [1. 1.08908323] [1. 1.65763828] [1. 0.75378952] [1. 1.61262625] [1. 0.37017158] [1. 1.12323188] [1. 0.22165802] [1. 1.69647343] [1. 1.66041812]] """

 Insert picture description here

#  Conventional equation solving theta
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
print(theta_best)
""" [[3.9942692 ] [3.01839793]] """
#  Create... In the test set X1
X_new = np.array([[0], [2]]) 
X_new_b = np.c_[(np.ones((2, 1))), X_new]
print(X_new_b)
y_predict = X_new_b.dot(theta_best)
print(y_predict)
""" [[1. 0.] [1. 2.]] [[ 3.9942692 ] [10.03106506]] """

mapping

plt.plot(X_new, y_predict, 'r-')
plt.plot(X, y, 'b.')
plt.axis([0, 2, 0, 15])
plt.show()

 Insert picture description here

7、 ... and 、 Complete code

import numpy as np
import matplotlib.pyplot as plt

#  It's quite random X dimension X1,rand Is a random uniform distribution 
X = 2 * np.random.rand(100, 1)
#  Artificial settings, real Y A column of ,np.random.randn(100, 1) It's settings error,randn It's the standard Zhengtai distribution 
y = 4 + 3 * X + np.random.randn(100, 1)

#  Integrate X0 and X1
X_b = np.c_[np.ones((100, 1)), X]
print(X_b)

#  Conventional equation solving theta
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
print(theta_best)

#  Create... In the test set X1
X_new = np.array([[0], [2]]) 
X_new_b = np.c_[(np.ones((2, 1))), X_new]
print(X_new_b)
y_predict = X_new_b.dot(theta_best)
print(y_predict)

# mapping 
plt.plot(X_new, y_predict, 'r-')
plt.plot(X, y, 'b.')
plt.axis([0, 2, 0, 15])
plt.show()

版权声明
本文为[Beyond proverb]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231436591095.html