当前位置：网站首页>4.1 - Support Vector Machines

4.1 - Support Vector Machines

The binary classification problem from the previous chapter

Because the original picture on the right is yellowlossFunctions cannot do gradient descent,So we make an approximation to it（Approximation）,用 $l$ 来取代 $δ$ ,此时的 $l$ Many different functions can be used,比如：方差,Sigmoid+方差,Sigmoid+交叉熵（Cross entropy）,铰链损失（Hinge loss）

When there are outliers in the data（Outlier）时,铰链损失（Hinge loss）Often better than cross entropy（Cross entropy）表现得更好

According to the derivation process in the figure below,SVM的lossThe function can be solved by gradient descent,最后并将SVMConverted to a common expression in textbooks.

对偶表示（Dual Representation）：在SVM中, $\alpha_n^*$ May be sparse,means there is some $\alpha_n^*=0$ 的xⁿ,而那些 $\alpha_n^*\neq 0$ 的xⁿ就是支持向量（support vector）.这些不是0The points ultimately determine the quality of our entire model,This is also why some outliers in the data are hard to matchSVMcause of the impact.
核函数（Kernel Fountion）：右图中 $K(x^n,x)$ 就是核函数,也就是做 $x^n和x$ 的内积（inner product）

核方法（Kernel Trick）：当我们的lossThe function can be written as the blue line on the left,We just need to calculate $K(x^{n'},x^n)$ ,And you don't need to know the vectorx的具体值.This is the benefit of the nuclear approach,他不仅可以应用在SVM上,It can also be applied to linear regression and logistic regression.
We can see in the derivation of the right figurex与zThe inner product after the feature transformation is very complicated,We don't need to do this when we use the kernel method,直接对x,zIt can be squared after inner product.

当x与zmore like,其Kernel值就越大.如果x=z,值为1;x与zcompletely different,值为0.
It is easy to see from the derivation of the formula in the figure belowRBF KernelIt is to do things on an infinitely multidimensional plane,Therefore, the complexity of the model will be very high,This is very easy to overfit.

Do it in the picture on the leftSigmoid Kernel时,There is only one hidden layer network,And the weight of each neuron is a piece of data,The number of neurons is the number of support vectors.
The figure on the right explains how to directly design a kernel functionK(x,z)来代替Φ(x)和Φ(z),以及通过Mercer’s theoryto check whether the kernel function meets the requirements.

SVR（支持向量回归）：When the difference between the predicted value and the actual value is within a certain range,loss=0
Ranking SVM：When something to consider is an orderinglist时
One-class SVM：He wants to belongpositive的exampleare all in the same category,negative的examplescattered elsewhere
下图是SVMand deep learning similarities between the two