当前位置：网站首页>Principal Component Analysis - Applications of MATLAB in Mathematical Modeling (2nd Edition)

Principal Component Analysis - Applications of MATLAB in Mathematical Modeling (2nd Edition)

2022-08-09 17:18:00 【YuNlear】

数据建模及MATLAB实现(二)

随着信息技术的发展和成熟,各行业积累的数据越来越多,因此需要通过数据建模的方法,从看似杂乱的海量数据中找到有用的信息.

主成分分析（PCA）

在数学建模中,Often study the issue of multiple variables.When the complex relations between variables number and,会显著增加分析问题的复杂性.

为了解决这种问题,Scientists have proposed the principal component analysis（Principal Component Analysis, PCA）的方法.PCA是一种数学降维方法,The main purpose is to originally many has certain correlation variable,Back together for a new set of mutually independent variables.

通常,Mathematical treatment method is to do the original variable linear combination,As a new comprehensive variables.通常,Math will select the first linear combination of the first comprehensive variables to remember $F_1$ ,而 $F_1$ Need to reflect the original variable information as much as.Information through variance to say,则希望 $var(F_1)$ 尽可能的大,表示 $F_1$ Contains more information as much as possible.因此 $F_1$ Should be a linear combination of China's biggest.

And if the first principal component can't represent all variables of information,Requires the second principal component、第三主成分···,While the former existing information do not need to appear in the latter.Using mathematical way is $cov(F_1,F_2)=0$ ,称 $F_2$ 为第二主成分（ $co v ()$ The covariance in math）.

Principal component analysis of the typical steps

对原始数据进行标准化处理.
The observation data matrix $X$ ,有
$X=\begin{bmatrix} x_{11}&x_{12}&···&x_{1p}\\ x_{21}&x_{22}&···&x_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ x_{n1}&x_{n2}&···&x_{np}\\ \end{bmatrix}$
Can be standardized in accordance with the following method to the original data processing：
$x_{ij}^{*}=\frac{x_{ij}-\overline{x_j}}{\sqrt{Var(x_j)}} \ \ \ \ \ \ \ \ (i=1,2,\cdots,n;j=1,2,\cdots,p)$
其中,
$\overline{x_{j}}=\frac{1}{n}\underset{i=1}{\overset{n}{\Sigma}}x_{ij};Var(x_j)=\frac{1}{n-1}\overset{n}{\underset{i=1}{\Sigma}}(x_ij-\overline{x_j})^2\\ (j=1,2,\cdots,p).$
计算样本相关系数矩阵.
In the correlation coefficient matrix,Useful to the concept of covariance,See the specific concept百度百科-协方差.And the correlation coefficient is the covariance of standardized values.
$\begin{bmatrix} r_{11}&r_{12}&\cdots&r_{1p}\\ r_{21}&r_{22}&\cdots&r_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ r_{p1}&r_{p2}&\cdots&r_{pp}\\ \end{bmatrix}$
其中,
$r_{ij}=cov(x_i,x_j)=\frac{\overset{n}{\underset{k=1}{\Sigma}}(x_{ki}-\overline{x_i})(x_{kj}-\overline{x_j})}{n-1}$
Calculate the correlation coefficient matrix eigenvalues and corresponding eigenvectors of the
特征值： $\lambda_1,\lambda_2,\cdots,\lambda_p$ （Characteristic value namely variance values）
特征向量 $a_i=(a_{i1},a_{i2},\cdots,a_{ip})^T,i=1,2,\cdots,p$
选择重要的主成分,并写出主成分表达式
By principal component analysis getp个主成分,But as a result of the variance of the principal component is decreasing,So in the process of practical analysis,Generally not choosep个主成分,But the contribution rate of each principal component cumulative size selection before $k$ 个主成分.
Contribution is a main component of the proportion of total variance of variance combined：
$贡献率=\frac{\lambda_i}{\overset{p}{\underset{i=1}{\Sigma}}\lambda_i}\times100\%$
The greater the contribution,Original variables show that the principal component contains the more information.一般kA principal components need the cumulative contribution rate in80%或85%以上,In order to ensure comprehensive variables can include the original most of the information.
计算主成分得分
According to the standardization of original data,According to the sample,Each generation in the component expressions,You can get under the main component of each sample of new data,The principal component scores.具体形式如下：
$\begin{bmatrix} F_{11}&F_{12}&\cdots&F_{1k}\\ F_{21}&F_{22}&\cdots&F_{2k}\\ \vdots&\vdots&\ddots&\vdots\\ F_{n1}&F_{n2}&\cdots&F_{nk}\\ \end{bmatrix}$

主成分分析MATLAB程序设计

The following analysis of main componentMATLAB程序PCA.m

function [F,new_score]=PCA(A,T)
a=size(A,1);
b=size(A,2);
% 数据标准化处理
for i=1:b
    SA(:,i)=(A(:,i)-mean(A(:,i)))/std(A(:,i));
end
% Calculate the correlation coefficient matrix eigenvalues and corresponding eigenvectors
CM=corrcoef(SA);
[V,D]=eig(CM);
for j=1:b
    DS(j,1)=D(b+1-j,b+1-j);
end
for i=1:b
    DS(i,2)=DS(i,1)/sum(DS(:,1));
    DS(i,3)=sum(DS(1:i,2));
end
%保留k个主成分
for K=1:b
    if(DS(K,3)>=T)
        Com_num=K;
        brek;
    end
end
%提取主成分对应的特征向量
for i=1:Com_num
    F(:,i)=V(:,b+1-i);
end
new_score=SA*F;

原网站

版权声明
本文为[YuNlear]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/221/202208091453041836.html