当前位置:网站首页>Mahalanobis distance
Mahalanobis distance
2022-08-05 14:33:00 【why why】
马氏距离(Mahalanobis distance)是由印度统计学家马哈拉诺比斯(P. C. Mahalanobis)提出的,表示点与一个分布之间的距离.它是一种有效的计算两个未知样本集的相似度的方法.与欧氏距离不同的是,它考虑到各种特性之间的联系,This article introduces the content related to Mahalanobis distance.
欧氏距离的缺点
Distance metrics are widely used in various disciplines,When the data is represented as a vector\overrightarrow{\mathbf{x} }=\left(x_{1}, x_{2}, \cdots, x_{n}\right)^{T}和\overrightarrow{\mathbf{y}}=\left(y_{1}, y_{2}, \cdots, y_{n}\right)^{T}时,The most intuitive distance metric is the Euclidean distance:
But this measure does not take into account the differences and correlations between dimensions,Different vectors have the same weight when measuring distance,This may interfere with the credibility of the results.
马氏距离
Measures the distance of a sample from a distribution,First normalize the samples and distribution to a multidimensional standard normal distribution and then measure the Euclidean distance
思想
- Rotate the variables according to the principal components,Eliminate correlations between dimensions
- Normalize vectors and distributions,Let each dimension be the same standard normal distribution
推导
- distributed byn个mdimensional vector characterization,即共n条数据,Each piece of data consists of onem维向量表示:
- X的均值为{\mu _X}
- X的协方差矩阵为:
- In order to eliminate the correlation between dimensions,通过一个m \times m的矩阵Q^T对XChange the coordinate table,Map the data to the new coordinate system,用Y表示:
At this point we expectQ^T的作用下,Y 的向量表示中,The different dimensions are independent of each other,此时Y The covariance matrix of should be a diagonal matrix(Except for diagonal elements,其余元素均为0).
- Y 的均值:u_{Y}=Q^{T} u_{X}
- Y 的协方差矩阵:
- 从这里可以发现,当 Q 是\Sigma_{X}When the matrix consists of the eigenvectors of ,\Sigma_{Y} Must be a diagonal matrix,And the value is the eigenvalue corresponding to each eigenvector.由于\Sigma_{X}是对称矩阵,Therefore, it can definitely be obtained by eigendecomposition Q ,且 Q 是正交矩阵.
- \Sigma_{Y}The meaning of the diagonal elements of YThe variance of each vector in ,So both are non-negative values,From this perspective, it can be shown that the eigenvalues of the covariance matrix are non-negative.
- 而且事实上The covariance matrix itself is positive semi-definite,The eigenvalues are all non-negative
- Unrelated and independent issues:
- Here we show that the correlation coefficient between the transformed vectors is 0,That is, the vectors are not correlated
- In fact independence is a stronger constraint than irrelevance,Irrelevance often does not lead to independence
- 但在under a Gaussian distribution,不相关和独立是等价的
Next we normalize the vector
- After we subtract the mean,Vector has become0A vector of means,The distance normalization is just a difference to change the variance to 1
- 在经历了Y=Q^TX变换后,YThe covariance matrix of has become a diagonal matrix,对角线元素为YThe variance of each dimension in the data,Then we just have to letYDivide each dimension data by the standard deviation of the dimension data.
- We will de-correlate、0均值化、The normalized data is recorded as Z:
- The Mahalanobis distance is the metric corrected vectorZto the distribution center(原点)的欧式距离:
参考资料
边栏推荐
猜你喜欢

Memory Management Architecture and Virtual Address Space Layout

C#员工考勤管理系统源码 考勤工资管理系统源码

Shell realizes automatic decompression of encrypted compressed files

Observation cloud product update|DCA web terminal is online; new global viewer auto refresh configuration; new global blacklist function; new custom function menu, etc.

【虚拟机数据恢复】Hyper-V虚拟化文件丢失,虚拟化服务器不可用的数据恢复案例

day14·私有化属性

软件测试之集成测试

NLP 论文领读|无参数机器翻译遇上对比学习:效率和性能我全都要!

全栈软件测试工程师技术涨薪进阶路径图(附资料)
![[CUDA study notes] What is GPU computing](/img/20/6c68dbba66904b58a20c143c0f576d.png)
[CUDA study notes] What is GPU computing
随机推荐
业务局部多线程
Taurus.MVC WebAPI 入门开发教程3:路由类型和路由映射。
Design of Fingerprint Time Attendance Machine + Host Computer Management Based on STM32 MCU
Redis - Talking about master-slave synchronization
HDD杭州站•ArkUI让开发更灵活
The memory problem is difficult to locate, that's because you don't use ASAN
自媒体人必看的9个网站,每一个都很实用,值得收藏
The power behind | Open up a new experience of intelligent teaching Huayun Data helps Tianchang Industrial School to build a new IT training room
20款短视频自媒体必备工具,让你的运营效率翻倍
选择排名靠前的期货公司开户
【CUDA学习笔记】什么是GPU计算
OpenHarmony像素单位(eTS)
Memory Management Architecture and Virtual Address Space Layout
The actual use of EOSJS in China Mobile Chain
荆棘与玫瑰:基础服务架构师的成⻓之路PPT
软件测试之对测试平台的看法
Matplotlib绘制直方图
LeetCode Question of the Day (1706. Where Will the Ball Fall)
umi3.5新特性之提速方案mfsu
拉格朗日乘数法