当前位置:网站首页>33 Basic Statistics - One Item Nonparametric Test
33 Basic Statistics - One Item Nonparametric Test
2022-08-09 03:33:00 【paper limit】
1.The purpose of the chi-square test and its basic idea
文章目录
卡方检验的 目的就是通过样本数据的分布来检验总体分布与期望分布或某一理论分布是否一致,零假设是样本的总体分布与期望分布或某一理论分布无显著差异.
卡方检验 基本思想是,if from a random variable X X XA number of observations are randomly selected from the sample,when these samples fall X X X的 k k k个互不相关的子集中的观察频数服从一个多项分布,当 k k k趋于无穷时,This multinomial distribution follows a chi-square distribution,根据这个思想,对变量 X X X总体分布的检验可从各个观察频数的分析入手.
under the assumption that the null hypothesis holds,If the variable value falls in th i i iThe probability in the subsets is p i p_i pi,The corresponding expected frequency is n p i n p_i npi,期望频数的分布代表了When the null hypothesis holds的理论分布,可以采用卡方统计量来检验实际分布与期望的分布之间是否存在显著差异.A typical chi-square statistic is P e a r s o n Pearson Pearson统计量,定义为:
X 2 = ∑ i = 1 k (观测频数 − predicted frequency) 2 predicted frequency X^2=\sum_{i=1}^k{\frac{\text{(观测频数}-\text{predicted frequency)}^2}{\text{predicted frequency}}} X2=i=1∑kpredicted frequency(观测频数−predicted frequency)2
X 2 X^2 X2服从 k − 1 k-1 k−1个自由度的卡方分布.当 X 2 X^2 X2值越大,说明观测频数分布与期望分布差距越大.SPSS会自动计算 X 2 X^2 X2值,And calculate the corresponding probability according to the chi-square distribution table p p p 值.
如果 p p p 值小于显著性水平,拒绝零假设,认为总体分布与期望分布或某一理论分布有显著差异;反之,如果 p p p 值大于显著性水平,接受零假设,认为总体分布与期望分布或某一理论分布一致.
例子:https://blog.csdn.net/snowdroptulip/article/details/78770088 .Pretty detailed example.
2.二项分布检验
tested for the binomial distribution目的就是来检验样本中这两个类别的观察频率是否等于给定的检验比列,零假设是样本来自的总体分布与指定的二项分布无显著差异.
二项分布检验在小样本中采用精确检验方法,Approximate tests are used for large samples.Exact test method calculationnThe number of successes in each trial is less than or equal tox次的概率,即 P { X ⩽ x } = ∑ i = 0 x C n i p i q n − i P\left\{ X\leqslant x \right\} =\sum_{i=0}^x{C_{n}^{i}p^iq^{n-i}} P{ X⩽x}=∑i=0xCnipiqn−i.used in large samplesZ检验统计量,under the null hypothesisZThe test statistic approximately follows a normal distribution and is defined as Z = x ± 0.5 − n p n p ( 1 − p ) Z=\frac{x\pm 0.5-np}{\sqrt{np\left( 1-p \right)}} Z=np(1−p)x±0.5−np,The above formula performs continuity correction,当 x x x 小于 n / 2 n/2 n/2 时加 0.5 ,当 x x x 大于 n / 2 n/2 n/2 .时减0.5.
3.游程检验
The nature of run-length testing:首先,The type of the variable must be dichotomous,For example the gender variable,A variable consisting of only two numbers.然后,Analysis of runs tests目的is used to determine whether the order of observations is random.游程检验是最简单的判断随机性的方法.
所以,单样本检验的时候,The null hypothesis is that the sequence is random;而Runs test for two independent samples就是用来检验两个赝本来自总体的分布是否相同,此时的零假设就是两组独立样本来自总体分布无显著性差异.
4.单样本K-S检验(Kolmogorov-Smirnov)
This method is a goodness-of-fit test method,将变量的观察累积分布函数与指定的理论分布进比较,Mainly the normal distribution、Uniform distribution and Poisson distribution, etc.
单样本k-s检验的零假设就是样本来自的总体分布与制定理论分布无显著性差异.
基本思路如下:
under the assumption that the null hypothesis holds,计算各样本观测值在理论分布中出现的理论累积概率值 F ( X ) F(X) F(X),其次经计算各样本观测值实际累计概率值 S ( X ) S(X) S(X) ,Calculate the difference between the actual probability value and the theoretical probability value D ( X ) D(X) D(X) ,最后计算差值序列中的最大绝对差值 $D=\underset{1\leqslant i\leqslant n}{max}\left( |S\left( X_i \right) -F\left( X_i \right) |,|S\left( X_{i-1} \right) -F\left( X_i \right) |\right) , 这 个 ,这个 ,这个D$ Statistics are ours too k − s k-s k−s 统计量.
在小样本下,When the null hypothesis holds,D统计量服从 Kolmogorov分布,在大样本下,When the null hypothesis holds, n D \sqrt{n} D nD统计量近似服从 Kolmogorov 分布.当 D 小于 0 时,K(X)为0;当 D 大于 0 时, K ( x ) = ∑ j = − ∞ ∞ ( − 1 ) j exp ( − 2 j 2 x 2 ) K\left( x \right) =\sum_{j=-\infty}^{\infty}{\left( -1 \right) ^j\exp \left( -2j^2x^2 \right)} K(x)=∑j=−∞∞(−1)jexp(−2j2x2).
K − S K-S K−S 检验步骤:
1)建立假设检验
2)由样本数据计算经验分布函数与理论分布函数,代入计算
$D=\underset{1\leqslant i\leqslant n}{max}\left( |S\left( X_i \right) -F\left( X_i \right) |,|S\left( X_{i-1} \right) -F\left( X_i \right) |\right) $
1sisn
3)Look up the table to determine the critical value D n ( α ) D_n(\alpha) Dn(α)
4)作出判断
If the sample is calculated D n > D ( α ) D_n>D(\alpha) Dn>D(α),拒绝零假设,Otherwise the fit is considered satisfactory,即认为该样本来自于特定的理论分布.
边栏推荐
猜你喜欢

Deep learning - in the recognition, for example, this paper discusses how to preserve the neural network model

The condition variable condition_variable implements thread synchronization

01| 数据类型

23 Lectures on Disassembly of Multi-merchant Mall System Functions-Platform Distribution Level

2021-07-21

EventLoop同步异步,宏任务微任务笔记

2022-08-08 The fifth group Gu Xiangquan study notes day31-collection-IO stream-File class

笔记本重装系统如何找回之前自己自带的office

理性预测,未来音视频开发前景将是这般光景

JSP入门
随机推荐
别了,IE浏览器
【图形学】20 基础纹理(一、单张纹理)
30 范数
关于微软2022/2023秋招内推的几句
开发工程师必备————【Day05】UDP协议;进程的并发与并行
wift3.0设置导航栏,标题,字体,item颜色和字体大小
el-popover 内嵌 el-table 后位置错位 乱飘 解决方案
状态机使用小结
Second data CEO CAI data warming invited to jointly organize the acceleration data elements online salon
Linux安装MySQL8
One Pass 1258 - Digital Pyramid (Dynamic Programming)
redis的四种模式
荣耀路由(WS831)做无线中继时LAN网段与WAN网段冲突解决方法
VS2019 compiles boost_1_79, generates 32-bit and 64-bit static libraries
07.1 类的的补充
数组与切片
Kaggle(六)特征衍生技术 特征聚合
If A, B, C, and D process parts, the total number of processed parts is 370. If the number of parts processed by A is 10 more, if the number of parts processed by B is 20 less, if the number of parts
What are the functions and applications of the smart counter control board?
进程和计划任务管理