当前位置：网站首页>33 Basic Statistics - One Item Nonparametric Test

33 Basic Statistics - One Item Nonparametric Test

2022-08-09 03:33:00 【paper limit】

1.The purpose of the chi-square test and its basic idea

文章目录

卡方检验的目的就是通过样本数据的分布来检验总体分布与期望分布或某一理论分布是否一致,零假设是样本的总体分布与期望分布或某一理论分布无显著差异.
卡方检验 基本思想是,if from a random variable

X

A number of observations are randomly selected from the sample,when these samples fall

X

的

k

个互不相关的子集中的观察频数服从一个多项分布,当

k

趋于无穷时,This multinomial distribution follows a chi-square distribution,根据这个思想,对变量

X

总体分布的检验可从各个观察频数的分析入手.
under the assumption that the null hypothesis holds,If the variable value falls in th

i

The probability in the subsets is

p_i

,The corresponding expected frequency is

n p_i

,期望频数的分布代表了When the null hypothesis holds的理论分布,可以采用卡方统计量来检验实际分布与期望的分布之间是否存在显著差异.A typical chi-square statistic is

P e a r s o n

统计量,定义为：

X^2=\sum_{i=1}^k{\frac{\text{（观测频数}-\text{predicted frequency）}^2}{\text{predicted frequency}}}

X^2

服从

k - 1

个自由度的卡方分布.当

X^2

值越大,说明观测频数分布与期望分布差距越大.SPSS会自动计算

X^2

值,And calculate the corresponding probability according to the chi-square distribution table

p

值.
如果

p

值小于显著性水平,拒绝零假设,认为总体分布与期望分布或某一理论分布有显著差异;反之,如果

p

值大于显著性水平,接受零假设,认为总体分布与期望分布或某一理论分布一致.
例子：https://blog.csdn.net/snowdroptulip/article/details/78770088 .Pretty detailed example.

2.二项分布检验

tested for the binomial distribution目的就是来检验样本中这两个类别的观察频率是否等于给定的检验比列,零假设是样本来自的总体分布与指定的二项分布无显著差异.
二项分布检验在小样本中采用精确检验方法,Approximate tests are used for large samples.Exact test method calculationnThe number of successes in each trial is less than or equal tox次的概率,即 $P\left\{ X\leqslant x \right\} =\sum_{i=0}^x{C_{n}^{i}p^iq^{n-i}}$ .used in large samplesZ检验统计量,under the null hypothesisZThe test statistic approximately follows a normal distribution and is defined as $Z=\frac{x\pm 0.5-np}{\sqrt{np\left( 1-p \right)}}$ ,The above formula performs continuity correction,当 $x$ 小于 $n / 2$ 时加 0.5 ,当 $x$ 大于 $n / 2$ .时减0.5.

3.游程检验

The nature of run-length testing：首先,The type of the variable must be dichotomous,For example the gender variable,A variable consisting of only two numbers.然后,Analysis of runs tests目的is used to determine whether the order of observations is random.游程检验是最简单的判断随机性的方法.
所以,单样本检验的时候,The null hypothesis is that the sequence is random;而Runs test for two independent samples就是用来检验两个赝本来自总体的分布是否相同,此时的零假设就是两组独立样本来自总体分布无显著性差异.

4.单样本K-S检验（Kolmogorov-Smirnov）

This method is a goodness-of-fit test method,将变量的观察累积分布函数与指定的理论分布进比较,Mainly the normal distribution、Uniform distribution and Poisson distribution, etc.
单样本k-s检验的零假设就是样本来自的总体分布与制定理论分布无显著性差异.
基本思路如下：
under the assumption that the null hypothesis holds,计算各样本观测值在理论分布中出现的理论累积概率值 $F (X)$ ,其次经计算各样本观测值实际累计概率值 $S (X)$ ,Calculate the difference between the actual probability value and the theoretical probability value $D (X)$ ,最后计算差值序列中的最大绝对差值 $D=\underset{1\leqslant i\leqslant n}{max}\left( |S\left( X_i \right) -F\left( X_i \right) |,|S\left( X_{i-1} \right) -F\left( X_i \right) |\right) $, 这个$ D$ Statistics are ours too $k - s$ 统计量.
在小样本下,When the null hypothesis holds,D统计量服从 Kolmogorov分布,在大样本下,When the null hypothesis holds, $\sqrt{n} D$ 统计量近似服从 Kolmogorov 分布.当 D 小于 0 时,K(X)为0;当 D 大于 0 时, $K\left( x \right) =\sum_{j=-\infty}^{\infty}{\left( -1 \right) ^j\exp \left( -2j^2x^2 \right)}$ .
$K - S$ 检验步骤：
1）建立假设检验
2）由样本数据计算经验分布函数与理论分布函数,代入计算
$D=\underset{1\leqslant i\leqslant n}{max}\left( |S\left( X_i \right) -F\left( X_i \right) |,|S\left( X_{i-1} \right) -F\left( X_i \right) |\right) $
1sisn
3）Look up the table to determine the critical value $D_n(\alpha)$
4）作出判断
If the sample is calculated $D_n>D(\alpha)$ ,拒绝零假设,Otherwise the fit is considered satisfactory,即认为该样本来自于特定的理论分布.