Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Studentized range distribution

From Wikipedia, the free encyclopedia
Studentized range distribution
Probability density function
Cumulative distribution function
Parametersk > 1, the number of groups
ν{\displaystyle \nu } > 0, thedegrees of freedom
Supportq(0,+){\displaystyle q\in (0,+\infty )}
PDFfR(q;k,ν)=2πk(k1)νν/2Γ(ν/2)2(ν/21)0sνφ(νs)×[φ(z+qs)φ(z)[Φ(z+qs)Φ(z)]k2dz]ds{\displaystyle {\begin{matrix}f_{\text{R}}(q;k,\nu )={\frac {\,{\sqrt {2\pi \,}}\,k\,(k-1)\,\nu ^{\nu /2}\,}{\Gamma (\nu /2)\,2^{\left(\nu /2-1\right)}}}\int _{0}^{\infty }s^{\nu }\,\varphi ({\sqrt {\nu \,}}\,s)\,\times \\[0.5em]\left[\int _{-\infty }^{\infty }\varphi (z+q\,s)\,\varphi (z)\,\left[\Phi (z+q\,s)-\Phi (z)\right]^{k-2}\,\mathrm {d} z\right]\,\mathrm {d} s\end{matrix}}}
CDFFR(q;k,ν)=2πkνν/2Γ(ν/2)2(ν/21)0sν1φ(νs)×[φ(z)[Φ(z+qs)Φ(z)]k1dz]ds{\displaystyle {\begin{matrix}F_{\text{R}}(q;k,\nu )={\frac {\,{\sqrt {2\pi \,}}\,k\,\nu ^{\nu /2}\,}{\,\Gamma (\nu /2)\,2^{\left(\nu /2-1\right)}}}\int _{0}^{\infty }s^{\nu -1}\,\varphi ({\sqrt {\nu \,}}\,s)\,\times \\[0.5em]\qquad \left[\int _{-\infty }^{\infty }\varphi (z)\,\left[\Phi (z+q\,s)-\Phi (z)\right]^{k-1}\,\mathrm {d} z\right]\,\mathrm {d} s\end{matrix}}}

Inprobability andstatistics,studentized range distribution is the continuousprobability distribution of thestudentized range of ani.i.d. sample from anormally distributed population.

Suppose that we take a sample of sizen from each ofk populations with the samenormal distributionN(μσ2) and suppose thaty¯min{\displaystyle {\bar {y}}_{\min }} is the smallest of these sample means andy¯max{\displaystyle {\bar {y}}_{\max }} is the largest of these sample means, and supposes² is the pooled sample variance from these samples. Then the following statistic has a Studentized range distribution.

q=y¯maxy¯mins/n{\displaystyle q={\frac {{\overline {y}}_{\max }-{\overline {y}}_{\min }}{s/{\sqrt {n\,}}}}}

Definition

[edit]

Probability density function

[edit]

Differentiating the cumulative distribution function with respect toq gives theprobability density function.

fR(q;k,ν)=2πk(k1)νν/2Γ(ν/2)2(ν/21)0sνφ(νs)[φ(z+qs)φ(z)[Φ(z+qs)Φ(z)]k2dz]ds{\displaystyle f_{\text{R}}(q;k,\nu )={\frac {{\sqrt {2\pi \,}}\,k\,(k-1)\,\nu ^{\nu /2}}{\Gamma (\nu /2)\,2^{\left(\nu /2-1\right)}}}\int _{0}^{\infty }s^{\nu }\,\varphi ({\sqrt {\nu \,}}\,s)\,\left[\int _{-\infty }^{\infty }\varphi (z+q\,s)\,\varphi (z)\,\left[\Phi (z+q\,s)-\Phi (z)\right]^{k-2}\,\mathrm {d} z\right]\,\mathrm {d} s}

Note that in the outer part of the integral, the equation

φ(νs)2π=e(νs2/2){\displaystyle \varphi ({\sqrt {\nu \,}}\,s)\,{\sqrt {2\pi \,}}=e^{-\left(\nu \,s^{2}/2\right)}}

was used to replace an exponential factor.

Cumulative distribution function

[edit]

The cumulative distribution function is given by[1]

FR(q;k,ν)=2πkνν/2Γ(ν/2)2(ν/21)0sν1φ(νs)[φ(z)[Φ(z+qs)Φ(z)]k1dz]ds{\displaystyle F_{\text{R}}(q;k,\nu )={\frac {{\sqrt {2\pi \,}}\,k\,\nu ^{\nu /2}}{\,\Gamma (\nu /2)\,2^{(\nu /2-1)}\,}}\int _{0}^{\infty }s^{\nu -1}\varphi ({\sqrt {\nu \,}}\,s)\left[\int _{-\infty }^{\infty }\varphi (z)\left[\Phi (z+q\,s)-\Phi (z)\right]^{k-1}\,\mathrm {d} z\right]\,\mathrm {d} s}

Special cases

[edit]

Ifk is 2 or 3,[2] the studentized range probability distribution function can be directly evaluated, whereφ(z){\displaystyle \varphi (z)} is the standard normal probability density function andΦ(z){\displaystyle \Phi (z)} is the standard normal cumulative distribution function.

fR(q;k=2)=2φ(q/2){\displaystyle f_{R}(q;k=2)={\sqrt {2\,}}\,\varphi \left(\,q/{\sqrt {2\,}}\right)}
fR(q;k=3)=62φ(q/2)[Φ(q/6)12]{\displaystyle f_{R}(q;k=3)=6{\sqrt {2\,}}\,\varphi \left(\,q/{\sqrt {2\,}}\right)\left[\Phi \left(q/{\sqrt {6\,}}\right)-{\tfrac {1}{2}}\right]}

When the degrees of freedom approaches infinity the studentized range cumulative distribution can be calculated for anyk using the standard normal distribution.

FR(q;k)=kφ(z)[Φ(z+q)Φ(z)]k1dz=k[Φ(z+q)Φ(z)]k1dΦ(z){\displaystyle F_{R}(q;k)=k\,\int _{-\infty }^{\infty }\varphi (z)\,{\Bigl [}\Phi (z+q)-\Phi (z){\Bigr ]}^{k-1}\,\mathrm {d} z=k\,\int _{-\infty }^{\infty }\,{\Bigl [}\Phi (z+q)-\Phi (z){\Bigr ]}^{k-1}\,\mathrm {d} \Phi (z)}

Applications

[edit]

Critical values of the studentized range distribution are used inTukey's range test.[3]

The studentized range is used to calculate significance levels for results obtained bydata mining, where one selectively seeks extreme differences in sample data, rather than only sampling randomly.

The Studentized range distribution has applications tohypothesis testing andmultiple comparisons procedures. For example,Tukey's range test andDuncan's new multiple range test (MRT), in which the samplex1, ..., xn is a sample ofmeans andq is the basic test-statistic, can be used aspost-hoc analysis to test between which two groups means there is a significant difference (pairwise comparisons) after rejecting thenull hypothesis that all groups are from the same population (i.e. all means are equal) by the standardanalysis of variance.[4]

Related distributions

[edit]

When only the equality of the two groups means is in question (i.e. whetherμ1 =μ2), the studentized range distribution is similar to theStudent's t distribution, differing only in that the first takes into account the number of means under consideration, and the critical value is adjusted accordingly. The more means under consideration, the larger the critical value is. This makes sense since the more means there are, the greater the probability that at least some differences between pairs of means will be significantly large due to chance alone.

Derivation

[edit]

The studentized range distribution function arises from re-scaling the sample rangeR by thesample standard deviations, since the studentized range is customarily tabulated in units of standard deviations, with the variableq =Rs. The derivation begins with a perfectly general form of the distribution function of the sample range, which applies to any sample data distribution.

In order to obtain the distribution in terms of the "studentized" rangeq, we will change variable fromR tos andq. Assuming the sample data isnormally distributed, thestandard deviations will beχ distributed. By further integrating overs we can removes as a parameter and obtain the re-scaled distribution in terms ofq alone.

General form

[edit]

For any probability density functionfX, the range probability densityfR is:[2]

fR(r;k)=k(k1)fX(t+12r)fX(t12r)[t12rt+12rfX(x)dx]k2dt{\displaystyle f_{R}(r;k)=k\,(k-1)\int _{-\infty }^{\infty }f_{X}\left(t+{\tfrac {1}{2}}r\right)f_{X}\left(t-{\tfrac {1}{2}}r\right)\left[\int _{t-{\tfrac {1}{2}}r}^{t+{\tfrac {1}{2}}r}f_{X}(x)\,\mathrm {d} x\right]^{k-2}\,\mathrm {d} \,t}

What this means is that we are adding up the probabilities that, givenk draws from a distribution, two of them differ byr, and the remainingk − 2 draws all fall between the two extreme values. If we change variables tou whereu=t12r{\displaystyle u=t-{\tfrac {1}{2}}r} is the low-end of the range, and defineFX as the cumulative distribution function offX, then the equation can be simplified:

fR(r;k)=k(k1)fX(u+r)fX(u)[FX(u+r)FX(u)]k2du{\displaystyle f_{R}(r;k)=k\,(k-1)\int _{-\infty }^{\infty }f_{X}(u+r)\,f_{X}(u)\,\left[\,F_{X}(u+r)-F_{X}(u)\,\right]^{k-2}\,\mathrm {d} \,u}

We introduce a similar integral, and notice that differentiating under the integral-sign gives

r[kfX(u)[FX(u+r)FX(u)]k1du]=k(k1)fX(u+r)fX(u)[FX(u+r)FX(u)]k2du{\displaystyle {\begin{aligned}{\frac {\partial }{\partial r}}&\left[k\,\int _{-\infty }^{\infty }f_{X}(u)\,{\Bigl [}\,F_{X}(u+r)-F_{X}(u)\,{\Bigr ]}^{k-1}\,\mathrm {d} \,u\right]\\[5pt]={}&k\,(k-1)\int _{-\infty }^{\infty }f_{X}(u+r)\,f_{X}(u)\,{\Bigl [}\,F_{X}(u+r)-F_{X}(u)\,{\Bigr ]}^{k-2}\,\mathrm {d} \,u\end{aligned}}}

which recovers the integral above,[a] so that last relation confirms

FR(r;k)=kfX(u)[FX(u+r)FX(u)]k1du=k[FX(u+r)FX(u)]k1dFX(u){\displaystyle {\begin{aligned}F_{R}(r;k)&=k\int _{-\infty }^{\infty }f_{X}(u){\Bigl [}\,F_{X}(u+r)-F_{X}(u)\,{\Bigr ]}^{k-1}\,\mathrm {d} \,u\\&=k\int _{-\infty }^{\infty }{\Bigl [}\,F_{X}(u+r)-F_{X}(u)\,{\Bigr ]}^{k-1}\,\mathrm {d} \,F_{X}(u)\end{aligned}}}

because for any continuouscdf

FR(r;k)r=fR(r;k){\displaystyle {\frac {\partial F_{R}(r;k)}{\partial r}}=f_{R}(r;k)}

Special form for normal data

[edit]

The range distribution is most often used for confidence intervals around sample averages, which are asymptoticallynormally distributed by thecentral limit theorem.

In order to create the studentized range distribution for normal data, we first switch from the genericfX andFX to the distribution functionsφ and Φ for thestandard normal distribution, and change the variabler tos·q, whereq is a fixed factor that re-scalesr by scaling factors:

fR(q;k)=sk(k1)φ(u+sq)φ(u)[Φ(u+sq)Φ(u)]k2du{\displaystyle f_{R}(q;k)=s\,k\,(k-1)\int _{-\infty }^{\infty }\varphi (u+sq)\varphi (u)\,\left[\,\Phi (u+sq)-\Phi (u)\right]^{k-2}\,\mathrm {d} u}

Choose the scaling factors to be the sample standard deviation, so thatq becomes the number of standard deviations wide that the range is. For normal datas ischi distributed[b] and thedistribution functionfS of the chi distribution is given by:

fS(s;ν)ds={νν/2sν1eνs2/22(ν/21)Γ(ν/2)dsfor 0<s<,0otherwise.{\displaystyle f_{S}(s;\nu )\,\mathrm {d} s={\begin{cases}{\dfrac {\nu ^{\nu /2}\,s^{\nu -1}e^{-\nu \,s^{2}/2}\,}{2^{\left(\nu /2-1\right)}\Gamma (\nu /2)}}\,\mathrm {d} s&{\text{for }}\,0<s<\infty ,\\[4pt]0&{\text{otherwise}}.\end{cases}}}

Multiplying the distributionsfR andfS and integrating to remove the dependence on the standard deviations gives the studentized range distribution function for normal data:

fR(q;k,ν)=νν/2k(k1)2(ν/21)Γ(ν/2)0sνeνs2/2φ(u+sq)φ(u)[Φ(u+sq)Φ(u)]k2duds{\displaystyle f_{R}(q;k,\nu )={\frac {\nu ^{\nu /2}\,k\,(k-1)}{2^{\left(\nu /2-1\right)}\Gamma (\nu /2)}}\int _{0}^{\infty }s^{\nu }e^{-\nu s^{2}/2}\int _{-\infty }^{\infty }\varphi (u+sq)\,\varphi (u)\,\left[\,\Phi (u+sq)-\Phi (u)\right]^{k-2}\,\mathrm {d} u\,\mathrm {d} s}

where

q is the width of the data range measured in standard deviations,
ν is the number of degrees of freedom for determining the sample standard deviation,[c] and
k is the number of separate averages that form the points within the range.

The equation for thepdf shown in the sections above comes from using

eνs2/2=2πφ(νs){\displaystyle e^{-\nu \,s^{2}/2}={\sqrt {2\pi \,}}\,\varphi ({\sqrt {\nu \,}}\,s)}

to replace the exponential expression in the outer integral.

Notes

[edit]
  1. ^Technically, the relation is only true for pointsu{\displaystyle u} wherefX(u+r)>0{\displaystyle f_{X}(u+r)>0}, which holds everywhere fornormal data as discussed in the next section, but not for distributions whosesupport has an upper bound, likeuniformly distributed data.
  2. ^Note well the absence of "squared": The text refers to theχ distribution,not theχ2 distribution.
  3. ^Usuallyν=n1{\displaystyle \nu =n-1}, wheren is the total number of all datapoints used to find the averages that are the values in the range.

References

[edit]
  1. ^Lund, R.E.; Lund, J.R. (1983). "Algorithm AS 190: Probabilities and upper quantiles for the studentized range".Journal of the Royal Statistical Society.32 (2):204–210.JSTOR 2347300.
  2. ^abMcKay, A.T. (1933). "A note on the distribution of range in samples ofn".Biometrika.25 (3):415–420.doi:10.2307/2332292.JSTOR 2332292.
  3. ^"StatsExamples | table of Q distribution critical values for alpha=0.05".
  4. ^Pearson & Hartley (1970, Section 14.2)

Further reading

[edit]

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Studentized_range_distribution&oldid=1082889807"
Category:

[8]ページ先頭

©2009-2026 Movatter.jp