Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Tukey's range test

From Wikipedia, the free encyclopedia
(Redirected fromTukey's HSD)
Statistical test for multiple comparisons
Not to be confused withTukey mean-difference test.

Tukey's range test, also known asTukey's test,Tukey method,Tukey's honest significance test, orTukey's HSD (honestly significant difference)test,[1]is a single-stepmultiple comparison procedure andstatistical test. It can be used to correctly interpret thestatistical significance of the difference between means that have been selected for comparison because of their extreme values.

The method was initially developed and introduced byJohn Tukey for use inAnalysis of Variance (ANOVA), and usually has only been taught in connection with ANOVA. However, thestudentized range distribution used to determine the level of significance of the differences considered in Tukey's test has vastly broader application: It is useful for researchers who have searched their collected data for remarkable differences between groups, but then cannot validly determinehow significant their discovered stand-out difference is using standard statistical distributions used for other conventional statistical tests, for which the data must have been selected at random. Since when stand-out data is compared it was by definitionnot selected at random, but rather specifically chosen because it was extreme, it needs a different, stricter interpretation provided by the likely frequency and size of thestudentized range; the modern practice of "data mining" is an example where it is used.

Development

[edit]

The test was devised byJohn Tukey,[2]it compares all possible pairs ofmeans, and is based on astudentized range distribution (q) (this distribution is similar to the distribution oft from thet-test. See below).[3][full citation needed]

Tukey's test compares the means of every treatment to the means of every other treatment; that is, it applies simultaneously to the set of all pairwise comparisons

μiμj ,{\displaystyle \mu _{i}-\mu _{j}\ ,}

and identifies any difference between two means that is greater than the expectedstandard error. Theconfidence coefficient for theset, when all sample sizes are equal, is exactly 1α {\displaystyle \ 1-\alpha \ } for any α : 0α1 .{\displaystyle \ \alpha ~:~0\leq \alpha \leq 1~.} For unequal sample sizes, the confidence coefficient is greater than 1α .{\displaystyle \ 1-\alpha ~.} In other words, the Tukey method is conservative when there areunequal sample sizes.

This test is often followed by theCompact Letter Display (CLD) statistical procedure to render the output of this test more transparent to non-statistician audiences.

Assumptions

[edit]
  1. The observations being tested areindependent within and among the groups.[citation needed]
  2. The subgroups associated with each mean in the test arenormally distributed.[citation needed]
  3. There is equal within-subgroup variance across the subgroups associated with each mean in the test (homogeneity of variance).[citation needed]

The test statistic

[edit]

Tukey's test is based on a formula very similar to that of thet-test. In fact, Tukey's test is essentially at-test, except that it corrects forfamily-wise error rate.

The formula for Tukey's test is

qs= |YAYB|  SE  ,{\displaystyle q_{\mathsf {s}}={\frac {\ \left|Y_{\mathsf {A}}-Y_{\mathsf {B}}\right|\ }{\ {\mathsf {SE}}\ }}\ ,}

whereYA andYB are the two means being compared, and SE is thestandard error for the sum of the means. The valueqs is the sample's test statistic. (The notation |x| means theabsolute value ofx; the magnitude ofx with the sign set to+, regardless of the original sign ofx.)

Thisqs test statistic can then be compared to aq value for the chosen significance levelα from a table of thestudentized range distribution. If theqs value islarger than the critical valueqα obtained from the distribution, the two means are said to be significantly different at level α : 0α1 .{\displaystyle \ \alpha ~:~0\leq \alpha \leq 1~.}[3]

Since thenull hypothesis for Tukey's test states that all means being compared are from the same population (i.e.μ1 =μ2 =μ3 = ... =μk), the means should be normally distributed (according to thecentral limit theorem) with the same modelstandard deviationσ, estimated by the mergedstandard error, SE ,{\displaystyle \ {\mathsf {SE}}\ ,} for all the samples; its calculation is discussed in the following sections. This gives rise to the normality assumption of Tukey's test.

The studentized range (q) distribution

[edit]

The Tukey method uses thestudentized range distribution. Suppose that we take a sample of sizen from each ofk populations with the samenormal distributionN(μ,σ2) and suppose that y¯min {\displaystyle \ {\bar {y}}_{\mathsf {min}}\ } is the smallest of these sample means and y¯max {\displaystyle \ {\bar {y}}_{\mathsf {max}}\ } is the largest of these sample means, and supposeS2 is thepooled sample variance from these samples. Then the following random variable has a Studentized range distribution:

q y¯maxy¯min  S2/n {\displaystyle q\equiv {\frac {\ {\overline {y}}_{\mathsf {max}}-{\overline {y}}_{\mathsf {min}}\ }{\ S{\sqrt {2/n}}\ }}}

This definition of the statisticq given above is the basis of the critically significant value forqα discussed below, and is based on these three factors:

 α {\displaystyle \ \alpha ~\quad } theType I error rate, or the probability of rejecting a true null hypothesis;
 k {\displaystyle \ k~\quad } the number of sub-populations being compared;
 df{\displaystyle \ {\mathsf {df}}\quad } the number of degrees of freedom for each mean

( df =Nk ) whereN is the total number of observations.)

The distribution ofq has been tabulated and appears in many textbooks on statistics. In some tables the distribution ofq has been tabulated without the 2  {\displaystyle \ {\sqrt {2\ }}\ } factor. To understand which table it is, we can compute the result fork = 2 and compare it to the result of theStudent's t-distribution with the same degrees of freedom and the sameα .In addition,R offers acumulative distribution function (ptukey) and aquantile function (qtukey)forq .

Confidence limits

[edit]

The Tukeyconfidence limits for all pairwise comparisons with confidence coefficient of at least 1 −α  are

y¯iy¯j ±  q α ; k ; Nk  2   σ^ε 2n :i, j=1,,kij .{\displaystyle {\bar {y}}_{i\bullet }-{\bar {y}}_{j\bullet }\ \pm \ {\frac {\ q_{\ \alpha \ ;\ k\ ;\ N-k}\ }{\ {\sqrt {2\ }}\ }}\ {\widehat {\sigma }}_{\varepsilon }\ {\sqrt {{\frac {2}{n}}\ }}\quad :\quad i,\ j=1,\ldots ,k\quad i\neq j~.}

Notice that the point estimator and the estimated variance are the same as those for a single pairwise comparison. The only difference between the confidence limits for simultaneous comparisons and those for a single comparison is the multiple of the estimated standard deviation.

Also note that the sample sizes must be equal when using the studentized range approach. σ^ε {\displaystyle \ {\widehat {\sigma }}_{\varepsilon }\ } is the standard deviation of the entire design, not just that of the two groups being compared. It is possible to work with unequal sample sizes. In this case, one has to calculate the estimated standard deviation for each pairwise comparison as formalized byClyde Kramer in 1956, so the procedure for unequal sample sizes is sometimes referred to as theTukey–Kramer method which is as follows:

y¯iy¯j ±  q α ; k ; Nk  2   σ^ε   1 ni +  1 nj  {\displaystyle {\bar {y}}_{i\bullet }-{\bar {y}}_{j\bullet }\ \pm \ {\frac {\ q_{\ \alpha \ ;\ k\ ;\ N-k}\ }{\ {\sqrt {2\ }}\ }}\ {\widehat {\sigma }}_{\varepsilon }\ {\sqrt {\ {\frac {\ 1\ }{n_{i}}}\ +\ {\frac {\ 1\ }{n_{j}}}\ }}\ }

wheren i andn j are the sizes of groupsi andj respectively. The degrees of freedom for the whole design is also applied.

Comparing ANOVA and Tukey–Kramer tests

[edit]

Both ANOVA and Tukey–Kramer tests are based on the same assumptions. However, these two tests fork groups (i.e.μ1 =μ2 = ... =μk) may result in logical contradictions whenk > 2 , even if the assumptions do hold.

It is possible to generate a set of pseudorandom samples of strictly negative measure such that hypothesisμ1 =μ2 is rejected at significance level 1α>0.95 {\displaystyle \ 1-\alpha >0.95\ } whileμ1 =μ2 =μ3 is not rejected even at 1α=0.975 .{\displaystyle \ 1-\alpha =0.975~.}[4]

See also

[edit]

References

[edit]
  1. ^Lowry, Richard."One-way ANOVA – independent samples".Vassar.edu. Archived fromthe original on 17 October 2008. Retrieved4 December 2008.
    Also occasionally described as "honestly", see e.g.
    Morrison, S.; Sosnoff, J.J.; Heffernan, K.S.; Jae, S.Y.; Fernhall, B. (2013). "Aging, hypertension and physiological tremor: The contribution of the cardioballistic impulse to tremorgenesis in older adults".Journal of the Neurological Sciences.326 (1–2):68–74.doi:10.1016/j.jns.2013.01.016.PMID 23385002.
  2. ^Tukey, John (1949). "Comparing individual means in the Analysis of Variance".Biometrics.5 (2):99–114.doi:10.2307/3001913.JSTOR 3001913.PMID 18151955.
  3. ^abLinton, L.R.; Harder, L.D. (2007). Lecture notes (Report). Biology 315: Quantitative biology. Calgary, AB: University of Calgary.
  4. ^Gurvich, V.; Naumova, M. (2021)."Logical contradictions in the one-way ANOVA and Tukey–Kramer multiple comparisons tests with more than two groups of observations".Symmetry.13 (8): 1387.arXiv:2104.07552.Bibcode:2021Symm...13.1387G.doi:10.3390/sym13081387.

Further reading

[edit]
  • Montgomery, Douglas C. (2013).Design and Analysis of Experiments (8th ed.). Wiley. Section 3.5.7.

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Tukey%27s_range_test&oldid=1299739052"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp