Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Wilcoxon signed-rank test

From Wikipedia, the free encyclopedia
Statistical hypothesis test
Not to be confused withWilcoxon rank-sum test.

TheWilcoxon signed-rank test is anon-parametricrank test forstatistical hypothesis testing used either to test thelocation of a population based on a sample of data, or to compare the locations of two populations using two matched samples.[1] The one-sample version serves a purpose similar to that of the one-sampleStudent'st-test.[2] For two matched samples, it is apaired difference test like the paired Student'st-test (also known as the "t-test for matched pairs" or "t-test for dependent samples"). The Wilcoxon test is a good alternative to thet-test when thenormal distribution of the differences between paired individuals cannot be assumed. Instead, it assumes a weaker hypothesis that the distribution of this difference is symmetric around a central value and it aims to test whether this center value differs significantly from zero. The Wilcoxon test is a more powerful alternative to thesign test because it considers the magnitude of the differences, but it requires this moderately strong assumption of symmetry.

History

[edit]

The test is named afterFrank Wilcoxon (1892–1965) who, in a single paper, proposed both it and therank-sum test for two independent samples.[3] The test was popularized bySidney Siegel (1956) in his influential textbook on non-parametric statistics.[4] Siegel used the symbolT for the test statistic, and consequently, the test is sometimes referred to as theWilcoxonT-test.

Test procedure

[edit]

There are two variants of the signed-rank test. From a theoretical point of view, the one-sample test is more fundamental because the paired sample test is performed by converting the data to the situation of the one-sample test. However, most practical applications of the signed-rank test arise from paired data.

For a paired sample test, the data consists of a sample(X1,Y1),,(Xn,Yn){\displaystyle (X_{1},Y_{1}),\dots ,(X_{n},Y_{n})}. Each data point in the sample is a pair of measurements. In the simplest case, the measurements are on aninterval scale. Then they may be converted toreal numbers, and the paired sample test is converted to a one-sample test by replacing each pair of numbers(Xi,Yi){\displaystyle (X_{i},Y_{i})} by its differenceXiYi{\displaystyle X_{i}-Y_{i}}.[5] In general, it must be possible to rank the differences between the pairs. This requires that the data be on anordered metric scale, a type of scale that carries more information than an ordinal scale but may have less than an interval scale.[6]

The data for a one-sample test is a sample in which each observation is a real number:X1,,Xn{\displaystyle X_{1},\dots ,X_{n}}. Assume for simplicity that the observations in the sample have distinct absolute values and that no observation equals zero. (Zeros and ties introduce several Complications; see below.) The test is performed as follows:[7][8]

  1. Compute|X1|,,|Xn|.{\displaystyle |X_{1}|,\dots ,|X_{n}|.}
  2. Sort|X1|,,|Xn|{\displaystyle |X_{1}|,\dots ,|X_{n}|}, and use this sorted list to assign ranksR1,,Rn{\displaystyle R_{1},\dots ,R_{n}}: The rank of the smallest observation is one, the rank of the next smallest is two, and so on.
  3. Letsgn{\displaystyle \operatorname {sgn} } denote thesign function:sgn(x)=1{\displaystyle \operatorname {sgn}(x)=1} ifx>0{\displaystyle x>0} andsgn(x)=1{\displaystyle \operatorname {sgn}(x)=-1} ifx<0{\displaystyle x<0}. Thetest statistic is thesigned-rank sumT{\displaystyle T}:T=i=1Nsgn(Xi)Ri.{\displaystyle T=\sum _{i=1}^{N}\operatorname {sgn}(X_{i})R_{i}.}
  4. Produce ap{\displaystyle p}-value by comparingT{\displaystyle T} to its distribution under the null hypothesis.

The ranks are defined so thatRi{\displaystyle R_{i}} is the number ofj{\displaystyle j} for which|Xj||Xi|{\displaystyle |X_{j}|\leq |X_{i}|}. Additionally, ifσ:{1,,n}{1,,n}{\displaystyle \sigma :\{1,\dots ,n\}\to \{1,\dots ,n\}} is such that|Xσ(1)|<<|Xσ(n)|{\displaystyle |X_{\sigma (1)}|<\dots <|X_{\sigma (n)}|}, thenRσ(i)=i{\displaystyle R_{\sigma (i)}=i} for alli{\displaystyle i}.

The signed-rank sumT{\displaystyle T} is closely related to two other test statistics. Thepositive-rank sumT+{\displaystyle T^{+}} and thenegative-rank sumT{\displaystyle T^{-}} are defined by[9]T+=1in, Xi>0Ri,T=1in, Xi<0Ri.{\displaystyle {\begin{aligned}T^{+}&=\sum _{1\leq i\leq n,\ X_{i}>0}R_{i},\\T^{-}&=\sum _{1\leq i\leq n,\ X_{i}<0}R_{i}.\end{aligned}}}BecauseT++T{\displaystyle T^{+}+T^{-}} equals the sum of all the ranks, which is1+2++n=n(n+1)/2{\displaystyle 1+2+\dots +n=n(n+1)/2}, these three statistics are related by:[9]T+=n(n+1)2T=n(n+1)4+T2,T=n(n+1)2T+=n(n+1)4T2,T=T+T=2T+n(n+1)2=n(n+1)22T.{\displaystyle {\begin{aligned}T^{+}&={\frac {n(n+1)}{2}}-T^{-}={\frac {n(n+1)}{4}}+{\frac {T}{2}},\\T^{-}&={\frac {n(n+1)}{2}}-T^{+}={\frac {n(n+1)}{4}}-{\frac {T}{2}},\\T&=T^{+}-T^{-}=2T^{+}-{\frac {n(n+1)}{2}}={\frac {n(n+1)}{2}}-2T^{-}.\end{aligned}}}BecauseT{\displaystyle T},T+{\displaystyle T^{+}}, andT{\displaystyle T^{-}} carry the same information, any of them may be used as the test statistic.

The positive-rank sum and negative-rank sum have alternative interpretations that are useful for the theory behind the test. Define theWalsh averageWij{\displaystyle W_{ij}} to be12(Xi+Xj){\displaystyle {\tfrac {1}{2}}(X_{i}+X_{j})}. Then:[10]T+=#{Wij>0:1ijn},T=#{Wij<0:1ijn}.{\displaystyle {\begin{aligned}T^{+}=\#\{W_{ij}>0\colon 1\leq i\leq j\leq n\},\\T^{-}=\#\{W_{ij}<0\colon 1\leq i\leq j\leq n\}.\end{aligned}}}

Null and alternative hypotheses

[edit]

One-sample test

[edit]

The one-sample Wilcoxon signed-rank test can be used to test whether data comes from a symmetric population with a specified center (which corresponds tomedian,mean andpseudomedian).[11] If the population center is known, then it can be used to test whether data is symmetric about its center.[12]

To explain the null and alternative hypotheses formally, assume that the data consists ofindependent and identically distributed samples from a distributionF{\displaystyle F}. IfF{\displaystyle F} can be assumed symmetric, then the null and alternative hypotheses are the following:[13]

Null hypothesisH0
F{\displaystyle F} is symmetric aboutμ=0{\displaystyle \mu =0}.
One-sided alternative hypothesisH1
F{\displaystyle F} is symmetric aboutμ<0{\displaystyle \mu <0}.
One-sided alternative hypothesisH2
F{\displaystyle F} is symmetric aboutμ>0{\displaystyle \mu >0}.
Two-sided alternative hypothesisH3
F{\displaystyle F} is symmetric aboutμ0{\displaystyle \mu \neq 0}.

If in additionPr(X=μ)=0{\displaystyle \Pr(X=\mu )=0}, thenμ{\displaystyle \mu } is a median ofF{\displaystyle F}. If this median is unique, then the Wilcoxon signed-rank sum test becomes a test for the location of the median.[14] When the mean ofF{\displaystyle F} is defined, then the mean isμ{\displaystyle \mu }, and the test is also a test for the location of the mean.[7]

The hypothesis that the data are IID can be weakened. Each data point may be taken from a different distribution, as long as all the distributions are assumed to be continuous and symmetric about a common pointμ0{\displaystyle \mu _{0}}. The data points are not required to be independent as long as the conditional distribution of each observation given the others is symmetric aboutμ0{\displaystyle \mu _{0}}.[15]

Paired data test

[edit]

Because the paired data test arises from taking paired differences, its null and alternative hypotheses can be derived from those of the one-sample test. In each case, they become assertions about the behavior of the differencesXiYi{\displaystyle X_{i}-Y_{i}}.

LetF(x,y){\displaystyle F(x,y)} be the joint cumulative distribution of the pairs(Xi,Yi){\displaystyle (X_{i},Y_{i})}. If we assumes that there exists aμ{\displaystyle \mu } such thatXiYi{\displaystyle X_{i}-Y_{i}} is symmetric aboutμ{\displaystyle \mu }, the null and alternative hypotheses are:[16][17]

Null hypothesisH0
The observationsXiYi{\displaystyle X_{i}-Y_{i}} are symmetric aboutμ=0{\displaystyle \mu =0}.
One-sided alternative hypothesisH1
The observationsXiYi{\displaystyle X_{i}-Y_{i}} are symmetric aboutμ<0{\displaystyle \mu <0}.
One-sided alternative hypothesisH2
The observationsXiYi{\displaystyle X_{i}-Y_{i}} are symmetric aboutμ>0{\displaystyle \mu >0}.
Two-sided alternative hypothesisH3
The observationsXiYi{\displaystyle X_{i}-Y_{i}} are symmetric aboutμ0{\displaystyle \mu \neq 0}.

These can also be expressed more directly in terms of the original pairs:[18]

Null hypothesisH0
The observations(Xi,Yi){\displaystyle (X_{i},Y_{i})} areexchangeable, meaning that(Xi,Yi){\displaystyle (X_{i},Y_{i})} and(Yi,Xi){\displaystyle (Y_{i},X_{i})} have the same distribution. Equivalently,F(x,y)=F(y,x){\displaystyle F(x,y)=F(y,x)}.
One-sided alternative hypothesisH1
For someμ<0{\displaystyle \mu <0}, the pairs(Xi,Yi){\displaystyle (X_{i},Y_{i})} and(Yi+μ,Xiμ){\displaystyle (Y_{i}+\mu ,X_{i}-\mu )} have the same distribution.
One-sided alternative hypothesisH2
For someμ>0{\displaystyle \mu >0}, the pairs(Xi,Yi){\displaystyle (X_{i},Y_{i})} and(Yi+μ,Xiμ){\displaystyle (Y_{i}+\mu ,X_{i}-\mu )} have the same distribution.
Two-sided alternative hypothesisH3
For someμ0{\displaystyle \mu \neq 0}, the pairs(Xi,Yi){\displaystyle (X_{i},Y_{i})} and(Yi+μ,Xiμ){\displaystyle (Y_{i}+\mu ,X_{i}-\mu )} have the same distribution.

The null hypothesis of exchangeability can arise from a matched pair experiment with a treatment group and a control group. Randomizing the treatment and control within each pair makes the observations exchangeable. For an exchangeable distribution,XiYi{\displaystyle X_{i}-Y_{i}} has the same distribution asYiXi{\displaystyle Y_{i}-X_{i}}, and therefore, under the null hypothesis, the distribution is symmetric about zero.[18]

Zeros and ties

[edit]

In real data, it sometimes happens that there is an observationXi{\displaystyle X_{i}} in the sample which equals zero or a pair(Xi,Yi){\displaystyle (X_{i},Y_{i})} withXi=Yi{\displaystyle X_{i}=Y_{i}}. It can also happen that there are tied observations. This means that for someij{\displaystyle i\neq j}, we haveXi=Xj{\displaystyle X_{i}=X_{j}} (in the one-sample case) orXiYi=XjYj{\displaystyle X_{i}-Y_{i}=X_{j}-Y_{j}} (in the paired sample case). This is particularly common for discrete data. When this happens, the test procedure defined above is usually undefined because there is no way to uniquely rank the data. (The sole exception is if there is a single observationXi{\displaystyle X_{i}} which is zero and no other zeros or ties.) Because of this, the test statistic needs to be modified.

Zeros

[edit]

Wilcoxon's original paper did not address the question of observations (or, in the paired sample case, differences) that equal zero. However, in later surveys, he recommended removing zeros from the sample.[19] Then the standard signed-rank test could be applied to the resulting data, as long as there were no ties. This is now called thereduced sample procedure.

Pratt[20] observed that the reduced sample procedure can lead to paradoxical behavior. He gives the following example. Suppose that we are in the one-sample situation and have the following thirteen observations:

0, 2, 3, 4, 6, 7, 8, 9, 11, 14, 15, 17, −18.

The reduced sample procedure removes the zero. To the remaining data, it assigns the signed ranks:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, −12.

This has a one-sidedp-value of55/212{\displaystyle 55/2^{12}}, and therefore the sample is not significantly positive at any significance levelα<55/2120.0134{\displaystyle \alpha <55/2^{12}\approx 0.0134}. Pratt argues that one would expect that decreasing the observations should certainly not make the data appear more positive. However, if the zero observation is decreased by an amount less than 2, or if all observations are decreased by an amount less than 1, then the signed ranks become:

−1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, −13.

This has a one-sidedp-value of109/213{\displaystyle 109/2^{13}}. Therefore the sample would be judged significantly positive at any significance levelα>109/2130.0133{\displaystyle \alpha >109/2^{13}\approx 0.0133}. The paradox is that, ifα{\displaystyle \alpha } is between109/213{\displaystyle 109/2^{13}} and55/212{\displaystyle 55/2^{12}}, thendecreasing an insignificant sample causes it to appear significantlypositive.

Pratt therefore proposed thesigned-rank zero procedure. This procedure includes the zeros when ranking the observations in the sample. However, it excludes them from the test statistic, or equivalently it definessgn(0)=0{\displaystyle \operatorname {sgn}(0)=0}. Pratt proved that the signed-rank zero procedure has several desirable behaviors not shared by the reduced sample procedure:[21]

  1. Increasing the observed values does not make a significantly positive sample insignificant, and it does not make an insignificant sample significantly negative.
  2. If the distribution of the observations is symmetric, then the values ofμ{\displaystyle \mu } which the test does not reject form an interval.
  3. A sample is significantly positive, not significant, or significantly negative, if and only if it is so when the zeros are assigned arbitrary non-zero signs, if and only if it is so when the zeros are replaced with non-zero values which are smaller in absolute value than any non-zero observation.
  4. For a fixed significance thresholdα{\displaystyle \alpha }, and for a test which is randomized to have level exactlyα{\displaystyle \alpha }, the probability of calling a set of observations significantly positive (respectively, significantly negative) is a non-decreasing (respectively, non-increasing) function of the observations.

Pratt remarks that, when the signed-rank zero procedure is combined with the average rank procedure for resolving ties, the resulting test is a consistent test against the alternative hypothesis that, for allij{\displaystyle i\neq j},Pr(Xi+Xj>0){\displaystyle \Pr(X_{i}+X_{j}>0)} andPr(Xi+Xj<0){\displaystyle \Pr(X_{i}+X_{j}<0)} differ by at least a fixed constant that is independent ofi{\displaystyle i} andj{\displaystyle j}.[22]

The signed-rank zero procedure has the disadvantage that, when zeros occur, the null distribution of the test statistic changes, so tables ofp-values can no longer be used.

When the data is on aLikert scale with equally spaced categories, the signed-rank zero procedure is more likely to maintain the Type I error rate than the reduced sample procedure.[23]

From the viewpoint of statistical efficiency, there is no perfect rule for handling zeros. Conover found examples of null and alternative hypotheses that show that neither Wilcoxon's and Pratt's methods are uniformly better than the other. When comparing a discrete uniform distribution to a distribution where probabilities linearly increase from left to right, Pratt's method outperforms Wilcoxon's. When testing a binomial distribution centered at zero to see whether the parameter of each Bernoulli trial is12{\displaystyle {\tfrac {1}{2}}}, Wilcoxon's method outperforms Pratt's.[24]

Ties

[edit]

When the data does not have ties, the ranksRi{\displaystyle R_{i}} are used to calculate the test statistic. In the presence of ties, the ranks are not defined. There are two main approaches to resolving this.

The most common procedure for handling ties, and the one originally recommended by Wilcoxon, is called theaverage rank ormidrank procedure. This procedure assigns numbers between 1 andn to the observations, with two observations getting the same number if and only if they have the same absolute value. These numbers are conventionally called ranks even though the set of these numbers is not equal to{1,,n}{\displaystyle \{1,\dots ,n\}} (except when there are no ties). The rank assigned to an observation is the average of the possible ranks it would have if the ties were broken in all possible ways. Once the ranks are assigned, the test statistic is computed in the same way as usual.[25][26]

For example, suppose that the observations satisfy|X3|<|X2|=|X5|<|X6|<|X1|=|X4|=|X7|.{\displaystyle |X_{3}|<|X_{2}|=|X_{5}|<|X_{6}|<|X_{1}|=|X_{4}|=|X_{7}|.}In this case,X3{\displaystyle X_{3}} is assigned rank 1,X2{\displaystyle X_{2}} andX5{\displaystyle X_{5}} are assigned rank(2+3)/2=2.5{\displaystyle (2+3)/2=2.5},X6{\displaystyle X_{6}} is assigned rank 4, andX1{\displaystyle X_{1}},X4{\displaystyle X_{4}}, andX7{\displaystyle X_{7}} are assigned rank(5+6+7)/3=6{\displaystyle (5+6+7)/3=6}. Formally, suppose that there is a set of observations all having the same absolute valuev{\displaystyle v}, thatk1{\displaystyle k-1} observations have absolute value less thanv{\displaystyle v}, and that{\displaystyle \ell } observations have absolute value less than or equal tov{\displaystyle v}. If the ties among the observations with absolute valuev{\displaystyle v} were broken, then these observations would occupy ranksk{\displaystyle k} through{\displaystyle \ell }. The average rank procedure therefore assigns them the rank(k+)/2{\displaystyle (k+\ell )/2}.

Under the average rank procedure, the null distribution is different in the presence of ties.[27][28] The average rank procedure also has some disadvantages that are similar to those of the reduced sample procedure for zeros. It is possible that a sample can be judged significantly positive by the average rank procedure; but increasing some of the values so as to break the ties, or breaking the ties in any way whatsoever, results in a sample that the test judges to be not significant.[29][30] However, increasing all the observed values by the same amount cannot turn a significantly positive result into an insignificant one, nor an insignificant one into a significantly negative one. Furthermore, if the observations are distributed symmetrically, then the values ofμ{\displaystyle \mu } which the test does not reject form an interval.[31][32]

The other common option for handling ties is a tiebreaking procedure. In a tiebreaking procedure, the observations are assigned distinct ranks in the set{1,,n}{\displaystyle \{1,\dots ,n\}}. The rank assigned to an observation depends on its absolute value and the tiebreaking rule. Observations with smaller absolute values are always given smaller ranks, just as in the standard rank-sum test. The tiebreaking rule is used to assign ranks to observations with the same absolute value. One advantage of tiebreaking rules is that they allow the use of standard tables for computingp-values.[33]

Random tiebreaking breaks the ties at random. Under random tiebreaking, the null distribution is the same as when there are no ties, but the result of the test depends not only on the data but on additional random choices. Averaging the ranks over the possible random choices results in the average rank procedure.[29] One could also report the probability of rejection over all random choices.[34] Random tiebreaking has the advantage that the probability that a sample is judged significantly positive does not decrease when some observations are increased.[35]Conservative tiebreaking breaks the ties in favor of the null hypothesis. When performing a one-sided test in which negative values ofT{\displaystyle T} tend to be more significant, ties are broken by assigning lower ranks to negative observations and higher ranks to positive ones. When the test makes positive values ofT{\displaystyle T} significant, ties are broken the other way, and when large absolute values ofT{\displaystyle T} are significant, ties are broken so as to make|T|{\displaystyle |T|} as small as possible. Pratt observes that when ties are likely, the conservative tiebreaking procedure "presumably has low power, since it amounts to breaking all ties in favor of the null hypothesis."[36]

The average rank procedure can disagree with tiebreaking procedures. Pratt gives the following example.[29] Suppose that the observations are:

1, 1, 1, 1, 2, 3, −4.

The average rank procedure assigns these the signed ranks

2.5, 2.5, 2.5, 2.5, 5, 6, −7.

This sample is significantly positive at the one-sided levelα=14/27{\displaystyle \alpha =14/2^{7}}. On the other hand, any tiebreaking rule will assign the ranks

1, 2, 3, 4, 5, 6, −7.

At the same one-sided levelα=14/27{\displaystyle \alpha =14/2^{7}}, this is not significant.

Two other options for handling ties are based around averaging the results of tiebreaking. In theaverage statistic method, the test statisticT{\displaystyle T} is computed for every possible way of breaking ties, and the final statistic is the mean of the tie-broken statistics. In theaverage probability method, thep-value is computed for every possible way of breaking ties, and the finalp-value is the mean of the tie-brokenp-values.[37]

Computing the null distribution

[edit]

Computingp-values requires knowing the distribution ofT{\displaystyle T} under the null hypothesis. There is no closed formula for this distribution.[38] However, for small values ofn{\displaystyle n}, the distribution may be computed exactly. Under the null hypothesis that the data is symmetric about zero, eachXi{\displaystyle X_{i}} is exactly as likely to be positive as it is negative. Therefore the probability thatT=t{\displaystyle T=t} under the null hypothesis is equal to the number of sign combinations that yieldT=t{\displaystyle T=t} divided by the number of possible sign combinations2n{\displaystyle 2^{n}}. This can be used to compute the exact distribution ofT{\displaystyle T} under the null hypothesis.[39]

Computing the distribution ofT{\displaystyle T} by considering all possibilities requires computing2n{\displaystyle 2^{n}} sums, which is intractable for all but the smallestn{\displaystyle n}. However, there is an efficient recursion for the distribution ofT+{\displaystyle T^{+}}.[40][41] Defineun(t+){\displaystyle u_{n}(t^{+})} to be the number of sign combinations for whichT+=t+{\displaystyle T^{+}=t^{+}}. This is equal to the number of subsets of{1,,n}{\displaystyle \{1,\dots ,n\}} which sum tot+{\displaystyle t^{+}}. The base cases of the recursion areu0(0)=1{\displaystyle u_{0}(0)=1},u0(t+)=0{\displaystyle u_{0}(t^{+})=0} for allt+0{\displaystyle t^{+}\neq 0}, andun(t+)=0{\displaystyle u_{n}(t^{+})=0} for allt<0{\displaystyle t<0} ort>n(n+1)/2{\displaystyle t>n(n+1)/2}. The recursive formula isun(t+)=un1(t+)+un1(t+n).{\displaystyle u_{n}(t^{+})=u_{n-1}(t^{+})+u_{n-1}(t^{+}-n).}The formula is true because every subset of{1,,n}{\displaystyle \{1,\dots ,n\}} which sums tot+{\displaystyle t^{+}} either does not containn{\displaystyle n}, in which case it is also a subset of{1,,n1}{\displaystyle \{1,\dots ,n-1\}}, or it does containn{\displaystyle n}, in which case removingn{\displaystyle n} from the subset produces a subset of{1,,n1}{\displaystyle \{1,\dots ,n-1\}} which sums tot+n{\displaystyle t^{+}-n}. Under the null hypothesis, the probability mass function ofT+{\displaystyle T^{+}} satisfiesPr(T+=t+)=un(t+)/2n{\displaystyle \Pr(T^{+}=t^{+})=u_{n}(t^{+})/2^{n}}. The functionun{\displaystyle u_{n}} is closely related to the integerpartition function.[42]

Ifpn(t+){\displaystyle p_{n}(t^{+})} is the probability thatT+=t+{\displaystyle T^{+}=t^{+}} under the null hypothesis when there aren{\displaystyle n} observations in the sample, thenpn(t+){\displaystyle p_{n}(t^{+})} satisfies a similar recursion:[42]2pn(t+)=pn1(t+)+pn1(t+n){\displaystyle 2p_{n}(t^{+})=p_{n-1}(t^{+})+p_{n-1}(t^{+}-n)}with similar boundary conditions. There is also a recursive formula for the cumulative distribution functionPr(T+t+){\displaystyle \Pr(T^{+}\leq t^{+})}.[42]

For very largen{\displaystyle n}, even the above recursion is too slow. In this case, the null distribution can be approximated. The null distributions ofT{\displaystyle T},T+{\displaystyle T^{+}}, andT{\displaystyle T^{-}} are asymptotically normal with means and variances:[43]E[T+]=E[T]=n(n+1)4,E[T]=0,Var(T+)=Var(T)=n(n+1)(2n+1)24,Var(T)=n(n+1)(2n+1)6.{\displaystyle {\begin{aligned}\mathbf {E} [T^{+}]&=\mathbf {E} [T^{-}]={\frac {n(n+1)}{4}},\\\mathbf {E} [T]&=0,\\\operatorname {Var} (T^{+})&=\operatorname {Var} (T^{-})={\frac {n(n+1)(2n+1)}{24}},\\\operatorname {Var} (T)&={\frac {n(n+1)(2n+1)}{6}}.\end{aligned}}}

Better approximations can be produced using Edgeworth expansions. Using a fourth-order Edgeworth expansion shows that:[44][45]Pr(T+k)Φ(t)+ϕ(t)(3n2+3n110n(n+1)(2n+1))(t33t),{\displaystyle \Pr(T^{+}\leq k)\approx \Phi (t)+\phi (t){\Big (}{\frac {3n^{2}+3n-1}{10n(n+1)(2n+1)}}{\Big )}(t^{3}-3t),}wheret=k+12n(n+1)4n(n+1)(2n+1)24.{\displaystyle t={\frac {k+{\tfrac {1}{2}}-{\frac {n(n+1)}{4}}}{\sqrt {\frac {n(n+1)(2n+1)}{24}}}}.}The technical underpinnings of these expansions are rather involved, because conventional Edgeworth expansions apply to sums of IID continuous random variables, whileT+{\displaystyle T^{+}} is a sum of non-identically distributed discrete random variables. The final result, however, is that the above expansion has an error ofO(n3/2){\displaystyle O(n^{-3/2})}, just like a conventional fourth-order Edgeworth expansion.[44]

The moment generating function ofT{\displaystyle T} has the exact formula:[46]M(t)=12nj=1n(1+ejt).{\displaystyle M(t)={\frac {1}{2^{n}}}\prod _{j=1}^{n}(1+e^{jt}).}

When zeros are present and the signed-rank zero procedure is used, or when ties are present and the average rank procedure is used, the null distribution ofT{\displaystyle T} changes. Cureton derived a normal approximation for this situation.[47][48] Suppose that the original number of observations wasn{\displaystyle n} and the number of zeros wasz{\displaystyle z}. The tie correction isc=t3t,{\displaystyle c=\sum t^{3}-t,}where the sum is over all the sizest{\displaystyle t} of each group of tied observations. The expectation ofT{\displaystyle T} is still zero, while the expectation ofT+{\displaystyle T^{+}} isE[T+]=n(n+1)4z(z+1)4.{\displaystyle \mathbf {E} [T^{+}]={\frac {n(n+1)}{4}}-{\frac {z(z+1)}{4}}.}Ifσ2=n(n+1)(2n+1)z(z+1)(2z+1)c/26,{\displaystyle \sigma ^{2}={\frac {n(n+1)(2n+1)-z(z+1)(2z+1)-c/2}{6}},}thenVar(T)=σ2,Var(T+)=σ2/4.{\displaystyle {\begin{aligned}\operatorname {Var} (T)&=\sigma ^{2},\\\operatorname {Var} (T^{+})&=\sigma ^{2}/4.\end{aligned}}}

Alternative statistics

[edit]

Wilcoxon[49] originally defined the Wilcoxon rank-sum statistic to bemin(T+,T){\displaystyle \min(T^{+},T^{-})}. Early authors such as Siegel[6] followed Wilcoxon. This is appropriate for two-sided hypothesis tests, but it cannot be used for one-sided tests.

Instead of assigning ranks between 1 andn, it is also possible to assign ranks between 0 andn1{\displaystyle n-1}. These are calledmodified ranks.[50] The modified signed-rank sumT0{\displaystyle T_{0}}, the modified positive-rank sumT0+{\displaystyle T_{0}^{+}}, and the modified negative-rank sumT0{\displaystyle T_{0}^{-}} are defined analogously toT{\displaystyle T},T+{\displaystyle T^{+}}, andT{\displaystyle T^{-}} but with the modified ranks in place of the ordinary ranks. The probability that the sum of two independentF{\displaystyle F}-distributed random variables is positive can be estimated as2T0+/(n(n1)){\displaystyle 2T_{0}^{+}/(n(n-1))}.[51] When consideration is restricted to continuous distributions, this is a minimum variance unbiased estimator ofp2{\displaystyle p_{2}}.[52]

Example

[edit]
i{\displaystyle i}x2,i{\displaystyle x_{2,i}}x1,i{\displaystyle x_{1,i}}x2,ix1,i{\displaystyle x_{2,i}-x_{1,i}}
sgn{\displaystyle \operatorname {sgn} }abs{\displaystyle \operatorname {abs} }
1125110115
2115122 –17
313012515
4140120120
5140140 0
6115124 –19
7140123117
8125137 –112
914013515
10135145 –110
order by absolute difference
i{\displaystyle i}x2,i{\displaystyle x_{2,i}}x1,i{\displaystyle x_{1,i}}x2,ix1,i{\displaystyle x_{2,i}-x_{1,i}}
sgn{\displaystyle \operatorname {sgn} }abs{\displaystyle {\text{abs}}}Ri{\displaystyle R_{i}}sgnRi{\displaystyle \operatorname {sgn} \cdot R_{i}}
5140140 0  
3130125151.51.5
9140135151.51.5
2115122 –173 –3
6115124 –194 –4
10135145 –1105 –5
8125137 –1126 –6
112511011577
714012311788
414012012099

sgn{\displaystyle \operatorname {sgn} } is thesign function,abs{\displaystyle \operatorname {abs} } is theabsolute value, andRi{\displaystyle R_{i}} is therank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.

W=1.5+1.53456+7+8+9=9{\displaystyle W=1.5+1.5-3-4-5-6+7+8+9=9}
|W|<Wcrit(α=0.05, 9, two-sided)=15{\displaystyle |W|<W_{\operatorname {crit} (\alpha =0.05,\ 9{\text{, two-sided}})}=15}
failed to reject H0{\displaystyle \therefore {\text{failed to reject }}H_{0}} that the median of pairwise differences is different from zero.
Thep{\displaystyle p}-value for this result is0.6113{\displaystyle 0.6113}

Effect size

[edit]
Main article:Mann–Whitney_U_test § Rank-biserial_correlation

To compute aneffect size for the signed-rank test, one can use therank-biserial correlation.

If the test statisticT is reported, the rank correlation r is equal to the test statisticT divided by the total rank sumS, or r = T/S.[53] Using the above example, the test statistic isT = 9. The sample size of 9 has a total rank sum ofS = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, sor = 0.20.

If the test statisticT is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.[53] To continue with the current example, the sample size is 9, so the total rank sum is 45.T is the smaller of the two rank sums, soT is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sumS minusT, or in this case 45 − 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hencer = .20.

Software implementations

[edit]
  • R includes an implementation of the test aswilcox.test(x,y,paired=TRUE), where x and y are vectors of equal length.[54]
  • ALGLIB includes implementation of the Wilcoxon signed-rank test in C++, C#, Delphi, Visual Basic, etc.
  • GNU Octave implements various one-tailed and two-tailed versions of the test in thewilcoxon_test function.
  • SciPy includes an implementation of the Wilcoxon signed-rank test in Python.
  • Accord.NET includes an implementation of the Wilcoxon signed-rank test in C# for .NET applications.
  • MATLAB implements this test using "Wilcoxon rank sum test" as[p,h] = signrank(x,y) also returns a logical value indicating the test decision. The result h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level.
  • Julia HypothesisTests package includes the Wilcoxon signed-rank test asvalue(SignedRankTest(x, y)).
  • SAS PROC UNIVARIATE includes the Wilcoxon-Signed Rank Test in the frame titles "Tests for Location" as "Signed Rank". Even though this procedure calculates an S-Statistic rather than a W-Statistic, the resulting p-value can still be used for this test.[55] Also SAS with PROC NPAR1WAY contains many non-parametric test and also sports exact test using a bayesian mcmc approach.

See also

[edit]

References

[edit]
  1. ^Conover, W. J. (1999).Practical nonparametric statistics (3rd ed.). John Wiley & Sons, Inc.ISBN 0-471-16068-7., p. 350
  2. ^McDonald, John H."Wilcoxon signed-rank test – Handbook of Biological Statistics".www.biostathandbook.com. Retrieved2021-09-02.
  3. ^Wilcoxon, Frank (Dec 1945)."Individual comparisons by ranking methods"(PDF).Biometrics Bulletin.1 (6):80–83.doi:10.2307/3001968.hdl:10338.dmlcz/135688.JSTOR 3001968.
  4. ^Siegel, Sidney (2007) [1956].Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83.ISBN 978-0-07-057348-2.
  5. ^Conover, p. 352
  6. ^abSiegel, p. 76
  7. ^abConover, p. 353
  8. ^Pratt, John W.; Gibbons, Jean D. (1981).Concepts of Nonparametric Theory. Springer-Verlag.ISBN 978-1-4612-5933-6., p. 148
  9. ^abPratt and Gibbons, p. 148
  10. ^Pratt and Gibbons, p. 150
  11. ^Conover, pp. 352–357
  12. ^Hettmansperger, Thomas P. (1984).Statistical Inference Based on Ranks. John Wiley & Sons.ISBN 0-471-88474-X., pp. 32, 50
  13. ^Pratt and Gibbons, pp. 146–147
  14. ^Hettmansperger, pp. 30–31
  15. ^Pratt and Gibbons, p. 155
  16. ^Conover, p. 354
  17. ^Hollander, Myles; Wolfe, Douglas A.; Chicken, Eric (2014).Nonparametric Statistical Methods (Third ed.). John Wiley & Sons, Inc.ISBN 978-0-470-38737-5., pp. 39–41
  18. ^abPratt and Gibbons, p. 147
  19. ^Wilcoxon, Frank (1949).Some Rapid Approximate Statistical Procedures. American Cynamic Co.
  20. ^Pratt, J. (1959). "Remarks on zeros and ties in the Wilcoxon signed rank procedures".Journal of the American Statistical Association.54 (287):655–667.doi:10.1080/01621459.1959.10501526.
  21. ^Pratt, p. 659
  22. ^Pratt, p. 663
  23. ^Derrick, B; White, P (2017). "Comparing Two Samples from an Individual Likert Question".International Journal of Mathematics and Statistics.18 (3):1–13.
  24. ^Conover, William Jay (1973). "On Methods of Handling Ties in the Wilcoxon Signed-Rank Test".Journal of the American Statistical Association.68 (344):985–988.doi:10.1080/01621459.1973.10481460.
  25. ^Pratt and Gibbons, p. 162
  26. ^Conover, pp. 352–353
  27. ^Pratt and Gibbons, p. 164
  28. ^Conover, pp. 358–359
  29. ^abcPratt, p. 660
  30. ^Pratt and Gibbons, pp. 168–169
  31. ^Pratt, pp. 661–662
  32. ^Pratt and Gibbons, p. 170
  33. ^Pratt and Gibbons, pp. 163, 166
  34. ^Pratt and Gibbons, p. 166
  35. ^Pratt and Gibbons, p. 171
  36. ^Pratt, p. 661
  37. ^Gibbons, Jean D.; Chakraborti, Subhabrata (2011).Nonparametric Statistical Inference (Fifth ed.). Chapman & Hall/CRC.ISBN 978-1-4200-7762-9., p. 194
  38. ^Hettmansperger, p. 34
  39. ^Pratt and Gibbons, pp. 148–149
  40. ^Pratt and Gibbons, pp. 148–149, pp. 186–187
  41. ^Hettmansperger, p. 171
  42. ^abcPratt and Gibbons, p. 187
  43. ^Pratt and Gibbons, p. 149
  44. ^abKolassa, John E. (1995)."Edgeworth approximations for rank sum test statistics".Statistics and Probability Letters.24 (2):169–171.doi:10.1016/0167-7152(95)00164-H.
  45. ^Hettmansperger, p. 37
  46. ^Hettmansperger, p. 35
  47. ^Cureton, Edward E. (1967). "The normal approximation to the signed-rank sampling distribution when zero differences are present".Journal of the American Statistical Association.62 (319):1068–1069.doi:10.1080/01621459.1967.10500917.
  48. ^Pratt and Gibbons, p. 193
  49. ^Wilcoxon, p. 82
  50. ^Pratt and Gibbons, p. 158
  51. ^Pratt and Gibbons, p. 159
  52. ^Pratt and Gibbons, p. 191
  53. ^abKerby, Dave S. (2014), "The simple difference formula: An approach to teaching nonparametric correlation.",Comprehensive Psychology,3 11.IT.3.1,doi:10.2466/11.IT.3.1
  54. ^Dalgaard, Peter (2008).Introductory Statistics with R. Springer Science & Business Media. pp. 99–100.ISBN 978-0-387-79053-4.
  55. ^"Wilcox signed-rank test: SAS instruction".www.stat.purdue.edu. Retrieved2023-08-24.

External links

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Wilcoxon_signed-rank_test&oldid=1319736979"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp