Together with theLagrange multiplier test and thelikelihood-ratio test, the Wald test is one of three classical approaches tohypothesis testing. An advantage of the Wald test over the other two is that it only requires the estimation of the unrestricted model, which lowers thecomputational burden as compared to the likelihood-ratio test. However, a major disadvantage is that (in finite samples) it is not invariant to changes in the representation of the null hypothesis; in other words, algebraically equivalentexpressions of non-linear parameter restriction can lead to different values of the test statistic.[5][6] That is because the Wald statistic is derived from aTaylor expansion,[7] and different ways of writing equivalent nonlinear expressions lead to nontrivial differences in the corresponding Taylor coefficients.[8] Another aberration, known as the Hauck–Donner effect,[9] can occur inbinomial models when the estimated (unconstrained) parameter is close to theboundary of theparameter space—for instance a fitted probability being extremely close to zero or one—which results in the Wald test no longermonotonically increasing in the distance between the unconstrained and constrained parameter.[10][11]
Under the Wald test, the estimated that was found as themaximizing argument of the unconstrainedlikelihood function is compared with a hypothesized value. In particular, the squared difference is weighted by the curvature of the log-likelihood function.
If the hypothesis involves only a single parameter restriction, then the Wald statistic takes the following form:
which under the null hypothesis follows an asymptotic χ2-distribution with one degree of freedom. The square root of the single-restriction Wald statistic can be understood as a (pseudo)t-ratio that is, however, not actuallyt-distributed except for the special case of linear regression withnormally distributed errors.[12] In general, it follows an asymptoticz distribution.[13]
where is thestandard error (SE) of themaximum likelihood estimate (MLE), the square root of the variance. There are several ways toconsistently estimate thevariance matrix which in finite samples leads to alternative estimates of standard errors and associated test statistics andp-values.[3]: 129 The validity of still getting an asymptotically normal distribution after plugin-in theMLE estimator of into theSE relies onSlutsky's theorem.
The Wald test can be used to test a single hypothesis on multiple parameters, as well as to test jointly multiple hypotheses on single/multiple parameters. Let be our sample estimator ofP parameters (i.e., is a vector), which is supposed to follow asymptotically a normal distribution withcovariance matrixV,.The test ofQ hypotheses on theP parameters is expressed with a matrix R:
The distribution of the test statistic under the null hypothesis is
which in turn implies
where is an estimator of the covariance matrix.[14]
What if the covariance matrix is not known a-priori and needs to be estimated from the data? If we have aconsistent estimator of such that has a determinant that is distributed, then by the independence of the covariance estimator and equation above, we have:
In the standard form, the Wald test is used to test linear hypotheses that can be represented by a single matrix R. If one wishes to test a non-linear hypothesis of the form:
The test statistic becomes:
where is thederivative of c evaluated at the sample estimator. This result is obtained using thedelta method, which uses a first order approximation of the variance.
The fact that one uses an approximation of the variance has the drawback that the Wald statistic is not-invariant to a non-linear transformation/reparametrisation of the hypothesis: it can give different answers to the same question, depending on how the question is phrased.[15][5] For example, asking whetherR = 1 is the same as asking whether log R = 0; but the Wald statistic forR = 1 is not the same as the Wald statistic for log R = 0 (because there is in general no neat relationship between the standard errors ofR and log R, so it needs to be approximated).[16]
There are several reasons to prefer the likelihood ratio test or the Lagrange multiplier to the Wald test:[18][19][20]
Non-invariance: As argued above, the Wald test is not invariant under reparametrization, while the likelihood ratio tests will give exactly the same answer whether we work withR, log R or any othermonotonic transformation of R.[5]
The other reason is that the Wald test uses two approximations (that we know the standard error orFisher information and the maximum likelihood estimate), whereas the likelihood ratio test depends only on the ratio of likelihood functions under the null hypothesis and alternative hypothesis.
The Wald test requires an estimate using the maximizing argument, corresponding to the "full" model. In some cases, the model is simpler under the null hypothesis, so that one might prefer to use thescore test (also called Lagrange multiplier test), which has the advantage that it can be formulated in situations where the variability of the maximizing element is difficult to estimate or computing the estimate according to the maximum likelihood estimator is difficult; e.g. theCochran–Mantel–Haenzel test is a score test.[21]
^Davidson, Russell; MacKinnon, James G. (1993). "The Method of Maximum Likelihood : Fundamental Concepts and Notation".Estimation and Inference in Econometrics. New York: Oxford University Press. p. 89.ISBN0-19-506011-3.
^Yee, Thomas William (2022). "On the Hauck–Donner Effect in Wald Tests: Detection, Tipping Points, and Parameter Space Characterization".Journal of the American Statistical Association.117 (540):1763–1774.arXiv:2001.08431.doi:10.1080/01621459.2021.1886936.
^Davidson, Russell; MacKinnon, James G. (1993). "The Method of Maximum Likelihood : Fundamental Concepts and Notation".Estimation and Inference in Econometrics. New York: Oxford University Press. p. 89.ISBN0-19-506011-3.
^Harrell, Frank E. Jr. (2001). "Section 9.3.1".Regression modeling strategies. New York: Springer-Verlag.ISBN0387952322.
^Engle, Robert F. (1983). "Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics". In Intriligator, M. D.; Griliches, Z. (eds.).Handbook of Econometrics. Vol. II. Elsevier. pp. 796–801.ISBN978-0-444-86185-6.
^Harrell, Frank E. Jr. (2001). "Section 9.3.3".Regression modeling strategies. New York: Springer-Verlag.ISBN0387952322.
^Collett, David (1994).Modelling Survival Data in Medical Research. London: Chapman & Hall.ISBN0412448807.
^Pawitan, Yudi (2001).In All Likelihood. New York: Oxford University Press.ISBN0198507658.