Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Delta method

From Wikipedia, the free encyclopedia
Method in statistics

Instatistics, thedelta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as adifferentiable function of a random variable which isasymptoticallyGaussian.

History

[edit]

The delta method was derived frompropagation of error, and the idea behind was known in the early 20th century.[1] Its statistical application can be traced as far back as 1928 byT. L. Kelley.[2] A formal description of the method was presented byJ. L. Doob in 1935.[3]Robert Dorfman also described a version of it in 1938.[4]

Univariate delta method

[edit]

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated inunivariate terms. Roughly, if there is asequence of random variablesXn satisfying

n[Xnθ]DN(0,σ2),{\displaystyle {{\sqrt {n}}[X_{n}-\theta ]\,{\xrightarrow {D}}\,{\mathcal {N}}(0,\sigma ^{2})},}

whereθ andσ2 are finite valued constants andD{\displaystyle {\xrightarrow {D}}} denotesconvergence in distribution, then

n[g(Xn)g(θ)]DN(0,σ2[g(θ)]2){\displaystyle {{\sqrt {n}}[g(X_{n})-g(\theta )]\,{\xrightarrow {D}}\,{\mathcal {N}}(0,\sigma ^{2}\cdot [g'(\theta )]^{2})}}

for any functiong satisfying the property that its first derivative, evaluated atθ{\displaystyle \theta },g(θ){\displaystyle g'(\theta )} exists and is non-zero valued.

The intuition of the delta method is that any suchg function, in a "small enough" range of the function, can be approximated via a first orderTaylor series (which is basically a linear function). If the random variable is roughly normal then alinear transformation of it is also normal. Small range can be achieved when approximating the function around the mean, when the variance is "small enough". When g is applied to a random variable such as the mean, the delta method would tend to work better as the sample size increases, since it would help reduce the variance, and thus the Taylor approximation would be applied to a smaller range of the function g at the point of interest.

Proof in the univariate case

[edit]

Demonstration of this result is fairly straightforward under the assumption thatg(x){\displaystyle g(x)} is differentiable near the neighborhood ofθ{\displaystyle \theta } andg(x){\displaystyle g'(x)} is continuous atθ{\displaystyle \theta } withg(θ)0{\displaystyle g'(\theta )\neq 0}. To begin, we use themean value theorem (i.e.: the first order approximation of aTaylor series usingTaylor's theorem):

g(Xn)=g(θ)+g(θ~)(Xnθ),{\displaystyle g(X_{n})=g(\theta )+g'({\tilde {\theta }})(X_{n}-\theta ),}

whereθ~{\displaystyle {\tilde {\theta }}} lies betweenXn andθ.Note that sinceXnPθ{\displaystyle X_{n}\,{\xrightarrow {P}}\,\theta } and|θ~θ|<|Xnθ|{\displaystyle |{\tilde {\theta }}-\theta |<|X_{n}-\theta |}, it must be thatθ~Pθ{\displaystyle {\tilde {\theta }}\,{\xrightarrow {P}}\,\theta } and sinceg′(θ) is continuous, applying thecontinuous mapping theorem yields

g(θ~)Pg(θ),{\displaystyle g'({\tilde {\theta }})\,{\xrightarrow {P}}\,g'(\theta ),}

whereP{\displaystyle {\xrightarrow {P}}} denotesconvergence in probability.

Rearranging the terms and multiplying byn{\displaystyle {\sqrt {n}}} gives

n[g(Xn)g(θ)]=g(θ~)n[Xnθ].{\displaystyle {\sqrt {n}}[g(X_{n})-g(\theta )]=g'\left({\tilde {\theta }}\right){\sqrt {n}}[X_{n}-\theta ].}

Since

n[Xnθ]DN(0,σ2){\displaystyle {{\sqrt {n}}[X_{n}-\theta ]{\xrightarrow {D}}{\mathcal {N}}(0,\sigma ^{2})}}

by assumption, it follows immediately from appeal toSlutsky's theorem that

n[g(Xn)g(θ)]DN(0,σ2[g(θ)]2).{\displaystyle {{\sqrt {n}}[g(X_{n})-g(\theta )]{\xrightarrow {D}}{\mathcal {N}}(0,\sigma ^{2}[g'(\theta )]^{2})}.}

This concludes the proof.

Proof with an explicit order of approximation

[edit]

Alternatively, one can add one more step at the end, to obtain theorder of approximation:

n[g(Xn)g(θ)]=g(θ~)n[Xnθ]=n[Xnθ][g(θ~)+g(θ)g(θ)]=n[Xnθ][g(θ)]+n[Xnθ][g(θ~)g(θ)]=n[Xnθ][g(θ)]+Op(1)op(1)=n[Xnθ][g(θ)]+op(1){\displaystyle {\begin{aligned}{\sqrt {n}}[g(X_{n})-g(\theta )]&=g'\left({\tilde {\theta }}\right){\sqrt {n}}[X_{n}-\theta ]\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'({\tilde {\theta }})+g'(\theta )-g'(\theta )\right]\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'(\theta )\right]+{\sqrt {n}}[X_{n}-\theta ]\left[g'({\tilde {\theta }})-g'(\theta )\right]\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'(\theta )\right]+O_{p}(1)\cdot o_{p}(1)\\[5pt]&={\sqrt {n}}[X_{n}-\theta ]\left[g'(\theta )\right]+o_{p}(1)\end{aligned}}}

This suggests that the error in the approximation converges to 0 in probability.

Multivariate delta method

[edit]

By definition, aconsistent estimatorBconverges in probability to its true valueβ, and often acentral limit theorem can be applied to obtainasymptotic normality:

n(Bβ)DN(0,Σ),{\displaystyle {\sqrt {n}}\left(B-\beta \right)\,{\xrightarrow {D}}\,N\left(0,\Sigma \right),}

wheren is the number of observations and Σ is a (symmetric positive semi-definite)covariance matrix. Suppose we want to estimate the variance of a scalar-valued functionh of the estimatorB. Keeping only the first two terms of theTaylor series, and usingvector notation for thegradient, we can estimateh(B) as

h(B)h(β)+h(β)T(Bβ){\displaystyle h(B)\approx h(\beta )+\nabla h(\beta )^{T}\cdot (B-\beta )}

which implies the variance ofh(B) is approximately

Var(h(B))Var(h(β)+h(β)T(Bβ))=Var(h(β)+h(β)TBh(β)Tβ)=Var(h(β)TB)=h(β)TCov(B)h(β)=h(β)TΣnh(β){\displaystyle {\begin{aligned}\operatorname {Var} \left(h(B)\right)&\approx \operatorname {Var} \left(h(\beta )+\nabla h(\beta )^{T}\cdot (B-\beta )\right)\\[5pt]&=\operatorname {Var} \left(h(\beta )+\nabla h(\beta )^{T}\cdot B-\nabla h(\beta )^{T}\cdot \beta \right)\\[5pt]&=\operatorname {Var} \left(\nabla h(\beta )^{T}\cdot B\right)\\[5pt]&=\nabla h(\beta )^{T}\cdot \operatorname {Cov} (B)\cdot \nabla h(\beta )\\[5pt]&=\nabla h(\beta )^{T}\cdot {\frac {\Sigma }{n}}\cdot \nabla h(\beta )\end{aligned}}}

One can use themean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.

The delta method therefore implies that

n(h(B)h(β))DN(0,h(β)TΣh(β)){\displaystyle {\sqrt {n}}\left(h(B)-h(\beta )\right)\,{\xrightarrow {D}}\,N\left(0,\nabla h(\beta )^{T}\cdot \Sigma \cdot \nabla h(\beta )\right)}

or in univariate terms,

n(h(B)h(β))DN(0,σ2(h(β))2).{\displaystyle {\sqrt {n}}\left(h(B)-h(\beta )\right)\,{\xrightarrow {D}}\,N\left(0,\sigma ^{2}\cdot \left(h^{\prime }(\beta )\right)^{2}\right).}

Example: the binomial proportion

[edit]

SupposeXn isbinomial with parametersp(0,1]{\displaystyle p\in (0,1]} andn. Since

n[Xnnp]DN(0,p(1p)),{\displaystyle {{\sqrt {n}}\left[{\frac {X_{n}}{n}}-p\right]\,{\xrightarrow {D}}\,N(0,p(1-p))},}

we can apply the Delta method withg(θ) = log(θ) to see

n[log(Xnn)log(p)]DN(0,p(1p)[1/p]2){\displaystyle {{\sqrt {n}}\left[\log \left({\frac {X_{n}}{n}}\right)-\log(p)\right]\,{\xrightarrow {D}}\,N(0,p(1-p)[1/p]^{2})}}

Hence, even though for any finiten, the variance oflog(Xnn){\displaystyle \log \left({\frac {X_{n}}{n}}\right)} does not actually exist (sinceXn can be zero), the asymptotic variance oflog(Xnn){\displaystyle \log \left({\frac {X_{n}}{n}}\right)} does exist and is equal to

1pnp.{\displaystyle {\frac {1-p}{np}}.}

Note that sincep>0,Pr(Xnn>0)1{\displaystyle \Pr \left({\frac {X_{n}}{n}}>0\right)\rightarrow 1} asn{\displaystyle n\rightarrow \infty }, so with probability converging to one,log(Xnn){\displaystyle \log \left({\frac {X_{n}}{n}}\right)} is finite for largen.

Moreover, ifp^{\displaystyle {\hat {p}}} andq^{\displaystyle {\hat {q}}} are estimates of different group rates from independent samples of sizesn andm respectively, then the logarithm of the estimatedrelative riskp^q^{\displaystyle {\frac {\hat {p}}{\hat {q}}}} has asymptotic variance equal to

1ppn+1qqm.{\displaystyle {\frac {1-p}{p\,n}}+{\frac {1-q}{q\,m}}.}

This is useful to construct a hypothesis test or to make aconfidence interval for the relative risk.

Alternative form

[edit]

The delta method is often used in a form that is essentially identical to that above, but without the assumption thatXn orB is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are:[5]

Var(hr)=i(hrBi)2Var(Bi)+iji(hrBi)(hrBj)Cov(Bi,Bj)Cov(hr,hs)=i(hrBi)(hsBi)Var(Bi)+iji(hrBi)(hsBj)Cov(Bi,Bj){\displaystyle {\begin{aligned}\operatorname {Var} \left(h_{r}\right)=&\sum _{i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)^{2}\operatorname {Var} \left(B_{i}\right)+\sum _{i}\sum _{j\neq i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)\left({\frac {\partial h_{r}}{\partial B_{j}}}\right)\operatorname {Cov} \left(B_{i},B_{j}\right)\\\operatorname {Cov} \left(h_{r},h_{s}\right)=&\sum _{i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)\left({\frac {\partial h_{s}}{\partial B_{i}}}\right)\operatorname {Var} \left(B_{i}\right)+\sum _{i}\sum _{j\neq i}\left({\frac {\partial h_{r}}{\partial B_{i}}}\right)\left({\frac {\partial h_{s}}{\partial B_{j}}}\right)\operatorname {Cov} \left(B_{i},B_{j}\right)\end{aligned}}}

wherehr is therth element ofh(B) andBi is theith element ofB.

Second-order delta method

[edit]

Wheng′(θ) = 0 the delta method cannot be applied. However, ifg′′(θ) exists and is not zero, the second-order delta method can be applied. By the Taylor expansion,n[g(Xn)g(θ)]=12n[Xnθ]2[g(θ)]+op(1){\displaystyle n[g(X_{n})-g(\theta )]={\frac {1}{2}}n[X_{n}-\theta ]^{2}\left[g''(\theta )\right]+o_{p}(1)}, so that the variance ofg(Xn){\displaystyle g\left(X_{n}\right)} relies on up to the 4th moment ofXn{\displaystyle X_{n}}.

The second-order delta method is also useful in conducting a more accurate approximation ofg(Xn){\displaystyle g\left(X_{n}\right)}'s distribution when sample size is small.n[g(Xn)g(θ)]=n[Xnθ]g(θ)+12n[Xnθ]2ng(θ)+op(1){\displaystyle {\sqrt {n}}[g(X_{n})-g(\theta )]={\sqrt {n}}[X_{n}-\theta ]g'(\theta )+{\frac {1}{2}}{\frac {n[X_{n}-\theta ]^{2}}{\sqrt {n}}}g''(\theta )+o_{p}(1)}.For example, whenXn{\displaystyle X_{n}} follows the standard normal distribution,g(Xn){\displaystyle g\left(X_{n}\right)} can be approximated as theweighted sum of a standard normal and a chi-square with 1 degree of freedom.

Nonparametric delta method

[edit]

A version of the delta method exists innonparametric statistics. LetXiF{\displaystyle X_{i}\sim F} be anindependent and identically distributed random variable with a sample of sizen{\displaystyle n} with anempirical distribution functionF^n{\displaystyle {\hat {F}}_{n}}, and letT{\displaystyle T} be a functional. IfT{\displaystyle T} isHadamard differentiable with respect to theChebyshev metric, then

T(F^n)T(F)se^DN(0,1){\displaystyle {\frac {T({\hat {F}}_{n})-T(F)}{\widehat {\text{se}}}}\xrightarrow {D} N(0,1)}

wherese^=τ^n{\displaystyle {\widehat {\text{se}}}={\frac {\hat {\tau }}{\sqrt {n}}}} andτ^2=1ni=1nL^2(Xi){\displaystyle {\hat {\tau }}^{2}={\frac {1}{n}}\sum _{i=1}^{n}{\hat {L}}^{2}(X_{i})}, withL^(x)=LF^n(δx){\displaystyle {\hat {L}}(x)=L_{{\hat {F}}_{n}}(\delta _{x})} denoting the empiricalinfluence function forT{\displaystyle T}. A nonparametric(1α){\displaystyle (1-\alpha )} pointwise asymptotic confidence interval forT(F){\displaystyle T(F)} is therefore given by

T(F^n)±zα/2se^{\displaystyle T({\hat {F}}_{n})\pm z_{\alpha /2}{\widehat {\text{se}}}}

wherezq{\displaystyle z_{q}} denotes theq{\displaystyle q}-quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.

See also

[edit]

References

[edit]
  1. ^Portnoy, Stephen (2013). "Letter to the Editor".The American Statistician.67 (3): 190.doi:10.1080/00031305.2013.820668.S2CID 219596186.
  2. ^Kelley, Truman L. (1928).Crossroads in the Mind of Man: A Study of Differentiable Mental Abilities. pp. 49–50.ISBN 978-1-4338-0048-1.{{cite book}}:ISBN / Date incompatibility (help)
  3. ^Doob, J. L. (1935)."The Limiting Distributions of Certain Statistics".Annals of Mathematical Statistics.6 (3):160–169.doi:10.1214/aoms/1177732594.JSTOR 2957546.
  4. ^Ver Hoef, J. M. (2012). "Who invented the delta method?".The American Statistician.66 (2):124–127.doi:10.1080/00031305.2012.687494.JSTOR 23339471.
  5. ^Klein, L. R. (1953).A Textbook of Econometrics. p. 258.

Further reading

[edit]

External links

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Delta_method&oldid=1311917313"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp