Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

M-estimator

From Wikipedia, the free encyclopedia
Class of statistical estimators

Instatistics,M-estimators are a broadclass ofextremum estimators for which theobjective function is a sample average.[1] Bothnon-linear least squares andmaximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated byrobust statistics, which contributed new types of M-estimators.[citation needed] However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is calledM-estimation. The "M" initial stands for "maximum likelihood-type".

More generally, an M-estimator may be defined to be a zero of anestimating function.[2][3][4][5][6][7] This estimating function is often the derivative of another statistical function. For example, amaximum-likelihood estimate is the point where the derivative of the likelihood function with respect to the parameter is zero; thus, a maximum-likelihood estimator is acritical point of thescore function.[8] In many applications, such M-estimators can be thought of as estimating characteristics of the population.

Historical motivation

[edit]

Although the main concepts of robust statistics have been formally developed only in recent decades, precursors of robust M-estimators can be traced back to the early history of statistics. Galileo Galilei (1632) was among the first to argue that measurement errors required systematic treatment. Later, Roger Joseph Boscovich (1757) proposed an estimator based on absolute deviations, Daniel Bernoulli (1785) suggested iterative reweighting schemes, and Simon Newcomb (1886) experimented with mixtures of distributions for regression. By the late 19th century, Smith (1888) introduced what is now recognized as the first robust M-estimator, already resembling the modern formulation. A recent review by De Menezes (2021) collected, organized, classified, and reported tuning constants for an extensive set of M-estimators, providing a systematic overview of their properties and applications.[9]

The method ofleast squares is a prototypical M-estimator, since the estimator is defined as a minimum of the sum of squares of the residuals.

Another popular M-estimator is maximum-likelihood estimation. For a family ofprobability density functionsf parameterized byθ, amaximum likelihood estimator ofθ is computed for each set of data by maximizing thelikelihood function over the parameter space { θ } . When the observations are independent and identically distributed, a ML-estimateθ^{\displaystyle {\hat {\theta }}} satisfies

θ^=argmaxθ(i=1nf(xi,θ)){\displaystyle {\widehat {\theta }}=\mathop {\arg \max } _{\theta }{\left(\prod _{i=1}^{n}f(x_{i},\theta )\right)}\,\!}

or, equivalently,

θ^=argminθ(i=1nlogf(xi,θ)).{\displaystyle {\widehat {\theta }}=\mathop {\arg \min } _{\theta }{\left(\sum _{i=1}^{n}-\log f(x_{i},\theta )\right)}.}

Maximum-likelihood estimators have optimal properties in the limit of infinitely many observations under rather general conditions, but may be biased and not the most efficient estimators for finite samples.

Definition

[edit]

In 1964,Peter J. Huber proposed generalizing maximum likelihood estimation to the minimization of

i=1nρ(xi,θ),{\displaystyle \sum _{i=1}^{n}\rho (x_{i},\theta ),\,\!}

where ρ is a function with certain properties (see below). The solutions

θ^=argminθi=1nρ(xi,θ){\displaystyle {\hat {\theta }}=\mathop {\arg \min } _{\theta }\sum _{i=1}^{n}\rho (x_{i},\theta )}

are calledM-estimators ("M" for "maximum likelihood-type" (Huber, 1981, page 43)); other types of robust estimators includeL-estimators,R-estimators andS-estimators. Maximum likelihood estimators (MLE) are thus a special case of M-estimators. With suitable rescaling, M-estimators are special cases ofextremum estimators (in which more general functions of the observations can be used).

The function ρ, or its derivative, ψ, can be chosen in such a way to provide the estimator desirable properties (in terms of bias and efficiency) when the data are truly from the assumed distribution, and 'not bad' behaviour when the data are generated from a model that is, in some sense,close to the assumed distribution.

Types

[edit]

M-estimators are solutions,θ, which minimize

i=1nρ(xi,θ).{\displaystyle \sum _{i=1}^{n}\rho (x_{i},\theta ).}

This minimization can always be done directly. Often it is simpler to differentiate with respect toθ and solve for the root of the derivative. When this differentiation is possible, the M-estimator is said to be ofψ-type. Otherwise, the M-estimator is said to be ofρ-type.

In most practical cases, the M-estimators are of ψ-type.

ρ-type

[edit]

For positive integerr, let(X,Σ){\displaystyle ({\mathcal {X}},\Sigma )} and(ΘRr,S){\displaystyle (\Theta \subset \mathbb {R} ^{r},S)} be measure spaces.θΘ{\displaystyle \theta \in \Theta } is a vector of parameters. An M-estimator of ρ-typeT{\displaystyle T} is defined through ameasurable functionρ:X×ΘR{\displaystyle \rho :{\mathcal {X}}\times \Theta \to \mathbb {R} }. It maps a probability distributionF{\displaystyle F} onX{\displaystyle {\mathcal {X}}} to the valueT(F)Θ{\displaystyle T(F)\in \Theta } (if it exists) that minimizesXρ(x,θ)dF(x){\textstyle \int _{\mathcal {X}}\rho (x,\theta )\,dF(x)}:

T(F):=argminθΘXρ(x,θ)dF(x){\displaystyle T(F):=\mathop {\arg \min } _{\theta \in \Theta }\int _{\mathcal {X}}\rho (x,\theta )\,dF(x)}

For example, for themaximum likelihood estimator,ρ(x,θ)=logf(x,θ){\displaystyle \rho (x,\theta )=-\log f(x,\theta )}, wheref(x,θ)=xF(x,θ){\textstyle f(x,\theta )={\frac {\partial }{\partial x}}F(x,\theta )}.

ψ-type

[edit]

Ifρ{\displaystyle \rho } is differentiable with respect toθ{\displaystyle \theta }, the computation ofθ^{\displaystyle {\widehat {\theta }}} is usually much easier. An M-estimator of ψ-typeT is defined through a measurable functionψ:X×ΘRr{\displaystyle \psi :{\mathcal {X}}\times \Theta \rightarrow \mathbb {R} ^{r}}. It maps a probability distributionF onX{\displaystyle {\mathcal {X}}} to the valueT(F)Θ{\displaystyle T(F)\in \Theta } (if it exists) that solves the vector equation:

Xψ(x,θ)dF(x)=0{\displaystyle \int _{\mathcal {X}}\psi (x,\theta )\,dF(x)=0}

Xψ(x,T(F))dF(x)=0{\displaystyle \int _{\mathcal {X}}\psi (x,T(F))\,dF(x)=0}

For example, for themaximum likelihood estimator,ψ(x,θ)=(θ1logf(x,θ),,θplogf(x,θ))T{\displaystyle \psi (x,\theta )=\left({\frac {\partial }{\partial \theta ^{1}}}\log f(x,\theta ),\dots ,{\frac {\partial }{\partial \theta ^{p}}}\log f(x,\theta )\right)^{\mathrm {T} }}, whereuT{\displaystyle u^{\mathrm {T} }} denotes the transpose of vectoru andf(x,θ)=xF(x,θ){\displaystyle f(x,\theta )={\frac {\partial }{\partial x}}F(x,\theta )}.

Such an estimator is not necessarily an M-estimator of ρ-type, but if ρ has a continuous first derivative with respect toθ{\displaystyle \theta }, then a necessary condition for an M-estimator of ψ-type to be an M-estimator of ρ-type isψ(x,θ)=θρ(x,θ){\displaystyle \psi (x,\theta )=\nabla _{\theta }\rho (x,\theta )}. The previous definitions can easily be extended to finite samples.

If the function ψ decreases to zero asx±{\displaystyle x\rightarrow \pm \infty }, the estimator is calledredescending. Such estimators have some additional desirable properties, such as complete rejection of gross outliers.

Computation

[edit]

For many choices of ρ or ψ, no closed form solution exists and an iterative approach to computation is required. It is possible to use standard function optimization algorithms, such asNewton–Raphson. However, in most cases aniteratively re-weighted least squares fitting algorithm can be performed; this is typically the preferred method.

For some choices of ψ, specifically,redescending functions, the solution may not be unique. The issue is particularly relevant in multivariate and regression problems. Thus, some care is needed to ensure that good starting points are chosen.Robust starting points, such as themedian as an estimate of location and themedian absolute deviation as a univariate estimate of scale, are common.

Concentrating parameters

[edit]

In computation of M-estimators, it is sometimes useful to rewrite theobjective function so that the dimension of parameters is reduced. The procedure is called "concentrating" or "profiling". Examples in which concentrating parameters increases computation speed includeseemingly unrelated regressions (SUR) models.[10]Consider the following M-estimation problem:

(β^n,γ^n):=argmaxβ,γi=1Nq(wi,β,γ){\displaystyle ({\hat {\beta }}_{n},{\hat {\gamma }}_{n}):=\mathop {\arg \max } _{\beta ,\gamma }\sum _{i=1}^{N}\displaystyle q(w_{i},\beta ,\gamma )}

Assuming the functionq is differentiable, M-estimator solves the first order conditions:

i=1Nβq(wi,β,γ)=0{\displaystyle \sum _{i=1}^{N}{\frac {\partial }{\partial \beta }}\,q(w_{i},\beta ,\gamma )=0}

i=1Nγq(wi,β,γ)=0{\displaystyle \sum _{i=1}^{N}{\frac {\partial }{\partial \gamma }}\,q(w_{i},\beta ,\gamma )=0}

Now, if we can solve the second equation for γ in terms ofW:=(w1,w2,..,wN){\displaystyle W:=(w_{1},w_{2},..,w_{N})}andβ{\displaystyle \beta }, the second equation becomes:

i=1Nγq(wi,β,g(W,β))=0{\displaystyle \sum _{i=1}^{N}{\frac {\partial }{\partial \gamma }}\,q(w_{i},\beta ,g(W,\beta ))=0}

whereg is, there is some function to be found. Now, we can rewrite the original objective function solely in terms of β by inserting the function g into the place ofγ{\displaystyle \gamma }. As a result, there is a reduction in the number of parameters.

Whether this procedure can be done depends on particular problems at hand. However, when it is possible, concentrating parameters can facilitate computation to a great degree. For example, in estimatingSUR model of 6 equations with 5 explanatory variables in each equation by Maximum Likelihood, the number of parameters declines from 51 to 30.[10]

Despite its appealing feature in computation, concentrating parameters is of limited use in deriving asymptotic properties of M-estimator.[11] The presence of W in each summand of the objective function makes it difficult to apply thelaw of large numbers and thecentral limit theorem.

Properties

[edit]

Distribution

[edit]

It can be shown that M-estimators are asymptotically normally distributed. As such,Wald-type approaches to constructing confidence intervals and hypothesis tests can be used. However, since the theory is asymptotic, it will frequently be sensible to check the distribution, perhaps by examining the permutation orbootstrap distribution.

Influence function

[edit]

The influence function of an M-estimator ofψ{\displaystyle \psi }-type is proportional to its definingψ{\displaystyle \psi } function.

LetT be an M-estimator of ψ-type, andG be a probability distribution for whichT(G){\displaystyle T(G)} is defined. Its influence function IF isIF(x;T,G)=ψ(x,T(G))ψ(y,θ)θf(y)dy{\displaystyle \operatorname {IF} (x;T,G)=-{\frac {\psi (x,T(G))}{\int {\frac {\partial \psi (y,\theta )}{\partial \theta }}f(y)\,\mathrm {d} y}}}

assuming the density functionf(y){\displaystyle f(y)} exists. A proof of this property of M-estimators can be found in Huber (1981, Section 3.2).

Applications

[edit]

M-estimators can be constructed for location parameters and scale parameters in univariate and multivariate settings, as well as being used in robust regression.

Examples

[edit]

Mean

[edit]

Let (X1, ...,Xn) be a set ofindependent, identically distributed random variables, with distributionF.

If we define

ρ(x,θ)=(xθ)22,{\displaystyle \rho (x,\theta )={\frac {(x-\theta )^{2}}{2}},}

we note that this is minimized whenθ is themean of theXs. Thus the mean is an M-estimator of ρ-type, with this ρ function.

As this ρ function is continuously differentiable inθ, the mean is thus also an M-estimator of ψ-type for ψ(x,θ) =θ − x.

Median

[edit]

For the median estimation of (X1, ...,Xn), instead we can define the ρ function asρ(x,θ)=|xθ|{\displaystyle \rho (x,\theta )=|x-\theta |}and similarly, the ρ function is minimized whenθ is themedian of theXs.

While this ρ function is not differentiable inθ, the ψ-type M-estimator, which is thesubgradient of ρ function, can be expressed asψ(x,θ)=sgn(xθ){\displaystyle \psi (x,\theta )=\operatorname {sgn}(x-\theta )}

andψ(x,θ)={{1},if xθ<0{1},if xθ>0[1,1],if xθ=0{\displaystyle \psi (x,\theta )={\begin{cases}\{-1\},&{\mbox{if }}x-\theta <0\\\{1\},&{\text{if }}x-\theta >0\\\left[-1,1\right],&{\mbox{if }}x-\theta =0\end{cases}}}[clarification needed]

Sufficient conditions for statistical consistency

[edit]

M-estimators areconsistent under various sets of conditions. A typical set of assumptions is the class of functions satisfies auniform law of large numbers and that the maximum is well-separated. Specifically, given an empirical and population objectiveMn,M:ΘR{\displaystyle M_{n},M:\Theta \rightarrow \mathbb {R} }, respectively, asn{\displaystyle n\rightarrow \infty }:

supθΘ|Mn(θ)M(θ)| p 0{\displaystyle \sup _{\theta \in \Theta }\left|M_{n}(\theta )-M(\theta )\right|~{\stackrel {p}{\to }}~0}and for everyϵ>0{\displaystyle \epsilon >0}:supθ:d(θ,θ)ϵM(θ)<M(θ){\displaystyle \sup _{\theta :d(\theta ,\theta ^{*})\geq \epsilon }M(\theta )<M(\theta ^{*})}

whered:Θ×ΘR{\displaystyle d:\Theta \times \Theta \rightarrow \mathbb {R} } is adistance function andθ{\displaystyle \theta ^{*}} is the optimum, then M-estimation is consistent.[12]

The uniform convergence constraint is not necessarily required; an alternate set of assumptions is to instead consider pointwise convergence (in probability) of the objective functions. Additionally, assume that each of theMn{\displaystyle M_{n}} has continuous derivative with exactly one zero or has a derivative which is non-decreasing and is asymptotically orderop(1){\displaystyle o_{p}(1)}. Finally, assume that the maximumθ{\displaystyle \theta ^{*}} is well-separated. Then M-estimation is consistent.[13]

See also

[edit]

References

[edit]
  1. ^Hayashi, Fumio (2000)."Extremum Estimators".Econometrics. Princeton University Press.ISBN 0-691-01018-8.
  2. ^Vidyadhar P. Godambe, editor.Estimating functions, volume 7 of Oxford Statistical Science Series. The Clarendon Press Oxford University Press, New York, 1991.
  3. ^Christopher C. Heyde.Quasi-likelihood and its application: A general approach to optimal parameter estimation. Springer Series in Statistics. Springer-Verlag, New York, 1997.
  4. ^D. L. McLeish and Christopher G. Small.The theory and applications of statistical inference functions, volume 44 of Lecture Notes in Statistics. Springer-Verlag, New York, 1988.
  5. ^Parimal Mukhopadhyay.An Introduction to Estimating Functions. Alpha Science International, Ltd, 2004.
  6. ^Christopher G. Small and Jinfang Wang.Numerical methods for nonlinear estimating equations, volume 29 of Oxford Statistical Science Series. The Clarendon Press Oxford University Press, New York, 2003.
  7. ^Sara A. van de Geer.Empirical Processes in M-estimation: Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.
  8. ^Ferguson, Thomas S. (1982). "An inconsistent maximum likelihood estimate".Journal of the American Statistical Association.77 (380):831–834.doi:10.1080/01621459.1982.10477894.JSTOR 2287314.
  9. ^De Menezes, Diego Q.F. (2021). "A review on robust M-estimators for regression analysis".Computers & Chemical Engineering.147 (1) 107254.doi:10.1016/j.compchemeng.2021.107254.S2CID 232328341.
  10. ^abGiles, D. E. (July 10, 2012)."Concentrating, or Profiling, the Likelihood Function".
  11. ^Wooldridge, J. M. (2001).Econometric Analysis of Cross Section and Panel Data. Cambridge, Mass.: MIT Press.ISBN 0-262-23219-7.
  12. ^Vaart AW van der. Asymptotic Statistics. Cambridge University Press; 1998.
  13. ^Vaart AW van der. Asymptotic Statistics. Cambridge University Press; 1998.

Further reading

[edit]
  • Andersen, Robert (2008).Modern Methods for Robust Regression. Quantitative Applications in the Social Sciences. Vol. 152. Los Angeles, CA: Sage Publications.ISBN 978-1-4129-4072-6.
  • Godambe, V. P. (1991).Estimating functions. Oxford Statistical Science Series. Vol. 7. New York: Clarendon Press.ISBN 978-0-19-852228-7.
  • Heyde, Christopher C. (1997). Heyde, Christopher C (ed.).Quasi-likelihood and its application: A general approach to optimal parameter estimation. Springer Series in Statistics. New York: Springer.doi:10.1007/b98823.ISBN 978-0-387-98225-0.
  • Huber, Peter J. (2009).Robust Statistics (2nd ed.). Hoboken, NJ: John Wiley & Sons Inc.ISBN 978-0-470-12990-6.
  • Hoaglin, David C.; Frederick Mosteller; John W. Tukey (1983).Understanding Robust and Exploratory Data Analysis. Hoboken, NJ: John Wiley & Sons Inc.ISBN 0-471-09777-2.
  • McLeish, D.L.; Christopher G. Small (1989).The theory and applications of statistical inference functions. Lecture Notes in Statistics. Vol. 44. New York: Springer.ISBN 978-0-387-96720-2.
  • Mukhopadhyay, Parimal (2004).An Introduction to Estimating Functions. Harrow, UK: Alpha Science International, Ltd.ISBN 978-1-84265-163-6.
  • Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007),"Section 15.7. Robust Estimation",Numerical Recipes: The Art of Scientific Computing (3rd ed.), New York: Cambridge University Press,ISBN 978-0-521-88068-8
  • Serfling, Robert J. (2002).Approximation theorems of mathematical statistics. Wiley Series in Probability and Mathematical Statistics. Hoboken, NJ: John Wiley & Sons Inc.ISBN 978-0-471-21927-9.
  • Shapiro, Alexander (2000). "On the asymptotics of constrained localM-estimators".Annals of Statistics.28 (3):948–960.CiteSeerX 10.1.1.69.2288.doi:10.1214/aos/1015952006.JSTOR 2674061.MR 1792795.
  • Small, Christopher G.; Jinfang Wang (2003).Numerical methods for nonlinear estimating equations. Oxford Statistical Science Series. Vol. 29. New York: Oxford University Press.ISBN 978-0-19-850688-1.
  • van de Geer, Sara A. (2000).Empirical Processes in M-estimation: Applications of empirical process theory. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 6. Cambridge, UK: Cambridge University Press.ISBN 978-0-521-65002-1.
  • Wilcox, R. R. (2003).Applying contemporary statistical techniques. San Diego, CA: Academic Press. pp. 55–79.
  • Wilcox, R. R. (2012).Introduction to Robust Estimation and Hypothesis Testing, 3rd Ed. San Diego, CA: Academic Press.

External links

[edit]
  • M-estimators — an introduction to the subject by Zhengyou Zhang
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Retrieved from "https://en.wikipedia.org/w/index.php?title=M-estimator&oldid=1315181828"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp