Movatterモバイル変換

[0]ホーム

Jump to content

Statistical model specification

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromModel specification)

Part of the process of building a statistical model

Instatistics,model specification is part of the process of building astatistical model: specification consists of selecting an appropriatefunctional form for the model and choosing which variables to include. For example, givenpersonal income $y {\displaystyle y}$ together with years of schooling $s {\displaystyle s}$ and on-the-job experience $x {\displaystyle x}$ , we might specify a functional relationship $y=f(s,x)$ as follows:^[1]

\ln y=\ln y_{0}+\rho s+\beta _{1}x+\beta _{2}x^{2}+\varepsilon

where $\varepsilon$ is the unexplainederror term that is supposed to compriseindependent and identically distributed Gaussian variables.

The statisticianSir David Cox has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".^[2]

Specification error and bias

[edit]

Specification error occurs when the functional form or the choice ofindependent variables poorly represent relevant aspects of the true data-generating process. In particular,bias (theexpected value of the difference of an estimatedparameter and the true underlying value) occurs if an independent variable is correlated with the errors inherent in the underlying process. There are several different possible causes of specification error; some are listed below.

An inappropriate functional form could be employed.
A variable omitted from the model may have a relationship with both thedependent variable and one or more of the independent variables (causingomitted-variable bias).^[3]
An irrelevant variable may be included in the model (although this does not create bias, it involvesoverfitting and so can lead to poor predictive performance).
The dependent variable may be part of a system ofsimultaneous equations (giving simultaneity bias).

Additionally,measurement errors may affect the independent variables: while this is not a specification error, it can create statistical bias.

Note that all models will have some specification error. Indeed, in statistics there is a common aphorism that "all models are wrong". In the words of Burnham & Anderson,

"Modeling is an art as well as a science and is directed toward finding a good approximating model ... as the basis for statistical inference".^[4]

Detection of misspecification

[edit]

TheRamsey RESET test can help test for specification error inregression analysis.

In the example given above relating personal income to schooling and job experience, if the assumptions of the model are correct, then theleast squares estimates of the parameters $\rho$ and $\beta$ will beefficient andunbiased. Hence specification diagnostics usually involve testing the first to fourthmoment of theresiduals.^[5]

Model building

[edit]

Building a model involves finding a set of relationships to represent the process that is generating the data. This requires avoiding all the sources of misspecification mentioned above.

One approach is to start with a model in general form that relies on a theoretical understanding of the data-generating process. Then the model can be fit to the data and checked for the various sources of misspecification, in a task calledstatistical model validation. Theoretical understanding can then guide the modification of the model in such a way as to retain theoretical validity while removing the sources of misspecification. But if it proves impossible to find a theoretically acceptable specification that fits the data, the theoretical model may have to be rejected and replaced with another one.

A quotation fromKarl Popper is apposite here: "Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve".^[6]

Another approach to model building is to specify several different models as candidates, and then compare those candidate models to each other. The purpose of the comparison is to determine which candidate model is most appropriate for statistical inference. Common criteria for comparing models include the following:R²,Bayes factor, and thelikelihood-ratio test together with its generalizationrelative likelihood. For more on this topic, seestatistical model selection.

Notes

[edit]

^This particular example is known asMincer earnings function.
^Cox, D. R. (2006),Principles of Statistical Inference,Cambridge University Press, p. 197.
^"Quantitative Methods II: Econometrics",College of William & Mary.
^Burnham, K. P.; Anderson, D. R. (2002),Model Selection and Multimodel Inference: A practical information-theoretic approach (2nd ed.),Springer-Verlag, §1.1.
^Long, J. Scott; Trivedi, Pravin K. (1993). "Some specification tests for the linear regression model". InBollen, Kenneth A.; Long, J. Scott (eds.).Testing Structural Equation Models.SAGE Publishing. pp. 66–110.
^Popper, Karl (1972),Objective Knowledge: An evolutionary approach,Oxford University Press.

Movatterモバイル変換

Statistical model specification

Specification error and bias

Detection of misspecification

Model building

See also

Notes

Further reading