Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Goodness of fit

From Wikipedia, the free encyclopedia
Metric for fit of statistical models
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Goodness of fit" – news ·newspapers ·books ·scholar ·JSTOR
(January 2018) (Learn how and when to remove this message)
Part of a series on
Regression analysis
Models
Estimation
Background

Thegoodness of fit of astatistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used instatistical hypothesis testing, e.g. totest for normality ofresiduals, to test whether two samples are drawn from identical distributions (seeKolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (seePearson's chi-square test). In theanalysis of variance, one of the components into which the variance is partitioned may be alack-of-fit sum of squares.

Fit of distributions

[edit]

In assessing whether a given distribution is suited to a data-set, the followingtests and their underlying measures of fit can be used:

Regression analysis

[edit]

Inregression analysis, more specificallyregression validation, the following topics relate to goodness of fit:

Categorical data

[edit]

The following are examples that arise in the context ofcategorical data.

Pearson's chi-square test

[edit]

Pearson's chi-square test uses a measure of goodness of fit which is the sum of differences between observed andexpected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

χ2=i=1n(OiEi)Ei2{\displaystyle \chi ^{2}=\sum _{i=1}^{n}{{\frac {(O_{i}-E_{i})}{E_{i}}}^{2}}} where:

  • Oi = an observed count for bini
  • Ei = an expected count for bini, asserted by thenull hypothesis.

The expected frequency is calculated by:Ei=(F(Yu)F(Yl))N{\displaystyle E_{i}\,=\,{\bigg (}F(Y_{u})\,-\,F(Y_{l}){\bigg )}\,N}where:

The resulting value can be compared with achi-square distribution to determine the goodness of fit. The chi-square distribution has (kc)degrees of freedom, wherek is the number of non-empty bins andc is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameterWeibull distribution,c = 4.

Binomial case

[edit]
Further information:Binomial test

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There aren trials each with probability of success, denoted byp. Provided thatnpi ≫ 1 for everyi (wherei = 1, 2, ..., k), then

χ2=i=1k(Ninpi)2npi=all bins(OE)2E.{\displaystyle \chi ^{2}=\sum _{i=1}^{k}{\frac {(N_{i}-np_{i})^{2}}{np_{i}}}=\sum _{\mathrm {all\ bins} }^{}{\frac {(\mathrm {O} -\mathrm {E} )^{2}}{\mathrm {E} }}.}

This has approximately a chi-square distribution withk − 1 degrees of freedom. The fact that there arek − 1 degrees of freedom is a consequence of the restrictionNi=n{\textstyle \sum N_{i}=n}. We know there arek observed bin counts, however, once anyk − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are onlyk − 1 freely determined binn counts, thusk − 1 degrees of freedom.

G-test

[edit]

G-tests arelikelihood-ratio tests ofstatistical significance that are increasingly being used in situations where Pearson's chi-square tests were previously recommended.[7]

The general formula forG is

G=2iOiln(OiEi),{\displaystyle G=2\sum _{i}{O_{i}\cdot \ln \left({\frac {O_{i}}{E_{i}}}\right)},}

whereOi{\textstyle O_{i}} andEi{\textstyle E_{i}} are the same as for the chi-square test,ln{\textstyle \ln } denotes thenatural logarithm, and the sum is taken over all non-empty bins. Furthermore, the total observed count should be equal to the total expected count:iOi=iEi=N{\displaystyle \sum _{i}O_{i}=\sum _{i}E_{i}=N}whereN{\textstyle N} is the total number of observations.

G-tests have been recommended at least since the 1981 edition of the popular statistics textbook byRobert R. Sokal andF. James Rohlf.[8]

See also

[edit]

References

[edit]
  1. ^Berk, Robert H.; Jones, Douglas H. (1979). "Goodness-of-fit test statistics that dominate the Kolmogorov statistics".Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete.47 (1):47–59.doi:10.1007/BF00533250.
  2. ^Moscovich, Amit; Nadler, Boaz; Spiegelman, Clifford (2016). "On the exact Berk-Jones statistics and their p-value calculation".Electronic Journal of Statistics.10 (2).arXiv:1311.3190.doi:10.1214/16-EJS1172.
  3. ^Liu, Qiang; Lee, Jason; Jordan, Michael (20 June 2016)."A Kernelized Stein Discrepancy for Goodness-of-fit Tests".Proceedings of the 33rd International Conference on Machine Learning. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 276–284.
  4. ^Chwialkowski, Kacper; Strathmann, Heiko; Gretton, Arthur (20 June 2016)."A Kernel Test of Goodness of Fit".Proceedings of the 33rd International Conference on Machine Learning. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 2606–2615.
  5. ^Zhang, Jin (2002)."Powerful goodness-of-fit tests based on the likelihood ratio"(PDF).J. R. Stat. Soc. B.64 (2):281–294.doi:10.1111/1467-9868.00337. Retrieved5 November 2018.
  6. ^Vexler, Albert; Gurevich, Gregory (2010). "Empirical Likelihood Ratios Applied to Goodness-of-Fit Tests Based on Sample Entropy".Computational Statistics and Data Analysis.54 (2):531–545.doi:10.1016/j.csda.2009.09.025.
  7. ^McDonald, J.H. (2014). "G–test of goodness-of-fit".Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58.
  8. ^Sokal, R. R.; Rohlf, F. J. (1981).Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.).W. H. Freeman.ISBN 0-7167-2411-1.

Further reading

[edit]
  • Huber-Carol, C.; Balakrishnan, N.; Nikulin, M. S.; Mesbah, M., eds. (2002),Goodness-of-Fit Tests and Model Validity,Springer
  • Ingster, Yu. I.; Suslina, I. A. (2003),Nonparametric Goodness-of-Fit Testing Under Gaussian Models,Springer
  • Rayner, J. C. W.; Thas, O.; Best, D. J. (2009),Smooth Tests of Goodness of Fit (2nd ed.),Wiley
  • Vexler, Albert; Gurevich, Gregory (2010), "Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy",Computational Statistics & Data Analysis,54 (2):531–545,doi:10.1016/j.csda.2009.09.025
Retrieved from "https://en.wikipedia.org/w/index.php?title=Goodness_of_fit&oldid=1246721899"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp