Movatterモバイル変換

[0]ホーム

Jump to content

Computational statistics

Edit links

From Wikipedia, the free encyclopedia

Interface between statistics and computer science

For the journal, seeComputational Statistics (journal).

Students working in the StatisticsMachine Room of theLondon School of Economics in 1964

Computational statistics, orstatistical computing, is the study which is the intersection ofstatistics andcomputer science, and refers to the statistical methods that are enabled by using computational methods. It is the area ofcomputational science (or scientific computing) specific to the mathematical science ofstatistics. This area is fast developing. The view that the broader concept of computing must be taught as part of generalstatistical education is gaining momentum.^[1]

As intraditional statistics the goal is to transformraw data intoknowledge,^[2] but the focus lies oncomputer intensivestatistical methods, such as cases with very largesample size and non-homogeneousdata sets.^[2]

The terms 'computational statistics' and 'statistical computing' are often used interchangeably, although Carlo Lauro (a former president of theInternational Association for Statistical Computing) proposed making a distinction, defining 'statistical computing' as "the application of computer science to statistics",and 'computational statistics' as "aiming at the design of algorithm for implementingstatistical methods on computers, including the ones unthinkable before the computerage (e.g.bootstrap,simulation), as well as to cope with analytically intractable problems" [sic].^[3]

The term 'Computational statistics' may also be used to refer to computationallyintensive statistical methods includingresampling methods,Markov chain Monte Carlo methods,local regression,kernel density estimation,artificial neural networks andgeneralized additive models.

History

[edit]

Though computational statistics is widely used today, it actually has a relatively short history of acceptance in thestatistics community. For the most part, the founders of the field of statistics relied on mathematics and asymptotic approximations in the development of computational statistical methodology.^[4]

In 1908,William Sealy Gosset performed his now well-knownMonte Carlo method simulation which led to the discovery of theStudent’s t-distribution.^[5] With the help of computational methods, he also has plots of the empirical distributions overlaid on the corresponding theoretical distributions. The computer has revolutionized simulation and has made the replication of Gosset’s experiment little more than an exercise.^[6]^[7]

Later on, the scientists put forward computational ways of generatingpseudo-random deviates, performed methods to convert uniform deviates into other distributional forms using inversecumulative distribution function or acceptance-rejection methods, and developed state-space methodology forMarkov chain Monte Carlo.^[8] One of the first efforts to generate random digits in a fully automated way, was undertaken by the RAND Corporation in 1947. Thetables produced were published as abook in 1955, and also as a series of punch cards.

By the mid-1950s, several articles and patents for devices had been proposed forrandom number generators.^[9] The development of these devices were motivated from the need to userandom digits to perform simulations and other fundamental components in statistical analysis. One of the most well known of such devices is ERNIE, which produces random numbers that determine the winners of thePremium Bond, a lottery bond issued in the United Kingdom. In 1958,John Tukey’sjackknife was developed. It is as a method to reduce thebias of parameter estimates in samples under nonstandard conditions.^[10] This requires computers for practical implementations. To this point, computers have made many tedious statistical studies feasible.^[11]

Methods

[edit]

Maximum likelihood estimation

[edit]

Maximum likelihood estimation is used toestimate theparameters of an assumedprobability distribution, given some observed data. It is achieved bymaximizing alikelihood function so that theobserved data is most probable under the assumedstatistical model.

Monte Carlo method

[edit]

Monte Carlo is a statistical method that relies on repeatedrandom sampling to obtain numerical results. The concept is to userandomness to solve problems that might bedeterministic in principle. They are often used inphysical andmathematical problems and are most useful when it is difficult to use other approaches. Monte Carlo methods are mainly used in three problem classes:optimization,numerical integration, and generating draws from aprobability distribution.

Markov chain Monte Carlo

[edit]

TheMarkov chain Monte Carlo method creates samples from a continuousrandom variable, withprobability density proportional to a known function. These samples can be used to evaluate an integral over that variable, such as itsexpected value orvariance. The more steps are included, the more closely the distribution of the sample matches the actual desired distribution.

Bootstrapping

[edit]

Thebootstrap is a resampling technique used to generate samples from anempirical probability distribution defined by an original sample of the population. It can be used to find a bootstrapped estimator of a population parameter. It can also be used to estimate the standard error of an estimator as well as to generate bootstrapped confidence intervals. Thejackknife is a related technique.^[12]

Applications

[edit]

Computational statistics journals

[edit]

Associations

[edit]

International Association for Statistical Computing

References

[edit]

^Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula",The American Statistician64 (2), pp.97-107.
^^a ^bWegman, Edward J. “Computational Statistics: A New Agenda for Statistical Theory and Practice.”Journal of the Washington Academy of Sciences, vol. 78, no. 4, 1988, pp. 310–322.JSTOR
^Lauro, Carlo (1996), "Computational statistics or statistical computing, is that the question?",Computational Statistics & Data Analysis,23 (1):191–193,doi:10.1016/0167-9473(96)88920-1
^Watnik, Mitchell (2011)."Early Computational Statistics".Journal of Computational and Graphical Statistics.20 (4):811–817.doi:10.1198/jcgs.2011.204b.ISSN 1061-8600.S2CID 120111510.
^"Student" [William Sealy Gosset] (1908)."The probable error of a mean"(PDF).Biometrika.6 (1):1–25.doi:10.1093/biomet/6.1.1.hdl:10338.dmlcz/143545.JSTOR 2331554.{{cite journal}}: CS1 maint: numeric names: authors list (link)
^Trahan, Travis John (2019-10-03). Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory (Report).doi:10.2172/1569710.OSTI 1569710.
^Metropolis, Nicholas; Ulam, S. (1949)."The Monte Carlo Method".Journal of the American Statistical Association.44 (247):335–341.doi:10.1080/01621459.1949.10483310.ISSN 0162-1459.PMID 18139350.
^Robert, Christian; Casella, George (2011-02-01)."A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data".Statistical Science.26 (1).arXiv:0808.2902.Bibcode:2011StaSc..26TS351R.doi:10.1214/10-sts351.ISSN 0883-4237.S2CID 2806098.
^Pierre L'Ecuyer (2017)."History of uniform random number generation"(PDF).2017 Winter Simulation Conference (WSC). pp. 202–230.doi:10.1109/WSC.2017.8247790.ISBN 978-1-5386-3428-8.S2CID 4567651.
^QUENOUILLE, M. H. (1956)."Notes on Bias in Estimation".Biometrika.43 (3–4):353–360.doi:10.1093/biomet/43.3-4.353.ISSN 0006-3444.
^Teichroew, Daniel (1965)."A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation".Journal of the American Statistical Association.60 (309):27–49.doi:10.1080/01621459.1965.10480773.ISSN 0162-1459.
^Rizzo, Maria (15 November 2007).Statistical Computing with R. CRC Press.ISBN 9781420010718.

External links

[edit]

Associations

[edit]

Journals

[edit]

v t e Computational science
Biology	Anatomy Biological systems Cognition Genomics Neuroscience Phylogenetics
Chemistry	Electronic structure Molecular mechanics Quantum mechanics
Physics	Astrophysics Electromagnetics Fluid dynamics Geophysics Mechanics Particle physics Thermodynamics
Linguistics	Semantics Lexicology
Social science	Politics Sociology Economics
Other	Finance Materials science Mathematics Statistics Engineering Semiotics Transportation science