Movatterモバイル変換

[0]ホーム

Jump to content

Bayesian probability

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromMetaprobability)

For broader coverage of this topic, seeBayesian statistics.

Bayesian statistics
Part of a series on

Posterior =Likelihood ×Prior ÷Evidence
Background
Bayesian inference Bayesian probability Bayes' theorem Bernstein–von Mises theorem Coherence Cox's theorem Cromwell's rule Likelihood principle Principle of indifference Principle of maximum entropy
Model building
Conjugate prior Linear regression Empirical Bayes Hierarchical model
Posterior approximation
Markov chain Monte Carlo Laplace's approximation Integrated nested Laplace approximations Variational inference Approximate Bayesian computation
Estimators
Bayesian estimator Credible interval Maximum a posteriori estimation
Evidence approximation
Evidence lower bound Nested sampling
Model evaluation
Bayes factor (Schwarz criterion) Model averaging Posterior predictive
Mathematics portal
v t e

Interpretation of probability

Bayesian probability (/ˈbeɪziən/BAY-zee-ən or/ˈbeɪʒən/BAY-zhən)^[1] is aninterpretation of the concept of probability, in which, instead offrequency orpropensity of some phenomenon, probability is interpreted as reasonable expectation^[2] representing a state of knowledge^[3] or as quantification of a personal belief.^[4]

The Bayesian interpretation of probability can be seen as an extension ofpropositional logic that enables reasoning withhypotheses;^[5]^[6] that is, with propositions whosetruth or falsity is unknown. In the Bayesian view, a probability is assigned to a hypothesis, whereas underfrequentist inference, a hypothesis is typically tested without being assigned a probability.

Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies aprior probability. This, in turn, is then updated to aposterior probability in the light of new, relevantdata (evidence).^[7] The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.

The termBayesian derives from the 18th-century mathematician and theologianThomas Bayes, who provided the first mathematical treatment of a non-trivial problem of statisticaldata analysis using what is now known asBayesian inference.^[8]^: 131 MathematicianPierre-Simon Laplace pioneered and popularized what is now called Bayesian probability.^[8]^: 97–98

Bayesian methodology

[edit]

Bayesian methods are characterized by concepts and procedures as follows:

The use ofrandom variables, or more generally unknown quantities,^[9] to model all sources ofuncertainty in statistical models including uncertainty resulting from lack of information (see alsoaleatoric and epistemic uncertainty).
The need to determine the priorprobability distribution taking into account the available (prior) information.
The sequential use ofBayes' theorem: as more data become available, calculate the posterior distribution using Bayes' theorem; subsequently, the posterior distribution becomes the next prior.
While for the frequentist, ahypothesis is aproposition (which must beeither true or false) so that the frequentist probability of a hypothesis is either 0 or 1, in Bayesian statistics, the probability that can be assigned to a hypothesis can also be in a range from 0 to 1 if the truth value is uncertain.

Objective and subjective Bayesian probabilities

[edit]

Broadly speaking, there are two interpretations of Bayesian probability. For objectivists, who interpret probability as an extension oflogic,probability quantifies the reasonable expectation that everyone (even a "robot") who shares the same knowledge should share in accordance with the rules of Bayesian statistics, which can be justified byCox's theorem.^[3]^[10] For subjectivists,probability corresponds to a personal belief.^[4] Rationality and coherence allow for substantial variation within the constraints they pose; the constraints are justified by theDutch book argument or bydecision theory andde Finetti's theorem.^[4] The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.

History

[edit]

Main article:History of statistics § Bayesian statistics

The termBayesian derives fromThomas Bayes (1702–1761), who proved a special case of what is now calledBayes' theorem in a paper titled "An Essay Towards Solving a Problem in the Doctrine of Chances".^[11] In that special case, the prior and posterior distributions werebeta distributions and the data came fromBernoulli trials. It wasPierre-Simon Laplace (1749–1827) who introduced a general version of the theorem and used it to approach problems incelestial mechanics, medical statistics,reliability, andjurisprudence.^[12] Early Bayesian inference, which used uniform priors following Laplace'sprinciple of insufficient reason, was called "inverse probability" (because itinfers backwards from observations to parameters, or from effects to causes).^[13] After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be calledfrequentist statistics.^[13]

In the 20th century, the ideas of Laplace developed in two directions, giving rise toobjective andsubjective currents in Bayesian practice.Harold Jeffreys'Theory of Probability (first published in 1939) played an important role in the revival of the Bayesian view of probability, followed by works byAbraham Wald (1950) andLeonard J. Savage (1954). The adjectiveBayesian itself dates to the 1950s; the derivedBayesianism,neo-Bayesianism is of 1960s coinage.^[14]^[15]^[16] In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed.^[17] No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery ofMarkov chain Monte Carlo methods and the consequent removal of many of the computational problems, and to an increasing interest in nonstandard, complex applications.^[18] While frequentist statistics remains strong (as demonstrated by the fact that much of undergraduate teaching is based on it^[19]), Bayesian methods are widely accepted and used, e.g., in the field ofmachine learning.^[20]

Justification

[edit]

The use of Bayesian probabilities as the basis ofBayesian inference has been supported by several arguments, such asCox axioms, theDutch book argument, arguments based ondecision theory andde Finetti's theorem.

Axiomatic approach

[edit]

Richard T. Cox showed that Bayesian updating follows from several axioms, including twofunctional equations and a hypothesis of differentiability.^[10]^[21] The assumption of differentiability or even continuity is controversial; Halpern found a counterexample based on his observation that the Boolean algebra of statements may be finite.^[22] Other axiomatizations have been suggested by various authors with the purpose of making the theory more rigorous.^[9]

Dutch book approach

[edit]

Main article:Dutch book

Bruno de Finetti proposed the Dutch book argument based on betting. A cleverbookmaker makes aDutch book by setting theodds and bets to ensure that the bookmaker profits—at the expense of the gamblers—regardless of the outcome of the event (a horse race, for example) on which the gamblers bet. It is associated withprobabilities implied by the odds not beingcoherent.

However,Ian Hacking noted that traditional Dutch book arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. For example,Hacking writes^[23]^[24] "And neither the Dutch book argument, nor any other in the personalist arsenal of proofs of the probability axioms, entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour."

In fact, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics"^[25] following the publication ofRichard C. Jeffrey's rule, which is itself regarded as Bayesian^[26]). The additional hypotheses sufficient to (uniquely) specify Bayesian updating are substantial^[27] and not universally seen as satisfactory.^[28]

Decision theory approach

[edit]

Adecision-theoretic justification of the use of Bayesian inference (and hence of Bayesian probabilities) was given byAbraham Wald, who proved that everyadmissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.^[29] Conversely, every Bayesian procedure isadmissible.^[30]

Personal probabilities and objective methods for constructing priors

[edit]

Following the work onexpected utility theory ofRamsey andvon Neumann, decision-theorists have accounted forrational behavior using a probability distribution for theagent.Johann Pfanzagl completed theTheory of Games and Economic Behavior by providing an axiomatization of subjective probability and utility, a task left uncompleted by von Neumann andOskar Morgenstern: their original theory supposed that all the agents had the same probability distribution, as a convenience.^[31] Pfanzagl's axiomatization was endorsed by Oskar Morgenstern: "Von Neumann and I have anticipated ... [the question whether probabilities] might, perhaps more typically, be subjective and have stated specifically that in the latter case axioms could be found from which could derive the desired numerical utility together with a number for the probabilities (cf. p. 19 of The Theory of Games and Economic Behavior). We did not carry this out; it was demonstrated by Pfanzagl ... with all the necessary rigor".^[32]

Ramsey andSavage noted that the individual agent's probability distribution could be objectively studied in experiments. Procedures fortesting hypotheses about probabilities (using finite samples) are due toRamsey (1931) andde Finetti (1931, 1937, 1964, 1970). BothBruno de Finetti^[33]^[34] andFrank P. Ramsey^[34]^[35] acknowledge their debts topragmatic philosophy, particularly (for Ramsey) toCharles S. Peirce.^[34]^[35]

The "Ramsey test" for evaluating probability distributions is implementable in theory, and has kept experimental psychologists occupied for a half century.^[36]This work demonstrates that Bayesian-probability propositions can befalsified, and so meet an empirical criterion ofCharles S. Peirce, whose work inspired Ramsey. (Thisfalsifiability-criterion was popularized byKarl Popper.^[37]^[38])

Modern work on the experimental evaluation of personal probabilities uses the randomization,blinding, and Boolean-decision procedures of the Peirce-Jastrow experiment.^[39] Since individuals act according to different probability judgments, these agents' probabilities are "personal" (but amenable to objective study).

Personal probabilities are problematic for science and for some applications where decision-makers lack the knowledge or time to specify an informed probability-distribution (on which they are prepared to act). To meet the needs of science and of human limitations, Bayesian statisticians have developed "objective" methods for specifying prior probabilities.

Indeed, some Bayesians have argued the prior state of knowledge definesthe (unique) prior probability-distribution for "regular" statistical problems; cf.well-posed problems. Finding the right method for constructing such "objective" priors (for appropriate classes of regular problems) has been the quest of statistical theorists from Laplace toJohn Maynard Keynes,Harold Jeffreys, andEdwin Thompson Jaynes. These theorists and their successors have suggested several methods for constructing "objective" priors (Unfortunately, it is not always clear how to assess the relative "objectivity" of the priors proposed under these methods):

Each of these methods contributes useful priors for "regular" one-parameter problems, and each prior can handle some challengingstatistical models (with "irregularity" or several parameters). Each of these methods has been useful in Bayesian practice. Indeed, methods for constructing "objective" (alternatively, "default" or "ignorance") priors have been developed by avowed subjective (or "personal") Bayesians likeJames Berger (Duke University) andJosé-Miguel Bernardo (Universitat de València), simply because such priors are needed for Bayesian practice, particularly in science.^[40] The quest for "the universal method for constructing priors" continues to attract statistical theorists.^[40]

Thus, the Bayesian statistician needs either to use informed priors (using relevant expertise or previous data) or to choose among the competing methods for constructing "objective" priors.

References

[edit]

^"Bayesian".Merriam-Webster.com Dictionary. Merriam-Webster.
^Cox, R.T. (1946). "Probability, Frequency, and Reasonable Expectation".American Journal of Physics.14 (1):1–10.Bibcode:1946AmJPh..14....1C.doi:10.1119/1.1990764.
^^a ^bJaynes, E.T. (1986). "Bayesian Methods: General Background". In Justice, J. H. (ed.).Maximum-Entropy and Bayesian Methods in Applied Statistics. Cambridge: Cambridge University Press.CiteSeerX 10.1.1.41.1055.
^^a ^b ^cde Finetti, Bruno (2017).Theory of Probability: A critical introductory treatment. Chichester: John Wiley & Sons Ltd.ISBN 9781119286370.
^Hailperin, Theodore (1996).Sentential Probability Logic: Origins, Development, Current Status, and Technical Applications. London: Associated University Presses.ISBN 0934223459.
^Howson, Colin (2001). "The Logic of Bayesian Probability". In Corfield, D.; Williamson, J. (eds.).Foundations of Bayesianism. Dordrecht: Kluwer. pp. 137–159.ISBN 1-4020-0223-8.
^Paulos, John Allen (5 August 2011)."The Mathematics of Changing Your Mind [by Sharon Bertsch McGrayne]". Book Review.New York Times. Archived fromthe original on 2022-01-01. Retrieved2011-08-06.
^^a ^bStigler, Stephen M. (March 1990).The history of statistics. Harvard University Press.ISBN 9780674403413.
^^a ^bDupré, Maurice J.; Tipler, Frank J. (2009)."New axioms for rigorous Bayesian probability".Bayesian Analysis.4 (3):599–606.CiteSeerX 10.1.1.612.3036.doi:10.1214/09-BA422.
^^a ^bCox, Richard T. (1961).The algebra of probable inference (Reprint ed.). Baltimore, MD; London, UK: Johns Hopkins Press; Oxford University Press [distributor].ISBN 9780801869822.
^McGrayne, Sharon Bertsch (2011).The Theory that Would not Die.[https://archive.org/details/theorythatwouldn0000mcgr/page/10 10 ], p.10, atGoogle Books.
^Stigler, Stephen M. (1986)."Chapter 3".The History of Statistics. Harvard University Press.ISBN 9780674403406.
^^a ^bFienberg, Stephen. E. (2006)."When did Bayesian Inference become "Bayesian"?"(PDF).Bayesian Analysis.1 (1): 5,1–40.doi:10.1214/06-BA101. Archived fromthe original(PDF) on 10 September 2014.
^Harris, Marshall Dees (1959). "Recent developments of the so-called Bayesian approach to statistics". Agricultural Law Center.Legal-Economic Research. University of Iowa: 125 (fn. #52), 126.The works ofWald,Statistical Decision Functions (1950) andSavage,The Foundation of Statistics (1954) are commonly regarded starting points for current Bayesian approaches
^Annals of the Computation Laboratory of Harvard University. Vol. 31. 1962. p. 180.This revolution, which may or may not succeed, is neo-Bayesianism. Jeffreys tried to introduce this approach, but did not succeed at the time in giving it general appeal.
^Kempthorne, Oscar (1967).The Classical Problem of Inference—Goodness of Fit. Fifth Berkeley Symposium on Mathematical Statistics and Probability. p. 235.It is curious that even in its activities unrelated to ethics, humanity searches for a religion. At the present time, the religion being 'pushed' the hardest is Bayesianism.
^Bernardo, J.M. (2005). "Reference analysis".Bayesian Thinking - Modeling and Computation. Handbook of Statistics. Vol. 25. Handbook of Statistics. pp. 17–90.doi:10.1016/S0169-7161(05)25002-2.ISBN 9780444515391.
^Wolpert, R.L. (2004)."A conversation with James O. Berger".Statistical Science.9:205–218.doi:10.1214/088342304000000053.
^Bernardo, José M. (2006).A Bayesian mathematical statistics primer(PDF). ICOTS-7. Bern.Archived(PDF) from the original on 2022-10-09.
^Bishop, C.M. (2007).Pattern Recognition and Machine Learning. Springer.
^Smith, C. Ray; Erickson, Gary (1989). "From Rationality and Consistency to Bayesian Probability". In Skilling, John (ed.).Maximum Entropy and Bayesian Methods. Dordrecht: Kluwer. pp. 29–44.doi:10.1007/978-94-015-7860-8_2.ISBN 0-7923-0224-9.
^Halpern, J. (1999)."A counterexample to theorems of Cox and Fine"(PDF).Journal of Artificial Intelligence Research.10:67–85.doi:10.1613/jair.536.S2CID 1538503.Archived(PDF) from the original on 2022-10-09.
^Hacking (1967), Section 3, page 316
^Hacking (1988, page 124)
^Skyrms, Brian (1 January 1987). "Dynamic Coherence and Probability Kinematics".Philosophy of Science.54 (1):1–20.CiteSeerX 10.1.1.395.5723.doi:10.1086/289350.JSTOR 187470.S2CID 120881078.
^Joyce, James (30 September 2003)."Bayes' Theorem".The Stanford Encyclopedia of Philosophy. stanford.edu.
^Fuchs, Christopher A.; Schack, Rüdiger (1 January 2012). "Bayesian Conditioning, the Reflection Principle, and Quantum Decoherence". In Ben-Menahem, Yemima; Hemmo, Meir (eds.).Probability in Physics. The Frontiers Collection. Springer Berlin Heidelberg. pp. 233–247.arXiv:1103.5950.doi:10.1007/978-3-642-21329-8_15.ISBN 9783642213281.S2CID 119215115.
^van Frassen, Bas (1989).Laws and Symmetry. Oxford University Press.ISBN 0-19-824860-1.
^Wald, Abraham (1950).Statistical Decision Functions. Wiley.
^Bernardo, José M.; Smith, Adrian F.M. (1994).Bayesian Theory. John Wiley.ISBN 0-471-92416-4.
^Pfanzagl (1967, 1968)
^Morgenstern (1976, page 65)
^Galavotti, Maria Carla (1 January 1989). "Anti-Realism in the Philosophy of Probability: Bruno de Finetti's Subjectivism".Erkenntnis.31 (2/3):239–261.doi:10.1007/bf01236565.JSTOR 20012239.S2CID 170802937.
^^a ^b ^cGalavotti, Maria Carla (1 December 1991). "The notion of subjective probability in the work of Ramsey and de Finetti".Theoria.57 (3):239–259.doi:10.1111/j.1755-2567.1991.tb00839.x.ISSN 1755-2567.
^^a ^bDokic, Jérôme; Engel, Pascal (2003).Frank Ramsey: Truth and Success. Routledge.ISBN 9781134445936.
^Davidson et al. (1957)
^Thornton, Stephen (7 August 2018). "Karl Popper".Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University.
^Popper, Karl (2002) [1959].The Logic of Scientific Discovery (2nd ed.). Routledge. p. 57.ISBN 0-415-27843-0 – via Google Books. (translation of 1935 original, in German).
^Peirce & Jastrow (1885)
^^a ^bBernardo, J. M. (2005). "Reference Analysis". In Dey, D.K.;Rao, C. R. (eds.).Handbook of Statistics(PDF). Vol. 25. Amsterdam: Elsevier. pp. 17–90.Archived(PDF) from the original on 2022-10-09.

Bibliography

[edit]

Berger, James O. (1985).Statistical Decision Theory and Bayesian Analysis. Springer Series in Statistics (Second ed.). Springer-Verlag.ISBN 978-0-387-96098-2.
Bessière, Pierre; Mazer, E.; Ahuacatzin, J.-M.; Mekhnacha, K. (2013).Bayesian Programming. CRC Press.ISBN 9781439880326.
Bernardo, José M.;Smith, Adrian F.M. (1994).Bayesian Theory. Wiley.ISBN 978-0-471-49464-5.
Bickel, Peter J.; Doksum, Kjell A. (2001) [1976].Basic and selected topics. Mathematical Statistics. Vol. 1 (Second ed.). Pearson Prentice–Hall.ISBN 978-0-13-850363-5.MR 0443141.(updated printing, 2007, of Holden-Day, 1976)
Davidson, Donald;Suppes, Patrick;Siegel, Sidney (1957).Decision-Making: an Experimental Approach.Stanford University Press.
de Finetti, Bruno (1937)."La Prévision: ses lois logiques, ses sources subjectives" [Foresight: Its logical laws, its subjective sources].Annales de l'Institut Henri Poincaré (in French).7 (1):1–68.
de Finetti, Bruno (September 1989) [1931]. "Probabilism: A critical essay on the theory of probability and on the value of science".Erkenntnis.31. (translation of de Finetti, 1931)
de Finetti, Bruno (1964) [1937]. "Foresight: Its logical laws, its subjective sources". In Kyburg, H.E.; Smokler, H.E. (eds.).Studies in Subjective Probability. New York, NY: Wiley. (translation of de Finetti, 1937, above)
de Finetti, Bruno (1974–1975) [1970].Theory of Probability: A critical introductory treatment. Translated by Machi, A.;Smith, AFM. Wiley.ISBN 0-471-20141-3.,ISBN 0-471-20142-1, two volumes.
Goertz, Gary and James Mahoney. 2012.A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press.
DeGroot, Morris (2004) [1970].Optimal Statistical Decisions. Wiley Classics Library. Wiley.ISBN 0-471-68029-X..
Hacking, Ian (December 1967). "Slightly more realistic personal probability".Philosophy of Science.34 (4):311–325.doi:10.1086/288169.JSTOR 186120.S2CID 14344339.

(Partly reprinted inGärdenfors, Peter; Sahlin, Nils-Eric (1988).Decision, Probability, and Utility: Selected Readings. Cambridge University Press.ISBN 0-521-33658-9.)

Hajek, A.; Hartmann, S. (2010) [2001]. "Bayesian Epistemology". In Dancy, J.; Sosa, E.; Steup, M. (eds.).A Companion to Epistemology(PDF). Wiley.ISBN 978-1-4051-3900-7. Archived fromthe original(PDF) on 2011-07-28.
Hald, Anders (1998).A History of Mathematical Statistics from 1750 to 1930. New York: Wiley.ISBN 978-0-471-17912-2.
Hartmann, S.; Sprenger, J. (2011). "Bayesian Epistemology". In Bernecker, S.; Pritchard, D. (eds.).Routledge Companion to Epistemology(PDF). Routledge.ISBN 978-0-415-96219-3. Archived fromthe original(PDF) on 2011-07-28.
"Bayesian approach to statistical problems",Encyclopedia of Mathematics,EMS Press, 2001 [1994]
Howson, C.; Urbach, P. (2005).Scientific Reasoning: The Bayesian approach (3rd ed.).Open Court Publishing Company.ISBN 978-0-8126-9578-6.
Jaynes, E.T. (2003).Probability Theory: The logic of science. C. University Press.ISBN 978-0-521-59271-0. ("Link to fragmentary edition of March 1996".
McGrayne, S.B. (2011).The Theory that would not Die: How Bayes' rule cracked the Enigma code, hunted down Russian submarines, and emerged triumphant from two centuries of controversy. New Haven, CT: Yale University Press.ISBN 9780300169690.OCLC 670481486.
Morgenstern, Oskar (1978). "Some Reflections onUtility". In Schotter, Andrew (ed.).Selected Economic Writings of Oskar Morgenstern. New York University Press. pp. 65–70.ISBN 978-0-8147-7771-8.
Peirce, C.S. &Jastrow J. (1885)."On Small Differences in Sensation".Memoirs of the National Academy of Sciences.3:73–83.
Pfanzagl, J (1967)."Subjective Probability Derived from the Morgenstern-von Neumann Utility Theory". InMartin Shubik (ed.).Essays in Mathematical Economics In Honor of Oskar Morgenstern. Princeton University Press. pp. 237–251.
Pfanzagl, J.; Baumann, V. & Huber, H. (1968). "Events, Utility and Subjective Probability".Theory of Measurement. Wiley. pp. 195–220.
Ramsey, Frank Plumpton (2001) [1931]. "Chapter VII: Truth and Probability".The Foundations of Mathematics and other Logical Essays. Routledge.ISBN 0-415-22546-9."Chapter VII: Truth and Probability"(PDF). Archived fromthe original(PDF) on 2008-02-27.
Stigler, S.M. (1990).The History of Statistics: The Measurement of Uncertainty before 1900. Belknap Press; Harvard University Press.ISBN 978-0-674-40341-3.
Stigler, S.M. (1999).Statistics on the Table: The history of statistical concepts and methods. Harvard University Press.ISBN 0-674-83601-4.
Stone, J.V. (2013).Bayes' Rule: A tutorial introduction to Bayesian analysis. England: Sebtel Press."Chapter 1 ofBayes' Rule".
Winkler, R.L. (2003).Introduction to Bayesian Inference and Decision (2nd ed.). Probabilistic.ISBN 978-0-9647938-4-2.Updated classic textbook. Bayesian theory clearly presented