Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Bayesian inference

From Wikipedia, the free encyclopedia
Method of statistical inference
Part of a series on
Bayesian statistics
Posterior =Likelihood ×Prior ÷Evidence
Background
Model building
Posterior approximation
Estimators
Evidence approximation
Model evaluation

Bayesian inference (/ˈbziən/BAY-zee-ən or/ˈbʒən/BAY-zhən)[1] is a method ofstatistical inference in whichBayes' theorem is used to calculate a probability of a hypothesis, given priorevidence, and update it as moreinformation becomes available. Fundamentally, Bayesian inference uses aprior distribution to estimateposterior probabilities. Bayesian inference is an important technique instatistics, and especially inmathematical statistics. Bayesian updating is particularly important in thedynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, includingscience,engineering,philosophy,medicine,sport, andlaw. In the philosophy ofdecision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

Introduction to Bayes' rule

[edit]
A geometric visualisation of Bayes' theorem. In the table, the values 2, 3, 6 and 9 give the relative weights of each corresponding condition and case. The figures denote the cells of the table involved in each metric, the probability being the fraction of each figure that is shaded. This shows thatP(A|B)P(B)=P(B|A)P(A){\displaystyle P(A|B)P(B)=P(B|A)P(A)} i.e.P(A|B)=P(B|A)P(A)P(B){\displaystyle P(A|B)={\frac {P(B|A)P(A)}{P(B)}}}. Similar reasoning can be used to show thatP(¬A|B)=P(B|¬A)P(¬A)P(B){\displaystyle P(\neg A|B)={\frac {P(B|\neg A)P(\neg A)}{P(B)}}} etc.
Main article:Bayes' theorem
See also:Bayesian probability

Formal explanation

[edit]
Contingency table
Hypothesis


Evidence
Satisfies
hypothesis
H
Violates
hypothesis
¬H{\displaystyle \neg H}

Total
Has evidence
E
P(H|E)P(E){\displaystyle P(H|E)\cdot P(E)}
=P(E|H)P(H){\displaystyle =P(E|H)\cdot P(H)}
P(¬H|E)P(E){\displaystyle P(\neg H|E)\cdot P(E)}
=P(E|¬H)P(¬H){\displaystyle =P(E|\neg H)\cdot P(\neg H)}
P(E){\displaystyle P(E)}
No evidence
¬E{\displaystyle \neg E}
P(H|¬E)P(¬E){\displaystyle P(H|\neg E)\cdot P(\neg E)}
=P(¬E|H)P(H){\displaystyle =P(\neg E|H)\cdot P(H)}
P(¬H|¬E)P(¬E){\displaystyle P(\neg H|\neg E)\cdot P(\neg E)}
=P(¬E|¬H)P(¬H){\displaystyle =P(\neg E|\neg H)\cdot P(\neg H)}
P(¬E){\displaystyle P(\neg E)}=
1P(E){\displaystyle 1-P(E)}
Total  P(H){\displaystyle P(H)}P(¬H)=1P(H){\displaystyle P(\neg H)=1-P(H)}1

Bayesian inference derives theposterior probability as aconsequence of twoantecedents: aprior probability and a "likelihood function" derived from astatistical model for the observed data. Bayesian inference computes the posterior probability according toBayes' theorem:

P(HE)=P(EH)P(H)P(E),{\displaystyle P(H\mid E)={\frac {P(E\mid H)\cdot P(H)}{P(E)}},}

where

  • H stands for anyhypothesis whose probability may be affected bydata (calledevidence below). Often there are competing hypotheses, and the task is to determine which is the most probable.
  • P(H){\displaystyle P(H)}, theprior probability, is the estimate of the probability of the hypothesisHbefore the dataE, the current evidence, is observed.
  • E, theevidence, corresponds to new data that were not used in computing the prior probability.
  • P(HE){\displaystyle P(H\mid E)}, theposterior probability, is the probability ofHgivenE, i.e.,afterE is observed. This is what we want to know: the probability of a hypothesisgiven the observed evidence.
  • P(EH){\displaystyle P(E\mid H)} is the probability of observingEgivenH and is called thelikelihood. As a function ofE withH fixed, it indicates the compatibility of the evidence with the given hypothesis. The likelihood function is a function of the evidence,E, while the posterior probability is a function of the hypothesis,H.
  • P(E){\displaystyle P(E)} is sometimes termed themarginal likelihood or "model evidence". This factor is the same for all possible hypotheses being considered (as is evident from the fact that the hypothesisH does not appear anywhere in the symbol, unlike for all the other factors) and hence does not factor into determining the relative probabilities of different hypotheses.
  • P(E)>0{\displaystyle P(E)>0} (Else one has0/0{\displaystyle 0/0}.)

For different values ofH, only the factorsP(H){\displaystyle P(H)} andP(EH){\displaystyle P(E\mid H)}, both in the numerator, affect the value ofP(HE){\displaystyle P(H\mid E)} – the posterior probability of a hypothesis is proportional to its prior probability (its inherent likeliness) and the newly acquired likelihood (its compatibility with the new observed evidence).

In cases where¬H{\displaystyle \neg H} ("notH"), thelogical negation ofH, is a valid likelihood, Bayes' rule can be rewritten as follows:

P(HE)=P(EH)P(H)P(E)=P(EH)P(H)P(EH)P(H)+P(E¬H)P(¬H)=11+(1P(H)1)P(E¬H)P(EH){\displaystyle {\begin{aligned}P(H\mid E)&={\frac {P(E\mid H)P(H)}{P(E)}}\\\\&={\frac {P(E\mid H)P(H)}{P(E\mid H)P(H)+P(E\mid \neg H)P(\neg H)}}\\\\&={\frac {1}{1+\left({\frac {1}{P(H)}}-1\right){\frac {P(E\mid \neg H)}{P(E\mid H)}}}}\\\end{aligned}}}

because

P(E)=P(EH)P(H)+P(E¬H)P(¬H){\displaystyle P(E)=P(E\mid H)P(H)+P(E\mid \neg H)P(\neg H)}

and

P(H)+P(¬H)=1.{\displaystyle P(H)+P(\neg H)=1.}

This focuses attention on the term

(1P(H)1)P(E¬H)P(EH).{\displaystyle \left({\tfrac {1}{P(H)}}-1\right){\tfrac {P(E\mid \neg H)}{P(E\mid H)}}.}

If that term is approximately 1, then the probability of the hypothesis given the evidence,P(HE){\displaystyle P(H\mid E)}, is about12{\displaystyle {\tfrac {1}{2}}}, about 50% likely - equally likely or not likely. If that term is very small, close to zero, then the probability of the hypothesis, given the evidence,P(HE){\displaystyle P(H\mid E)} is close to 1 or the conditional hypothesis is quite likely. If that term is very large, much larger than 1, then the hypothesis, given the evidence, is quite unlikely. If the hypothesis (without consideration of evidence) is unlikely, thenP(H){\displaystyle P(H)} is small (but not necessarily astronomically small) and1P(H){\displaystyle {\tfrac {1}{P(H)}}} is much larger than 1 and this term can be approximated asP(E¬H)P(EH)P(H){\displaystyle {\tfrac {P(E\mid \neg H)}{P(E\mid H)\cdot P(H)}}} and relevant probabilities can be compared directly to each other.

One quick and easy way to remember the equation would be to userule of multiplication:

P(EH)=P(EH)P(H)=P(HE)P(E).{\displaystyle P(E\cap H)=P(E\mid H)P(H)=P(H\mid E)P(E).}

Alternatives to Bayesian updating

[edit]

Bayesian updating is widely used and computationally convenient. However, it is not the only updating rule that might be considered rational.

Ian Hacking noted that traditional "Dutch book" arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. Hacking wrote:[2] "And neither the Dutch book argument nor any other in the personalist arsenal of proofs of the probability axioms entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour."

Indeed, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics") following the publication ofRichard C. Jeffrey's rule, which applies Bayes' rule to the case where the evidence itself is assigned a probability.[3] The additional hypotheses needed to uniquely require Bayesian updating have been deemed to be substantial, complicated, and unsatisfactory.[4]

Inference over exclusive and exhaustive possibilities

[edit]

If evidence is simultaneously used to update belief over a set of exclusive and exhaustive propositions, Bayesian inference may be thought of as acting on this belief distribution as a whole.

General formulation

[edit]
Diagram illustrating event spaceΩ{\displaystyle \Omega } in general formulation of Bayesian inference. Although this diagram shows discrete models and events, the continuous case may be visualized similarly using probability densities.

Suppose a process is generating independent and identically distributed eventsEn, n=1,2,3,{\displaystyle E_{n},\ n=1,2,3,\ldots }, but theprobability distribution is unknown. Let the event spaceΩ{\displaystyle \Omega } represent the current state of belief for this process. Each model is represented by eventMm{\displaystyle M_{m}}. The conditional probabilitiesP(EnMm){\displaystyle P(E_{n}\mid M_{m})} are specified to define the models.P(Mm){\displaystyle P(M_{m})} is thedegree of belief inMm{\displaystyle M_{m}}. Before the first inference step,{P(Mm)}{\displaystyle \{P(M_{m})\}} is a set ofinitial prior probabilities. These must sum to 1, but are otherwise arbitrary.

Suppose that the process is observed to generateE{En}{\displaystyle E\in \{E_{n}\}}. For eachM{Mm}{\displaystyle M\in \{M_{m}\}}, the priorP(M){\displaystyle P(M)} is updated to the posteriorP(ME){\displaystyle P(M\mid E)}. FromBayes' theorem:[5]

P(ME)=P(EM)mP(EMm)P(Mm)P(M).{\displaystyle P(M\mid E)={\frac {P(E\mid M)}{\sum _{m}{P(E\mid M_{m})P(M_{m})}}}\cdot P(M).}

Upon observation of further evidence, this procedure may be repeated.

Multiple observations

[edit]

For a sequence ofindependent and identically distributed observationsE=(e1,,en){\displaystyle \mathbf {E} =(e_{1},\dots ,e_{n})}, it can be shown by induction that repeated application of the above is equivalent toP(ME)=P(EM)mP(EMm)P(Mm)P(M),{\displaystyle P(M\mid \mathbf {E} )={\frac {P(\mathbf {E} \mid M)}{\sum _{m}{P(\mathbf {E} \mid M_{m})P(M_{m})}}}\cdot P(M),}whereP(EM)=kP(ekM).{\displaystyle P(\mathbf {E} \mid M)=\prod _{k}{P(e_{k}\mid M)}.}

Parametric formulation: motivating the formal description

[edit]

By parameterizing the space of models, the belief in all models may be updated in a single step. The distribution of belief over the model space may then be thought of as a distribution of belief over the parameter space. The distributions in this section are expressed as continuous, represented by probability densities, as this is the usual situation. The technique is, however, equally applicable to discrete distributions.

Let the vectorθ{\displaystyle {\boldsymbol {\theta }}} span the parameter space. Let the initial prior distribution overθ{\displaystyle {\boldsymbol {\theta }}} bep(θα){\displaystyle p({\boldsymbol {\theta }}\mid {\boldsymbol {\alpha }})}, whereα{\displaystyle {\boldsymbol {\alpha }}} is a set of parameters to the prior itself, orhyperparameters. LetE=(e1,,en){\displaystyle \mathbf {E} =(e_{1},\dots ,e_{n})} be a sequence ofindependent and identically distributed event observations, where allei{\displaystyle e_{i}} are distributed asp(eθ){\displaystyle p(e\mid {\boldsymbol {\theta }})} for someθ{\displaystyle {\boldsymbol {\theta }}}.Bayes' theorem is applied to find theposterior distribution overθ{\displaystyle {\boldsymbol {\theta }}}:

p(θE,α)=p(Eθ,α)p(Eα)p(θα)=p(Eθ,α)p(Eθ,α)p(θα)dθp(θα),{\displaystyle {\begin{aligned}p({\boldsymbol {\theta }}\mid \mathbf {E} ,{\boldsymbol {\alpha }})&={\frac {p(\mathbf {E} \mid {\boldsymbol {\theta }},{\boldsymbol {\alpha }})}{p(\mathbf {E} \mid {\boldsymbol {\alpha }})}}\cdot p({\boldsymbol {\theta }}\mid {\boldsymbol {\alpha }})\\&={\frac {p(\mathbf {E} \mid {\boldsymbol {\theta }},{\boldsymbol {\alpha }})}{\int p(\mathbf {E} \mid {\boldsymbol {\theta }},{\boldsymbol {\alpha }})p({\boldsymbol {\theta }}\mid {\boldsymbol {\alpha }})\,d{\boldsymbol {\theta }}}}\cdot p({\boldsymbol {\theta }}\mid {\boldsymbol {\alpha }}),\end{aligned}}}wherep(Eθ,α)=kp(ekθ).{\displaystyle p(\mathbf {E} \mid {\boldsymbol {\theta }},{\boldsymbol {\alpha }})=\prod _{k}p(e_{k}\mid {\boldsymbol {\theta }}).}

Formal description of Bayesian inference

[edit]

Definitions

[edit]

Bayesian inference

[edit]

PXy(A)=E(1A(X)|Y=y){\displaystyle P_{X}^{y}(A)=E(1_{A}(X)|Y=y)}Existence and uniqueness of the neededconditional expectation is a consequence of theRadon–Nikodym theorem. This was formulated byKolmogorov in his famous book from 1933. Kolmogorov underlines the importance of conditional probability by writing "I wish to call attention to ... and especially the theory of conditional probabilities and conditional expectations ..." in the Preface.[8] The Bayes theorem determines the posterior distribution from the prior distribution. Uniqueness requires continuity assumptions.[9] Bayes' theorem can be generalized to include improper prior distributions such as the uniform distribution on the real line.[10] ModernMarkov chain Monte Carlo methods have boosted the importance of Bayes' theorem including cases with improper priors.[11]

Bayesian prediction

[edit]

Bayesian theory calls for the use of the posterior predictive distribution to dopredictive inference, i.e., topredict the distribution of a new, unobserved data point. That is, instead of a fixed point as a prediction, a distribution over possible points is returned. Only this way is the entire posterior distribution of the parameter(s) used. By comparison, prediction infrequentist statistics often involves finding an optimum point estimate of the parameter(s)—e.g., bymaximum likelihood ormaximum a posteriori estimation (MAP)—and then plugging this estimate into the formula for the distribution of a data point. This has the disadvantage that it does not account for any uncertainty in the value of the parameter, and hence will underestimate thevariance of the predictive distribution.

In some instances, frequentist statistics can work around this problem. For example,confidence intervals andprediction intervals in frequentist statistics when constructed from anormal distribution with unknownmean andvariance are constructed using aStudent's t-distribution. This correctly estimates the variance, due to the facts that (1) the average of normally distributed random variables is also normally distributed, and (2) the predictive distribution of a normally distributed data point with unknown mean and variance, using conjugate or uninformative priors, has a Student's t-distribution. In Bayesian statistics, however, the posterior predictive distribution can always be determined exactly—or at least to an arbitrary level of precision when numerical methods are used.

Both types of predictive distributions have the form of acompound probability distribution (as does themarginal likelihood). In fact, if the prior distribution is aconjugate prior, such that the prior and posterior distributions come from the same family, it can be seen that both prior and posterior predictive distributions also come from the same family of compound distributions. The only difference is that the posterior predictive distribution uses the updated values of the hyperparameters (applying the Bayesian update rules given in theconjugate prior article), while the prior predictive distribution uses the values of the hyperparameters that appear in the prior distribution.

Mathematical properties

[edit]
This section includes a list ofgeneral references, butit lacks sufficient correspondinginline citations. Please help toimprove this section byintroducing more precise citations.(February 2012) (Learn how and when to remove this message)

Interpretation of factor

[edit]

P(EM)P(E)>1P(EM)>P(E){\textstyle {\frac {P(E\mid M)}{P(E)}}>1\Rightarrow P(E\mid M)>P(E)}. That is, if the model were true, the evidence would be more likely than is predicted by the current state of belief. The reverse applies for a decrease in belief. If the belief does not change,P(EM)P(E)=1P(EM)=P(E){\textstyle {\frac {P(E\mid M)}{P(E)}}=1\Rightarrow P(E\mid M)=P(E)}. That is, the evidence is independent of the model. If the model were true, the evidence would be exactly as likely as predicted by the current state of belief.

Cromwell's rule

[edit]
Main article:Cromwell's rule

IfP(M)=0{\displaystyle P(M)=0} thenP(ME)=0{\displaystyle P(M\mid E)=0}. IfP(M)=1{\displaystyle P(M)=1} andP(E)>0{\displaystyle P(E)>0}, thenP(M|E)=1{\displaystyle P(M|E)=1}. This can be interpreted to mean that hard convictions are insensitive to counter-evidence.

The former follows directly from Bayes' theorem. The latter can be derived by applying the first rule to the event "notM{\displaystyle M}" in place of "M{\displaystyle M}", yielding "if1P(M)=0{\displaystyle 1-P(M)=0}, then1P(ME)=0{\displaystyle 1-P(M\mid E)=0}", from which the result immediately follows.

Asymptotic behaviour of posterior

[edit]

Consider the behaviour of a belief distribution as it is updated a large number of times withindependent and identically distributed trials. For sufficiently nice prior probabilities, theBernstein-von Mises theorem gives that in the limit of infinite trials, the posterior converges to aGaussian distribution independent of the initial prior under some conditions firstly outlined and rigorously proven byJoseph L. Doob in 1948, namely if the random variable in consideration has a finiteprobability space. The more general results were obtained later by the statisticianDavid A. Freedman who published in two seminal research papers in 1963[12] and 1965[13] when and under what circumstances the asymptotic behaviour of posterior is guaranteed. His 1963 paper treats, like Doob (1949), the finite case and comes to a satisfactory conclusion. However, if the random variable has an infinite but countableprobability space (i.e., corresponding to a die with infinite many faces) the 1965 paper demonstrates that for a dense subset of priors theBernstein-von Mises theorem is not applicable. In this case there isalmost surely no asymptotic convergence. Later in the 1980s and 1990sFreedman andPersi Diaconis continued to work on the case of infinite countable probability spaces.[14] To summarise, there may be insufficient trials to suppress the effects of the initial choice, and especially for large (but finite) systems the convergence might be very slow.

Conjugate priors

[edit]
Main article:Conjugate prior

In parameterized form, the prior distribution is often assumed to come from a family of distributions calledconjugate priors. The usefulness of a conjugate prior is that the corresponding posterior distribution will be in the same family, and the calculation may be expressed inclosed form.

Estimates of parameters and predictions

[edit]

It is often desired to use a posterior distribution to estimate a parameter or variable. Several methods of Bayesian estimation selectmeasurements of central tendency from the posterior distribution.

For one-dimensional problems, a unique median exists for practical continuous problems. The posterior median is attractive as arobust estimator.[15]

If there exists a finite mean for the posterior distribution, then the posterior mean is a method of estimation.[16]θ~=E[θ]=θp(θX,α)dθ{\displaystyle {\tilde {\theta }}=\operatorname {E} [\theta ]=\int \theta \,p(\theta \mid \mathbf {X} ,\alpha )\,d\theta }

Taking a value with the greatest probability definesmaximuma posteriori (MAP) estimates:[17]{θMAP}argmaxθp(θX,α).{\displaystyle \{\theta _{\text{MAP}}\}\subset \arg \max _{\theta }p(\theta \mid \mathbf {X} ,\alpha ).}

There are examples where no maximum is attained, in which case the set of MAP estimates isempty.

There are other methods of estimation that minimize the posteriorrisk (expected-posterior loss) with respect to aloss function, and these are of interest tostatistical decision theory using the sampling distribution ("frequentist statistics").[18]

Theposterior predictive distribution of a new observationx~{\displaystyle {\tilde {x}}} (that is independent of previous observations) is determined by[19]p(x~|X,α)=p(x~,θX,α)dθ=p(x~θ)p(θX,α)dθ.{\displaystyle p({\tilde {x}}|\mathbf {X} ,\alpha )=\int p({\tilde {x}},\theta \mid \mathbf {X} ,\alpha )\,d\theta =\int p({\tilde {x}}\mid \theta )p(\theta \mid \mathbf {X} ,\alpha )\,d\theta .}

Examples

[edit]

Probability of a hypothesis

[edit]
Contingency table
Bowl

Cookie
#1
H1
#2
H2

Total
Plain,E302050
Choc, ¬E102030
Total404080
P(H1|E) = 30 / 50 = 0.6

Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?

Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. LetH1{\displaystyle H_{1}} correspond to bowl #1, andH2{\displaystyle H_{2}} to bowl #2.It is given that the bowls are identical from Fred's point of view, thusP(H1)=P(H2){\displaystyle P(H_{1})=P(H_{2})}, and the two must add up to 1, so both are equal to 0.5.The eventE{\displaystyle E} is the observation of a plain cookie. From the contents of the bowls, we know thatP(EH1)=30/40=0.75{\displaystyle P(E\mid H_{1})=30/40=0.75} andP(EH2)=20/40=0.5.{\displaystyle P(E\mid H_{2})=20/40=0.5.} Bayes' formula then yieldsP(H1E)=P(EH1)P(H1)P(EH1)P(H1)+P(EH2)P(H2) =0.75×0.50.75×0.5+0.5×0.5 =0.6{\displaystyle {\begin{aligned}P(H_{1}\mid E)&={\frac {P(E\mid H_{1})\,P(H_{1})}{P(E\mid H_{1})\,P(H_{1})\;+\;P(E\mid H_{2})\,P(H_{2})}}\\\\\ &={\frac {0.75\times 0.5}{0.75\times 0.5+0.5\times 0.5}}\\\\\ &=0.6\end{aligned}}}

Before we observed the cookie, the probability we assigned for Fred having chosen bowl #1 was the prior probability,P(H1){\displaystyle P(H_{1})}, which was 0.5. After observing the cookie, we must revise the probability toP(H1E){\displaystyle P(H_{1}\mid E)}, which is 0.6.

Making a prediction

[edit]
Example results for archaeology example. This simulation was generated using c=15.2.

An archaeologist is working at a site thought to be from the medieval period, between the 11th century to the 16th century. However, it is uncertain exactly when in this period the site was inhabited. Fragments of pottery are found, some of which are glazed and some of which are decorated. It is expected that if the site were inhabited during the early medieval period, then 1% of the pottery would be glazed and 50% of its area decorated, whereas if it had been inhabited in the late medieval period then 81% would be glazed and 5% of its area decorated. How confident can the archaeologist be in the date of inhabitation as fragments are unearthed?

The degree of belief in the continuous variableC{\displaystyle C} (century) is to be calculated, with the discrete set of events{GD,GD¯,G¯D,G¯D¯}{\displaystyle \{GD,G{\bar {D}},{\bar {G}}D,{\bar {G}}{\bar {D}}\}} as evidence. Assuming linear variation of glaze and decoration with time, and that these variables are independent,

P(E=GDC=c)=(0.01+0.810.011611(c11))(0.50.50.051611(c11)){\displaystyle P(E=GD\mid C=c)=(0.01+{\frac {0.81-0.01}{16-11}}(c-11))(0.5-{\frac {0.5-0.05}{16-11}}(c-11))}P(E=GD¯C=c)=(0.01+0.810.011611(c11))(0.5+0.50.051611(c11)){\displaystyle P(E=G{\bar {D}}\mid C=c)=(0.01+{\frac {0.81-0.01}{16-11}}(c-11))(0.5+{\frac {0.5-0.05}{16-11}}(c-11))}P(E=G¯DC=c)=((10.01)0.810.011611(c11))(0.50.50.051611(c11)){\displaystyle P(E={\bar {G}}D\mid C=c)=((1-0.01)-{\frac {0.81-0.01}{16-11}}(c-11))(0.5-{\frac {0.5-0.05}{16-11}}(c-11))}P(E=G¯D¯C=c)=((10.01)0.810.011611(c11))(0.5+0.50.051611(c11)){\displaystyle P(E={\bar {G}}{\bar {D}}\mid C=c)=((1-0.01)-{\frac {0.81-0.01}{16-11}}(c-11))(0.5+{\frac {0.5-0.05}{16-11}}(c-11))}

Assume a uniform prior offC(c)=0.2{\textstyle f_{C}(c)=0.2}, and that trials areindependent and identically distributed. When a new fragment of typee{\displaystyle e} is discovered, Bayes' theorem is applied to update the degree of belief for eachc{\displaystyle c}:fC(cE=e)=P(E=eC=c)P(E=e)fC(c)=P(E=eC=c)1116P(E=eC=c)fC(c)dcfC(c){\displaystyle f_{C}(c\mid E=e)={\frac {P(E=e\mid C=c)}{P(E=e)}}f_{C}(c)={\frac {P(E=e\mid C=c)}{\int _{11}^{16}{P(E=e\mid C=c)f_{C}(c)dc}}}f_{C}(c)}

A computer simulation of the changing belief as 50 fragments are unearthed is shown on the graph. In the simulation, the site was inhabited around 1420, orc=15.2{\displaystyle c=15.2}. By calculating the area under the relevant portion of the graph for 50 trials, the archaeologist can say that there is practically no chance the site was inhabited in the 11th and 12th centuries, about 1% chance that it was inhabited during the 13th century, 63% chance during the 14th century and 36% during the 15th century. TheBernstein-von Mises theorem asserts here the asymptotic convergence to the "true" distribution because theprobability space corresponding to the discrete set of events{GD,GD¯,G¯D,G¯D¯}{\displaystyle \{GD,G{\bar {D}},{\bar {G}}D,{\bar {G}}{\bar {D}}\}} is finite (see above section on asymptotic behaviour of the posterior).

In frequentist statistics and decision theory

[edit]

Adecision-theoretic justification of the use of Bayesian inference was given byAbraham Wald, who proved that every unique Bayesian procedure isadmissible. Conversely, everyadmissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.[20]

Wald characterized admissible procedures as Bayesian procedures (and limits of Bayesian procedures), making the Bayesian formalism a central technique in such areas offrequentist inference asparameter estimation,hypothesis testing, and computingconfidence intervals.[21][22][23] For example:

  • "Under some conditions, all admissible procedures are either Bayes procedures or limits of Bayes procedures (in various senses). These remarkable results, at least in their original form, are due essentially to Wald. They are useful because the property of being Bayes is easier to analyze than admissibility."[20]
  • "In decision theory, a quite general method for proving admissibility consists in exhibiting a procedure as a unique Bayes solution."[24]
  • "In the first chapters of this work, prior distributions with finite support and the corresponding Bayes procedures were used to establish some of the main theorems relating to the comparison of experiments. Bayes procedures with respect to more general prior distributions have played a very important role in the development of statistics, including its asymptotic theory." "There are many problems where a glance at posterior distributions, for suitable priors, yields immediately interesting information. Also, this technique can hardly be avoided in sequential analysis."[25]
  • "A useful fact is that any Bayes decision rule obtained by taking a proper prior over the whole parameter space must be admissible"[26]
  • "An important area of investigation in the development of admissibility ideas has been that of conventional sampling-theory procedures, and many interesting results have been obtained."[27]

Model selection

[edit]
Main article:Bayesian model selection
See also:Bayesian information criterion

Bayesian methodology also plays a role inmodel selection where the aim is to select one model from a set of competing models that represents most closely the underlying process that generated the observed data. In Bayesian model comparison, the model with the highestposterior probability given the data is selected. The posterior probability of a model depends on the evidence, ormarginal likelihood, which reflects the probability that the data is generated by the model, and on theprior belief of the model. When two competing models are a priori considered to be equiprobable, the ratio of their posterior probabilities corresponds to theBayes factor. Since Bayesian model comparison is aimed on selecting the model with the highest posterior probability, this methodology is also referred to as the maximum a posteriori (MAP) selection rule[28] or the MAP probability rule.[29]

Probabilistic programming

[edit]
Main article:Probabilistic programming

While conceptually simple, Bayesian methods can be mathematically and numerically challenging. Probabilistic programming languages (PPLs) implement functions to easily build Bayesian models together with efficient automatic inference methods. This helps separate the model building from the inference, allowing practitioners to focus on their specific problems and leaving PPLs to handle the computational details for them.[30][31][32]

Applications

[edit]

Statistical data analysis

[edit]

See the separate Wikipedia entry onBayesian statistics, specifically thestatistical modeling section in that page.

Computer applications

[edit]

Bayesian inference has applications inartificial intelligence andexpert systems. Bayesian inference techniques have been a fundamental part of computerizedpattern recognition techniques since the late 1950s.[33] There is also an ever-growing connection between Bayesian methods and simulation-basedMonte Carlo techniques since complex models cannot be processed in closed form by a Bayesian analysis, while agraphical model structuremay allow for efficient simulation algorithms like theGibbs sampling and otherMetropolis–Hastings algorithm schemes.[34] Recently[when?] Bayesian inference has gained popularity among thephylogenetics community for these reasons; a number of applications allow many demographic and evolutionary parameters to be estimated simultaneously.

As applied tostatistical classification, Bayesian inference has been used to develop algorithms for identifyinge-mail spam. Applications which make use of Bayesian inference for spam filtering includeCRM114,DSPAM,Bogofilter,SpamAssassin,SpamBayes,Mozilla, XEAMS, and others. Spam classification is treated in more detail in the article on thenaïve Bayes classifier.

Solomonoff's Inductive inference is the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computableprobability distribution. It is a formal inductive framework that combines two well-studied principles of inductive inference: Bayesian statistics andOccam's Razor.[35][unreliable source?] Solomonoff's universal prior probability of any prefixp of a computable sequencex is the sum of the probabilities of all programs (for a universal computer) that compute something starting withp. Given somep and any computable but unknown probability distribution from whichx is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts ofx in optimal fashion.[36][37]

Bioinformatics and healthcare applications

[edit]

Bayesian inference has been applied in differentbioinformatics applications, including differential gene expression analysis.[38] Bayesian inference is also used in a general cancer risk model, calledCIRI (Continuous Individualized Risk Index), where serial measurements are incorporated to update a Bayesian model which is primarily built from prior knowledge.[39][40]

Cosmology and astrophysical applications

[edit]

The Bayesian approach has been central to recent progress in cosmology and astrophysical applications,[41][42] and extends to a wide range of astrophysical problems, including the characterisation of exoplanet (such as the fitting of atmosphere fork2-18b[43]), parameter constraints with cosmological data,[44] and calibration in astrophysical experiments.[45]

In cosmology, it is often employed with computational techniques such asMarkov chain Monte Carlo(MCMC) andNested sampling algorithm to analyse complex datasets and navigate high-dimensional parameter space. A notable application is to the Planck 2018 CMB data for parameter inference.[44]The six base cosmological parameters inLambda-CDM model are not predicted by a theory, but rather fitted from Cosmic microwave background (CMB) data to a chosen model of cosmology (the Lambda-CDM model).[46] The bayesian code for cosmology `cobaya`[47] sets up cosmological runs and interfaces cosmological likelihoods, Boltzmann code,[48][49] which computes the predicted CMB anisotropies for any given set of cosmological parameters, with MCMC or nested sampler.

This computational framework is not limited to the standard model, it is also essential for testing alternative or extended theories of cosmology, such as theories with early dark energy,[50] or modified gravity theories introducing additional parameters beyond Lambda-CDM.Bayesian model comparison can then be employed to calculate the evidence for competing models, providing a statistical basis to assess whether the data support them over the standard Lambda-CDM.[51]

In the courtroom

[edit]
Main article:Jurimetrics § Bayesian analysis of evidence

Bayesian inference can be used by jurors to coherently accumulate the evidence for and against a defendant, and to see whether, in totality, it meets their personal threshold for "beyond a reasonable doubt".[52][53][54] Bayes' theorem is applied successively to all evidence presented, with the posterior from one stage becoming the prior for the next. The benefit of a Bayesian approach is that it gives the juror an unbiased, rational mechanism for combining evidence. It may be appropriate to explain Bayes' theorem to jurors inodds form, asbetting odds are more widely understood than probabilities. Alternatively, alogarithmic approach, replacing multiplication with addition, might be easier for a jury to handle.

Adding up evidence

If the existence of the crime is not in doubt, only the identity of the culprit, it has been suggested that the prior should be uniform over the qualifying population.[55] For example, if 1,000 people could have committed the crime, the prior probability of guilt would be 1/1000.

The use of Bayes' theorem by jurors is controversial. In the United Kingdom, a defenceexpert witness explained Bayes' theorem to the jury inR v Adams. The jury convicted, but the case went to appeal on the basis that no means of accumulating evidence had been provided for jurors who did not wish to use Bayes' theorem. The Court of Appeal upheld the conviction, but it also gave the opinion that "To introduce Bayes' Theorem, or any similar method, into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity, deflecting them from their proper task."

Gardner-Medwin[56] argues that the criterion on which a verdict in a criminal trial should be based isnot the probability of guilt, but rather theprobability of the evidence, given that the defendant is innocent (akin to afrequentistp-value). He argues that if the posterior probability of guilt is to be computed by Bayes' theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime, which is an unusual piece of evidence to consider in a criminal trial. Consider the following three propositions:

A – the known facts and testimony could have arisen if the defendant is guilty.
B – the known facts and testimony could have arisen if the defendant is innocent.
C – the defendant is guilty.

Gardner-Medwin argues that the jury should believe bothA and not-B in order to convict.A and not-B implies the truth ofC, but the reverse is not true. It is possible thatB andC are both true, but in this case he argues that a jury should acquit, even though they know that they will be letting some guilty people go free. See alsoLindley's paradox.

Bayesian epistemology

[edit]

Bayesian epistemology is a movement that advocates for Bayesian inference as a means of justifying the rules of inductive logic.

Karl Popper andDavid Miller have rejected the idea of Bayesian rationalism, i.e. using Bayes rule to make epistemological inferences:[57] It is prone to the samevicious circle as any otherjustificationist epistemology, because it presupposes what it attempts to justify. According to this view, a rational interpretation of Bayesian inference would see it merely as a probabilistic version offalsification, rejecting the belief, commonly held by Bayesians, that high likelihood achieved by a series of Bayesian updates would prove the hypothesis beyond any reasonable doubt, or even with likelihood greater than 0.

Other

[edit]

Bayes and Bayesian inference

[edit]

The problem considered by Bayes in Proposition 9 of his essay, "An Essay Towards Solving a Problem in the Doctrine of Chances", is the posterior distribution for the parametera (the success rate) of thebinomial distribution.[citation needed]

History

[edit]
Main article:History of statistics § Bayesian statistics

The termBayesian refers toThomas Bayes (1701–1761), who proved that probabilistic limits could be placed on an unknown event.[64] However, it wasPierre-Simon Laplace (1749–1827) who introduced (as Principle VI) what is now calledBayes' theorem and used it to address problems incelestial mechanics, medical statistics,reliability, andjurisprudence.[65] Early Bayesian inference, which used uniform priors following Laplace'sprinciple of insufficient reason, was called "inverse probability" (because itinfers backwards from observations to parameters, or from effects to causes[66]). After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be calledfrequentist statistics.[66]

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise toobjective andsubjective currents in Bayesian practice. In the objective or "non-informative" current, the statistical analysis depends on only the model assumed, the data analyzed,[67] and the method assigning the prior, which differs from one objective Bayesian practitioner to another. In the subjective or "informative" current, the specification of the prior depends on the belief (that is, propositions on which the analysis is prepared to act), which can summarize information from experts, previous studies, etc.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery ofMarkov chain Monte Carlo methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications.[68] Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics.[69] Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field ofmachine learning.[70]

See also

[edit]

References

[edit]

Citations

[edit]
  1. ^"Bayesian".Merriam-Webster.com Dictionary. Merriam-Webster.
  2. ^Hacking, Ian (December 1967). "Slightly More Realistic Personal Probability".Philosophy of Science.34 (4): 316.doi:10.1086/288169.S2CID 14344339.
  3. ^"Bayes' Theorem (Stanford Encyclopedia of Philosophy)". Plato.stanford.edu. Retrieved2014-01-05.
  4. ^van Fraassen, B. (1989)Laws and Symmetry, Oxford University Press.ISBN 0-19-824860-1.
  5. ^Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B. (2013).Bayesian Data Analysis, Third Edition. Chapman and Hall/CRC.ISBN 978-1-4398-4095-5.
  6. ^de Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019)."On the geometry of Bayesian inference"(PDF).Bayesian Analysis.14 (4): 1013‒1036.doi:10.1214/18-BA1112.S2CID 88521802.
  7. ^Lee, Se Yoon (2021). "Gibbs sampler and coordinate ascent variational inference: A set-theoretical review".Communications in Statistics – Theory and Methods.51 (6):1549–1568.arXiv:2008.01006.doi:10.1080/03610926.2021.1921214.S2CID 220935477.
  8. ^Kolmogorov, A.N. (1933) [1956].Foundations of the Theory of Probability. Chelsea Publishing Company.
  9. ^Tjur, Tue (1980).Probability based on Radon measures. Internet Archive. Chichester [Eng.]; New York : Wiley.ISBN 978-0-471-27824-5.
  10. ^Taraldsen, Gunnar; Tufto, Jarle; Lindqvist, Bo H. (2021-07-24)."Improper priors and improper posteriors".Scandinavian Journal of Statistics.49 (3):969–991.doi:10.1111/sjos.12550.hdl:11250/2984409.ISSN 0303-6898.S2CID 237736986.
  11. ^Robert, Christian P.; Casella, George (2004).Monte Carlo Statistical Methods. Springer.ISBN 978-1-4757-4145-2.OCLC 1159112760.
  12. ^Freedman, DA (1963)."On the asymptotic behavior of Bayes' estimates in the discrete case".The Annals of Mathematical Statistics.34 (4):1386–1403.doi:10.1214/aoms/1177703871.JSTOR 2238346.
  13. ^Freedman, DA (1965)."On the asymptotic behavior of Bayes estimates in the discrete case II".The Annals of Mathematical Statistics.36 (2):454–456.doi:10.1214/aoms/1177700155.JSTOR 2238150.
  14. ^Robins, James; Wasserman, Larry (2000). "Conditioning, likelihood, and coherence: A review of some foundational concepts".Journal of the American Statistical Association.95 (452):1340–1346.doi:10.1080/01621459.2000.10474344.S2CID 120767108.
  15. ^Sen, Pranab K.; Keating, J. P.; Mason, R. L. (1993).Pitman's measure of closeness: A comparison of statistical estimators. Philadelphia: SIAM.
  16. ^Choudhuri, Nidhan; Ghosal, Subhashis; Roy, Anindya (2005-01-01). "Bayesian Methods for Function Estimation".Handbook of Statistics. Bayesian Thinking. Vol. 25. pp. 373–414.CiteSeerX 10.1.1.324.3052.doi:10.1016/s0169-7161(05)25013-7.ISBN 978-0-444-51539-1.
  17. ^"Maximum A Posteriori (MAP) Estimation".www.probabilitycourse.com. Retrieved2017-06-02.
  18. ^Yu, Angela."Introduction to Bayesian Decision Theory"(PDF).cogsci.ucsd.edu/. Archived fromthe original(PDF) on 2013-02-28.
  19. ^Hitchcock, David."Posterior Predictive Distribution Stat Slide"(PDF).stat.sc.edu.
  20. ^abBickel & Doksum (2001, p. 32)
  21. ^Kiefer, J.; Schwartz R. (1965)."Admissible Bayes Character of T2-, R2-, and Other Fully Invariant Tests for Multivariate Normal Problems".Annals of Mathematical Statistics.36 (3):747–770.doi:10.1214/aoms/1177700051.
  22. ^Schwartz, R. (1969)."Invariant Proper Bayes Tests for Exponential Families".Annals of Mathematical Statistics.40:270–283.doi:10.1214/aoms/1177697822.
  23. ^Hwang, J. T. & Casella, George (1982)."Minimax Confidence Sets for the Mean of a Multivariate Normal Distribution"(PDF).Annals of Statistics.10 (3):868–881.doi:10.1214/aos/1176345877.
  24. ^Lehmann, Erich (1986).Testing Statistical Hypotheses (Second ed.). (see p. 309 of Chapter 6.7 "Admissibility", and pp. 17–18 of Chapter 1.8 "Complete Classes"
  25. ^Le Cam, Lucien (1986).Asymptotic Methods in Statistical Decision Theory. Springer-Verlag.ISBN 978-0-387-96307-5. (From "Chapter 12 Posterior Distributions and Bayes Solutions", p. 324)
  26. ^Cox, D. R.; Hinkley, D.V. (1974).Theoretical Statistics. Chapman and Hall. p. 432.ISBN 978-0-04-121537-3.
  27. ^Cox, D. R.; Hinkley, D.V. (1974).Theoretical Statistics. Chapman and Hall. p. 433.ISBN 978-0-04-121537-3.)
  28. ^Stoica, P.; Selen, Y. (2004). "A review of information criterion rules".IEEE Signal Processing Magazine.21 (4):36–47.doi:10.1109/MSP.2004.1311138.S2CID 17338979.
  29. ^Fatermans, J.; Van Aert, S.; den Dekker, A.J. (2019). "The maximum a posteriori probability rule for atom column detection from HAADF STEM images".Ultramicroscopy.201:81–91.arXiv:1902.05809.doi:10.1016/j.ultramic.2019.02.003.PMID 30991277.S2CID 104419861.
  30. ^Bessiere, P., Mazer, E., Ahuactzin, J. M., & Mekhnacha, K. (2013). Bayesian Programming (1 edition) Chapman and Hall/CRC.
  31. ^Daniel Roy (2015)."Probabilistic Programming".probabilistic-programming.org. Archived fromthe original on 2016-01-10. Retrieved2020-01-02.
  32. ^Ghahramani, Z (2015)."Probabilistic machine learning and artificial intelligence".Nature.521 (7553):452–459.Bibcode:2015Natur.521..452G.doi:10.1038/nature14541.PMID 26017444.S2CID 216356.
  33. ^Fienberg, Stephen E. (2006-03-01)."When did Bayesian inference become "Bayesian"?".Bayesian Analysis.1 (1).doi:10.1214/06-BA101.
  34. ^Jim Albert (2009).Bayesian Computation with R, Second edition. New York, Dordrecht, etc.: Springer.ISBN 978-0-387-92297-3.
  35. ^Rathmanner, Samuel; Hutter, Marcus; Ormerod, Thomas C (2011)."A Philosophical Treatise of Universal Induction".Entropy.13 (6):1076–1136.arXiv:1105.5721.Bibcode:2011Entrp..13.1076R.doi:10.3390/e13061076.S2CID 2499910.
  36. ^Hutter, Marcus; He, Yang-Hui; Ormerod, Thomas C (2007). "On Universal Prediction and Bayesian Confirmation".Theoretical Computer Science.384 (2007):33–48.arXiv:0709.1516.Bibcode:2007arXiv0709.1516H.doi:10.1016/j.tcs.2007.05.016.S2CID 1500830.
  37. ^Gács, Peter; Vitányi, Paul M. B. (2 December 2010). "Raymond J. Solomonoff 1926-2009".CiteSeerX 10.1.1.186.8268.
  38. ^Robinson, Mark D & McCarthy, Davis J & Smyth, Gordon K edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics.
  39. ^"CIRI".ciri.stanford.edu. Retrieved2019-08-11.
  40. ^Kurtz, David M.; Esfahani, Mohammad S.; Scherer, Florian; Soo, Joanne; Jin, Michael C.; Liu, Chih Long; Newman, Aaron M.; Dührsen, Ulrich; Hüttmann, Andreas (2019-07-25)."Dynamic Risk Profiling Using Serial Tumor Biomarkers for Personalized Outcome Prediction".Cell.178 (3): 699–713.e19.doi:10.1016/j.cell.2019.06.011.ISSN 1097-4172.PMC 7380118.PMID 31280963.
  41. ^Trotta, Roberto (2017). "Bayesian Methods in Cosmology".arXiv:1701.01467 [astro-ph.CO].
  42. ^Staicova, Denitsa (2025)."Modern Bayesian Sampling Methods for Cosmological Inference: A Comparative Study".Universe.11 (2): 68.arXiv:2501.06022.Bibcode:2025Univ...11...68S.doi:10.3390/universe11020068.
  43. ^Madhusudhan, Nikku; Constantinou, Savvas; Holmberg, Måns; Sarkar, Subhajit; Piette, Anjali A. A.; Moses, Julianne I. (2025)."New Constraints on DMS and DMDS in the Atmosphere of K2-18 b from JWST MIRI".The Astrophysical Journal.983 (2): L40.arXiv:2504.12267.Bibcode:2025ApJ...983L..40M.doi:10.3847/2041-8213/adc1c8.
  44. ^abAghanim, N.; et al. (2020). "Planck 2018 results".Astronomy & Astrophysics.641: A6.arXiv:1807.06209.Bibcode:2020A&A...641A...6P.doi:10.1051/0004-6361/201833910.
  45. ^Anstey, Dominic; De Lera Acedo, Eloy; Handley, Will (2021)."A general Bayesian framework for foreground modelling and chromaticity correction for global 21 cm experiments".Monthly Notices of the Royal Astronomical Society.506 (2):2041–2058.arXiv:2010.09644.doi:10.1093/mnras/stab1765.
  46. ^Lewis, Antony; Bridle, Sarah (2002). "Cosmological parameters from CMB and other data: A Monte Carlo approach".Physical Review D.66 (10) 103511.arXiv:astro-ph/0205436.Bibcode:2002PhRvD..66j3511L.doi:10.1103/PhysRevD.66.103511.
  47. ^"Cobaya, a code for Bayesian analysis in Cosmology — cobaya 3.5.7 documentation".cobaya.readthedocs.io. Retrieved2025-07-23.
  48. ^"CAMB — Code for Anisotropies in the Microwave Background (CAMB) 1.6.1 documentation".camb.readthedocs.io. Retrieved2025-07-23.
  49. ^Lesgourgues, Julien (2011). "The Cosmic Linear Anisotropy Solving System (CLASS) I: Overview".arXiv:1104.2932 [astro-ph.IM].
  50. ^Hill, J. Colin; McDonough, Evan; Toomey, Michael W.; Alexander, Stephon (2020). "Early dark energy does not restore cosmological concordance".Physical Review D.102 (4) 043507.arXiv:2003.07355.Bibcode:2020PhRvD.102d3507H.doi:10.1103/PhysRevD.102.043507.
  51. ^Trotta, Roberto (2008). "Bayes in the sky: Bayesian inference and model selection in cosmology".Contemporary Physics.49 (2):71–104.arXiv:0803.4089.Bibcode:2008ConPh..49...71T.doi:10.1080/00107510802066753.
  52. ^Dawid, A. P. and Mortera, J. (1996) "Coherent Analysis of Forensic Identification Evidence".Journal of the Royal Statistical Society, Series B, 58, 425–443.
  53. ^Foreman, L. A.; Smith, A. F. M., and Evett, I. W. (1997). "Bayesian analysis of deoxyribonucleic acid profiling data in forensic identification applications (with discussion)".Journal of the Royal Statistical Society, Series A, 160, 429–469.
  54. ^Robertson, B. and Vignaux, G. A. (1995)Interpreting Evidence: Evaluating Forensic Science in the Courtroom. John Wiley and Sons. Chichester.ISBN 978-0-471-96026-3.
  55. ^Dawid, A. P. (2001)Bayes' Theorem and Weighing Evidence by Juries.Archived 2015-07-01 at theWayback Machine
  56. ^Gardner-Medwin, A. (2005) "What Probability Should the Jury Address?".Significance, 2 (1), March 2005.
  57. ^Miller, David (1994).Critical Rationalism. Chicago: Open Court.ISBN 978-0-8126-9197-9.
  58. ^Howson & Urbach (2005), Jaynes (2003)
  59. ^Cai, X.Q.; Wu, X.Y.; Zhou, X. (2009). "Stochastic scheduling subject to breakdown-repeat breakdowns with incomplete information".Operations Research.57 (5):1236–1249.doi:10.1287/opre.1080.0660.
  60. ^Ogle, Kiona; Tucker, Colin; Cable, Jessica M. (2014-01-01). "Beyond simple linear mixing models: process-based isotope partitioning of ecological processes".Ecological Applications.24 (1):181–195.Bibcode:2014EcoAp..24..181O.doi:10.1890/1051-0761-24.1.181.ISSN 1939-5582.PMID 24640543.
  61. ^Evaristo, Jaivime; McDonnell, Jeffrey J.; Scholl, Martha A.; Bruijnzeel, L. Adrian; Chun, Kwok P. (2016-01-01). "Insights into plant water uptake from xylem-water isotope measurements in two tropical catchments with contrasting moisture conditions".Hydrological Processes.30 (18):3210–3227.Bibcode:2016HyPr...30.3210E.doi:10.1002/hyp.10841.ISSN 1099-1085.S2CID 131588159.
  62. ^Gupta, Ankur; Rawlings, James B. (April 2014)."Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology".AIChE Journal.60 (4):1253–1268.Bibcode:2014AIChE..60.1253G.doi:10.1002/aic.14409.ISSN 0001-1541.PMC 4946376.PMID 27429455.
  63. ^Schütz, N.; Holschneider, M. (2011). "Detection of trend changes in time series using Bayesian inference".Physical Review E.84 (2) 021120.arXiv:1104.3448.Bibcode:2011PhRvE..84b1120S.doi:10.1103/PhysRevE.84.021120.PMID 21928962.S2CID 11460968.
  64. ^Stigler, Stephen (1982). "Thomas Bayes's Bayesian Inference".Journal of the Royal Statistical Society.145 (2):250–58.doi:10.2307/2981538.JSTOR 2981538.
  65. ^Stigler, Stephen M. (1986)."Chapter 3".The History of Statistics. Harvard University Press.ISBN 978-0-674-40340-6.
  66. ^abFienberg, Stephen E. (2006)."When did Bayesian Inference Become 'Bayesian'?".Bayesian Analysis.1 (1): 1–40 [p. 5].doi:10.1214/06-ba101.
  67. ^Bernardo, José-Miguel (2005). "Reference analysis".Handbook of statistics. Vol. 25. pp. 17–90.
  68. ^Wolpert, R. L. (2004). "A Conversation with James O. Berger".Statistical Science.19 (1):205–218.CiteSeerX 10.1.1.71.6112.doi:10.1214/088342304000000053.MR 2082155.S2CID 120094454.
  69. ^Bernardo, José M. (2006)."A Bayesian mathematical statistics primer"(PDF).Icots-7.
  70. ^Bishop, C. M. (2007).Pattern Recognition and Machine Learning. New York: Springer.ISBN 978-0-387-31073-2.

Sources

[edit]

Further reading

[edit]

Elementary

[edit]

The following books are listed in ascending order of probabilistic sophistication:

Intermediate or advanced

[edit]

External links

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
International
National
Other
Retrieved from "https://en.wikipedia.org/w/index.php?title=Bayesian_inference&oldid=1322694773"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp