An inductive logic is a system of reasoning that articulates howevidence claims bear on the truth of hypotheses. As with any logic, itdoes this via the evaluation of arguments. Each argument consists ofpremise statements and a conclusion statement. A logic employsprinciples and rules to evaluate the extent to which the truth of anargument’s premise statements support the truth of itsconclusion statement.
In a deductive logic the truth of the premises of agoodargument guarantees the truth of its conclusion.Gooddeductive arguments are calleddeductively valid; theirpremises are said tologically entail their conclusions,wherelogical entailment means that every logically possiblestate of affairs that makes the premises true also makes theconclusion true. In an inductive logic the truth of the premises of agood argument supports the truth of its conclusion to someappropriate degree. That is, the truth of the argument’spremises provides an appropriatedegree-of-support for thetruth of its conclusion. Thesedegrees-of-support aretypically measured on a numerical scale. By analogy with the notion ofdeductive logical entailment, the notion of an appropriate inductivedegree-of-support may be taken to mean something like this: among thelogically possible states of affairs that make the premises true, theconclusion is true in proportionr of them.
This article explicates the inductive logic most widely studied bylogicians and epistemologists in recent years. The logic employsconditional probability functions to represent the degree to which anargument’s premises support its conclusion. This approach isoften called aBayesian inductive logic, because a theorem ofprobability theory called Bayes’ Theorem plays a central role inarticulating how evidence claims inductively support hypotheses.
Ultimately, any adequate inductive logic should provide a mechanismwhereby evidence may legitimately refute false hypotheses and endorsetrue ones. That is, any legitimate inductive logic should provide atleast a modest version of the most famous epistemological remarkattributed to Sherlock Holmes:
When you have eliminated all which is impossible, then whateverremains, however improbable, must be the truth.
Although this remark overstates what an inductive logic can usuallyaccomplish, the underlying idea is basically right. That is, a logicof evidential support aspires to endorse the following more modestprinciple:
When a rigorous body of evidence shows that all of the crediblealternatives to a hypothesis are highly unlikely by comparison, thenthe remaining hypothesis, however initially implausible, must veryprobably be true.
This idea, that evidence comes to support the truth of a hypothesis byundermining its competitors, is central to the workings of a Bayesianlogic of evidential support. This article will describe in some detailhow this Bayesian inductive logic works.
Section 1 explicates the most important inference rules for a Bayesianinductive logic. These rules articulate how some probabilisticarguments may be combined to determine the degree to which evidenceweighs for or against hypotheses (as expressed by other probabilisticarguments). Section 2 provides examples of the application of theseinference rules.
This section lays out the fundamental elements of a probabilistic(Bayesian) inductive logic. We first develop appropriate notation andspecify the logical axioms for the conditional probability functions.These conditional probability functions will be used to representinductive arguments. Next we briefly describe the two most fundamentalcomponent arguments in the inference rules for Bayesian inductiveinferences: (1) theevidential likelihoods, and (2) theprior plausibility assessments of hypotheses. Then weexplicate four of the most important inference rules for this kind ofinductive logic, rules that employ the probability values fromlikelihood arguments and the prior plausibility arguments to determinethe probability values for arguments from evidential premises tohypotheses.
In the main body of this article we will forgo a discussion of thehistorical origins of probabilistic inductive logic. See the appendixHistorical Origins and Interpretations of Probabilistic Inductive Logic for an overview of the origins, and for a brief summary of viewsabout the nature of probabilistic inductive logic.
In a probabilistic argument, the degree to which a premise statement\(D\) supports the truth or falsehood of a conclusion statement \(C\)is expressed in terms of a conditional probability function \(P\). Aformula of form \(P[C \mid D] = r\) expresses the claim that premise\(D\) supports conclusion \(C\) to degree \(r\), where \(r\) is a realnumber between 0 and 1. Notice that the conclusion \(C\) is placed onthe left-hand side of the conditional probability expression, followedby the premise \(D\) on the right-hand side. This reverses the orderof premise and conclusion employed in the standard expressions fordeductive logical entailment, where the logical entailment of aconclusion \(C\) by premise \(D\) is usually represented by anexpression of form \(D \vDash C\).
In applications of deductive logic the main challenge is to determinewhether or not a logical entailment, \(D \vDash C\), holds forarguments consisting of premises \(D\) and conclusions \(C\).Similarly, the main challenge in a probabilistic inductive logic is todetermine the appropriate values of \(r\) such that \(P[C \mid D] =r\) holds for arguments consisting of premises \(D\) and conclusions\(C\). The probabilistic formula \(P[C \mid D] = r\) may be read ineither of two ways: literallythe probability of \(C\) given \(D\)is \(r\); but also,apropos the application of probabilityfunctionsP to represent argument strengths,the degree towhich \(C\) is supported by \(D\) is \(r\).
Throughout our discussion we use common logical notation forconjunctions, disjunctions, and negations. We use adot betweensentences, \((A \cdot B)\), to represent their conjunction, (\(A\)and \(B\)); and we use awedge between sentences, \((A\vee B)\), to represent their disjunction, (\(A\)or \(B\)).Disjunction is taken to be inclusive: \((A \vee B)\) means thatatleast one of \(A\) or \(B\) is true. We use thenot symbol\(\neg\) in front of a sentence to represent its negation: \(\neg C\)meansit’s not the case that \(C\).
Here are standard logical axioms for conditional probabilities. Theysupply minimal rules for probabilistic support functions. That is,support functions should satisfy at least these axioms, and perhapssome additional rules as well.
Let \(L\) be a language of interest — i.e. any bit of languagein which the inductive arguments of interest may be expressed —and let \(\vDash\) be the logical entailment relation for thislanguage. A conditional probability function (i.e. a probabilisticsupport function) is a function \(P\) from pairs of statements of\(L\) to real numbers that satisfies (at least) the following axioms.
For all statements \(A\), \(B\), and \(C\) in \(L\):
These axioms do not presuppose that logically equivalent statementshave the same probability. Rather, that can be proved from theseaxioms.
Axioms 1-4 should be clear enough as stated. Axiom 5 says that when\(C \vDash \neg(A \cdot B)\) (i.e. when \(C\) logically entails that\(A\) and \(B\) cannot both be true), the support-strength of \(C\)for their disjunction, \((A \vee B)\), must equal the sum of itssupport-strengths for each of them individually. The only exception tothis additivity condition occurs when \(C\) supports every statement\(D\) to degree 1. That can happen, for example, when \(C\) islogically inconsistent, since (according to standard deductive logic)logically inconsistent statements must logically entail everystatement \(D\).
The following four rules follow easily from axioms 2, 3, and 5:
These results are derived in the appendix,Axioms and Some Theorems for Conditional Probability. This appendix also includes an alternative way to axiomatizeconditional probability, which draws on much weaker axioms to arriveat the same results (i.e. all the above axioms and theorems arederivable from these weaker axioms).
Axiom 6 expresses a fundamental relationship between conditionalprobabilities. Think of it like this. Call the collection of logicallypossible states of affairs where a statement \(C\) is truethe\(C\) states. Consider the proportion \(p\) of \(C\) states thatare also \(B\) states: \(P[B \mid C] = p\). A certain fraction \(f\)of those \((B \cdot C)\) states are also \(A\) states: \(P[A \mid (B\cdot C)] = f\). Then, the proportion of the \(C\) states that are\((A \cdot B)\) states, \(P[(A \cdot B) \mid C]\),should bethe fraction \(f\) of proportion \(p\), which is given by \(f \timesp\). That is, the proportion of the \(C\) states that are \((A \cdotB)\) statesshould be the fraction of \((B \cdot C)\) statesthat are also \(A\) states, \(f\), of the proportion of \(C\) statesthat are \(B\) states, \(p\):
\[P[(A \cdot B) \mid C] = f \times p = P[A \mid (B \cdot C)] \timesP[B \mid C].\]From axiom 6, together with axioms 3 and 5, a simple form ofBayes’ Theorem follows: if \(P[B \mid C] \gt 0\), then
\[P[A \mid (B \cdot C)] = \dfrac{P[B \mid (A \cdot C)] \times P[A \midC]}{P[B \mid C]}.\]To see how Bayes’ Theorem can represent an inference rulegoverning the evidential support for a hypothesis, replace \(A\) bysome hypothesis \(h\), replace \(B\) by some relevant body of evidence\(e\), and let \(c\) represent some appropriate conjunction ofbackground and auxiliary conditions, including whatever experimentalor observational conditions (a.k.a.initial conditions) may berequired to link \(h\) to \(e\) (more about this below). Then, theappropriate version of Bayes’ Theorem takes the following form:if \(P[e \mid c] \gt 0\), then
\[P[h \mid (e \cdot c)] = \dfrac{P[e \mid (h \cdot c)] \times P[h \midc]}{P[e \mid c]}.\]Thus, Bayes’ Theorem represents the way in which the strength ofthe evidential support for a hypothesis, \(P[h \mid (e \cdot c)]\),can be calculated from the strengths of three other probabilisticarguments: \(P[e \mid (h \cdot c)]\), \(P[h \mid c]\), and \(P[e \midc]\). Stated this way, Bayes’ Theorem may not look much like aninference rule. So, let’s articulate more precisely how anequation like this may be construed as an inference rule. Itrepresents a rule that draws on the strengths of three probabilisticarguments to infer the strength of a further argument. Thus, as aninference rule, Bayes’ Theorem may be expressed as follows:
Each of the inference rules for the inductive logic of evidentialsupport presented in this article is based on this basic Bayesianidea. However, it usually turns out that the numerical value \(q\) ofthe strength of the argument \(P[e \mid c] = q\) is especiallydifficult to evaluate. So, the Bayesian inference rules providedthroughout the remainder of this article will not depend onprobabilistic arguments of the form \(P[e \mid c] = q\). Furthermore,the strengths \(s\) of arguments of form \(P[h \mid c] = s\) are oftenquite vague or indeterminate. This issue will receive specialattention as we proceed.
We now proceed to consider four basic rules of Bayesian inference foran inductive logic. Each of these rules follows from the above axioms.However, before getting into the rules themselves, we need to firstinvestigate more carefully the two kinds of argumentative componentsthat will be employed by each of these rules: \(P[e \mid (h \cdot c)]= r\) and \(P[h \mid c] = s\).
In nearly all applications of probabilistic inductive logic, thearguments of interest involve an assessment of the degree to whichobservable or detectable evidence \(e\) tells for or against ahypothesis and its competing alternatives. Let \(h_1\), \(h_2\),\(h_3\), …, etc., represent a collection of two or morecompeting alternative hypotheses. Hypotheses count ascompetingalternatives when they address the same subject matter, butdisagree with regard to at least some claims about that subjectmatter. Thus, we take any two alternative hypotheses from thecollection, \(h_i\) and \(h_j\), to be logically incompatible:\(\vDash \neg (h_i \cdot h_j)\) — i.e. it is logically true that\(\neg (h_i \cdot h_j)\).
The bearing of evidence on the probable truth or falsehood of ahypothesis can seldom, if ever, be assessed on the basis of evidentialresults alone. For one thing, the bearing of evidential results \(e\)on hypothesis \(h_j\) depends on the conditions under which theobservations were made, or on how the experiment was set up andconducted. Let \(c\) represent (a conjunction of) statements thatdescribe the observational or experimental conditions (sometimescalled theinitial conditions) that give rise to evidentialresults described by (conjunction of) statements \(e\).
Furthermore, the bearing of evidential conditions and their outcomes,\((c \cdot e)\), on a hypothesis \(h_j\) will often depend onauxiliary hypotheses — e.g. auxiliary claims about how measuringdevices produce outcomes relevant to \(h_j\) under conditions like\(c\). Let \(b\) represent the conjunction of all such auxiliaryclaims that connect each competing hypothesis, \(h_i\), \(h_j\), etc.to outcomes \(e\) of conditions \(c\). For example, suppose thevarious hypotheses propose alternative medical disorders that may beafflicting a particular patient. Conditions \(c\) may describe a bodyof medical tests performed on the patient (e.g. blood drawn andsubmitted to various specific tests), and \(e\) may state the preciseoutcomes of those tests (e.g. precise values for white cell count,blood sugar level, AFP level, etc.). However, descriptions of medicaltests and their outcomes can only weigh for or against the presence ofa disorder in light of auxiliary hypotheses about the ways in whicheach disorder \(h_j\) is likely to influence those test outcomes (e.g.how each possible medical disorder is likely to influence white cellcounts, blood sugar levels, AFP levels, etc.). The expression \(b\),forbackground claims, represents the conjunction of suchauxiliaries. (Many of the claims in \(b\) should themselves be subjectto evidential support in contexts where they compete with alternativeclaims about their own subject matters. More on this later.)
A comprehensive assessment of the probable truth of a hypothesisshould also depend on some body of plausibility considerations —on how much more (or less) plausible \(h_j\) is than alternatives\(h_i\), based on considerationsprior to bringing the evidenceto bear. A reasonable inductive logic should reflect the idea thatextraordinary claims require extraordinarily evidence. That is,a hypothesis that makes extraordinary claims requires exceptionallystrong evidence to overcome its initial implausibility. So, it makesgood sense that the logic should have a way to accommodate how muchmore or less plausible one hypothesis is than an alternative, prior totaking the evidence into account. For example, in diagnosing a medicaldisorder, it makes good sense to take into account how commonly (orrarely) each alternative disorder occurs within the most relevantsub-population to which the patient belongs. This is called thebase rates of disorders in the relevant sub-population.We’ll soon see how such considerations figure into the inferencerules of inductive logic. For the purpose of describing the logic, wealso let symbol \(b\) represent the conjunction of whatever relevantplausibility considerations are brought to bear on the initialplausibilities of hypotheses, along with whatever relevant auxiliaryhypotheses are employed.
Expressed in these terms, a primary objective of a probabilisticinductive logic is to assess the degree-of-support for (or against)each competing hypothesis \(h_j\) by a premise of form \((c \cdote\cdot b)\), consisting of evidential condition \(c\) together withits observable outcome \(e\), in conjunction with relevant auxiliaryhypotheses and plausibility claims \(b\). That is, the objective is todetermine the numerical value \(t\) for a probabilistic argument ofform \(P[h_j \mid c \cdot e\cdot b] = t\). This expression is usuallycalled theposterior probability of hypothesis \(h_j\) onevidence \((c \cdot e)\), given background \(b\). Thus, the primaryobjective of the logic is to assess the values \(t\) of theposterior probabilities of such evidential arguments.
The most basic inference rule for the Bayesian logic of evidentialsupport is comparative in nature. That is, this most basic rule doesnot directly provide values for individual posterior probabilities.Rather, it providesratio comparisons of the posteriorprobabilities (the argument weights) for competing hypotheses.
Let \(h_i\) and \(h_j\) be any two distinct hypotheses from a list ofcompeting alternatives. Thecomparative degrees-of-supportfor these two hypotheses is given by a numerical value \(q\) for theratio of their posterior probabilities: \(P[h_i \mid c \cdot e\cdot b]/ P[h_j \mid c \cdot e\cdot b] = q\). This ratio measures how muchmore (or less) strongly the premise \((c \cdot e \cdot b)\) supports\(h_i\) than it supports \(h_j\). The most basic rule for the logicstates a direct way to calculate the values \(q\) for such ratios; andit does this without providing values for the individual posteriorprobabilities, \(P[h_i \mid c \cdot e \cdot b]\) and \(P[h_j \mid c\cdot e \cdot b]\), themselves. We’ll see how this works when weintroduce the relevant inference rule, in the next subsection.
The inference rule for determining the value \(q\) of a posteriorprobability ratio draws on only two distinct kinds of probabilisticarguments:
1. Thelikelihoods of the evidence according to varioushypotheses: Alikelihood is a probabilistic argument ofform \(P[e \mid h_k \cdot c \cdot b] = r\). It is a probabilisticargument from premises \((h_k \cdot c \cdot b)\) to a conclusion\(e\). This argument expresses what hypothesis \(h_k\)saysabouthow likely it is that evidence claim \(e\) should betrue when evidential conditions \(c\) and auxiliary claims statedwithin \(b\) are also true. Likelihoods express the empirical contentof a hypothesis, what itsays an observable part of the worldis probably like. In order for two hypotheses, \(h_i\) and \(h_j\), todiffer in empirical content (given \(b\)), there must be somepossible evidential conditions \(c\) that have possibleoutcomes \(e\) on which the likelihoods for the two hypothesesdisagree:
\(P[e \mid h_i \cdot c \cdot b] = r \neq s = P[e \mid h_j \cdot c\cdot b].\)
It turns out that Bayesian inductive inference rules don’tdepend directly on the individual values of likelihoods, but only onthe values \(v\) ofratios of likelihoods:
\(v = P[e \mid h_i \cdot c \cdot b] / P[e \mid h_j \cdot c \cdot b]\).
Theselikelihood ratios (a.k.a.Bayes Factors)represent how much more (or less) likely the evidential outcome \(e\)should be if hypothesis \(h_i\) is true than if alternative hypothesis\(h_j\) is true. They embody the means by which empirical contentevidentially distinguishes between two competing hypotheses.
In many scientific contexts the exact values of individual likelihoodsare calculable, often via some explicit statistical model on which thehypothesis together with auxiliaries, \((h_k \cdot b)\), draws.Clearly, in contexts where the exact values of likelihoods arecalculable, exact values of these likelihood ratios are calculable aswell. However, even in cases where the individual hypotheses, \(h_i\)and \(h_j\), provide somewhat vague or imprecise information regardingthe values for individual likelihoods, it may be possible to assessreasonable estimates of upper and lower bounds on their likelihoodratios. We will see how such bounds on likelihood ratios may provideimportant evidential inputs for the inductive inference rules.
When the evidence consists of a collection of \(m\) distinctexperiments or observations and their outcomes, \((c_1 \cdot e_1)\),\((c_2 \cdot e_2)\), …, \((c_m \cdot e_m)\), we use the term\(c\) to represent the conjunction of these experimental orobservational conditions, \((c_1 \cdot c_2 \cdot \ldots \cdot c_m)\),and we use the term \(e\) to represent the conjunction of theirrespective outcomes, \((e_1 \cdot e_2 \cdot \ldots \cdot e_m)\). Fornotational convenience we may employ the term \(c^m\) to abbreviatethe conjunction of the \(m\) experimental conditions, and we use theterm \(e^m\) to abbreviate the corresponding conjunction of theiroutcomes. Given a specific hypothesis \(h_k\) together with relevantauxiliaries \(b\), the evidential outcomes of these distinctexperiments or observations will usually be probabilisticallyindependent of one another, and will also be independent of theexperimental conditions for one another’s outcomes. In that casethe likelihood \(P[e \mid h_k \cdot c \cdot b]\) decomposes into thefollowing terms:
Thus, when the likelihoods represent evidence that consists of acollection of \(m\) distinct probabilistically independent experiments(or observations) and their respective outcomes, the likelihood ratiosmay take the following form:
\[\begin{align}&\frac{P[e \mid h_i \cdot c \cdot b]}{P[e \mid h_j \cdot c \cdot b]} = \frac{P[e^m \mid h_i \cdot c^m \cdot b]}{P[e^m \mid h_j \cdot c^m \cdot b]} \\&~ = \frac{P[e_1 \mid h_i \cdot c_1 \cdot b]}{P[e_1 \mid h_j \cdot c_1 \cdot b]} \times \frac{P[e_2 \mid h_i \cdot c_2 \cdot b]}{P[e_2 \mid h_j \cdot c_2 \cdot b]} \times \ldots \times \frac{P[e_m \mid h_i \cdot c_m \cdot b]}{P[e_m \mid h_j \cdot c_m \cdot b]}.\end{align}\]2. Theprior plausibilities of hypotheses: Apriorprobability is a probabilistic argument for or against ahypothesis of form \(P[h_k \mid b]\) or \(P[h_k \mid c \cdot b]\),where the information carried by \(b\) or \((c \cdot b)\) doesnot contain the kinds of evidential outcomes \(e\) for whichthe \(h_k\) expresses likelihoods. These probabilistic arguments neednot bea prior arguments for hypothesis \(h_k\), as some havesuggested. Nor need they merely express the subjective opinions ofindividual persons. Rather, the values for these arguments shouldrepresent an assessment of the plausibility of hypotheses based on arange of relevant considerations, including broadly empirical factsnot captured by evidential likelihoods. For instance, suchplausibility arguments may involve considerations of thesimplicity of the hypothesis, whether it is overlyadhoc, whether it provides (or is at least consistent with) areasonable causal mechanism, etc. Such considerations may beexplicitly stated within statement \(b\). (This view on the nature ofBayesian probabilities, and especially the prior probabilities, mostclosely follows in the tradition of such Bayesians as Keynes,Jeffreys, and Jaynes. Alternatively, many Bayesians, in the traditionof Ramsey, de Finetti, and Savage, take all Bayesian probabilities,including the priors, to express individual subjective degrees ofbelief. However, the mathematical rules of the Bayesian logic itselfdo not in any way depend on the resolution of this issue regardingconceptual nature of Bayesian probabilities. So we can set this issueaside here.)
In many contexts such initial plausibility assessments will not bewell-represented by precise numerical values. However, it turns outthat the inductive inference rules presented below need only draw onthe values \(u\) forratios of priors:
\[ u = P[h_i \mid c \cdot b] / P[h_j \mid c \cdot b].\]These ratios represent how much more (or less) plausible hypothesis\(h_i\) is taken to be than alternative hypothesis \(h_j\), giventheir comparativesimplicity,ad hocness,causalviability, etc., and including whatever broadly empirical factorsare relevant to the specific field of inquiry to which thesehypotheses are relevant.
Furthermore, such comparative plausibility assessments may often betoo vague to be represented by precise numerical values. Rather, theywill often be best represented by numerical intervals:
\[ u \ge P[h_i \mid c \cdot b] / P[h_j \mid c \cdot b] \ge v,\]for real numbers \(u\) and \(v\).
One more point. Although the description of theobservational/experimental conditions, embodied by \(c\), will notusually be relevant to the prior probability values (in the absence ofoutcome \(e\)), the probabilistic logic itself doesn’tautomatically permit the dismissal of information that may becontained in \(c\). Rather, the logic requires that the relevance of\(c\) be specifically addressed. However, if absent outcome \(e\),conditions \(c\) are equally relevant to \(h_i\) and \(h_j\), then theprobabilistic logic permits \(c\) to be dropped, yielding comparativeplausibility ratios of the following form:
\[u \ge P[h_i \mid b] / P[h_j \mid b] = P[h_i \mid c \cdot b] / P[h_j \mid c \cdot b] \ge v.\]So, although the rules for inductive inferences described below willcontinue to include statements \(c\) within the prior probabilityarguments, the reader should keep in mind that \(c\) is usually notrelevant to these arguments, and can be dropped from them.
The logic of evidential support combines the numerical values of thesetwo kinds of factors to produce an assessment of the degree ofsupport, \(P[h_k \mid c \cdot e \cdot b]\), for hypotheses. To see howthis works, first return to following form of Bayes’ Theorem,applied to each hypothesis \(h_k\): \[P[h_k \mid c \cdot e \cdot b] = \frac{P[e \mid h_k \cdot c \cdot b] \times P[h_k \mid c \cdot b]}{P[e \mid c \cdot b]}.\] The value of the term\(P[e \mid c \cdot b]\), which occurs in the denominator of this formof Bayes’ Theorem, is usually difficult (even impossible) toassess. So it is generally more useful to consider thecomparativesupport of pairs of competing hypotheses by the evidence. ApplyingBayes’ Theorem to each of a pair of hypotheses, \(h_i\) and\(h_j\), and then taking their ratio, produces the following formulafor assessing their comparative support, via the ratio of theirposterior probabilities: \[\frac{P[h_i \mid c \cdot e \cdot b]}{P[h_j \mid c \cdot e \cdot b]} = \frac{P[e \mid h_i \cdot c \cdot b] \times P[h_i \mid c \cdot b]}{P[e \mid h_j \cdot c \cdot b] \times P[h_j \mid c \cdot b]}.\] The following two sectionsexplicate this Ratio Form of Bayes’ Theorem, and show how itcaptures the essential features of Bayesian inductive inference.
In this section and the next we look at two closely related versionsof Bayes’ Theorem as it applies to competing hypotheses. Thepresent section is devoted to the most elementary version, theRatio Form of Bayes’ Theorem. Here it is.
Rule RB: Ratio Form of Bayes’ Theorem
Let \(h_1\), \(h_2\), …, be a list of two or more alternativehypotheses,alternatives in the sense that the conjunction ofany two of them, \((h_i \cdot h_j)\), is logically inconsistent (i.e.no two of them can both be true): \(\vDash \neg (h_i \cdot h_j)\). Let\(c\) be observational or experimental conditions for which \(e\) isamong the possible outcomes. And suppose \(b\) is a conjunction ofrelevant auxiliary hypotheses and plausibility considerations.
Let \(h_j\) be any hypothesis from the list for which both \(P[e \midh_j \cdot c \cdot b] > 0\) and \(P[h_j \mid c \cdot b] >0\).
Then \(P[h_j \mid c \cdot e \cdot b] > 0\),and for each\(h_i\) among the alternatives to \(h_j\),\[ \frac{P[h_i \mid c \cdot e \cdot b]}{P[h_j \mid c \cdot e \cdot b]} = \frac{P[e \mid h_i \cdot c \cdot b]}{P[e \mid h_j \cdot c \cdot b]} \times \frac{P[h_i \mid c \cdot b]}{P[h_j \mid c \cdot b]}.\]
This ratio also provides an upper bound on \(P[h_i \mid c \cdot e\cdot b]\), since\[P[h_i \mid c \cdot e \cdot b] \le \frac{P[h_i \mid c \cdot e \cdot b]}{P[h_j \mid c \cdot e \cdot b]}.\]
ThisRatio Form of Bayes’ Theorem is straightforwardlyderivable from the above axioms for conditional probabilityfunctions.
In any application ofRule RB, thelikelihood ratioscarry the full import of the evidence \((c \cdot e)\). The evidenceinfluences the evaluation of hypotheses in no other way. In manyscientific contexts, each hypothesis (together with auxiliaries)provides a precise value for the likelihoods of evidence claims. Insuch cases the exact values forlikelihood ratios can becalculated. Indeed, in any given epistemic context,RB isuseful as arule of inference for inductive logic only if, foreach pair of hypothesis \(h_i\) and \(h_j\) in the context, the valuesof (or at least reasonable bounds on) theirlikelihood ratiosare determinable or calculable.
InRule RB, the only other factor that influences the valueof theratio of posterior probabilities is the ratio of theirassociated prior probabilities. And theseratios of priors playa central role. So, forRule RB to be useful as a rule ofinference for inductive logic, the values of theseratios ofpriors must be estimable or calculable — or, at leastcredible upper and lower bounds on them must be assessable.
For some kinds of hypotheses, reasonably precise values for theindividual prior probabilities may be available, so the numericalvalue for theratio of priors may be calculated. However, inmany epistemic contexts the prior probability values for individualhypotheses are vague and difficult to determine. In these contexts itwill often be easier to assess theratio of priors directly,since it represents an assessment of how much more (or less) plausibleone hypothesis is than another. Indeed, an assessment of credibleupper and lower bounds oncomparative plausibilities sufficesfor the kinds of inductive inferences supplied byRule RB.For, given a significant body of evidence, the associatedlikelihood ratios applied to wide bounds on thecomparativeprior plausibilities will often produce quite narrow bounds on theresulting ratios ofposterior probabilities.
Notice thatRule RB implies that if either \(P[e \mid h_i\cdot c \cdot b] = 0\) or \(P[h_i \mid c \cdot b] = 0\), then \(P[h_i\mid c \cdot e \cdot b] = 0\).
When \(P[h_i \mid c \cdot e \cdot b] = 0\) is due to \(P[e \mid h_i\cdot c \cdot b] = 0\), we have an extended version of the notion ofthefalsification of a hypothesis.Falsification isusually associated with the deductive refutation of a hypothesis byevidence. That is, when \((h_i \cdot c \cdot b) \vDash e^*\), but theactual outcome \(e\) is logically incompatible with \(e^*\), itfollows that \((h_i \cdot c \cdot b) \vDash \neg e\). Then,deductively, it also follows that \((c \cdot e \cdot b) \vDash \negh_i\), and \(h_i\) is said to befalsified by \((c \cdote)\), given \(b\).
Rule RB captures this idea, since when \((h_i \cdot c \cdotb) \vDash \neg e\), probability theory yields \(P[\neg e \mid h_i\cdot c \cdot b] = 1\), so \(P[e \mid h_i \cdot c \cdot b] = 0\), inwhich case ruleRB yields \(P[h_i \mid c \cdot e \cdot b] =0\). And, according toRB, \(P[e \mid h_i \cdot c \cdot b] =0\) suffices for \(P[h_i \mid c \cdot e \cdot b] = 0\), from which itfollows that \(P[\neg h_i \mid c \cdot e \cdot b] = 1\).
Rule RB goes further by showing how evidence may come tostrongly refute a hypothesis \(h_i\), without fully falsifyingit. Suppose now that both \(P[h_j \mid c \cdot b] > 0\) and \(P[h_i\mid c \cdot b] > 0\). Then, regardless of how plausible orimplausible \(h_i\) is taken to be as compared to \(h_j\), providedthat \(h_j\) isn’tway too implausible, if the body ofevidence \(e\) is sufficiently unlikely on \(h_i\) as compared to\(h_j\), thenRule RB says that the posterior probability of\(h_i\) on that evidence must also be extremely close to 0.
More formally, suppose that \(P[h_i \mid c \cdot b] / P[h_j \mid c\cdot b] \le K\), where \(K\) may be some very large number. Thisrepresents the idea that \(h_i\) is initially considered to be up to\(K\) times more plausible than \(h_j\). Let \(\epsilon\) be someextremely small number, as close to 0 as you wish. Then, according toRule RB, to get the value of \(P[h_i \mid c \cdot e \cdotb]\) within \(\epsilon\) of 0, it suffices for the body of evidence tofavor \(h_j\) over \(h_i\) strongly enough that \(P[e \mid h_i \cdot c\cdot b] \lt (\epsilon / K) \times P[e \mid h_j \cdot c \cdot b]\).That is, viaRule RB:
\[\begin{align}&\text{When }~ \frac{P[h_i \mid c \cdot b]}{P[h_j \mid c \cdot b]} \le K,~\text{ if }~ \frac{P[e \mid h_i \cdot c \cdot b]}{P[e \mid h_j \cdot c \cdot b]} \lt \frac{\epsilon}{K}, \\&\text{then }~ P[h_i \mid c \cdot e \cdot b] \lt \epsilon.\end{align}\]If all but the most extremely implausible alternatives to hypothesis\(h_j\) becomestrongly refuted in this way by a body ofevidence \((c \cdot e)\), then the posterior probability of \(h_j\),\(P[h_j \mid c \cdot e \cdot b]\), should approach 1. Thus, may\(h_j\) become strongly supported by the evidence. The next rule willendorse this idea more fully.
Rule RB contributes to a more comprehensive inference rule,one that applies to collections of competing hypotheses. This morecomprehensive rule employs the well-known probabilistic concept ofodds. By definition, theodds of \(A\) given \(B\),written \(\Omega[A \mid B]\), is related to theprobability of\(A\) given \(B\) by the formula: \[\Omega[A \mid B] = \frac{P[A \mid B]}{P[\neg A \mid B]}.\] However, for ourpurposes it will be more useful to employ the inverse ratio of theodds, theodds against \(A\) given \(B\): \[\Omega[\neg A \mid B] = \frac{P[\neg A \mid B]}{P[A \mid B]} = \frac{1 - P[A \mid B]}{P[A \mid B]}.\]From the definition ofodds against, it follows that:\[P[A \mid B] = \frac{1}{1 + \Omega[\neg A \mid B]}.\]
Here is how odds comes into play in Bayesian inductive logic. Sum theratio versions of Bayes’ Theorem, as given byRule RB,over a range of alternatives to hypothesis \(h_j\). This yields theOdds Form of Bayes’ Theorem. And from that we cancalculate the individual values of posterior probabilities.
Rule OB: Odds Form of Bayes’ Theorem
Let \(H\) = {\(h_1\), \(h_2\), …, \(h_n\)} be a collection oftwo or more alternative hypotheses (i.e. \(n \ge 2\)), where theconjunction of any two of them is logically inconsistent, \(\vDash\neg (h_i \cdot h_j)\). Let \(c\) be observational or experimentalconditions for which \(e\) is among the possible outcomes. And suppose\(b\) is a conjunction of relevant auxiliary hypotheses andplausibility considerations.
Let \(h_j\) be any hypothesis from the list for which both \(P[h_j\mid c \cdot b] > 0\) and \(P[e \mid h_j \cdot c \cdot b] > 0\).
Then \(P[h_j \mid c \cdot e \cdot b] > 0\)and for each\(h_i\) an alternative to \(h_j\),
\[\begin{align}\Omega[\neg h_j \mid c \cdot e \cdot b \cdot (h_i \vee h_j)] &=\frac{P[h_i \mid c \cdot e \cdot b]}{P[h_j \mid c \cdot e \cdot b]} \\ &= \frac{P[e \mid h_i \cdot c \cdot b]}{P[e \mid h_j \cdot c \cdot b]} \times \frac{P[h_i \mid c \cdot b]}{P[h_j \mid c \cdot b]}.\end{align}\]Furthermore,
\[\begin{align}\Omega[\neg h_j \mid& c \cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots \vee h_n)] \\ &= \sum_{i = 1, i \ne j}^n \Omega[\neg h_j \mid c \cdot e \cdot b \cdot (h_i \vee h_j)] \\ &= \sum_{i = 1, i \ne j}^n \frac{P[e \mid h_i \cdot c \cdot b]}{P[e \mid h_j \cdot c \cdot b]} \times \frac{P[h_i \mid c \cdot b]}{P[h_j \mid c \cdot b]}.\end{align}\]Finally, the associated posterior probability of \(h_j\), the degreeto which premise \((c \cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots\vee h_n))\) supports conclusion \(h_j\), is given by the formula
\[\begin{align}&P[h_j \mid c \cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots \vee h_n)] \\&\quad = \frac{1}{1 + \Omega[\neg h_j \mid c \cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots \vee h_n)]}.\end{align}\]Thus,Rule OB shows that theodds against ahypothesis, assessed against a finite collection of alternatives,depends only on the values ofratios of posteriorprobabilities, where each of these ratios entirely derives fromtheRatio Form of Bayes’ Theorem, stated byRuleRB. The same goes for theposterior probability of ahypothesis, since its value entirely derives from the odds againstit. Thus, theRatio Form of Bayes’ Theorem captures theessential features of the Bayesian evaluation of hypotheses. It showshow the impact of evidence, captured bylikelihood ratios,combine with comparative plausibility assessments of hypotheses,captured byratios of prior probabilities, to provide a netassessment of the extent to which hypotheses are refuted or supportedin a contest with their rivals.
We conclude this section with a comment about why the posterior oddsand posterior probabilities provided byRule OB usually needto be relativised to finite disjunctions of alternative hypotheses,\((h_1 \vee h_2 \vee \ldots \vee h_n)\).
First notice that in any specific epistemic context where thecollection of \(n\) alternative hypotheses, \(\{h_1, h_2, \ldots,h_n\},\) consists ofall possible alternatives about thesubject matter at issue, and if background statement \(b\) says so(i.e. if \(b \vDash (h_1 \vee h_2 \vee \ldots \vee h_n)\)), then theexplicit use of disjunctions of hypotheses can be dropped from theequations inRule OB. For, in that context,
\[\Omega[\neg h_j \mid c \cdot e \cdot b] = \Omega[\neg h_j \mid c\cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots \vee h_n)].\]However, in many epistemic contexts an investigator may not be awareofall possible alternative hypotheses or theories about thesubject at issue. For instance, the medical community may not haveidentified every possible disorder or disease that may afflict apatient. Furthermore, in some contexts it may not even be possible toformulateall possible alternative hypotheses or theories— e.g. all possible alternative theories about the fundamentalnature of space-time and the origin of the universe. In such cases,the best we can do is evaluate evidential support for (and against)those hypotheses we’ve formulated thus far, always keeping inmind that the list of alternatives might well be expanded toadditional alternatives.
Now, just one further point. Suppose that the list of \(n\)alternatives contains all alternative hypotheses that the relevantepistemic community has formulated so far, but other unidentifiedalternatives remain possible. Can we not appeal to the followingBayesian result to bypass the need to relativise to the disjunction ofpresently formulated alternative hypotheses? After all, this result isalso a theorem of probability theory.
For \(P[e \mid h_j \cdot c \cdot b] > 0\) and \(P[h_j \mid c \cdote\cdot b] > 0\),
where the final term is given by the equation,
The problem with this idea is that it draws on likelihoods of form\(P[e \mid (\neg h_1 \cdot \neg h_2 \cdot \ldots \cdot \neg h_n) \cdotc \cdot b]\). Such likelihoods will almost never have explicitlydeterminable or calculable values. So, the values of \(\Omega[\neg h_j\mid c \cdot e \cdot b]\) and \(P[h_j \mid c \cdot e \cdot b]\) thatderive from formulas that draw on this kind of likelihood must alsofail to be determinable or calculable. So, this approach tosidestepping the relativization to \((h_1 \vee h_2 \vee \ldots \veeh_n)\) is at cross-purposes with the idea that an inductive logicshould be couched in terms ofusable rules of inductiveinference.
Nevertheless, the calculable values of \(\Omega[\neg h_j \mid c \cdote \cdot b \cdot (h_1 \vee h_2 \vee \ldots \vee h_n)]\) provided byRule OB do entail explicitbounds on the values forthe non-disjunctively-relativized posterior odds and posteriorprobabilities. For, the probabilistic logic entails the followingrelationships:
\[\Omega[\neg h_j \mid c \cdot e \cdot b] \ge \Omega[\neg h_j \mid c\cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots \vee h_n)],\]and so
\[P[h_j \mid c \cdot e \cdot b] \le P[h_j \mid c \cdot e \cdot b \cdot(h_1 \vee h_2 \vee \ldots \vee h_n)].\]Thus, if the evidence pushes \(P[h_j \mid c \cdot e \cdot b \cdot (h_1\vee h_2 \vee \ldots \vee h_n)]\) close to 0, then it also must push\(P[h_j \mid c \cdot e \cdot b]\) close to 0. However, althoughpushing \(P[h_i \mid c \cdot e \cdot b \cdot (h_1 \vee h_2 \vee \ldots\vee h_n)]\) close to 0 for all \((n-1)\) competitors of \(h_j\)results in the approach of \(P[h_j \mid c \cdot e \cdot b \cdot (h_1\vee h_2 \vee \ldots \vee h_n)]\) to 1, it need not result in the theapproach of the non-disjunctively-relativized posterior \(P[h_j \mid c\cdot e \cdot b]\) to 1. For, some as yet unconsidered alternativehypothesis may well be able to do better than \(h_j\) on the currentlyavailable evidence \((c \cdot e \cdot b)\). The logic of Bayesianinference does not rule out this possibility.
This section specifies two additional inference rules for Bayesianinductive logic. They are specialized versions of Bayes’ Theorem— basically extended versions of ruleOB. These two rulesare especially useful in cases of interval estimation, where theevidence bears on whether the true hypothesis lies within somespecific interval of alternative claims. The first of these two ruleswill be stated in terms of evidential support for disjunctions ofhypotheses. The precise statement of this rule does not presupposethat the hypotheses it addresses lie within some interval of values;rather, it applies to the support for any finite disjunction ofhypotheses. However, one of its important applications is to theevidential support of adisjunctive interval of alternativehypotheses. An example application to a disjunctive interval ofalternative hypotheses is provided in Section 2.4.
The second rule applies to the support of competing hypotheses thatrange over continuous intervals of real numbers. For example, considereach hypothesis of form, “the chance ofheads on tossesof this particular (possibly biased) coin is \(r\)”, where \(r\)must have some real number value between 0 and 1. Perhaps the truevalue of \(r\) for this particular coin is .72. However, the evidencewon’t usually single out this exact chance hypothesis. Rather,the best we can usually do is use evidence to narrow down the intervalwithin which the true value of \(r\) very probably resides (e.g. showthat the posterior probability that \(r\) lies between .67 and .77 is.95, based on the evidence). The statement of this second intervalestimation rule will closely resemble the statement of the first rule,but modifies it to apply to continuous intervals of values. An exampleis provided in Section 2.5.
The following rule provides lower bounds on the posterior probabilityof disjunctions of alternative hypotheses. It derives from the aboveaxioms for conditional probabilities, with no additional suppositionsbeyond those explicitly stated in the rule itself. Although thestatement of this rules is quite general, its most common applicationis to disjunctions of hypotheses about closely spaced numericalquantities.
Rule BE-D: Bayesian Estimation for Disjunctions of AlternativeHypotheses
Let \(H\) be a collection of \(z\) alternative hypotheses, \(z \ge2\), where the conjunction of any two of them is logicallyinconsistent. Let \(c\) be observational or experimental conditionsfor which \(e\) describes one of the possible outcomes. And suppose\(b\) is a conjunction of relevant auxiliary hypotheses andplausibility considerations. For each hypothesis \(h_i\) in \(H\), letits prior probability be non-zero: \(P[h_i \mid c \cdot b] \gt0\).
Choose any \(k\) hypotheses from collection \(H\), where each one ofthem, \(h_i\), has a likelihood value \(P[e \mid h_i \cdot c \cdot b]> 0\). Label these \(k\) hypotheses (in whatever order you wish) as\(\lsq h_1\rsq\), \(\lsq h_2\rsq\), \(\ldots\), \(\lsq h_k\rsq\). Thenlabel all the remaining hypotheses in \(H\) (in whatever order youwish) as \(\lsq h_{k+1}\rsq\), \(\lsq h_{k+2}\rsq\), \(\ldots\),\(\lsq h_z\rsq\).
Given these labelings of hypotheses in \(H\), let \((h_1 \vee \ldots\vee h_k)\) represent the disjunction of the first \(k\) hypotheseschosen from \(H\), and \((h_{k+1} \vee \ldots \vee h_z)\) representthe disjunction of the remaining hypotheses from \(H\). The expression\((h_1 \vee \ldots \vee h_z)\) represents the disjunction of allhypotheses in \(H\). Furthermore, let’s take \(b\) to logicallyentail that one of the hypotheses in \(H\) is true — i.e. \(b\)logically entails the disjunction of all alternative hypotheses in\(H\): \(b \vDash (h_1 \vee \ldots \vee h_z)\). So, both \(P[(h_1 \vee\ldots \vee h_z) \mid c \cdot b] = 1\) and \(P[(h_1 \vee \ldots \veeh_z) \mid c \cdot e \cdot b] = 1\).
Then, the posterior probability of \((h_1 \vee \ldots \vee h_k)\)satisfies the following form of Bayes’ Theorem:
In cases where the values of all the prior probabilities, \(P[h_i \midc \cdot b]\), are known, or can be closely approximated, this equationsuffices to provide values for the argument strengths \(r\) of theposterior probabilities, \(P[(h_1 \vee \ldots \vee h_k) \mid c \cdot e\cdot b] = r\). But when no precise values of the priors areavailable, a useful estimate of bounds on the posterior probabilitiesmay be derived as follows.
Let \(K\) be (your best estimate of) an upper bound on the ratios ofprior probabilities, \(P[h_i \mid c \cdot b] / P[h_j \mid c \cdot b]\)for all \(h_j\) in \(\{h_1, h_2, \ldots, h_k\}\) and all \(h_i\) in\(\{h_{k+1}, h_{k+2}, \ldots, h_z\}\). That is, for whichever \(h_j\)in \(\{h_1, h_2, \ldots, h_k\}\) has the smallest value of \(P[h_j\mid c \cdot b]\), and for whichever \(h_i\) in \(\{h_{k+1}, h_{k+2},\ldots, h_z\}\) has the largest value of \(P[h_i \mid c \cdot b]\),let \(K\) be a real number that is large enough that \(K \ge P[h_i\mid c \cdot b] / P[h_j \mid c \cdot b]\).
Then, \[\Omega[\neg (h_1 \vee \ldots \vee h_k) \mid c \cdot e \cdot b] \; \; \le \; \; K \times \left[\frac{1}{\frac{\sum_{j = 1}^k P[e \; \mid \; h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \; \mid \; h_i \cdot c \cdot b]}} - 1 \right].\]
Thus, a lower bound on the associated posterior probability of \((h_1\vee \ldots \vee h_k)\) is given by the formula \[P[(h_1 \vee \ldots \vee h_k) \mid c \cdot e \cdot b] \; \; \ge \; \;\frac{1}{1 + K \times \left[\frac{1}{\frac{\sum_{j = 1}^k P[e \; \mid \; h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \; \mid \; h_i \cdot c \cdot b]}} - 1 \right]}.\]
A few points about this rule are worth noting. First, notice that theterm \(\sum_{j = 1}^k P[e \mid h_j \cdot c \cdot b] / \sum_{i = 1}^zP[e \mid h_i \cdot c \cdot b]\) is the ratio of the sum of the first\(k\) likelihoods to the sum of all the likelihoods for hypotheses in\(H\). So, although this rule applies to any collection \(H\)consisting of \(z\) alternative hypotheses, it is most usefullyapplied when each hypothesis \(h_j\) contained in the disjunction\((h_1 \vee h_2 \vee \ldots \vee h_k)\) has a greater likelihoodvalue, \(P[e \mid h_j \cdot c \cdot b]\), than any of the otherhypotheses in \(H\). This is usually the most interesting case inwhich a lower bound on the posterior probability, \(P[(h_1 \vee \ldots\vee h_k) \mid c \cdot e \cdot b]\), is assessed. For, when these\(k\) likelihoods yield a sum much greater than likelihoods for theother hypotheses in \(H\), then this ratio term may approach 1, whichin turn drives the lower bound on the posterior probability, \(P[(h_1\vee \ldots \vee h_k) \mid c \cdot e \cdot b]\), close to 1. We willsee how this can happen in an example in Section 2.4.
Notice that when all the prior probabilities are equal, the value of\(K\) will be 1. In that case the final formula can be replaced by theequality, \[P[(h_1 \vee \ldots \vee h_k) \mid c \cdot e \cdot b] \; \; = \; \;\frac{\sum_{j = 1}^k P[e \mid h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \mid h_i \cdot c \cdot b]}.\]
When each of the prior probabilities for the first \(k\) hypotheses isat least as large as any of the prior probabilities for the remaining\(z-k\) hypotheses, the value of \(K\) must be less than or equal to1. In that case, the following version of the final formula holds:\[\begin{align}P[(h_1 \vee \ldots \vee h_k) \mid c \cdot e \cdot b] &\ge\frac{1}{1 + K \times \left[\frac{1}{\frac{\sum_{j = 1}^k P[e \; \mid \; h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \; \mid \; h_i \cdot c \cdot b]}} - 1 \right]} \\&\ge \frac{\sum_{j = 1}^k P[e \mid h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \mid h_i \cdot c \cdot b]}.\end{align}\]
Derivations of the two Bayesian Estimation Rules,Rule BE-D,andRule BE-C (which will be described in the next subsection)are provided in the following appendix:Derivations of the Two Bayesian Estimation Rules,Rule BE-D andRule BE-C.
A rule similar toBE-D applies to a continuous range ofcompeting hypotheses. For example, the claim that “the chancer ofheads on tosses of this coin lies between .63 andpoint .81” consists of a continuous (disjunctive) interval ofcompeting hypotheses. So,the statement of the following rule closelyparallels the statement ofRule BE-D. An example of itsapplication is provided in Section 2.5.
Rule BE-C: Bayesian Estimation for a Continuous Range ofAlternative Hypotheses
Let \(H\) be a continuous region of alternative hypotheses \(h_q\),where \(q\) is a real number, and where the conjunction of any two ofthese hypotheses is logically inconsistent. Let \(c\) be observationalor experimental conditions for which \(e\) describes one of thepossible outcomes. And suppose \(b\) is a conjunction of relevantauxiliary hypotheses and plausibility considerations. For each pointhypothesis \(h_q\) in \(H\), we take \(p[e \mid h_q \cdot c \cdot b]\)to be an appropriate likelihood.
Let \(p[h_q \mid c \cdot b]\) and \(p[h_q \mid c \cdot e \cdot b]\) beprobability density functions on \(H\), where these two densityfunctions are related as follows: \[p[h_q \mid c \cdot e \cdot b] \times P[e \mid c \cdot b] \;=\; p[e \mid h_q \cdot c \cdot b] \times p[h_q \mid c \cdot b].\]
We suppose throughout that prior probability density \(p[h_q \mid c\cdot b] > 0\) for all values of \(q\).
The prior probability that the true point hypothesis \(h_r\) lieswithin measurable region \(R\) is given by
\(P[h_R \mid c \cdot b] \; = \; \int_R p[h_r \mid c \cdot b] \;dr,\;\;\) where \(\; P[h_H \mid c \cdot b] \; = \; \int_H p[h_q \mid c\cdot b] \; dq \: =\: 1\).
The posterior probability that the true point hypothesis \(h_r\) lieswithin measurable region \(R\) is given by
\(P[h_R \mid c \cdot e \cdot b] \; = \; \int_R p[h_r \mid c \cdot e\cdot b] \; dr, \;\;\) where \(\;P[h_H \mid c \cdot e \cdot b] \; = \;\int_H p[h_q \mid c \cdot e \cdot b] \; dq \: =\: 1\).
Then, the posterior probability satisfies the following equation foreach measurable region \(R\): \[\begin{align}P[h_R \mid c \cdot e \cdot b] &= \frac{\int_R p[e \mid h_r \cdot c \cdot b] \times p[h_r \mid c \cdot b] \; \; dr}{\int_H p[e \mid h_q \cdot c \cdot b] \times p[h_q \mid c \cdot b] \; \; dq}.\end{align}\]
In cases where a precise model of the prior probability density,\(p[h_q \mid c \cdot b]\), is available, this equation suffices toprovide values for the posterior probabilities, \(P[h_R \mid c \cdot e\cdot b]\). However, when no precise model of the priors is available,bounds on the values of posterior probabilities may be evaluated inthe following way.
Let \(K\) be (your best estimate of) an upper bound on the ratios ofthe probability density values, \(p[h_q \mid c \cdot b] / p[h_r \mid c\cdot b]\), for each \(h_r\) in region \(R\) and \(h_q\) in \((H-R)\).That is, for whichever \(h_r\) in \(R\) has the smallest value of\(p[h_r \mid c \cdot b]\), and for whichever \(h_q\) in \((H-R)\) hasthe largest value of \(p[h_q \mid c \cdot b]\), let \(K\) be a realnumber such that \(K \ge p[h_q \mid c \cdot b] / p[h_r \mid c \cdotb]\).
Then, \[\begin{align}\Omega[\neg h_R \mid c \cdot e \cdot b] & \; \le \; K \times \left[\frac{1}{\frac{\int_{R} \; p[e \:\mid\; h_r \cdot c \cdot b] \; \; dr}{\int_{H} \; p[e \;\mid\; h_q \cdot c \cdot b] \; \; dq}} - 1 \right].\end{align}\] Thus, a lower bound on the associated posteriorprobability of \(h_R\) is given by the formula \[P[h_R \mid c \cdot e \cdot b] \; \; \ge \; \;\frac{1}{1 + K \times \left[\frac{1}{\frac{\int_{R} \; p[e \;\mid\; h_r \cdot c \cdot b] \; \; dr}{\int_{H} \; p[e \;\mid\; h_q \cdot c \cdot b] \; \; dq}} - 1 \right]}.\]
In Bayesian statistics, interval hypotheses of this kind on whichposterior probabilities are assessed are calledcredibleintervals. The posterior probabilities of such intervals areusually calculated from prior probability distributions governed byexplicitly known (or assumed) prior probability density functions.Often the assumed density function is given by \(p[h_q \mid c \cdot b]= 1\) over all \(h_q\) in \(H\), in which case the prior is said tohave a flat distribution. When the prior is flat, the value of\(K=1\), and the precise value of the posterior probability for region(interval) \(R\) is given by the formula, \[P[h_R \mid c \cdot e \cdot b] \; \; = \; \;\frac{\int_R p[e \mid h_q \cdot c \cdot b] \; \; dr}{\int_H p[e \mid h_q \cdot c \cdot b] \; \; dq}.\]
Rule BE-C is closely related to the BayesianPrinciple ofStable Estimation (Edwards, Lindman, Savage, 1963), but somewhatsimpler and easier to apply. An example of its application is suppliedin Section 2.5.
As already noted, the logical connection between hypotheses and theevidence expressed by thelikelihoods often requires themediation of auxiliary hypotheses. When competing hypotheses, \(h_i\)and \(h_j\) draw on distinct, incompatible auxiliary hypotheses,\(a_i\) and \(a_j\), respectively, these auxiliaries cannot becollected into a common background claim \(b\). Rather, they must beevidentially evaluated along with (in conjunction with) the hypothesesthat draw on them. In that caseRule RB applies as follows:\[ \frac{P[(h_i \cdot a_i) \mid c \cdot e \cdot b]}{P[(h_j \cdot a_j) \mid c \cdot e \cdot b]} = \frac{P[e \mid (h_i \cdot a_i) \cdot c \cdot b]}{P[e \mid (h_j \cdot a_j) \cdot c \cdot b]} \times \frac{P[(h_i \cdot a_i) \mid c \cdot b]}{P[(h_j \cdot a_j) \mid c \cdot b]}.\]
But when two competing hypotheses draw on the same auxiliaries \(a\),the logic treats them as “given” with regard to thecomparative support of those hypotheses. To see how the probabilisticlogic endorses this treatment, consider howRule RB applies toa pair of hypotheses when each is conjoined to the same auxiliary (orconjunction of auxiliaries), \(a\). First notice thatRule RBapplies to the comparative support for \((h_i \cdot a)\) verses \((h_j\cdot a)\) as expressed above. (Here we let \(d\) contain backgroundand auxiliaries other than \(a\), so that the previous backgroundclaim \(b\) now consists of the conjunction (\(a \cdot d)\)):\[ \frac{P[(h_i \cdot a) \mid c \cdot e \cdot d]}{P[(h_j \cdot a) \mid c \cdot e \cdot d]} = \frac{P[e \mid (h_i \cdot a) \cdot c \cdot d]}{P[e \mid (h_j \cdot a) \cdot c \cdot d]} \times \frac{P[(h_i \cdot a) \mid c \cdot d]}{P[(h_j \cdot a) \mid c \cdot d]}.\]
Consider the following probabilistically valid rule — Axiom 5 ofthe axioms for conditional probabilities:
\[P[(A \cdot B) \mid C] = P[A \mid B \cdot C] \times P[B \mid C].\]Applying this rule to each posterior probability in the previous ratioof posteriors yields
\[\begin{align}\frac{P[(h_i \cdot a) \mid c \cdot e \cdot d]}{P[(h_j \cdot a) \mid c \cdot e \cdot d]} &= \frac{P[h_i \mid a \cdot c \cdot e \cdot d] \times P[a \mid c \cdot e \cdot d]}{P[h_j \mid a \cdot c \cdot e \cdot d] \times P[a \mid c \cdot e \cdot d]} \\ &= \frac{P[h_i \mid c \cdot e \cdot (a \cdot d)]}{P[h_j \mid c \cdot e \cdot (a \cdot d)]}\end{align}\]Similarly, applying this rule to each prior probability in theprevious ratio of priors yields
\[\frac{P[(h_i \cdot a) \mid c \cdot d]}{P[(h_j \cdot a) \mid c \cdot d]} = \frac{P[h_i \mid a \cdot c \cdot d] \times P[a \mid c \cdot d]}{P[h_j \mid a \cdot c \cdot d] \times P[a \mid c \cdot d]} =\frac{P[h_i \mid c \cdot (a \cdot d)]}{P[h_j \mid c \cdot (a \cdot d)]}.\]Now, substituting these equal posterior ratios and equal prior ratiosinto the previous version ofRB for \((h_i \cdot a)\) and\((h_i \cdot a)\) yields
\[ \frac{P[h_i \mid c \cdot e \cdot (a \cdot d)]}{P[h_j \mid c \cdot e \cdot (a \cdot d)]} = \frac{P[e \mid h_i \cdot c \cdot (a \cdot d)]}{P[e \mid h_j \cdot c \cdot (a \cdot d)]} \times \frac{P[h_i \mid c \cdot (a \cdot d)]}{P[h_j \mid c \cdot (a \cdot d)]}.\]Thus, when auxiliaries \(a\) are employed in common by competinghypotheses, they may be swept into a common collection of backgroundclaims \(b\) (i.e., becoming \((a \cdot d)\) in this example).
As with any logic, the logic of inductive support only tells us what agiven collection of premises implies about various conclusions. It maywell happen that auxiliary \(a\) together the body of evidence \((c\cdot e)\) implies, via likelihood ratios, that hypothesis \(h_j\) isstrongly supported over \(h_i\), \[ \frac{P[e \mid h_i \cdot c \cdot (a \cdot d)]}{P[e \mid h_j \cdot c \cdot (a \cdot d)]} \ll 1,\] whereas, rival auxiliary\(a_r\) together with the same body of evidence may tell us, vialikelihood ratios, that \(h_i\) is strongly supported over \(h_j\),\[ \frac{P[e \mid h_i \cdot c \cdot (a_r \cdot d)]}{P[e \mid h_j \cdot c \cdot (a_r \cdot d)]} \gg 1.\]
This ability to switch between auxiliaries to the benefit of onehypothesis over another seems epistemically dubious. Does the logicpermit epistemic agents to simply employ whatever auxiliaries may besthelp support their own favorite hypotheses?
No, not exactly. As with any logic, only arguments that have truepremises warrant their conclusions as true, or, for an inductivelogic, as more or less probably true. So, if we can determine which ofthe alternative auxiliaries, \(a\) or \(a_r\), is true, then, providedthe body of evidence \((c \cdot e)\) is also true, the problem wouldbe solved. Our best assessment of which alternative hypothesis,\(h_j\) or \(h_i\), is most probably true should draw on premises(evidence and auxiliaries) that are themselves true. But how are we todetermine which auxiliaries are true? By assessingtheirprobable truth based on the body of evidence for and againstthem.
That is, the auxiliary hypotheses themselves are subject to evidencethat may strongly support (the truth of) one of them over its rivals.Furthermore, this evidential support for the auxiliaries can, in turn,impact the support of hypotheses that draw on them. To see how thishappens, consider again the two alternative auxiliaries (oralternative conjunctions auxiliaries) \(a\) and \(a_r\). Suppose thata large body of evidence, \((c^* \cdot e^*)\), bears on \(a\) and itsrivals, and that this body of evidence strongly supports \(a\) overeach of them. In particular, suppose that according toRule RBthis body of evidence supplies very strong support for \(a\) overrival \(a_r\):
\[ \frac{P[a_r \mid c^* \cdot e^* \cdot d]}{P[a \mid c^* \cdot e^* \cdot d]} = \frac{P[e^* \mid a_r \cdot c^* \cdot d]}{P[e^* \mid a \cdot c^* \cdot d]} \times \frac{P[a_r \mid c^* \cdot d]}{P[a \mid c^* \cdot d]} = \epsilon,\]for some extremely small value of \(\epsilon\).
So, according to this body of evidence, \(a\) is much more likely tobe true than \(a_r\). Intuitively, this provides good epistemic reasonto employ \(a\) rather than \(a_r\) as premises in the evaluation ofhypotheses \(h_j\) verses \(h_i\). When the evidence strongly supportsone auxiliary hypothesis over an alternative, it makes good epistemicsense to draw on the most strongly supported auxiliary. Indeed, theBayesian logic can be shown to reinforce this intuition in a sensibleway. The following appendix works through the technical details of atheorem that establishes this claim.
An Epistemic Advantage of Drawing on Well-Supported Auxiliary Hypotheses
Bayesian inductive logic captures the structure of evidential supportfor all sorts of scientific hypotheses, ranging from simple diagnosticclaims (e.g., “the patient is infected by the SARS-CoV-2virus”) to complex scientific theories about the fundamentalnature of the world, such as quantum theories and the theory ofrelativity. As we’ve seen, the logic is essentially comparative.The evaluation of a hypothesis depends on how strongly evidencesupports it over rival hypotheses. In this section we consider severalapplications of this logic to the evidential evaluation of scientifichypotheses and theories.
We have seen that comparisons among theposteriorprobabilities of hypotheses depend on just two kinds of factors:(1) thelikelihoods of evidential outcomes \(e\) according toeach hypothesis \(h_k\), when conjoined with auxiliaries \(b\) andevidential initial conditions \(c\), \(P[e \mid h_k\cdot c \cdot b]\);and (2) theprior probability of each hypotheses, \(P[h_k\mid c \cdot b]\). The likelihoods capture what a hypothesissays about how evidential aspects of the world should turn out(if the hypothesis is true). The prior probabilities representassessments of how plausible a hypothesis is assessed to be on groundsnot captured by evidential likelihoods.
Plausibility assessments of hypotheses and theories always play animportant, legitimate role in the sciences. Plausibility assessmentsare often backed by extensive arguments that may draw on forcefulconceptual considerations together with broadly empirical claims notcaptured by the evidential likelihoods. Scientists often bringplausibility arguments to bear in assessing competing views. Althoughsuch arguments are usually far from decisive, they may bring thescientific community into widely shared agreement with regard to theimplausibility of some logically possible alternatives. Thisseems to be the primary epistemic role of thought experiments.Consider, for example, the kinds of plausibility arguments that havebeen brought to bear on the various interpretations of quantum theory(e.g., those related to the measurement problem). These arguments goto the heart of conceptual issues that were central to the originaldevelopment of the theory. Many of these issues were first raised bythose scientists who made the greatest contributions to thedevelopment of quantum theory, in their attempts to get a conceptualhold on the theory and its implications.
Furthermore, given any body of evidence, it is easy enough to cook uplogically possible alternative hypotheses that completelyaccount for the evidence. These cooked up,ad hoc hypothesesmay be constructed so as to logically entail all the known evidence,providing likelihood values equal to 1 for the totality of theavailable evidence. Although most of these cooked up hypotheses willbe laughably implausible, and no scientist would give them a momentsnotice, the evidential likelihoods are unable to rule them out. Onlyplausibility considerations, represented via prior probabilities,provide a place for the inductive logic to bring suchimplausibility considerations to bear.
Among those hypotheses that are not laughably implausible, thecontributions of prior plausibility assessments may be substantially“washed out” as a sufficiently strong body of evidencebecomes available. Thus, provided the prior probability of a truehypothesis isn’t assessed to be too close to zero, the influenceof the values of the prior probabilities willvery probablyfade away as evidence accumulates. Various Bayesianconvergenceresults establish reasonable conditions for this to occur. So, itturns out that prior plausibility assessments play their mostimportant role when the distinguishing evidence represented by thelikelihoods remains weak. Some of the following examples illustratethis idea.
Newtonian Gravitation Theory (NGT) accounts for the “fallingtogether” of massive bodies in terms of an attractive forcebetween them, the force of gravity produced by those massive bodies.According to the General Theory of Relativity (GTR) there is nogravitational force between bodies as such. Rather, in the vicinity ofmassive bodies space-time is curved. That curvature in space-timecauses the distance between massive objects to decrease as they followthese curved paths through space-time. One result of this differencebetween GTR and NGT is that they entail different paths for beams oflight that pass near the surface of the Sun on their way to Earth.
GTR entails that the light of distant stars that passes very close tothe surface of the Sun is deflected from a straight-line path. Thisdeflection will make the star, as viewed from Earth, appear to be in aslightly different location than usual with respect to backgroundstars whose light does not pass so close to the Sun’s surface.According to GTR, the predicted angle of deflection for a beam passingnear the Sun’s surface is 1.75 arcsec (where 1 arcsec is anangle of 1/3600 of a degree).
If light has gravitational mass, then Newtonian Gravitation Theoryalso entails that the path of a light beam near the Sun’ssurface will be deflected. But the predicted gravitational deflectionis only .875 arcsec, half as much as predicted by General Relativity.On the other hand, if light has no gravitational mass, NGT entailsthat it will not be deflected at all by gravity near the Sun’ssurface.
Einstein realized these differences in the predicted paths of light byGTR vs. NGT. His publication of GTR in 1915 predicted this kind ofempirical distinction between GTR and NGT. In order to test thisprediction, Arthur Eddington and Andrew Crommelin lead two separateexpeditions to observe the positions of stars near the edge of the Sunduring a solar eclipse in 1919. Their measurements involved takingphotographs of stars that appear near the Sun’s surface duringthe eclipse, and then measuring their apparent positions in thosephotographs as compared to other stars that appear further away fromthe Sun’s surface. The relative positions of those same starswere also photographed and measured in the night sky at another timeof year, when the paths of their light was not influenced by travelnear the surface of the Sun.
The hypotheses being tested by the evidence in this case are notthemselves statistical in nature. However, the evidential likelihoodsturn out to be probabilistic due to statistical error characteristicsof the measuring devices.
The Eddington group measured a deflection of 1.61 arcsec, with anerror of plus or minus .31 arcsec. The Crommelin group measured adeflection of 1.98 arcsec, with an error of plus or minus .12 arcsec.These error terms are due to inaccuracies in the measuring devices,such as irregularities in the photographic emulsions, and differencesin the cameras and telescopes during the eclipse measurements ascompared to the non-eclipse reference measurements of star positionsat other times (e.g. due to temperature and configurationchanges).
Let’s employ the following abbreviations:
In cases like this, the statistical error in the measurement outcomeis taken to be normally distributed around the true value of the lightdeflection, expressed by the hypothesis. That is, the likelihood ofthe evidential outcome \(e\) for a hypothesis \(h_j\), given \(c \cdotb\), is calculated in terms of how far away, in terms ofstandarddeviations for a normal distribution, the measured outcome liesfrom the value predicted by that hypothesis.
A well-know spreadsheet program can be used to calculate these values.It uses the following syntax to calculate the probability value due toa normal distribution for the region under the normal curve extendingfrom the left of the curve up to pointx, given themeanof the normal distribution and its standard deviation,standard_dev:
\[\text{NORM.DIST}(x, mean, standard\_dev, \textit{TRUE})\] where the term \(\textit{TRUE}\)tells the function to calculate the cumulative distribution up to\(x\), instead of only calculating the value of the density functionat \(x\). Using this spreadsheet program, the probability of getting ameasured outcome value between \(m-v\) and \(m+v\) is calculated viathe following formula: \[\begin{align}&\text{NORM.DIST}(m+v, mean, standard\_dev, \textit{TRUE}) \\&\quad - \text{NORM.DIST}(m-v, mean, standard\_dev, \textit{TRUE}).\end{align}\]For the experiment conducted by the Eddington group, the evidenceconsists of a measured deflection value of 1.61, accurate to no morethat two decimal places. Thus, the measurement result lies in theinterval between \((1.61-.005)\) and \((1.61+.005)\). This is theevidential outcome \(e_1\). Thus, the relevant evidential likelihoodsmay be calculated as follow:
\[\begin{align}&P[e_1 \mid h_G \cdot c_1 \cdot b]\ = \\&\qquad \text{NORM.DIST}(1.61 + 0.005, 1.75, .31, \textit{TRUE}) \\&\qquad\quad - \text{NORM.DIST}(1.61 - 0.005, 1.75, .31, \textit{TRUE}) \\&~=\ 1.16 \times 10^{-2}\end{align}\] \[\begin{align}&P[e_1 \mid h_N \cdot c_1 \cdot b] = \\&\qquad \text{NORM.DIST}(1.61 + 0.005, .875, .31, \textit{TRUE}) \\&\qquad\quad - \text{NORM.DIST}(1.61 - 0.005, .875, .31, \textit{TRUE}) \\&= 7.74 \times 10^{-4}\end{align}\]\[\begin{align}&P[e_1 \mid h_{N_0} \cdot c_1 \cdot b] = \\&\qquad \text{NORM.DIST}(1.61 + 0.005, 0, .31, \textit{TRUE}) \\&\qquad\quad - \text{NORM.DIST}(1.61 - 0.005, 0, .31, \textit{TRUE}) \\&= 1.79 \times 10^{-8}.\end{align}\]The likelihoods for the evidence from the Crommelin group, \((c_2\cdot e_2)\), may be calculated in a similar way.
The following table provides the likelihoods due to each hypothesisfor each experiment. And it provides the resulting values for thecorresponding likelihood ratios.
| \(e_k\) | \(e_1\) | \(e_2\) |
|---|---|---|
| \(P[e_k \mid h_G \cdot c_k \cdot b]\) | \(1.16 \times 10^{-2}\) | \(5.30\times 10^{-3}\) |
| \(P[e_k \mid h_N \cdot c_k \cdot b]\) | \(7.74 \times 10^{-4}\) | \(1.29 \times 10^{-20}\) |
| \(P[e_k \mid h_{N_0} \cdot c_k \cdot b]\) | \(1.79 \times 10^{-8}\) | \(2.53 \times 10^{-61}\) |
| \[\frac{P[e_k \mid h_N \cdot c_k \cdot b]}{P[e_k \mid h_G \cdot c_k \cdot b]}\] | \[6.67 \times 10^{-2}\] | \[2.43 \times 10^{-18}\] |
| \[\frac{P[e_k \mid h_{N_0} \cdot c_k \cdot b]}{P[e_k \mid h_G \cdot c_k \cdot b]}\] | \[1.54 \times 10^{-6}\] | \[4.77 \times 10^{-59}\] |
| \[\frac{P[e_k \mid h_G \cdot c_k \cdot b]}{P[e_k \mid h_N \cdot c_k \cdot b]}\] | \[1.50 \times 10^{1}\] | \[4.11 \times 10^{17}\] |
| \[\frac{P[e_k \mid h_G \cdot c_k \cdot b]}{P[e_k \mid h_{N_0} \cdot c_k \cdot b]}\] | \[6.48 \times 10^{5}\] | \[2.09 \times 10^{58}\] |
Table: Likelihoods and Likelihood Ratios
Clearly, \((c_1 \cdot e_1)\) provides overwhelming evidence against\(h_{N_0}\) as compared to \(h_G\), and strong evidence against\(h_N\) as compared to \(h_G\). And, \((c_2 \cdot e_2)\) also providesoverwhelming evidence against both \(h_{N_0}\) and \(h_N\) as comparedto \(h_G\).
As an illustration of how evidential support works in a medicalsetting, let’s consider the kind of evidence supplied byover-the-counter COVID-19 self-tests. Let \(h\) be the hypothesis thatthe subject of the testhas COVID-19 on the day of testing;the alternative hypothesis, \(\neg h\), says that the subject does nothave COVID-19 on the day of testing. Background/auxiliary conditions\(b\) state thesensitivity of the test (chance of a positivetest result when disease is present) and thespecificity of thetest (chance of a negative test result when disease is not present).Most home-tests reportsensitivity andspecificity fortest subjects who are already symptomatic — i.e. who alreadyshow any of the following symptoms: fever, fatigue, chills, myalgia(i.e. muscle pain), congestion, cough, loss of smell, shortness ofbreath, sore throat, nausea, diarrhea. In addition, a home-test is“administered appropriately” when the nasal swab is usedas the test instructions specify, and the result is deposited on thesupplied test strip as per instructions. For our purposes, all of thisinformation is included in the background/auxiliary information,\(b\).
Consider a home-test with the following characteristics forsymptomatic subjects:sensitivity = .94,specificity = .98. Thesensitivity is thetruepositive rate (the chance of a positive test result when diseaseis present); so thefalse negative rate (the chance of anegative test result when disease is present) for this test is .06 =(1 -sensitivity). Thespecificity is thetruenegative rate (the chance of a negative test result when diseaseis not present); so thefalse positive rate (the chance of apositive test result when disease is not present) for this test is .02= (1 -specificity).
Now, let’s suppose that an individual subject is tested.Condition \(c\) says that this subject issymptomatic and thatthe test is administered to the subject in the appropriate way (asspecified in the instructions for the test). Let \(e\) say that thetest result is positive (i.e. the test shows that asignificant amount of the target antigen of the SARS-CoV-2 virus isdetected); and let \(\neg e\) say that thetest result isnegative (i.e. the test shows that no significant amount of thetarget antigen of the SARS-CoV-2 virus is detected). What the testsubject wants to know is the value of the posterior probabilities,\(P[h \mid c\cdot e \cdot b]\) and \(P[h \mid c \cdot \neg e\cdotb]\), that the subject has COVID-19, given the evidence of thepositive result, \((c\cdot e)\), or the negative test result,\((c\cdot \neg e)\), taken together with the error rates of thesetests as described in \(b\).
The values of these posterior probabilities depend on the followinglikelihoods, which come from applying thesensitivity andspecificity statistics for the test to this individual testsubject:
\[P[e \mid h \cdot c \cdot b] = .94, \text{ due to the }\textit{sensitivity},\] \[P[\neg e \mid \neg h \cdot c \cdot b] = .98, \text{ due to the }\textit{specificity}.\]As a result, we also have the following values:
\[(P[\neg e \mid h \cdot c \cdot b] = .06, \text{ for the }\textit{false negative rate},\]\[P[e \mid \neg h \cdot c \cdot b] = .02, \text{ for the }\textit{false positive rate}.\]This provides the following likelihood ratios against disease (against\(h\)) for this test subject when the test result is positive, ornegative, respectively: \[\frac{P[e \mid \neg h\cdot c\cdot b]}{P[e \mid h \cdot c\cdot b]} = .02/.94 = .0213\] \[\frac{P[\neg e \mid \neg h\cdot c\cdot b]}{P[\neg e \mid h\cdot c\cdot b]} = .98/.06 = 16.34.\]
The value of the posterior probability that the subject has COVID-19,given the evidence, depends on how plausible it is that the patienthas COVID-19 on the day of the test prior to taking the test resultsinto account, \(P[h \mid c \cdot b]\). In the context of medicaldiagnosis, this prior probability is usually assessed on the basis ofthebase rate for the disease in the patient’s riskgroup. Such information may be stated within the backgroundinformation \(b\).Rule OB shows how to calculate theposterior probabilities from these values.
\[\begin{align}&\Omega[\neg h \mid c \cdot e \cdot b \cdot (h \vee \neg h)] = \frac{P[\neg h \mid c \cdot e \cdot b]}{P[h \mid c \cdot e \cdot b]} \\&\qquad = \frac{P[e \mid \neg h \cdot c \cdot b]}{P[e \mid h \cdot c \cdot b]} \times \frac{P[\neg h \mid c \cdot b]}{P[h \mid c \cdot b]}.\end{align}\]\[\begin{align}P[h \mid c \cdot e \cdot b] &= P[h \mid c \cdot e \cdot b \cdot (h \vee \neg h)] \\ &= \frac{1}{1 + \Omega[\neg h \mid c \cdot e \cdot b \cdot (h \vee \neg h)]}.\end{align}\]And similarly for \(P[h \mid c \cdot \neg e \cdot b]\).
The table below shows how these posterior probabilities depend on thevalues of prior probabilities. The columns under “Test Brand1” shows the posterior probabilities for the test describedabove, the test that hassensitivity = .94 andspecificity = .98. The columns under “Test Brand 2”shows the posterior probabilities for a different, lower sensitivitytest, a test that hassensitivity = .84 andspecificity= .98.
| Test Brand 1 Sensitivity = .94 Specificity = .98 | Test Brand 2 Sensitivity = .84 Specificity = .98 | |||
|---|---|---|---|---|
| \(P[h \mid c \cdot b]\) | \(P[h \mid c \cdot e \cdot b]\) | \(P[h \mid c \cdot \neg e \cdot b]\) | \(P[h \mid c \cdot e \cdot b]\) | \(P[h \mid c \cdot \neg e \cdot b]\) |
| .01 | .322 | .001 | .298 | .002 |
| .02 | .490 | .001 | .462 | .003 |
| .03 | .592 | .002 | .565 | .005 |
| .04 | .662 | .003 | .636 | .007 |
| .05 | .712 | .003 | .689 | .009 |
| .06 | .750 | .004 | .728 | .010 |
| .07 | .780 | .005 | .760 | .012 |
| .08 | .803 | .005 | .785 | .014 |
| .09 | .823 | .006 | .806 | .016 |
| .10 | .839 | .007 | .824 | .018 |
| .20 | .922 | .015 | .913 | .039 |
| .30 | .953 | .026 | .947 | .065 |
| .40 | .969 | .039 | .966 | .098 |
| .50 | .979 | .058 | .977 | .140 |
| .60 | .986 | .084 | .984 | .197 |
| .70 | .991 | .125 | .990 | .276 |
| .80 | .995 | .197 | .994 | .395 |
| .90 | .998 | .355 | .997 | .595 |
Table: Posterior Probabilities for COVID-19 Home Test Results
\(h\) =disease present \(e\) =test resultpositive
When the precise values of the prior probabilities are unknown, but areasonable range can be estimated, a resulting range of posteriorprobabilities may be calculated. Suppose we can be confident that thebase-rate for COVID-19 among symptomatic members of the relevantpopulation for the test subject is between .05 and .09. Then, when thesubject is tested with Test Brand 1, the posterior probability thatthe subject has COVID-19, given a positive result is, according to thetable, \(.713 \le P[h \mid c\cdot e \cdot b] \le .823\). And theposterior probability that the subject has COVID-19, given a negativeresult, is \(.003 \le P[h \mid c \cdot \neg e \cdot b] \le .006\).
In many contexts the values of likelihoods may be vague or imprecise.Nevertheless, the evidence may still be capable of strongly supportingone hypothesis over another in a reasonably objective way. Here is anexample.
Consider the following simple version of the continental drifthypothesis. \(h_2\): The land masses of Africa and South America wereonce joined, then split apart and have drifted to there currentpositions on Earth over the eons. Let’s compare this hypothesisto the oldercontractionist theory: \(h_1\): The continentshave fixed positions on Earth, which they acquired when the Earthfirst formed, cooled, and contracted into its present configuration.
The evidence available for the drift hypothesis over thecontractionist hypothesis during the first half of the 20thcentury included the following observations: (1) Upon carefulexamination, the east coast of South America fits the shape of thewest coast of Africa extremely well. (2) When the coasts of SouthAmerica and Africa are aligned as closely as possible, and the geologyof the two continents is carefully examined, a number of geologicfeatures align across the two continents (e.g. the Ghana mountainranges align with mountain ranges in Brazil; the rock strata of theKarroo system of South Africa matches precisely with the SantaCatarina system in Brazil; etc.). (3) When the fossil record on bothcontinents is carefully examined, a number fossils of identicalspecies have been discovered to have lived at the same time on bothcontinents (e.g. Mesosaurus (land reptile, 286-258 million yrs. ago),Cynognathus (fresh water reptile 250-240 million yrs. ago),Glossopteris (tree-sized fern, 299 million yrs. ago)); and none ofthese species could have crossed the Atlantic Ocean under their ownpower.
Let \(c\) represent the conjunction of all the specific methods usedto collect the above evidence, and let \(e\) represent a detaileddescription of the precise results of all these investigations. (Here\(b\) expresses relevant scientific background knowledge, includingthe relevant knowledge of geology and evolutionary biology.) Considerthe evidential likelihoods, \(P[e \mid h_1 \cdot c \cdot b]\) and\(P[e \mid h_2 \cdot c \cdot b]\). Although experts may be unable tospecify anything like precise numerical values for these likelihoods,experts may readily agree that each of the above cited evidentialobservations is much more likely on the drift hypothesis than on thecontraction hypothesis, and that they jointly constitute extremelystrong evidence in favor ofdrift overcontraction. On aBayesian analysis this is due to the fact that, although theselikelihoods do not have precise values, it is obvious to experts thatthe ratio of the likelihoods is pretty extreme, strongly favoringdrift over contraction. That is,
\(P[e \mid h_2 \cdot c \cdot b] / P[e \mid h_1 \cdot c \cdot b]\) isvery large, and its inverse, \(P[e \mid h_1 \cdot c \cdot b] / P[e\mid h_2 \cdot c \cdot b]\), is very nearly zero.
Thus, according to the Ratio Form of Bayes’ Theorem,
\[P[h_1 \mid c \cdot e \cdot b] \; \lt \; P[h_1 \mid c \cdot e \cdot b] / P[h_2 \mid c \cdot e \cdot b]\]should be very close to 0, strongly supporting \(h_2\) over \(h_1\),unless thedrift hypothesis is taken to be extremelyimplausible as compared tocontraction on other grounds —i.e. unless \(P[h_1 \mid c \cdot b] / P[h_2 \mid c \cdot b]\) isextremely large due to other information (which may be listed within\(b\)).
Historically, the evidence described above was well-known during thefirst half of the 20th century. Nevertheless, mostgeologists largely dismissed thedrift hypothesis until the1960s. Apparently the strength of this evidence did not suffice toovercome non-evidential (though broadly empirical) considerations thatmade the drift hypothesis seem much less plausible than thetraditionalcontractionist view. The chief difficulty was theapparent absence of a plausible mechanism for moving continents acrossthe ocean floor. This difficulty was overcome when a plausible enoughconvection mechanism was articulated, and evidence favoring it wasacquired.
We now turn to an example application ofRule BE-D.
Let ‘B’ represent the collection of all householdsin the United States during July, 2020. Let ‘A’represent those households among them in which one or more dogsreside. What proportion of theBs areAs? Symbolically,for real number \(r\) between 0 and 1, let \(F(A,B)= r\) say that thefrequency (i.e. proportion) of \(A\)s among \(B\)s is \(r\). So, wewant to know, for what value of \(r\) does \(F(A,B)= r\) hold. Giventhat the number of households in the United States during July of 2020was a little under \(z\) = 129 million (stated within the backgroundand auxiliaries, \(b\)), there are in principle that many alternativehypotheses: \(F(A,B)=k/z\) for each integer \(k\) between 0 and 129million.
Suppose a sampleS consisting of \(n = 400\) of thesehouseholds is randomly drawn fromB (households present in theUnited States during July 20, 2020) with respect to whether or notthey areA (households with dogs). This is the experimentalcondition, \(c\). And suppose that within sampleS, \(m = 248\)households report being inA (having one or more dogs inresidence). So, \(F(A,S)= m/n = 248/400=.62\). This is the evidence\(e\).
The posterior probability of any specific hypothesis, \(P[F(A,B)=k/z\mid c \cdot F[A,S]=248/400 \cdot b]\), will be extremely small, evenfor \(F(A,B)=248/400=.62\). And in any case, we shouldn’t expectthe value of \(F[A,B]\) to be exactly the value of \(F(A,S)\). Rather,what we may reasonably hope to determine is that some interval ofvalues below and above the sample value .62 has a fairly highprobability: e.g. \[P[.57 \le F(A,B) \le .67 \mid c \cdot F(A,S)=248/400 \cdot b] \ge .95.\] We will see how to determine suchposterior probabilities viaRule BE-D.
Before proceeding, let’s settle on a few convenient notationalconventions. To facilitate the statement of ruleBE-D we pulleda particular list of hypotheses to the front of the queue, and listedthem as \(h_1\) through \(h_k\). In the present example we divergefrom this way of labeling hypotheses. Instead, we employ a notationthat is more natural for the present example. We let each hypothesisin the set of alternatives \(H\) take the form \(F(A,B)=r_k\), where\(k\) now ranges from 0 through \(z\), and where we now define each\(r_k\) to abbreviate proportion \(k/z\) of the population \(B\).Furthermore, the main disjunction of hypotheses of interest nowconsists of those frequencies within some interval \([v,u]\) centeredaround the sample frequency \(F(A,S)=m/n\). Thus, the expression \(v\le F[A,B] \le u\) (for some specific values of \(v\) and \(u\))represents the disjunction of hypotheses, \((F[A,B]=v \;\vee \ldots \)\(\vee\; F[A,B]=m/n \;\vee \ldots \) \(\vee\; F[A,B]=u)\), whoseposterior probability we want to evaluate.
When a hypothesis states that the proportion of \(A\)s among \(B\)s is\(r_k\), the associated likelihood of drawing a sample proportion\(F(A,S)=m/n\) is given by the binomial distribution formula:
\[\begin{align}&P[F(A,S)=m/n \mid c \cdot F(A,B)=r_k \cdot b] \\ &\qquad = \frac{n!}{m!(n-m)!}\; r_k^m\; (1-r_k)^{n-m}.\end{align}\]Now, we apply the Bayesian Estimation ruleBE-D as follows:
\[\begin{align}&P[v \le F[A,B] \le q \mid c \cdot F[A,S]=m/n \cdot b] \\&\qquad \ge \frac{1}{1 + K \times \left[\frac{1}{\frac{\sum_{j = v\cdot z}^{u\cdot z} P[e \; \mid \; h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \; \mid \; h_i \cdot c \cdot b]}} - 1 \right]},\end{align}\]where the ratio of sums in the denominator is given by the formula,\[\frac{\sum_{j = v\cdot z}^{u\cdot z} P[e \mid h_j \cdot c \cdot b]}{\sum_{i = 1}^z P[e \mid h_i \cdot c \cdot b]} \; = \;\frac{\sum_{j = v\cdot z}^{u\cdot z}\; r_j^m\; (1-r_j)^{n-m}}{\sum_{i = 1}^z\; r_i^m\; (1-r_i)^{n-m}},\] where \((v\cdot z)\) and \((u\cdot z)\) are theappropriate integers for the endpoints of the interval \([v, u]\)(i.e. \((v\cdot z) /z = v\) and \((u\cdot z)/z = u\)).
These large sums of binomial factors are difficult to calculatedirectly. Fortunately, they are closely approximated by a more easilycalculable formula, that for the normalized Beta distribution. Thatis,
\[\begin{align}\frac{\sum_{j = v\cdot z}^{u\cdot z}\; r_j^m\; (1-r_j)^{n-m}}{\sum_{i = 1}^z\; s_i^k\; (1-s_i)^{n-m}} \; &\approxeq \; Beta[v,u \;:\; m+1,\; (n-m)+1] \\&=\; \frac{\int_{v}^u r^{m} (1-r)^{n-m} \; dr}{\int_{0}^1 s^m (1-s)^{n-m} \; ds}.\end{align}\]The values of this normalized Beta-distribution function may easily becomputed using well-know mathematics and spreadsheet programs. Forexample, the version of this function supplied by one such spreadsheetprogram takes the form BETA.DIST(\(x\), \(\alpha\), \(\beta\), TRUE).It computes the value of the normalized beta distribution from 0 up toto \(x\), where for our purposes \(\alpha = m+1\), \(\beta = (n-m)+1\). The input value TRUE tells the program to calculate the integralfrom 0 to \(x\) (whereas FALSE would tell the program to calculate thevalue of the density function at point \(x\)). Using this spreadsheetversion of the function, we calculate the value of the normalizedBeta-distribution between \(v\) and \(u\) by inputing the followingformula:
\[\begin{align}\tag{$BD$} &\text{BETA.DIST}[u,\; m+1,\; (n-m)+1,\; \textit{TRUE}] \\&\quad - \text{BETA.DIST}[v,\; m+1,\; (n-m)+1,\; \textit{TRUE}].\end{align}\]For simplicity, we refer to the above formula as \(BD(u,v,m,n)\). So,to have the spreadsheet program compute a lower bound on the value of\(P[v\le F[A,B]\le u \mid c \cdot F[A,S]=m/n \cdot b]\) for specificvalues of \(m\), \(n\), \(v\), and \(u\), we need only input thisformula with those values, together with a value for the upper bound\(K\) on ratios of prior probabities:
\[\frac{1}{1 + K\times\left(\frac{1}{BD(u,v,m,n)} - 1\right)}\]In many real cases it will be at least as initially plausible that thetrue frequency value lies within of theregion of interestbetweenv andu as that it lies outside that thatregion. In such cases the value ofK must be less than or equalto 1. However, even when the upper boundK on the ratio ofthese priors is quite large, any moderately large sample sizenwill drive the posterior probability \(P[v \le F[A,B] \le q \mid c\cdot F[A,S]=m/n \cdot b]\) close to 1, for fairly narrow boundsv andu. The following table, calculated via theBeta-distribution, illustrates this for both
\[P[F(A,B)=.62\pm .05\mid c \cdot F(A,S)=m/n=.62 \cdot b]\]and
\[P[F(A,B)=.62\pm .025\mid c \cdot F(A,S)=m/n=.62 \cdot b]\]over a range of different samples sizes \(n\), and over a wide rangeof values of \(K\).
| Size of sampleS fromB \(= n\), Number ofAs in sampleS \(= m\): \(m/n = .62\) throughout table | Where \(\frac{P[F(A,B)=s \mid c \cdot b]}{P[F(A,B)=r\mid c \cdot b]} \: \le \: K\) for all \(r\), \(s\) such that \(.62-q \le r \le .62+q\)and either \(s \lt .62-q\) or \(s \gt.62+q\), \(P[F(A,B)=.62\pm q\mid c \cdot F(A,S)=m/n \cdot b] \;\; \ge\) | |||||||
| Prior RatioK \(\downarrow\) | n \(\rightarrow\) (m) \(\rightarrow\) | 400 (248) | 800 (496) | 1600 (992) | 3200 (1984) | 6400 (3968) | 12800 (7936) | |
| 1 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.9614 0.6982 | 0.9965 0.8554 | 1.0000 0.9608 | 1.0000 0.9964 | 1.0000 1.0000 | 1.0000 1.0000 | |
| 2 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.9256 0.5364 | 0.9930 0.7474 | 0.9999 0.9246 | 1.0000 0.9929 | 1.0000 0.9999 | 1.0000 1.0000 | |
| 5 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.8327 0.3163 | 0.9827 0.5420 | 0.9998 0.8306 | 1.0000 0.9825 | 1.0000 0.9998 | 1.0000 1.0000 | |
| 10 | q = .05 \(\rightarrow\) q =.025 \(\rightarrow\) | 0.7133 0.1879 | 0.9661 0.3717 | 0.9996 0.7103 | 1.0000 0.9656 | 1.0000 0.9996 | 1.0000 1.0000 | |
| 100 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.1992 0.0226 | 0.7402 0.0559 | 0.9963 0.1969 | 1.0000 0.7371 | 1.0000 0.9962 | 1.0000 1.0000 | |
| 1,000 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.0243 0.0023 | 0.2217 0.0059 | 0.9639 0.0239 | 1.0000 0.2190 | 1.0000 0.9637 | 1.0000 1.0000 | |
| 10,000 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.0025 0.0002 | 0.0277 0.0006 | 0.7277 0.0024 | 0.9999 0.0273 | 1.0000 0.7261 | 1.0000 0.9999 | |
| 100,000 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.0002 0.0000 | 0.0028 0.0001 | 0.2109 0.0002 | 0.9994 0.0028 | 1.0000 0.2096 | 1.0000 0.9994 | |
| 1,000,000 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.0000 0.0000 | 0.0003 0.0000 | 0.0260 0.0000 | 0.9940 0.0003 | 1.0000 0.0258 | 1.0000 0.9943 | |
| 10,000,000 | q = .05 \(\rightarrow\) q = .025 \(\rightarrow\) | 0.0000 0.0000 | 0.0000 0.0000 | 0.0027 0.0000 | 0.9433 0.0000 | 1.0000 0.0026 | 1.0000 0.9457 | |
Table: Lower Bounds on Posterior Probability
\(P[F(A,B)=.62\pm q\mid c \cdot F(A,S)=m/n=.62 \cdot b]\),
for SampleS of Sizen Randomly Drawn fromB.
All probability entries in this table are accurate to four decimalplaces. Those entries of form ‘1.0000’ actually representprobability values that are a tiny bit less than 1.0000.
Notice that even when the bound of ratios of prior probabilities,\(K\), is extremely large, a sufficiently large sample size overcomesthis disparity between prior probabilities. To illustrate the point,let’s focus on those hypotheses that lie in the interval\(F(A,B)=.62\pm .025\) (i.e. the interval \(.595 \le F(A,B) \le.645\)). In this context K is an an upper bound on the ratios of allthe prior probabilities, \[K \;\ge\; P[F(A,B)=r_i \mid c \cdot b] / P[F(A,B)=r_j \mid c \cdot b],\] such that \(r_j\) lies withinthe interval \(.62\pm .025\) and \(r_i\) lies outside the interval\(.62\pm .025\). For \(K = 1,000\) this means that some of thespecific frequency hypotheses \(F(A,B)=k/z\) outside this interval(i.e. some hypotheses that either have \(k/z \lt .62-.025\) or have\(k/z \gt .62+.025\)) may have prior probabilitiesup to 1000times larger than the priors of specific hypotheses within thisinterval. But no specific hypotheses outside the interval has a priormore than 1000 times larger than any hypothesis inside theinterval. The table shows that even when the upper bound on theseratios of priors is this extreme, a large enough sample size, \(n =6400\), results in a reasonably good lower bound on the posteriorprobability: \[P[F(A,B)=.62\pm .025\mid c \cdot F(A,S)=3968/6400 \cdot b] \; \ge \; .9637.\] And even for a really extreme value of thisratio of priors, \(K = 10,000,000\), a sample size of \(n = 12800\)results in a decent lower bound on the posterior: \[P[F(A,B)=.62\pm .025\mid c \cdot F(A,S)=7936/12800 \cdot b] \; \ge \; .9457.\]
Let’s consider a simple example of a statistical hypothesisabout a collection of independent evidential outcomes. Suppose wepossess a warped coin and want to determine its propensity for turningupheads when tossed in a standard unbiased way. Consider twohypotheses, \(h_{q}\) and \(h_{r}\), which say that the chances (orpropensities) for the coin to come upheads when tossed are\(q\) and \(r\), respectively. Let \(c\) report that the coin istossed \(n\) times in the normal way, and let \(e\) say that precisely\(m\) occurrences ofheads result. Supposing that theoutcomes of such tosses are probabilistically independent (asserted by\(b\)). So, the respective likelihoods take the usually binomial form\[ P[e \mid h_{r}\cdot c \cdot b] = \frac{n!}{m! \times(n-m)!} \times r^m (1-r)^{n-m}, \]
Then,Rule RB yields the following formula, where thelikelihood ratio is the ratio of the respective binomial terms:
\[ \frac{P[h_{q} \mid c\cdot e \cdot b]} {P[h_{r} \mid c\cdot e \cdot b]} = \frac{q^m (1-q)^{n-m}} {r^m (1-r)^{n-m}} \times \frac{P[h_{q} \mid c \cdot b]} {P[h_{r} \mid c \cdot b]} \]When, for instance, the coin is tossed \(n = 100\) times and comes upheads \(m = 72\) times, the evidence for hypothesis\(h_{1/2}\) as compared to \(h_{3/4}\) is given by the likelihoodratio
\[\frac{P [e \mid h_{1/2}\cdot c \cdot b]} {P [e \mid h_{3/4}\cdot c \cdot b]} = \frac{[(1/2)^{72}(1/2)^{28}]}{[(3/4)^{72}(1/4)^{28}]} = .000056269. \]Such evidencestrongly refutes the \(h_{1/2}\)(fair-coin) hypothesis with respect to the \(h_{3/4}\)(bias-coin towards 3/4-heads) hypothesis, provided thatthe assessment of prior plausibilities for these two hypothesesdoesn’t make the latter hypothesistoo extremelyimplausible to begin with. In this case, provided that\(h_{1/2}\) is initially no more that 100 times more plausible thanthe \(h_{3/4}\) — i.e. provided that \(P[h_{1/2} \mid b] /P[h_{3/4} \mid b] \le 100\) — the resulting ratio of posteriorprobabilities must be less than or equal to .0056269: \[ \frac{P[h_{1/2} \mid c^{n}\cdot e^{n} \cdot b]} {P[h_{3/4} \mid c^{n}\cdot e^{n} \cdot b]} \le .000056269 \times 100 = .0056269 \]Notice, however, that thisstrong refutation of \(h_{1/2}\)is notabsolute refutation. Additional evidence could reversethe total proportion ofheads outcomes that favor it.
In cases like this, where all the competing hypotheses lie within acontinuous region, the Bayesian Estimation RuleBE-C providesanother useful way to assess the evidential support for hypotheses. Inthe coin-tossing case, the relevant region of alternative hypotheses\(H\) is the class of all hypotheses of form \(h_{r}\), where eachsuch hypothesis says that the chance ofheads on each coin-tossis \(r\). So, when \(c\) says the coin is tossed \(n\) times, and esays these tosses produce precisely \(m\) occurrences ofheads(and \(b\) says the tosses are independent and identicallydistributed), the individual likelihoods continue to take the binomialform: \[P[e \mid h_{r} \cdot c \cdot b] = \frac{n!}{m! \times(n-m)!} \times r^m (1-r)^{n-m}.\]
Let \(h[v,u]\) express the hypothesis that the propensity for tossesto landheads is some real number in the interval between \(v\)and \(u\). Then, applyingRule BE-C to this problem, our goalis to evaluate posterior probabilities of form \[\begin{align}P[h[v,u] \mid c \cdot e \cdot b] &= \int_v^u p[h_q \mid c \cdot e \cdot b] \; \; dq \\&\ge \frac{1}{1 + K \times \left[\frac{1}{\frac{\int_v^u r^m (1-r)^{n-m} \; \; dr}{\int_0^1 q^m (1-q)^{n-m} \; \; dq}} - 1 \right]},\end{align}\] where K isan an upper bound on the ratios of values of the prior probabilitydensity functions, \[K \;\ge\; p[h_q \mid c \cdot b] / p[h_r \mid c \cdot b],\] when \(r\) lies within the intervalbetween \(v\) and \(u\), and \(q\) lies outside this interval.
It turns out that the ratio \(\frac{\int_v^u r^m (1-r)^{n-m} \; \;dr}{\int_0^1 q^m (1-q)^{n-m} \; \; dq}\) in this equation is the verydefinition of the normalized Beta-distribution function (discussedearlier) applied to \(m\) positive outcomes in \(n\) trials. We canemploy a well-known spreadsheet application to calculate values of thenormalized Beta-distribution between specific values ofv andu, using the previously-defined formula \(BD(u,v,m,n)\).
Thus, we have the following formula for the lower bound on theposterior probability that the propensity forheads lies withinan interval between bounds \(v\) and \(u\).
\[P[h[v,u] \mid c \cdot e \cdot b] \; \; \ge\frac{1}{1 + K\times\left(\frac{1}{BD(u,v,m,n)}\right)}.\]Here are a few examples calculated via this formula. In each case, thevalues of \(v\) and \(u\) have been chosen to lie equal distancesbelow and above .72, which we assume to be the proportion found in thesample, \(m/n = .72\). Each of the following posterior probabilitiesdraws on specified values of m and n, and a specified value for \(K\).
| \(K\) | \(n\) | \(m\) | posterior probabilities |
|---|---|---|---|
| 1 | 100 | 72 | \(P[h[.63,.81] \mid c \cdot e \cdot b] \; \; \gt .956\) \(P[h[.60,.84] \mid c \cdot e \cdot b] \; \; \gt .992\) |
| 10 | 100 | 72 | \(P[h[.59,.85] \mid c \cdot e \cdot b] \; \; \gt .959\) \(P[h[.56,.88] \mid c \cdot e \cdot b] \; \; \gt .994\) |
| 100 | 100 | 72 | \(P[h[.56,.88] \mid c \cdot e \cdot b] \; \; \gt .946\) \(P[h[.53,.91] \mid c \cdot e \cdot b] \; \; \gt .994\) |
| 1 | 1000 | 720 | \(P[h[.69,.75] \mid c \cdot e \cdot b] \; \; \gt .965\) \(P[h[.68,.76] \mid c \cdot e \cdot b] \; \; \gt .995\) |
| 10 | 1000 | 720 | \(P[h[.68,.76] \mid c \cdot e \cdot b] \; \; \gt .953\) \(P[h[.67,.77] \mid c \cdot e \cdot b] \; \; \gt .995\) |
| 100 | 1000 | 720 | \(P[h[.67,.77] \mid c \cdot e \cdot b] \; \; \gt .956\) \(P[h[.66,.78] \mid c \cdot e \cdot b] \; \; \gt .997\) |
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
Bayes’ Theorem |belief, formal representations of |Carnap, Rudolf |confirmation |epistemology: Bayesian |probability, interpretations of |statistics, philosophy of
Thanks to Alan Hájek, Jim Joyce, and Edward Zalta for manyvaluable comments and suggestions. The editors and author also thankGreg Stokley and Philippe van Basshuysen for carefully reading anearlier version of the entry and identifying a number of typographicalerrors.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054