Bayes’ Theorem

First published Sat Jun 28, 2003; substantive revision Tue Sep 30, 2003

Bayes' Theorem is a simple mathematical formula used for calculatingconditional probabilities. It figures prominently insubjectivist orBayesian approaches to epistemology,statistics, and inductive logic. Subjectivists, who maintain thatrational belief is governed by the laws of probability, lean heavilyon conditional probabilities in their theories of evidence and theirmodels of empirical learning. Bayes' Theorem is central to theseenterprises both because it simplifies the calculation of conditionalprobabilities and because it clarifies significant features ofsubjectivist position. Indeed, the Theorem's central insight —that a hypothesis is confirmed by any body of data that its truthrenders probable — is the cornerstone of all subjectivistmethodology.

1. Conditional Probabilities and Bayes' Theorem

The probability of a hypothesisH conditional on a givenbody of dataE is the ratio of the unconditional probabilityof the conjunction of the hypothesis with the data to theunconditional probability of the data alone.

(1.1) Definition.
The probability ofH conditional onE isdefined asP_E(H) =P(H &E)/P(E),provided that both terms of this ratio exist andP(E) > 0.^[1]

To illustrate, suppose J. Doe is a randomly chosen American who was aliveon January 1, 2000. According to the United States Center for DiseaseControl, roughly 2.4 million of the 275 million Americans alive on thatdate died during the 2000 calendar year. Among the approximately 16.6million senior citizens (age 75 or greater) about 1.36 million died. Theunconditional probability of the hypothesis that our J. Doe died during2000,H, is just the population-wide mortality rateP(H) = 2.4M/275M = 0.00873. To find the probabilityof J. Doe's death conditional on the information,E, that heor she was a senior citizen, we divide the probability that he or she wasa senior who died,P(H &E) = 1.36M/275M = 0.00495, by the probability that he or she was a senior citizen,P(E) = 16.6M/275M = 0.06036. Thus, the probability of J. Doe's death given that he or she was a senior isP_E(H) =P(H &E)/P(E) = 0.00495/0.06036 = 0.082. Notice how thesize of thetotal population factors out of this equation, so thatP_E(H) is just the proportion of seniorswho died. One should contrast this quantity, which gives the mortalityrate among senior citizens, with the "inverse" probability ofEconditional onH,P_H(E) =P(H &E)/P(H) =0.00495/0.00873 = 0.57, which is the proportion of deathsin thetotal population that occurred among seniors.

Here are some straightforward consequences of (1.1):

Probability.P_E is aprobability function.^[2]
Logical Consequence. IfE entailsH,thenP_E(H) = 1.
Preservation of Certainties. IfP(H) = 1,thenP_E(H) = 1.
Mixing.P(H) =P(E)P_E(H) +P(~E)P_~_E(H).^[3]

The most important fact about conditional probabilities is undoubtedlyBayes' Theorem, whose significance was first appreciated bythe British cleric Thomas Bayes in his posthumously publishedmasterwork, "An Essay Toward Solving a Problem in the Doctrine ofChances" (Bayes 1764). Bayes' Theorem relates the "direct"probability of a hypothesis conditional on a given body of data,P_E(H), to the "inverse"probability of the data conditional on the hypothesis,P_H(E).

(1.2) Bayes' Theorem.
P_E(H) = [P(H)/P(E)]P_H(E)

In an unfortunate, but now unavoidable, choice of terminology,statisticians refer to the inverse probabilityP_H(E) as the "likelihood" ofH onE. It expresses the degree to which thehypothesispredicts the data given the background informationcodified in the probabilityP.

In the example discussed above, the condition that J. Doe died during 2000is a fairly strong predictor of senior citizenship. Indeed, the equationP_H(E) = 0.57 tells us that 57% of thetotal deaths occurred among seniors that year. Bayes' theorem letsus use this information to compute the "direct" probability of J. Doedying given that he or she was a senior citizen. We do this bymultiplying the "prediction term"P_H(E) by the ratio of the total numberof deaths in the population to the number of senior citizens in thepopulation,P(H)/P(E) = 2.4M/16.6M =0.144. The result isP_E(H) = 0.57 ×0.144 = 0.082, just as expected.

Though a mathematical triviality, Bayes' Theorem is of great valuein calculating conditional probabilities because inverse probabilitiesare typically both easier to ascertain and less subjective than directprobabilities. People with different views about the unconditionalprobabilities ofE andH often disagree aboutE's value as an indicator ofH. Even so, they canagree about the degree to which the hypothesis predicts the data ifthey know any of the following intersubjectively available facts: (a)E'sobjective probability givenH, (b) thefrequency with which events likeE will occur ifHis true, or (c) the fact thatH logically entailsE. Scientists often design experiments so that likelihoodscan be known in one of these "objective" ways. Bayes' Theorem thenensures that any dispute about the significance of the experimentalresults can be traced to "subjective" disagreements about theunconditional probabilities ofH andE.

When bothP_H(E) andP_~H(E) are known an experimenterneed not even knowE's probability to determine a value forP_E(H) using Bayes' Theorem.

(1.3) Bayes' Theorem (2nd form).^[4]
P_E(H) =P(H)P_H(E)/ [P(H)P_H(E)+P(~H)P_~H(E)]

In this guise Bayes' theorem is particularly useful for inferringcauses from their effects since it is often fairly easy to discern theprobability of an effect given the presence or absence of a putativecause. For instance, physicians often screen for diseases of knownprevalence using diagnostic tests of recognizedsensitivityandspecificity. The sensitivity of a test, its "truepositive" rate, is the fraction of times that patients with thedisease test positive for it. The test's specificity, its "truenegative" rate, is the proportion of healthy patients who testnegative. If we letH be the event of a given patient havingthe disease, andE be the event of her testing positive forit, then the test's sensitivity and specificity are given by thelikelihoodsP_H(E) andP_~_H(~E), respectively,and the "baseline" prevalence of the disease in the population isP(H). Given these inputs about the effects of thedisease on the outcome of the test, one can use (1.3) to determine theprobability of disease given a positive test. For a moredetailed illustration of this process, seeExample 1 in the Supplementary Document "Examples, Tables, and Proof Sketches".

2. Special Forms of Bayes' Theorem

Bayes' Theorem can be expressed in a variety of forms that are usefulfor different purposes. One version employs what Rudolf Carnap calledtherelevance quotient orprobability ratio (Carnap1962, 466). This is the factorPR(H,E) =P_E(H)/P(H)by whichH's unconditional probability must be multiplied toget its probability conditional onE. Bayes' Theorem isequivalent to a simple symmetry principle for probability ratios.

(1.4) Probability Ratio Rule.
PR(H,E) =PR(E,H)

The term on the right provides one measure of the degree to whichH predicts E. If we think ofP(E) asexpressing the "baseline" predictability ofE given thebackground information codified inP, and ofP_H(E) asE'spredictability whenH is added to this background, thenPR(E,H) captures the degree towhich knowingH makesE more or less predictablerelative to the baseline:PR(E,H) =0 means thatH categorically predicts ~E;PR(E,H) = 1 means that addingH does not alter the baseline prediction at all;PR(E,H) =1/P(E) means thatH categoricallypredictsE. SinceP(E)) =P_T(E)) whereT is any truth of logic, we can think of (1.4) astelling us that

The probability of a hypothesis conditional on a body of data isequal to the unconditional probability of the hypothesis multiplied bythe degree to which the hypothesis surpasses a tautology as apredictor of the data.

In our J. Doe example,PR(H,E) isobtained by comparing the predictability of senior status given thatJ. Doe died in 2000 to its predictability given no informationwhatever about his or her mortality. Dividing the former "predictionterm" by the latter yieldsPR(H,E) =P_H(E)/P(E) =0.57/0.06036 = 9.44. Thus, as a predictor of senior status in 2000,knowing that J. Doe died is more than nine times better than notknowing whether she lived or died.

Another useful form of Bayes' Theorem is theOdds Rule. Inthe jargon of bookies, the "odds" of a hypothesis is its probabilitydivided by the probability of its negation:O(H) =P(H)/P(~H). So, for example, aracehorse whose odds of winning a particular race are 7-to-5 has a7/12 chance of winning and a 5/12 chance of losing. Tounderstand the difference between odds and probabilities it helps tothink of probabilities asfractions of the distance betweenthe probability of a contradiction and that of a tautology, so thatP(H) =p means thatH isptimes as likely to be true as a tautology. In contrast, writingO(H) = [P(H) −P(F)]/[P(T)−P(H)] (whereF is somelogical contradiction) makes it clear thatO(H)expresses this same quantity as the ratio of the amount by whichH's probability exceeds that of a contradiction to theamount by which it is exceeded by that of a tautology. Thus, thedifference between "probability talk" and "odds talk" corresponds tothe difference between saying "we are two thirds of the way there" andsaying "we have gone twice as far as we have yet to go."

The analogue of the probability ratio is theodds ratioOR(H,E) =O_E(H)/O(H),the factor by whichH's unconditional odds must be multipliedto obtain its odds conditional onE. Bayes' Theorem isequivalent to the following fact about odds ratios:

(1.5) Odds Ratio Rule.
OR(H,E) =P_H(E)/P_~H(E)

Notice the similarity between (1.4) and (1.5). While each employs adifferent way ofexpressing probabilities, each shows howits expression forH's probability conditional onE can be obtained by multiplyingits expression forH's unconditional probability by a factor involving inverseprobabilities.

The quantityLR(H,E) =P_H(E)/P_~H(E)that appears in (1.5) is thelikelihood ratio ofHgivenE. In testing situations like the one described inExample 1, the likelihood ratio is the test's true positive ratedivided by its false positive rate:LR =sensitivity/(1 − specificity). As with the probabilityratio, we can construe the likelihood ratio as a measure of the degreeto whichH predictsE. Instead of comparingE's probability givenH with its unconditionalprobability, however, we now compare it with its probabilityconditional on ~H.LR(H,E) is thus the degree to which the hypothesis surpasses itsnegation as a predictor of the data. Once more, Bayes' Theorem tellsus how to factor conditional probabilities into unconditionalprobabilities and measures of predictive power.

The odds of a hypothesis conditional on a body of data is equalto the unconditional odds of the hypothesis multiplied by the degreeto which it surpasses its negation as a predictor of the data.

In our running J. Doe example,LR(H,E) is obtained by comparing the predictability of seniorstatus given that J. Doe died in 2000 to its predictability giventhat he or she lived out the year. Dividing the former "predictionterm" by the latter yieldsLR(H,E)=P_H(E)/P_~H(E)= 0.57/0.056 = 10.12. Thus, as a predictor of senior status in 2000,knowing that J. Doe died is more than ten times better than knowingthat he or she lived.

The similarities between the "probability ratio" and "odds ratio"versions of Bayes' Theorem can be developed further if we expressH's probability as a multiple of the probability of someother hypothesisH* using therelative probabilityfunctionB(H,H*) =P(H)/P(H*). It should be clearthatB generalizes bothP andO sinceP(H) =B(H,T) andO(H) =B(H, ~H). By comparingthe conditional and unconditional values ofB we obtain theBayes' Factor:

BR(H,H*;E) =B_E(H,H*)/B(H,H*) =[P_E(H)/P_E(H*)]/[P(H)/P(H*)].

We can also generalize the likelihood ratio by settingLR(H,H*;E) =P_H(E)/P_H*(E).This comparesE's predictability on the basis ofHwith its predictability on the basis ofH*. We can use thesetwo quantities to formulate an even more general form of Bayes'Theorem.

(1.6) Bayes' Theorem (General Form)
BR(H,H*;E) =LR(H,H*;E)

The message of (1.6) is this:

The ratio of probabilities for two hypotheses conditional on abody of data is equal to the ratio their unconditional probabilitiesmultiplied by the degree to which the first hypothesis surpasses thesecond as a predictor of the data.

The various versions of Bayes' Theorem differ only with respect tothe functions used to express unconditional probabilities(P(H),O(H),B(H)) andin the likelihood term used to represent predictive power(PR(E,H),LR(H,E),LR(H,H*;E)). In eachcase, though, the underlying message is the same:

conditional probability = unconditional probability × predictive power

(1.2) – (1.6) are multiplicative forms of Bayes' Theorem that usedivision to compare the disparities between unconditional andconditional probabilities. Sometimes these comparisons are bestexpressed additively by replacing ratios withdifferences.The following table gives the additive analogue of each ratio measure.

**Table 1**
Ratio	Difference
Probability Ratio PR(H,E)=P_E(H)/P(H)	Probability Difference PD(H,E) =P_E(H) −P(H)
Odds Ratio OR(H,E) =O_E(H)/O(H)	Odds Difference OD(H,E) =O_E(H) −O(H)
Bayes' Factor BR(H,H;E) =B_E(H,H)/B(H,H*)	Bayes' Difference BD(H,H;E) =B_E(H,H) −B(H,H*)

We can use Bayes' theorem to obtain additive analogues of (1.4) –(1.6), which are here displayed along with their multiplicativecounterparts:

**Table 2**
	Ratio	Difference
(1.4)	PR(H,E)=PR(E,H)=P_H(E)/P(E)	PD(H,E)=P(H) [PR(E,H) − 1]
(1.5)	OR(H,E)=LR(H,E)=P_H(E)/P_~H(E)	OD(H,E) =O(H)[OR(H,E) − 1]
(1.6)	BR(H,H;E) =LR(H,H;E) =P_H(E)/P_H*(E)	BD(H,H;E) =B(H,H) [BR(H,H*;E) − 1]

Notice how each additive measure is obtained by multiplyingH's unconditional probability, expressed on the relevantscale,P,O orB, by the associatedmultiplicative measure diminished by 1.

While the results of this section are useful to anyone who employsthe probability calculus, they have a special relevance forsubjectivist or "Bayesian" approaches to statistics,epistemology, and inductive inference.^[5] Subjectivists lean heavily on conditional probabilities in theirtheory of evidential support and their account of empiricallearning. Given that Bayes' Theorem is the single most important factabout conditional probabilities, it is not at all surprising that itshould figure prominently in subjectivist methodology.

3. The Role of Bayes' Theorem in Subjectivist Accounts of Evidence

Subjectivists maintain that beliefs come in varying gradations ofstrength, and that an ideally rational person's graded beliefs can berepresented by asubjective probability functionP. For each hypothesisH about which the person has afirm opinion,P(H) measures her level of confidence(or "degree of belief") inH's truth.^[6] Conditional beliefs are represented by conditional probabilities, sothatP_E(H) measures the person'sconfidence inH on the supposition that E is a fact.^[7]

One of the most influential features of the subjectivist program isits account ofevidential support. The guiding ideas of thisBayesian confirmation theory are these:

Confirmational Relativity. Evidential relationships mustbe relativized to individuals and their degrees of belief.
Evidence Proportionism.^[8] A rational believer will proportion her confidence in a hypothesisH to hertotal evidence forH, so that hersubjective probability forH reflects the overall balance ofher reasons for or against its truth.
Incremental Confirmation.^[9] A body of data providesincremental evidence forHto the extent that conditioning on the data raisesH'sprobability.

The first principle says that statements about evidentiaryrelationships always make implicit reference to people and theirdegrees of belief, so that, e.g., "E is evidence forH" should really be read as "E is evidence forH relative to the information encoded in the subjectiveprobabilityP".

According to evidence proportionism, a subject's level of confidenceinH should vary directly with the strength of her evidencein favor ofH's truth. Likewise, her level of confidence inH conditional onE should vary directly with thestrength of her evidence forH's truth when this evidence isaugmented by the supposition ofE. It is a matter of somedelicacy to say precisely what constitutes a person's evidence,^[10] and to explain how her beliefs should be "proportioned" to it.Nevertheless, the idea that incremental evidence is reflected indisparities between conditional and unconditional probabilities onlymakes sense if differences in subjective probability mirrordifferences intotal evidence.

An item of data provides a subject withincremental evidencefor or against a hypothesis to the extent that receiving the dataincreases or decreases her total evidence for the truth of thehypothesis. When probabilities measure total evidence, the incrementof evidence thatE provides forH is a matter of thedisparity betweenP_E(H) andP(H). When odds are used it is a matter of thedisparity betweenO_E(H) andO(H). SeeExample 2 in thesupplementary document "Examples, Tables, and Proof Sketches", whichillustrates the difference between total and incremental evidence, andexplains the "baserate fallacy" that can result from failing toproperly distinguish the two.

It will be useful to distinguish two subsidiary concepts related tototal evidence.

Thenet evidence in favor of H is the degree to which asubject's total evidence in favor ofH exceeds her totalevidence in favor of ~H.
Thebalance of total evidence for H over H* is the degreeto which a subject's total evidence in favor ofH exceeds hertotal evidence in favor ofH*.

The precise content of these notions will depend on how totalevidence is understood and measured, and on how disparities in totalevidence are characterized. For example, if total evidence is givenin terms of probabilities and disparities are treated as ratios, thenthe net evidence forH isP(H)/P(~H). If total evidenceis expressed in terms of odds and differences are used to expressdisparities, then the net evidence forH will beO(H) −O(~H). Readers mayconsultTable 3 (in the supplementary document) for a complete list of the possibilities.

As these remarks make clear, one can interpretO(H)either as a measure of net evidence or as a measure of total evidence.To see the difference, imagine that 750 red balls and 250 black ballshave been drawn at random and with replacement from an urn known tocontain 10,000 red or black balls. Assuming that this is our onlyevidence about the urn's contents, it is reasonable to setP(Red) = 0.75 andP(~Red) = 0.25. Ona probability-as-total-evidence reading, these assignments reflectboth the fact that we have a great deal of evidence in favor ofRed (namely, that 750 of 1,000 draws were red) and the factthat we have also have some evidence against it (namely, that 250 ofthe draws were black). Thenet evidence forRed isthen the disparity between our total evidence forRed and ourtotal evidence againstRed. This can be expressedmultiplicatively by saying that we have seen three times as many reddraws as black draws, which is just to say thatO(Red)= 3. Alternatively, we can useO(Red) as a measure ofthe total evidence by taking our evidence forRed to be theratio of red to black draws, rather than the total number of reddraws, and our evidence for ~Red to be the ratio of blackballs to red balls, rather than the total number of black draws.While the decision whether to useO as a measure total or netevidence makes little difference to questions about theabsolute amount of total evidence for a hypothesis (sinceO(H) is an increasing function ofP(H)), it can make a major difference when one isconsidering the incrementalchanges in total evidence broughtabout by conditioning on new information.

Philosophers interested in characterizing correct patterns ofinductive reasoning and in providing "rational reconstructions" ofscientific methodology have tended to focus on incremental evidence ascrucial to their enterprise. When scientists (or ordinary folk) saythatE supports or confirmsH what they generallymean is that learning ofE's truth will increase the totalamount of evidence forH's truth. Since subjectivistscharacterize total evidence in terms of subjective probabilities orodds, they analyze incremental evidence in terms of changes in thesequantities. On such views, the simplest way to characterize thestrength of incremental evidence is by making ordinal comparisons ofconditional and unconditional probabilities or odds.

(2.1) A Comparative Account of Incremental Evidence.
Relative to a subjective probability functionP,
E incrementally confirms (disconfirms, is irrelevant to)H if and only ifP_E(H) isgreater than (less than, equal to)P(H).
H receives a greater increment (or lesser decrement) ofevidential support fromE than fromE* if and onlyifP_E(H) exceedsP_E*(H).

Both these equivalences continue to hold with probabilities replacedby odds. So, this part of the subjectivist theory of evidence doesnot depend on how total evidence is measured.

Bayes' Theorem helps to illuminate the content of (2.1) by making itclear thatE's status as incremental evidence forHis enhanced to the extent thatH predictsE. Thisobservation serves as the basis for the following conclusions aboutincremental confirmation (which hold so long as 1 >P(H),P(E) > 0).

(2.1a) IfE incrementally confirmsH, thenH incrementally confirmsE.
(2.1b) IfE incrementally confirmsH, thenE incrementally disconfirms~H.
(2.1c) IfH entailsE, thenEincrementally confirmsH.
(2.1d) IfP_H(E) =P_H(E*), thenH receivesmore incremental support fromE than fromE* if andonly ifE is unconditionally less probable thanE*.
(2.1e) Weak Likelihood Principle.E provides incremental evidence forH if and only ifP_H(E) >P_~H(E). More generally, ifP_H(E) >P_H*(E) andP_~H(~E) ≥P_~H*(~E), thenE providesmore incremental evidence forH than forH*.

(2.1a) tells us that incremental confirmation is a matter ofmutual reinforcement: a person who seesE asevidence forH invests more confidence in the possibilitythat both propositions are true than in either possibility in whichonly one obtains.

(2.1b) says that relevant evidence must be capable of discriminatingbetween the truth and falsity of the hypothesis under test.

(2.1c) provides a subjectivist rationale for thehypothetico-deductive model of confirmation. According tothis model, hypotheses are incrementally confirmed by any evidencethey entail. While subjectivists reject the idea that evidentiaryrelations can be characterized in a belief-independent manner —Bayesian confirmation isalways relativized to a person andher subjective probabilities — they seek to preserve the basicinsight of the H-D model by pointing out that hypotheses areincrementally supported by evidence they entailfor anyone who hasnot already made up her mind about the hypothesis or theevidence. More precisely, ifH entailsE, thenP_E(H) =P(H)/P(E), which exceedsP(H) whenever 1 >P(E),P(H) > 0. This explains why scientists so oftenseek to design experiments that fit the H-D paradigm. Even whenevidentiary relations are relativized to subjective probabilities,experiments in which the hypothesis under test entails the data willbe regarded as evidentially relevant byanyone who has notyet made up his mind about the hypothesis or the data. Thedegree of incremental confirmation will vary among peopledepending on their prior levels of confidence inH andE , but everyone will agree that the data incrementallysupports the hypothesis to at least some degree.

Subjectivists invoke (2.1d) to explain why scientists so often regardimprobable or surprising evidence as having more confirmatorypotential than evidence that is antecedently known. While it is nottruein general that improbable evidence has more confirmingpotential, it is true thatE's incremental confirming powerrelative toH varies inversely withE'sunconditional probabilitywhen the value of the inverseprobabilityP_H(E)is heldfixed. IfH entails bothE andE*,say, then Bayes' Theorem entails that the least probable of the twosupportsH more strongly. For example, even if heart attacksare invariably accompanied by severe chest pain and shortness ofbreath, the former symptom is far better evidence for a heart attackthan the latter simply because severe chest pain is so much lesscommon than shortness of breath.

(2.1e) captures one core message of Bayes' Theorem for theories ofconfirmation. Let's say thatH isuniformly betterthanH* as predictor ofE's truth-value when (a)H predictsE more strongly thanH* does,and (b) ~H predicts ~E more strongly than~H* does. According to the weak likelihood principle,hypotheses that are uniformly better predictors of the data are bettersupported by the data. For example, the fact that little Johnny is aChristian is better evidence for thinking that his parents areChristian than for thinking that they are Hindu because (a) a farhigher proportion of Christian parents than Hindu have Christianchildren, and (b) a far higher proportion of non-Christian parentsthan non-Hindu parents have non-Christian children.

Bayes' Theorem can also be used as the basis for developing andevaluatingquantitative measures of evidential support. Theresults listed in Table 2 entail that all four of the functionsPR,OR,PD andOD agree with one another on the simplest question ofconfirmation: DoesE provide incremental evidence forH?

(2.2) Corollary.
Each of the following is equivalent to the assertionthatE provides incremental evidence in favor ofH:PR(H,E) > 1,OR(H,E) > 1,PD(H,E) > 0,OD(H,E) > 0.

Thus, all four measures agree with the comparative account ofincremental evidence given in (2.1).

Given all this agreement it should not be surprising thatPR(H,E),OR(H,E) andPD(H,E), have all been proposed asmeasures of thedegree of incremental support thatEprovides forH.^[11] WhileOD(H,E) has not beensuggested for this purpose, we will consider it for reasons ofsymmetry. Some authors maintain that one or another of thesefunctions is the unique correct measure of incremental evidence;others think it best to use a variety of measures that capturedifferent evidential relationships. While this is not the place toadjudicate these issues, we can look to Bayes' Theorem for help inunderstanding what the various functions measure and in characterizingthe formal relationships among them.

All four measures agree in their conclusions about thecomparative amount of incremental evidence that differentitems of data provide for afixed hypothesis. In particular,they agree ordinally about the following concepts derived fromincremental evidence:

Theeffective increment of evidence^[12] thatE provides forH is the amount by which theincremental evidence thatE provides forH exceedsthe incremental evidence that ~E provides forH.
Thedifferential in the incremental evidence thatE andE* provide forH is the amount bywhich the incremental evidence thatE provides forHexceeds the incremental evidence thatE* provides forH.

Effective evidence is a matter of the degree to which a person'stotal evidence forH depends on her opinion aboutE.WhenP_E(H) andP_~E(H) (orO_E(H) andO_~E(H)) are far apart the person'sbelief aboutE has a great effect on her belief aboutH: from her point of view, a great deal hangs onE'struth-value when it comes to questions aboutH's truth-value.A large differential in incremental evidence betweenE andE* tells us that learningE increases the subject'stotal evidence forH by a larger amount than learningE* does. Readers may consultTable 4 (in thesupplement) for quantitative measures of effective anddifferential evidence.

The second clause of (2.1) tells us thatE provides moreincremental evidence thanE* does forH just in casethe probability ofH conditional onE exceeds theprobability ofH conditional onE*. It is then asimple step to show that all four measures of incremental supportagree ordinally on questions of effective evidence and ofdifferentials in incremental evidence.

(2.3) Corollary.
For anyH,E* andE withpositive probability, the following are equivalent:
E provides more incremental evidence thanE*does forH
PR(H,E) >PR(H,E*)
OR(H,E) >OR(H,E*)
PD(H,E) >PD(H,E*)
OD(H,E) >OD(H,E*)

The four measures of incremental support can disagree over thecomparative degree to which a single item of dataincrementally confirms two distinct hypotheses.Example 3,Example 4, andExample 5 (in the supplement) show the various ways in which thiscan happen.

All the differences between the measures have ultimately to do with(a) whether thetotal evidence in favor of a hypothesisshould be measured in terms of probabilities or in terms of odds, and(b) whetherdisparities in total evidence are best capturedas ratios or as differences. Rows in the following table correspondto different measures of total evidence. Columns correspond todifferent ways of treating disparities.

**Table 5**: Four measures of incremental evidence
	Ratio	Difference
P = Total	PR(H,E) =P_E(H)/P(H)	PD(H,E)=P_E(H) −P(H)
O = Total	OR(H,E) =O_E(H)/O(H)	OD(H,E)=O_E(H) −O(H)

Similar tables can be constructed for measures of net evidence andmeasures of balances in total evidence. SeeTable 5A in the supplement.

We can use the various forms of Bayes' Theorem to clarify thesimilarities and differences among these measures by rewriting each ofthem in terms of likelihood ratios.

**Table 6**: The four measures expressed in terms oflikelihood ratios
	Ratio	Difference
P = Total	PR(H,E) =LR(H,T;E)	PD(H,E) =P(H)[LR(H,T;E) − 1]
O = Total	OR(H,E) =LR(H, ~H;E)	OD(H,E)=O(H)[LR(H, ~H;E) − 1]

This table shows that there are two differences between eachmultiplicative measure and its additive counterpart. First, thelikelihood term that appears in a given multiplicative measure isdiminished by 1 in its associated additive measure. Second, in eachadditive measure the diminished likelihood term is multiplied by anexpression forH's probability:P(H) orO(H), as the case may be. The first differencemarks no distinction; it is due solely to the fact that themultiplicative and additive measures employ a different zero pointfrom which to measure evidence. If we settle on the point ofprobabilistic independenceP_E(H) =P(H) as a natural common zero, and so subtract 1 fromeach multiplicative measure,^[13] then equivalent likelihood terms appear in both columns.

The real difference between the measures in a given row concerns theeffect of unconditional probabilities on relations of incrementalconfirmation. Down the right column, the degree to whichEprovides incremental evidence forH is directly proportionaltoH's probability expressed in units ofP(T) orP(~H). In the leftcolumn,H's probability makes no difference to the amount ofincremental evidence thatE provides forH onceP_H(E) and eitherP(E) orP_~H(E) are fixed.^[14] In light of Bayes' Theorem, then, the difference between the ratiomeasures and then difference measures boils down to one question:

Does a given piece of data provide a greater increment ofevidential support for a more probable hypothesis than it does for aless probable hypothesis when both hypotheses predict the data equallywell?

The difference measures answer yes, the ratio measures answer no.

Bayes' Theorem can also help us understand the difference betweenrows. The measures within a given row agree about the role ofpredictability in incremental confirmation. In the top rowthe incremental evidence thatE provides forHincreases linearly withP_H(E)/P(E),whereas in the bottom row it increases linearly withP_H(E)/P_~H(E).Thus, when probabilities measure total evidence what matters is thedegree to whichH exceedsT as a predictor ofE, but when odds measure total evidence it is the degree towhichH exceeds ~H as a predictor ofE thatmatters.

The central issue here concerns the status of the likelihood ratio.While everyone agrees that it should play a leading role in anyquantitative theory of evidence, there are conflicting views aboutprecisely what evidential relationship it captures. There are threepossible interpretations.

**Table 7**: Three interpretations of the likelihoodratio
Probability as total evidence reading	PR(H,E) measures incrementalchange in total evidence. LR(H,E) measures incrementalchange innet evidence. LR(H,H,E) measuresincremental change in the balance of evidence thatE providesforH overH
Odds as total evidence reading	LR(H,E) measures incrementalchanges in total evidence. LR(H,E)² measuresincremental change in net evidence. LR(H,H;E)/LR(~H, ~H;E) measures incremental change in the balance of evidencethatE provides forH overH*.
"Likelihoodist" reading	NeitherP norO measures total evidence becauseevidential relations are essentiallycomparative; they alwaysinvolve the balance of evidence. LR(H,E) measures the balanceof evidence thatE provides forH overH. LR(H,H;E) measuresthe balance of evidence thatE provides forH overH*.

On the first reading there is no conflict whatsoever between usingprobability ratios and using likelihood ratios to measure evidence.Once we get clear on the distinctions between total evidence, netevidence and the balance of evidence, we see that each ofPR(H,E),LR(H,E) andLR(H,H*;E) measures animportant evidential relationship, but that the relationships theymeasure are importantly different.

When odds measure total evidence neitherPR(H,E) norLR(H,H*;E) plays afundamental role in the theory of evidence. Changes in theprobability ratio forH givenE only indicatechanges in incremental evidence in the presence of information aboutchanges in the probability ratio for ~H givenE.Likewise, changes in the likelihood ratio forH andH* givenE only indicate changes in the balance ofevidence in light of information about changes in the likelihood ratiofor ~H and ~H* givenE. Thus, while eachof the two functions can figure as one component in a meaningfulmeasure of confirmation, neither tells us anything about incrementalevidence when taken by itself.

The third view, "likelihoodism," is popular among non-Bayesianstatisticians. Its proponents deny evidence proportionism. Theymaintain that a person's subjective probability for a hypothesismerely reflects her degree of uncertainty about its truth; it need notbe tied in any way to the amount of evidence she has in its favor.^[15] It is likelihood ratios, not subjective probabilities, which capturethe "scientifically meaningful" evidential relations. Here are twoclassic statements of the position.

All the information which the data provide concerning the relativemerits of two hypotheses is contained in the likelihood ratio of thehypotheses on the data. (Edwards 1972, 30)
The ‘evidential meaning’ of experimental results is characterizedfully by the likelihood function… Reports of experimental results inscientific journals should in principle be descriptions of likelihoodfunctions. (Brinbaum 1962, 272)

On this view, everything that can be said about the evidential importofE forH is embodied in the followinggeneralization of the weak likelihood principle:

The "Law of Likelihood". IfH implies that theprobability ofE isx, whileH* impliesthat the probability ofE isx*, thenE isevidence supportingH overH* if and only ifx exceedsx*, and the likelihood ratio,x/x*, measures the strength of this support.(Hacking 1965, 106-109), (Royall 1997, 3)

The biostatistician Richard Royall is a particularly lucid defenderof likelihoodism (Royall 1997). He maintains that any scientificallyrespectable concept of evidence must analyze the evidential impact ofE onH solely in terms of likelihoods; it should notadvert to anyone's unconditional probabilities forE orH. This is supposed to be because likelihoods are bothbetter known and more objective than unconditional probabilities.Royall argues strenuously against the idea that incremental evidencecan be measured in terms of the disparity between unconditional andconditional probabilities. Here is the gist of his complaint:

Whereas [LR(H,H*;E)]measures the support for one hypothesisH relative to aspecific alternativeH*, without regard either to the priorprobabilities of the two hypotheses or to what other hypotheses mightalso be considered, the law of changing probability [as measured byPR(H,E)] measures support forH relative to a specific prior distribution overHand its alternatives... The law of changing probability is of limitedusefulness in scientific discourse because of its dependence on theprior probability distribution, which is generally unknown and/orpersonal. Although you and I agree (on the basis of the law oflikelihood) that given evidence supportsH overH*,andH** over bothH andH*, we mightdisagree about whether it is evidence supportingH (on thebasis of the law of changing probability) purely on the basis of ourdifferent judgments of the priori probability ofH,H*, andH**. (Royall 1997, 10-11, with slightchanges in notation)

Royall's point is that neither the probability ratio nor probabilitydifference will capture the sort of objective evidence required byscience because their values depend on the "subjective" termsP(E) andP(H), and not just on the"objective" likelihoodsP_H(E) andP_~H(E).

Whether one agrees with this assessment will be a matter ofphilosophical temperament, in particular of one's willingness totolerate subjective probabilities in one's account of evidentialrelations. It will also depend crucially on the extent to which oneis convinced that likelihoods are better known and more objective thanordinary subjective probabilities. Cases like the one envisioned inthe law of likelihood, where hypothesesdeductively entails adefinite probability for the data, are relatively rare. So, unlessone is willing to adopt a theory of evidence with a very restrictedrange of application, a great deal will turn on how easy it is todetermine objective likelihoods in situations where the predictiveconnection from hypothesis to data is itself the result ofinductive inferences. However one comes down on theseissues, though, there is no denying that likelihood ratios will play acentral role in any probabilistic account of evidence.

In fact, the weak likelihood principle (2.1e) encapsulates a minimalform of Bayesianism to which all parties can agree. This is clearestwhen it is restated in terms of likelihoods.

(2.1e) The Weak Likelihood Principle. (expressed interms of likelihood ratios)
IfLR(H,H*;E)≥ 1 andLR(~H, ~H*;~E) ≥ 1, with one inequality strict, thenEprovides more incremental evidence forH than forH*and ~E provides more incremental evidence for ~Hthan for ~H*.

Likelihoodists will endorse (2.1e) because the relationshipsdescribed in its antecedent depend only on inverse probabilities.Proponents of both the "probability" and "odds" interpretations oftotal evidence will accept (2.1e) because satisfaction of itsantecedent ensures that conditioning onE increasesH's probability and its odds strictly more than those ofH*. Indeed, the weak likelihood principle must be anintegral part of any account of evidential relevance that deserves thetitle "Bayesian". To deny it is to misunderstand the central messageof Bayes' Theorem for questions of evidence: namely, that hypothesesare confirmed by data they predict. As we shall see in the nextsection, this "minimal" form of Bayesianism figures importantly intosubjectivist models of learning from experience.

4. The Role of Bayes' Theorem in Subjectivist Models of Learning

Subjectivists think of learning as a process ofbeliefrevision in which a "prior" subjective probabilityP isreplaced by a "posterior" probabilityQ that incorporates newlyacquired information. This process proceeds in two stages. First,some of the subject's probabilities aredirectly altered byexperience, intuition, memory, or some othernon-inferentiallearning process. Second, the subject "updates" the rest of heropinions to bring them into line with her newly acquired knowledge.

Many subjectivists are content to regard the initial belief changesassui generis and independent of the believer's prior stateof opinion. However, as long as the first phase of the learningprocess is understood to be non-inferential, subjectivism can be madecompatible with an "externalist" epistemology that allows forcriticism of belief changes in terms the reliability of the causalprocesses that generate them. It can even accommodate the thought thatthe direct effect of experience might depend causally on thebeliever's prior probability.

Subjectivists have studied the second, inferential phase of thelearning process in great detail. Here immediate belief changes areseen as imposing constraints of the form "the posterior probabilityQ has such-and-such properties." The objective is to discoverwhat sorts of constraints experience tends to impose, and to explainhow the person'sprior opinions can be used to justify thechoice of a posterior probability from among the many that mightsatisfy a given constraint. Subjectivists approach the latter problemby assuming that the agent is justified in adopting whatever eligibleposteriordeparts minimally from her prior opinions. This isa kind of "no jumping to conclusions" requirement. We explain it hereas a natural result of the idea that rational learners shouldproportion their beliefs to the strength of the evidence they acquire.

The simplest learning experiences are those in which the learnerbecomes certain of the truth of some propositionE aboutwhich she was previously uncertain. Here the constraint is that allhypotheses inconsistent withE must be assigned probabilityzero. Subjectivists model this sort of learning assimpleconditioning, the process in which the prior probability of eachpropositionH is replaced by a posterior that coincides withthe prior probability ofH conditional onE.

(3.1) Simple Conditioning
If a person with a "prior" such that 0 <P(E) < 1has a learning experience whose sole immediate effect is to raise hersubjective probability forE to 1, then her post-learning"posterior" for any propositionH should beQ(H) =P_E(H).

In short, a rational believer who learns for certain thatEis true should factor this information into her doxastic system byconditioning on it.

Though useful as an ideal, simple conditioning is not widelyapplicable because it requires the learner to become absolutelycertain ofE's truth. As Richard Jeffrey has argued(Jeffrey 1987), the evidence we receive is often too vague orambiguous to justify such "dogmatism." On more realistic models, thedirect effect of a learning experience will be toalter thesubjective probability of some proposition without raising it to 1 orlowering it to 0. Experiences of this sort are appropriately modeledby what has come to be calledJeffrey conditioning (thoughJeffrey's preferred term is "probability kinematics").

(3.2) Jeffrey Conditioning
If a person with a prior such that 0 <P(E) < 1has a learning experience whose sole immediate effect is to change hersubjective probability forE toq, then herpost-learning posterior for anyH should beQ(H) =qP_E(H) + (1 −q)P_~E(H).

Obviously, Jeffrey conditioning reduces to simple conditioning whenq = 1.

A variety of arguments for conditioning (simple or Jeffrey-style) canbe found in the literature, but we cannot consider them here.^[16] There is, however, one sort of justification in which Bayes' Theoremfigures prominently. It exploits connections between belief revisionand the notion of incremental evidence to show that conditioning istheonly belief revision rule that allows learners tocorrectly proportion their posterior beliefs to the new evidence theyreceive.

The key to the argument lies in marrying the "minimal" version ofBayesian expressed in the (2.1e) to a very modest "proportioning"requirement for belief revision rules.

(3.3) The Weak Evidence Principle
If, relative to a priorP,E provides at leastas much incremental evidence forH as forH*, and ifH is antecedently more probable thanH*, thenH should remain more probable thanH* after anylearning experience whose sole immediate effect is to increase theprobability ofE.

This requires an agent to retain his views about the relativeprobability of two hypotheses when he acquires evidence that supportsthe more probable hypothesis more strongly. It rules out obviouslyirrational belief revisions such as this: George is more confidentthat the New York Yankees will win the American League Pennant than heis that the Boston Rex Sox will win it, but he reverses himself whenhe learns (only) that the Yankees beat the Red Sox in last night'sgame.

Combining (3.3) with minimal Bayesianism yields the following:

(3.4) Consequence
If a person's prior is such thatLR(H,H*;E) ≥ 1,LR(~H,~H*; ~E) ≥ 1, andP(H) >P(H*), then any learning experience whose soleimmediate effect is to raise her subjective probability forEshould result in a posterior such thatQ(H) >Q(H*).

On the reasonable assumption thatQ is defined on the same setof propositions over whichP is defined, this conditionsuffices to pick out simple conditioning as theuniquecorrect method of belief revision for learning experiences that makeE certain. It picks out Jeffrey conditioning as theunique correct method when learning merely alters one'ssubjective probability forE. The argument for theseconclusions makes use of the following two facts about probabilities.

(3.5) Lemma
IfH andH* both entailE whenP(H) >P(H*), thenLR(H,H*;E) = 1
andLR(~H, ~H*; ~E) >1.
Proof Sketch
(3.6) Lemma
Simple conditioning onE is the only rule for revisingsubjective probabilities that yields a posterior with the followingproperties forany prior such thatP(E) >0:
Q(E) = 1.
Ordinal Similarity. IfH andH* bothentailE, thenP(H) ≥P(H*) if and
only ifQ(H) ≥Q(H*).
Proof Sketch

From here the argument for simple conditioning is a matter of using(3.4) and (3.5) to establish ordinal similarity. Suppose thatH andH* entailE and thatP(H) >P(H*). It follows from(3.5) thatLR(H,H*;E) = 1andLR(~H, ~H*; ~E) >1. (3.4) then entails that any learning experience that raisesE's probability must result in a posterior withQ(H) >Q(H*). Thus,Q andP are ordinally similar with respect to hypotheses that entailH. If we go on to suppose that the learning experienceraisesE's probability to 1, then (3.6) then guarantees thatQ arises fromP by simple conditioning onE.

The case for Jeffrey conditioning is similarly direct. Since theargument for ordinal similarity did not depend at all on theassumption thatQ(E) = 1, we have really established

(3.7) Corollary
• IfH andH* entailE, thenP(H) >P(H*) if and only ifQ(H) >Q(H*).
• IfH andH* entail ~E, thenP(H) >P(H*) if and only ifQ(H) >Q(H*).

So,Q is ordinally similar toP both when restricted tohypotheses that entailE and when restricted to hypothesesthan entail ~E. Moreover, since dividing by positive numbersdoes not disturb ordinal relationships, it also follows that thatQ_E is ordinally similar toP whenrestricted to hypotheses that entailE, and thatQ_~E is ordinally similar toP whenrestricted to hypotheses than entail ~E. SinceQ_E(E) = 1 =Q_~E(E), (3.6) then entails:

(3.8) Consequence
For every propositionH,Q_E(H) =P_E(H) andQ_~E(H) =P_~E(H)

It is easy to show that (3.8) is necessary and sufficient forQ to arise fromP by Jeffrey conditioning onE.Subject to the constraintQ(E) =q, itguarantees thatQ(H) =qP_E(H) + (1−q)P_~E(H).

The general moral is clear.

The basic Bayesian insight embodied in the weak likelihoodprinciple (2.1e) entails that simple and Jeffrey conditioning onE are theonly rational ways to revise beliefs inresponse to a learning experience whose sole immediate effect is toalterE's probability.

While much more can be said about simple conditioning, Jeffreyconditioning and other forms of belief revision, these remarks shouldgive the reader a sense of the importance of Bayes' Theorem insubjectivist accounts of learning and evidential support. Though amathematical triviality, the Theorem's central insight — that ahypothesis is supported by any body of data it renders probable — lies at the heart of all subjectivist approaches to epistemology, statistics, and inductive logic.

Bibliography

Armendt, B. 1980. "Is There a Dutch Book Argument for ProbabilityKinematics?",Philosophy of Science47, 583-588.
Bayes, T. 1764. "An Essay Toward Solving a Problem in the Doctrineof Chances",Philosophical Transactions of the Royal Society ofLondon53, 370-418. [Fascimile available online: the original essay with an introduction by his friend Richard Price]
Birnbaum A. 1962. "On the Foundations of Statistical Inference",Journal of the American Statistical Association53,259-326.
Carnap, R. 1962.Logical Foundations of Probability, 2ndedition. Chicago: University of Chicago Press.
Chihara, C. 1987. "Some Problems for Bayesian ConfirmationTheory",British Journal for the Philosophy of Science38, 551-560.
Christensen, D. 1999. "Measuring Evidence",Journal ofPhilosophy96, 437-61.
Dale, A. I. 1989. "Thomas Bayes: A Memorial",The MathematicalIntelligencer11, 18-19.
----- 1999.A History of Inverse Probability, 2ndedition. New York: Springer-Verlag.
Earman, J. 1992.Bayes or Bust? Cambridge, MA: MITPress.
Edwards, A. W. F. 1972.Likelihood. Cambridge: CambridgeUniversity Press.
Glymour, Clark. 1980.Theory and Evidence. Princeton:Princeton University Press.
Hacking, Ian. 1965.Logic of StatisticalInference. Cambridge: Cambridge University Press.
Hájek, A. 2003. "Interpretations of the Probability Calculus", in theStanford Encyclopedia of Philosophy, (Summer 2003Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/sum2003/entries/probability-interpret/>
Hammond, P. 1994. "Elementary non-Archimedean Representations forof Probability for Decision Theory and Games," in P. Humphreys, ed.,Patrick Suppes: Scientific Philosopher, vol. 1., Dordrecht:Kluwer Publishers, 25-62.
Harper, W. 1976. "Rational Belief Change, Popper Functions andCounterfactuals," in W. Harper and C. Hooker, eds.,Foundations ofProbability Theory, Statistical Inference, and Statistical Theories ofScience, vol. I. Dordrecht: Reidel, 73-115.
Hartigan, J. A. 1983.Bayes Theory. New York:Springer-Verlag.
Howson, Colin. 1985. "Some Recent Objections to the BayesianTheory of Support",British Journal for the Philosophy ofScience,36, 305-309.
Jeffrey, R. 1987. "Alias Smith and Jones: The Testimony of theSenses",Erkenntnis26, 391-399.
----- 1992.Probability and the Art of Judgment. NewYork: Cambridge University Press.
Joyce, J. M. 1999.The Foundations of Causal DecisionTheory. New York: Cambridge University Press.
Kahneman, D. and Tversky, A. 1973. "On the psychology ofprediction",Psychological Review80, 237-251.
Kaplan, M. 1996.Decision Theory asPhilosophy. Cambridge: Cambridge University Press.
Levi, I. 1985. "Imprecision and Indeterminacy in ProbabilityJudgment",Philosophy of Science53, 390-409.
Maher, P. 1996. "Subjective and Objective Confirmation",Philosophy of Science63, 149-174.
McGee, V. 1994. "Learning the Impossible," in E. Eells andB. Skyrms, eds.,Probability and Conditionals. New York:Cambridge University Press, 179-200.
Mortimer, Halina. 1988.The logic of induction, Ellis Horwood Series in Artificial Intelligence, New York; Halsted Press.
Nozick, R. 1981.Philosophical Explanations. Cambridge:Harvard University Press.
Renyi, A. 1955. "On a New Axiomatic Theory of Probability",Acta Mathematica Academiae Scientiarium Hungaricae6,285-335.
Royall, R. 1997.Statistical Evidence: A LikelihoodParadigm. New York: Chapman & Hall/CRC.
Skyrms, B. 1987. "Dynamic Coherence and ProbabilityKinematics".Philosophy of Science54, 1-20.
Sober, E. 2002. "Bayesianism — its Scope and Limits", inSwinburne (2002), 21-38.
Sphon, W. 1986. "The Representation of Popper Measures",Topoi5, 69-74.
Stigler, S. M. 1982. "Thomas Bayes' Bayesian Inference",Journal of the Royal Statistical Society, series A145, 250-258.
Swinburne, R. 2002.Bayes' Theorem. Oxford: OxfordUniversity Press (published for the British Academy).
Talbot, W. 2001. "Bayesian Epistemology",Stanford Encyclopedia of Philosophy (Fall2001 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/fall2001/entries/epistemology-bayesian/>
Teller, P. 1976. "Conditionalization, Observation, and Change ofPreference", in W. Harper and C.A. Hooker, eds.,Foundations ofProbability Theory, Statistical Inference, and Statistical Theories ofScience. Dordrecht: D. Reidel.
Williamson, T. 2000.Knowledge and its Limits. Oxford:Oxford University Press.
Van Fraassen, B. 1999. "A New Argument forConditionalization".Topoi18, 93-96.

Academic Tools

How to cite this entry.
Preview the PDF version of this entry at theFriends of the SEP Society.
Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
Enhanced bibliography for this entry atPhilPapers, with links to its database.

Other Internet Resources

Fitelson, B. 2001.Studies in Bayesian ConfirmationTheory, Ph.D. Dissertation, University of Wisconsin. [Preprint in PDF available online] (750K download)
A Short Biography of Thomas Bayes (University of St. Andrews, MacTutor History of Mathematics Archive)
The International Society for Bayesian Analysis (ISBA)

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

Browse

About

Support SEP

Mirror Sites

View this site from another server:

USA (Main Site)Philosophy, Stanford University

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054

Movatterモバイル変換