An inductive logic is a logic of evidential support. In a deductivelogic, the premises of a valid deductive argumentlogicallyentail the conclusion, wherelogical entailment means that every logically possible state of affairs that makes the premises truemustmake the conclusion truth as well. Thus, the premises of a valid deductive argument providetotal support for the conclusion. An inductive logic extends this idea to weaker arguments. In a good inductive argument, the truth of the premises provides somedegree of support for the truth of the conclusion, where thisdegree-of-support might be measured via some numerical scale. By analogy with the notion of deductive entailment, the notion of inductive degree-of-support might mean something like this: among the logically possible states of affairs that make the premises true, the conclusion must be true in (at least) proportionr of them—wherer is some numerical measure of the support strength.
If a logic ofgood inductive arguments is to be of anyreal value, the measure of support it articulates should be up to the task. Presumably, the logic should at least satisfy the following condition:
Criterion of Adequacy (CoA):
The logic should make it likely (as a matter of logic) that as evidence accumulates,the total body of true evidence claims will eventually come to indicate, via the logic’smeasure ofsupport, that false hypotheses are probably false and that truehypotheses are probably true.
The CoA stated here may strike some readers as surprisingly strong. Given a specific logic of evidential support, how might it be shown to satisfy such a condition?Section 4 will show precisely how this condition is satisfied by the logic of evidential support articulated in Sections 1 through 3 of this article.
This article will focus on the kind of the approach to inductive logicmost widely studied by epistemologists and logicians in recent years.This approach employs conditional probability functions to representmeasures of the degree to which evidence statements supporthypotheses. Presumably, hypotheses should be empirically evaluatedbased on what theysay (or imply) about the likelihood that evidence claims will be true. Astraightforward theorem of probability theory, called Bayes’Theorem, articulates the way in which what hypothesessay about the likelihoods of evidence claims influences the degree to which hypotheses aresupported by those evidence claims. Thus, this approach to the logicof evidential support is often called aBayesian InductiveLogic or aBayesian Confirmation Theory. This article will first provide a detailed explication of a Bayesian approach to inductive logic. It willthen examine the extent to which this logic may pass muster asan adequate logic of evidential support for hypotheses. In particular,we will see how such a logic may be shown to satisfy the Criterion ofAdequacy stated above.
Sections 1 through 3 present all of the main ideas underlying the(Bayesian) probabilistic logic of evidential support. Thesethree sections should suffice to provide an adequate understanding ofthe subject. Section 5 extends this account to cases where theimplications ofhypotheses about evidence claims (calledlikelihoods)are vague or imprecise. After reading Sections 1 through 3, the reader may safely skip directly to Section 5, bypassing the rather technical account in Section 4 of how how the CoA is satisfied.
Section 4 is for the more advanced reader who wants an understanding of howthis logic may bring aboutconvergence to the true hypothesisas evidence accumulates. This result shows that the Criterion ofAdequacy is indeed satisfied—that as evidence accumulates, falsehypotheses will very probably come to have evidential support values(as measured by theirposterior probabilities) that approach0; and as this happens, a true hypothesis may very probably acquireevidential support values (as measured by itsposteriorprobability) that approaches 1.
Let us begin by considering some common kinds of examples of inductive arguments. Consider the following two arguments:
Example 1. Every raven in a random sample of 3200ravens is black. This strongly supports the following conclusion: Allravens are black.
Example 2. 62 percent of voters in a random sample of400 registered voters (polled on February 20, 2004) said that theyfavor John Kerry over George W. Bush for President in the 2004Presidential election. This supports with a probability of at least.95 the following conclusion: Between 57 percent and 67 percent of allregistered voters favor Kerry over Bush for President (at or aroundthe time the poll was taken).
This kind of argument is often called aninduction byenumeration. It is closely related to the technique of statisticalestimation. We may represent the logical form of such argumentssemi-formally as follows:
Premise: In random sampleS consisting ofn members ofpopulationB, the proportion of members that have attributeA isr.
Therefore, with degree of supportp,
Conclusion: The proportion of all members ofB that haveattributeA is between \(r-q\) and \(r+q\) (i.e., lies withinmargin of errorq ofr).
Let’s lay out this argument more formally. The premise breaksdown into three separate statements:[1]
| Semi-formalization | Formalization | |
| Premise 1 | The frequency (or proportion) of members with attributeAamong the members ofS isr. | \(F[A,S] = r\) |
| Premise 2 | S is a random sample ofB with respect to whetheror not its members haveA | Rnd[\(S,B,A\)] |
| Premise 3 | SampleS has exactlyn members | Size[\(S] = n\) |
| Therefore | with degree of supportp | \(========\{p\}\) |
| Conclusion | The proportion of members ofB that have attributeA lies between \(r-q\) and \(r+q\) (i.e., lies withinmargin of errorq ofr) | \(F[A,B] = r \pm q\) |
Any inductive logic that treats such arguments should address twochallenges. (1) It should tell us which enumerative inductivearguments should count asgood inductive arguments. Inparticular, it should tell us how to determine the appropriatedegreep to which such premisesinductivelysupport the conclusion, for a given margin of errorq. (2)It should demonstrably satisfy theCoA. That is, it should be provable (as a metatheorem) thatif aconclusion expressing the approximate proportion for an attribute in apopulation is true,then it is very likely that sufficientlynumerous random samples of the population will provide true premisesforgood inductive arguments that conferdegrees ofsupportp approaching 1 for that trueconclusion—where, on pain of triviality, thesesufficientlynumerous samples are only a tiny fraction of a large population.Thesupplement on Enumerative Inductions: Bayesian Estimation and Convergence, shows precisely how a a Bayesian account of enumerative induction maymeet these two challenges.
Enumerative induction is, however, rather limited in scope. This formof induction is only applicable to the support of claims involvingsimple universal conditionals (i.e., claims of form ‘AllBs areAs’) and claims about the proportion of anattribute in a population (i.e., claims of form ‘the frequencyofAs among theBs isr’). But, manyimportant empirical hypotheses are not reducible to this simple form,and the evidence for these hypotheses is not composed of anenumeration of such instances. Consider, for example, the NewtonianTheory of Mechanics:
All objects remain at rest or in uniform motion unless acted upon bysome external force. An object’s acceleration (i.e., the rate atwhich its motion changes from rest or from uniform motion) is in thesame direction as the force exerted on it; and the rate at which theobject accelerates due to a force is equal to the magnitude of theforce divided by the object’s mass. If an object exerts a forceon another object, the second object exerts an equal amount of forceon the first object, but in the opposite direction to the forceexerted by the first object.
The evidence for (and against) this theory is not gotten by examininga randomly selected subset of objects and the forces acting upon them.Rather, the theory is tested by calculating what this theorysays (or implies) about observable phenomena in a widevariety of specific situations—e.g., ranging from simplecollisions between small bodies to the trajectories of planets andcomets—and then seeing whether those phenomena occur in the waythat the theorysays they will. This approach to testinghypotheses and theories is ubiquitous, and should be captured by an adequate inductive logic.
More generally, for a wide range of cases where inductivereasoning is important, enumerative induction is inadequate. Rather,the kind of evidential reasoning that judges the likely truth of hypotheses on the basis of whattheysay (or imply) about the evidence is more appropriate.Consider the kinds of inferences jury members are supposed to make,based on the evidence presented at a murder trial. The inference toprobable guilt or innocence is based on a patchwork of evidence ofvarious kinds. It almost never involves consideration of a randomlyselected sequences of past situations when people like the accusedcommitted similar murders. Or, consider how a doctor diagnoses herpatient on the basis of his symptoms. Although the frequency ofoccurrence of various diseases when similar symptoms have been present mayplay a role, this is clearly not the whole story. Diagnosticianscommonly employ a form ofhypothesis evaluation—e.g.,would the hypothesis that the patient has a brain tumor account for his symptoms?; or are these symptoms more likely the result ofa minor stroke?; or may some other hypothesis better account for thepatient’s symptoms? Thus, a fully adequate account of inductivelogic should explicate the logic ofhypothesis evaluation,through which a hypothesis or theory may be tested on the basis ofwhat it says (or "predicts") about observable phenomena. InSection 3 we will see how a kind of probabilistic inductive logic called "Bayesian Inference" or"Bayesian Confirmation Theory" captures such reasoning. The full logicalstructure of such arguments will be spelled out in that section.
Perhaps the oldest and best understood way of representing partial belief, uncertain inference, and inductive support is in terms ofprobability and the equivalent notionodds. Mathematicians have studied probabilityfor over 350 years, but the concept is certainly much older. In recenttimes a number of other, related representations of partial belief and uncertain inference have emerged. Some of these approaches have found useful application in computer based artificial intelligence systems that perform inductive inferences inexpert domains such as medical diagnosis. Nevertheless, probabilistic representations have predominated in such application domains. So, in this article we will focus exclusively on probabilistic representations of inductive support. A brief comparative description of some of the most prominent alternative representations of uncertainty and support-strength can be found inSupplement: Some Prominent Approaches to the Representation of Uncertain Inference.
The mathematical study of probability originated with Blaise Pascaland Pierre de Fermat in the mid-17th century. From thattime through the early 19th century, as the mathematicaltheory continued to develop, probability theory was primarily appliedto the assessment of risk in games of chance and to drawing simplestatistical inferences about characteristics of largepopulations—e.g., to compute appropriate life insurance premiumsbased on mortality rates. In the early 19th century Pierrede Laplace made further theoretical advances and showed how to applyprobabilistic reasoning to a much wider range of scientific andpractical problems. Since that time probability has become anindispensable tool in the sciences, business, and many other areas ofmodern life.
Throughout the development of probability theory various researchers appear to have thought of it as a kind of logic. But the first extended treatment ofprobability as an explicit part of logic was George Boole’sThe Laws of Thought (1854). John Venn followed two decadeslater with an alternative empirical frequentist account of probabilityinThe Logic of Chance (1876). Not long after that the wholediscipline of logic was transformed by new developments in deductivelogic.
In the late 19th and early 20th century Frege,followed by Russell and Whitehead, showed how deductive logic may berepresented in the kind of rigorous formal system we now callquantified predicate logic. For thefirst time logicians had a fully formal deductive logic powerfulenough to represent all valid deductive arguments that arise inmathematics and the sciences. In this logic the validity of deductivearguments depends only on the logical structure of the sentencesinvolved. This development in deductive logic spurred some logiciansto attempt to apply a similar approach to inductive reasoning. Theidea was to extend the deductive entailment relation to a notion ofprobabilistic entailment for cases where premises provideless than conclusive support for conclusions. Thesepartialentailments are expressed in terms ofconditionalprobabilities, probabilities of the form \(P[C \pmid B] = r\)(read “the probability ofC givenB isr”), whereP is a probability function,Cis a conclusion sentence,B is a conjunction of premisesentences, andr is the probabilistic degree of support thatpremisesB provide for conclusionC. Attempts to developsuch a logic vary somewhat with regard to the ways in which they attempt toemulate the paradigm of formal deductive logic.
Some inductive logicians have tried to follow the deductive paradigmby attempting to specify inductive support probabilities solely interms of the syntactic structures of premise and conclusion sentences.In deductive logic the syntactic structure of the sentences involvedcompletely determines whether premises logically entail a conclusion.So these inductive logicians have attempted to follow suit. In such a system each sentence confers asyntactically specified degree of support on each of the othersentences of the language. Thus, the inductive probabilities in such asystem arelogical in the sense that they depend on syntacticstructure alone. This kind of conception was articulated to someextent by John Maynard Keynes in hisTreatise on Probability(1921). Rudolf Carnap pursued this idea with greater rigor in hisLogical Foundations of Probability (1950) and in severalsubsequent works (e.g., Carnap 1952). (For details of Carnap’sapproach see the section onlogical probability in the entry oninterpretations of the probability calculus, in thisEncyclopedia.)
In the inductive logics of Keynes and Carnap, Bayes’ theorem, astraightforward theorem of probability theory, plays a central role inexpressing how evidence comes to bear on hypotheses. Bayes’theorem expresseshow the probability of a hypothesish on the evidencee, \(P[h \pmid e]\), depends on the probability thateshould occur ifh is true, \(P[e \pmid h]\), and on theprobability of hypothesishprior to taking theevidence into account, \(P[h]\) (called theprior probabilityofh). (Later we’ll examine Bayes’ theorem in detail.) So, such approaches might well be calledBayesianlogicist inductive logics. Other prominent Bayesian logicistattempts to develop a probabilistic inductive logic include the worksof Jeffreys (1939), Jaynes (1968), and Rosenkrantz (1981).
It is now widely held that the core idea of this syntactic approach toBayesian logicism is fatally flawed—that syntactic logicalstructure cannot be the sole determiner of the degree to whichpremises inductively support conclusions. A crucial facet of theproblem faced by syntactic Bayesian logicism involves how the logic issupposed to apply in scientific contexts where the conclusion sentenceis some scientific hypothesis or theory, and the premises are evidenceclaims. The difficulty is that inany probabilistic logicthat satisfies the usual axioms for probabilities, the inductivesupport for a hypothesis must depend in part on itspriorprobability. Thisprior probability represents(arguably) how plausible the hypothesis is taken to be on the basis ofconsiderations other than the observational and experimental evidence(e.g., perhaps due to various plausibility arguments). A syntacticBayesian logicist must tell us how to assign values to thesepre-evidentialprior probabilities of hypotheses in a waythat relies only on the syntactic logical structure of the hypothesis,perhaps based on some measure of syntactic simplicity. There aresevere problems with getting this idea to work. Variouskinds of examples seem to show that such an approach must assignintuitively quite unreasonable prior probabilities to hypotheses inspecific cases (see the footnote cited near the end ofSection 3.2 for details). Furthermore, for this idea to apply to the evidentialsupport of real scientific theories, scientists would have toformalize theories in a way that makes their relevant syntacticstructures apparent, and then evaluate theories solely on thatsyntactic basis (together with their syntactic relationships toevidence statements). Are we to evaluate alternative theories ofgravitation, and alternative quantum theories, this way? This seems anextremely dubious approach to the evaluation of real scientifichypotheses and theories. Thus, it seems that logical structure alonemay not suffice for the inductive evaluation of scientific hypotheses.(This issue will be treated in more detail inSection 3, after we first see how probabilistic logics employ Bayes’theorem to represent the evidential support for hypotheses as afunction ofprior probabilities together withevidential likelihoods.)
At about the time that the syntactic Bayesian logicist idea wasdeveloping, an alternative conception of probabilistic inductivereasoning was also emerging. This approach is now generally referredto as the Bayesiansubjectivist orpersonalistapproach to inductive reasoning (see, e.g., Ramsey 1926; De Finetti1937; Savage 1954; Edwards, Lindman, & Savage 1963; Jeffrey 1983,1992; Howson & Urbach 1993; Joyce 1999). This approach treatsinductive probability as a measure of an agent’sdegree-of-belief that a hypothesis is true, given the truthof the evidence. This approach was originally developed as part of alarger normative theory of belief and action known asBayesiandecision theory. The principal idea is that the strength of anagent’s desires for various possible outcomes should combinewith her belief-strengths regarding claims about the world to produceoptimally rational decisions. Bayesian subjectivists provide a logicof decision that captures this idea, and they attempt to justify thislogic by showing that in principle it leads to optimal decisions aboutwhich of various risky alternatives should be pursued. On the Bayesiansubjectivist or personalist account of inductive probability,inductive probability functions represent the subjective (or personal)belief-strengths of ideally rational agents, the kind of beliefstrengths that figure into rational decision making. (See the sectiononsubjective probability in the entry oninterpretations of the probability calculus, in thisEncyclopedia.)
Elements of a logicist conception of inductive logic live on today aspart of the general approach calledBayesian inductive logic.However, among philosophers and statisticians the term‘Bayesian’ is now most closely associated with thesubjectivist or personalist account of belief and decision. And theterm ‘Bayesian inductive logic’ has come to carry theconnotation of a logic that involves purely subjective probabilities.This usage is misleading since, for inductive logics, theBayesian/non-Bayesian distinction should really turn on whether thelogic gives Bayes’ theorem a prominent role, or the approach largely eschews the use of Bayes’ theorem in inductiveinferences, as do theclassical approaches to statisticalinference developed by R. A. Fisher (1922) and by Neyman & Pearson(1967)). Indeed, any inductive logic that employs the same probabilityfunctions to represent both theprobabilities of evidence claimsdue to hypotheses and theprobabilities of hypotheses due tothose evidence claims must be aBayesian inductive logicin this broader sense; because Bayes’ theorem follows directlyfrom the axioms that each probability function must satisfy, andBayes’ theorem expresses a necessary connection between theprobabilities of evidence claims due to hypotheses and theprobabilities of hypotheses due to those evidence claims.
In this article theprobabilistic inductive logic we willexamine is aBayesian inductive logic in this broader sense.This logic will not presuppose thesubjectivist Bayesiantheory of belief and decision, and will avoid the objectionablefeatures of the syntactic version of Bayesian logicism. We will seethat there are good reasons to distinguishinductiveprobabilities fromdegree-of-belief probabilities andfrompurely syntactic logical probabilities. So, theprobabilistic logic articulated in this article will be presented in away that depends on neither of these conceptions of what theprobability functionsare. However, this version of the logicwill be general enough that it may be fitted to a Bayesiansubjectivist or Bayesian syntactic-logicist program, if one desires todo that.
All logics derive from the meanings of terms in sentences. What we nowrecognize asformal deductive logic rests on the meanings(i.e., the truth-functional properties) of the standard logical terms.These logical terms, and the symbols we will employ to represent them,are as follows:
The meanings of all other terms, the non-logical terms such as namesand predicate and relational expressions, are permitted to“float free”. That is, the logical validity of deductivearguments depends neither on the meanings of the name and predicateand relation terms, nor on the truth-values of sentences containingthem. It merely supposes that these non-logical terms are meaningful,and that sentences containing them have truth-values. Deductive logicthen tells us that the logical structures of somesentences—i.e., the syntactic arrangements of their logicalterms—preclude them from being jointly true of any possiblestate of affairs. This is the notion oflogicalinconsistency. The notion oflogical entailment isinter-definable with it. A collection of premise sentenceslogically entails a conclusion sentence just when thenegation of the conclusion islogically inconsistent withthose premises.
An inductive logic must, it seems, deviate from the paradigm providedby deductive logic in several significant ways. For one thing, logicalentailment is an absolute, all-or-nothing relationship betweensentences, whereas inductive support comes in degrees-of-strength. Foranother, although the notion ofinductive support isanalogous to the deductive notion oflogical entailment, andis arguably an extension of it, there seems to be no inductive logicextension of the notion oflogical inconsistency—atleast none that is inter-definable withinductive support inthe way thatlogical inconsistency is inter-definable withlogical entailment. Indeed, it turns out that when theunconditional probability of \((B\cdot{\nsim}A)\) is very nearly 0(i.e., when \((B\cdot{\nsim}A)\) is “nearlyinconsistent”), the degree to whichBinductivelysupportsA, \(P[A \pmid B]\), may range anywhere between 0and 1.
Another notable difference is that whenBlogicallyentailsA, adding a premiseC cannot undermine thelogical entailment—i.e., \((C\cdot B)\) mustlogically entailA as well. This property oflogical entailment iscalledmonotonicity. Butinductive support isnonmonotonic. In general, depending on what \(A, B\), andC mean, adding a premiseC toB may substantiallyraise the degree of support forA, or may substantially lowerit, or may leave it completely unchanged—i.e., \(P[A \pmid(C\cdot B)]\) may have a value much larger than \(P[A \pmid B]\), ormay have a much smaller value, or it may have the same, or nearly thesame value as \(P[A \pmid B]\).
In a formal treatment of probabilistic inductive logic, inductivesupport is represented by conditional probability functions defined onsentences of a formal languageL. These conditional probabilityfunctions are constrained by certain rules or axioms that aresensitive to the meanings of the logical terms (i.e.,‘not’, ‘and’, ‘or’, etc., thequantifiers ‘all’ and ‘some’, and the identityrelation). The axioms apply without regard for what the other terms ofthe language may mean. In essence the axioms specify a family ofpossible support functions, \(\{P_{\beta}, P_{\gamma}, \ldots,P_{\delta}, \ldots \}\) for a given languageL. Although eachsupport function satisfies these same axioms, the further issue ofwhich among them provides an appropriate measure ofinductivesupport is not settled by the axioms alone. That may depend onadditional factors, such as the meanings of the non-logical terms(i.e., the names and predicate expressions) of the language.
A good way to specify the axioms of the logic of inductive supportfunctions is as follows. These axioms are apparently weaker than theusual axioms for conditional probabilities. For instance, the usualaxioms assume that conditional probability values are restricted toreal numbers between 0 and 1. The following axioms do not assume this,but only that support functions assign some real numbers as values forsupport strengths. However, it turns out that the following axiomssuffice to derive all the usual axioms for conditional probabilities(including the usual restriction to values between 0 and 1). We drawon these weaker axioms only to forestall some concerns about whether the supportfunction axioms may assume too much, or may be overly restrictive.
LetL be a language for predicate logic with identity, and let‘\(\vDash\)’ be the standardlogical entailmentrelation—i.e., the expression‘\(B\vDash A\)’ says“B logically entails A” and the expression‘\(\vDashA\)’ says“A is a tautology”. A support function is afunction \(P_{\alpha}\) from pairs of sentences ofL to realnumbers that satisfies the following axioms:
For all sentence \(A, B, C\), andD:
This axiomatization takes conditional probability as basic, as seemsappropriate forevidential support functions. (Thesefunctions agree with the more usual unconditional probabilityfunctions when the latter are defined—just let \(P_{\alpha}[A] =P_{\alpha}[A \pmid (D \vee{\nsim}D)]\). However, these axioms permitconditional probabilities \(P_{\alpha}[A \pmid C]\) to remain definedeven when condition statementC has probability 0—i.e.,even when \(P_{\alpha}[C \pmid (D\vee{\nsim}D)] = 0\).)
Notice that conditional probability functions apply only to pairs ofsentences, a conclusion sentence and a premise sentence. So, inprobabilistic inductive logic we represent finite collections ofpremises by conjoining them into a single sentence. Rather than say,
A is supported to degreer by the set of premises\(\{B_1\), \(B_2\), \(B_3\),…, \(B_n\}\),
we instead say that
A is supported to degreer by the conjunctive premise\((((B_1\cdot B_2)\cdot B_3)\cdot \ldots \cdot B_n)\),
and write this as
\[P[A \pmid ( ((B_1\cdot B_2)\cdot B_3)\cdot \ldots \cdot B_n)] = r.\]The above axioms are quite weak. For instance, they do not say thatlogically equivalent sentences are supported by all other sentences tothe same degree; rather, that result is derivable from these axioms(seeresult 6 below). Nor do these axioms say that logically equivalent sentencessupport all other sentences to the same degree; rather, that result isalso derivable (seeresult 8 below). Indeed, from these axioms all of the usual theorems ofprobability theory may be derived. The following results areparticularly useful in probabilistic logic. Their derivations fromthese axioms are provided in note 2.[2]
Let us now briefly consider each axiom to see how plausible it is as aconstraint on a quantitative measure of inductive support, and how itextends the notion of deductive entailment. First notice that eachdegree-of-support function \(P_{\alpha}\) onLmeasuressupport strength with some real number values, butthe axioms don’t explicitly restrict these values to lie between0 and 1. It turns out that the all support values must lie between 0and 1, but this follows from the axioms, rather than being assumed bythem. The scaling of inductive support via the real numbers is surelya reasonable way to go.
Axiom 1 is a non-triviality requirement. It says that the support valuescannot be the same for all sentence pairs. This axiom merely rules outthe trivial support function that assigns the same amount of supportto each sentence by every sentence. One might replace this axiom withthe following rule:
\[P_{\alpha}[(A\vee{\nsim}A) \pmid (A\vee{\nsim}A)] \ne P_{\alpha}[(A\cdot{\nsim}A) \pmid (A\vee{\nsim}A)].\]But this alternative rule turns out to be derivable from axiom 1together with the other axioms.
Axiom 2 asserts that whenBlogically entailA, thesupport ofA byB is as strong as support can possiblybe. This comports with the idea that an inductive support function isa generalization of the deductive entailment relation, where thepremises of deductive entailments provide the strongest possiblesupport for their conclusions.
Axiom 3 merely says that \((B \cdot C)\) supports sentences to precisely thesame degree that \((C \cdot B)\) supports them. This is an especiallyweak axiom. But taken together with the other axioms, it suffices toentail that logically equivalent sentences support all sentences toprecisely the same degree.
Axiom 4 says that inductive supportadds up in a plausible way. WhenC logically entails the incompatibility ofA andB, i.e., when no possible state of affairs can make bothA andB true together, the degrees of support thatC provides to each of them individually must sum to the supportit provides to their disjunction. The only exception is in those caseswhereC acts like a logical contradiction and supports allsentences to the maximum possible degree (in deductive logic a logicalcontradictionlogically entails every sentence).
To understand whataxiom 5 says, think of a support function \(P_{\alpha}\) as describing ameasure on possible states of affairs. Read each degree-of-supportexpression of form ‘\(P_{\alpha}[D \pmid E] = r\)’ to saythat the proportion of states of affairs in whichD is trueamong those states of affairs whereE is true isr. Readthis way, axiom 5 then says the following. SupposeB is true inproportionq of all the states of affairs whereC istrue, and supposeA is true in fractionr of thosestates whereB andC are true together. ThenAandB should be true together in what proportion of all thestates whereC is true? In fractionr (the \((A\cdotB)\) part) of proportionq (theB portion) of all thosestates whereC is true.
The degree to which a sentenceB supports a sentenceAmay well depend on what these sentences mean. In particular it willusually depend on the meanings we associate with the non-logical terms(those terms other than the logical termsnot,and,or, etc., thequantifiers, andidentity), that is, on themeanings of the names, and the predicate and relation terms of thelanguage. For example, we should want
\[P_{\alpha}[\textrm{George is not married} \pmid \textrm{George is a bachelor}] = 1,\]given the usual meanings of ‘bachelor’ and‘married’, since “all bachelors are unmarried”is analytically true—i.e. no empirical evidence is required toestablish this connection. (In the formal language for predicatelogic, if we associate the meaning “is married” withpredicate term ‘M’, the meaning “is abachelor” with the predicate term ‘B’, andtake the name term ‘g’ to refer to George, then weshould want \(P_{\alpha}[{\nsim}Mg \pmid Bg] = 1\), since \(\forall x(Bx \supset{\nsim}Mx)\) is analytically true on this meaningassignment to the non-logical terms.) So, let’s associate witheach individual support function \(P_{\alpha}\) a specific assignmentof meanings (primary intensions) to all the non-logical termsof the language. (However, evidential support functions should notpresuppose meaning assignments in the sense of so-calledsecondaryintensions—e.g., those associated with rigid designators across possible states of affairs. For, we should not want a confirmation function\(P_{\alpha}\) to make
\[P_{\alpha}[\textrm{This glass is full of H\(_2\)O} \pmid \textrm{This glass is full of water}] = 1,\]since we presumably want the inductive logic to draw on explicitempirical evidence to support the claim that water is made ofH2O. Thus, the meanings of terms we associate with asupport function should only be their primary intensions, not theirsecondary intensions.)
In the context of inductive logic itmakes good sense to supplement the above axioms with two additionalaxioms. Here is the first of them:
Here is how axiom 6 applies to the above example, yielding\(P_{\alpha}[{\nsim}Mg \pmid Bg] = 1\) when the meaning assignments tonon-logical terms associated with support function \(P_{\alpha}\)makes \(\forall x(Bx \supset{\nsim}Mx)\) analytically true. Fromaxiom 6 (followed by results 7, 5, and 4) we have
\[\begin{align}1 & = P_{\alpha}[\forall x(Bx \supset{\nsim}Mx) \pmid Bg] \\& = P_{\alpha}[(Bg \cdot \forall x(Bx \supset{\nsim}Mx)) \pmid Bg]\\& \le P_{\alpha}[{\nsim}Mg \pmid Bg] \\& \le 1;\end{align}\]thus, \(P_{\alpha}[{\nsim}Mg \pmid Bg] = 1\). The idea behind axiom 6is that inductive logic is about evidential support for contingentclaims. Nothing can count as empirical evidence for or againstnon-contingent truths. In particular, analytic truths should bemaximally supported by all premisesC.
One important respect in which inductive logicshould followthe deductive paradigm is that the logic should not presuppose the truth ofcontingent statements. If a statementC is contingent, then some other statements should be able to count as evidence againstC. Otherwise, a support function \(P_{\alpha}\) will takeC and all of its logical consequences to be supported to degree 1 by all possible evidence claims.This is no way for an inductive logic to behave. The whole idea of inductive logic isto provide a measure of the extent to which premise statements indicatethe likely truth-values of contingent conclusion statements. This ideawon’t work properly if the truth-values of some contingentstatements arepresupposed by assigning them support value 1 on every possible premise. Such probability assignments would make the inductive logic enthymematicby hiding significant premises in inductive support relationships.It would be analogous to permitting deductive arguments to count as valid in cases where the explicitly stated premises are insufficient to logically entail the conclusion, but where the validity of the argument is permitted to depend on additional unstated premises. This is not how arigorous approach to deductive logic should work, and it should not be a commonpractice in a rigorous approach to inductive logic.
Nevertheless, it is common practice for probabilistic logicians tosweep provisionally accepted contingent claims under the rug byassigning them probability 1 (regardless of the fact that no explicitevidence for them is provided). This practice savesthe trouble of repeatedly writing a given contingent sentenceBas a premise, since \(P_{\gamma}[A \pmid B\cdot C]\) will equal\(P_{\gamma}[A \pmid C]\) whenever \(P_{\gamma}[B \pmid C] = 1\).Although this convention is useful, such probability functions shouldbe considered mere abbreviations for proper, logically explicit,non-enthymematic, inductive support relations. Thus, properlyspeaking, an inductive support function \(P_{\alpha}\) should notassign probability 1 to a sentence on every possible premise unlessthat sentence is either (i) logically true, or (ii) an axiom of settheory or some other piece of pure mathematics employed by thesciences, or (iii) unless according to the interpretation of thelanguage that \(P_{\alpha}\) presupposes, the sentence isanalytic (and so outside the realm of evidential support).Thus, we adopt the following version of the so-called “axiom ofregularity”.
Axioms 6 and 7 taken together say that a support function\(P_{\alpha}\) counts as non-contingently true, and so not subject toempirical support, just those sentences that are assigned probability1 by every premise.
Some Bayesian logicists have proposed that an inductivelogic might be made to depend solely on the logical form of sentences,as is the case for deductive logic. The idea is, effectively, tosupplement axioms 1–7 with additional axioms that depend only onthe logical structures of sentences, and to introduce enough suchaxioms to reduce the number of possible support functions to a singleuniquely best support function. It is now widely agreed that thisproject cannot be carried out in a plausible way. Perhaps support functions should obey some rules in addition to axioms 1–7. But it isdoubtful that any plausible collection of additional rules can suffice to determine a single, uniquely qualified support function. Later, inSection 3, we will briefly return to this issue, after we develop a more detailed account of how inductive probabilitiescapture the relationship between hypotheses and evidence.
Axioms 1–7 for conditional probability functions merely placeformal constraints on what may properly count as adegree ofsupport function. Each function \(P_{\alpha}\) that satisfiesthese axioms may be viewed as a possible way of applying the notion ofinductive support to a languageL that respects themeanings of the logical terms, much as each possibletruth-valueassignment for a language represents a possible way of assigningtruth-values to its sentences in a way that respects the meanings of the logical terms. The issue of whichof thepossible truth-value assignments to a languagerepresents theactual truth or falsehood of its sentencesdepends on more than this. It depends on the meanings of thenon-logical terms and on the state of the actual world. Similarly, thedegree to which some sentencesactually support others in afully meaningful language must rely on something more than the meresatisfaction of the axioms for support functions. It must, at least, relyon what the sentences of the language mean, and perhaps on much morebesides. But, what more? Perhaps a better understanding of what inductive probabilityis may provide some help by filling out our conception of whatinductive support is about. Let’s pause todiscuss two prominent views—twointerpretations of the notion of inductive probability.
One kind of non-syntactic logicist reading of inductive probability takes each supportfunction \(P_{\alpha}\) to be a measure on possible states of affairs. The idea is that,given a fully meaningful language (associated with support function \(P_{\alpha}\))‘\(P_{\alpha}[A \pmid B] = r\)’ says that among thosestates of affairs in whichB is true,A is true inproportionr of them. There will not generally be a singleprivileged way to define such a measure on possible states of affairs.Rather, each of a number of functions \(P_{\alpha}\), \(P_{\beta}\),\(P_{\gamma}\),…, etc., that satisfy the constraints imposed byaxioms 1–7 may represent a viable measure of theinferentialimport of the propositions expressed by sentences of thelanguage. This idea needs more fleshing out, of course. The nextsection will provide some indication of how that mightgo.
Subjectivist Bayesians offer an alternative reading of thesupport functions. First, they usually take unconditional probabilityas basic, and take conditional probabilities as defined in terms ofunconditional probabilities: the conditional probability‘\(P_{\alpha}[A \pmid B]\)’ is defined as a ratio ofunconditional probabilities:
\[P_{\alpha}[A \pmid B] = \frac{P_{\alpha}[A\cdot B]}{P_{\alpha}[B]}.\]Subjectivist Bayesians take each unconditional probabilityfunction \(P_{\alpha}\) to represent the belief-strengths orconfidence-strengths of an ideally rational agent, \(\alpha\). On thisunderstanding ‘\(P_{\alpha}[A] =r\)’ says, “thestrength of \(\alpha\)’s belief (or confidence) thatA istruth isr”. Subjectivist Bayesians usually tie suchbelief strengths to how much money (or how manyunits ofutility) the agent would be willing to bet onA turningout to be true. Roughly, the idea is this. Suppose that an ideallyrational agent \(\alpha\) would be willing to accept a wager thatwould yield (no less than) $u ifA turns out to be trueand would lose him $1 ifA turns out to be false. Then, underreasonable assumptions about the agent’s desire money, it can beshown that the agent’s belief strength thatA is trueshould be
\[P_{\alpha}[A] = \frac{1}{(u+1)}. \]And it can further be shown that any function \(P_{\alpha}\) thatexpresses such betting-related belief-strengths on all statements inagent \(\alpha\)’s language must satisfy axioms forunconditional probabilities analogous to axioms 1–5.[4] Moreover, it can be shown that any function \(P_{\beta}\) thatsatisfies these axioms is a possible rational belief function for someideally rational agent \(\beta\). These relationships betweenbelief-strengths and the desirability of outcomes (e.g., gaining moneyor goods on bets) are at the core ofsubjectivist Bayesiandecision theory.Subjectivist Bayesians usually takeinductive probability to justbe this notion ofprobabilistic belief-strength.
Undoubtedly real agents do believe some claims more strongly thanothers. And, arguably, the belief strengths of real agents can bemeasured on a probabilistic scale between 0 and 1, at leastapproximately. And clearly the inductive support of a hypothesis byevidence should influence the strength of an agent’s belief inthe truth of that hypothesis—that’s the point of engagingin inductive reasoning, isn’t it? However, there is good reasonfor caution about viewinginductive support functions asBayesian belief-strength functions, as we’ll see a bit later.So, perhaps an agent’s support function is not simplyidentical to his belief function, and perhaps therelationship betweeninductive support andbelief-strength is somewhat more complicated.
In any case, some account of what support functions are supposed torepresent is clearly needed. The belief function account and thelogicist account (in terms of measures on possible states of affairs)are two attempts to provide this account. But let us put this interpretativeissue aside for now. One may be able to get a better handle on whatinductive support functionsreally are after one sees how theinductive logic that draws on them is supposed to work.
One of the most important applications of an inductive logic is its treatment ofthe evidential evaluation of scientific hypotheses.The logic should capture the structure of evidential support for allsorts of scientific hypotheses, ranging from simple diagnostic claims (e.g.,“the patient is infected by the HIV”) to complex scientific theories about the fundamental nature of the world, such as quantummechanics or the theory of relativity. This section will show howevidential support functions (a.k.a. Bayesian confirmation functions)represent the evidential evaluation of scientific hypotheses and theories. This logic is essentially comparative. The evaluation of a hypothesis depends on how strongly evidence supports it over alternative hypotheses.
Consider some collection of mutually incompatible, alternative hypotheses (or theories)about a common subject matter, \(\{h_1, h_2 , \ldots \}\). The collection ofalternatives may be very simple, e.g., {“the patient hasHIV”, “the patient is free of HIV”}. Or, when thephysician is trying to determine which among a range of diseases iscausing the patient’s symptoms, the collection of alternatives mayconsist of a long list of possible disease hypotheses. For the cosmologist, the collection of alternatives may consist of several distinct gravitationaltheories, or several empirically distinct variants of the “same” theory. Whenever two variants of a hypothesis (or theory) differ in empirical import, they count as distinct hypotheses. (This should not be confused with the converse positivistic assertion that theories with the same empirical content are really the same theory. Inductive logicdoesn’t necessarily endorse that view.)
The collection of competing hypotheses (or theories) to be evaluated by the logic may be finite in number, or may be countably infinite. No realistic language contains more than a countable number of expressions; so it suffices for a logic to apply to countably infinite number of sentences. From a purely logical perspective the collection of competing alternatives may consist of every rival hypothesis (or theory) about a given subject matter that can be expressed within a given language — e.g., all possible theories of the origin and evolution of the universe expressible in English and contemporary mathematics. In practice, alternative hypotheses (or theories) will often be constructed and evidentially evaluated over a long period of time. The logic of evidential support works in much the same way regardless of whether all alternative hypotheses are considered together, or only a few alternative hypotheses are available at a time.
Evidence for scientific hypotheses consists of the results of specificexperiments or observations. For a given experiment or observation,let ‘\(c\)’ represent a description of the relevantconditions under which it is performed, and let‘\(e\)’ represent a description of the result of the experiment or observation, theevidential outcome of conditions \(c\).
The logical connection between scientific hypotheses and the evidence often requires the mediation of background information and auxiliary hypotheses. Let ‘\(b\)’ represent whatever background and auxiliary hypotheses are required to connect each hypothesis \(h_i\) among the competing hypotheses \(\{h_1, h_2 , \ldots \}\) to the evidence. Although the claims expressed by the auxiliary hypotheses within \(b\) may themselves be subject to empirical evaluation, they should be the kinds of claims thatare not at issue in the evaluation of the alternative hypothesis in the collection\(\{h_1, h_2 , \ldots \}\). Rather, each of the alternative hypotheses under consideration draws on the same background and auxiliaries tologically connect to the evidential events. (If competing hypotheses \(h_i\) and\(h_j\) draw on distinct auxiliary hypotheses \(a_i\) and \(a_j\),respectively, in making logical contact with evidential claims, thenthe following treatment should be applied to the respectiveconjunctive hypotheses, \((h_{i}\cdot a_{i})\) and \((h_{j}\cdota_{j})\), since these alternative conjunctive hypotheses willconstitute the empirically distinct alternatives at issue.)
In cases where a hypothesis is deductively related to anoutcome \(e\) of an observational or experimental condition\(c\) (via background and auxiliaries \(b\)), we will haveeither\(h_i\cdot b\cdot c \vDashe\) or\(h_i\cdot b\cdot c\vDash{\nsim}e\). For example, \(h_i\) might be the NewtonianTheory of Gravitation. A test of the theory might involve a conditionstatement \(c\) that describes the results of some earlier measurementsof Jupiter’s position, and that describes the means by which thenext position measurement will be made; the outcome description\(e\) states the result of this additional position measurement;and the background information (and auxiliary hypotheses) \(b\)might state some already well confirmed theory about the workings andaccuracy of the devices used to make the position measurements. Then,from \(h_i\cdot b\cdot c\) we may calculate the specific outcome\(e\) we expect to find; thus, the following logical entailmentholds:\(h_i\cdot b\cdot c \vDashe\). Then, provided that the experimental and observationalconditions stated by \(c\) are in fact true, if the evidentialoutcome described by \(e\) actually occurs, the resulting conjointevidential claim \((c\cdot e)\) may be considered good evidence for\(h_i\), given \(b\). (This method of theory evaluation is called thehypothetical-deductive approach to evidential support.) Onthe other hand, when from \(h_i\cdot b\cdot c\) we calculate someoutcome incompatible with the observed evidential outcome \(e\),then the following logical entailment holds:\(h_i\cdotb\cdot c \vDash{\nsim}e\). In that case, from deductive logic alone wemust also have that\(b\cdot c\cdot e\vDash{\nsim}h_i\); thus, \(h_i\) is said to befalsified by \(b\cdot c\cdot e\). The Bayesian account ofevidential support we will be describing below extends thisdeductivist approach to include cases where the hypothesis \(h_i\)(and its alternatives) may not be deductive related to the evidence,but may instead imply that the evidential outcome is likely or unlikelyto some specific degreer. That is, the Bayesian approach applies to cases where we may have neither\(h_i\cdot b\cdot c\vDash e\) nor\(h_i\cdotb\cdot c \vDash{\nsim}e\), but may instead only have \(P[e\pmid h_i\cdot b\cdot c] = r\), wherer is some“entailment strength” between 0 and 1.
Before going on to describing the logic of evidential support in moredetail, perhaps a few more words are in order about the background knowledgeand auxiliary hypotheses, represented here by ‘\(b\)’.Duhem (1906) and Quine (1953) are generally credited with alertinginductive logicians to the importance of auxiliary hypotheses inconnecting scientific hypotheses and theories to empirical evidence.(See the entry onPierre Duhem.) They point out that scientific hypotheses often make little contactwith evidence claims on their own. Rather, in most cases scientific hypothesesmake testable predictions only relative to background information andauxiliary hypotheses that tie them to the evidence. (Some specific examples of such auxiliary hypotheses will be provided in the next subsection.) Typicallyauxiliaries are highly confirmed hypotheses from other scientificdomains. They often describe the operating characteristics of variousdevices (e.g., measuring instruments) used to make observations orconduct experiments. Their credibility is usually not at issue in the testing of hypothesis \(h_i\) against its competitors, because \(h_i\) and its alternativesusually rely on the same auxiliary hypotheses to tie them to theevidence. But even when an auxiliary hypothesis is alreadywell-confirmed, we cannot simply assume that it is unproblematic, orjustknown to be true. Rather, the evidential support orrefutation of a hypothesis \(h_i\) isrelative to whateverauxiliaries and background information (in \(b\)) is beingsupposed in the confirmational context. In other contexts the auxiliary hypotheses used to test \(h_i\) may themselves be among a collection of alternative hypothesesthat are subject to evidential support or refutation. Furthermore, tothe extent that competing hypotheses employ different auxiliaryhypotheses in accounting for evidence, the evidence only tests eachsuch hypothesis in conjunction with its distinct auxiliaries againstalternative hypotheses packaged with their distinct auxiliaries, asdescribed earlier. Thus, what counts as ahypothesis to betested, \(h_i\), and what counts as auxiliary hypotheses andbackground information, \(b\), may depend on the epistemic context—on what class of alternative hypotheses are being tested by a collection of experiments or observations, and on what claims are presupposed in that context.No statement is intrinsically atest hypothesis, orintrinsically anauxiliary hypothesis orbackground condition. Rather, these categories are roles statements may play in a particular epistemic context.
In a probabilistic inductive logic the degree to which the evidence\((c\cdot e)\) supports a hypothesis \(h_i\) relative to background and auxiliaries\(b\) is represented by theposterior probability of\(h_i\), \(P_{\alpha}[h_i \pmid b\cdot c\cdot e]\), according to an evidentialsupport function \(P_{\alpha}\). It turns out that theposteriorprobability of a hypothesis depends on just two kinds of factors:(1) itsprior probability, \(P_{\alpha}[h_i \pmid b]\),together with the prior probabilities of its competitors,\(P_{\alpha}[h_j \pmid b]\), \(P_{\alpha}[h_k \pmid b]\), etc.; and (2) thelikelihood of evidential outcomes \(e\) according to \(h_i\) in conjunction with with \(b\) and \(c\), \(P[e \pmid h_i\cdot b\cdot c]\), together withthe likelihoods of these same evidential outcomes according to competing hypotheses, \(P[e\pmid h_j\cdot b\cdot c]\), \(P[e \pmid h_k\cdot b\cdot c]\), etc. We will now examine each of these factors in some detail. Following that we will see precisely how the values of posterior probabilities depend on the values of likelihoodsand prior probabilities.
In probabilistic inductive logicthe likelihoods carry theempirical import of hypotheses. Alikelihood is a supportfunction probability of form \(P[e \pmid h_i\cdot b\cdot c]\). Itexpresses how likely it is that outcome \(e\) will occur accordingto hypothesis \(h_i\) together with the background and auxiliaries \(b\) and the experimental (or observational) conditions \(c\).[5] If a hypothesis together with auxiliaries and experimental/observation conditionsdeductively entails an evidence claim, the axioms of probability makethe corresponding likelihoodobjective in the sense that every supportfunction must agree on its values: \(P[e \pmid h_i\cdot b\cdot c] =1\) if \(h_i\cdot b\cdot c \vDash e\); \(P[e \pmid h_i\cdot b\cdot c]= 0\) if \(h_i\cdot b\cdot c \vDash{\nsim}e\). However, in many casesa hypothesis \(h_i\) will not be deductively related to the evidence,but will only imply it probabilistically. There are several ways thismight happen: (1) hypothesis \(h_i\) may itself be an explicitlyprobabilistic or statistical hypothesis; (2) an auxiliary statisticalhypothesis, as part of the backgroundb, may connect hypothesis\(h_i\) to the evidence; (3) the connection between the hypothesis andthe evidence may be somewhat loose or imprecise, not mediated byexplicit statistical claims, but nevertheless objective enough for thepurposes of evidential evaluation. Let’s briefly considerexamples of the first two kinds. We’ll treat case (3) inSection 5, which addresses the the issue of vague and imprecise likelihoods.
The hypotheses being tested may themselves be statistical in nature.One of the simplest examples of statistical hypotheses and their rolein likelihoods are hypotheses about the chance characteristic ofcoin-tossing. Let\(h_{[r]}\)be a hypothesis that says a specific coin has a propensity (orobjective chance)r for coming upheads on normal tosses, let \(b\) say that such tosses are probabilistically independent of one another. Let \(c\)state that the coin is tossedn times in the normal way; andlet \(e\) say that on these tosses the coin comes up headsmtimes. In cases like this the value of the likelihood of the outcome\(e\) on hypothesis\(h_{[r]}\)for condition \(c\) is given by the well-known binomial formula:
\[P[e \pmid h_{[r]}\cdot b\cdot c] =\frac{n!}{m! \times(n-m)!}\times r^m (1-r)^{n-m}.\]There are, of course, more complex cases of likelihoods involvingstatistical hypotheses. Consider, for example, the hypothesis thatplutonium 233 nuclei have a half-life of 20 minutes—i.e., thatthe propensity (orobjective chance) for a Pu-233 nucleus todecay within a 20 minute period is 1/2. The full statistical model forthe lifetime of such a system says that the propensity (orobjective chance) for that system to remain intact (i.e., tonot decay) within any time periodx is governed by theformula \(1/2^{x/\tau}\), where \(\tau\) is the half-life of such asystem. Let \(h\) be a hypothesis that says that this statisticalmodel applies to Pu-233 nuclei with \(\tau = 20\) minutes; let\(c\) say that some specific Pu-233 nucleus is intact within a decay detector (of some specific kind) at an initial time \(t_0\); let \(e\) say that no decay of this same Pu-233 nucleus is detected by the later time \(t\); and let \(b\) say that the detector is completely accurate (it always registers a real decay, and it never registers false-positive detections). Then, the associated likelihood of\(e\) given \(h\) and \(c\) is this: \(P[e \pmid h\cdot b\cdot c] =1/2^{(t - t_0)/\tau}\), where the value of \(\tau\) is 20 minutes.
An auxiliary statistical hypothesis, as part of the background\(b\), may be required to connect hypothesis \(h_i\) to the evidence. For example,a blood test for HIV has a known false-positive rate and a knowntrue-positive rate. Suppose the false-positive rate is .05—i.e.,the test tends to incorrectly show the blood sample to be positive forHIV in 5% of all cases whereHIV isnot present. And suppose that thetrue-positive rate is .99—i.e., the test tends to correctly showthe blood sample to be positive for HIV in 99% of all cases whereHIVreally is present. When a particular patient’s blood is tested, the hypotheses under consideration arethis patient is infected with HIV, \(h\), andthis patient isnot infected with HIV, \({\nsim}h\). In this context the known test characteristics function as background information,b. Theexperimental condition \(c\) merely states that this particularpatient was subjected to this specific kind of blood test for HIV,which was processed by the lab using proper procedures. Let us supposethat the outcome \(e\) states that the result is apositive testresult for HIV. The relevant likelihoods then, are \(P[e \pmid h\cdotb\cdot c] = .99\) and \(P[e \pmid {\nsim}h\cdot b\cdot c]\) = .05.
In this example the values of the likelihoods are entirely due to thestatistical characteristics of the accuracy of the test, which iscarried by the background/auxiliary information \(b\). The hypothesis\(h\) being tested by the evidence is not itself statistical.
This kind of situation may, of course, arise for much more complexhypotheses. The alternative hypotheses of interest may be deterministicphysical theories, say Newtonian Gravitation Theory and some specific alternatives. Some of the experiments that test this theory relay on somewhat imprecisemeasurements that have known statistical error characteristics, whichare expressed as part of the background or auxiliary hypotheses,\(b\). For example, the auxiliary \(b\) may describe the errorcharacteristics of a device that measures the torque imparted to aquartz fiber, where the measured torque is used to assess the strengthof the gravitational force between test masses. In that case \(b\)may say that for this kind of device the measurement errors arenormally distributed about whatever value a given gravitational theorypredicts, with some specified standard deviation that ischaracteristic of the device. This results in specific values \(r_i\)for the likelihoods, \(P[e \pmid h_i\cdot b\cdot c] = r_i\), for eachof the various gravitational theories, \(h_i\), beingtested.
Likelihoods that arise from explicit statistical claims—eitherwithin the hypotheses being tested, or from explicit statisticalbackground claims that tie the hypotheses to the evidence—areoften calleddirect inference likelihoods. Such likelihoodsshould be completely objective. So, all evidential support functions should agree on their values, just as all support functions agree on likelihoods when evidence is logicallyentailed. Direct inference likelihoods arelogical in anextended, non-deductive sense. Indeed, some logicians have attemptedto spell out the logic ofdirect inferences in terms of thelogical form of the sentences involved.[6] But regardless of whether that project succeeds, it seems reasonableto take likelihoods of this sort to have highly objective orintersubjectively agreed values.
Not all likelihoods of interest in confirmational contexts arewarranted deductively or by explicitly stated statistical claims. Insuch cases the likelihoods may have vague, imprecise values, butvalues that are determinate enough to still underwrite an objectiveevaluation of hypotheses on the evidence. InSection 5 we’ll consider such cases, where no underlying statisticaltheory is involved, but where likelihoods are determinate enough toplay their standard role in the evidential evaluation of scientifichypotheses. However, the proper treatment of such cases will be moreeasily understood after we have first seen how the logic works whenlikelihoods are precisely known (such as cases where the likelihoodvalues are endorsed by explicit statistical hypotheses and/or explicitstatistical auxiliaries). In any case, the likelihoods that relatehypotheses to evidence claims in many scientific contexts will havesuch objective values. So, although a variety of different supportfunctions \(P_{\alpha}\), \(P_{\beta}\),…, \(P_{\gamma}\),etc., may be needed to represent the differing “inductiveproclivities” of the various members of a scientific community,for now we will consider cases where all evidential support functionsagree on the values of the likelihoods. For,the likelihoods represent the empirical content of a scientific hypothesis, whatthe hypothesis (together with experimental conditions, \(c\), and background and auxiliaries \(b\))says orprobabilistically implies about theevidence. Thus, the empirical objectivity of a science relies on ahigh degree of objectivity or intersubjective agreement amongscientists on the numerical values of likelihoods.
To see the point more vividly, imagine what a science would be like ifscientists disagreed widely about the values of likelihoods. Eachpractitioner interprets a theory tosay quite differentthings about how likely it is that various possible evidencestatements will turn out to be true. Whereas scientist \(\alpha\)takes theory \(h_1\) to probabilistically imply that event \(e\) ishighly likely, his colleague \(\beta\) understands the empiricalimport of \(h_1\) to say that \(e\) is very unlikely. And,conversely, \(\alpha\) takes competing theory \(h_2\) toprobabilistically imply that \(e\) is very unlikely, whereas\(\beta\) reads \(h_2\) to say that \(e\) is extremely likely. So,for \(\alpha\) the evidential outcome \(e\) supplies strong supportfor \(h_1\) over \(h_2\), because
\[P_{\alpha}[e \pmid h_1\cdot b\cdot c] \gg P_{\alpha}[e \pmid h_2\cdot b\cdot c].\]But his colleague \(\beta\) takes outcome \(e\) to show just theopposite, that \(h_2\) is strongly supported over \(h_1\), because
\[P_{\beta}[e \pmid h_2\cdot b\cdot c] \gg P_{\beta}[e \pmid h_1\cdot b\cdot c].\]If this kind of situation were to occur often, or for significant evidenceclaims in a scientific domain, it would make a shambles of theempirical objectivity of that science. It would completely underminethe empirical testability of such hypotheses and theories within thatscientific domain. Under these circumstances, although each scientistemploys the samesentences to express a given theory\(h_i\), each understands theempirical import of thesesentences so differently that \(h_i\) as understood by\(\alpha\) is an empirically different theory than \(h_i\) asunderstood by \(\beta\). (Indeed, arguably, \(\alpha\) must takeat least one of the two sentences, \(h_1\) or \(h_2\), to express a different proposition than does \(\beta\).) Thus, the empiricalobjectivity of the sciences requires that experts should be in closeagreement about the values of the likelihoods.[7]
For now we will suppose that the likelihoods have objective orintersubjectively agreed values, common to all agents in a scientificcommunity. We mark this agreement by dropping the subscript‘\(\alpha\)’, ‘\(\beta\)’, etc., fromexpressions that represent likelihoods, since all support functionsunder consideration are supposed to agree on the values forlikelihoods. One might worry that this supposition is overly strong.There are legitimate scientific contexts where, although scientistsshould have enough of a common understanding of the empirical importof hypotheses to assign quite similar values to likelihoods, preciseagreement on their numerical values may be unrealistic. This point isright in some important kinds of cases. So later, in Section 5, we will see how to relax the supposition that preciselikelihood values are available, and see how the logic works in suchcases. But for now the main ideas underlying probabilistic inductivelogic will be more easily explained if we focus on those contexts wereobjective or intersubjectively agreed likelihoods are available. Laterwe will see that much the same logic continues to apply in contextswhere the values of likelihoods may be somewhat vague, or wheremembers of the scientific community disagree to some extent abouttheir values.
An adequate treatment of the likelihoods calls for the introduction ofone additional notational device. Scientific hypotheses are generallytested by a sequence of experiments or observations conducted over aperiod of time. To explicitly represent the accumulation of evidence,let the series of sentences \(c_1\), \(c_2\), …, \(c_n\),describe the conditions under which a sequence of experiments orobservations are conducted. And let the corresponding outcomes ofthese observations be represented by sentences \(e_1\), \(e_2\),…, \(e_n\). We will abbreviate the conjunction of the firstn descriptions of experimental or observational conditions by‘\(c^n\)’, and abbreviate the conjunction of descriptionsof their outcomes by ‘\(e^n\)’. Then, for a stream ofn observations or experiments and their outcomes, thelikelihoods take form \(P[e^n \pmid h_{i}\cdot b\cdot c^{n}] = r\),for appropriate values of \(r\). In many cases the likelihoodof the evidence stream will be equal to the product of the likelihoodsof the individual outcomes:
\[P[e^n \pmid h_{i}\cdot b\cdot c^{n}] = P[e_1 \pmid h_i\cdot b\cdot c_1] \times \cdots \times P[e_n \pmid h_{i}\cdot b\cdot c_{n}].\]When this equality holds, the individual bits of evidence are said tobeprobabilistically independent on the hypothesis (together withauxiliaries). In the following account of the logic of evidentialsupport, suchprobabilistic independence willnot be assumed,except in those places where it is explicitly invoked.
The probabilistic logic of evidential support represents the netsupport of a hypothesis by theposterior probability of thehypothesis, \(P_{\alpha}[h_i \pmid b\cdot c^{n}\cdot e^{n}]\).The posterior probability represents the net support for thehypothesis that results from the evidence, \(c^n \cdot e^n\), togetherwith whateverplausibility considerations are taken to berelevant to the assessment of \(h_i\). Whereas the likelihoods are themeans through which evidence contributes to the posterior probabilityof a hypothesis, all other relevant plausibility consideration arerepresented by a separate factor, called theprior probability ofthe hypothesis: \(P_{\alpha}[h_i \pmid b]\). Thepriorprobability represents the weight of any important considerationsnot captured by the evidential likelihoods. Any relevantconsiderations that go beyond the evidence itself may be explicitlystated within expression \(b\) (in addition to whatever auxiliary hypotheses\(b\) may contain in support of the likelihoods). Thus, the prior probability of \(h_i\)may depend explicitly on the content of \(b\). It turns out that posteriorprobabilities dependonly on the values of evidentiallikelihoods together with the values of prior probabilities.
As an illustration of the role ofprior probabilities, consider theHIV test example described in the previous section. What thephysician and the patient want to know is the value of the posteriorprobability, \(P_{\alpha}[h \pmid b\cdot c\cdot e]\), that the patienthas HIV, \(h\), given the evidence of the positive test, \(c\cdote\), and given the error rates of the test, described within \(b\).The value of this posterior probability depends on the likelihood (dueto the error rates) of this patient obtaining a true-positive result,\(P[e \pmid h\cdot b\cdot c] = .99\), and of obtaining afalse-positive result, \(P[e \pmid {\nsim}h\cdot b\cdot c] = .05\). Inaddition, the value of the of the posterior probability depends on howplausible it is that the patient has HIV prior to taking the testresults into account, \(P_{\alpha}[h \pmid b]\). In the context ofmedical diagnosis, this prior probability is usually assessed on thebasis of thebase rate for HIV in the patient’s riskgroup (i.e., whether the patient is an IV drug user, has unprotected sex withmultiple partners, etc.). On a rigorous approach to the logic, suchinformation and its risk-relevance should be explicitly stated within thebackground information \(b\). To see the importance of thisinformation, consider the following numerical results (which may becalculated using the formula called Bayes’ Theorem, presented inthe next section). If the base rate for the patient’s risk groupis relatively high, say \(P_{\alpha}[h \pmid b] = .10\), then thepositive test result yields a posterior probability value for hishaving HIV of \(P_{\alpha}[h \pmid b\cdot c\cdot e] = .69\). However,if the patient is in a very low risk group, say \(P_{\alpha}[h \pmidb] = .001\), then a positive test result only raises the posteriorprobability of his having an HIV infection to \(P_{\alpha}[h \pmidb\cdot c\cdot e] = .02\). This posterior probability is much higherthan the prior probability of .001, but should not worry the patienttoo much. This positive test result may well be due to the comparatively highfalse-positive rate for the test, rather than to the presence of HIV.This sort of test, with a false-positive rate as large as .05, isbest used as a screening test; a positive result warrants conducting asecond, more rigorous, less error-prone test.
More generally, in the evidential evaluation of scientific hypotheses and theories, priorprobabilities represent assessments of non-evidentialplausibility weightings among hypotheses. However, because the strengths of such plausibility assessments mayvary among members of a scientific community, critics often brand such assessments asmerely subjective, and take their role in Bayesian inference to be highly problematic. Bayesian inductivists counter that plausibilityassessments play an important, legitimate role in the sciences, especiallywhen evidence cannot suffice to distinguish among some alternative hypotheses. And, they argue, the epithet “merely subjective” is unwarranted. Such plausibility assessments areoften backed by extensive arguments that may draw on forcefulconceptual considerations.
Scientists often bring plausibility arguments to bearin assessing competing views. Although such arguments are seldomdecisive, they may bring the scientific community into widely sharedagreement, especially with regard to theimplausibility of somelogically possible alternatives. This seems to be the primaryepistemic role of thought experiments.Consider, for example, the kinds of plausibility arguments that havebeen brought to bear on the various interpretations of quantum theory(e.g., those related to the measurement problem). These arguments goto the heart of conceptual issues that were central to the originaldevelopment of the theory. Many of these issues were first raised bythose scientists who made the greatest contributions to the development of quantum theory, in their attempts to get a conceptual hold on the theory and its implications.
Given any body of evidence, it is fairly easy to cook upa host of logically possible alternative hypotheses that make the evidence as probable as desired. In particular, it is easy to cook up hypotheses that logically entail any given body evidence, providing likelihood values equal to 1 for all the available evidence. Although most of these cooked up hypotheses will be laughably implausible, evidential likelihoods cannot rule them out. But, the only factors other than likelihoods that figure into the values of posterior probabilities for hypotheses are the values of their prior probabilities; so only prior probability assessments provide a place for the Bayesian logic to bring important plausibility considerations to bear. Thus, the Bayesian logic can only give implausible hypotheses their due via prior probability assessments.
It turns out that the mathematical structure of Bayesian inference makes prior probabilities especially well-suited to represent plausibility assessments among competing hypotheses. For, in the fully fleshed out account of evidential support for hypotheses (spelled out below), it will turn out that only ratios of prior probabilities for competing hypotheses, \(P_{\alpha}[h_j \pmid b] / P_{\alpha}[h_i \pmid b]\), together with ratios of likelihoods, \(P_{\alpha}[e \pmid h_j\cdot b\cdot c] / P_{\alpha}[e \pmid h_2\cdot b\cdot c]\), play essential roles. The ratio of prior probabilities is well-suited to represent how much more (or less) plausible hypothesis \(h_j\) is than competing hypothesis \(h_i\). Furthermore, the plausibility arguments on which such this comparative assessment is based may be explicitly stated within \(b\). So, given that an inductive logic needs to incorporate well-considered plausibility assessments (e.g. in order to lay low wildly implausible alternative hypotheses), the comparative assessment of Bayesian prior probabilities seems well-suited to do the job.
Thus, although prior probabilities may be subjective in the sense thatagents may disagree on the relative strengths of plausibilityarguments, the priors used in scientific contexts need notrepresentmere subjective whims. Rather, the comparative strengths of the priors for hypotheses should be supported by arguments abouthow much more plausible one hypothesis is than another. The importantrole of plausibility assessments is captured by such received bits ofscientific wisdom as the well-known scientific aphorism,extraordinary claims requireextraordinary evidence. That is, it takes especially strongevidence, in the form of extremely high values for (ratios of)likelihoods, to overcome the extremely low pre-evidential plausibility valuespossessed by some hypotheses. In the next section we’ll see precisely how this idea works, and we’ll return to it again inSection 3.4.
When sufficiently strong evidence becomes available, it turns out that the contributions of prior plausibility assessments to the values of posterior probabilities may be substantially “washedout”, overridden by the evidence. That is, provided the prior probability of a true hypothesis isn’t assessed to be tooclose to zero, the influence of the values ofthe prior probabilities willvery probably fade away as evidence accumulates. InSection 4 we’ll see precisely how this kind of Bayesian convergence to the true hypothesis works.Thus, it turns out that prior plausibility assessments play their most important rolewhen the distinguishing evidence represented by the likelihoods remains weak.
One more point before moving on to the logic of Bayes’ Theorem. Some Bayesian logicists have maintained that posteriorprobabilities of hypotheses should be determined by syntactic logicalform alone. The idea is that the likelihoods might reasonably bespecified in terms of syntactic logical form; so if syntactic formmight be made to determine the values of prior probabilities as well,then inductive logic would be fully “formal” in the sameway that deductive logic is “formal”. Keynes and Carnaptried to implement this idea through syntactic versions of theprinciple of indifference—the idea that syntactically similarhypotheses should be assigned the same prior probability values.Carnap showed how to carry out this project in detail, but only forextremely simple formal languages. Most logicians now take the projectto have failed because of a fatal flaw with the whole idea thatreasonable prior probabilities can be made to depend on logical formalone. Semantic content should matter. Goodmanian grue-predicatesprovide one way to illustrate this point.[8] Furthermore, as suggested earlier, for this idea to apply to theevidential support of real scientific theories, scientists would haveto assess the prior probabilities of each alternative theory basedonly on its syntactic structure. That seems an unreasonable way toproceed. Are we to evaluate the prior probabilities of alternativetheories of gravitation, or for alternative quantum theories, byexploring only their syntactic structures, with absolutely no regardfor their content—withno regard for what theysay about the world? This seems an extremely dubious approachto the evaluation of real scientific theories. Logical structure alonecannot, and should not suffice for determining reasonable priorprobability values for real scientific theories. Moreover, realscientific hypotheses and theories are inevitably subject toplausibility considerations based on what theysay about theworld. Prior probabilities are well-suited to represent the comparative weight of plausibility considerations for alternative hypotheses. But no reasonable assessment of comparative plausibility can derive solely from the logical form of hypotheses.
We will return to a discussion of prior probabilities a bit later. Let’s now see how Bayesian logic combines likelihoods with prior probabilitiesto yield posterior probabilities for hypotheses.
Any probabilistic inductive logic that draws on the usualrules of probability theory to represent how evidence supportshypotheses must be aBayesian inductive logic in the broadsense. For, Bayes’ Theorem follows directly from the usual axioms of probability theory. Its importance derives from the relationship it expressesbetween hypotheses and evidence. Itshows how evidence, via the likelihoods, combines with priorprobabilities to produce posterior probabilities for hypotheses.We now examine several forms of Bayes’ Theorem, each derivable fromaxioms 1–5.
The simplest version of Bayes’ Theorem as it applies to evidence for a hypothesis goes like this:
Bayes’ Theorem: Simple Form
\[\begin{align*}P_{\alpha}[h_i \pmid e] &= \frac{P_{\alpha}[e \pmid h_i]\times P_{\alpha}[h_i]}{P_{\alpha}[e]}\end{align*}\]This equation expresses the posterior probability of hypothesis\(h_i\) due to evidence \(e\), \(P_{\alpha}[h_i \pmid e]\), in terms of thelikelihood ofthe evidence on that hypothesis, \(P_{\alpha}[e \pmid h_i]\), theprior probability of the hypothesis, \(P_{\alpha}[h_i]\), and thesimple probability of the evidence, \(P_{\alpha}[e]\). The factor \(P_{\alpha}[e]\) is often calledthe expectedness of the evidence. Written this way, the theorem suppresses the experimental (or observational) conditions, \(c\), and all background information and auxiliary hypotheses, \(b\). As discussed earlier, both of these terms play an important role in logically connecting the hypothesis at issue, \(h_i\), to the evidence \(e\). In scientific contexts the objectivity of the likelihoods, \(P_{\alpha}[e \pmid h_i\cdot b \cdot c]\), almost always depends on such terms. So, although the suppression of experimental (or observational) conditions and auxiliary hypotheses is a common practice in accounts of Bayesian inference, the treatment below, and throughout the remainder of this article will make the role of these terms explicit.
The subscript \(\alpha\) on the evidential support function \(P_{\alpha}\) is there to remind us that more than one such function exists. A host of distinct probability functions satisfyaxioms 1–5, so each of them satisfies Bayes’ Theorem. Some of these probability functions may provide a better fit with our intuitive conception of how the evidential support for hypotheses should work. Nevertheless, there are bound to be reasonable differences among Bayesian agents regarding to the initial plausibility of a hypothesis \(h_i\). This diversity in initial plausibility assessments is represented by diverse values for prior probabilities for the hypothesis: \(P_{\alpha}[h_i]\), \(P_{\beta}[h_i]\), \(P_{\gamma}[h_i]\), etc. This usually results in diverse values for posterior probabilities for hypotheses: \(P_{\alpha}[h_i \pmid e]\), \(P_{\beta}[h_i \pmid e]\), \(P_{\gamma}[h_i \pmid e]\), etc. So it is important to keep the diversity among evidential support functions in mind.
Here is how the Simple Form of Bayes’ Theorem lookswhen terms for the experimental (or observational) conditions, \(c\), and thebackground information and auxiliary hypotheses \(b\) are made explicit:
Bayes’ Theorem: Simple Form with explicit Experimental Conditions, Background Information and Auxiliary Hypotheses
\[\tag{8}\begin{align} P_{\alpha}[h_i \pmid b\cdot c\cdot e]&=\frac{P[e \pmid h_i\cdot b \cdot c] \times P_{\alpha}[h_i \pmid b]}{P_{\alpha}[e \pmid b \cdot c]}\\ &\qquad\times\frac{P_{\alpha}[c \pmid h_i\cdot b]}{P_{\alpha}[c \pmid b]}\\[3ex]& =\frac{P[e \pmid h_i\cdot b\cdot c] \times P_{\alpha}[h_i \pmid b]}{P_{\alpha}[e \pmid b\cdot c]}\\[2ex]& \textrm{when }P_{\alpha}[c \pmid h_j\cdot b] =P_{\alpha}[c \pmid b].\end{align}\]This version of the theorem determines the posterior probability of the hypothesis,\(P_{\alpha}[h_i \pmid b\cdot c\cdot e]\), from the value of thelikelihood of the evidence according to that hypothesis (taken together withbackground and auxiliaries and the experimental conditions), \(P[e \pmid h_i\cdot b\cdot c]\), the value of theprior probability of the hypothesis (on background and auxiliaries), \(P_{\alpha}[h_i \pmid b]\), and the value of theexpectedness of the evidence (on background and auxiliaries and the experimental conditions), \(P_{\alpha}[e \pmid b\cdot c]\). Notice that in the factor for the likelihood, \(P[e \pmid h_i\cdot b\cdot c]\), the subscript \(\alpha\) has been dropped. This marks the fact that in scientific contexts the likelihood of an evidential outcome \(e\) on the hypothesis together with explicit background and auxiliary hypotheses and the description of the experimental conditions, \(h_i\cdot b\cdot c\), is usually objectively determinate. This factor represents what the hypothesis (in conjunction with background and auxiliaries)objectively says about the likelihood of possible evidential outcomes of the experimental conditions. So, all reasonable support functions should agree on the values for likelihoods. (Section 5 will treat cases where the likelihoods may lack this kind of objectivity.)
This version of Bayes’ Theorem includes a term that represents the ratio of thelikelihood of the experimental conditions on the hypothesis and background information (and auxiliaries) to the“likelihood” of the experimental conditions onthe background (and auxiliaries) alone:\(P_{\alpha}[c \pmid h_i\cdot b]/ P_{\alpha}[c \pmid b]\).Arguably the value of this term should be 1, or very nearly 1, since thetruth of the hypothesis at issue should not significantly affect howlikely it is that the experimental conditions are satisfied. Ifvarious alternative hypotheses assign significantly differentlikelihoods to the experimental conditions themselves, then suchconditions should more properly be included as part of the evidentialoutcome \(e\).
Both theprior probability of the hypothesis and theexpectedness tend to be somewhat subjective factors in thatvarious agents from the same scientific community may legitimatelydisagree on what values these factors should take. Bayesian logiciansusually accept the apparent subjectivity of the prior probabilities ofhypotheses, but find the subjectivity of theexpectedness tobe more troubling. This is due at least in part to the fact that in aBayesian logic of evidential support the value of the expectednesscannot be determined independently of likelihoods and priorprobabilities of hypotheses. That is, when, for each member of a collectionof alternative hypotheses, the likelihood \(P[e \pmid h_j\cdot b\cdotc]\) has an objective (or intersubjectively agreed) value, theexpectedness is constrained by the following equation (wherethe sum ranges over a mutually exclusive and exhaustive collection ofalternative hypotheses \(\{h_1, h_2 , \ldots ,h_m , \ldots \}\), whichmay be finite or countably infinite):
This equation shows that the values for the prior probabilitiestogether with the values of the likelihoods uniquely determine thevalue for theexpectedness of the evidence. Furthermore, itimplies that the value ofthe expectedness must lie betweenthe largest and smallest of the various likelihood values implied bythe alternative hypotheses. However, the precise value oftheexpectedness can only be calculated this way when everyalternative to hypothesis \(h_j\) is specified. In cases where somealternative hypotheses remain unspecified (or undiscovered), the valueofthe expectedness is constrained in principle by thetotality of possible alternative hypotheses, but there is no way tofigure out precisely what its value should be.
Troubles with determining a numerical value for theexpectedness of the evidencemay be circumvented by appealing to another form of Bayes’Theorem, a ratio form that compares hypotheses one pair at a time:
Bayes’ Theorem: Ratio Form
\[\tag{9}\begin{align}\frac{P_{\alpha}[h_j \pmid b\cdot c\cdot e]}{P_{\alpha}[h_i \pmid b\cdot c\cdot e]}& =\frac{P[e \pmid h_j\cdot b\cdot c]}{P[e \pmid h_i\cdot b\cdot c]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\\&\qquad\times\frac{P_{\alpha}[c \pmid h_j\cdot b]}{P_{\alpha}[c \pmid h_i\cdot b]}\\[2ex]& =\frac{P[e \pmid h_j\cdot b\cdot c]}{P[e \pmid h_i\cdot b\cdot c]} \times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\\[2ex]& \textrm{when }P_{\alpha}[c \pmid h_j\cdot b] =P_{\alpha}[c \pmid h_i\cdot b].\end{align}\]The clause \(P_{\alpha}[c \pmid h_j\cdot b] = P_{\alpha}[c \pmid h_i\cdot b]\)says that the experimental (or observation) condition described by \(c\) is as likely on \((h_i\cdot b)\) as on \((h_j\cdot b)\) — i.e., the experimental or observation conditions are no more likely according to one hypothesis than according to the other.[9]
This Ratio Form of Bayes’ Theorem expresses how much moreplausible, on the evidence, one hypothesis is than another. Noticethat thelikelihood ratios carry the full import of theevidence. The evidence influences the evaluation of hypotheses in noother way. The only other factor that influences the value of theratio of posterior probabilities is the ratio of the priorprobabilities. When the likelihoods are fully objective, anysubjectivity that affects the ratio of posteriors can only arise viasubjectivity in the ratio of the priors.
This version of Bayes’s Theorem shows that in order to evaluatetheposterior probability ratios for pairs of hypotheses, theprior probabilities of hypotheses need not be evaluated absolutely;only their ratios are needed. That is, with regard to the priors, theBayesian evaluation of hypotheses only relies onhow much moreplausible one hypothesis is than another (due to considerationsexpressed withinb). This kind of Bayesian evaluation ofhypotheses is essentially comparative in that onlyratios oflikelihoods andratios of prior probabilities are everreally needed for the assessment of scientific hypotheses.Furthermore, we will soon see that the absolute values of theposterior probabilities of hypotheses entirely derive from theposterior probability ratios provided by the Ratio Form ofBayes’ Theorem.
When the evidence consists of a collection ofn distinctexperiments or observations, we may explicitly represent this fact byreplacing the term ‘\(c\)’ by the conjunction of experimental or observational conditions, \((c_1\cdotc_2\cdot \ldots \cdot c_n)\), and replacing the term‘\(e\)’ by the conjunction of their respective outcomes, \((e_1\cdot e_2\cdot \ldots \cdot e_n)\). For notational convenience, let’s use the term‘\(c^n\)’ to abbreviate the conjunction ofn the experimental conditions, and we use the term ‘\(e^n\)’ to abbreviate the corresponding conjunction ofn their respective outcomes. Relative to any given hypothesis \(h\), the evidentialoutcomes of distinct experiments or observations will usually beprobabilistically independent of one another, and also independent of theexperimental conditions for one another. In that case we have:
\[P[e^n \pmid h\cdot b\cdot c^n] = P[e_1 \pmid h\cdot b\cdot c_1] \times \cdots \times P[e_n \pmid h\cdot b\cdot c_n].\]When the Ratio Form of Bayes’ Theorem is extended to explicitly represent the evidence as consisting of a collection ofn of distinct experiments (or observations) and their respective outcomes, it takes the following form.
Bayes’ Theorem: Ratio Form for a Collection ofnDistinct Evidence Claims
\[\tag{9*}\begin{align}\frac{P_{\alpha}[h_j \pmid b\cdot c^n \cdot e^n ] }{P_{\alpha}[h_i \pmid b\cdot c^n \cdot e^n ]}& =\frac{P [ e^n \pmid h_j\cdot b\cdot c^n ]}{P [ e^n \pmid h_i\cdot b\cdot c^n ]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\\[2ex]&\qquad \times\frac{P_{\alpha} [ c^n \pmid h_j\cdot b]}{P_{\alpha} [ c^n \pmid h_i\cdot b]}\\[2ex]& =\frac{P [ e^n \pmid h_j\cdot b\cdot c^n ]}{P [ e^n \pmid h_i\cdot b\cdot c^n ]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\\[2ex]&\textrm{when }P_{\alpha} [ c ^n \pmid h _j\cdot b ] =P_{\alpha} [ c ^n \pmid h _i\cdot b ].\end{align}\]Furthermore, when evidence claims are probabilistically independent of one another, we have
\[\tag{9**}\begin{align}\frac{P_{\alpha}[h_j \pmid b\cdot c^n \cdot e^n ] }{P_{\alpha}[h_i \pmid b\cdot c^n \cdot e^n ]}& =\frac{P[e_1 \pmid h_j\cdot b\cdot c_1]}{P[e_1 \pmid h_i\cdot b\cdot c_1]}\times \cdots \\[2ex]&\qquad \times\frac{P[e_n \pmid h_{j }\cdot b\cdot c_{ n}]}{P[e_n \pmid h_{i }\cdot b\cdot c_{ n}]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}.\end{align}\]Let’s consider a simple example of how the Ratio Form ofBayes’ Theorem applies to a collection of independent evidential events. Suppose we possess a warped coinand want to determine its propensity forheads when tossed inthe usual way. Consider two hypotheses, \(h_{[p]}\) and\(h_{[q]}\), which say that the propensities for the coin to come upheads on the usual kinds of tosses are \(p\) and \(q\),respectively. Let \(c^n\) report that the coin is tossedntimes in the normal way, and let \(e^n\) report that preciselym occurrences ofheads has resulted. Supposing thatthe outcomes of such tosses are probabilistically independent (asserted by \(b\)),the respective likelihoods take the binomial form
\[P[e^n \pmid h_{[r]}\cdot b\cdot c^n] =\frac{n!}{m! \times(n-m)!}\times r^m (1-r)^{n-m},\]with \(r\) standing in for \(p\) and for \(q\), respectively. Then,Equation 9** yields the following formula, where the likelihood ratio is theratio of the respective binomial terms:
\[\frac{P_{\alpha}[h_{[p]} \pmid b\cdot c^{n }\cdot e^{ n}]}{P_{\alpha}[h_{[q]} \pmid b\cdot c^{n }\cdot e^{ n}]}=\frac{p^m (1-p)^{n-m}}{q^m (1-q)^{n-m}}\times\frac{P_{\alpha}[h_{[p]} \pmid b]}{P_{\alpha}[h_{[q]} \pmid b]}\]When, for instance, the coin is tossed \(n = 100\) times and comes upheads \(m = 72\) times, the evidence for hypothesis\(h_{[1/2]}\) as compared to \(h_{[3/4]}\) is given by the likelihoodratio
\[\frac{P [ e^n \pmid h_{[1/2]}\cdot b\cdot c^n ]}{P [ e^n \pmid h_{[3/4]}\cdot b\cdot c^n ]} = \frac{[(1/2)^{72}(1/2)^{28}]}{[(3/4)^{72}(1/4)^{28}]} = .000056269. \]In that case, even if the prior plausibility considerations(expressed within \(b\)) make it 100 times more plausible that thecoin isfair than that it is warped towardsheads withpropensity 3/4 — i.e., even if \(P_{\alpha}[h_{[1/2]} \pmid b] / P_{\alpha}[h_{[3/4]} \pmid b] = 100\) — the evidence provided by these tosses makes the posterior plausibility that the coin isfaironly about 6/1000ths as plausible as the hypothesis that itis warped towardsheads withpropensity 3/4:
\[\frac{P_{\alpha}[h_{[1/2]} \pmid b\cdot c^{n}\cdot e^{n}]}{P_{\alpha}[h_{[3/4]} \pmid b\cdot c^{n}\cdot e^{n}]} = .0056269.\]Thus, such evidencestrongly refutes the “fairnesshypothesis” relative to the “3/4-headshypothesis”, provided the assessment of priorprior plausibilities doesn’t make the latter hypothesistooextremely implausible to begin with. Notice, however, thatstrong refutation is notabsolute refutation.Additional evidence could reverse this trend towards therefutation of thefairness hypothesis.
This example employs repetitions of the same kind ofexperiment—repeated tosses of a coin. But the point holds moregenerally. If, as the evidence increases, thelikelihoodratios
\[\frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}]}{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]}\]approach 0, then the Ratio Forms of Bayes’ Theorem, Equations \(9*)\) and \(9**)\), show that the posterior probability of \(h_j\) must approach 0 aswell, since
\[P_{\alpha}[h_j \pmid b\cdot c^{n}\cdot e^{n}] \le \frac{P_{\alpha}[h_j \pmid b\cdot c^{n}\cdot e^{n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n}\cdot e^{n}]}.\]Such evidence comes to strongly refute \(h_j\), with little regard forits prior plausibility value. Indeed, Bayesian induction turns out tobe a version ofeliminative induction, and Equation \(9*\) and \(9**\) beginto illustrate this. For, suppose that \(h_i\) is the true hypothesis,and consider what happens toeach of its false competitors,\(h_j\). If enough evidence becomes available to drive each of thelikelihood ratios
\[\frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}]}{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]}\]toward 0 (asn increases), then Equation \(9*\) says that each false\(h_j\) will become effectively refuted — each of their posteriorprobabilities will approaches 0 (asn increases). As a result, the posterior probability of \(h_i\) must approach 1. The next two equations show precisely howthis works.
If we sum the ratio versions of Bayes’ Theorem in Equation\(9*\) over all alternatives to hypothesis \(h_i\) (including thecatch-all alternative \(h_K\), if appropriate), we get the Odds Formof Bayes’ Theorem. By definition, theodds against a statement \(A\) given \(B\) is related to the probability of \(A\) given \(B\) as follows:
\[\Omega_{\alpha}[{\nsim}A \pmid B] = \frac{P_{\alpha}[{\nsim}A \pmid B]}{P_{\alpha}[A \pmid B]} = \frac{1 - P_{\alpha}[A \pmid B]}{P_{\alpha}[A \pmid B]}.\]This notion ofodds gives rise to the following version of Bayes’ Theorem:
Bayes’ Theorem: Odds Form
\[\tag{10}\begin{align} \Omega_{\alpha}[{\nsim}h_i \pmid b\cdot c^{n }\cdot e^{ n}]& = \sum_{j\ne i}\frac{P_{\alpha}[h_j \pmid b\cdot c^{n }\cdot e^{ n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n }\cdot e^{ n}]}\\ &\qquad+\frac{P_{\alpha}[h_K \pmid b\cdot c^{n }\cdot e^{ n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n }\cdot e^{ n}]}\\[2ex]& =\sum_{ j\ne i}\frac{P[e^n \pmid h_{j }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\\[2ex]&\qquad +\frac{P_{\alpha}[e^n \pmid h_{K }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]}\times\frac{P_{\alpha}[h_K \pmid b]}{P_{\alpha}[h_i \pmid b]}\end{align}\]where the factor following the ‘+’ sign is onlyrequired in cases where a catch-all alternative hypothesis, \(h_K\),is needed.
Recall that when we have a finite collection of concrete alternativehypotheses available, \(\{h_1, h_2 , \ldots ,h_m\}\), but where thisset of alternatives is not exhaustive (where additional,unarticulated, undiscovered alternative hypotheses may exist), thecatch-all alternative hypothesis \(h_K\) is just the denial of each ofthe concrete alternatives, \(({\nsim}h_1\cdot{\nsim}h_2\cdot \ldots\cdot{\nsim}h_m)\). Generally, the likelihood of evidence claims relative toa catch-all hypothesis will not enjoy the same kind of objectivity possessed bythe likelihoods for concrete alternative hypotheses. So, we leave thesubscript \(\alpha\) attached to the likelihood for the catch-all hypothesisto indicate this lack of objectivity.
Although the catch-all hypothesis may lack objective likelihoods, theinfluence of the catch-all term in Bayes’ Theorem diminishes asadditional concrete hypotheses are articulated. That is, as newhypotheses are discovered they are “peeled off” of thecatch-all. So, when a new hypothesis \(h_{m+1}\) is formulated andmade explicit, the old catch-all hypothesis \(h_K\) is replaced by anew catch-all, \(h_{K*}\), of form \(({\nsim}h_1\cdot\cdot{\nsim}h_2\cdot \ldots \cdot{\nsim}h_{m}\cdot{\nsim}h_{m+1})\);and the prior probability for the new catch-all hypothesis is gottenby diminishing the prior of the old catch-all: \(P_{\alpha}[h_{K*}\pmid b] = P_{\alpha}[h_K \pmid b] - P_{\alpha}[h_{m+1} \pmid b]\).Thus, the influence of the catch-all term should diminish towards 0 asnew alternative hypotheses are made explicit.[10]
If increasing evidence drives towards 0 the likelihood ratioscomparing each competitor \(h_j\) with hypothesis \(h_i\), then theodds against \(h_i\), \(\Omega_{\alpha}[{\nsim}h_i \pmid b\cdotc^{n}\cdot e^{n}]\), will approach 0 (provided that priors ofcatch-all terms, if needed, approach 0 as well, as new alternativehypotheses are made explicit and peeled off). And, as\(\Omega_{\alpha}[{\nsim}h_i \pmid b\cdot c^{n}\cdot e^{n}]\)approaches 0, the posterior probability of \(h_i\) goes to 1. This derives from the fact that the odds against \(h_i\) is related to and its posterior probability by the following formula:
Bayes’ Theorem: General Probabilistic Form
\[\tag{11} P_{\alpha}[h_i \pmid b\cdot c^{n}\cdot e^n] = \frac{1}{1 + \Omega_{\alpha}[{\nsim}h_i \pmid b\cdot c^n\cdot e^{n}]}.\]The odds against a hypothesis depends only on the values ofratiosof posterior probabilities, which entirely derive from the RatioForm of Bayes’ Theorem. Thus, we see that the individual valueof the posterior probability of a hypothesis depends only on theratios of posterior probabilities, which come from the RatioForm of Bayes’ Theorem. Thus, the Ratio Form of Bayes’Theorem captures all the essential features of the Bayesianevaluation of hypothesis. It shows how the impact of evidence (in theform of likelihood ratios) combines with comparative plausibilityassessments of hypotheses (in the form of ratios of priorprobabilities) to provide a net assessment of the extent to whichhypotheses are refuted or supported via contests with their rivals.
There is a result, a kind ofBayesian Convergence Theorem,that shows that if \(h_i\) (together with \(b\cdot c^n)\) is true,then the likelihood ratios
\[\frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}]}{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]}\]comparing evidentially distinguishable alternative hypothesis \(h_j\)to \(h_i\) willvery probably approach 0 as evidenceaccumulates (i.e., asn increases). Let’s call thisresult theLikelihood Ratio Convergence Theorem. When thistheorem applies,Equation \(9^*\) shows that the posterior probability of a false competitor \(h_j\)will very probably approach 0 as evidence accumulates, regardless ofthe value of its prior probability \(P_{\alpha}[h_j \pmid b]\). Asthis happens to each of \(h_i\)’s false competitors,Equations 10 and11 say that the posterior probability of the true hypothesis, \(h_i\),will approach 1 as evidence increases.[11] Thus, Bayesian induction is at bottom a version ofinduction byelimination, where the elimination of alternatives comes by wayof likelihood ratios approaching 0 as evidence accumulates. Thus, whentheLikelihood Ratio Convergence Theorem applies, theCriterion of Adequacy for an Inductive Logic described at thebeginning of this article will be satisfied: As evidence accumulates,thedegree to which the collection of true evidencestatements comes tosupport a hypothesis, as measured by thelogic, should very probably come to indicate that false hypotheses areprobably false and that true hypotheses are probably true. We willexamine thisLikelihood Ratio Convergence Theorem inSection 4.[12]
A view calledLikelihoodism relies on likelihood ratios inmuch the same way as the Bayesian logic articulated above. However,Likelihoodism attempts to avoid the use of priorprobabilities. For an account of this alternative view, seeSupplement: Likelihood Ratios, Likelihoodism, and the Law of Likelihood. For more discussion of Bayes’ Theorem and its application, seethe entries onBayes’ Theorem and onBayesian Epistemology in thisEncyclopedia.
Given that a scientific community should largely agree on the valuesof the likelihoods, any significant disagreement among them withregard to the values of posterior probabilities of hypotheses shouldderive from disagreements over their assessments of values for theprior probabilities of those hypotheses. We saw inSection 3.3 that the Bayesian logic of evidential support need only rely onassessments ofratios of prior probabilities—on howmuch more plausible one hypothesis is than another. Thus, the logic ofevidential support only requires that scientists can assess thecomparative plausibilities of various hypotheses. Presumably, inscientific contexts the comparative plausibility values for hypothesesshould depend on explicit plausibility arguments, not merely onprivately held opinions. (Formally, the logic may representcomparative plausibility arguments by explicit statements expressedwithin \(b\).) It would be highlyunscientific for amember of the scientific community to disregard or dismiss ahypothesis that other members take to be a reasonable proposal withonly the comment, “don’t ask me to give my reasons,it’s just my opinion”. Even so, agents may be unable tospecifyprecisely how much more strongly the availableplausibility arguments support a hypothesis over an alternative; soprior probability ratios for hypotheses may be vague. Furthermore,agents in a scientific community may disagree about how strongly theavailable plausibility arguments support a hypothesis over a rivalhypothesis; so prior probability ratios may be somewhat diverse aswell.
Both the vagueness of comparative plausibilities assessments forindividual agents and the diversity of such assessments among thecommunity of agents can be represented formally by sets of supportfunctions, \(\{P_{\alpha}, P_{\beta}, \ldots \}\), that agree on thevalues for the likelihoods but encompass a range of values for the(ratios of) prior probabilities of hypotheses.Vagueness anddiversity are somewhat different issues, but they may berepresented in much the same way. Let’s briefly consider each inturn.
Assessments of the prior plausibilities of hypotheses will often bevague—not subject to the kind of precise quantitative treatmentthat a Bayesian version of probabilistic inductive logic may seem torequire for prior probabilities. So, it may seem that the kind ofassessment of prior probabilities required to get the Bayesianalgorithm going cannot be accomplished in practice. To see howBayesian inductivists address this worry, first recall the Ratio Formof Bayes’ Theorem, Equation \(9^*\).
\[\frac{P_{\alpha}[h_j \pmid b\cdot c^{n }\cdot e^{ n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n }\cdot e^{ n}]}=\frac{P[e^n \pmid h_{j }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\]Recall that this Ratio Form of the theorem captures the essentialfeatures of the logic of evidential support, even though it onlyprovides a value for the ratio of the posterior probabilities. Noticethat the ratio form of the theorem easily accommodates situationswhere we don’t have precise numerical values for priorprobabilities. It only depends on our ability to assesshow muchmore or less plausible alternative hypothesis \(h_j\) is thanhypothesis \(h_i\)—only the value of the ratio \(P_{\alpha}[h_j\pmid b] / P_{\alpha}[h_i \pmid b]\) need be assessed; the values ofthe individual prior probabilities are not needed. Such comparativeplausibilities are much easier to assess than specific numericalvalues for the prior probabilities of individual hypotheses. Whencombined with theratio of likelihoods, thisratio ofpriors suffices to yield an assessment of theratio ofposterior plausibilities,
\[\frac{P_{\alpha}[h_j \pmid b\cdot c^{n}\cdot e^{n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n}\cdot e^{n}]}.\]Although such posterior ratios don’t supply values for theposterior probabilities of individual hypotheses, they place a crucialconstraint on the posterior support of hypothesis \(h_j\), since
\[\begin{align}P_{\alpha}[h_j \pmid b\cdot c^{n }\cdot e^{ n}] & \lt \frac{P_{\alpha}[h_j \pmid b\cdot c^{n }\cdot e^{ n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n }\cdot e^{ n}]}\\& =\frac{P[e^n \pmid h_{j }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]}\times\frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]}\end{align}\]This Ratio Form of Bayes’ Theorem tolerates a good deal ofvagueness or imprecision in assessments of the ratios of priorprobabilities. In practice one need only assess bounds for these priorplausibility ratios to achieve meaningful results. Given a prior ratioin a specific interval,
\[q \le \frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]} \le r\]a likelihood ratio
\[\frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}]}{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]} = \LR^n\]results in a posterior support ratio in the interval
\[(\LR^n\times q) \le \frac{P_{\alpha}[h_j \pmid b\cdot c^{n}\cdot e^{n}]}{P_{\alpha}[h_i \pmid b\cdot c^{n}\cdot e^{n}]} \le (\LR^n \times r).\](Technically each probabilistic support function assigns a specificnumerical value to each pair of sentences; so when we write aninequality like
\[q \le \frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]} \le r\]we are really referring to a set of probability functions\(P_{\alpha}\), avagueness set, for which the inequalityholds. Thus, technically, the Bayesian logic employs sets ofprobabilistic support functions to represent the vagueness incomparative plausibility values for hypotheses.)
Observe that if the likelihood ratio values \(\LR^n\) approach 0 asthe amount of evidence \(e^n\) increases, the interval of values forthe posterior probability ratio must become tighter as the upper bound(\(\LR^n\times r)\) approaches 0. Furthermore, the absolute degree ofsupport for \(h_j\), \(P_{\alpha}[h_j \pmid b\cdot c^{n}\cdote^{n}]\), must also approach 0.
This observation is really useful. For, it can be shown that when\(h_{i}\cdot b\cdot c^{n}\) is true and \(h_j\) is empiricallydistinct from \(h_i\), the continual pursuit of evidence isverylikely to result in evidential outcomes \(e^n\) that (asn increases) yield values of likelihood ratios \(P[e^n \pmidh_{j}\cdot b\cdot c^{n}] / P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\) thatapproach 0 as the amount of evidence increases. This result, calledtheLikelihood Ratio Convergence Theorem, will beinvestigated in more detail inSection 4. When that kind of convergence towards 0 for likelihood ratios occurs,the upper bound on the posterior probability ratio also approaches 0,driving the posterior probability of \(h_j\) to approach 0 as well,effectively refuting hypothesis \(h_j\). Thus, false competitors of atrue hypothesis will effectively be eliminated by increasing evidence.As this happens, Equations9* through11 show that the posterior probability \(P_{\alpha}[h_i \pmid b\cdotc^{n}\cdot e^{n}]\) of the true hypothesis \(h_i\) approaches 1.
Thus, Bayesian logic of inductive support for hypotheses is a form ofeliminative induction, where the evidence effectively refutes falsealternatives to the true hypothesis. Because of its eliminativenature, the Bayesian logic of evidential support doesn’t requireprecise values for prior probabilities. It only needs to draw onbounds on the values of comparative plausibility ratios, and thesebounds only play a significant role while evidence remains fairlyweak. If the true hypothesis is assessed to be comparatively plausible(due to plausibility arguments contained inb), thenplausibility assessments give it a leg-up over alternatives. If thetrue hypothesis is assessed to be comparatively implausible, theplausibility assessments merely slow down the rate at which it comesto dominate its rivals, reflecting the idea thatextraordinaryhypotheses require extraordinary evidence (or an extraordinaryaccumulation of evidence) to overcome their initial implausibilities.Thus, as evidence accumulates, the agent’s vague initialplausibility assessments transform into quite sharp posteriorprobabilities that indicate their strong refutation or support by theevidence.
When the various agents in a community may widely disagree over thenon-evidential plausibilities of hypotheses, the Bayesian logic ofevidential support may represent this kind ofdiversityacross the community of agents as a collection of the agents’vagueness sets of support functions. Let’s call such acollection of support functions adiversity set. That is, adiversity set is just a set of support functions\(P_{\alpha}\) that cover the ranges of values for comparativeplausibility assessments for pairs of competing hypotheses
\[q \le \frac{P_{\alpha}[h_j \pmid b]}{P_{\alpha}[h_i \pmid b]} \le r\]as assessed by the scientific community. But, once again, ifaccumulating evidence drives the likelihood ratios comparing variousalternative hypotheses to the true hypothesis towards 0, the range ofsupport functions in adiversity set will come to nearagreement, near 0, on the values for posterior probabilities of falsecompetitors of the true hypothesis. So, not only does such evidencefirm up each agent’s vague initial plausibilityassessment, it also brings the whole community into agreement on thenear refutation of empirically distinct competitors of a truehypothesis. As this happens, the posterior probability of the truehypothesis may approach 1. TheLikelihood Ratio ConvergenceTheorem implies that this kind of convergence to the truth shouldvery probably happen, provided that the true hypothesis isempirically distinct enough from its rivals.
One more point about prior probabilities and Bayesian convergenceshould be mentioned before proceeding toSection 4. Some subjectivist versions of Bayesian induction seem to suggest thatan agent’s prior plausibility assessments for hypotheses shouldstay fixed once-and-for-all, and that all plausibility updating shouldbe brought about via the likelihoods in accord with Bayes’Theorem. Critics argue that this is unreasonable. The members of ascientific community may quite legitimately revise their (comparative)prior plausibility assessments for hypotheses from time to time asthey rethink plausibility arguments and bring new considerations tobear. This seems a natural part of the conceptual development of ascience. It turns out that such reassessments of the comparativeplausibilities of hypotheses poses no difficulty for the probabilisticinductive logic discussed here. Such reassessments may be representedby the addition or modification of explicit statements that modify thebackground informationb. Such reassessments may result in(non-Bayesian) transitions to newvagueness sets forindividual agents and newdiversity sets for the community.Thelogic of Bayesian induction (as described here) hasnothing to say about what values the prior plausibility assessmentsfor hypotheses should have; and it places no restrictions on how theymight change over time. Provided that the series of reassessments of(comparative) prior plausibilities doesn’t happen to diminishthe (comparative) prior plausibility value of the true hypothesistowards zero (or, at least, doesn’t do so too quickly), theLikelihood Ratio Convergence Theorem implies that theevidence will very probably bring the posterior probabilities ofempirically distinct rivals of the true hypothesis to approach 0 viadecreasing likelihood ratios; and as this happens, the posteriorprobability of the true hypothesis will head towards 1.
(Those interested in a Bayesian account of Enumerative Induction andthe estimation of values for relative frequencies of attributes inpopulations should see the supplement,Enumerative Inductions: Bayesian Estimation and Convergence.)
In this section we will investigate theLikelihood RatioConvergence Theorem. This theorem shows that under certainreasonable conditions, when hypothesis \(h_i\) (in conjunction withauxiliaries inb) is true and an alternative hypothesis \(h_j\)is empirically distinct from \(h_i\) on some possible outcomes ofexperiments or observations described by conditions \(c_k\), then itisvery likely that a long enough sequence of suchexperiments and observationsc\(^n\) will produce a sequenceof outcomes \(e^n\) that yields likelihood ratios \(P[e^n \pmidh_{j}\cdot b\cdot c^{n}] / P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\) thatapproach 0, favoring \(h_i\) over \(h_j\), as evidence accumulates(i.e., asn increases). This theorem places an explicit lowerbound on the “rate of probable convergence” of theselikelihood ratios towards 0. That is, it puts a lower bound on howlikely it is, if \(h_i\) is true, that a stream of outcomes will occurthat yields likelihood ratio values against \(h_j\) as compared to\(h_i\) that lie within any specified small distance above 0.
The theorem itself does not require the full apparatus of Bayesianprobability functions. It draws only on likelihoods. Neither thestatement of the theorem nor its proof employ prior probabilities ofany kind. So evenlikelihoodists, who eschew the use ofBayesian prior probabilities, may embrace this result. Given the formsof Bayes’ Theorem, 9*-11 from the previous section, theLikelihood Ratio Convergence Theorem further implies thelikely convergence to 0 of the posterior probabilities of falsecompetitors of a true hypothesis. That is, when the ratios \(P[e^n\pmid h_{j}\cdot b\cdot c^{n}] / P[e^n \pmid h_{i}\cdot b\cdotc^{n}]\) approach 0 for increasingn, the Ratio Form ofBayes’ Theorem,Equation 9*, says that the posterior probability of \(h_j\) must also approach 0as evidence accumulates, regardless of the value of its priorprobability. So, support functions in collections representing vagueprior plausibilities for an individual agent (i.e., avagueness set) and representing the diverse range of priorsfor a community of agents (i.e., adiversity set) will cometo agree on the near 0 posterior probability of empirically distinctfalse rivals of a true hypothesis. And as the posterior probabilitiesof false competitors fall, the posterior probability of the truehypothesis heads towards 1. Thus, the theorem establishes that theinductive logic of probabilistic support functions satisfies theCriterion of Adequacy (CoA) suggested at the beginning of this article.
TheLikelihood Ratio Convergence Theorem merely provides somesufficient conditions for probable convergence. But likelihood ratiosmay well converge towards 0 (in the way described by the theorem) evenwhen the antecedent conditions of the theorem are not satisfied. Thistheorem overcomes many of the objections raised by critics of Bayesianconvergence results. First, this theorem does not employsecond-order probabilities; it says noting about theprobability of a probability. It only concerns the probability of aparticular disjunctive sentence that expresses a disjunction ofvarious possible sequences of experimental or observational outcomes.The theorem does not require evidence to consist of sequences ofevents that, according to the hypothesis, are identically distributed(like repeated tosses of a die). The result is most easily expressedin cases where the individual outcomes of a sequence of experiments orobservations are probabilistically independent, given each hypothesis.So that is the version that will be presented in this section.However, a version of the theorem also holds when the individualoutcomes of the evidence stream are not probabilistically independent,given the hypotheses. (This more general version of the theorem willbe presented in asupplement on the Probabilistic Refutation Theorem, below, where the proof of both versions is provided.) In addition,this result does not rely on supposing that the probability functionsinvolved arecountably additive. Furthermore, the explicitlower bounds on the rate of convergence provided by this result meansthat there is no need to wait for the infinitely long run beforeconvergence occurs (as some critics seem to think).
It is sometimes claimed that Bayesian convergence results only workwhen an agent locks in values for the prior probabilities ofhypotheses once-and-for-all, and then updates posterior probabilitiesfrom there only by conditioning on evidence via Bayes Theorem. TheLikelihood Ratio Convergence Theorem, however, applies evenif agents revise their prior probability assessments over time. Suchnon-Bayesian shifts from one support function (orvaguenessset) to another may arise from new plausibility arguments or fromreassessments of the strengths of old ones. TheLikelihood RatioConvergence Theorem itself only involves the values oflikelihoods. So, provided such reassessments don’t push theprior probability of the true hypothesis towards 0toorapidly, the theorem implies that the posterior probabilities ofeach empirically distinct false competitor willvery probablyapproach 0 as evidence increases.[13]
To specify the details of theLikelihood Ratio ConvergenceTheorem we’ll need a few additional notational conventionsand definitions. Here they are.
For a given sequence ofn experiments or observations \(c^n\),consider the set of those possible sequences of outcomes that wouldresult in likelihood ratios for \(h_j\) over \(h_i\) that are lessthan some chosen small number \(\varepsilon \gt 0\). This set isrepresented by the expression,
\[\left\{e^n : \frac{P[e^n \pmid h_{j }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]} \lt \varepsilon \right\}.\]Placing the disjunction symbol ‘\(\vee\)’ in front of thisexpression yields an expression,
\[\vee \left\{ e^n : \frac{P[e^n \pmid h_{j }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]} \lt \varepsilon \right\} ,\]that we’ll use to represent the disjunction of all outcomesequences \(e^n\) in this set. So,
\[\vee \left\{ e^n : \frac{P[e^n \pmid h_{j }\cdot b\cdot c^{ n}]}{P[e^n \pmid h_{i }\cdot b\cdot c^{ n}]} \lt \varepsilon \right\} \]is just a particular sentence that says, in effect, “one of thesequences of outcomes of the firstn experiments orobservations will occur that makes the likelihood ratio for \(h_j\)over \(h_i\) less than \(\varepsilon\)”.
TheLikelihood Ratio Convergence Theorem says that undercertain conditions (covered in detail below), the likelihood of adisjunctive sentence of this sort, given that ‘\(h_{i}\cdotb\cdot c^{n}\)’ is true,
\[ P \left[\vee \left\{ e^n : \frac{P[e^n \pmid h_{j }\cdot b\cdot c^n]}{P[e^n \pmid h_i\cdot b\cdot c^{ n}]} \lt \varepsilon \right\} \pmid h_{i }\cdot b\cdot c^{ n}\right] ,\]must be at least \(1-(\psi /n)\), for some explicitly calculable term\(\psi\). Thus, the true hypothesis \(h_i\) probabilistically impliesthat as the amount of evidence,n, increases, it becomes highlylikely (as close to 1 as you please) that one of the outcome sequences\(e^n\) will occur that yields a likelihood ratio \(P[e^n \pmidh_{j}\cdot b\cdot c^{n}] / P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\) lessthan \(\varepsilon\); and this holds for any specific value of\(\varepsilon\) you may choose. As this happens, the posteriorprobability of \(h_i\)’s false competitor, \(h_j\), mustapproach 0, as required by the Ratio Form of Bayes’ Theorem,Equation 9*.
The term \(\psi\) in the lower bound of this probability depends on ameasure of the empirical distinctness of the two hypotheses \(h_j\)and \(h_i\) for the proposed sequence of experiments and observations\(c^n\). To specify this measure we need to contemplate the collectionof possible outcomes of each experiment or observation. So, considersome sequence of experimental or observational conditions described bysentences \(c_1,c_2 ,\ldots ,c_n\). Corresponding to each condition\(c_k\) there will be some range of possible alternative outcomes. Let\(O_{k} = \{o_{k1},o_{k2},\ldots ,o_{kw}\}\) be a set of statementsdescribing the alternative possible outcomes for condition \(c_k\).(The number of alternative outcomes will usually differ for distinctexperiments among those in the sequence \(c_1 ,\ldots ,c_n\); so, thevalue ofw may depend on \(c_k\).) For each hypothesis \(h_j\),the alternative outcomes of \(c_k\) in \(O_k\) are mutually exclusiveand exhaustive, so we have:
\[ P[o_{ku }\cdot o_{kv} \pmid h_j\cdot b\cdot c_{ k}] = 0 \textrm{ and }\sum^{w}_{u = 1} P[o_{ku} \pmid h_{j }\cdot b\cdot c_{ k}] =1 . \]We now let expressions of form ‘\(e_k\)’ act as variablesthat range over the possible outcomes of condition \(c_k\)—i.e.,\(e_k\) ranges over the members of \(O_k\). As before,‘\(c^n\)’ denotes the conjunction of the firstntest conditions, \((c_1\cdot c_2\cdot \ldots \cdot c_n)\), and‘\(e^n\)’ represents possible sequences of correspondingoutcomes, \((e_1\cdot e_2\cdot \ldots \cdot e_n)\). Let’s usethe expression ‘E\(^n\)’ to represent the set ofall possible outcome sequences that may result from the sequence ofconditionsc\(^n\). So, for each hypothesis \(h_j\)(including \(h_i)\), \(\sum_{e^n\in E^n} P[e^n \pmid h_{j}\cdot b\cdotc^{n}] = 1\).
Everything introduced in this subsection is mere notationalconvention. No substantive suppositions (other than the axioms ofprobability theory) have yet been introduced. The version of theLikelihood Ratio Convergence Theorem I’ll present belowdoes, however, draw on one substantive supposition, although a ratherweak one. The next subsection will discuss that supposition indetail.
In most scientific contexts the outcomes in a stream of experiments orobservations areprobabilistically independent of one anotherrelative to each hypothesis under consideration, or can at least bedivided up into probabilistically independent parts. For our purposesprobabilistic independence of evidential outcomes on ahypothesis divides neatly into two types.
Definition: Independent Evidence Conditions:
When these two conditions hold, the likelihood for an evidencesequence may be decomposed into the product of the likelihoods forindividual experiments or observations. To see how the twoindependence conditions affect the decomposition, firstconsider the following formula, which holds even when neitherindependence condition is satisfied:
\[\tag{12} P[e^n \pmid h_{j }\cdot b\cdot c^{ n}] = \prod^{n}_{k = 1} P[e_k \pmid h_{j }\cdot b\cdot c^n\cdot e^{ k-1}] . \]Whencondition-independence holds, the likelihood of thewhole evidence stream parses into a product of likelihoods thatprobabilistically depend on only past observation conditionsand their outcomes. They do not depend on the conditions for otherexperiments whose outcomes are not yet specified. Here is theformula:
\[\tag{13} P[e^n \pmid h_{j }\cdot b\cdot c^{ n}] = \prod^{n}_{k = 1} P[e_k \pmid h_{j }\cdot b\cdot c_k\cdot (c^{k-1}\cdot e^{ k-1})] . \]Finally, whenever bothindependence conditions are satisfiedwe have the following relationship between the likelihood of theevidence stream and the likelihoods of individual experiments orobservations:
\[\tag{14} P[e^n \pmid h_{j }\cdot b\cdot c^{ n}] = \prod^{n}_{k = 1} P[e_k \pmid h_{j }\cdot b\cdot c_{ k}] . \](For proofs of Equations 12–14 seeSupplement: Immediate Consequences of Independent Evidence Conditions.)
In scientific contexts the evidence can almost always be divided intoparts that satisfy both clauses of theIndependent EvidenceCondition with respect to each alternative hypothesis. To seewhy, let us consider each independence condition more carefully.
Condition-independence says that the mere addition of a newobservation condition \(c_{k+1}\),without specifying one of itsoutcomes, does not alter the likelihood of the outcomes \(e^k\)of other experiments \(c^k\). To appreciate the significance of thiscondition, imagine what it would be like if it were violated. Supposehypothesis \(h_j\) is some statistical theory, say, for example, aquantum theory of superconductivity. The conditions expressed in\(c^k\) describe a number of experimental setups, perhaps conducted innumerous labs throughout the world, that test a variety of aspects ofthe theory (e.g., experiments that test electrical conductivity indifferent materials at a range of temperatures). An outcome sequence\(e^k\) describes the results of these experiments. The violation ofcondition-independence would mean that merely adding to\(h_{j}\cdot b\cdot c^{k}\) a statement \(c_{k+1}\) describing how anadditional experiment has been set up, but with no mention of itsoutcome, changes how likely the evidence sequence \(e^k\) is taken tobe. What \((h_j\cdot b)\)says via likelihoods about theoutcomes \(e^k\) of experiments \(c^k\) differs as a result of merelysupplying a description of another experimental arrangement,\(c_{k+1}\).Condition-independence, when it holds, rules outsuch strange effects.
Result-independence says that the description of previoustest conditionstogether with their outcomes is irrelevant tothe likelihoods of outcomes for additional experiments. If thiscondition were widely violated, then in order to specify the mostinformed likelihoods for a given hypothesis one would need to includeinformation about volumes of past observations and their outcomes.What a hypothesis says about future cases would depend on how pastcases have gone. Suchdependence had better not happen on alarge scale. Otherwise, the hypothesis would be fairly useless, sinceits empirical import in each specific case would depend on taking intoaccount volumes of past observational and experimental results.However, even if such dependencies occur, provided they are not toopervasive,result-independence can be accommodated rathereasily by packaging each collection ofresult-dependent datatogether, treating it like a single extended experiment orobservation. Theresult-independence condition will then besatisfied by letting each term ‘\(c_k\)’ in the statementof the independence condition represent a conjunction of testconditions for a collection ofresult-dependent tests, and byletting each term ‘\(e_k\)’ (and each term‘\(o_{ku}\)’) stand for a conjunction of the correspondingresult-dependent outcomes. Thus, by packagingresult-dependent data together in this way, theresult-independence condition is satisfied by those(conjunctive) statements that describe the separate,result-independent chunks.[14]
The version of theLikelihood Ratio Convergence Theorem wewill examine depends only on theIndependent EvidenceConditions (together with the axioms of probability theory). Itdraws on no other assumptions. Indeed, an even more general version ofthe theorem can be established, a version that draws on neither of theIndependent Evidence Conditions. However, theIndependentEvidence Conditions will be satisfied in almost all scientificcontexts, so little will be lost by assuming them. (And thepresentation will run more smoothly if we side-step the addedcomplications needed to explain the more general result.)
From this point on, let us assume that the following versions of theIndependent Evidence Conditions hold.
Assumption: Independent Evidence Assumptions. Foreach hypothesish and backgroundb under consideration,we assume that the experiments and observations can be packaged intocondition statements, \(c_1 ,\ldots ,c_k, c_{k+1},\ldots\), andpossible outcomes in a way that satisfies the followingconditions:
We now have all that is needed to begin to state theLikelihoodRatio Convergence Theorem.
TheLikelihood Ratio Convergence Theorem comes in two parts.The first part applies only to those experiments or observations\(c_k\) within the total evidence stream \(c^n\) for which some of thepossible outcomes have 0 likelihood of occurring according tohypothesis \(h_j\) but have non-0 likelihood of occurring according to\(h_i\). Such outcomes are highly desirable. If they occur, thelikelihood ratio comparing \(h_j\) to \(h_i\) will become 0, and\(h_j\) will befalsified. So-calledcrucialexperiments are a special case of this, where for at least onepossible outcome \(o_{ku}\), \(P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]= 1\) and \(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] = 0\). In the moregeneral case \(h_i\) together withb says that one of theoutcomes of \(c_k\) is at least minimally probable, whereas \(h_j\)says that this outcome is impossible—i.e., \(P[o_{ku} \pmidh_{i}\cdot b\cdot c_{k}] \gt 0\) and \(P[o_{ku} \pmid h_{j}\cdotb\cdot c_{k}] = 0\). It will be convenient to define a term for thissituation.
Definition: Full Outcome Compatibility. Let’scall \(h_j\)fully outcome-compatible with \(h_i\) onexperiment or observation \(c_k\)just when, for each of itspossible outcomes \(e_k\), if \(P[e_k \pmid h_{i}\cdot b\cdot c_{k}]\gt 0\), then \(P[e_k \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\).Equivalently, \(h_j\) isfails to be fully outcome-compatiblewith \(h_i\) on experiment or observation \(c_k\)just when,for at least one of its possible outcomes \(e_k\), \(P[e_k \pmidh_{i}\cdot b\cdot c_{k}] \gt 0\) but \(P[e_k \pmid h_{j}\cdot b\cdotc_{k}] = 0\).
The first part of theLikelihood Ratio Convergence Theoremapplies to that part of the total stream of evidence (i.e., thatsubsequence of the total evidence stream) on which hypothesis \(h_j\)fails to be fully outcome-compatible with hypothesis \(h_i\);the second part of the theorem applies to the remaining part of thetotal stream of evidence, that subsequence of the total evidencestream on which \(h_j\) isfully outcome-compatible with\(h_i\). It turns out that these two kinds of cases must be treateddifferently. (This is due to the way in which theexpectedinformation content for empirically distinguishing between thetwo hypotheses will be measured for experiments and observations thatarefully outcome compatible; this measure of informationcontent blows up (becomes infinite) for experiments and observationsthatfail to be fully outcome compatible). Thus, thefollowing part of the convergence theorem applies to just that part ofthe total stream of evidence that consists of experiments andobservations thatfail to be fully outcome compatible for thepair of hypotheses involved. Here, then, is the first part of theconvergence theorem.
Likelihood Ratio Convergence Theorem 1—The FalsificationTheorem:
Suppose that the total stream of evidence \(c^n\) contains preciselym experiments or observations on which \(h_j\)fails to befully outcome-compatible with \(h_i\). And suppose that theIndependent Evidence Conditions hold for evidence stream\(c^n\) with respect to each of these two hypotheses. Furthermore,suppose there is a lower bound \(\delta \gt 0\) such that for each\(c_k\) on which \(h_j\)fails to be fully outcome-compatiblewith \(h_i\),
—i.e., \(h_i\) together with \(b\cdot c_k\)says, withlikelihood at least as large as \(\delta\), that one of the outcomeswill occur that \(h_j\)says cannot occur. Then,
\[\begin{align}P \left[\vee \left\{ e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^n]}{P[e^n \pmid h_i\cdot b\cdot c^{n}]} = 0\right\} \pmid h_{i}\cdot b\cdot c^{n}\right]\\[2ex] = P\left[\vee \left\{ e^n : P[e^n \pmid h_{j}\cdot b\cdot c^{n}] = 0\right\} \pmid h_{i}\cdot b\cdot c^{n}\right]\\ \ge 1 - (1-\delta)^m,\end{align}\]which approaches 1 for largem. (For proof seeProof of the Falsification Theorem.)
In other words, we only suppose that for each ofmobservations, \(c_k, h_i\)says observation \(c_k\) has atleast a small likelihood \(\delta\) of producing one of the outcomes\(o_{ku}\) that \(h_j\)says is impossible. If the numberm of such experiments or observations is large enough (or ifthe lower bound \(\delta\) on the likelihoods of getting such outcomesis large enough), and if \(h_i\) (together with \(b\cdot c^n)\) istrue, then it is highly likely that one of the outcomes held to beimpossible by \(h_j\) will actually occur. If one of these outcomesdoes occur, then the likelihood ratio for \(h_j\) as compared to over\(h_i\) will become 0. According to Bayes’ Theorem, when thishappen, \(h_j\) is absolutely refuted by the evidence—itsposterior probability becomes 0.
The Falsification Theorem is quite commonsensical. First, notice thatif there is acrucial experiment in the evidence stream, thetheorem is completely obvious. That is, suppose for the specificexperiment \(c_k\) (in evidence stream \(c^n)\) there are twoincompatible possible outcomes \(o_{kv}\) and \(o_{ku}\) such that\(P[o_{kv} \pmid h_{j}\cdot b\cdot c_{k}] = 1\) and \(P[o_{ku} \pmidh_{i}\cdot b\cdot c_{k}] = 1\). Then, clearly, \(P[\vee \{ o_{ku}:P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] = 0\} \pmid h_{i}\cdot b\cdotc_{k}] = 1\), since \(o_{ku}\) is one of the \(o_{ku}\) such that\(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] = 0\). So, where a crucialexperiment is available, the theorem applies with \(m = 1\) and\(\delta = 1\).
The theorem is equally commonsensical for cases where no crucialexperiment is available. To see what it says in such cases, consideran example. Let \(h_i\) be some theory that implies a specific rate ofproton decay, but a rate so low that there is only a very smallprobability that any particular proton will decay in a given year.Consider an alternative theory \(h_j\) that implies that protonsnever decay. If \(h_i\) is true, then for a persistent enoughsequence of observations (i.e., if proper detectors can keep trillionsof protons under observation for long enough), eventually a protondecay will almost surely be detected. When this happens, thelikelihood ratio becomes 0. Thus, the posterior probability of \(h_j\)becomes 0.
It is instructive to plug some specific values into the formula givenby the Falsification Theorem, to see what the convergence rate mightlook like. For example, the theorem tells us that if we compare anypair of hypotheses \(h_i\) and \(h_j\) on an evidence stream \(c^n\)that contains at least \(m = 19\) observations or experiments, whereeach has a likelihood \(\delta \ge .10\) of yielding afalsifyingoutcome, then the likelihood (on \(h_{i}\cdot b\cdot c^{n})\) ofobtaining an outcome sequence \(e^n\) that yields likelihood-ratio
\[\frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}] }{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]} = 0,\]will be at least as large as \((1 - (1-.1)^{19}) = .865\). (The readeris invited to try other values of \(\delta\) andm.)
A comment about theneed for andusefulness of suchconvergence theorems is in order, now that we’ve seen one. Givensome specific pair of scientific hypotheses \(h_i\) and \(h_j\) onemay directly compute the likelihood, given \((h_{i}\cdot b\cdotc^{n})\), that a proposed sequence of experiments or observations\(c^n\) will result in one of the sequences of outcomes that wouldyield low likelihood ratios. So, given a specific pair of hypothesesand a proposed sequence of experiments, we don’t need a generalConvergence Theorem to tell us the likelihood of obtainingrefuting evidence. The specific hypotheses \(h_i\) and \(h_j\) tell usthisthemselves. They tell us the likelihood of obtainingeach specific outcome stream, including those that either refute thecompetitor or produce a very small likelihood ratio for it.Furthermore, after we’ve actually performed an experiment andrecorded its outcome, all that matters is the actual ratio oflikelihoods for that outcome. Convergence theorems become moot.
The point of the Likelihood Ratio Convergence Theorem (both theFalsification Theorem and the part of the theorem still to come) is toassure usin advance of considering any specific pair ofhypotheses that if the possible evidence streams that testhypotheses have certain characteristics which reflect the empiricaldistinctness of the two hypotheses, then it is highly likely that oneof the sequences of outcomes will occur that yields a very smalllikelihood ratio. These theorems provide finite lower bounds on howquickly such convergence is likely to be. Thus, they show that theCoA is satisfied in advance of our using the logic to test specific pairsof hypotheses against one another.
The Falsification Theorem applies whenever the evidence streamincludes possible outcomes that mayfalsify the alternativehypothesis. However, it completely ignores the influence of anyexperiments or observations in the evidence stream on which hypothesis\(h_j\) isfully outcome-compatible with hypothesis \(h_i\).We now turn to a theorem that applies to those evidence streams (or toparts of evidence streams) consisting only of experiments andobservations on which hypothesis \(h_j\) isfullyoutcome-compatible with hypothesis \(h_i\). Evidence streams ofthis kind contain nopossibly falsifying outcomes. In suchcases the only outcomes of an experiment or observation \(c_k\) forwhich hypothesis \(h_j\) may specify 0 likelihoods are those for whichhypothesis \(h_i\) specifies 0 likelihoods as well.
Hypotheses whose connection with the evidence is entirely statisticalin nature will usually befully outcome-compatible on theentire evidence stream. So, evidence streams of this kind areundoubtedly much more common in practice than those containingpossibly falsifying outcomes. Furthermore, whenever an entire streamof evidence contains some mixture of experiments and observations onwhich the hypotheses arenot fully outcome compatible alongwith others on which they arefully outcome compatible, wemay treat the experiments and observations for whichfull outcomecompatibility holds as a separate subsequence of the entireevidence stream, to see the likely impact of that part of the evidencein producing values for likelihood ratios.
To cover evidence streams (or subsequences of evidence streams)consisting entirely of experiments or observations on which \(h_j\) isfully outcome-compatible with hypothesis \(h_i\) we willfirst need to identify a useful way to measure the degree to whichhypotheses are empirically distinct from one another on such evidence.Consider some particular sequence of outcomes \(e^n\) that resultsfrom observations \(c^n\). The likelihood ratio \(P[e^n \pmidh_{j}\cdot b\cdot c^{n}] / P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\)itself measures the extent to which the outcome sequence distinguishesbetween \(h_i\) and \(h_j\). But as a measure of the power of evidenceto distinguish among hypotheses, raw likelihood ratios provide arather lopsided scale, a scale that ranges from 0 to infinity with themidpoint, where \(e^n\) doesn’t distinguish at all between\(h_i\) and \(h_j\), at 1. So, rather than using raw likelihood ratiosto measure the ability of \(e^n\) to distinguish between hypotheses,it proves more useful to employ a symmetric measure. The logarithm ofthe likelihood ratio provides such a measure.
Definition: QI—the Quality of the Information.
For each experiment or observation \(c_k\), definethe quality ofthe information provided by possible outcome \(o_{ku}\) fordistinguishing \(h_j\) from \(h_i\), givenb, as follows (wherehenceforth we take “logs” to be base-2):
Similarly, for the sequence of experiments or observations \(c^n\),definethe quality of the information provided by possibleoutcome \(e^n\) for distinguishing \(h_j\) from \(h_i\), givenb, as follows:
\[\QI[e^n \pmid h_i /h_j \pmid b\cdot c^n] = \log\left[\frac{P[e^n \pmid h_{i}\cdot b\cdot c^n]}{P[e^n \pmid h_j\cdot b\cdot c^{n}]}\right].\]That is, QI is the base-2 logarithm of the likelihood ratio for\(h_i\) over that for \(h_j\).
So, we’ll measure theQuality of the Information anoutcome would yield in distinguishing between two hypotheses as thebase-2 logarithm of the likelihood ratio. This is clearly a symmetricmeasure of the outcome’s evidential strength at distinguishingbetween the two hypotheses. On this measure hypotheses \(h_i\) and\(h_j\) assign the same likelihood value to a given outcome \(o_{ku}\)just when \(\QI[o_{ku} \pmid h_i /h_j \pmid b\cdot c_k] =0\). Thus, QI measures information on a logarithmic scale that issymmetric about the natural no-information midpoint, 0. This measureis set up so thatpositive information favors \(h_i\) over\(h_j\), andnegative information favors \(h_j\) over\(h_i\).
Given theIndependent Evidence Assumptions with respect toeach hypothesis, it’s easy to show that the QI for a sequence ofoutcomes is just the sum of the QIs of the individual outcomes in thesequence:
\[\tag{15} \QI[e^n \pmid h_i /h_j \pmid b\cdot c^n] =\sum^{n}_{k = 1} \QI[e_k \pmid h_i /h_j \pmid b\cdot c_k]. \]Probability theorists measure theexpected value of aquantity by first multiplying each of itspossible values bytheir probabilities of occurring, and then summing these products.Thus, theexpected value of QI is given by the followingformula:
Definition: EQI—the Expected Quality of theInformation.
We adopt the convention that if \(P[o_{ku} \pmid h_{i}\cdot b\cdotc_{k}] = 0\), then the term \(\QI[o_{ku} \pmid h_i /h_j \pmid b\cdotc_k] \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] = 0\). Thisconvention will make good sense in the context of the followingdefinition because, whenever the outcome \(o_{ku}\) has 0 probabilityof occurring according to \(h_i\) (together with \(b\cdot c_k)\), itmakes good sense to give it 0 impact on the ability of the evidence todistinguish between \(h_j\) and \(h_i\) when \(h_i\) (together with\(b\cdot c_k)\) is true. Also notice that thefulloutcome-compatibility of \(h_j\) with \(h_i\) on \(c_k\) meansthat whenever \(P[e_k \pmid h_{j}\cdot b\cdot c_{k}] = 0\), we musthave \(P[e_k \pmid h_{i}\cdot b\cdot c_{k}] = 0\) as well; so wheneverthe denominator would be 0 in the term
the convention just described makes the term
\[\QI[o_{ku} \pmid h_i /h_j \pmid b\cdot c_k] \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] = 0.\]Thus the following notion is well-defined:
For \(h_j\)fully outcome-compatible with \(h_i\) onexperiment or observation \(c_k\), define
\[\EQI[c_k \pmid h_i /h_j \pmid b] = \sum_u \QI[o_{ku} \pmid h_i /h_j \pmid b\cdot c_k] \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}].\]Also, for \(h_j\)fully outcome-compatible with \(h_i\) oneach experiment and observation in the sequence \(c^n\), define
\[\EQI[c^n \pmid h_i /h_j \pmid b] = \sum_{e^n\in E^n} \QI[e^n \pmid h_i /h_j \pmid b\cdot c^n] \times P[e^n \pmid h_{i}\cdot b\cdot c^{n}]. \]The EQI of an experiment or observation is theExpected Quality ofits Information for distinguishing \(h_i\) from \(h_j\) when\(h_i\) is true. It is a measure of the expected evidential strengthof the possible outcomes of an experiment or observation atdistinguishing between the hypotheses when \(h_i\) (together with\(b\cdot c)\) is true. Whereas QI measures the ability of eachparticular outcome or sequence of outcomes to empirically distinguishhypotheses, EQI measures the tendency of experiments or observationsto produce distinguishing outcomes. It can be shown that EQI tracksempirical distinctness in a very precise way. We return to this in amoment.
It is easily seen that the EQI for a sequence of observations \(c^n\)is just the sum of the EQIs of the individual observations \(c_k\) inthe sequence:
\[\tag{16} \EQI[c^n \pmid h_i /h_j \pmid b] = \sum^{n}_{k=1} \EQI[c_k \pmid h_i /h_j \pmid b_{}]. \](For proof seeSupplement: Proof that the EQI for \(c^n\) is the sum of the EQI for the individual \(c_k\).)
This suggests that it may be useful to average the values of the\(\EQI[c_k \pmid h_i /h_j \pmid b]\) over the number of observationsn to obtain a measure of theaverage expected quality ofthe information among the experiments and observations that makeup the evidence stream \(c^n\).
Definition: The Average Expected Quality ofInformation
For \(h_j\)fully outcome-compatible with \(h_i\) on eachexperiment and observation in the evidence stream \(c^n\), define theaverage expected quality of information, \(\bEQI\), from \(c^n\) fordistinguishing \(h_j\) from \(h_i\), given \(h_i\cdot b\), asfollows:
It turns out that the value of \(\EQI[c_k \pmid h_i /h_j \pmid b_{}]\)cannot be less than 0; and it must be greater than 0 just in case\(h_i\) isempirically distinct from \(h_j\) on at least oneoutcome \(o_{ku}\)—i.e., just in case it isempiricallydistinct in the sense that \(P[o_{ku} \pmid h_{i}\cdot b\cdotc_{k}] \ne P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}]\), for at least oneoutcome \(o_{ku}\). The same goes for the average, \(\bEQI[c^n \pmidh_i /h_j \pmid b]\).
Theorem: Nonnegativity of EQI.
\(\EQI[c_k \pmid h_i /h_j \pmid b_{}] \ge 0\); and \(\EQI[c_k \pmidh_i /h_j \pmid b_{}] \gt 0\)if and only if for at least oneof its possible outcomes \(o_{ku}\),
\[P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \ne P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}].\]As a result, \(\bEQI[c^n \pmid h_i /h_j \pmid b] \ge 0\); and\(\bEQI[c^n \pmid h_i /h_j \pmid b] \gt 0\)if and only if atleast one experiment or observation \(c_k\) has at least one possibleoutcome \(o_{ku}\) such that
\[P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \ne P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}].\](For proof, seeSupplement: The Effect on EQI of Partitioning the Outcome Space More Finely—Including Proof of the Nonnegativity of EQI.)
In fact, the more finely one partitions the outcome space \(O_{k} =\{o_{k1},\ldots ,o_{kv},\ldots ,o_{kw}\}\) into distinct outcomes thatdiffer on likelihood ratio values, the larger EQI becomes.[15] This shows that EQI tracks empirical distinctness in a precise way.The importance of theNon-negativity of EQI result for theLikelihood Ratio Convergence Theorem will become clear in amoment.
We are now in a position to state the second part of theLikelihood Ratio Convergence Theorem. It applies to allevidence streams not containingpossibly falsifying outcomesfor \(h_j\) when \(h_i\) holds—i.e., it applies to all evidencestreams for which \(h_j\) isfully outcome-compatible with\(h_i\) on each \(c_k\) in the stream.
Likelihood Ratio Convergence Theorem 2—The ProbabilisticRefutation Theorem.
Suppose the evidence stream \(c^n\) contains only experiments orobservations on which \(h_j\) isfully outcome-compatiblewith \(h_i\)—i.e., suppose that for each condition \(c_k\) insequence \(c^n\), for each of its possible outcomes possible outcomes\(o_{ku}\), either \(P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] = 0\) or\(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\). In addition (as aslight strengthening of the previous supposition), for some \(\gamma\gt 0\) a number smaller than \(1/e^2\) (\(\approx .135\); wheree’ is the base of the natural logarithm), suppose thatfor each possible outcome \(o_{ku}\) of each observation condition\(c_k\) in \(c^n\), either \(P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] =0\) or
\[\frac{P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}]}{P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]} \ge \gamma.\]And suppose that theIndependent Evidence Conditions hold forevidence stream \(c^n\) with respect to each of these hypotheses. Now,choose any positive \(\varepsilon \lt 1\), as small as you like, butlarge enough (for the number of observationsn beingcontemplated) that the value of
\[\bEQI[c^n \pmid h_i /h_j \pmid b] \gt -\frac{(\log \varepsilon)}{n}.\]Then:
\[\begin{multline}P\left[\vee \left\{ e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}]}{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]} \lt \varepsilon \right\} \pmid h_{i}\cdot b\cdot c^{n}\right]\\[2ex]\gt 1 - \frac{1}{n} \times \frac{(\log \gamma)^2}{(\bEQI[c^n \pmid h_i /h_j \pmid b] + (\log \varepsilon)/n)^2}\end{multline}\]For \(\varepsilon = 1/2^m\) and \(\gamma = 1/2^q\), this formulabecomes,
\[\begin{multline}P\left[\vee \left\{ e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^{n}]}{P[e^n \pmid h_{i}\cdot b\cdot c^{n}]} \lt 1/2^m\right\} \pmid h_{i}\cdot b\cdot c^{n}\right]\\ \gt 1 - \frac{1}{n} \times\frac{q^2}{(\bEQI[c^n \pmid h_i /h_j \pmid b] - (m/n) )^2} \end{multline}\](For proof seeSupplement: Proof of the Probabilistic Refutation Theorem.)
This theorem provides sufficient conditions for thelikelyrefutation of false alternatives via exceeding small likelihoodratios. The conditions under which this happens characterize thedegree to which the hypotheses involved are empirically distinct fromone another. The theorem says that when these conditions are met,according to hypothesis \(h_i\) (taken together with \(b\cdot c^n)\),the likelihood is near 1 that that one of the outcome sequence \(e^n\)will occur for which the likelihood ratio is smaller than\(\varepsilon\) (for any value of \(\varepsilon\) you may choose). Thelikelihood of getting such an evidential outcome \(e^n\) is quiteclose to 1—i.e., no more than the amount
\[\frac{1}{n} \times \frac{(\log \gamma)^2}{\left(\bEQI[c^n \pmid h_i /h_j \pmid b] + \frac{(\log \varepsilon)}{n}\right)^2}\]below 1. (Notice that this amount below 1 goes to 0 asnincreases.)
It turns out that in almost every case (for almost any pair ofhypotheses) the actual likelihood of obtaining such evidence (i.e.,evidence that has a likelihood ratio value less than \(\varepsilon)\)will bemuch closer to 1 than this factor indicates.[16] Thus, the theorem provides an overly cautious lower bound on thelikelihood of obtaining small likelihood ratios. It shows that thelarger the value of \(\bEQI\) for an evidence stream, the more likelythat stream is to produce a sequence of outcomes that yield a verysmall likelihood ratio value. But even if \(\bEQI\) remains quitesmall, a long enough evidence stream,n, of such low-gradeevidence will, nevertheless, almost surely produce an outcome sequencehaving a very small likelihood ratio value.[17]
Notice that the antecedent condition of the theorem, that“either
\[P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] = 0\]or
\[\frac{P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}]}{P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]} \ge \gamma,\]for some \(\gamma \gt 0\) but less than \(1/e^2\) (\(\approx.135\))”, does not favor hypothesis \(h_i\) over \(h_j\) in anyway. The condition only rules out the possibility that some outcomesmight furnishextremely strong evidenceagainst\(h_j\) relative to \(h_i\)—by making \(P[o_{ku} \pmidh_{i}\cdot b\cdot c_{k}] = 0\) or by making
\[\frac{P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] }{P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]}\]less than some quite small \(\gamma\). This condition is only neededbecause our measure of evidential distinguishability, QI, blows upwhen the ratio
\[\frac{P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}]}{P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]}\]is extremely small. Furthermore, this condition is really norestriction at all on possible experiments or observations. If \(c_k\)has some possible outcome sentence \(o_{ku}\) that would make
\[\frac{P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}]}{P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]} \lt \gamma\](for a given small \(\gamma\) of interest), one may disjunctively lump\(o_{ku}\) together with some other outcome sentence \(o_{kv}\) for\(c_k\). Then, the antecedent condition of the theorem will besatisfied, but with the sentence ‘\((o_{ku} \veeo_{kv})\)’ treated as a single outcome. It can be proved thatthe only effect of such “disjunctive lumping” is to make\(\bEQI\) smaller than it would otherwise be (whereas larger values of\(\bEQI\) are more desirable). If thetoo strongly refutingdisjunct \(o_{ku}\) actually occurs when the experiment or observation\(c_k\) is conducted, all the better, since this results in alikelihood ratio
\[\frac{P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}]}{P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]}\]smaller than \(\gamma\) on that particular evidential outcome. Wemerely failed to take thismore strongly refuting possibilityinto account when computing our lower bound on thelikelihood thatrefutation via likelihood ratios would occur.
The point of the twoConvergence Theorems explored in thissection is to assure us, in advance of the consideration of anyspecific pair of hypotheses, that if the possible evidence streamsthat test them have certain characteristics which reflect theirevidential distinguishability, it is highly likely that outcomesyielding small likelihood ratios will result. These theorems providefinite lower bounds on how quickly convergence is likely to occur.Thus, there is no need to wait through some infinitely long run forconvergence to occur. Indeed, for any evidence sequence on which theprobability distributions are at all well behaved, theactuallikelihood of obtaining outcomes that yield small likelihoodratio values will inevitably bemuch higher than the lowerbounds given by Theorems 1 and 2.
In sum, according to Theorems 1 and 2, each hypothesis \(h_i\)says, via likelihoods, that given enough observations,it is very likely to dominate its empirically distinct rivalsin a contest of likelihood ratios. The true hypothesis speakstruthfully about this, and its competitors lie. Even a sequence ofobservations with an extremely lowaverage expected quality ofinformation is very likely to do the job if that evidentialsequence is long enough. Thus (byEquation 9*), as evidence accumulates, thedegree of support for falsehypotheses will very probably approach 0, indicating that they areprobably false; and as this happens, (by Equations 10 and 11) thedegree of support for the true hypothesis will approach 1, indicatingits probable truth. Thus, theCriterion of Adequacy(CoA) is satisfied.
Up to this point we have been supposing that likelihoods possessobjective or agreed numerical values. Although this supposition isoften satisfied in scientific contexts, there are important settingswhere it is unrealistic, where hypotheses only support vaguelikelihood values, and where there is enough ambiguity in whathypothesessay about evidential claims that the scientificcommunity cannot agree on precise values for the likelihoods ofevidential claims.[18] Let us now see how the supposition of precise, agreed likelihoodvalues may be relaxed in a reasonable way.
Recall why agreement, or near agreement, on precise values forlikelihoods is so important to the scientific enterprise. To theextent that members of a scientific community disagree on thelikelihoods, they disagree about the empirical content of theirhypotheses, about what each hypothesissays about how theworld is likely to be. This can lead to disagreement about whichhypotheses are refuted or supported by a given body of evidence.Similarly, to the extent that the values of likelihoods are onlyvaguely implied by hypotheses as understood by an individual agent,that agent may be unable to determine which of several hypotheses isrefuted or supported by a given body of evidence.
We have seen, however, that the individual values of likelihoods arenot really crucial to the way evidence impacts hypotheses. Rather, asEquations 9–11 show, it isratios of likelihoods thatdo the heavy lifting. So, even if two support functions \(P_{\alpha}\)and \(P_{\beta}\) disagree on the values of individual likelihoods,they may, nevertheless, largely agree on the refutation or supportthat accrues to various rival hypotheses, provided that the followingcondition is satisfied:
When this condition holds, the evidence will support \(h_i\) over\(h_j\) according to \(P_{\alpha}\) just in case it does so for\(P_{\beta}\) as well, although the strength of support may differ.Furthermore, although therate at which the likelihood ratiosincrease or decrease on a stream of evidence may differ for the twosupport functions, the impact of the cumulative evidence shouldultimately affect their refutation or support in much the same way.
When likelihoods are vague or diverse, we may take an approach similarto that we employed forvague anddiverse priorplausibility assessments. We may extend thevagueness setsfor individual agents to include a collection of inductive supportfunctions that cover the range of values for likelihood ratios ofevidence claims (as well as cover the ranges of comparative supportstrengths for hypotheses due to plausibility arguments withinb, as represented by ratios of prior probabilities). Similarly,we may extend thediversity sets for communities of agents toinclude support functions that cover the ranges of likelihood ratiovalues that arise within thevagueness sets of members of thescientific community.
This broadening ofvagueness anddiversity sets toaccommodate vague and diverse likelihood values makes no trouble fortheconvergence to truth results for hypotheses. For,provided that theDirectional Agreement Condition issatisfied by all support functions in an extendedvaguenessordiversity set under consideration, theLikelihoodRatio Convergence Theorem applies to each individual supportfunction in that set. For, the the proof of that convergence theoremdoesn’t depend on the supposition that likelihoods are objectiveor have intersubjectively agreed values. Rather, it applies to eachindividual support function \(P_{\alpha}\). The only possible problemwith applying this result across a range of support functions is thatwhen their values for likelihoods differ, function \(P_{\alpha}\) maydisagree with \(P_{\beta}\) on which of the hypotheses is favored by agiven sequence of evidence. That can happen because different supportfunctions may represent the evidential import of hypothesesdifferently, by specifying different likelihood values for the verysame evidence claims. So, an evidence stream that favors \(h_i\)according to \(P_{\alpha}\) may instead favor \(h_j\) according to\(P_{\beta}\). However, when theDirectional AgreementCondition holds for a given collection of support functions, thisproblem cannot arise.Directional Agreement means that theevidential import of hypotheses is similar enough for \(P_{\alpha}\)and \(P_{\beta}\) that a sequence of outcomes may favor a hypothesisaccording to \(P_{\alpha}\) only if it does so for \(P_{\beta}\) aswell.
Thus, when theDirectional Agreement Condition holds for allsupport functions in avagueness ordiversity setthat is extended to include vague or diverse likelihoods, and providedthat enough evidentially distinguishing experiments or observationscan be performed, all support functions in the extendedvagueness ordiversity set will very probably cometo agree that the likelihood ratios for empirically distinct falsecompetitors of a true hypothesis are extremely small. As that happens,the community comes to agree on the refutation of these competitors,and the true hypothesis rises to the top of the heap.[20]
What if the true hypothesis has evidentially equivalent rivals? Theirposterior probabilities must rise as well. In that case we are onlyassured that the disjunction of the true hypothesis with itsevidentially equivalent rivals will be driven to 1 as evidence layslow its evidentially distinct rivals. The true hypothesis will itselfapproach 1 only if either it has no evidentially equivalent rivals, orwhatever equivalent rivals it does have can be laid low byplausibility arguments of a kind that don’t depend on theevidential likelihoods, but only show up via the comparativeplausibility assessments represented by ratios of priorprobabilities.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up this entry topic at theIndiana Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry atPhilPapers, with links to its database.
Bayes' Theorem |epistemology: Bayesian |probability, interpretations of
Thanks to Alan Hájek, Jim Joyce, and Edward Zalta for manyvaluable comments and suggestions. The editors and author also thankGreg Stokley and Philippe van Basshuysen for carefully reading anearlier version of the entry and identifying a number of typographicalerrors.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2016 byThe Metaphysics Research Lab, Center for the Study of Language and Information (CSLI), Stanford University
Library of Congress Catalog Data: ISSN 1095-5054