Probability is the most important concept in modern science,especially as nobody has the slightest notion what it means.—Bertrand Russell, 1929 Lecture
(cited in Bell 1945, 587)
One regularly reads and hears probabilistic claims like the following:
But what do these statements mean? This may be understood as ametaphysical question about what kinds of things are probabilities, ormore generally as a question about what makes probability statementstrue or false. At a first pass, variousinterpretations ofprobability answer this question, one way or another.
However, there is also a stricter usage: an‘interpretation’of a formal theory providesmeanings for its primitive symbols or terms, with an eye to turningits axioms and theorems into true statements about some subject. Inthe case of probability, Kolmogorov’s axiomatization (which wewill see shortly) is the usual formal theory, and the so-called‘interpretations of probability’ usually interpretit. That axiomatization introduces a function‘\(P\)’ that has certain formal properties. We may thenask ‘What is \(P\)?’. Several of the views that we willdiscuss also answer this question, one way or another.
Our topic is complicated by the fact that there are variousalternative formalizations of probability. Moreover, as we will see,some of the leading ‘interpretations of probability’ donot obey all of Kolmogorov’s axioms, yet they have notlost their title for that. And various other quantities that havenothing to do with probabilitydo satisfy Kolmogorov’saxioms, and thus are ‘interpretations’ of it in the strictsense: normalized mass, length, area, volume, and other quantitiesthat fall under the scope of measure theory, the abstract mathematicaltheory that generalizes such quantities. Nobody seriously considersthese to be ‘interpretations of probability’, however,because they do not play the right role in our conceptualapparatus.
Perhaps we would do better, then, to think of the interpretations asanalyses of various concepts of probability. Or perhapsbetter still, we might regard them asexplications of suchconcepts, refining them to be fruitful for philosophical andscientific theorizing (à la Carnap 1950, 1962).
However we think of it, the project of finding such interpretations isan important one. Probability is virtually ubiquitous. It plays a rolein almost all the sciences. It underpins much of the social sciences— witness the prevalent use of statistical testing, confidenceintervals, regression methods, and so on. It finds its way, moreover,into much of philosophy. In epistemology, the philosophy of mind, andcognitive science, we see states of opinion being modeled bysubjective probability functions, and learning being modeled by theupdating of such functions. Since probability theory is central todecision theory and game theory, it has ramifications for ethics andpolitical philosophy. It figures prominently in such staples ofmetaphysics as causation and laws of nature. It appears again in thephilosophy of science in the analysis of confirmation of theories,scientific explanation, and in the philosophy of specific scientifictheories, such as quantum mechanics, statistical mechanics,evolutionary biology, and genetics. It can even take center stage inthe philosophy of logic, the philosophy of language, and thephilosophy of religion. Thus, problems in the foundations ofprobability bear at least indirectly, and sometimes directly, uponcentral scientific, social scientific, and philosophical concerns. Theinterpretation of probability is one of the most important suchfoundational problems.
Probability theory was a relative latecomer in intellectual history.To be sure, proto-probabilistic ideas concerning evidence andinference date back to antiquity (see Franklin 2001). However,probability’s mathematical treatment had to wait until theFermat-Pascal correspondence, and their analysis of games of chance in17th century France. Its axiomatization had to wait stilllonger, in Kolmogorov’s classicFoundations of the Theory ofProbability (1933). Roughly, probabilities lie between 0 and 1inclusive, and they are additive. More formally, let \(\Omega\) be anon-empty set (‘the universal set’). Afield (oralgebra) on \(\Omega\) is a set \(\mathbf{F}\) of subsets of\(\Omega\) that has \(\Omega\) as a member, and that is closed undercomplementation (with respect to \(\Omega)\) and union. Let \(P\) be afunction from \(\mathbf{F}\) to the real numbers obeying:
Call \(P\) aprobability function, and \((\Omega ,\mathbf{F}, P)\) aprobability space. This isKolmogorov’s “elementary theory of probability”.
The non-negativity and normalization axioms are largely matters ofconvention, although it is non-trivial that probability functions takeat least the two values 0 and 1, and that they have a maximal value(unlike various other measures, such as length, volume, and so on,which are unbounded). We will return to finite additivity at a numberof points below.
We may now apply the theory to various familiar cases. For example, wemay represent the results of tossing a single die once by the set\(\Omega = \{1, 2, 3, 4, 5, 6\}\), and we could let \(\mathbf{F}\) bethe set of all subsets of \(\Omega\). Under the natural assignment ofprobabilities to members of \(\mathbf{F}\), we obtain such welcomeresults as the following:
and so on.
We could instead attach probabilities to members of a collection\(\mathbf{S}\) ofsentences of a formal language, closedunder (countable) truth-functional combinations, with the followingcounterpart axiomatization:
The bearers of probabilities are sometimes also called“events”, “outcomes”, or“propositions”, but the underlying formalism remains thesame. More attention has been given to interpreting‘\(P\)’ than to interpreting its bearers; we will beconcerned with the former.
The elementary theory of probability suffices for most everydayapplications of probability, and it will suffice for most of ourdiscussion below. But more advanced treatments in mathematics,statistics, and science require more mathematical sophisticationinvolvingcountable infinite extensions. (Readers lessinterested in the mathematical details may want to skip to "Theconditional probability ..." three paragaphs below.) Now let usstrengthen our closure assumptions regarding \(\mathbf{F}\), requiringit to be closed under complementation andcountable union; itis then called asigma field (orsigma algebra) on\(\Omega\). It is controversial whether we should strengthen finiteadditivity, as Kolmogorov does:
Kolmogorov comments that infinite probability spaces are idealizedmodels of real random processes, and that he limits himselfarbitrarily to only those models that satisfy countable additivity.This axiom is the cornerstone of the assimilation of probabilitytheory to measure theory.
The conditional probability of A given B is then given by theratio of unconditional probabilities:
\[ P(A\mid B) = \frac{P(A\cap B)}{P(B)},\text{ provided } P(B) \gt 0. \]This is often taken to be thedefinition of conditionalprobability, although it should be emphasized that this is a technicalusage of the term that may not align perfectly with a pretheoreticalconcept that we might have (see Hájek, 2003). We recognize itin locutions such as “the probability that the die lands 1,given that it lands odd, is 1/3”, or “the probability thatit will rain tomorrow, given that there are dark clouds in the skytomorrow morning, is high”. It is the concept of the probabilityof somethinggiven orin the light of some piece ofevidence or information. Indeed, some authors take conditionalprobability to be the primitive notion, and axiomatize it directly(e.g. Popper 1959b, Rényi 1970, van Fraassen 1976, Spohn 1986,and Roeper and Leblanc 1999).
There are other formalizations that give up normalization; that giveup countable additivity, and even additivity; that allow probabilitiesto take infinitesimal values (positive, but smaller than everypositive real number); that allow probabilities to be imprecise— interval-valued, or more generally represented with sets ofprecise probability functions; and that treat probabilitiescomparatively rather than quantitatively. (See Fine 1974, Halpern2003, Cozman 2016, Fine 2016, Hawthorne 2016, Lyon 2016.) For now,however, when we speak of ‘the probability calculus’, wewill mean Kolmogorov’s approach, as is standard. SeeHájek and Hitchcock (2016b) for a relatively non-technicalintroduction to it, intended for philosophers.
Given certain probabilities as inputs, the axioms and theorems allowus to compute various further probabilities. However, apart from theassignment of 1 to the universal set and 0 to the empty set, they aresilent regarding the initial assignment of probabilities.[1] For guidance with that, we need to turn to the interpretations ofprobability. First, however, let us list some criteria of adequacy forsuch interpretations.
What criteria are appropriate for assessing the cogency of a proposedinterpretation of probability? Of course, an interpretation should beprecise, unambiguous, non-circular, and use well-understoodprimitives. But those are really prescriptions for good philosophizinggenerally; what do we want from our interpretationsofprobability, specifically? We begin by following Salmon (1966,64), although we will raise some questions about his criteria, andpropose some others. He writes:
Admissibility. We say that an interpretation of a formalsystem is admissible if the meanings assigned to the primitive termsin the interpretation transform the formal axioms, and consequentlyall the theorems, into true statements. A fundamental requirement forprobability concepts is to satisfy the mathematical relationsspecified by the calculus of probability…
Ascertainability. This criterion requires that there be somemethod by which, in principle at least, we can ascertain values ofprobabilities. It merely expresses the fact that a concept ofprobability will be useless if it is impossible in principle to findout what the probabilities are…
Applicability. The force of this criterion is best expressedin Bishop Butler’s famous aphorism, “Probability is thevery guide of life.”…
It might seem that the criterion of admissibility goes without saying.The word ‘interpretation’ is often used in such a way that‘admissible interpretation’ is a pleonasm. Yet it turnsout that the criterion is non-trivial, and indeed if taken seriouslywould rule out several of the leading interpretations of probability!As we will see, some of them fail to satisfy countable additivity; forothers (certain propensity interpretations) the status of at leastsome of the axioms is unclear. Nevertheless, we regard them as genuinecandidates. It should be remembered, moreover, that Kolmogorov’sis just one of many possible axiomatizations, and there is notuniversal agreement on which is ‘best’ (whatever thatmight mean). Indeed, Salmon’s preferred axiomatization differsfrom Kolmogorov’s.[2] Thus, there is no such thing as admissibilitytout court,but rather admissibility with respect to this or that axiomatization.In any case, if we found an inadmissible interpretation (with respectto Kolmogorov’s axiomatization) that did a wonderful job ofmeeting the criteria of ascertainability and applicability, then weshould surely embrace it.
So let us turn to those criteria. It is a little unclear in theascertainability criterion just what “in principle”amounts to – it outruns what is practical or feasible –though perhaps some latitude here is all to the good. Most of the workwill be done by the applicability criterion. We must say more (asSalmon indeed does) about whatsort of a guide to lifeprobability is supposed to be. Mass, length, area and volume are alluseful concepts, and they are ‘guides to life’ in variousways (think how critical distance judgments can be to survival);moreover, they are admissible and ascertainable, so presumably it isthe applicability criterion that will rule them out. Perhaps it isbest to think of applicability as a cluster of criteria, each of whichis supposed to capture something of probability’s distinctiveconceptual roles; moreover, we should not require that all of them bemet by a given interpretation. They include:
Non-triviality: an interpretation should make non-extremeprobabilities at least a conceptual possibility. For example, supposethat we interpret ‘\(P\)’ as thetruth function:it assigns the value 1 to all true sentences, and 0 to all falsesentences. Then trivially, all the axioms come out true, so thisinterpretation is admissible. We would hardly count it as an adequateinterpretation ofprobability, however, and so weneed to exclude it. It is essential to probability that, at least inprinciple, it can takeintermediate values. All of theinterpretations that we will present meet this criterion, so we willdiscuss it no more.
Applicability to frequencies: an interpretation should renderperspicuous the relationship between probabilities and (long-run)frequencies. Among other things, it should make clear why, by andlarge, more probable events occur more frequently than less probableevents.
Applicability to rational beliefs: an interpretation shouldclarify the role that probabilities play in constraining the degreesof belief, orcredences, of rational agents. Among otherthings, knowing that one event is more probable than another, arational agent will be more confident about the occurrence of theformer event.
Applicability to rational decisions: an interpretation shouldmake clear how probabilities figure in rational decision-making. Thisseems especially apposite for a ‘guide to life’.
Applicability to ampliative inferences: an interpretationwill score bonus points if it illuminates the distinction between‘good’ and ‘bad’ ampliative inferences, whileexplicating why both fall short of deductive inferences.
Applicability to science: an interpretation should illuminateparadigmatic uses of probability in science (for example, in quantummechanics and statistical mechanics).
Perhaps there are furthermetaphysical desiderata that wemight impose on the interpretations. For example, there appear to beconnections between probability andmodality. Events withpositive probabilitycan happen, even if they don’t.Some authors also insist on the converse condition thatonlyevents with positive probability can happen, although this is morecontroversial — see our discussion of ‘regularity’in Section 3.3.4. (Indeed, in uncountable probability spaces thiscondition will require the employment of infinitesimals, and will thustake us beyond the standard Kolmogorov theory —‘standard’ both in the sense of being the orthodoxy, andin its employment of standard, as opposed to‘non-standard’ real numbers. See Skyrms 1980.) In anycase, our list is already long enough to help in our assessment of theleading interpretations on the market.
Broadly speaking, there are arguably three main concepts ofprobability:
Some philosophers will insist that not all of these concepts areintelligible; some will insist that one of them is basic, and that theothers are reducible to it. Moreover, the boundaries between theseconcepts are somewhat permeable. After all, ‘degree ofconfidence’ is itself an epistemological concept, and as we willsee, it is thought to be rationally constrained both by evidentialsupport relations and by attitudes to physical probabilities in theworld. And there are intramural disputes within the camps supportingeach of these concepts, as we will also see. Be that as it may, itwill be useful to keep these concepts in mind. Sections 3.1 and 3.2discuss analyses of concept (1),classical andlogical/evidential probability; 3.3 discusses analyses ofconcept (2),subjective probability; 3.4, 3.5, and 3.6discuss three analyses of concept (3),frequentist,propensity, andbest-system interpretations.
The classical interpretation owes its name to its early and augustpedigree. It was championed by de Moivre and Laplace, and inchoateversions of it may be found in the works of Pascal, Bernoulli,Huygens, and Leibniz. It assigns probabilities in the absence of anyevidence, or in the presence of symmetrically balanced evidence. Theguiding idea is that in such circumstances, probability is sharedequally among all the possible outcomes, so that the classicalprobability of an event is simply the fraction of the total number ofpossibilities in which the event occurs. It seems especially wellsuited to those games of chance that by their very design create suchcircumstances — for example, the classical probability of a fairdie landing with an even number showing up is 3/6. It is oftenpresupposed (usually tacitly) in textbook probability puzzles.
Here is a classic statement by de Moivre:
[I]f we constitute a fraction whereof the numerator be the number ofchances whereby an event may happen, and the denominator the number ofall the chances whereby it may either happen or fail, that fractionwill be a proper designation of the probability of happening. (1718;1967, 1–2)Laplace gives the best-known but slightly differentformulation:
The theory of chances consists in reducing all events of the same kindto a certain number of equally possible cases, that is to say, tocases whose existence we are equally uncertain of, and in determiningthe number of cases favourable to the event whose probability issought. The ratio of this number to that of all possible cases is themeasure of this probability, which is thus only a fraction whosenumerator is the number of favourable cases, and whose denominator isthe number of all possible cases. (1814; 1999, 4)
We may ask a number of questions about this formulation. When areevents of the same kind? Intuitively, ‘heads’ and‘tails’ are equally likely outcomes of tossing a faircoin; but if their kind is ‘ways the coin could land’,then ‘edge’ should presumably be counted alongside them.The “certain number of equally possible cases” and“the number of all possible cases” are presumably finitenumbers. What, then, of probabilities in infinite spaces? Apparently,irrational-valued probabilities such as \(1/\sqrt{2}\) areautomatically eliminated, and thus theories such as quantum mechanicsthat posit them cannot be accommodated. (We will shortly see, however,that Laplace’s theory has been refined to handle infinitespaces.)
Who are “we”, who “are equally uncertain”?Different people may be equally undecided about different things,which suggests that Laplace is offering a subjectivist interpretationin which probabilities vary from person to person depending oncontingent differences in their evidence. Yet he means to characterizethe objective probability assignment of a rational agent in anepistemically neutral position with respect to a set of “equallypossible” cases. But then the proposal risks sounding empty: forwhat is it for an agent tobe “equally uncertain”about a set of cases, other than assigning them equal probability?
This brings us to one of the key objections to Laplace’saccount. The notion of “equally possible” cases faces thecharge of either being a category mistake (for‘possibility’ does not come in degrees), or circular (forwhat is meant is really ‘equally probable’). The notion isfinessed by the so-called ‘principle of indifference’, acoinage due to Keynes (although he was no friend of the principle):“if there is no known reason for predicating of our subject onerather than another of several alternatives, then relatively to suchknowledge the assertions of each of these alternatives have an equalprobability” (1921, 52–53). (The ‘principle of equalprobability’ would be a better name.) Thus, it might be claimed,there is no circularity in the classical interpretation after all.However, this move may only postpone the problem, for there is still athreat of circularity, albeit at a lower level. We have two caseshere: outcomes for which we haveno evidence(“reason”)at all, and outcomes for which we havesymmetrically balanced evidence. There is no circularity inthe first case unless the notion of ‘evidence’ is itselfprobabilistic; but artificial examples aside, it is doubtful that thecase ever arises. For example, we have a considerable fund of evidenceon coin tossing from the results of our own experiments, the testimonyof others, our knowledge of some of the relevant physics, and so on.In the second case, the threat of circularity is more apparent, for itseems that some sort ofweighing of the evidence in favor ofeach outcome is required, and this seems to require a reference toprobability. Indeed, the most obvious characterization ofsymmetrically balanced evidence is in terms of equality of conditionalprobabilities: given evidence \(E\) and possible outcomes \(O_1, O_2 ,\ldots ,O_n\), the evidence is symmetrically balanced iff \(P(O_1\midE) = P(O_2\mid E) = \ldots = P(O_n\mid E)\). Then it seems thatprobabilities reside at the base of the interpretation after all.Still, it would be an achievement if all probabilities could bereduced to cases of equal probability. See Zabell (2016) for furtherdiscussion of the classical interpretation and the principle ofindifference.
When the spaces are countably infinite, the spirit of the classicaltheory may be upheld by appealing to the information-theoreticprinciple ofmaximum entropy, a generalization of theprinciple of indifference championed by Jaynes (1968). Entropy is ameasure of the lack of ‘informativeness’ of a probabilityfunction. The more concentrated is the function, the less is itsentropy; the more diffuse it is, the greater is its entropy. For adiscrete assignment of probabilities \(P = (p_1, p_2,\ldots)\), theentropy of \(P\) is defined as:
\[ -\sum_i p_i\log p_i \](For more explanation of this formula see the entry onInformation.)
The principle of maximum entropy enjoins us to select from the familyof all probability functions consistent with our background knowledgethe function that maximizes this quantity. In the special case ofchoosing the most uninformative probability function over a finite setof possible outcomes, this is just the familiar ‘flat’classical assignment discussed previously. Things get more complicatedin the infinite case, since there cannot be a flat assignment overdenumerably many outcomes, on pain of violating the standardprobability calculus (with countable additivity). Rather, the best wecan have are sequences of progressively flatter assignments, none ofwhich is truly flat. We must then impose somefurtherconstraint that narrows the field to a smaller family in which thereis an assignment of maximum entropy.[3] This constraint has to be imposed from outside as backgroundknowledge, but there is no general theory of which external constraintshould be applied when. See Seidenfeld (1986) for mathematical resultsregarding maximum entropy and a critique of it.
Let us turn now to uncountably infinite spaces. It is easy — alltoo easy — to assign equal probabilities to the points in such aspace: each gets probability 0. Non-trivial probabilities arise whenuncountably many of the points are clumped together in larger sets. Ifthere are finitely many clumps, Laplace’s classical theory maybe appealed to again: if the evidence bears symmetrically on theseclumps, each gets the same share of probability.
Enter Bertrand’s paradoxes (1889). They all arise in uncountablespaces and turn on alternative parametrizations of a given problemthat are non-linearly related to each other. Some presentations areneedlessly arcane; length and area suffice to make the point. Thefollowing example (adapted from van Fraassen 1989) nicely illustrateshow Bertrand-style paradoxes work. A factory produces cubes withside-length between 0 and 1 foot; what is the probability that arandomly chosen cube has side-length between 0 and 1/2 a foot? Theclassical intepretation’s answer is apparently 1/2, as weimagine a process of production that is uniformly distributed overside-length. But the question could have been given an equivalentrestatement: A factory produces cubes with face-area between 0 and 1square-feet; what is the probability that a randomly chosen cube hasface-area between 0 and 1/4 square-feet? Now the answer is apparently1/4, as we imagine a process of production that is uniformlydistributed over face-area. This is already disastrous, as we cannotallow the same event to have two different probabilities (especiallyif this interpretation is to be admissible!). But there is worse tocome, for the problem could have been restated equivalently again: Afactory produces cubes with volume between 0 and 1 cubic feet; what isthe probability that a randomly chosen cube has volume between 0 and1/8 cubic-feet? Now the answer is apparently 1/8, as we imagine aprocess of production that is uniformly distributed over volume. Andso on for all of the infinitely many equivalent reformulations of theproblem (in terms of the fourth, fifth, … power of the length,and indeed in terms of every non-zero real-valued exponent of thelength). What, then, isthe probability of the event inquestion?
The paradox arises because the principle of indifference can be usedin incompatible ways. We have no evidence that favors the side-lengthlying in the interval [0, 1/2] over its lying in [1/2, 1], or viceversa, so the principle requires us to give probability 1/2 to each.Unfortunately, we also have no evidence that favors the face-arealying in any of the four intervals [0, 1/4], [1/4, 1/2], [1/2, 3/4],and [3/4, 1] over any of the others, so we must give probability 1/4to each. The event ‘the side-length lies in [0, 1/2]’,receives a different probability when merely redescribed. And so itgoes, for all the other reformulations of the problem. We cannot meetany pair of these constraints simultaneously, let alone all ofthem.
Jaynes attempts to save the principle of indifference and to extendthe principle of maximum entropy to the continuous case, with hisinvariance condition: in two problems where we have the sameknowledge, we should assign the same probabilities. He regards this asa consistency requirement. For any problem, we have a group ofadmissible transformations, those that change the problem into anequivalent form. Various details are left unspecified in the problem;equivalent formulations of it fill in the details in different ways.Jaynes’ invariance condition bids us to assign equalprobabilities to equivalent propositions, reformulations of oneanother that are arrived at by such admissible transformations of ourproblem. Any probability assignment that meets this condition iscalled aninvariant assignment. Ideally, our problem willhave a unique invariant assignment. To be sure, things will not alwaysbe ideal; but sometimes they are, in which case this is surelyprogress on Bertrand-style problems.
And in any case, for many garden-variety problems such technicalmachinery will not be needed. Suppose I tell you that a prize isbehind one of three doors, and you get to choose a door. This seems tobe a paradigm case in which the principle of indifference works well:the probability that you choose the right door is 1/3. It seemsimplausible that we should worry about some reparametrization of theproblem that would yield a different answer. To be sure,Bertrand-style problems caution us that there are limits to theprinciple of indifference. But arguably we must just be careful not tooverstate its applicability.
How does the classical theory of probability fare with respect to ourcriteria of adequacy? Let us begin with admissibility. (Laplacean)classical probabilities obey non-negativity and normalization, butthey are only finitely additive (de Finetti 1974). So they do not obeythe full Kolmogorov probability calculus, but they provide aninterpretation of the elementary theory.
Classical probabilities are ascertainable, assuming that the space ofpossibilities can be determined in principle. They bear a relationshipto the credences of rational agents; the circularity concern, as wesaw above, is that the relationship is vacuous, and that rather thanconstraining the credences of a rational agent in anepistemically neutral position, they merely record them.
Without supplementation, the classical theory makes no contact withfrequency information. However the coin happens to land in a sequenceof trials, the possible outcomes remain the same. Indeed, even if wehave strong empirical evidence that the coin is biased towards headswith probability, say, 0.6, it is hard to see how the unadornedclassical theory can accommodate this fact — for what now arethe ten possibilities, six of which are favorable to heads? Laplacedoes supplement the theory with his Rule of Succession: “Thus wefind that an event having occurred successively any number of times,the probability that it will happen again the next time is equal tothis number increased by unity divided by the same number, increasedby two units.” (1951, 19) That is:
\[ Pr(\text{success on } N+1^{\text{st}}\text{ trial}\mid N\text{ consec. succeses}) = \frac{N+1}{N+2} \]Thus, inductive learning is possible — though not by classicalprobabilitiesper se, but rather thanks to this further rule.And we must ask whether such learning can be captured once and for allby such a simple formula, the same for all domains and events. We willreturn to this question when we discuss the logical interpretationbelow.
Science apparently invokes at various points probabilities that lookclassical. Bose-Einstein statistics, Fermi-Dirac statistics, andMaxwell-Boltzmann statistics each arise by considering the ways inwhich particles can be assigned to states, and then applying theprinciple of indifference to different subdivisions of the set ofalternatives, Bertrand-style. The trouble is that Bose-Einsteinstatistics apply to some particles (e.g. photons) and not to others,Fermi-Dirac statistics apply to different particles (e.g. electrons),and Maxwell-Boltzmann statistics do not apply to any known particles.None of this can be determineda priori, as the classicalinterpretation would have it. Moreover, the classical theory purportsto yield probability assignments in the face of ignorance. But as Fine(1973) writes:
If we are truly ignorant about a set of alternatives, then we are alsoignorant about combinations of alternatives and about subdivisions ofalternatives. However, the principle of indifference when applied toalternatives, or their combinations, or their subdivisions, yieldsdifferent probability assignments (170).
This brings us to one of the chief points of controversy regarding theclassical interpretation. Critics accuse the principle of indifferenceof extracting information from ignorance. Proponents reply that itrather codifies the way in which such ignorance should beepistemically managed — for anything other than an equalassignment of probabilities would represent the possession of someknowledge. Critics counter-reply that in a state of completeignorance, it is better to assign imprecise probabilities (perhapsranging over the entire [0, 1] interval), or to eschew the assignmentof probabilities altogether.
Logical theories of probability retain the classicalinterpretation’s idea that probabilities can be determined apriori by an examination of the space of possibilities. However, theygeneralize it in two important ways: the possibilities may be assignedunequal weights, and probabilities can be computed whateverthe evidence may be, symmetrically balanced or not. Indeed, thelogical interpretation, in its various guises, seeks to encapsulate infull generality the degree of support or confirmation that a piece ofevidence \(e\) confers upon a given hypothesis \(h\), which we maywrite as \(c(h, e)\). In doing so, it can be regarded also asgeneralizing deductive logic and its notion of implication, to acomplete theory of inference equipped with the notion of ‘degreeof implication’ that relates \(e\) to \(h\). It is often calledthe theory of ‘inductive logic’, although this is amisnomer: there is no requirement that \(e\) be in any sense‘inductive’ evidence for \(h\). ‘Non-deductivelogic’ would be a better name, but this overlooks the fact thatdeductive logic’s relations of implication and incompatibilityare also accommodated as extreme cases in which the confirmationfunction takes the values 1 and 0 respectively. In any case, it issignificant that the logical interpretation provides a framework forinduction.
Early proponents of logical probability include Johnson (1921), Keynes(1921), and Jeffreys (1939/1998). However, by far the most systematicstudy of logical probability was by Carnap. His formulation of logicalprobability begins with the construction of a formal language. In(1950/1962) he considers a class of very simple languages consistingof a finite number of logically independent monadic predicates (namingproperties) applied to countably many individual constants (namingindividuals) or variables, and the usual logical connectives. Thestrongest (consistent) statements that can be made in a given languagedescribe all of the individuals in as much detail as the expressivepower of the language allows. They are conjunctions of completedescriptions of each individual, each description itself a conjunctioncontaining exactly one occurrence (negated or unnegated) of eachpredicate of the language. Call these strongest statementsstatedescriptions.
Any probability measure \(m(-)\) over the state descriptionsautomatically extends to a measure over all sentences, since eachsentence is equivalent to a disjunction of state descriptions; m inturn induces a confirmation function \(c(-, -)\):
\[ c(h,e) = \frac{m(h \amp e)}{m(e)} \]There are infinitely many candidates for \(m\), and hence \(c\), evenfor very simple languages. Carnap argues for his favored measure“\(m^*\)” by insisting that the only thing thatsignificantly distinguishes individuals from one another is somequalitative difference, not just a difference in labeling. Call astructure description a maximal set of state descriptions,each of which can be obtained from another by some permutation of theindividual names. \(m^*\) assigns each structure description equalmeasure, which in turn is divided equally among their constituentstate descriptions. It gives greater weight to homogenous statedescriptions than to heterogeneous ones, thus ‘rewarding’uniformity among the individuals in accordance with putativelyreasonable inductive practice. The induced \(c^*\) allows inductivelearning from experience.
Consider, for example, a language that has three names, \(a\), \(b\)and \(c\), for individuals, and one predicate \(F\). For thislanguage, the state descriptions are:
\[\begin{array}{crcrcr}1. & Fa &\amp& Fb &\amp& Fc \\ 2. & \neg Fa &\amp& Fb &\amp& Fc \\ 3. & Fa &\amp& \neg Fb &\amp& Fc \\ 4. & Fa &\amp& Fb &\amp& \neg Fc \\ 5. & \neg Fa &\amp& \neg Fb &\amp& Fc \\ 6. & \neg Fa &\amp& Fb &\amp& \neg Fc \\ 7. & Fa &\amp& \neg Fb &\amp& \neg Fc \\ 8. & \neg Fa &\amp& \neg Fb &\amp& \neg Fc \\ \end{array}\]There are four structure descriptions:
\[\begin{align}\{1\}, &\text{ “Everything is }F\text{”;} \\ \{2, 3, 4\}, &\text{ “Two } F\text{s, one }\neg F\text{”;} \\ \{5, 6, 7\}, &\text{ “One } F\text{, two }\neg F\text{s”; and} \\ \{8\}, &\text{ “Everything is }\neg F\text{”;} \\ \end{align}\]The measure \(m^*\) assigns numbers to the state descriptions asfollows: first, every structure description is assigned an equalweight, 1/4; then, each state description belonging to a givenstructure description is assigned an equal part of the weight assignedto the structure description:
\[\begin{array}{llll}\textit{State description} & \textit{Structure Description} & \textit{Weight} & \quad m^* \\ \left.\begin{array}{l}1.\ Fa.Fb.Fc \end{array}\right. & \text{I. Everything is } F & 1/4 & \quad 1/4 \\ \left.\begin{array}{l}2.\ \neg Fa.Fb.Fc\phantom{\neg} \\ 3.\ Fa.\neg Fb.Fc \\ 4.\ Fa.Fb.\neg Fc \end{array} \right\} & \text{II. Two } F\text{s, one }\neg F & 1/4 & \left\{\begin{array}{l}1/12 \\ 1/12 \\ 1/12 \end{array}\right. \\ \left.\begin{array}{l}5.\ \neg Fa.\neg Fb.Fc \\ 6.\ \neg Fa.Fb.\neg Fc \\ 7.\ Fa.\neg Fb.\neg Fc \end{array} \right\} & \text{III. One } F\text{, two }\neg F\text{s} & 1/4 & \left\{\begin{array}{l}1/12 \\ 1/12 \\ 1/12 \end{array}\right. \\ \left.\begin{array}{l}8.\ \neg Fa.\neg Fb.\neg Fc \end{array}\right. & \text{IV. Everything is } \neg F & 1/4 & \quad 1/4 \end{array}\]Notice that \(m^*\) gives greater weight to the homogenous statedescriptions 1 and 8 than to the heterogeneous ones. This willmanifest itself in the inductive support that hypotheses can gain fromappropriate evidence statements. Consider the hypothesis statement \(h= Fc\), true in 4 of the 8 state descriptions, witha prioriprobability \(m^*(h) = 1/2\). Suppose we examine individual“\(a\)” and find it has property \(F\) — call thisevidence \(e\). Intuitively, \(e\) is favorable (albeit weak)inductive evidence for \(h\). We have: \(m^*(h \amp e) = 1/3,\)\(m^*(e) = 1/2\), and hence
\[ c^*(h,e) = \frac{m^*(h \amp e)}{m^*(e)} = \frac{2}{3}. \]This is greater than thea priori probability \(m^*(h) =1/2\), so the hypothesis has been confirmed. It can be shown that ingeneral \(m^*\) yields a degree of confirmation \(c^*\) that allowslearning from experience.
Note, however, that infinitely many confirmation functions, defined bysuitable choices of the initial measure, allow learning fromexperience. We do not have yet a reason to think that \(c^*\) is theright choice. Carnap claims nevertheless that \(c^*\) stands out forbeing simple and natural.
He later generalizes his confirmation function to a continuum offunctions \(c_{\lambda}\). Define afamily of predicates tobe a set of predicates such that, for each individual, exactly onemember of the set applies, and consider first-order languagescontaining a finite number of families. Carnap (1963) focuses on thespecial case of a language containing only one-place predicates. Helays down a host of axioms concerning the confirmation function \(c\),including those induced by the probability calculus itself, variousaxioms of symmetry (for example, that \(c(h, e)\) remains unchangedunder permutations of individuals, and of predicates of any family),and axioms that guarantee undogmatic inductive learning, and long-runconvergence to relative frequencies. They imply that, for a family\(\{P_n\},\) \(n = 1, \ldots,k\) \((k \gt 2){:}\)
\[\begin{align}c_{\lambda}(\text{individual } s+1 \text{ is } P_j,\ s_j \text{ of thefirst } &s \text{ individuals are }P_j) \\ &= \frac{(s_j + \lambda/k)}{s+ \lambda}, \end{align}\]where \(\lambda\) is a positive real number. The higher the value of\(\lambda\), the less impact evidence has: induction from what isobserved becomes progressively more swamped by a classical-style equalassignment to each of the \(k\) possibilities regarding individual \(s+ 1\).
I turn to various objections to Carnap’s program that have beenoffered in the literature, noting that this remains an area of livelydebate. (See Maher (2010) for rebuttals of some of these objectionsand for defenses of the program; see Fitelson (2006) for an overallassessment of the program.) Firstly, is there a correct setting of\(\lambda\), or said another way, how ‘inductive’ shouldthe confirmation function be? The concern here is that any particularsetting of \(\lambda\) is arbitrary in a way that compromisesCarnap’s claim to be offering alogical notion ofprobability. Also, it turns out that for any such setting, a universalstatement in an infinite universe always receives zero confirmation,no matter what the (finite) evidence. Many find this counterintuitive,since laws of nature with infinitely many instances can apparently beconfirmed. Earman (1992) discusses the prospects for avoiding theunwelcome result.
Significantly, Carnap’s various axioms of symmetry are hardlylogical truths. Moreover, Fine (1973, 202) argues that we cannotimpose further symmetry constraints that are seemingly just asplausible as Carnap’s, on pain of inconsistency. Goodman (1955)taught us: that the future will resemble the past in some respect istrivial; that it will resemble the past in all respects iscontradictory. And we may continue: that a probability assignment canbe made to respect some symmetry is trivial; that one can be made torespect all symmetries is contradictory. This threatens the wholeprogram of logical probability.
Another Goodmanian lesson is that inductive logic must be sensitive tothe meanings of predicates, strongly suggesting that a purelysyntactic approach such as Carnap’s is doomed. Scott and Krauss(1966) use model theory in their formulation of logical probabilityfor richer and more realistic languages than Carnap’s. Still,finding a canonical language seems to many to be a pipe dream, atleast if we want to analyze the “logical probability” ofany argument of real interest — either in science, or ineveryday life.
Logical probabilities are admissible. It is easily shown that theysatisfy finite additivity, and given that they are defined on finitesets of sentences, the extension to countable additivity is trivial.Given a choice of language, the values of a given confirmationfunction are ascertainable; thus, if this language is rich enough fora given application, the relevant probabilities are ascertainable. Thewhole point of the theory of logical probability is to explicateampliative inference, although given the apparent arbitrariness in thechoice of language and in the setting of \(\lambda\) — thus, inthe choice of confirmation function — one may wonder how well itachieves this. The problem of arbitrariness of the confirmationfunction also hampers the extent to which the logical interpretationcan truly illuminate the connection between probabilities andfrequencies.
The arbitrariness problem, moreover, stymies any compelling connectionbetween logical probabilities and rational credences. And a furtherproblem remains even after the confirmation function has been chosen:if one’s credences are to be based on logical probabilities,they must be relativized to an evidence statement, \(e\). Carnaprequires that \(e\) be one’stotal evidence—themaximally specific information at one’s disposal, the strongestproposition of which one is certain. But perhaps learning does notcome in the form of such ‘bedrock’ propositions, asJeffrey (1992) has argued — maybe it rather involves a shift inone’s subjective probabilities across a partition, without anycell of the partition becoming certain. Then it may be that thestrongest proposition of which one is certain is expressed by atautology \(T\) — hardly an interesting notion of ‘total evidence’.[4]
In connection with the ‘applicability to science’criterion, a point due to Lakatos is telling. By Carnap’slights, the degree of confirmation of a hypothesis depends on thelanguage in which the hypothesis is stated and over which theconfirmation function is defined. But scientific progress often bringswith it a change in scientific language (for example, the addition ofnew predicates and the deletion of old ones), and such a change willbring with it a change in the corresponding \(c\)-values. Thus, thegrowth of science may overthrow any particular confirmation theory.There is something of the snake eating its own tail here, sincelogical probability was supposed to explicate the confirmation ofscientific theories.
We have seen that the later Carnap relaxed his earlier aspiration tofind aunique confirmation function, allowing a continuum ofsuch functions displaying a wide range of inductive cautiousness.Various critics of logical probabilities believe that he did not gofar enough — that even his later systems constrain inductivelearning beyond what is rationally required. This recalls the classicdebate earlier in the 20th century between Keynes, a famousproponent of logical probabilities, and Ramsey, an equally famousopponent. Ramsey (1926; 1990) was skeptical of there being anynon-trivial relations of logical probability: he said that he couldnot discern them himself, and that others disagree about them. Thisskepticism led him to formulate his enormously influential version ofthe subjective interpretation of probability, to be discussedshortly.
One might insist, however, that there are non-trivial probabilisticevidential relations, even if they are not logical. It maynot be a matter oflogic that the sun will probably risetomorrow, given our evidence, yet there still seems to be an objectivesense in which it probably will, given our evidence. In a crimeinvestigation, there may be a fact of the matter of how strongly theavailable evidence supports the guilt of various suspects. This doesnot seem to be a matter of logic—nor of physics, nor of whatanyone happens to think, nor of how the facts in the actual world turnout. It seems to be a matter, rather, ofevidentialprobabilities.
More generally, Timothy Williamson (2000, 209) writes:
Given a scientific hypothesis \(h\), we can intelligibly ask: howprobable is \(h\) on present evidence? We are asking how much theevidence tells for or against the hypothesis. We are not asking whatobjective physical chance or frequency of truth \(h\) has. A proposedlaw of nature may be quite improbable on present evidence even thoughits objective chance of truth is 1. That is quite consistent with theobvious point that the evidence bearing on \(h\) may include evidenceabout objective chances or frequencies. Equally, in asking howprobable \(h\) is on present evidence, we are not asking aboutanyone’s actual degree of belief in \(h\). Present evidence maytell strongly against \(h\), even though everyone is irrationallycertain of \(h\).
Williamson identifies one’s evidence with what one knows.However, one might adopt other conceptions of evidence, and one mighteven take evidential probabilities to link any two propositionswhatsoever. Williamson maintains that evidential probabilities are notlogical—in particular, they are not syntactically definable. Heassumes an initial probability distribution \(P\), which“measures something like the intrinsic plausibility ofhypotheses prior to investigation” (211). The evidentialprobability of \(h\) on total evidence \(e\) is then given by\(P(h\mid e)\).
Are evidential probabilities admissible? Williamson says that “Pwill be assumed to satisfy a standard set of axioms for theprobability calculus” (211). So admissibility is built into thevery specification of P. Are they ascertainable? He writes:
What, then, are probabilities on evidence? We should resist demandsfor an operational definition; such demands are as damaging in thephilosophy of science as they are in science itself. Sometimes thebest policy is to go ahead and theorize with a vague but powerfulnotion. One’s original intuitive understanding becomes refinedas a result, although rarely to the point of a definition in precisepretheoretic terms. That policy will be pursued here. (211)
This might be understood as rejecting ascertainability as a criterionof adequacy.
However, some authors are skeptical that there are such things asevidential probabilities—e.g. Joyce (2004). He also argues thatthere is more than one sense in which evidence tells for or against ahypothesis. Bacon (2014) allows that there are such things asevidential probabilities, but he argues that various puzzling resultsfollow from Williamson’s account of them, in virtue of itsidentifying evidence with knowledge. Moreover, one may resist demandsfor anoperational definition of evidential probabilities,while seeking some further understanding of them in terms of othertheoretical concepts. For example, perhaps \(P(h\mid e)\) is thesubjective probability that a perfectly rational agent with evidence\(e\) would assign to \(h\)? Williamson argues against this proposal;Eder (2023) defends it, and she offers several ways of interpretingevidential probabilities in terms of ideal subjective probabilities.If some such way is tenable, evidential probabilities would presumablyenjoy whatever applicability that such subjective probabilities have.This brings us to our next interpretation of probability.
Nearly a century before Ramsey, De Morgan wrote: “By degree ofprobability, we really mean, or ought to mean, degree of belief”(1847, 172). According to thesubjective (orpersonalist orBayesian) interpretation,probabilities are degrees of confidence, or credences, or partialbeliefs of suitable agents. Thus, we really havemanyinterpretations of probability here— as many as there aresuitable agents. What makes an agent suitable? What we might callunconstrained subjectivism places no constraints on theagents — anyone goes, and hence anything goes. Various studiesby psychologists are taken to show that people commonly violate theusual probability calculus in spectacular ways. (See, e.g., severalarticles in Kahneman et al. 1982.) We clearly do not have here anadmissible interpretation (with respect to any probability calculus),since there is no limit to what degrees of confidence agents mighthave.
More promising, however, is the thought that the suitable agents mustbe, in a strong sense,rational. Following Ramsey, varioussubjectivists have wanted to assimilate probability to logic byportraying probability as “the logic of partial belief”(1926; 1990, 53 and 55). A rational agent is required to be logicallyconsistent, now taken in a broad sense. These subjectivists argue thatthis implies that the agent obeys the axioms of probability (althoughperhaps with only finite additivity), and that subjectivism is thus(to this extent) admissible. Before we can present this argument, wemust say more about what degrees of belief are.
Subjective probabilities have long been analyzed in terms of bettingbehavior. Here is a classic statement by de Finetti (1980):
Let us suppose that an individual is obliged to evaluate the rate\(p\) at which he would be ready to exchange the possession of anarbitrary sum \(S\) (positive or negative) dependent on the occurrenceof a given event \(E\), for the possession of the sum \(pS\); we willsay by definition that this number \(p\) is the measure of the degreeof probability attributed by the individual considered to the event\(E\), or, more simply, that \(p\) is the probability of \(E\)(according to the individual considered; this specification can beimplicit if there is no ambiguity). (62)
This boils down to the following analysis:
Your degree of belief in \(E\) is \(p\) iff \(p\) units of utility isthe price at which you would buy or sell a bet that pays 1 unit ofutility if \(E\), 0 if not \(E\).
The analysis presupposes that, for any \(E\), there is exactly onesuch price — let’s call this yourfair price forthe bet on \(E\). This presupposition may fail. There may be no suchprice — you may refuse to bet on \(E\) at all (perhaps unlesscoerced, in which case your genuine opinion about \(E\) may not berevealed), or your selling price may differ from your buying price, asmay occur if your probability for \(E\) is imprecise. There may bemore than one fair price — you may find a range of such pricesacceptable, as may also occur if your probability for \(E\) isimprecise. For now, however, let us waive these concerns, and turn toan important argument that uses the betting analysis purportedly toshow that rational degrees of belief must conform to the probabilitycalculus (with at least finite additivity).
ADutch book is a series of bets bought and sold at pricesthat collectively guarantee loss, however the world turns out. Supposewe identify your credences with your betting prices. Ramsey notes, andit can be easily proven (e.g., Skyrms 1984), that if your credencesviolate the probability calculus, then you are susceptible to a Dutchbook—this is theDutch Book Theorem. For example,suppose that you violate the additivity axiom by assigning \(P(A \cupB) \lt P(A) + P(B)\), where \(A\) and \(B\) are mutually exclusive.Then a cunning bettor could buy from you a bet on \(A \cup B\) for\(P(A \cup B)\) units, and sell you bets on \(A\) and \(B\)individually for \(P(A)\) and \(P(B)\) units respectively. He pocketsan initial profit of \(P(A) + P(B) - P(A \cup B)\), and retains itwhatever happens. Ramsey offers the following influential gloss:“If anyone’s mental condition violated these laws [of theprobability calculus], his choice would depend on the precise form inwhich the options were offered him, which would be absurd.”(1990, 78) The Dutch Book argument concludes: rationality requiresyour credences to obey the probability calculus.
The argument is incomplete as it stands. As Hájek (2008, 2009b)observes, the Dutch Book Theorem leaves open the possibility that youare susceptible to a Dutch Book whether or not your credences violatethe probability calculus—perhaps we are all susceptible? Equallyimportant, and often neglected, is the converse theorem thatestablishes how you can avoid such a predicament. If your subjectiveprobabilities conform to the probability calculus, then no Dutch bookcan be made against you (Kemeny 1955); your probability assignmentsare then said to becoherent. Williamson (1999) extends theDutch Book argument to countable additivity: if your credences violatecountable additivity, then you are susceptible to a Dutch book (withinfinitely many bets). Conformity to the full probability calculusthus seems to be necessary and sufficient for coherence.[5] We thus have an argument that rational credences provide aninterpretation of the full probability calculus, and thus anadmissible interpretation. Note, however, that de Finetti—thearch subjectivist and proponent of the Dutch Book argument—wasan opponent of countable additivity (e.g. in his 1974). SeeHájek (2009b), Pettigrew (2020) and the entry onDutch Book arguments for various objections to Dutch Book arguments for conformity to theprobability calculus and for other putative norms on credences.
But let us return to the betting analysis of credences. It is anattempt to make good on Ramsey’s idea that probability “isa measurement of beliefqua basis of action” (67).While he regards the method of measuring an agent’s credences byher betting behavior as “fundamentally sound” (68), herecognizes that it has its limitations.
The betting analysis gives an operational definition of subjectiveprobability, and indeed it inherits some of the difficulties ofoperationalism in general, and of behaviorism in particular. Forexample, you may have reason to misrepresent your true opinion, or tofeign having opinions that in fact you lack, by making the relevantbets (perhaps to exploit an incoherence in someone else’sbetting prices). Moreover, as Ramsey points out, placing the very betmay alter your state of opinion. Trivially, it does so regardingmatters involving the bet itself (e.g., you suddenly increase yourprobability that you have just placed a bet). Less trivially, placingthe bet may change the world, and hence your opinions, in other ways.For example, betting at high stakes on the proposition ‘I willsleep well tonight’ may suddenly turn you into an insomniac! Andthen the bet may concern an event such that, were it to occur, youwould no longer value the pay-off the same way. (During the August 11,1999 solar eclipse in the UK, a man placed a bet that would have paida million pounds if the world came to an end.)
These problems stem largely from taking literally the notion ofentering into a bet on \(E\), with its corresponding payoffs. Theproblems may be avoided by identifying your degree of belief in aproposition with the betting price you regard as fair, whether or notyou enter into such a bet; it corresponds to the betting odds that youbelieve confer no advantage or disadvantage to either side of the bet(Howson and Urbach 1993). At your fair price, you should beindifferent between taking either side.[6]
De Finetti speaks of “an arbitrary sum” as the prize ofthe bet on \(E\). The sum had better be potentially infinitelydivisible, or else probability measurements will be precise only up tothe level of ‘grain’ of the potential prizes. For example,a sum that can be divided into only 100 parts will leave probabilitymeasurements imprecise beyond the second decimal place, conflatingprobabilities that should be distinguished (e.g., those of a logicalcontradiction and of ‘a fair coin lands heads 8 times in arow’). More significantly, if utility is not a linear functionof such sums, then the size of the prize will make a difference to theputative probability: winning a dollar means more to a pauper morethan it does to Bill Gates, and this may be reflected in their bettingbehaviors in ways that have nothing to do with their genuineprobability assignments. De Finetti responds to this problem bysuggesting that the prizes be kept small; that, however, only createsthe opposite problem that agents may be reluctant to bother abouttrifles, as Ramsey points out.
Better, then, to let the prizes be measured in utilities: after all,utility is infinitely divisible, and utility is a linear function ofutility. While we’re at it, we should adopt a more liberalnotion of betting. After all, there is a sense in which every decisionis a bet, as Ramsey observed.
Utilities (desirabilities) of outcomes, their probabilities, andrational preferences are all intimately linked. ThePort RoyalLogic (Arnauld, 1662) showed how utilities and probabilitiestogether determine rational preferences; de Finetti’s bettinganalysis derives probabilities from utilities and rationalpreferences; von Neumann and Morgenstern (1944) derive utilities fromprobabilities and rational preferences. And most remarkably, Ramsey(1926) (and later, Savage 1954 and Jeffrey 1966) derivesbothprobabilitiesand utilities from rational preferencesalone.
First, he defines a proposition to beethically neutral— relative to an agent — if the agent is indifferentbetween the proposition’s truth and falsehood. The agentdoesn’t care about the ethically neutral proposition as such— it may be a means to an end that he might care about, but ithas no intrinsic value. (The result of a coin toss is typically likethis for most of us.) Now, there is a simple test for determiningwhether, for a given agent, an ethically neutral proposition \(N\) hasprobability 1/2. Suppose that the agent prefers \(A\) to \(B\). Then\(N\) has probability 1/2 iff the agent is indifferent between thegambles:
\[\begin{align}& A \text{ if } N, B \text{ if not } \\ & B \text{ if } N, A \text{ if not}. \\ \end{align}\]Ramsey assumes that it does not matter what the candidates for \(A\)and \(B\) are. We may assign arbitrarily to \(A\) and \(B\) any tworeal numbers \(u(A)\) and \(u(B)\) such that \(u(A) \gt u(B)\),thought of as the desirabilities of \(A\) and \(B\) respectively.Having done this for the one arbitrarily chosen pair \(A\) and \(B\),the utilities of all other propositions are determined.
Given various assumptions about the richness of the preference space,and certain ‘consistency assumptions’, he can define areal-valued utility function of the outcomes \(A, B\), etc — infact, various such functions will represent the agent’spreferences. He is then able to define equality of differences inutility for any outcomes over which the agent has preferences. Itturns out that ratios of utility-differences are invariant — thesame whichever representative utility function we choose. This factallows Ramsey to define degrees of belief as ratios of suchdifferences. For example, suppose the agent is indifferent between\(A\), and the gamble “\(B\) if \(X, C\) otherwise”. Thenit follows from considerations of expected utility that her degree ofbelief in \(X, P(X)\), is given by:
\[ P(X) = \frac{u(A) - u(C)}{u(B) - u(C)} \]Ramsey shows that degrees of belief so derived obey the probabilitycalculus (with finite additivity).
Savage (1954) likewise derives probabilities and utilities frompreferences among options that are constrained by certain putative‘consistency’ axioms. For a given set of such preferences,he generates a class of utility functions, each a positive lineartransformation of the other (i.e. of the form \(U_1 = aU_2 + b\),where \(a \gt 0)\), and a unique probability function. Together theseare said to ‘represent’ the agent’s preferences, andthe result that they do so is called a ‘representationtheorem’. Jeffrey (1966) refines Savage’s approach. Theresult is a theory of decision according to which rational choicemaximizes ‘expected utility’, a certainprobability-weighted average of utilities. (See Buchak 2016 for morediscussion.) Some of the difficulties with the behavioristic bettinganalysis of degrees of belief can now be resolved by moving to ananalysis of degrees of belief that is functionalist in spirit. Forexample, according to Lewis (1986a, 1994a), an agent’s credencesare represented by the probability function belonging to a utilityfunction/probability function pair that best rationalizes herbehavioral dispositions, rationality being given a decision-theoreticanalysis. Representation theorems (in one form or another) underpinrepresentation theorem arguments that rational agents’credences obey the probability calculus: their preferences obey therequisite axioms, and thus their credences are representable that way.However, as well as being representable probabilistically, suchagents’ credences are representablenon-probabilistically; why should the probabilisticrepresentation be privileged? See Zynda (2000), Hájek (2008),and Meacham and Weisberg (2011) for this and other objections torepresentation theorem arguments.
There is a deep issue that underlies all of these accounts ofsubjective probability. They all presuppose the existence of necessaryconnections between desire-like states and belief-like states,rendered explicit in the connections between preferences andprobabilities. In response, one might insist that such connections areat best contingent, and indeed can be imagined to be absent. Think ofan idealized Zen Buddhist monk, devoid of any preferences, whodispassionately surveys the world before him, forming beliefs but nodesires. It could be replied that such an agent is not so easilyimagined after all — even if the monk does not value worldlygoods, he will still prefer some things to others (e.g., truth tofalsehood).
Once desires enter the picture, they may also have unwantedconsequences. Again, how does one separate an agent’s enjoymentor disdain for gambling from the value she places on the gambleitself? Ironically, a remark that Ramsey makes in his critique of thebetting analysis seems apposite here: “The difficulty is likethat of separating two different co-operating forces” (1990,68). See Eriksson and Hájek (2007) for further criticism ofpreference-based accounts of credence.
The betting analysis makes subjective probabilities ascertainable tothe extent that an agent’s betting dispositions areascertainable. The derivation of them from preferences makes themascertainable to the extent that his or her preferences are known.However, it is unclear that an agent’s full set of preferencesis ascertainable even to himself or herself. Here a lot of weight mayneed to be placed on the ‘in principle’ qualification inthe ascertainability criterion. The expected utility representationmakes it virtually analytic that an agent should be guided byprobabilities — after all, the probabilities are her own, andthey are fed into the formula for expected utility in order todetermine what it is rational for her to do. So the applicability torational decision criterion is clearly met.
But do they function as agood guide? Here it is useful todistinguish different versions of subjectivism.OrthodoxBayesians in the style of de Finetti recognize no rationalconstraints on subjective probabilities beyond:
This is a permissive epistemology, licensing doxastic states that wewould normally call crazy. Thus, you could assign probability 1 tothis sentence ruling the universe, while upholding such extremesubjectivism.
Some subjectivists impose the further rationality requirement ofregularity: anything that is possible (in an appropriatesense) gets assigned positive probability. It is advocated by authorssuch as Jeffreys (1939/1998), Kemeny (1955), Edwards et al. (1963),Shimony (1970), and Stalnaker (1970). It is meant to capture a form ofopen-mindedness and responsiveness to evidence. But then, perhapsunintuitively, someone who assigns probability 0.999 to this sentenceruling the universe can be judged rational, while someone who assignsit probability 0 is judged irrational. See, e.g., Levi (1978) forfurther opposition to regularity.
Probabilistic coherence plays much the same role for degrees of beliefthatconsistency plays for ordinary, all-or-nothing beliefs.What an extreme subjectivist, even one who demands regularity, lacksis an analogue oftruth, some yardstick for distinguishingthe ‘veridical’ probability assignments from the rest(such as the 0.999 one above), some way in which probabilityassignments are answerable to the world. It seems, then, that thesubjectivist needs something more.
And various subjectivists offer more. Having isolated the“logic” of partial belief as conformity to the probabilitycalculus, Ramsey goes on to discuss what makes a degree of belief in apropositionreasonable. After canvassing several possibleanswers, he settles upon one that focuses onhabits ofopinion formation — “e.g. the habit of proceeding from theopinion that a toadstool is yellow to the opinion that it isunwholesome” (50). He then asks, for a person with this habit,what probability it would be best for him to have that a given yellowtoadstool is unwholesome, and he answers that “it will ingeneral be equal to the proportion of yellow toadstools which are infact unwholesome” (1990, 91). This resonates with more recentproposals (e.g., van Fraassen 1984, Shimony 1988) for evaluatingdegrees of belief according to how closely they match thecorresponding relative frequencies — in the jargon, how wellcalibrated they are. Since relative frequencies obey theaxioms of probability (up to finite additivity), it is thought thatrational credences, which strive to track them, should do so also.[7]
However, rational credences may strive to track various things. Forexample, we are often guided by the opinions of experts. We consultour doctors on medical matters, our weather forecasters onmeteorological matters, and so on. Gaifman (1988) coins the terms“expert assignment” and “expert probability”for a probability assignment that a given agent strives to track:“The mere knowledge of the [expert] assignment will make theagent adopt it as his subjective probability” (193). This ideamay be codified as follows:
\[\begin{align}\tag{Expert} &P(A\mid pr(A)=x) = x, \\ &\text{for all } x \text{ where this is defined}. \end{align}\]where ‘\(P\)’ is the agent’s subjective probabilityfunction, and ‘\(pr(A)\)’ is the assignment that the agentregards as expert. For example, if you regard the local weatherforecaster as an expert on your local weather, and she assignsprobability 0.1 to it raining tomorrow, then you may well followsuit:
\[ P(\textit{rain}\mid pr(\textit{rain}) = 0.1) = 0.1 \]More generally, we might speak of an entire probability function asbeing such a guide for an agent over a specified set of propositions.Van Fraassen (1989, 198) gives us this definition: “If \(P\) ismy personal probability function, then \(q\) is anexpert functionfor me concerning family \(F\) of propositions exactly if \(P(A\mid q(A) = x) = x\) for all propositions \(A\) in family\(F\).”
Let us define auniversal expert functionfor agiven rational agent as one that would guideall of thatagent’s probability assignments in this way: an expert functionfor the agent concerning all propositions. van Fraassen (1984, 1995a),following Goldstein (1983), argues that an agent’sfutureprobability functions are universal expert functions for thatagent. He enshrines this idea in hisReflection Principle,whereP is the agent’s probability and \(P_{t}\) is herfunction at a later time \(t\):
\[\begin{align}&P (A \mid P_t(A) = x) = x, \\ &\text{for all } t, A \text{ and } x \text{ for which this is defined.} \end{align}\]The principle encapsulates a certain demand for ‘diachroniccoherence’ imposed by rationality. Van Fraassen defends it witha ‘diachronic’ Dutch Book argument (one that considersbets placed at different times), and by analogizing violations of itto the sort of pragmatic inconsistency that one finds in Moore’sparadox.
We may go still further. There may be universal expert functions forlarge classes of rational agents, and perhaps all of them. ThePrinciple of Direct Probability regards therelativefrequency function as a universal expert function for allrational agents; we have already seen the importance that proponentsof calibration place on it. Let \(A\) be an event-type, and letrelfreq\((A)\) be the relative frequency of \(A\) (in somesuitable reference class). Then for any rational agent withprobability function \(P\), we have (cf. Hacking 1965):
\[\begin{align}&P(A\mid \textit{relfreq}(A) = x) = x, \\ &\text{for all } A \text{ and for all } x \text{ where this is defined.} \end{align}\]Lewis (1980) posits a similar expert role for theobjective chancefunction, ch, for all rationalinitial credences in hisPrincipal Principle (here simplified[8]):
\[\begin{align}&C(A\mid \textit{ch}(A) = x) = x, \\ &\text{for all } A \text{ and for all } x \text{ where this is defined.} \end{align}\]‘\(C\)’ denotes the ‘ur’ credence function ofan agent at the beginning of enquiry. This is an idealization thatensures that the agent does not have any “inadmissible”evidence that bears on \(A\) without bearing on the chance of \(A\).For example, a rational agent who somehow knows that a particular cointoss lands heads is surelynot required to assign
\[ C(\text{heads} \mid \textit{ch}(\text{heads}) = \frac{1}{2}) = \frac{1}{2}. \]Rather, this conditional probability should be 1, since she hasinformation relevant to the outcome ‘heads’ that trumpsits chance. The other expert principles surely need to be suitablyqualified – otherwise they face analogous counterexamples. Yetstrangely, the Principal Principle is the only expert principle aboutwhich concerns about inadmissible evidence have been raised in theliterature.
I will say more about relative frequencies and chance shortly.
The ultimate expert, presumably, is thetruth function— the function that assigns 1 to all the true propositions and 0to all the false ones. Knowledge of its values should surely trumpknowledge of the values assigned by human experts (includingone’s future selves), frequencies, or chances. Note that for anyputative expert \(q\),
\[\begin{align}&P(A\mid q(A) = x \,\cap\, A) = 1, \\ &\text{for all } A \text{ and for all } x \text{ where this is defined.} \end{align}\]— the truth of \(A\) overrides anything the expert might say. Soall of the proposed expert probabilities above should really beregarded as defeasible. Joyce (1998) portrays the rational agent asestimating truth values, seeking to minimize a measure of distancebetween them and her probability assignments—that is, tomaximize theaccuracy of those assignments. Generalizing atheorem of de Finetti’s (1974), he shows that for any measure ofdistance that satisfies certain intuitive properties, any agent whoviolates the probability axioms could serve this epistemic goal betterby obeying them instead, however the world turns out. In short,non-probabilistic credences areaccuracy-dominated byprobabilistic credences. This provides a “non-pragmatic”argument for probabilism (in contrast to the Dutch Book andrepresentation theorem arguments) for finite domains. Nielsen (2023)extends a related accuracy argument by Predd et al. (2009), withdifferent conditions on accuracy measures, to arbitrarily largedomains.
There are some unifying themes in these putative constraints onsubjective probability. An agent’s degrees of belief determineher estimates of certain quantities: the values of bets, or thedesirabilities of gambles more generally, or the probabilityassignments of various ‘experts’ — humans, relativefrequencies, objective chances, or truth values. The laws ofprobability then are claimed to be constraints on these estimates:putative necessary conditions for minimizing her ‘losses’in a broad sense, be they monetary, or measured by distances from theassignments of these experts.
We have been gradually adding more and more constraints on rationalcredences, putatively demanded by rationality. Recall that Carnapfirst assumed that there was a unique confirmation function, and thenrelaxed this assumption to allow a plurality of such functions. We nowseem to be heading in the opposite direction: starting with theextremely permissive orthodox Bayesianism, we are steadily reducingthe class of rationally permissible credence functions. So far theconstraints that we have admitted have not been especiallyevidence-driven.Objective Bayesians maintain that arational agent’s credences are largely determined by herevidence.
How large is “largely”? The lines of demarcation are notsharp, and subjective Bayesianism may be regarded as a somewhatindeterminate region on a spectrum of views that morph into objectiveBayesianism. At one end lies an extreme form of subjectiveBayesianism, according to which rational credences are constrainedonly by the probability calculus (and updating by conditionalization).At the other of the spectrum lies an extreme form of objectiveBayesianism, according to which rational probabilities are constrainedto the point of uniqueness by one’s evidence—we may callthisthe Uniqueness Thesis. But both objective Bayesians andsubjective Bayesians may adopt less extreme positions, and typicallydo. For example, Jon Williamson (2010) is an objective Bayesian, butnot an extreme one. He adds to the probability calculus theconstraints of being calibrated with evidence, and otherwiseequivocating between basic outcomes, especially appealing to versionsof maximum entropy. As such, his view is a descendant of the classicalinterpretation and its generalization due to Jaynes.
Gamblers, actuaries and scientists have long understood that relativefrequencies bear an intimate relationship to probabilities. Frequencyinterpretations posit the most intimate relationship of all: identity.Thus, we might identify the probability of ‘heads’ on acertain coin with the number of heads in a suitable sequence of tossesof the coin, divided by the total number of tosses. A simple versionof frequentism, which we will callfinite frequentism,attaches probabilities to events or attributes in a finite referenceclass in such a straightforward manner:
the probability of an attribute A in a finite reference class B isthe relative frequency of actual occurrences of A within B.
Thus, finite frequentism bears certain structural similarities to theclassical interpretation, insofar as it gives equal weight to eachmember of a set of events, simply counting how many of them are‘favorable’ as a proportion of the total. The crucialdifference, however, is that where the classical interpretationcounted all thepossible outcomes of a given experiment,finite frequentism countsactual outcomes. It is thuscongenial to those with empiricist scruples. It was developed by Venn(1876), who in his discussion of the proportion of births of males andfemales, concludes: “probabilityis nothing but thatproportion” (p. 84, his emphasis).[9]) Finite frequentism is often assumed, tacitly or explicitly, instatistics and in the sciences more generally.
Finite frequentism gives an operational definition of probability, andits problems begin there. For example, just as we want to allow thatour thermometers could be ill-calibrated, and could thus givemisleading measurements of temperature, so we want to allow that our‘measurements’ of probabilities via frequencies could bemisleading, as when a fair coin lands heads 9 out of 10 times. Morethan that, it seems to be built into the very notion of probabilitythat such misleading results can arise. Indeed, in many cases,misleading results are guaranteed. Starting with a degenerate case:according to the finite frequentist, a coin that is never tossed, andthat thus yields no actual outcomes whatsoever, lacks a probabilityfor heads altogether; yet a coin that is never measured does notthereby lack a diameter. Perhaps even more troubling, a coin that istossed exactly once yields a relative frequency of heads of either 0or 1, whatever its bias. Or we can imagine a unique radiocative atomwhose probabilities of decaying at various times obey a continuous law(e.g. exponential); yet according to finite frequentism, withprobability 1 it decays at the exact time that itactuallydoes, for its relative frequency of doing so is 1/1. Famous enough tomerit a name of its own, these are instances of the so-called‘problem of the single case’. In fact, many events aremost naturally regarded as not merely unrepeated, but in a strongsenseunrepeatable — the 2020 presidential election,the final game of the 2019 NBA play-offs, the Civil War,Kennedy’s assassination, certain events in the very earlyhistory of the universe, and so on. Nonetheless, it seems natural tothink of non-extreme probabilities attaching to some, and perhaps all,of them. Worse still, some cosmologists regard it as a genuinelychancy matter whether our universe is open or closed (apparentlycertain quantum fluctuations could, in principle, tip it one way orthe other), yet whatever it is, it is ‘single-case’ in thestrongest possible sense.
The problem of the single case is particularly striking, but we reallyhave a sequence of related problems: ‘the problem of the doublecase’, ‘the problem of the triple case’ …Every coin that is tossed exactly twice can yield only the relativefrequencies 0, 1/2 and 1, whatever its bias… According toactual frequentism, it is an analytic truth that every coin that istossed an odd number of times is biased. A finite reference class ofsize \(n\), however large \(n\) is, can only produce relativefrequencies at a certain level of ‘grain’, namely \(1/n\).Among other things, this rules out irrational-valued probabilities;yet our best physical theories say otherwise. Furthermore, there is asense in which any of these problems can be transformed into theproblem of the single case. Suppose that we toss a coin a thousandtimes. We can regard this as asingle trial of athousand-tosses-of-the-coin experiment. Yet we do not want to becommitted to saying thatthat experiment yields its actualresult with probability 1.
The problem of the single case is that the finite frequentist fails tosee intermediate probabilities in various places where others do.There is also the converse problem: the frequentist sees intermediateprobabilities in various places where others do not. Our world hasmyriad different entities, with myriad different attributes. We cangroup them into still more sets of objects, and then ask with whichrelative frequencies various attributes occur in these sets. Many suchrelative frequencies will be intermediate; the finite frequentistautomatically identifies them with intermediate probabilities. But itwould seem that whether or not they are genuineprobabilities, as opposed to mere tallies, depends on thecase at hand. Bare ratios of attributes among sets of disparateobjects may lack the sort of modal force that one might expect fromprobabilities. I belong to the reference class consisting of myself,the Eiffel Tower, the southernmost sandcastle on Santa Monica Beach,and Mt Everest. Two of these four objects are less than 7 feet tall, arelative frequency of 1/2; moreover, we could easily extend thisclass, preserving this relative frequency (or, equally easily, not).Yet it would be odd to say that myprobability of being lessthan 7 feet tall, relative to this reference class, is 1/2, althoughit is perfectly acceptable (if uninteresting) to say that 1/2 of theobjects in the reference class are less than 7 feet tall.
Some frequentists (notably Venn 1876, Reichenbach 1949, and von Mises1957 among others), partly in response to some of the problems above,have gone on to considerinfinite reference classes,identifying probabilities withlimiting relative frequenciesof events or attributes therein. Thus, we require an infinite sequenceof trials in order to define such probabilities. But what if theactual world does not provide an infinite sequence of trials of agiven experiment? Indeed, that appears to be the norm, and perhapseven the rule. In that case, we are to identify probability with ahypothetical orcounterfactual limiting relativefrequency. We are to imagine hypothetical infinite extensions of anactual sequence of trials; probabilities are then what the limitingrelative frequencieswould be if the sequence were soextended. We might thus call this interpretationhypotheticalfrequentism:
the probability of an attribute A in a reference class B is thevalue the limiting relative frequency of occurrences of A within Bwould be if B were infinite.
Note that at this point we have left empiricism behind. A modalelement has been injected into frequentism with this invocation of acounterfactual; moreover, the counterfactual may involve a radicaldeparture from the way things actually are, one that may even requirethe breaking of laws of nature. (Think what it would take for the coinin my pocket, which has only been tossed once, to be tossed infinitelymany times — never wearing out, and never running short ofpeople willing to toss it!) One may wonder, moreover, whether there isalways — or ever — a fact of the matter of what suchcounterfactual relative frequencies are.
Limiting relative frequencies, we have seen, must be relativized to asequence of trials. Herein lies another difficulty. Consider aninfinite sequence of the results of tossing a coin, as it might be H,T, H, H, H, T, H, T, T, … Suppose for definiteness that thecorresponding relative frequency sequence for heads, which begins 1/1,1/2, 2/3, 3/4, 4/5, 4/6, 5/7, 5/8, 5/9, …, converges to 1/2. Bysuitably reordering these results, we can make the sequence convergeto any value in [0, 1] that we like. (If this is not obvious, considerhow the relative frequency of even numbers among positive integers,which intuitively ‘should’ converge to 1/2, can instead bemade to converge to 1/4 by reordering the integers with the evennumbers in every fourth place, as follows: 1, 3, 5, 2, 7, 9, 11, 4,13, 15, 17, 6, …) To be sure, there may be something naturalabout the ordering of the tosses as given — for example, it maybe theirtemporal ordering. But there may be more than onenatural ordering. Imagine the tosses taking place on a train thatshunts backwards and forwards on tracks that are oriented west-east.Then thespatial ordering of the results from west to eastcould look very different. Why should one ordering be privileged overothers?
A well-known objection to any version of frequentism is thatrelative frequencies must berelativised to areference class. Consider a probability concerning myself that I careabout — say, my probability of living to age 80. I belong to theclass of males, the class of non-smokers, the class of philosophyprofessors who have two vowels in their surname, … Presumablythe relative frequency of those who live to age 80 varies across (mostof) these reference classes. What, then, is my probability of livingto age 80? It seems that there is no single frequentist answer.Instead, there is my probability-qua-male, myprobability-qua-non-smoker, my probability-qua-male-non-smoker, and soon. This is an example of the so-calledreference classproblem for frequentism (although it can be argued that analoguesof the problem arise for the other interpretations as well[10]). And as we have seen in the previous paragraph, the problem is onlycompounded for limiting relative frequencies: probabilities must berelativized not merely to a reference class, but to a sequence withinthe reference class. We might call this thereference sequenceproblem.
The beginnings of a solution to this problem would be to restrict ourattention to sequences of a certain kind, those with certain desirableproperties. For example, there are sequences for which the limitingrelative frequency of a given attribute does not exist; Reichenbachthus excludes such sequences. Von Mises (1957) gives us a morethoroughgoing restriction to what he callscollectives— hypothetical infinite sequences of attributes (possibleoutcomes) of specified experiments that meet certain requirements.Call aplace-selection an effectively specifiable method ofselecting indices of members of the sequence, such that the selectionor not of the index \(i\) depends at most on the first \(i - 1\)attributes. Von Mises imposes these axioms:
Axiom of Convergence: the limiting relative frequency of anyattribute exists.Axiom of Randomness: the limiting relative frequency of eachattribute in a collective \(\omega\) is the same in any infinitesubsequence of \(\omega\) which is determined by a placeselection.
The probability of an attribute \(A\), relative to a collective\(\omega\), is then defined as the limiting relative frequency of\(A\) in \(\omega\). Note that a constant sequence such as H, H, H,…, in which the limiting relative frequency is the same inany infinite subsequence, trivially satisfies the axiom ofrandomness. This puts some strain on the terminology — offhand,such sequences appear to be asnon-random as they come— although to be sure it is desirable that probabilities beassigned even in such sequences. Be that as it may, there is aparallel between the role of the axiom of randomness in vonMises’ theory and the principle of maximum entropy in theclassical theory: both attempt to capture a certain notion ofdisorder.
Collectives are abstract mathematical objects that are not empiricallyinstantiated, but that are nonetheless posited by von Mises to explainthe stabilities of relative frequencies in the behavior of actualsequences of outcomes of a repeatable random experiment. Church (1940)renders precise the notion of a place selection as a recursivefunction. Nevertheless, the reference sequence problem remains:probabilities must always be relativized to a collective, and for agiven attribute such as ‘heads’ there are infinitely many.Von Mises embraces this consequence, insisting that the notion ofprobability only makes sense relative to a collective. In particular,he regards single case probabilities as nonsense: “We can saynothing about the probability of death of an individual even if weknow his condition of life and health in detail. The phrase‘probability of death’, when it refers to a single person,has no meaning at all for us” (11). Some critics believe thatrather than solving the problem of the single case, this merelyignores it. And note that von Mises drastically understates thecommitments of his theory: by his lights, the phrase‘probability of death’ also has no meaning at all when itrefers to a million people, or a billion, or any finite number —after all, collectives areinfinite. More generally, it seemsthat von Mises’ theory has the unwelcome consequence thatprobability statements never have meaning in the real world, forapparently all sequences of attributes are finite.
Let us see how the frequentist interpretations fare according to ourcriteria of adequacy. Finite relative frequencies of course satisfyfinite additivity. In a finite reference class, only finitely manyevents can occur, so only finitely many events can have positiverelative frequency. In that case, countable additivity is satisfiedsomewhat trivially: all but finitely many terms in the infinite sumwill be 0. Limiting relative frequencies violate countable additivity(de Finetti 1972, §5.22). Indeed, the domain of definition oflimiting relative frequency is not even a field, let alone a sigmafield (de Finetti 1972, §5.8). So such relative frequencies donot provide an admissible interpretation of Kolmogorov’s axioms.Finite frequentism has no trouble meeting the ascertainabilitycriterion, as finite relative frequencies are in principle easilydetermined. The same cannot be said of limiting relative frequencies.On the contrary, any finite sequence of trials (which, after all, isall we ever see) puts literally no constraint on the limit of aninfinite sequence; still less does anactual finite sequenceput any constraint on the limit of an infinitehypotheticalsequence, however fast and loose we play with the notion of ‘inprinciple’ in the ascertainability criterion.
It might seem that the frequentist interpretations resoundingly meetthe applicability to frequencies criterion. Finite frequentism meetsit all too well, while hypothetical frequentism meets it in the wrongway. If anything, finite frequentism makes the connection betweenprobabilities and frequenciestoo tight, as we have alreadyobserved. A fair coin that is tossed a million times is veryunlikely to land headsexactly half the time; onethat is tossed a million and one times is even less likely to do so!Facts about finite relative frequencies should serve as evidence, butnotconclusive evidence, for the relevant probabilityassignments. Hypothetical frequentism fails to connect probabilitieswith finite frequencies. It connects them with limiting relativefrequencies, of course, but again too tightly: for even in infinitesequences, the two can come apart. (A fair coin could land headsforever, even if it is highly unlikely to do so.) To be sure, sciencehas much interest in finite frequencies, and indeed working with themis much of the business of statistics. Whether it has any interest inhighly idealized, hypothetical extensions of actual sequences, andrelative frequencies therein, is another matter. The applicability torational beliefs and to rational decisions go much the same way. Suchbeliefs and decisions are guided by finite frequency information, butthey arenot guided by information about limits ofhypothetical frequencies, since one never has such information. Formuch more extensive critiques of finite frequentism and hypotheticalfrequentism, see Hájek (1997) and Hájek (2009)respectively, and La Caze (2016).
Like the frequency interpretations,propensityinterpretations regard probabilities as objective properties ofentities in the real world. Probability is thought of as a physicalpropensity, or disposition, or tendency of a given type of physicalsituation to yield an outcome of a certain kind, or to yield a longrun relative frequency of such an outcome.
While Popper (1957) is often credited as being the pioneer ofpropensity interpretations, we already find the key idea in thewritings of Peirce (1910, 79–80): “I am, then, to definethe meaning of the statement that theprobability, that if adie be thrown from a dice box it will turn up a number divisible bythree, is one-third. The statement means that the die has a certain‘would-be’; and to say that the die has a‘would-be’ is to say that it has a property, quiteanalogous to anyhabit that a man might have.” Aman’s habit is a paradigmatic example of a disposition;according to Peirce the die’s probability of landing 3 or 6 isan analogous disposition. We might think of various habits coming indifferent degrees, measuring their various strengths. Analogously, thedie’s propensities to land various ways measure the strength ofits dispositions to do so.
Peirce continues: “Now in order that the full effect of thedie’s ‘would-be’ may find expression, it isnecessary that the die should undergo an endless series of throws fromthe dice box”, and he imagines the relative frequency of theevent-type in question oscilating from one side of 1/3 to another.This again anticipates Popper’s view. But an importantdifference is that Peirce regards the propensity as a property of thedie itself, whereas Popper attributes the propensity to the entirechance set-up of throwing the die.
Popper (1957) is motivated by the desire to make sense of single-caseprobability attributions that one finds in quantum mechanics—forexample ‘the probability that this radium atom decays in 1600years is 1/2’. He develops the theory further in (1959a). Forhim, a probability \(p\) of an outcome of a certain type is apropensity of a repeatable experiment to produce outcomes of that typewith limiting relative frequency \(p\). For instance, when we say thata coin has probability 1/2 of landing heads when tossed, we mean thatwe have a repeatable experimental set-up — the tossing set-up— that has a propensity to produce a sequence of outcomes inwhich the limiting relative frequency of heads is 1/2. With its heavyreliance on limiting relative frequency, this position riskscollapsing into von Mises-style frequentism according to some critics.Giere (1973), on the other hand, explicitly allows single-casepropensities, with no mention of frequencies: probability is just apropensity of a repeatable experimental set-up to produce sequences ofoutcomes. This, however, creates the opposite problem toPopper’s: how, then, do we get the desired connection betweenprobabilities and frequencies?
It is thus useful to follow Gillies (2000a, 2016) in distinguishinglong-run propensity theories andsingle-casepropensity theories:
A long-run propensity theory is one in which propensities areassociated with repeatable conditions, and are regarded aspropensities to produce in a long series of repetitions of theseconditions frequencies which are approximately equal to theprobabilities. A single-case propensity theory is one in whichpropensities are regarded as propensities to produce a particularresult on a specific occasion (2000a, 822).
Hacking (1965) and Gillies offer long-run (though not infinitelylong-run) propensity theories. Fetzer (1982, 1983) and Miller (1994)offer single-case propensity theories. So does Popper in a later work(1990), in which he regards propensities as “properties ofthe whole physical situation and sometimes of the particularway in which a situation changes” (17). Note that‘propensities’ are categorically different thingsdepending on which sort of theory we are considering. According to thelong-run theories, propensities are tendencies to produce relativefrequencies with particular values, but the propensities are notmeasured by the probability values themselves; according to thesingle-case theories, the propensitiesare measured by theprobability values. According to Popper’s earlier view, forexample, a fair die has a propensity — anextremelystrong tendency — to land ‘3’ with long-runrelative frequency 1/6. The small value of 1/6 doesnotmeasure this tendency. According to Giere, on the other hand, the diehas aweak tendency to land ‘3’. The value of 1/6does measure this tendency.
It seems that those theories that tie propensities to frequencies donot provide an admissible interpretation of the (full) probabilitycalculus, for the same reasons that relative frequencies do not. It isprima facie unclear whether single-case propensity theoriesobey the probability calculus or not. To be sure, one canstipulate that they do so, perhaps using that stipulation aspart of the implicit definition of propensities. Still, it remains tobe shown that there really are such things — stipulating what awitch is does not suffice to show that witches exist. Indeed, toclaim, as Popper does, that an experimental arrangement has a tendencyto produce a given limiting relative frequency of a particularoutcome, presupposes a kind of stability or uniformity in the workingsof that arrangement (for the limit would not exist in a suitablyunstable arrangement). But this is the sort of‘uniformity of nature’ presupposition that Hume arguedcould not be known eithera priori, or empirically. Now,appeals can be made to limit theorems — so called ‘laws oflarge numbers’ — whose content is roughly that undersuitable conditions, such limiting relative frequencies almostcertainly exist, and equal the single case propensities. Still, thesetheorems make assumptions (e.g., that the trials are independent andidentically distributed) whose truth again cannot be known, and mustmerely be postulated.
Part of the problem here, say critics, is that we do not know enoughabout what propensities are to adjudicate these issues. There issome property of this coin tossing arrangement such that thiscoin would land heads with a certain long-run frequency, say. But asHitchcock (2002) points out, “calling this property a‘propensity’ of a certain strength does little to indicatejust what this property is.” Said another way, propensityaccounts are accused of giving empty accounts of probability, àla Molière’s ‘dormative virtue’ (Sober 2000,64). Similarly, Gillies objects to single-case propensities on thegrounds that statements about them are untestable, and that they are“metaphysical rather than scientific” (825). Some mightlevel the same charge even against long-run propensities, which aresupposedlydistinctfrom the testable relativefrequencies.
This suggests that the propensity account has difficulty meeting theapplicability to science criterion. Some propensity theorists (e.g.,Giere) liken propensities to physical magnitudes such as electricalcharge that are the province of science. But Hitchcock observes thatthe analogy is misleading. We can only determine the generalproperties of charge — that it comes in two varieties, that likecharges repel, and so on — by empirical investigation. Whatinvestigation, however, could tell us whether or not propensities arenon-negative, normalized and additive? (See also Eagle 2004.)
More promising, perhaps, is the idea that propensities are to playcertain theoretical roles, and that these place constraints on the waythey must behave, and hence what they could be (in the style of theRamsey/Lewis/‘Canberra plan’ approach to theoretical terms— see Lewis 1970 or Jackson 2000). The trouble here is thatthese roles may pull in opposite directions,overconstrainingthe problem. The first role, according to some, constrains them toobey the probability calculus (with finite additivity); the secondrole, according to others, constrains them to violate it.
On the one hand, propensities are said to constrain the degrees ofbelief, orcredences, of a rational agent. Recall the‘applicability to rational beliefs’ criterion: aninterpretation should clarify the role that probabilities play inconstraining the credences of rational agents. One such putative rolefor propensities is codified by Lewis’s ‘PrincipalPrinciple’. (See section 3.3.) The Principal Principle underpinsan argument (Lewis 1980) that whatever they are, propensities mustobey the usual probability calculus (with finite additivity). Afterall, it is argued, rational credences, which are guided by them,do.
On the other hand, Humphreys (1985) gives an influential argument thatpropensities donot obey Kolmogorov’s probabilitycalculus. The idea is that the probability calculus impliesBayes’ theorem, which allows us to reverse aconditional probability:
\[ P(A\mid B) = \frac{P(B\mid A) \cdot P(A)}{P(B)} \]Yet propensities seem to be measures of ‘causaltendencies’, and much as the causal relation is asymmetric, sothese propensities supposedly do not reverse. Suppose that we have atest for an illness that occasionally gives false positives and falsenegatives. A given sick patient may have a (non-trivial) propensity togive a positive test result, but it apparently makes no sense to saythat a given positive test result has a (non-trivial) propensity tohave come from a sick patient. Thus, we have an argument that whateverthey are, propensities mustnot obey the usual probabilitycalculus. ‘Humphreys’ paradox’, as it is known, isreally an argument against any formal account of propensities that hasas a theorem:
however one understands these conditional probabilities. The argumenthas prompted Fetzer and Nute (in Fetzer 1981) to offer a“probabilistic causal calculus” that looks quite differentfrom Kolmogorov’s calculus.[11] But one could respond more conservatively, as Lyon (2014) points out.For example, Rényi’s axiomatization of primitiveconditional probabilities does not have (∗) as a theorem, andthus propensities may conform to it despite Humphreys’ argument.Nonetheless, Lyon offers “a more general problem for thepropensity interpretation. There are all sorts of pairs of events thathave no propensity relations between them, and all three axiomsystems—Kolmogorov’s, Popper’s, andRényi’s—will sometimes force there to beconditional probabilities between them. This is not an argument thatthere is no alternative axiom system that propensity theorists canadopt, but it is an argument that the three main contenders are notviable” (124).
Or perhaps all this shows that the notion of ‘propensity’bifurcates: on the one hand, there are propensities that bear anintimate connection to relative frequencies and rational credences,and that obey the usual probability calculus (with finite additivity);on the other hand, there are causal propensities that behave ratherdifferently. In that case, there would be still more interpretationsof probability than have previously been recognized.
Traditionally, philosophers of probability have recognized fiveleading interpretations of probability—classical, logical,subjectivist, frequentist, and propensity. But recently, so-calledbest-system interpretations of chance have becomeincreasingly popular and important. While they bear some similaritiesto frequentist accounts, they avoid some of frequentism’s majorfailings; and while they are sometimes assimilated to propensityaccounts, they are really quite distinct. So they deserve separatetreatment.
The best-system approach was pioneered by Lewis (1994b). His analysisof chance is based on his account oflaws of nature (1973),which in turn refines an account due to Ramsey (1928/1990). Accordingto Lewis, the laws of nature are the theorems of thebestsystematization of the universe—thetrue theorythat best combines the theoretical virtues ofsimplicity andstrength. These virtues trade off. It is easy for a theory to besimple but not strong, by saying very little; it is easy for a theoryto be strong but not simple, by conjoining lots of disparate facts.The best theory balances simplicity and strength optimally—inshort, it is the most economical true theory.
So far, there is no mention of chances. Now, we allow probabilistictheories to enter the competition. We are not yet in a position tospeak of such theories as being true. Instead, let us introduceanother theoretical virtue:fit. The more probable the actualhistory of the universe is by the lights of the theory, the better itfits that history. Now the theories compete according to how well theycombine simplicity, strength, and fit. The theorems of the winningtheory are the laws of nature. Some of these laws may beprobabilistic. The chances are the probabilities that are determinedby these probabilistic laws.
According to Lewis (1986b), intermediate chances are incompatible withdeterminism. Loewer (2004) agrees that intermediatepropensities are incompatible with determinism, understandingthose to be essentiallydynamical: “they specify thedegree to which one state has a tendency to cause another” (15).But he argues thatchances are best understood along Lewisianbest-system lines, and that there is no reason to limit them todynamical chances. In particular, best-system chances may also attachtoinitial conditions: adding to the dynamical laws aprobability assignment, ordistribution, over initialconditions may provide a substantial gain in strength with relativelylittle cost in simplicity. Science furnishes important examples ofdeterministic theories with such initial-condition probabilities.Adding the so-called micro-canonical distribution to Newton’slaws (and the assumption that the distant past had low entropy) yieldsall of statistical mechanics; adding the so-called quantum equilibriumdistribution to Bohm’s dynamical laws yields standard quantummechanics. Indeed, this contact with actual science is one of theselling points of best-system analyses. See Schwarz (2016) for furtherselling points.
At first blush, best-systems analyses seem to score well on ourcriteria of adequacy. They are admissible by definition: chances aredetermined by probabilistic laws (rather than by those expressed bysome other formalism). One could in principle ascertain values ofprobabilities, since they supervene on what actually happens in theuniverse (though ‘in principle’ bears a heavy burden).Applicability to frequencies is secured through the role that‘fit’ plays. Schwarz (2014) offers a proof of thePrincipal Principle, which could be taken to undergird thebest-systems analyses’ applicability to rational beliefs andrational decisions. And we have just mentioned theinterpretation’s applicability to science.
This approach solves, or at least eases, some of frequentism’sproblems. Progress can be made on the problem of the single case. Thechances of a rare atom decaying in various time intervals may bedetermined by a more pervasive functional law, in which decay chancesare given for a far wider range of atoms by plugging in a range ofsettings of some other magnitude (e.g., atomic number). And simplicitymay militate in favour of this functional law being continuous, soeven irrational-valued probabilities may be assigned. Moreover, bareratios of attributes among sets of disparate objects will not qualifyas chances if they are not pervasive enough, for then a theory thatassigns them probabilities will lose too much simplicity withoutsufficient gain in strength.
However, some other problems for frequentism remain, and some new onesemerge, beginning with more basic problems for the Lewisian account oflawhood itself. Some of them are partly a matter of Lewis’sspecific formulation. Critics (e.g. van Fraassen 1989) question therather nebulous notion of “balancing” simplicity andstrength, which are themselves somewhat sketchy. But arguably sometechnical story (e.g. information-theoretic) could be offered toprecisify them. Lewis himself worries that the exchange rate for suchbalancing may depend partly on our psychology, in which case there isthe threat the laws themselves depend on our psychology, anunpalatable idealism about them. But he maintains that this threat isnot serious as long as “nature is kind”, and one theory isso robustly the front-runner that it remains so under any reasonablestandards for balancing. And again, perhaps technical tools can offersome objectivity here. (See section 4 for a gesture at suchtools.)
More telling is the concern that simplicity is language-relative, andindeed that any theory can be given the simplest specificationpossible: simply abbreviate it as \(T\)! Lewis replies that atheory’s simplicity must be judged according to itsspecification in a canonical language, in which all of the predicatescorrespond tonatural properties. Thus, ‘green’may well be eligible, but ‘grue’ surely is not. (SeeGoodman 1955.) Our abbreviation, then, has to be unpacked in terms ofsuch a language, in which its true complexity will be revealed. Butthis now involves a substantial metaphysical commitment to adistinction between natural and unnatural properties, one that variousempiricists (e.g. van Fraassen 1989) find objectionable.
Further problems arise with the refinement to handle probabilisticlaws. Again, some of them may be due to Lewis’s particularformulation. Elga (2004) observes that Lewis’s notion of fit isproblematic in various infinite universes—think of an infinitesequence of tosses of a coin. Offhand, it seems that the particularinfinite sequence that is actualized will be assigned probabilityzero by any plausible candidate theory that regards theprobability of heads as intermediate and the trials as independent.Elga argues, moreover, that there are technical difficulties withaddressing this problem with infinitesimal probabilities. However,perhaps we merely need a different understanding of‘fit’—perhaps understood as ‘typicality’(Elga), or perhaps one closer to that employed by statisticians with‘chi-squared’ tests of goodness of fit (Schwarz 2014).
Hoefer (2007) modifies Lewis’s best-system account in light ofsome of these problems. Hoefer understands “best” as“best for us”, covering regularities that are of interestto us, using the language both of science and of daily life, withoutany special privilege bestowed upon natural properties. Moreover, the“best system” is now one of chances directly, rather thanof laws. Thus, there may be chances associated with the punctuality oftrains, for example, without any presumption that there are anyassociated laws. Hoefer follows Elga in understanding‘fit’ as ‘typicality’. Strength is a matter ofthe size of the overall domain of the best system’s probabilityfunctions. Simplicity is to be understood in terms of elegantunification, and user-friendliness to beings like us. As a result,Hoefer embraces the agent-centric nature of chances in his sense,regarding as essential the credence-guiding role for them that iscaptured by the Principal Principle. This is how his account meets the‘applicability to rational beliefs’ criterion.
However, some other problems for Lewis’s account may run deeper,threatening best-system analyses more generally, and symptomatic ofthe ghost of frequentism that still hovers behind such analyses. Oneproblem for frequentism that we saw strikes at the heart of anyattempt to reduce chances to properties of patterns of outcomes. Suchoutcomes may be highly misleading regarding the true chances,because of their probabilistic nature. This is most vivid forevents that are single-case by any reasonable typing. Whether or ouruniverse turns out to be open or closed, plausibly that outcome iscompatible with any underlying intermediate chance. The pointgeneralizes, however pervasive the probabilistic pattern might be.Plausibly, a coin’s landing 9 heads out of 10 tosses iscompatible with any underlying intermediate chance for heads; and soon. The pattern of outcomes that is instantiated may be a poor guideto the true chance. (See Hájek 2009 for further argumentsagainst frequentism that carry over to best-system accounts.)
Another way of putting the concern is that best-system accountsmistake an idealized epistemology of chance for its metaphysics(though see Lewis’ insistence that this is not the case, in his1994). Such accounts single out three theoretical virtues—andone may wonder whyjust those three—and reifies theprobabilities of a theory that displays the virtues to the highestdegree. But a probabilistic world may be recalcitrant to even the besttheorizing: nature may be unkind.
It should be clear from the foregoing that there is still much work tobe done regarding the interpretations of probability. Eachinterpretation that we have canvassed seems to capture some crucialinsight into a concept of it, yet falls short of doing completejustice to this concept. Perhaps the full story about probability issomething of a patchwork, with partially overlapping pieces andprinciples about how they ought to relate. In that sense, the aboveinterpretations might be regarded as complementary, although to besure each may need some further refinement. My bet, for what it isworth, is that we will retain the distinct notions of physical,logical/evidential, and subjective probability, with a rich tapestryof connections between them.
There are further signs of the rehabilitation of classical and logicalprobability, and in particular the principle of indifference and theprinciple of maximum entropy, by authors such as Paris andVencovská (1997), Maher (2000, 2001), Bartha and Johns (2001),Novack (2010), White (2010), and Pettigrew (2016). However, Rinard(2014) argues that the principle of indifference leads to incoherenceeven when imprecise probabilities are allowed. Eva (2019) resurrectsthe principle as a constraint oncomparative probabilities ofthe form ‘I am more confident inp than inq’ or ‘I am equally confident inp andq’. This, in turn, showcases another recent trend: anincreased interest in comparative probabilities.
Relevant here may also be advances in information theory andcomplexity theory. Information theory uses probabilities to define theinformation in a particular event, the degree of uncertainty in arandom variable, and the mutual information between random variables(Shannon 1948, Shannon & Weaver 1949). This theory has beendeveloped extensively to give accounts of complexity, optimal datacompression and encoding (Kolmogorov 1965, Li and Vitanyi 1997, Coverand Thomas 2006; see the entry oninformation for more details). It is applied across the sciences, from itsnatural home in computer science and communication theory, to physicsand biology. Interpreting information in these areas goes hand-in-handwith interpreting the underlying probabilities: each concept ofprobability has a corresponding concept of information. For example,Scarantino (2015) offers an account of ‘naturalinformation’ in biology that is compatible with either a logicalinterpretation of probability or objective Bayesian interpretation,while Kraemer (2015) offers one that rests on a finite frequencyinterpretation.
Information theory has also proved to be fruitful in the study ofrandomness (Kolmogorov 1965, Martin-Löf 1966), which obviously isintimately related to the notion of probability – see Eagle(2016), and the entry onchance versus randomness. Refinements of our understanding of randomness, in turn, should havea bearing on the frequency interpretations (recall von Mises’appeal to randomness in his definition of a ‘collective’),and on propensity accounts (especially those that make explicit tiesto frequencies). Given the apparent connection between propensitiesand causation adumbrated in Section 3.5, powerful causal modellingmethods should also prove fruitful here. More generally, the theory ofgraphical causal models (also known as Bayesian networks) usesdirected acyclic graphs to represent causal relationships in a system.(See Spirtes, Glymour and Scheines 1993, Pearl 2000, Woodward 2003.)The graphs and the probabilities of the system’s variablesharmonize in accordance with the causal Markov condition, asophisticated version of Reichenbach’s slogan “nocorrelation without causation”. (See the entry oncausal models for more details.) Thus again, each understanding of probability hasa counterpart understanding of causal networks.
Regarding best-system interpretations of chance, I noted that it issomewhat unclear exactly what ‘simplicity’ and‘strength’ consist in, and exactly how they are to bebalanced. Perhaps insights from statistics and computer science may behelpful here: approaches to statistical model selection, and inparticular the ‘curve-fitting’ problem, that attempt tocharacterize simplicity, and its trade-off with strength — e.g.,the Akaike Information Criterion (see Forster and Sober 1994), theBayesian Information Criterion (see Kieseppä 2001), MinimumDescription Length theory (see Rissanen 1999) and Minimum MessageLength theory (see Wallace and Dowe 1999).
Physical probabilities are becoming even more crucial to scientificinquiry. Probabilities are not just used to characterize the supportgiven to scientific theories by evidence; they appear essentially inthe content of the theories themselves. This has led to fertilephilosophical ground interpreting the probabilities in such theories.For example, quantum mechanics has physical probabilities at thefundamental level. The interpretation of these probabilities isrelated to the interpretation of the theory itself (see the entry onphilosophical issues in quantum theory). Statistical mechanics and evolutionary theory have non-fundamentalobjective probabilities. Are they genuine chances? How can we accountfor them? See Strevens (2003) and Lyon (2011) for discussion. However,Schwarz (2018) argues that these probabilities can and should be leftuninterpreted. Loewer (2012, 2020) proposes that the Lewisian bestsystem of our world is given by “theMentaculus”—a complete probability map of theuniverse. This is Albert’s (2000) package of:
Another ongoing debate regarding physical probabilities concernswhether chance is compatible with determinism—see, e.g.,Schaffer (2007), who is an incompatibilist, and Ismael (2009) andLoewer (2020), who are compatibilists. Handfield and Wilson (2014)argue that chance ascriptions are context-sensitive, varying accordingto the relevant “evidence base”. This captures the thoughtthat in a deterministic universe, there issome sense inwhich all chances are extreme, while doing justice to othercompatibilist usages of chance. See Frigg (2016) for an overview ofthis debate. Relatedly, an important approach to objective probabilitythat has gained popularity involves the so-calledmethod ofarbitrary functions. Originating with Poincaré (1896), itis a mathematical technique for determining probability functions forcertain systems with chaotic dynamical laws mapping input conditionsto outcomes. Roughly speaking, the probabilities for the outcomes arerelatively insensitive to the probabilities over the various initialconditions — think of how the probabilities of outcomes of spinsof a roulette wheel apparently do not depend on how the wheel is spun,sometimes vigorously, sometimes feebly. See Strevens (2003, 2013) fordetailed treatments of this approach.
The subjectivist theory of probability is also thriving—indeed,it has been the biggest growth area among all the interpretations,thanks to the burgeoning of formal epistemology in the last couple ofdecades. For each of the topics that I will briefly mention, I canonly cite a few representative works.
Especially since Joyce (1998),accuracy arguments for variousBayesian norms have been influential. They include arguments forconditionalization (Greaves and Wallace 2006, Briggs and Pettigrew2020), the Reflection Principle (Easwaran 2013), and the PrincipalPrinciple (Pettigrew 2016). However, Mahtani (2021) argues that themathematical theorems that are invoked to support the accuracyapproach do not justify probabilism. These lines of research continueto develop. And these norms themselves have received furtherattention—e.g. Schoenfield (2017) on conditionalization, andHall (1994, 2004), Ismael (2008), and Briggs (2009) on the PrincipalPrinciple.
Yet for some problems, Bayesian modelling seems not to be sufficientlynuanced. A recently flourishing area has concerned modelling anagent’sself-locating credences, concerning who she is,or what time it is. The contents of such credences are usually takento be richer than just propositions (thought of as sets of possibleworlds); rather, they are finer-grained propositions (sets of centeredworlds — see Lewis 1979). This in turn has ramifications forupdating rules, in particular calling conditionalization intoquestion—see Meacham (2008). The so-called Sleeping Beautyproblem (Elga 2000) has generated much discussion in this regard. SeeTitelbaum (2012) for a comprehensive study and approach to suchproblems, Titelbaum (2016), and the entry on self-locating beliefs fora survey of the literature. These continue to be fertile areas ofresearch.
On the other hand, there is another sense in which Bayesian modellinghas been regarded astoo nuanced. It seems to bepsychologically unrealistic to portrayhumans (rather thanideally rational agents) as having degrees of belief that areinfinitely precise real numbers. Thus, there have been variousattempts to ‘humanize’ Bayesianism, and this line ofresearch is gaining momentum. For example, there has been aflourishing study of imprecise probability and imprecise decisiontheory, in which credences need not be precise numbers—forexample, they could be sets of numbers, or intervals. Seehttp://www.sipta.org/ for up-to-date research in this area. Thisresonates with recent work on whether imprecise probabilities arerationally required—Hájek and Smithson (2012) and Isaacs,Hájek, and Hawthorne (2022) on the pro side, Schoenfield (2017)on the con side. The debate continues.
Nor is it plausible that humans obey all the theorems of theprobability calculus—we are incoherent in all sorts of ways. Thelast couple of decades have also seen research on degrees ofincoherence—measuring the extent of departures from obedience tothe probability calculus—including Zynda (1996), Schervish,Seidenfeld and Kadane (2003), De Bona and Staffel (2017, 2018), andStaffel (2019). Lin (2013) sees traditional epistemology’snotion ofbelief as appropriate for humans who fall short ofthe Bayesian ideal, but who nevertheless may obey various doxasticnorms that can be given Bayesian endorsement. He models everydaypractical reasoning, with qualitative beliefs and desires, providing aqualitative decision theory and representation theorem. Easwaran(2016) takes humans to genuinely have all-or-nothing beliefs, butoffers aninstrumentalist justification for representingthose beliefs with probabilities.
It also a fact of life that humansdisagree with each other.How should an agent modify her credences (if at all) when shedisagrees on some claim with anepistemic peer—someonewho has the same evidence as her, and whom she regards as equally goodat evaluating that evidence? The literature on this topic is huge (seeKopec and Titelbaum (2016) for a survey, and the entry ondisagreement), and it connects in important ways with the interpretations ofprobability. Intuitively, we feel that disagreement with an epistemicpeer rationally calls for moving one’s opinion in the directionof theirs, since disagreement with a peer seems to be evidence thatone has made a mistake in evaluating one’s initial evidence. AsKelly (2010) argues, this ‘conciliationist’ intuitionappears to commit us to the evidential interpretation of probability,with the common evidence bestowing a unique probability on thedisputed claim. (See Schoenfield 2014 and Titelbaum 2016 for dissent;for a defense of the Uniqueness Thesis more generally, see Horowitzand Dogramaci 2016.) The intuition also appears to commit us toprobabilistic enkrasia: the view that our credences arebeholden to our attitudesabout evidential probabilities, inmuch the same way as the Principal Principle portrays our credences asbeholden to our attitudes about chances. (See Christensen 2013 andElga 2010 for versions of probabilistic enkrasia principles.)Let’s grant that disagreement with a peer about some claim isevidence that one has made a mistake regarding it. This should affectone’s opinion in it only if one’s attitude about thecorrect way to evaluate the evidence constrains one’sattitude about the claim. However, probabilistic enkrasia has beencriticised (see Williamson 2014; Lasonen-Aarnio 2015).
We thus come back full circle to where we started. The classical andlogical/evidential interpretations sought to capture an objectivenotion of probability that measures evidential support relations.Early proponents of the subjective interpretation gave us a highlypermissive notion of rational credences, constrained only by theprobability calculus. Less liberal subjectivists added furtherrationality constraints, with credences beholden to attitudes aboutphysical probabilities, and to evidential probabilities—at anextreme, to the point of uniqueness. The three kinds of concepts ofprobability that we identified at the outset converge:epistemological, degrees of confidence, and physical. Future researchwill doubtless explore further the relationships betweenthem—and how they provide guides to life.
Kyburg (1970) contains a vast bibliography of the literature onprobability and induction pre-1970. Also useful for references before1967 is the bibliography for “Probability” in theMacmillanEncyclopedia of Philosophy. Earman (1992) andHowson and Urbach (1993) have large bibliographies, and give detailedpresentations of the Bayesian program. Hájek and Hitchcock(2021 [Other Internet Resources]) has a more recent and extensiveannotated bibliography for all the interpretations of probabilitydiscussed in this entry. Skyrms (2000) is an excellent introduction tothe philosophy of probability. Von Plato (1994) is more technicallydemanding and more historically oriented, with another extensivebibliography that has references to many landmarks in the developmentof probability theory in the last century. Fine (1973) is still ahighly sophisticated survey of and contribution to variousfoundational issues in probability, with an emphasis oninterpretations. More recent philosophical studies of the leadinginterpretations include Childers (2013), Gillies (2000b), Galavotti(2005), Huber (2019), and Mellor (2005). Hájek and Hitchcock(2016) is a collection of original survey articles on philosophicalissues related to probability. Section IV includes chapters on most ofthe major interpretations of probability. It also includes coverage ofthe history of probability, Kolmogorov’s formalism andalternatives, and applications of probability in science andphilosophy. Joyce (2011) is a thorough survey of subjectiveBayesianism; Titelbaum (2022) is a wide-ranging and accessibleintroduction to Bayesian epistemology. Hájek and Lin (2017)canvass various respects of similarity and dissimilarity betweenBayesian epistemology and traditional epistemology. Knauff and Spohn(2021) is a comprehensive open access handbook on many topicsconcerning rationality; the chapter by Hájek and Staffel (2021)elaborates on a number of issues raised in this entry’sdiscussion of subjective probability. Eagle (2010) is a valuableanthology of many significant papers in the philosophy of probability,with detailed and incisive critical discussions. Billingsley (1995)and Feller (1968) are classic, rather advanced textbooks on themathematical theory of probability. Ross (2013) is less advanced andhas lots of examples.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
Carnap, Rudolf |causal models |causation: probabilistic |chance: versus randomness |decision theory |disagreement |Dutch book arguments |epistemology: Bayesian |information | Laplace, Pierre Simon |logic: inductive |Popper, Karl |probability, in medieval and Renaissance philosophy |quantum theory: philosophical issues in |Ramsey, Frank |Reichenbach, Hans |self-locating beliefs |statistics, philosophy of
I thank Branden Fitelson, Matthias Hild, Christopher Hitchcock, LeonLeontyev, Ralph Miles, Wolfgang Schwarz, Teddy Seidenfeld, GlennShafer, Elliott Sober, Jeremy Strasser, and Jim Woodward for theirmany helpful comments, and especially Jim Joyce, who gave me verydetailed and incisive feedback.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2023 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054