Randomness, as we ordinarily think of it, exists when some outcomes occur haphazardly, unpredictably, or by chance. These latter three notions are all distinct, but all have some kind of close connection to probability. Notoriously, there are many kinds of probability: subjective probabilities (‘degrees of belief’), evidential probabilities, and objective chances, to name a few (Hájek 2012), and we might enquire into the connections between randomness and any of these species of probability.In this entry, we focus on the potential connections between randomness and chance, or physical probability. The ordinary way that the word ‘random’ gets used is more or less interchangeable with ‘chancy’, which suggests this Commonplace Thesis—a useful claim to target in our discussion:
The Commonplace Thesis, and the close connection between randomness and chance it proposes, appears also to be endorsed in the scientificliterature, as in this example from a popular textbook on evolution (which also throws in the notion of unpredictability for good measure):
scientists use chance, or randomness, to mean that when physicalcauses can result in any of several outcomes, we cannot predict whatthe outcome will be in any particular case. (Futuyma 2005:225)
Some philosophers are, no doubt, equally subject to this unthinkingelision, but others connect chance and randomness deliberately. Suppesapprovingly introduces
the view that the universe is essentially probabilistic in character,or, to put it in more colloquial language, that the world is full ofrandom happenings. (Suppes 1984: 27)
However a number of technical and philosophical advances in ourunderstanding of both chance and randomness open up the possibilitythat the easy slide between chance and randomness in ordinary andscientific usage—a slide that would be vindicated by the truth ofthe Commonplace Thesis—is quite misleading. This entry willattempt to spell out these developments and clarify the differencesbetween chance and randomness, as well as the areas in which theyoverlap in application. It will also aim to clarify the relationship ofchance and randomness to other important notions in the vicinity,particularly determinism and predictability (themselves often subjectto confusion).
There will be philosophically significant consequences if theCommonplace Thesis is incorrect, and if ordinary usage is misleading.For example, it is intuitively plausible that if an event is trulyrandom it cannot be explained (if it happens for a reason, it isn’t truly random). It might seem then that the possibilityof probabilistic explanation is undermined when the probabilitiesinvolved are genuine chances. Yet this pessimistic conclusion onlyfollows under the assumption, derived from the Commonplace Thesis, thatall chancy outcomes are random. Another interesting case is the role ofrandom sampling in statistical inference. If randomness requireschance, then no statistical inferences on the basis of‘randomly’ sampling a large population will be valid unlessthe experimental design involves genuine chance in the selection ofsubjects. But the rationale for random sampling may not require chance sampling—as long as our sample is representative, thosestatistical inferences may be reliable. But inthat case, we’d be in a curious situation where random samplingwouldn’t have much to do with randomness, and whatever justificationfor beliefs based on random sampling that randomness is currentlythought to provide would need to be replaced by something else.
A final case of considerable philosophical interest is thefrequentist approach to objective probability, which claims (roughly) that the chance of an outcome is its frequency in an appropriate series of outcomes (Hájek 2012 §3.4). To avoid classifying perfectly regular recurring outcomes as chancy, frequentists like von Mises (1957) proposed to require that the series of outcomes should be random, without pattern or order. Frequentism may fall with the Commonplace Thesis: if there can be chancy outcomes without randomness, both will fail.
The Commonplace Thesis is central to all three examples. As it is widely accepted that probabilistic explanation is legimitate, that random sampling doesn’t need genuine chance (though it can help), and that frequentism is in serious trouble (Hájek 1997), there is already some some pressure on the CommonplaceThesis. But we must subject it to closer examination to clarifywhether these arguments do succeed, and what exactly it means to say ofsome event or process that it is random or chancy. Though developingfurther consequences of this kind is not the primary aim of this entry,it is hoped that what is said here may help to untangle these and othervexed issues surrounding chance and randomness.
To get clear on the connections and differences between chance andrandomness, it would be good first to have some idea of what chance andrandomness amount to. Interestingly, philosophical attention hasfocussed far more on chance than randomness. This may well be aconsequence of the Commonplace Thesis. Whatever its source, we canappeal to a substantial consensus in the philosophical literature as towhat kind of thing chance must be.
Carnap (1945) distinguished between two conceptions of probability,arguing that both were scientifically important. His‘probability\(_1\)’ corresponds to an epistemicnotion, nowadays glossed either (following Carnap himself) asevidential probability, or ascredence or degree ofbelief. This is contrasted with Carnap’s‘probability\(_2\)’, which is the concept of anon-epistemic objective kind of probability, better known aschance.
There are many philosophical accounts of what actually groundschance, as part of the minor philosophical industry of producing‘interpretations’—really, reductive analyses orexplications—of probability. In this vein we have at least thefrequency theory of Reichenbach (1949) and von Mises (1957)(and Carnap’s own explication of probability\(_2\) was in termsof frequencies), thepropensity theory of Popper (1959) andGiere (1973), as well as many more recent accounts, notably Lewis’(1994) ‘Best System’ account of chance (see also Loewer2004). There is no agreement over which, if any, of these accounts areright; certainly both the accounts mentioned face difficulties ingiving an adequate account of chance. The consensus mentioned earlieris not over what actually plays the role of chance, but rather on theconstraints that determine what that role is.
There can be such a consensus because ‘chance’ is not atechnical term, but is rather an ordinary concept deployed in fairlyfamiliar situations (games of chance, complicated and unpredictablescenarios, large ensembles of similar events, etc.). There iswidespread agreement amongst native speakers of English over when‘chance’ applies to a particular case, and this agreementat least indicates that there is a considerable body of ordinary beliefabout chance. One needn’t take the deliverances of folk intuition assacrosanct to recognise that this ordinary belief provides the startingpoint for philosophical accounts of chance. It may turn out thatnothing fits the role picked out by these ordinary beliefs andtheir philosophical precisifications, yet even in that case we’d beinclined to conclude that chance doesn’t exist, not that our ordinarybeliefs about what chance must be are incorrect.
Below, some of the theoretical principles that philosophers haveextracted from commonplace beliefs about chance will be outlined. (Inso doing, we freely use probabilistic notation and concepts; see the entry on interpretations of probability,Hájek 2012:§1, for backgroundprobability theory required to understand these expressions.) Two suchconstraints have been widely accepted since the early days of thephilosophy of probability. Firstly, it is required that the mathematicsof chance should conform to some standard mathematical theory ofprobability such as Kolmogorov’s 1933 axiomatisation of the probabilitycalculus (or some recognisable variant thereof, like Popper’saxiomatisation of conditional probability). Secondly, chance should beobjective: mind-independent and not epistemic or evidential. But anumber of other constraints have been articulated and defended in theliterature. (Schaffer 2007: §4 contains a useful discussion ofthese and other constraints on the chance role.) While these principleshave been termed ‘things we know about chance’, thisshouldn’t be taken to preclude our discovering that there is no suchthing as chance—rather, the philosophical consensus is that ifthere is any such thing as chance, it will (more or less) fit theseconstraints.
Chance should regulate (that is, it is involved with normsgoverning) rational belief in line with Lewis’ Principal Principle(Lewis, 1980), or his New Principle (Lewis, 1994; Hall, 2004). Where\(C\) is a reasonable initial credence function, and \(E\) isthe evidence, the Principal Principle (omitting some complications) is this:
(PP)\(C(p \mid \ulcorner\Ch(p) = x\urcorner \wedge E) = x\);
This principle says that rational initial credence should treatchance as an expert, deferring to it with respect to opinions aboutthe outcome \(p\), by adopting the corresponding chances as yourown conditional degree of belief. The New Principle—adopted todeal with some problematic interactions between Lewis’ metaphysics andthe PP—advocates deferring to chance in a somewhat differentway. This New Principle NP suggests (more or less) that rational initial credenceshould treat chance as an expert with respect to theoutcome \(p\), by adopting the conditional chances, on theadmissible evidence, as your conditional degrees of belief on thatsame evidence, as in this principle (this claim corresponds toequation 3.9 in Hall 2004; see his discussion at pp. 102–5 forsome important qualifications, and the connection to the formulationof the NP in Lewis 1994):
(Chance-Analyst)\(C(p \mid \ulcorner\Ch(p \mid E) = x\urcorner \wedge E) = x\).
Non-reductionist views about chance, which take chances to beindependent fundamental features of reality, can follow PP.Reductionists, who take the values of chances to be fixed entirely byother features of reality (normally frequencies and symmetries, butchance is typically constrained by them in a quite indirect manner), are for technical reasons (to do with undermining, see supplementA.1) ordinarily forced to adopt NP, and Chance-Analyst, asnorms on credence, though in many ordinary cases, NP and PP givevery similar recommendations. In either case, both formal principlesgive content to the intuitively plausible idea that chances should guide reasonable credence.[1]
Chance should connect with possibility. Leibniz claimed thatprobability was a kind of ‘graded possibility’, and morerecent authors have largely agreed. In particular, it seems clear thatif an outcome has some chance of occurring, then it is possible thatthe outcome occurs. This intuition has been made precise in the Basic Chance Principle (BCP) (See supplementA.2 for further details on thisprinciple):
Suppose \(x \gt 0\) and\(Ch_{tw}(A) = x\). Then\(A\) is true in at least one of those worlds \(w'\)that matches \(w\) up to time \(t\) and for which\(Ch_t (A) = x\). (Bigelowet al., 1993: 459)
But one needn’t accept precisely this version of the BCP to endorsethe general thesis that chance and possibility must be linked—forother versions of this kind of claim, see Mellor (2000); Eagle(2011) and the ‘realization principle’ of Schaffer(2007: 124).
Chance should connect with actual frequencies, at least to theextent of permitting frequencies to be good evidence for the values ofchances. This may be through some direct connection between chance andfrequency, or indirectly through the influence of observed outcomefrequencies on credences about chances via the Principal Principle(Lewis, 1980: 104–6). But chance should not be identified withfrequency—since a fair coin can produce any sequence of outcomes,there is no possibility of identifying chance with observed frequency.(Though of course a fair coin is overwhelmingly likely to produceroughly even numbers of heads and tails when tossed often enough).Moreover, there can be a chance for a kind of outcome even when thereare very few instances of the relevant process that leads to thatoutcome, resulting in the actual frequencies being misleading ortrivial (for example, if there is only a single actual outcome:Hájek 1997).
When considering the connection between frequency and chance, notjust any frequency will do. What is wanted is the frequency inrelevantly similar trials, with the same kind of experimental setup.The relevance of frequencies in such trial is derived from theassumption that in such similar trials, the same chances exist:intra-world duplicate trials should have the same chances. This isclosely related to the ‘stable trial principle’ (Schaffer,2003: 37ff). Chances attach to the outcomes of trials, but the physicalgrounds of the chance lie in the physical properties of the trialdevice or chance setup.
More details on all of these principles can be found in thissupplementary document:
Supplement A. Some Basic Principles About Chance
Chance, it is commonly said, is ‘single-case objectiveprobability’. Philosophers haven’t been very clear on what ismeant by ‘single-case’, and the terminology is slightlymisleading, as it falsely suggests that perhaps multiple cases haveless claim to their chances. The most minimal version of the claim isthat, at least sometimes, an outcome can have a chance to be a resultof an instance of a given kind of process, ortrial, eventhough no other trials of that process occur. This is what we willmean by ‘single case’ chance. (A stronger claim is thatthe chance of an outcome resulting from a given process is anintrinsic property of a single trial. The stronger claim isinconsistent with standard versions of the frequency theory, andindeed it can be difficult to see how chance and frequency might beconnected if that stronger claim were true.) Some have claimed thatsingle-case chance is no part of objective probability; for example,von Mises (1957: p. 11) remarks that the ‘concept of probability… applies only to problems in which either the same eventrepeats itself again and again, or a great number of uniform elementsare involved at the same time.’. However, this is a theoreticaljudgment on von Mises’ part, based on difficulties he perceived ingiving an account of single-case chance; it is not a judgement derivedfrom internal constraints on the chance role. And how could it be?For
like it or not, we have this concept [of single-casechance]. We think that a coin about to be tossed has a certain chanceof falling heads, or that a radioactive atom has a certain chance ofdecaying within the year, quite regardless of what anyone may believeabout it and quite regardless of whether there are any other similarcoins or atoms. As philosophers we may well find the concept ofobjective chance troublesome, but that is no excuse to deny itsexistence, its legitimacy, or its indispensability. If we can’tunderstand it, so much the worse for us. (Lewis, 1980: 90)
A number of the constraints discussed above require the legitimacyof single-case chance. The objection to frequentist accounts of chancethat the frequency might misrepresent the chance if the number ofactual outcomes is too low apparently requires that there benon-trivial chances even for events resulting from a type of trial thatoccurs only very few times, and perhaps even can only occur very fewtimes (Hájek 2009: 227–8). The strong connections betweenpossibility and chance mooted by the BCP and variants thereof alsorequire that there are single-case chances. For the BCP requires that,for every event with some chance, it is possible that the event hasthat same chance and occurs. As noted in SupplementA.2, this renders the chancesrelatively independent of the occurrent frequencies, which in turnrequires single-case chance. For some single outcomes—forexample, the next toss of a coin that is biased ⅔ towardsheads—only very few assignments of credence are reasonable; inthat case, we should, if we are rational, have credence of ⅔ in thecoin landing heads. The rationality of this unequal assignment cannotbe explained by anything like symmetry or indifference. Its rationalitycan be explained by the PP only if the single-case chance of heads onthe next toss is ⅔. Moreover, the existence of this constraint onrational credence should have an explanation. Therefore the PP, if itis to play this necessary explanatory role, requires single-casechance.
It is the stable trial principle that has the closest connectionwith single-case chance, however. For in requiring that duplicatetrials should receive the same chances, it is natural to take thechance to be grounded in the properties of that trial, plus the laws ofnature. It is quite conceivable that the same laws could obtain even ifthat kind of trial has only one instance, and the very same chancesshould be assigned in that situation. But then there are well-definedchances even though that type of event occurs only once.
The upshot of this discussion is that chance is aprocessnotion, rather than being entirely determined by features of theoutcome to which the surface grammar of chance ascriptions assigns thechance. For if there can be a single-case chance of \(\frac{1}{2}\) for a coin toland heads on a toss even if there is only one actual toss, and itlands tails, then surely the chance cannot be fixed by properties ofthe outcome ‘lands heads’, as that outcome does not exist.[2] The chance must rather be grounded in featuresof the process that can produce the outcome: the coin-tossing trial,including the mass distribution of the coin and the details of how itis tossed, in this case, plus the background conditions and laws thatgovern the trial. Whether or not an event happens by chance is afeature of the process that produced it, not the event itself. The factthat a coin lands heads does not fix that the coin landed heads bychance, because if it was simply placed heads up, as opposed to tossedin a normal fashion, we have the same outcome not by chance. Sometimesfeatures of the outcome event cannot be readily separated from thefeatures of its causes that characterise the process by means of whichit was produced. But the upshot of the present discussion is that evenin those cases, whether an outcome happens by chance is fixed by theproperties of the process leading up to it, the causal situation inwhich it occurs, and not simply by the fact that an event of a givencharacter was the product of that process.[3]
Do chances exist? The best examples of probability functions thatmeet the principles about chance are those provided by our bestphysical theories. In particular, the probability functions thatfeature in radioactive decay and quantum mechanics have some claim tobeing chance functions. In orthodox approaches to quantum mechanics,some measurements of a system in a given state will not yield a resultthat represents a definite feature of that prior state (Albert 1992).So, for example, an \(x\)-spin measurement on a system in adeterminate \(y\)-spin state will not yield a determinate resultreflecting some prior state of \(x\)-spin, but rather has a 0.5probability of resulting in \(x\)-spin \(= +1\), and a 0.5 probabilityof resulting in \(x\)-spin \(= -1\). That these measurementresults cannot reflect any prior condition of the system is aconsequence of variousno-hidden variables theorems, the mostfamous of which is Bell’s theorem (Bell 1964; see the entry on Bell’s theorem,Shimony 2009). Bell’s theoremshows that the probabilities predicted by quantum mechanics, andexperimentally confirmed, for spin measurements on a two-particleentangled but spatially separated system cannot be equal to the jointprobabilities of two independent one-particle systems. The upshot isthat the entangled system cannot be represented as the product of twoindependent localised systems with determinate prior \(x\)-spinstates. Therefore, there can be no orthodox local account of theseprobabilities of measurement outcomes as reflecting our ignorance of ahidden quality found in half of the systems, so that the probabilitiesare in fact basic features of the quantum mechanical systems themselves.[4]
The standard way of understanding this is that something—theprocess of measurement, on the Copenhagen interpretation, orspontaneous collapse on the GRW theory—induces anon-deterministic state transition, calledcollapse, into astate in which the system really is in a determinate state with respectto a given quality (though it was not previously). These transitionprobabilities are dictated entirely by the state and the process ofcollapse, which allows these probabilities to meet the stable trialprinciple. The models of standard quantum mechanics explicitly permittwo systems prepared in identical states to evolve via collapse intoany state which has a non-zero prior probability in the original state,which permits these probabilities to meet the BCP. And the no-hiddenvariables theorems strongly suggest that there is no better informationabout the system to guide credence in future states than the chances,which makes these probabilities play the right role in the PP. Thesebasic quantum probabilities governing state transitions seem to bestrong candidates to be called chances.
The foregoing argument makes essential use of collapse. Theexistence of collapse as an alternative rule governing the evolution ofthe quantum state controversial, and it is a scandal of quantummechanics that we have no satisfactory understanding of why collapse(or measurement) should give rise to basic probabilities. But thatsaid, the existence of well-confirmed probabilistic theories whichcannot be plausibly reduced to any non-probabilistic theory is someevidence that there are chances. (Though the Everettian(‘many-worlds’) program of generating quantum probabilitiesfrom subjective uncertainty, without basic chance in the present sense,has recently been gaining adherents—see Barrett 1999; Wallace2007.) Indeed, it looks like the strongest kind of evidence that thereare chances. For if our best physical theories did not featureprobabilities, we should have little reason to postulate them, andlittle reason to take chances to exist. This will become importantbelow (§5), when we discuss classical physics.The conventional view of classical physics, including statisticalmechanics, is that it does not involve basic probability (because thestate transition dynamics is deterministic), and is not accordingly atheory that posits chances (Loewer 2001).[5] Below, we will examinethis view, as well as some of the recent challenges to thisconventional view. But there is at least enough evidence fromfundamental physics for the existence of chances for us to adopt italready at this point as a defensible assumption.
As mentioned in the introduction, some philosophers deliberately use‘random’ to mean ‘chancy’. A random process, intheir view, is one governed by chance in the sense of the previoussection. This generates stipulative definitions like this one:
I group random with stochastic or chancy, taking a random process tobe one which does not operate wholly capriciously or haphazardly butin accord with stochastic or probabilistic laws. (Earman 1986:137)
This process conception of randomness is perfectly legitimate, ifsomewhat redundant. But it is not adequate for our purposes. It makes the Commonplace Thesis a triviality, and thereby neitherinteresting in itself nor apt to support the interesting conclusionssome have drawn from it concerning explanation or experimentaldesign. Moreover,
The invocation of a notion of process randomness is inadequate in another way, as it does not cover all cases of randomness. Take a clear case of process randomness, such as one thousand consecutive tosses of a fair coin. We would expect, with very high confidence, to toss at least one head. But as that outcome has some chance of not coming to pass, it counts as process random even when it does. This is at variance with what we would ordinarily say about such an outcome, which is not at all unexpected, haphazard, or unpredictable. We could search for some refinement of the notion of process randomness that would reserve the word ‘random’ for more irregular looking outcomes. But a better approach, and the one we pursue in this entry, is to distinguish between randomness of the process generating an outcome (which we stipulate to amount to its being a chance process), and randomness of theproduct of that random process. In the case just envisaged, we have a random process, while the outomce ‘at least one head in 1000 tosses’ is not a random product.
The introduction of product randomness helps us make sense of somefamiliar uses of ‘random’ to characterise an entirecollection of outcomes of a given repeated process. This is the sensein which arandom sample is random: it is an unbiasedrepresentation of the population from which it is drawn—and thatis a property of the entire sample, not each individual member. If a random sample is to do its job, it should be irregular and haphazard with respect to the population variables of interest. We should not be able to predict the membership of the sample to any degree of reliability by making use of some other feature of individuals in the population. (So we should not be able to guess at the likely membership of a random sample by using some feature like ‘is over 180cm tall’.) A random sample is one that is representative in the sense of beingtypical of the underlying population from which it is drawn, which means in turn that—in the ideal case—it will exhibit no order or pattern that is not exemplified in that underlying population.
Whilemany random samples will be drawn using a random process, they need notbe. For example, if we are antecedently convinced that the final digitof someone’s minute of birth is not correlated with their familyincome, we may draw a random sample of people’s incomes by choosingthose whose birth minute ends in ‘7’, and that process ofchoice is not at all random. To be sure that our sample is random, wemay wish to use random numbers to decide whether to include a givenindividual in the sample; to that end, large tables of random digitshave been produced, displaying no order or pattern (RAND Corporation1955). This other conception of randomness, as attaching primarily tocollections of outcomes, has been termedproductrandomness.
Product randomness also plays an important role in scientificinference. Suppose we encounter a novel phenomenon, and attempt to givea theory of it. All we have to begin with is the data concerning whathappened. If that data is highly regular and patterned, we may attemptto give a deterministic theory of the phenomenon. But if the data isirregular and disorderly—random—we may offer only astochastic theory. As we cannot rely on knowing whether the phenomenonis chancy in advance of developing a theory of it, it is extremelyimportant to be able to characterise whether the data is random or notdirectly, without detouring through prior knowledge of the processbehind it. We might think that we could simply do this by examinationof the data—surely the lack of pattern will be apparent to theobserver? (We may assume that patternlessness is good evidence forrandomness, even if not entailed by it.) Yet psychological research hasrepeatedly shown that humans are poor at discerning patterns, seeingthem in completely random data, and (for precisely the same reason, infact) failing to see them in non-random data (Gilovichet al.,1985; Kahneman and Tversky, 1972; Bar-Hillel and Wagenaar, 1991; Hahnand Warren, 2009). So the need for an objective account of randomnessof a sequence of outcomes is necessary for reliable scientificinference.
It might seem initially that giving a rigorous characterisation ofdisorder and patternlessness is a hopeless task, made even more difficult by the fact that we need to characterise it without using the notion of chance. (Otherwise we make CT trivial.) Yet a series ofmathematical developments in the theory ofalgorithmicrandomness, culminating in the early 1970s, showed that asatisfactory characterisation of randomness of a sequence of outcomeswas possible. This notion has shown its theoretical fruitfulness notonly in the foundations of statistics and scientific inference, butalso in connection with the development of information theory andcomplexity theory. The task of this section is to introduce themathematical approach to the definition of random sequences, just aswe introduced the philosophical consensus on chance in the previoussection. We will then be in a position to evaluate the CommonplaceThesis, when made precise using theoretically fruitful notions of chance and randomness.
The fascinating mathematics of algorithmic randomness are largelyunknown to philosophers. For this reason, I will give a fairly detailedexposition in this entry. Various points of more technical interesthave been relegated to this supplementary document:
Supplement B. Further Details Concerning Algorithmic Randomness
Most proofs will be skipped, or relegated to this supplementarydocument:
Supplement C. Proofs of Selected Theorems
Fuller discussions can be found in the cited references.
Throughout the focus will be on a simple binary process, which hasonly two types of outcome \(O = \{0,1\}\). (The theory ofrandomness for the outcome sequences of such a simple process can beextended to more complicated sets of outcomes, but there is much of interest even in the questionwhich binary sequences are product random?) Asequence ofoutcomes is an ordered collection of events, finite or infinite,such that each event is of a type in \(O\). So a sequence\(x = x_1 x_2 \ldots x_k\ldots\),where each \(x_i \in O\). The set of allinfinite binary sequences of outcomes is known as theCantor space. One familiar example of a process the outcomesof which form a Cantor space is an infinite sequence of independentflips of a fair coin, where 1 denotes heads and 0 tails. Notions frommeasure theory and computability theory are used in the discussionbelow; an elementary presentation of the mathematics needed can befound in supplementB.2.
Perhaps counterintuitively, we begin with the case of infinite binary sequences. Which of these should count as random products of our binary process?Each individual infinite sequence, whether orderly or not, hasmeasure zero under the standard (Lebesgue) measure over the Cantor space.We cannot determine whether an individual sequence is random fromconsidering what fraction it constitutes of the set of all suchsequences. But, intuitively, almost all such infinite sequences shouldbe random and disorderly, and only few will be orderly (an observationfirst due to Ville 1939). Atypical infinite sequence is onewithout pattern; only exceptional cases have order to them. If theactual process that generate the sequences are perfectly deterministic,it may be that a typical product of that process is not random. But weare rather concerned to characterise which of all the possiblesequences produced by any process whatsoever are random, and it seemsclear that most of the ways an infinite sequence might be produced, andhence most of the sequences so produced, will be random. This fits withintuitive considerations:
We arrange in our thought, all possible events in variousclasses; and we regard asextraordinary those classes whichinclude a very small number. In the game of heads and tails, if headscomes up a hundred times in a row, then this appears to usextraordinary, because the almost infinite number of combinations thatcan arise in a hundred throws are divided in regular sequences, orthose in which we observe a rule that is easy to grasp, and inirregular sequences, that are incomparably more numerous. (Laplace1826)
This fertile remark underscores both that random sequences should beunruly, and that they should becommon. In the present framework: the set ofnon-random sequences shouldhave measure zero, proportional to the set of all suchsequences—correspondingly, the set of random sequences shouldhave measure one (Dasgupta, 2011: §3; Gaifman and Snir,1982: 534; Williams, 2008: 407–11).
This helps, but not much. For there are many measure one subsets ofthe Cantor space, and we need some non-arbitrary way of selecting aprivileged such subset. (The natural option, to take the intersectionof all measure one subsets, fails, because the complement of thesingleton of any specific sequence is measure one, so for each sequencethere is a measure one set which excludes it; therefore theintersection of all measure one sets excludes every sequence, so is theempty set.) The usual response is to take the random sequences to bethe intersection of all measure one subsets of the space which have‘nice’ properties, and to give some principled delimitationof which properties are to count as ‘nice’ and why.
For example, if a sequence is genuinely random, we should expect thatin the long run it would tend to have features we associate with theoutputs of (independent, identically distributed trials of) a chancyprocess. The sequence should look as disorderly as if it were the expected product of genuine chance. This approach is known accordingly as thetypicality approach to randomness. Typicality is normally defined with respect to a prior probability function, since what is a typical series of fair coin toss outcomes might not be a typical series of unfair coin toss outcomes (Eagle 2016: 447). In the present case, we use the Lebesgue measure as it is the natural measure definable from the symmetries of the outcome space of the binary process itself.
A typical sequence should satisfy all of the various ‘propertiesof stochasticity’ (Martin-Löf 1966: 604). What are theseproperties? They include the property of large numbers, the claim thatthe limit frequency of a digit in a random sequence should not bebiased to any particular digit. The (strong)law of largenumbers is the claim that, with probability 1, an infinite sequence ofindependent, identically distributed Bernoulli trials will have theproperty of large numbers. If we concentrate on the sequence ofoutcomes as independently given mathematical entities, rather than asthe products of a large number of independent Bernoulli trials, we canfollow Borel’s (1909) characterisation of the strong law. Let\(S_n (x)\) be the number of 1s occurring in the first \(n\) places ofsequence \(x\) (this is just \(\sum^{n}_{k=1}x_{k}\)), and let \(B\)be the set of infinite sequences \(x\) such that the limit of \(S_n(x)/n\) as \(n\) tends to infinity is \(\frac{1}{2}\). Borel’stheorem is that \(B\) has measure one; almost all infinite sequencesare, in the limit, unbiased with respect to digit frequency.
Clearly, the property of large numbers is a necessary condition forrandomness of a sequence. It is not sufficient, however. Consider thesequence 10101010…. This sequence is not biased. But it isclearly not random either, as it develops in a completely regular andpredictable fashion. So we need to impose additional constraints. Eachof these constraints will be another property of stochasticity weshould expect of a random sequence, including all other such limitproperties of ‘unbiasedness’.
One such further property isBorel normality, also definedin that paper by Borel. A sequence is Borel normal iff each finitestring of digits of equal length has equal frequency in the sequence.[6] Borel proved that a measure one set ofsequences in the Cantor space are Borel normal. Borel normality is auseful condition to impose for random sequences, as it has theconsequence that there will be no predictable pattern to the sequence:for any string \(\sigma\) appearing multiple times in a sequence, it will ascommonly be followed by a 1 as by a 0. This lack of predictabilitybased on the previous elements of the sequence is necessary for genuinerandomness. But again Borel normality is not sufficient for randomness.TheChampernowne sequence (Champernowne 1933) is the sequenceof digits in the binary representations of each successive non-negativeinteger:
\[011011100101110111\ldots\]This is Borel normal, but perfectly predictable, because there is ageneral law which states what the value of the sequence at each indexwill be—not because it can be predicted from prior elements of the sequence, but because it can be predicted from the index.
We must impose another condition to rule out the Champernownesequence. We could proceed, piecemeal, in response to various problemcases, to successively introduce further stochastic properties, each ofwhich is a necessary condition for randomness, eventually hoping togive a characterisation of the random sequences by aggregating enoughof them together. Given the complex structure of the Cantor space, theprospects for success of such a cumulative approach seem dim. A morepromising bolder route is to offer one stochastic property that is byitself necessary and sufficient for randomness, the possession of whichwill entail the possession of the other properties we’ve mentioned (theproperty of large numbers, Borel normality, etc.).
The first detailed and sophisticated attempt at a bolder approach todefining randomness for a sequence with a single stochastic propertywas by von Mises (von Mises, 1957; von Mises, 1941). Suppose you werepresented with any subsequence \(x_1 , \ldots ,x_{n-1}\) of (not necessarily consecutivemembers of) a sequence, and asked to predict the value of\(x_n\). If the sequence were really random,then this information—the values of any previous members of thesequence, and the place of the desired outcome in thesequence—should be of no use to you in this task. To supposeotherwise is to suppose that there is an exploitable regularity in therandom sequence; a gambler could, for example, bet reliably on theirpreferred outcome and be assured of a positive expected gain if theywere in possession of this information. A gambling system selectspoints in a sequence of outcomes to bet on; a successful gamblingsystem would be one where the points selected have a higher frequencyof ‘successes’ than in the sequence as a whole, so that byemploying the system one can expect to do better than chance. But thefailure of gambling systems to make headway in games of chance suggeststhat genuinely random sequences of outcomes aren’t so exploitable. VonMises, observing the empirical non-existence of successful gamblingsystems, makes it a condition of randomness for infinite sequences thatthey could not be exploited by a gambling system (his ‘Prinzipvom ausgeschlossenen Spielsystem’). The idea is that without whateffectively amounts to a crystal ball, there is no way of selecting abiased selection of members of a random sequence.
Shorn of the inessential presentational device of selecting anoutcome to bet on based on past outcomes, von Mises contends that it isa property of stochasticity that a random sequence should not be suchthat information about any initial subsequence\(x_1 x_2 \ldots x_{k-1}\)provides information about the contents of outcome\(x_k\). He formally implements this idea by defining aplace selection as ‘the selection of a partial sequencein such a way that we decide whether an element should or should not beincluded without making use of the [value] of the element’ (vonMises 1957: 25). He then defines a random sequence as one such thatevery infinite subsequence selected by an admissible place selectionretains the same relative digit frequencies as in the original sequence(so one cannot select a biased subsequence, indicating that this is agenuine property of stochasticity). In our case this will mean thatevery admissibly selected subsequence will meet the property of largenumbers with equal frequency of 1s and 0s.[7] One way to characterisethe resulting set of von Mises-random (vM-random) sequences is that itis the largest set which contains only infinite sequences with theright limit frequency and is closed under all admissible placeselections. If the limit frequency of a digit is 1, say in the sequence\(111\ldots\), it is true that every admissible place selectiondetermines a subsequence with the same limit frequency. Von Misesintends this result, for this is what a random sequence of outcomes oftrials with probability 1 of obtaining the outcome 1 looks like. Thissequence does not meet the property of large numbers, however. So wemodify von Mises’ own condition, defining the vM-random sequences asthe largest set of infinite sequences which have the limit frequency\(\frac{1}{2}\), and which is closed under all admissible place selections.
Von Mises’ original proposals were deliberately imprecise about whatkinds of procedures count as admissible place selections. Thisimprecision did not concern him, as he was disposed to regard the‘right’ set of place selections for any particular randomcollective as being fixed by context and not invariantly specifiable.But his explicit characterisation is subject to counterexamples. Since‘any increasing sequence of natural numbers\(n_1 \lt n_2 \lt \ldots\)defines a corresponding selection rule, … given an arbitrarysequence of 0s and 1s … there is among the selection rules theone which selects the 1s of the given sequence, so the limit frequencyis changed’ (Martin-Löf 1969b: 27). This clearly violatesvon Mises’ intentions, as he presumably intended that the placeselections should be constructively specified, yet the notion ofvM-randomness remains tantalisingly vague without some more concretespecification.
Such a specification arrived in the work of Church (1940), drawingon the then newly clarified notion of an effective procedure. Churchobserved that
To a player who would beat the wheel at roulette a systemis unusable which corresponds to a mathematical function known to existbut not given by explicit definition; and even the explicit definitionis of no use unless it provides a means of calculating the particularvalues of the function. … Thus a [gambling system] should berepresented mathematically, not as a function, or even as a definitionof a function, but as an effective algorithm for the calculation of thevalues of a function. (Church 1940: 133)
Church therefore imposes the condition that the admissible placeselections should be, not arbitrary functions, buteffectivelycomputable functions of the preceding outcomes in a sequence.Formally, we consider (following Wald 1938) a place selection as afunction \(f\) from an initial segment\(x_1 x_2 \ldots x_{i-1}\)of a sequence \(\sigma\) into \(\{0,1\}\), such that the selectedsubsequence \(\sigma' = \{x_i : f(x_1 \ldots x_{i - 1}) = 1\}\), Church’s proposal is that we admit only thoseplace selections which arecomputable (total recursive) functions.[8] Church’s proposal applies equally to thesequences von Mises is concerned with, namely those with arbitrarynon-zero outcome frequencies for each type of outcome; to get therandom sequences, we again make the restriction to those normal binarysequences where the limit relative frequency of each outcome is \(\frac{1}{2}\)(these are sometimes called theChurch stochasticsequences).
As Church points out, if we adopt the Church-Turing computablefunctions as the admissible place selections, it follows quickly thatthe set of admissible place selections is countably infinite. We maythen show:
Theorem 1 (Doob-Wald). The set of random sequences forms a measure one subset of the Cantor space.
[Proof]
Thus von Mises’ conception of randomness was made mathematicallyrobust (Martin-Löf 1969b). We can see that various properties ofstochasticity follow from this characterisation. For example, we canshow:
Corollary 1. Every von Mises-random sequence is Borel normal.
[Proof]
These successes for the approach to randomness based on theimpossibility of gambling systems were, however, undermined by atheorem of Ville (1939):
Theorem 2 (Ville). For any countable set of place selections \(\{\phi_n\}\)(including the identity), there exists an infinite binarysequence \(x\) such that:
- for all \(m\), \(\lim_{m \rightarrow \infty}(\sum^{m}_{k=1}(\phi_n (x))_k)/m = \frac{1}{2}\); but
- for all \(m,(\sum^{m}_{n=1}x_{n})/m \gt \frac{1}{2}\).
That is, for any specifiable set of place selections, including thetotal recursive place selections proposed by Church as the invariantlyappropriate set, there exist sequences which have the right limitrelative frequencies to satisfy the strong law of large numbers (andindeed Borel normality), as do all their acceptable subsequences, but which are biased in all initial segments.[9]
Why should this be a problem for random sequences? The property oflarge numbers shows that almost all infinite binary sequences havelimit digit frequency \(\frac{1}{2}\), but says nothing about how quicklythis convergence happens or about the statistical properties of theinitial segments. There certainly exist sequences that converge to\(\frac{1}{2}\) butwhere \(S_n (x)/n \gt \frac{1}{2}\) for all \(n\) (the sequence has the right mean butconverges ‘from above’). In the ‘random walk’model of our random sequences, where each element in the sequence isinterpreted as a step left (if 0) or right (if 1) along the integerline, this sequence would consist of a walk that (in the limit) endsup back at the origin but always (or even eventually) stays to theright. Intuitively, such a sequence is not random.
Such sequences do indeed violate at least one property ofstochasticity, as it turns out that in a measure one set of sequences,\(S_n (x)/n\) will be abovethe mean infinitely many times, and below the mean infinitely manytimes. So stated, this is thelaw of symmetric oscillation(Dasgupta, 2011: 13).[10] Since the law of symmetric oscillationsholds for a measure one set of sequences, it is a plausible property ofrandomness (it is naturally of a family with other properties ofstochasticity). Ville’s result shows that von Mises’ definition interms of place selections cannot characterise random sequences exactlybecause it includes sequences that violate this law (so don’tcorrespond to a truly random random walk). Indeed, such sequences don’teven correspond to von Mises’ avowed aims. As Li and Vitányi say(2008: 54), ‘if you bet 1 all the time against such a sequence ofoutcomes, your accumulated gain is always positive’. As such,Ville-style sequences seem to permit successful gambling, despite thefact that they do not permit a system to be formulated in terms ofplace selections.
Von Mises and Church identified a class of sequences, those withlimit frequencies invariant under recursive place selections, thatsatisfied a number of the measure one stochastic properties ofsequences that are thought characteristic of randomness. But the classthey identified was too inclusive. The next insight in this area wasdue to (Martin-Löf 1966), who realised that rather than lookingfor anothersingle property of sequences that would entailthat the sequence met all the further conditions on randomness, it wassimpler to adopt as a definition that a sequence is random iff thesequence has all the measure one properties of randomness that can bespecified. Here again recursion theory plays a role, for the idea ofa measure one property of randomness that can be specified isequivalent to the requirement that there be an effective procedure fortesting whether a sequence violates the property. This prompts thefollowing very bold approach to the definition of random sequences:
Martin-Löf Randomness:
A random sequence is one which cannot be effectively determined toviolate a measure one randomness property (Downey and Hirschfeldt2010: §5.2; Dasgupta 2011: §6.1; Porter 2016: 461–2).
Recalling the definition of effective measure zero from supplementB.2, Martin-Löf suggests that arandom sequence is any that does not belong to any effective measurezero set of sequences, and thus belongs to every effective measure oneset of sequences. An effective measure zero set of sequences willcontain sequences which can be effectively determined to have a‘special hallmark’ (for example, having a ‘1’at every seventh place, or never having the string ‘000000’occurring as a subsequence). It is part of von Mises’ insight that norandom sequence will possess any of these effectively determinablespecial hallmarks: such hallmarks would permit exploitation as part ofa gambling system. But Martin-Löf notices that all of the commonlyused measure one properties of stochasticity are effective measure one.Any sequence which violates the property of large numbers, or the lawof symmetric oscillations, etc., will do so on increasingly longinitial subsequences. So the violation of any such property will alsobe a special hallmark of a non-random sequence, an indicator that thesequence which possesses it is an unusual one. Since the unusualproperties of non-stochasticity in question are effective measure zero,we can therefore say that the random sequences are those which are notspecial in any effectively determinable way. To formalise this,Martin-Löf appeals to the language of significance testing. Hismain result is sometimes put as the claim that random sequences asthose which pass all recursive significance tests for sequences(Schnorr 1971: §1)—they are never atypical enough to prompt us to reject the hypothesis that they are random. See supplementB.1.1 for more detail on thispoint.
Note that the restriction to effective properties of sequences iscrucial here. If we allowed, for example, the propertybeingidentical to my favourite random sequence x, that would define atest which the sequence \(x\) would fail, even though it israndom. But it follows from our observations about von Mises randomness(which is still a necessary condition on randomness) that noeffectively computable sequence is random (if it were, there would be aplace selection definable from the algorithm that selected all the 1sin the sequence). So there is no effective test that checks whether agiven sequence is identical to some random sequence.
The central result of Martin-Löf (1966) is the following:
Theorem 3 (Universal Tests and the Existence of Random Sequences). There is a universal testfor ML-randomness; moreover, only a measure zero set of infinitebinary sequences fails this test. So almost all such sequences areML-random.
[Proof]
The universal test does define an effective measure one property,but (unlike normality, or having no biased admissible subsequences), itis far from a naturally graspable property. Nevertheless,Martin-Löf’s result does establish that there are random sequencesthat satisfy all the properties of stochasticity, and that in factalmost all binary sequences are random in that sense. Returning to Ville’s theorem2, it can be shown that allML-random sequences satisfy the law of symmetric oscillations (vanLambalgen 1987a: §3.3). Hence the kind of construction Ville usesyields vM-random sequences which are not ML-random. All the ML-randomsequences have the right limit relative frequencies, since they satisfythe effective measure one property of large numbers. So Martin-Löfrandom sequences satisfy all the intuitive properties we would expectof a sequence produced by tosses of a fair coin, but are characterisedentirely by reference to the effectively specifiable measure one setsof infinite sequences. We have therefore characterised random sequencesentirely in terms of the explicit features of the product, and not ofthe process that may or may not lie behind the production of thesesequences.
There are other accounts that develop and extend Martin-Löf’saccount of randomness, in the same kind of framework, such as that ofSchnorr (1971); for some further details, see supplementB.1.2.
For infinite binary sequences, the Martin-Löf definition interms of effective tests is a robust and mathematically attractivenotion. However, it seems to have the major flaw that it applies onlyto infinite binary sequences. (Since finiteness of a sequence iseffectively positively decidable, and the set of all finite sequencesis measure zero, every finite sequence violates an effective measureone randomness property.) Yet ordinarily we are happy to characteriseeven quite small finite sequences of outcomes as random. As mentioned above (§2), there is room for doubt at ourability to do so correctly, as we seem to be prone to mischaracterisesequences we are presented with, and perform poorly when asked toproduce our own random sequences. However, there is nothing in thisliterature to suggest that we are fundamentally mistaken in applying the notion of randomness to finite sequences at all. So one might thinkthis shows that the Martin-Löf approach is toorestrictive.
Yet there is something in the idea of ML-randomness that we mightapply profitably to the case of finite sequences. Since being generatedby an effective procedure is a measure zero property of infinitesequences, given that there are only countably many effectiveprocedures, it follows immediately that no ML-random sequence can beeffectively produced. This fits well with the intuitive idea thatrandom sequences don’t have the kind of regular patterns that anyfinite algorithm, no matter how complex, must exploit in order toproduce an infinite sequence. This contrast between random sequenceswhich lack patterns that enable them to be algorithmic generated, andnon-random sequences which do exhibit such patterns, does not applystraightforwardly to the finite case, because clearly there is aneffective procedure which enables us to produce any particular finitesequence of outcomes—simply to list those outcomes in thespecification of the algorithm. But a related contrast doesexist—between those algorithms which are simply crude lists ofoutcomes, and those which produce outcomes which involve patterns andregularities in the outcome sequence. This leads us to the idea thatfinite random sequences, like their infinite cousins, are not able tobe generated by an algorithm which exploits patterns in the outcomesequence. The outcomes in random sequences are thus patternless, ordisorderly, in a way that is intuitively characteristic ofrandom sequences.
Disorderly sequences, in the above sense, are highlyincompressible. The best effective description we can give ofsuch a sequence—one that would enable someone else, or acomputer, to reliably reproduce it—would be to simply list thesequence itself. This feature allows us to characterise the randomsequences as those which cannot be produced by a compact algorithm(compact with respect to the length of the target sequence, that is).Given that algorithms can be specified by a list of Turing machineinstructions, we have some basic idea on how to characterise the lengthof an algorithm. We can then say that a random sequence is one suchthat the shortest algorithm which produces it is approximately (to beexplained below) the same length as the sequence itself—nogreater compression in the algorithm can be attained. This proposal,suggested by the work of Kolmogorov, Chaitin and Solomonov (KCS),characterises randomness as the algorithmic or informationalcomplexity of a sequence. Comprehensive surveys of complexityand the complexity-based approach to randomness are Li andVitányi 2008 and Downey and Hirschfeldt 2010: Part I. (See alsoChaitin 1975, Dasgupta 2011: §7, Downey et al. 2006:§§1–3, Earman 1986: 141–7, Kolmogorov 1963,Kolmogorov and Uspensky 1988, Smith (1998: ch. 9) and van Lambalgen1995.)
If \(f\) is effectively computable—a recursivefunction—let us say that \(\delta\) is an \(f\)-description of afinite string \(\sigma\) iff \(f\) yields \(\sigma\) on input\(\delta\). We may define the \(f\)-complexity of a string\(\sigma , C_f (\sigma)\), as the length of the shortest string\(\delta\) that \(f\)-describes \(\sigma\). If there is no such\(\delta\), let the \(f\)-complexity of \(\sigma\) be infinite. \(f\)is thus adecompression algorithm, taking the compresseddescription \(\delta\) back to the original string \(\sigma\). Thereare obviously many different kinds of decompression algorithm. Oneboring case is the identity function (the empty program), which takeseach string to itself. The existence of this function shows that thereare decompression algorithms \(f\) which have a finite\(f\)-complexity for any finite string. Any useful decompressionalgorithm will, however, yield an output string significantly longerthan the input description, for at least some input descriptions.
One example is this algorithm: on input of a binary string \(\delta\)of length \(4n\), the algorithm breaks the input down into \(n\)blocks of 4, which it turns into an output sequence \(\sigma\), asfollows. Given a block \(b_1 , \ldots ,b_4\), it produces a block ofthe symbol contained in \(b_1\), the length of which is governed bythe binary numeral \(b_2 b_3 b_4\). So the block 1101 produces astring of five 1s. The output sequence is obtained by concatenatingthe output of successive blocks in order. Every string \(\sigma\) canbe represented by this algorithm, since the string \(\sigma '\) whichinvolves replacing every 1 in \(\sigma\) by 1001, and every 0 by 0001,will yield \(\sigma\) when given as input to this algorithm. So thisalgorithm has finite complexity for any string. But this algorithm cando considerably better; if the original string, for example, is astring of sixteen 1s, it can be obtained by input of this description:11111111, which is half the length. Indeed, as reflection on thisalgorithm shows, this algorithm can reconstruct an original stringfrom a shorter description, for many strings, particularly if theycontain reasonably long substrings of consecutive 1s or 0s.
However, there is a limit on how well any algorithm can compress astring. If \(\lvert\sigma\rvert\) is the length of \(\sigma\), saythat a string \(\sigma\) iscompressed by \(f\) if there isan \(f\)-description \(\delta\) such that \(\sigma = f(\delta)\) and\(\lvert\delta\rvert \lt \lvert\sigma\rvert\). If a useful decompressionalgorithm is such that for some fixed \(k\), \(\lvert f(\delta)\rvert \le \lvert\delta\rvert +k\), so that \(f\)-descriptions are at least \(k\) shorter than thesequence to be compressed, then it follows that very few stringsusefully compress. For there are \(2^l\) strings \(\sigma\) such that\(\lvert\sigma\rvert = l\); so there are at most \(2^{l - k} f\)-descriptions;since \(f\) is a function, there are at most \(2^{l-k}\) compressiblestrings. As a proportion of all strings of length \(l\), then, thereare at most \(2^{l-k}/2^{l} = \frac{1}{2}^{k}\) compressible strings. This means that as the amount of compression required increases, thenumber of sequences so compressible decreases exponentially. Even inthe most pitiful amount of compression, \(k = 1\), we see that at mosthalf the strings of a given length can be compressed by any algorithm\(f\).
So our interest must be in those decompression functions which do bestoverall. We might hope to say: \(f\) is better than \(g\) iff for all\(\sigma , C_f (\sigma) \le C_g (\sigma)\). Unfortunately, no functionis best in this sense, since for any given string \(\sigma\) with\(f\)-complexity \(\lvert\sigma\rvert - k\), we can design a function\(g\) as follows: on input 1, output \(\sigma\); oninputn\(\delta\) (for any \(n)\), output\(f(\delta)\). (Generalising, we can add arbitrarily long prefixes oflength \(m\) onto the inputs to \(g\) and have better-than\(-f\)compression for \(2^m\) sequences.) But we can define a notion ofcomplexity near-superiority of \(f\) to \(g\) iffthere is some constant \(k\) such that for any string,\(C_f (\sigma) \le C_g (\sigma) + k.f\) is least as good as \(g\), subject to some constantwhich is independent of the functions in question. If \(f\) and\(g\) are both complexity near-superior to each other, for thesame \(k\), we say they arecomplexity equivalent.
Kolmogorov (1965) showed that there is an optimal decompressionalgorithm:
Theorem 4 (Kolmogorov). There exists a decompression algorithm whichis near-superior to any other program. Moreover, any such optimalalgorithm is complexity equivalent to any other optimal algorithm (seealso Chaitin 1966 and Martin-Löf 1969a).
[Proof]
Such a universal function Kolmogorov calledasymptoticallyoptimal (for as \(\lvert\sigma\rvert\) increases, the constant\(k\) becomes asymptotically negligible).
Choose some such asymptotically optimal function \(u\), anddefine the complexity (simpliciter) \(C(\sigma) = C_u (\sigma)\). Since \(u\) is optimal,it is near-superior to the identity function; it follows that thereexists a \(k\) such that \(C(\sigma) \le \lvert\sigma\rvert+ k\). On the other hand, we also know that the number ofstrings for which \(C(\sigma) \le \lvert\sigma\rvert- k\) is at most \(\frac{1}{2} ^k\). We knowtherefore that all except \(1 - 2^k\) sequencesof length \(n\) have a complexity within \(k\)of \(n\). As \(n\) increases, for fixed large \(k\),therefore, we see that almost all sequences have complexity ofapproximately their length. All this can be used to make precise thedefinition of randomness sketched above.
Kolmogorov random:
We say that a sequence \(\sigma\) isKolmogorov-randomiff \(C(\sigma) \approx \lvert\sigma\rvert\).
It follows from what we have just said that there exist randomsequences for any chosen length \(n\), and that as \(n\)increases with respect to \(k\), random sequences come to be theoverwhelming majority of sequences of that length.
Theorem 5. A random sequence of a given length cannot be effectively produced.
[Proof]
An immediate corollary is that the complexity function \(C\) isnot a recursive function. If it were, for any \(n\), we couldeffectively compute \(C(\sigma)\) for any \(\sigma\) oflength \(n\). By simply listing all such sequences, we could haltafter finding the first \(\sigma\) for which \(C(\sigma)\ge n\). But then we could effectively produce a randomsequence, contrary to theorem5.
The notion of Kolmogorov randomness fits well with the intuitionsabout the disorderliness of random sequences we derived from theMartin-Löf account. It also fits well with other intuitions aboutrandomness—random sequences don’t have a short description, sothere is no sense in which they are produced in accordance with aplan. As such, Kolmogorov randomness also supports von Mises’intuitions about randomness being linked to the impossibility ofgambling systems, as there will be no way of effectively producing agiven random sequence of outcomes using a set of initially given dataany smaller than the sequence itself. There is no way of predicting agenuinely random sequence in advance because no random sequence can beeffectively produced, yet every predictable sequence of outcomes can(intuitively) be generated by specifying the way in which futureoutcomes can be predicted on the basis of prior outcomes. Moreover,because for increasing \(k\) the number of strings of length\(n\) which are random increases, and because for increasing\(n\) we can choose larger and larger \(k\), there is somesense in which the great majority of sequences are random; this matcheswell the desiderata in the infinite case that almost all sequencesshould be random. Finally, it can be shown that the Kolmogorovrandomness of a sequence is equivalent to that sequence passing abattery of statistical tests, in the Martin-Löfsense—indeed, that the Kolmogorov random sequences are just thosethat pass a certain universal test of non-randomness (Martin-Löf1969a: §2).[11]
The plain Kolmogorov complexity measure is intuitively appealing.Yet the bewildering variety of permitted \(f\)-descriptionsincludes many unmanageable encodings. In particular, for a givendecompression algorithm \(f\), there are \(f\)-descriptions\(\gamma\) and \(\delta\) such that \(\delta = \gamma\unicode{x2040}\tau\), for some string\(\tau\). This is an inefficient encoding, because if\(\gamma\) can occur both as a code itself, and as the initialpart of another code, then an algorithm cannot decode its input string‘on the fly’ as soon as it detects a comprehensible input,but must wait until it has scanned and processed the entire inputbefore beginning to decode it. An efficient coding, such that noacceptable input is an initial substring of another acceptable input,is calledprefix-free (because no member is a prefix of anyother member). A good example of an encoding like this is the encodingof telephone numbers: the telephone exchange can, on input of a stringof digits that it recognises, immediately connect you; once anacceptable code from a prefix-free set has been input, no otheracceptable code can follow it.
Prefix-free encodings are useful for a number of practical purposes,and they turn out to be useful in defining randomness also. (As we willsee in §2.3, they are of special importance inavoiding a problem in the definition of infinite Kolmogorov randomsequences.) The change is the natural one: we appeal, not to the plaincomplexity of a sequence in defining its randomness, but the so-calledprefix-free complexity (Downey and Hirschfeldt 2010:§2.5ff; Dasgupta 2011: §8).
To fix ideas, it is useful to have an example of a prefix-freeencoding in mind. Suppose we have a string\(\sigma =x_1 \ldots x_k\)of length \(k\). This is the initial part of the string\(\sigma 1\), so if any string was an acceptable input, we wouldnot have a prefix-free encoding. But if the code contained informationabout thelength of the string encoded, we would know that thelength of \(\sigma , k\), is less than the length of\(\sigma 1\). We can make this idea precise as follows (using acode similar to that used to a different end in theproof of Theorem4).Let the code of \(\sigma\) be the string\(1^{[\lvert\sigma\rvert]}0\sigma\)—that is, thecode of a string consists of a representation of the length of thestring, followed by a 0, followed by the string. This is clearly aprefix-free encoding.[12] This coding is not particularly efficient,but more compact prefix-free encodings do exist.
The notion of prefix-free complexity is defined in exactly the sameway as plain complexity, with the additional restriction that theallowable \(f\)-descriptions of a string, given a decompressionfunction \(f\), must form a prefix-free set. With an appropriatechoice of coding, we can get a set of \(f\)-descriptions which ismonotonic with increasing length, i.e., if \(\lvert\gamma\rvert \lt \lvert\delta\rvert\) then \(\lvert f(\gamma)\rvert \lt \lvert f(\delta)\rvert\). Our definition goes much as before: The prefix free Kolmogorov complexity, given adecompression function\(f\) with a prefix-free domain, of a string \(\sigma\),denoted \(K_f (\sigma)\), is thelength of the shortest \(f\)-description of \(\sigma\) (andinfinite otherwise). Since\(1^{[\lvert\sigma\rvert]}0\sigma\) is a finiteprefix-free code for \(\sigma\), we know there are at least someprefix-free decompression algorithms with finite\(K_f\) for every string. As before, we canshow there exist better decompression algorithms than this one, andindeed, that there exists a universal prefix-free decompressionalgorithm \(u\), such that for every other algorithm \(m\)there is a \(k\) such that\(K_u (\sigma) \le K_m (\sigma) + k\), for all\(\sigma\) (Downey and Hirschfeldt 2010: §2.5). We define\(K(\sigma) = K_u (\sigma)\).
Since the set of prefix-free codes is a subset of the set of allpossible codes, we should expect generally that\(C(\sigma) \le K(\sigma)\). On theother hand, we can construct a universal prefix-free algorithm\(u\) as follows. A universal Turing machine \(u'\)takes as input the Gödel number of the Turing machine we wish tosimulate, and the input we wish to give to that machine. Let usconcatenate these two inputs into a longer input string that is itselfuniquely readable; and we then encode that longer string into ourprefix-free encoding. The encoding is effectively computable, clearly,so we can chain a decoding machine together with our universal machine\(u'\); on input an acceptable prefix-free string, thedecoder will decompose it into the input string, we then decompose theinput string into the Gödel number and the input, and run themachine \(u'\) on that pair of inputs. Depending on theparticular encoding we choose, we may establish various bounds on\(K\); one obvious bound we have already established is that\(C(\sigma) \le K(\sigma) \lt C(\sigma) + 2\lvert\sigma\rvert\). By using a moreefficient prefix-free coding of a \(u'\)-description, we canestablish better bounds.[13] (Some more results on the connectionbetween \(K\) and \(C\) are in Downey and Hirschfeldt 2010:§§3.1–3.2.)
With prefix-free complexity in hand, we may define:
Prefix-free Kolmogorov Randomness:
A string \(\sigma\) is Prefix-free Kolmogorov random iff\(K(\sigma) \ge \lvert\sigma\rvert\) (modulo anadditive constant).
Again, there do exist prefix-free random sequences, since we knowthat there are plain random sequences, and given the greater length ofa prefix-free encoding, we know that the prefix-free code of ordinaryrandom sequence will be generally longer than an arbitrary code of it,and thus random too. Indeed, there will be more prefix-free randomsequences because strings compress less effectively under \(K\)than \(C\). Yet \(K\) and \(C\) behave similarly enoughthat the success of plain Kolmogorov complexity at capturing ourintuitions about randomness carry over to prefix-free Kolmogorovrandomness, and the label ‘Kolmogorov random’ has come tobe used generally to refer to prefix-free Kolmogorov randomsequences.
Both plain and prefix-free Kolmogorov randomness providesatisfactory accounts of the randomness of finite sequences. Onedifficulty arises, as I suggested earlier, when we attempt to extendplain Kolmogorov randomness to the case of infinite sequences in themost obvious way, that is, by defining an infinite sequence asKolmogorov random iff all finite initial segments are Kolmogorovrandom. It would then turn out that no infinite sequence is random.Why? Because of the following theorem, which shows that there is nosequence such that all of its initial segments are random:
Theorem 6 (Martin-Löf 1966). For any sufficiently long string, there will always be some fairly compressible initial segments. (See also Li and Vitányi 2008:§2.5.1 and Downey andHirschfeldt 2010: §2.1.)
[Proof]
This dip in complexity of an initial subsequence will occur infinitelyoften in even a random infinite sequence, a phenomenon knownascomplexity oscillation (Li and Vitányi 2008:§2.5.1). This phenomenon means that ‘it is difficult toexpress a universal sequential test precisely in termsof \(C\)-complexity’ (Li and Vitányi 2008: 151), andthe best that can be precisely done is to find upper and lower boundsexpressible in terms of ordinary Kolmogorov complexity between whichthe set of ML-random sequences falls (Li and Vitányi 2008:§ 2.5.3).
However, the phenomenon of complexity oscillation does not pose assignificant a problem for prefix-free Kolmogorovcomplexity. Complexity oscillation does arise, but in fact theinefficiency of prefix-free encodings is a benefit here:‘\(K\) exceeds \(C\) by so much that the complexity ofthe prefix does not drop below the length of the prefix itself (forrandom infinite \(\omega)\)’ (Li and Vitányi 2008:221). That is, while the complexity of some initial segments dipsdown, it always remains greater than the length of the prefix. So itcan be that, uniformly, when \(x\) is an infinite sequence, forany of its initial subsequences \(\sigma , K(\sigma) \ge \lvert\sigma\rvert\). This suggests that we can extend prefix-free Kolmogorovcomplexity to the infinite case in the straightforward way:aninfinite sequence \(x\) is prefix-free Kolmogorov random iffevery finite initial subsequence is prefix-free Kolmogorov random.
With this definition in hand we obtain a very striking result. Theclass of infinite prefix-free Kolmogorov random sequences is certainlynon-empty. Indeed: it is just the class of ML-random sequences!
Theorem 7 (Schnorr). A sequence is ML-random iff it is prefix-free Kolmogorov random.
[Proof]
Schnorr’s theorem is evidence that we really have captured theintuitive notion of randomness. Different intuitive starting pointshave generated the same set of random sequences. This has been takento be evidence that ML-randomness or equivalently (prefix-free)Kolmogorov randomness is really the intuitive notion of randomness, inmuch the same way as the coincidence of Turing machines, Postmachines, and recursive functions was taken to be evidenceforChurch’s Thesis, the claim that any one of these notionscaptures the intuitive notion of effective computability. Accordingly,Delahaye (1993) has proposed theMartin-Löf-ChaitinThesis, that either of these definitions captures the intuitivenotion of randomness. If this thesis is true, this undermines at leastsome sceptical contentions about randomness, such as the claim ofHowson and Urbach (1993: 324) that ‘it seems highly doubtfulthat there is anything like a unique notion of randomness there to beexplicated’.
There are some reasons to be suspicious of the Martin-Löf-ChaitinThesis, despite the mathematically elegant convergence between thesetwo mathematical notions. For one, there is quite a bit of intuitivesupport for accounts of randomness which do not make it primarily aproperty of sequences, and those other accounts are no less able to bemade mathematically rigorous (see especially the‘epistemic’ theories of randomness discussed in§6.2, as well as theories of randomness asindeterminism discussed in §7.2). Theexistence of other intuitive notions makes the case of randomnessrather unlike the supposedly analogous case of Church’s Thesis, whereno robust alternative characterisation of effective computability isavailable.
Even if we accept that randomness, like disorder, is at root aproduct notion, there are a number of candidates in the vicinity ofthe set identified by Schnorr’s thesis that might also deserve to becalled the set of random sequences. Most obviously, there is Schnorr’sown conception of randomness (§2.1.2;supplementB.1.2). Schnorr(1971) suggests that, for technical and conceptual reasons, Schnorrrandomness is to be preferred to Martin-Löf randomness as anaccount of the intuitive notion. While results that parallel theconvergence of ML-randomness and Kolmogorov randomness have been given(Downey and Griffiths 2004), the relevant compressibility notion ofrandomness for Schnorr randomness was not known until quite recently,and is certainly less intuitively clear than Kolmogorovrandomness. Moreover, since the set of ML-random sequences is a strictsubset of the set of Schnorr random sequences, any problematic membersof the former are equally problematic members of the latter; and ofcourse there will be Schnorr random sequences which fail someMartin-Löf statistical test, which might lead some to reject theviability of Schnorr’s notion from the start.
Schnorr’s result showing the convergence between prefix-freeKolmogorov complexity and Martin-Löf randomness is verysuggestive. As has become clear, the existence of other notions ofrandomness—incluing Schnorr randomness, as well as a number ofother proposals (Li and Vitányi 2008: §2.5; Porter 2016: 464–6)—shows thatwe should be somewhat cautious in yielding to its suggestion.
This is especially true in light of a recent argument by Porter (2016: 469–70). He considers a certain schematic characterisation of computable functions, something like this: for every computable function \(f\) with property \(P_i, f\) is differentiable at \(x\) iff \(x\) corresponds to a random\(_i\) sequence. It turns out that for each sense of randomness (ML randomness, Schnorr randomness, computable randomness, etc.), there is some corresponding property of computable functions. Most importantly, none of these properties look overwhelmingly more natural or canonical than the others. For example, computable functions of bounded variation are differentiable at the Martin-Löf random points, while nondecreasing computable functions are differentiable at the computably random points. The difficulty for the Martin-Löf-Chaitin Thesis is this: these results give us the notion of a typical sequence, with respect to the differentiability of various kinds of function. Unfortunately, these notions of typical sequence diverge from one another. Unlike Church’s thesis, where all the notions of effective computability line up, here we have a case where various notions of a typical sequence do not line up with each other (though there is significant overlap). Porter concludes that ‘no single definition of randomness can do the work of capturing every mathematically significant collection of typical points’ (Porter 2016: 471).
That conclusion may well be justified. But we can largely sidestep the dispute over whether there is a single precise notion ofrandomness that answers perfectly to our intuitive conception ofrandom sequence. Kolmogorov-Martin-Löf randomness is a reasonable andrepresentative exemplar of the algorithmic approach to randomness, and it overlaps almost everywhere with any other plausible definition of randomness. It is adopted here as a useful working account of randomness forsequences. None of the difficulties and problems I raise below for theconnection between random sequences and chance turns in any materialway on the details of which particular set of sequences gets countedas random (most are to do with the mismatch between the process notionof chance and any algorithmic conception of randomness, withdifferences amongst the latter being relatively unimportant). So whilethe observations below are intended to generalise to Schnorrrandomness and other proposed definitions of random sequences, I willexplicitly treat only KML randomness in what follows.
The notions of chance and randomness discussed and clarified in theprevious two sections are those that have proved scientifically andphilosophically most fruitful. Whatever misfit there may be betweenordinary language uses of these terms and these scientificprecisifications, is made up for by the usefulness of these concepts.This is particularly so with the notion of randomness, chance being inphilosopher’s mouths much closer to what we ordinarily take ourselvesto know about chance. On these conceptions, randomness isfundamentally a product notion, applying in the first instance tosequences of outcomes, while chance is a process notion, applying inthe single case to the process or chance setup which produces a tokenoutcome. Of course the terminology in common usage is somewhatslippery; it’s not clear, for example, whether to count randomsampling as a product notion, because of the connection withrandomness, or as a process notion, because sampling is a process. Theorthodox view of the process, in fact, is that it should be governedby a random sequence; we enumerate the population, and sample anindividual \(n\) just in case the \(n\)-th outcome in apre-chosen random sequence is 1. (Of course the sample thus selectedmay not be random in some intuitive sense; nevertheless, it will notbe biased because of any defect in the selection procedure, but ratheronly due to bad luck.)
With these precise notions in mind, we can return to the CommonplaceThesis CT connecting chance and randomness. Two readings makethemselves available, depending on whether we take single outcomes orsequences of outcomes to be primary:
\(\mathbf{CTa}\):
A sequence of outcomes happens by chance iff that sequence israndom.\(\mathbf{CTb}\):
An outcome happens by chance iff there is a random sequence of outcomesincluding it.
Given the standard probability calculus, any sequence of outcomes isitself an outcome (in the domain of the chance function defined over a\(\sigma\)-algebra of outcomes, as in standard mathematical probability);so we may without loss of generality consider only (CTb). But aproblem presents itself, if we consider chancy outcomes in possiblesituations in which only very few events ever occur. It may be thatthe events which do occur, by chance, are all of the same type, inwhich case the sequence of outcomes won’t be random. This problem isanalogous to the ‘problem of the single case’ for frequency views of chance (Hájek 2012:§3.3), becauserandomness is, like frequency, a property of an outcome sequence. Theproblem arises because the outcomes may be too few or too orderly toproperly represent the randomness of the entire sequence of which theyare part (all infinite random sequences have at least some non-randominitial subsequences). The most common solution in the case offrequentism was to opt for a hypothetical outcome sequence—asequence of outcomes produced under the same conditions with a stablelimit frequency (von Mises 1957: 14–5). Likewise, we may refinethe commonplace thesis as follows:
\(\mathbf{RCT}\):
An outcome happens by chance iff, were the trial which generated thatoutcome repeated often enough under the same conditions, we wouldobtain a random sequence including the outcome (or of which theoutcome is a subsequence).
Here the idea is that chancy outcomes would, if repeated oftenenough, produce an appropriately homogenous sequence of outcomes whichis random. If the trial is actually repeated often enough, thissequence should be the actual sequence of outcomes; the whole point ofKolmogorov randomness was to permit finite sequences to be random.
RCT is intuitively appealing, even once we distinguish process andproduct randomness in the way sketched above. It receives significantsupport from the fact that fair coins, tossed often enough, do in ourexperience invariably give rise to random sequences, and that theexistence of a random sequence of outcomes is compelling evidence forchance. The truth of RCT explains this useful constraint on theepistemology of chance, since if we saw an actual finite randomsequence, we could infer that the outcomes constituting that sequencehappened by chance. However, in the next two sections, we will see thatthere are apparent counterexamples even to RCT, posing grave difficulties for the Commonplace Thesis. In §4 wewill see a number of cases where there are apparently chancy outcomes without randomness, while in §5 we will see casesof apparent randomness where there is no chance involved.
A fundamental problem with RCT seems to emerge when we consider thefate of hypothetical frequentism as a theory of chance. For thereseems to be no fact of the matter, in a case of genuine chance, as towhat sequence of outcomes would result: as Jeffrey (1977: 193) putsit, ‘there is no telling whether the coin would have landed headup on a toss that never takes place. That’s what probability is allabout.’ Many philosophers (e.g., Hájek 2009) havefollowed Jeffrey in scepticism about the existence and tractability ofthe hypothetical sequences apparently required by the right hand sideof RCT. However, there is some reason to think that RCT will farebetter than hypothetical frequentism in this respect. In particular,RCT does not propose toanalyse chance in terms of thesehypothetical sequences, so we can rely on the law of large numbers toguide our expectation that chance processes produce certain outcomefrequencies, in the limit, with probability 1; this at least mayprovide some reason for thinking that the outcome sequences willbehave as needed for RCT to turn out true. Even so, one might suspectthat many difficulties for hypothetical frequentism will recur forRCT. However, these difficulties stem from general issues with merelypossible evidential manifestations of chance processes, and havenothing specifically to do with randomness. The objections canvassedbelow are, by contrast, specifically concerned with the interactionbetween chance and randomness. So these more general potential worrieswill be set aside, though they should not be forgotten—they mayeven be, in the end, the most significant problem for RCT (if theright kind of merely possible sequence doesn’t exist, we must retreatto CTa or CTb and their problems).
It is possible for a fair coin—i.e., such that the chances ofheads and tails are equal—to be tossed infinitely many times, andto land heads on every toss. An infinite sequence of heads has, on thestandard probability calculus, zero chance of occurring. (Indeed, ithas zero chance even on most non-standard views of probability:Williamson 2007.) Nevertheless if such a sequence of outcomes didoccur, it would have happened by chance—assuming, plausibly, thatif each individual outcome happens by chance, the complex eventcomposed by all of them also happens by chance. But in that case wewould have an outcome that happened by chance and yet the obvioussuitable sequence of outcomes is not KML-random. This kind of exampleexploits the fact that while random sequences are a measure one set ofpossible outcome sequences of any process, chancy or otherwise, measureone does not meanevery.
This counterexample can be resisted. For while it is possible that afair coin lands only heads when tossed infinitely many times, it maynot be that this all heads outcome sequence is a suitable sequence. Forif we consider the counterfactual involved in RCT—what wouldhappen, if a fair coin were tossed infinitely many times—we wouldsay: it would land heads about half the time. That is (on the standard,though not uncontroversial, Lewis-Stalnaker semantics forcounterfactuals: Lewis 1973), though the all-heads outcome sequence ispossible, it does not occur at any of the nearest possibilities inwhich a fair coin is tossed infinitely many times.
If we adopt a non-reductionist account of chance, this line ofresistance is quite implausible. For there is nothing inconsistent onsuch views about a situation where the statistical properties of theoccurrent sequence of outcomes and the chance diverge arbitrarily far,and it seems that such possibilities are just as close in relevantrespects as those where the occurrent outcome statistics reflect thechances. In particular, as the all-heads sequence has some chance ofcoming to pass, there is (by BCP) a physical possibility sharinghistory and laws with our world in which all-heads occurs. This lookslike a legitimately close possibility to our own.
Prospects look rather better on a reductionist view of chance (SupplementA.3). On such a view, wecan say that worlds where an infinite sequence of heads does occur atsome close possibility will look very different from ours; they differin law or history to ours. In such worlds, the chance of heads is muchcloser to 1 (reflecting the fact that if a coin were tossed infinitelymany times, it might well land heads on each toss)—the coin isnot after all fair. The response, then, is that in any situation wherethe reductionist chance of heads really is 0.5, suitable outcomesequences in that situation or its nearest neighboursare infactall unbiased with respect to outcome frequencies. That isto say, they at least satisfy the property of large numbers; andarguably they can be expected to meet other randomness properties also.So, on this view, there is no counterexample to RCT from the merepossibility of these kinds of extreme outcome sequences. This responsedepends on the success of the reductionist similarity metrics forchancy counterfactuals developed by Lewis (1979a) and Williams (2008);the latter construction, in particular, invokes a close connectionbetween similarity and randomness. (Lewis’ original construction iscriticised in Hawthorne 2005.)
However, we needn’t use such an extreme example to make the point.For the same phenomenon exists with almost any unrepresentative outcomesequence. A fair coin, tossed 1000 times, has a positive chance oflanding heads more than 700 times. But any outcome sequence of 1000tosses which contains more than 700 heads will be compressible (longruns of heads are common enough to be exploited by an efficient codingalgorithm, and 1000 outcomes is long enough to swamp the constantsinvolved in defining the universal prefix-free Kolmogorov complexity).So any such outcome sequence will not be random, even though it quiteeasily could come about by chance. The only way to resist thiscounterexample is to refuse to acknowledge that such a sequence ofoutcomes can be an appropriate sequence in RCT. This is implausible,for such sequences can be actual, and can be sufficiently long to avoidthe analogue of the problem of the single case, certainly long enoughfor the Kolmogorov definition of randomness to apply. The only reasonto reject such sequences as suitable is to save RCT, but that isclearly question begging in this context. In this case of anunrepresentative finite sequence, even reductionism about chanceneedn’t help, because it might be that other considerations suffice tofix the chance, and so we can have a genuine fair chance but a biasedand non-random outcome sequence.
Any given token event is an instance of many different types oftrial:
It is obvious that every individual thing or event has an indefinitenumber of properties or attributes observable in it, and mighttherefore be considered as belonging to an indefinite number ofdifferent classes of things… (Venn 1876: 194)
Suppose Lizzie tosses a coin on Tuesday; this particular coin tossmay be considered as a coin toss; a coin toss on a Tuesday; a coin tossby Lizzie; an event caused by Lizzie; etc. Each of these ways of typingthe outcome give rise to different outcome sequences, some of which maybe random, while others are not. Each of these outcome sequences isunified by a homogenous kind of trial; as such, they may all besuitable sequences to play a role in RCT.
This is not a problem if chance too is relative to a type of trial,for we may simply make the dependence on choice of reference classexplicit in both sides of RCT. If chances were relative frequencies, itwould be easy enough to see why chance is relative to a type of trial.But chances aren’t frequencies, and single-case chance is almostuniversally taken to be not only well-defined for a specific event, butunique for that event. We naturally speak ofthe chance thatthis coin will lands heads on its next toss, with the chance taken tobe a property of the possible outcome directly, and not mediated bysome particular description of that outcome as an instance of this orthat kind of trial. Moreover, for chance to play its role in the Principal Principle (§1 and SupplementA.1), there must be a unique chance for agiven event that is to guide our rational credence, as we have only onecredence in a particular proposition stating the occurrence of thatevent. (Indeed, the inability of frequentists to single out a uniquereference class, the frequency in which is the chance, was taken to bea decisive objection to frequentism.) On the standard understanding ofchance, then, there is a mismatch between the left and right sides ofthe RCT. And this gives rise to a counterexample to RCT, if we take anevent with a unique non-trivial single-case chance, but such that atleast one way of classifying the trial which produced it is such thatthe sequence of outcomes of all trials of that kind is not random. Thetrivial case might be this: a coin is tossed and lands heads. Thatevent happened by chance, yet it is of the type ‘coin toss whichlands heads’, and the sequence consisting of all outcomes of thattype is not random.
The natural response—and the response most frequentists offered, with the possible exception of von Mises[14]—was to narrow the available reference classes. (As noted in SupplementA.3, many frequentists were explicit thatchances were frequencies in repetitions of natural kinds of processes.)Salmon (1977) appeals to objectively homogenous reference classes(those which cannot be partitioned by any relevant property intosubclasses which differ in attribute frequency to the originalreference class). Salmon’s proposal, in effect, is that homogeneousreference classes are random sequences, the evident circularity ofwhich will hardly constitute a reply to the present objection.Reichenbach (1949: 374) proposed to ‘proceed by considering thenarrowest class for which reliable statistics can be compiled’,which isn’t circular, but which fails to respond to the objection sinceit provides no guarantee that there will be only one such class. Theremay well be multiple classes which are equally ‘narrow’ andfor which reliable statistics can be collected (Gillies, 2000: 816). Inthe present context, this will amount to a number of sequences longenough to make for reliable judgements of their randomness or lackthereof.
This objection requires the chance of an event to be insensitive toreference class. Recently, Hájek (2007) has argued thatno adequate conception of probability is immune to a referenceclass problem, so that this requirement cannot be met. (For a relatedview of relativised chance, though motivated by quite differentconsiderations, see Glynn 2010.) However, as Hájek notes, thisconclusion makes it difficult to see how chance could guide credence,and it remains an open question whether a relativised theory of chancethat meets the platitudes concerning chance can be developed.
The two previous problems notwithstanding, many have found the mostcompelling cases of chance without randomness to be situations in whichthere is a biased chance process. A sequence of unfair coin tosses willhave an unbalanced number of heads and tails, and such a sequencecannot be random. But such a sequence, and any particular outcome inthat sequence, happens by chance.
That such sequences aren’t random can be seen by using bothMartin-Löf- and Kolmogorov-style considerations. In the lattercase, as we have already seen, a biased sequence will be morecompressible than an unbiased sequence, if the sequence is long enough,because an efficient coding will exploit the fact that biased sequenceswill typically have longer subsequences of consecutive digits, andhence will not be random. In the former case, a biased sequence willviolate at least one measure one property, on the standard Lebesguemeasure on infinite binary sequences—in particular, a measure one subset of the Cantor space will be Borel normal (§2.1), but no biased sequence is Borel normal. So on thestandard account of randomness, no sequences of outcomes of a biasedchance process are random, but of course these outcomes happened bychance.
One response to this problem is to try and come up with acharacterisation of randomness which will permit the outcomes ofbiased chances to be random. It is notable that von Mises’ initialcharacterisation of randomness was expressly constructed with this inmind—for him, a random sequence is one for which there is noadmissible subsequence having a frequency differing from the frequencyin the original sequence. This account is able to handle any value forthe frequency, not only the case where each of two outcomes areequifrequent. Given that the Martin-Löf approach is ageneralisation of von Mises’, it is not surprising that it too can beadapted to permit biased sequences to be random. Consider a binaryprocess with outcome probabilities \((p, 1 - p)\). The law of large numbers in a general form tellsus that a measure one set of sequences of independent trials of such aprocess will have limit frequencies of outcomes equal to \((p, 1 - p)\). This measure is not the standard Lebesguemeasure, but rather a measure defined by the chance function inquestion. We can similarly re-interpret the other effectivestatistical tests of randomness. Drawing as we did above(§2.1.2) on the language of statisticaltesting, we can characterise the random sequences as those which arenot significant with respect to the hypothesis that the outcomechances are \((p, 1 - p)\)—those which, asit were, conform to our prior expectations based on the underlyingchances.
To approach the topic of randomness of biased sequences throughKolmogorov complexity, suppose we are given—somehow—anycomputable probability measure \(\lambda\) over the set of infinitebinary sequences (that is, the probability of a given sequence can beapproximated arbitrarily well by a recursive function). A sequence\(\sigma\) is \(\lambda\)-incompressible iff for each \(n\), theKolmogorov complexity of the length \(n\) initial subsequence of\(\sigma\) (denoted \(\sigma_n )\) is greater than orequal to \({-}\log_2 (\lambda(\sigma_n ))\). Where \(\lambda\) is theLebesgue measure(supplementB.2), itfollows that
\[{-}\log_2 (\lambda(\sigma_n )) = {-}\log_2 (1/2^n ) = n,\]so we get back the originaldefinition of Kolmogorov complexity in that special case. With thisgeneralised definition of Kolmogorov randomness in hand, it turns outthat we can show a generalisation of Schnorr’s theorem(§2.3): a sequence \(\sigma\) is\(\lambda\)-incompressible iff \(\sigma\) is ML-random with respect to\(\lambda\).[15] Inthe framework ofsupplementB.1.1, asequence is ML-random in this generalised sense iff the measure, underour arbitrary computable measure \(\lambda\), of the \(n\)thsequential significance test is no greater than\(1/2^n\). (There are some potential pitfalls,suggesting that perhaps the generalisation to an arbitrary computablemeasure is an overgeneralisation: seesupplementB.1.3 forsome details.)
While the above approach, with the modifications suggestion in thesupplement taken on board, does permit biased random sequences, itcomes at a cost. While the Lebesgue measure is a natural one that isdefinable on the Cantor space of sequences directly, thegeneralisation of ML-randomness requires an independent computableprobability measure on the space of sequences to be provided. Whilethis may be done in cases where we antecedently know the chances, itis of no use in circumstances where the existence of chance is to beinferred from the existence of a random sequence of outcomes, in linewith RCT—for every sequence there is some chance measureaccording to which it is random, which threatens to trivialise theinference from randomness to chance. As Earman (1986: 145) alsoemphasises, this approach to randomness seems to require essentiallythat the chanciness of the process producing the random sequence isconceptually prior to the sequence being random. This kind of processapproach to randomness has some intuitive support, and we will returnto it below (§6.2), but it risks turning RCTinto an uninformative triviality. By contrast, the Lebesgue measurehas the advantage of beingintrinsically definable from thesymmetries of the Cantor space, a feature other computable measureslack.
The main difficulty with the suggested generalisation to biasedsequences lies in the simple fact that biased sequences, while theymight reflect the probabilities in the process which produces them,simply don’t seem to be random in the sense of disorderly andincompressible. The generalisation above shows that we can define anotion of disorderliness that is relative to the probabilitiesunderlying the sequence, but that is not intrinsic to the sequenceitself independently of whatever measure we are considering. As Earmanputs it (in slightly misleading terminology):
[T]here is a concept of randomness and a separable concept ofdisorder. The concept of disorder is an intrinsic notion; it takes thesequence at face value, caring nothing for genesis, and asks whetherthe sequence lacks pattern.… By contrast, the concept ofrandomness is concerned with genesis; it does not take the sequence atface value but asks whether the sequence mirrors the probabilities ofthe process of which it is a product. There is a connection betweenthis concept of randomness and the concept of disorder, but it is nota tight one. (Earman 1986: 145)
As we might put it: Kolmogorov randomness is conceptually linked todisorderliness, and while we can gerry-rig a notion of ‘biaseddisorder’, that doesn’t really answer to what we already knowabout the incompressibility of disorderly sequences. While we mightwell regard a sequence featuring even numbers of heads and tails, butproduced by successive tosses of anunfair coin, to be biasedin some sense with respect to the underlying measure of process behindit, it is still plausible that thisunrepresentativeness ofthe sequence isn’t conceptually connected with disorder in anyinteresting sense. It is very intuitive, as this remark from Dasguptasuggests, to take that biasedness—the increased orderliness ofthe sequence—to contrast with randomness:
if for a sequence \(x\) this limiting frequency exists but is notequal to 1/2, then, in view of our underlying fair coinmodel, \(x\) would clearly be biased, not random. … Thus,it is natural to view this ‘stochastic law ofunbiasedness’ as a ‘stochastic law ofrandomness’. (Dasgupta 2011: §3.2)
As the bias in a chance process approaches extremal values, it is verynatural to reject the idea that the observed outcomes are random. (Asan example, we may consider human behaviour—while people aren’tperfectly predictable, and apparently our behaviour doesn’t obeynon-probabilistic psychological laws, nevertheless to say that peopleact randomly is incorrect.) Moreover, there is a relativelymeasure-independent notion of disorder or incompressibility ofsequences, such that biased sequences really are less disorderly. Wecan define a measure-dependent notion of disorder for biased sequencesonly by ignoring the availability of better compression techniquesthat really do compress biased sequences more than unbiased ones. Togeneralise the notion of randomness, as proposed above, permits highlynon-random sequences to be called random as long as they reflect thechances of highly biased processes. So there is at least someintuitive pull towards the idea that if randomness does bifurcate asEarman suggests, the best deserver of the name is Kolmogorovrandomness in its original sense. But this will be a sense thatcontrasts with the natural generalisation of ML-randomness to dealwith arbitrary computable probability measures, and similarlycontrasts with the original sense of randomness that von Mises musthave been invoking in his earliest discussions of randomness in thefoundations of probability.
In light of the above discussion, while there has been progress ondefining randomness for biased sequences in a general andtheoretically robust way, there remain difficulties in using thatnotion in defence of any non-trivial version of RCT, and difficultiesin general with the idea that biased sequences can be genuinelydisorderly. But the generalisation invoked here does give some succourto von Mises, for a robust notion of randomness for biased sequencesis a key ingredient of his form of frequentism.
A further counterexample to RCT, related to the immediately previousone, is that randomness is indifferent to history, while chance is not.Chance is history-dependent. The simplest way in which chance ishistory-dependent is when the conditions that may produce a certainevent change over time:
Suppose you enter a labyrinth at 11:00 a.m., planning to choose yourturn whenever you come to a branch point by tossing a coin. When youenter at 11:00, you may have a 42% chance of reaching the center bynoon. But in the first half hour you may stray into a region fromwhich it is hard to reach the center, so that by 11:30 your chance ofreaching the center by noon has fallen to 26%. But then you turnlucky; by 11:45 you are not far from the center and your chance ofreaching it by noon is 78%. At 11:49 you reach the center; then andforevermore your chance of reaching it by noon is 100%. (Lewis, 1980:91)
But there are more complicated types of history-dependence. In Lewis’example, the property which changes to alter the chances is how closethe agent is to the centre. But there are cases where the propertywhich changes is a previous outcome of the very same process. Indeed,any process in which successive outcomes of repeated trials arenotprobabilistically independent will have this feature.
One example of chance without randomness involves an unbiased urnwhere balls are drawnwithout replacement. Each draw (with theexception of the last) is an event which happens by chance, but thesequence of outcomes will not be random (because the first half of thesequence will carry significant information about the composition ofthe second half, which may aid compressibility). But a more compellingexample is found in stochastic processes in which the chances of futureoutcomes depend on past outcomes. One well-known class of such processare Markov chains, which produce a discrete sequence of outcomes withthe property that the value of an outcome is dependent on the value ofthe immediately prior outcome (but that immediately prior outcomescreens off the rest of the history). A binary Markov chain might bethe weather (Gates and Tong 1976): if the two possible outcomes are‘sunny’ and ‘rainy’, it is plausible to supposethat whether tomorrow is rainy depends on whether today was rainy (arainy day is more likely to be preceded by another rainy day); butknowing that today was rainy arguably makes yesterday’s weatherirrelevant.
If a Markov chain is the correct model of a process, then even whenthe individual trial outcomes happen by chance, we should expect theentire sequence of repeated trials to be non-random. In the weathercase just discussed, we should expect a sunny day to be followed by asunny day, and a rainy day by a rainy one. In our notation 11 and 00should be more frequent than 10 or 01. But the condition of Borelnormality, which all random sequences obey, entails that every finitesequence of outcomes of equal length should have equal frequency in thesequence. So no Borel normal sequence, and hence no random sequence,can model the sequence of outcomes of a Markov chain, even though eachoutcome happens by chance.
At least some non-random sequences satisfy many of the measure oneproperties required of random sequences. For example, the Champernownesequence, consisting of all the binary numerals for every non-negativeinteger listed consecutively (i.e., 011011100101110111…), isBorel normal. This sequence isn’t random, as initial subsequences ofreasonable length are highly compressible. But it looks like itsatisfies at least some desiderata for random sequences. This sequenceis an attempt at producing apseudorandom sequence—onethat passes at least some statistical tests for randomness, yet can beeasily produced. (The main impetus behind the development ofpseudorandom number generators has been the need to efficiently producenumbers which are random for all practical purposes, for use incryptography or statistical sampling.) Much better examples exist thanthe Champernowne sequence, which meet more stringent randomness properties.[16] One simple technique for generatingpseudorandom sequences is asymbol shift algorithm (Smith1998: 53). Given an initial ‘seed’ numeral\(s_1, s_2 , \ldots ,s_n\), the algorithm simply spits out thedigits in order. Obviously this is useless if the seed is known, or canin some way be expected to be correlated with the events to which oneis applying these pseudorandom numbers. But in practical applications,the seed is often chosen in a way that we do expect it to carry noinformation about the application (in simple computer pseudorandomnumber generators, the seed may be derived in some way from the time atwhich the seed is called for). With a finite seed, this sequence willobviously repeat after a some period. A symbol shift is the simplestpossible function from seed to outcome sequence; better algorithms usea more complicated but still efficiently computable function of theseed to generate outcome sequences with a longer period, much longerthan the length of the seed (e.g., the ‘Mersenne twister’of Matsumoto and Nishimura 1998 has a period of \(2^{19937} - 1)\).
If the seed is not fixed, but is chosen by chance, we can havechance without randomness. For example, suppose the computer has aclock representing the external time; the time at which the algorithmis started may be used as a seed. But if it is a matter of chance whenthe algorithm is started, as it well may be in many cases, then theparticular sequence that is produced by the efficient pseudorandomsequence generator algorithm will be have come about by chance, but notbe random (since there is a program which runs the same algorithm on anexplicitly given seed; since the seed is finite, there will be such aprogram; and since the algorithm is efficient, the length before thesequence produced repeats will be longer than the code of the programplus the length of the seed, making the produced sequencecompressible). Whether the seed is produced by chance or explicitlyrepresented in the algorithm, the sequence of outcomes will be thesame—one more way in which it seems that the chanciness of asequence can vary while whether or not it is random remains constant.(Symbol shift dynamics also permit counterexamples to the otherdirection of RCT—see §5.2.)
Much the same point could have been made, of course, with referenceto any algorithm which may be fed an input chosen by chance, and so mayproduce an outcome by chance, but where the output is highlycompressible. (One way in which pseudorandom sequence generators arenice in this respect is that they are designed to produce highlycompressible sequences, though non-obviously highly compressible ones).The other interesting thing about those algorithms which producepseudorandom sequences is that they provide another kind ofcounterexample to the epistemic connection between chance andrandomness. For our justification in thinking that a given sequence israndom will be based on its passing only finitely many tests; we can bejustified in believing pseudorandom sequence to be random (in somerespectable sense of justification, as long as justification is weakerthan truth), and justified in making the inference to chance via RCT.But then we might think that this poses a problem for RCT to play theright role epistemically, even if it were true. Suppose one sees agenuinely random sequence and forms the justified belief that it israndom. The existence of pseudorandom sequences entails that thingsmight seem justificatorily exactly as they are and yet the sequence notbe random. But such a scenario, arguably, defeats my knowing that thesequence is random, and thus defeats my knowing the sequence to havebeen produced by chance (and presumably undermines the goodness of theinference from randomness to chance).
The counterexamples to RCT offered in §§4.3–4.4 suggest strongly thatthe appeal of RCT depends on our curious tendency to take independentidentically distributed trials, like the Bernoulli process of fair cointossing, to be paradigms of chance processes. Yet when we widen ourgaze to encompass a fuller range of chance processes, the appeal of theright-to-left direction of RCT is quite diminished. It is now time toexamine potential counterexamples to the other direction of RCT. Thereare a number of plausible cases where a random sequence potentiallyexists without chance. Many of these cases involve interesting featuresof classical physics, which is apparently not chancy, and yet whichgives rise to a range of apparently random phenomena. Unfortunatelysome engagement with the details of the physics is unavoidable in thefollowing.
One obvious potential counterexample involves coin tossing. Somehave maintained that coin tossing is a deterministic process, and assuch entirely without chances, and yet which produces outcome sequenceswe have been taking as paradigm of random sequences. This will be setaside until §7, where the claim that determinismprecludes chance will be examined.
For many short sequences, even the most efficient prefix-free codewill be no shorter than the original sequence (as prefix-free codescontain information about the length of the sequence as well as itscontent, if the sequence is very short the most efficient code may wellbe the sequence itself prefixed with its length, which will be longerthan the sequence). So all short sequences will be Kolmogorov random.This might seem counterintuitive, but if randomness indicates a lack ofpattern or repetition, then sequences which are too short to displaypattern or repetition must be random. Of course it will not usually beuseful to say that such sequences are random, mostly because in veryshort sequences we are unlikely to talk of the sequence at all, asopposed to talking directly about its constituent outcomes.
Because of the modal aspect of RCT, for most processes there willpossibly be a long enough sequence of outcomes to overcome any‘accidental’ randomness due to actual brevity of theoutcome sequence. But for events which are unrepeatable or seldomrepeatable, even the merely possible suitable reference classes will besmall. And such unrepeatable events do exist—consider the BigBang which began our universe, or your birth (your birth, notthe birth of qualitatively indistinguishable counterpart), or the deathof Ned Kelly. These events are all part of outcome sequences that arenecessarily short, and hence these events are part of Kolmogorov randomsequences. But it is implausible to say that all of these eventshappened by chance; no probabilistic theory need be invoked to predictor explain any of them. In the case of Kelly’s death, for example,while it may have been a matter of chance when he happened to die, itwas not chance that he died, as it is (physically) necessary that hedid. So there are random sequences—those which are essentiallyshort—in which each outcome did not happen by chance.
The natural response is to reject the idea that short sequences areapt to be random. The right hand side of RCT makes room for this, forwe may simply insist that unrepeatable events cannot be repeated oftenenough to give rise to an adequate sequence (whether or not theinadequate sequence they do in fact give rise to is random). Theproblem here is that we can now have chance without randomness, ifthere is a single-case unrepeatable chance event. Difficulties in factseem unavoidable. If we consider the outcomes alone, either all shortsequences are random or none of them are; there is no way todifferentiate on the basis of any product-based notion betweendifferent short sequences. But as some single unrepeatable events arechancy, and some are not, whichever way we opt to go with respect torandomness of the singleton sequences of such events we will discovercounterexamples to one direction or another of RCT.
The simple symbol shift dynamics in §4.5 hada finite seed, which allowed for chance without randomness. Yet thereseem to be physical situations in which a symbol shift dynamics is anaccurate way of representing the physical processes at work. One simpleexample might be a stretch and fold dynamics, of the kind common inchaos theory (Smith 1998: §4.2). The classic example is thebaker’s transformation (Earman 1986: 167–8; Ekeland1988: 50–9). We take a system the state of which at any one timeis characterised by a point \((p, q)\) in the real unitsquare. We specify the evolution of this system over time as follows,letting \(\phi\) be the function governing the discrete evolutionof the system over time (i.e., \(s_{t + 1} = \phi(s_t))\):
\[\phi(p,q) = \begin{cases} (2p, q/2), & \text{if } 0 \le p \le \frac{1}{2} \\ (2p-1,(1+q)/2), & \text{if } \frac{1}{2} \le p \le 1 \end{cases}\]This corresponds to transforming the unit square to a rectangletwice as wide and half the height, chopping off the right half, andstacking it back on top to fill the unit square again. (That thistransformation reminded mathematicians of baking says something abouttheir unfamiliarity with the kitchen—similar transformations,where the right half is ‘folded’ back on top, are morerealistic.) If we represent the coordinates \(p\) and \(q\)in binary, the transformation is this:
\[\phi (0.p_1 p_2\ldots, 0.q_1 q_2\ldots) = (0.p_2 p_3\ldots , 0.p_1 q_1 q_2\ldots).\]So this is a slight variant on a simple symbol shift, as the\(p\)-coordinate is a symbol shift to the right, while the\(q\) coordinate is, in effect, a symbol shift to the left.[17]
One important feature of this dynamics is that it is measurepreserving, so that if \(X\) is a subset of the unit square,\(\mu(X) = \mu(\phi(X))\).(This is easily seen, as the symbol shift dynamics on the basic sets ofinfinite binary sequences is measure-preserving, and each coordinatecan be represented as an infinite binary sequence.) Define the set\(L = \{(p, q) : 0 \le p \lt \frac{1}{2} \}\).We see that \((p, q) \in L\) iff\(p_1 = 0\). Since \(p\) can be represented by aninfinite binary sequence, and a measure one set of finite binarysequences is Borel normal, we see that almost all states of this systemare such that, over time, \(\mu(\phi(s)\in L \mid s \in L) = \mu(\phi(s) \in L)\)—thatis, whether the system is in \(L\) at \(t\) isprobabilistically independent of its past history. Furthermore,\(\mu(L) = \mu(\overline{L})\). The behaviour ofthis system over time, with respect to the partition \(\{L, \overline{L}\}\), is therefore aBernoulli process, exactly like a sequence of fair coin tosses—aseries of independent and identically distributed repetitions of achance process. If the RCT is true, then a system which behaves exactlylike a chance process should have as its product a random sequence. Sothe sequence of outcomes for the baker’s transformation (under thispartition) is a random sequence.
But unlike a sequence of fair coin tosses, supposing them to involvegenuine chance, the baker’s transformation is entirelydeterministic. Given a particular point \((p,q)\) as an initial condition, the future evolution of states ofthe system at each moment is time \(t\) is determined to be\(\phi^t (p, q)\). So whilethe product produced is random, as random as a genuine chance process,these outcomes do not happen by chance; given the prior state of thesystem, the future evolution is not at all a matter of chance. So wehave a random sequence without chance outputs. (Indeed, given thesymbol shift dynamics, the evolution of the system over time in\(\{L, \overline{L}\}\)simply recapitulates successive digits of the starting point.) To beperfectly precise, the trial in this case is sampling the system at agiven time point, and seeing which cell of the coarse grained partitionit is in at each time. This is a sequence of arbitrarily repeatedtrials which produces a random sequence; yet none of these outcomeshappens by chance.[18] To put the point slightly differently:while the sequence of outcomes is random, there is a perfectly adequatetheory of the system in question in which probability plays no role.And if probability plays no role, it is very difficult to see howchance could play a role, since there is no probability function whichserves as norm for credences, governs possibility, or is non-trivialand shared between intrinsic duplicate trials. In short, no probabilityfunction that has the features required of chance plays a role in thedynamics of this system, and that seems strong reason for thinkingthere is no chance in this system.
The baker’s transformation provides a simple model of deterministicmacro-randomness—a system which has a\(\mu\)-measure-preserving temporal evolution, and produces asequence of coarse grained states that have the Bernoulli property. Itis a question of considerable interest whether there are physicallymore realistic systems which exhibit the same features. We may conceiveof an \(n\)-particle classical (Newtonian) system as having, ateach moment, a state that is characterised by a single point in a\(6n\)-dimensional state space (each point a \(6n\)-tuplecharacterising the position and momentum of each particle). Theevolution of the system over time is characterised by itsHamiltonian, a representation of the energetic and otherproperties of the system. The evolution under the Hamiltonian is\(\mu\)-measure-preserving (by Liouville’s theorem), so it mightbe hoped that at least some systems could be shown to be Bernoulli too.Yet for closed systems, in which energy is conserved over time, this isnot generally possible. Indeed, for closed systems it is not generallypossible to satisfy even a very weak randomness property,ergodicity. A system is ergodic just in case, in the limit,with probability one, the amount of time a system spends in a givenstate is equal to the (standard) measure of state space thatcorresponds to that state (Earman 1986: 159–61; Sklar 1993:60–3; Albert 2000: 59–60). While a Bernoulli system isergodic, the converse entailment does not hold; if the system movesonly slowly from state to state, it may be ergodic while the state atone time is strongly dependent on past history (Sklar 1993:166–7). While ergodicity has been shown to hold of at least onephysically interesting system (Yakov Sinai has shown that the motion ofhard spheres in a box is ergodic, a result of great significance forthe statistical mechanics of ideal gases), a great many physicallyinteresting systems cannot be ergodic. This is the upshot of theso-called KAM theorem, which says that for almost all closed systems inwhich there are interactions between the constituent particles, therewill be stable subregions of the state space—regions of positivemeasure such that if a system is started in such a region, it willalways stay in such a region (Sklar 1993: 169–94). Such systemsobviously cannot be ergodic.
The upshot of this for our discussion may be stated: ‘there areno physically realistic classical systems which exhibit evenergodicity, and so no physically realistic classical system canexhibit randomness. The baker’s transformation is a mathematicalcuriosity, but is not a genuine case of randomness without chance,since systems like it are not physically possible.’ Thisresponse is premature. There are physically interesting systems towhich the KAM theorem does not apply. Open or dissipative systems,those which are not confined to a state space region of constantenergy, are one much studied class, because such systems are paradigmsof chaotic systems. The hallmarks of a chaotic dissipative system aretwo (Smith 1998: §1.5):
There are physically realistic classical systems which exhibit bothof these features, of which the best known is perhaps Lorenz’s model ofatmospheric convection (Smith 1998: §1.4; Earman 1986: 165). Thecombination of these two features permits very interestingbehaviour—while the existence of an attractor means that overtime the states of the system will converge to the region of theattractor, the sensitive dependence on initial conditions means thatclose states, at any time, will end up arbitrarily far apart. For thisto happen the attractor must have a very complex shape (it will be aregion of small measure, but most of the state space will be in theneighbourhood of the attractor). More importantly for our purposes, asystem with these characteristics, supposing that the divergence underevolution of close states happens quickly enough, will yield behaviourclose to Bernoulli—it will yield rapidmixing (Luzzattoet al. 2005).[20] Roughly, a system is mixing iff thepresence of the system in coarse state at one time is probabilisticallyindependent of its presence in another coarse state at another time,provided there is a sufficient amount of time between the twotimes. This is weaker than Bernoulli (since the states of aBernoulli system are probabilistically independent if there is any timebetween them), but still strong enough to plausibly yield a randomsequence of outcomes from a coarse grained partition of the statespace, sampled infrequently enough. So we do seem to have physicallyrealistic systems that yield random behaviour without chance. (See alsothe systems discussed in Frigg 2004.)
Indeed, the behaviour of a chaotic system will be intuitively randomin other ways too. The sensitive dependence on initial conditions meansthat, no matter how accurate our finite discrimination of the initialstate of a given chaotic system is, there will exist statesindiscriminable from the initial state (and so consistent with ourknowledge of the initial state), but which would diverge arbitrarilyfar from the actual evolution of the system. No matter, then, how wellwe know the initial condition (as long as we do not have infinitepowers of discrimination), there is another state the system could bein for all we know that will evolve to a discriminably different futurecondition. Since this divergence happens relatively quickly, the systemis unable to be predicted. (Anecdotally, at least, Lorenz’ model of theweather seems borne out by our inability to reliably predict futureweather even a few days from now.) Insofar as randomness and lack ofreliable prediction go hand in hand, we have another reason forthinking there is product randomness here (§6.2).
Just as before, the classical physical theory underlying thedynamics of these chaotic systems is one in which probability does notfeature. So we are able to give an adequate characterisation of thephysical situation without appeal to any probabilities fit to play thechance role. Given well-enough behaved boundary conditions, this systemis also deterministic (though see §5.3), andthat may also be thought to preclude a role for non-trivial chances. Soagain we have randomness in the performance, though none of theoutcomes happened by chance.
Two avenues of resistance to the line of thought in this sectionsuggest themselves. The first is to maintain that, despite the truth ofdeterminism for the baker’s transformation and classical physics(modulo worries to be addressed in the following section), there arestill, or at there least may be, non-trivial chances in these theories.The proposal is that the possibility remains ofdeterministicchance, so that from the fact of determinism alone it does notfollow that we have a counterexample to RCT. The outcomes may bedetermined to happen by the prior conditions, but (so the suggestiongoes) they may still happen by chance. This radical proposal isdiscussed below, in §7. It should be noted,however, that even if it is true that deterministic chance is possible,that observation is far from establishing that the physical theoriesunder discussion here are ones in which deterministic chance features.That some deterministic theories may have chances is no argument thatall will, and particularly in very simple cases like the baker’stransformation, there doesn’t seem much point in an invocation ofdeterministic chance: chances will be trivial or redundant if classicalphysics is true. The second avenue of resistance is to claim that therereally is chance here—it is chance of theinitialconditions which is recapitulated in the random sequence ofoutcomes. While the Lebesgue measure over the set of initial conditionsin our models is formally like a probability function, to assume ityields genuine chances is a fairly controversial thesis (for a contraryopinion, see Clark 1987). Other initial conditions could have obtained;still it seems wrong to think that (somehow) there was a chance processwhich terminated in the selection of the initial conditions whichhappened in fact to obtain at our world. Rather, that the initialconditions are what they are seems to be a brute fact. If there are tobe chances, then, they cannot be dynamical chances, the kind that isfamiliar from physics, and from our discussion in §1.2. Some recent arguments in favour of the possibilityof chancy initial conditions are discussed in the followingsupplementary document:
Supplement D. Chance and Initial Conditions
But whether or not the idea of chancy initial conditions can be madeto work, the fact remains that at most one outcome in the randomsequence—the first one—happens by chance. The subsequentstates do not, yet RCT is committed to there being (dynamical,transition) chances in those state transitions.
Despite the neat picture of classical determinism drawn in theprevious section, it is well known that classical physics is not infact deterministic. These cases of indeterminism do not undermine theapplications of classical mechanics in the previous section. Butclassical indeterminism may provide problems of its own for RCT. Usefulfurther material on the topic of this section can be found in the entryon causal determinism (Hoefer 2010:§4.1).
For present purposes, indeterminism occurs when the state of thesystem at one time does not uniquely fix the state the system will bein at some future time (see §7). To showindeterminism in the classical case, it suffices to give a state ofsome system at a given time and to specify two future states that areincompatible with each other and yet both states are consistent withNewton’s laws of motion and the initial state.
To help us in this task, it is useful to note one fact aboutNewtonian mechanics: the laws aretime-reversal invariant(Albert 2000: ch. 1). That is, for every lawful trajectory through thestate space, there is another lawful trajectory which results from thefirst by mapping every instantaneous state in the first trajectory toits image state in which the particles have the same positions but thesigns on the components of momentum are reversed, and running thetrajectory in reverse order. These image states are those where theparticles are in the same positions but moving in exactly the oppositedirection. So for every lawful process, that process running backwardsis also lawful. (If these trajectories are so lawful, why don’t we seethem?—that is the deep question of thermodynamic asymmetry,discussed briefly in supplementD.)
Two examples serve to illustrate the possibility of classicalindeterminism. A very elegant recent example is provided by‘Norton’s dome’ (Norton 2003; 2008). A point mass is atrest at the apex of a dome (of certain shape) at\(t^*\). One obvious solution to Newton’s equations ofmotion is that it continues to do so for all moments \(t \gt t^*\). But, Norton points out, there is anothersolution: that while at \(t = t^*\) the mass isat rest, at every moment \(t \gt t^*\), themass is moving in some direction. But this means the mass spontaneouslymoves off in some arbitrary direction at an arbitrary time.[21]Determinism is clearly violated: for some given time\(t^*\), there is a state at \(t'\) where theparticle remains at the apex of the dome; and there are manyincompatible states where at \(t'\) the particle issomewhere else on the surface of the dome. Nothing in the conditions at\(t^*\) fixes which of these many future states will bethe one to eventuate. An easy way to understand the dome example is toconsider its time reversal: a ball is given some initial velocity alongthe surface of the dome toward the apex. Too little, and it fallsshort; too great, and it overshoots. Just right, however, and the ballcomes to rest precisely at the apex, and remains at rest. The timereversal of this system is the original dome example.
A more exotic example involves ‘space invaders’ (Earman1986: ch. III). These are particles that are at no spatial location attime \(t\), and therefore form no part of the state at time\(t\), but which travel in ‘from’ spatial infinity andby time \(t'\) have a location. We can see the example moreclearly if we invoke time-reversal invariance. Consider two pointparticles, \(a\) and \(b\), at rest and in the vicinity ofone another at \(t^*\). From \(t^*\)onwards, force is applied to \(a\) in such a way that the velocityof \(a\) away from \(b\) increases without bound. This ispossible, because there is no upper bound on velocity in classicalphysics. Indeed, let the velocity of \(a\) increase fast enoughthat by some finite time \(t', a\) has no finitevelocity, and is thus ‘at’ spatial infinity. Thetime-reversal of this system has the particle \(a\) having nolocation at \(t'\), but having a location and continuouslydecreasing velocity at every moment \(t \gt t'\),until it comes to rest at \(t^*\). This system violatesdeterminism in the sense given above. The state at \(t'\)consists of a single particle \(b\) at rest. That state could befollowed at \(t^*\) by the space-invaded state justdescribed; or it could be followed at \(t^*\) by theincompatible state of \(b\) simply being at rest some more.Nothing in the laws rules out either of these transitions. Of course,the model isn’t particularly physically realistic—where does theforce acting on \(a\) come from? But physically more realisticsystems exhibiting the same general structure have been given; Earmanmentions one due to Mather and McGehee (1975) involving four pointparticles moving in such a way that the forces they exert on each otherin collisions end up unboundedly far from one another in a finite time(see also Saari and Xia 1995).
While classical mechanics is thus indeterministic, it is importantlynot chancy. There is no reason to think that we either need to or canassign a probability distribution over the possible future states inour indeterministic systems. Norton says this about his dome:
One might think that … we can assign probabilities to thevarious possible outcomes. Nothing in the Newtonian physics requires usto assign the probabilities, but we might choose to try to add them forour own conceptual comfort. It can be done as far as the direction ofthe spontaneous motion is concerned. The symmetry of the surface aboutthe apex makes it quite natural for us to add a probabilitydistribution that assigns equal probability to all directions. Thecomplication is that there is no comparable way for us to assignprobabilities for the time of the spontaneous excitation that respectthe physical symmetries of solutions. Those solutions treat allcandidate excitation times \(T\) equally. A probabilitydistribution that tries to make each candidate time equally likelycannot be proper—that is, it cannot assign unit probability tothe union of all disjoint outcomes.[22] Or one that is propercan only be defined by inventing extra physical properties, not givenby the physical description of the dome and mass, Newton’s laws and thelaws of gravitation, and grafting them unnaturally onto the physicalsystem. (Norton 2003: 9–10)
The point Norton makes about the impossibility of a uniformdistribution over a countable set of time intervals holds also for thetime at which we might expect the space invader to occur in the secondtype of case. Thus, it seems, we have indeterminism without chance.
We can use these constructions to come up with counterexamples toRCT. Let a dome system be prepared, and left for a period of 5 seconds.If the ball remains at rest on the apex, call the outcome‘0’. If the ball moves, and so is at the end of 5 secondssomewhere on the dome other than the apex, call the outcome‘1’. Both outcomes are physically possible final states ofthe system after 5 seconds. If the system is repeatedly prepared inthis state, it is physically possible to get a random sequence ofoutcomes of these repeated trials. Certainly the outcomes cannot beproduced by some finite algorithm, since the indeterministic dynamicspermits as physically possible every sequence of outcomes, includingthose that differ at some place from every algorithmically compressiblesequence. In the infinite future case, it is physically possible forthe system to produce every infinite binary sequence, but at mostcountably many of these are non-random. So it is physically possiblefor these setups to produce a random sequence of outcomes in the KMLsense. But we then have randomness without a chance distribution overthe outcomes. Randomness only requires two distinguishable possibleoutcomes and the possibility of production of arbitrary sequences ofsuch outcomes. Chance requires two distinguishable outcomes, each ofwhich has some chance. These cases show that chance and possibilitycome apart—there are cases where there are two possible outcomesof a process, neither of which has any chance at all (not even a chanceof zero).
The last objection draws on a remark made in §4.4. Consider a large finite urn full of black and white balls. The number of balls is sufficiently large thatthe sequence of outcomes of draws is long enough to be random. Sosuppose we make a random selection from this urn, drawing ballswithout replacement until the urn is empty. The resulting sequence ofoutcomes is, or at least could be, random—it had better be,since this sequence meets all the conditions to be a simple randomsample from the population. (We could attach a number to each memberof an experimental population, and give the \(n\)-th member apill of an active substance (respectively, a placebo) ifthe \(n\)-th draw is black (white).) But in this process, theoutcomes become less and less chancy as the number of ballsdiminishes. Since there are only finitely many balls, there will adraw such that the chance of it being black is either 1 or 0, and sowhatever outcome eventuates did not occur by chance. But then we havea random sequence that includes an outcome (drawing a black ball,say) which did not happen by chance, contrary to RCT.
One response is to say that this last outcome did happen by chance,since at the start of the drawing there was a positive chance that awhite ball would be drawn last, and a positive chance that a black ballwould be. This response neglects the time-dependence of chances. Ifthere were \(n\) balls initially, and we let ‘Lastie’name the ball that was in actual fact drawn last, then we can say: thechance that Lastie is drawn last was \(1/n\) initially, and afterthe \(m\)-th draw it was \(1/(n - m)\), untilit reached 1 and remained there. At that last stage it was no longer amatter of chance whether Lastie would be drawn last; it is the onlyball left in the urn. RCT maintains that a given outcome happens bychance iff it is part of a random sequence. At every time, the event ofLastie being drawn last is part of a random sequence. But then there isat least one time at which Lastie being drawn last is part of randomsequence but, at that time, it did not happen by chance.(Alternatively, of course, it could be maintained that even events withtrivial chances happen by chance. But this would permit again theproblem of biased sequences; a string of all heads tossed by atwo-headed coin could then be random, which it is not.)
The discussion in §§4–5 leaves RCT in a doubtful position. But it may be thatthe problems for RCT are due more to some defect in the theories ofchance and randomness sketched in §§1–2. As noted earlier, there arealternative conceptions of chance and randomness that have some appealand might perhaps save RCT. They won’t have much to say about the modalproblems for merely possible random sequences mentioned when RCT wasfirst introduced. But perhaps the other objections can be avoided. Theproblems for RCT arise fundamentally because of the split betweenproduct randomness and process chance. Closing this gap promises to aidRCT. This following two subsections will consider product conceptionsof chance and process based conceptions of randomness.
The frequency theory is a product conception of chance. Anoutcome-type has a chance, according to von Mises (1957), just in caseit is part of a random sequence of outcomes in which that outcome typehas a robust limit relative frequency. So chance can’t be separatedfrom randomness; it in fact requires randomness. Moreover, since havinga limit relative frequency of \(\frac{1}{2}\) is a measure one property of infinitebinary sequences, all random sequences define collectives. (Thoseinfinite binary sequences which do not converge on a limit frequencyare non-random.) Yet the problems with frequentism as a theory ofchance are well known (Hájek, 1997; 2009; Jeffrey,1977)—we have come across some of them above—and to saveRCT at the price of accepting frequentism has not attracted many.
But the prospects are more promising for Humean views of chance(Lewis 1994; Loewer 2004), like the reductionist views discussed insupplementsA.1,A.3, andD. These are a species of productconception of chance, for a possible world will not feature chancesunless the best (simplest, best fitting, and most informative)description of what events occur in that world involves a probabilityfunction. Two worlds couldn’t differ in their chances without differingalso somewhere in the events which occur. Chance thus supervenes on theactual outcomes in a given world, but not necessarily in a directway—in the service of simplicity, some deviation from the actualfrequency values may yield a description that is better overall. Thereis considerable debate over whether a Humean best systems account canexplain all of the things we know about chance. Lewis thought thatchance was the ‘big bad bug’ for his broadly Humean worldview (though he thought that the NP, discussed in supplementA.1, debugged Humean Supervenience:Lewis 1994), and there has been considerable debate over whether or notthe PP or the BCP can be accounted for by the Humean, as the referencesin thesupplement A attest. Moreover, thereare problems about whether the notion of a probability distribution‘fitting’ a set of outcomes makes sense (Elga 2004). Butsuppose a best systems account can be made to work.
The role of simplicity in Humean views is important for randomness.For if a world contains a genuinely random sequence of outcomes, therewill be no short description of that sequence. Those descriptions whichdo not attempt to describe what happens in all its particularity, butinstead involve a probability distribution that makes the sequence atypical one with respect to that probability function, will be lessinformative but much shorter, and still fit just as well. So it seemsthat if a world contains a random sequence of outcomes, the best theorywill be one which involves probabilities, and in virtue of being partof the best theory, those probabilities will be chances. That makes theright to left direction of RCT plausible. The other direction wouldhold if this route through simplicity was the only way in which chancescould enter into the best system. Could there be a world with chancesin which there was no random sequence? Here things are rather murkierfor the Humean. For the hallmark of the best systems approach to chanceis that, by involving simplicity, it avoids some of the problems forpure frequentism. In particular, an unrepresentative outcome sequence(a short one, or highly biased one) need not force the chances to takethe value of the actual frequencies. Suppose a world contains twointrinsic duplicate coins, one of which is tossed many times, landingheads about half the time; the other is tossed few times and landstails every time. The second coin’s outcome sequence has a headsfrequency of zero. It is a strength of the best systems account that wemay still say the chance of heads was \(\frac{1}{2}\), because the coin is embeddedin a world where another, similar, coin did have the appropriateoutcome frequencies, and it is overall a simpler theory to say thatboth of these coins obeyed the same probabilistic laws than that thesecond one was an oasis of determinism. But this case provides aproblem for RCT—it looks like the second coin toss is not part ofany random sequence of outcomes (since a few all tails tosses is notrandom), but it has a chance.
We should not let even the partial success of best systems analysisin preserving RCT sway us. The Humean account of chance is perfectlycompatible with the existence of bias and with non-independentsequences of trials; RCT is not. The problem just mentioned arises evenin the best circumstances for RCT, where there is at least one actualunbiased fair coin sequence. The existence of such a problem could havebeen predicted. The best systems account deviates from pure frequentismprecisely in trying to accommodate the single-case process intuitionsthat, as we saw in §1, characterise chance. Everysuccess in this direction brings this broadly product conception ofchance closer to a process conception, and will therefore be apotential opportunity for counterexamples to RCT to emerge.
As mentioned at the beginning of §2, sometimes‘random’ is used in a process sense. There have been somephilosophical approaches to randomness which attempt to take thisseriously, but which do not take it to be merely equivalent to‘chancy’ and thus trivialise RCT. The most popular suchapproach is to connect randomness withindeterminism, and todefend RCT by arguing directly that indeterminism yields both chanceand randomness. Prospects for that approach will be discussed in §7.
The next most discussed view of process randomness is an epistemicone. Random processes on this view are those whose outcomes we cannotknow in advance; that is, random processes areunpredictable(Eagle 2005; Kyburg 1974: ch. 9).[23] Here is one clearexpression of the view:
At the most basic level, we say an event is random if thereis no way to predict its occurrence with certainty. Likewise, a randomprocess is one for which we are not able to predict what happens next.(Frigg 2004: 430)
For this kind of view to yield the right results, we cannot count as‘being able to predict a process’ if we merely guessrightly about its outcomes. For that reason, prediction must involvesome notion ofreasonableness; it must be rational for theagent to make the predictions they do. For example, Eagle (2005: 770)insists that it is the physical theory accepted by the predicting agentthat makes certain posterior credences reasonable; simply guessing willnot be reasonable, even if it’s correct.
This definition overlaps considerably with those definitions ofrandomness canvassed in §2. In particular, if aprocess is predictable, that will make available a winning bettingstrategy on the sequence of outcomes of that process, which cannottherefore be a KML-random sequence of outcomes. So unpredictability ofthe generating process is a necessary condition on KML-randomness ofany concrete outcome sequence.
But unpredictability is not sufficient, for it may be that we cannotknow everything true of the future outcomes of the process, and thosetruths might preclude a KML-random sequence. One way to see this drawson our discussion of chaotic dynamics. Let’s say that a system exhibitsapparent dependence on initial conditions if statesindiscriminable to us may end up arbitrarily far apart after somerelatively short period of time. (Another definition, if knowledgeobeys a margin for error principle, would be: a system exhibitsapparent dependence on initial conditions if, for all we know, it issensitively dependent on initial conditions.) Apparent dependence oninitial conditions can obtain even if sensitive dependence on initialconditions does not. For there may well be a size \(v\) such that,under some partition of the set of states into regions of measuresmaller than \(v\), any two points starting in the same cell ofthe partition evolve to future states which are close to oneanother—as long as \(v\) is small with respect to ourabilities to discriminate. A system which is apparently but notsensitively dependent on initial conditions will be unpredictable tous, but there will exist an algorithm that, given some finitespecification of the initial conditions, generates the future evolutionof the system perfectly. (The key is that the finite specification mustinvolve more precise data than we could know to characterise thesystem.) This sequence, if long enough, is not KML-random, but it is unpredictable.[24]
Because of the considerable overlap between the unpredictablygenerated sequences and the KML-random sequences (Frigg 2004: 431),many roles the latter can play will be played by the former too. Eagle(2005) further argues that the unpredictably generated sequences are abetter fit to the theoretical role of randomness, and claims on thatbasis thatrandomness is unpredictability. One nice thingabout this thesis is that, being a process notion, it directly connectswith chance processes. We can thus directly evaluate the originalthesis connecting chance and randomness, CT, in the form:
(CTU) A process is chancy iff it is notrationally predictable.
The left-to-right direction of CTU looks relatively secure when weattend just to independent and identically distributed trials. But whenthe trials are not independent, like the examples in §4.4, future outcomes can happen by chance, even ifknowing the past states of the system does put one in a position tobetter predict the future outcomes. The right-to-left direction of CTUis in even worse straits. For if KML-randomness of a sequence entailsits unpredictable generation, then every counterexample to RCT whichinvolves KML-randomness without chance will also involveunpredictability without chance, and also constitute a counterexampleto CTU. There is no succour to be found for defenders of RCT in thisconception of randomness.
One last hope for the thesis that chancy outcomes are random comesfrom the connection of both notions with indeterminism. Consider thisargument:
\(\mathbf{P1}\):
An outcome happens by chance iff it is produced by an indeterministicprocess.\(\mathbf{P2}\):
A possible sequence of outcomes is random iff a repeatedindeterministic process could produce all of the outcomes.\(\mathbf{RCT}\):
Therefore, an outcome happens by chance iff there is a possible randomsequence of outcomes, produced by the same process, of which it is amember.
This argument is valid. If the premises are true, we have a directargument for RCT. We have no direct response to the objections raisedin §§4–5, butsomehow, if this argument succeeds, those objections must miss thepoint. The premises have some initial plausibility (though P2 isdubious: surely a properly indeterministic process could produce anysequence of outcomes whatsoever, including many non-random sequences?We discuss this further in §7.2). The thesisthat indeterminism is necessary and sufficient for chance has long beena popular claim. And randomness and indeterminism also seem to have aclose connection. But to evaluate them, we will need to be more precisethan we have been about determinism.
Earman-Montague Determinism:
A scientific theory isdeterministic iff any two sequences ofstates in models of that theory which share some state at some timeshare every state at every time. A theory isindeterministiciff it is not deterministic; equivalently, if two systems can be in thesame state at one time and evolve into distinct states. A system is(in)deterministic iff the theory which completely and correctlydescribes it (of which it is a model) is (in)deterministic. (Earman1986; Montague 1974)
Determinism so stated is a supervenience thesis: as Schaffer (2007:115) puts it, ‘A world \(w\) is deterministic iff: for alltimes \(t\) in \(w\), the total occurrent history of\(w\) supervenes on the occurrent state of \(w\) at\(t\) together with the laws of \(w\).’
With this in mind, we now evaluate the premises of this argument.There seem to be significant doubts about both directions of bothpremises. The upshot is that indeterminism offers little comfort forthe defender of RCT.
It is very nearly a piece of philosophical orthodoxy thatnon-trivial objective chances require indeterminism. The view is seldomdefended; even those who trouble to state the view explicitly (Lewis,1980: 120) don’t go to the further effort of justifying it, perhapsbecause it seems obvious. After all, under determinism, someone whoknew the past and the laws would be in a position to know withcertainty every future outcome. So our use of probability underdeterminism must be purely subjective, a side-effect of our (perhapsunavoidable) ignorance of the past or the laws. If this orthodoxy iscorrect, at least the left-to-right direction of P1 would be true.
However, there has recently been a considerable amount of work byphilosophers defending the thesis that chance and determinism areconsistent. Many of the topics dealt with above feature in theirarguments. Loewer (2001) draws on the best systems analysis of chanceto argue that, in worlds like ours (with entropy-increasing processes,and apparently fair coin tosses, etc.), the best description involvessome probabilistic component which deserves to be called chance. Clark(1987) draws on how we use the phase space measure \(\mu\) togovern our expectations about the behaviour of Bernoulli and mixingsystems in classical statistical mechanics, and argues (in effect) thatthis is an objective probability function despite the deterministicunderlying physics. Various other proposals for deterministic chancehave been developed (Eagle 2011; Glynn, 2010; Hoefer, 2007;Ismael, 2009; Sober, 2010). The general technique is to argue thatthere are probability distributions over outcomes that can play thechance role, even in impeccably deterministic theories. Many of thesephilosophers are sympathetic to reductionism about chance, whichpermits the compatibility of chance and determinism. For the fact thatthe entire history of a world supervenes on any moment of its history,as determinism states, apparently entails nothing one way or anotherabout whether the best description of that entire history involveschances (this is related to the ‘no-connection’ argument ofSchaffer 2007: 115). If there is such a thing as deterministic chance,however, P1 is false.
Anti-reductionists about chance have generally found these argumentsless persuasive (Popper 1992; Black 1998; Weiner and Belnap 2006). Inparticular, it is in many ways hard to reconcile the BCP (and RP) withdeterministic chance. Doing so will require that there is a physicallypossible world (sharing the same laws) which matches ours in occurrentfacts up until \(t\) but diverges thereafter; but if determinismis true, such a divergence is not possible. If that world matches oursat any time, it matches it at all times. So, it seems, ‘only anincompatibilist function can fit the RP’ (Schaffer 2007: 130). Inany case, the debate over whether deterministic ‘chance’,and reductionism about chance more generally is ongoing (see furtherthe discussion in supplementA); the statusof the left-to-right direction of P1 is at best unsettled.
The same cannot be said for the right-to-left direction. For thediscussion in §5.3 showed that there areindeterministic theories without chances, those where indeterminism issecured by the existence of alternative future possibilities, but wherethose possibilities collectively do not permit or require a probabilitydistribution over them. A more controversial class of cases ofindeterminism without chances comes from those who rejectuniversalism about chance: the thesis ‘that chance oftruth applies to any proposition whatever’ (Lewis, 1980: 91). Ifuniversalism is false, there may be indeterministic situations wherethe alternative outcomes are those to which chance does not apply. VonMises rejected universalism, because he thought that chance appliedproperly only to ‘mass phenomena’; in an indeterministicworld where the indeterministic process occurs only once, for example,the theory of probability does not apply. Hoefer (2007) also holds aview something like this, rejecting chances for those processes whichdon’t have stable enough outcome patterns.[25]
We could give an argument for P2 if it could be shown that, from theabove definition of determinism, we could conclude (i) that onlyrandom sequences could occur under indeterminism, and (ii) that randomsequences could only occur under indeterminism. However, both parts ofthis claim are problematic.
Theorem 8 (Humphreys 1978). There is a theory whichis deterministic in the sense of Montague which has as a model asystem which produces sequences which are random in the sense of vonMises/Church.
The proof of this theorem relies on the fact that a theory can benon-trivially deterministic without being computable. There is anarithmetically definable function which governs the evolution of thesystem over time (in Humphrey’s construction, the instantaneous stateof the system is an arithmetically definable function of the time,which ensures that any two models agreeing at one time will agree atall times, guaranteeing determinism). But the function is noteffectively computable, so no algorithm can produce the sequence of states that this system goes through.[26] The physicalsignificance of such uncomputable functions is not clear (though seePour-El and Richards 1983), but the possibility of a deterministicphysics which features such equations of motion is enough to underminea close connection between randomness and indeterminism. This showsthat claim (ii) above is false.
Moreover, since we’ve already seen (§4.1) that it is possible for a chancy and indeterministic process to producea non-random sequence of outcomes, and such a sequence would not berandom, we also have a counterexample to claim (i). Claim (i) could besaved if we made a more robust claim about what could happen underindeterminism. There is a sense in which, while it is possible that afair coin could land heads an infinite number of times, itwouldnot. That is, the counterfactual ‘If I tossed the coin aninfinite number of times, it wouldn’t land all heads’ isapparently true. There is some question whether the counterfactualreally is true; Lewis (1979a) and Williams (2008) argue that it is,while Hawthorne (2005) argues that it is not. But if it is, the way lies open todefend a modified counterfactual version of P2:
P\(2'\):
A possible sequence of outcomes is random iff it is not the case that,were an indeterministic process to be repeated infinitely, it would notproduce that sequence of outcomes.
But this is highly controversial; and the problem for claim (ii)would still stand.
If we are to accept this argument, then, we shall have to take P2 asan independent truth about randomness. Analyses of randomness asindeterminism, which take P2 to be analytic, have been given: to theirdetriment, if the foregoing observations are correct. Hellman (1978:83) suggests that randomness is ‘roughly interchangeable with“indeterministic”’, while Ekeland (1988: 49) says‘the main feature of randomness is some degree of independencefrom the initial conditions. … Better still, if one performsthe same experiment twice with the same initial conditions, one mayget two different outcomes’.
However, it has been argued that this view of randomness asindeterminism makes it difficult to understand many of the uses ofrandomness in science (Eagle 2005: §3.2). This view entails thatrandom sampling, and random outcomes in chaotic dynamics, and randommating in population genetics, etc., are not in fact random, despitethe plausibility of their being so. It does not apparently requirefundamental indeterminism to have a randomised trial, and ourconfidence in the deliverances of such trials does not depend on ourconfidence that the trial design involved radioactive decay or someother fundamentally indeterministic process. Indeed, if Bohmians orEverettians are right (an open epistemic possibility), and quantummechanics is deterministic, this view entails that nothing is actuallyrandom, not even the most intuitively compelling cases. This kind ofview attributes to scientists a kind of error theory about many oftheir uses of the term ‘random’, but as yet thephilosophical evidence adduced to convict scientists of this pervasiveerror is not compelling.
One reason for the continuing attractiveness of the thesis thatrandomness is indeterminism may be the fact that, until quiterecently, there has been a tendency for philosophers and other toconfuse unpredictability and indeterminism. Laplace’s originalconception of determinism was an epistemic one:
[A]n intelligence which could comprehend all the forces by whichnature is animated and the respective situation of all the [thingswhich] compose it—an intelligence sufficiently vast to submitthese data to analysis—it would embrace in the same formula themovements of the greatest bodies in the universe and those of thelightest atom; for it, nothing would be uncertain and the future, aswell as the past, would be present to its eyes. (Laplace 1826:p. 4)
This kind of account still resonates with us, despite the fact thatwith the Montague-Earman definition we now have a non-epistemiccharacterisation of determinism. Since random sequences will, almostalways, be unpredictable, it is clear why we might then taken them tobe indeterministic. But once we keep clear the distinction betweenpredictability and determinism, we should be able to avoid thisconfusion (Bishop, 2003; Schurz, 1995; Werndl, 2009).
From what we have seen, the commonplace thesis cannot be sustained.It would in many ways have been nice if chance of a process andrandomness of its product had gone hand in hand—the epistemologyof chance would be much aided if it invariably showed itself in randomoutputs, and we could have had a tight constraint on what outcomes toexpect of a repeated chance process, to say nothing of the furtherinteresting consequences the thesis may have for random sampling orprobabilistic explanation, mentioned in the introduction. But thecounterexamples to the thesis in §§4–5 show that it is false, even inits most plausible form. Various attempts to salvage the thesis, byappeal to non-standard accounts of chance or randomness, fail to giveus a version of the thesis of much interest or bearing on the issues wehad hoped it would illuminate. A final attempt to argue directly forthe thesis from the connections between chance, randomness, anddeterminism also failed, though it does shed light on all threenotions. It is safest, therefore, to conclude that chance andrandomness, while they overlap in many cases, are separateconcepts.
This is not to say that there is no link between KML-randomness andphysical chance. The observation of a random sequence of outcomes is adefeasible incentive to inquire into the physical basis of the outcomesequence, and it provides at least a prima facie reason to think that aprocess is chancy (though recall §4.5).Moreover, if we knew that a process is chancy, we should expect(eventually, with high and increasing probability) a random sequence ofoutcomes. Conversely, a sequence of outcomes that appears predictable, compressible, and rule-governed will be strong disconfirming evidence for any hypothesis to the effect that the sequence was produced by chance alone. Hellman concludes
The link, then, between mathematical and physical randomness is epistemic and only that. Observations of mathematically non-random sequences can be used to decide when further explanation in terms of as yet undiscovered causal factors is wanting. But, in no sense is any notion of mathematical randomness serving as an explication for ‘ultimate physical randomness’, whatever that might be. (Hellman, 1978: 86)
Taking ‘mathematical randomness’ to be product randomness, and ‘physical randomness’ to mean process randomness (chanciness), this conclusion seems unavoidable.
A parallel with the relationship between frequencies and chances is tempting and unsurprising. Relative frequencies are good but not infallible indicators of the chances, and the existence of outcome frequencies strictly between 0 and 1 is evidence that chance processes are involved in the production of those outcomes. But frequentism is implausible as a reductive account of chance. And it is no more plausible to think that chance is present iff random sequences of outcomes are.[27] An evidential and epistemic connection between chanceand randomness falls well short of the conceptual connection proposedby the Commonplace Thesis with which we started.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
Bell’s Theorem |chaos |computability and complexity |determinism: causal |epistemology: Bayesian |Lewis, David |probability, interpretations of |statistical physics: philosophy of statistical mechanics |time: thermodynamic asymmetry in
Thanks to audiences at the Sigma Group at the LSE, Leeds HPS, and thefirst year seminar in Oxford, for comments on presentations of partsof this material, and to Alan Hájek, Chris Porter, and FredKroon for extensive and very helpful comments on a draft entry. (Inparticular, the argument of supplementA.2,note4 is due to Hájek.)In revising the entry, I’ve been grateful to Chris Porter forfurther helpful comments and pointers to some of the recentliterature. The ‘pluralist’ approach mentioned insupplementB.1.2 is due to him, as in the broad outlines of the argument ofB.1.3.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2023 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054