Probabilistic Causation

First published Fri Jul 11, 1997; substantive revision Fri Mar 9, 2018

“Probabilistic Causation” designates a group of theoriesthat aim to characterize the relationship between cause and effectusing the tools of probability theory. The central idea behind thesetheories is that causes change the probabilities of their effects.This article traces developments in probabilistic causation, includingrecent developments in causal modeling.

This entry surveys the main approaches to characterizing causation interms of probability. Section 1 provides some of the motivation forprobabilistic approaches to causation, and addresses a few preliminaryissues. Section 2 surveys theories that aim to characterize causationin terms of probability-raising. Section 3 surveys developments incausal modeling. Section 4 covers probabilistic accounts of actualcausation.

1. Motivation and Preliminaries

In this section, we will provide some motivation for trying tounderstand causation in terms of probabilities, and address a coupleof preliminary issues.

1.1 Problems for Regularity Theories

According to David Hume, causes are invariably followed by theireffects:

We may define a cause to bean object, followed by another, andwhere all the objects similar to the first, are followed by objectssimilar to the second. (1748: section VII)

Attempts to analyze causation in terms of invariable patterns ofsuccession are referred to as “regularity theories” ofcausation. There are a number of well-known problems facing regularitytheories, at least in their simplest forms, and these may be used tomotivate probabilistic approaches to causation. Moreover, an overviewof these difficulties will help to give a sense of the kinds ofproblem that any adequate theory of causation would have to solve.

(i)Imperfect Regularities. The first difficulty is that mostcauses are not invariably followed by their effects. For example,smoking is a cause of lung cancer, even though some smokers do notdevelop lung cancer. Imperfect regularities may arise for twodifferent reasons. First, they may arise because of theheterogeneity of circumstances in which the cause arises. Forexample, some smokers may have a genetic susceptibility to lungcancer, while others do not; some non-smokers may be exposed to othercarcinogens (such as asbestos), while others are not. Second,imperfect regularities may also arise because of a failure ofphysical determinism. If an event is not determined to occur,then no other event can be (or be a part of) a sufficient conditionfor that event. The success of quantum mechanics—and to a lesserextent, other theories employing probability—has shaken ourfaith in determinism. Thus it has struck many philosophers asdesirable to develop a theory of causation that does not presupposedeterminism.

The central idea behind probabilistic theories of causation is thatcauseschange the probability of their effects; an effect maystill occur in the absence of a cause or fail to occur in itspresence. Thus smoking is a cause of lung cancer, not because allsmokers develop lung cancer, but because smokers aremorelikely to develop lung cancer than non-smokers. This is entirelyconsistent with there being some smokers who avoid lung cancer, andsome non-smokers who succumb to it.

(ii)Irrelevance. A condition that is invariably followed bysome outcome may nonetheless be irrelevant to that outcome. Salt thathas been hexed by a sorcerer invariably dissolves when placed in water(Kyburg 1965), but hexing does not cause the salt to dissolve. Hexingdoes notmake a difference for dissolution. Probabilistictheories of causation capture this notion of making a difference byrequiring that a cause make a difference for the probability of itseffect.

(iii)Asymmetry. IfA causesB, then,typically,B will not also causeA. Smoking causes lungcancer, but lung cancer does not cause one to smoke. One way ofenforcing the asymmetry of causation is to stipulate that causesprecede their effects in time. But it would be nice if a theory ofcausation could provide some explanation of the directionality ofcausation, rather than merely stipulate it. Some proponents ofprobabilistic theories of causation have attempted to use theresources of probability theory to articulate a substantive account ofthe asymmetry of causation.

(iv)Spurious Regularities. Suppose that a cause is regularlyfollowed by two effects. Here is an example from Jeffrey (1969):Suppose that whenever the barometric pressure in a certain regiondrops below a certain level, two things happen. First, the height ofthe column of mercury in a particular barometer drops below a certainlevel. Shortly afterwards, a storm occurs. This situation is shownschematically in Figure 1. Then, it may well also be the case thatwhenever the column of mercury drops, there will be a storm. If so, asimple regularity theory would seem to rule that the drop of themercury columncauses the storm. In fact, however, theregularity relating these two events isspurious. The abilityto handle such spurious correlations is probably the greatest sourceof attraction for probabilistic theories of causation.

A diagram with three text nodes. The base one is 'Drop in Barometric Pressure' with arrows pointing to 'Drop in Column of Mercury' and 'Storm'.

Figure 1

1.2 Probability

In this sub-section, we will review some of the basics of themathematical theory of probability, and introduce some notation.Readers already familiar the mathematics of probability may wish toskip this section.

Probability is a function, P, that assigns values between zero andone, inclusive. Usually the arguments of the function are taken to besets, or propositions in a formal language. The formal term for thesearguments is ‘events’. We will here use the notation thatis appropriate for propositions, with ‘\(\nsim\)’representing negation, ‘&’ representing conjunction,and ‘\(\vee\)’ representing disjunction. Sometimes whenthere is a long conjunction, this is abbreviated by using commas insteadof ampersands. The domain of a probability function has the structureof afield or aBoolean algebra. This means that thedomain is closed under complementation and the taking of finite unionsor intersections (for sets), or under negation, conjunction, anddisjunction (for propositions). Thus ifA andB areevents in the domain of P, so are \({\nsim}A\), \(A \amp B\), and \(A\vee B\).

Some standard properties of probability are the following:

IfA is a contradiction, then \(\PP(A) = 0\).
IfA is a tautology, then \(\PP(A) = 1\).
If \(\PP(A \amp B) = 0\), then \(\PP(A \vee B) = \PP(A) +\PP(B)\)
\(\PP({\nsim}A) = 1 - \PP(A)\).

In addition to probability theory, the entry will use basic notation from settheory and logic. Sets will appear in boldface.

\(\varnothing\) is the empty set (the set that has no members orelements)
\(x \in \bX\) means thatx is a member or element of theset \(\bX\).
\(\bX \subseteq \bY\) means that \(\bX\) is a subset of \(\bY\);i.e., every member of \(\bX\) is also a member of \(\bY\). Note thatboth \(\varnothing\) and \(\bY\) are subsets of \(\bY\).
\(\bX \setminus \bY\) is the set that results from removing themembers of \(\bY\) from \(\bX\).
\(\forall\) and \(\exists\) are the universal and existentialquantifiers, respectively.

Some further definitions:

Theconditional probability of A given B, written \(\PP(A\mid B)\), is standardly defined as follows: \[\PP(A \mid B) = \frac{\PP(A \amp B)}{\PP(B)}.\]
If \(\PP(B) = 0\), then the ratio in the definition of conditionalprobability is undefined. There are, however, a variety of technicaldevelopments that will allow us to define \(\PP(A \mid B)\) when\(\PP(B)\) is 0. We will ignore this problem here.
Arandom variable for probability P is a functionX that takes values in the real numbers, such that for anynumber \(x, X = x\) is an event in the domain of P. For example, wemight have a random variable \(T_1\) that takes values in \(\{1, 2, 3,4, 5, 6\}\), representing the outcome of the first toss of a die. Theevent \(T_1 = 3\) would represent the first toss as having outcome 3.We will write \(T_1 \in \{1, 2, 3\}\) as shorthand for \(T_1 = 1 \veeT_1 = 2 \vee T_1 = 3\).
IfA andB are in the domain of P, thenA andB areprobabilistically independent (with respect toP) just in case \(\PP(A \amp B) = \PP(A) \times \PP(B). A\) andB areprobabilistically dependent orcorrelated otherwise.
A andB areprobabilistically independentconditional onC if and only if \(\PP(A \amp B\mid C) = \PP(A \mid C) \times \PP(B \mid C)\).
Random variablesX andY areprobabilisticallyindependent if and only if all events of the form \(X \in \bH\)are probabilistically independent of all events of the form \(Y \in\bJ\), where \(\bH\) and \(\bJ\) are subsets of the range ofXandY, respectively.

As a convenient shorthand, a probabilistic statement that containsonly a variable or set of variables, but no values, will be understoodas a universal quantification over all possible values of thevariable(s). Thus if \(\bX = \{X_1 , \ldots ,X_m\}\) and \(\bY = \{Y_1, \ldots ,Y_n\}\), we may write

\[\PP(\bX \mid \bY) = \PP(\bX)\]

as shorthand for

\[\begin{align} \forall x_1 \ldots \forall x_m\forall y_1 \ldots \forall y_n & [\PP(X_1 =x_1 ,\ldots ,X_m =x_m \mid Y_1 =y_1,\ldots ,Y_n =y_n)\\ & = \PP(X_1 =x_1 ,\ldots ,X_m =x_m)]\end{align}\]

(where the domain of quantification for each variable will be therange of the relevant random variable).

1.3 The Interpretation of Probability

Causal relations are normally thought to be objective features of theworld. If they are to be captured in terms of probability theory, thenprobability assignments should represent some objective feature of theworld. There are a number of attempts to interpret probabilitiesobjectively, the most prominent beingfrequencyinterpretations andpropensity interpretations. Mostproponents of probabilistic theories of causation have understoodprobabilities in one of these two ways. Notable exceptions are Suppes(1970), who takes probability to be a feature of a model of ascientific theory; and Skyrms (1980), who understands the relevantprobabilities to be the subjective probabilities of a certain kind ofrational agent.

1.4 General Causation and Actual Causation

It is common to distinguish betweengeneral, ortype-level causation, on the one hand, andsingular,token-level oractual causation, on the other. Thisentry adopts the termsgeneral causation andactualcausation. Causal claims usually have the structure‘C causesE’.C andE are therelata of the causal claim; we will discuss causal relata inmore detail in the next section. General causation and actualcausation are often distinguished by their relata.Generalcausal claims, such “smoking causes lung cancer” typicallydo not refer to particular individuals, places, or times, but only toevent-types or properties.Singular causal claims, such as“Jill’s heavy smoking during the 2000s caused her todevelop lung cancer”, typically do make reference to particularindividuals, places, and times. This is an imperfect guide, however;for example, some theories of general causation to be discussed belowtake their causal relata to be time-indexed.

A related distinction is that general causation is concerned with afull range of possibilities, whereas actual causation is concernedwith how events actually play out in a specific case. At a minimum, inclaims of actual causation, “cause” functions as a successverb. The claim “Jill’s heavy smoking during the 2000scaused her to develop lung cancer” implies that Jill smokedheavily during the 2000s and that she developed lung cancer.

The theories to be discussed in Sections2 and3 below primarily concern general causation, whileSection 4 discusses theories of actual causation.

1.5 Causal Relata

A number of different candidates have been proposed for the relata ofcausal relations. The relata of actual causal relations are oftentaken to beevents (not to be confused with events in thepurely technical sense), although some authors (e.g., Mellor 2004)argue that they arefacts. The relata of general causalrelations are often taken to beproperties orevent-types. For purposes of definiteness,events will refer to the relata of actual causation, andfactors will refer to the relata of general causation. Theseterms are not intended to imply a commitment to any particular view onthe nature of the causal relata.

In probabilistic approaches to causation, causal relata arerepresented by events or random variables in a probability space.Since the formalism requires us to make use of negation, conjunction,and disjunction, the relata must be entities (or be accuratelyrepresented by entities) to which these operations can be meaningfullyapplied.

In some theories, the time at which an event occurs or a property isinstantiated plays an important role. In such cases, it will be usefulto include a subscript indicating the relevant time. Thus the relatamight be represented by \(C_t\) and \(E_{t'}\). If the relata areparticular events, this subscript is just a reminder; it adds nofurther information. For example, if the event in question is theopening ceremony of the Rio Olympic games, the subscript‘8/5/2016’ is not necessary to disambiguate it from otherevents. In the case of properties or event-types, however, suchsubscripts do add further information. The time index need not referto a date or absolute time. It could refer to a stage in thedevelopment of a particular kind of system. For example, exposure tolead paint in children can cause learning disabilities. Here the timeindex would indicate that it is exposurein children, thatis, in the early stages of human life, that causes the effect inquestion. The time indices may also indicate relative times. Exposureto the measles virus causes the appearance of a rash approximately twoweeks later. We might indicate this time delay by assigning exposure atime index of \(t = 0\), and rash an index of \(t = 14\) (for 14days).

It is standard to assume that causes and effects must bedistinct from one another. This means that they must notstand in logical relations or part-whole relations to one another.Lewis 1986a contains a detailed discussion of the relevant notion ofdistinctness. We will typically leave this restriction tacit.

1.6 Further Reading

Psillos 2009 provides an overview of regularity theories of causation.Lewis 1973 contains a brief but clear and forceful overview ofproblems with regularity theories. The entry forscientific explanation contains discussions of some of these problems.

Hájek and Hitchcock 2016b is a short introduction to probabilitytheory geared toward philosophical applications. Billingsley 1995 andFeller 1968 are two standard texts on probability theory. The entryforinterpretations of probability includes a brief introduction to the formalism of probability theory,and discusses the various interpretations of probability. Galavotti2005 and Gillies 2000 are good surveys of philosophical theories ofprobability. Hájek and Hitchcock 2016a includes essays covering themajor interpretations of probability.

The Introduction of Eells 1991 provides a good overview of thedistinction between general and actual causation.

Bennett 1988 is an excellent discussion of facts and events in thecontext of causation. Ehring 2009 is a survey of views about causalrelata. See also the entries forthe metaphysics of causation,events,facts, andproperties.

2. Probability-raising Theories of Causation

The theories canvassed in this section all develop the basic idea thatcauses raise the probability of their effects. These theories wereamong the leading theories of causation during the second half of the20^th century. Today, they have largely been supplanted bythe causal modeling approaches discussed inSection 3.

2.1 Probability-raising and Conditional Probability

The central idea that causes raise the probability of their effectscan be expressed formally using conditional probability.Craises the probability ofE just in case:

\((\PR_1)\): \(\PP(E \mid C) \gt \PP(E)\).

In words, the probability thatE occurs, given thatCoccurs, is higher than the unconditional probability thatEoccurs. Alternately, we might say thatC raises the probabilityofE just in case:

\((\PR_2)\): \( \PP(E \mid C) \gt \PP(E \mid {\nsim}C)\);

the probability thatE occurs, given thatC occurs, ishigher than the probability thatE occurs, given thatCdoes not occur. These two formulations turn out to be equivalent inthe sense that inequality \(\PR_1\) will hold just in case \(\PR_2\)holds. Some authors (e.g., Reichenbach 1956, Suppes 1970, Cartwright1979) have formulated probabilistic theories of causation usinginequalities like \(\PR_1\), others (e.g., Skyrms 1980, Eells 1991)have used inequalities like \(\PR_2\). This difference is mostlyimmaterial, but for consistency we will stick with (\(\PR_2)\). Thus afirst stab at a probabilistic theory of causation would be:

(PR): C is acause ofE just in case \(\PP(E \mid C) \gt \PP(E \mid{\nsim}C)\).

PR has some advantages over the simplest version of a regularitytheory of causation (discussed inSection 1.1 above). PR is compatible with imperfect regularities:C mayraise the probability ofE even though instances ofCare not invariably followed by instances ofE. Moreover, PRaddresses the problem of relevance: ifC is a cause ofE, thenC makes a difference for the probability ofE. But as it stands, PR does not address either the problem ofasymmetry, or the problem of spurious correlations. PR does notaddress the problem of asymmetry because probability-raising turns outto be symmetric: \(\PP(E \mid C) \gt \PP(E \mid {\nsim}C)\),if and only if \(\PP(C \mid E) \gt \PP(C \mid {\nsim}E)\).Thus PR by itself cannot determine whetherC is the cause ofE or vice versa. PR also has trouble withspuriouscorrelations. IfC andE are both caused by somethird factor,A, then it may be that \(\PP(E \mid C) \gt\PP(E \mid {\nsim}C)\) even thoughC does not causeE. This is the situation shown inFigure 1 above. Here,C is the drop in the level of mercury in abarometer, andE is the occurrence of a storm. Then we wouldexpect that \(\PP(E \mid C) \gt \PP(E \mid {\nsim}C)\). Inthis case, atmospheric pressure is referred to as aconfounding factor.

2.2 Screening off

Hans Reichenbach’sThe Direction of Time was publishedposthumously in 1956. In it, Reichenbach is concerned with the originsof temporally asymmetric phenomena, particularly the increase inentropy dictated by the second law of thermodynamics. In this work, hepresents the first fully developed probabilistic theory of causation,although some of the ideas can be traced back to an earlier paper from1925 (Reichenbach 1925).

Reichenbach introduced the terminology ofscreening off todescribe a particular type of probabilistic relationship. If \(\PP(E\mid A \amp C) = \PP(E \mid C)\), thenC is said toscreenA off fromE. When \(\PP(A \amp C) \gt 0\), thisequality is equivalent to \(\PP(A \amp E \mid C) = \PP(A \midC) \times \PP(E \mid C)\); i.e.,A andE areprobabilistically independent conditional uponC.

Reichenbach recognized that there were two kinds of causal structurein whichC will typically screenA off fromE.The first occurs whenA causesC, which in turn causesE, and there is no other route or process by whichAeffectsE. This is shown in Figure 2.

The letter 'A' with an arrow to the right pointing to the letter 'C' which in turn has an arrow to the right pointing to the letter 'E'

Figure 2

In this case, Reichenbach said thatC iscausallybetweenA andE. We might say thatC is anintermediate cause betweenA andE, or thatC isa proximate cause ofE andA a distal cause ofE.For example, unprotected sex (A) causes AIDS (E) only bycausing HIV infection (C). Then we would expect that amongthose already infected with HIV, those who became infected throughunprotected sex would be no more likely to contract AIDS than thosewho became infected in some other way.

The second type of case that produces screening off occurs whenC is a common cause ofA andE, such as in thebarometer example depicted inFigure 1. A drop in atmospheric pressure (C) causes both a drop in thelevel of mercury in a barometer (A) and a storm (E).(This notation is slightly different from one used earlier.) Theatmospheric pressure will screen off the barometer reading from theweather: given that the atmospheric pressure has dropped, thereading of the barometer makes no difference for the probability ofwhether a storm will occur.

Reichenbach used the apparatus of screening off to address the problemof spurious correlations. In our example, while a drop in the columnof mercury (A) raises the probability of a storm (E)overall, it does not raise the probability of a storm when we furthercondition on the atmospheric pressure. That is, ifA andE are spuriously correlated, thenA will be screened offfromE by a common cause. More specifically, suppose that\(C_t\) and \(E_{t'}\) are events that occur at timest and\(t'\) respectively. Then

(Reich)

\(C_t\) is a cause of \(E_{t'}\) if and only if:

i.: \(t \lt t'\)
ii.: \(\PP(E_{t'} \mid C_t) \gt \PP(E_{t'} \mid {\nsim}C_t)\); and
iii.: There is no further eventB\(_{t''}\), occurring at a time \(t''\) earlier than or simultaneously witht, that screens \(E_{t'}\) off from \(C_t\).

Note the restriction of \(t''\) to times earlier than orsimultaneously with the occurrence of \(C_t\). That is because causalintermediates between \(C_t\) and \(E_{t'}\) will often screen \(C_t\)off from \(E_{t'}\). In such cases we still want to say that \(C_t\)is a cause of \(E_{t'}\), albeit a distal or indirect cause.

Suppes (1970) independently offered an equivalent definition ofcausation, although his motivation for the no-screening-off conditionwas different from Reichenbach’s. Suppes extended the frameworkin a number of directions. While Reichenbach was interested inprobabilistic causation primarily in connection with issues that arisewithin the foundations of statistical mechanics, Suppes was interestedin defining causation within the framework of probabilistic models ofscientific theories. For example, Suppes offers an extended discussionof causation in the context of psychological models of learning.

2.3 The Common Cause Principle

Reichenbach (1956) formulated a principle he dubbed the ‘Common CausePrinciple’ (CCP). Suppose that eventsA andB arepositively correlated, i.e., that

\(\PP(A \amp B) \gt \PP(A) \times \PP(B)\).

But suppose that neitherA norB is a cause of theother. Then Reichenbach maintained that there will be a common cause,C, ofA andB, satisfying the followingconditions:

\(0 \lt \PP(C) \lt 1\)
\(\PP(A \amp B \mid C) = \PP(A \mid C) \times \PP(B\mid C)\)
\(\PP(A \amp B \mid {\nsim}C) = \PP(A \mid {\nsim}C)\times \PP(B \mid {\nsim}C)\)
\(\PP(A \mid C) \gt \PP(A \mid {\nsim}C)\)
\(\PP(B \mid C) \gt \PP(B \mid {\nsim}C)\).

When eventsA,B, andC satisfy these conditions,they are said to form aconjunctive fork. 5 and 6 follow fromC being a cause ofA and a cause ofB. Conditions2 and 3 stipulate thatC and \({\nsim}C\) screen offAfromB.

Conditions 2 through 6 mathematically entail 1. Reichenbach says thatthe common causeexplains the correlation betweenAandB. The idea is that probabilistic correlations that are notthe result of one event causing another are ultimately derived fromprobabilistic correlations that do result from a causalrelationship.

Reichenbach’s definition of causation, discussed inSection 2.2 above, appeals to time order: it requires that a cause occur earlierthan its effect. But Reichenbach also thought that the direction fromcauses to effects can be identified with a pervasive statisticalasymmetry. Suppose that eventsA andB are correlated,and thatC satisfies conditions 2–6 above, so thatABC form a conjunctive fork. IfC occurs earlier thanA andB, and there is no event satisfying 2 through 6that occurs later thanA andB, thenACB is saidto form a conjunctive forkopen to the future. Analogously,if there is a later event satisfying 2 through 6, but no earlierevent, we have a conjunctive forkopen to the past. If anearlier eventC and a later eventD both satisfy 2through 6, thenACBD forms aclosed fork.Reichenbach’s proposal was that the direction from cause toeffect is the direction in which open forks predominate. In our world,there are a great many forks open to the future, few or none open tothe past. However, we shall see insection 3.6 below that conjunctive forks are not the best structures foridentifying causal direction.

2.4 Simpson’s Paradox and Background Contexts

In the Reichenbach-Suppes definition of causation, the inequality\(\PP(E_{t'} \mid C_t) \gt \PP(E_{t'} \mid {\nsim}C_t)\) isnecessary, but not sufficient, for causation. It is not sufficient,because it may hold in cases where \(C_t\) and \(E_{t'}\) share acommon cause. Unfortunately, common causes can also give rise to caseswhere this inequality is not necessary for causation either. Suppose,for example, that smoking is highly correlated with living in thecountry: those who live in the country are much more likely to smokeas well. Smoking is a cause of lung cancer, but suppose that citypollution is an even stronger cause of lung cancer. Then it may bethat smokers are, over all, less likely to suffer from lung cancerthan non-smokers. LettingC represent smoking,B livingin the country, andE lung cancer, \(\PP(E \mid C) \lt\PP(E \mid {\nsim}C)\). Note, however, that if we conditionalizeon whether one lives in the country or in the city, this inequality isreversed: \(\PP(E \mid C \amp B) \gt \PP(E \mid {\nsim}C \ampB)\), and \(\PP(E \mid C \amp{\nsim}B) \gt \PP(E \mid {\nsim}C\amp{\nsim}B)\). Such reversals of probabilistic inequalities areinstances of “Simpson’s Paradox”. The problem thatSimpson’s paradox creates for probabilistic theories ofcausation was pointed out by Nancy Cartwright (1979) and Brian Skyrms(1980) at about the same time.

Cartwright and Skyrms sought to rectify the problem by replacingconditions (ii) and (iii) ofReich with the requirement that causes must raise the probabilities oftheir effects in variousbackground contexts. Cartwrightproposed the following definition:

(Cart): C causesE if and only if \(\PP(E \mid C \amp B) \gt \PP(E \mid {\nsim}C \amp B)\) for every background contextB.

Skyrms proposed a slightly weaker condition: a cause must raise theprobability of its effect in at least one background context, andlower it in none. A background context is a conjunction of factors.When such a conjunction of factors is conditioned on, those factorsare said to be “held fixed”. To specify what thebackground contexts will be, then, we must specify what factors are tobe held fixed. In the previous example, we saw that the true causalrelevance of smoking for lung cancer was revealed when we held countryliving fixed, either positively (conditioning on \(B)\) or negatively(conditioning on \({\nsim}B)\). This suggests that in evaluating thecausal relevance ofC forE, we need to hold fixed othercauses ofE, either positively or negatively. This suggestionis not entirely correct, however. LetC andE be smokingand lung cancer, respectively. SupposeD is a causalintermediary, say the presence of tar in the lungs. IfC causesE exclusively viaD, thenD will screenCoff fromE: given the presence (absence) of tar in the lungs,the probability of lung cancer is not affected by whether the tar gotthere by smoking. Thus we will not want to hold fixed any causes ofEthat are themselves caused by C. Let us call the setof all factors that are causes ofE, but are not caused byC, the set ofindependent causes ofE. Abackground context forC andE will then be a maximalconjunction, each of whose conjuncts is either an independent cause ofE, or the negation of an independent cause ofE.

Note that the specification of factors that need to be held fixedappeals to causal relations, so the theory no longer offers areductive analysis of causation. Nonetheless, the theoryimposes probabilistic constraints upon possible causal relations inthe sense that a given set of probability relations will beincompatible with at least some systems of causal relations. Note alsothat we have dropped the subscripts referring to times. Cartwrightclaimed that it is not necessary to appeal to the time order of eventsto distinguish causes from effects in her theory. That is because itwill no longer be true in general that ifC raises theprobability ofE in every relevant background contextB,thenE raise will raise the probability ofC in everybackground context \(B'\). The reason is that the construction of thebackground contexts ensures that the background contexts relevant toassessingC’s causal relevance forE are differentfrom those relevant to assessingE’s causal relevance forC. However, Davis (1988) and Eells (1991) both argue cogentlythat Cartwright’s account will still sometimes rule that effectsbring about their causes.

2.5 Other Causal Relations

Cartwright defined a cause as a factor that increases the probabilityof its effect in every background context. But it is easy to see thatthere are other possible probability relations betweenC andE. Eells (1991) proposes the following taxonomy:

(Eells)

•: \(C_t\) is apositive cause (or simplycause) of \(E_{t'}\) ifand only if \(t \lt t'\) and \(\PP(E_{t'} \mid C_t \amp B) \gt\PP(E_{t'} \mid {\nsim}C_t \amp B)\) for every background contextB.
•: \(C_t\) is anegative cause of \(E_{t'}\) (or \(C_t\)prevents\(E_{t'}\) or \(C_t\)inhibits \(E_{t'})\) if and only if \(t\lt t'\) and \(\PP(E_{t'} \mid C_t \amp B) \lt \PP(E_{t'} \mid{\nsim}C_t \amp B)\) for every background contextB.
•: \(C_t\) iscausally neutral for \(E_{t'}\) (orcausally irrelevant for \(E_{t'})\) if and only if \(\PP(E_{t'} \midC_t \amp B) = \PP(E_{t'} \mid {\nsim} C_t \amp B)\) for everybackground contextB.
•: \(C_t\) is amixed cause (orinteracting cause) of \(E_{t'}\) ifand only if \(t \lt t'\) and \(C_t\)is not a positive or negativecause of \(E_{t'}\) and \(C_t\)is not causally neutral for \(E_{t'}\).

\(C_t\) iscausally relevant for \(E_{t'}\) if and only if itis a positive, negative, or mixed cause of \(E_{t'}\); i.e., if andonly if \(t \lt t'\) and \(C_t\) is not causally neutral for\(E_{t'}\).

It should be apparent that when constructing background contexts forC andE one should hold fixed not only (positive) causesofE that are independent of \(C,\) but also negative and mixedcauses ofE; in other words, one should hold fixed all factorsthat are causally relevant forE, except those for whichC is causally relevant. This suggests that causal relevance,rather than positive causation, is the most basic metaphysicalconcept.

Eells’s taxonomy brings out an important distinction. It is onething to ask whetherC is causally relevant toEinsome way; it is another to askin which wayC iscausally relevant toE. To say thatC causesE isthen potentially ambiguous: it might mean thatC is causallyrelevant toE; or it might mean thatC is a positivecause ofE. Probabilistic theories of causation can be used toanswer both types of question.

Eells claims that general causal claims must be relativized to apopulation. A very heterogeneous population will include a great manydifferent background conditions, while a homogeneous population willcontain few. A heterogeneous population can always be subdivided intohomogeneous subpopulations. It will often happen thatC is amixed cause ofE relative to a population P, while being apositive cause, negative cause, or causally neutral forE invarious subpopulations of P.

2.6 Contextual-unanimity

According to bothCart andEells, a cause must raise the probability of its effect ineverybackground context. This has been called the requirement ofcontextual-unanimity. Dupré (1984) raises the followingcounterexample to the contextual unanimity requirement. Suppose thatthere is a very rare gene that has the following effect: those thatpossess the gene have their chances of contracting lung cancerlowered when they smoke. In this scenario, there would be abackground context in which smoking lowers the probability of lungcancer: thus smoking would not be a cause of lung cancer according tothe contextual-unanimity requirement. Nonetheless, it seems unlikelythat the discovery of such a gene would lead us to abandon the claimthat smoking causes lung cancer.

Dupré suggests instead that we should deemC to be a cause ofE if it raises the probability ofE in a ‘fairsample’—a sample that is representative of the populationas a whole. Mathematically, this amounts to the requirement that

(Dupré): C causesE if and only if

\[\Sigma_B \PP(E \mid C \amp B) \times \PP(B) \gt \Sigma_B \PP(E \mid {\nsim}C \amp B) \times \PP(B)\]

whereB ranges over the relevant background contexts. This isthe same as requiring thatC must raise the probability ofE in aweighted average of background contexts, whereeach background context is weighted by the product of \(\PP(B)\) andthe absolute value of

\[\PP(E \mid C \amp B) - \PP(E \mid {\nsim}C \amp B).\]

Dupré’s account surely comes closer to capturing our ordinaryuse of causal language. Indeed, the inequality inDupré is what one looks for in randomized trials. If one randomlydetermines which members of a population receive a treatment(C) and which do not \(({\nsim}C)\), then the distribution ofbackground conditionsB ought to be the same in both groups,and ought to reflect the frequency of these conditions in thepopulation. Thus we would expect the frequency ofE to behigher in the treatment group just in case inequalityDupréholds.

On the other hand, Eells’s population-relative formulation allowsus to make more precise causal claims: in the population as a whole,smoking is a mixed cause of lung cancer; in the sub-population ofindividuals who lack the protective gene, smoking is a positive causeof lung cancer; in the sub-population consisting of individuals whopossess the gene, smoking is a negative cause of lung cancer.

In any event, this debate does not really seem to be about themetaphysics of causation. As we saw in the previous section, causalrelevance is really the basic metaphysical concept. The disputebetween Dupré and Eells is really a debate about how best to use theword ‘cause’ to pick out a particular species of causalrelevance. Dupré’s proposed usage will count as (positive)causes many things that will be mixed causes in Eells’s proposedusage. But there does not seem to be any underlying disagreement aboutwhich factors are causally relevant. (For defense of a similarposition, see Twardy and Korb 2004.)

2.7 Conclusion and Further Reading

The program described in this section did much to illuminate therelationship between causation and probability. In particular, ithelped us to better understand the way in which causal structure cangive rise to probabilistic relations of screening off. However,despite the mathematical framework of the program, and points ofcontact with statistics and experimental methodology, this program didnot give rise to any new computational tools, or suggest any newmethods for detecting causal relationships. For this reason, theprogram has largely been supplanted by the causal modeling toolsdescribed in the next section.

The main works surveyed in this section are Reichenbach 1956, Suppes1970, Cartwright 1979, Skyrms 1980, and Eells 1991. Williamson 2009and Hitchcock 2016 are two further surveys that cover a number of thetopics discussed in this section. The entries forHans Reichenbach andReichenbach’s Common Cause Principle include discussions of Reichenbach’s program and the status ofhis Common Cause Principle. Salmon (1984) contains an extensivediscussion of conjunctive forks. The entry forSimpson’s paradox contains further discussion of some of the issues raised inSection 2.4.

3. Causal Modeling

The discussion of the previous section conveys some of the complexityof the problem of inferring causal relationships from probabilisticcorrelations. Fairly recently, a number of techniques have beendeveloped for representing systems of causal relationships, and forinferring causal relationships from probabilities. The name‘causal modeling’ is often used to describe the newinterdisciplinary field devoted to the study of methods of causalinference. This field includes contributions from statistics,artificial intelligence, philosophy, econometrics, epidemiology, andother disciplines. Within this field, the research programs that haveattracted the greatest philosophical interest are those of thecomputer scientist Judea Pearl and his collaborators, and of thephilosophers Peter Spirtes, Clark Glymour, and Richard Scheines (SGS)and their collaborators. The most significant works of these authorsare Pearl (2009) (first published in 2000), and Spirtes et al. (2000)(first published in 1993).

3.1 Graphical Causal Models

Every causal model involves a set of variables \(\bV\). The variablesin \(\bV\) may include, for example, the education-level, income, andoccupation of an individual. A variable could be binary, its valuesrepresenting the occurrence or non-occurrence of some event, or theinstantiation or non-instantiation of some property. But as theexample of income suggests, a variable could have multiple values oreven be continuous.

A probabilistic causal model also includes a probability measure P. Pis defined over propositions of the form \(X = x\), whereX isa variable in \(\bV\) andx is a value in the range ofX. P is also defined over conjunctions, disjunctions, andnegations of such propositions. It follows that conditionalprobabilities over such propositions will be well-defined whenever theevent conditioned on has positive probability. P is usually understoodto represent some kind of objective probability.

Causal relationships among the variables in \(\bV\) are represented bygraphs. We will consider two types of graphs. The first isthedirected acyclic graph (DAG). Adirected graph\(\bG\) on variable set \(\bV\) is a set of ordered pairs of variablesin \(\bV\). We represent this visually by drawing an arrow fromX toY just in case \(\langle X, Y\rangle\) is in\(\bG\). Figure 3 shows a directed graph on variable set \(\bV = \{S,T, W, X, Y, Z\}\).

a diagram with 6 letters. S has an arrow pointing to T which in turn has arrows pointing to X and Y. Y in turn has an arrow pointing to Z which has an arrow pointing to W which has an arrow pointing back to Y.

Figure 3

Apath in a directed graph is a non-repeating sequence ofarrows that have endpoints in common. For example, there is a pathfromX toZ, which we can write as \(X \leftarrow T\rightarrow Y \rightarrow Z\). Adirected path is a path inwhich all the arrows align by meeting tip-to-tail; for example, thereis a directed path \(S \rightarrow T \rightarrow Y \rightarrow Z\). Adirected graph isacyclic, and hence a DAG, if there is nodirected path from a variable to itself. The graph inFigure 3 is a DAG.

The relationships in the graph are often described using the languageof genealogy. The variableX is aparent ofYjust in case there is an arrow directed fromX toY.\(\PA(Y)\) will denote the set of all parents ofY. InFigure 3, \(\PA(Y) = \{T, W\}\).X is anancestor ofY(andY is adescendant ofX) just in case thereis a directed path fromX toY. However, it will beconvenient to deviate slightly from the genealogical analogy anddefine ‘descendant’ so that every variable is alsoa descendant of itself. \(\DE(X)\) denotes the set of all descendants ofX. InFigure 3 \(\DE(T) = \{T, X, Y, Z\}\).

An arrow fromY toZ in a DAG represents thatYis adirect cause ofZ. Roughly, this means that thevalue ofY makes some causal difference for the value ofZ, and thatY influencesZ through some processthat is not mediated by any other variable in \(\bV\). Directness isrelative to a variable set. We will call the system of direct causalrelations represented in a DAG such asFigure 3 thecausal structure on the variable set \(\bV\).

A second type of graph that we will consider is anacyclicdirected mixed graph (ADMG). AnADMG, will contain double-headed arrows, as well as single-headedarrows. A double-headed arrow represents alatent commoncause. A latent common cause of variablesX andY is acommon cause that is not included in the variable set \(\bV\). Forexample, suppose thatX andY share a common causeL (Figure 4(a)). An ADMG on the variable set \(\bV = \{X, Y\}\)will look like Figure 4(b).


(a)		(b)

Figure 4

We only need to represent missing common causes in this way when theyareclosest common causes. That is, a graph on \(\bV\) shouldcontain a double-headed arrow betweenX andY when thereis a variableL that is omitted from \(\bV\), such that ifL were added to \(\bV\) it would be adirect cause ofX andY. Double-headed arrows do not give rise to“genealogical” relationships: inFigure 4(b),X is not a parent, ancestor, or descendant ofY.

In an ADMG, we expand the definition of apath to includedouble-headed arrows. Thus, \(X \leftrightarrow Y\) is a path in theADMG shown inFigure 4(b).Directed path retains the same meaning, and a directed pathcannot contain double-headed arrows. An ADMG cannot include a directedpath from a variable to itself.

We will adopt the convention that both DAGs and ADMGs represent thepresenceand absence of both direct causal relationships andlatent common causes. For example the DAG inFigure 3 represents thatT is a direct cause ofY, thatTis not a direct cause ofZ, and that there are no latent commoncauses of any variables.

3.2 Identification Problems

We will be interested in a variety of problems that have a generalstructure. There will be aquery concerning some causalfeature of the system being investigated. A query may concern:

The qualitative causal structure on a set of variables. Forinstance, we may ask which DAG, or which ADMG, correctly describes thecausal structure among the variables in \(\bV = \{S, T, W, X, Y,Z\}\). Or we may be interested in some specific feature of the causalstructure, such as whetherX is a direct cause ofY.Problems with this type of query are sometimes calledcausaldiscovery problems.
The effects ofinterventions. If we were to intervene inthe causal structure and set the value ofX tox, whatis the probability thatY would take the valuey? Forinstance, if we were to intervene by giving drugs to a patient, whatis the probability that he would recover?

A given problem will also have a set of inputs. These fall into avariety of categories:

Metaphysical and methodological assumptions: These are generalassumptions about the relationship between causal structure andprobability. These assumptions function like a probabilistic theory ofcausation. They tell us, in particular, how causal relationshipsconstrain probabilistic relationships. The main assumptions that wewill consider are theMarkov Condition (MC), theMinimality Condition, and theFaithfulnessCondition.
Background assumptions about the causal structure: We may assumethat a particular DAG or ADMG describes the causal structure on someset of variables, or assume that the correct graph does or does notinclude an arrow fromX toY. In particular, we may havetime order information that constrains the possible causal structures.For instance, if we know thatX concerns a time earlier thanY, then we can rule out causal structures in whichYcausally influencesX.
Observations: Information about the probability distribution Pover the variables in \(\bV\).

In realistic scientific cases, we never directly observe the trueprobability distribution P over a set of variables. Rather, we observefinite data that approximate the true probability when sample sizesare large enough and observation protocols are well-designed. Sinceour primary concern is with the philosophical issue of howprobabilities determine or constrain causal structure, we will notaddress these important practical concerns. An answer to a query thatcan be determined from the true probabilities is said to beidentifiable. For instance, if we can determine the correctDAG on a variable set \(\bV\) from the probability distribution on\(\bV\), the DAG is identifiable.

3.3 The Markov Condition

The most important principle connecting the causal structure on\(\bV\), as represented in a graph \(\bG\), and the probabilitydistribution P on \(\bV\) is theMarkov Condition (MC). Letus first consider the case where \(\bG\) is a DAG. Then P satisfiestheMarkov Condition (MC) relative to \(\bG\) if and only itsatisfies these three conditions:

(MC_{Screening_off})	For every variableX in \(\bV\), and every set of variables\(\bY \subseteq \bV \setminus \DE(X)\),\(\PP(X \mid \PA(X) \amp \bY) = \PP(X \mid \PA(X))\).
(MC_{Factorization})	Let \(\bV = \{X_1, X_2 , \ldots ,X_n\}\). Then\(\PP(X_1, X_2 , \ldots ,X_n) = \prod_i \PP(X_i \mid \PA(X_i))\).
(MC_d-separation)	Let \(X, Y \in \bV, \bZ \subseteq \bV \setminus \{X, Y\}\). Then\(\PP(X, Y \mid \bZ) = \PP(X \mid \bZ) \times \PP(Y \mid\bZ)\)if \(\bZ\)d-separatesX andY in \(\bG\)(explained below).

These three conditions are equivalent when \(\bG\) is a DAG.

Let us take some time to explain each of these formulations.

MC_{Screening_off} says that the parents of variableXscreenX off from all other variables, except for thedescendants ofX. Given the values of the variables that areparents ofX, the values of the variables in \(\bY\) (whichincludes no descendants of \(X)\), make no further difference to theprobability thatX will take on any given value.

MC_{Factorization} tells us that once we know the conditionalprobability distribution of each variable given its parents, \(\PP(X_i\mid \PA(X_i))\), we can compute the complete joint distributionover all of the variables. This captures Reichenbach’s idea thatprobability relations between variables that are not related as causeand effect are nonetheless derived from probability relations betweencauses and effects.

MC_d-separation uses the graphical notion ofd-separation, introduced by Pearl (1988). Let \(X, Y \in \bV,\bZ \subseteq \bV \setminus \{X, Y\}\). As noted above, a path fromX toY is a sequence of variables \(\langle X = X_1 ,\ldots ,X_k = Y\rangle\) such that for each \(X_i\), \(X_{i+1}\),there is either an arrow from \(X_i\) to \(X_{i+1}\) or an arrow from\(X_{i+1}\) to \(X_i\) in \(\bG\). A variable \(X_i , 1 \lt i \lt k\)is acollider on the path just in case there is an arrow from\(X_{i-1}\) to \(X_i\) and from \(X_{i+1}\) to \(X_i\). That is,\(X_i\) is a collider on a path just in case two arrows converge on\(X_i\) in the path. \(\bZ\)d-separatesX andYjust in case every path \(\langle X = X_1 , \ldots ,X_k = Y\rangle\)fromX toY contains at least one variable \(X_i\) suchthat either: (i) \(X_i\) is a collider, and no descendant of \(X_i\)(including \(X_i\) itself) is in \(\bZ\); or (ii) \(X_i\) is not acollider, and \(X_i\) is in \(\bZ\). MC_d-separationstates thatd-separation is sufficient for conditionalindependence.

Note that MC provides sufficient conditions for variables to beprobabilistically independent, conditional on others, but no necessarycondition. The Markov Condition entails many of the same screening offrelations as Reichenbach’s Common Cause Principle, discussed inSection 2.3 above. Here are some examples:

a diagram of 5 letters. T has an arrow to W which has arrows to X and Z. X in turn has an arrow to Y.

Figure 5

In Figure 5, MC implies thatX screensY off from all ofthe other variables, and thatW screensZ off from allof the other variables. This is most easily seen from MC_Screeningoff.W also screensT off from all of the othervariables, which is most easily seen fromMC_d-separation. MC does not imply thatTscreensY off fromZ (or indeed anything from anything).WhileY andZ do have a common cause that screens themoff (W), not all common causes screen them off (T doesnot have to), and not everything that screens them off is a commoncause (X screens them off but is not a common cause).

a diagram of 3 letters. X and Y both have arrows pointing to Z.

Figure 6

In Figure 6, MC entails thatX andY will beunconditionally independent, but not that they will be independentconditional onZ. This is most easily seen fromMC_d-separation.

MC is not expected to hold for arbitrary sets of variables \(\bV\),even when the graph \(\bG\) accurately represents the causal relationsamong those variables. For example, MC will typically fail in thefollowing kinds of case:

In an EPR (Einstein-Podolsky-Rosen) set-up, we have two particlesprepared in the singlet state. IfX represents a spinmeasurement on one particle,Y a spin measurement (in the samedirection) on the other, thenX andY are perfectlyanti-correlated. (One particle will be spin-up just in case the otheris spin-down.) The measurements can be conducted sufficiently far awayfrom each other that it is impossible for one outcome to causallyinfluence the other. However, it can be shown that there is no (local)common causeZ that screens off the two measurement outcomes.
The variables in \(\bV\) are not appropriately distinct. Forexample, suppose thatX,Y, andZ are variablesthat are probabilistically independent and causally unrelated. Nowdefine \(U = X + Y\) and \(W = Y + Z\), and let \(\bV = \{U, W\}\).ThenU andW will be probabilistically dependent, eventhough there is no causal relation between them.
MC may fail if the variables are too coarsely grained. SupposeX,Y, andZ are quantitative variables,Zis a common cause ofX andY, and neitherX norY causes the other. Suppose we replaceZ with a coarservariable, \(Z'\) indicating only whetherZ is high or low. Thenwe would not expect \(Z'\) to screenX off fromY. Thevalue ofX may well contain information about the value ofZ beyond what is given by \(Z'\), and this may affect theprobability ofY.

If there are latent common causes, we expect MC_Screeningoff and MC_{Factorization} to fail if we apply them in anaïve way. For example, suppose that the true causal structure on\(\bV = \{X, Y, Z\}\) is shown by the ADMG in Figure 7.

a diagram of three letters. X has an arrow to its right pointing to Y which in turn has an arrow to its right to Z; a double headed curved arrow connects X and Z.

Figure 7

Y is the only parent ofZ shown in the graph, and if wetry to apply MC_{Screening_off}, it tells us thatYshould screenX off fromZ. However, we would expectX andZ to be correlated, even when we condition onY, due to the latent common cause. The problem is that thegraph is missing a relevant parent ofZ, namely the omittedcommon cause. However, suppose that the probability distribution issuch thatif the latent causeL were added, theprobability distribution over the expanded set of variables wouldsatisfy MC with respect to the resulting DAG. Then it turns out thatthe probability distribution willstill satisfyMC_d-separation with respect to the ADMG ofFigure 8. This requires us to expand the definition ofd-separation toinclude paths with double-headed arrows. For instance,Z is acollider on the path \(Y \rightarrow Z \leftrightarrow X\) (sinceZ has two arrows pointing into it), butX is not acollider on the path \(Y \leftarrow X \leftrightarrow Z\). Thus wewill say that a probability distribution P satisfies the MarkovCondition relative to an ADMG just in case it satisfiesMC_d-separation.

Both SGS 2000 and Pearl 2009 contain statements of a principle calledtheCausal Markov Condition (CMC), but they mean differentthings. In Pearl’s formulation, CMC is just a statement of amathematical theorem: Pearl and Verma (1991) prove if each variable in\(\bV\) is a deterministic product of its parents in \(\bV\), togetherwith an error term; and the errors are probabilistically independentof each other; then the probability distribution on \(\bV\) willsatisfy MC with respect to the DAG \(\bG\). Pearl interprets thisresult in the following way: Macroscopic systems, he believes, aredeterministic. In practice, however, we never have access to all ofthe causally relevant variables affecting a macroscopic system. But ifwe include enough variables in our model so that the excludedvariables are probabilistically independent of one another, then ourmodel will satisfy the MC, and we will have a powerful set of analytictools for studying the system. Thus MC characterizes a point at whichwe have constructed a useful approximation of the complete system.

In SGS 2000, the CMC has more the status of an empirical posit. If\(\bV\) is set of macroscopic variables that are well-chosen, meaningthat they are free from the sorts of defects described in points (ii)and (iii) above; \(\bG\) is a graph representing the causal structureon \(\bV\); and P is the objective probability distribution resultingfrom this causal structure; then P can be expected to satisfy MCrelative to \(\bG\). More precisely, P will satisfy all three versionsof MC if \(\bG\) is a directed acyclic graph, and P will satisfyMC_d-separation if \(\bG\) is an ADMG withdouble-headed arrows. SGS defend this empirical posit in two differentways:

Empirically, it seems that a great many systems do in fact satisfyMC.
Many of the methods that are in fact used to detect causalrelationships tacitly presuppose the MC. In particular, the use ofrandomized trials presupposes a special case of the MC. Suppose thatan experimenter determines randomly which subjects will receivetreatment with a drug \((D = 1)\) and which will receive a placebo\((D = 0)\), and that under this regimen, treatment isprobabilistically correlated with recovery (R). The effect ofrandomization is to eliminate all of the parents ofD, so MCtells us that ifR is not a descendant ofD, thenR andD should be probabilistically independent. If wedo not make this assumption, how can we infer from the experiment thatD is a cause ofR?

Cartwright (1993, 2007: chapter 8) has argued that MC need not holdfor genuinely indeterministic systems. Hausman and Woodward (1999,2004) attempt to defend MC for indeterministic systems.

A causal model that comprises a DAG and a probability distributionthat satisfies MC is called acausal Bayes net (CBN). Acausal model incorporating an ADMG and probability distributionsatisfying MC_d-separation is called asemi-Markov causal model (SMCM).

3.4 The Minimality and Faithfulness Conditions

The MC states a sufficient condition but not a necessary condition forconditional probabilistic independence. As such, the MC by itself cannever entail that two variables are conditionally or unconditionallydependent. The Minimality and Faithfulness Conditions are twoprinciples that posit necessary conditions for probabilisticindependence. The terminology comes from Spirtes et al. (2000). Pearlprovides analogous conditions with different terminology.

(i)The Minimality Condition. Suppose that the acyclicdirected graph \(\bG\) on variable set \(\bV\) satisfies MC withrespect to the probability distribution P. The Minimality Conditionasserts that no sub-graph of \(\bG\) over \(\bV\) also satisfies theMarkov Condition with respect to P. (A subgraph of \(\bG\) is a graphover \(\bV\) that results from removing arrows from \(\bG)\). As anillustration, consider the variable set \(\{X, Y\}\), let there be anarrow fromX toY, and suppose thatX andY are probabilistically independent of each other according toprobability function P. This graph would satisfy the MC with respectto P: none of the independence relations mandated by the MC are absent(in fact, the MC mandates no independence relations). But this graphwould violate the Minimality Condition with respect to P, since thesubgraph that omits the arrow fromX toY would alsosatisfy the MC. The Minimality Condition implies that if there is anarrow fromX toY, thenX makes a probabilisticdifference forY, conditional on the other parents ofY.In other words, if \(\bZ = \PA(Y) \setminus \{X\}\), there exist\(\bz\),y,x, \(x'\) such that

\[\PP(Y = y \mid X = x \amp \bZ = \bz) \ne \PP(Y = y \mid X = x' \amp \bZ = \bz).\]

(ii)The Faithfulness Condition. The Faithfulness Conditionsays that all of the (conditional and unconditional) probabilisticindependencies that exist among the variables in \(\bV\) arerequired by the MC. For example, suppose that \(\bV = \{X, Y,Z\}\). Suppose also thatX andY are unconditionallyindependent of one another, but dependent, conditional uponZ.(The other two variable pairs are dependent, both conditionally andunconditionally.) The graph shown inFigure 8 does not satisfy the faithfulness condition with respect to thisdistribution (colloquially, the graph is not faithful to thedistribution). MC, when applied to the graph ofFigure 8, does not imply the independence ofX andY. Bycontrast, the graph shown inFigure 6 above is faithful to the described distribution. Note thatFigure 8 does satisfy the Minimality Condition with respect to thedistribution; no subgraph satisfies MC with respect to the describeddistribution. In fact, the Faithfulness Condition is strictly strongerthan the Minimality Condition.

a diagram of three letters with X having arrows pointing to Z and Y and also an arrow from Z to Y.

Figure 8

The Faithfulness Condition implies that the causal influences of onevariable on another along multiple causal routes do not‘cancel’. In Figure 8,X influencesY alongtwo different directed paths. If the effect of one path is to exactlyundo the influence along the other path, thenX andYwill be probabilistically independent. The Faithfulness Conditionforbids such exact cancellation. This ‘no canceling’condition seems implausible as a metaphysical or conceptual constraintupon the connection between causation and probabilities. For example,if one gene codes for the production of a particular protein, andsuppresses another gene that codes for the same protein, the operationof the first gene will be independent of the presence of the protein.Cartwright (2007: chapter 6) and Andersen (2103) argue that violationsof faithfulness are widespread.

The Faithfulness Condition is amethodological principlerather than a metaphysical principle. Given a distribution on \(\{X,Y, Z\}\) in whichX andY are independent, we shouldinfer that the causal structure is that depicted inFigure 6, rather thanFigure 8. This is not becauseFigure 8 is conclusively ruled out by the distribution, but rather because itis preferable to postulate a causal structure thatimpliesthe independence ofX andY rather than one that ismerelyconsistent with independence.

3.5 Identifiability of Causal Structure

The original hope of Reichenbach and Suppes was to provide a reductionof causation to probabilities. To what extent has this hope beenrealized within the causal modeling framework? Causal modeling doesnot offer a reduction in the traditional philosophical sense; that is,it does not offer an analysis of the form ‘X causesY if and only if…’ where the right hand side ofthe bi-conditional makes no reference to causation. Instead, it offersa series of postulates about how causal structure constrains thevalues of probabilities. Still, if we have a set of variables \(\bV\)and a probability distribution P on \(\bV\), we may ask if P sufficesto pick out a unique causal graph \(\bG\) on \(\bV\).

Pearl (1988: Chapter 3) proves the following theorem:

: (Identifiability with time-order)

the variables in \(\bV\) are time-indexed, such that only earliervariables can cause later ones;
the probability P assigns positive probability to every possibleassignment of values of the variables in \(\bV\);
there are no latent common causes, so that the correct causalgraph \(\bG\) is a DAG;
and the probability measure P satisfies the Markov and MinimalityConditions with respect to \(\bG\);

then it will be possible to uniquely identify \(\bG\) on the basis ofP.

In many ways, this result successfully executes the sort of projectdescribed inSection 2above. That is, making the same sorts of assumptions abouttime-indexing, and substantive assumptions about the connectionbetween probability and causation, it establishes that it is possibleto identify causal structure using probabilities.

If we don’t have information about time ordering, or othersubstantive assumptions restricting the possible causal structuresamong the variables in \(\bV\), then it will not always be possible toidentify the causal structure from probability alone. In general,given a probability distribution P on \(\bV\), it is only possible toidentify aMarkov equivalence class of causal structures.This will be the set of all DAGs on \(\bV\) that (together with MC)imply all and only the conditional independence relations contained inP. ThePC algorithm (SGS 2000: 84–85), named for itstwo creators (Peter Spirtes andClark Glymour), is onealgorithm that generates the Markov equivalence class for any givenprobability distribution.

Consider two simple examples involving three variables \(\{X, Y,Z\}\). Suppose our probability distribution has the followingproperties:

X andY are dependent unconditionally, andconditional onZ
Y andZ are dependent unconditionally, andconditional onX
X andZ are dependent unconditionally, butindependent conditional onY

Then the Markov equivalence class is:

\[\begin{align}X \rightarrow Y \rightarrow Z\\X \leftarrow Y \leftarrow Z\\X \leftarrow Y \rightarrow Z\end{align}\]

We cannot determine from the probability distribution, together withMC and Faithfulness, which of these structures is correct.

On the other hand, suppose the probability distribution is asfollows:

X andY are dependent unconditionally, andconditional onZ
Y andZ are dependent unconditionally, andconditional onX
X andZ are independent unconditionally, butdependent conditional onY

Then the Markov equivalence class is:

\[X \rightarrow Y \leftarrow Z\]

Note that the first probability distribution on \(\{X, Y, Z\}\) isthat characterized by Reichenbach’s Common Cause Principle. Thesecond distribution reverses the relations betweenX andZ: they are unconditionallyindependent andconditionallydependent. Contrary to Reichenbach, it isactually the latter pattern of dependence relations that is mostuseful for orienting the causal arrows in the graph. In the lastcausal structure shown,Y is a collider on the path fromX toZ. MC_d-separation implies thatcolliders give rise to distinctive conditional independence relations,while all three types of non-collider give rise to the sameconditional independence relations. Many of the algorithms that havebeen developed for inferring causal structure from probabilities workby searching for colliders (see, e.g., SGS 2000: Chapter 5).

The identifiability results discussed so far all assume that thecorrect causal graph is a DAG. However, it is common that latentvariables will be present, and even more common that we might wish toallow for the possibility of latent variables (whether they areactually there or not). If we allow that the correct causal graph maycontain double-headed arrows, we can still applyMC_d-separation, and ask which graphs imply the samesets of conditional independence relations. TheMarkov equivalence class will be larger than it was when we did notallow for latent variables. For instance, given the last set ofprobability relations described above, the graph

\[X \rightarrow Y \leftarrow Z\]

is no longer the only one compatible with this distribution. Thestructure

\[X \leftrightarrow Y \leftrightarrow Z\]

is also possible, as are several others.

3.6 Interventions

A conditional probability such as \(\PP(Y = y \mid X = x)\) givesus the probability thatY will take the valuey, giventhatX has beenobserved to take the valuex.Often, however, we are interested in predicting the value ofYthat will result if weintervene to set the value ofXequal to some particular valuex. Pearl writes \(\PP(Y = y\mid \do(X = x))\) to characterize this probability. What is thedifference between observation and intervention? When we merelyobserve the value that a variable takes, we are learning about thevalue of the variable when it is caused in the normal way, asrepresented in our causal model. Information about the value of thevariable will also provide us with information about its causes, andabout other effects of those causes. However, when we intervene, weoverride the normal causal structure, forcing a variable to take avalue it might not have taken if the system were left alone. The valueof the variable is determined completely by our intervention, thecausal influence of the other variables being completely overridden.Graphically, we can represent the effect of this intervention byeliminating the arrows directed into the variables intervened upon.Such an intervention is sometimes described as ‘breaking’those arrows.

A causal model can be used to predict the effects of such anintervention. Suppose we have a causal model in which the probabilitydistribution P satisfies MC on the causal DAG \(\bG\) over thevariable set \(\bV = \{X_1, X_2 ,\ldots ,X_n\}\). The most usefulversion of MC for thinking about interventions isMC_{Factorization} (seeSection 3.3), which tells us:

\[\PP(X_1, X_2 , \ldots ,X_n) = \prod_i \PP(X_i \mid \PA(X_i))\]

Now suppose that we intervene by setting the value of \(X_k\) to\(x_k\). The post-intervention probability \(\PP'\) is the result ofaltering the factorization as follows:

\[\PP'(X_1, X_2 , \ldots ,X_n) = \PP'(X_k) \times \prod_{i\ne k} \PP(X_i \mid \PA(X_i)),\]

where \(\PP'(X_k = x_k) = 1\). The conditional probabilities of theform \(\PP(X_i \mid \PA(X_i))\) for \(i \ne k\) remain unchangedby the intervention.

This treatment of interventions has been expanded in a number ofdirections. The ‘manipulation theorem’ (theorem 3.6 of SGS2000) generalizes the formula to cover a much broader class ofinterventions, including ones that don’t break all the arrowsinto the variables that are intervened on. Pearl (2009: Chapter 3)develops an axiomatic system he calls the ‘do-calculus’for computing post-intervention probabilities that can be applied tosystems with latent variables.

3.7 Conclusion and Further Reading

Causal modeling is a burgeoning area of research. This entry has largelyignored work on computational methods, as well as applications of thetools discussed here. Rather, the focus has been on the conceptualunderpinnings of recent programs in causal modeling, with specialattention to the connection between causation and probability. It hasalso focused on what it possible to learn about causation “inprinciple” on the basis of probabilities, while ignoring thepractical problems of making causal inferences on the basis of finitedata samples (which inevitably deviate from the trueprobabilities).

The entry on Causal Models covers all of the material in this sectionin greater detail. The most important works surveyed in this sectionare Pearl 2009 and Spirtes, Glymour, & Scheines 2000. Pearl 2010is a short overview of Pearl’s program, and Pearl et al. 2016 isa longer overview. The latter, in particular, assumes relativelylittle technical background. Scheines 1997 and the Introduction ofGlymour & Cooper 1999 are accessible introductions to the SGSprogram. Neapolitan 2004 is a text book that treats Bayes nets incausal and noncausal contexts. Neapolitan & Jiang 2016 is a shortoverview of this topic. Hausman 1999, Glymour 2009, Hitchcock 2009,and Eberhardt 2017 are short overviews that cover some of the topicsraised in this section. The entry oncausation and manipulability contains extensive discussion of interventions, and some discussionof causal models.

4. Actual Causation

Many philosophers and legal theorists have been interested in therelation ofactual causation. This concerns the assignment ofcausal responsibility for an event, based on how events actually playout. For example, suppose that Billy and Suzy each throw a rock at abottle, and that each has a certain probability of hitting andbreaking it. As it happens, Suzy’s rock hits the bottle, andBilly’s doesn’t. As things actually happened, we would saythat Suzy’s throw caused the bottle to shatter, whileBilly’s didn’t. Nonetheless, Billy’s throw increasedthe probability that the bottle would shatter, and it would beidentified as a cause by the theories described in sections2 and3. Billy’s throw had a tendency to shatter the bottle; it was apotential cause of the bottle shattering; it was the sort of thingthat generally causes shattering; but it did not actually cause thebottle to shatter.

A number of authors have attempted to provide probabilistic analysesof actual causation. Some, such as Eells (1991: chapter 6), Kvart(1997, 2004), and Glynn(2011), pay careful attention to the way in which probabilities changeover time. Some, such as Dowe (2004) and Schaffer (2001), combineprobabilities with the resources of a process theory of causation.Some, such as Lewis (1986b), Menzies (1989), and Noordhof (1999),employ probabilities together with counterfactuals to analyze actualcausation. And others such as Beckers & Vennekens (2016),Fenton-Glynn (2017), Halpern (2016: Section 2.5), Hitchcock (2004a),and Twardy & Korb (2011) employ causal modeling tools similar tothose described inSection 3. We will describe two of those theories—Lewis (1986b) andFenton-Glynn (2017)—in more detail in sections4.3 and4.4 below.

4.1 A First Attempt

InSection 2.5 above, we saw that Eells (1991) defines a variety of different waysin whichC can be causally relevant forE.C canbe a positive, negative, or mixed cause ofE depending uponwhetherC raises, lowers, or leaves unchanged the probabilityofE in various background conditions \(B_i\). A naturalsuggestion is that (i) an actual cause ofE is a type ofpositive cause ofE; but (ii) for assessing actual causation,only the background condition thatactually obtains isrelevant. Putting these ideas together, we get:

(AC1)

\(C_t\) is anactual cause of \(E_{t'}\) just in case:

(i): \(t \lt t'\)
(ii): \(\PP(E_{t'} \mid C_t \amp B_a) \gt \PP(E_{t'} \mid {\nsim}C_t \amp B_a)\) , where \(B_a\) is the background condition that actually obtains

As we shall see in the next section, this type of analysis isvulnerable to two types of counterexamples: cases where causes seem tolower (or leave unchanged) the probabilities of their effects; andcases where non-causes seem to raise the probabilities of events thatare not their effects. Most of the theories mentioned in the previoussection can be seen as attempts to improve upon AC1 to deal with thesetypes of counterexample.

4.2 Problem Cases

Actual causes can sometimes lower the probability of their effects incases ofpreemption: Suppose that Billy and Suzy are aimingrocks at a bottle. Billy decides that he will give Suzy theopportunity to throw first; he will throw his rock just in case Suzydoesn’t throw hers. For mathematical convenience, we will assumethat there is some small probability—0.1 say—that Billydoes not faithfully execute his plan. Billy is a more accurate throwerthan Suzy. If Billy throws his rock, there is a 90% chance that itwill shatter the bottle; if Suzy throws, she has a 50% chance ofsuccess. Suzy throws her rock and Billy doesn’t; Suzy’srock hits the bottle and smashes it. By throwing, Suzy lowered theprobability of shattering from 81% (the probability that Billy wouldboth throw and hit if Suzy hadn’t thrown) to 54.5%(accommodating the small probability that Billy will throw even ifSuzy throws). Suzy’s throwpreempts Billy’sthrow: she prevents Billy from throwing, and substitutes her own, lessreliable throw. Nonetheless, Suzy’s throw actually caused thebottle to shatter.

Changing the example slightly gives us a case of a probability-raisingnon-cause. Suppose that Billy and Suzy throw their rockssimultaneously. As it happens, Suzy’s throw hits the bottle andBilly’s misses. Nonetheless, Billy’s throw increased theprobability that the bottle would shatter from 50% (the probabilitythat Suzy would hit) to 95% (the probability that at least one of themwould hit). But Billy’s throw did not in fact cause the bottleto shatter. In the terminology of Schaffer (2001), Billy’s throwis afizzler. It had the potential to shatter the bottle, butit fizzled out, and something else actually caused the bottle tobreak.

4.3 Lewis’s Counterfactual Theory

David Lewis is the best-known advocate of a counterfactual theory ofcausation. In Lewis 1973, he offered a counterfactual theory ofcausation under the assumption of determinism. Lewis 1986b presented aprobabilistic extension to this counterfactual theory ofcausation.

Lewis defines a relation ofcausal dependence that issufficient, but not necessary for causation.

(CD)

EventEcausally depends upon eventC just in case:

(i): C andE actually occur, at timest and \(t'\) respectively.
(ii): At timet, the probability ofE wasx.
(iii): IfC hadn’t occurred, then at timet the probability ofE would have been less than or equal toy.
(iv): \(x \gg y\).

The counterfactual in (iii) is to be understood in terms of possibleworlds: it says that in the nearest possible world(s) whereCdoes not occur, the probability ofE is less than or equal toy. (There needn’t be a single value that the probabilitywould have been. It can take on different values in the closestpossible worlds, as long as all of those values are less than or equaltoy.) On this account, the relevant notion of‘probability-raising’ is not understood in terms ofconditional probabilities, but in terms of unconditional probabilitiesin different possible worlds.

Lewis defines causation (what we are calling “actualcausation”) to be theancestral of causal dependence;that is:

(Lewis): C causesE just in case there is a sequence of events \(D_1\), \(D_2\),…, \(D_n\), such that \(D_1\) causally depends upon \(C\),\(D_2\) causally depends upon \(D_1\), …,E causallydepends upon \(D_n\).

This definition guarantees that causation will be transitive: ifC causesD, andD causesE, thenCcausesE. This modification is useful for addressing certaintypes of preemption. Consider the example from the previous section,where Suzy throws her rock, preempting Billy. We can interpolate aneventD between Suzy’s throw,C, and thebottle’s shatteringE. LetD be the presence ofSuzy’s rock on its actual trajectory, at some time after Billyhas already failed to throw. If Suzy hadn’t thrown,Dwould have been much less likely. And ifD hadn’toccurred,E would have been much less probable. SinceDoccurs after Billy has already declined to throw, ifDhadn’t occurred, there would not have beenany rock ona trajectory toward the bottle. Thus there is a chain of causaldependence fromC toD toE.

Despite this success, it has been widely acknowledged (even by Lewishimself) that Lewis’s probabilistic theory has problems withother types of preemption, and with probability-raisingnon-causes.

4.4. Fenton-Glynn’s Causal Modeling Account

Fenton-Glynn (2017) offers an analysis of actual causation that isbased on the definition of Halpern and Pearl (2005), who consider onlythe deterministic case. What follows here is a simplified version ofFenton-Glynn’s proposal, as one example of an analysis employingcausal models.

Let \(\bV\) be a set of time-indexed, binary variables, which weassume to include any common causes of variables in \(\bV\) (so thatthe correct causal graph on \(\bV\) is a DAG). Let \(*\) be an assignmentfunction that assigns to each variable \(X\) in \(\bV\) one of itspossible values. Intuitively, \(*\) identifies theactual valueof each variable. We will denote \(*(X)\) by \(x^*\), and \(x'\)will denote the non-actual value of \(X\). If \(\bX\) is a set ofvariables in \(\bV\), \(\bX\) = \(\bx^*\) will be a proposition stating thateach variable in \(\bX\) takes the actual value assigned by \(*\). Let Pbe a probability function on \(\bV\) representing objectiveprobability, which we assume to satisfy the Markov and MinimalityConditions (Sections3.3 and3.4 above). We also assume that P assigns positive probability to everypossible assignment of values to variables in \(\bV\).

Given the identifiability result described inSection 3.5 above, we can recover the correct causal graph \(\bG\) from the probabilityfunction P together with the time-indices of the variables. We can nowuse P and \(\bG\) to compute the effects of interventions, as described insection 3.6 above. We now define actual causation as follows:

(F-G): Let \(X, Y \in \bV\). \(X = x^*\) is an actual cause of \(Y = y^*\) just in case there is a partition \(\{\bZ, \bW\}\) of the variables in \(\bV \setminus \{X, Y\}\) such that for every \(\bU \subseteq \bZ\): \[\begin{align}\PP(Y = y^* &\mid do(X = x^*, \bW = \bw^*, \bU = \bu^*))\\ \mathbin{\gt} &\PP(Y = y^* \mid do(X = x',\bW = \bw^*))\\\end{align}\]

Intuitively, this is what is going on: If \(X = x^*\) is an actualcause of \(Y = y^*\) then there has to be at least one directed pathfrom \(X\) to \(Y\). \(\bZ\) will consist of variables that liealong some (but not necessarily all) of these paths. (If \(X\) is adirect cause of \(Y\), then \(\bZ\) can be empty.). F-G requiresthat \(X = x^*\) raises the probability of \(Y = y^*\) in the sense thatinterventions that set \(X\) to \(x^*\) result in higherprobabilities for \(Y = y^*\) than interventions that set \(X\) to\(x'\). Specifically, \(X = x^*\) must raise the probability of \(Y =y^*\) when we also intervene to set the variables in \(\bW\) to theiractual values. \(\bW = \bw^*\) is like a background context of the sortdiscussed inSection 2.4, except that \(\bW\) may include some variables that are descendantsof \(X\). Moreover, \(X = x^*\) must raise the probability of \(Y =y^*\) in conjunction with any combination of variables in \(\bZ\) beingset to their actual values. The idea is that the probabilistic impactof \(X\) on \(Y\) is being constrained to flow through thevariables in \(\bZ\), and at every stage in the process, the value ofthe variables in \(\{X\} \cup \bZ\) must confer a higher probabilityon \(Y = y^*\) than the baseline probability that would have resultedif \(X\) had been set to \(x'\).

Let’s see how this account handles the problem cases fromsection 4.2. For the example of preemption, we will use the followingvariables:

\(\ST_0= 1\) if Suzy throws, 0 if not
\(\BT_1= 1\) if Billy throws, 0 if not
\(\BS_2 = 1\) if the bottle shatters, 0 if not

The subscripts indicate the relative times of the events, with largernumbers corresponding to later times. The actual values of thevariables are \(\ST_0= 1\), \(\BT_1= 0\), and \(\BS_2= 1\). Theprobabilities are:

\[\begin{align}\PP(\BT_1= 1 \mid \ST_0= 1) &{} = .1 \\\PP(\BT_1= 1 \mid \ST_0= 0) &{} = .9 \\[1ex]\PP(\BS_2= 1 \mid \ST_0= 1 \amp \BT_1= 1) &{} = .95\\\PP(\BS_2= 1 \mid \ST_0= 1 \amp \BT_1= 0) &{} = .5\\\PP(\BS_2= 1 \mid \ST_0= 0 \amp \BT_1= 1) &{} = .9\\\PP(\BS_2= 1 \mid \ST_0= 0 \amp \BT_1= 0) &{} = .01\\\end{align}\]

(Note that we have added a small probability for the bottle to shatterdue to some other cause, even if neither Suzy nor Billy throw theirrock. This ensures that the probabilities of all assignments of valuesto the variables are positive.) The corresponding graph is shown inFigure 9.

a three node diagram with node ST_0 having arrows pointing to nodes BS_2 and BT_1. Node BT_1 also has an arrow pointing to node BS_2.

Figure 9

Applying F-G, we can take \(\bW = \{\BT_1\}\), \(\bZ = \varnothing\).We have:

\[\begin{align}\PP(\BS_2= 1 \mid \do(\ST_0= 1) \amp \do(\BT_1= 0)) &{} = .5\\\PP(\BS_2= 1 \mid \do(\ST_0= 0) \amp \do(\BT_1= 0)) &{} = .01\\\end{align}\]

Holding fixed that Billy doesn’t throw, Suzy’s throwraises the probability that the bottle will shatter. Thus theconditions are met for \(\ST = 1\) to be an actual cause of \(\BS =1\).

To treat the case of fizzling fromSection 4.2, let

\(\ST_0= 1\) if Suzy throws, 0 if not
\(\BT_0= 1\) if Billy throws, 0 if not
\(\SH_1= 1\) if Suzy’s rock hits the bottle, 0 if not
\(\BH_1= 1\) if Billy’s rock hits the bottle, 0 if not
\(\BS_2= 1\) if the bottle shatters, 0 if not

The actual values are \(\ST_0= 1\), \(\BT_0= 1\), \(\SH_1= 1\),\(\BH_1= 0\), and \(\BS_2= 1\). The probabilities are:

\[\begin{align}\PP(\SH_1= 1 \mid \ST_0= 1) &{} = .5\\\PP(\SH_1= 1 \mid \ST_0= 0) &{} = .01\\[2ex]\PP(\BH_1= 1 \mid \BT_0= 1) &{} = .9\\\PP(\BH_1= 1 \mid \BT_0= 0) &{} = .01\\[2ex]\PP(\BS_2= 1 \mid \SH_1= 1 \amp \BH_1= 1) & {} = .998 \\\PP(\BS_2= 1 \mid \SH_1= 1 \amp \BH_1= 0) & {} = .95\\\PP(\BS_2= 1 \mid \SH_1= 0 \amp \BH_1= 1) & {} = .95 \\\PP(\BS_2= 1 \mid \SH_1= 0 \amp \BH_1= 0) & {} = .01\\\end{align}\]

As before, we have assigned probabilities close to, but not equal to,zero and one for some of the possibilities. The graph is shown inFigure 10.

Five node diagram with node ST_0 pointing to node SH_1 which points to node BS_2. Node BT_0 points to BH_1 which points to the aforementioned node BS_2.

Figure 10

We want to show that \(\BT_0= 1\) is not an actual cause of \(\BS_2=1\) according to F-G. We will show this by means of a dilemma: is\(\BH_1\in \bW\) or is \(\BH_1\in \bZ\)?

Suppose first that \(\BH_1\in \bW\). Then, regardless of whether\(\ST_0\) and \(\SH_1\) are in \(\bW\) or \(\bZ\), we will need tohave

\[\begin{align}\PP(\BS_2 = 1 &\mid do(\BT_0= 1, \BH_1= 0, \ST_0= 1, \SH_1= 1))\\ \mathbin{\gt} &\PP(\BS_2 = 1 \mid do(\BT_0= 0,\BH_1= 0, \ST_0= 1, \SH_1= 1))\\\end{align}\]

But in fact both of these probabilities are equal to .95. If weintervene to set \(\BH_1\) to 0, intervening on \(\BT_0\) makes nodifference to the probability of \(\BS_2= 1\).

So let us suppose instead that \(\BH_1\in \bZ\). Then we will need tohave

\[\begin{align}\PP(\BS_2 = 1 &\mid do(\BT_0= 1, \BH_1= 0, \ST_0= 1, \SH_1= 1))\\ \mathbin{\gt} &\PP(\BS_2 = 1 \mid do(\BT_0= 0, \ST_0= 1, \SH_1= 1))\\\end{align}\]

This inequality is slightly different, since \(\BH_1= 0\) doesnot appear in the second conditional probability. Nonetheless we have

\[\PP(\BS_2 = 1 \mid do(\BT_0= 1, \BH_1= 0, \ST_0= 1, \SH_1= 1)) = .95\]

and

\[\PP(\BS_2 = 1 \mid do(\BT_0= 0, \ST_0= 1, \SH_1= 1)) = .9505\]

(The second probability is a tiny bit larger, due to the very smallprobability that Billy’s rock will hit even if he doesn’tthrow it.)

So regardless of whether \(\BH_1\in \bW\) or is \(\BH_1\in \bZ\),condition F-G is not satisfied, and \(\BT_0= 1\) is not judged to bean actual cause of \(\BS_2= 1\). The key idea is that it is not enoughfor Billy’s throw to raise the probability of the bottleshattering; Billy’s throw together with what happens afterwardshas to raise the probability of shattering. As things actuallyhappened, Billy’s rock missed the bottle. Billy’s throwtogether with his rock missing does not raise the probabilityof shattering.

Note that this treatment of fizzling requires that we includevariables for whether the rocks hit the bottle. If we try to modelthis case using just three variables, \(\BT\), \(\ST\), and \(\BS\),we will incorrectly judge that Billy’s throw is a cause of thebottle shattering. This raises the question of what is the“right” model to use, and whether we can know if we haveincluded “enough” variables in our model. Fenton-Glynn(2017) includes some discussion of these tricky issues.

4.5 Conclusion and Further Reading

While this section describes some success stories, it is safe to saythat no analysis of actual causation is widely believed to perfectlycapture all of our pre-theoretic intuitions about hypothetical cases.Indeed, it is not clear that these intuitions form a coherent set, orthat they are perfectly tracking objective features of the world.Glymour et al. (2010) raise a number of challenges to the generalproject of trying to provide an analysis of actual causation.

The anthologies Collins et al. 2004 and Dowe & Noordhof 2004contain a number of essays on topics related to the discussion of thissection. Hitchcock 2004b has an extended discussion of the problemposed by fizzlers. Hitchcock 2015 is an overview of Lewis’s workon causation. The entry forcounterfactual theories of causation discusses Lewis’s work, and counterfactual theories ofcausation more generally.

Bibliography

Andersen, Holly, 2013, “When to Expect Violations of CausalFaithfulness and Why it Matters”,Philosophy ofScience, 80(5): 672–683. doi:10.1086/673937
Beckers, Sander and Joost Vennekens, 2016, “A GeneralFramework for Defining and Extending Actual Causation UsingCP-logic”,International Journal of ApproximateReasoning, 77: 105–126. doi:10.1016/j.ijar.2016.05.008
Beebee, Helen, Christopher Hitchcock, and Peter Menzies (eds.),2009,The Oxford Handbook of Causation, Oxford: OxfordUniversity Press.
Bennett, Jonathan, 1988,Events and Their Names,Indianapolis and Cambridge: Hackett.
Billingsley, Patrick, 1995,Probability and Measure,third edition, New York: Wiley.
Cartwright, Nancy, 1979, “Causal Laws and EffectiveStrategies”,Noûs, 13(4): 419–437.doi:10.2307/2215337
–––, 1993, “Marks and Probabilities: TwoWays to Find Causal Structure”, in Fritz Stadler (ed.),Scientific Philosophy: Origins and Development, Dordrecht:Kluwer, pp. 113–119. doi:10.1007/978-94-017-2964-2_7
–––, 2007,Hunting Causes and UsingThem, Cambridge: Cambridge University Press.doi:10.1017/CBO9780511618758
Collins, John, Ned Hall, and L.A. Paul (eds.), 2004,Causationand Counterfactuals, Cambridge MA: MIT Press.
Davis, Wayne, 1988, “Probabilistic Theories ofCausation”, in James Fetzer (ed.),Probability andCausality, Dordrecht: Reidel, pp. 133–160.
Dowe, Phil, 2004, “Chance-lowering Causes”, in Doweand Noordhof 2004: 28–38.
Dowe, Phil and Paul Noordhof (eds.), 2004,Cause andChance, London and New York: Routledge.
Dupré, John, 1984, “Probabilistic CausalityEmancipated”, in Peter French, Theodore Uehling, Jr., and HowardWettstein (eds),Midwest Studies in Philosophy IX,Minneapolis: University of Minnesota Press, pp. 169–175.doi:10.1111/j.1475-4975.1984.tb00058.x
Eberhardt, Frederick, 2017, “Introduction to theFoundations of Causal Discovery”,International Journal ofData Science and Analytics, 3(2): 81–91.doi:10.1007/s41060-016-0038-6
Eells, Ellery, 1991,Probabilistic Causality, Cambridge:Cambridge University Press.
Ehring, Douglas, 2009, “Causal Relata”, in Beebee,Hitchcock, and Menzies 2009: 387–413.
Feller, William, 1968,An Introduction to Probability Theoryand its Applications, New York: Wiley.
Fenton-Glynn, Luke, 2017, “A Proposed ProbabilisticExtension of the Halpern and Pearl Definition of ‘ActualCause’”,The British Journal for the Philosophy ofScience, 68(4): 1061–1124. doi:10.1093/bjps/axv056
Galavotti, Maria Carla, 2005,Philosophical Introduction toProbability, Stanford, CA: CSLI Publications.
Gillies, Donald, 2000,Philosophical Theories ofProbability, London and New York: Routledge.
Glymour, Clark, 2009, “Causality and Statistics”, inBeebee, Hitchcock, and Menzies 2009: 498–522.
Glymour, Clark and Gregory Cooper, 1999,Computation,Causation, and Discovery, Cambridge, MA: MIT Press.
Glymour, Clark, David Danks, Bruce Glymour, Frederick Eberhardt,Joseph Ramsey, Richard Scheines, Peter Spirtes, Choh Man Teng, andJiji Zhang, 2010, “Actual Causation: a Stone Soup Essay”,Synthese, 175(2): 169–192.doi:10.1007/s11229-009-9497-9
Glynn, Luke, 2011, “A Probabilistic Analysis ofCausation”,British Journal for the Philosophy ofScience, 62(2): 343–392. doi:10.1093/bjps/axq015
Hájek, Alan and Christopher Hitchcock, 2016a,The OxfordHandbook of Probability and Philosophy, Oxford: Oxford UniversityPress.
–––, 2016b, “Probability forEveryone—Even Philosophers”, in Hájek and Hitchcock2016a: 5–30.
Halpern, Joseph Y., 2016,Actual Causality, Cambridge,MA: MIT Press.
Halpern, Joseph Y. and Judea Pearl, 2005, “Causes andExplanations: A Structural-Model Approach. Part I: Causes”,British Journal for the Philosophy of Science, 56(4):843–887. doi:10.1093/bjps/axi147
Hausman, Daniel M., 1999, “The Mathematical Theory ofCausation”,British Journal for the Philosophy ofScience, 50(1): 151–62. doi:10.1093/bjps/50.1.151
Hausman, Daniel M. and James Woodward, 1999, “Independence,Invariance, and the Causal Markov Condition”,BritishJournal for the Philosophy of Science, 50(4): 521–583.doi:10.1093/bjps/50.4.521
–––, 2004, “Modularity and the CausalMarkov Condition: a Restatement”,British Journal for thePhilosophy of Science, 55(1): 147–61.doi:10.1093/bjps/55.1.147
Hitchcock, Christopher, 2004a, “Routes, Processes, andChance Lowering Causes”, in Dowe and Noordhof 2004:138–151.
–––, 2004b, “Do All and Only Causes Raisethe Probabilities of Effects?” in Collins, Hall, and Paul 2004:403–418.
–––, 2009, “Causal Models”, inBeebee, Hitchcock, and Menzies 2009: 299–314.
–––, 2015, “Lewis on Causation”, inB. Loewer and J. Schaffer (eds.),A Companion to David Lewis,Hoboken NJ: Wiley-Blackwell, pp. 295–312.
–––, 2016, “ProbabilisticCausation”, in Hájek and Hitchcock 2016a: 815–32.
Hume, David, 1748,An Enquiry Concerning HumanUnderstanding, London.
Jeffrey, Richard, 1969, “Statistical Explanation vs.Statistical Inference”, in Nicholas Rescher (ed.),Essays inHonor of Carl G. Hempel, Reidel: Dordrecht, pp.104–13.
Kvart, Igal, 1997, “Cause and Some Positive Causal Impact”, in James Tomberlin (ed.),Philosophical Perspectives 11: Mind, Causation, and World, Atascadero, CA: Ridgeview, pp. 401–432.
–––, 2004, “Causation: Probabilistic andCounterfactual Analyses”, in Collins, Hall, and Paul 2004:359–387.
Kyburg, Henry E., Jr., 1965, “Discussion: Salmon’sPaper”,Philosophy of Science, 32(2): 147–151.doi:10.1086/288034
Lewis, David, 1973, “Causation”,Journal ofPhilosophy, 70(17): 556–567. doi:10.2307/2025310
–––, 1986a, “Events”, in Lewis1986c: 241–270.
–––, 1986b, “Postscripts to‘Causation’”, in Lewis 1986c: 173–213.
–––, 1986c,Philosophical Papers, VolumeII, Oxford: Oxford University Press.doi:10.1093/0195036468.003.0006
Mellor, Hugh, 2004, “For Facts as Causes and Effects”,in Collins, Hall, and Paul 2004: 309–324.
Menzies, Peter, 1989, “Probabilistic Causation and CausalProcesses: A Critique of Lewis”,Philosophy of Science,56(4): 642–663. doi:10.1086/289518
Neapolitan, Richard, 2004,Learning Bayesian Networks,Upper Saddle River, NJ: Prentice Hall.
Neapolitan, Richard and Xia Jiang, 2016, “The BayesianNetwork Story”, in Hájek and Hitchcock 2016a:183–99.
Noordhof, Paul, 1999, “Probabilistic Causation, Preemptionand Counterfactuals”,Mind, 108(429): 95–125.doi:10.1093/mind/108.429.95
Pearl, Judea, 1988,Probabilistic Reasoning in IntelligentSystems, San Francisco: Morgan Kaufman.
–––, 2009,Causality: Models, Reasoning, andInference, second edition, Cambridge: Cambridge University Press.First edition in 2000. doi:10.1017/CBO9780511803161
–––, 2010, “An Introduction to CausalInference”,The International Journal of Biostatistics,6(2): article 7. doi:10.2202/1557-4679.1203
Pearl, Judea, Madelyn Glymour, and Nicholas P. Jewell, 2016,Causal Inference in Statistics: A Primer, Chichester, UK:Wiley.
Pearl, Judea and Thomas Verma, 1991, “A Theory of InferredCausation”,Principles of Knowledge Representation andReasoning: Proceedings of the 2^nd InternationalConference, San Mateo, CA: Morgan Kaufman, pp. 441–52.
Psillos, Stathis, 2009, “Regularity Theories ofCausation”, in Beebee, Hitchcock, and Menzies 2009:131–157.
Reichenbach, Hans, 1925, “Die Kausalstruktur der Welt undder Unterschied von Vergangenheit und Zukunft”,Sitzungsberichte der Bayerische Akademie der Wissenschaft,November: 133–175. English translation “The CausalStructure of the World and the Difference between Past andFuture”, in Maria Reichenbach and Robert S. Cohen (eds.),Hans Reichenbach:Selected Writings,1909–1953, Vol. II, Dordrecht and Boston: Reidel,1978, pp. 81–119.
–––, 1956,The Direction of Time,Berkeley and Los Angeles: University of California Press.
Salmon, Wesley, 1984,Scientific Explanation and the CausalStructure of the World, Princeton: Princeton UniversityPress.
Schaffer, Jonathan, 2001, “Causes as Probability-Raisers ofProcesses”,Journal of Philosophy, 98(2): 75–92.doi:10.2307/2678483
Scheines, Richard, 1997, “An Introduction to CausalInference” in V. McKim and S. Turner (eds.),Causality inCrisis?, Notre Dame: University of Notre Dame Press, pp.185–199.
Skyrms, Brian, 1980,Causal Necessity, New Haven andLondon: Yale University Press.
Spirtes, Peter, Clark Glymour, and Richard Scheines [SGS], 2000,Causation, Prediction and Search, second edition, Cambridge,MA: MIT Press. First edition in 1993.
Suppes, Patrick, 1970,A Probabilistic Theory ofCausality, Amsterdam: North-Holland Publishing Company.
Twardy, Charles R. and Kevin B. Korb, 2004, “A Criterion ofProbabilistic Causation”,Philosophy of Science, 71(3):241–262. doi:10.1086/421534
–––, 2011, “Actual Causation byProbabilistic Active Paths”,Philosophy of Science,78(5): 900–913. doi:10.1086/662957
Williamson, Jon, 2009, “Probabilistic Theories ofCausation”, in Beebee, Hitchcock, and Menzies 2009:185–212.

Academic Tools

How to cite this entry.
Preview the PDF version of this entry at theFriends of the SEP Society.
Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
Enhanced bibliography for this entryatPhilPapers, with links to its database.

Other Internet Resources

Causal Analysis and Theory in Practice
Causality, web page on Judea Pearl’s book.
The Tetrad Project
The Carnegie Mellon Curriculum on Causal and Statistical Reasoning

Acknowledgments

Thanks to Frederick Eberhardt, Luke Fenton-Glynn, Clark Glymour, JudeaPearl, Richard Scheines, Elliott Sober, Jim Woodward and the editorsof the Stanford Encyclopedia of Philosophy for detailed comments,corrections, and discussion.

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

Browse

About

Support SEP

Mirror Sites

View this site from another server:

USA (Main Site)Philosophy, Stanford University

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054

Movatterモバイル変換