Russell’s paradox is a contradiction—a logicalimpossibility—of concern to the foundations of set theory andlogical reasoning generally. It was discovered by Bertrand Russell inor around 1901. In a letter to Gottlob Frege, Russell outlined theproblem as it affects Frege’s major workGrundgesetze derArithmetik. Frege responded with both dismay and admiration, andnever recovered from the blow dealt to his life’s work. Russellwas also alarmed by the extent to which the paradox threatened his ownproject. Struggling to solve the paradox during a period of intensivework and emotional upheaval, he made some of his most importantdiscoveries including his theory of denoting phrases and definitedescriptions, and his theory of types. Since then, the paradox hasprompted major advances in logic,set theory and thephilosophy and foundations of mathematics. Although Russell is justly famous for his type-theoretic solution, itis less well known that he wrestled for several years with othercandidate solutions. The paradox also inspired fruitful proposals fromgreat mathematicians such as David Hilbert, Ernst Zermelo, and Johnvon Neumann. It continues to inspire work in logic and foundations tothis day. In his introductory note to Russell’s letter publishedmore than fifty years ago, Jean van Heijenoort wrote that“Russell’s paradox has been leaven in modern logic andcountless works have dealt with it.” The leavening effect ofRussell’s Paradox has not diminished.
Central to any theory of sets is a statement of the conditions underwhich sets are formed. In addition to simply listing the members of aset, it was initially assumed that any well-defined condition (orprecisely specified property) could be used to determine a set. Forexample, if \(T\) is the property of being a teacup, then the set,\(S\), of all teacups might be defined as \(S = \{x: T(x)\}\), the setof all individuals, \(x\), such that \(x\) has the property of being\(T\). Even a contradictory property might be used to determine a set.For example, the property of being both \(T\) and not-\(T\) woulddetermine the empty set, the set having no members.
More precisely, naïve set theory assumes the so-called naïveor unrestricted comprehension principle that for any formula\(\phi(x)\) containing \(x\) as a free variable, there will exist theset \(\{x: \phi(x)\}\) whose members are exactly those objects thatsatisfy \(\phi(x)\). Thus, if the formula \(\phi(x)\) stands for“\(x\) is prime”, then \(\{x: \phi(x)\}\) will be the setof prime numbers. If \(\phi(x)\) stands for “\({\sim}(x =x)\)”, then \(\{x: \phi(x)\}\) will be the empty set.
But from the assumption of this principle, Russell’scontradiction follows. Let \(\phi(x)\) stand for \(x \in x\) and let\(R = \{x: {\sim}\phi(x)\}\). Then \(R\) is the set whose members areexactly those objects that are not members of themselves.
Is \(R\) a member of itself? If it is, then it must satisfy thecondition of not being a member of itself and so it is not. If it isnot, then it must not satisfy the condition of not being a member ofitself, and so it must be a member of itself. This is a contradiction,one that is a problem even for systems of logic weaker than theclassical system.
Although the reasoning leading to the paradox applies to set theory,it does not actually require any principle of set theory. Suppose that\(y\) is such that for all \(x\), \(x\in y \equiv x \notin x\). Thentake the case where \(x=y\) to get \(y\in y \equiv y\notin y\). Infact, the reasoning is entirely general and does not even require thatthe relation in question is \(\in\). This is already suggested byRussell’s example of the village barber who shaves all and onlyvillagers who don’t shave themselves. (This is emphasized inboth Levy 1979 and Kripke 2014.) The reasoning is an applied instanceof the following theorem of pure logic, labelled T269 in Kalish,Montague, and Mar 2000:
\[\tag{T269} {\sim}\exists y\forall x(Fxy\equiv{\sim}Fxx)\]We return to this generalization in section 5.
Some related paradoxes are discussed in the second chapter of theIntroduction to Whitehead and Russell (1910, 2nd edition,60–65), as well as in the entry onparadoxes and contemporary logic in this encyclopedia. A few of these are discussed later in thisentry.
Russell discovered the paradox during a period when similar antinomieswere being discovered. Cesare Burali-Forti, an assistant to GiuseppePeano, had discovered one such in 1897 when he noticed that since theset of ordinals is well-ordered, it too must have an ordinal. However,this ordinal must be both an element of the set of all ordinals andyet greater than every such element. Further, Cantor had proved in1882 that there is no greatest cardinal number (Cantor 1883), whichRussell found “paradoxical” since he believed that thereis a universal set containing everything and that this set had thegreatest cardinal number. Unlike Burali-Forti’s paradox andCantor’s “paradox,” Russell’s paradox does notinvolve either ordinals or cardinals, relying instead only on muchmore primitive notions.
Zermelo noticed a similar contradiction sometime between 1897 and1902, possibly anticipating Russell by some years (Ebbinghaus andPeckhaus 2007, 43–48; Tappenden 2013, 336), although Kanamoriconcludes that the discovery could easily have been as late as 1902(Kanamori 2009, 411). As Linsky points out, Zermelo’s argument,while similar to Russell’s, is best understood as one of acluster of arguments by Zermelo, Schröder and Cantor that“indeed anticipated” the mathematical argument Russelldeveloped but that turned out to be different in small but significantways from Russell’s argument (Linsky 2013, 11).
Russell appears to have made his discovery in the late spring of 1901,while studying Cantor’s argument and working on hisPrinciples of Mathematics (1903a). Exactly when the discoverytook place is not clear. The paper in which Cantor’s diagonalmethod is first discussed appeared in June (Russell 1901). Elsewhere,Russell confirms that he came across his paradox “in June1901” (1944, 13). Later he reports that the discovery took place“in the spring of 1901” (1959, 75). Still later he reportsthat he came across the paradox, not in June, but in May of that year(1969, 221). For more details see Grattan-Guinness 1978, Coffa 1979,and Lavine 1994. Further primary sources for Russell’s work inthe period 1900–1902 are collected in Moore 1994.
Russell discovered his paradox by examining Cantor’s proof thatthere is no largest cardinal number. The proof relies on a neat trickknown as “diagonalization” that was reinvented for a novelpurpose by Cantor himself. Here we discuss the method in generalterms. Then, in the next section andthe supplement, we turn to some of the details of Russell’s struggle withdiagonalization.
A common form of diagonalization involves anenumeration—a list of elements such as formulas, sets, or functions inwhich every element is associated with some natural number— inwhich an element’s place in the enumeration is exploited todefine a diagonal construction. For example, assuming an enumerationof one-place functions of natural numbers, \(f_1, f_2, f_3, \ldots\),it can be shown that the enumeration cannot be exhaustive: There mustbe a one-place function not in the enumeration. Define \(g(n)\) to be\(f_n(n)+1\). Then \(g\) cannot be in the enumeration, say, at the\(q\)th place, since then \(g(q)=g_q(q)+1\), a contradiction.Variations of this method are used throughout set theory andmetamathematics.
It’s easy to see why the method is called“diagonalization.” List the functions \(f\) asfollows:
\[\begin{array}{cccccc}F_0: & \boldsymbol{f_0(0)}, & f_0(1) & f_0(2), & f_0(3), & \ldots\\ F_1: & f_1(0), & \boldsymbol{f_1(1)} & f_1(2), & f_1(3), & \ldots\\ F_2: & f_2(0), & f_2(1) & \boldsymbol{f_2(2)}, & f_2(3), & \ldots\\ F_3: & f_3(0), & f_3(1) & f_3(2), & \boldsymbol{f_3(3)}, & \ldots\\ & & \vdots & & & \\ \end{array}\]The function \(g(n) = f_n (n) + 1\) is defined by raising the value of\(f_n\) along the bolded left-to-right diagonal by one. So \(g(n)\)differs from any \(f_n\) at \(f_n (n)\) and so cannot be in the list\(F_0, F_1, F_2\).... Cantor used this method to prove that there aremore real numbers than natural numbers, although both kinds of numbersare infinite in extent. Cantor’s method set off a firestorm ofactivity in set theory and metamathematics; and a more general kind ofdiagonal reasoning—perhaps not involving anenumeration—led Russell to his paradox as we will soon see.
A very basic principle involving diagonalization is the following:
Now consider the case in which \(X = Y\). Let \(f\) be the identitymapping \(I\), where \(I(x) = x\). The set defined by diagonalizationin this case is \(R_X = D_f = \{x \in X: x\not\in x\}\). (CL) impliesthat \(R_X \not\in Y\), and since \(X = Y\), \(R\not\in X\). Thus, forany set \(X\), \(R_X\)is never an element of \(X\). It follows thatthere cannot be a universal set. However, Russell was convinced thatthere must be a universal set, \(V\). His conclusion was, in effect,that \(R_V\), which is just \(R\), does not exist. If it did, it wouldbe both a member and a non-member of itself. This is a consistentposition, but modern ZF set theory has an axiom —the axiom offoundation, or regularity— that implies that no set is a memberof itself and hence that \(V\) and \(R_V\) coincide and so neither isa set.
It is a corollary of (CL) that there is no onto function with domain\(X\) and range thepower set of \(X\) (the set of allsubsets of \(X)\). This is a form of Cantor’s Theorem. If such afunction existed, then it would have \(D_f\) in its range (since\(D_f\) is a subset of \(X)\). By (CL), this is impossible. However,there is a one-to-one correspondence between \(X\) and a subset of thepower set of \(X\) – namely the family of all singletons ofelements of \(X\). This correspondence is a function into the powerset of \(X\). Since there is a function with domain \(X\) into thepower set of \(X\), but no function with domain \(X\) onto thepowerset of \(X\), the cardinal size of \(X\) isstrictlysmaller than that of the powerset of \(X\). This is the resultthat is more usually called Cantor’s theorem.
As Russell tells us in his (1919), it was after he applied the samekind of reasoning found in Cantor’s diagonal argument to a“supposed class of all imaginable objects” that he was ledto the contradiction. (Griffin, 2004) is a scholary account ofRussell’s discovery. What follows is a reconstruction of hisreasoning that continues to show the logical connection between hisparadox and diagonalization. How might he have discovered that therelation defined by diagonalization is \(R\)?
The comprehensive or universal set \(V\) that Russell was consideringcontains everything, including itself, as an element. (Strictlyspeaking, Russell’s theory concernedclasses ratherthan sets, but this distinction can be ignored for a moment.) Since\(V\) contains everything as an element, and since subsets are things,\(V\) should contain all its subsets as elements (Russell 1901, p.69). For this reason, Russell seems to have thought that it waspossible to map every subset of \(V\) to an element of \(V\). Thiswould be achieved by mapping every subset of \(V\) to itself. To dothis in a general way that also takes account of the elements of \(V\)that are not sets, he considered the following many-to-one mapping:\(f(x) = x\) if \(x\) is a set, \(f(x) =\) the singleton of \(x\) if\(x\) is not a set (Russell 1903a, §349). (The mapping ismany-to-one because both \(x\) and the singleton of \(x\) are mappedto the singleton of \(x\).) To simplify things, focus on the clausethat \(f(x) = x\) if \(x\) is a set. Let \(f\) be the identity mapping\(I\), where \(I(x) = x\), whose domain is the subset \(V^*\) (of\(V)\) that contains all and only sets, and whose codomain is thepowerset of \(V^*\). For every \(y\) in the codomain, there is an\(x\) in the domain such that \(f(x) = y\). So the codomain is therange of \(f\). So, by definition of ‘onto’, \(f\) is afunction from domain \(V^*\) onto the powerset of \(V^*\). Thiscontradicts Cantor’s theorem. Let us focus, then, on what mustbe omitted from the mapping: the diagonal set \(D_f\), which (CL)tells us is not in the range of \(f\).
Recall from 2.2 that the diagonal set \(D_f\) is \(\{x \in X: x\not\in f(x)\}\). In English, this is the set of all sets that are notmembers of the sets to which they are mapped by \(f\). Since \(f\)maps every set to itself, \(D_f\) is the set of all sets that are notmembers of themselves. More precisely,
\[\forall x \in V^* (x \in f(x) \equiv x \not\in D_f)\]implies:
\[\forall x (x \in x \equiv x \not\in D_f),\]since those sets not in \(D_f\) are members of the sets—themselves— to which they are mapped by \(f\). Hence
\[\forall x (x \not\in x \equiv x \in D_f).\]This because \(f\) maps every set to itself, so \(D_f\) is the set ofall sets that are not members of themselves. That is,
\[D_f = \{x: x \not\in x\}.\]By the reasoning in section 1, this implies Russell’s paradox:\(D_f \in D_f \equiv D_f \not\in D_f\). Since Russell remainedconvinced that \(V\) exists, he sought a principled way of ruling outfunctions that allow this diagonal argument to get started. This isdiscussed in 3. Then, in 4, the modern reaction to the argument isconsidered: that \(V\) does not exist.
Returning to the history of the paradox, it is worth noting that thecluster of arguments due to Russell, Zermelo and Schröder werethought to be of minor importance by everyone other than Russell untilit was realized how detrimental they were to Gottlob Frege’sfoundations for arithmetic.
Russell wrote to Frege with news of the paradox on June 16, 1902. (Forthe relevant correspondence, see Russell 1902 and Frege 1902, in vanHeijenoort 1967.) After he had expressed his great admiration forFrege’s work, Russell broke the devastating news, gently:
Let \(w\) be the predicate of being a predicate that cannot bepredicated of itself. Can \(w\) be predicated of itself? From eitheranswer follows its contradictory. We must therefore conclude that\(w\) is not a predicate. Likewise, there is no class (as a whole) ofthose classes which, as wholes, are not members of themselves. Fromthis I conclude that under certain circumstances a definable set doesnot form a whole. (1902, p. 125)
Russell’s letter arrived just as the second volume ofFrege’sGrundgesetze der Arithmetik (The Basic Lawsof Arithmetic, 1893, 1903) was in press. Frege’s letter inresponse contained a solution to the predicational version of theparadox described in Russell’s letter (see Landini 1993, Klement2010a). Immediately appreciating the difficulty that the paradox posedfor classes, Frege added to theGrundgesetze a hastilycomposed appendix discussing Russell’s discovery (more of whichin section 3). In the appendix Frege observes that the consequences ofRussell’s paradox are not immediately clear. For example:
Is it always permissible to speak of the extension of a concept, of aclass? And if not, how do we recognize the exceptional cases? Can wealways infer from the extension of one concept’s coinciding withthat of a second, that every object which falls under the firstconcept also falls under the second? These are the questions raised byMr Russell’s communication (1903, 127).
We will now discuss the first and third questions, before turning tothe second question in section 3.
Consider the first question raised by Frege. As Frege uses it in thequote above, “class” denotes an extension of a concept,something rather like what Russell above calls “a class aswhole” and what we today call a “set.” Further, aconcept or function to truth values is what in Frege’s systemcorresponds to a property. Thus the concept of having mammary glandsdistinguishes the class of mammals from reptiles, birds and otherliving organisms. In that case, it is permissible to speak of theclass of mammals. The concept of being both square and not square (orany other conjunction of contradictory concepts) distinguishes theempty class, and so on. The intuition that it is, in Frege’swords, “always permissible to speak of the extension of aconcept, of a class” (1903, 127) can be codified in a naïveor unrestricted comprehension principle saying that every concept maybe used to determine a corresponding class:
For every concept \(f\), there exists a class \(C_f\) such that forall objects \(x\), \(x\) is a member of \(C_f\) if and only if \(x\)falls under \(f\).
Note, however, that Frege does not adopt such a principle as an axiom(or “basic law”). Rather, as is explained below, it is aconsequence of his basic law V.
Exactly how naïve comprehension is stated depends on the theoryin question. In theories based on a first order formal language, andusing the set-theoretic \(\in\) rather than Frege’s own notionof membership, the principle states that
\[\tag{NC}\exists y \forall x [x \in y \equiv \phi(x)],\](Note that ‘\(y\)’ is not free in the formula‘\(\phi\)’, for if it were then \(\phi(x)\) could be \(x\not\in y\), and so \(x \in y \equiv x \not\in y\).) (NC) says,“There is a set \(y\) such that for any object \(x, x\) is anelement of \(y\) if and only if the condition expressed by \(\phi\)holds.” Where ‘\(\phi\)’ is the well-formed formula‘\(x \not\in\)x’, and \(y\) is the correspondingset, this implies Russell’s Paradox. Hence Russell’sdiagnosis in his letter to Frege (quoted above): “under certaincircumstances a definable set does not form a whole.”
Frege’s own notion of membership is that \(x \in y\) if and onlyif \(y\) is the extension of a concept under which \(x\) falls.Writing ‘\(\varepsilon f\)’ for the extension of a concept\(f\):
\[x \in y \equiv_{df} \exists f [y = \varepsilon f \wedge f(x)].\]Using this, it is possible to derive the principle corresponding to(NC) in Frege’s system:
\[\tag{NCF}\forall f \exists y \forall x[x \in y \equiv f(x)].\]First, however, recall Frege’s third question. Frege iscommitted to (NCF) by his basic law V about the identity of ranges ofvalues. As he wrote in response to Russell’s letter, afterexpressing his shock and dismay:
It seems accordingly that the transformation of the generality of anidentity into an identity of ranges of values (sect. 9 of myBasicLaws) is not always permissible, that my law V (sect. 20, p. 36)is false, and that my explanations in sect. 31 do not suffice tosecure a meaning for my combinations of signs in all cases. (1902,127)
This requires some explanation. For Frege, concepts or functions areincomplete entities in need of an argument or object to complete them.Thus, he identifies concepts not in isolation but based on how theybehave when completed by all their arguments i.e., based on whetherthey agree on or have the same value for every argument. If they doagree, then they are the same, which justifies writing‘\(\forall x[f(x) = g(x)\)]’. (Note that where it is nowcustomary to write ‘\(\equiv\)’ for material equivalenceFrege writes ‘=’, because when given arguments theexpressions standing for concepts denote truth values. From now on, wewill comport with modern practice and write ‘\(\equiv\)’.)This is what Frege calls “the generality of an identity”between concepts. Further, according to Frege, objects are unlikeconcepts in being complete (or self-standing); hence concepts are notobjects. However, Frege introduces objects corresponding to conceptsby the familiar mathematical method of transforming entities thathave something in common —standing in an equivalencerelation— into the thing they have in common. In this case, theentities are concepts. What they have in common is that they have thesame value for every argument – “the generality of anidentity.” This is the equivalence relation in question. Theobject abstracted from this equivalence relation is called the“range of values” of a concept, also called “thegraph of a function” – the set of pairs of arguments andvalues corresponding to the concept. The special case in which a rangeof values corresponds to a concept —or function to one of thetruth values— is called an “extension.” (For therest of this section, this is our preferred terminology.)Frege’s law V states that the extension of a concept \(f(~)\) isidentical to the extension of a concept \(g(~)\) if and only if\(f(~)\) and \(g(~)\) agree on their values for every argument, i.e.,if and only if \(\forall x[f(x) \equiv g(x)\)]. Thus “thetransformation of the generality of an identity into an identity ofranges of values”. Hence law V:
\[\tag{V} \varepsilon f = \varepsilon g \equiv \forall x[f(x) \equiv g(x)].\]This implies (NCF). First fix \(f\) and let \(y = \varepsilon f\). Toprove the left-to-right direction of (NCF), assume \(x \in y\). Then,by the definition of ‘\(\in\)’,
\[\exists g [y = \varepsilon g \wedge g(x)].\]So,
\[\varepsilon f = \varepsilon g.\]Hence, by (V),
\[\forall x[f(x) \equiv g(x)].\]Further, we have \(g(x)\), so \(f(x)\) as required.
To prove the right-to-left direction of (NCF), assume \(f(x)\).Because \(y = \varepsilon f\), Frege’s definition is satisfied,and so \(x \in y\) as required.
Note also that (NCF) implies (V). Assuming that \(\varepsilon f =\varepsilon g\),
\[\forall x[x \in \varepsilon f \equiv x \in \varepsilon g].\]But by (NCF),
\[\forall x[x \in \varepsilon f\equiv f(x)],\]and
\[\forall x[x \in \varepsilon g \equiv g(x)].\]So,
\[\forall x[f(x) \equiv g(x)].\](NCF) should not be confused with the purely first order condition wehave labelled (NC). T. Parsons (1987) proved that the first orderversion of (V) is consistent – it has a model in the naturalnumbers. But of course (NC) is inconsistent. So, in the first ordercontext, (V), now treated as a first order schema, does not imply(NC).
As Frege points out in his letter to Russell, the thing correspondingto \(R\) in Frege’s system is:is a concept whose extensiondoesn’t fall under it. This is described by the followingformula of second order logic:
\[\tag{RF} \exists f [x = \varepsilon f \wedge{\sim} f(x)].\]Recalling the proof of (NCF), abbreviate the formula labeled \((RF)\)by the formula ‘\(Rx\)’ and substitute this “Russellconcept” for \(f(x)\) in (NCF). Then it follows in Frege’ssystem that the extension of the concept \(R(~)\) falls under \(R(~)\)if and only if it does not: \(R(\varepsilon R) \equiv{\sim}R(\varepsilon R)\). Consider the extension of \(R, \varepsilon R\). If\(R(\varepsilon R)\) is true, then, by definition, there is a concept\(f\) such that \(\varepsilon R\) is the extension of \(f\) and\(f(\varepsilon R)\) is false. So, by law V, \(R(\varepsilon R)\) isfalse as well, because \(\varepsilon R\) is also the extension of\(f\) and \(\forall x[f(x) \equiv R(x)\)]. Therefore, if\(R(\varepsilon R)\) is true, then \(R(\varepsilon R)\) is false. So\(R(\varepsilon R)\) is false. Then there is a concept \(f\) such that\(\varepsilon R\) is the extension of \(f\) and \(f(\varepsilon R)\)is false. So, by definition, \(R(\varepsilon R)\) is true. Thiscontradicts the claim that \(R(\varepsilon R)\) is false.
(See 2.3–2.5 of the entry onFrege’s Theorem for more details. See also section 2.4.1 of the entry onGottlob Frege, and the papers in section II of Boolos 1998.)
Because of these problems, Frege eventually felt forced to abandonmany of his views about logic and mathematics. Even so, as Russellpoints out, Frege met the news of the paradox with remarkablefortitude:
As I think about acts of integrity and grace, I realise that there isnothing in my knowledge to compare with Frege’s dedication totruth. His entire life’s work was on the verge of completion,much of his work had been ignored to the benefit of men infinitelyless capable, his second volume was about to be published, and uponfinding that his fundamental assumption was in error, he respondedwith intellectual pleasure clearly submerging any feelings of personaldisappointment. It was almost superhuman and a telling indication ofthat of which men are capable if their dedication is to creative workand knowledge instead of cruder efforts to dominate and be known.(Quoted in van Heijenoort 1967), 127.)
Of course, Russell too was concerned about the consequences of thecontradiction. Upon learning that Frege agreed with him about thesignificance of the result, he immediately began writing an appendixfor his own soon-to-be-releasedPrinciples of Mathematics.Entitled “Appendix B: The Doctrine of Types,” the appendixrepresents Russell’s first attempt at providing a principledmethod for avoiding what soon was to become known as“Russell’s paradox.” We return to this below.
One early skeptic concerning (NC) was the originator of modern settheory, Georg Cantor. Even prior to Russell’s discovery, Cantorhad rejected (NC) in favor of what was, in effect, a distinctionbetween sets and classes, recognizing that some properties (such asthe property of being an ordinal) produced collections that weresimply too large to be sets. Thus, he thought that the entities thatare today called sets should be restricted (Cantor 1899). Various waysof implementing this line of response are discussed below. (Detailscan be found in Moore 1982, Hallett 1984, and Menzel 1984.) For themoment, note that Frege and Russell were responding to a paradox thatafflicts their notion of an extension or class as defined by (NC); itis not entirely clear that there was an analogous crisis for settheory as Cantor understood it. Rather, as Hilbert would insist, whatCantor’s theory needed was a precise axiomatization (Hilbert1904).
Recall Frege’s second question quoted above, How to recognizethe exceptions to (NCF)? Because (NCF) follows from (V) inFrege’s system, isolating the exceptions to the former requiresisolating the exceptions to the latter. His basic idea is that whenconsidering whether \(\varepsilon f = \varepsilon g\), one mustnot consider whether these extensions contain themselves;only then may one identify them when the corresponding concepts orfunctions agree on every (other) argument. While the proposal wasultimately unsuccessful, Frege’s strategy of isolating theproblematic instances and excluding them by restricting (V) is ofhistorical interest, because of the influence it appears to have hadon Russell. For this reason, Frege’s strategy andRussell’s related proposals are discussed in thesupplement on Frege’s Way Out.
Russell worked (with A. N. Whitehead) on various candidate solutionsto the paradox between 1902 and 1905. Early on, he toyed with an idealike Cantor’s that some properties produced collections that arenot legitimate sets. This is reflected in Russell’s distinctionbetween a class as one and a class as many, where the size of theformer needs to be restricted (1903a, 60, 102–3). Russell wouldlater return to theories that limit the sizes of legitimate sets andcomplain that such theories do not decide how big, or how high up inorder, is too big, or too high, to form a set. In particular, hecomplains that such theories do not decide which ordinals arelegitimate (Russell 1907, 44). Since this complaint began to beaddressed, subsequently, by axiomatizations beginning with that inZermelo 1908, and since some such axiomatizations can be justified bya coherent basic conception of the sort discussed in Shoenfield 1967and Boolos 1971, we put the complaint to one side.
Although Russell first introduced his theory of types in Appendix B ofhis 1903Principles of Mathematics, he recognized immediatelythat more work needed to be done, since his initial account seemed toresolve some but not all of the paradoxes. For example, the initialaccount is subject to Russell’s paradox of propositions. This isdiscussed later.
In a letter to Frege in May 1903, Russell suggests that “classesare entirely superfluous… this seems to me to avoid thecontradiction” (Russell 1903b). Russell means that there is noneed for his theory to make reference to classes in any sense of thatterm – no need for reference to extensions, or sets, or Fregeanfunctions. Because of this feature, Russell calls it a“no-classes theory.” However, this does not address thefact that Frege’s (NCF) can also be obtained from (V). Moreover,Russell’s suggestion still faces the question of what to doabout the predicational version of Russell’s paradox that hestated without reference to classes or extensions in the originalletter to Frege.
Here the reader should recall from section 2 the account of how todefine relations and functions via diagonalization. Russell’sanswer was to try various principled ways of restricting hisdefinitions to rule out diagonalization. See Russell 1994b, Landini1992, and Urquhart 1994, xxi–xxii. Much of his work in 1904 ison ruling out the relevant propositional functions using therestrictions of the so-called zigzag theory (Urquhart 1994, xxvi). Thetheory is so-called because given a propositional function \(\phi\)that purports to determine a class \(u\), diagonalization exploits“a certain zigzag quality,” that \(x \not\in u\) when\(\phi x\) or \(x \in u\) when \({\sim}\phi x\) (Russell, 1907, 38).This particular strategy shows the influence of Frege and is discussedalong with Frege’s “way out” in the aforementionedsupplement on Frege’s Way Out.
Russell’s revisionary ideas about denoting and propositions weredeveloped around 1904–5 while he worked on the zigzag theory.The paper in which this theory was summarized was submitted inNovember 1905, mere months after “On Denoting” wassubmitted in August after completion in late July 1905. SeeUrquhart’s introduction to Russell 1905 (p. 414). Indeed, thefirst recorded evidence of Russell’s view that
a complex is denoting or un-denoting, not in its own nature, butaccording to its position in another complex, i.e., a proposition(Russell 1904a, 126)
occurs in his notes on zigzag theories.
Concerning the implications of the new theory of denoting for theparadox, there were many false starts. Russell wrote (with hischaracteristic wit and understatement) to his wife on April14th 1904:
Alfred [Whitehead] and I had a happy hour yesterday, when we thoughtthe present King of France had solved the Contradiction; but it turnedout finally that the royal intellect was not quite up to that standard(quoted by Urquhart in (Urquhart 1994, xxxiii)).
This brings us to the relevant aspects of Russell’s theory ofdenoting phrases and definite descriptions (see Section 4 of the entryonBertrand Russell), which plays a role in all his proposed solutions to the paradoxfrom 1904 onward.
Definite descriptions (like ‘the smallest prime number’)are of fundamental importance to mathematics, in which one oftenspeaks of the unique entity satisfying a given condition (or two suchconditions in the case of equations). Russell’s theory is thatdefinite descriptions, like denoting phrases in general, have nomeaning in isolation —in the way that, say, the proper name‘Walter Scott’ does— but only in the context oflarger expressions in which they occur. This reflects the influence ofRussell’s mathematical education, which included the 19thcentury method for eliminating apparent reference to infinitesimalquantities by giving meaning not to phrases that purport to denotethem but to the larger expressions in which these occur. In the caseof definite descriptions, like the one occurring in
The present King of France is bald,
Russell’s analysis of them as complex quantifier phrases raisesthe question of whether or not the purportedly denoted entity exists.In the case of phrases that purport to denote classes, his analysisshows that there is simply no reason to say that classes exist.Rather, the analysis reveals that what are said to exist arepropositional functions. For example, the following statement thatpurports to be about the class of \(G\)s,
\(\{x: Gx\}\) is abstract,
is analyzed as something that can be rendered informally as:
There is a \(\phi\) such that for all \(x~Gx\) if and only if \(\phi x\amp \phi\) is abstract,
where \(\phi\) is a variable for propositional functions. (SeeRussell, 1910, 72, 188. For a formalization of this analysis in modernnotation, see section 9 of the entry on the notation ofPrincipia Mathematica. For further discussion of Russell’s theory of denoting phrasesand its role in the elimination of classes and other paradoxicalentities, see Urquhart 1994, Kaplan 2005, and Klement 2010a,b.) Ofcourse, eliminating classes does not address the predicational versionof Russell’s paradox. What is required, then, is a principlegoverning propositional functions that can block the paradox.Eventually, Russell would settle on thevicious circleprinciple, to which we return below. First, however, heconsidered another kind of no-classes theory – theaforementioned substitutional theory (Russell 1906, 1907).
The substitutional theory is officially debuted as the preferredversion of the no-classes theory in (Russell 1907). In a note added tothis paper on February 5th 1906, Russell records that he“now feel[s] hardly any doubt that the no-classes theory affordsthe complete solution of all the difficulties stated” (Russell1907). (This attitude may seem alien to the modern reader familiarwith model theory, which assumes some set theory thus giving thelatter a logically primary status. But this was not the attitude ofthe time.) Russell’s published description of the substitutionaltheory is extremely sketchy. Further, while a solution toRussell’s paradox can be gleaned from the theory, it does notrule out diagonalization entirely and so faces a paradox of its own.However, the theory played a large role in Russell’s thinkingand influenced his subsequent work on type theory, so aspiring Russellscholars should know something about it. The substitutional theory andits problems are discussed in more detail in thesupplement on Frege’s Way Out. See also Landini 1998, 2004, 2011, Grattan-Guiness 1974, Urquhart1986, Urquhart and Pelham 1995, Klement 2010a, and Galaugher 2013.
The substitutional theory led to type theory’s more matureexpression five years later in Russell’s 1908 article,“Mathematical Logic as Based on the Theory of Types,” andin the monumental work he co-authored withAlfred North Whitehead,Principia Mathematica (1910, 1912, 1913). Russell’s type theory thusappears in two versions: the “simple theory” of 1903 andthe “ramified theory” of 1908. Both versions have beencriticized for being too ad hoc to eliminate the paradox successfully.For a passionate defense of a version of simple type theory againstthis charge, see (Andrews 1986).
Russell’s basic idea is to avoid commitment to \(R\) (the set ofall sets that are not members of themselves) by arranging allsentences (or, more precisely, allpropositional functions, functions which give propositions as their values) into a hierarchy.It is then possible to refer to all objects for which a givencondition (or predicate) holds only if they are all at the same levelor of the same “type.”
The resulting solution to Russell’s paradox is motivated inlarge part by adoption of a principle that he viewed as aconstraint on logically possible theories of propositional functions:the so-calledvicious circle principle. The principle ineffect states that no propositional function can be defined prior tospecifying exactly those objects to which the function applies –the function’s domain. For example, before defining “\(x\)is a prime number,” one first needs to define the collection ofobjects that might satisfy it, namely the set \(N\) of naturalnumbers. As Whitehead and Russell explain,
An analysis of the paradoxes to be avoided shows that they all resultfrom a kind of vicious circle. The vicious circles in question arisefrom supposing that a collection of objects may contain members whichcan only be defined by means of the collection as a whole. Thus, forexample, the collection ofpropositions will be supposed tocontain a proposition stating that “all propositions are eithertrue or false.” It would seem, however, that such a statementcould not be legitimate unless “all propositions” referredto some already definite collection, which it cannot do if newpropositions are created by statements about “allpropositions.” We shall, therefore, have to say that statementsabout “all propositions” are meaningless. … Theprinciple which enables us to avoid illegitimate totalities may bestated as follows: “Whatever involvesall of acollection must not be one of the collection”; or, conversely:“If, provided a certain collection had a total, it would havemembers only definable in terms of that total, then the saidcollection has no total.” We shall call this the“vicious-circle principle,” because it enables us to avoidthe vicious circles involved in the assumption of illegitimatetotalities. (1910 [2nd edn], 37)
If Whitehead and Russell are right, it follows that nofunction’s domain includes any object presupposing the functionitself. As regards Russell’s paradox, since classes orextensions presuppose the functions that define them, nofunction’s domain includes any class or extension thatpresupposes that function. This blocks the construction of \(R\) bydiagonalization. It will be recalled, however, that the theory ofPrincipia Mathematica is another version of the no-classestheory, according to which classes are entirely superfluous. A moreimportant result of the vicious circle principle is, then, thatpropositional functions are arranged in a hierarchy of the kindRussell proposes. This suffices to restrict Frege’s
\[ \tag{NCF} \forall f \exists y \forall x[x \in y \equiv f(x)],\]based on the demand that no functions or relations of the level of\(f\) or higher fall under \(f(~)\).
As Gödel pointed out (in his 1944), Russell’s typerestriction is not well motivated in cases describing propositionalfunctions, or sets (like \(N)\), that are mind and languageindependent. As Quine put it, memorably, such descriptions are
not visibly more vicious than singling out an individual as the mosttypical Yale man on the basis of averages of Yale scores including hisown (1967, 243).
Note however that in the 1908 article, Russell reaches the conclusionthat his restriction should be dropped in the case of extensionalmathematics andonly applied when characterizing intensionalentities such as propositional functions (1908, 243). Theattitude taken inPrincipia is similar. This attitude isendorsed in (Church 1978), (Myhill 1979) and (Kripke 2011).
In response to Russell’s paradox,David Hilbert also expanded his program of building a consistent, axiomaticfoundation for mathematics that included an axiomatic foundation forquantificational logic and set theory (Peckhaus 2004). Underlying thisapproach was the idea of allowing the use of only finite, well-definedand intuitively constructible objects, together with rules ofinference deemed to be absolutely certain. A foundation based on thisidea, Hilbert thought, would be sufficiently definite and certain toensure that paradoxes could not arise.
Hilbert credited Cantor with seeing that some classes were simply toolarge to be sets and that any assumption to the contrary would lead toinconsistency. Hilbert’s demand was that Cantor’s settheory be given an axiomatic foundation (Hilbert 1904, p. 131). Thisis worth emphasizing because it was glossed over in a much more famouspaper, where Hilbert did not clearly distinguish Cantor’s workfrom that of Frege (Hilbert, 1925, p 375). Hilbert may also haveunderestimated the extent to which axiomatic set theory can bejustified by a coherent basic conception of the sort discussed inShoenfield 1967 and Boolos 1971.
Two more major figures who are worth mentioning introduced versions ofconstructivism, the basic idea of which was that one cannotassert the existence of a mathematical object unless one can define aprocedure for constructing it.
Henri Poincaré read Russell’s work and in his (1910) advocated a version ofthe vicious circle principle, which is presented not so much inresponse to Russell’s paradox as to Richard’s. Elsewhere(1909), Poincaré claims that Zermelo’s axiom ofseparation does not guarantee safety from paradox, because it does notrequire one to define a procedure for constructing a set prior toasserting its existence:
Zermelo has no qualms about talking about all the objects that arepart of a certain Menge, a Menge which also satisfies a certaincondition […] By laying down his Menge M in advance, he hasraised a wall of enclosure that stops any troublemakers who might comefrom outside. But he does not ask himself whether there might not betroublemakers from within (1909, 477).
For Poincaré, asserting the existence of an object cannot bejustified by intuition, which allows us tofollow rulesrather than torepresent objects such as sets. If theexistence of a set is to be asserted based on a legitimate use ofintuition, then it must be accompanied by a rule of construction. Inthe absence of such a rule, there is always the danger that intuitionwill lead to paradox.
Finally,Luitzen Brouwer developed a more permissive form of constructivism,intuitionism. Brouwer differed from Poincaré aboutintuition of objects, which Brouwer accepted as legitimate so long asthe object’s existence is not simply posited to fill a“gap” but is supported by a procedure for constructing itthat meets certain conditions. The demand for such a procedureunderlied his opposition to higher number classes and was partly aresponse to the classical infinitary perspective assumed by theexistence axioms of set theory and by the set-theoretic definition ofthe continuum. However, Brouwer’s dissertation (1907) alsocontains a critique of Cantorian set theory and a specific criticismof comprehension. Some of this is summarized in his Inaugural Lecture(1912), which also addresses comprehension and the axiom of separation(which he calls “the axiom of inclusion”). A theme of thelecture is that set theoretic axioms extrapolate without adequatejustification from the finite to the infinite. As in the case ofPoincaré, these criticisms are presented not so much inresponse to Russell’s paradox as to those of Richard andBurali-Forti. Note also that the derivation of Russell’s paradoxdoes not depend depends upon an instance of the principle of ExcludedMiddle, that either \(R\) is a member of \(R\) or it is not. This isdiscussed in section 4.
Russell’s paradox is often seen as a negative development– as bringing down Frege’sGrundgesetze and asone of the original conceptual sins leading to our expulsion fromCantor’s paradise (Hilbert 1925).W.V. Quine describes the paradox as an “antinomy” that “packsa surprise that can be accommodated by nothing less than a repudiationof our conceptual heritage” (1966, 11). Quine is referring tothe principle (NC) mentioned earlier. Despite Quine’s comment,it is possible to see Russell’s paradox in a more positivelight.
Later research has revealed that the paradox does not necessarilyshort circuit Frege’s derivation of arithmetic from logictogether with the Hume-Cantor Principle that the number of \(F\)s\(=\) the number of \(G\)s if and only if there is a one-to-onecorrespondence between the \(F\)s and the \(G\)s. If this is taken asa (non-logical) axiom, instead of a theorem derived from Frege’slaw V, the latter can simply be abandoned. (For details, see the entryonFrege’s theorem and foundations for arithmetic.) For critiques, see the papers in section II of (Boolos 1998), and(Burgess, 2005).
To take a different tack, Church gives an elegant formulation of thesimple theory of types that with the axioms of infinity and choice isan adequate framework for extant mathematics (Church, 1940). It hasalso proven fruitful even in areas removed from the foundations ofmathematics. (For details, see the entries onAlonzo Church and onChurch’s Type Theory.)
For some set theorists today, what the paradox highlights is thatthere is a problem with Frege’s and Russell’slogical notions of a class according to which it is specifiedby a formula that indicates a Fregean concept, a propositionalfunction or an attribute. Some, such as Lavine (1994) go further andargue that the paradox does not afflict Cantor’s combinatorialnotion of a set as an object formed simply by collecting its members.An intermediate position (McLarty, 1997) is that while the paradox wasa crisis for Frege and Russell, it was merely a source of problems andquestions for set theorists. This is why they concentrated on findingprecise axiomatizations that determine which sets exist. However,Frege and Russell were also seeking to state precise theories ofconcepts and propositional functions, respectively, if not directly ofsets. Moreover, it is still worth asking what the distinction betweenthe logical and combinatorial conceptions amounts to, because it isimpossible to formulate set theory without set abstracts which requireformulas for the definitions of sets.
In any case, the development of axiomatic (as opposed to naïve)set theories exhibits various ingenious and mathematically andphilosophically significant ways of dealing with Russell’sparadox. This paved the way for stunning results in themetamathematics of set theory. These results have includedGödel’s and Cohen’s theorems on the independence of theaxiom of choice and Cantor’s continuum hypothesis. So let us see, roughly, howsome of these methods – specifically, the so-called“untyped” methods – deal with Russell’sparadox.
Zermelo replaces (NC) with the following axiom schema of Separation(orAussonderungsaxiom):
\[\tag{ZA}\forall A \exists B \forall x (x \in B \equiv(x \in A \wedge \phi)).\]Note that 'B' is not free in '\(\phi\)'. (ZA)demands that in order to gain entry into \(B, x\) must be a member ofan existing set \(A\). As one might imagine, this requires a host ofadditional set-existence axioms, none of which would be required if(NC) had held up.
How does (ZA) avoid Russell’s paradox? One might think at firstthat it doesn’t. Let \(A\) be \(V\) —the whole universe ofsets— and \(\phi\) be \(x \not\in x\), a contradiction againappears to arise. But in this case, all the contradiction shows isthat \(V\) is not a set. All the contradiction shows is that“\(V\)” is an empty name (i.e., that it has no denotation,that \(V\) does not exist), since the ontology of Zermelo’ssystem consists solely of sets.
This same point can be made in yet another way, involving arelativized form of Russell’s argument. Let \(B\) be any set. By(ZA), the set \(R_B = \{x \in B: x \not\in x\}\) exists, but it cannotbe an element of \(B\). For if it is an element of \(B\), then it isreasonable to ask whether or not it is an element of \(R_B\); and itis if and only if it is not. Thus something, namely \(R_B\), is“missing” from each set \(B\). So again, \(V\) is not aset, since nothing can be missing from \(V\). But notice the followingsubtlety: unlike the previous argument involving the directapplication ofAussonderungs to \(V\), the present argumenthints at the idea that, while \(V\) is not a set, “\(V\)”is not an empty name. The next strategy for dealing withRussell’s paradox capitalizes on this hint.
John von Neumann’s (1925) untyped method for dealing withparadoxes, and with Russell’s paradox in particular, is simpleand ingenious. Von Neumann introduces a distinction between membershipand non-membership and, on this basis, draws a distinction betweensets and classes. An object is amember (simpliciter) if itis a member of some class; and it is anon-member if it isnot a member of any class. (Actually, von Neumann develops a theory offunctions, taken as primitive, rather than classes, whereincorresponding to the member/non-member distinction one has adistinction between an object that can be an argument of some functionand one that cannot. In its modern form, due to Bernays andGödel, it is a single-sorted theory of classes.)
Sets are then defined as members, and non-members are labeled“proper classes.” Only sets can be members of classes. Sofor example, the Russell class, \(R\), cannot be a member of anyclass, and hence is not a set but a proper class. If \(R\) is assumedto be an element of a class \(A\), then it follows from one of vonNeumann’s axioms that \(R\) is not equivalent to \(V\). But\(R\) is equivalent to \(V\), and hence not an element of \(A\). Thus,von Neumann’s method is closely related to the result statedabove about the set \(R_B\), for arbitrary \(B\). Von Neumann’smethod, while admired by the likes of Gödel and Bernays, has beenundervalued in recent years. (One notable exception is Burgess(2005)). See (Bernays, 1968), which summarizes his work on thisbeginning in the mid-1930s, and (Gödel, 1944).
Quine (1937) and (1967) similarly provide another untyped method (inletter if not in spirit) of blocking Russell’s paradox, and onethat is rife with interesting anomalies. Quine’s basic idea isto allow the universal set \(V\) and to introduce astratifiedcomprehension axiom. In effect, the axiom blocks circularity byintroducing a hierarchy (or stratification) that is similar to typetheory in some ways, and dissimilar in others. (Details can be foundin the entry onQuine’s New Foundations.) For the criticism that Quine’s theory compares unfavorably toZermelo’s, see Martin 1970 and Boolos 1971. For another approachthat allows the universal set, see Church 1974b.
Ackerman (1956) also distinguishes sets and classes, but in adifferent way than does Von Neumann. All objects of the theory areclasses (represented by upper case variables), which are individuatedby the usual principle of extensionality: \(\forall x (x \in U \equivx \in W) \rightarrow U = W\). The comprehension axiom for classesallows that there exists a class that contains all and only those setsthat satisfy \(\phi\), for any condition \(\phi\). Because classescontain only sets that satisfy \(\phi\), there is no longer a proof ofRussell’s paradox in the case where \(\phi\) is \(x \not\in x\).Consider a \(Y\) such that for all \(x, x \in Y \equiv x \not\in x\).By comprehension \(Y\) exists and is the class {\(x: x \not\in x\)}.Take the case where \(x = Y\) to get \(Y \in Y \equiv Y \not\in Y\).This shows that \(Y\) is not a set.
Some classes are also sets, so Ackerman introduces a second primitiverelation symbol (in addition to \(\in)\), the unary‘\(M\)’, where ‘\(M(x)\)’ means that \(x\) isa set. His comprehension axiom for sets is less restrictive than (ZA).It merely requires that sets are defined without reference to theclass of all sets. More precisely, if every class that satisfies\(\phi\) is a set, then there exists a set containing exactly thosesets that satisfy \(\phi\), so long as \(\phi\) does not involve \(M\)and all parameters in \(\phi\) are set parameters. Formally:
\[\tag{ACS}\forall x_1 ....x_n [\forall W (\phi(W) \rightarrow M(W)) \rightarrow \exists z \forall W (W \in z \equiv \phi(W))],\]where \(\phi\) is any condition that contains no occurrence of \(M\)and contains no parameters other than set parameters \(x_1 ,\ldots,x_n\). The consequent of the main conditional says that {\(W:\phi(W)\)} exists (by class comprehension) and is a set. Suppose thatthe restriction on \(M\) was lifted and \(\phi\) allowed to be \(M\wedge \phi\). Then, by (ACS), for any condition \(\phi\), the classof all \(W\) satisfying \(\phi\) would be a set. In particular, theclass {\(W: W \not\in W\)} would be a set and Russell’s paradoxwould follow. Now suppose that the restriction that \(\phi\) containsno parameters other than set parameters was lifted. Then classparameters in \(\phi\) would be allowed. Choosing \(W \in Y\) for\(\phi(W)\), would yield the following instance of (ACS):
\[\forall W (W \in Y \rightarrow M(W)) \rightarrow \exists z \forall W (W \in z \equiv W \in Y).\]It follows by generalization on \(Y\) that for any class, if all itsmembers are sets, then the class of its members exists (by classcomprehension) and is a set:
\[\forall Y \forall W (W \in Y \rightarrow M(W)) \rightarrow \exists z \forall W (W \in z \equiv W \in Y).\]It follows, by instantiation to the class \(\{x: \phi(x)\}\), where\(\phi\) is any condition, that every class that exists (by classcomprehension) is a set:
\[\forall W (W \in \{x: \phi(x)\} \rightarrow M(W)) \rightarrow \exists z \forall W (W \in z \equiv W \in \{x: \phi(x)\}).\]Taking \(x \not\in x\) for \(\phi\) yields Russell’s paradox.Clearly, then, considerable care has to be taken to avoid the paradox,if one takes Ackerman’s less restrictive approach to setcomprehension rather than that of limitation of size. Note also thatAckerman’s set theory is equivalent to ZF and that (ACS) allowsone to prove the so-called axiom of infinity as a theorem. See(Fraenkel, Bar-Hillel and Levy, 1973) for more discussion as well assection 5.2 of the entry onalternative axiomatic set theories.
In contrast to Zermelo’s, von Neumann’s, Ackerman’sand Quine’s strategies, which are in a sense purely settheoretic, there have also been attempts to avoid Russell’sparadox by altering the underlying logic. There have been many suchattempts, but one stands out as being both radical and, at the moment,somewhat popular (although not with set theoristsper se):this is theparaconsistent approach, which limits the overalleffect of an isolated contradiction on an entire theory. Classicallogic mandates that any contradiction trivializes a theory by makingevery sentence of the theory provable. This is because, in classicallogic, the following is a theorem:
\[\tag{Ex Falso Quodlibet} A \supset({\sim}A \supset B).\]Now, virtually the only way to avoid EFQ is to give up disjunctivesyllogism, that is, given the usual definitions of the connectives,modus ponens! So altering basic sentential logic in this wayis radical indeed. Unfortunately, even giving up EFQ is not enough toretain a semblance of (NC). One also has to give up the followingadditional theorem of basic sentential logic:
\[\tag{Contraction} (A \supset(A \supset B)) \supset(A \supset B).\]Using (Contraction) it can be argued that (NC) leads directly, notmerely to an isolated contradiction, but to triviality. (For theargument that this is so, see the entry onCurry’s paradox, section 2.2. Note too that it is not enough merely to retain the name“modus ponens”; it is the rule itself thatbecomes modified within non-traditional logics.) Thus it seems thatthe woes of (NC) are not confined to Russell’s paradox but alsoinclude a negation-free paradox due to Curry.
Another suggestion might be to conclude that the paradox depends uponan instance of the principle of Excluded Middle, that either \(R\) isa member of \(R\) or it is not. This is a principle that is rejectedby some non-classical approaches to logic, including intuitionism.However it is possible to formulate the paradox without appealing toExcluded Middle by relying instead upon the Law of Non-contradiction.Given the definition of \(R\) it follows that \(R \in R \equiv{\sim}(R\in R)\). So \(R \in R \supset{\sim}(R \in R)\). But it is also truethat \(R \in R \supset R \in R\). So \(R \in R \supset(R \in R\wedge{\sim}(R \in R))\). But by the Law of Non-contradiction\({\sim}(R \in R \wedge{\sim}(R \in R))\). So bymodustollens \({\sim}(R \in R)\). At the same time, since \(R \in R\equiv{\sim}(R \in R)\), it follows that \({\sim}(R \in R) \supset R\in R\), and thus that \(R \in R\). So both \(R \in R\) and itsnegation are deduced using only intuitionistically acceptable methods.See (Bell, 2004) for relevant discussion.
It seems, therefore, that proponents of non-classical logics cannotclaim to have preserved (NC) in any significant sense, other thanpreserving the purely syntactical form of the principle, and neitherintuitionism nor paraconsistency plus the abandonment of Contractionwill offer an advantage over the untyped solutions of Zermelo, vonNeumann, or Quine. (Further discussion can be found in Meyer, Routleyand Dunn 1979; Irvine 1992; Priest 2006, ch. 18; Weber 2010, 2012, andin the entries onCurry’s paradox (sec. 2.2) andparaconsistent logic (sec. 2.3).)
Russell’s paradox was not the only paradox that troubled Russelland, hence, not the only motivation for the type restrictions onefinds inPrincipia Mathematica. In his earlier work,ThePrinciples of Mathematics, Russell devotes a chapter to“the Contradiction” (Russell’s paradox), presentingit in several forms and dismissing several non-starter responses. Hethen signals that he will “shortly” discuss the doctrineof types. This doesn’t happen for several hundred pages, untilhe reaches the very end of the book, in Appendix B! There Russellpresents an incipient, simple theory of types, not the theory of typeswe find inPrincipia Mathematica. Why was the later theoryneeded? The reason is that in Appendix B Russell also presents anotherparadox which he thinks cannot be resolved by means of the simpletheory of types. This paradox concerns propositions, not classes, andit, together with the semantic paradoxes, led Russell to formulate hisramified version of the theory of types. For more on Russell’sdiscovery of the propositional paradox in 1902 see the primary sourcesand Urquhart’s introduction in (Russell 1994b). See also (Church1978) and (Deutsch 2022).
The propositional version of the paradox did not figure prominently inthe subsequent development of logic and set theory, but it sorelypuzzled Russell. For one thing, it seems to contradict Cantor’stheorem. Russell writes: “We cannot admit that there are moreranges [classes of propositions] than propositions” (1903, 527).The reason is that there seem to be easy, one to one correlationsbetween classes of propositions and propositions. For example, theclass \(m\) of propositions can be correlated with the propositionthat every proposition in \(m\) is true. This, together with afine-grained principle of individuation for propositions (asserting,for one thing, that if the classes \(m\) and \(n\) of propositionsdiffer, then any proposition about \(m\) will differ from anyproposition about \(n)\) leads to contradiction.
For some time, there was relatively little discussion of this paradox,although it played a key role in the development of Church’slogic of sense and denotation. While there are several set theories tochoose from, there is only one well-developed formal theory ofRussellian propositions, although such propositions are central to theviews of Millians and direct-reference theorists. One would think thatsuch a theory would be required for the foundations of semantics, ifnot for the foundations of mathematics. Thus, while one ofRussell’s paradoxes has led to the fruitful development of thefoundations of mathematics, there is more to be done in response tohis “other” paradox in the foundations of semantics. To besure, Church (1974a) and Anderson (1989) have attempted to develop aRussellian intensional logic based on the ramified theory of types.(See the entry onAlonzo Church.) But an argument can be made that the ramified theory is toorestrictive to serve as a foundation for the semantics of naturallanguage. There have also been some recent attempts to obtain thebeginnings of a Russellian intensional logic based on untyped settheories (Cantini 2004; Deutsch 2014). And there is now a growth ofinterest in a related topic, that of “structureprinciples” in metaphysics. See (Bacon 2023) and the citationstherein.
It is also worth noting that a number of seemingly purelyset-theoretical principles are actually (applied) instances oftheorems of pure logic (i.e., of first-order quantification theorywith identity). There is a (partial) list of these in Kalish,Montague, and Mar 2000. As mentioned earlier, Russell’s paradoxis an instance of T269 in this list:
\[\tag{T269} {\sim}\exists y \forall x (Fxy \equiv{\sim} Fxx).\]Reading the dyadic predicate letter “\(F\)” as “is amember of”, this says that it is not the case that there is a\(y\) such that for any \(x, x\) is a member of \(y\) if and only if\(x\) is not a member of \(x\). Does this mean that Russell’sparadox reduces to T269?
Certainly the proof of T269 distills the essence of Russell’sargument, its pattern of reasoning. But that pattern also underwritesan endless list of seemingly frivolous “paradoxes” such asthe famous paradox of the village barber who shaves all and only thosevillagers who do not shave themselves or, similarly, the paradox ofthe benevolent but efficient God who helps all and only those who donot help themselves.
How do these “pseudo paradoxes,” as they are sometimescalled, differ, if at all, from Russell’s paradox? The patternof reasoning is the same and the conclusion – that there is nosuch Barber, no such efficient God, no such set of non-self-memberedsets – is the same: such things simply don’t exist.(However, as von Neumann showed, it is not necessary to go quite thisfar. Von Neumann’s method instructs us not that such things as\(R\) do not exist, but just that we cannot say much about them,inasmuch as \(R\) and the like cannot fall into the extension of anypredicate that qualifies as a class.)
The standard answer to this question is that the difference lies inthe subject matter. Quine asks, “why does it [Russell’sparadox] count as an antinomy and the barber paradox not?”; andhe answers, “The reason is that there has been in our habits ofthought an overwhelming presumption of there being such a class but nopresumption of there being such a barber” (1966, 14). Even so,psychological talk of “habits of thought” is notilluminating. More to the point, Russell’s paradox sensiblygives rise to the question of what sets there are; but it is nonsenseto wonder, on such grounds as T269, what barbers or Gods thereare!
This verdict, however, is not quite fair to fans of the Barber or ofT269 generally. They will insist that the question raised by T269 isnot what barbers or Gods there are, but rather what non-paradoxicalobjects there are. This question is virtually the same as that raisedby Russell’s paradox itself. Thus, from this perspective, therelation between the Barber and Russell’s paradox is much closerthan many (following Quine) have been willing to allow (Salmon 2013,Kripke 2014).
Note that there is a first-order logical formula that bears the samerelation to the principle about the \(R_B\)’s that T269 bears toRussell’s paradox. It is the following:
\[\tag{T273} \forall z \forall y (\forall x [Fxy \equiv(Fxz \wedge{\sim} Fxx)] \supset{\sim} Fyz).\](We have taken the liberty of extending the numbering used in Kalish,Montague and Mar 2000 to T273.)
Of particular interest is the logical status of the principle called“Cantor’s Diagonal Lemma”. Recall (CL) fromabove:
As stated earlier, an immediate corollary is that there is no functionfrom domain \(X\) onto the powerset of \(X\) and this latter factunderpins Cantor’s theorem itself. In fact, it alone might welldeserve to be called “Cantor’s theorem.” Then theprinciple that is more usually called Cantor’s theorem—that the cardinality of \(X\) is always strictly less than thecardinality of the powerset of \(X\) —follows as an easycorollary. Yet there is indeed a formula of pure logic that distillsthe essence of (CL) It is the following:
\[{\sim}\exists y \forall x (x \in f(y) \equiv x \not\in f(x)).\]Substituting an arbitrary binary relation symbol \(P\) for epsilon, wehave:
\[{\sim}\exists y \forall x (P(x, f(y)) \equiv{\sim}P(x, f(x)).\]But this formula is merely a substitution instance of T269! There is,therefore, a very close purely logical relationship betweenRussell’s paradox in the form of T269 and Cantor’s theoremin the form of Cantor’s lemma, (CL). The latter is a simplegeneralization of the former obtained by substituting a functionletter \(f\) for the variables at certain points in T269. Similarly,one obtains T269 from the formula above by taking \(f\) to be theidentity function. As we have seen, Russell was led to his paradox bycontemplating Cantor’s theorem as applied to the universe \(V\).We now see that from a purely logical point of view, the two are evenmore closely related than anyone might have imagined.
But not all set-theoretic paradoxes are similarly related tofirst-order logical theorems. The Burali-Forti paradox is an example,since the notion of a well-ordering is not elementary; that is, it isnot first-order definable.
Russell’s paradox has never been passé, but recentlythere has been an explosion of interest in it by scholars involved inresearch in mathematical logic and in philosophical and historicalstudies of modern logic. Kripke (2014) argues that Gödel’s first incompleteness theorem is a pure-logical consequence of the inconsistency of (NC), fromwhich he proves non-constructively, i.e. without producing an example,that there is a true and unprovable sentence in formal systems ofarithmetic. No particular paradox is used in this argument. Kripkethen goes on to show how use of paradox in addition to pure logic canyield a constructive proof of incompleteness. Kripke uses theheterological paradox to get something like “Unprovable ofitself is unprovable of itself.” From this point of view, theGödel sentence makes an intelligible assertion that can bestated. Kripke takes the heterological paradox to be a version ofRussell’s paradox with membership replaced with satisfaction.Both are instances of (T269).
Moreover, a glance at the contents of the 2004 volumeOne HundredYears of Russell’s Paradox shows prominent mathematical andphilosophical logicians and historians of logic poring over theparadox, proposing new ways back into Cantor’s paradise, orother ways of resolving the issue. Further investigations includeradically new ways out of the dilemma posed by the paradox, such asplural quantification. (See Boolos 1984, McGee and Rayo 2000, Spencer2012, and Pruss and Rasmussen 2015). There are also new studies of thetheories of types (simple and ramified, and extensions thereof), newinterpretations of Russell’s paradox, and constructive theories,of Russell’s paradox of propositions and of his own attempt atan untyped theory (the substitution theory), and so forth. Morerecently, it has even been debated whether the connection of Cantorianarguments to Russell’s paradox casts doubt on Cantor’sparadise. See Whittle 2015 for the claim that it does and the responsein McGee 2015.
All of this reminds us that fruitful work can arise from the mostunlikely of observations. As Dana Scott has put it, “It is to beunderstood from the start that Russell’s paradox isnotto be regarded as a disaster. It and the related paradoxes show thatthe naïve notion of all-inclusive collections is untenable. Thatis an interesting result, no doubt about it” (1974,207).
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
Cantor, Georg |Frege, Gottlob |Frege, Gottlob: theorem and foundations for arithmetic | peano |Principia Mathematica |Quine, Willard Van Orman: New Foundations |Russell, Bertrand |self-reference |set theory: alternative axiomatic theories |set theory: early development |type theoryWhitehead, Alfred North set theories | Alonzo Church |
We’d like to thank the following people, who provided feedbackon various iterations of this entry: Haim Gaifman; Gary Ostertag fordiscussion of Russell; Carl Posy for discussion of Brouwer; DavidStump and Gerhard Heinzmann for discussion of Poincaré and helpwith our translation; and Tony Anderson for encouraging our approachto Cantor’s theorem via (CL). We'd like to also thank thefollowing people, who provided helpful comments on earlier versions ofthis entry: Ken Blackwell, Fred Kroon, Paolo Mancosu, Chris Menzel,Jim Robinson, Fred Spiessens, Richard Zach, and several anonymousreferees. Part of this entry was written with the support ofInstituto Investigaciones Filosoficas (UNAM), PAPIIT IG400422.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054