Movatterモバイル変換


[0]ホーム

URL:


SEP home page
Stanford Encyclopedia of Philosophy

Alternative Axiomatic Set Theories

First published Tue May 30, 2006; substantive revision Tue Sep 21, 2021

By “alternative set theories” we mean systems of settheory differing significantly from the dominantZF(Zermelo-Frankel set theory) and its close relatives (though we willreview these systems in the article). Among the systems we will revieware typed theories of sets, Zermelo set theory and its variations, NewFoundations and related systems, positive set theories, andconstructive set theories. An interest in the range of alternative settheories does not presuppose an interest in replacing the dominant settheory with one of the alternatives; acquainting ourselves withfoundations of mathematics formulated in terms of an alternativesystem can be instructive as showing us what any set theory (includingthe usual one) is supposed to do for us. The study of alternative settheories can dispel a facile identification of “settheory” with “Zermelo-Fraenkel set theory”; they arenot the same thing.


1. Why Set Theory?

Why do we do set theory in the first place? The most immediatelyfamiliar objects of mathematics which might seem to be sets aregeometric figures: but the view that these are best understood as setsof points is a modern view. Classical Greeks, while certainly aware ofthe formal possibility of viewing geometric figures as sets of points,rejected this view because of their insistence on rejecting the actualinfinite. Even an early modern thinker like Spinoza could comment thatit is obvious that a line is not a collection of points (whereas forus it may be hard to see what else it could be;Ethics, I.15,scholium IV, 96).

Cantor’s set theory (which we will not address directly here asit was not formalized) arose out of an analysis of complicatedsubcollections of the real line defined using tools of what we wouldnow call topology (Cantor 1872). A better advertisement for theusefulness of set theory for foundations of mathematics (or at leastone easier to understand for the layman) is Dedekind’sdefinition of real numbers using “cuts” in the rationalnumbers (Dedekind 1872) and the definition of the natural numbers assets due to Frege and Russell (Frege 1884).

Most of us agree on what the theories of natural numbers, realnumbers, and Euclidean space ought to look like (though constructivistmathematicians will have differences with classical mathematics evenhere). There was at least initially less agreement as to what a theoryof sets ought to look like (or even whether there ought to be a theoryof sets). The confidence of at least some mathematicians in theirunderstanding of this subject (or in its coherence as a subject atall) was shaken by the discovery of paradoxes in “naive”set theory around the beginning of the twentieth century. A number ofalternative approaches were considered then and later, but a singletheory, the Zermelo-Fraenkel theory with the Axiom of Choice(ZFC) dominates the field in practice. One of the strengthsof the Zermelo-Fraenkel set theory is that it comes with an image ofwhat the world of set theory is (just as most of us have a commonnotion of what the natural numbers, the real numbers, and Euclideanspace are like): this image is what is called the “cumulativehierarchy” of sets.

1.1 The Dedekind construction of the reals

In the nineteenth century, analysis (the theory of the real numbers)needed to be put on a firm logical footing. Dedekind’sdefinition of the reals (Dedekind 1872) was a tool for thispurpose.

Suppose that the rational numbers are understood (this is of course amajor assumption, but certainly the rationals are more easilyunderstood than the reals).

Dedekind proposed that the real numbers could be uniquely correlatedwithcuts in the rationals, where a cut was determined by apair of sets \((L, R)\) with the following properties: \(L\) and \(R\)are sets of rationals. \(L\) and \(R\) are both nonempty and everyelement of \(L\) is less than every element of \(R\) (so the two setsare disjoint). \(L\) has no greatest element. The union of \(L\) and\(R\) contains all rationals.

If we understand the theory of the reals prior to the cuts, we can saythat each cut is of the form \(L = (-\infty , r) \cap \mathbf{Q}, R =[r, \infty) \cap \mathbf{Q}\), where \(\mathbf{Q}\) is the set of allrationals and \(r\) is a unique real number uniquely determining anduniquely determined by the cut. It is obvious that each real number\(r\) uniquely determines a cut in this way (but we need to show thatthere are no other cuts). Given an arbitrary cut \((L, R)\), wepropose that \(r\) will be the least upper bound of \(L\). The LeastUpper Bound Axiom of the usual theory of the reals tells us that \(L\)has a least upper bound \((L\) is nonempty and any element of \(R\)(which is also nonempty) is an upper bound of \(L\), so \(L\) has aleast upper bound). Because \(L\) has no greatest element, its leastupper bound \(r\) cannot belong to \(L\). Any rational number lessthan \(r\) is easily shown to belong to \(L\) and any rational numbergreater than or equal to \(r\) is easily shown to belong to \(R\), sowe see that the cut we chose arbitrarily (and so any cut) is of theform \(L = (-\infty , r) \cap \mathbf{Q}, R = [r, \infty) \cap\mathbf{Q}\).

A bolder move (given a theory of the rationals but no prior theory ofthe reals) is todefine the real numbers as cuts. Notice thatthis requires us to have not only a theory of the rational numbers(not difficult to develop) but also a theory of sets of rationalnumbers: if we are to understand a real number to be identified with acut in the rational numbers, where a cut is a pair of sets of rationalnumbers, we do need to understand what a set of rational numbers is.If we are to demonstrate the existence of particular real numbers, weneed to have some idea what sets of rational numbers there are.

An example: when we have defined the rationals, and then defined thereals as the collection of Dedekind cuts, how do we define the squareroot of 2? It is reasonably straightforward to show that \((\{x \in\mathbf{Q} \mid x \lt 0 \vee x^2 \lt 2\}, \{x \in \mathbf{Q} \mid x\gt 0 \amp x^2 \ge 2\})\) is a cut and (once we define arithmeticoperations) that it is the positive square root of two. When weformulate this definition, we appear to presuppose that any propertyof rational numbers determines a set containing just those rationalnumbers that have that property.

1.2 The Frege-Russell definition of the natural numbers

Frege (1884) and Russell (1903) suggested that the simpler concept“natural number” also admits analysis in terms of sets.The simplest application of natural numbers is to count finite sets.We are all familiar with finite collections with 1, 2, 3, …elements. Additional sophistication may acquaint us with the empty setwith 0 elements.

Now consider the number 3. It is associated with a particular propertyof finite sets: having three elements. With that property it may beargued that we may naturally associate an object, the collection ofall sets with three elements. It seems reasonable to identify this setas the number 3. This definition might seem circular (3 is the set ofall sets with 3 elements?) but can actually be put on a firm,non-circular footing.

Define 0 as the set whose only element is the empty set. Let \(A\) beany set; define \(A + 1\) as the collection of all sets \(a \cup\{x\}\) where \(a \in A\) and \(x \not\in a\) (all sets obtained byadding a new element to an element of \(A)\). Then \(0 + 1\) isclearly the set we want to understand as \(1, 1 + 1\) is the set wewant to understand as \(2, 2 + 1\) is the set we want to understand as3, and so forth.

We can go further and define the set \(\mathbf{N}\) of naturalnumbers. 0 is a natural number and if \(A\) is a natural number, so is\(A + 1\). If a set \(S\) contains 0 and is closed under successor, itwill contain all natural numbers (this is one form of the principle ofmathematical induction). Define \(\mathbf{N}\) as the intersection ofall sets \(I\) which contain 0 and contain \(A + 1\) whenever \(A\) isin \(I\) and \(A + 1\) exists. One might doubt that there is anyinductive set, but consider the set \(V\) of all \(x\) such that \(x =x\) (the universe). There is a formal possibility that \(V\) itself isfinite, in which case there would be a last natural number \(\{V\}\);one usually assumes an Axiom of Infinity to rule out suchpossibilities.

2. Naive Set Theory

In the previous section, we took a completely intuitive approach toour applications of set theory. We assumed that the reader would goalong with certain ideas of what sets are like.

What are the identity conditions on sets? It seems entirely in accordwith common sense to stipulate that a set is precisely determined byits elements: two sets \(A\) and \(B\) are the same if for every\(x\), either \(x \in A\) and \(x \in B\) or \(x \not\in A\) and \(x\not\in B\):

\[ A = B \leftrightarrow \forall x(x \in A \leftrightarrow x \in B) \]

This is called theaxiom of extensionality.

It also seems reasonable to suppose that there are things which arenot sets, but which are capable of being members of sets (such objectsare often calledatoms orurelements). These objectswill have no elements (like the empty set) but will be distinct fromone another and from the empty set. This suggests the alternativeweaker axiom of extensionality (perhaps actually closer to commonsense),

\[ [\textrm{set}(A) \amp \textrm{set}(B) \amp \forall x(x \in A \leftrightarrow x \in B)] \rightarrow A = B \]

with an accompanying axiom of sethood

\[ x \in A \rightarrow \textrm{ set}(A) \]

What sets are there? The simplest collections are given by enumeration(the set {Tom,Dick,Harry} of men I seeover there, or (more abstractly) the set \(\{-2, 2\}\) of square rootsof 4. But even for finite sets it is often more convenient to give adefining property for elements of the set: consider the set of allgrandmothers who have a legal address in Boise, Idaho; this is afinite collection but it is inconvenient to list its members. Thegeneral idea is that for any property \(P\), there is a set of allobjects with property \(P\). This can be formalized as follows: Forany formula \(P(x)\), there is a set \(A\) (the variable \(A\) shouldnot be free in \(P(x))\) such that

\[ \forall x(x \in A \leftrightarrow P(x)). \]

This is called theaxiom of comprehension. If we have weakextensionality and a sethood predicate, we might want to say

\[ \exists A(\textrm{set}(A) \amp \forall x(x \in A \leftrightarrow P(x))) \]

The theory with these two axioms of extensionality and comprehension(usually without sethood predicates) is callednaive settheory.

It is clear that comprehension allows the definition of finite sets:our set of men {Tom,Dick,Harry} can alsobe written \(\{x \mid {}\) \(x = \textit{Tom}\) \({}\lor{}\) \(x =\textit{Dick}\) \({}\lor{}\) \(x = \textit{Harry}\}\). It also appearsto allow for the definition ofinfinite sets, such as the set\((\{x \in \mathbf{Q} \mid x \lt 0 \lor x^2 \lt 2\}\) mentioned abovein our definition of the square root of 2.

Unfortunately, naive set theory is inconsistent. Russell gave the mostconvincing proof of this, although his was not the first paradox to bediscovered: let \(P(x)\) be the property \(x \not\in x\). By the axiomof comprehension, there is a set \(R\) such that for any \(x, x \inR\) iff \(x \not\in x\). But it follows immediately that \(R \in R\)iff \(R \not\in R\), which is a contradiction.

It must be noted that our formalization of naive set theory is ananachronism. Cantor did not fully formalize his set theory, so itcannot be determined whether his system falls afoul of the paradoxes(he did not think so, and there are some who agree with him now).Frege formalized his system more explicitly, but his system was notprecisely a set theory in the modern sense: the most that can be saidis that his system is inconsistent, for basically the reason givenhere, and a full account of the differences between Frege’ssystem and our “naive set theory” is beside the point(though historically certainly interesting).

2.1 The other paradoxes of naive set theory

Two other paradoxes of naive set theory are usually mentioned, theparadox of Burali-Forti (1897)—which has historicalprecedence—and the paradox of Cantor. To review these otherparadoxes is a convenient way to review as well what the early settheorists were up to, so we will do it. Our formal presentation ofthese paradoxes is anachronistic; we are interested in theirmathematical content, but not necessarily in the exact way that theywere originally presented.

Cantor in his theory of sets was concerned with defining notions ofinfinite cardinal number and infinite ordinal number. Consideration ofthe largest ordinal number gave rise to the Burali-Forti paradox, andconsideration of the largest cardinal number gave rise to the Cantorparadox.

Infinite ordinals can be presented in naive set theory as isomorphismclasses of well-orderings (a well-ordering is a linear order \(\le\)with the property that any nonempty subset of its domain has a\(\le\)-least element). We use reflexive, antisymmetric, transitiverelations \(\le\) as our linear orders rather than the associatedirreflexive, asymmetric, transitive relations \(\lt\), because thisallows us to distinguish between the ordinal numbers 0 and 1 (Russelland Whitehead took the latter approach and were unable to define anordinal number 1 in theirPrincipia Mathematica).

There is a natural order on ordinal numbers (induced by the fact thatof any two well-orderings, at least one will be isomorphic to aninitial segment of the other) and it is straightforward to show thatit is a well-ordering. Since it is a well-ordering, it belongs to anisomorphism class (an ordinal number!) \(\Omega\).

It is also straightforward to show that the order type of the naturalorder on the ordinals restricted to the ordinals less than \(\alpha\)is \(\alpha\): the order on \(\{0, 1, 2\}\) is of order type 3, theorder on the finite ordinals \(\{0, 1, 2, \ldots \}\) is the firstinfinite ordinal \(\omega\), and so forth.

But then the order type of the ordinals \(\lt \Omega\) is \(\Omega\)itself, which means that the order type ofall the ordinals(including \(\Omega)\) is “greater”—but \(\Omega\)was defined as the order type of all the ordinals and should not begreater than itself!

This paradox was presented first (Cantor was aware of it) and Cantordid not think that it invalidated his system.

Cantor defined two sets as having the same cardinal number if therewas a bijection between them. This is of course simply common sense inthe finite realm; his originality lay in extending it to the infiniterealm and refusing to shy from the apparently paradoxical results. Inthe infinite realm, cardinal and ordinal number are not isomorphicnotions as they are in the finite realm: a well-ordering of order type\(\omega\) (say, the usual order on the natural numbers) and awell-ordering of order type \(\omega + \omega\) (say, the order on thenatural numbers which puts all odd numbers before all even numbers andputs the sets of odd and even numbers in their usual order) representdifferent ordinal numbers but their fields (being the same set!) arecertainly of the same size. Such “paradoxes” as theapparent equinumerousness of the natural numbers and the perfectsquares (noted by Galileo) and the one-to-one correspondence betweenthe points on concentric circles of different radii, noted since theMiddle Ages, were viewed as matter-of-fact evidence forequinumerousness of particular infinite sets by Cantor.

Novel with Cantor was the demonstration (1872) that there are infinitesets of different sizes according to this criterion. Cantor’sparadox, for which an original reference is difficult to find, is animmediate corollary of this result. If \(A\) is a set, define thepower set of \(A\) as the set of all subsets of \(A: \wp(A) =\{B \mid \forall x(x \in B \rightarrow x \in A)\}\). Cantor provedthat there can be no bijection between \(A\) and \(\wp(A)\) for anyset \(A\). Suppose that \(f\) is a bijection from \(A\) to \(\wp(A)\).Define \(C\) as \(\{a \in A \mid a \not\in f(a)\}\). Because \(f\) isa bijection there must be \(c\) such that \(f(c) = C\). Now we noticethat \(c \in C \leftrightarrow c \not\in f (c) = C\), which is acontradiction.

Cantor’s theorem just proved shows that for any set \(A\), thereis a set \(\wp(A)\) which is larger. Cantor’s paradox arises ifwe try to apply Cantor’s theorem to the set of all sets (or tothe universal set, if we suppose (with common sense) that not allobjects are sets). If \(V\) is the universal set, then \(\wp(V)\), thepower set of the universal set (the set of all sets) must have largercardinality than \(V\). But clearly no set can be larger incardinality than the set which contains everything!

Cantor’s response to both of these paradoxes was telling (andcan be formalized inZFC or in the related systems whichadmit proper classes, as we will see below). He essentially reinvokedthe classical objections to infinite sets on a higher level. Both thelargest cardinal and the largest ordinal arise from considering thevery largest collections (such as the universe \(V)\). Cantor drew adistinction between legitimate mathematical infinities such as thecountable infinity of the natural numbers (with its associatedcardinal number \(\aleph_0\) and many ordinal numbers \(\omega ,\omega + 1, \ldots ,\omega + \omega ,\ldots)\), the larger infinity ofthe continuum, and further infinities derived from these, which hecalledtransfinite, and what he called the Absolute Infinite,the infinity of the collection containing everything and of suchrelated notions as the largest cardinal and the largest ordinal. Inthis he followed St. Augustine (De Civitate Dei) who arguedin late classical times that the infinite collection of naturalnumbers certainly existed as an actual infinity because God was awareof each and every natural number, but because God’s knowledgeencompassed all the natural numbers their totality was somehow finitein His sight. The fact that his defense of set theory against theBurali-Forti and Cantor paradoxes was subsequently successfullyformalized inZFC and the related class systems leads some tobelieve that Cantor’s own set theory was not implicated in theparadoxes.

3. Typed Theories

An early response to the paradoxes of set theory (by Russell, whodiscovered one of them) was the development of type theory (see theappendix to Russell’sThe Principles of Mathematics(1903) or Whitehead & Russell’sPrincipiaMathematica (1910–1913).

The simplest theory of this kind, which we call TST (ThéorieSimple des Types, from the French, following Forster and others) isobtained as follows. We admit sorts of object indexed by the naturalnumbers (this is purely a typographical convenience; no actualreference to natural numbers is involved). Type 0 is inhabited by“individuals” with no specified structure. Type 1 isinhabited by sets of type 0 objects, and in general type \(n + 1\) isinhabited by sets of type \(n\) objects.

The type system is enforced by the grammar of the language. Atomicsentences are equations or membership statements, and they are onlywell-formed if they take one of the forms \(x^{n} = y^{n}\) or \(x^{n}\in y^{n+1}\).

The axioms of extensionality ofTST take the form

\[ A^{n+1} = B^{n+1} \leftrightarrow \forall x^n (x^n \in A^{n+1} \leftrightarrow x^n \in B^{n+1}); \]

there is a separate axiom for each \(n\).

The axioms of comprehension ofTST take the form (for anychoice of a type \(n\), a formula \(\phi\), and a variable \(A^{n+1}\)not free in \(\phi)\)

\[ \exists A^{n+1}\forall x^n (x^n \in A^{n+1} \leftrightarrow \phi) \]

It is interesting to observe that the axioms ofTST areprecisely analogous to those of naive set theory.

This is not the original type theory of Russell. Leaving asideRussell’s use of “propositional functions” insteadof classes and relations, the system ofPrincipia Mathematica(Whitehead & Russell 1910–1913), hereinafterPM,fails to be a set theory because it has separate types for relations(propositional functions of arity \(\gt 1)\). It was not until NorbertWiener observed in 1914 that it was possible to define the orderedpair as a set (his definition of \(\lt x, y \gt\) was not the current\(\{\{x\},\{x, y\}\}\), due to Kuratowski (1921), but \(\{\{\{x\},\varnothing \},\{\{y\}\}\})\) that it became clear that it is possibleto code relation types into set types. Russell frequently said inEnglish that relations could be understood as sets of pairs (or longertuples) but he had no implementation of this idea (in fact, he definedordered pairs as relations inPM rather than the now usualreverse!) For a discussion of the history of this simplified typetheory, see Wang 1970.

Further, Russell was worried about circularity in definitions of sets(which he believed to be the cause of the paradoxes) to the extentthat he did not permit a set of a given type to be defined by acondition which involved quantification over the same type or a highertype. Thispredicativity restriction weakens the mathematicalpower of set theory to an extreme degree.

In Russell’s system, the restriction is implemented bycharacterizing a type not only by the type of its elements but by anadditional integer parameter called its “order”. For anyobject with elements, the order of its type is higher than the orderof the type of its elements. Further, the comprehension axiom isrestricted so that the condition defining a set of a type of order\(n\) can contain parameters only of types with order \(\le n\) andquantifiers only over types with order \(\lt n\). Russell’ssystem is further complicated by the fact that it is not a theory ofsets, as we noted above, because it also contains relation types (thismakes a full account of it here inappropriate). Even if we restrict totypes of sets, a simple linear hierarchy of types is not possible iftypes have order, because each type has “power set” typesof each order higher than its own.

We present a typed theory of sets with predicativity restrictions (wehave seen this in work of Marcel Crabbé, but it may be older).In this system, the types do not have orders, but Russell’sramified type theory with orders (complete with relation types) can beinterpreted in it (a technical result of which we do not give anaccount here).

The syntax of predicativeTST is the same as that of theoriginal system. The axioms of extensionality are also the same. Theaxioms of comprehension of predicativeTST take the form (forany choice of a type \(n\), a formula \(\phi\), and a variable\(A^{n+1}\) not free in \(\phi\), satisfying the restriction that noparameter of type \(n + 2\) or greater appears in \(\phi\), nor doesany quantifier over type \(n + 1\) or higher appear in \(\phi)\)

\[ \exists A^{n+1}\forall x^n (x^n \in A^{n+1} \leftrightarrow \phi) \]

Predicative mathematics does not permit unrestricted mathematicalinduction: In impredicative type theory, we can define 0 and the“successor” \(A^+\) of a set just as we did above in naiveset theory (in a given type \(n)\) then define the set of naturalnumbers:

\[ \begin{aligned}\mathbf{N}^{n+1} = \{m^n \mid\forall A^{n+1}[[0^n \in A^{n+1} \amp \forall B^n (B^n \in A^{n+1} \rightarrow (B^+)^n \in A^{n+1})] \\ \rightarrow m^n \in A^{n+1}] \} \end{aligned} \]

Russell would object that the set \(\mathbf{N}^{n+1}\) is being“defined” in terms of facts aboutall sets\(A^{n+1}\): something is a type \(n + 1\) natural number just in caseit belongs to all type \(n + 1\) inductive sets. But one of the type\(n + 1\) sets in terms of which it is being “defined” is\(\mathbf{N}^{n+1}\) itself. (Independently of predicativist scruples,one does need an Axiom of Infinity to ensure that all natural numbersexist; this is frequently added toTST, as is the Axiom ofChoice).

For similar reasons, predicative mathematics does not permit the LeastUpper Bound Axiom of analysis (the proof of this axiom in a settheoretical implementation of the reals as Dedekind cuts fails for thesame kind of reason).

Russell solved these problems inPM by adopting an Axiom ofReducibility which in effect eliminated the predicativityrestrictions, but in later comments onPM he advocatedabandoning this axiom.

Most mathematicians are not predicativists; in our opinion the bestanswer to predicativist objections is to deny that comprehensionaxioms can properly be construed as definitions (though we admit thatwe seem to find ourselves frequently speaking loosely of \(\phi\) asthe condition which “defines” \(\{x \mid \phi \})\).

It should be noted that it is possible to do a significant amount ofmathematics while obeying predicativist scruples. The set of naturalnumbers cannot be defined in the predicative version ofTST,but the set of singletons of natural numbers can be defined and can beused to prove some instances of induction (enough to do quite a bit ofelementary mathematics). Similarly, a version of the Dedekindconstruction of the real numbers can be carried out, in which manyimportant instances of the least upper bound axiom will beprovable.

Type theories are still in use, mostly in theoretical computerscience, but these are type theories offunctions, withcomplexity similar to or greater than the complexity of the system ofPM, and fortunately outside the scope of this study.

4. Zermelo Set Theory and Its Refinements

In this section we discuss the development of the usual set theoryZFC. It did not spring up full-grown like Athena from thehead of Zeus!

4.1 Zermelo set theory

The original theoryZ of Zermelo (1908) had the followingaxioms:

Extensionality: Sets with the same elements areequal. (The original version appears to permit non-sets (atoms) whichall have no elements, much as in my discussion above under naive settheory).

Pairing: For any objects \(a\) and \(b\), there is aset \(\{a, b\} = \{x \mid x = a \lor x = b\}\). (the original axiomalso provided the empty set and singleton sets).

Union: For any set \(A\), there is a set \(\cup A =\{x \mid \exists y(x \in y \amp y \in A)\}\). The union of \(A\)contains all the elements of elements of \(A\).

Power Set: For any set \(A\), there is a set \(\wp(A)= \{x \mid \forall y(y \in x \rightarrow y \in A)\}\). The power setof \(A\) is the set of all subsets of \(A\).

Infinity: There is an infinite set. Zermelo’soriginal formulation asserted the existence of a set containing\(\varnothing\) and closed under the singleton operation:\(\{\varnothing ,\{\varnothing \},\{\{\varnothing \}\}, \ldots \}\).It is now more usual to assert the existence of a set which contains\(\varnothing\) and is closed under the von Neumann successoroperation \(x \mapsto x \cup \{x\}\). (Neither of these axioms impliesthe other in the presence of the other axioms, though they yieldtheories with the same mathematical strength).

Separation: For any property \(P(x)\) of objects andany set \(A\), there is a set \(\{x \in A \mid P(x)\}\) which containsall the elements of \(A\) with the property \(P\).

Choice: For every set \(C\) of pairwise disjointnonempty sets, there is a set whose intersection with each element of\(C\) has exactly one element.

We note that we do not need an axiom asserting the existence of\(\varnothing\) (which is frequently included in axiom lists as it wasin Zermelo’s original axiom set): the existence of any object(guaranteed by logic unless we use a free logic) along with separationwill do the trick, and even if we use a free logic the set provided byInfinity will serve (the axiom of Infinity can be reframed to say thatthere is a set which contains all sets with no elements (withoutpresupposing that there are any) and is closed under the desiredsuccessor operation).

Every axiom of Zermelo set theory except Choice is an axiom of naiveset theory. Zermelo chose enough axioms so that the mathematicalapplications of set theory could be carried out and restricted theaxioms sufficiently that the paradoxes could not apparently bederived.

The most general comprehension axiom ofZ is the axiom ofSeparation. If we try to replicate the Russell paradox by constructingthe set \(R' = \{x \in A \mid x \not\in x\}\), we discover that \(R'\in R' \leftrightarrow R' \in A \amp R' \not\in R'\), from which wededuce \(R' \not\in A\). For any set \(A\), we can construct a setwhich does not belong to it. Another way to put this is thatZ proves that there is no universal set: if we had theuniversal set \(V\), we would have naive comprehension, because wecould define \(\{x \mid P(x)\}\) as \(\{x \in V \mid P(x)\}\) for anyproperty \(P(x)\), including the fatal \(x \not\in x\).

In order to apply the axiom of separation, we need to have some sets\(A\) from which to carve out subsets using properties. The otheraxioms allow the construction of a lot of sets (all sets needed forclassical mathematics outside of set theory, though not all of thesets that even Cantor had constructed with apparent safety).

The elimination of the universal set seems to arouse resistance insome quarters (many of the alternative set theories recover it, andthe theories with sets and classes recover at least a universe of allsets). On the other hand, the elimination of the universal set seemsto go along with Cantor’s idea that the problem with theparadoxes was that they involved Absolutely Infinitecollections—purported “sets” that are too large.

4.2 From Zermelo set theory toZFC

Zermelo set theory came to be modified in certain ways.

The formulation of the axiom of separation was made explicit:“for each formula \(\phi\) of the first-order language withequality and membership, \(\{x \in A \mid \phi \}\) exists”.Zermelo’s original formulation referred more vaguely toproperties in general (and Zermelo himself seems to have objected tothe modern formulation as too restrictive).

The non-sets are usually abandoned (so the formulation ofExtensionality is stronger) thoughZFA (Zermelo-Fraenkel settheory with atoms) was used in the first independence proofs for theAxiom of Choice.

The axiom scheme of Replacement was added by Fraenkel to make itpossible to construct larger sets (even \(\aleph_{\omega}\) cannot beproved to exist in Zermelo set theory). The basic idea is that anycollection the same size as a set is a set, which can be logicallyformulated as follows: if \(\phi(x,y)\) is a functional formula [thismeans \(\forall x\forall y\forall z[(\phi(x,y) \amp \phi(x,z))\rightarrow y = z\)] and \(A\) is a set then there is a set \(\{y \mid\exists x \in A(\phi(x,y))\}\).

The axiom scheme of Foundation was added to support a definiteconception of what the universe of sets is like. The idea of thecumulative hierarchy of sets is that we construct sets in a sequenceof stages indexed by the ordinals: at stage 0, the empty set isconstructed; at stage \(\alpha + 1\), all subsets of the set of stage\(\alpha\) sets are constructed; at a limit stage \(\lambda\), theunion of all stages with index less than \(\lambda\) is constructed.Replacement is important for the implementation of this idea, asZ only permits one to construct sets belonging to the stages\(V_n\) and \(V_{\omega +n}\) for \(n\) a natural number (we use thenotation \(V_{\alpha}\) for the collection of all sets constructed atstage \(\alpha)\). The intention of the Foundation Axiom is to assertthat every set belongs to some \(V_{\alpha}\) ; the commonestformulation is the mysterious assertion that for any nonempty set\(A\), there is an element \(x\) of \(A\) such that \(x\) is disjointfrom \(A\). To see that this is at least implied by Foundation,consider that there must be a smallest \(\alpha\) such that \(A\)meets \(V_{\alpha}\), and any \(x\) in this \(V_{\alpha}\) will haveelements (if any) only of smaller rank and so not in \(A\).

Zermelo set theory has difficulties with the cumulative hierarchy. Theusual form of the Zermelo axioms (or Zermelo’s original form)does not prove the existence of \(V_{\alpha}\) as a set unless\(\alpha\) is finite. If the Axiom of Infinity is reformulated toassert the existence of \(V_{\omega}\), then the ranks proved to existas sets by Zermelo set theory are exactly those which appear in thenatural model \(V_{\omega +\omega}\) of this theory. Also, Zermelo settheory does not prove the existence of transitive closures of sets,which makes it difficult to assign ranks to sets in general. Zermeloset theory plus the assertion that every set belongs to a rank\(V_{\alpha}\) which is a set implies Foundation, the existence ofexpected ranks \(V_{\alpha}\) (not the existence of such ranks for allordinals \(\alpha\) but the existence of such a rank containing eachset which can be shown to exist), and the existence of transitiveclosures, and can be interpreted in Zermelo set theory withoutadditional assumptions.

A reader who wants to examine models ofZermelo set theory which exhibit pathological properties in thisregard can consult Mathias (2001b).

The Axiom of Choice is an object of suspicion to some mathematiciansbecause it is not constructive. It has become customary to indicatewhen a proof in set theory uses Choice, although most mathematiciansaccept it as an axiom. The Axiom of Replacement is sometimes replacedwith the Axiom of Collection, which asserts, for any formula\(\phi(x,y)\):

\[ \forall x \in A\exists y(\phi(x,y)) \rightarrow \exists C\forall x \in A\exists y \in C(\phi(x,y)) \]

Note that \(\phi\) here does not need to be functional; if for every\(x \in A\), there are somey's such that \(\phi(x,y)\), there is a set such that for every \(x \in A\), there is \(y\)in that set such that \(\phi(x, y)\). One way to build thisset is to take, for each \(x \in A\), all the \(y\)s of minimal ranksuch that \(\phi(x, y)\) and put them in \(C\). In the presence of allother axioms ofZFC, Replacement and Collection areequivalent; when the axiomatics is perturbed (or when the logic isperturbed, as in intuitionistic set theory) the difference becomesimportant. The Axiom of Foundation is equivalent to \(\in\)-Inductionhere but not in other contexts: \(\in\)-Induction is the assertionthat for any formula \(\phi\):

\[ \forall x((\forall y \in x(\phi(y)) \rightarrow \phi(x)) \rightarrow \forall x\phi(x) \]

i.e., anything which is true of any set if it is true of all itselements is true of every set without exception.

4.3 Critique of Zermelo set theory

A common criticism of Zermelo set theory is that it is anadhoc selection of axioms chosen to avoid paradox, and we have noreason to believe that it actually achieves this end. We believe suchobjections to be unfounded, for two reasons. The first is that thetheory of types (which is the result of a principled singlemodification of naive set theory) is easily shown to be preciselyequivalent in consistency strength and expressive power toZwith the restriction that all quantifiers in the formulas \(\phi\) ininstances of separation must be bounded in a set; this casts doubt onthe idea that the choice of axioms inZ is particularlyarbitrary. The fact that the von Neumann-Gödel-Bernays classtheory (discussed below) turns out to be a conservative extension ofZFC suggests that fullZFC is a precise formulationof Cantor’s ideas about the Absolute Infinite (and so notarbitrary). Further, the introduction of the Foundation Axiomidentifies the set theories of this class as the theories of aparticular class of structures (the well-founded sets) of which theZermelo axioms certainly seem to hold (whether Replacement holds soevidently is another matter).

These theories are frequently extended with large cardinal axioms (theexistence of inaccessible cardinals, Mahlo cardinals, weakly compactcardinals, measurable cardinals and so forth). These do not to ussignal a new kind of set theory, but represent answers to the questionas to how large the universe of Zermelo-style set theory is.

The choice of Zermelo set theory (leaving aside whether one goes on toZFC) rules out the use of equivalence classes of equinumeroussets as cardinals (and so the use of the Frege natural numbers) or theuse of equivalence classes of well-orderings as ordinals. There is nodifficulty with the use of the Dedekind cut formulation of the reals(once the rationals have been introduced). Instead of the equivalenceclass formulations of cardinal and ordinal numbers, thevonNeumann ordinals are used: a von Neumann ordinal is a transitiveset (all of its elements are among its subsets) which is well-orderedby membership. The order type of a well-ordering is the von Neumannordinal of the same length (the axiom of Replacement is needed toprove that every set well-ordering has an order type; this can fail tobe true in Zermelo set theory, where the von Neumann ordinal \(\omega+ \omega\) cannot be proven to exist but there are certainlywell-orderings of this and longer order types). The cardinal number\(|A|\) is defined as the smallest order type of a well-ordering of\(A\) (this requires Choice to work; without choice, we can useFoundation to define the cardinal of a set \(A\) as the set of allsets equinumerous with \(A\) and belonging to the first \(V_{\alpha}\)containing sets equinumerous with \(A)\). This is one respect in whichCantor’s ideas do not agree with the modern conception; heappears to have thought that he could define at least cardinal numbersas equivalence classes (or at least that is one way to interpret whathe says), although such equivalence classes would of course beAbsolutely Infinite.

4.4 Weak variations and theories with hypersets

Some weaker subsystems ofZFC are used. Zermelo set theory,the systemZ described above, is still studied. The furtherrestriction of the axiom of separation to formulas in which allquantifiers are bounded in sets \((\Delta_0\) separation) yields“bounded Zermelo set theory” or “Mac Lane settheory”, so called because it has been advocated as a foundationfor mathematics by Saunders Mac Lane (1986). It is interesting toobserve that Mac Lane set theory is precisely equivalent inconsistency strength and expressive power toTST with theAxiom of Infinity.Z is strictly stronger than Mac Lane settheory; the former theory proves the consistency of the latter. SeeMathias 2001a for an extensive discussion.

The set theoryKPU (Kripke-Platek set theory with urelements,for which see Barwise 1975) is of interest for technical reasons inmodel theory. The axioms ofKPU are the weak Extensionalitywhich allows urelements, Pairing, Union, \(\Delta_0\) separation,\(\Delta_0\) collection, and \(\in\)-induction for arbitrary formulas.Note the absence of Power Set. The technical advantage ofKPUis that all of its constructions are “absolute” in asuitable sense. This makes the theory suitable for the development ofan extension of recursion theory to sets.

The dominance ofZFC is nowhere more evident than in thegreat enthusiasm and sense of a new departure found in reactions tothe very slight variation of this kind of set theory embodied inversions ofZFC without the foundation axiom. It should benoted that the Foundation Axiom was not part of the originalsystem!

We describe two theories out of a range of possible theories ofhypersets (Zermelo-Frankel set theory without foundation). Asource for theories of this kind is Aczel 1988.

In the following paragraphs, we will use the term “graph”for a relation, and “extensional graph” for a relation\(R\) satisfying

\[ (\forall y,z \in \textit{field}(R)[\forall x(xRy \equiv xRz) \rightarrow y = z]. \]

A decoration of a graph \(G\) is a function \(f\) with the propertythat \(f(x) = \{f(y) \mid yGx\}\) for all \(x\) in the field of \(G\).InZFC, all well-founded relations have unique decorations,and non-well-founded relations have no decorations. Aczel proposed hisAnti-Foundation Axiom:every set graph has a uniquedecoration. Maurice Boffa considered a stronger axiom: everypartial, injective decoration of an extensional set graph \(G\) whosedomain contains the \(G\)-preimages of all its elements can beextended to an injective decoration of all of \(G\).

The Aczel system is distinct from the Boffa system in having fewerill-founded objects. For example, the Aczel theory proves that thereis just one object which is its own sole element, while the Boffatheory provides a proper class of such objects. The Aczel system hasbeen especially popular, and we ourselves witnessed a great deal ofenthusiasm for this subversion of the cumulative hierarchy. We aredoubtless not the only ones to point this out, but we did notice andpoint out to others that at least the Aczel theory has a perfectlyobvious analogue of the cumulative hierarchy. If \(A_{\alpha}\) is arank, the successor rank \(A_{\alpha +1}\) will consist of all thosesets which can be associated with graphs \(G\) with a selected point\(t\) with all elements of the field of \(G\) taken from\(A_{\alpha}\). The zero and limit ranks are constructed just as inZFC. Every set belongs to an \(A_{\alpha}\) for \(\alpha\)less than or equal to the cardinality of its transitive closure. (Itseems harder to impose rank on the world of the Boffa theory, thoughit can be done: the proper class of self-singletons is an obviousdifficulty, to begin with!).

It is true (and has been the object of applications in computerscience) that it is useful to admit reflexive structures for somepurposes. The kind of reflexivity permitted by Aczel’s theoryhas been useful for some such applications. However, such structuresare modelled in well-founded set theory (using relations other thanmembership) with hardly more difficulty, and the reflexivity admittedby Aczel’s theory (or even by a more liberal theory like that ofBoffa) doesn’t come near the kind of non-well-foundedness foundin genuinely alternative set theories, especially those with universalset. These theories are close variants of the usual theoryZFC, caused by perturbing the last axiom to be added to thissystem historically (although, to be fair, the Axiom of Foundation isthe one which arguably defines the unique structure which the usualset theory is about; the anti-foundation axioms thus invite us tocontemplate different, even if closely related, universalstructures).

5. Theories with Classes

5.1 Class theory overZFC

Even those mathematicians who accepted the Zermelo-style set theoriesas the standard (most of them!) often found themselves wanting to talkabout “all sets”, or “all ordinals”, orsimilar concepts.

Von Neumann (who actually formulated a theory of functions, not sets),Gödel, and Bernays developed closely related systems which admit,in addition to the sets found inZFC, general collections ofthese sets. (In Hallett 1984, it is argued that the system of vonNeumann was the first system in which the Axiom of Replacement wasimplemented correctly [there were technical problems withFraenkel’s formulation], so it may actually be the firstimplementation ofZFC.)

We present a theory of this kind. Its objects areclasses.Among the classes we identify those which are elements as sets.

Axiom of extensionality: Classes with the sameelements are the same.

Definition: A class \(x\) is aset just incase there is a class \(y\) such that \(x \in y\). A class which isnot a set is said to be a proper class.

Axiom of class comprehension: For any formula\(\phi(x)\) which involves quantification only over all sets (not overall classes), there is a class \(\{x \mid \phi(x)\}\) which containsexactly thosesets \(x\) for which \(\phi(x)\) is true.

The axiom scheme of class comprehension with quantification only oversets admits a finite axiomatization (a finite selection of formulas\(\phi\) (most with parameters) suffices) and was historically firstpresented in this way. It is an immediate consequence of classcomprehension that the Russell class \(\{x \mid x \not\in x\}\) cannotbe a set (so there is at least one proper class).

Axiom of limitation of size: A class \(C\) is properif and only if there is a class bijection between \(C\) and theuniverse.

This elegant axiom is essentially due to von Neumann. A classbijection is a class of ordered pairs; there might be pathology hereif we did not have enough pairs as sets, but other axioms do providefor their existence. It is interesting to observe that this axiomimplies Replacement (a class which is the same size as a set cannot bethe same size as the universe) and, surprisingly, implies Choice (thevon Neumann ordinals make up a proper class essentially by theBurali-Forti paradox, so the universe must be the same size as theclass of ordinals, and the class bijection between the universe andthe ordinals allows us to define a global well-ordering of theuniverse, whose existence immediately implies Choice).

Although Class Comprehension and Limitation of Size appear to tell usexactly what classes there are and what sets there are, more axiomsare required to make our universe large enough. These can be taken tobe the axioms ofZ (other than extensionality and choice,which are not needed): the sethood of pairs of sets, unions of sets,power sets of sets, and the existence of an infinite set are enough togive us the world ofZFC. Foundation is usually added. Theresulting theory is a conservative extension ofZFC: itproves all the theorems ofZFC about sets, and it does notprove any theorem about sets which is not provable inZFC.For those with qualms about choice (or about global choice),Limitation of Size can be restricted to merely assert that the imageof a set under a class function is a set.

We have two comments about this. First, the mental furniture of settheorists does seem to include proper classes, though usually it isimportant to them that all talk of proper classes can be explainedaway (the proper classes are in some sense “virtual”).Second, this theory (especially the version with the strong axiom ofLimitation of Size) seems to capture the intuition of Cantor about theAbsolute Infinite.

A stronger theory with classes, but still essentially a version ofstandard set theory, is the Kelley-Morse set theory in which ClassComprehension is strengthened to allow quantification over all classesin the formulas defining classes. Kelley-Morse set theory is notfinitely axiomatizable, and it is stronger thanZFC in thesense that it allows a proof of the consistency ofZFC.

5.2 Ackermann set theory

The next theory we present was actually embedded in the settheoretical proposals of Paul Finsler, which were (taken as a whole)incoherent (see the notes on Finsler set theory available in theOther Internet Resources). Ackermann later (and apparently independently) presented it again. Itis to all appearances a different theory from the standard one (it isour first genuine “alternative set theory”) but it turnsout to be essentially the same theory asZF (and choice canbe added to make it essentially the same asZFC).

Ackermann set theory is a theory ofclasses in which someclasses aresets, but there is no simple definition of whichclasses are sets (in fact, the whole power of the theory is that thenotion of set is indefinable!)

All objects are classes. The primitive notions are equality,membership and sethood. The axioms are

Axiom of extensionality: Classes with the sameelements are equal.

Axiom of class comprehension: For any formula\(\phi\), there is a class \(\{x \in V \mid \phi(x)\}\) whose elementsare exactly the sets \(x\) such that \(\phi(x) (V\) here denotes theclass of all sets). [But note that it is not the case here that allelements of classes are sets].

Axiom of elements: Any element of a set is a set.

Axiom of subsets: Any subset of a set is a set.

Axiom of set comprehension: For any formula \(\phi(x)\) which does not mention the sethood predicate and in which allfree variables other than \(x\) denote sets, and which further has theproperty that \(\phi(x)\) is only true of sets \(x\), the class \(\{x\mid \phi \}\) (which exists by Class Comprehension since all suitable\(x\) are sets) is a set.

One can conveniently add axioms of Foundation and Choice to thissystem.

To see the point (mainly, to understand what Set Comprehension says)it is a good idea to go through some derivations.

The formula \(x = a \lor x = b\) (where \(a\) and \(b\) are sets) doesnot mention sethood, has only the sets \(a\) and \(b\) as parameters,and is true only of sets. Thus it defines a set, and Pairing is truefor sets.

The formula \(\exists y(x \in y \amp y \in a)\), where \(a\) is a set,does not mention sethood, has only the set \(a\) as a parameter, andis true only of sets by the Axiom of Elements (any witness \(y\)belongs to the set \(a\), so \(y\) is a set, and \(x\) belongs to theset \(y\), so \(x\) is a set). Thus Union is true for sets.

The formula \(\forall y(y \in x \rightarrow y \in a)\), where \(a\) isa set, does not mention sethood, has only the set \(a\) as aparameter, and is true only of sets by the Axiom of Subsets. ThusPower Set is true for sets.

The big surprise is that this system proves Infinity. The formula \(x\ne x\) clearly defines a set, the empty set \(\varnothing\). Considerthe formula

\[ \forall I\left[\varnothing \in I \amp \forall y(y \in I \rightarrow y\cup \{y\} \in I) \rightarrow x \in I\right] \]

This formula does not mention sethood and has no parameters (or justthe set parameter \(\varnothing)\). The class \(V\) of all sets has\(\varnothing\) as a member and contains \(y \cup \{y\}\) if itcontains \(y\) by Pairing and Union for sets (already shown). Thus any\(x\) satisfying this formula is a set, whence the extension of theformula is a set (clearly the usual set of von Neumann naturalnumbers). So Infinity is true in the sets of Ackermann set theory.

It is possible (but harder) to prove Replacement as well in the realmof well-founded sets (which can be the entire universe of sets ifFoundation for classes is added as an axiom). It is demonstrable thatthe theorems of Ackermann set theory about well-founded sets areexactly the theorems ofZF (Lévy 1959; Reinhardt1970).

We attempt to motivate this theory (in terms of the cumulativehierarchy). Think of classes as collections which merely existpotentially. The sets are those classes which actually getconstructed. Extensionality for classes seems unproblematic. Allcollections of the actual sets could have been constructed byconstructing one more stage of the cumulative hierarchy: thisjustifies class comprehension. Elements of actual sets are actualsets; subcollections of actual sets are actual sets; these do not seemproblematic. Finally, we assert that any collection of classes whichis defined without reference to the realm of actual sets, which isdefined in terms of specific objects which are actual, and which turnsout only to contain actual elements is actual. When one getsone’s mind around this last assertion, it can seem reasonable. Aparticular thing to note about such a definition is that it is“absolute”: the collection of all actual sets is a properclass and not itself an actual set, because we are not committed tostopping the construction of actual sets at any particular point; butthe elements of a collection satisfying the conditions of setcomprehension do not depend on how many potential collections we makeactual (this is why the actuality predicate is not allowed to appearin the “defining” formula).

It may be a minority opinion, but we believe (after somecontemplation) that the Ackermann axioms have their own distinctivephilosophical motivation which deserves consideration, particularlysince it turns out to yield basically the same theory asZFfrom an apparently quite different starting point.

Ackermann set theory actually proves that there are classes which havenon-set classes as elements; the difference between sets and classesprovably cannot be as in von Neumann-Gödel-Bernays class theory.A quick proof of this concerns ordinals. There is a proper class vonNeumann ordinal \(\Omega\), the class of all set von Neumann ordinals.We can prove the existence of \(\Omega + 1\) using set comprehension:if \(\Omega\) were the last ordinal, then “\(x\) is a vonNeumann ordinal with a successor” would be a predicate notmentioning sethood, with no parameters (so all parameters sets), andtrue only of sets. But this would make the class of all set ordinals aset, and the class of all set ordinals is \(\Omega\) itself, whichwould lead to the Burali-Forti paradox. So \(\Omega + 1\) must exist,and is a proper class with the proper class \(\Omega\) as anelement.

There is a meta-theorem ofZF called the Reflection Principlewhich asserts that any first-order assertion which is true of theuniverse \(V\) is also true of some set. This means that for anyparticular proof inZF, there is a set \(M\) which might aswell be the universe (because any proof uses only finitely manyaxioms). A suitable such set \(M\) can be construed as the universe ofsets and the actual universe \(V\) can be construed as the universe ofclasses. The set \(M\) has the closure properties asserted in Elementsand Subsets if it is a limit rank; it can be chosen to have as many ofthe closure properties asserted in Set Comprehension (translated intoterms of \(M)\) as a proof in Ackermann set theory requires. Thismachinery is what is used to show that Ackermann set theory provesnothing about sets thatZF cannot prove: one translates aproof in Ackermann set theory into a proof inZFC using theReflection Principle.

6. New Foundations and Related Systems

6.1 The definition ofNF

We have alluded already to the fact that the simple typed theory ofsetsTST can be shown to be equivalent to an untyped theory(Mac Lane set theory, aka bounded Zermelo set theory). We brieflyindicate how to do this: choose any map \(f\) in the model which is aninjection with domain the set of singletons of type 0 objects andrange included in type 1 (the identity on singletons of type 0 objectsis an example). Identify each type 0 object \(x^0\) with the type 1object \(f (\{x^0\})\); then introduce exactly those identificationsbetween objects of different types which are required byextensionality: every type 0 object is identified with a type 1object, and an easy meta-induction shows that every type \(n\) objectis identified with some type \(n + 1\) object. The resulting structurewill satisfy all the axioms of Zermelo set theory except Separation,and will satisfy all instances of Separation in which each quantifieris bounded in a set (this boundedness comes in because each instanceof Comprehension inTST has each quantifier bounded in atype, which becomes a bounding set for that quantifier in theinterpretation of Mac Lane set theory). It will satisfy Infinity andChoice if the original model ofTST satisfies these axioms.The simplest map \(f\) is just the identity on singletons of type 0objects, which will have the effect of identifying each type 0 objectwith its own singleton (a failure of foundation). It can be arrangedfor the structure to satisfy Foundation: for example, if Choice holdstype 0 can be well-ordered and each element of type 0 identified withthe corresponding segment in the well-ordering, so that type 0 becomesa von Neumann ordinal. (A structure of this kind will never modelReplacement, as there will be a countable sequence of cardinals [thecardinalities of the types] which is definable and cofinal below thecardinality of the universe.) See Mathias 2001a for a fullaccount.

Quine’s set theory New Foundations (abbreviatedNF,proposed in 1937 in his paper “New Foundations for MathematicalLogic”), is also based on a procedure for identifying theobjects in successive types in order to obtain an untyped theory.However, in the case ofNF and related theories, the idea isto identify the entirety of type \(n + 1\) with type \(n\); the typehierarchy is to be collapsed completely. An obvious difficulty withthis is that Cantor’s theorem suggests that type \(n + 1\)(being the “power set” of type \(n)\) should beintrinsically larger than type \(n\) (and in some senses this isdemonstrably true).

We first outline the reason that Quine believed that it might bepossible to collapse the type hierarchy. We recall fromabove:

We admit sorts of object indexed by the natural numbers (this ispurely a typographical convenience; no actual reference to naturalnumbers is involved). Type 0 is inhabited by “individuals”with no specified structure. Type 1 is inhabited by sets of type 0objects, and in general type \(n + 1\) is inhabited by sets of type\(n\) objects.

The type system is enforced by the grammar of the language. Atomicsentences are equations or membership statements, and they are onlywell-formed if they take one of the forms \(x^{n} = y^{n}\) or \(x^{n}\in y^{n+1}\).

The axioms of extensionality ofTST take the form

\[ A^{n+1} = B^{n+1} \leftrightarrow \forall x^n (x^n \in A^{n+1} \leftrightarrow x^n \in B^{n+1}); \]

there is a separate axiom for each \(n\).

The axioms of comprehension ofTST take the form (for anychoice of a type \(n\), a formula \(\phi\), and a variable \(A^{n+1}\)not free in \(\phi)\)

\[ \exists A^{n+1}\forall x^n (x^n \in A^{n+1} \leftrightarrow \phi) \]

It is interesting to observe that the axioms ofTST areprecisely analogous to those of naive set theory.

For any formula \(\phi\), define \(\phi^+\) as the formula obtained byraising every type index on a variable in \(\phi\) by one. Quineobserves that any proof of \(\phi\) can be converted into a proof of\(\phi^+\) by raising all type indices in the original proof. Further,every object \(\{x^n \mid \phi \}^{n+1}\) that the theory permits usto define has a precise analogue \(\{x^{n+1} \mid \phi^{+}\}^{n+2}\)in the next higher type; this can be iterated to produce“copies” of any defined object in each higher type.

For example, the Frege definition of the natural numbers works inTST. The number \(3^2\) can be defined as the (type 2) set ofall (type 1) sets with three (type 0) elements. The number \(3^3\) canbe defined as the (type 3) set of all (type 2) sets with three (type1) elements. The number \(3^{27}\) can be defined as the (type 27) setof all (type 26) sets with three (type 25) elements. And so forth. Ourlogic does not even permit us to say that these are a sequence ofdistinct objects; we cannot ask the question as to whether they areequal or not.

Quine suggested, in effect, that we tentatively suppose that \(\phi\equiv \phi^+\) for all \(\phi\) ; it is not just the case that if wecan prove \(\phi\), we can prove \(\phi^+\), but that the truth valuesof these sentences are the same. It then becomes strongly tempting toidentify \(\{x^n \mid \phi \}^{n+1}\) with \(\{x^{n+1} \mid\phi^{+}\}^{n+2}\), since anything we can say about these two objectsis the same (and our new assumption implies that we will assign thesame truth values to corresponding assertions about these twoobjects).

The theoryNF which we obtain can be described briefly (butdeceptively) as being the first-order untyped theory with equality andmembership having the same axioms asTST but without thedistinctions of type. If this is not read very carefully, it may beseen as implying that we have adopted the comprehension axioms ofnaive set theory,

\[ \exists A\forall x(x \in A \leftrightarrow \phi) \]

for each formula \(\phi\). But we have not. We have only adopted thoseaxioms for formulas \(\phi\) which can be obtained from formulas ofTST by dropping distinctions of type between the variables(without introducing any identifications between variables ofdifferent types). For example, there is no way that \(x \not\in x\)can be obtained by dropping distinctions of type from a formula ofTST, without identifying two variables of different type.Formulas of the untyped language of set theory in which it is possibleto assign a type to each variable (the same type wherever it occurs)in such a way as to get a formula ofTST are said to bestratified. The axioms ofNF are strongextensionality (no non-sets) and stratified comprehension.

Though the set \(\{x \mid x \not\in x\}\) is not provided bystratified comprehension, some other sets which are not found in anyvariant of Zermelo set theory are provided. For example, \(x = x\) isa stratified formula, and the universal set \(V = \{x \mid x = x\}\)is provided by an instance of comprehension. Moreover, \(V \in V\) istrue.

All mathematical constructions which can be carried out inTST can be carried out inNF. For example, the Fregenatural numbers can be constructed, and so can the set \(\mathbf{N}\)of Frege natural numbers. For example, the Frege natural number 1, theset ofall one-element sets, is provided byNF.

6.2 The consistency problem forNF; the known consistent subsystems

No contradictions are known to follow fromNF, but someuncomfortable consequences do follow. The Axiom of Choice is known tofail inNF: Specker (1953) proved that the universe cannot bewell-ordered. (Since the universe cannot be well-ordered, it followsthat the “Axiom” of Infinity is a theorem ofNF:if the universe were finite, it could be well-ordered.) This might bethought to be what one would expect on adopting such a dangerouscomprehension scheme, but this turns out not to be the problem. Theproblem is with extensionality.

Jensen (1968) showed thatNFU (New Foundations withurelements), the version of New Foundations in which extensionality isweakened to allow many non-sets (as described above under naive settheory) is consistent, is consistent with Infinity and Choice, and isalso consistent with the negation of Infinity (which of course impliesChoice).NFU, which has the full stratified comprehensionaxiom ofNF with all its frighteningly big sets, is weaker inconsistency strength than Peano arithmetic;NFU + Infinity +Choice is of the same strength asTST with Infinity andChoice or Mac Lane set theory.

Some other fragments ofNF, obtained by weakeningcomprehension rather than extensionality, are known to be consistent.NF3, the version ofNF in which oneaccepts only those instances of the axiom of comprehension which canbe typed using three types, was shown to be consistent by Grishin(1969).

NFP (predicativeNF), the version ofNF inwhich one accepts only instances of the axiom of comprehension whichcan be typed so as to be instances of comprehension of predicativeTST (described above under type theories) was shown to beconsistent by Marcel Crabbé (1982). He also demonstrated theconsistency of the theoryNFI in which one allows allinstances of stratified comprehension in which no variable appears oftype higher than that assigned to the set being defined (boundvariables of the same type as that of the set being defined arepermitted, which allows some impredicativity). One would like to readthe nameNFI as “impredicativeNF” butone cannot, as it is more impredicative thanNFP, not moreimpredicative thanNF itself.

NF3+Infinity has the same strength as second-orderarithmetic. So doesNFI (which has just enoughimpredicativity to define the natural numbers, and not enough for theLeast Upper Bound Axiom).NFP is equivalent to a weakerfragment of arithmetic, but does (unlikeNFU) prove Infinity:this is the only application of the Specker proof of the negation ofthe Axiom of Choice to a provably consistent theory. Either Union istrue (in which case we readily get all ofNF andSpecker’s proof of Infinity goes through) or Union is not true,in which case we note that all finite sets have unions, so there mustbe an infinite set.NF3 has considerable interestfor a surprising reason: it turns out thatall infinitemodels ofTST3 (simple type theory with threetypes) satisfy the ambiguity schema \(\phi \equiv \phi^+\) (of coursethis only makes sense for formulas with one or two types) and thisturns out to be enough to show that for any infinite model ofTST3 there is a model ofNF3with the same theory.NF4 is the same theory asNF (Grishin 1969), and we have no idea how to get a model ofTST4 to satisfy ambiguity.

Very recently, Sergei Tupailo (2010) has proved the consistency ofNFSI, the fragment ofNF consisting ofextensionality and those instances of Comprehension (\(\{x \in A \mid\phi \}\) exists) which are stratified and in which the variable \(x\)is assigned the lowest type. Tupailo’s proof is highlytechnical, but Marcel Crabbé pointed out that a structure forthe language of set theory in which the sets are exactly the finiteand cofinite collections satisfies this theory (so it is very weak).It should be noted that Tupailo’s model ofNFSIsatisfies additional propositions of interest not satisfied by thevery simple model of Crabbé, such as the existence of eachFrege natural number. It is of some interest whether this new fragmentrepresents an independent way of getting a consistent fragment ofNF. Note thatNFU+NFSI isNFbecauseNFSI has strong extensionality. Also,NFP+NFSI isNF becauseNFSIincludes Union. The relationship ofNFSI toNF\(_3\)has been clarified by Marcel Crabbé in 2016. Tupailo’stheory is shown not to be a fragment of Grishin’s, and thusrepresents a fourth known method of getting consistent fragments.

6.3 Mathematics inNFU + Infinity + Choice

Of these set theories, onlyNFU with Infinity, Choice andpossibly further strong axioms of infinity (of which more anon) isreally mathematically serviceable. We examine the construction ofmodels of this theory and the way mathematics works inside thistheory. A source for this development is Holmes 1998. Rosser 1973develops the foundations of mathematics inNF: it can adaptedtoNFU fairly easily).

A model ofNFU can be constructed as follows. Well-knownresults of model theory allow the construction of a nonstandard modelofZFC (actually, a model of Mac Lane set theory suffices)with an external automorphism \(j\) which moves a rank \(V_{\alpha}\).We stipulate without loss of generality that \(j(\alpha) \lt \alpha\).The universe of our model ofNFU will be \(V_{\alpha}\) andthe membership relation will be defined as

\[ x \in_{NFU} y \equiv_{def} j(x) \in y \amp y \in V_{j(\alpha)+1} \]

(where \(\in\) is the membership relation of the nonstandard model).The proof that this is a model ofNFU is not long, but it isinvolved enough that we refer the reader elsewhere. The basic idea isthat the automorphism allows us to code the (apparent) power set\(V_{\alpha +1}\) of our universe \(V_{\alpha}\) into the“smaller” \(V_{j(\alpha)+1}\) which is included in ouruniverse; the left over objects in \(V_{\alpha} - V_{j(\alpha)+1}\)become urelements. Note that \(V_{\alpha} - V_{j(\alpha)+1}\) is mostof the domain of the model ofNFU in a quite strong sense:almost all of the universe is made up of urelements (note that each\(V_{\beta +1}\) is the power set of \(V_{\beta}\), and so is strictlylarger in size, and not one but many stages intervene between\(V_{j(\alpha)+1}\) (the collection of “sets”) and\(V_{\alpha}\) (the “universe”)). This construction isrelated to the construction used by Jensen, but is apparently firstdescribed explicitly in Boffa 1988.

In any model ofNFU, a structure which looks just like one ofthese models can be constructed in the isomorphism classes ofwell-founded extensional relations. The theory of isomorphism classesof well-founded extensional relations with a top element looks likethe theory of (an initial segment of) the usual cumulative hierarchy,because every set in Zermelo-style set theory is uniquely determinedby the isomorphism type of the restriction of the membership relationto its transitive closure. The surprise is that we not only see astructure which looks like an initial segment of the cumulativehierarchy: we also see an external endomorphism of this structurewhich moves a rank (and therefore cannot be a set), in terms of whichwe can replicate the model construction above and get aninterpretation ofNFU of this kind insideNFU! Theendomorphism is induced by the map \(T\) which sends the isomorphismtype of a relation \(R\) to the isomorphism type of \(R^{\iota} = \{\langle \{x\}, \{y\}\rangle \mid xRy\}\). There is no reason tobelieve that \(T\) is a function: it sends any relation \(R\) to arelation \(R^{\iota}\) which is one type higher in terms ofTST. It is demonstrable that \(T\) on the isomorphism typesof well-founded extensional relations is not a set function (we willnot show this here, but our discussion of the Burali-Forti paradoxbelow should give a good idea of the reasons for this). See Holmes(1998) for the full discussion.

This suggests that the underlying world view ofNFU, in spiteof the presence of the universal set, Frege natural numbers, and otherlarge objects, may not be that different from the world view ofZermelo-style set theory; we build models ofNFU in a certainway in Zermelo-style set theory, andNFU itself reflects thiskind of construction internally. A further, surprising result (Holmes2012) is that in models ofNFU constructed from a nonstandard\(V_{\alpha}\) with automorphism as above, the membership relation onthe nonstandard \(V_{\alpha}\) is first-order definable (in a veryelaborate way) in terms of the relation \(\in_{NFU}\); this is verysurprising, since it seems superficially as if all information aboutthe extensions of the urelements has been discarded in thisconstruction. But this turns out not to be the case (and this meansthat the urelements, which seem to have no internal information,nonetheless have a great deal of structure in these models).

Models ofNFU can have a “finite” (but externallyinfinite) universe if the ordinal \(\alpha\) in the construction is anonstandard natural number. If \(\alpha\) is infinite, the model ofNFU will satisfy Infinity. If the Axiom of Choice holds inthe model of Zermelo-style set theory, it will hold in the model ofNFU.

Now we look at the mathematical universe according toNFU,rather than looking at models ofNFU from the outside.

The Frege construction of the natural numbers works perfectly inNFU. If Infinity holds, there will be no last natural numberand we can define the usual set \(\mathbf{N}\) of natural numbers justas we did above.

Any of the usual ordered pair constructions works inNFU. Theusual Kuratowski pair is inconvenient inNF or inNFU, because the pair is two types higher than itsprojections in terms ofTST. This means that functions andrelations are three types higher than the elements of their domainsand ranges. There is a type-level pair defined by Quine (1945;type-level because it is the same type as its projections) which isdefinable inNF and also on \(V_{\alpha}\) for any infiniteordinal \(\alpha\); this pair can be defined and used inNFand the fact that it is definable on infinite \(V_{\alpha}\) meansthat it can be assumed inNFU+Infinity that there is atype-level ordered pair (the existence of such a pair also followsfrom Infinity and Choice together). This would make the typedisplacement between functions and relations and elements of theirdomains and ranges just one, the same as the displacement between thetypes of sets and their elements. We will assume that ordered pairsare of the same type as their projections in the sequel, but we willnot present the rather complicated definition of the Quine pair.

Once pairs are defined, the definition of relations and functionsproceeds exactly as in the usual set theory. The definitions ofintegers and rational numbers present no problem, and the Dedekindconstruction of the reals can be carried out as usual. We will focushere on developing the solutions to the paradoxes of Cantor andBurali-Forti inNFU, which give a good picture of the oddcharacter of this set theory, and also set things up nicely for abrief discussion of natural strong axioms of infinity forNFU. It is important to realize as we read the ways in whichNFU evades the paradoxes that this evasion is successful:NFU is known to be consistent if the usual set theory isconsistent, and close examination of the models ofNFU showsexactly why these apparent dodges work.

Two sets are said to be of the same cardinality just in case there isa bijection between them. This is standard. But we then proceed todefine \(|A|\) (the cardinality of a set \(A)\) as the set of all setswhich are the same size as \(A\), realizing the definition intended byFrege and Russell, and apparently intended by Cantor as well. Noticethat \(|A|\) is one type higher than \(A\). The Frege natural numbersare the same objects as the finite cardinal numbers.

The Cantor theorem of the usual set theory asserts that \(|A| \lt|\wp(A)|\). This is clearly not true inNFU, since | \(V|\)is the cardinality of the universe and \(|\wp(V)|\) is the cardinalityof the set of sets, and in fact \(|V| \gt \gt |\wp(V)|\) in all knownmodels ofNFU (there are many intervening cardinals in allsuch models). But \(|A| \lt |\wp(A)|\) does not make sense inTST: it is ill-typed. The correct theorem inTST,which is inherited byNFU, is \(|\wp_1 (A)| \lt |\wp(A)|\),where \(\wp_1 (A)\) is the set of one-element subsets of \(A\), whichis at the same type as the power set of \(A\). So we have \(|\wp_1(V)| \lt |\wp(V)|\): there are more sets than there are singletonsets. The apparent bijection \(x \mapsto \{x\}\) between \(\wp_1 (V)\)and \(V\) cannot be a set (and there is no reason to expect it to be aset, since it has an unstratified definition).

A set which satisfies \(|A| = |\wp_1 (A)|\) is called acantorian set, since it satisfies the usual form ofCantor’s theorem. A set \(A\) which satisfies the strongercondition that the restriction of the singleton map to \(A\) is a setis said to bestrongly cantorian (s.c.). Strongly cantoriansets are important because it is not necessary to assign a relativetype to a variable known to be restricted to a strongly cantorian set,as it is possible to use the restriction of the singleton map and itsinverse to freely adjust the type of any such variable for purposes ofstratification. The strongly cantorian sets are can be thought of asanalogues of thesmall sets of the usual set theory.

Ordinal numbers are defined as equivalence classes of well-orderingsunder similarity. There is a natural order on ordinal numbers, and inNFU as in the usual set theory it turns out to be awell-ordering—and, as in naive set theory, a set! Since thenatural order on the ordinal numbers is a set, it has an order type\(\Omega\) which is itself one of the ordinal numbers. Now in theusual set theory we prove that the order type of the restriction ofthe natural order on the ordinals to the ordinals less than \(\alpha\)is the ordinal \(\alpha\) itself; however, this is an ill-typedstatement inTST, where, assuming a type level ordered pair,the second occurrence of \(\alpha\) is two types higher than the first(it would be four types higher if the Kuratowski ordered pair wereused). Since the ordinals are isomorphism types of relations, we candefine the operation \(T\) on them as above.

The order type of the restriction of the natural order on the ordinalsto the ordinals less than \(\alpha\) is the ordinal \(T^2(\alpha)\)

is an assertion which makes sense inTST and is in fact trueinTST and so inNFU. We thus find that the ordertype of the restriction of the natural order on the ordinals to theordinals less than \(\Omega\) is \(T^2 (\Omega)\), whence we find that\(T^2 (\Omega)\) (as the order type of a proper initial segment of theordinals) is strictly less than \(\Omega\) (which is the order type ofall the ordinals). Once again, the fact that the singletonmap is not a function eliminates the “intuitively obvious”similarity between these orders. This also shows that \(T\) is not afunction. \(T\) is an order endomorphism of the ordinals, though,whence we have \(\Omega \gt T^2 (\Omega) \gt T^4 (\Omega)\ldots\),which may be vaguely disturbing, though this “sequence” isnot a set. A perhaps useful comment is that in the models ofNFU described above, the action of \(T\) on ordinals exactlyparallels the action of \(j\) on order types of well-orderings \((j\)does not sendNFU ordinals to ordinals, exactly, so thisneeds to be phrased carefully): the “descending sequence”already has an analogue in the sequence \(\alpha \gt j(\alpha) \gt j^2(\alpha)\ldots\) in the original nonstandard model. Some have assertedthat this phenomenon (that the ordinals in any model ofNFUare not externally well-ordered) can be phrased as “NFUhas no standard model”. We reserve judgement on this—we donote that the theorem “the ordinals in any (set!) model ofNFU are not well-ordered” is a theorem ofNFUitself; note thatNFU does not see the universe as a model ofNFU (even though it is a set) because the membership relationis not a set relation (if it were, the singleton map certainly wouldbe).

NFU + Infinity + Choice is a relatively weak theory: likeZermelo set theory it does not prove even that \(\aleph_{\omega}\)exists. As is the case with Zermelo set theory, natural extensions ofthis theory make it much stronger. We give just one example. The Axiomof Cantorian Sets is the deceptively simple statement (to which thereare no evident counterexamples) that “every cantorian set isstrongly cantorian”.NFU + Infinity + Choice +Cantorian Sets is a considerably stronger theory thanNFU +Infinity + Choice: in its theory of isomorphism types of well-foundedextensional relations with top element, the cantorian types with theobvious “membership” relation satisfy the axioms ofZFC + “there is an \(n\)-Mahlo cardinal” for eachconcrete \(n\). There is no mathematical need for the deviousinterpretation: this theory proves the existence of \(n\)-Mahlocardinals and supports all mathematical constructions at that level ofconsistency strength in its own terms without any need to refer to thetheory of well-founded extensional relations. More elaboratestatements about such properties as “cantorian” and“strongly cantorian” (applied to order types as well ascardinality) yield even stronger axioms of infinity.

Our basic claim aboutNFU + Infinity + Choice (and itsextensions) is that it is a mathematically serviceable alternative settheory with its own intrinsic motivation (although we have usedZermelo style set theory to prove its consistency here, the entiredevelopment can be carried out in terms ofTST alone: one canuseTST as meta-theory, show inTST that consistencyofTST implies consistency ofNFU, and use thisresult to amend one’s meta-theory toNFU, thusabandoning the distinctions between types). We do not claim that it isbetter thanZFC, but we do claim that it is adequate, andthat it is important to know that adequate alternatives exist; we doclaim that it is useful to know that there are different ways to foundmathematics, as we have encountered the absurd assertion that“mathematics is whatever is formalized inZFC”.

6.4 Critique ofNFU

Like Zermelo set theory,NFU has advantages anddisadvantages. An advantage, which corresponds to one of the few cleardisadvantages of Zermelo set theory, is that it is possible to definenatural numbers, cardinal numbers, and ordinal numbers in the naturalway intended by Frege, Russell, and Whitehead.

Many but not all of the purported disadvantages ofNFU as aworking foundation for mathematics reduce to complaints bymathematicians used to working inZFC that “this is notwhat we are used to”. The fact that there are fewer singletonsthan objects (in spite of an obvious external one to onecorrespondence) takes getting used to. In otherwise familiarconstructions, one sometimes has to make technical use of thesingleton map or \(T\) operations to adjust types to getstratification. This author can testify that it is perfectly possibleto develop good intuition forNFU and work effectively withstratified comprehension; part of this but not all of it is a goodfamiliarity with how things are done inTST, as one also hasto develop a feel for how to use principles that subvertstratification.

As Sol Feferman has pointed out, one place where the treatments inNFU (at least those given so far) are clearly quite involvedare situations in which one needs to work with indexed families ofobjects. The proof of König’s Lemma of set theory in Holmes1998 is a good example of how complicated this kind of thing can getinNFU. We have a notion that the use of sets of “Quineatoms” (self-singletons) as index sets (necessarily for s.c.sets) might relieve this difficulty, but we haven’t proved thisin practice, and problems would remain for the noncantoriansituation.

The fact that “NFU has no standard models” (theordinals are not well-ordered in any set model ofNFU) is acriticism ofNFU which has merit. We observe, though, thatthere are other set theories in which nonstandard objects aredeliberately provided (we will review some of these below), and someof the applications of those set theories to “nonstandardanalysis” might be duplicated in suitable versions ofNFU. We also observe that strong principles which minimizethe nonstandard behavior of the ordinals turn out to give surprisinglystrong axioms of infinity inNFU; the nonstandard structureof the ordinals allows insight into phenomena associated with largecardinals.

Some have thought that the fact thatNFU combines a universalset and other big structures with mathematical fluency in treatingthese structures might make it a suitable medium for category theory.Although we have some inclination to be partial to this class of settheories, we note that there are strong counterarguments to this view.It is true that there are big categories, such as the category of allsets (as objects) and functions (as the morphisms between them), thecategory of all topological spaces and homeomorphism, and even thecategory of all categories and functors. However, the category of allsets and functions, for example, while it is a set, is not“cartesian closed” (a technical property which thiscategory is expected to have): see McLarty 1992. Moreover, if onerestricts to the s.c. sets and functions, one obtains a cartesianclosed category, which is much more closely analogous to the categoryof all sets and functions overZFC—and shares with itthe disadvantage of being a proper class! Contemplation of the modelsonly confirms the impression that the correct analogue of the properclass category of sets and functions inZFC is the properclass category of s.c. sets and functions inNFU! There maybe some applications for the big set categories inNFU, butthey are not likely to prove to be as useful as some haveoptimistically suggested. See Feferman 2006 for an extensivediscussion.

An important point is that there is a relativity of viewpoint here:theNFU world can be understood to be a nonstandard initialsegment of the world ofZFC (which could be arranged toinclude its entire standard part!) with an automorphism and theZFC world (or an initial segment of it) can be interpreted inNFU as the theory of isomorphism classes of well-foundedextensional relations with top (often restricted to its stronglycantorian part); these two theories are mutually interpretable, so thecorresponding views of the world admit mutual translation.

ZFC might be viewed as motivated by a generalization of thetheory of sets in extension (as generalizations of the notion offinite set, replacing the finite with the transfinite and the rejectedinfinite with the rejected Absolute Infinite of Cantor) while themotivation ofNFU can be seen as a correction of the theoryof sets as intensions (that is, as determined by predicates) which ledto the disaster of naive set theory. Nino Cocchiarella (1985) hasnoted that Frege’s theory of concepts could be saved if onecould motivate a restriction to stratified concepts (the abandonmentof strong extensionality is merely a return to common sense). But theimpression of a fundamental contrast should be tempered by theobservation that the two theories nonetheless seem to be looking atthe same universe in different ways!

7. Positive Set Theories

7.1 Topological motivation of positive set theory

We will not attempt an exhaustive survey of positive set theory; ouraim here is to motivate and exhibit the axioms of the strongest systemof this kind familiar to us, which is the third of the systems ofclassical set theory which we regard as genuinely mathematicallyserviceable (the other two beingZFC and suitable strongextensions ofNFU + Infinity + Choice).

Apositive formula is a formula which belongs to the smallestclass of formulas containing a false statement \(\bot\), all atomicmembership and equality formulas and closed under the formation ofconjunctions, disjunctions, universal and existential quantifications.Ageneralized positive formula is obtained if we allowbounded universal and existential quantifications (theadditional strength comes from allowing \((\forall x \in A \mid \phi)\equiv \forall x(x \in A \rightarrow \phi)\); bounded existentialquantification is positive in any case).

Positive comprehension is motivated superficially by an attack on oneof the elements of Russell’s paradox (the negation): a positiveset theory will be expected to support the axiom of extensionality (asusual) and the axiom of(generalized) positive comprehension:for any (generalized) positive formula \(\phi , \{x \mid \phi \}\)exists.

We mention that we are aware that positive comprehension with theadditional generalization of positive formulas allowing one to includeset abstracts \(\{x \mid \phi \}\) (with \(\phi\) generalizedpositive) in generalized positive formulas is consistent, but turnsout not to be consistent with extensionality. We are not very familiarwith this theory, so have no additional comments to make about it; donotice that the translations of formulas with set abstracts in theminto first order logic without abstracts are definitely not positivein our more restricted sense, and so one may expect some kind oftrouble!

The motivation for the kinds of positive set theory we are familiarwith istopological. We are to understand the sets as closedsets under some topology. Finite unions and intersections of closedsets are closed; this supports the inclusion of \(\{x \mid \phi \lor\psi \}\) and \(\{x \mid \phi \amp \psi \}\) as sets if \(\{x \mid\phi \}\) and \(\{x \mid \psi \}\) are sets. Arbitrary intersectionsof closed sets are closed: this supports our adoption of even boundeduniversal quantification (if each \(\{x \mid \phi(y)\}\) is a set,then \(\{x \mid \forall y\phi(y)\}\) is the intersection of all ofthese sets, and so should be closed, and \(\{x \in A \mid \forally\phi(y)\}\) is also an intersection of closed sets and so should beclosed. The motivation for permitting \(\{x \mid \exists y\phi(y)\}\)when each \(\{x \mid \phi(y)\}\) exists is more subtle, since infiniteunions do not as a rule preserve closedness: the idea is that the setof pairs \((x, y)\) such that \(\phi(x, y)\) is closed, and thetopology is such that the projection of a closed set is closed.Compactness of the topology suffices. Moreover, we now need to beaware that formulas with several parameters need to be considered interms of a product topology.

An additional very powerful principle should be expected to hold in atopological model: for any class \(C\) whatsoever (any collection ofsets), the intersection of all sets which include \(C\) as a subclassshould be a set. Every class has a set closure.

We attempt the construction of a model of such a topological theory.To bring out an analogy with Mac Lane set theory andNF, weinitially present a model built by collapsingTST in yetanother manner.

The model ofTST that we use contains one type 0 object\(u\). Note that this means that each type is finite. Objects of eachtype are construed as better and better approximations to the untypedobjects of the final set theory. \(u\) approximates any set. The type\(n + 1\) approximant to any set \(A\) is intended to be the set oftype \(n\) approximants of the elements of \(A\).

This means that we should be able to specify when a type \(n + 2\) set\(A^{n+2}\) refines a type \(n + 1\) set \(A^{n+1}\): each (type \(n +1)\) element of \(A^{n+2}\) should refine a (type \(n)\) element of\(A^{n+1}\), and each element of \(A^{n+1}\) should be refined by oneor more elements of \(A^{n+2}\). Along with the information that thetype 0 object \(u\) refines both of the elements of type 1, this givesa complete recursive definition of the notion of refinement of a type\(n\) set by a type \(n + 1\) set. Each type \(n + 1\) set refines aunique type \(n\) set but may be refined by many type \(n + 2\) sets.(The “hereditarily finite” sets without \(u\) in theirtransitive closure are refined by just one precisely analogous set atthe next higher level.) Define a general relation \(x \sim y\) on allelements of the model of set theory as holding when \(x = y\) (if theyare of the same type) or if there is a chain of refinements leadingfrom the one of \(x, y\) of lower type to the one of higher type.

The objects of our first model of positive set theory are sequences\(s_n\) with each \(s_n\) a type \(n\) set and with \(s_{n+1}\)refining \(s_n\) for each \(n\). We say that \(s \in t\) when \(s_{n}\in t_{n+1}\) for all \(n\). It is straightforward to establish thatif \(s_{n} \in t_{n+1}\) or \(s_{n} = t_{n}\) is false, then \(s_k \int_{k+1}\) or (respectively) \(s_k = t_k\) is false for all \(k \gtn\). More generally, if \(s_m \sim t_n\) is false, then \(s_{m+k} \simt_{n+k}\) is false for all \(k \ge 0\).

Formulas in the language of the typed theory with \(\in\) and \(\sim\)have a monotonicity property: if \(\phi\) is a generalized positiveformula and one of its typed versions is false, then any version ofthe same formula obtained by raising types and refining the values offree variables in the formula will continue to be false. It is nothard to see why this will fail to work if negation is allowed.

It is also not too hard to show that if all typed versions of ageneralized positive formula \(\phi\) in the language of the intendedmodel (with sequences \(s\) appearing as values of free variablesreplaced by their values at the appropriate types) are true, then theoriginal formula \(\phi\) is true in the intended model. The onedifficulty comes in with existential quantification: the fact that onehas a witness to \((\exists x.\phi(x))\) in each typed version doesnot immediately give a sequence witnessing this in the intended model.The tree property of \(\omega\) helps here: only finitely manyapproximants to sets exist at each level, so one can at each levelchoose an approximant refinements of which are used at infinitely manyhigher levels as witnesses to \((\exists x.\phi(x))\), then restrictattention to refinements of that approximant; in this way one gets notan arbitrary sequence of witnesses at various types but a“convergent” sequence (an element of the intendedmodel).

One then shows that any generalized positive formula \(\phi(x)\) hasan extension \(\{x \mid \phi(x)\}\) by considering the sets ofwitnesses to \(\phi(x)\) in each type \(n\); these sets themselves canbe used to construct a convergent sequence (with the proviso that someapparent elements found at any given stage may need to be discarded;one defines \(s_{n+1}\) as the set of those type \(n\) approximantswhich not only witness \(\phi(x)\) at the current type \(n\) but haverefinements which witness \(\phi(x)\) at each subsequent type. Thesequence of sets \(s\) obtained will be an element of the intendedmodel and have the intended extension.

Finally, for any class of sequences (elements of the intended model)\(C\), there is a smallestset which contains all elements of\(C\): let \(c_{n+1}\) be the set of terms \(s_n\) of sequences \(s\)belonging to \(C\) at each type \(n\) to construct a sequence \(c\)which will have the desired property.

This theory can be made stronger by indicating how to pass totransfinite typed approximations. The type \(\alpha + 1\)approximation to a set will always be the set of type \(\alpha\)approximations; if \(\lambda\) is a limit ordinal, the type\(\lambda\) approximation will be the sequence \(\{s_{\beta} \}_{\beta\lt \lambda}\) of approximants to the set at earlier levels (so our“intended model” above is the set of type \(\omega\)approximations in a larger model).

Everything above will work at any limit stage except the treatment ofthe existential quantifier. The existential quantifier argument willwork if the ordinal stage at which the model is being constructed is aweakly compact cardinal. This is a moderately strong large cardinalproperty (for an uncountable cardinal): it implies, for example, theexistence of proper classes of inaccessibles and of \(n\)-Mahlocardinals for each \(n\).

So for each weakly compact cardinal \(\kappa\) (including \(\kappa =\omega)\) the approximants of level \(\kappa\) in the transfinite typetheory just outlined make up a model of set theory withextensionality, generalized positive comprehension, and the closureproperty. We will refer to this model as the“\(\kappa\)-hyperuniverse”.

7.2 The systemGPK\(^{+}_{\infty}\) of Olivier Esser

We now present an axiomatic theory which has the\(\kappa\)-hyperuniverses with \(\kappa \gt \omega\) as (some of its)models. This is a first-order theory with equality and membership asprimitive relations. This system is calledGPK\(^{+}_{\infty}\) and is described in Esser 1999.

Extensionality: Sets with the same elements are thesame.

Generalized Positive Comprehension: For anygeneralized positive formula \(\phi , \{x \mid \phi \}\) exists.(Notice that since we view the false formula \(\bot\) as positive weneed no special axiom asserting the existence of the empty set).

Closure: For any formula \(\phi(x)\), there is a set\(C\) such that \(x \in C \equiv [\forall y\forall z(\phi(z)\rightarrow z \in y) \rightarrow x \in y\)]; \(C\) is the intersectionof all sets which include all objects which satisfy \(\phi : C\) iscalled the closure of the class \(\{x \mid \phi(x)\}\).

Infinity: The closure of the von Neumann ordinals isnot an element of itself. (This excludes the \(\omega\)-hyperuniverse,in which the closure of the class of von Neumann ordinals has itselfas an additional member).

As one might expect, some of the basic concepts of this set theory aretopological (sets being the closed classes of the topology on theuniverse).

This set theory interpretsZF. This is shown by demonstratingfirst that the discrete sets (and more particularly the (closed) setsof isolated points in the topology) satisfy an analogue of Replacement(a definable function (defined by a formula which need not bepositive) with a discrete domain is a set), and so an analogue ofseparation, then by showing that well-founded sets are isolated in thetopology and the class of well-founded sets is closed under theconstructions ofZF.

Not onlyZF but also Kelley-Morse class theory can beinterpreted; any definable class of well-founded sets has a closurewhose well-founded members will be exactly the desired members (itwill as a rule have other, non-well-founded members). Quantificationover these “classes” defines sets just as easily asquantification over mere sets in this context; so we get animpredicative class theory. Further, one can prove internally to thistheory that the “proper class ordinal” in the interpreted\(KM\) has the tree property, and so is in effect a weakly compactcardinal; this shows that this theory has considerable consistencystrength (for example, its version ofZF proves that there isa proper class of inaccessible cardinals, a proper class of\(n\)-Mahlos for each \(n\), and so forth): the use of large cardinalsin the outlined model construction above was essential.

The Axiom of Choice in any global form is inconsistent with thistheory, but it is consistent for all well-founded sets to bewell-orderable (in fact, this will be true in the models describedabove if the construction is carried out in an environment in whichChoice is true). This is sufficient for the usual mathematicalapplications.

SinceZF is entirely immersed in this theory, it is clearlyserviceable for the usual classical applications. The Frege naturalnumbers are not definable in this theory (except for 0 and 1); it isbetter to work with the finite von Neumann ordinals. The ability toprove strong results about large cardinals using the properties of theproper class ordinal suggests that the superstructure of large setscan be used for mathematical purposes as well. Familiarity withtechniques of topology of \(\kappa\)-compact spaces would be usefulfor understanding what can be done with the big sets in thistheory.

With the negation of the Axiom of Infinity, we get the theory of the\(\omega\)-hyperuniverse, which is equiconsistent with second-orderarithmetic, and so actually has a fair amount of mathematicalstrength. In this theory, the class of natural numbers (considered asfinite ordinals) is not closed and acquires an extra element “atinfinity” (which happens to be the closure of the class ofnatural numbers itself). Individual real numbers can be coded (usingthe usual Dedekind construction, actually) but the theory of sets ofreal numbers will begin to look quite different.

7.3 Critique of positive set theory

One obvious criticism is that this theory isextremelystrong, compared with the other systems given here. This could be agood thing or a bad thing, depending on one’s attitude. If oneis worried about the consistency of a weakly compact, the level ofconsistency strength here is certainly a problem (though the theory ofthe \(\omega\)-hyperuniverse will stay around in any case). On theother hand, the fact that the topological motivation for set theoryseems to work and yields a higher level of consistency strength thanone might expect (“weakly compact” infinity following frommerely uncountable infinity) might be taken as evidence that these arevery powerful ideas.

The mathematical constructions that are readily accessible to thisauthor are simply carried over fromZF orZFC; thewell-founded sets are considered within the world of positive settheory, and we find that they have exactly the properties we expectthem to have from the usual viewpoint. It is rather nice that we get(fuzzier) objects in our set theory suitable to represent all of theusual proper classes; it is less clear what we can do with the otherlarge objects than it is inNFU. A topologist might find thissystem quite interesting; in any event, topological expertise seemsrequired to evaluate what can be done with the extra machinery in thissystem.

We briefly review the paradoxes: the Russell paradox doesn’twork because \(x \not\in x\) is not a positive formula; notice that\(\{x \mid x \in x\}\) exists! The Cantor paradox does not workbecause the proof of the Cantor theorem relies on an instance ofcomprehension which is not positive. \(\wp(V)\) does exist and isequal to \(V\). The ordinals are defined by a non-positive condition,and do not make up a set, but it is interesting to note that theclosure \(\mathbf{CL}(On)\) of the class \(On\) of ordinals is equalto \(On \cup \{\mathbf{CL}(On)\}\); the closure has itself as its onlyunexpected element.

8. Logically and Philosophically Motivated Variations

In the preceding set theories, the properties of the usual objects ofmathematics accord closely with their properties as“intuitively” understood by most mathematicians (or laypeople). (Strictly speaking, this is not quite true inNFU +Infinity without the additional assumption of Rosser’s Axiom ofCounting, but the latter axiom (“\(\mathbf{N}\) is stronglycantorian”) is almost always assumed in practice).

In the first two classes of system discussed in this section, logicalconsiderations lead to the construction of theories in which“familiar” parts of the world look quite different.Constructive mathematicians do not see the same continuum that we do,and if they are willing to venture into the higher reaches of settheory, they find a different world there, too. The proponents ofnonstandard analysis also find it useful to look at a differentcontinuum (and even different natural numbers) though they do see theusual continuum and natural numbers embedded therein.

It is not entirely clear that the final item discussed in thissection, the multiverse view of set theory proposed by Joel Hamkins,should be described as a view of the world of set theory at all: itproposes that we should consider that there are multiple differentconcepts of set each of which describes its own universe (and looselywe might speak of the complex of universes as a“multiverse”), but at bottom it is being questionedwhether there is properly a single world of set theory at all. But thetentative list of proposed axioms he gives for relationships betweenuniverses have some of the flavor of an alternative set theory.

8.1 Constructive set theory

There are a number of attempts at constructive (intuitionistic)theories of types and set theories. We will describe a few systemshere, quite briefly as we are not expert in constructivemathematics.

An intuitionistic typed theory of sets is readily obtained by simplyadopting the intuitionistic versions of the axioms ofTST asaxioms. An Axiom of Infinity would be wanted to ensure that aninterpretation of Heyting arithmetic could be embedded in the theory;it might be simplest to provide type 0 with the primitives of Heytingarithmetic (just as the earliest versions ofTST had theprimitives of classical arithmetic provided for type 0). We believethat this would give a quite comfortable environment for doingconstructive mathematics.

Daniel Dzierzgowski has gone so far as to study an intuitionisticversion ofNF constructed in the same way; all that we canusefully report here is that it is not clear that the resulting theoryINF is as strong asNF (in particular, it is unclearwhetherINF interprets Heyting Arithmetic, becauseSpecker’s proof of Infinity inNF does not seem to gothrough in any useful way) but the consistency problem forINF remains open in spite of the apparent weakness of thetheory.

A more ambitious theory isIZF (intuitionisticZF).An interesting feature of the development ofIZF is that onemust be very careful in one’s choice of axioms: someformulations of the axioms of set theory have (constructivelydeducible) consequences which are not considered constructively valid(such as Excluded Middle), while other (classically equivalent)formulations of the axioms appear not to have such consequences: thelatter forms, obviously to be preferred for a constructive developmentof set theory, often are not the most familiar ones in the classicalcontext.

A set of axioms which seems to yield a nontrivial system ofconstructive mathematics is the following:

Extensionality: in the usualZF form.

Pairing, Union, Power Set, Infinity: in the usualZF form.

Collection: We are not sure why this is oftenpreferred in constructive set theory, as it seems to us lessconstructive than replacement? But we have heard it said thatReplacement is constructively quite weak.

\(\in\)-Induction: The induction on membership formis preferred for a highly practical reason: more usual formulations ofFoundation immediately imply the Axiom of Excluded Middle!

See Friedman 1973 andOther Internet Resources for further information aboutIZF.

As is often the case in constructive mathematics generally, verysimple notions of classical set theory (such as the notion of anordinal) require careful reformulation to obtain the appropriatedefinition for the constructive environment (and the formulationsoften appear more complicated than familiar ones to the classicaleye). Being inexpert, we will not involve ourselves further in this.It is worth noting thatIZF, like many but not allconstructive systems, admits a double negation interpretation of thecorresponding classical theoryZF; we might think ofIZF as a weakened version ofZF from the classicalstandpoint, but in its own terms it is the theory of a larger, morecomplex realm in which a copy of the classical universe of set theoryis embedded.

The theories we have described so far are criticized by someconstructive mathematicians for allowing an unrestricted power setoperation. A weaker systemCZF (constructiveZF hasbeen proposed which does not have this operation (and which has thesame level of strength as the weak set theoryKPU withoutPower Set described earlier).

CZF omits Power Set. It replaces Foundation with\(\in\)-Induction for the same reasons as above. The axioms ofExtensionality, Pairing, and Union are as in ordinary set theory. Theaxiom of Separation is restricted to bounded \((\Delta_0)\) formulasas in Mac Lane set theory orKPU.

The Collection axiom is replaced by two weaker axioms.

The Strong Collection axiom scheme asserts that if for every \(x \inA\) there is \(y\) such that \(\phi (x, y)\), then there is a set\(B\) such that for every \(x \in A\) there is \(y \in B\) such that\(\phi(x, y)\) (as in the usual scheme) but also for every \(y \in B\)there is \(x \in A\) such that \(\phi(x, y)\) (\(B\) doesn’tcontain any redundant elements). The additional restriction is usefulbecause of the weaker form of the Separation Axiom.

The Subset Collection scheme can be regarded as containing a very weakform of Power Set. It asserts, for each formula \(\phi(x, y, z)\) thatfor every \(A\) and \(B\), there is a set \(C\) such that for each\(z\) such that \(\forall x \in A\exists y \in B[\phi(x, y, z)\)]there is \(R_z \in C\) such that for every \(x \in A\) there is \(y\in R_z\) such that \(\phi(x, y, z)\)and for every \(y \inR_z\) there is \(x \in A\) such that \(\phi(x, y, z)\) (this is thesame restriction as in the Strong Collection axiom; notice that notonly are images under the relation constructed, but the images arefurther collected into a set).

The Subset Collection scheme is powerful enough to allow theconstruction of the set of all functions from a set \(A\) to a set\(B\) as a set (which suggests that the classical version of thistheory is as strong asZF, since the existence of the set offunctions from \(A\) to \(\{0, 1\}\) is classically as strong as theexistence of the power set of \(A\), and strong collection shouldallow the proof of strong separation in a classical environment).

This theory is known to be at the same level of consistency strengthas the classical set theoryKPU. It admits an interpretationin Martin-Löf constructive type theory (asIZF doesnot).

See Aczel (1978, 1982, 1986) for further information about thistheory.

8.2 Set theory for nonstandard analysis

Nonstandard analysis originated with Abraham Robinson (1966), whonoticed that the use of nonstandard models of the continuum wouldallow one to make sense of the infinitesimal numbers of Leibniz, andso obtain an elegant formulation of the calculus with feweralternations of quantifiers.

Later exponents of nonstandard analysis observed that the constantreference to the model theory made the exposition less elementary thanit could be; they had the idea of working in a set theory which wasinherently “nonstandard”.

We present a system of this kind, a version of the set theoryIST (Internal Set Theory) of Nelson (1977). The primitives ofthe theory are equality, membership, and a primitive notion ofstandardness. The axioms follow.

Extensionality, Pairing, Union, Power Set, Foundation,Choice: As in our presentation ofZFC above.

Separation, Replacement: As in our presentation ofZFC above, except that the standardness predicate cannotappear in the formula \(\phi\).

Definition: For any formula \(\phi\), the formula\(\phi\)st is obtained by replacing each quantifier overthe universe with a quantifier over all standard objects (and eachquantifier bounded in a set with a quantifier restricted to thestandard elements of that set).

Idealization: There is a finite set which containsall standard sets.

Transfer: For each formula \(\phi(x)\) not mentioningthe standardness predicate and containing no parameters (freevariables other than \(x)\) except standard sets, \(\forall x\phi(x)\equiv \forall x\)(standard\((x) \rightarrow \phi(x))\).

Standardization: For any formula \(\phi(x)\) andstandard set \(A\), there is a standard set \(B\) whose standardelements are exactly the standard elements \(x\) of \(A\) satisfying\(\phi(x)\).

Our form of Idealization is simpler than the usual version but has thesame effect.

Transfer immediately implies that any uniquely definable object(defined without reference to standardness) is in fact a standardobject. So the empty set is standard, \(\omega\) is standard, and soforth. But it is not the case that all elements of standard objectsare standard. For consider the cardinality of a finite set containingall standard objects; this is clearly greater that any standardnatural number (usual element of \(\omega)\) yet it is equally clearlyan element of \(\omega\). It turns out to be provable that every setall of whose elements are standard is a standard finite set.

Relative consistency of this theory with the usual set theoryZFC is established via familiar results of model theory.Working in this theory makes it possible to use the techniques ofnonstandard analysis in a “elementary” way, without everappealing explicitly to the properties of nonstandard models.

8.3 The multiverse view of set theory

We examine the theory of the set theoretic multiverse proposed by JoelDavid Hamkins, whose purpose is to address philosophical questionsabout independence questions in standard set theory, but which whenspelled out formally has some of the flavor of an alternative settheory. A set theoretic Platonist might say about the ContinuumHypothesis (CH) that, since there is “of course”a single universe of sets,CH is either true or false in thatworld, but that we cannot determine which ofCH and\(\neg\)CH actually holds. Hamkins proposes as an alternative(taking the same realist standpoint as the classical Platonist, itmust be noted) that there are many distinct concepts of set, which wemay suppose for the moment all satisfy the usual axioms ofZFC, each concept determining its own universe of sets, andin some of these universesCH holds and in some it does nothold. He says further, provocatively, that in his viewCH isa solved problem, because we have an excellent understanding of theconditions under whichCH holds in \(a\) universe of sets(note the article used) and the conditions in which it does not hold,and even more provocatively, he argues that an “ideal”solution to theCH problem in which a generally acceptedaxiom arises which causes most mathematicians to conclude thatCH is “self-evidently” true or false (decidingthe question in the usual sense) is now actually impossible, becauseset theorists are now very conversant with universes in which bothalternatives hold, and understand very well that neither alternativeis “self-evidently” true (the force of his argument isreally that the complementary conclusion that one of the alternativesis self-evidently false is now impossible to draw, because we are toowell acquainted with actual “worlds” in which eachalternative holds to believe that either is absurd).

We could write an entire essay on questions raised in our summary inthe previous paragraph, but Hamkins has already done this in Hamkins2012. Our aim here is to summarize the tentative axioms that Hamkinspresents for the multiverse conception. This is not really a formalset of axioms, but it does have some of the qualities of anaxiomatization of an alternative set theory. We note that the list ofaxioms presented here unavoidably presupposes more knowledge ofadvanced set theory than other parts of this article.

Realizability Principle: For any universe \(V\), if\(W\) is a model of set theory and definable or interpreted in \(V\),then \(W\) is a universe.

One thing to note here is that Hamkins is open to the idea that someuniverses may be models of theories other thanZFC (weakertheories such as Zermelo set theory or Peano arithmetic, or evendifferent theories such asZFA orNF/NFU). But itappears to be difficult philosophically to articulate exact boundariesfor what counts as a “concept of set theory” which woulddefine a universe. And this is fine, because there is no notion of“the multiverse” of universes as a completed totality hereat all—this would amount to smuggling in the single Platonicuniverse again through the back door! Some of the axioms which followdo presume that the universes discussed are models ofZFC orvery similar theories.

Forcing Extension Principle: For any universe \(V\)and any forcing notion \(P\) in \(V\), there is a forcing extension\(V[G]\), where \(G \subset P\) is \(V\)-generic.

This asserts that our forcing extensions are concretely real worlds.Hamkins discusses the metaphysical difficulties of the status offorcing extensions at length in Hamkins 2012.

Reflection Axiom: For every universe \(V\), there isa much taller universe \(W\) with an ordinal \(\theta\) for which\(V\) is elementarily equivalent to (or isomorphic to) \(W_{\theta}\),a level of the cumulative hierarchy in \(W\).

We quote Hamkins:

the principle asserts that no universe is correct about the height ofthe ordinals, and every universe looks like an initial segment of amuch taller universe having the same truths. (2012: 438)

Here we are presuming that the universes we are talking about aremodels ofZFC or aZFC-like theory.

Countability Principle: Every universe \(V\) iscountable from the perspective of another, better universe \(W\).

This definitely has the flavor of an alternative set theory axiom! Themodel theoretic motivation is obvious: this amounts to takingSkolem’s paradox seriously. Hamkins notes that the ForcingExtension principle above already implies this, but it is clear in anycase that his list of tentative axioms is intended to be neitherindependent nor complete.

Well-foundedness Mirage: Every universe \(V\) isill-founded from the perspective of another, better universe.

Hamkins says that this may be the most provocative of all his axioms.He states that he intends this to imply that even our notion ofnatural numbers is defective in any universe: the collection ofnatural numbers as defined in any universe is seen to containnonstandard elements from the standpoint of a further universe.

Reverse Embedding Axiom: For every universe \(V\) andevery embedding \(j : V \rightarrow M\) in \(V\), there is a universe\(W\) and embedding \(h: W \rightarrow V\) such that \(j\) is theiterate of \(h\).

We merely quote this astonishing assertion, which says that for anyelementary embedding of a universe \(V\) into a model \(M\) includedin \(V\), our understanding of this embedding locally to \(V\) itselfis seriously incomplete.

Absorption into L: Every universe \(V\) is acountable transitive model in another universe \(W\) satisfying \(V =L\).

We are used to thinking of the constructible universe \(L\) as a“restricted” universe. Here Hamkins turns this inside out(he discusses at length why this is a reasonable way to think in thepaper Hamkins 2012).

We leave it to the reader who is interested to pursue thisfurther.

9. Small Set Theories

It is commonly noted that set theory produces far more superstructurethan is needed to support classical mathematics. In this section, wedescribe two miniature theories which purport to provide enoughfoundations without nearly as much superstructure. Our “pocketset theory” (motivated by a suggestion of Rudy Rucker) is justsmall; Vopenka’s alternative set theory is also“nonstandard” in its approach.

9.1 Pocket set theory

This theory is a proposal of ours, which elaborates on a suggestion ofRudy Rucker. We (and many others) have observed that of all the ordersof infinity in Cantor’s paradise, only two actually occur inclassical mathematical practice outside set theory: these are\(\aleph_0\) and \(c\), the infinity of the natural numbers and theinfinity of the continuum. Pocket set theory is a theory motivated bythe idea that these are the only infinities (Vopenka’salternative set theory also has this property, by the way).

The objects of pocket set theory are classes. A class is said to be aset iff it is an element (as in the usual class theories overZFC).

The ordered pair is defined using the usual Kuratowski definition, butwithout assuming that there are any ordered pairs. The notions ofrelation, function, bijection and equinumerousness are defined asusual (still without any assumptions as to the existence of anyordered pairs). An infinite set is defined as a set which isequinumerous with one of its proper subsets. A proper class is definedas a class which is not a set.

The axioms of pocket set theory are

Extensionality: Classes with the same elements areequal.

Class Comprehension: For any formula \(\phi\), thereis a class \(\{x \mid \phi(x)\}\) which contains all sets \(x\) suchthat \(\phi(x)\). (note that this is the class comprehension axiom ofKelley-Morse set theory, without any restrictions on quantifiers in\(\phi)\).

Infinite Sets: There is an infinite set; all infinitesets are the same size.

Proper Classes: All proper classes are the same size,and any class the same size as a proper class is proper.

We cannot resist proving the main results (because the proofs arefunny).

Empty Set: If the empty set were a proper class, thenall proper classes would be empty. In particular, the Russell classwould be empty. Let \(I\) be an infinite set. \(\{I\}\) would be aset, because it is not empty, and \(\{I,\{I\}\}\) would be a set(again because it is not empty). But \(\{I,\{I\}\}\) belongs to theRussell class (as a set with two elements, it cannot be either theDedekind infinite \(I\) or the singleton \(\{I\}\). So \(\varnothing\)is a set.

Singleton: If any singleton \(\{x\}\) is a properclass, then all singletons are proper classes, and the Russell classis a singleton. \(\{I, \varnothing \}\) is a set (both elements aresets, and the class is not a singleton) which cannot be a member ofitself, and so is in the Russell class. But so is \(\varnothing\) inthe Russell class; so the Russell class is not a singleton, and allsingletons are sets.

Unordered Pair: The Russell class is not a pair,because it has distinct elements \(\varnothing , \{\varnothing \},\{\{\varnothing \}\}\).

Relations: All Kuratowski ordered pairs exist, so alldefinable relations are realized as set relations.

Cantor’s theorem (no set is the same size as the class of itssubsets) and the Schröder-Bernstein theorem (if there areinjections from each of two classes into the other, there is abijection between them) have their standard proofs.

The Russell class can be shown to be the same size as the universeusing Schröder-Bernstein: the injection from \(R\) into \(V\) isobvious, and \(V\) can be embedded into \(R\) using the map \(x\mapsto \{\{x\}, \varnothing \}\) (clearly no set \(\{\{x\},\varnothing \}\) belongs to itself). So a class is proper iff it isthe same size as the universe (limitation of size).

Define the von Neumann ordinals as classes which are strictlywell-ordered by membership. Each finite ordinal can be proved to be aset (because it is smaller than its successor and is a subclass of theRussell class). The class of all ordinals is not a set (but is thelast ordinal), for the usual reasons, and so is the same size as theuniverse, and so the universe can be well-ordered.

There is an infinite ordinal, because there is an ordinal which can beplaced in one-to-one correspondence with one’s favorite infiniteset \(I\). Since there is an infinite ordinal, every finite ordinal isa set and the first infinite ordinal \(\omega\) is a set. It followsthat all infinite sets are countably infinite.

The power set of an infinite set \(I\) is not the same size as \(I\)by Cantor’s theorem, is certainly infinite, and so cannot be aset, and so must be the same size as the universe. It follows by usualconsiderations that the universe is the same size as \(\wp(\omega)\)or as \(\mathbf{R}\) (the set of real numbers, defined in any of theusual ways), and its “cardinal” is \(c\). Further, thefirst uncountable ordinal \(\omega_1\) is the cardinality of theuniverse, so the Continuum Hypothesis holds.

It is well-known that coding tricks allow one to do classicalmathematics without ever going above cardinality \(c\): for example,the class ofall functions from the reals to the reals, istoo large to be even a proper class here, but the class ofcontinuous functions is of cardinality \(c\). An individualcontinuous function \(f\) might seem to be a proper class, but it canbe coded as a hereditarily countable set by (for example) letting thecountable set of pairs of rationals \(\langle p, q\rangle\) such that\(p \lt f(q)\) code the function \(f\). In fact, it is claimed thatmost of classical mathematics can be carried out using just naturalnumbers and sets of natural numbers (second-order arithmetic) or ineven weaker systems, so pocket set theory (having the strength ofthird order arithmetic) can be thought to be rathergenerous.

We do remark that it is not necessarily the case that the hypotheticaladvocate of pocket set theory thinks that the universe is small; he orshe might instead think that the continuum is very large…

9.2 Vopenka’s alternative set theory

Petr Vopenka has presented the followingalternative settheory (1979).

The theory has sets and classes. The following axioms hold ofsets.

Extensionality: Sets with the same elements are thesame.

Empty set: \(\varnothing\) exists.

Successor: For any sets \(x\) and \(y, x \cup \{y\}\)exists.

Induction: Every formula \(\phi\) expressed in thelanguage of sets only (all parameters are sets and all quantifiers arerestricted to sets) and true of \(\varnothing\) and true of \(x \cup\{y\}\) if it is true of \(x\) is true of all sets.

Regularity: Every set has an element disjoint fromit.

The theory of sets appears to be the theory of \(V_{\omega}\) (thehereditarily finite sets) in the usual set theory!

We now pass to consideration of classes.

Existence of classes: If \(\phi(x)\) is any formula,then the class \(\phi(x)\) of all sets \(x\) such that \(\phi(x)\)exists. (The set \(x\) is identified with the class of elements of\(x\).) Note that Kuratowski pairs of sets are sets, and so we candefine (class) relations and functions on the universe of sets much asusual.

Extensionality for classes: Classes with the sameelements are equal.

Definition: Asemiset is a subclass of aset. Aproper class is a class which is not a set. Aproper semiset is a subclass of a set which is not a set.

Axiom of proper semisets: There is a propersemiset.

A proper semiset is a signal that the set which contains it isnonstandard (recall that all setsseem to be hereditarilyfinite!)

Definition: A set isfinite iff all of itssubclasses are sets.

A finite set has standard size (the use of “finite” herecould be confusing: allsets are nonstandard finite here,after all).

Definition: An ordering of type \(\omega\) is a classwell-ordering which is infinite and all of whose initial segments arefinite. A class is countable if it has an ordering of type\(\omega\).

An ordering of type \(\omega\) has the same length as thestandard natural numbers. We can prove that there is such anordering: consider the order on the finite (i.e., standard finite) vonNeumann ordinals. There must be infinite von Neumann ordinals becausethere is a set theoretically definable bijection between the vonNeumann ordinals and the whole universe of sets: any proper semisetcan be converted to a proper semiset of a set of von Neumannordinals.

Prolongation axiom: Each countable function \(F\) canbe extended to a set function.

The Prolongation Axiom has a role similar to that of the IdealizationAxiom in the “nonstandard” set theoryIST above,though the analogy between the alternative set theory andISTis far from perfect.

Vopenka considers representations of superclasses of classes usingrelations on sets. A class relation \(R\) on a class \(A\) is said tocode the superclass of inverse images of elements of \(A\) under\(R\). A class relation \(R\) on a class \(A\) is said toextensionally code this superclass if distinct elements of \(A\) havedistinct preimages. He “tidies up” the theory of suchcodings by adopting the

Axiom of extensional coding: Every collection ofclasses which is codable is extensionally codable.

It is worth noting that this can be phrased in a way which makes noreference to superclasses: for any class relation \(R\), there is aclass relation \(R'\) such that for any \(x\) there is \(x'\) withpreimage under \(R'\) equal to the preimage of \(x\) under \(R\), anddistinct elements of the field of \(R'\) have distinct preimages.

His notion of coding is more general: we can further code collectionsof classes by taking a pair \(\langle K, R\rangle\) where \(K\) is asubclass of the field of \(R\); clearly any collection of classescodable in this way can be extensionally coded by using the axiom inthe form we give.

The final axiom is

Axiom of cardinalities: If two classes areuncountable, they are the same size.

This implies (as in pocket set theory) that there are two infinitecardinalities, which can be thought of as \(\aleph_0\) and \(c\),though in this context their behavior is less familiar than it is inpocket set theory. For example, the set of all natural numbers (asVopenka defines it) is of cardinality \(c\), while there is an initialsegment of the natural numbers (the finite natural numbers) which hasthe expected cardinality \(\omega\).

One gets the axiom of choice from the axioms of cardinalities andextensional codings; the details are technical. One might think thatthis would go as in pocket set theory: the order type of all theordinals is not a set and so has the same cardinality as the universe.But this doesn’t work here, because the “ordinals”in the obvious sense are all nonstandard finite ordinals, which, froma class standpoint, are not well-ordered at all. However, there is adevious way to code an uncountable well-ordering using the axiom ofextensional coding, and since its domain is uncountable it must be thesame size as the universe.

This is a rather difficult theory. A model of the alternative settheory in the usual set theory is a nonstandard model of\(V_{\omega}\) of size \(\omega_1\) in which every countable externalfunction extends to a function in the model. It might be best tosuppose that this model is constructed inside \(L\) (the constructibleuniverse) so that the axiom of cardinalities will be satisfied. Theaxiom of extensional coding follows from Choice in the ambient settheory.

The constructions of the natural numbers and the real numbers withwhich we started go much as usual, except that we get two kinds ofnatural numbers (the finite von Neumann ordinals in the set universe(nonstandard), and thefinite von Neumann set ordinals(standard)). The classical reals can be defined as Dedekind cuts inthe standard rationals; these are not sets, but any real can then beapproximated by a nonstandard rational. One can proceed to do analysiswith some (but not quite all) of the tools of the usual nonstandardanalysis.

10. Double Extension Set Theory: A Curiosity

A recent proposal of Andrzej Kisielewicz (1998) is that the paradoxesof set theory might be evaded by having two different membershiprelations \(\in\) and \(\varepsilon\), with each membership relationused to define extensions for the other.

We present the axiomatics. The primitive notions of this theory areequality \((=)\) and the two flavors \(\in\) and \(\varepsilon\) ofmembership. A formula \(\phi\) isuniform if it does notmention \(\varepsilon\). If \(\phi\) is a uniform formula, \(\phi^*\)is the corresponding formula with \(\in\) replaced by \(\varepsilon\)throughout. A set \(A\) isregular iff it has the sameextension with respect to both membership relations: \(x \in A \equivx \varepsilon A\).

The comprehension axiom asserts that for any uniform formula\(\phi(x)\) in which all parameters (free variables other than \(x\))are regular, there is an object \(A\), for which we use the notation\(\{x \mid \phi(x)\}\), such that \(\forall x ((x \in A \equiv \phi^*)\amp (x \varepsilon A \equiv \phi))\).

The extensionality axiom asserts that for any \(A\) and \(B\),\(\forall x(x \in A \equiv x \varepsilon B) \rightarrow A = B\).Notice that any object to which this axiom applies is regular.

Finally, a special axiom asserts that any set one of whose extensionsis included in a regular set is itself regular.

This theory can be shown to interpretZF in the realm ofhereditarily regular sets. Formally, the proof has the samestructure as the proof for Ackermann set theory. It is unclear whetherthis theory is actually consistent; natural ways to strengthen it(including the first version proposed by Kisielewicz) turn out to beinconsistent. It is also extremely hard to think about!

An example of the curious properties of this theory is that theordinals under one membership relation are exactly the regularordinals while under the other they are longer; this means that theapparent symmetry between the two membership relations breaks!

11. Conclusion

We have presented a wide range of theories here. The theoriesmotivated by essentially different views of the realm of mathematics(the constructive theories and the theories which support nonstandardanalysis) we set to one side. Similarly, the theories motivated by thedesire to keep the universe small can be set to one side. Thealternative classical set theories which support a fluent developmentof mathematics seem to beZFC or its variants with classes(including Ackermann),NFU + Infinity + Choice with suitablestrong infinity axioms (to get s.c. sets to behave nicely), and thepositive set theory of Esser. Any of these is adequate for thepurpose, in our opinion, including the one currently in use. There isno compelling reason for mathematicians to use a different foundationthanZFC; but there is a good reason for mathematicians whohave occasion to think about foundations to be aware that there arealternatives; otherwise there is a danger that accidental features ofthe dominant system of set theory will be mistaken for essentialfeatures of any foundation of mathematics. For example, it isfrequently said that the universal set (an extension which is actuallytrivially easy to obtain in a weak set theory) is an inconsistenttotality; the actual situation is merely that one cannot have auniversal set while assuming Zermelo’s axiom of separation.

Bibliography

  • Aczel, Peter, 1978, “The Type Theoretic Interpretation ofConstructive Set Theory”, in A. MacIntyre, L. Pacholski, J.Paris (eds.),Logic Colloquium ‘77, (Studies in Logicand the Foundations of Mathematics, 96), Amsterdam: North-Holland, pp.55–66. doi:10.1016/S0049-237X(08)71989-X
  • –––, 1982, “The Type TheoreticInterpretation of Constructive Set Theory: Choice Principles”,in A.S. Troelstra and D. van Dalen (eds.),The L.E.J. BrouwerCentenary Symposium, (Studies in Logic and the Foundations ofMathematics, 110), Amsterdam: North-Holland, pp. 1–40.doi:10.1016/S0049-237X(09)70120-X
  • –––, 1986, “The Type TheoreticInterpretation of Constructive Set Theory: InductiveDefinitions”, in Ruth Barcan Marcus, Georg J.W.Dorn, and PaulWeingartner (eds.),Logic, Methodology, and Philosophy of ScienceVII, (Studies in Logic and the Foundations of Mathematics, 114),Amsterdam: North-Holland, pp. 17–49.doi:10.1016/S0049-237X(09)70683-4
  • –––, 1988,Non-Well-Founded Sets (CSLILecture Notes, 14), Stanford: CSLI Publications.
  • St. Augustine,De Civitate Dei, Book 12, chapter 18.
  • Barwise, Jon, 1975,Admissible Sets and Structures: AnApproach to Definability Theory, (Perspectives in MathematicalLogic, 7), Berlin: Springer-Verlag.
  • Boffa, M., 1988, “ZFJ and the Consistency Problem forNF”,Jahrbuch der Kurt Gödel Gesellschaft, Vienna,pp. 102–106
  • Burali-Forti, C., 1897, “Una questione sui numeritransfiniti”,Rendiconti del Circolo matematico diPalermo, 11(1): 154–164. A correction appears in“Sulle classi ben ordinate”,Rendiconti del Circolomatematico di Palermo, 11(1): 260. It is not clear thatBurali-Forti ever correctly understood his paradox.doi:10.1007/BF03015911 and doi:10.1007/BF03015919
  • Cantor, Georg, 1872, “Über die Ausdehnung eines Satzesaus der Theorie der trigonometrischen Reihen”,Mathematischen Annalen, 5: 123–32.
  • –––, 1891, “Über eine elementareFrage der Mannigfaltigkeitslehre”,Jahresbericht derdeutschen Mathematiker-Vereiningung, 1: 75–8.
  • Cocchiarella, Nino B., 1985, “Frege’sDouble-Correlation Thesis and Quine’s Set Theories NF andML”,Journal of Philosophical Logic, 14(1): 1–39.doi:10.1007/BF00542647
  • Crabbé, Marcel, 1982, “On the Consistency of anImpredicative Subsystem of Quine’sNF”,Journal of Symbolic Logic, 47(1): 131–36.doi:10.2307/2273386
  • –––, 2016, “NFSI is not includedinNF3”,Journal of Symbolic Logic,81(3): 948–950. doi:10.1017/jsl.2015.29
  • Dedekind, Richard, 1872,Stetigkeit und irrationaleZahlen, Brannschweig: Friedrich Vieweg und Sohn (second edition,1892).
  • Esser, Olivier, 1999, “On the Consistency of a PositiveTheory”,Mathematical Logic Quarterly, 45(1):105–116. doi:10.1002/malq.19990450110
  • Feferman, Sol, 2006, “Enriched Stratified Systems for theFoundations of Category Theory” in Giandomenico Sica (ed.),What is Category Theory?, Milan: Polimetrica. [Feferman 2006 preprint available online (PDF)]
  • Frege, Gottlob, 1884,Die Grundlagen der Arithmetik,English translation by J.L. Austin,The Foundations ofArithmetic, Oxford: Blackwell, 1974.
  • Friedman, Harvey, 1973, “Some Applications of Kleene’sMethods for Intuitionistic Systems”, in A.R.D. Mathias and H.Rogers (eds.),Cambridge Summer School in Mathematical Logic,(Lecture Notes in Mathematics, 337), Berlin: Springer-Verlag, pp.113–170. doi:10.1007/BFb0066773
  • Grishin, V.N., 1969, “Consistency of a Fragment ofQuine’sNF System”,Soviet MathematicsDoklady, 10: 1387–1390.
  • Hallett, Michael, 1984,Cantorian Set Theory and Limitation ofSize, Oxford: Clarendon, pp. 280–286.
  • Hamkins, Joel David, 2012, “The Set-TheoreticMultiverse”,Review of Symbolic Logic, 5(3):416–449. doi:10.1017/S1755020311000359
  • Holmes, M. Randall, 1998,Elementary Set Theory with aUniversal Set, (Cahiers du Centre de logique, 10),Louvain-la-Neuve: Academia. (See chapter 20 for the discussion ofwell-founded extensional relation types.) [Holmes 1998 revised and corrected version available online (PDF)]
  • –––, 2012, “The Usual Model ConstructionforNFU Preserves Information”,Notre Dame Journalof Formal Logic, 53(4): 571–580.doi:10.1215/00294527-1722764
  • Jensen, Ronald Bjorn, 1968, “On the Consistency of a Slight(?) Modification of Quine’s ‘NewFoundations’”,Synthese, 19(1): 250–63.doi:10.1007/BF00568059
  • Kisielewicz, Andrzej, 1998, “A Very Strong SetTheory?”,Studia Logica, 61(2): 171–178.doi:10.1023/A:1005048329677
  • Kuratowski, Casimir [Kazimierz], 1921, “Sur la notion del’ordre dans la Théorie des Ensembles”,Fundamenta Mathematicae, 2(1): 161–171. [Kuratowski 1921 available online]
  • Lévy, Azriel, 1959, “On Ackermann’s SetTheory”,Journal of Symbolic Logic, 24(2):154–166. doi:10.2307/2964757
  • Mac Lane, Saunders, 1986,Mathematics, Form and Function,Berlin: Springer-Verlag.
  • Mathias, A.R.D., 2001a, “The Strength of Mac Lane SetTheory”,Annals of Pure and Applied Logic,110(1–3): 107–234. doi:10.1016/S0168-0072(00)00031-2
  • –––, 2001b, “Slim Models of Zermelo SetTheory”,The Journal of Symbolic Logic, 66(2):487–496. doi:10.2307/2695026
  • McLarty, Colin, 1992, “Failure of Cartesian Closedness inNF”,Journal of Symbolic Logic, 57(2):555–6. doi:10.2307/2275291
  • Nelson, Edward, 1977, “Internal Set Theory, a New Approachto Nonstandard Analysis”,Bulletin of the AmericanMathematical Society, 83(6): 1165–1198.doi:10.1090/S0002-9904-1977-14398-X
  • Quine, W.V.O., 1937, “New Foundations for MathematicalLogic”,American Mathematical Monthly, 44(2):70–80. doi:10.2307/2300564
  • –––, 1945, “On Ordered Pairs”,Journal of Symbolic Logic, 10(3): 95–96.doi:10.2307/2267028
  • Reinhardt, William N., 1970, “Ackermann’s Set TheoryEquals ZF”,Annals of Mathematical Logic, 2(2):189–249. doi:10.1016/0003-4843(70)90011-2
  • Robinson, Abraham, 1966,Non-standard Analysis,Amsterdam: North-Holland.
  • Rosser, J. Barkley, 1973,Logic for Mathematicians,second edition, New York: Chelsea.
  • Russell, Bertrand, 1903,The Principles of Mathematics,London: George Allen and Unwin.
  • Specker, Ernst P., 1953, “The Axiom of Choice inQuine’s ‘New Foundations for MathematicalLogic’”,Proceedings of the National Academy ofSciences of the United States of America, 39(9): 972–5. [Specker 1953 available online]
  • Spinoza, Benedict de, 1677,Ethics, reprinted andtranslated inA Spinoza Reader: the “Ethics” and OtherWorks, Edwin Curley (ed. and trans.), Princeton: PrincetonUniversity Press, 1994.
  • Tupailo, Sergei, 2010, “Consistency of StrictlyImpredicativeNF and a Little More …”,Journal of Symbolic Logic, 75(4): 1326–1338.doi:10.2178/jsl/1286198149
  • Vopěnka, Petr, 1979,Mathematics in the Alternative SetTheory, Leipzig: Teubner-Verlag.
  • Wang, Hao, 1970,Logic, Computers, and Sets, New York:Chelsea, p. 406.
  • Whitehead, Alfred North and Bertrand Russell, [PM]1910–1913,Principia Mathematica, 3 volumes, Cambridge:Cambridge University Press.
  • Wiener, Norbert, 1914, “A Simplification of the Logic ofRelations”,Proceedings of the Cambridge PhilosophicalSociety, 17: 387–390.
  • Zermelo, Ernst, 1908, “Untersuchen über die Grundlagender Mengenlehre I”,Mathematische Annalen, 65:261–281.

Other Internet Resources

Copyright © 2021 by
M. Randall Holmes<rholmes@boisestate.edu>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

Browse

About

Support SEP

Mirror Sites

View this site from another server:

USA (Main Site)Philosophy, Stanford University

The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054


[8]ページ先頭

©2009-2025 Movatter.jp