Proof theory is not an esoteric technical subject that was invented tosupport a formalist doctrine in the philosophy of mathematics; rather,it has been developed as an attempt to analyze aspects of mathematicalexperience and to isolate, possibly overcome, methodological problemsin the foundations of mathematics. The origins of those problems,forcefully and sometimes contentiously formulated in the 1920s, aretraceable to the transformation of mathematics in the nineteenthcentury: the emergence of abstract mathematics, its reliance on settheoretic notions, and its focus on logic in a broad, foundationalsense. Substantive issues came to the fore already in the mathematicalwork and the foundational essays of Dedekind and Kronecker; theyconcerned the legitimacy of undecidable concepts, the existence ofinfinite mathematical objects, and the sense of non-constructiveproofs of existential statements.
In an attempt to mediate between conflicting foundational positions,Hilbert shifted issues, already around 1900, from a mathematical to avaguely conceived metamathematical level. That approach was rigorouslyrealized in the 1920s, when he took advantage of the possibility offormalizing mathematics in deductive systems and investigated theunderlying formal frames from a strictly constructive,“finitist” standpoint. Hilbert’s approach raisedfascinating metamathematical questions—from semanticcompleteness through mechanical decidability to syntacticincompleteness; however, the hoped-for mathematical resolution of thefoundational issues was not achieved. The failure of hisfinitistconsistency program raised and deepened equally fascinatingmethodological questions. A broadened array of problems with onlypartial solutions has created a vibrant subject that spanscomputational, mathematical, and philosophical issues—with arich history.
The main part of our article covers these exciting investigations foran expanded Hilbert Program through 1999—with special, detailedattention to results and techniques that by now can be called“classical” and are of continued interest. Newer, butstill closely connected developments are sketched in Appendices: theproof theory of set theories in Appendix D,combinatorialindependence results in Appendix E, andprovably totalfunctions in Appendix F. Here (infinitary) sequent calculi andsuitable systems of ordinal notations are crucial proof theoretictools. However, we discuss in section 4.2 also Gödel’sDialectica Interpretation and some of its extensions as an alternativefor obtaining relative consistency proofs and describe in section5.2.1 the systematic attempt of completing the incomplete throughrecursive progressions. Both topics are analyzed further in AppendixC.2 and Appendix B, respectively. To complete this bird’s eyeview of our article, we mention that theEpilogue, section 6,not only indicates further proof theoretic topics, but also somedirections of current research that are connected to proof theory andof deep intrinsic interest. We have tried to convey the vibrancy of asubject that thrives on concrete computational and (meta-)mathematical work, but also invites and is grounded in generalphilosophical reflection.
Appendices
We have altogether six appendices that elaborate historical,mathematical and quite “technical” proof-theoretic matters:
Hilbert viewed theaxiomatic method as the crucial tool formathematics (and rational discourse in general). In a talk to theSwiss Mathematical Society in 1917, published the following year asAxiomatisches Denken (1918), he articulates his broadperspective on that method and presents it “at work” byconsidering, in detail, examples from various parts of mathematics andalso from physics. Proceeding axiomatically is not just developing asubject in a rigorous way from first principles, but rather requires,for advanced subjects, their deeper conceptual organization andserves, for newer investigations, as a tool for discovery. In his talkHilbert reflects on his investigations of the arithmetic of realnumbers and of Euclidean geometry from before 1900. We emphasize theparticular form of his axiomatic formulations; they are not logicalformulations, but rather mathematical ones: he defines Euclidean spacein a similar way as other abstract notions like group or field;that’s why we call itstructural axiomatics.[1] However, Hilbert turns from the past and looks to the future,pointing out a programmatic direction for research in thefoundations of mathematics; he writes:
To conquer this field [concerning the foundations of mathematics] wemust turn the concept of a specifically mathematical proof itself intoan object of investigation, just as the astronomer considers themovement of his position, the physicist studies the theory of hisapparatus, and the philosopher criticizes reason itself.
He then asserts, “The execution of this program is at presentstill an unsolved task”. During the following winter term1917–18, Hilbert—with the assistance of PaulBernays—gave a lecture course entitledPrinzipien derMathematik. Here modern mathematical logic is invented in onefell swoop and completes the shift from structural to formalaxiomatics. This dramatic shift allows the constructive, elementarydefinition of the syntax of theories and, in particular, of theconcept ofproof in a formal theory. This fundamental insightunderpins the articulation of the consistency problem and seems toopen a way of proving, meta-mathematically, that no proof in a formaltheory establishes a contradiction.
That perspective is formulated first in a talk Bernays presented inthe fall of 1921, published as “Über Hilberts Gedanken zurGrundlegung der Mathematik” (1922). Starting with a discussionofstructural axiomatics and pointing out its assumption of asystem of objects that satisfies the axioms, he asserts thisassumption contains “something so-to-speak transcendent formathematics”. He raises the question, “which principledposition should be taken with respect to it?” Bernays believesthat it might be perfectly coherent to appeal to anintuitivegrasp of the natural number sequence or even of the manifold of realnumbers. However, that could not be an intuition in any primitivesense and would conflict with the tendency of the exact sciences touse only restricted means to acquire knowledge.
Under this perspective we are going to try, whether it is not possibleto give a foundation to these transcendent assumptions in such a waythat only primitive intuitive knowledge is used. (Bernays 1922:11)
Meaningful mathematics is to be based, Bernays demands, on primitiveintuitive knowledge that includes, however, induction concerningnatural numbers—both as a proof and definition principle. In theoutline for the lecturesGrundlagen der Mathematik to begiven in the winter term 1921–22, Bernays discusses a few weeksafter his talk “constructive arithmetic” and then the“broader formulation of constructive thought”:
Construction of the proofs, by means of which the formalization of thehigher inferences is made possible and the consistency problem isbecoming accessible in a general way.
Bernays concludes the outline by suggesting, “This would befollowed by the development of proof theory”. The outline wassent to Hilbert on 17 October 1921 and it was followed strictly in thelectures of the following term—with only one terminologicalchange: “constructive” in the above formulations is turnedinto “finitist”.[2]
Bernays’s notes of the 1921/22 lectures reflect the consequenceof that change in attitude. They contain a substantial development of“finitist arithmetic” and “finitist logic” ina quantifier-free formalism. Finitist arithmetic involves inductionand primitive recursion[3] from the outset, and the central metamathematical arguments allproceed straightforwardly by induction on proof figures. The thirdpart of these lectures is entitledThe grounding of theconsistency of arithmetic by Hilbert’s new proof theory.Here we find the first significant consistency proof—from afinitist perspective.[4] The proof is sketched in Hilbert’s Leipzig talk (Hilbert 1923:184) and was presented in a modified form during the winter term of1922/23; in that form the proof is given in Ackermann 1925: 4–7.Ackermann’s article was submitted for publication in early 1924,and by then the proof had taken on a certain canonical form that isstill found in the presentation of Hilbert and Bernays 1934:220–230. Let us see what was achieved by followingAckermann’s concise discussion.
The proof is given in section II of Ackermann’s paper entitled,tellingly,The consistency proof before the addition of thetransfinite axioms. Ackermann uses a logical calculus inaxiomatic form that is taken over from Hilbert’s lectures and isdiscussed below insection 2. Here we just note that it involves two logical rules, namely, modusponens and substitution (for individual, function and statementvariables) in axioms. The non-logical axioms concern identity, zeroand successor, and recursion equations that define primitive recursivefunctions. The first step in the argument turns the linear proof intoa tree, so that any formula occurrence is used at most once as apremise of an inference (Auflösung in Beweisfäden);this is done in preparation for the second step, namely, theelimination of all necessarily free variables (Ausschaltung derVariablen); in the third step, the numerical value of the closedterms is computed (Reduktion der Funktionale). The resultingsyntactic configurations, aBeweisfigur, contains now onlynumerical formulae that are built up from equations orinequations between numerals and Boolean connectives; these formulaecan be effectively determined to be true or false. By induction on the“Beweisfigur” one shows that all its componentformulae are true; thus, a formula like \(0\ne 0\) is not provable.The induction principle can be directly incorporated into theseconsiderations when it is formulated as a rule for quantifier-freestatements. That was not done in Ackermann’s discussion of theproof, but had been achieved already by Hilbert and Bernays in their1922/23 lectures.
These proof theoretic considerations are striking and important asthey involve for the first time genuine transformations of formalderivations. Nevertheless, they are preliminary as they concern aquantifier-free theory that is apart of finitist mathematicsandneed not be secured by a consistency proof. What has tobe secured is “transfinite logic” with its “idealelements”, as Hilbert put it later. The strategy was direct andstarted to emerge already in 1921. First, introduce functional termsby thetransfinite axiom[5]
\[A(a) \to A(\varepsilon x\ldot A(x))\]and define quantifiers by
\[ \exists x A(x) \leftrightarrow A(\varepsilon x\ldot A(x)) \]and
\[\forall x A(x) \leftrightarrow A(\varepsilon x\ldot \neg A(x)).\]Using the epsilon terms, quantifiers can now be eliminated from proofsin quantificational logic, thus transforming them into quantifier-freeones. Finally, the given derivation allows one, so it was conjectured,to determine numerical values for the epsilon terms. In his Leipzigtalk of September 1922, published in 1923, Hilbert discussed thisAnsatz for eliminating quantifiers and reducing thetransfinite case to that of the quantifier-free theory. He presentedthe actual execution of this strategic plan only “for thesimplest case” (in Hilbert 1923: 1143–1144). However, thetalk was crucial in the development of proof theory and the finitistprogram: “With the structure of proof theory, presented to us inthe Leipzig talk, the principled form of its conception had beenreached”. That is how Bernays characterizes its achievement inhis essay onHilbert’s investigations of the foundations ofarithmetic (1935: 204)
Ackermann continued in section III of his 1925 at the very spot whereHilbert and Bernays had left off. His paper, submitted toMathematische Annalen in March of 1924, and the correctivework he did in 1925 led to the conviction that the consistency ofelementary arithmetic had been established. The corrective work hadbeen done to address difficulties von Neumann had pointed out, but wasnot published by Ackermann; it was only presented in the second volumeof Hilbert and Bernays 1939 (pp. 93–130).[6] Von Neumann’s own proof theoretic investigations, submitted toMathematische Zeitschrift in July 1925, were published underthe titleZur Hilbertschen Beweistheorie in 1927.Hilbert’s 1928 Bologna Lecture prominently tookAckermann’s and von Neumann’s work as having establishedthe consistency of elementary arithmetic, the proof making use only offinitist principles. LetF be a theory containing exclusivelysuch principles, likeprimitive recursive arithmeticPRA; the principles ofPRA (definedin more detail innote 3) consist of the Peano axioms for zero and successor, the definingequations for all primitive recursive functions, and induction foratomic formulas. Now the significance of a consistency proof inF can be articulated as follows:
Theorem 1.1 LetT be a theory thatcontains a modicum of arithmetic and letA be a\(\Pi^0_1\)-statement, i.e., one of the form \(\forallx_1\ldots\forall x_n\,P(x_1,\ldots,x_n)\) with quantifiers rangingover naturals andP a primitive recursive predicate, i.e., apredicate with a primitive recursive characteristic function. IfF proves the consistency ofT andT provesA, thenF provesA.
This theorem can be expressed and proved inPRA andensures that aT-proof of a “real”, finitisticallymeaningful statementA leads to a finitistically validstatement. This point is made clear in Hilbert’s 1927-Hamburglecture (Hilbert 1927). There he takesA to be the Fermatproposition and argues that if we had a proof ofA in a theorycontaining “ideal” elements, a finistist consistency prooffor that theory would allow us to transform that proof into a finitistone.
The belief that Ackermann and von Neumann had established theconsistency of elementary arithmetic was expressed as late as December1930 by Hilbert in his third Hamburg talk (Hilbert 1931a) and byBernays in April 1931 in a letter to Gödel (see Gödel 2003:98–103). Bernays asserts there that he has “repeatedlyconsidered [Ackermann’s modified proof] and viewed it ascorrect”. He continues, referring to Gödel’sincompleteness results,
On the basis of your results one must now conclude, however, that thatproof cannot be formalized within the systemZ [of elementarynumber theory]; this must in fact hold true even if the system isrestricted so that, of the recursive definitions, only those foraddition and multiplication are retained. On the other hand, Idon’t see at which place in Ackermann’s proof theformalization withinZ should become impossible, …
At the end of his letter, Bernays mentions that Herbrand misunderstoodhim in a recent conversation on which Herbrand had reported in aletter to Gödel with a copy to Bernays. Not only had Herbrandmisunderstood Bernays, but Bernays had also misunderstood Herbrand asto the extent of the latter’s consistency result that was to bepublished a few months later as Herbrand 1931. Bernays understoodHerbrand as having claimed that he had established the consistency offull first-order arithmetic: Herbrand’s system is indeed afirst-order theory with a rich collection of finitist functions, butit uses the induction principle only for quantifier-free formulae.[7] Gödel asserted in December 1933 that this theorem ofHerbrand’s was even then the strongest result that had beenobtained in the pursuit of Hilbert’s finitist program, and heformulated the result in a beautiful informal way as follows:
If we take a theory, which is constructive in the sense that eachexistence assertion made in the axioms is covered by a construction,and if we add to this theory the non-constructive notion of existenceand all the logical rules concerning it, e.g., the law of the excludedmiddle, we shall never get into any contradiction. (Gödel 1933:52)
Gödel himself had been much more ambitious in early 1930; hisgoal was then to prove the consistency of analysis! According to Wang(1981: 654), his idea was “to prove the consistency of analysisby number theory, where one can assume the truth of number theory, notonly the consistency”. The plan for establishing the consistencyof analysis relative to number theory did not work out, insteadGödel found that sufficiently strong formal theories likePrincipia Mathematica and Zermelo-Fraenkel set theory are(syntactically) incomplete.
In 1931 Gödel published a paper (1931a) that showed that thereare true arithmetic statements that cannot be proved in the formalsystem ofPrincipia Mathematica, assumingPM to beconsistent. His methods not only applied toPM but to anyformal system that contains a modicum of arithmetic. A couple ofmonths after Gödel had announced this result at a conference inKönigsberg in September 1930, von Neumann and Gödelindependently realized that a striking corollary could be drawn fromthe incompleteness theorem. Every consistent and effectivelyaxiomatized theory that allows for the development of basic parts ofarithmetic cannot prove its own consistency. This came to be known asthesecond incompleteness theorem. (For details on thesetheorems and their history seeappendix A.4) The second incompleteness theorem refutes the general ambitions ofHilbert’s program under the sole and very plausible assumptionthat finitist mathematics is contained in one of the formal theoriesof arithmetic, analysis or set theory. As a matter of fact,contemporary characterizations of finitist mathematics have elementaryarithmetic as an upper bound.[8] In response to Gödel’s result, Hilbert attempted in hislast published paper (1931b) to formulate a strategy for consistencyproofs that is reminiscent of his considerations in the early 1920s(when thinking about the object theories as constructive) and clearlyextends the finitist standpoint. He introduced a broad constructiveframework that includes full intuitionist arithmetic and suggestedextendibility of the new “method” to “the case offunction variables and even higher sorts of variables”. He alsoformulated a new kind of rule that allowed the introduction of a newaxiom \(\forall x A(x)\) as soon as all the numerical instances\(A(n)\) had been established by finitist proofs; in 1931 that is donefor quantifier-free \(A(x)\), whereas in 1931b that is extended toformulae of arbitrary complexity. The semi-formal calculi, whicharticulate the broader framework, are based on rules that reflectmathematical practice, but also define the meaning of logicalconnectives. Indeed, Hilbert’s reasons for taking them to beevidently consistent are expressed in a single sentence: “Alltransfinite rules and inference schemata are consistent; for theyamount to definitions”. Adding thetertium non datur inthe form
\[{\forall x A(x)} \lor {\exists x \neg A(x)}\]yields now the classical version of the theory and it is that additionthat has to be justified.[9] Hilbert’s problematic considerations for this newmetamathematical step inspired Gentzen’s“Urdissertation” when he began working in late 1931 on aconsistency proof for elementary arithmetic.[10]
As part of his “Urdissertation”, Gentzen had establishedby the end of 1932 the reduction of classical to intuitionistarithmetic, a result that had also been obtained by Gödel.Gentzen’s investigations led, finally in 1935, to his firstconsistency proof for arithmetic. In the background was a normal formtheorem for intuitionist logic that will be discussed in the nextsection together with Gentzen’s actual dissertation and thespecial calculi he introduced there. Now we just formulate theGentzen-Gödel result “connecting” classicalfirst-order number theoryPA with its intuitionistversionHA. The non-logical principles of thesetheories aim at describing \(\fN\), the arguably most importantstructure in mathematics, namely, Dedekind’s simply infinitesystem \(\bbN\) together with zero, successor, multiplication,exponentiation and the less-than relation:
\[{\fN}=(\bbN; 0^{\bbN}, S^{\bbN},+^{\bbN},\times^{\bbN},E^{\bbN},<^{\bbN}).\]They are formulated in the first-order language that has the relationsymbols =, <, the function symbolsS, +, \(\times\),E and the constant symbol 0. The axioms comprise the usualequations for zero, successor, addition, multiplication,exponentiation, and the less-than relation. In addition, theinduction principle is given by the schema \[\tag{IND}{F(0)} \land {\forall x[F(x)\to F(Sx)]\to \forall x F(x)}\] forall formulae \(F(x)\) of the language. These principles together withclassical logic constitute the theory offirst orderarithmetic orfirst order number theory, also known asDedekind-Peano arithmetic,PA; together withintuitionist logic they constituteintuitionistic first orderarithmetic commonly known asHeyting-arithmetic,HA.
Now we are considering the syntactic translation \(\tau\) from thecommon language ofPA andHA intoits “negative” fragment that replaces disjunctions \(A\lorB\) by their de Morgan equivalent \(\neg (\neg A\land \neg B)\) andexistential statements \(\exists x A(x)\) by \(\neg \forall x \negA(x)\). The reductive result obtained by Gentzen and Gödel in1933 is now stated quite easily:
Theorem 1.2
PA proves the equivalence ofA and \(\tau(A)\)for any formulaA.
IfPA provesA, thenHAproves \(\tau(A)\).
For atomic sentences like \(1\ne 1\) the translation \(\tau\) isclearly the identity, and the provability of \(1\ne 1\) inPA would imply its provability inHA. Thus,PA is consistent relativetoHA. This result is technically of great interestand had a profound effect on the perspective concerning therelationship between finitism and intuitionism: finitist andintuitionist mathematics were considered as co-extensional; thistheorem showed that intuitionist mathematics is actually stronger thanfinitist mathematics. Thus, if the intuitionist standpoint is taken toguarantee the soundness ofHA, then it guarantees theconsistency ofPA. The corresponding connectionbetween classical and intuitionist logic had been established alreadyby Kolmogorov (1925) who not only formalized intuitionist logic butalso observed the translatability of classical into intuitionistlogic. His work, though, seems not to have been noticed at the time oreven in 1932, when Gentzen and Gödel established their result forclassical and intuitionist arithmetic.
The foundational discussion concerning extended“constructive” viewpoints is taken up insection 4. There, and throughout our paper the concepts of“proof-theoretic reducibility” and “proof-theoreticequivalence” will play a central role. The connection betweenPA andHA is paradigmatic and leadsto the notion ofproof-theoretic reduction. Before we canfurnish a precise definition, we should perhaps stress that manyconcepts can be expressed in the language ofPRA (aswell asPA) via coding, also known asGödelnumbering. Any finite object such as a string of symbols or anarray of symbols can be coded via a single natural number in such away that the string or array can be retrieved from the number when weknow how the coding is done. Typical finite objects include formulaein a given language and also proofs in a theory. Talk about formulaeor proofs can then be replaced by talking about predicates of numbersthat single out the codes of formulae and proofs, respectively. Wethen say that the concepts of formula and proof have been arithmetizedand thereby rendered expressible in the language ofPRA.
Definition 1.3 Let \(\bT_1\), \(\bT_2\) be apair of theories with languages \(\cL_1\) and \(\cL_2\), respectively,and let \(\Phi\) be a (primitive recursive) collection of formulaecommon to both languages. Furthermore, \(\Phi\) should contain theclosed equations of the language ofPRA.
We then say that \(\bT_1\) isproof-theoretically\(\Phi\)-reducible to \(\bT_2\), written\(\bT_1\leq_{\Phi}\bT_2\), if there exists a primitive recursivefunctionf such that
\[\tag{1} \PRA\vdash \forall x \forall y\,[ {\rform_{\Phi}(x)} \land {\proof_{\bT_1}(y,x)} \rightarrow {\proof_{\bT_2}(f(y),x)}].\]Here \(\rform_{\Phi}\) and \(\proof_{\bT_i}\) are arithmetizedformalizations of \(\Phi\) and the proof relation in \(\bT_i\),respectively, i.e., \(\rform_{\Phi}(x)\) expresses thatx isthe Gödel number of a formula in \(\Phi\) while\(\proof_{\bT_i}(y,x)\) expresses thaty codes a proof in\(\bT_i\) of a formula with Gödel numberx.
\(\bT_1\) and \(\bT_2\) are said to beproof-theoretically\(\Phi\)-equivalent, written \(\bT_1\equiv_{\Phi}\bT_2\), if\(\bT_1\leq_{\Phi}\bT_2\) and \(\bT_2\leq_{\Phi}\bT_1\).
The appropriate class \(\Phi\) is revealed in the process of reductionitself, so that in the statement of theorems we simply say that\(\bT_1\) isproof-theoretically reducible to \(\bT_2\)(written \(\bT_1\leq \bT_2\)) and \(\bT_1\) and \(\bT_2\) areproof-theoretically equivalent (written \(\bT_1 \equiv\bT_2\)), respectively. Alternatively, we shall say that \(\bT_1\) and\(\bT_2\) have thesame proof-theoretic strength when\(\bT_1\equiv \bT_2\). In practice, if \(\bT_1\equiv \bT_2\) is shownvia proof-theoretic means[11] this always entails that the two theories prove at least the same\(\Pi^0_2\) sentences (those of the complexity of the twin primeconjecture). The complexity of formulae ofPRA isstratified as follows. The \(\Sigma^0_0\) and \(\Pi^0_0\) formulae areof the form \(R(t_1,\ldots,t_n)\) whereR is a predicate symbolfor ann-ary primitive recursive predicate. A formula is\(\Sigma^0_{k+1}\) (\(\Pi^0_{k+1}\)) if it is of the form
\[\exists y_1\ldots \exists y_m\,F(y_1,\ldots,y_m) (\forall y_1\ldots \forall y_m\,F(y_1,\ldots,y_m))\]with \(F(y_1,\ldots,y_m)\) being of complexity \(\Pi^0_k\)(\(\Sigma^0_k\)). Thus the complexity of a formula is measured interms of quantifier alternations. For instance \(\Pi^0_2\)-formulaehave two alternations starting with a block of universal quantifiersor more explicitly they are of the shape
\[\forall x_1\ldots\forall x_n \exists y_1\ldots \exists y_m\,R(x_1,\ldots,x_n,y_1,\ldots,y_m)\]withR primitive recursive.
For the reduction of classical elementary number theory to itsintuitionist version, Gödel and Gentzen used different logicalcalculi. Gödel used the system Herbrand had investigated in his1931, whereas Gentzen employed the formalization of intuitionistarithmetic from Heyting 1930. For his further finitist investigationsGentzen introduced new calculi that were to become of utmostimportance for proof theory: natural deduction and sequentcalculi.
As we noted above, Gentzen had already begun in 1931 to be concernedwith the consistency of full elementary number theory. As the logicalframework he used, what we now call, natural deduction calculi. Theyevolved from an axiomatic calculus that had been used by Hilbert andBernays since 1922 and introduced an important modification of thecalculus for sentential logic. The connectives \(\land \) and \(\lor\)are incorporated, and the axioms for these connectives are as follows:
\[\begin{align}A\land B & \to A \\ A\land B & \to B \\A & \to (B \to A\land B) \\A & \to A\lor B \\ B & \to A\lor B \\(A \to C) & \to ((B \to C) \to (A\lor B \to C))\end{align}\]Hilbert and Bernays introduced this new logical formalism for tworeasons, (i) to be able to better and more easily formalizemathematics, and (ii) to bring out the understanding of logicalconnectives in methodological parallel to the treatment of geometricconcepts inFoundations of geometry. The methodologicaladvantages of this calculus are discussed in Bernays 1927: 10:
The starting formulae can be chosen in quite different ways. A greatdeal of effort has been spent, in particular, to get by with a minimalnumber of axioms, and the limit of what is possible has indeed beenreached. However, for the purposes of logical investigations it isbetter to separate out, in line with the axiomatic procedure forgeometry, different axiom groups in such a way that each of themexpresses the role of a single logical operation.
Then Bernays lists four groups, namely, axioms for the conditional\(\to\), for \(\land \) and \(\lor\) as above, and for negation\(\neg\). The axioms for the conditional are not only reflectinglogical properties, but also structural features as in the latersequent calculus (and in Frege’sBegriffsschrift,1879).
\[\begin{align}A & \to (B \to A)\\(A \to (A \to B)) & \to (A \to B)\\(A \to (B \to C)) & \to (B \to (A \to C))\\(B \to C) & \to ((A \to B) \to (A \to C))\end{align}\]As axioms for negation one can choose:
\[\begin{align}A & \to (\neg A \to B)\\(A \to B) & \to ((\neg A \to B) \to B)\end{align}\]Hilbert formulates this logical system inÜber dasUnendliche and in his second Hamburg talk of 1927, but gives aslightly different formulation of the axioms for negation, callingthem theprinciple of contradiction and theprinciple ofdouble negation:
\[\begin{align}(A \to (B \land \neg B)) &\to \neg A\\\neg\neg A & \to A\end{align}\]Clearly, the axioms correspond directly to the natural deduction rulesfor these connectives, and one finds here the origin ofGentzen’s natural deduction calculi. Bernays had investigated inhisHabilitationsschrift (1918) rule based calculi. However,in the given context, the simplicity of the metamathematicaldescription of calculi seemed paramount, and in Bernays 1927 (p. 17)one finds the programmatic remark: “We want to have as few rulesas possible, rather put much into axioms”.
Gentzen was led to a rule-based calculus withintroductionandelimination rules for every logical connective. The trulydistinctive feature of this new type of calculus was for Gentzen,however, making and discharging assumptions. This feature, heremarked, most directly reflects a crucial aspect of mathematical argumentation.[12] Here we formulate the distinctive rules that involve contradictionsand go beyond minimal logic that has I- and E-rules for all logicalconnectives. Intuitionist logic is obtained from minimal logic byadding the rule: from \(\perp\) infer any formulaA, i.e.,ex falso quodlibet. In the case of classical logic, if aproof of \(\perp\) is obtained from the assumption \(\neg A\), theninferA (and cancel the assumption \(\neg A\)). Gentzendiscovered a remarkable fact for the intuitionist calculus, havingobserved that proofs can have peculiardetours of thefollowing form: a formula is obtained by an I-rule and is then themajor premise of the corresponding E-rule. For conjunction such adetour is depicted as follows:
\[\cfrac{\begin{array}{c}\vdots \\ A\end{array} \quad \begin{array}{c}\vdots \\ B\end{array}}{\cfrac{A\land B}{B}}\]Clearly, a proof ofB is already contained in the givenderivation. Proofs without detours are callednormal, andGentzen showed that any proof can be effectively transformed via“reduction steps” into a normal one.
Theorem 2.1 (Normalization for intuitionistlogic) A proof ofA from a set of assumptions \(\Gamma\)can be transformed into a normal proof ofA from the same setof assumptions.
Focusing on normal proofs, Gentzen proved then that the complexity offormulae in such proofs can be bounded by that of assumptions andconclusion.
Corollary 2.2 (Subformula property) If\(\cD\) is a normal proof ofA from \(\Gamma\), then everyformula in \(\cD\) is either a subformula of an element \(\Gamma\) orofA.
As Gentzen recounts matters at the very beginning of his dissertation(1934/35), he was led by the investigation of the natural calculus tohisHauptsatz[13] when he could not extend the considerations to classical logic.
To be able to formulate it [theHauptsatz] in a direct way, Ihad to base it on a particularly suitable logical calculus. Thecalculus of natural deduction turned out not to be appropriate forthat purpose.
So, Gentzen focused his attention on sequent calculi that had beenintroduced by Paul Hertz and which had been the subject ofGentzen’s first scientific paper (1932).
In his thesis Gentzen introduced a form of the sequent calculus andhis technique ofcut elimination. As this is a tool of utmostimportance in proof theory, an outline of the underlying ideas will bediscussed next. The sequent calculus can be generalized to so-calledinfinitary logics and is central for ordinal analysis. TheHauptsatz is also called thecut eliminationtheorem.
We use upper case Greek letters\(\Gamma,\Delta,\Lambda,\Theta,\Xi\ldots\) to range over finite listsof formulae. \(\Gamma\subseteq \Delta\) means that every formula of\(\Gamma\) is also a formula of \(\Delta\). Asequent is anexpression \(\Gamma\Rightarrow \Delta\) where \(\Gamma\) and\(\Delta\) are finite sequences of formulae \(A_1,\ldots,A_n\) and\(B_1,\ldots, B_m\), respectively. We also allow for the possibilitythat \(\Gamma\) or \(\Delta\) (or both) are empty. The empty sequencewill be denoted by \(\emptyset\). \(\Gamma \Rightarrow \Delta\) isread, informally, as \(\Gamma\) yields \(\Delta\) or, rather, theconjunction of the \(A_i\) yields thedisjunction ofthe \(B_j\).
Thelogical axioms of the calculus are of the form
\[A \Rightarrow A \]whereA is any formula. In point of fact, one could limit thisaxiom to the case of atomic formulaeA. We havestructuralrules of the form
\[\frac{\Gamma \Rightarrow \Delta}{\Gamma' \Rightarrow \Delta'}\qquad \textrm{if } \Gamma \subseteq \Gamma', \Delta \subseteq \Delta'. \]A special case of the structural rule, known ascontraction,occurs when the lower sequent has fewer occurrences of a formula thanthe upper sequent. For instance, \(A, \Gamma\Rightarrow\Delta, B\)follows structurally from \(A,A,\Gamma \Rightarrow \Delta,B,B\).
Now we list the rules for the logical connectives.
\[\begin{array}{cc} \textrm{Left} & \textrm{Right} \\\displaystyle \frac{\Gamma \Rightarrow {\Delta,A}}{{\neg A, \Gamma} \Rightarrow \Delta} & \displaystyle \frac{{B, \Gamma} \Rightarrow {\Delta}}{{\Gamma} \Rightarrow {\Delta, \neg B}}\\[2ex]\displaystyle \frac{{\Gamma} \Rightarrow {\Delta, A} \qquad {B,\Lambda} \Rightarrow {\Theta}}{{{A\rightarrow B},\Gamma, \Lambda} \Rightarrow {\Delta, \Theta}}& \displaystyle \frac{{A, \Gamma} \Rightarrow {\Delta, B}}{{\Gamma} \Rightarrow {\Delta, {A \rightarrow B}}}\\[2ex]\displaystyle \frac{{A, \Gamma} \Rightarrow {\Delta}}{{A\land B,\Gamma} \Rightarrow {\Delta}}\quad \frac{{B, \Gamma} \Rightarrow {\Delta}}{{A\land B,\Gamma} \Rightarrow {\Delta}} &\displaystyle \frac{{\Gamma} \Rightarrow {\Delta, A} \qquad {\Gamma} \Rightarrow {\Delta, B}} {{\Gamma} \Rightarrow {\Delta, {A\land B}}} \\[2ex]\displaystyle \frac{{A,\Gamma} \Rightarrow {\Delta} \qquad {B,\Gamma} \Rightarrow {\Delta}}{{{A\lor B},\Gamma} \Rightarrow {\Delta}} & \displaystyle \frac{{\Gamma} \Rightarrow {\Delta, A}}{{\Gamma} \Rightarrow {\Delta, {A\lor B}}} \quad \frac{{\Gamma} \Rightarrow {\Delta, B}}{{\Gamma} \Rightarrow {\Delta, {A\lor B}}} \end{array}\]\[\begin{array}{lccclc}\forall L & \displaystyle \frac{F(t),\Gamma \Rightarrow \Delta}{\forall x F(x),\Gamma \Rightarrow \Delta} & &&\forall R & \displaystyle \frac{\Gamma \Rightarrow {\Delta, F(a)}}{\Gamma \Rightarrow {\Delta, \forall x F(x)}} \\[2ex]\exists L & \displaystyle \frac{F(a),\Gamma \Rightarrow \Delta}{\exists x F(x),\Gamma \Rightarrow \Delta} & &&\exists R & \displaystyle \frac{\Gamma \Rightarrow {\Delta, F(t)}}{\Gamma \Rightarrow {\Delta, \exists x F(x)}} \end{array}\]In \(\forall L\) and \(\exists R\),t is an arbitrary term. Thevariablea in \(\forall R\) and \(\exists L\) is aneigenvariable of the respective inference, i.e.,a isnot to occur in thelower sequent.
Finally, we have the specialCut Rule
\[\frac{\Gamma \Rightarrow {\Delta, A} \qquad {A, \Lambda} \Rightarrow \Theta}{{\Gamma, \Lambda} \Rightarrow {\Delta, \Theta}} \tag*{Cut}\]The formulaA is called thecut formula of theinference.
In the rules for logical operations, the formulae highlighted in thepremises are called theminor formulae of that inference,while the formula highlighted in the conclusion is theprincipalformula of that inference. The other formulae of an inference arecalledside formulae. Aproof (also known as adeduction orderivation) \(\cD\) is a tree ofsequents satisfying the conditions that (i) the topmost sequents of\(\cD\) are logical axioms and (ii) every sequent in \(\cD\) exceptthe lowest one is an upper sequent of an inference whose lower sequentis also in \(\cD\). A sequent \(\Gamma \Rightarrow \Delta\) isdeducible if there is a proof having \(\Gamma \Rightarrow\Delta\) as its bottom sequent.
The Cut rule differs from the other rules in an important respect.With the rules for introducing connectives, one sees that everyformula that occurs above the line occurs below the line eitherdirectly, or as a subformula of a formula below the line. That is alsotrue for the structural rules. (Here \(A(t)\) is counted as asubformula, in a slightly extended sense, of both \(\exists xA(x)\)and \(\forall xA(x)\).) But in the case of the Cut rule, the cutformulaA disappears. Gentzen showed that rules with“vanishing formulas” can be eliminated.
Theorem 2.3 (Gentzen’sHauptsatz) If a sequent \(\Gamma \Rightarrow \Delta\)is deducible, then it is also deducible without the Cut rule; theresulting proof is calledcut-free ornormal.
The secret to Gentzen’sHauptsatz is the symmetry ofleft and right rules for all the logical connectives includingnegation. The proof of the cut elimination theorem is rather intricateas the process of removing cuts interferes with the structural rules.It is contraction that accounts for the high cost of eliminating cuts.Let \(\lvert \cD\rvert\) be theheight of the deduction\(\cD\) and let \(rank(\cD)\) be thesupremum of thelengths of cut formulae occurring in \(\cD\). Turning \(\cD\)into a cut-free deduction of the same end sequent results, in theworst case, in a deduction of height \(\cH(rank(\cD),\lvert\cD\rvert)\) where \(\cH(0,n)=n\) and \(\cH(k+1,n)=4^{\cH(k,n)}\),yielding hyper-exponential growth.
The sequent calculus we have been discussing allows the proof ofclassically, but not intuitionistically correct formulae, for example,the law of excluded middle. An intuitionist version of the sequentcalculus can be obtained by a very simple structural restriction:there can be at most one formula on the right hand side of the sequentsymbol \(\Rightarrow\). The cut elimination theorem is also provablefor this intuitionist variant. In either case, theHauptsatzhas an important corollary that parallels that of the Normalizationtheorem (for intuitionist logic) and expresses the subformulaproperty.
Corollary 2.4 (Subformula property) If\(\cD\) is a cut-free proof of the sequent\(\Gamma\Rightarrow\Delta\), then all formulae in \(\cD\) aresubformulae of elements in either \(\Gamma\) or \(\Delta\).
This Corollary has another direct consequence that explains thecrucial role of theHauptsatz for obtaining consistencyproofs.
Corollary 2.5 (Consistency) A contradiction,i.e., the empty sequent \(\emptyset \Rightarrow \emptyset\), is notprovable.
Proof: Assume that the empty sequent is provable;then, according to theHauptsatz it has a cut-free derivation\(\cD\). The previous corollary assures us that only empty sequentscan occur in \(\cD\); but such a \(\cD\) does not exist since everyproof must contain axioms. \(\qed\)
The foregoing results are solely concerned with pure logic. Formaltheories that axiomatize mathematical structures or serve as formalframeworks for developing substantial chunks of mathematics are basedon logic but have additional axioms germane to their purpose. If theyare of the latter kind, such asfirst-order arithmetic orZermelo-Fraenkel set theory, they will assert the existenceofmathematical objects and their properties. What happenswhen we try to apply the procedure of cut elimination to theories?Axioms are usually detrimental to this procedure. It breaks downbecause the symmetry of the sequent calculus is lost. In general, onecannot remove cuts from deductions in a theoryT when the cutformula is an axiom ofT. However, sometimes the axioms of atheory are ofbounded syntactic complexity. Then theprocedure applies partially in that one can remove all cuts thatexceed the complexity of the axioms ofT. This gives rise topartial cut elimination. It is a very important tool in prooftheory. For example, it can be used to analyze theories withrestricted induction (such as fragments ofPA; cf.Sieg 1985). It also works very well if the axioms of a theory can bepresented asatomic intuitionist sequents (also calledHorn clauses), yielding the completeness ofRobinson’sresolution method (see Harrison2009).
Using theHauptsatz and its Corollary, Gentzen was able tocapture all of the consistency results that had been obtained prior to1934 including Herbrand’s, that had been called by Gödel inhis 1933 “the most far-reaching” result. They had beenobtained at least in principle for fragments of elementary numbertheory; in practice, Gentzen did not include the quantifier-freeinduction principle. Having completed his dissertation, Gentzen wentback to investigate natural deduction calculi and obtained in 1935 hisfirst consistency proof for full first-order arithmetic. Heformulated, however, the natural calculus now in “sequentform”: instead of indicating the assumptions on which aparticular claim depended by the undischarged ones in its proof tree,they are attached now to every node of the tree. The“sequents” that are being proved are of the form \(\Gamma\Rightarrow A\), where all the logical inferences are carried out onthe right hand side. This proof was published only in 1974; it wassubsequently analyzed most carefully in Tait 2015 and Buchholz 2015.It was due to criticism of Bernays and Gödel that Gentzenmodified his consistency proof quite dramatically; he made use oftransfinite induction, as will be discussed in detail in the nextsection. Here we just mention that Bernays extensively discussedtransfinite induction inGrundlagen der Mathematik II. Themain issue for Bernays was the question,is it still a finitistprinciple?—Bernays did not discuss, however, two otheraspects of Gentzen’s work, namely, the use of structuralfeatures of formal deductions in consistency proofs and the attempt ofgaining a constructive, semantic understanding of intuitionistarithmetic. The former became crucial for proof theoreticinvestigations; the latter influenced Gödel and his functionalinterpretation via computable functionals of finite type.[14] The two aspects together opened a new era for proof theory andmathematical logic with the goal of proving the consistency ofanalysis. We will see, how far current techniques lead us and whatfoundational significance one can attribute to them.
Cut elimination fails for first-order arithmetic (i.e.,PA), not even partial cut elimination is possiblesince the induction axioms have unbounded complexity. Gentzen,however, found an ingenious way of dealing with purportedcontradictions in arithmetic. In Gentzen 1938b he showed how toeffectively transform an allegedPA-proof of aninconsistency (the empty sequent) in his sequent calculus into anotherproof of the empty sequent such that the latter gets assigned asmaller ordinal than the former. Ordinals are a central concept in settheory as well as in proof theory. To present Gentzen’s work weshall first discuss the notion of ordinal from a proof-theoretic pointof view.
This is the first time we talk about the transfinite and ordinals inproof theory. Ordinals have become very important in advanced prooftheory. The concept of an ordinal is a generalization of that of anatural number. The latter are used for counting finitely many thingswhereas ordinals can also “count” infinitely many things.It is derived from the concept of an ordering \(\prec\) of a setX which arranges the objects ofX in order, one afteranother, in such a way that for every predicateP onXthere is always a first element ofX with respect to \(\prec\)that satisfiesP if there is at least one object inXsatisfyingP. Such an ordering is called awell-ordering ofX. The usual less-than relation on\(\bbN\) is certainly a well-ordering. Here every number \(\ne 0\) isthe successor of another number. If one orders the natural numbers\(>0\) in the usual way but declares that 0 is bigger than everynumber \(\ne 0\) one arrives at another ordering of \(\bbN\).Let’s call it \(\prec\). \(\prec\) is also a well-ordering of\(\bbN\). This time 1 is the least number with respect to \(\prec\).However, 0 plays a unique role. There are infinitely many numbers\(\prec 0\) and there is no number \(m\prec 0\) such that 0 is thenext number afterm. Such numbers are are called limit numbers(with respect to \(\prec\)).
In order to be able to formulate Gentzen’s results from the endof section 3.3, we have to “arithmetize” the treatment ofordinals. Let us first state some precise definitions and a Cantoriantheorem.
Definition 3.1 A non-empty setAequipped with a total ordering \(\prec\) (i.e., \(\prec\) istransitive, irreflexive, and
\[\forall x,y\in A\,[{x\prec y}\lor {x=y}\lor {y\prec x}])\]is awell-ordering if every non-empty subsetX ofA contains a \(\prec\)-least element, i.e.,
\[(\exists u\in X)(\forall y\in X)[{u\prec y}\lor {u=y}].\]The elements of a well-ordering \((A,\prec)\) can be divided intothree types. SinceA is non-empty there is a least element withrespect to \(\prec\) which is customarily denoted by 0 or \(0_A\).Then there are elements \(a\in A\) such that there exists \(b\prec a\)but there is noc betweenb anda. These are thesuccessor elements ofA, witha being the successor ofb. An element \(c\in A\) such that \(0\prec c\) and for all\(b\prec c\) there exists \(d\in A\) with \(b\prec d\prec c\) is saidto be a limit element ofA.
In set theory a set is calledtransitive just in case all itselements are also subsets. Anordinal in the set-theoreticsense is a transitive set that is well-ordered by the elementhoodrelation \(\in\). It follows that each ordinal is the set of itspredecessors. According to the trichotomy above, there is a leastordinal (which is just the empty set) and all other ordinals areeither successor or limit ordinals. The first limit ordinal is denotedby \(\omega\).
Fact 3.2 Every well-ordering \((A,\prec)\)is order isomorphic to a unique ordinal \((\alpha,\in)\).
Ordinals are traditionally denoted by lower case Greek letters\(\alpha,\beta,\gamma,\delta,\ldots\) and the relation \(\in\) onordinals is notated simply by \(<\). If \(\beta \) is a successorordinal, i.e., \(\beta\) is the successor of some (necessarily unique)ordinal \(\alpha\), we also denote \(\beta\) by \(\alpha'\). Anotherimportant fact is that for any family of ordinals \(\alpha_i\) for\(i\in I\) (I some set) there is a smallest ordinal, denoted by\(\sup_{i\in I}\alpha_i\), that is bigger than every ordinal\(\alpha_i\).
The operations of addition, multiplication, and exponentiation can bedefined on all ordinals by using case distinctions and transfiniterecursion (on \(\alpha\)). The following states the definitions justto convey the flavor:
\[\begin{align} \beta+0 &= \beta & \beta+\alpha' &=(\beta+\alpha)' &\displaystyle \beta+\lambda &=\sup_{\xi<\lambda}(\beta+\xi)\\\beta\cdot 0 &= 0 & \beta\cdot \alpha' &=(\beta\cdot\alpha)+\beta &\displaystyle \beta\cdot \lambda &=\sup_{\xi<\lambda}(\beta\cdot\xi)\\\beta^ 0 &= 0' & \beta^{ \alpha'} &=\beta^{\alpha}\cdot \beta &\displaystyle \beta^\lambda &=\sup_{\xi<\lambda}(\beta^\xi)\end{align}\]However, addition and multiplication are in general notcommutative.
We are interested in representing specific ordinals \(\alpha\) asrelations on \(\bbN\). In essence Cantor defined the first ordinalrepresentation system in 1897. Naturalordinal representationsystems are frequently derived from structures of the form
\[\tag{2}{\frakA} = \langle\alpha,f_1,\ldots,f_n,<_{\alpha}\rangle\]where \(\alpha\) is an ordinal, \(<_{\alpha}\) is the ordering ofordinals restricted to elements of \(\alpha\) and the \(f_i\) arefunctions
\[\tag{3}f_i:\underbrace{\alpha\times\cdots\times\alpha}_{k_i \textrm{ times}}\longrightarrow \alpha\]for some natural number \(k_i\).
\[ \bbA = \langle A,g_1,\ldots,g_n,\prec\rangle\]is acomputable (orrecursive)representation of \(\frakA\) if the following conditionshold:
\(A\subseteq\bbN\) andA is a computable set.
\(\prec\) is a computable total ordering onA and the functions\(g_i\) are computable.
\(\frakA \cong\bbA\), i.e., the two structures areisomorphic.
Theorem 3.3 (Cantor 1897) For every ordinal\(\beta>0\) there exist unique ordinals\(\beta_0\geq\beta_1\geq\dots\geq\beta_n\) such that
\[\tag{4}\label{epsilon} \beta = \omega^{\beta_0}+\ldots+\omega^{\beta_n}.\]The representation of \(\beta\) in (4) is called itsCantor normalform. We shall write \(\beta \mathbin{=_{\sCNF}}\omega^{\beta_1}+\cdots +\omega^{\beta_n}\) to convey that\(\beta_0\geq\beta_1\geq\dots\geq\beta_k\).
The rather famous ordinal that emerged in Gentzen’s consistencyproof ofPA is denoted by \(\varepsilon_0\). Itrefers to first ordinal \(\alpha>0\) such that \((\forall\beta<\alpha)\,\omega^{\beta}<\alpha\). \(\varepsilon_0\) canalso be described as the least ordinal \(\alpha\) such that\(\omega^{\alpha}=\alpha\).
Ordinals \(\beta<\varepsilon_0\) have a Cantor normal form withexponents \(\beta_i<\beta\) and these exponents have Cantor normalforms with yet again smaller exponents. As this process mustterminate, ordinals \(<\varepsilon_0\) can be coded by naturalnumbers. For instance a coding function
\[{\Corner{\mathord{\,.\,}}}:\varepsilon_0\longrightarrow \bbN\]could be defined as follows:
\[{\Corner{\alpha}} = \left\{\begin{array}{ll} 0 & \textrm{if } \alpha=0\\ \langle{\Corner{\alpha_1}},\ldots,{\Corner{\alpha_n}}\rangle & \textrm{if } \alpha \mathbin{=_{\sCNF}} \omega^{\alpha_1}+\cdots+\omega^{\alpha_n} \end{array}\right.\]where \(\langle k_1,\cdots,k_n\rangle\coloneqq2^{k_1+1}\cdot\ldots\cdot p_n^{k_n+1}\) with \(p_i\) being theith prime number (or any other coding of tuples). Furtherdefine:
\[\begin{align}A_0 &{} \coloneqq \textrm{range of }\ {\Corner{\mathord{.} }} \\{\Corner{\alpha}}\prec{\Corner{\beta}} &{} \mathbin{:\Leftrightarrow} \alpha<\beta \\ {\Corner{\alpha}} \mathop{\hat{+}} {\Corner{\beta}} &{} \coloneqq {\Corner{\alpha+\beta}} \\{\Corner{\alpha}} \mathop{\hat{\cdot}} {\Corner{\beta}} &{} \coloneqq {\Corner{\alpha\cdot\beta}} \\\hat{\omega}^{{\Corner{\alpha}}} &{} \coloneqq {\Corner{\omega^{\alpha}}}.\\\end{align}\]Then, using arrow notation \(x\mapsto f(x)\) for functions \(f\),
\[ {\langle\varepsilon_0,+,\cdot,\delta\mapsto \omega^{\delta},<\rangle}\cong {\langle A_0,\hat{+},\hat{\cdot},x\mapsto\hat{\omega}^{x},\prec\rangle}.\]\(A_0,\hat{+},\hat{\cdot},x\mapsto\hat{\omega}^{x},\prec\) areprimitive recursive. Finally, we can spell out the schemePR-TI\((\varepsilon_0)\) in the language ofPA:
\[\forall x\, {\left[\forall y\, \left({y\prec x} \to {P(y)}\right) \to {P(x)}\right]} \to {\forall x\, {P(x)}}\]for all primitive recursive predicatesP.
Given a natural ordinal representation system \(\langleA,\prec,\ldots\rangle\) of order type \(\tau\) let\(\PRA+\rTI_{\qf}(<\tau)\) bePRA augmented byquantifier-free induction over all initial (externally indexed)segments of \(\prec\). This is perhaps best explained via therepresentation system for \(\varepsilon_0\) given above. There one cantake the initial segments of \(\prec\) to be determined by theGödel numbers of the ordinals \(\omega_0\coloneqq 1\) and\(\omega_{n+1}\coloneqq \omega^{\omega_n}\) whose limit is\(\varepsilon_0\).
Definition 3.4 We say that a theoryThasproof-theoretic ordinal \(\tau\), written \(\lvertT\rvert =\tau\), ifT can be proof-theoretically reduced to\(\PRA+\rTI_{\qf}(<\tau)\), i.e.,
\[ T \mathbin{\equiv_{\Pi^0_2}} \PRA+\rTI_{\qf}(<\tau).\]Unsurprisingly, the above notion has certain intensional aspects andhinges on the naturality of the representation system (for adiscussion see Rathjen 1999a: section 2.).
Gentzen’s consistency proof forPA employs areduction procedure \(\cR\) on proofsP of the empty sequenttogether with an assignmentord of representations for ordinalsto proofs such that \(\ord(\cR(P))< \ord(P)\). Here \(<\)denotes the ordering on ordinal representations induced by theordering of the pertinent ordinals. For this purpose he neededrepresentations for ordinals \(<\varepsilon_0\) where\(\varepsilon_0\) is the smallest ordinal \(\tau\) such that whenever\(\alpha<\tau\) then also \(\omega^{\alpha}<\tau\) with\(\alpha\mapsto \omega^{\alpha}\) being the function of ordinalexponentiation with base \(\omega\). Moreover, the functions \(\cR\)andord and the relation \(<\) are computable (when viewedas acting on codes for the syntactic objects), they can be chosen tobe primitive recursive. With \(g(n)= \ord(\cR^n(P))\), then-fold iteration of \(\cR\) applied toP, one has\(g(0)>g(1)> g(2)> \ldots > g(n)\) for alln, whichis absurd as the ordinals \(<\varepsilon_0\) are well-founded.HencePA is consistent.
Gentzen’s proof, though elementary, was very intricate and thusmore transparent proofs were sought. As it turned out, the obstaclesto cut elimination, inherent toPA, could be overcomeby moving to a richer proof system, albeit in a drastic way by goinginfinite. This richer system allows for proof rules withinfinitely many premises.[15] The inference commonly known as the \(\omega\)-rule consistsof the two types ofinfinitary inferences:
\[\begin{align}\frac{\Gamma \Rightarrow {\Delta, F(0)};\;\Gamma \Rightarrow {\Delta,F(1)};\; \ldots; \Gamma \Rightarrow {\Delta,F(n)};\;\ldots}{\Gamma \Rightarrow {\Delta,\forall x\,F(x)}}\tag*{\(\omega R\)}\\[1ex]\frac{{F(0),\Gamma} \Rightarrow {\Delta};\; {F(1), \Gamma} \Rightarrow {\Delta};\;\ldots; {F(n),\Gamma} \Rightarrow {\Delta};\ldots}{\exists x\,F(x), {\Gamma \Rightarrow \Delta} }\tag*{\(\omega L\)}\end{align}\]The price to pay will be that deductions become infinite objects,i.e.,infinite well-founded trees.
The sequent-style version of Peano arithmetic with the\(\omega\)-rule will be denoted by \(\PA_{\omega}\).\(\PA_{\omega}\) has no use for free variables. Thus free variablesare discarded and allterms will be closed. All formulae ofthis system are therefore closed, too. Thenumerals are theterms \(\bar{n}\), where \(\bar{0}=0\) and\(\overline{n+1}=S\bar{n}\). We shall identify \(\bar{n}\) with thenatural numbern. All termst of \(\PA_{\omega}\)evaluate to a numeral \(\bar{n}\).
\(\PA_{\omega}\) has all the inference rules of the sequent calculusexcept for \(\forall R\) and \(\exists L\). In their stead,\(\PA_{\omega}\) has the \(\omega R\) and \(\omega L\) inferences. TheAxioms of \(\PA_{\omega}\) are the following: (i)\(\emptyset\Rightarrow A\) ifA is atrue atomicsentence; (ii) \(B\Rightarrow \emptyset\) ifB is afalse atomic sentence; (iii) \(F(s_1,\ldots,s_n)\RightarrowF(t_1,\ldots,t_n)\) if \(F(s_1,\ldots,s_n)\) is an atomic sentence andthe \(s_i\) and \(t_i\) evaluate to the same numerals,respectively.
With the aid of the \(\omega\)-rule, each instance of the inductionscheme becomes logically deducible with an infinite proof tree. Todescribe the cost of cut elimination for \(\PA_{\omega}\), weintroduce the measures ofheight andcut rank of a\(\PA_{\omega}\) deduction \(\cD\). We will notate this by
\[\cD \stile{\alpha}{k} \Gamma \Rightarrow \Delta.\]The above relation is defined inductively following the buildup of thededuction \(\cD\). For thecut rank we need the definition ofthelength \(\lvert A\rvert\) of a formula:
\[\begin{align}\lvert A\rvert & =0 \textrm{ if } A \textrm{ is atomic; }\\\lvert \neg A_0\rvert & =\lvert A_0\rvert +1; \\\lvert A_0\mathbin{\Box} A_1\rvert & =\max(\lvert A_0,A_1\rvert)+1\end{align}\]where \(\Box=\land,\lor,\to\); \(\lvert \exists x\,F(x)\rvert =\lvert\forall x\,F(x)\rvert =\lvert F(0)\rvert +1\).
Now suppose the last inferenceI of \(\cD\) is of the form
\[\frac{\begin{array}{c}\cD_0\\{\Gamma_0\Rightarrow \Delta_0}\end{array}\ \ldots\ \begin{array}{c}\cD_n\\{\Gamma_n\Rightarrow \Delta_n}\end{array}\ \ldots\ n<\tau } {\Gamma\Rightarrow \Delta} I\]where \(\tau=1,2,\omega\) and the \(\cD_n\) are the immediatesubdeductions of \(\cD\). If
\[{\cD_n \stile{\alpha_n}{k} \Gamma_n} \Rightarrow \Delta_n\]and \(\alpha_n<\alpha\) for all \(n<\tau\) then
\[{\cD \stile{\alpha}{k} \Gamma} \Rightarrow \Delta\]providing that in the case ofI being acut with cutformulaA we also have \(\lvert A\rvert <k\). We will write\({\PA_{\omega} \stile{\alpha}{k} \Gamma} \Rightarrow \Delta\) toconvey that there exists a \(\PA_{\omega}\)-deduction\({\cD\stile{\alpha}{k} \Gamma}\Rightarrow \Delta\). The ordinalanalysis ofPA proceeds by first unfolding anyPA-deduction into a \(\PA_{\omega}\)-deduction:
\[\tag{5}\label{einbett}\textrm{If } {\PA \vdash \Gamma}\Rightarrow \Delta \textrm{ then } {\PA_{\omega} \stile{\omega+m}{k} \Gamma} \Rightarrow \Delta\]for some \(m,k<\omega\). The next step is to get rid of the cuts.It turns out that the cost of lowering the cut rank from \(k+1\) tok is an exponential with base \(\omega\).
Theorem 3.5 (Cut Elimination for\(\PA_{\omega}\)) If \({\PA_{\omega} \stile{\alpha}{k+1}\Gamma} \Rightarrow \Delta\), then
\[{\PA_{\omega} \stile{\omega^{\alpha}}{k} \Gamma} \Rightarrow \Delta.\]As a result, if \({\PA_{\omega} \stile{\alpha}{n} \Gamma}\Rightarrow\Delta\), we may apply the previous theoremn times to arriveat a cut-free deduction \({\PA_{\omega} \stile{\rho}{0} \Gamma}\Rightarrow \Delta\) with\(\rho=\omega^{\omega^{\iddots^{\omega^{\alpha}}}}\), where the stackhas heightn. Combining this with the result from\((\ref{einbett})\), it follows that every sequent \(\Gamma\Rightarrow\Delta\) deducible inPA has a cut-free deduction in\(\PA_{\omega}\) of length \(<\varepsilon_0\). Ruminating on thedetails of how this result was achieved yields a consistency proof forPA from transfinite induction up to \(\varepsilon_0\)for elementary decidable predicates on the basis of finitist reasoning(as described below).
Gentzen did not deal explicitly with infinite proof trees in hissecond published proof of the consistency ofPA(Gentzen 1938b). However, in the unpublished first consistency proofof Gentzen 1974 he aims at showing that a proof of a sequent infirst-order arithmetic gives rise to a a well-founded reduction tree;that tree can be identified with a cut-free proof in the sequentcalculus with the \(\omega\)-rule. The infinitary version ofPA with the \(\omega\)-rule was investigated bySchütte (1950). There remained the puzzle, how Gentzen’swork that used an ingenious method of assigning ordinals to purportedproofs of the empty sequent relates to the infinitary approach. Muchlater work by Buchholz (1997) and others revealed an intrinsicconnection between Gentzen’s assignment of ordinals todeductions inPA and the standard one to infinitedeductions in \(\PA_{\omega}\). In the 1950s infinitary proof theoryflourished in the hands of Schütte. He extended his approach toPA to systems of ramified analysis to be discussedbelow insection 5.2.
One last remark about the use of ordinals: Gentzen showed thattransfinite induction up to the ordinal \(\varepsilon_0\) suffices toprove the consistency ofPA. Using the arithmetizedformalization of the proof predicate (see above, afterDefinition 1.3) and takingk as the numeral denoting the Gödel number ofthe formula \(0=1\), we can express the consistency ofPA, \(\Con (\PA)\), by the formula \(\forall x\,\neg\proof_{\PA}(x,k)\). To appreciate Gentzen’s result it ispivotal to note that he applied transfinite induction up to\(\varepsilon_0\) only for primitive recursive predicates (the latterprinciple was denoted above by PR-TI\((\varepsilon_0)\)). Otherwise,Gentzen’s proof used only finistist means. Hence, a moreaccurate formulation of Gentzen’s result is
\[\tag{6}\label{picture}\bF+\textrm{PR-TI}(\varepsilon_0) \vdash \Con (\PA),\]whereF, as above, contains only finitistically acceptablemeans. In his 1943 paper Gentzen also showed that this result is bestpossible, asPA proves transfinite induction up toany \(\alpha<\varepsilon_0\). So one might argue that thenon-finitist part ofPA is encapsulated inPR-TI\((\varepsilon_0)\) and therefore “measured” by\(\varepsilon_0\). \(\varepsilon_0\) is also the proof-theoreticordinal ofPA as specified inDefinition 3.4. Gentzen hoped that results of this character could also be obtainedfor stronger mathematical theories, in particular for analysis.Hilbert’s famous second problem asked for a direct consistencyproof of that mathematical theory. Gentzen wrote in 1938 that
the most important [consistency] proof of all in practice, that foranalysis, is still outstanding. (1938a [Gentzen 1969: 236]).
He actually worked on a consistency proof for analysis as letters(e.g. one to Bernays on 23.6.1935 translated in von Plato 2017: 240)and stenographic notes from 1945 (e.g., Gentzen 1945) show. Formally,“analysis” was identified already in Hilbert 1917/18 as aform of second order number theory \(\bZ_2\). Hilbert and Bernaysdeveloped mathematical analysis in their \(\bZ_2\) and to a muchlarger extent in a supplement of the second volume of“Grundlagen der Mathematik”. We take \(\bZ_2\) as given inthe following way. Its language extends that ofPA byan additional sort of variables \(X,Y,Z,\ldots\) that range over setsof numbers and the binary membership relation \(t\in X\). Its axiomsare those ofPA, but the principle of proof byinduction is formulated as thesecond order induction axiom
\[\forall X(0\in X\land\forall x(x\in X\rightarrow S(x)\in X) \rightarrow \forall x(x\in X)).\]Finally, the axiom schema ofcomprehension,CA, asserts that for every formula \({F}(u)\) of thelanguage of \(\bZ_2\), there is a set \(X=\{u\mid {F}(u)\}\) havingexactly those numbersu as members that satisfy \({F}(u)\).More formally,
\[\label{CA}\tag{\(\bCA\)}\exists X\forall u(u\in X\leftrightarrow{F}(u))\]for all formulae \({F}(u)\) in whichX does not occur. That\(\bZ_2\) is often called “analysis” is due to therealization that, via the coding of real numbers and continuousfunctions as sets of natural numbers, a good theory of the continuumcan be developed from these axioms.
Modern analyses of “finitist mathematics” consider it assituated betweenPRA andPA. Whenarguing that Gödel’s second incompleteness theorem refutesHilbert’s finitist program, von Neumann argued that finitistmathematics is included inPA and, if not there,undoubtedly in \(\bZ_2\). So it is quite clear that a consistencyproof of \(\bZ_2\) would use non-finitist principles or that thepursuit of the consistency program would require an extension of thefinitist standpoint. In the next section we discuss briefly a varietyof extensions and elaborate two in greater detail.
According to Bernays, the reductive result due to Gödel andGentzen,Theorem 1.2, has a dramatic impact on the work concerned with Hilbert’sprogram. It opened in a very concrete and precise way the finitistperspective to a broader “constructive” one. Hilberthimself had taken such a step in a much vaguer way in his last paper(Hilbert 1931b).Theorem 1.2 showed, after all, thatPA is contained inHA via the negative translation. SinceHA comprises just a fragment of Brouwer’sintuitionism, the consistency ofPA is secured on thebasis of the intuitionist standpoint. In a letter to Heyting of 25February 1933, Gentzen suggested investigating the consistency ofHA since a consistency proof for classical arithmetichad not been given so far by finitist means. He then continued
If on the other hand, one admits the intuitionistic position as asecure basis in itself, i.e., as a consistent one, the consistency ofclassical arithmetic is secured by my result. If one wished to satisfyHilbert’s requirements, the task would still remain of showingintuitionistic arithmetic consistent. This, however, is not possibleby even the formal apparatus of classical arithmetic, on the basis ofGödel’s result in combination with my proof. Even so, I aminclined to believe that a consistency proof for intuitionisticarithmetic, from an even more evident position, is possible anddesirable. (quoted and translated in von Plato 2009: 672)
Gödel took a very similar position in December of 1933(Gödel 1995: 53). There he broadened the idea of a revisedversion of Hilbert’s program allowing constructive means that gobeyond the finitist ones without accepting fully fledged intuitionism;the latter he considered to be problematic, in particular on accountof the impredicative nature of intuitionist implication. As to anextension of Hilbert’s position he wrote:
But there remains the hope that in future one may find other and moresatisfactory methods of construction beyond the limits of the system A[capturing finitist methods], which may enable us to found classicalarithmetic and analysis upon them. This question promises to be afruitful field for further investigations.
Insection 3.2 we described Gentzen’s considerations; insection 4.2 we discuss Gödel’s as developed in the late 1930s. Insection 4.1 wesketch some other constructive positions.
A particularly appealing idea is to pursue Hilbert’s programrelative to a constructive point of view and determine which parts ofclassical mathematics are demonstrably consistent relative to thatstandpoint (see Rathjen 2009 for pursuing this with regard toMartin-Löf type theory). As one would suspect, there arediffering “schools” of constructivism and different layersof constructivism. Several frameworks for developing mathematics fromsuch a point of view have been proposed. Some we will refer to in thisarticle (arguably the most important) are:
At this point we will just give a very rough description of thesefoundational views. A few more details, especially about their scopeon a standard scale of theories and proof-theoretic ordinals, will beprovided at the very end ofsection 5.3.
(a) Arithmetical Predicativism originated in the writings ofPoincaré and Russell in response to the set-theoreticparadoxes. It is characterized by a ban of impredicative definitions.Whilst it accepts the completed infinite set of naturals numbers, allother sets are required to be constructed out of them via anautonomous process of arithmetical definitions. A first systematicattempt at developing mathematics predicatively was made inWeyl’s 1918 monographDas Kontinuum (Weyl 1918).
(b) Theories of higher type functionals comprise Gödel’sT and Spector’s extension ofT via functionalsdefined by bar recursion. The basic idea goes back toGödel’s 1938 lecture at Zilsel’s (Gödel 1995:94). It was inspired by Hilbert’s 1926Über dasUnendliche, which considered a hierarchy of functionals over thenatural numbers, not only of finite but also of transfinite type.
(c) To understand Takeuti’s finitist standpoint it is importantto pinpoint the place where in a consistency proof à la Gentzenthe means ofPRA are exceeded. Gentzen’s proofemploys a concrete ordering \(\prec\) of type \(\varepsilon_0\), ituses an assignment of ordinals to proofs and provides a reductionprocedure on proofs such that any alleged proof of an inconsistency isreduced to another proof of an inconsistency which gets assigned asmaller element of the ordering. The ordering, the ordinal assignmentand the reduction procedure are actually primitive recursive and thesteps described so far can be carried out in a small fragment ofPRA. The additional principle needed to infer theconsistency ofPA is the following:
Takeuti refers to (*) as the accessibility of \(\prec\). Note thatthis is a weaker property than the well-foundedness of \(\prec\) whichrefers to arbitrary sequences. There is nothing special about the caseofPA since any ordinal analysis of a theoryTin the literature can be made to fit this format. Thusepistemologically (*) is the fulcrum in any such consistency proof.Takeuti’s central idea (1987, 1975) was that we can carry outGedankenexperimente (thought experiments) on concretely given(elementary) sequences to arrive at the insight that (*) obtains.[16]
(d) Errett Bishop’s novel (informal) approach to constructiveanalysis (1967) made a great impression on mathematicians withconstructive leanings. In it, he dealt with different kinds ofmathematical objects (numbers, functions, sets) as if they were givenby explicit presentations, each kind being equipped with its owngermane “equality” relation conceived in such a way thatoperations on them would lead from representations to representationsrespecting the equality relation. An important ingredient that madeBishop’s constructivism workable is the systematic use ofwitnessing data as an integral part of what constitutes a mathematicalobject. For instance, a real number comes with a modulus ofconvergence while a function of real numbers comes equipped with amodulus of (uniform) convergence. In hisexplicitmathematics, Feferman (1975, 1979) aims (among other things) atformalizing the core of Bishop’s ontology. Explicit mathematicsis a theory that describes a realm of concretely and explicitly givenobjects (a universeU of symbols) equipped with an operation\(\bullet\) of application in such a way that given two objects\(a,b\in U\),a may be viewed as a program which can be run oninputb and may produce an output \(a\bullet b\in U\) or neverhalt (such structures are known as partial combinatory algebras orSchönfinkel algebras). Moreover, some of the objects ofUrepresent sets of elements ofU. The construction of new setsout of given sets is either done explicitly by elementarycomprehension or by a process of inductive generation. If one alsoadds principles to the effect that every internal operation (given as\(\lambda x.a\bullet x\) for some \(a\in U\)) which is monotone onsets possesses a least fixed point one arrives at a remarkably strongtheory (cf. Rathjen 1998, 1999b, 2002).
(e) Martin-Löf type theory is an intuitionist theory of dependenttypes intended to be a full scale system for formalizing constructivemathematics. Its origins can be traced toPrincipiaMathematica, Hilbert’sÜber das Unendliche,the natural deduction systems of Gentzen, taken in conjunction withPrawitz’s reduction procedures, and to Gödel’sDialectica system. It incorporates inductively defined data typesthat, together with the vehicle of internal reflection via universes,endow it with considerable consistency strength.
(f) Constructive set theory (as do the theories under (d) and (e))sets out to develop a framework for the style of constructivemathematics of Bishop’s 1967Foundations of constructiveanalysis in which he carried out a development of constructiveanalysis, based on informal notions of constructive function and set,which went substantially further mathematically than anything donebefore by constructivists. Where Brouwer reveled in differences,Bishop stressed the commonalities with classical mathematics. What wasnovel about his work was that it could be read as a piece of classicalmathematics as well.
The ‘manifesto’ of constructive set theory was mostvividly expressed by Myhill:
… the argumentation of [Bishop 1967] looks very smooth andseems to follow directly from a certain conception of what sets,functions, etc. are, and we wish to discover a formalism whichisolates the principles underlying this conception in the same waythat Zermelo-Fraenkel set-theory isolates the principles underlyingclassical (nonconstructive) mathematics. We want these principles tobe such as to make the process of formalization completely trivial, asit is in the classical case. (Myhill 1975: 347)
Despite first appearances, there are close connections between theapproaches of (d)–(f). Constructive set theory can beinterpreted in Martin-Löf type theory (due to Aczel 1978) andexplicit mathematics can be interpreted in constructive set theory(see Griffor & Rathjen 1994, Theorem 3.9). Perhaps the closest fitbetween (e) and (f), giving back and forth interpretations, isprovided by Rathjen & Tupailo 2006. Some concrete mathematicalresults are found at the end ofsection 5.3.
Among the proposals for extending finitist methods put forward in his1938 lecture at Zilsel’s, Gödel appears to have favored theroute via higher type functions. Details of what came to be known asthe Dialectica interpretation were not published until 1958(Gödel 1958) but the D-interpretation itself was arrived at by1941. Gödel’s system \({T}\) axiomatizes a class offunctions that he called theprimitive recursive functionals offinite type. \({T}\) is a largely equational theory whose axiomsare equations involving terms for higher type functionals with just alayer of propositional logic on top of that. In this way thequantifiers, problematic for finists and irksome to intuitionists, areavoided. To explain the benefits of the D-interpretation we need tohave a closer look at the syntax of \({T}\) and its principles.
Definition 4.1 \({T}\) has a many-sortedlanguage in that each terms is assigned a type. Type (symbols) aregenerated from 0 by the rule: If \(\sigma\) and \(\tau\) are typesthen so is \(\sigma\to\tau\). Intuitively the ground type 0 is thetype of natural numbers. If \(\sigma\) and \(\tau\) are types that arealready understood then \(\sigma\to\tau\) is a type whose objects areconsidered to be functions from objects of type \(\sigma\) to objectsof type \(\tau\). In addition to variables\(x^{\tau},y^{\tau},z^{\tau},\ldots\) for each type \(\tau\), thelanguage of \({T}\) has special constants 0, \(\rsuc\),\(\rK_{\sigma,\tau}\), \(\rS_{\rho,\sigma,\tau}\), and\(\rR_{\sigma}\) for all types \(\rho,\sigma,\tau\). The meaning ofthese constants is explained by their defining equations.\(\rK_{\sigma,\tau}\) and \(\rS_{\rho,\sigma,\tau}\) are familiar fromcombinatory logic which was introduced by Schönfinkel in 1924 andbecame more widely known through Curry’s work (1930). 0 playsthe role of the first natural number while \(\rsuc\) embodies thesuccessor function on objects of type 0. The constants\(\rR_{\sigma}\), calledrecursors, provide the main vehiclefor defining functionals by recursion on \(\bbN\). Term formationstarts with constants and variables, and ifs andt areterms of type \(\sigma\to\tau\) and \(\sigma\), respectively, then\(s(t)\) is a term of type \(\tau\). To increase readability we shallwrite \(t(r,s)\) instead of \((t(r))(s)\) and \(t(r,s,q)\) instead of\((t(r,s))(q)\) etc. Also \(\rsuc(t)\) will be shortened to \(t'\).The defining axioms for the constants are the following:[17]
The axioms of \({T}\) consist of the above defining axioms, equalityaxioms and axioms for propositional logic. Inference rules are modusponens and the induction rule
\[\textrm{from } A(0) \textrm{ and } A(x) \to A(x') \textrm{ conclude }A(t)\]fort of type 0 andx not in \(A(0)\).
The first step towards the D-interpretation of Heyting arithmetic in\({T}\) consists of associating to each formulaA of arithmetica syntactic translation \(A^D\) which is of the form
with \(A_D(\vec{x},\vec{y})\) being quantifier free. Thus \(A^D\) isnot a formula of \({T}\) but of its augmentation via quantifiers\(\forall x^{\tau}\) and \(\exists y^{\tau}\) for all types \(\tau\).The translation proceeds by induction on the buildup ofA. Thecases where the outermost logical symbol ofA is among\(\land\), \(\lor\), \(\exists x\), \(\forall x\) are ratherstraightforward. The crucial case occurs whenA is animplication \(B\to C\). To increase readability we shall suppress thetyping of variables. Let \(B^D\equiv \exists \vec{x}\, \forall\vec{y}\, B_D(\vec{x},\vec{y})\) and \(C^D\equiv {\exists \vec{u}\,\forall \vec{v}\, C_D(\vec{u},\vec{v})}\). Then one uses a series ofjudicious equivalences to bring the quantifiers in \(B^D\to C^D\) tothe front and finally employs skolemization of existential variablesas follows:
\[\begin{align}\tag{i} \exists \vec{x}\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to \exists \vec{u}\,\forall \vec{v} C_D(\vec{u},\vec{v})\\\tag{ii} \forall \vec{x}[\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to \exists \vec{u}\,\forall \vec{v} C_D(\vec{u},\vec{v})]\\\tag{iii} \forall \vec{x}\,\exists\vec{u}[\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to \forall \vec{v} C_D(\vec{u},\vec{v})]\\\tag{iv} \forall \vec{x}\,\exists\vec{u}\,\forall \vec{v}[\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to C_D(\vec{u},\vec{v})]\\\tag{v} \forall \vec{x}\,\exists\vec{u}\,\forall \vec{v}\,\exists \vec{y}[B_D(\vec{x},\vec{y})&\to C_D(\vec{u},\vec{v})]\\\tag{vi} \forall \vec{x}\,\exists\vec{u}\,\exists Y\,\forall \vec{v}[B_D(\vec{x},Y(\vec{v}))&\to C_D(\vec{u},\vec{v})]\\\tag{vii}\exists U\,\exists Z\,\forall \vec{x}\,\forall \vec{v}[B_D(\vec{x},Z(\vec{x},\vec{v}))&\to C_D(U(\vec{x}),\vec{v})].\end{align}\]\(A^D\) is then defined to be the formula in (vii). Note, however,that these equivalences are not necessarily justified constructively.Only (i) \(\Leftrightarrow\) (ii) and (iii) \(\Leftrightarrow\) (iv)hold constructively whereas (v) \(\Leftrightarrow\) (vi) and (vi)\(\Leftrightarrow\) (vii) are justified constructively only if onealso accepts the axiom of choice for all finite types(ACft). Equivalences (ii) \(\Leftrightarrow\) (iii)and (iv) \(\Leftrightarrow\) (v) use a certain amount of classicallogic known as theprinciple of independence of premise(IPft) andMarkov’s principle(MPft) for all finite types, respectively. At thispoint \(A\mapsto A^D\) is just a syntactic translation. But amazinglyit gives rise to a meaningful interpretation ofHA inT.
Theorem 4.2 (Gödel 1958) Suppose\(\cD\) is a proof ofA inHA and \(A^D\) asin \((\ref{D-Form})\). Then one can effectively construct a sequenceof terms \(\vec t\) (from \(\cD\)) such that \({T}\) proves \(A_D(\vect,\vec y\,)\).
If one combines theD-interpretation with theKolmogorov-Gentzen-Gödel negative translation ofPA intoHA one also arrives at aninterpretation ofPA in \({T}\). Some interestingconsequences of the latter are that the consistency ofPA follows finitistically from the consistency of\({T}\) and that every total recursive function ofPAis denoted by a term of \({T}\).
The three principles ACft, IPftand MPft which figured in theD-translationactually characterize theD-translation in the sense that overthe quantifier extension of \({T}\) with intuitionistic logic, called\(\bHA^{\omega}\), they are equivalent to the schema
\[C\leftrightarrow C^D\]for all formulaeC of that theory. Principles similar to thethree above are also often validated in another type of computationalinterpretation of intuitionistic theories known asrealizability. Thus, it appears that they are intrinsicallyrelated to computational interpretations of such theories.
A further pleasing aspect of Gödel’s interpretation is thatit can be extended to stronger systems such as higher order systemsand even to set theory (Burr 2000, Diller 2008). Moreover, itsometimes allows one to extract computational information even fromproofs of specific classical theorems (see, e.g., Kohlenbach 2007). Itbehaves nicely with respect to modus ponens and thus works well forordinary proofs that are usually structured via a series of lemmata.This is in contrast to cut elimination which often requires acomputationally costly transformation of proofs.
Spector (1962) extended Gödel’s functional interpretation,engineering an interpretation of \(\bZ_2\) intoT augmented viaa scheme of transfinite recursion on higher type orderings. This typeof recursion, calledbar recursion, is conceptually relatedto Brouwer’s bar induction principle. (For a definition of barinduction and a presentation of Spector’s result seeAppendix C.)
We described the system \(\bZ_2\) of second order arithmetic alreadyat the end of section 3.2. It was viewed as the“next”system to be proved consistent—afterfirst-order arithmeticPA had been shown to be. As wementioned \(\bZ_2\) is also called “analysis”, because itallows the straightforward development of classical mathematicalanalysis after coding real numbers and continuous functions as sets ofnatural numbers. Indeed, Hermann Weyl showed in 1918 that aconsiderable portion of analysis can be developed in small fragmentsof \(\bZ_2\) that are actually conservative overPA.The idea of singling out the minimal fragment of \(\bZ_2\) required toexpose a particular part of ordinary mathematics led in the 1980s tothe research program ofreverse mathematics. However, beforediscussing that program, we are going to describe proof-theoreticinvestigations of \(\bZ_2\) and its subsystems that have been a focalpoint until the very early 1980s.
After Gentzen, it was Gaisi Takeuti who worked on a consistency prooffor \(\bZ_2\) in the late 1940s. He conjectured that Gentzen’sHauptsatz not only holds for first order logic but also forhigher order logic, also known as simple type theory,STT. This came to be known as Takeuti’sfundamental conjecture.[18] The particular sequent calculus he introduced was called ageneralized logic calculus, GLC (Takeuti 1953). \(\bZ_2\) can beviewed as a subtheory of GLC. In the setting of GLC the comprehensionprincipleCA is encapsulated in the rightintroduction rule for the existential second-order quantifier and theleft introduction rule for the universal second-order quantifier. Inorder to display these rules the following notation is convenient. If\(F(U)\) and \(A(a)\) are formulae then \(F(\{v\mid A(v)\})\) arisesfrom \(F(U)\) by replacing all subformulae \(t\in U\) of \(F(U)\)(withU indicated) by \(A(t)\). The rules for second orderquantifiers can then be stated as follows:[19]
\[\begin{array}{lcclc}(\forall_2\,\bL) &\displaystyle \frac {{ F(\{v\mid A(v)\}) },\Gamma\Rightarrow \Delta}{\forall X\,{F(X),\Gamma} \Rightarrow \Delta }& & (\forall_2\,\rR) & \displaystyle\frac {\Gamma\Rightarrow {\Delta, F(U)}}{\Gamma\Rightarrow \Delta, {\forall X\,F(X) }} \\[2ex](\exists_2\,\bL) & \displaystyle \frac{F(U),\Gamma\Rightarrow \Delta}{{ \exists X\,F(X)},\Gamma\Rightarrow \Delta } & &(\exists_2\,\rR) & \displaystyle \frac{\Gamma\Rightarrow \Delta,F(\{ v\mid A(v)\})} {\Gamma\Rightarrow \Delta,{\exists X\,F(X) }} \end{array} \]To deduce an instance \(\exists X\,\forall x\,[x\in X \leftrightarrowA(x)]\) ofCA just let \(F(U)\) be the formula\(\forall x\,[x\in U \leftrightarrow A(x)]\) and observe that
\[F(\{v\mid A(v)\})\equiv \forall x\,[A(x)\leftrightarrow A(x)], \]and hence
\[ \begin{gather}\vdots \\ \displaystyle \frac{\Gamma\Rightarrow \Delta, \forall x\,[A(x)\leftrightarrow A(x)]}{\Gamma\Rightarrow \Delta, \exists X\,\forall x\,[x\in X \leftrightarrow A(x)]}\tag{\(\exists_2\,\rR\)}\end{gather}\]As the deducibility of the empty sequent is ruled out if cutelimination holds for GLC (or just the fragment GLC2corresponding to \(\bZ_2\)), Takeuti’s Fundamental Conjectureentails the consistency of \(\bZ_2\). However, note that it does notyield the subformula property as in the first-order case. After all,the minor formula \(F(\{x\mid A(x)\})\) in \((\exists_2\,\rR)\) and\((\forall_2\,\bL)\) may have a much higher (quantifier) complexitythan the principal formula \(\exists XF(X)\) and \(\forall XF(X)\),respectively. Indeed, \(\exists XF(X)\) may be a proper subformula of\(A(x)\) which clearly exhibits the impredicative nature of theseinferences and shows that they are strikingly different from those inpredicative analysis, where the subformula property obtains.
In 1960a Schütte developed a semantic equivalent to the(syntactic) fundamental conjecture using partial or semi-valuations.He employed the method of search trees (or deduction chains) to showthat a formulaF that cannot be deduced in the cut-free systemhas a deduction chain without axioms which then gives rise to apartial valuationV assigning the value “false” toF. From the latter he inferred that the completeness of thecut-free system[20] is equivalent to the semantic property that every partial valuationcan be extended to a total valuation (basically a Henkin model ofSTT). In 1966 Tait succeeded in provingcut-elimination for second order logic using Schütte’ssemantic equivalent for that fragment. Soon afterwards, Takahashi(1967) and Prawitz (1968) independently proved for full classicalsimple type that every partial valuation extends to a total one,thereby establishing Takeuti’s fundamental conjecture. Theseresults, though, were somewhat disappointing as they were obtained byhighly non-constructive methods that provided no concrete method foreliminating cuts in a derivation. In contrast, Girard showed in 1971that simple type theory not only allows cut-elimination but that thereis also a terminating normalization procedure.[21] These are clearly very interesting results, but as far as instillingtrust in the consistency of \(\bZ_2\) orSST isconcerned, the cut elimination or termination proofs are just circularsince they blatantly use the very comprehension principles formalizedin these theories (and a bit more). To quote Takeuti:
My fundamental conjecture itself has been resolved in a sense by MotooTakahashi and Dag Prawitz independently. However, their proofs rely onset theory, and so it cannot be regarded as an execution ofHilbert’s Program. (Takeuti 2003: 133)
Takeuti’s work on his conjecture instead focused on partialresults. A major breakthrough that galvanized research in prooftheory, especially ordinal-theoretic investigations, was made by himin 1967. In Takeuti 1967 he gave a consistency proof for\(\Pi^1_1\)-comprehension and thereby for the first time obtained anordinal analysis of an impredicative theory. For this Takeuti vastlyextended Gentzen’s method of assigning ordinals (ordinaldiagrams, to be precise) to purported derivations of the emptysequent. It is worth quoting Takeuti’s own assessment of hisachievements.
… the subsystems for which I have been able to prove thefundamental conjecture are the system with \(\Pi^1_1\) comprehensionaxiom and a slightly stronger system, that is, the one with\(\Pi^1_1\) comprehension axiom together with inductivedefinitions.[…] I tried to resolve the fundamental conjecturefor the system with the \(\Delta^1_2\) comprehension axiom within ourextended version of the finite standpoint. Ultimately, our success waslimited to the system with provably \(\Delta^1_2\) comprehensionaxiom. This was my last successful result in this area. (Takeuti 2003:133)
The subsystems of \(\bZ_2\) that are alluded to in the abovediscussion are now to be described. We consider the axiom schema of\(\cC\)-comprehension for formula classes \(\cC\) which isgiven by
\[\tag*{\(\cC\Hy\bCA\)}\Gamma\Rightarrow \exists X\forall u(u\in X\leftrightarrow{F}(u))\]for all formulae \({F}\in\cC\) in whichX does not occur.Natural formula classes are thearithmetical formulae,consisting of all formulae without second order quantifiers \(\forallX\) and \(\exists X\), and the \(\Pi^1_n\)-formulae, where a\(\Pi^1_n\)-formula is a formula of the form \(\forall X_1\ldots QX_n\,A(X_1,\ldots,X_n)\) with \(\forall X_1\ldots Q X_n\) being astring ofn alternating set quantifiers, commencing with auniversal one, followed by an arithmetical formula\(A(X_1,\ldots,X_n)\). Note that in this notation the class ofarithmetical formulae is denoted by \(\Pi^1_0\).
Also “mixed” forms of comprehension are of interest, e.g.,
\[\tag*{\(\Delta^1_n{\Hy}\bCA\)}\Gamma\Rightarrow \forall u\,[F(u)\leftrightarrow G(u)] \to \exists X\,\forall u\,[u\in X \leftrightarrow F(u)] \]where \(F(u)\) is in \(\Pi^1_n\) and \(G(u)\) in \(\Sigma^1_n\).
One also considers \(\Delta^1_n\) comprehension rules:
\[\frac{\emptyset \Rightarrow \forall u\,[F(u)\leftrightarrow G(u)]}{\Gamma\Rightarrow \exists X\,\forall u\,[u\in X \leftrightarrow F(u)]}\quad \quad\quad \begin{split}\textrm{if } & F(u)\in\Pi^1_n,\\& G(u)\in\Sigma^1_n\end{split}\tag*{\(\Delta^1_n{\Hy}\bCR\)} \]For each axiom scheme \(\mathbf{Ax}\) we denote by \((\mathbf{Ax})_0\)the theory consisting of the basic arithmetical axioms plus the scheme\(\mathbf{Ax}\). By contrast, \((\mathbf{Ax})\) stands for the theory\((\mathbf{Ax})_0\) augmented by the scheme of induction for all\(\cL_2\)-formulae. An example for these notations is the theory\((\bPi^1_1-\bCA)_0\) which has the comprehension schema for\(\bPi^1_1\)-formulae.
InPA one can define an elementary injective pairingfunction on numbers, e.g., \((n,m)\coloneqq 2^n\times3^m\). With thehelp of this function an infinite sequence of sets of natural numberscan be coded as a single set of natural numbers. The \(n^{th}\)section of set of natural numbersU is defined by\(U_n\,\coloneqq \,\{m:\,(n,m)\in U\}\). Using this coding, we canformulate a form of the axiom of choice for formulae \({F}\) in\(\cC\) by
\[\tag*{\(\cC{\Hy}\bAC\)}\Gamma\Rightarrow \forall x\exists Y{F}(x,Y)\rightarrow\exists Y\forall x{F}(x,Y_x).\]The basic relations between the above theories are discussed inFeferman and Sieg 1981a.
A major stumbling block for proving Takeuti’s fundamentalconjecture is that in \((\forall_2\bL)\) and \((\exists_2\rR)\)inferences the minor formula \(F(\{v\mid A(v)\})\) can have a muchhigher complexity than the principal (inferred) formula \(QX\,F(X)\).If, instead, one allowed these inferences only in cases where the‘abstraction’ term \(\{v\mid A(v)\}\) had (in some sense)a lower complexity than \(QX\,F(X)\), cut elimination could berestored. To implement this idea, one introduces a hierarchy of sets(formally represented by abstraction terms) whose complexity isstratified by ordinal levels \(\alpha\), and a pertaining hierarchy ofquantifiers \(\forall X^{\beta}\) and \(\exists X^{\beta}\) conceivedto range over sets of levels \(<\beta\). This is the basic ideaunderlying the ramified analytic hierarchy. The problem of whichordinals could be used for the transfinite iteration led to theconcept ofautonomous progressions of theories. The generalidea of progressions of theories is very natural and we shall considerit first before discussing the autonomous versions.
As observed earlier, Hilbert attempted to overcome the incompletenessof first-order arithmetic by introducing as axioms\(\Pi^0_1\)-statements all of whose instances had been finitisticallyproved (Hilbert 1931a). In a way he modified the concept of a“formal” theory by invoking finitist provability. Bernays,in his letter to Gödel of January 18, 1931 (Gödel 2003:86–88), proposed arule of a more general form. Heindicated also that it would allow the elimination of the inductionprinciple—in exchange for dealing with infinite proofs.
These considerations among others raised the issue of what constitutesa properlyformal theory. Gödel paid very specialattention to it when giving his Princeton Lectures in 1934. At thevery end he introduced the general recursive functions. This class ofnumber theoretic functions was shown to be co-extensional withChurch’s λ-definable ones by Church and Kleene. In Church1936 an “identification” of effective calculability andgeneral recursiveness was proposed, what is usually calledChurch’s thesis. Turing, of course, proposed hismachine computability for a very similar purpose and proved itsequivalence to λ-definability in an appendix to his 1936.Church and Turing used their respective notion to establish theundecidability of first-order logic. For Gödel, this was thebackground for formulating the incompleteness theorems in “fullgenerality” for all formal theories (containing a modicum ofnumber or set theory); see the Postscriptum to the Princeton LecturesGödel wrote in 1964:
In consequence of later advances, in particular of the fact that, dueto A.M. Turing’s work, a precise and unquestionably adequatedefinition of the general concept of formal system can now be given,the existence of undecidable arithmetical propositions and thenon-demonstrability of the consistency of a system in the same systemcan now be proved rigorously forevery consistent formalsystem containing a certain amount of finitary number theory.(Gödel 1986: 369).
The first incompleteness is proved for any such theoryT, byexplicitly producing an unprovable yet true statement \(G_\bT\). Thatformula can then be added toT making \(\bT+G_\bT\) a“less incomplete” theory. Von Neumann had alreadyestablished the equivalence of \(G_\bT\) with the consistencystatement forT, \(\Con (\bT)\); the latter expresses thatthere is no proof inT of a blatantly false statement such as\(0=1\). This gives then rise to an extension procedure leading fromT to \(\bT'\), namely (R1) \(\bT'=\bT+G_\bT\).
Thus one might try to address the incompleteness ofT byforming a sequence of theories \(\bT=\bT_0\subset\bT_1\subset\bT_2\subset\ldots\) where \(\bT_{i+1}=\bT_i'\) and tocontinue this into the transfinite. The latter can be achieved byletting \(\bT_{\lambda}=\bigcup_{\alpha<\lambda}\bT_{\alpha}\) forlimit ordinals λ and \(\bT_{\alpha+1}=\bT_{\alpha}'\) forsuccessor ordinals \(\alpha+1\). However, the consistency statementfor \(\bT_{\lambda}\), thus the provability predicate for the theory,has to be expressed in the language of \(\bT_{\lambda}\), and onecannot simply use set theoretic ordinals. Furthermore, the extensionsofT are all supposed to be formal theories, i.e., the axiomshave to be enumerable by recursive functions. To deal with both issuesat once, one has to deal with ordinals in an effective way.
That is what Turing did in his Princeton dissertation (1939)concerning, what he called,ordinal logics. There heconsiders two ways of achieving the effective representation ofordinals. The first way is via the set \(\rW\) of numberse forrecursive well-orderings \(\leq_e\), the second is provided by theclass of Church-Kleene notations for ordinals (Church and Kleene 1936)that used expressions in the λ-calculus to describe ordinals.The latter approach was then modified in Kleene 1938 to an equivalentrecursion-theoretic definition that uses numerical codes to denotecountable ordinals and is known as Kleene’s \({\cO}\).
Definition 5.1 Acomputable orrecursive function on the naturals is one that can becomputed by a Turing machine. The program of a Turing machineMcan be assigned a Gödel number \({\Corner{M}}\). For naturalnumbers \(e,n\), to convey that the Turing machine with Gödelnumbere computes a numberm on inputn, we usethe notation \(\{e\}(n)\) form.
Kleene uses \({\rsuc}(a)\coloneqq 2^a\) as notations for successorordinals and and \({\rlim}(e)\coloneqq 3\cdot 5^e\) for limitordinals.
The class \({\cO}\) of ordinal notations, the partial orderingrelation \(\relLTcO\) between such notations, and the ordinal\({\mid}{a}{\mid}\) denoted by \(a\in {\cO}\) are definedsimultaneously as follows:
The first ordinal \(\tau\) such that there is no recursivewell-ordering of order type \(\tau\) is usually denoted by\(\omega^{CK}\) in honor of Church and Kleene. It can be shown for theabove definition of \({\cO}\) that the recursive ordinals are exactlythose that have a notation in \({\cO}\).
When it comes to theoriesT, quite unlike to other areas oflogic (e.g., model theory), results as those presented in this sectiondepend not only on the set of axioms ofT, but also on the waythey are presented. When talking about a theoryT we assumethatT is given by a \(\Sigma^0_1\)-formula \(\psi(v_0)\) suchthatF is an axiom ofT just in case\(\psi({\Corner{F}})\) holds; a \(\Sigma^0_1\)-formula is of the form\(\exists y_1\ldots\exists y_n\,R(y_1,\ldots y_n)\) withRprimitive recursive. This consideration together with Kleene’s\({\cO}\) allows us to build a transfinite hierarchy of theories basedon any suitable theoryT. Aconsistency progressionbased onT is a primitive recursive function \(n\mapsto\psi_n\) that associates with every natural numbern a\(\Sigma^0_1\)-formula \(\psi_n(v_0)\) that defines \(\bT_n\) suchthatPA proves: (i) \(\bT_0=\bT\); (ii)\(\bT_{{\rsuc}(n)}=\bT_n'\), and (iii)\(\bT_{{\rlim}(n)}=\bigcup_{x\in{\bbN}}\bT_{\{n\}(x)}\). So, finallywe can formulate Turing’s completeness result.
Theorem 5.2 For any true \(\Pi^0_1\)sentenceF a number \(a_{F}\in{\cO}\) can be constructed suchthat \({{\mid}\,{a_{F}}\,{\mid}}=\omega+1\) and \(\bT_{a_{F}}\vdashF\). Moreover, the function \(F\mapsto a_{F}\) is given by a primitiverecursive function.
At first glance Turing’s theorem seems to provide some insightinto the nature of true \(\Pi^0_1\)-statements. That this is an“illusion” is revealed by the analysis of its simple proofwhich is just based on the trick of coding the truth ofF as amember of \({\cO}\). The proof also shows that the infinitely manyiterated consistency axioms \(\Con (\bT_0),\Con (\bT_1),\ldots\) of\(\bT_{{{\rsuc}}({\rlim}(e))}\) are irrelevant for provingF.As it turns out, the reason why one has to go to stage \(\omega+1\) issimply that only at stage \(\omega\) a non-standard definition of theaxioms of \(\bigcup_{n<\omega}\bT_n\) can be introduced. Moredetails and other results on recursive progressions are discussed inAppendix B. Here let us just mention that one has considered other progressionsbased on various extension procedures \(\bT \mapsto \bT'\) thatstrengthen a given theory, notably:[22]
(R2) is called an extension by thelocal reflectionprinciple, whereas (R3) uses theuniform reflectionprinciple. Feferman obtained in 1962 an amazing result thatstrengthens Turing’s result in a dramatic way.
Let \((\bT_a)_{a\in{\cO}}\) be a progression using the uniformreflection principle withPA as the base theoryT. Then we have: for any true arithmetical sentenceFthere is an \(a\in{\cO}\), such that \(\bT_a\vdash F\). Moreover,\(a\in{\cO}\) can be chosen such that \({{\mid}\,{a}\,{\mid}}\leq\omega^{\omega^{\omega+1}}\).
For further discussion seeAppendix B. Here we just note that the union of the \(\bT_a\) is no longer aformal theory.
In the foregoing progressions the ordinals remained external to thetheory.Autonomous progressions of theories are the properinternalization of the general concept of progressions. In theautonomous case one is allowed to ascend to a theory \(\bT_a\) only ifone already has shown in a previously accepted theory \(\bT_b\) that\(a\in{\cO}\). This idea of generating a hierarchy of theories via aboot-strapping process appeared for the first time in Kreisel 1960,where it was proposed as a way of characterizing finitism andpredicativism in mathematically precise way. In more formal terms, thestarting point is a theory \(\bT_0\) which is accepted as correct andan extension procedure \(\bT\mapsto\bT'\) which is viewed as leadingfrom a correct theoryT to a more encompassing correct theory\(\bT'\). Moreover, the language of these theories is supposed tocontain a formula \(\rAcc(x)\) such that provability of\(\rAcc(\bar{a})\) in a correct theory entails that \(a\in{\cO}\).[23] Kreisel singled out two autonomous progressions of theories\(\{{{\bF}}_a\}\) and \(\{{{\bR}}_a\}\) for finitism andpredicativity, respectively, and determined the least upper bound ofthe \(\lvert a\rvert\) appearing in the first hierarchy to be theordinal \(\varepsilon_0\) which is also the proof-theoretic ordinal ofPA. The determination of the least upper bound forthe predicative hierarchy \(\{{{\bR}}_a\}\) was achieved independentlyby Feferman (1964) and Schütte (1964, 1965). It turned out thatthis ordinal can be expressed in a notation system developed by Veblenthat will be presented next.
As we saw above, ordinals below \(\varepsilon_0\) suffice for theproof-theoretic treatment ofPA. For strongertheories segments larger than \(\varepsilon_0\) have to be employed,requiring new normal forms akin to Cantor’s normal form. Ordinalrepresentation systems utilized by proof theorists in the 1960s firstemerged in a purely set-theoretic context more than 50 years earlier.In 1904 Hardy wanted to “construct” a subset of \(\bbR\)of size \(\aleph_{1}\), the first uncountable cardinal. His method wasto represent countable ordinals via increasing sequences of naturalnumbers and then to correlate a decimal expansion with each suchsequence. Hardy used two processes on sequences: (i) Removing thefirst element to represent the successor; (ii) Diagonalizing atlimits. For example, if the sequence \(1,2,3,\ldots\) represents theordinal 1, then \(2,3,4,\ldots\) represents the ordinal 2 and\(3,4,5,\ldots\) represents the ordinal 3 etc., while the‘diagonal’ \(1,3,5,\ldots\) provides a representation of\(\omega\). In general, if \(\lambda=\lim_{n\in\bbN}\lambda_n\) is alimit ordinal with \(b_{n1},b_{n2},b_{n3},\ldots\) representing\(\lambda_n<\lambda\), then \(b_{11},b_{22},b_{33},\ldots\)represents λ. This representation, however, depends on thesequence chosen with limit λ. A sequence\((\lambda_n)_{n\in\bbN}\) with \(\lambda_n<\lambda\) and\(\lim_{n\in \bN}\lambda_n=\lambda\) is called afundamentalsequence for λ. Hardy’s two operations give explicitrepresentations for all ordinals \(<\omega^2\).
Veblen in 1908 extended the initial segment of the countable for whichfundamental sequences can be given effectively. The new tools hedevised were the operations ofderivation andtransfiniteiteration applied tocontinuous increasing functions onordinals.
Definition 5.4 Let \({\ON}\) be the class ofordinals. A (class) function \(f:{\ON}\to{\ON}\) is said to beincreasing if \(\alpha<\beta\) implies\(f(\alpha)<f(\beta)\) andcontinuous (in the ordertopology on \({\ON}\)) if
\[f(\lim_{\xi<\lambda}\alpha_{\xi})=\lim_{\xi<\lambda}f(\alpha_{\xi})\]holds for every limit ordinal λ and increasing sequence\((\alpha_{\xi})_{\xi<\lambda}\).f is callednormal if it is increasing and continuous.
The function \(\beta\mapsto \omega+\beta\) is normal while\(\beta\mapsto \beta+\omega\) is not continuous at \(\omega\) since\(\lim_{\xi<\omega}(\xi+\omega)=\omega\) but\((\lim_{\xi<\omega}\xi)+\omega=\omega+\omega\).
Definition 5.5 Thederivative\(f'\) of a function \(f:{\ON}\rightarrow {\ON}\) is the functionwhich enumerates in increasing order the solutions of the equation\(f( \alpha )= \alpha\), also called thefixed points off.
Iff is a normal function, \(\{\alpha:\,f(\alpha)=\alpha\}\) isa proper class and \(f'\) will be a normal function, too.
Definition 5.6 Now, given a normal function\(f:\ON \rightarrow \ON\), define a hierarchy of normal functions asfollows:
\[\begin{align}f_{0}&=f \\ f_{\alpha +1}& =f_{\alpha}'\\f_{\lambda}(\xi) &= \xi^{\textrm{th}} \textrm{ element of } \bigcap_{\alpha<\lambda} \left(\textrm{Range of } f_{\alpha}\right) \quad\mbox{ for }\lambda \textrm{ a limit ordinal}. \end{align}\]In this way, from the normal functionf we get a two-placefunction, \(\varphi_{f} ( \alpha , \beta )\coloneqq f_{\alpha} ( \beta)\). One usually discusses the hierarchy when \(f=\ell\), where\(\ell( \alpha )=\omega^{\alpha}\). The least ordinal \(\gamma>0\)closed under \(\varphi_{\ell}\), i.e., the least ordinal \(>0\)satisfying\((\forall\alpha,\beta<\gamma)\;\varphi_{\ell}(\alpha,\beta)<\gamma\)is called \(\Gamma_0\). It has a somewhat iconic status, in particularsince Feferman and Schütte determined it to be the least ordinal‘unreachable’ by certainpredicative meansexpressed in terms of autonomous progressions of theories (defined insection 5.2).
Veblen extended this idea first to arbitrary finite numbers ofarguments, but then also to transfinite numbers of arguments, with theproviso that in, for example \(\Phi_{f} ( \alpha_{0} , \alpha_{1} ,\ldots , \alpha_{\eta} )\), only a finite number of the arguments\(\alpha_{\nu}\) may be non-zero. Finally, Veblen singled out theordinal \(E(0)\), where \(E(0)\) is the least ordinal \(\delta >0\) which cannot be named in terms of functions \(\Phi_{\ell} (\alpha_{0} , \alpha_{1} , \ldots , \alpha_{\eta} )\) with \(\eta <\delta\), and each \(\alpha_{ \gamma } < \delta\).
Though the “great Veblen number” (as \(E(0)\) is sometimescalled) is quite an impressive ordinal it does not furnish an ordinalrepresentation sufficient for the task of analyzing a theory as strongas \(\Pi^1_1\)-comprehension. Of course, it is possible to go beyond\(E(0)\) and initiate a new hierarchy based on the function\(\xi\mapsto E(\xi)\) or even consider hierarchies utilizing finitetype functionals over the ordinals. Still all these further stepsamount to rather modest progress over Veblen’s methods. In 1950Bachmann presented a new kind of operation on ordinals which dwarfsall hierarchies obtained by iterating Veblen’s methods. Bachmannbuilds on Veblen’s work but his novel idea was the systematicuse ofuncountable ordinals to keep track of the functionsdefined by diagonalization. Let \({\Omega}\) be the first uncountableordinal. Bachmann defines a set of ordinals \(\fB\) closed undersuccessor such that with each limit \(\lambda\in \fB\) is associatedan increasing sequence \(\langle\lambda[\xi]:\,\xi<\tau_{\lambda}\rangle\) of ordinals \(\lambda[\xi] \in \fB\) of length \(\tau_{\lambda}\leq\Omega\) and\(\lim_{\xi<\tau_{\lambda}}\lambda[\xi]=\lambda\). A hierarchy offunctions \((\varphi^{\ssfB}_{\alpha})_{\alpha \in \fB}\) is thenobtained as follows:
\[\tag{8}\label{bachmann} \begin{align} {\varphi^{\ssfB}_0 (\beta)} = \omega^{\beta} \quad& {\varphi^{\ssfB}_{\alpha+1}} = {\left(\varphi^{\ssfB}_{\alpha}\right)'}\\ \varphi^{\ssfB}_{\lambda} \textrm{ enumerates }& \bigcap_{\xi<\tau_{\lambda}}(\mbox{Range of }\varphi^{\ssfB}_{\lambda[\xi]})&\kern-12pt \textrm{if } \lambda \textrm{ is a limit with } \tau_{\lambda}<{\Omega} \\ \varphi^{\ssfB}_{\lambda}\textrm{ enumerates } & \{\beta< {\Omega} : \varphi^{\ssfB}_{\lambda[\beta]}(0)=\beta\}&\kern-10pt \textrm{if }\lambda \textrm{ is a limit with } \tau_{\lambda}={\Omega}. \end{align}\]After the work of Bachmann, the story of ordinal representationsbecomes very complicated. Significant papers (by Isles, Bridge,Pfeiffer, Schütte, Gerber to mention a few) involve quitehorrendous computations to keep track of the fundamental sequences.Also Bachmann’s approach was combined with uses of higher typefunctionals by Aczel and Weyhrauch. Feferman proposed an entirelydifferent method for generating a Bachmann-type hierarchy of normalfunctions which does not involve fundamental sequences. Buchholzfurther simplified the systems and proved their computability. Fordetails we recommend the preface to Buchholz et al. 1981.
The ordinal that Feferman and Schütte determined as the leastupper bound for \(\{{{\bR}}_a\}\) is \(\Gamma_0\), the least non-zeroordinal closed under the Veblen function\(\alpha,\beta\mapsto\varphi_{\alpha}(\beta)\). This was a genuineproof-theoretic result with the tools coming ready-made fromSchütte’s (1960b) monograph. There he had calculated theproof-theoretic ordinals of the \({{\bR}}_a\) as a function of\(\lvert a\rvert\), using cut elimination techniques for logics withinfinitary rules (dubbed “semi-formal systems”). If\(\lvert a\rvert =\omega^{\alpha}\) then \(\lvert {{\bR}}_a\rvert=\varphi_{\alpha}(0)\). “Semi-formal” is a terminologyemployed by Schütte and refers to the fact that this proof systemhas rules with infinitely many premises, similar to the\(\omega\)-rule.
Definition 5.7 In the following we assumethat the ordinals come from some representation system. The languageof \(\bRA^*\) is an extension of that of first order arithmetic. Foreach ordinal \(\alpha\) and \(\beta>0\) it has free set variables\(U_0^\alpha,U^\alpha_1,U^\alpha_2\ldots\) of level \(\alpha\) andbound set variables of level \(\beta\). The level,\({\textrm{lev}(A)}\), of a formulaA of \(\bRA^*\) is definedto be the maximum of the levels of set variables that occur inA. Expressions of the form \(\{x\mid A(x)\}\) with \(A(u)\) aformula will be calledabstraction terms, their level beingthe same as that of the formula \(A(u)\).
The inference rules of \(\bRA^*\) comprise those of the sequentcalculus with the exception of \((\forall R)\) and \((\exists L)\)which are replaced by those for the \(\omega\)-rule: \(\omega\)R and\(\omega\)L. Below \(\fP_{\beta}\) stands for the set of allabstraction terms with levels \(<\beta\). The rules for the setquantifiers are as follows:
\[\tag*{\({\forall_\beta\,\bL}\)}\frac{F(P),\Gamma\Rightarrow \Delta,}{\forall X^\beta\,F(X^\beta),\Gamma\Rightarrow \Delta}\] \[\tag*{\({\forall_\beta}\,\rR\)} \frac{\Gamma\Rightarrow \Delta, F(P)\\textrm{ all } P\in{\fP}_{\beta}} {\Gamma\Rightarrow \Delta, \forallX^\beta\,F(X^\beta)} \]\[ \tag*{\({\exists_\beta}\,\bL\)}\frac{F(P),\Gamma\Rightarrow \Delta\ \textrm{ all } P\in{\fP}_{\beta}}{\exists X^\beta\,F(X^\beta),\Gamma\Rightarrow \Delta} \] \[\tag*{\({\exists_\beta}\,\rR\)} \frac{\Gamma\Rightarrow \Delta,F(P)}{\Gamma\Rightarrow \Delta, \exists X^{\beta} F(X^{\beta})} \]where in \(\forall_\beta\bL\) and \(\exists_\beta\rR\),P is anabstraction term of level \(<\beta\).
As per usual, the price one has to pay for rules with infinitely manypremises is that derivations become infinite (well-founded) trees. Thelength of a derivation can then be measured by the ordinal rankassociated with the tree. One also wants to keep track of thecomplexity of cuts in the derivation. The length we assign to aformulaA, \(\lvert A\rvert\), measures its complexity. It isan ordinal of the form \(\omega\cdot \alpha+n\) where \(\alpha\) isthe level ofA and \(n<\omega\). One then defines a notionof derivability in \({{\bRA}}\),
\[\bRA^*\stile{\alpha}{\rho} \Gamma\Rightarrow \Delta\]where \(\alpha\) majorizes the transfinite length of the derivationand \(\rho\) conveys that all cut formulae in the derivation havelength \(<\rho\).
Cut elimination works smoothly for \(\bRA^*\). However, the prize onehas to pay can only be measured in terms of Veblen’s \(\varphi\)function. The optimal result is the so-called second cut eliminationtheorem.
Theorem 5.8 (Second Cut Elimination Theorem)
\[ \textrm{If } \bRA^* \stile{\alpha}{\rho+\omega^\nu} \Gamma\Rightarrow \Delta \textrm{ then } \bRA^* \stile{\varphi_{\nu}(\alpha)}{\rho} \Gamma\Rightarrow \Delta \]It entails of course the special case that \(\bRA^*\stile{\alpha}{\omega^\nu} \Gamma\Rightarrow \Delta\) yields \(\bRA^*\stile{\varphi_{\nu}(\alpha)}{0} \Gamma\Rightarrow \Delta\), and thus,as the latter deduction is cut-free, all cuts can be removed. Severalsubtheories of \(\bZ_2\) can be interpreted in \(\bRA^*\), yieldingupper bounds for their proof-theoretic ordinals viaTheorem 5.8. Here is a selection of such results:[24]
Theorem 5.9
For the definitions of the theories in this theorem, see end ofsection 5.1. To obtain the results about theories in (iii) and (iv) it is somewhateasier to first reduce them to systems of the form\((\Pi^0_1{\Hy}\bCA)_{<\rho}\) as the latter have a straightforwardinterpretation in \({{\bRA}}^*\). Reductions of\((\Delta^1_1{\Hy}\bCR)\) to\((\Pi^0_1{\Hy}\bCA)_{<\omega^{\omega}}\) and of\((\Sigma^1_1{\Hy}\bAC)\) to\((\Pi^0_1{\Hy}\bCA)_{<\varepsilon_0}\) are due to Feferman (1964)and Friedman (1970), respectively.[25]
The investigation of such subsystems of analysis and the determinedeffort to establish their mathematical power led to a research programthat was initiated by Friedman and Simpson some thirty years ago anddubbedReverse Mathematics. The objective of the program isto investigate the role of set existence principles in ordinary mathematics.[26] The main question can be stated as follows:
Given a specific theorem \(\tau\) of ordinary mathematics, which setexistence axioms are needed in order to prove \(\tau\)?
Central to the above is the reference to what is called‘ordinary mathematics’. This concept, of course,doesn’t have a precise definition. Roughly speaking, by ordinarymathematics we mean main-stream, non-set-theoretic mathematics, i.e.,the core areas of mathematics which make no essential use of theconcepts and methods of set theory and do not essentially depend onthe theory of uncountable cardinal numbers.
For many mathematical theorems \(\tau\), there is a weakest naturalsubsystem \({\bS}(\tau)\) of \(\bZ_2\) such that \({\bS}(\tau)\)proves \(\tau\). Very often, if a theorem of ordinary mathematics isproved from the weakest possible set existence axioms, the statementof that theorem will turn out to be provably equivalent to thoseaxioms over a still weaker base theory. Moreover, it has turned outthat \({\bS}(\tau)\) often belongs to a small list of specificsubsystems of \(\bZ_2\) dubbed \(\RCA_0\), \(\WKL_0\), \(\ACA_0\),\(\ATR_0\) and \(({\bPi}^1_1{\Hy}{\bCA})_0\), respectively.[27] The systems are enumerated in increasing strength. The main setexistence axioms of \(\RCA_0\), \(\WKL_0\), \(\ACA_0\), \(\ATR_0\),and \(({{\bPi}}^1_1{\Hy}{\bCA})_0\) are recursive comprehension, weakKönig’s lemma, arithmetical comprehension, arithmeticaltransfinite recursion, and \(\bPi_1^1\)-comprehension, respectively.\(\ACA_0\) is actually the same theory as \((\Pi^1_0{\Hy}\bCA)_0\).For exact definitions of all these systems and their role in reversemathematics see Simpson 1999. Examples of mathematical statementsprovable in \(\RCA_0\) are the intermediate value theorem and theBaire category theorem. Reversals for \(\WKL_0\) are the Heine/Borelcovering lemma and the local existence theorem for solutions ofordinary differential equations. Among the many reversals for\(\ACA_0\), \(\ATR_0\), and \((\bPi^1_1{\Hy}{\bCA})_0\) one finds theexistence of maximal ideals in countable rings, Ulm’s theorem,and the Cantor-Bendixson theorem, respectively.
The proof-theoretic strength of \(\RCA_0\) is weaker than that ofPA while \(\ACA_0\) has the same strength asPA. To get a sense of scale, the strengths of thefirst four theories are best expressed via their proof-theoreticordinals:
Theorem 5.10
\(\lvert (\bPi^1_1\Hy\bCA)_0\rvert\), however, eludes expression inthe ordinal representations introduced so far. This will require themuch stronger representation to be introduced inDefinition 5.11.
There are important precursors of reverse mathematics. Weyl (1918)started to develop analysis using a minimalist foundation (thatequates to \(\ACA_0\)) whilst Hilbert and Bernays (1939) developedanalysis in second order arithmetic, whereby on closer inspection onesees that all they used is \((\bPi^1_1\Hy\bCA)_0\). The first theoremof genuinely reverse mathematics was established by Dedekind in hisessayStetigkeit und irrationale Zahlen (1872). It statesthat his continuity (or completeness) principle is equivalent to awell-known theorem of analysis, namely, every bounded, monotonicallyincreasing sequence has a limit. He emphasizes,
This theorem is equivalent to the principle of continuity, i.e., itloses its validity as soon as we assume a single real number not to becontained in the domain \(\cR\) [of all real numbers, i.e., of allcuts of rational numbers]. (1872 [1996: 778])
It is to bring out “the connection between the principle ofcontinuity and infinitesimal analysis” (1872 [1996: 779]).
Spector’s (1962) functional interpretation of \(\bZ_2\) via barrecursive functionals was of great interest to proof theory. However,it was not clear whether there was a constructive foundation of thesefunctionals along the lines of hereditarily continuous functionalsthat can be represented by computable functions (akin to Kleene 1959;Kreisel 1959) which would make them acceptable on intuitionisticgrounds. In 1963 Kreisel conducted a seminar the expressed aim ofwhich was to assay the constructivity of Spector’sinterpretation (see Kreisel 1963). Specifically he asked whether anintuitionistic theory of monotonic inductive definitions,\(\bID^i_1(\text{mon})\), could model bar recursion, or even morespecifically, formally capture a class of indices of representingfunctions of these functionals. In a subsequent report theseminar’s conclusion was later summarized by Kreisel:
… the answer is negative by a wide margin, since not even barrecursion of type 2 can be proved consistent [from intuitionisticallyaccepted principles]. (Kreisel in “Reports of the Seminar on theFoundations of Analysis, Stanford, Summer 1963”, as quoted inFeferman 1998: 223)
He not only introduced theories of one inductive definition but alsoof \(\nu\)-times transfinitely iterated inductive definitions,\(\bID_{\nu}\). Albeit it soon became clear that even the theories\(\bID_{\nu}\) couldn’t reach the strength of \(\bZ_2\) (inpoint of fact, such theories are much weaker than the fragment of\(\bZ_2\) based on \(\Pi^1_2\)-comprehension); they became the subjectof proof-theoretic investigation in their own right and occupied theattention of proof theorists for at least another 15 years. One reasonfor this interest was surely that the intuitionistic versionscorresponding to the accessible (i.e., well-founded) part of aprimitive recursive ordering are immediately constructively appealingand a further reason was that they were thought to be more amenable todirect proof-theoretic treatments than fragments of \(\bZ_2\) or settheories.
We shall not give a detailed account of the formalization of thesetheories, but focus on the non-iterated case \(\bID_1\) and itsintuitionistic version \(\bID_1^i\) to convey the idea. A monotoneoperator on \(\bbN\) is a map \(\Gamma\) that sends a set \(X\subseteq\bbN\) to a subset \(\Gamma(X)\) of \(\bbN\) and is monotone, i.e.,\(X\subseteq Y\subseteq \bbN\) implies \(\Gamma(X) \subseteq\Gamma(Y)\). Owing to monotonicity, the operator \(\Gamma\) will havea least fixed point \(I_{\Gamma}\subseteq\bbN\), i.e., \(\Gamma(I_{\Gamma})=I_{\Gamma}\) and for every other fixed pointX itholds \(I_{\Gamma} \subseteq X\). Set-theoretically \(I_{\Gamma}\) isobtained by iterating \(\Gamma\) through the ordinals,
\[\begin{align}\Gamma^0&=\emptyset,\\\Gamma^1&=\Gamma(\Gamma^0),\\\Gamma^{\alpha}&=\Gamma\left(\bigcup_{\xi<\alpha}\Gamma^{\xi}\right).\end{align}\]Monotonicity ensures (in set theory) that one finds an ordinal\(\tau\) such that \(\Gamma(\Gamma^{\tau})=\Gamma^{\tau}\), and theset \(\Gamma^{\tau}\) will be the least fixed point. If one adds a new1-place predicate symbolP to the language of arithmetic, onecan describe the so-called positive arithmetical operators. They areof the form
\[\Gamma_A(X)=\{n\in\bbN\mid A(n,X)\}\]where \(A(x,P)\) is a formula of the language ofPAaugmented byP in which the predicateP occurs only positively.[28] The syntactic condition of positivity then ensures that the operator\(\Gamma_A\) is monotone. The language of \(\bID_1\) is an extensionof that ofPA. It contains a unary predicate symbol\(I_A\) for each positive arithmetical operator \(\Gamma_A\) and theaxioms
\[\label{id1} \begin{align}\tag{Id1} \forall x\,(A(x,I_A)\rightarrow I_A(x))\\\tag{Id2} \forall x\,[A(x,F)\to F(x)]\to \forall x\,[I_A(x)\to F(x)]\end{align}\]where in (Id2) \(F(x)\) is an arbitrary formula of \(\bID_1\) and\(A(x,F)\) arises from \(A(x,P)\) by replacing every occurrence of\(P(t)\) in the formula by \(F(t)\). Collectively these axioms assertthat \(I_A\) is the least fixed point of \(\Gamma_A\), or moreaccurately the least among all sets of natural numbers definable inthe language of \(\bID_1\). \(\bID_1^i\) will be used to denote theintuitionistic version. Its subtheory \(\bID_{1}^{i}(\cO)\) isobtained by just adding the predicate symbol \(I_A\) and thepertaining axioms (Id1) and (Id2), where \(\Gamma_A\) is the operatorthat defines Kleene’s \(\cO\) (cf.Definition 5.1).
By a complicated passage through formal theories for choice sequencesit was known that the theory \(\bID_{1}\) can be reduced to\(\bID_{1}^{i}(\cO)\). The first ordinal analysis for the theory\(\bID_{1}^{i}(\cO)\) was obtained by Howard (1972). Via the knownproof-theoretical reductions this entailed also an ordinal analysisfor \(\bID_{1}\). The proof-theoretic ordinal of \(\bID_{1}\) is theBachmann-Howard ordinal, which is denoted by\({\psi_{\sOmega_{1}}(\varepsilon_{\Omega_1+1})}\) in the system ofDefinition 5.11.
As inductively defined sets can be the starting point of anotherinductive definition, the procedure of inductively defining predicatescan be iterated along any well-ordering \(\nu\) in a uniform way. Thisleads to the theories \(\bID_{\nu}\) which allow one to formalize\(\nu\)-times iterated inductive definitions, where \(\nu\) stands fora primitive recursive well-ordering. If \(\nu\) is a well-ordering onconstructive grounds then also the \(\nu\)-times iterated version ofKleene’s \(\cO\) has a clear constructive meaning. As a resultthe formal theories \({{\bID}}^i_{\nu}(\cO)\) that embody this processare constructively justified. The topic of theories of iteratedinductive definitions was flourishing at the 1968 conference onIntuitionism and Proof Theory in Buffalo (see Kino et al. 1970). Oneof the main proof-theoretic goals was to find a reduction of theclassical theories \(\bID_{\nu}\) to their intuitionistic counterparts\({{\bID}}^i_{\nu}(\cO)\). This was all the more desirable because ofknown reductions of important fragments of second order arithmetic totheories of the former kind. Friedman (1970) had shown that the secondorder system with the \(\Sigma _2^1\)-axiom of choice can beinterpreted in the system\(\hbox{($\Pi_1^1$-CA)}_{<{\varepsilon_0}}\) of less than\({\varepsilon_0}\)-fold iterated \(\Pi_1^1\)-comprehensions andFeferman (1970a) had shown that less than \(\nu\)-fold iterated\(\Pi_1^1\)-comprehensions could be interpreted in the system
\[\bID_{<\nu} \coloneqq \bigcup_{\alpha<\nu}\bID_{\alpha}\]for \(\nu=\omega^{\gamma}\) with \(\gamma\) limit. However, Zucker(1973) showed that there are definite obstacles to a straightforwardreduction of the theories \(\text{ID}_{\nu}\) for \(\nu > 1\) totheir intuitionistic cousins. Sieg (1977) attacked the problem by amethod adapted from Tait (1970) who had used cut elimination for aninfinitary propositional logic with formulae indexed over constructivenumber classes to obtain a consistency proof for second orderarithmetic theory with the schema of \(\Sigma _2^1\) dependentchoices. He achieved a reduction of \(\bID_{<\nu}\) to\(\bID_{<\nu}^i(\cO)\) for limit \(\nu\) by carrying out the prooftheory for a system of \(\PL_{\alpha}\) of propositional logic withinfinitely long conjunctions and disjunctions indexed over theconstructive number classes \(\cO_{\alpha}\) for \(\alpha<\nu\)inside \(\bID^i_{\alpha+1}(\cO)\). As \(\bID_{\alpha}\) can be reducedto \(\PL_{\alpha}\) this brought about the reduction. Ordinal analysesfor theories of iterated inductive definitions, first for finite andthen also for transfinite iterations, were obtained by Pohlers usingTakeuti’s reduction procedure for \(\Pi_1^1\)-comprehension (seePohlers 1975, 1977). Working independently, Buchholz (1977a) used anew type of rules, dubbed \(\Omega_{\mu+1}\)-rules to recapture theseresults without use of Takeuti’s methods. These rules are anextension of the Ω-rule described inDefinition C.3.
The ordinal representation systems encountered so far are notsufficient for expressing the strength of theories of iteratedinductive definitions nor that of the strongest of the standard systemof reverse mathematics, \((\Pi^1_1{\Hy}\bCA)_0\). Therefore weintersperse a brief account of how to proceed onwards, adumbrating themain ideas.
Bachmann’s bold move of using large ordinals to generate namesfor small ordinals was a very important idea. To obtain ordinalanalyses of ever stronger theories one has to find new ways ofdefining ordinal representation systems that can encapsulate theirstrength. The latter goes hand in hand with the development of new cutelimination techniques that are capable of removing cuts in(infinitary) proof systems with strong reflection rules. Ordinalrepresentations, however, appear to pose a considerable barrier tounderstanding books and articles in this research area. Nonetheless wethink that they are the best way to express the proof-theoreticstrength of a theory as they provide a scale by means of which one canget a grasp of how much stronger a theory \(S_1\) is than anothertheory \(S_2\) (rather than the bland statement that \(S_1\) isstronger than \(S_2\)).
As an example we will introduce an ordinal representation system whichcharacterizes the theory \((\Pi^1_1{\Hy}\bCA)+{{\BI}}\), followingBuchholz 1977b. It is based on certain ordinal functions\(\psi_{\sOmega_n}\) which are often calledcollapsingfunctions. The definition of these functions, that is of thevalue \({\psi_{\sOmega_{n}}(\alpha)}\) at \(\alpha\), proceeds byrecursion on \(\alpha\) and gets intertwined with the definition ofsets of ordinals \(C^{{\Omega_{\omega}}}(\alpha,\beta)\), dubbed“Skolem hulls” since they are defined as the smalleststructures closed under certain functions specified below.
Let \(\bbN^+\) be the natural numbers without 0. Below we shall assumethat \(\Omega_n\) \((n\in\bbN^+)\) is a “large” ordinaland that \(\omega<\Omega_n <\Omega_{n+1}\). Their limit,\(\sup_{n\in\bbN^+}\Omega_n\), will be denoted by\(\Omega_{\omega}\).
Definition 5.11 By recursion on \(\alpha\)we define:
\[\begin{align}C^{\Omega_{\omega}}(\alpha,\beta) & = \left\{\begin{array}{l} \textrm{closure of } \beta\cup\{0,\Omega_{\omega}\}\cup\{\Omega_n\mid n\in\bbN^+\} \\\textrm{under: }\\ +, (\xi\mapsto \omega^{\xi})\\(\xi\longmapsto{\psi_{\sOmega_{n}}(\xi)})_{\xi<\alpha}\textrm{ for } n\in\bbN^+\end{array}\right. \\[1ex]{\psi_{\sOmega_{n}}(\alpha)}&= \min\{\rho<\Omega_n\mid\;C^{\Omega_{\omega}}(\alpha,\rho)\cap\Omega_n=\rho\}.\end{align}\]At this point it is not clear whether \({\psi_{\sOmega_{n}}(\alpha)}\)will actually be defined for all \(\alpha\) since there might notexist a \(\rho<\Omega_n\) such that
\[C^{{\Omega_{\omega}}}(\alpha,\rho)\cap\Omega_n=\rho.\]This is where the “largeness” of \(\Omega_n\) comes intoplay. One (easy) way of guaranteeing this consists in letting\(\Omega_n\) be the \(n^{th}\) uncountable regular cardinal, that is\(\Omega_n\coloneqq \aleph_n\). However, such strong set-theoreticassumptions can be avoided. For instance, it suffices to let\(\Omega_n\) be the \(n^{th}\) recursively regular ordinal (which is acountable ordinal) (see Rathjen 1993a).
To get a better feel for what \(\psi_{\sOmega_{n}}\) is doing, notethat if \(\rho={\psi_{\sOmega_{n}}(\alpha)}\), then\(\rho<\Omega_n\) and with \([\rho,\Omega_n)\) being the intervalconsisting of ordinals \(\rho\leq\alpha<\Omega_n\) one has
\[[\rho,\Omega_n)\cap C^{\Omega_{\omega}} (\alpha,\rho) =\emptyset\]thus the order-type of the ordinals below \(\Omega_n\) which belong tothe “Skolem hull” \(C^{{\Omega_{\omega}}}(\alpha,\rho)\)is \(\rho\). In more pictorial terms, \(\rho\) is said to be the\(\alpha^{th}\)collapse of \(\Omega_n\) since the order-typeof \(\Omega_n\) viewed from within the structure\(C^{{\Omega_{\omega}}}(\alpha,\rho)\) is actually \(\rho\).
The ordinal representation system we are after is provided by the set
\[C^{{\Omega_{\omega}}}(\varepsilon_{\Omega_{\omega}+1},0)\]where \(\varepsilon_{\Omega_{\omega}+1} \) is the least epsilon numberafter \(\Omega_{\omega}\), i.e., the least ordinal\(\eta>\Omega_{\omega}\) such that \(\omega^{\eta}=\eta\). Theproof-theoretic ordinal of \((\Pi^1_1{\Hy}\bCA)+{{\BI}}\) is\({\psi_{\sOmega_{1}}(\varepsilon_{\Omega_{\omega}+1})}\). Althoughthe definition of the set\(C^{{\Omega_{\omega}}}(\varepsilon_{\Omega_{\omega}+1},0)\) and itsordering is set-theoretic, it turns that it also has a purelyprimitive recursive definition which can be given in a fragment ofPRA. Thus the set-theoretic presentation mainlyserves the purpose of a “visualization” of an elementarywell-ordering.
The pattern of definition exhibited inDefinition 5.11 continues for stronger systems, albeit only as a basic template sincefor theories beyond the level of \((\Delta^1_2{\Hy}\bCA)+{{\BI}}\)substantially new ideas are required. Analogies between largeset-theoretic ordinals (cardinals) and recursively large ordinals onthe one hand and ordinal representation systems on the other hand canbe a fruitful source of inspiration for devising new representationsystems. More often than not, hierarchies and structural propertiesthat have been investigated in set theory and recursion theory onordinals turn out to have proof-theoretic counterparts.
Using an extended version of the representation system fromDefinition 5.11 if \(\nu>\omega\),[29] the outcome of all the work on the theories of inductive definitionscan be summarized by the following theorem.[30]
Theorem 5.12 For recursive \(\nu\),
\[\lvert \bID_{\nu}\rvert = \lvert \bID^i_{\nu}(\cO)\rvert = \psi_{\sOmega_\nu}(\varepsilon_{\sOmega_{\nu}+1}).\]A generalized treatment of theories of iterated inductive definitionsfor arbitrary well-orderings and of autonomous iteration was carriedout in Rathjen 1988, 2010. These theories are stronger than\((\Delta^1_2{\Hy}\bCA)\) if \(\nu\geq\varepsilon_0\).
Theorem 5.12 played an important role in determining the exact strength of somefragments of \(\bZ_2\). The major ordinal-theoretic results pertainingto subsystems of \(\bZ_2\) of the pre-1980 area given in the next theorem.[31]
Theorem 5.13
The next challenge after \((\Delta^1_2{\Hy}\bCA)\) was posed by thetheory \((\Delta^1_2{\Hy}\bCA)+{{\BI}}\). Its treatment not onlyrequired a considerably stronger ordinal representation system butalso coincided with a shift away from \(\cL_2\) theories and theoriesof iterated inductive definitions to a direct proof-theoretictreatment of set theories. Pioneering work on the proof theory of settheories is mainly due to Jäger (1980, 1982). The analysis of\((\Delta^1_2{\Hy}\bCA)+{{\BI}}\) in Jäger & Pohlers 1982provides a particularly nice application of this new approach whichhas been calledadmissible proof theory, owing to its concernwith theories extending Kripke-Platek set theory by axioms assertingthe existence of admissible sets (for some details seeAppendix D.1).
Theorem 5.14 \(\lvert(\Delta^1_2{\Hy}\bCA)+\BI\rvert =\psi_{\sOmega_1}(\varepsilon_{I+1})\)
The “I” in the foregoing notation is supposed to beindicative of “inaccessible cardinal”. Indeed,the easiest way to build an extended ordinal representation systemsufficient unto this task (modeled onDefinition 5.11) is to add an inaccessibleI, close the Skolem hulls under\(\xi\mapsto \Omega_\xi\) for \(\xi<I\) and introduce collapsingfunctions \(\psi_{\pi}\) for all \(\pi\) of either formI or\(\Omega_{\xi+1}\).
The goal of giving an ordinal analysis of full second order arithmetichas not been attained yet. For many years \(\Pi^1_2\)-comprehensionposed a formidable challenge and the quest for its ordinal analysisattained something of a holy grail status (cf. Feferman 1989). Atfirst blush it might be difficult to see why the latter comprehensionis so much more powerful than \(\Delta^1_2\)-comprehension (plus\({{\BI}}\)). To get a sense for the enormous difference, it seemsadvisable to work in (admissible) set theory and consider a hierarchyof recursively large ordinal notions wherein these comprehensionschemes correspond to the bottom and the top end of the scale,respectively. That is discussed inAppendix D. Here we just mention a few reductions to “constructive”frameworks. The reductions we have in mind, underlie a broadened viewof “constructivity”. Constructive theories of functionsand sets that relate to Bishop’s constructive mathematics astheories likeZFC relate to Cantorian set theory havebeen proposed by Myhill, Martin–Löf, Feferman and Aczel.Among those are Feferman’s constructive theory of operations andclasses, \(\bT_0\) in 1975 and 1979, Martin-Löf’sintuitionistic type theory (1984) and constructive set theory,especially Constructive Zermelo-Fraenkel Set Theory,CZF, the latter also combined with the regularextension axiom,REA. By employing an ordinalanalysis for set theoryKPi which is an extension ofKripke-Platek set theory via an axiom asserting that every set iscontained in an admissible set (seeAppendix D) it has been shown thatKPi and consequently\((\Delta^1_2{\Hy}\bCA)+{{\BI}}\) can be reduced to both of thesetheories:
Theorem 5.15 (Feferman 1975; Jäger1983; Jäger & Pohlers 1982; Rathjen 1993b)\((\Delta^1_2{\Hy}\bCA)+\BI\),KPi, \(\bT_0\) andCZF+REA are proof-theoreticallyequivalent. In particular, these theories prove the same theorems inthe negative arithmetic fragment.
Theorem 5.16 (Rathjen 1993b; Setzer 1998)The soundness of the negative arithmetic fragment of\(\Delta^1_2-\bCA+\BI\) andKPi is provable inMartin-Löf’s 1984 type theory.
A detailed account of these results has been given in section 3 ofRathjen 1999a.
Proof theory has become a large subject with many specialized branchesthat can be mathematically quite complex. So we have tried to presentdevelopments close to the main artery of its body, starting with itsinception at the beginning of the twentieth century and ending withresults from the close of the century. The theorems mentioned at theend of section 5 foreshadow investigations in the twenty-first centurythat are presented in Rathjen (2018) and concern relationships betweenFeferman’s systems of explicit mathematics and Martin-Löftype theories; the paper touches also on the completely newdevelopments of homotopy type theory (see Awodey 2012). Someadditional contemporary proof theoretic developments are described inappendicesD,E andF. The theme ofAppendix E, the extraction of computational information from proofs in particularformal theories, is the central topic of Schwichtenberg andWainer’s 2012. They deal with theories of arithmetic andanalysis up to \((\Pi^1_1{\Hy}\bCA)_0\). Standard texts on prooftheory covering ordinal analysis are Takeuti 1985 and Schütte1977. An introductory text book on ordinal analysis is Pohlers 2009and a more recent one is Arai 2020. Finally, some new directions ofproof theoretic work are taken in contributions to both Kahle &Rathjen 2015 and Jäger & Sieg 2018.
Let us also report on progress on the methodological issues thefinitist consistency program was to address. First of all, due toquite a bit of important historical work, we have a much better graspof the evolution of the program in the 1920s and its roots in thedevelopment toward modern structuralist mathematics in the nineteenthcentury. The work of editing Hilbert’s unpublished lecture noteshas opened a treasure of information.[32] The connections to the development in nineteenth century mathematicsare hinted at inAppendix A, but they are explored in greater depth, for example, inFerreirós 2008; Reck 2003, 2013; and the papers Sieg wrote onDedekind with Morris (2018) and Schlimm (2005, 2014), respectively.The 2020 volume edited by Reck & Schiemer explores this connectionin a rich way. Secondly, as to the properly methodological issues, wepresented some broad considerations insection 4.1, namely, that consistency proofs should be given relative to“constructive” theories. We did not make any remarks aboutwhat is characteristic of a constructive perspective and why such aperspective has not only a mathematical, but also a philosophicalpoint. There is, of course, a very rich literature. Let us point tosome informative sources: van Atten (2007) as defendingBrouwer’s views, Martin-Löf (1984) as exposing thephilosophical motivation for his type theory, Feferman (1988, 2000)discussing the foundational point of proof theory, Bernays (1976) aspresenting crucial aspects of an informed philosophy of mathematics,and (Sieg 2013, 2020) as explicating (the context for) hisreductive structuralism.
Back to proof theory: We have to admit that we neglected someclassical topics. One is the study of different proof systems andtheir relationships going back to Gentzen’s dissertation (1935).In theirBasic Proof Theory, Troelstra and Schwichtenberg(2000) give an excellent selection, but some important calculi such asthe Schütte proof systems are not covered (see, for example,Schütte 1960b, 1977). They also do not cover proof systems fortemporal and modal logic, neither are substructural logics presented.[33]
A second omission concerns Bounded Arithmetic, where feasibilityissues are a central focus: one studies formal theories with provablyrecursive functions that form very small subclasses of the primitiverecursive ones. Indeed, the class of the provably total functions ofBuss’ base theory is that of functions that can be computed inpolynomial time, and there was the hope that proof theoreticinvestigations might contribute novel results in complexity theory. Athird omission concerns proof mining; that line of deep mathematicalinvestigations using proof theoretic tools was initiated by Kreisel(1958, 1982) and Luckhardt (1989), but really perfected only byKohlenbach (2007). We hinted at the work of his school at the very endofsection 4.1.
Proof theory, as we described it, deals primarily with formal proofsor derivations. Hilbert aimed, however, as we pointed out insection 1, for a more general analysis of ordinary, informal mathematicalproofs. For Gentzen in his 1936, “the objects of proof theoryshall be theproofs carried out in mathematics proper”(p. 499). The aim of Hilbert and his collaborators was undoubtedly toachieve a deeper mathematical and conceptual understanding, but alsoto find general methods of proof construction in formal calculi. Thisis now being pursued in the very active area of using powerfulcomputers for the interactive verification of proofs and programs aswell as the fully automated search for proofs of mathematical theorems.[34] That can be pursued with a cognitive scientific purpose of modelingmathematical reasoning (see Sieg 2010, Ganesalingam & Gowers 2017,and Sieg & Derakhshan 2021). It is clearly in the spirit ofHilbert who articulated matters in his second Hamburg talk of 1927 asfollows:
The formula game … has, besides its mathematical value, animportant general philosophical significance. For this formula game iscarried out according to certain definite rules, in which thetechnique of our thinking is expressed. These rules form a closedsystem that can be discovered and definitively stated.
Then he continues with a provocative statement about the cognitivegoal of proof theoretic investigations.
The fundamental idea of my proof theory is none other than to describethe activity of our understanding, to make a protocol of the rulesaccording to which our thinking actually proceeds. (Hilbert 1927[1967: 475])
It is clear to us, and it was clear to Hilbert, that mathematicalthinking does not proceed in the strictly regimented ways imposed byan austere formal theory. Though formal rigor is crucial, it is notsufficient to shape proofs intelligibly or to discover themefficiently, even in pure logic. Recalling the principle thatmathematics should solve problems “by a minimum of blindcalculation and a maximum of guiding thought”, we have toinvestigate the subtle interaction between understanding andreasoning, i.e., between introducing concepts and provingtheorems.
The translations in this paper are ours, unless we explicitly refer toan English edition.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
[Please contact the author with suggestions.]
epsilon calculus |Hilbert, David: program in the foundations of mathematics |logic: linear |mathematics: constructive |proof theory: development of |reasoning: automated |semantics: proof-theoretic |set theory: constructive and intuitionistic ZF |type theory
Some passages in this entry first appeared in W. Sieg,Hilbert’s Programs and Beyond, Oxford: OxfordUniversity Press, 2013.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2024 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054