Reasoning is the ability to make inferences, and automated reasoningis concerned with the building of computing systems that automate thisprocess. Although the overall goal is to mechanize different forms ofreasoning, the term has largely been identified with valid deductivereasoning as practiced in mathematics and formal logic. In thisrespect, automated reasoning is akin to mechanical theorem proving.Building an automated reasoning program means providing an algorithmicdescription to a formal calculus so that it can be implemented on acomputer to prove theorems of the calculus in an efficient manner.Important aspects of this exercise involve defining the class ofproblems the program will be required to solve, deciding what languagewill be used by the program to represent the information given to itas well as new information inferred by the program, specifying themechanism that the program will use to conduct deductive inferences,and figuring out how to perform all these computations efficiently.While basic research work continues in order to provide the necessarytheoretical framework, the field has reached a point where automatedreasoning programs are being used by researchers to attack openquestions in mathematics and logic, provide important applications incomputing science, solve problems in engineering, and find novelapproaches to questions in exact philosophy.
A problem being presented to an automated reasoning program consistsof two main items, namely a statement expressing the particularquestion being asked called theproblem’sconclusion, and a collection of statements expressing all therelevant information available to the program—theproblem’s assumptions. Solving a problem meansproving the conclusion from the given assumptions by the systematicapplication of rules of deduction embedded within the reasoningprogram. The problem solving process ends when one such proof isfound, when the program is able to detect the non-existence of aproof, or when it simply runs out of resources.
A first important consideration in the design of an automatedreasoning program is to delineate the class of problems that theprogram will be required to solve—theproblemdomain. The domain can be very large, as would be the casefor a general-purpose theorem prover for first-order logic, or be morerestricted in scope as in a special-purpose theorem prover forTarski’s geometry, or the modal logic K. A typical approach inthe design of an automated reasoning program is to provide it firstwith sufficient logical power (e.g., first-order logic) and thenfurther demarcate its scope to the particular domain of interestdefined by a set ofdomain axioms. To illustrate,EQP, a theorem-proving program for equational logic, was used to solvean open question in Robbins algebra (McCune 1997):Are all Robbinsalgebras Boolean? For this, the program was provided with theaxioms defining a Robbins algebra:
\[\begin{align}\tag{A1} &x+y=y+x & \text{(commutativity)}\\\tag{A2} (&x+y)+z = x+ (y+z) & \text{(associativity)}\\\tag{A3} -(-(&x+y)+ -(x+-y))=x & \text{(Robbins equation)}\end{align}\]The program was then used to show that a characterization of Booleanalgebra that uses Huntington’s equation, \[-(-x + y) + -(-x + -y) = x,\] followsfrom the axioms. We should remark that this problem is non-trivialsince deciding whether a finite set of equations provides a basis forBoolean algebra is undecidable, that is, it does not permit analgorithmic representation; also, the problem was attacked by Robbins,Huntington, Tarski and many of his students with no success. The keystep was to establish that all Robbins algebras satisfy \[\exists x\exists y(x + y = x),\]since it was known that this formula is a sufficient condition for aRobbins algebra to be Boolean. When EQP was supplied with this pieceof information, the program provided invaluable assistance bycompleting the proof automatically.
A special-purpose theorem prover does not draw its main benefit byrestricting its attention to the domain axioms but from the fact thatthe domain may enjoy particular theorem-proving techniques which canbe hardwired—coded—within the reasoning program itself andwhich may result in a more efficient logic implementation. Much ofEQP’s success at settling the Robbins question can be attributedto its built-in associative-commutative inference mechanisms.
A second important consideration in the building of an automatedreasoning program is to decide (1) how problems in its domain will bepresented to the reasoning program; (2) how they will actually berepresented internally within the program; and, (3) how the solutionsfound—completed proofs—will be displayed back to the user.There are several formalisms available for this, and the choice isdependent on the problem domain and the underlying deduction calculusused by the reasoning program. The most commonly used formalismsinclude standard first-order logic, typed \(\lambda\)-calculus, andclausal logic. We take up clausal logic here and assume that thereader is familiar with the rudiments of first-order logic; for thetyped \(\lambda\)-calculus the reader may want to check Church 1940.Clausal logic is a quantifier-free variation of first-order logic andhas been the most widely used notation within the automated reasoningcommunity. Some definitions are in order: Aterm is aconstant, a variable, or a function whose arguments are themselvesterms. For example, \(a, x, f(x)\), and \(h(c,f(z),y)\) are all terms.Aliteral is either an atomic formula, e.g. \(F(x)\),or the negation of an atomic formula, e.g. \({\sim}R(x,f(a))\). Twoliterals arecomplementary if one is the negation ofthe other. Aclause is a (possibly empty) finitedisjunction of literals \(l_1\vee \ldots \vee l_n\) where no literalappears more than once in the clause (that is, clauses can bealternatively treated as sets of literals).Groundterms, ground literals, and ground clauses have no variables. Theempty clause, [ ], is the clause having noliterals and, hence, is unsatisfiable—false under anyinterpretation. Some examples: \({\sim}R(a,b)\), and \(F(a)\vee{\sim}R(f(x),b) \vee F(z)\) are both examples of clauses but onlythe former is ground. The general idea is to be able to express aproblem’s formulation as a set of clauses or, equivalently, as aformula inconjunctive normal form (CNF), that is, asa conjunction of clauses.
For formulas already expressed in standard logic notation, there is asystematic two-step procedure for transforming them into conjunctivenormal form. The first step consists in re-expressing a formula into asemantically equivalent formula inprenex normalform, \((\Theta x_1)\ldots(\Theta x_n)\alpha(x_1 ,\ldots,x_n)\), consisting of a string of quantifiers \((\Thetax_1)\ldots(\Theta x_n)\) followed by a quantifier-free expression\(\alpha(x_1 ,\ldots ,x_n)\) called thematrix. Thesecond step in the transformation first converts the matrix intoconjunctive normal form by using well-known logical equivalences suchas DeMorgan’s laws, distribution, double-negation, and others;then, the quantifiers in front of the matrix, which is now inconjunctive normal form, are dropped according to certain rules. Inthe presence of existential quantifiers, this latter step does notalways preserve equivalence and requires the introduction ofSkolem functions whose role is to“simulate” the behaviour of existentially quantifiedvariables. For example, applying the skolemizing process to theformula \[\forall x\exists y\forall z\exists u\forall v[R(x,y,v)\vee{\sim}K(x,z,u,v)]\] requires the introduction of a one-place andtwo-place Skolem functions, \(f\) and \(g\) respectively, resulting inthe formula \[\forall x\forall z\forall v[R(x,f(x),v)\vee{\sim}K(x,z,g(x,z),v)]\] The universal quantifiers can then be removedto obtain the final clause, \(R(x,f(x),v) \vee{\sim}K(x,z,g(x,z),v)\)in our example. The Skolemizing process may not preserve equivalencebut maintains satisfiability, which is enough for clause-basedautomated reasoning.
Although clausal form provides a more uniform and economicalnotation—there are no quantifiers and all formulas aredisjunctions—it has certain disadvantages. One drawback is theincrease in the size of the resulting formula when transformed fromstandard logic notation into clausal form. The increase in size isaccompanied by an increase in cognitive complexity that makes itharder for humans to read proofs written with clauses. Anotherdisadvantage is that the syntactic structure of a formula in standardlogic notation can be used to guide the construction of a proof butthis information is completely lost in the transformation into clausalform.
A third important consideration in the building of an automatedreasoning program is the selection of the actual deduction calculusthat will be used by the program to perform its inferences. Asindicated before, the choice is highly dependent on the nature of theproblem domain and there is a fair range of options available:General-purpose theorem proving and problem solving (first-orderlogic, simple type theory), program verification (first-order logic),distributed and concurrent systems (modal and temporal logics),program specification (intuitionistic logic), hardware verification(higher-order logic), logic programming (Horn logic), constraintsatisfaction (propositional clausal logic), mathematics (higher-orderlogic), computational metaphysics (higher-order modal logic), andothers.
A deduction calculus consists of a set of logical axioms and acollection of deduction rules for deriving new formulas frompreviously derived formulas. Solving a problem in the program’sproblem domain then really means establishing a particular formula\(\alpha\)—the problem’s conclusion—from theextended set \(\Gamma\) consisting of the logical axioms, the domainaxioms, and the problem assumptions. That is, the program needs todetermine if \(\Gamma\) entails \(\alpha , \Gamma \vDash \alpha\). Howthe program goes about establishing this semantic fact depends, ofcourse, on the calculus it implements. Some programs may take a verydirect route and attempt to establish that \(\Gamma\vDash \alpha\) by actually constructing a step-by-step proof of\(\alpha\) from \(\Gamma\). If successful, this shows of course that\(\Gamma\) derives—proves—\(\alpha\), a fact we denote bywriting \(\Gamma \vdash \alpha\). Other reasoning programs may insteadopt for a moreindirect approach and try to establishthat \(\Gamma \vDash \alpha\) by showing that \(\Gamma \cup\{{\sim}\alpha \}\) is inconsistent which, in turn, is shown byderiving a contradiction, \(\bot\), from the set \(\Gamma \cup\{{\sim}\alpha \}\). Automated systems that implement the formerapproach include natural deduction systems; the latter approach isused by systems based on resolution, sequent deduction, and matrixconnection methods.
Soundness and completeness are two (metatheoretical) properties of acalculus that are particularly important for automated deduction.Soundness states that the rules of the calculus aretruth-preserving. For a direct calculus this means that if \(\Gamma\vdash \alpha\) then \(\Gamma \vDash \alpha\). For indirect calculi,soundness means that if \(\Gamma \cup \{{\sim}\alpha \} \vdash \bot\)then \(\Gamma \vDash \alpha\).Completeness in adirect calculus states that if \(\Gamma \vDash \alpha\) then \(\Gamma\vdash \alpha\). For indirect calculi, the completeness property isexpressed in terms ofrefutations since oneestablishes that \(\Gamma \vDash \alpha\) by showing the existence ofa proof, not of \(\alpha\) from \(\Gamma\), but of \(\bot\) from\(\Gamma \cup \{{\sim}\alpha \}\). Thus, an indirect calculus isrefutation complete if \(\Gamma \vDash \alpha\)implies \(\Gamma \cup \{{\sim}\alpha \} \vdash \bot\). Of the twoproperties, soundness is the most desirable. An incomplete calculusindicates that there are entailment relations that cannot beestablished within the calculus. For an automated reasoning programthis means, informally, that there are true statements that theprogram cannot prove. Incompleteness may be an unfortunate affair butlack of soundness is a truly problematic situation since an unsoundreasoning program would be able to generate false conclusions fromperfectly true information.
It is important to appreciate the difference between a logicalcalculus and its corresponding implementation in a reasoning program.The implementation of a calculus invariably involves making somemodifications to the calculus and this results, strictly speaking, ina new calculus. The most important modification to the originalcalculus is the “mechanization” of its deduction rules,that is, the specification of the systematic way in which the rulesare to be applied. In the process of doing so, one must exercise careto preserve the metatheoretical properties of the originalcalculus.
Two other metatheoretical properties of importance to automateddeduction are decidability and complexity. A calculus isdecidable if it admits an algorithmic representation,that is, if there is an algorithm that, for any given \(\Gamma\) and\(\alpha\), it can determine in a finite amount of time the answer,“Yes” or “No”, to the question “Does\(\Gamma \vDash \alpha\)?” A calculus may be undecidable inwhich case one needs to determine which decidable fragment toimplement. The time-spacecomplexity of a calculusspecifies how efficient its algorithmic representation is. Automatedreasoning is made the more challenging because many calculi ofinterest are not decidable and have poor complexity measures forcingresearchers to seek tradeoffs between deductive power versusalgorithmic efficiency.
Of the many calculi used in the implementation of reasoning programs,the ones based on theresolution principle have beenthe most popular. Resolution is modeled after the chain rule (of whichModus Ponens is a special case) and essentially states that from \(p\vee q\) and \({\sim}q \vee r\) one can infer \(p \vee r\). Moreformally, let \(C - l\) denote the clause \(C\) with the literal \(l\)removed. Assume that \(C_1\) and \(C_2\) are ground clausescontaining, respectively, a positive literal \(l_1\) and a negativeliteral \({\sim}l_2\) such that \(l_1\) and \({\sim}l_2\) arecomplementary. Then, the rule ofground resolutionstates that, as a result ofresolving \(C_1\) and\(C_2\), one can infer \((C_1 - l_1) \vee(C_2 - {\sim}l_2)\):
\[\tag{ground resolution}\frac{C_1 C_2}{(C_1 - l_1)\vee (C_2 - \sim l_2)}\]Herbrand’s theorem (Herbrand 1930) assures usthat the non-satisfiability ofany set of clauses, ground ornot, can be established by using ground resolution. This is a verysignificant result for automated deduction since it tells us that if aset \(\Gamma\) is not satisfied by any of the infinitely manyinterpretations, this fact can be determined infinitely manysteps. Unfortunately, a direct implementation of ground resolutionusing Herbrand’s theorem requires the generation of a vastnumber of ground terms making this approach hopelessly inefficient.This issue was effectively addressed by generalizing the groundresolution rule tobinary resolution and byintroducing the notion of unification (Robinson 1965a). Unificationallows resolution proofs to be “lifted” and be conductedat a more general level; clauses only need to be instantiated at themoment where they are to be resolved. Moreover, the clauses resultingfrom the instantiation process do not have to be ground instances andmay still contain variables. The introduction of binary resolution andunification is considered one of the most important developments inthe field of automated reasoning.
Aunifier of two expressions—terms orclauses—is a substitution that when applied to the expressionsmakes them equal. For example, the substitution \(\sigma\) givenby
\[\sigma := \{x \leftarrow b, y \leftarrow b, z \leftarrow f(a,b)\}\]is a unifier for
\[R(x,f(a,y))\]and
\[R(b,z)\]since when applied to both expressions it makes them equal:
\[\begin{aligned}R(x, f(a, y))\sigma & = R(b, f(a, b))\\& = R(b, z)\sigma\end{aligned}\]Amost general unifier (mgu) produces the mostgeneral instance shared by two unifiable expressions. In the previousexample, the substitution \(\{x \leftarrow b, y \leftarrow b, z\leftarrow f(a,b)\}\) is a unifier but not an mgu; however, \(\{x\leftarrow b, z \leftarrow f(a,y)\}\) is an mgu. Note that unificationattempts to “match” two expressions and this fundamentalprocess has become a central component of most automated deductionprograms, resolution-based and otherwise.Theory-unification is an extension of the unificationmechanism that includes built-in inference capabilities. For example,the clauses \(R(g(a,b),x)\) and \(R(g(b,a),d)\) do not unify but theyAC-unify, where AC-unification is unification with built-inassociative and commutative rules such as \(g(a,b) = g(b,a)\).Shifting inference capabilities into the unification mechanism addspower but at a price: The existence of an mgu for two unifiableexpressions may not be unique (there could actually be infinitelymany), and the unification process becomes undecidable in general.
Let \(C_1\) and \(C_2\) be two clauses containing, respectively, apositive literal \(l_1\) and a negative literal \({\sim}l_2\) suchthat \(l_1\) and \(l_2\) unify with mgu \(\theta\). Then,
\[\tag{binary resolution}\frac{C_1 C_2}{(C_1 \theta - l_1 \theta)\vee (C_2 \theta - \sim l_2 \theta)}\]by binary resolution; the clause \((C_1\theta - l_1\theta) \vee(C_2\theta - {\sim}l_2\theta)\) is called abinaryresolvent of \(C_1\) and \(C_2\).
If two or more literals occurring in a clause \(C\) share an mgu\(\theta\) then \(C\theta\) is afactor of \(C\). Forexample, in \(R(x,a) \vee{\sim}K(f(x),b) \vee R(c,y)\) the literals\(R(x,a)\) and \(R(c,y)\) unify with mgu \(\{x \leftarrow c, y\leftarrow a\}\) and, hence, \(R(c,a) \vee{\sim}K(f(c),b)\) is afactor of the original clause.
Let \(C_1\)and \(C_2\) be two clauses. Then, aresolvent obtained byresolutionfrom \(C_1\) and \(C_2\) is defined as: (a) a binary resolvent of\(C_1\) and \(C_2\); (b) a binary resolvent of \(C_1\) and a factor of\(C_2\); (c) a binary resolvent of a factor of \(C_1\) and \(C_2\);or, (d) a binary resolvent of a factor of \(C_1\) and a factor of\(C_2\).
Resolution proofs, more precisely refutations, are constructed byderiving the empty clause [ ] from \(\Gamma \cup \{{\sim}\alpha\}\) using resolution; this will always be possible if \(\Gamma \cup\{{\sim}\alpha \}\) is unsatisfiable since resolution is refutationcomplete (Robinson 1965a). As an example of a resolution proof, weshow that the set \(\{\forall x(P(x) \vee Q(x)), \forall x(P(x)\supset R(x)),\forall x(Q(x) \supset R(x))\}\), denoted by \(\Gamma\),entails the formula \(\exists xR(x)\). The first step is to find theclausal form of \(\Gamma \cup \{{\sim}\exists xR(x)\}\); the resultingclause set, denoted by \(S_0\), is shown in steps 1 to 4 in therefutation below. The refutation is constructed by using alevel-saturation method: Compute all the resolvents of the initialset, \(S_0\), add them to the set and repeat the process until theempty clause is derived. (This produces the sequence of increasinglylarger sets: \(S_0, S_1, S_2,\ldots)\) The only constraint that weimpose is that we do not resolve the same two clauses more thanonce.
\(S_0\) 1 \(P(x) \vee Q(x)\) Assumption 2 \({\sim}\)P\((x) \vee R(x)\) Assumption 3 \({\sim}\)Q\((x) \vee R(x)\) Assumption 4 \({\sim}R(a)\) Negate conclusion \(S_1\) 5 \(Q(x) \vee R(x)\) Res 1 2 6 \(P(x) \vee R(x)\) Res 1 3 7 \({\sim}P(a)\) Res 2 4 8 \({\sim}Q(a)\) Res 3 4 \(S_2\) 9 \(Q(a)\) Res 1 7 10 \(P(a)\) Res 1 8 11 \(R(x)\) Res 2 6 12 \(R(x)\) Res 3 5 13 \(Q(a)\) Res 4 5 14 \(P(a)\) Res 4 6 15 \(R(a)\) Res 5 8 16 \(R(a)\) Res 6 7 \(S_3\) 17 \(R(a)\) Res 2 10 18 \(R(a)\) Res 2 14 19 \(R(a)\) Res 3 9 20 \(R(a)\) Res 3 13 21 [ ] Res 4 11
Although the resolution proof is successful in deriving [ ], ithas some significant drawbacks. To start with, the refutation is toolong as it takes 21 steps to reach the contradiction, [ ]. Thisis due to the naïve brute-force nature of the implementation. Theapproach not only generates too many formulas but some are clearlyredundant. Note how \(R(a)\) is derived six times; also, \(R(x)\) hasmore “information content” than \(R(a)\) and one shouldkeep the former and disregard the latter. Resolution, like all otherautomated deduction methods, must be supplemented by strategies aimedat improving the efficiency of the deduction process. The above samplederivation has 21 steps but research-type problems command derivationswith thousands or hundreds of thousands of steps.
The successful implementation of a deduction calculus in an automatedreasoning program requires the integration of search strategies thatreduce the search space by pruning unnecessary deduction paths. Somestrategies remove redundant clauses or tautologies as soon as theyappear in a derivation. Another strategy is to remove more specificclauses in the presence of more general ones by a process known assubsumption (Robinson 1965a). Unrestrictedsubsumption, however, does not preserve the refutation completeness ofresolution and, hence, there is a need to restrict its applicability(Loveland 1978).Model elimination (Loveland 1969)can discard a sentence by showing that it is false in some model ofthe axioms. The subject of model generation has received muchattention as a complementary process to theorem proving. The methodhas been used successfully by automated reasoning programs to show theindependence of axioms sets and to determine the existence of discretemathematical structures meeting some given criteria.
Instead of removing redundant clauses, some strategies prevent thegeneration of useless clauses in the first place. Theset-of-support strategy (Wos, Carson & Robinson1965) is one of the most powerful strategies of this kind. A subset\(T\) of the set \(S\), where \(S\) is initially \(\Gamma \cup\{{\sim}\alpha \}\), is called aset of support of\(S\) iff \(S - T\) is satisfiable. Set-of-support resolution dictatesthat the resolved clauses are not both from \(S - T\). The motivationbehind set-of-support is that since the set \(\Gamma\) is usuallysatisfiable it might be wise not to resolve two clauses from\(\Gamma\) against each other.Hyperresolution(Robinson 1965b) reduces the number of intermediate resolvents bycombining several resolution steps into a single inference step.
Independently co-discovered,linear resolution(Loveland 1970, Luckham 1970) always resolves a clause against themost recently derived resolvent. This gives the deduction a simple“linear” structure affording a straightforwardimplementation; yet, linear resolution preserves refutationcompleteness. Using linear resolution we can derive the empty clausein the above example in only eight steps:
1 \(P(x) \vee Q(x)\) Assumption 2 \({\sim}\)P\((x) \vee R(x)\) Assumption 3 \({\sim}\)Q\((x) \vee R(x)\) Assumption 4 \({\sim}R(a)\) Negated conclusion 5 \({\sim}P(a)\) Res 2 4 6 \(Q(a)\) Res 1 5 7 \(R(a)\) Res 3 6 8 [ ] Res 4 7
With the exception of unrestricted subsumption, all the strategiesmentioned so far preserve refutation completeness. Efficiency is animportant consideration in automated reasoning and one may sometimesbe willing to trade completeness for speed.Unitresolution andinput resolution are two suchrefinements of linear resolution. In the former, one of the resolvedclauses is always a literal; in the latter, one of the resolvedclauses is always selected from the original set to be refuted. Albeitefficient, neither strategy is complete. Ordering strategies imposesome form of partial ordering on the predicate symbols, terms,literals, or clauses occurring in the deduction.Orderedresolution treats clauses not as sets of literals but assequences—linear orders—of literals. Ordered resolution isextremely efficient but, like unit and input resolution, is notrefutation complete. To end, it must be noted that some strategiesimprove certain aspects of the deduction process at the expense ofothers. For instance, a strategy may reduce the size of the proofsearch space at the expense of increasing, say, the length of theshortest refutations. A taxonomy and detailed presentation oftheorem-proving strategies can be found in Bonacina 1999; for adiscussion of the relative complexity (i.e. efficiency measures) ofresolution see Buresh-Oppenheim & Pitassi 2003, and Urquhart1987.
There are several automated reasoning programs that are based onresolution, or refinements of resolution. Otter (succeeded by Prover4)was a driving force in the development of automated reasoning (Wos,Overbeek, Lusk & Boyle 1984) but it has been superseded by morecapable programs like Vampire (Voronkov 1995, Kovács &Voronkov 2013). Resolution also provides the underlyinglogico-computational mechanism for the popular logic programminglanguage Prolog (Clocksin & Mellish 1981).
Hilbert-style calculi (Hilbert and Ackermann 1928) have beentraditionally used to characterize logic systems. These calculiusually consist of a few axiom schemata and a small number of rulesthat typically include modus ponens and the rule of substitution.Although they meet the required theoretical requisites (soundness,completeness, etc.) the approach at proof construction in thesecalculi is difficult and does not reflect standard practice. It wasGentzen’s goal “to set up a formalism that reflects asaccurately as possible the actual logical reasoning involved inmathematical proofs” (Gentzen 1935). To carry out this task,Gentzen analyzed the proof-construction process and then devised twodeduction calculi for classical logic: the natural deduction calculus,\(\mathbf{NK}\), and the sequent calculus, \(\mathbf{LK}\). (Gentzenactually designed NK first and then introduced LK to pursuemetatheoretical investigations). The calculi met his goal to a largeextent while at the same time managing to secure soundness andcompleteness. Both calculi are characterized by a relatively largernumber of deduction rules and a simple axiom schema. Of the twocalculi, LK is the one that has been most widely used inimplementations of automated reasoning programs, and it is the onethat we will discuss first; NK will be discussed in the nextsection.
Although the application of the LK rules affect logic formulas, therules are seen as manipulating not logic formulas themselves butsequents.Sequents are expressions of the form\(\Gamma \rightarrow \Delta\), where both \(\Gamma\) and \(\Delta\)are (possibly empty) sets of formulas. \(\Gamma\) is thesequent’santecedent and \(\Delta\) itssuccedent. Sequents can be interpreted thus: Let\(\mathcal{I}\) be an interpretation. Then,
\(\mathcal{I}\) satisfies the sequent \(\Gamma \rightarrow \Delta\)(written as: \(\mathcal{I} \vDash \Gamma \rightarrow \Delta)\) iff
either \(\mathcal{I} \not\vDash \alpha\) (for some \(\alpha \in\Gamma)\) or \(\mathcal{I} \vDash \beta\) (for some \(\beta \in\Delta)\).
In other words,
\(\mathcal{I} \vDash \Gamma \rightarrow \Delta\) iff \(\mathcal{I}\vDash(\alpha_1~ \amp \ldots \amp ~\alpha_n) \supset (\beta_1 \vee\ldots \vee \beta_n)\), where \(\alpha_1~ \amp \ldots \amp ~\alpha_n\)is the iterated conjunction of the formulas in \(\Gamma\) and\(\beta_1 \vee \ldots \vee \beta_n\) is the iterated disjunction ofthose in \(\Delta\).
If \(\Gamma\) or \(\Delta\) are empty then they are respectively validor unsatisfiable. Anaxiom of LK is a sequent\(\Gamma \rightarrow \Delta\) where \(\Gamma \cap \Delta \ne\varnothing\). Thus, the requirement that the same formula occurs ateach side of the \(\rightarrow\) sign means that the axioms of LK arevalid, for no interpretation can then make all the formulas in\(\Gamma\) true and, simultaneously, make all those in \(\Delta\)false. LK has two rules per logical connective, plus one extra rule:the cut rule.
| Axioms | Cut Rule | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| Antecedent Rules \((\Theta \rightarrow)\) | Succedent Rules \((\rightarrow \Theta)\) | ||||||||||
| \(\amp\rightarrow\) |
| \(\rightarrow\amp\) |
| ||||||||
| \(\vee \rightarrow\) |
| \(\rightarrow \vee\) |
| ||||||||
| \(\supset \rightarrow\) |
| \(\rightarrow \supset\) |
| ||||||||
| \(\supset \equiv\) |
| \(\equiv \supset\) |
| ||||||||
| \({\sim}\rightarrow\) |
| \(\rightarrow{\sim}\) |
| ||||||||
| \(\exists \rightarrow\) |
| \(\rightarrow \exists\) |
| ||||||||
| \(\forall \rightarrow\) |
| \(\rightarrow \forall\) |
| ||||||||
The sequents above a rule’s line are called therule’s premises and the sequent below the lineis therule’s conclusion. The quantificationrules \(\exists \rightarrow\) and \(\rightarrow \forall\) have aneigenvariable condition that restricts their applicability, namelythat \(a\) must not occur in \(\Gamma , \Delta\) or in the quantifiedsentence. The purpose of this restriction is to ensure that the choiceof parameter, \(a\), used in the substitution process is completely“arbitrary”.
Proofs in LK are represented as trees where each node in the tree islabeled with a sequent, and where the original sequent sits at theroot of the tree. The children of a node are the premises of the rulebeing applied at that node. The leaves of the tree are labeled withaxioms. Here is the LK-proof of \(\exists xR(x)\) from the set\(\{\forall x(P(x) \vee Q(x)), \forall x(P(x) \supset R(x)),\forallx(Q(x) \supset R(x))\}\). In the tree below, \(\Gamma\) stands forthis set:
|
| ||||||||||||
| |||||||||||||
In our example, all the leaves in the proof tree are labeled withaxioms. This establishes the validity of \(\Gamma \rightarrow \existsxR(x)\) and, hence, the fact that \(\Gamma \vDash \exists xR(x)\). LKtakes an indirect approach at proving the conclusion and this is animportant difference between LK and NK. While NK constructs an actualproof (of the conclusion from the given assumptions), LK insteadconstructs a proof that proves the existence of a proof (of theconclusion from the assumptions). For instance, to prove that\(\alpha\) is entailed by \(\Gamma\), NK constructs a step-by-stepproof of \(\alpha\) from \(\Gamma\) (assuming that one exists); incontrast, LK first constructs the sequent \(\Gamma \rightarrow\alpha\) which then attempts to prove valid by showing that it cannotbe made false. This is done by searching for a counterexample thatmakes (all the sentences in) \(\Gamma\) true and makes \(\alpha\)false: If the search fails then a counterexample does not exist andthe sequent is therefore valid. In this respect, proof trees in LK areactually refutation proofs. Like resolution, LK is refutationcomplete: If \(\Gamma \vDash \alpha\) then the sequent \(\Gamma\rightarrow \alpha\) has a proof tree.
As it stands, LK is unsuitable for automated deduction and there aresome obstacles that must be overcome before it can be efficientlyimplemented. The reason is, of course, that the statement of thecompleteness of LK only has to assert, for each entailment relation,the existence of a proof tree but a reasoning program has the moredifficult task of actually having to construct one. Some of the mainobstacles: First, LK does not specify the order in which the rulesmust be applied in the construction of a proof tree. Second, and as aparticular case of the first problem, the premises in the rules\(\forall \rightarrow\) and \(\rightarrow \exists\) rules inherit thequantificational formula to which the rule is applied, meaning thatthe rules can be applied repeatedly to the same formula sending theproof search into an endless loop. Third, LK does not indicate whichformula must be selected next in the application of a rule. Fourth,the quantifier rules provide no indication as to what terms or freevariables must be used in their deployment. Fifth, and as a particularcase of the previous problem, the application of a quantifier rule canlead into an infinitely long tree branch because the proper term to beused in the instantiation never gets chosen. Fortunately, as we willhint at below each of these problems can be successfullyaddressed.
Axiom sequents in LK are valid, and the conclusion of a rule is validiff its premises are. This fact allows us to apply the LK rules ineither direction, forwards from axioms to conclusion, or backwardsfrom conclusion to axioms. Also, with the exception of the cut rule,all the rules’ premises are subformulas of their respectiveconclusions. For the purposes of automated deduction this is asignificant fact and we would want to dispense with the cut rule;fortunately, the cut-free version of LK preserves its refutationcompleteness (Gentzen 1935). These results provide a strong case forconstructing proof trees in a backwards fashion; indeed, by workingthis way a refutation in cut-free LK gets increasingly simpler as itprogresses since subformulas are simpler than their parent formulas.Moreover, and as far as propositional rules go, the new subformulasentered into the tree are completely dictated by the cut-free LKrules. Furthermore, and assuming the proof tree can be brought tocompletion, branches eventually end up with atoms and the presence ofaxioms can be quickly determined. Another reason for working backwardsis that the truth-functional fragment of cut-free LK isconfluent in the sense that the order in which thenon-quantifier rules are applied is irrelevant: If there is a proof,regardless of what you do, you will run into it! To bring thequantifier rules into the picture, things can be arranged so that allrules have a fair chance of being deployed: Apply, as far as possible,all the non-quantifier rules before applying any of the quantifierrules. This takes care of the first and second obstacles, and it is notoo difficult to see how the third one would now be handled. Thefourth and fifth obstacles can be addressed by requiring that theterms to be used in the substitutions be suitably selected from theHerbrand universe (Herbrand 1930).
The use of sequent-type calculi in automated theorem proving wasinitiated by efforts to mechanize mathematics (Wang 1960). At thetime, resolution captured most of the attention of the automatedreasoning community but during the 1970s some researchers started tofurther investigate non-resolution methods (Bledsoe 1977), prompting afrutiful and sustained effort to develop more human-oriented theoremproving systems (Bledsoe 1975, Nevins 1974). Eventually, sequent-typededuction gained momentum again, particularly in its re-incarnation asanalytic tableaux (Fitting 1990). The method ofdeduction used in tableaux is essentially cut-free LK’s withsets used in lieu of sequents.
Although LK and NK are both commonly labeled as “naturaldeduction” systems, it is the latter which better deserves thetitle due to its more natural, human-like, approach to proofconstruction. The rules of NK are typically presented as acting onstandard logic formulas in an implicitly understood context, but theyare also commonly given in the literature as acting more explicitly on“judgements”, that is, expressions of the form \(\Gamma\vdash \alpha\) where \(\Gamma\) is a set of formulas and \(\alpha\)is a formula. This form is typically understood as making themetastatement that there is a proof of \(\alpha\) from \(\Gamma\)(Kleene 1962). Following Gentzen 1935 and Prawitz 1965 here we takethe former approach. The system NK has no logical axioms and providestwo introduction-elimination rules for each logical connective:
| Introduction Rules \((\Theta \mathbf{I})\) | Elimination Rules \((\Theta \mathbf{E})\) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| \(\amp\)I |
| \(\amp\)E |
| ||||||||
| \(\vee\)I |
| \(\vee\)E |
| ||||||||
| \(\supset\)I |
| \(\supset\)E |
| ||||||||
| \(\equiv\)I |
| \(\equiv\)E |
| ||||||||
| \({\sim}\)I |
| \({\sim}\)E |
| ||||||||
| \(\exists\)I |
| \(\exists\)E |
| ||||||||
| \(\forall\)I |
| \(\forall\)E |
| ||||||||
A few remarks: First, the expression [\(\alpha\)—\( \gamma\)]represents the fact that \(\alpha\) is an auxiliary assumption in theproof of \(\gamma\) that eventually gets discharged, i.e. discarded.For example, \(\exists\)E tells us that if in the process ofconstructing a proof one has already derived \(\exists x\alpha(x)\)and also \(\beta\) with \(\alpha(a/x)\) as an auxiliary assumptionthen the inference to \(\beta\) is allowed. Second, theeigenparameter, \(a\), in \(\exists\)E and \(\forall\)I must beforeign to the premises,undischarged—“active”—assumptions, to therule’s conclusion and, in the case of \(\exists\)E, to \(\existsx\alpha(x)\). Third, \(\bot\) is shorthand for two contradictoryformulas, \(\beta\) and \({\sim}\beta\). Finally, NK is complete: If\(\Gamma \vDash \alpha\) then there is a proof of \(\alpha\) from\(\Gamma\) using the rules of NK.
As in LK, proofs constructed in NK are represented as trees with theproof’s conclusion sitting at the root of the tree, and theproblem’s assumptions sitting at the leaves. (Proofs are alsotypically given as sequences of judgements, \(\Gamma \vdash \alpha\),running from the top to the bottom of the printed page.) Here is anatural deduction proof tree of \(\exists xR(x)\) from \(\forallx(P(x) \vee Q(x)), \forall x(P(x) \supset R(x))\) and \(\forall x(Q(x)\supset R(x))\):
|
|
| ||||||||||||||
| ||||||||||||||||
As in LK, a forward-chaining strategy for proof construction is notwell focused. So, although proofs areread forwards, that is,from leaves to root or, logically speaking, from assumptions toconclusion, that is not the way in which they are typicallyconstructed. A backward-chaining strategy implemented byapplying the rules in reverse order is more effective. Many of theobstacles that were discussed above in the implementation of sequentdeduction are applicable to natural deduction as well. These issuescan be handled in a similar way, but natural deduction introduces someissues of its own. For example, as suggested by the\(\supset\)-Introduction rule, to prove a goal of the form \(\alpha\supset \beta\) one could attempt to prove \(\beta\) on the assumptionthat \(\alpha\). But note that although the goal \(\alpha \supset\beta\) does not match the conclusion of any other introduction rule,it matches the conclusion of allelimination rules and thereasoning program would need to consider those routes too. Similarlyto forward-chaining, here there is the risk of setting goals that areirrelevant to the proof and that could lead the program astray. Towit: What prevents a program from entering the never-ending process ofbuilding, say, larger and larger conjunctions? Or, what is there toprevent an uncontrolled chain of backward applications of, say,\(\supset\)-Elimination? Fortunately, NK enjoys thesubformulaproperty in the sense that each formula entering into anatural deduction proof can be restricted to being a subformula of\(\Gamma \cup \Delta \cup \{\alpha \}\), where \(\Delta\) is the setof auxiliary assumptions made by the \({\sim}\)-Elimination rule. Byexploiting the subformula property a natural deduction automatedtheorem prover can drastically reduce its search space and bring thebackward application of the elimination rules under control (Portoraro1998, Sieg & Byrnes 1996). Further gains can be realized if one iswilling to restrict the scope of NK’s logic to itsintuitionistic fragment where every proof has a normal form in thesense that no formula is obtained by an introduction rule and then iseliminated by an elimination rule (Prawitz 1965).
Implementations of automated theorem proving systems using NKdeduction have been motivated by the desire to have the program reasonwith precisely the same proof format and methods employed by the humanuser. This has been particularly true in the area of education wherethe student is engaged in the interactive construction of formalproofs in an NK-like calculus working under the guidance of a theoremprover ready to provide assistance when needed (Portoraro 1994, Suppes1981). Other, research-oriented, theorem provers true to the spirit ofNK exist (Pelletier 1998) but are rare.
The name of the matrix connection method (Bibel 1981) is indicative ofthe way it operates. The term “matrix” refers to the formin which the set of logic formulas expressing the problem isrepresented; the term “connection” refers to the way themethod operates on these formulas. To illustrate the method at work,we will use an example from propositional logic and show that \(R\) isentailed by \(P \vee Q, P \supset R\) and \(Q \supset R\). This isdone by establishing that the formula
\[(P \vee Q) \amp(P \supset R) \amp(Q \supset R) \amp{\sim}R\]is unsatisfiable. To do this, we begin by transforming it intoconjunctive normal form:
\[(P \vee Q) \amp({\sim}P \vee R) \amp({\sim}Q \vee R) \amp{\sim}R\]This formula is then represented as a matrix, one conjunct per rowand, within a row, one disjunct per column:
| \(P\) | \( Q\) |
| \({\sim}P\) | \(R\) |
| \({\sim}Q\) | \(R\) |
| \({\sim}R\) |
The idea now is to explore all the possible vertical paths runningthrough this matrix. Avertical path is a set ofliterals selected from each row in the matrix such that each literalcomes from a different row. The vertical paths:
| Path 1 | \(P, {\sim}P, {\sim}Q\) and \({\sim}R\) |
| Path 2 | \(P, {\sim}P, R\) and \({\sim}R\) |
| Path 3 | \(P, R, {\sim}Q\) and \({\sim}R\) |
| Path 4 | \(P, R, R\) and \({\sim}R\) |
| Path 5 | \(Q, {\sim}P, {\sim}Q\) and \({\sim}R\) |
| Path 6 | \(Q, {\sim}P, R\) and \({\sim}R\) |
| Path 7 | \(Q, R, {\sim}Q\) and \({\sim}R\) |
| Path 8 | \(Q, R, R\) and \({\sim}R\) |
A path iscomplementary if it contains two literalswhich are complementary. For example, Path 2 is complementary since ithas both \(P\) and \({\sim}P\) but so is Path 6 since it contains both\(R\) and \({\sim}R\). Note that as soon as a path includes twocomplementary literals there is no point in pursuing the path since ithas itself become complementary. This typically allows for a largereduction in the number of paths to be inspected. In any event, allthe paths in the above matrix are complementary and this factestablishes the unsatisfiability of the original formula. This is theessence of the matrix connection method. The method can be extended topredicate logic but this demands additional logical apparatus:Skolemnization, variable renaming, quantifier duplication,complementarity of paths via unification, and simultaneoussubstitution across all matrix paths (Bibel 1981, Andrews 1981).Variations of the method have been implemented in reasoning programsin higher-order logic (Andrews 1981) and non-classical logics (Wallen1990).
Equality is an important logical relation whose behavior withinautomated deduction deserves its own separate treatment.Equational logic and, more generally,termrewriting treat equality-like equations asrewriterules, also known asreduction ordemodulation rules. An equality statement like\(f(a)= a\) allows the simplification of a term like \(g(c,f(a))\) to\(g(c,a)\). However, the same equation also has the potential togenerate an unboundedly large term: \(g(c,f(a)), g(c,f(f(a))),g(c,f(f(f(a)))),\ldots\) . What distinguishes term rewriting fromequational logic is that in term rewriting equations are used asunidirectional reduction rules as opposed to equality which works inboth directions. Rewrite rules have the form \(t_1 \Rightarrow t_2\)and the basic idea is to look for terms \(t\) occurring in expressions\(e\) such that \(t\) unifies with \(t_1\) with unifier \(\theta\) sothat the occurrence \(t_1\theta\) in \(e\theta\) can be replaced by\(t_2\theta\). For example, the rewrite rule \(x + 0 \Rightarrow x\)allows the rewriting ofsucc(succ(0) \(+ 0)\) assucc(succ(0)).
To illustrate the main ideas in term rewriting, let us explore anexample involving symbolic differentiation (the example and ensuingdiscussion are adapted from Chapter 1 of Baader & Nipkow 1998).Letder denote the derivative respect to \(x\), let \(y\) bea variable different from \(x\), and let \(u\) and \(v\) be variablesranging over expressions. We define the rewrite system:
\[\begin{align}\tag{R1} \der(x) & \Rightarrow 1\\\tag{R2} \der(y) & \Rightarrow 0\\\tag{R3} \der(u+v) & \Rightarrow \der(u) + \der(v)\\\tag{R4} \der( u \times v) & \Rightarrow (u \times \der(v)) + (\der(u) \times v)\end{align}\]Again, the symbol \(\Rightarrow\) indicates that a term matching theleft-hand side of a rewrite rule should be replaced by therule’s right-hand side. To see the differentiation system atwork, let us compute the derivative of \(x \times x\) respect to\(x\), \(\der(x \times x)\):
| \(\der(x \times x)\) | \(\Rightarrow\) | \((x \times \der(x)) + (\der(x) \times x)\) | by R4 | |
| \(\Rightarrow\) | \((x \times 1) + (\der(x) \times x)\) | by R1 | ||
| \(\Rightarrow\) | \((x \times 1) + (1 \times x)\) | by R1 |
At this point, since none of the rules (R1)–(R4) applies, nofurther reduction is possible and the rewriting process ends. Thefinal expression obtained is called anormal form,and its existence motivates the following question: Is there anexpression whose reduction process will never terminate when applyingthe rules (R1)–(R4)? Or, more generally: Under what conditions aset of rewrite rules will always stop, for any given expression, at anormal form after finitely many applications of the rules? Thisfundamental question is called theterminationproblem of a rewrite system, and we state without proof that thesystem (R1)–(R4) meets the termination condition.
There is the possibility that when reducing an expression, the set ofrules of a rewrite system could be applied in more than one way. Thisis actually the case in the system (R1)–(R4) where in thereduction of \(\der(x \times x)\) we could have applied R1 first tothe second sub-expression in \((x \times\) \(\der(x)) + \der(x) \timesx)\), as shown below:
| \(\der(x \times x)\) | \(\Rightarrow\) | \((x \times \der(x)) + (\der(x) \times x)\) | by R4 | |
| \(\Rightarrow\) | \((x \times \der(x)) + (1 \times x)\) | by R1 | ||
| \(\Rightarrow\) | \((x \times 1) + (1 \times x)\) | by R1 |
Following this alternative course of action, the reduction terminateswith the same normal form as in the previous case. This fact, however,should not be taken for granted: A rewriting system is said to be(globally) confluent if and only if independently ofthe order in which its rules are applied every expression always endsup being reduced to its one and only normal form. It can be shown that(R1)–(R4) is confluent and, hence, we are entitled to say:“Computethe derivative of an expression” (asopposed to simply “\(a\)” derivative). Adding more rulesto a system in an effort to make it more practical can have undesiredconsequences. For example, if we add the rule
\[\tag{R5} u+0\Rightarrow u\]to (R1)–(R4) then we will be able to further reduce certainexpressions but at the price of losing confluency. The followingreductions show that \(\der(x + 0)\) now has two normal forms: thecomputation
| \(\der(x + 0)\) | \(\Rightarrow\) | \(\der(x) + \der(0)\) | by R3 | |
| \(\Rightarrow\) | \(1 + \der(0)\) | by R1 |
gives one normal form, and
| \(\der(x + 0)\) | \(\Rightarrow\) | \(\der(x)\) | by R5 | |
| \(\Rightarrow\) | 1 | by R1 |
gives another. Adding the rule
\[\tag{R6} \der(0) \Rightarrow 0\]would allow the further reduction of \(1 + \der(0)\) to \(1 + 0\) andthen, by R5, to 1. Although the presence of this new rule actuallyincreases the number of alternative paths—\(\der(x + 0)\) cannow be reduced in four possible ways—they all end up with thesame normal form, namely 1. This is no coincidence as it can be shownthat (R6) actually restores confluency. This motivates anotherfundamental question: Under what conditions can a non-confluent systembe made into an equivalent confluent one? The Knuth-Bendixcompletion algorithm (Knuth & Bendix 1970) givesa partial answer to this question.
Term rewriting, like any other automated deduction method, needsstrategies to direct its application. Rippling (Bundy, Stevens &Harmelen 1993, Basin & Walsh 1996) is a heuristic that has itsorigins in inductive theorem-proving that uses annotations toselectively restrict the rewriting process. Thesuperpositioncalculus is a calculus of equational first-order logic thatcombines notions from first-order resolution and Knuth-Bendix orderingequality. Superposition is refutation complete (Bachmair &Ganzinger 1994) and is at the heart of a number of theorem provers,most notably the E equational theorem prover (Schulz 2004) and Vampire(Voronkov 1995). Superposition has been extended to higher-order logic(Bentkampet al. 2021).
Mathematical induction is a very important technique of theoremproving in mathematics and computer science. Problems stated in termsof objects or structures that involve recursive definitions or someform of repetition invariably require mathematical induction for theirsolving. In particular, reasoning about the correctness of computersystems requires induction and an automated reasoning program thateffectively implements induction will have important applications.
To illustrate the need for mathematical induction, assume that aproperty \(\phi\) is true of the number zero and also that if true ofa number then is true of its successor. Then, with our deductivesystems, we can deduce that for any given number \(n, \phi\) is trueof it, \(\phi(n)\). But we cannot deduce that \(\phi\) is true of allnumbers, \(\forall x\phi(x)\); this inference step requires the ruleof mathematical induction:
\[\tag{mathematical induction}\frac{\alpha(0)\quad \quad [\alpha(n)-\alpha(\textit{succ}(n))]}{\forall x \alpha(x)}\]In other words, to prove that \(\forall x\alpha(x)\) one proves that\(\alpha(0)\) is the case, and that \(\alpha(succ(n))\) follows fromthe assumption that \(\alpha(n)\). The implementation of induction ina reasoning system presents very challenging search control problems.The most important of these is the ability to determine the particularway in which induction will be applied during the proof, that is,finding the appropriate induction schema. Related issues includeselecting the proper variable of induction, and recognizing all thepossible cases for the base and the inductive steps.
The Boyer-Moore theorem prover (Boyer & Moore 1979) has been amost successful implementation of automated inductive theorem proving.In the spirit of Gentzen, Boyer and Moore were interested in howpeople prove theorems by mathematical induction. Their theorem proveris written in the functional programming language Lisp which is alsothe language in which theorems are represented. For instance, toexpress the commutativity of addition the user would enter the Lispexpression(EQUAL (PLUS X Y) (PLUS Y X)).Everything defined in the system is a functional term, including itsbasic “predicates”:T,F,EQUAL X Y,ANDX Y,NOT X,IFX Y Z, etc. Proofs are conducted by rewritingterms that possess recursive definitions, ultimately reducing theconclusion’s statement to theTpredicate. The Boyer-Moore theorem prover has been used to check theproofs of some quite deep theorems (Boyer, Kaufmann & Moore 1995).Lemma caching, problem statement generalization, and proof planningare techniques particularly useful in inductive theorem proving(Bundy, Harmelen & Hesketh 1991).
ACL2 (Kaufmann & Moore 1996) is a successor of, and supersedes,the Boyer-Moore prover and is intended for large-scale hardware andsoftware verification projects. Models of computer hardware andsoftware can be formally represented as “state machines”consisting of “states” as mathematical objects and“transitions” as functions or relations on state objects.ACL2 uses an extension of an applicative subset of Common Lisp as thenatural language in which to express such notions and where models andtheir properties can be derived logically (Boyer & Moore 1997).ACL2 supports the user in the building of proofs but its primary roleis to prevent logical mistakes. In a quasi-“black box”fashion, “to be a good user of ACL2 you do not have tounderstand how the theorem prover works. You just have to understandhow to interact with it” (Kaufmann & Moore 2024 [OtherInternet Resources]) but, if desired, the user can easily command ACL2to supply its full proof output for inspection. The authors of ACL2share the interesting insight that ACL2 users rarely read successfulproofs and tend to concentrate on subgoals in failed proofs; then,they try to figure out how these goals could be proved and provideACL2 with guidance to do so. Although ACL2 has been largely applied tothe formal verification of microprocessor design (see the section4.4 Formal Verification of Hardware) ACL2 has found manyapplications in other areas of which we mention three here: theformalization of the Normalization Theorem in algebraic simplicialtopology (Lambán, Martín-Mateos, Rubio & Ruiz-Reina2012), the formal verification of an implementation ofBuchberger’s algorithm for computing Gröbner bases ofpolynomial ideals (Medina-Bulo, Palomo-Lozano & Ruiz-Reina 2010),and a formalization of finite group theory up to an application of theSylow theorems (Russinoff 2023).
Higher-order logic differs from first-order logic in thatquantification over functions and predicates is allowed. The statement“Any two people are related to each other in one way oranother” can be legally expressed in higher-order logic as\(\forall x\forall y\exists RR(x,y)\) but not in first-order logic.Higher-order logic is inherently more expressive than first-orderlogic and is closer in spirit to actual mathematical reasoning. Forexample, the notion of set finiteness cannot be expressed as afirst-order concept. Due to its richer expressiveness, it should notcome as a surprise that implementing an automated theorem prover forhigher-order logic is more challenging than for first-order logic.This is largely due to the fact that unification in higher-order logicis more complex than in the first-order case: unifiable terms do notalways possess a most general unifier, and higher-order unification isitself undecidable. Finally, given that higher-order logic isincomplete, there are always proofs that will be entirely out of reachfor any automated reasoning program.
Methods used to automate first-order deduction can be adapted tohigher-order logic. TPS (Andrewset al. 1996, Andrewsetal. 2006) is a theorem proving system for higher-order logic thatuses Church’s typed \(\lambda\)-calculus as its logicalrepresentation language and is based on a connection-type deductionmechanism that incorporates Huet’s unification algorithm (Huet1975). As a sample of the capabilities of TPS, the program has provedautomatically that a subset of a finite set is finite, the equivalenceamong several formulations of the Axiom of Choice, and Cantor’sTheorem that a set has more subsets than members. The latter wasproved by the program by asserting that there is no onto function fromindividuals to sets of individuals, with the proof proceeding by adiagonal argument. HOL (Gordon & Melham 1993) is anotherhigher-order proof development system primarily used as an aid in thedevelopment of hardware and software safety-critical systems. HOL isbased on the LCF approach to interactive theorem proving (Gordon,Milner & Wadsworth 1979), and it is built on the strongly typedfunctional programming language ML. HOL, like TPS, can operate inautomatic and interactive mode. Availability of the latter mode iswelcomed since the most useful automated reasoning systems may well bethose which place an emphasis on interactive theorem proving (Farmer,Guttman & Thayer 1993) and can be used as assistants operatingunder human guidance. (Harrison 2000) discusses the verification offloating-point algorithms and the non-trivial mathematical propertiesthat are proved by HOL Light under the guidance of the user. Isabelle(Paulson 1994) is a generic, higher-order, framework for rapidprototyping of deductive systems. Object logics can be formulatedwithin Isabelle’s metalogic by using its many syntactic anddeductive tools. Isabelle also provides some ready-made theoremproving environments, including Isabelle/HOL, Isabelle/ZF andIsabelle/FOL, which can be used as starting points for applicationsand further development by the user (Paulson 1990, Nipkow &Paulson 2002). Isabelle/ZF has been used to prove equivalentformulations of the Axiom of Choice, formulations of the Well-OrderingPrinciple, as well as the key result about cardinal arithmetic that,for any infinite cardinal \(\kappa , \kappa \cdot \kappa = \kappa\)(Paulson & Grabczewski 1996).
To help prove higher-order theorems and discharge goals arising ininteractive proofs, the user can ask Isabelle/HOL to invoke externalfirst-order provers through Sledgehammer (Paulson 2010), a subsystemaimed at combining the complementary capabilities of automatedreasoning systems of different types, including SMT solvers (seethe section on SAT Solvers; Blanchetteet al. 2013). LEO-II (Benzmülleretal. 2015) is also a resolution-based automated theorem prover forhigher-order logic that has been applied in a wide array of problems,most notably in the automation of Gödel’s ontological proofof God’s existence (see Section 4.6 Logic and Philosophy). Leo-II has been superseded by Leo-III which implements a higher-orderordered paramodulation calculus that operates within a multi-agentblackboard architecture for parallel proof search; the architectureallows to independently run agents using Leo-III’s native proofcalculus as well as, in a cooperative fashion, agents for external,specialized, first- and higher-order theorem provers and model finders(Benzmüller, Steen & Wisniewski 2017, Steen andBenzmüller 2021).
Non-classical logics (Haack 1978) such as modal logics, intuitionsiticlogic, multi-valued logics, autoepistemic logics, non-monotonicreasoning, commonsense and default reasoning, relevance logic,paraconsistent logic, and so on, have been increasingly gaining theattention of the automated reasoning community. One of the reasons hasbeen the natural desire to extend automated deduction techniques tonew domains of logic. Another reason has been the need to mechanizenon-classical logics as an attempt to provide a suitable foundationfor artificial intelligence. A third reason has been the desire toattack some problems that are combinatorially too large to be handledby paper and pencil. Indeed, some of the work in automatednon-classical logic provides a prime example of automated reasoningprograms at work. To illustrate, the Ackerman Constant Problem asksfor the number of non-equivalent formulas in the relevance logic R.There are actually 3,088 such formulas (Slaney 1984) and the numberwas found by “sandwiching” it between a lower and an upperlimit, a task that involved constraining a vast universe of \(20^{400}20\)-element models in search of those models that rejectednon-theorems in R. It is safe to say that this result would have beenimpossible to obtain without the assistance of an automated reasoningprogram.
There have been three basic approaches to automate the solving ofproblems in non-classical logic (McRobie 1991). One approach has been,of course, to try to mechanize the non-classical deductive calculi.Another has been to simply provide an equivalent formulation of theproblem in first-order logic and let a classical theorem prover handleit. A third approach has been to formulate the semantics of thenon-classical logic in a first-order framework where resolution orconnection-matrix methods would apply. (Pelletieret al.2017) describes an automated reasoning system for a paraconsistentlogic that takes both “indirect” approaches, thetranslational and the truth-value approach, to prove its theorems.
Modal logics find extensive use in computing science as logics ofknowledge and belief, logics of programs, and in the specification ofdistributed and concurrent systems. Thus, a program that automatesreasoning in a modal logic such as K, K4, T, S4, or S5 would haveimportant applications. With the exception of S5, these logics sharesome of the important metatheoretical results of classical logic, suchas cut-elimination, and hence cut-free (modal) sequent calculi can beprovided for them, along with techniques for their automation.Connection methods (Andrews 1981, Bibel 1981) have played an importantrole in helping to understand the source of redundancies in the searchspace induced by these modal sequent calculi and have provided aunifying framework not only for modal logics but also forintuitionistic and classical logic as well (Wallen 1990). Currentefforts to automate modal logic reasoning revolve around thetranslational approach mentioned above, namely to embed modal logicinto classical logic and then use an existing automated reasoningsystem for the latter to prove theorems of the former.(Benzmüller & Paulson 2013) shows how to embed quantifiedmodal logic into simple type theory, proves the soundness andcompleteness of the embedding, and demonstrates with simpleexperiments how existing higher-order theorem provers can be used toautomate proofs in modal logic. The approach can be extended tohigher-order modal logic as well (Benzmüller & Paleo 2015).As a matter of fact, embeddings in classical higher-order logic can beused as a means for universal reasoning (Benzmüller 2019); thatis, the embedding provides a universal logical reasoning frameworkthat uses classical higher-order logic as a metalogic in which variousother classical and non-classical logics can be represented. Furtherevidence to the claim of universality is provided by the semanticalembedding approach to free logic and category theory (Benzmüller& Scott 2020). Embedding yields a number of practical benefits toautomated deduction: universality, uniformity, expressiveness ofnotation and reasoning, and ready availability of existing powerfulautomated theorem-proving tools that are known to be sound.
There are different ways in which intuitionsitic logic can beautomated. One is to directly implement the intuitionistic versions ofGentzen’s sequent and natural deduction calculi, LJ and NJrespectively. This approach inherits the stronger normalizationresults enjoyed by these calculi allowing for a more compactmechanization than their classical counterparts. Another approach atmechanizing intuitionistic logic is to exploit its semanticsimilarities with the modal logic S4 and piggy back on an automatedimplementation of S4. Automating intuitionistic logic has applicationsin software development since writing a program that meets aspecification corresponds to the problem of proving the specificationwithin an intuitionistic logic (Martin-Löf 1982). A system thatautomated the proof construction process would have importantapplications in algorithm design but also in constructive mathematics.Nuprl (Constableet al. 1986) is a computer system supportinga particular mathematical theory, namely constructive type theory, andwhose aim is to provide assistance in the proof development process.The focus is on logic-based tools to support programming and onimplementing formal computational mathematics. Over the years thescope of the Nuprl project has expanded from“proofs-as-programs” to “systems-as-theories”.Similar in spirit and based on the Curry-Howard isomorphism, the Coqsystem formalizes its proofs in the Calculus of InductiveConstructions, a \(\lambda\)-calculus with a rich system of typesincluding dependent types (Coquand & Huet 1988, Coquand &Paulin-Mohring 1988). Like Nuprl, Coq is designed to assist in thedevelopment of mathematical proofs as well as computer programs fromtheir formal specifications.
Logic programming, particularly represented by the languageProlog (Colmeraueret al. 1973), is probablythe most important and widespread application of automated theoremproving. During the early 1970s, it was discovered that logic could beused as a programming language (Kowalski 1974). What distinguisheslogic programming from other traditional forms of programming is thatlogic programs, in order to solve a problem, do not explicitly statehow a specific computation is to be performed; instead, alogic program stateswhat the problem is and then delegatesthe task of actually solving it to an underlying theorem prover. InProlog, the theorem prover is based on a refinement of resolutionknown as SLD-resolution.SLD-resolution is avariation of linear input resolution that incorporates a special rulefor selecting the next literal to be resolved upon; SLD-resolutionalso takes into consideration the fact that, in the computer’smemory, the literals in a clause are actually ordered, that is, theyform a sequence as opposed to a set. A Prologprogramconsists of clauses stating known facts and rules. For example, thefollowing clauses make some assertions about flight connections:
flight(toronto, london).
flight(london, rome).
flight(chicago, london).
flight\((X, Y)\) :– flight\((X, Z)\) , flight\((Z, Y)\).
The clause flight(toronto, london) is a fact while flight\((X, Y)\):– flight\((X, Z)\) , flight\((Z,Y)\) is a rule, written byconvention as a reversed conditional (the symbol“:–” means “if”; the comma means“and”; terms starting in uppercase are variables). Theformer states that there is flight connection between Toronto andLondon; the latter states that there is a flight between cities \(X\)and \(Y\) if, for some city \(Z\), there is a flight between \(X\) and\(Z\) and one between \(Z\) and \(Y\). Clauses in Prolog programs area special type of Horn clauses having precisely one positive literal:Facts are program clauses with no negative literalswhilerules have at least one negative literal. (Notethat in standard clause notation the program rule in the previousexample would be written as flight\((X,Y) \vee{\sim}\)flight\((X,Z)\vee{\sim}\)flight\((Z,Y)\).) The specific form of the program rulesis to effectively express statements of the form: “If theseconditions over here are jointly met then this other fact willfollow”. Finally, agoal is a Horn clausewith no positive literals. The idea is that, once a Prolog program\(\Pi\) has been written, we can then try to determine if a new clause\(\gamma\), the goal, is entailed by \(\Pi , \Pi \vDash \gamma\); theProlog prover does this by attempting to derive a contradiction from\(\Pi \cup \{{\sim}\gamma \}\). We should remark that program factsand rules alone cannot produce a contradiction; a goal must enter intothe process. Like input resolution, SLD-resolution is not refutationcomplete for first-order logic but it is complete for the Horn logicof Prolog programs. The fundamental theorem: If \(\Pi\) is a Prologprogram and \(\gamma\) is the goal clause then \(\Pi \vDash \gamma\)iff \(\Pi \cup \{{\sim}\gamma \} \vdash [\,]\) by SLD-resolution(Lloyd 1984).
For instance, to find out if there is a flight from Toronto to Romeone asks the Prolog prover to see if the clause flight(toronto, rome)follows from the given program. To do this, the prover adds\({\sim}\)flight(toronto,rome) to the program clauses and attempts toderive the empty clause, \([\,]\), by SLD-resolution:
| 1 | flight(toronto, london) | Program clause |
| 2 | flight(london, rome) | Program clause |
| 3 | flight(chicago, london) | Program clause |
| 4 | flight\((X,Y) \vee{\sim}\)flight\((X,Z)\vee{\sim}\)flight\((Z,Y)\) | Program clause |
| 5 | \({\sim}\)flight(toronto, rome) | Negated conclusion |
| 6 | \({\sim}\)flight(toronto,\(Z) \vee{\sim}\)flight\((Z\),rome) | Res 5 4 |
| 7 | \({\sim}\)flight(london, rome) | Res 6 1 |
| 8 | \([ \,]\) | Res 7 2 |
The conditional form of rules in Prolog programs adds to theirreadability and also allows reasoning about the underlying refutationsin a more friendly way: To prove that there is a flight betweenToronto and Rome, flight(toronto,rome), unify this clause with theconsequent flight\((X,Y)\) of the fourth clause in the program whichitself becomes provable if both flight(toronto,\(Z)\) andflight\((Z\),rome) can be proved. This can be seen to be the caseunder the substitution \(\{Z \leftarrow\) london\(\}\) since bothflight(toronto,london) and flight(london,rome) are themselvesprovable. Note that the theorem prover not only establishes that thereis a flight between Toronto and Rome but it can also come up with anactual itinerary, Toronto-London-Rome, by extracting it from theunifications used in the proof.
There are at least two broad problems that Prolog must address inorder to achieve the ideal of a logic programming language. Logicprograms consist of facts and rules describing what is true; anythingthat is not provable from a program is deemed to be false. In regardsto our previous example,flight(toronto,boston) is not true since this literal cannot be deduced fromthe program. The identification of falsity with non-provability isfurther exploited in most Prolog implementations by incorporating anoperator,not, that allows programmers to explicitlyexpress the negation of literals (or even subclauses) within aprogram. By definition, not \(l\) succeeds if the literal \(l\) itselffails to be deduced. This mechanism, known asnegation-by-failure, has been the target ofcriticism. Negation-by-failure does not fully capture the standardnotion of negation and there are significant logical differencesbetween the two. Standard logic, including Horn logic, is monotonicwhich means that enlarging an axiom set by adding new axioms simplyenlarges the set of theorems derivable from it; negation-by-failure,however, isnon-monotonic and the addition of newprogram clauses to an existing Prolog program may cause some goals tocease from being theorems. A second issue is thecontrolproblem. Currently, programmers need to provide a fair amountof control information if a program is to achieve acceptable levels ofefficiency. For example, a programmer must be careful with the orderin which the clauses are listed within a program, or how the literalsare ordered within a clause. Failure to do a proper job can result inan inefficient or, worse, non-terminating program. Programmers mustalso embed hints within the program clauses to prevent the prover fromrevisiting certain paths in the search space (by using thecut operator) or to prune them altogether (by usingfail. Last but not least, in order to improve theirefficiency, many implementations of Prolog do not implementunification fully and bypass a time-consuming yet criticaltest—the so-calledoccurs-check—responsible for checking thesuitability of the unifiers being computed. This results in an unsoundcalculus and may cause a goal to be entailed by a Prolog program (froma computational point of view) when in fact it should not (from alogical point of view).
There are variations of Prolog intended to extend its scope. Byimplementing a model elimination procedure, the Prolog TechnologyTheorem Prover (PTTP) (Stickel 1992) extends Prolog into fullfirst-order logic. The implementation achieves both soundness andcompleteness. Moving beyond first-order logic, \(\lambda\)Prolog(Miller & Nadathur 1988) bases the language on higher-orderconstructive logic.
The problem of determining the satisfiability of logic formulas hasreceived much attention by the automated reasoning community due toits important applicability in industry. A propositional formula issatisfiable if there is an assignment of truth-valuesto its variables that makes the formula true. For example, theassignment \((P \leftarrow\) true, \(Q \leftarrow\) true, \(R\leftarrow\) false) does not make \((P \vee R) \amp{\sim}Q\) true but\((P \leftarrow\) true, \(Q \leftarrow\) false, \(R \leftarrow\)false) does and, hence, the formula is satisfiable. Determiningwhether a formula is satisfiable or not is called the BooleanSatisfiability Problem—\(\mathbf{SAT}\) for short—and fora formula with \(n\) variables SAT can be settled thus: Inspect eachof the \(2^n\) possible assignments to see if there is at least oneassignment that satisfies the formula, i.e. makes it true. This methodis clearly complete: If the original formula is satisfiable then wewill eventually find one such satisfying assignment; but if theformula is contradictory (i.e. non-satisfiable), we will be able todetermine this too. Just as clearly, and particularly in this lattercase, this search takes an exponential amount of time, and the desireto conceive more efficient algorithms is well justified particularlybecause many computing problems of great practical importance such asgraph-theoretic problems, network design, storage and retrieval,scheduling, program optimization, and many others (Garey & Johnson1979) can be expressed as SAT instances, i.e. as the SAT question ofsome propositional formula representing the problem. Given that SAT isNP-complete (Cook 1971) it is very unlikely that a polynomialalgorithm exists for it; however, this does not preclude the existenceof sufficiently efficient algorithms for particular cases of SATproblems.
The Davis-Putnam-Logemann-Loveland \((\mathbf{DPLL})\) algorithm wasone of the first SAT search algorithms (Davis & Putnam 1960;Davis, Logemman & Loveland 1962) and is still considered one ofthe best complete SAT solvers; many of the complete SAT procedures inexistence today can be considered optimizations and generalizations ofDPLL. In essence, DPLL search procedures proceed by considering waysin which assignments can be chosen to make the original formula true.For example, consider the formula in CNF \[P \amp{\sim}Q \amp({\sim}P \vee Q \vee R) \amp(P \vee{\sim}S)\] Since \(P\) is aconjunct, but also a unit clause, \(P\) must be true if the entireformula is to be true. Moreover, the value of \({\sim}P\) does notcontribute to the truth of \({\sim}P \vee Q \vee R\) and \(P\vee{\sim}S\) is true regardless of \(S\). Thus, the whole formulareduces to \[{\sim}Q \amp(Q \vee R)\] Similarly, \({\sim}Q\) must be true and theformula further reduces to \[R\] which forces \(R\) to be true.From this process we can recover the assignment \((P \leftarrow\)true, \(Q \leftarrow\) false, \(R \leftarrow\) true, \(S \leftarrow\)false) proving that the original formula is satisfiable. A formula maycause the algorithm to branch; the search through a branch reaches adead end the moment a clause is deemed false—aconflicting clause—and all variations of theassignment that has been partially constructed up to this point can bediscarded. To illustrate:
| 1 | \(R \amp(P \vee Q) \amp({\sim}P \vee Q) \amp({\sim}P\vee{\sim}Q)\) | Given |
| 2 | \((P \vee Q) \amp({\sim}P \vee Q) \amp({\sim}P\vee{\sim}Q)\) | By letting \(R \leftarrow\) true |
| 3 | \(Q \amp{\sim}Q\) | By letting \(P \leftarrow\) true |
| 4 | ? | Conflict: \(Q\) and \({\sim}Q\) cannot both be true |
| 5 | \((P \vee Q) \amp({\sim}P \vee Q) \amp({\sim}P\vee{\sim}Q)\) | Backtrack to (2): \(R \leftarrow\) true still holds |
| 6 | \({\sim}P\) | By letting \(Q \leftarrow\) true |
| 7 | true | By letting \({\sim}P\) be true, i.e.,\(P\leftarrow\) false |
Hence, the formula is satisfiable by the existence of \((P\leftarrow\) false, \(Q \leftarrow\) true, \(R \leftarrow\) true).DPLL algorithms are made more efficient by strategies such asterm indexing (ordering of the formula variables inan advantageous way),chronological backtracking(undoing work to a previous branching point if the process leads to aconflicting clause), andconflict-driven learning(determining the information to keep and where to backtrack). Thecombination of these strategies results in a large prune of the searchspace; for a more extensive discussion the interested reader isdirected to Zhang & Malik 2002.
A quick back-envelope calculation reveals the staggering computingtimes of (algorithms for) SAT-type problems represented by formulaswith as little as, say, 60 variables. To wit: A problem represented asa Boolean formula with 10 variables that affords a linear solutiontaking one hundredth of a second to complete would take just fourhundredths and six hundredths of a second to complete if the formulahad instead 40 and 60 variables respectively. In dramatic contrast, ifthe solution to the problem were exponential (say \(2^n)\) then thetimes to complete the job for 10, 40 and 60 variables would berespectively one thousandth of a second, 13 days, and 365 centuries.It is a true testament to the ingenuity of the automated reasoningcommunity and the power of current SAT-based search algorithms thatreal-world problems with thousands of variables can be handled withreasonable efficency. Küchlin & Sinz 2000 discuss a SATapplication in the realm of industrial automotive product datamanagement where 18,000 (elementary) Boolean formulas and 17,000variables are used to express constraints on orders placed bycustomers. As another example, Massacci & Marraro 2000 discuss anapplication in logical cryptanalysis, that is, the verification ofproperties of cryptographic algorithms expressed as SAT problems. Theydemonstrate how finding a key with a cryptographic attack is analogousto finding a model—assignment—for a Boolean formula; theformula in their application encodes the commercial version of the U.SData Encryption Standard (DES) with the encoding requiring 60,000clauses and 10,000 variables.
Although SAT is conceptually very simple, its inner nature is not wellunderstood—there are no criteria that can be generally appliedto answer as to why one SAT problem is harder than another. It shouldthen come as no surprise that algorithms that tend to do well on someSAT instances do not perform so well on others, and efforts are beingspent in designing hybrid algorithmic solutions that combine thestrength of complementary approaches—see Prasad, Biere &Gupta 2005 for an application of this hybrid approach in theverification of hardware design.
Recent advances in SAT hybrid strategies coupled with supercomputingpower has allowed a team of three computing scientists to solve theBoolean Pythagorean Triples Problem, a long-standing open question inRamsey Theory: Can the set \(\{1, 2,...\}\) of natural numbers bedivided into two parts with no part containing a triple \((a, b, c)\)such that \(a^2 + b^2 = c^2\)? Heule, Kullmann & Marek 2016 provedthat this cannot be done by showing that the set \(\{1, 2, \ldots,n\}\) can be so partitioned for \(n = 7824\) but that this isimpossible for \(n \ge 7825\). Expressing this deceptively simplequestion as a SAT problem required close to 38,000 clauses and 13,000variables with about half of these going to represent that the problemis satisfiable when n \(= 7824\) and the other half to represent thatit is not when n \(= 7825\); of the two, proving the latter was farmore challenging since it demanded a proof of unsatisfiability, i.e.that no such partition exists. A naïve brute-force approachconsidering all \(2^{7825}\) possible two-part partitions was clearlyout of the question and the problem was attacked by using“clever” algorithms within a multi-stage SAT-basedframework for solving hard problems in combinatorics, consisting offive phases:Encode (encoding the problem as SAT formulas),Transform (optimizing the encoding using clause eliminationand symmetry breaking techniques),Split (dividing theproblem effectively into subproblems using splitting heuristics),Solve (searching for satisfying assignments or their lackthereof using fast processing), andValidate (validating theresults of the earlier phases). Of special importance was theapplication ofcube-and-conquer, a hybrid SATstrategy particularly effective for hard combinatorial problems. Thestrategy combineslook-ahead withconflict-driven clause-learning \((\mathbf{CDCL})\),with the former aiming to construct small binary search trees usingglobal heuristics and the latter aiming to find short refutationsusing local heuristics.
After splitting the problem into \(10^6\) hard subproblems (known as“cubes”), these were handed down to 800 cores working inparallel on aStampede supercomputer which, after 2 days offurther splitting and CDCL clause-crunching, settled the question anddelivered a 200-terabyte proof validating the work. After deservedlycelebrating this significant accomplishment of automated reasoning,and after entertaining all the new applications that the enhanced SATmethod would afford (particularly in the areas of hardware andsoftware verification), we should then ask some questions that are ofespecial importance to mathematicians: Is there a more insightful wayto establish this result that would involve more traditional andintellectually satisfying mathematical proof methods? Or, as far asincreasing our understanding of a given field (combinatorics in thiscase), what is the value of settling a question when no human caninspect the proof and hence get no insight from it? Even the teamresponsible for the result admits that “the proofs ofunsatisfiability coming from SAT solvers are, from a human point ofview, a giant heap of random information (no direct understanding isinvolved)”. The conjecture has been settled but we basicallyhave no underlying idea what makes 7825 so special. Perhaps the realvalue to be drawn from these considerations is that they lead us tothink about the deeper question: What is it about the structure of aspecific problem that makes it amenable to standard mathematicaltreatment as opposed to requiring a mindless brute-force approach?While this question is being contemplated, SAT may provide the bestline of attack on certain mathematical problems.
The DPLL search procedure has been extended to quantified logic. MACEis a popular program based on the DPLL algorithm that searches forfinite models of first-order formulas with equality. As an example(McCune 2001), to show that not all groups are commutative one candirect MACE to look for a model of the group axioms that alsofalsifies the commutation law or, equivalently, to look for a modelof:
\[\begin{align}\tag{G1} &e\cdot x = x &\text{(left identity)} \\\tag{G2} & i(x)\cdot x = e &\text{(left inverse)} \\\tag{G3} & x(\cdot y)\cdot z = x \cdot(y \cdot z) &\text{(associativity)}\\\tag{DC} & a\cdot b \neq b\cdot a & \text{(denial of commutativity)}\end{align}\]MACE finds a six-element model of these axioms, where \(\cdot\) isdefined as:
| \(\cdot\) | 0 | 1 | 2 | 3 | 4 | 5 |
| 0 | 0 | 1 | 2 | 3 | 4 | 5 |
| 1 | 1 | 0 | 4 | 5 | 2 | 3 |
| 2 | 2 | 3 | 0 | 1 | 5 | 4 |
| 3 | 3 | 2 | 5 | 4 | 0 | 1 |
| 4 | 4 | 5 | 1 | 0 | 3 | 2 |
| 5 | 5 | 4 | 3 | 2 | 1 | 0 |
and where \(i\) are defined as:
| \(x\) | 0 | 1 | 2 | 3 | 4 | 5 |
| \(i(x)\) | 0 | 1 | 2 | 3 | 4 | 5 |
This example also illustrates, once again, the benefits of using anautomated deduction system: How long would have taken the humanresearcher to come up with the above or a similar model? For morechallenging problems, the program is being used as a practicalcomplement to the resolution-based theorem prover Prover9 (formerlyOtter), with Prover9 searching for proofs and MACE jointly looking for(counter) models. To find such models, MACE converts the first-orderproblem into a set of "flattened" clauses which, for increasing modelsizes, are instantiated into propositional clauses and solved as a SATproblem. The method has been implemented in other automated reasoningsystems as well, most notably in the Paradox model finder where theMACE-style approach has been enhanced by four additional techniquesresulting in some significant efficiency improvements (Claessen &Sörensson 2003): term definitions (to reduce the number ofvariables in flattened clauses), static symmetric reduction (to reducethe number of isomorphic models), sort inference (to apply symmetricreduction at a finer level) and incremental SAT (to reuse searchinformation between consecutive model sizes). The strategy of pairingthe complementary capabilities of separate automated reasoning systemshas been applied to higher-order logic too as exemplified by Nitpick,a counterexample generator for Isabelle/HOL (Blanchette & Nipkow2010). Brown 2013 describes a theorem proving procedure forhigher-order logic that uses SAT-solving to do most of the work; theprocedure is a complete, cut-free, ground refutation calculus thatincorporates restrictions on instantiations and has been implementedin the Satallax theorem prover (Brown 2012).
An approach of great interest at solving SAT problems in first-orderlogic isSatisfiability Modulo Theory\((\mathbf{SMT})\) where the interpretation of symbols in theproblem’s formulation is constrained by abackgroundtheory. For example, in linear arithmetic the functionsymbols are restricted to + and \(-\). As another example, in theextensional theory of arrays (McCarthy 1962) the array functionread\((a, i)\) returns the value of the array \(a\) at index\(i\), andwrite\((a, i, x)\) returns the array identical to\(a\) but where the value of \(a\) at \(i\) is \(x\). Moreformally,
In the context of these axioms, an SMT solver would attempt toestablish the satisfiability (or, dually, the validity) of a givenfirst-order formula, or thousands of formulas for that matter, suchas
\[i - j = 1 \amp f(\textit{read}(\textit{write}(a, i, 2),j + 1) = \textit{read}(\textit{write}(a, i,f(i - j + 1)), i)\](Ganzingeret al. 2004) discusses an approach to SMT called\(\mathbf{DPLL}(\mathbf{T})\) consisting of a general DPLL(X) enginethat works in conjunction with a solverSolver\(_T\) forbackground theory \(T\). Bofillet al. (2008) present theapproach in the setting of the theory of arrays, where the DPLL engineis responsible for enumerating propositional models for the givenformula whereasSolver\(_T\) checks whether these models areconsistent with the theory of arrays. Their approach is sound andcomplete, and can be smoothly extended to multidimensional arrays.
SMT is particularly successful in verification applications, mostnotably software verification. Having improved the efficiency of SATsolvers with SMT, the effort is now on designing more efficient SMTsolvers (de Moura 2007). There is also the need to conduct acomprehensive comparison and potential consolidation of the techniquesoffered by the different SMT-based verification approaches, includingbounded model checking, k-induction, predicate abstraction, and lazyabstraction with interpolants (Beyer, Dangl & Wendler 2018 and2021).
To prove automatically even the simplest mathematical facts requires asignificant amount of domain knowledge. As a rule, automated theoremprovers lack such rich knowledge and attempt to construct proofs fromfirst principles by the application of elementary deduction rules.This approach results in very lengthy proofs (assuming a proof isfound) with each step being justified at a most basic logical level.Larger inference steps and a significant improvement in mathematicalreasoning capability can be obtained, however, by having a theoremprover interact with a computer algebra system, also known as asymbolic computation system. Acomputer algebrasystem is a computer program that assists the user with thesymbolic manipulation and numeric evaluation of mathematicalexpressions. For example, when asked to compute the improperintegral
\[\int_0^\infty e^{-a^2 t^2}\cos(2bt)\,dt\]a competent computer algebra system would quickly reply with theanswer
\[\frac{\sqrt{\pi}}{2a}{e^{-b^2/a^2}}\]Essentially, the computer algebra system operates by taking the inputexpression entered by the user and successively applies to it a seriesof transformation rules until the result no longer changes (seethe section on Term Rewriting for more details). These transformation rules encode a significantamount of domain (mathematical) knowledge making symbolic systemspowerful tools in the hands of applied mathematicians, scientists, andengineers trying to attack problems in a wide variety of fieldsranging from calculus and the solving of equations to combinatoricsand number theory.
Problem solving in mathematics involves the interplay of deduction andcalculation, with decision procedures being a reminder of the fuzzydivision between the two; hence, the integration of deductive andsymbolic systems, which we coin here asDeductive ComputerAlgebra (DCA), is bound to be a fruitful combination.Analytica (Bauer, Clarke & Zhao 1998) is a theorem prover built ontop of Mathematica, a powerful and popular computer algebra system.Besides supplying the deductive engine, Analytica also extendsMathematica’s capabilities by defining a number of rewriterules—more precisely, identities about summations andinequalities—that are missing in the system, as well asproviding an implementation of Gosper’s algorithm for findingclosed forms of indefinite hypergeometric summations. Equipped withthis extended knowledge, Analytica can prove semi-automatically somenontrivial theorems from real analysis, including a series of lemmasdirectly leading to a proof of the Bernstein Approximation Theorem.Here is the statement of the theorem simply to give the reader a senseof the level of the mathematical richness we are dealing with:
Bernstein Approximation Theorem.
Let \(\text{I} = [0, 1]\) be the closed unit interval, \(f\) a realcontinuous function on I, and \(B_n (x,f)\) thenth Bernsteinpolynomial for \(f\) defined as \[B_n(x, f)= \sum_{k=0}^n \binom{n}{k} f(k/n)x^k(1-x)^{n-k}\] Then, on the interval I,the sequence of Bernstein polynomials for \(f\) converges uniformly to\(f\).
To be frank, the program is supplied with key information to establishthe lemmas that lead to this theorem but the amount and type ofdeductive work done by the program is certainly nontrivial. (Clarke& Zhao 1994) provides examples of fully automated proofs usingproblems in Chapter 2 ofRamanujan’s Notebooks (Berndt1985) including the following example that the reader is invited totry. Show that:
\[\sum_{k=n+1}^{A_r}\frac{1}{k}=r+2\left(\sum_{k=1}^r (r-k)(\sum_{j=A_{k-1}+1}^{a_k} \frac{1}{(3j)^3 -3j})\right) + 2r\phi(3, A_0)\]where \(A_0 =1\), \(A_{n+1}=3A_n +1\) and \(\phi(x,n)\) isRamanujan’s abbreviation for
\[ \phi(x, n)=_{df} \sum_{K_1}^n \frac{1}{-(kx)+ k^3x^3}\]Analytica’s proof of this identity proceeds by simplifying boththe left- and right-hand sides of the equality and showing that bothsides reduce to the same expression, \(-H_n +\) \(H_{A_r}\). Thesimplification uses the added summation identities mentioned before aswell as some elementary properties of the harmonic numbers,
\[ H_n=\sum_{k=1}^n\frac{1}{k} \]The resulting proof has 28 steps (some of which are nontrivial) takingabout 2 minutes to find.
Kerber, Kohlhase & Sorge 1998 use the \(\Omega\)mega planningsystem as the overall way to integrate theorem proving and symboliccomputation. In Harrison & Théry 1998, we find an exampleof the integration of a higher-order logic theorem proving system(HOL) with a computer algebra system (Maple).
Their great power notwithstanding, symbolic algebra systems do notenforce the same level of rigor and formality that is the essence ofautomated deduction systems. In fact, the mathematical semantics ofsome of the knowledge rules in most algebra systems is not entirelyclear and are, in cases, logically unsound (Harrison &Théry 1998). The main reason for this is an over-aggressivenessto provide the user with an answer in a timely fashion at whatevercost, bypassing the checking of required assumptions even if it meanssacrificing the soundness of the calculation. (This is stronglyreminiscent of most Prolog implementations that bypass the so-called“occurs-check” also abandoning logical soundness in thename of efficiency.) This serious problem opens the opportunity for adeduction system to provide a service to the computer algebra system:Use its deductive capabilities to verify that the computeralgebra’s computational steps meet the required assumptions.There is a catch in this, however: For sufficiently large calculationsteps, verifying is tantamount to proving and, to check these steps,the deduction system may well need the assistance of the very samesystem that is in need of verification! The solution to the soundnessproblem may then well require an extensive modification of the chosensymbolic algebra system to make it sound; an alternative approach isto develop a new system, entirely from scratch, in conjunction withthe development of the automated theorem prover. In either case, theresulting combined deductive computer algebra system should display amuch improved ability for automated mathematical reasoning.
Automated reasoning has reached the level of maturity where theoremproving systems and techniques are being used for industrial-strengthapplications. One such application area is the formal verification ofhardware and software systems. The cost of defects in hardware caneasily run into the millions. In 1994, the Pentium processor wasshipped with a defect in its floating-point unit and the subsequentoffer by Intel to replace the flawed chip (which was taken up only bya small fraction of all Pentium owners) cost the company close to $500million. To guard against situations like this, the practice oftesting chip designs is now considered insufficient and more formalmethods of verification have not only gained large attention in themicroprocessor industry but have become a necessity. The idea behindformal verification is to rigorously prove with mathematical certaintythat the system functions as specified. Common applications tohardware design include formally establish that the system functionscorrectly on all inputs, or that two different circuits arefunctionally equivalent.
Depending on the task at hand, one can draw from a number of automatedformal verification techniques, including SAT solvers in propositionallogic, symbolic simulation using binary decision diagrams (BDDs),model checking in temporal logic, or conducting proofs in higher-orderlogic. In the latter case, using an automated theorem prover likeHOL—see Section 3.1—has shown to be invaluable inpractice. Proof construction in a system like HOL proceedssemi-automatically with the user providing a fair amount of guidanceas to how the proof should proceed: The user tries to find a proofwhile being assisted by the theorem prover which, on request, caneither automatically fill in a proof segment or verify proof stepsgiven to it. Although some of the techniques mentioned above providedecision procedures which higher-order logic lacks, higher-order logichas the advantage of being very expressive. The tradeoff is justifiedsince proving facts aboutfloating-point arithmeticrequires the formalization of a large body of real analysis, includingmany elementary statements such as:
| |- | (!x. a <= x /\ x <= b ==> (f diffl (f' x)) x) /\ |
| f(a) <= K /\ | |
| f(b) <= K /\ | |
| (!x. a <= x /\ x <= b /\ (f'(x) = 0) ==> f(x) <= K)==> | |
| (!x. a <= x /\ x <= b ==> f(x) <=K) |
This statement from Harrison 2000 written in HOL says that if afunction \(f\) is differentiable with derivative \(f'\) in an interval\([a, b]\) then a sufficient condition for \(f(x) \le K\) throughoutthe interval is that \(f(x) \le K\) at the endpoints \(a, b\) and atall points of zero derivative. The result is used to determine errorbounds when approximating transcendental functions by truncated powerseries. Conducting proofs in such a “painstakingly foundationalsystem” (Harrison 2006) has some significant benefits. First,one achieves a high degree of assurance that the proofs are validsince (admitedly lengthy) they are composed of small error-freedeductive steps. Second, the formalization of these elementarystatements and intermediate results can be reused in other tasks orprojects. For example, a library of formal statements and provenresults in floating-point division can be reused when proving otherresults of floating-point algorithms for square roots ortranscendental functions. To further illustrate, different versions ofthe square root algorithm for the Intel Itanium share manysimilarities and the proof of correctness for one version of thealgorithm can be carried over to another version after minor tweakingof the proof. A third benefit of using a prover like HOL is, ofcourse, that such lengthy proofs are carried out mechanically and aredeductively certain; the likelihood of introducing a human error ifthey were carried out manually would be just as certain.
Formal verification frameworks have become an indispensable part ofthe microprocessor design process, as exemplified by the use of theACL2 theorem prover at Centaur Technology (Hunt, Kaufmann, Moore andSlobodova 2017). ACL2 is used as an integrated programming and proofenvironment in the formal specification of microprocessors, formalmodels of hardware design and their associated proofs, and to supportthe design of other formal analysis tools. Uses of ACL2 to provegroup-theoretic properties of elliptic curves (Russinoff 2017) echothe aforementioned work of Harrison 2000 within abstract mathematicaldomains in order to serve the needs of wordly microprocessor design.
Society is becoming increasingly dependent on software systems forcritical services such as safety and security. Serious adverse effectsof malfunctioning software include loss of human life, threats tosecurity, unauthorized access to sensitive information, largefinancial losses, denial of critical services, and risk to safety. Oneway to increase the quality of critical software is to supplementtraditional methods of testing and validation with techniques offormal verification. The basic approach to formal verification is togenerate a number of conditions that the software must meet and toverify—establish—them by mathematical proof. As withhardware, automated formal verification (simply formal verification,hereafter) is concerned with discharging these proof obligations usingan automated theorem prover.
The formal verification ofsecurity protocols is analmost ideal application of automated theorem proving in industry.Security protocols are small distributed programs aimed at ensuringthat transactions take place securely over public networks. Thespecification of a security protocol is relatively small and welldefined but its verification is certainly non-trivial. We have alreadymentioned in a previous section the use of SAT-based theorem proversin the verification of the U.S Data Encryption Standard (DES). Asanother example, the Mondex “electronic purse” is a smartcard electronic cash system that was originally developed by NationalWestminster Bank and subsequently sold to MasterCard International.Schmitt & Tonin 2007 describe a Java Card implementation of theMondex protocol for which the security properties were reformulated inthe Java Modeling Language (JML) following closely the original Zspecification. Proof of correctness was conducted using the KeY tool(Beckert, Hanle & Schmitt 2007), an interactive theorem provingenvironment for first-order dynamic logic that allows the user toprove properties of imperative and object-oriented sequentialprograms. This application of automated reasoning demonstrates, in thewords of the authors, that “it is possible to bridge the gapbetween specification and implementation ensuring a fully verifiedresult”.
Denney, Fischer & Schumann 2004 describe a system to automate thecertification of safety properties of data-analysisaerospacesoftware at NASA. Using Hoare-style program verificationtechniques, their system generates proof obligations that are thenhandled by an automated theorem prover. The process is not fullyautomated, however, since many of the obligations must be simplifiedfirst in order to improve the ability of the theorem prover to solvethe proof tasks. For example, one such class of obligations makes astatement about a matrix, \(r\), that needs to remain symmetric afterupdates along its diagonal have been made, and has the form:
Original form:
\(\textit{symm}(r)\rightarrow\textit{symm}(\textit{diag-updates}(r))\)
Simplified form (when \(r\) is 2x2):
| \((\forall i)(\forall j)(0 \le i, j \le 1 \rightarrow\textit{sel}(r, i, j) = \textit{sel}(r, j, i)) \rightarrow\) |
| \((\forall k)(\forall l)(0 \le k, l \le 1 \rightarrow\) |
| \(\textit{sel}(\textit{upd}(\textit{upd}(r, 1, 1, r_{11}), 0,0, r_{00}), k, l) = \textit{sel}(\textit{upd}(\textit{upd}(r, 1, 1,r_{11}), 0, 0, r_{00}), l, k)))\) |
Even after the simplification, current theorem provers find the prooftask challenging. The task becomes intractable for larger matrices andnumber of updates (e.g. a \(6\times 6\) matrix with 36 updates) andfurther preprocessing and simplification on the obligation is requiredbefore the task eventually falls within the reach of state-of-arttheorem provers. But it is worth remarking that proofs are foundwithout using any specific features or configuration parameters of thetheorem provers which would improve their chances at completing theproofs. This is important since the everyday application of theoremprovers in industry cannot presuppose such deep knowledge of theprover from their users. The formal verification of software remains ademanding task but it is difficult to see how the certification ofproperties could happen without the assistance of automated deductionwhen one faces the humanly impossible task of establishing thousandsof such obligations.
In the field ofnuclear engineering, techniques ofautomated reasoning are deemed mature enough to assist in the formalverification of the safety-critical software responsible forcontrolling a nuclear power plant’s reactor prevention systems(RPS). The RPS component of the digital control system of the APR-1400nuclear reactor is specified using NuSCR, a formal specificationlanguage customized for nuclear applications (Yoo, Jee & Cha2009). Model checking in computation tree logic is used to check thespecifications for completeness and consistency. After this, nuclearengineers generate function block designs via a process of automaticsynthesis and formally verify the designs also using techniques ofmodel checking in linear temporal logic; the techniques are also usedto verify the equivalence of the multiple revisions and releases ofthe design. These model-checking tools were implemented to make theiruse as easy and intuitive as possible, in a way that did not require adeep knowledge of the techniques, and used notations familiar tonuclear engineers. The use of automated reasoning tools not only helpsthe design engineers to establish the desired results but it alsoraises the confidence of the government’s regulatory personnelthat need to approve the RPS software before the reactor can becertified for operation.
Quantum computing is an emerging field at theintersection of physics and computer science. The field is expected tobring very significant practical applications and, given the nature ofthe quantum world, we can rest assured there will be no shortage ofphilosophical implications. These applications require a firmfoundation, including the formalization and verification of quantumalgorithms and results in quantum information theory. Aiming to thisworthwhile objective, a number of results have already been formalizedin Isabelle/HOL and added to its library so they can be made availablefor further work. After formalizing a number of concepts in quantumcomputing such as qubits, quantum states, quantum gates, entanglement,measurement, matrix representation of quantum circuits, and others,the work proceeds to the formalization of theorems and algorithms(Bordg, Lachnitt & He 2021), including:
Of notable mention is the fact that the formalization of the unfairquantum prisoner’s dilemma into Isabelle/HOL uncovered a flaw inthe original “paper-and-pencil” publication and which hadgone undetected for many years. Under the more formal and strictframework that Isabelle/HOL demands, the so-called quantum“miracle move” (as defined in Eisert, Wilkens &Lewenstein 1999) was found to be of no advantage over a classicalstrategy. This error has now been rectified (Eisert, Wilkens &Lewenstein 2020) thus re-establishing the advantage of a quantumstrategy . Further use of Isabelle/HOL in quantum computing includesthe verification of quantum cryptographic protocols and the additionto Isabelle’s library the formalization of the quantum Fouriertransform which will pave the way for more advanced quantumalgorithms.
In the spirit of Wos, Overbeek, Lusk & Boyle 1992, we pose thequestion: What do the following statements about different systems offormal logic and exact philosophy have in common?
We ask again, what do these results have in common? The answer is thateach has been proved with the help of an automated reasoning program.Having disclosed the answer to this question prompts a new one: Howmuch longer would have taken to settle these open problems without theapplication of such an automated reasoning tool?
The strict implicational fragments of the logical systems S4 and S5 ofmodal logic are known as C4 and C5, respectively, andtheir Hilbert-style axiomatizations presuppose condensed detachment astheir sole rule of inference. With insight from Kripke’s work,Anderson & Belnap (1962) published the first axiomatization of C4using the following 3-axiom basis, where the Polish notation‘Cpq’ stands for ‘\(p \rightarrowq\)’.
A question was posed sometime after: Is there a shorter suchaxiomatization for C4, using a 2-axiom basis or even a single axiom?Using the automated reasoning program Otter, the authors Ernst,Fitelson, Harris & Wos (2001) settled both questions in theaffirmative. In fact, several 2-axiom bases were discovered of whichthe following turned out to be shortest:
Further rounds of automated reasoning work were rewarded with thediscovery of a single axiom for C4; the axiom is 21 symbols long andit was also proved that it is the shortest such axiom:
To show that each of (2) and (3) is necessary and sufficient for (1),a circle of proofs was produced using the automated reasoning tool:(1) \(\Rightarrow\) (3) \(\Rightarrow\) (2) \(\Rightarrow\) (1). Asfor C5, its axiomatization was originally published in a paper byLemmon, A. Meredith, D. Meredith, Prior & Thomas (1957) givingseveral 4-, 3-, 2- and 1-axiom bases for C5, including the following3-axiom basis:
The publication also included the shortest known 2-axiom bases for C5(actually two of them, containing 20 symbols each) but the shortestsingle axiom for C5 was later discovered by (Meredith and Prior 1964)and having 21 symbols:
Applying automated reasoning strategies again, Ernst, Fitelson, Harris& Wos 2001) discovered several new bases, including the following2-axiom basis of length 18 and six 1-axiom bases matchingMeredith’s length of 21 (only one of these is given below):
To show that each of (6) and (7) is necessary and sufficient for (4),a circle of proofs was also produced with the theorem prover: (6)\(\Rightarrow\) (4) \(\Rightarrow\) (7) \(\Rightarrow\) (6).
A charming foray intocombinatory logic is presentedin Smullyan 1985 and Glickfeld & Overbeek 1986, where we learnabout a certain enchanted forest inhabited by talking birds. Given anybirds \(A\) and \(B\), if the name of bird \(B\) is spoken to bird\(A\) then \(A\) will respond with the name of some bird in theforest, \(AB\), and this response to \(B\) from \(A\) will always bethe same. Here are some definitions about enchanted birds:
And here are two facts about this enchanted forest:
There have been rumors that every bird in the forest is fond of atleast one bird, and also that there is at least one bird that is notfond of any bird. The challenge to the reader now is, of course, tosettle these rumors using only F1 and F2, and the given definitions(B1)–(B3). Glickfeld & Overbeek 1986 do this in mere secondswith an automated reasoning system using paramodulation, demodulationand subsumption. For a more challenging problem, consider theadditional definitions:
Smullyan challenges us to prove a most surprising thing about larks:Suppose we are not given any other information except that the forestcontains a lark. Then, show that at least one bird in the forest mustbe egocentric! Below we give the salient steps in the proof found bythe automated reasoning system, where ‘\(S(x, y)\)’ standsfor ‘\(xy\)’ and where clauses (2) and (3) are,respectively, the definition of a lark and the denial of the theorem;numbers on the right are applications of paramodulation:
1 (x1 = x1) 2 (S(S(L, x1), x2) = S(x1, S(x2, x2))) 3 -(S(x1, x1) = x1) 6 (S(x1, S(S(L, S(x2, x2)), x2)) = S(S(L,x1), S(x2, x2))) 2 2 8 (S(x1, S(S(x2, x2), S(x2, x2))) =S(S(L, S(L, x1)), x2)) 2 2 9 (S(S(S(L, L), x1), x2) = S(S(x1, x1),S(x2, x2))) 2 2 18 -(S(S(L, S(S(L, S(L, L)), x1)), x1) =S(S(L, S(x1,x1)), x1)) 6 3 6 9 8 8 19 [] 18 1
Closer inspection of the left and right hand sides of (18) under theapplication of unification revealed the discovery of a \(10-L\) bird,i.e. a 10-symbol bird expressed solely in terms of larks, which was astrong candidate for egocentricity. This discovery was excitingbecause the shortest egocentric \(L\)-bird known to Smullyan was oflength 12. A subsequent run of the automated reasoning system produceda proof of this fact as well as another new significant bird: Apossible egocentric \(8-L\) bird! A few more runs of the systemeventually produced a 22-line proof (with terms with as many as 50symbols, excluding commas and parentheses) of the fact that\(((LL)(L(LL)))(L(LL))\) is indeed egocentric. The natural questionsto ask next are, of course, whether there are other \(8-L\) egocentricbirds and whether there are shorter ones. The reader may want toattempt this with paper and pencil but, given that there are 429 suchbirds, it may be wiser to try it instead (or in conjunction) with anautomated reasoning program; both approaches are explored in Glickfeld& Overbeek 1986. For a more formal, but admittedly less colorful,introduction to combinatory logic and lambda-conversion the reader isreferred to Hindley & Seldin 1986.
Formulas in the classicalequivalential calculus arewritten using sentential variables and a two-place function symbol,\(e\), for equivalence. The calculus has two rules of inference,detachment (modus ponens) and substitution; the rules can be combinedinto the single rule of condensed detachment: Obtain \(t\theta\) from\(e(s,t)\) and \(r\) where \(s\theta = r\theta\) with mgu \(\theta\).The calculus can be axiomatized with the formulas:
\[\begin{align}\tag{E1}& e(x, x) &\text{(reflexivity)}\\\tag{E2}& e(e(x, y), e(y, x)) & \text{(symmetry)}\\\tag{E3}& e(e(x, y), e(e(y, z), e(x, z))) & \text{(transitivity)}\end{align}\]We can dispense with reflexivity since it is derivable from the othertwo formulas. This brings the number of axioms down to two and anatural question to ask is whether there is a single axiom for theequivalential calculus. In 1933, Łukasiewicz found three formulasof length eleven that each could act as a single axiom for thecalculus—here’s one of them:\(e(e(x,y),e(e(z,y),e(x,z)))\)—and he also showed that noshorter single axiom existed. Over time, other single axioms also oflength eleven were found and the list kept growing with additions byMeredith, Kalman and Peterson to a total of 14 formulas of which 13were known to be single axioms and one formula with a yet undeterminedstatus: the formula \(XCB= e(x, e(e(e(x, y), e(z, y)), z))\).(Actually, the list grew to 18 formulas but Wos, Winker, Veroff, Smith& Henschen 1983 reduced it to 14.) Resisting the intense study ofvarious researchers, it remained as an open question for many yearswhether the 14th formula, \(XCB\), was a single axiom for theequivalential calculus (Peterson 1977). One way to answer the questionin the affirmative would be to show that at least one of the 13 knownsingle axioms is derivable from \(XCB\) alone; another approach wouldbe to derive from \(XCB\) the 3-axiom set (E1)–(E3). While Wos,Ulrich & Fitelson 2002 take shots at the former, their line ofattack concentrates on the latter with the most challenging task beingthe proving of symmetry. Working with the assistance of a powerfulautomated reasoning program, Otter, they conducted a concerted,persistent and very aggressive assault on the open question. (Theirarticle sometimes reads like a military briefing from the frontlines!) For simpler problems, proofs can be found by the reasoningprogram automatically; deeper and more challenging ones like the oneat hand require the guidance of the user. The relentless applicationof the reasoning tool involved much guidance in the setting of lemmasas targets and the deployment of an arsenal of strategies, includingthe set of support, forward and backward subsumption, lemmaadjunction, formula complexity, hints strategy, ratio strategy, termavoidance, level saturation, and others. After much effort and CPUtime, the open question finally succumbed to the combined effort ofman and machine and a 61-step proof of symmetry was found, followed byone for transitivity after 10 more applications of condenseddetachment. Subsequent runs of the theorem prover using demodulationblocking and the so-called cramming strategy delivered shorter proofs.Here are the last lines of their 25-step proof which in this caseproves transitivity first followed by symmetry:
| 123 | [hyper,51,106,122] | P(e(e(e(e(x,y),e(z,y)),z),x)). |
| 124 | [hyper,51,53,123] | P(e(e(e(e(e(e(e(x,y),e(z,y)), z),x),u),e(v,u)),v)). |
| 125 | [hyper,51,124,123] | P(e(e(e(x,y),x),y)). |
| 127 | [hyper,51,124,108] | P(e(e(e(e(x,e(e(e(x,y),e(z,y)) ,z)),e(e(e(e(e(u,v),e(w,v)),w),u), v6)),v7),e(v6,v7))). |
| 128 | [hyper,51,127,123] | P(e(e(x,y),e(e(y,z),e(x,z)))). |
| 130 | [hyper,51,128,125] | P(e(e(x,y),e(e(e(z,x),z),y))). |
| 131 | [hyper,51,128,130] | P(e(e(e(e(e(x,y),x),z),u), e(e(y,z),u))). |
| 132 | [hyper,51,131,123] | P(e(e(x,y),e(y,x))). |
With an effective methodology and a strategy that included theassistance of an automated reasoning program in a crucial way, thesearch for shortest single axioms for the equivalent calculus came toan end.
Fitelson & Zalta 2007, Oppenheimer & Zalta 2011, and Alama,Oppenheimer, & Zalta 2015 describe several applications ofautomated reasoning incomputational metaphysics. Byrepresenting formal metaphysical claims as axioms and premises in anautomated reasoning environment using programs like Prover9, Mace4,the E-prover system and Paradox, the logical status of metaphysicalarguments is investigated. After the suitable formalization of axiomsand premises, the model finder program Mace4 is used to help verifytheir consistency. Then, using Prover9, proofs are automaticallygenerated for a number of theorems of the Theory of Plato’sForms, twenty five fundamental theorems of the Theory of PossibleWorlds, the theorems described in Leibniz’s unpublished paper of1690 and in his modal metaphysics, and a fully automated constructionof Saint Anselm’s Ontological Argument. In the latterapplication, Saint Anselm is understood in Oppenheimer & Zalta2011 as having found a way of inferring God’s existence from Hismere being as opposed to inferring God’s actuality from His merepossibility. This allows for a formalization that is free of modaloperators, involving an underlying logic of descriptions, threenon-logical premises, and a definition of God. Here are two keydefinitions in the formalization, as inputted into Prover9, thathelped express the concept of God:
Definition of none_greater: all x (Object(x) -> (Ex1(none_greater,x)<-> (Ex1(conceivable,x) & -(exists y (Object(y) &Ex2(greater_than,y,x) & Ex1(conceivable,y)))))). Definition of God: Is_the(g,none_greater).
Part of the challenge when representing in Prover9 these and otherstatements from axiomatic metaphysics was to circumvent some of theprover’s linguistic limitations. For example, Prover9 does nothave definite descriptions so statements of this kind as well assecond-order concepts had to be expressed in terms of Prover9’sexisting first-order logic. But the return is worth the investmentsince Prover9 not only delivered a proof ofEx1(e,g)—thereis one and only oneGod—but does so with an added bonus. A close inspection of theoutput provides yet another example of an automated theorem prover"outreasoning" its users, revealing that some of the logical machineryis actually redundant: The proof can be constructed only using two ofthe logical theorems of the theory of descriptions (called "Theorem 2"and "Theorem 3" in their article), one of the non-logical premises(called "Premise 2"), and the definition of God. We cannot help but toinclude here Prover9’s shorter proof, written in the moreelegant notation of standard logic (from Oppenheimer & Zalta2011):
| 1. | \({\sim}E!\iota x\phi_1\) | Assumption, forReductio |
| 2. | \(\exists y(Gy\iota x\phi_1 \amp Cy)\) | from (1), by Premise 2 and MP |
| 3. | \(Gh\iota x\phi_1 \amp Ch\) | from (2), by \(\exists\)E, ‘\(h\)’ arbitrary |
| 4. | \(Gh\iota x\phi_1\) | from (3), by &E |
| 5. | \(\exists y(y = \iota x\phi_1)\) | from (4), by Theory of Descriptions, Theorem 3 |
| 6. | \(C\iota x\phi_1 \amp{\sim}\exists y(Gy\iota x\phi_1 \ampCy)\) | from (5), by Theory of Descriptions, Theorem 2 |
| 7. | \({\sim}\exists y(Gy\iota x\phi_1 \amp Cy)\) | from (6), by &E |
| 8. | \(E!\iota x\phi_1\) | from (1), (2), (7), byReductio |
| 9. | \(E!g\) | from (8), by the definition of ‘\(g\)’ |
In the same tradition as St. Anselm’s, Gödel also providedan ontological proof of God’s existence (Gödel 1970, Scott1972). An important difference between the two is Gödel’suse of modal operators to represent metaphysical possibility andnecessity and, of course, his use of symbolic logic for addedreasoning precision. In his proof, Gödel begins by framing theconcept of “positive property” using two axioms, and heintroduces a definition stating that “A God-like being possessesall positive properties”. This is enough logical machinery toprove as a theorem the possibility of God’s existence,\(\Diamond \exists xG(x)\); three more axioms and two additionaldefinitions allow Gödel to further his proof to establish notonly that God exists, \(\exists xG(x)\), but that this is so bynecessity, \(\Box \exists xG(x)\). Gödel’s proof is in theformalism of higher-order modal logic (HOML) using modal operators andquantification over properties. Gödel never published his proofbut he shared it with Dana Scott who produced the version presentedbelow, which is taken from (Benzmüller & Paleo 2014) alongwith its English annotation to aid the reader with its intendedinterpretation:
The proof has recently been analysed to an unprecedented degree ofdetail and precision by Benzmüller & Paleo 2014 with the helpof automated theorem provers. A major challenge faced by these authorswas the lack of a HOML-based theorem prover that could carry out thework but this was circumvented by embedding the logic into theclassical higher-order logic (HOL) already offered by existing theoremprovers like LEO-II, Satallax and the countermodel finder Nitpick.Details of the syntactic and semantic embedding are given in theirpaper and it consists of encoding HOML formulas as HOL predicates viamappings, expansions, and \(\beta \eta\)-conversions. The mappingassociates HOML types \(\alpha\), terms \(s_{\alpha}\), and logicaloperators \(\theta\) with corresponding HOL “raised” types\(\lceil\alpha\rceil\), type-raised terms \(\lceil s_{\alpha}\rceil\),and type-raised logical operators \(\theta^{\bullet}\). If \(\mu\) and\(\omicron\) are, respectively, the types of individuals and Booleansthen \(\lceil\mu\rceil = \mu\) and \(\lceil\omicron\rceil = \sigma\)where \(\sigma\) is shorthand for \(\iota \rightarrow \omicron\) with\(\iota\) as the type of possible worlds; as for function types,\(\lceil\beta \rightarrow \gamma\rceil =\lceil\beta\rceil\rightarrow\lceil\gamma\rceil\). For type-raisedterms, \(\lceil s_\alpha \rceil\) is defined inductively on thestructure of \(s_\alpha \) as the following example illustrates:
\[\begin{align}\lceil \exists_{(\mu\rightarrow\omicron)\rightarrow\omicron}X_\mu. g_{\mu\rightarrow \omicron}X\rceil & = \lceil \exists_{(\mu\rightarrow \omicron)\rightarrow \omicron}\rceil\lceil X_\mu . g_{\mu\rightarrow \omicron}X\rceil\\& = \lceil \exists_{(\mu\rightarrow\omicron)\rightarrow\omicron}\rceil\lceil X_\mu\rceil . \lceil g_{\mu\rightarrow \omicron}\rceil\lceil X\rceil\\& = \exists^{\bullet}_{\lceil (\mu\rightarrow\omicron)\rightarrow\omicron\rceil}X_{\lceil \mu \rceil} . g_{\lceil\mu\rightarrow\omicron\rceil}X\\& = \exists^{\bullet}_{(\mu\rightarrow \sigma)\rightarrow\sigma}X_\mu. g_{\mu\rightarrow \sigma} X\end{align}\]Type-raised logical connectives, \(\theta^{\bullet}\), are definedbelow where \(r\) is a new constant symbol in HOL associated with theaccessibility relation of HOML:
\[\begin{align} \sim^{\bullet}_{\sigma\rightarrow\sigma} & =\lambda s_\sigma \lambda w_\iota\sim(sw)\\ \vee^{\bullet}_{\sigma\rightarrow\sigma\rightarrow\sigma} & = \lambda s_\sigma \lambda t_\sigma \lambda w_\iota(sw\vee tv) \\ \forall^{\bullet}_{(\alpha\rightarrow\sigma)\rightarrow\sigma} & = \lambda s_{\alpha\rightarrow\sigma}\lambda w_\iota \forall x_\alpha sxw \\ \Box^\bullet_{\sigma\rightarrow\sigma} & = \lambda s_\sigma \lambda w_\iota \forall u_\iota . \sim(r_{\iota\rightarrow\iota\rightarrow\omicron} wu)\vee su) \end{align}\]The other connectives can be defined in the usual way. Validity isexpressed as a \(\lambda\)-term, \(\lambda s_{\iota\rightarrow\omicron}\forall w_\iota sw\), that when applied to a term\(s_{\sigma}\) we write as \([s_{\sigma}]\). For example, under theembedding, proving in HOML the possibility of God’s existence,\(\Diamond_{\omicron\rightarrow \omicron}\exists_{(\mu \rightarrow\omicron)\rightarrow \omicron} X_{\mu} . g_{\mu \rightarrow\omicron}X\), is tantamount to proving its validity in HOL:\([\Diamond^{\bullet}_{\sigma \rightarrow\sigma}\exists^{\bullet}_{(\mu \rightarrow \sigma)\rightarrow \sigma}X_{\mu} . g_{\mu \rightarrow \sigma} X]_{\mu \rightarrow\omicron }\).To prove so, the type-raised HOL expression\([\Diamond^{\bullet}\exists^{\bullet}X_{\mu} . g_{\mu\rightarrow\sigma } X]\) is then encoded in the so-called THF0 syntax(Sutcliffe & Benzmüller 2010) prior to being fed, along withthe above set of equality rules, to the provers that were used incompleting the proof:
| thf(corC, conjecture, |
| (v |
| @(mdia |
| @(mexists_ind |
| @^[X: mu]: |
| (g @ X)))))). |
The proof in Benzmüller & Paleo 2014 is presented here,including the axioms and definitions as well as the derivation of itsfour main results—T1, C, T2, T3—all written in thetype-decorated type-raised higher-order logic notation resulting fromthe embedding. The proof steps are not fully expanded—note thepresence of type-raised connectives—and the inferential movesare not broken down to lower levels of detail. Borrowing a phrase fromBertrand Russell (Urquhart 1994), this was done to spare the reader ofthe “kind of nausea” that the fully detailed automatedproof would cause:
| A1 | \([\forall^{\bullet}\varphi_{\mu\rightarrow \sigma} . p_{(\mu\rightarrow \sigma)\rightarrow\sigma} (\lambda X_{\mu} .{\sim}^{\bullet}\varphi\)X))\( \equiv^{\bullet}{\sim}^{\bullet}p\varphi]\) | Axiom |
| A2 | \([\forall^{\bullet}\varphi_{\mu\rightarrow \sigma}.\forall^{\bullet}\psi_{\mu \rightarrow \sigma} . (p_{(\mu \rightarrow\sigma)\rightarrow\sigma} \varphi\ \wedge^{\bullet}\)\(\Box^{\bullet}\forall^{\bullet}X_{\mu}.(\varphi X\supset^{\bullet}\psi X)) \supset^{\bullet}\psi]\) | Axiom |
| T1 | \([\forall^{\bullet}\varphi_{\mu\rightarrow \sigma} . p_{(\mu\rightarrow \sigma)\rightarrow\sigma} \varphi \supset^{\bullet}\Diamond^{\bullet}\exists^{\bullet}X_{\mu}. \varphi X]\) | A1, A2 (in K) |
| D1 | \(g_{\mu\rightarrow\sigma} =\lambda X_{\mu}.\forall^{\bullet}\varphi_{\mu\rightarrow\sigma} . p_{(\mu\rightarrow \sigma)\rightarrow\sigma} \varphi\supset^{\bullet}\varphi X\) | Definition |
| A3 | \([p_{(\mu\rightarrow \sigma)\rightarrow \sigma} g_{\mu\rightarrow\sigma}]\) | Axiom |
| C | \([\Diamond^{\bullet}\exists^{\bullet}X_{\mu} . g_{\mu\rightarrow\sigma} X]\) | T1, D1, A3 (in K) |
| A4 | \([\forall^{\bullet}\varphi_{\mu\rightarrow \sigma} . p_{(\mu\rightarrow \sigma)\rightarrow\sigma} \varphi \supset^{\bullet}\Box^{\bullet}p\varphi]\) | Axiom |
| D2 | \(\ess_{(\mu\rightarrow \sigma)\rightarrow \mu \rightarrow\sigma} = \lambda \varphi_{\mu \rightarrow\sigma} . \lambdaX_{\mu}. \varphi X\ \wedge\)\(\forall^{\bullet}\psi_{\mu\rightarrow\sigma}. (\psi X\supset^{\bullet}\Box^{\bullet}\forall^{\bullet}Y_{\mu}. (\varphi Y\supset^{\bullet}\psi Y))\) | Definition |
| T2 | \([\forall^{\bullet}X_{\mu} . g_{\mu \rightarrow\sigma} X\supset^{\bullet}\ess_{(\mu\rightarrow \sigma)\rightarrow \mu\rightarrow\sigma} gX]\) | A1, D1, A4, D2 (in K) |
| D3 | \(\text{NE}_{\mu\rightarrow \sigma} = \lambda X_{\mu}.\forall^{\bullet}\varphi_{\mu\rightarrow\sigma}. (\ess \varphi X \supset^{\bullet}\Box^{\bullet}\exists^{\bullet}Y_{\mu}. \varphi Y)\) | Definition |
| A5 | \([p_{(\mu\rightarrow\sigma)\rightarrow\sigma}\text{NE}_{\mu\rightarrow\sigma}]\) | Axiom |
| T3 | \([\Box^{\bullet}\exists X_{\mu} . g_{\mu \rightarrow\sigma}X]\) | D1, C, T2, D3, A5 (in KB) |
Besides helping in the completion of the proof, the automated theoremprovers were also very instrumental in the finding of some novelresults. First, Gödel’s set of original assumptions wasshown to be inconsistent by LEO-II by proving that self-differencebecomes an essential property of every entity; a re-formulation of thedefinition of essence due to Dana Scott—this involved theaddition of a missing conjunct, \(\varphi X\), in thedefinition—was shown by Nitpick to be consistent. Second, LEO-IIand Satallax managed to prove C, T1 and T2 using only the logic systemK and, moreover, Nitpick found a counter-model for T3 in K thusshowing that more logical power is required to complete the rest ofthe proof. Third, using LEO-II and Satallax, it is shown that thelogic system KB (system K with the Brouwer axiom) is sufficient toestablish the necessity of God’s existence,\(\Box^{\bullet}\exists^{\bullet}X_{\mu} . g_{\mu \rightarrow\sigma}X\), which is a double-win for automated reasoning: a gain in logicaleconomy, and the deeper philosophical result of having effectivelydismissed a major criticism against Gödel’s proof, namelyhis use of the stronger logic system S5. Fourth, the authors alsoprove in KB that:
\[(\forall^{\bullet}\varphi_{\mu\rightarrow\sigma} .\forall^{\bullet}X_{\mu} . (g_{\mu \rightarrow \sigma} X \supset^{\bullet} ({\sim}^{\bullet}(p_{(\mu \rightarrow \sigma)\rightarrow\sigma} \varphi) \supset^{\bullet} {\sim}^{\bullet}(\varphi X)))\]as well as:
\[\forall^{\bullet}X_{\mu} .\forall^{\bullet}Y_{\mu} . (g_{\mu \rightarrow \sigma} X \supset^{\bullet} (g_{\mu \rightarrow\sigma} Y \supset^{\bullet} X =^{\bullet} Y)),\]that is, that God is flawless and that monotheism holds, respectively.At this point, it would be fair to say that any of these results wouldbe enough to vindicate the application of automated reasoning in exactphilosophy. Now, for the bad news followed by good news: Fifth, theformula \(s_{\sigma} \supset^{\bullet} \Box^{\bullet}s_{\sigma}\) canalso be formally derived which is unfortunate since it implies thatthere are no contingent truths and that everything is determined, i.e.there is no free will. However, the issue has been addressed byfollow-up work based on Fitting’s and Anderson’s variantsof the ontological argument (Fuenmayor & Benzmüller 2017,Fitting 2002, Anderson 1990).
Abstract object theory (AOT) is a metaphysical theoryof abstract objects (Zalta 1983). Abstract objects are the objectspresupposed by scientific theories: numbers, natural laws, properties,states of affairs, possibilities, etc. AOT draws a fundamentaldistinction between ordinary objects defined as \(O!x =_{df} \DiamondE!x\) and abstract objects defined as \(A!x =_{df}\lnot\Diamond E!x\).AOT also provides two distinctive modes of predication:exemplification \((Fx\), more generally \(Fx_1 ...x_n)\) and encoding\((xF\), ‘\(x\) encodes \(F\)’, and more generally \(x_1...x_n F)\). AOT adds encoding to 2nd-order S5 quantifiedmodal logic without identity, extended with definite descriptions\(\iota x \phi\), lambda expressions \(\lambda x_1 ...x_n \phi^*\)(where \(\phi^*\) means no encoding of subformulas), and a free logicfor complex terms (Zalta 1983, Zalta 1988). The key axioms of AOT arecomprehension for abstract objects, \[\exists x(A!x \amp \forall F(xF \equiv \phi))\] with no free\(x\)’s in \(\phi\), and classical \(\lambda\)-conversion,\[[\lambda y_1 ...y_n \phi^*]x_1 ...x_n \equiv \phi^* (x_1 /y_1,..., x_n /y_n)\] with no descriptions in \(\phi^*\). These implycomprehension for relations, \[\exists F^n \Box \forall x_1 ...x_n (F^n x_1 ...x_n \equiv \phi^*)\] with no descriptions in\(\phi^*\). Other principles include \[\begin{align}O!x &\rightarrow \Box \lnot\exists FxF\\ \Diamond xF &\rightarrow \Box xF\\ O!x \amp O!y & \rightarrow(x = y \rightarrow \Box \forall F(Fx \equiv Fy))\\ A!x \amp A!y & \rightarrow(x = y \rightarrow \Box \forall F(xF \equiv yF))\end{align}\] and \(\iota x(A!x\amp \forall F(xF \equiv \phi))\) always being well-defined. To give asense of the expressive power and application of AOT, here are someexamples of AOT’s ability to define metaphysical entities asabstract objects and derive interesting results (Zalta 2018):
Plato’s Forms (e.g. triangle)
\(\Phi_T =_{df}\iota x(A!x \amp \forall F(xF \equiv \Box \forall x(Tx\rightarrow Fx)))\)Leibniz’s Concepts (e.g. Alexander)
\(c_a =_{df}\iota x(A!x \amp \forall F(xF \equiv Fa))\)Frege Numbers
\(0 =_{df}\iota x(A!x \amp \forall F(xF \equiv\lnot\exists yFy))\)
\(1 =_{df}\iota x(A!x \amp \forall F(xF \equiv \exists y(Fy \amp\forall z(Fz \rightarrow z = y))))\)
etc.Truth Values
\(\top =_{df}\iota x(A!x \amp \forall F(xF \equiv \exists p(p \amp F =[\lambda y p])))\)
\(\bot =_{df}\iota x(A!x \amp \forall F(xF \equiv \exists p(\lnot p\amp F = [\lambda y p])))\)Situations and Possible Worlds
\(\textit{Situation}(x) =_{df} \forall F(xF \rightarrow \exists p(F =[\lambda y p]))\)
\(s \vDash p =_{df} s[\lambda y p]\)
\(\textit{PossibleWorld}(x) =_{df} \Diamond \forall p((s \vDash p)\equiv p)\), from which one can derive Leibniz’s Principle that\(p\) is necessary if true in all worlds, \(\vdash \Box p \equiv\forall w(w \vDash p)\), and also Lewis’ Principle that forevery way the world might be, there is a world which is that way,\(\vdash \Diamond p \equiv \exists w(w \vDash p)\)Theoretical Mathematical Objects (e.g. null set inZF)
\(\varnothing_{ZF} =_{df}\iota x(A!x \amp \forall F(xF \equiv ZF\vDash F\varnothing))\)
AOT is under continuous development and for further details of thetheory the reader is referred to one of its latest formulations (Zalta2022). The computational analysis of AOT was pioneered by Fitelson andZalta (Fitelson & Zalta 2007) by using the first-order systemProver9. Conducting computational investigations of a higher-ordertheory like AOT using a first-order prover has inherent limitationsand it would be preferable to work within the computational frameworkof a higher-order prover. AOT, however, is based on a logicalfoundation that is significantly different from classical higher-orderlogic and, ideally, one would want to work with a theorem prover forAOT itself. The downside to this is, of course, that one would need tobuild such a prover and this is no trivial task. But one can“approximate” such a system to a large extent by buildinginstead ashallow semantic embedding (SSE) of AOTinto an existing higher-order prover like e.g. Isabelle/HOL where theresearcher can faithfully represent AOT’s axioms and deductivesystem (Benzmüller 2019, Kirchner 2021). In this setting,Isabelle/HOL acts as the metalogical framework for the SSE thatprovides the “custom” theorem prover for AOT. But there isalways a trade-off, and building the embedding brings its own set ofchallenges. Key among these is that there are aspects of AOT which canbe easily formulated in relational type theory but present a challengewhen being re-formulated in the underlying functional type theory ofIsabelle/HOL. For example, not every formula in AOT can be convertedto a \(\lambda\)-term unless one is willing to face a contradiction!With some ingenuity, one can use Isabelle/HOL’s functionalcalculus to define some complex types to help build an Aczel model ofAOT, and then interpret those offending \(\lambda\)-expressions interms of the complex types all in the context of a free logic. Thebottom line: Every formula of AOT can then be expressed as a\(\lambda\)-term but not all these terms denote; hence, consistency ispreserved.
Key aspects of the SSE of AOT in Isabelle/HOL include the modelconstruction of the embedding using Aczel models, reproducing thesyntax of AOT, extending Isabelle’s “outer” syntax(to deal with certain challenges of reasoning in AOT), representing anabstract semantics of AOT, specifying the logic of the Hilbert\(\varepsilon\) operator, representing the logic of the actualityoperator, representing hyperintensionality, deriving the axiom systemand deductive system, and other considerations—see Kirchner 2021for details. Salient among these is the use ofabstractionlayers (Kirchner 2017) which play an important role indetermining the derivability of a statement from the deductive systemof the target theory (AOT here). An abstraction layer is constructedby proving that the axioms and deduction rules of AOT are semanticallyvalid in the SSE; after this, all subsequent reasoning (as conductedby e.g. sledgehammer, Isabelle/HOL’s main tool for automatedreasoning) is restricted to rely on the derived axioms and deductionrules themselves and may not refer to the underlying semantics. Thework took about 25,000 lines of Isabelle/HOL: About 5,000 lines tobuild the required model structure and semantics as well as the syntaxrepresentation of AOT, and the remaining 20,000 for the logicreasoning in AOT. Under the embedding, computational explorations ofAOT can be conducted in a more “native” fashion asillustrated below in the 9-line proof in Isabelle/HOL notation that noobject is both ordinary and abstract (Kirchner 2021):
| 7571 | AOT_theorem partition: ❬¬∃x (O!x & A!x) ❭ |
| 7572 | proof(rule "raa-cor:2") |
| 7573 | AOT_assume ❬ ∃x (O!x& A!x) ❭ |
| 7574 | thenAOT_obtain awhere ❬ O!a & A!a ❭ |
| 7575 | using "∃E"[rotated] by blast |
| 7576 | AOT_thus ❬ p & ¬p❭ for p |
| 7577 | by (metis "&E"(1)"Conjunction Simplification"(2) "≡E"(1) |
| 7578 | "modus-tollens:1""oa-contingent:2" "raa-cor:3") |
| 7579 | qed |
An additional 1,000 lines of such computational derivations lead tothe result that there are distinct abstract objects which cannot bedistinguished by exemplification: \(\exists x\exists y(A!x \amp A!y\amp x \ne y \amp \forall F(Fx \equiv Fy))\). A few more derivationsland a significant novel discovery which provides the analytical meansto determine if a \(\lambda\)-expression denotes in AOT: \([\lambda x\phi]\downarrow \equiv \Box \forall x\forall y(\forall F(Fx \equiv Fy)\rightarrow(\phi \equiv \phi(y/x))\) with \(y\) not free in \(\phi\).And, as a corollary, \([\lambda x \phi]\downarrow \rightarrow \forallx\forall y(\forall F(Fx \equiv Fy) \rightarrow \Box(\phi \equiv\phi(y/x))\) with \(y\) not free in \(\phi\). The proof of the latterin the context of the SSE takes the 20 lines in Isabelle/HOL givenbelow:
| 8761 | AOT_theorem"kirchner-thm-cor:1": |
| 8762 | ❬ [λx φ{x}]↓→ ∀x∀y(∀F([F]x ≡ [F]y) →□(φ{x} ≡ φ{y})) ❭ |
| 8763 | proof(rule "→I"; rule GEN;rule GEN; rule "→I") |
| 8764 | fix x y |
| 8765 | AOT_assume ❬[λx φ{x}]↓ ❭ |
| 8766 | AOT_hence ❬□∀x∀y (∀F ([F]x ≡ [F]y) →(φ{x} ≡ φ{y})) ❭ |
| 8767 | by (rule"kirchner-thm:1"[THEN "≡E"(1)]) |
| 8768 | AOT_hence ❬∀x□∀y (∀F ([F]x ≡ [F]y) →(φ{x} ≡ φ{y})) ❭ |
| 8769 | usingCBF[THEN "→E"]by blast |
| 8770 | AOT_hence ❬□∀y (∀F ([F]x ≡ [F]y) → (φ{x} =φ{y})) ❭ |
| 8771 | using"∀E"by blast |
| 8772 | AOT_hence ❬∀y □(∀F ([F]x ≡ [F]y) → (φ{x}≡ φ{y})) ❭ |
| 8773 | usingCBF[THEN "→E"]by blast |
| 8774 | AOT_hence ❬᷑(∀F ([F]x ≡ [F]y) → (φ{x} ≡φ{y})) ❭ |
| 8775 | using"∀E"by blast |
| 8776 | AOT_hence ❬□∀F([F]x ≡ [F]y) → □(φ{x} ≡φ{y}) ❭ |
| 8777 | using"qml:1"[axiom_inst] "vdash-properties:6"byblast |
| 8778 | moreover AOT_assume❬ ∀F([F]x ≡ [F]y) ❭ |
| 8779 | ultimately AOT_show❬ □(φ{x} ≡ φ{y}) ❭using "→E" "ind-nec"byblast |
| 8780 | qed |
After establishing further results about basic logical objects,restricted variables, the extended relation comprehension, andpossible worlds, the computational exploration can then be redirectedto Dedekind-Peano arithmetic where its postulates for natural numbersare formally derived in a system free of mathematical primitivenotions and mathematical axioms—Frege’s Theorem—andthereby supporting the claim that AOT can provide a philosophicallygrounded basis for objects of mathematics. The computational approachin Kirchner 2021 is guided by a proof outline that was previouslygiven in Zalta 1999 but that now, being reconstructed in Isabelle/HOL,produces derivations of the postulates in full detail and formality.
Postulate 1
AOT_theorem "0-n": ❬[ℕ]0 ❭Postulate 2
AOT_theorem "0-pred":❬ ¬∃n [ℙ]n 0 ❭Postulate 3
AOT_theorem "no-same-succ":❬ ∀n∀m∀k([ℙ]nk & [ℙ]mk→ n = m) ❭Postulate 4
AOT_theorem "th-succ":❬ ∀n∃!m [ℙ]nm ❭Postulate 5
AOT_theorem induction:❬ ∀F([F]0 & ∀n∀m([ℙ]nm →([F]n → [F]m)) → ∀n[F]n) ❭
The computational explorations described above were done using thesecond-order fragment of AOT but the SSE could be extended to the fullhigher-order logic AOT (Kirchner 2021) where it could be applied tothe analysis of theoretical mathematics. It is important to stressthat embedding a target theory within the higher-order logic of anexisting prover results in more than just a formalization of thetheory: The SSE allows for the discovery of new results within thetarget theory, as exemplified above, as well as the study and furtherdevelopment of the target theory itself such as placing the theory onfirmer foundations, e.g. avoiding known paradoxes (Zalta 2018,Kirchner 2021). The computational analysis of AOT described in herecan also be construed as yet another test of the concept of embeddingtheories, simple and complex alike, within the framework of ahigher-order prover. It illustrates the power and convenience of theapproach, and researchers in automated reasoning may want to seriouslyconsider using an SSE in their theorem-proving efforts(Benzmüller 2019).
Leibniz’s dream was to have acharateristicauniversalis andcalculus ratiocinator that would allowus to reason in metaphysics and morals in much the same way as we doin geometry and analysis; that is to say, to settle disputes betweenphilosophers as accountants do: “To take pen in hand, sit downat the abacus and, having called in a friend if they want, say to eachother: Let us calculate!” From the above applications ofautomated reasoning, one would agree with the researchers when theyimply that these results achieve, to some extent, Leibniz’s goalof a computational metaphysics (Fitelson & Zalta 2007,Benzmüller & Paleo 2014).
Work in computational metaphysics has implications in other areas inphilosophy such as e.g. epistemology. An obvious example is ourimproved epistemological standing when errors in our reasoning are(computationally) detected and corrected. Also, proofs produced byautomated reasoning systems can help us better understand complexarguments, and see more quickly the consequences of revising ourtheories by the introduction, or removal, or axioms—a sort of“what-if analysis”. To illustrate, in the desire tosimplify the foundations of AOT, one can attempt the removal ofconstraints in the comprehension principle but it can be shown thatthis move leads to a paradox in a non-trivial way (Kirchner 2021).Finding alternative axiom sets for a given theory can help reduce theepistemological load needed to prove meta-theoretical results such assoundness. In brief, “one of the great benefits of usingcomputational techniques is that enables us to see exactly what thecommitments of our theories are” (Zalta 2018).
As a direct application in epistemology, a nonmonotonic theorem provercan provide the basis for a “computational laboratory” inwhich to explore and experiment with different models of artificialrationality; the theorem prover can be used to equip an artificialrational agent with an inference engine to reason and gain informationabout the world. In suchprocedural epistemology, arational agent is defeasible (i.e. nonmonotonic) in the sense that newreasoning leads to the acceptance of new beliefs but also to theretraction of previously held beliefs in the presence of newinformation. At any given point in time, the agent holds a set ofjustified beliefs but this set is open to revision and is in acontinuous set of flux as further reasoning is conducted. This modelbetter reflects our accepted notion of rationality than a model inwhich all the beliefs are warranted, i.e. beliefs that once areattained are never retracted. Actually, a set of warranted beliefs canbe seen as justified beliefs “in the limit”, that is, asthe ultimate epistemic goal in the agent’s search for trueknowledge about its world. (Pollock 1995) offers the followingdefinition:
A set is defeasible enumerable iff there is an effective computablefunction \(f\) such that for each \(n, f(n)\) is a recursive set andthe following two conditions hold
To compare the concepts, if \(A\) is recursively enumerable then thereis a sequence of recursive sets \(A_i\) such that each \(A_i\) is asubset of \(A\) with each \(A_i\) growing monotonically, approaching\(A\) in the limit. But if \(A\) is only defeasibly enumerable thenthe \(A_i\)’s still approach \(A\) in the limit but may not besubsets of \(A\) and approach \(A\) intermittently from above andbelow. The goal of the OSCAR Project (Pollock 1989) is to construct ageneral theory of rationality and implement it in an artificialcomputer-based rational agent. As such, the system uses a defeasibleautomated reasoner that operates according to the maxim that the setof warranted beliefs should be defeasible enumerable. OSCAR has beenin the making for some time and the application of automatednonmonotonic reasoning has also been used to extend its capabilitiesto reason defeasibly about perception and time, causation, anddecision-theoretic planning (Pollock 2006).
One of the main goals of automated reasoning has been the automationof mathematics. An early attempt at this was Automath (de Bruijn 1968)which was the first computer system used to check the correctness ofproofs and whole books of mathematics, including Landau’sGrundlagen der Analysis (van Benthem Jutting 1977). Automathhas been superseded by more modern and capable systems, most notablyMizar. The Mizar system (Trybulec 1979, Muzalewski 1993) is based onTarski-Grothendieck set theory and, like Automath, consists of aformal language which is used to write mathematical theorems and theirproofs. Once a proof is written in the language, it can be checkedautomatically by Mizar for correctness. Mizar proofs are formal butquite readable, can refer to definitions and previously provedtheorems and, once formally checked, can be added to the growing MizarMathematical Library (MML) (Bancerek & Rudnicki 2003, Bancereket al. 2018). As of June 2018, MML contained about 12,000definitions and 59,000 theorems. The Mizar language is a subset ofstandard English as used in mathematical texts and is highlystructured to ensure the production of rigorous and semanticallyunambiguous texts. Here’s a sample proof in Mizar of theexistence of a rational numberx\(^y\) where \(x\) and \(y\)are irrational:
| theorem T2: |
| ex x, y st x is irrational & y is irrational& x.^.y is rational |
| proof |
| set w = √2; |
| H1: w is irrational by INT_2:44,T1; |
| w>0 by AXIOMS:22,SQUARE_1:84; |
| then (w.^.w).^.w = w.^.(w•w) byPOWER:38 |
| .= w.^.(w2) bySQUARE_1:58 |
| .= w.^.2 by SQUARE_1:88 |
| .= w2 by POWER:53 |
| .= 2 by SQUARE_1:88; |
| then H2: (w.^.w).^.w is rational byRAT_1:8; |
| per cases; |
| suppose H3: w.^.w is rational; |
| take w, w; |
| thus thesis by H1,H3; |
| suppose H4: w.^.w is irrational; |
| take w.^.w, w; |
| thus thesis by H1,H2,H4; |
| end; |
Examples of proofs that have been checked by Mizar include theHahn-Banach theorem, the Brouwer fixed-point theorem,Kőnig’s lemma, the Jordan curve theorem, andGödel’s completeness theorem. Rudnicki (2004) discusses thechallenges of formalizing Witt’s proof of the Wedderburntheorem:Every finite division ring is commutative. Thetheorem was formulated easily using the existing formalizationsavailable in MML but the proof demanded further entries into thelibrary to formalize notions and facts from algebra, complex numbers,integers, roots of unity, cyclotomic polynomials, and polynomials ingeneral. It took several months of effort to supply the missingmaterial to the MML library but, once in place, the proof wasformalized and checked correct in a matter of days. Clearly, arepository of formalized mathematical facts and definitions is aprerequisite for more advanced applications. The QED Manifesto (Boyeret al. 1994, Wiedijk 2007) has such aim in mind and there ismuch work to do: Mizar has the largest such repository but even after30 years of work “it is minuscule with respect to the body ofestablished mathematics” (Rudnicki 2004). This last remarkshould be construed as a call to increase the effort toward thisimportant aspect in the automation of mathematics.
Mizar’s goal is to assist the practitioner in the formalizationof proofs and to help check their correctness; other systems aim atfinding the proofs themselves. Geometry has been a target of earlyautomated proof-finding efforts. Chou (1987) proves over 500 geometrytheorems using the algebraic approach offered by Wu’s method andthe Gröbner basis method by representing hypotheses andconclusions as polynomial equations. Quaife (1992) provides anotherearly effort to find proofs in first-order mathematics: over 400theorems in Neumann-Bernays-Gödel set theory, over 1,000 theoremsin arithmetic, a number of theorems in Euclidian geometry, andGödel’s incompleteness theorems. The approach is bestdescribed as semi-automatic or “interactive” with the userproviding a significant amount of input to guide the theorem-provingeffort. This is no surprise since, as one applies automated reasoningsystems into richer areas of mathematics, the systems take more on therole of proof assistants than theorem provers. This is because inricher mathematical domains the systems need to reason about theoriesand higher-order objects which in general takes them deeper into theundecidable.Interactive theorem proving is arguablythe “killer” application of automated reasoning inmathematics and much effort is being expended in the building ofincreasingly capable reasoning systems that can act as assistants toprofessional mathematicians. The proof assistant Isabelle/HOL providesthe user with an environment in which to conduct proofs expressed in astructured, yet human-readable, higher-order logic language and whichincorporates a number of facilities that increase the user’sproductivity, automates proof-verification and proof-finding tasks,and provides a modular way for the user to build and manage theoryhierarchies (Ballarin 2014). More recently, using both automated andinteractive theorem proving techniques, Quafie’s work inTarskian geometry has been extended with the proving of additionaltheorems (some of which required Ph.D. level proofs), including fourchallenge problems left unsolved by Quaife, and the derivation ofHilbert’s 1899 axioms for geometry from Tarski’s axioms(Beeson and Wos 2017).
Different proof assistants offer different capabilities measured bytheir power at automating reasoning tasks, supported logic, objecttyping, size of mathematical library, and readability of input andoutput. A “canonical” proof which is not too trivial butnot too complex either can be used as a baseline for systemcomparison, as done in (Wiedijk 2006) where the authors of seventeenreasoning systems are tasked with establishing the irrationality of\(\sqrt{} 2\). The systems discussed are certainly more capable thanthis and some have been used to assist in the formalization of farmore advanced proofs such as Erdös-Selberg’s proof of thePrime Number Theorem (about 30,000 lines in Isabelle), theformalization of the Four Color Theorem (60,000 lines in Coq), and theJordan Curve Theorem (75,000 lines in HOL Light). A milestone ininteractive theorem proving was reached in 2012 when, after six-yearsof effort and using the Coq proof assistant, George Gonthier and histeam completed the formal verification of the 255-page proof of theFeit-Thompson theorem, also known as the Odd Order Theorem, a majorstep in the classification of finite simple groups. Other, morerecent, successes include the resolution of Keller’s conjecture(Brakensieket al. 2022), the formalization of metric spaces(Maggesi 2018), and the formalization and classification of finitefields (Chan and Norrish 2019).
The above notwithstanding, automated reasoning has had a small impacton the practice of doing mathematics and there is a number of reasonsgiven for this. One reason is that automated theorem provers are notsufficiently powerful to attempt the kind of problems mathematicianstypically deal with; that their current power is, at best, at thelevel of first-year undergraduate mathematics and still far fromleading edge mathematical research. While it is true that currentsystems cannot prove completely on their own problems at this level ofdifficulty we should remember that the goal is to build reasoningsystems so that “eventually machines are to be an aid tomathematical research and not a substitute for it” (Wang 1960).With this in mind, and while the automated reasoning communitycontinues to try to meet the grand challenge of building increasinglypowerful theorem provers, mathematicians can draw now some of thebenefits offered by current systems, including assistance incompleting proof gaps or formalizing and checking the correctness ofproposed proofs. Indeed, the latter may be an application that couldhelp address some real issues currently being faced by themathematical community. Consider the announcement by Daniel Goldstonand Cem Yildrim of a proof of the Twin Prime Conjecture where,although experts initially agreed that the proof was correct, aninsurmountable error was found shortly after. Or, think about the caseof Hales’ proof of the Kepler Conjecture which asserts that nopacking of congruent balls in Euclidean 3-space has density greaterthan the face-centered cubic packing. Hales’ proof consists ofabout 300 pages of text and a large number of computer calculations.After four years of hard work, the 12-person panel assigned byAnnals of Mathematics to the task of verifying the proofstill had genuine doubts about its correctness. Thomas Hales, for one,took upon himself to formalize his proof and have it checked by anautomated proof assistant with the aim of convincing others of itscorrectness (Hales 2005b, in Other Internet Resources). His task wasadmittedly heavy but the outcome is potentially very significant toboth the mathematical and automated reasoning communities. All eyeswere on Hales and his formal proof as he announced the completion oftheFlyspeck project (Hales 2014, in Other InternetResources; Hales 2015) having constructed a formal proof of theconjecture using the Isabelle and HOL Light automated proofassistants: “In truth, my motivations for the project are farmore complex than a simple hope of removing residual doubt from theminds of few referees. Indeed, I see formal methods as fundamental tothe long-term growth of mathematics.” (Hales 2006).
Church 1936a, 1936b and Turing 1936 imply the existence of theoremswhose shortest proof is very large, and the proof of the Four ColorTheorem in (Appel & Haken 1977), the Classification of SimpleGroups in (Gorenstein 1982), and the proof of the Kepler Conjecture in(Hales 2005a) may well be just samples of what is yet to come. As(Bundy 2011) puts it: “As important theorems requiring largerand larger proofs emerge, mathematics faces a dilemma: either thesetheorems must be ignored or computers must be used to assist withtheir proofs.”
The above remarks also counter another argument given for not usingautomated theorem provers: Mathematicians enjoy proving theorems, sowhy let machines take away the fun? The answer to this is, of course,that mathematicians can have even more fun by letting the machine dothe more tedious and menial tasks: “It is unworthy of excellentmen to lose hours like slaves in the labour of calculation which couldsafely be relegated to anyone else if machines were used” (G. W.Leibniz,New Essays Concerning Human Understanding). If stillnot convinced, just consider the sobering prospect of having tomanually check the 23,000 inequalities used in Hales’ proof!
Another reason that is given for the weak acceptance of automatedreasoning by the mathematical community is that the programs are notto be trusted since they may contain bugs—softwaredefects—and hence may produce erroneous results. Formallyverifying automated reasoning programs will help ameliorate this,particularly in the case of proof checkers. Proving programs correctis no easy task but the same is true about proving theorems inadvanced mathematics: Gonthier proved correct the programs used in theformalization of his proof of the Four Color Theorem, but he spent farmore effort formalizing all the graph theory that was part of theproof. So ironically enough, it turns out that at least in this case,and surely there are others, “it is actually easier to verifythe correctness of the program than to verify the correctness of thepen-and-paper mathematics” (Wiedijk 2006). For theorem proversand model finders, a complementary strategy would be to verify theprograms’ results as opposed to the programs themselves.Paraphrasing (Slaney 1994): It does not matter to the mathematicianhow many defects a program may have as long as the proof (or model) itoutputs is correct. So, the onus is in the verification of results,whether produced by machine or man, and checking them by independentparties (where of course the effort may well use automated checkers)should increase the confidence on the validity of the proofs.
It is often argued that automated proofs are too long and detailed.That a proof can be expressed in more elementary steps is in principlevery beneficial since this allows a mathematician to request a proofassistant justify its steps in terms of simpler ones. But proofassistants should also allow the opposite, namely to abstract detailand present results and their justifications using the higher-levelconcepts, language, and notation mathematicians are accustomed to.Exploiting the hierarchical structure of proofs as done in (Denney2006) is a step in this direction but more work along these lines isneeded. Having the proof assistant work at the desired level ofgranularity provides more opportunity for insight during the proofdiscovery process. This is an important consideration sincemathematicians are equally interested in gaining understanding fromtheir proofs as in establishing facts—more about this below.
(Bundy 2011) alludes to a deadlock that is preventing the wideradoption of theorem provers by the mathematical community: On the onehand, the mathematicians need to use the proof assistants to build alarge formal library of mathematical results. But, on the other hand,they do not want to use the provers since there is no such library ofpreviously proved results they can build upon. To break the impasse, anumber of applications are proposed of which assisting themathematician in the search of previously proved theorems is ofparticular promise. Indeed, a thoughtful reuse of library results canlead to concise proofs of non-trivial mathematical problems asexemplified in the proving of some fundamental theorems of linearalgebra (Aransay and Divansón 2017) and probability theory(Avigad, Hölzl and Serafin 2017). During its history, mathematicshas accumulated a huge number of theorems and the number ofmathematical results continues to grow dramatically. In 2010,Zentralblatt MATH covered about 120,000 new publications(Wegner 2011). Clearly, no individual researcher can be acquaintedwith all this mathematical knowledge and it will be increasinglydifficult to cope with one’s ever-growing area of specialtyunless assisted with automated theorem-proving tools that can searchin intelligent ways for previously proved results of interest. Analternative approach to this problem is for mathematicians to tap intoeach other’s knowledge as enabled in computational socialsystems likepolymath andmathoverflow. Theintegration of automated reasoning tools into such social systemswould increase the effectiveness of their collective intelligence bysupporting “the combination of precise formal deductions and themore informal loose interaction seen in mathematical practice”(Martin & Pease 2013, in Other Internet Resources).
Due to real pressing needs from industry, some applications ofautomated reasoning in pure and applied mathematics are more ofnecessity than choice. After having worked on the formalization ofsome elementary real analysis to verify hardware-based floating pointtrigonometric functions, (Harrison 2006, Harrison 2000) mentions thefurther need to formalizemore pure mathematics—italicsare his—to extend his formalization to power series fortrigonometric functions and basic theorems about Diophantineapproximations. Harrison finds it surprising that “suchextensive mathematical developments are used simply to verify that afloating point tangent function satisfies a certain error bound”and, from this remark, one would expect there are other industrialapplications that will demand more extensive formalizations.
Albeit not at the rate originally anticipated, automated reasoning isfinding applications in mathematics. Of these, formal verification ofproofs is of special significance since it not only provides a viablemechanism to check proofs that humans alone could not but it also has,as a side effect, the potential to redefine what it would take for aproof to be accepted as such. As the use of automated reasoningassistants becomes more widespread one can envision their usefollowing a certain methodical order: First, automated reasoning toolsare used for theory exploration and discovery. Then, having identifiedsome target problem, the practitioner works interactively with anautomated assistant to find proofs and establish facts. Finally, anautomated proof checker is used to check the correctness of all finalproofs prior to being submitted for publication and being madeavailable to the rest of the mathematical community via the creationof new entries in a repository of formalized mathematics. It is indeeda matter of time before the application of automated proof assistantsbecomes an everyday affair in the life of the mathematician; it is thegrand challenge of the automated reasoning community to make it happensooner than later.
Besides formal verification, orcertification, anotherimportant aspect of a proof is theexplanation it provides;that is, the reasons it gives as to why the given statement isactually true. This, needless to say, is an important source ofinsight and for many mathematicians it may be the single most valuableaspect of a proof as it allows them to gain a better understanding ofthe nature of the statement being established and of the mathematicaltheory and objects involved; moreover, the approach used in the proofhas the potential of being applicable to the proving of othermathematical results. So when it comes to building theorem provers tobe used by the mathematical community, perhaps there should be lessemphasis on the provers as certifiers, i.e. as proof checkers, andplace more emphasis on the provers as proof solvers, i.e. asassistants in helping the mathematician complete proofs and explainthe steps in doing so. Assuming that a prover were to be powerfulenough to successfully attack proofs in the mathematician’s areaof research, it would be very desirable yet admittedly extremelychallenging if the prover were to also use the very same notation,methods, and techniques of proof-solving used by the mathematicianherself. One could safely bet that mathematicians would be morereceptive to this type of human-oriented theorem prover because theprover would work the same way the mathematician does. Moremathematicians may initially approach such a prover simply out ofsheer curiosity; that is, just to see how the prover would go aboutproving a previously established result. This would certainly be ofmuch interest to students of mathematics since they may need to seehow a given problem (say, from their textbook) has been solved and, byinspecting the proof, want to learn how to do it on their own.
Virtually all automated theorem provers nowadays go about buildingtheir proofs in ways far more akin to machines than to humans (e.g.resolution-driven, connection methods). There are some systems that,after establishing a fact, can present the higher-level steps in theproof in a form more amenable to humans and do so by translating themachine-oriented steps into a human-readable format. The approach hasmerit but it has a significant limitation: How deep into the proof canone continue asking for an explanation before the translation breaksdown? An alternative approach could try to address the situation headon: Why not build a system that directly proves theorems the waymathematicians actually do it? This is indeed a very tall order butthe question is not new and it has been taken up in various forms by asmall minority of researchers going back to the early days ofautomated theorem proving, including Bledsoe 1977 (investigations intonon-resolution methods), Boyer and Moore 1979 (induction by recursiveterm rewriting), Bundyet al. 1991 (automated inductivetheorem proving), Clarke and Zhao 1994 (integration of theorem proverswith symbolic algebra systems), Portoraro 1994 (automated advice tostudents building symbolic logic proofs), Portoraro 1998 (strategicproof construction the way teachers of symbolic logic do and teachit), Pelletier 1998 (building proofs in the predicate calculus the wayhumans do it), Beeson 2001 (proof generation using mathematicalmethods), Buchbergeret al. 2016 (computer-assistednatural-style mathematics), and others. More recently, Ganesalingamand Gowers 2017 describes a theorem prover that solves and presentsproofs of elementary problems in metric space theory where theprogram’s approach is hard to distinguish from what amathematician might prove and write. To illustrate, below is theprogram’s proof that the intersection of two open subsets of ametric space is itself open, and it is given as it was both solved andwritten by the program:
Proof. Let \(x\) be an element of \(A\capB\). Then \(x \in A\) and \(x \in B\). Therefore, since \(A\) is open,there exists \(\eta > 0\) such that \(u \in A\) whenever \(d(x, u) \lt\eta\) and since \(B\) is open, there exists \(\theta > 0\) such that\(v \in B\) whenever \(d(x, v) \lt \theta\). We would like to find\(\delta > 0\) s.t. \(y \in A\cap B\) whenever \(d(x, y) \lt \delta\).But \(y \in A\cap B\) if and only if \(y \in A\) and \(y \in B\). Weknow that \(y\cap A\) whenever \(d(x, y) \lt \eta\) and that \(y\capB\) whenever \(d(x, y) \lt \theta\). Assume now that \(d(x, y) \lt\delta\). Then \(d(x, y) \lt \eta\) if \(\delta \le \eta\) and \(d(x,y) \lt \theta\) if \(\delta \le \theta\). We may therefore take\(\delta = \text{min}\{\eta , \theta \}\) and we are done.
Impressive as it may be, the authors acknowledge a number ofshortcomings and many outstanding challenges in the building andpresentation of such proofs even for elementary problems inmathematics, and the reader is referred to their article for details.As we have already mentioned, building increasingly powerful proversthat can attack problems in advanced areas of mathematical researchand doing so in a human-oriented fashion is admittedly extremelychallenging but also, we need to add, a very worthwhile, promising,and rewarding line of research. Embracing this kind of provers bymathematicians will have clear practical applications both in researchand education. There will be philosophical implications too,especially the moment the provers are endowed with the additionalability to ask for assistance when getting stuck in a proof and, whenthis happens, we will be pressed to ask: Who is interacting with whom,the human with the machine or vice versa? The experience ofinteracting with a theorem prover will have been extended to a newrealm, becoming truly two-sided, more intimate and richer, more of acollaboration than an interaction: the dawn ofcollaborative theorem proving.
Since its inception, the field of automated theorem proving has hadimportant applications in the larger field of artificial intelligence(AI). Automated deduction is at the heart of AI applications likelogic programming (see Section 4.1 Logic Programming) where computation is equated with deduction; robotics and problemsolving (Green 1969) where the steps to achieve goals are stepsextracted from proofs; deductive databases (Minkeret al.2014) where factual knowledge is expressed as atomic clauses andinference rules, and new facts are inferred by deduction; expertsystems (Giarratano & Riley 2004) where human expertise in a givendomain (e.g. blood infections) is captured as a collection of IF-THENdeduction rules and where conclusions (e.g. diagnoses) are obtained bythe application of the inference rules; and many others. Anapplication of automated reasoning in AI which is bound to have deepphilosophical implications is the increased use of BDI computationallogics for describing the beliefs, desires, and intentions ofintelligent agents and multi-agent systems (Meyer 2014) and, inparticular, endowing future intelligent systems, such asdecision-support systems or robots, with legal and ethical behaviour.Deontic logic can be automated for the task (Furbachet al.2014) but given that there is no agreement on a universal system ofdeontic logic, ethics “code designers” need a way toexperiment with the different deontic systems (i.e., to lay out axiomsand see what conclusions follow from them) to help them identify thedesired ethic code for the specific application at hand;(Benzmülleret al. 2018) discusses an environment forthis. If actual, physical, robots were to be used in theseexperiments, the term “deontic laboratory” would be quitedescriptive albeit somewhat eerie.
Restricting the proof search space has always been a key considerationin the implementation of automated deduction, and traditionalAI-approaches to search have been an integral part of theorem provers.The main idea is to prevent the prover from pursuing unfruitfulreasoning paths. A dual aspect of search is to try to look for apreviously proved result that could be useful in the completion of thecurrent proof. Automatically identifying those results is no easy taskand it becomes less easy as the size of the problem domain, and thenumber of already established results, grows. This is not a happysituation particularly in light of the growing trend to build largelibraries of theorems such as the Mizar Problems for Theorem Proving(MPTP) (Urbanet al. 2010, Bancerek & Rudnicki 2003) orthe Isabelle/HOL mathematical library (Meng & Paulson 2008), sodeveloping techniques for the discovery, evaluation, and selection ofexisting suitable definitions, premises and lemmas in large librariesof formal mathematics as discussed in (Kühlweinet al.2012) is an important line of research.
Among many other methods, and in stark contrast to automated provers,mathematicians combine induction heuristics with deductive techniqueswhen attacking a problem. The former helps them guide theproof-finding effort while the latter allows them to close proof gaps.And of course all this happens in the presence of the very large bodyof knowledge that the human possesses. For an automated prover, theanalogous counterpart to the mathematician’s body of knowledgeis a large library like MPTP. An analogous approach to using inductiveheuristics would be to endow the theorem prover with inductive,data-driven, machine learning abilities. Urban & Vyskocil 2012 runa number of experiments to determine any gains that may result fromsuch an approach. For this, they use MPTP and theorem provers like Eand SPASS enhanced with symbol-based machine learning mechanisms. Adetailed presentation and statistical results can be found in theabove reference but in summary, and quoting the authors, “thisexperiment demonstrates a very real and quite unique benefit of largeformal mathematical libraries for conducting novel integration of AImethods. As the machine learner is trained on previous proofs, itrecommends relevant premises from the large library that (according tothe past experience) should be useful for proving newconjectures.” Urban 2007 discusses MaLARea (a Machine Learnerfor Automated Reasoning), a meta-system that also combines inductiveand deductive reasoning methods. MaLARea is intended to be used inlarge theories, i.e. problems with a large number of symbols,definitions, premises, lemmas, and theorems. The system works incycles where results proved deductively in a given iteration are thenused by the inductive machine-learning component to place restrictionsin the search space for the next theorem-proving cycle. Albeit simplein design, the first version of MaLARea solved 142 problems out of 252in the MPTP Challenge, outperforming the more seasoned provers E (89problems solved) and SPASS (81 problems solved). Machine learningpremise-selection methods trained on the considerable amount ofmathematical knowledge encoded in Flyspeck’s library of proofs,when combined with theorem provers, provides an AI-type system capableof proving a wide range of mathematical conjectures: Almost 40% of the14,185 theorems can be proved automatically without any guidance fromthe user within 30 seconds on a 14-CPU workstation (Kaliszyk &Urban 2014). Machine learning techniques can also be appliedsuccessfully to the problem of selecting good heuristics in thebuilding of first-order proofs (Bridge, Holden & Paulson2014).
The relationship between automated deduction and machine learning isreciprocal and the former has something to offer to the latter too. Tomention one contribution, deep learning has become the technique ofchoice when it comes to applications in image recognition, languageprocessing, and others, and there is theoretical evidence of itssuperiority over shallow learning. Such mathematical proofs can beformalized using theorem-proving systems like e.g. Isabelle/HOL whileat the same time can contribute to the growth of their libraries withformalized results that can be used for further work aimed to securethe foundations of machine learning (Bentkamp, Blanchette & Klakow2019).
Besides using large mathematical libraries, tapping into web-basedsemantic ontologies is another possible source of knowledge. Pease& Sutcliffe 2007 discuss ways for making the SUMO ontologysuitable for first-order theorem proving, and describes work ontranslating SUMO into TPTP. An added benefit of successfully reasoningover large semantic ontologies is that this promotes the applicationof automated reasoning into other fields of science. Tapping into itsfull potential, however, will require a closer alignment of methodsfrom automated reasoning and artificial intelligence.
Automated reasoning is a mature yet still growing field that providesa healthy interplay between basic research and application. Automateddeduction is being conducted using a multiplicity of theorem-provingmethods, including resolution, sequent calculi, natural deduction,matrix connection methods, term rewriting, mathematical induction, andothers. These methods are implemented using a variety of logicformalisms such as first-order logic, type theory and higher-orderlogic, clause and Horn logic, non-classical logics, and so on.Automated reasoning programs are being applied to solve a growingnumber of problems in formal logic, mathematics and computer science,logic programming, software and hardware verification, circuit design,exact philosophy, and many others. One of the results of this varietyof formalisms and automated deduction methods has been theproliferation of a large number of theorem proving programs. To testthe capabilities of these different programs, selections of problemshave been proposed against which their performance can be measured(McCharen, Overbeek & Wos 1976, Pelletier 1986). The TPTP(Sutcliffe & Suttner 1998, Sutcliffe 2017) is a library of suchproblems that is updated on a regular basis. There is also acompetition among automated theorem provers held regularly at the CADEconference (Pelletier, Sutcliffe & Suttner 2002; Sutcliffe 2016,in Other Internet Resources); the problems for the competition areselected from the TPTP library and range from problems in clausenormal form (CNF), fist-order form (FOF), typed first-order form(TFF), monomorphic typed higher-order form (TH0), and others. There isa similar library and competition for SMT solvers (Barretetal. 2013).
Initially, computers were used to aid scientists with their complexand often tedious numerical calculations. The power of the machineswas then extended from the numeric into the symbolic domain whereinfinite-precision computations performed by computer algebra programshave become an everyday affair. The goal of automated reasoning hasbeen to further extend the machine’s reach into the realm ofdeduction where they can be used as reasoning assistants in helpingtheir users establish truth through proof.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054