Artificial intelligence (AI) is the field devoted to buildingartificial animals (or at least artificial creatures that – insuitable contexts –appear to be animals) and, formany, artificial persons (or at least artificial creatures that– in suitable contexts –appear to be persons).[1] Such goals immediately ensure that AI is a discipline of considerableinterest to many philosophers, and this has been confirmed (e.g.) bythe energetic attempt, on the part of numerous philosophers, to showthat these goals are in fact un/attainable. On the constructive side,many of the core formalisms and techniques used in AI come out of, andare indeed still much used and refined in, philosophy: first-orderlogic and its extensions; intensional logics suitable for the modelingof doxastic attitudes and deontic reasoning; inductive logic,probability theory, and probabilistic reasoning; practical reasoningand planning, and so on. In light of this, some philosophers conductAI research and developmentas philosophy.
In the present entry, the history of AI is briefly recounted, proposeddefinitions of the field are discussed, and an overview of the fieldis provided. In addition, both philosophical AI (AI pursued as and outof philosophy) and philosophyof AI are discussed, viaexamples of both. The entry ends with somede rigueurspeculative commentary regarding the future of AI.
The field of artificial intelligence (AI) officially started in 1956,launched by a small but now-famousDARPA-sponsored summer conference at Dartmouth College, in Hanover, New Hampshire.(The 50-year celebration of this conference,AI@50, was held in July 2006 at Dartmouth, with five of the originalparticipants making it back.[2] What happened at this historic conference figures in the finalsection of this entry.) Ten thinkers attended, including John McCarthy(who was working at Dartmouth in 1956), Claude Shannon, Marvin Minsky,Arthur Samuel, Trenchard Moore (apparently the lone note-taker at theoriginal conference), Ray Solomonoff, Oliver Selfridge, Allen Newell,and Herbert Simon. From where we stand now, into the start of the newmillennium, the Dartmouth conference is memorable for many reasons,including this pair: one, the term ‘artificialintelligence’ was coined there (and has long been firmlyentrenched, despite being disliked by some of the attendees, e.g.,Moore); two, Newell and Simon revealed a program – LogicTheorist (LT) – agreed by the attendees (and, indeed, by nearlyall those who learned of and about it soon after the conference) to bea remarkable achievement. LT was capable of proving elementarytheorems in the propositional calculus.[3][4]
Though theterm ‘artificial intelligence’ madeits advent at the 1956 conference, certainly thefield of AI,operationally defined (defined, i.e., as a field constituted bypractitioners who think and act in certain ways), was in operationbefore 1956. For example, in a famousMind paper of 1950,Alan Turing argues that the question “Can a machinethink?” (and here Turing is talking about standard computingmachines: machines capable of computing functions from the naturalnumbers (or pairs, triples, … thereof) to the natural numbersthat a Turing machine or equivalent can handle) should be replacedwith the question “Can a machine be linguisticallyindistinguishable from a human?.” Specifically, he proposes atest, the “Turing Test” (TT) as it’s now known. In the TT, a woman and a computer aresequestered in sealed rooms, and a human judge, in the dark as towhich of the two rooms contains which contestant, asks questions byemail (actually, byteletype, to use the original term) of thetwo. If, on the strength of returned answers, the judge can do nobetter than 50/50 when delivering a verdict as to which room houseswhich player, we say that the computer in question haspassedthe TT. Passing in this sense operationalizes linguisticindistinguishability. Later, we shall discuss the role that TT hasplayed, and indeed continues to play, in attempts to define AI. At themoment, though, the point is that in his paper, Turing explicitly laysdown the call for building machines that would provide an existenceproof of an affirmative answer to his question. The call even includesa suggestion for how such construction should proceed. (He suggeststhat “child machines” be built, and that these machinescould then gradually grow up on their own to learn to communicate innatural language at the level of adult humans. This suggestion hasarguably been followed by Rodney Brooks and the philosopher DanielDennett (1994) in the Cog Project. In addition, the Spielberg/KubrickmovieA.I. is at least in part a cinematic exploration ofTuring’s suggestion.[5]) The TT continues to be at the heart of AI and discussions of itsfoundations, as confirmed by the appearance of (Moor 2003). In fact,the TT continues to be used todefine the field, as inNilsson’s (1998) position, expressed in his textbook for thefield, that AI simply is the field devoted to building an artifactable to negotiate this test. Energy supplied by the dream ofengineering a computer that can pass TT, or by controversy surroundingclaims that it hasalready been passed, is if anythingstronger than ever, and the reader has only to do an internet searchvia the string
turing test passed
to find up-to-the-minute attempts at reaching this dream, and attempts(sometimes made by philosophers) to debunk claims that some suchattempt has succeeded.
Returning to the issue of the historical record, even if one bolstersthe claim that AI started at the 1956 conference by adding the provisothat ‘artificial intelligence’ refers to a nuts-and-boltsengineering pursuit (in which case Turing’sphilosophical discussion, despite calls for a child machine,wouldn’t exactly count as AI per se), one must confront the factthat Turing, and indeed many predecessors, did attempt to buildintelligent artifacts. In Turing’s case, such building wassurprisingly well-understood before the advent of programmablecomputers: Turing wrote a program for playing chess before there werecomputers to run such programs on, by slavishly following the codehimself. He did this well before 1950, and long before Newell (1973)gave thought in print to the possibility of a sustained, seriousattempt at building a good chess-playing computer.[6]
From the perspective of philosophy, which views the systematicinvestigation of mechanical intelligence as meaningful and productiveseparate from the specific logicist formalisms (e.g., first-orderlogic) and problems (e.g., theEntscheidungsproblem) that gavebirth to computer science, neither the 1956 conference, norTuring’sMind paper, come close to marking the start ofAI. This is easy enough to see. For example, Descartes proposed TT(not the TT by name, of course) long before Turing was born.[7] Here’s the relevant passage:
If there were machines which bore a resemblance to our body andimitated our actions as far as it was morally possible to do so, weshould always have two very certain tests by which to recognise that,for all that, they were not real men. The first is, that they couldnever use speech or other signs as we do when placing our thoughts onrecord for the benefit of others. For we can easily understand amachine’s being constituted so that it can utter words, and evenemit some responses to action on it of a corporeal kind, which bringsabout a change in its organs; for instance, if it is touched in aparticular part it may ask what we wish to say to it; if in anotherpart it may exclaim that it is being hurt, and so on. But it neverhappens that it arranges its speech in various ways, in order to replyappropriately to everything that may be said in its presence, as eventhe lowest type of man can do. And the second difference is, thatalthough machines can perform certain things as well as or perhapsbetter than any of us can do, they infallibly fall short in others, bywhich means we may discover that they did not act from knowledge, butonly for the disposition of their organs. For while reason is auniversal instrument which can serve for all contingencies, theseorgans have need of some special adaptation for every particularaction. From this it follows that it is morally impossible that thereshould be sufficient diversity in any machine to allow it to act inall the events of life in the same way as our reason causes us to act.(Descartes 1637, p. 116)
At the moment, Descartes is certainly carrying the day.[8] Turing predicted that his test would be passed by 2000, but thefireworks across the globe at the start of the new millennium havelong since died down, and the most articulate of computers stillcan’t meaningfully debate a sharp toddler. Moreover, while incertain focussed areas machines out-perform minds (IBM’s famousDeep Blue prevailed in chess over Gary Kasparov, e.g.; and morerecently, AI systems have prevailed in other games, e.g.Jeopardy! and Go, about which more will momentarily be said),minds have a (Cartesian) capacity for cultivating their expertise invirtuallyany sphere. (If it were announced to Deep Blue, orany current successor, that chess was no longer to be the game ofchoice, but rather a heretofore unplayed variant of chess, the machinewould be trounced by human children of average intelligence having nochess expertise.) AI simply hasn’t managed to creategeneral intelligence; it hasn’t even managed to producean artifact indicating thateventually it will create such athing.
But what about IBM Watson’s famous nail-biting victory in theJeopardy! game-show contest?[9] That certainly seems to be a machine triumph over humans on their“home field,” sinceJeopardy! delivers ahuman-level linguistic challenge ranging across many domains. Indeed,among many AI cognoscenti, Watson’s success is considered to bemuch more impressive than Deep Blue’s, for numerous reasons. Onereason is that while chess is generally considered to bewell-understood from the formal-computational perspective (after all,it’s well-known that there exists a perfect strategy for playingchess), in open-domainquestion-answering (QA), as in anysignificant natural-language processing task, there is no consensus asto what problem, formally speaking, one is trying to solve. Briefly,question-answering (QA) is what the reader would think it is: one asksa question of a machine, and gets an answer, where the answer has tobe produced via some “significant” computational process.(See Strzalkowski & Harabagiu (2006) for an overview of what QA,historically, has been as a field.) A bit more precisely, there is noagreement as to what underlying function, formally speaking,question-answering capability computes. This lack of agreement stemsquite naturally from the fact that there is of course no consensus asto what natural languagesare, formally speaking.[10] Despite this murkiness, and in the face of an almost universal beliefthat open-domain question-answering would remain unsolved for a decadeor more, Watson decisively beat the two top humanJeopardy!champions on the planet. During the contest, Watson had to answerquestions that required not only command of simple factoids(Question1), but also of some amount of rudimentaryreasoning (in the form of temporal reasoning) and commonsense(Question2):
Question1: The only two consecutive U.S. presidentswith the same first name.
Question2: In May 1898, Portugal celebrated the400th anniversary of this explorer’s arrival in India.
While Watson is demonstrably better than humans inJeopardy!-style quizzing (a new humanJeopardy! mastercould arrive on the scene, but as for chess, AI now assumes that asecond round of IBM-level investment would vanquish the new humanopponent), this approach does not work for the kind of NLP challengethat Descartes described; that is, Watson can’t converse on thefly. After all, some questions don’t hinge on sophisticatedinformation retrieval and machine learning over pre-existing data, butrather on intricate reasoning right on the spot. Such questions mayfor instance involve anaphora resolution, which require even deeperdegrees of commonsensical understanding of time, space, history, folkpsychology, and so on. Levesque (2013) has catalogued some alarminglysimple questions which fall in this category. (Marcus, 2013, gives anaccount of Levesque’s challenges that is accessible to a wideraudience.) The other class of question-answering tasks on which Watsonfails can be characterized asdynamic question-answering. Theseare questions for which answers may not be recorded in textual formanywhere at the time of questioning, or for which answers aredependent on factors that change with time. Two questions that fall inthis category are given below (Govindarajulu et al. 2013):
Question3: If I have 4 foos and 5 bars, and if foosare not the same as bars, how many foos will I have if I get 3 bazeswhich just happen to be foos?
Question4: What was IBM’s Sharpe ratio in thelast 60 days of trading?
Closely following Watson’s victory, in March 2016,Google DeepMind’s AlphaGo defeated one of Go’s top-ranked players, Lee Seedol, in fourout of five matches. This was considered a landmark achievement withinAI, as it was widely believed in the AI community that computervictory in Go was at least a few decades away, partly due to theenormous number of valid sequences of moves in Go compared to that in Chess.[11] While this is a remarkable achievement, it should be noted that,despite breathless coverage in the popular press,[12] AlphaGo, while indisputably a great Go player, is just that. Forexample, neither AlphaGo nor Watson can understand the rules of Gowritten in plain-and-simple English and produce a computer programthat can play the game. It’s interesting that there is oneendeavor in AI that tackles a narrow version of this very problem: Ingeneral game playing, a machine is given a description of abrand new game just before it has to play the game (Genesereth et al.2005). However, the description in question is expressed in a formallanguage, and the machine has to manage to play the game from thisdescription. Note that this is still far from understanding even asimple description of a game in English well enough to play it.
But what if we consider the history of AI not from the perspective ofphilosophy, but rather from the perspective of the field with which,today, it is most closely connected? The reference here is to computerscience. From this perspective, does AI run back to well beforeTuring? Interestingly enough, the results are the same: we find thatAI runs deep into the past, and has always had philosophy in itsveins. This is true for the simple reason that computer science grewout of logic and probability theory,[13] which in turn grew out of (and is still intertwined with) philosophy.Computer science, today, is shot through and through with logic; thetwo fields cannot be separated. This phenomenon has become an objectof study unto itself (Halpern et al. 2001). The situation is nodifferent when we are talking not about traditional logic, but ratherabout probabilistic formalisms, also a significant component ofmodern-day AI: These formalisms also grew out of philosophy, as nicelychronicled, in part, by Glymour (1992). For example, in the one mindof Pascal was born a method of rigorously calculating probabilities,conditional probability (which plays a particularly large role in AI,currently), and such fertile philosophico-probabilistic arguments asPascal’s wager, according to which it is irrational not to become a Christian.
That modern-day AI has its roots in philosophy, and in fact that thesehistorical roots are temporally deeper than even Descartes’distant day, can be seen by looking to the clever, revealing cover ofthe second edition (the third edition is the current one) of thecomprehensive textbookArtificial Intelligence: A Modern Approach (known in the AI community as simplyAIMA2e for Russell &Norvig, 2002).

Cover of AIMA2e(Russell & Norvig 2002)
What you see there is an eclectic collection of memorabilia that mightbe on and around the desk of some imaginary AI researcher. Forexample, if you look carefully, you will specifically see: a pictureof Turing, a view of Big Ben through a window (perhaps R&N areaware of the fact that Turing famously held at one point that aphysical machine with the power of a universal Turing machine isphysically impossible: he quipped that it would have to be the size ofBig Ben), a planning algorithm described in Aristotle’sDeMotu Animalium,Frege’s fascinating notation for first-order logic, a glimpse of Lewis Carroll’s (1958) pictorial representation ofsyllogistic reasoning, Ramon Lull’s concept-generating wheelfrom his 13th-centuryArs Magna, and a number ofother pregnant items (including, in a clever, recursive, andbordering-on-self-congratulatory touch, a copy ofAIMA itself).Though there is insufficient space here to make all the historicalconnections, we can safely infer from the appearance of these items(and here we of course refer to the ancient ones: Aristotle conceivedof planning as information-processing over two-and-a-half millenniaback; and in addition, as Glymour (1992) notes, Artistotle can also becredited with devising the first knowledge-bases and ontologies, twotypes of representation schemes that have long been central to AI)that AI is indeed very, very old. Even those who insist that AI is atleast in part an artifact-building enterprise must concede that, inlight of these objects, AI is ancient, for it isn’t justtheorizing from the perspective that intelligence is at bottomcomputational that runs back into the remote past of human history:Lull’s wheel, for example, marks an attempt to captureintelligence not only in computation, but in a physical artifact thatembodies that computation.[14]
AIMA has now reached its the third edition, and those interested inthe history of AI, and for that matter the history of philosophy ofmind, will not be disappointed by examination of the cover of thethird installment (the cover of the second edition is almost exactlylike the first edition). (All the elements of the cover, separatelylisted and annotated, can be foundonline.) One significant addition to the cover of the third edition is adrawing of Thomas Bayes; his appearance reflects the recent rise inthe popularity of probabilistic techniques in AI, which we discusslater.
One final point about the history of AI seems worth making.
It is generally assumed that the birth of modern-day AI in the 1950scame in large part because of and through the advent of the modernhigh-speed digital computer. This assumption accords withcommon-sense. After all, AI (and, for that matter, to some degree itscousin, cognitive science, particularly computational cognitivemodeling, the sub-field of cognitive science devoted to producingcomputational simulations of human cognition) is aimed at implementingintelligence in a computer, and it stands to reason that such a goalwould be inseparably linked with the advent of such devices. However,this is only part of the story: the part that reaches back but toTuring and others (e.g., von Neuman) responsible for the firstelectronic computers. The other part is that, as already mentioned, AIhas a particularly strong tie, historically speaking, to reasoning(logic-based and, in the need to deal with uncertainty,inductive/probabilistic reasoning). In this story, nicely told byGlymour (1992), a search for an answer to the question “What isa proof?” eventually led to an answer based on Frege’sversion of first-order logic (FOL): a (finitary) mathematical proofconsists in a series of step-by-step inferences from one formula offirst-order logic to the next. The obvious extension of this answer(and it isn’t a complete answer, given that lots of classicalmathematics, despite conventional wisdom, clearly can’t beexpressed in FOL; even the Peano Axioms, to be expressed as a finiteset of formulae, requireSOL) is to say that not onlymathematical thinking, but thinking, period, can be expressed in FOL.(This extension was entertained by many logicians long before thestart of information-processing psychology and cognitive science– a fact some cognitive psychologists and cognitive scientistsoften seem to forget.) Today, logic-based AI is onlypart ofAI, but the point is that this part still lives (with help from logicsmuch more powerful, but much more complicated, than FOL), and it canbe traced all the way back to Aristotle’s theory of the syllogism.[15] In the case of uncertain reasoning, the question isn’t“What is a proof?”, but rather questions such as“What is it rational to believe, in light of certainobservations and probabilities?” This is a question posed andtackled long before the arrival of digital computers.
So far we have been proceeding as if we have a firm and precise graspof the nature of AI. But what exactlyis AI? Philosophersarguably know better than anyone that precisely defining a particulardiscipline to the satisfaction of all relevant parties (includingthose working in the discipline itself) can be acutely challenging.Philosophers of science certainly have proposed credible accounts ofwhat constitutes at least the general shape and texture of a givenfield of science and/or engineering, but what exactly is theagreed-upon definition of physics? What about biology? What, for thatmatter, is philosophy, exactly? These are remarkably difficult, maybeeven eternally unanswerable, questions, especially if the target is aconsensus definition. Perhaps the most prudent course we canmanage here under obvious space constraints is to present inencapsulated form someproposed definitions of AI. We doinclude a glimpse of recent attempts to define AI in detailed,rigorous fashion (and we suspect that such attempts will be ofinterest to philosophers of science, and those interested in thissub-area of philosophy).
Russell and Norvig (1995, 2002, 2009), in their aforementionedAIMA text, provide a set of possible answers to the “Whatis AI?” question that has considerable currency in the fielditself. These answers all assume that AI should be defined in terms ofits goals: a candidate definition thus has the form “AI is thefield that aims at building …” The answers all fall undera quartet of types placed along two dimensions. One dimension iswhether the goal is to match human performance, or, instead, idealrationality. The other dimension is whether the goal is to buildsystems that reason/think, or rather systems that act. The situationis summed up in this table:
| Human-Based | Ideal Rationality | |
| Reasoning-Based: | Systems that think like humans. | Systems that think rationally. |
| Behavior-Based: | Systems that act like humans. | Systems that act rationally. |
Four Possible Goals for AI According to AIMA
Please note that this quartet of possibilities does reflect (at leasta significant portion of) the relevant literature. For example,philosopher John Haugeland (1985) falls into the Human/Reasoningquadrant when he says that AI is “The exciting new effort tomake computers think …machines with minds, in thefull and literal sense.” (By far, this is the quadrant that mostpopular narratives affirm and explore. The recentWestworld TV series is a powerful case in point.) Luger and Stubblefield (1993)seem to fall into the Ideal/Act quadrant when they write: “Thebranch of computer science that is concerned with the automation ofintelligent behavior.” The Human/Act position is occupied mostprominently by Turing, whose test is passed only by those systems ableto act sufficiently like a human. The “thinkingrationally” position is defended (e.g.) by Winston (1992). Whileit might not be entirely uncontroversial to assert that the four binsgiven here are exhaustive, such an assertion appears to be quiteplausible, even when the literature up to the present moment iscanvassed.
It’s important to know that the contrast between the focus onsystems that think/reason versus systems that act, while found, as wehave seen, at the heart of theAIMA texts, and at the heart ofAI itself, should not be interpreted as implying that AI researchersview their work as falling all and only within one of these twocompartments. Researchers who focus more or less exclusively onknowledge representation and reasoning, are also quite prepared toacknowledge that they are working on (what they take to be) a centralcomponent or capability within any one of a family of larger systemsspanning the reason/act distinction. The clearest case may come fromthe work on planning – an AI area traditionally making centraluse of representation and reasoning. For good or ill, much of thisresearch is done in abstraction (in vitro, as opposed to in vivo), butthe researchers involved certainly intend or at least hope that theresults of their work can be embedded into systems that actually dothings, such as, for example, execute the plans.
What about Russell and Norvig themselves? What is their answer to theWhat is AI? question? They are firmly in the “actingrationally” camp. In fact, it’s safe to say both that theyare the chief proponents of this answer, and that they have beenremarkably successful evangelists. Their extremely influentialAIMA series can be viewed as a book-length defense andspecification of the Ideal/Act category. We will look a bit later athow Russell and Norvig lay out all of AI in terms ofintelligentagents, which are systems that act in accordance with variousideal standards for rationality. But first let’s look a bitcloser at the view of intelligence underlying theAIMA text. Wecan do so by turning to Russell (1997). Here Russell recasts the“What is AI?” question as the question “What isintelligence?” (presumably under the assumption that we have agood grasp of what an artifact is), and then he identifiesintelligence withrationality. More specifically, Russell seesAI as the field devoted to buildingintelligent agents, whichare functions taking as input tuples of percepts from the externalenvironment, and producing behavior (actions) on the basis of thesepercepts. Russell’s overall picture is this one:

The Basic Picture Underlying Russell’s Account ofIntelligence/Rationality
Let’s unpack this diagram a bit, and take a look, first, at theaccount ofperfect rationality that can be derived from it. Thebehavior of the agent in the environment \(E\) (from a class \(\bE\)of environments) produces a sequence of states or snapshots of thatenvironment. A performance measure \(U\) evaluates this sequence;notice the box labeled “Performance Measure” in the abovefigure. We let \(V(f,\bE,U)\) denote theexpected utilityaccording to \(U\) of the agent function \(f\) operating on \(\bE\).[16] Now we identify a perfectly rational agent with the agent function:
\[\tag{1}\label{eq1}f_{\opt} = \argmax_f V(f,\bE,U)\]According to the above equation, a perfectly rational agent can betaken to be the function \(f_{opt}\) which produces the maximumexpected utility in the environment under consideration. Of course, asRussell points out, it’s usually not possible to actually buildperfectly rational agents. For example, though it’s easy enoughto specify an algorithm for playing invincible chess, it’s notfeasible to implement this algorithm. What traditionally happens in AIis that programs that are – to use Russell’s aptterminology –calculatively rational are constructedinstead: these are programs that,if executed infinitely fast,would result in perfectly rational behavior. In the case of chess,this would mean that we strive to write a program that runs analgorithm capable, in principle, of finding a flawless move, but weadd features that truncate the search for this move in order to playwithin intervals of digestible duration.
Russell himself champions a new brand of intelligence/rationality forAI; he calls this brandbounded optimality. To understandRussell’s view, first we follow him in introducing adistinction: We say that agents have two components: a program, and amachine upon which the program runs. We write \(Agent(P, M)\) todenote the agent function implemented by program \(P\) running onmachine \(M\). Now, let \(\mathcal{P}(M)\) denote the set of allprograms \(P\) that can run on machine \(M\). Theboundedoptimal program \(P_{\opt,M}\) then is:
\[P_{\opt,M}=\argmax_{P\in\mathcal{P}(M)}V(\mathit{Agent}(P,M),\bE,U)\]You can understand this equation in terms of any of the mathematicalidealizations for standard computation. For example, machines can beidentified with Turing machines minus instructions (i.e., TMs are hereviewed architecturally only: as having tapes divided into squares uponwhich symbols can be written, read/write heads capable of moving upand down the tape to write and erase, and control units which are inone of a finite number of states at any time), and programs can beidentified with instructions in the Turing-machine model (telling themachine to write and erase symbols, depending upon what state themachine is in). So, if you are told that you must“program” within the constraints of a 22-state Turingmachine, you could search for the “best” program giventhose constraints. In other words, you could strive to find theoptimal program within the bounds of the 22-state architecture.Russell’s (1997) view is thus that AI is the field devoted tocreating optimal programs for intelligent agents, under time and spaceconstraints on the machines implementing these programs.[17]
The reader must have noticed that in the equation for \(P_{\opt,M}\)we have not elaborated on \(\bE\) and \(U\) and how equation\eqref{eq1} might be used to construct an agent if the class ofenvironments \(\bE\) is quite general, or if the true environment\(E\) is simply unknown. Depending on the task for which one isconstructing an artificial agent, \(E\) and \(U\) would vary. Themathematical form of the environment \(E\) and the utility function\(U\) would vary wildly from, say, chess toJeopardy!. Ofcourse, if we were to design a globally intelligent agent, and notjust a chess-playing agent, we could get away with having just onepair of \(E\) and \(U\). What would \(E\) look like if we werebuilding a generally intelligent agent and not just an agent that isgood at a single task? \(E\) would be a model of not just a singlegame or a task, but the entire physical-social-virtual universeconsisting of many games, tasks, situations, problems, etc. Thisproject is (at least currently) hopelessly difficult as, obviously, weare nowhere near to having such a comprehensive theory-of-everythingmodel. For further discussion of a theoretical architecture putforward for this problem, see theSupplement on the AIXI architecture.
It should be mentioned that there is a different, much morestraightforward answer to the “What is AI?” question. Thisanswer, which goes back to the days of the original Dartmouthconference, was expressed by, among others, Newell (1973), one of thegrandfathers of modern-day AI (recall that he attended the 1956conference); it is:
AI is the field devoted to building artifacts that are intelligent,where ‘intelligent’ is operationalized throughintelligence tests (such as the Wechsler Adult Intelligence Scale),and other tests of mental ability (including, e.g., tests ofmechanical ability, creativity, and so on).
The above definition can be seen as fully specifying a concreteversion of Russell and Norvig’s four possible goals. Though feware aware of this now, this answer was taken quite seriously for awhile, and in fact underlied one of the most famous programs in thehistory of AI: the ANALOGY program of Evans (1968), which solvedgeometric analogy problems of a type seen in many intelligence tests.An attempt to rigorously define this forgotten form of AI (as whatthey dubPsychometric AI), and to resurrect it from the days ofNewell and Evans, is provided by Bringsjord and Schimanski (2003) [seealso e.g. (Bringsjord 2011)]. A sizable private investment has beenmade in the ongoing attempt, now known asProject Aristo, to build a “digital Aristotle”, in the form of a machineable to excel on standardized tests such at the AP exams tackled by UShigh school students (Friedland et al. 2004). (Vibrant work in thisdirection continues today at theAllen Institute for Artificial Intelligence.)[18] In addition, researchers at Northwestern have forged a connectionbetween AI and tests of mechanical ability (Klenk et al. 2005).
In the end, as is the case with any discipline, to really knowprecisely what that discipline is requires you to, at least to somedegree, dive in and do, or at least dive in and read. Two decades agosuch a dive was quite manageable. Today, because the content that hascome to constitute AI has mushroomed, the dive (or at least the swimafter it) is a bit more demanding.
There are a number of ways of “carving up” AI. By far themost prudent and productive way to summarize the field is to turn yetagain to theAIMA text given its comprehensive overview of thefield.
As Russell and Norvig (2009) tell us in the Preface ofAIMA:
The main unifying theme is the idea of an intelligent agent. We defineAI as the study of agents that receive percepts from the environmentand perform actions. Each such agent implements a function that mapspercept sequences to actions, and we cover different ways to representthese functions… (Russell & Norvig 2009, vii)
The basic picture is thus summed up in this figure:

Impressionistic Overview of an Intelligent Agent
The content ofAIMA derives, essentially, from fleshing outthis picture; that is, the above figure corresponds to the differentways of representing the overall function that intelligent agentsimplement. And there is a progression from the least powerful agentsup to the more powerful ones. The following figure gives a high-levelview of a simple kind of agent discussed early in the book. (Thoughsimple, this sort of agent corresponds to the architecture ofrepresentation-free agents designed and implemented by Rodney Brooks,1991.)

A Simple Reflex Agent
As the book progresses, agents get increasingly sophisticated, and theimplementation of the function they represent thus draws from more andmore of what AI can currently muster. The following figure gives anoverview of an agent that is a bit smarter than the simple reflexagent. This smarter agent has the ability to internally model theoutside world, and is therefore not simply at the mercy of what can atthe moment be directly sensed.

A More Sophisticated Reflex Agent
There are seven parts toAIMA. As the reader passes throughthese parts, she is introduced to agents that take on the powersdiscussed in each part. Part I is an introduction to the agent-basedview. Part II is concerned with giving an intelligent agent thecapacity to think ahead a few steps in clearly defined environments.Examples here include agents able to successfully play games ofperfect information, such as chess. Part III deals with agents thathave declarative knowledge and can reason in ways that will be quitefamiliar to most philosophers and logicians (e.g., knowledge-basedagents deduce what actions should be taken to secure their goals).Part IV of the book outfits agents with the power to handleuncertainty by reasoning in probabilistic fashion.[19] In Part V, agents are given a capacity to learn. The following figureshows the overall structure of a learning agent.

A Learning Agent
The final set of powers agents are given allow them to communicate.These powers are covered in Part VI.
Philosophers who patiently travel the entire progression ofincreasingly smart agents will no doubt ask, when reaching the end ofPart VII, if anything is missing. Are we given enough, in general, tobuild an artificial person, or is there enough only to build a mereanimal? This question is implicit in the following from Charniak andMcDermott (1985):
The ultimate goal of AI (which we are very far from achieving) is tobuild a person, or, more humbly, an animal. (Charniak & McDermott1985, 7)
To their credit, Russell & Norvig, inAIMA’s Chapter27, “AI: Present and Future,” consider this question, atleast to some degree.[20] They do so by considering some challenges to AI that have hithertonot been met. One of these challenges is described by R&N asfollows:
[M]achine learning has made very little progress on the importantproblem of constructing new representations at levels of abstractionhigher than the input vocabulary. In computer vision, for example,learning complex concepts such as Classroom and Cafeteria would bemade unnecessarily difficult if the agent were forced to work frompixels as the input representation; instead, the agent needs to beable to form intermediate concepts first, such as Desk and Tray,without explicit human supervision. Similar concepts apply to learningbehavior:HavingACupOfTea is a very important high-level stepin many plans, but how does it get into an action library thatinitially contains much simpler actions such as RaiseArm and Swallow?Perhaps this will incorporatedeep belief networks –Bayesian networks that have multiple layers of hidden variables, as inthe work of Hintonet al. (2006), Hawkins and Blakeslee (2004),and Bengio and LeCun (2007). … Unless we understand suchissues, we are faced with the daunting task of constructing largecommonsense knowledge bases by hand, and approach that has not faredwell to date. (Russell & Norvig 2009, Ch. 27.1)
While there has seen some advances in addressing this challenge (inthe form ofdeep learning orrepresentation learning),this specific challenge is actually merely a foothill before a rangeof dizzyingly high mountains that AI must eventually somehow manage toclimb. One of those mountains, put simply, isreading.[21] Despite the fact that, as noted, Part V ofAIMA is devoted tomachine learning, AI, as it stands, offers next to nothing in the wayof a mechanization of learning by reading. Yet when you think aboutit, reading is probably the dominant way you learn at this stage inyour life. Consider what you’re doing at this very moment.It’s a good bet that you are reading this sentence because,earlier, you set yourself the goal of learning about the field of AI.Yet the formal models of learning provided inAIMA’s Part IV(which are all and only the models at play in AI) cannot be applied tolearning by reading.[22] These models all start with afunction-based view of learning.According to this view, to learn is almost invariably to produce anunderlying function \(\ff\) on the basis of a restricted set ofpairs
\[ \left\{\left\langle x_1, \ff(x_1)\right\rangle,\left\langle x_2, \ff(x_2)\right\rangle, \ldots, \left\langle x_n, \ff(x_n)\right\rangle\right\}.\]For example, consider receiving inputs consisting of 1, 2, 3, 4, and5, and corresponding range values of 1, 4, 9, 16, and 25; the goal isto “learn” the underlying mapping from natural numbers tonatural numbers. In this case, assume that the underlying function is\(n^2\), and that you do “learn” it. While this narrowmodel of learning can be productively applied to a number ofprocesses, the process of reading isn’t one of them. Learning byreading cannot (at least for the foreseeable future) be modeled asdivining a function that produces argument-value pairs. Instead, yourreading about AI can pay dividends only if your knowledge hasincreased in the right way,and if that knowledge leaves youpoised to be able to produce behavior taken to confirm sufficientmastery of the subject area in question. This behavior can range fromcorrectly answering and justifying test questions regarding AI, toproducing a robust, compelling presentation or paper that signals yourachievement.
Two points deserve to be made about machine reading. First, it may notbe clear to all readers that reading is an ability that is central tointelligence. The centrality derives from the fact that intelligencerequires vast knowledge. We have no other means of getting systematicknowledge into a system than to get it in from text, whether text onthe web, text in libraries, newspapers, and so on. You might even saythat the big problem with AI has been that machines really don’tknow much compared to humans. That can only be because of the factthat humans read (or hear: illiterate people can listen to text beinguttered and learn that way). Either machines gain knowledge by humansmanually encoding and inserting knowledge, or by reading andlistening. These are brute facts. (We leave aside supernaturaltechniques, of course. Oddly enough, Turing didn’t: he seemed tothink ESP should be discussed in connection with the powers of mindsand machines. See Turing, 1950.)[23]
Now for the second point. Humans able to read have invariably alsolearned a language, and learning languages has been modeled inconformity to the function-based approach adumbrated just above(Osherson et al. 1986). However, this doesn’t entail that anartificial agent able to read, at least to a significant degree, musthave really and truly learned a natural language. AI is first andforemost concerned with engineering computational artifacts thatmeasure up to some test (where, yes, sometimes that test is from thehuman sphere), not with whether these artifacts process information inways that match those present in the human case. It may or may not benecessary, when engineering a machine that can read, to imbue thatmachine with human-level linguistic competence. The issue isempirical, and as time unfolds, and the engineering is pursued, weshall no doubt see the issue settled.
Two additional high mountains facing AI are subjective consciousnessand creativity, yet it would seem that these great challenges are onesthe field apparently hasn’t even come to grips with. Mentalphenomena of paramount importance to many philosophers of mind andneuroscience are simply missing fromAIMA. For example,consciousness is only mentioned in passing inAIMA, butsubjective consciousness is the most important thing in our lives– indeed we only desire to go on living because we wish to go onenjoying subjective states of certain types. Moreover, if human mindsare the product of evolution, then presumably phenomenal consciousnesshas great survival value, and would be of tremendous help to a robotintended to have at least the behavioral repertoire of the firstcreatures with brains that match our own (hunter-gatherers; see Pinker1997). Of course, subjective consciousness is largely missing from thesister fields of cognitive psychology and computational cognitivemodeling as well. We discuss some of these challenges in thePhilosophy of Artificial Intelligence section below. For a list of similar challenges to cognitive science,see the relevantsection of the entry on cognitive science.[24]
To some readers, it might seem in the very least tendentious to pointto subjective consciousness as a major challenge to AI that it has yetto address. These readers might be of the view that pointing to thisproblem is to look at AI through a distinctively philosophical prism,and indeed a controversial philosophical standpoint.
But as its literature makes clear, AI measures itself by looking toanimals and humans and picking out in them remarkable mental powers,and by then seeing if these powers can be mechanized. Arguably thepower most important to humans (the capacity to experience) is nowhereto be found on the target list of most AI researchers. There may be agood reason for this (no formalism is at hand, perhaps), but there isno denying the state of affairs in question obtains, and that, inlight of how AI measures itself, that it’s worrisome.
As to creativity, it’s quite remarkable that the power we mostpraise in human minds is nowhere to be found inAIMA. Just asin (Charniak & McDermott 1985) one cannot find‘neural’ in the index, ‘creativity’can’t be found in the index ofAIMA. This is particularlyodd because many AI researchers have in fact worked on creativity(especially those coming out of philosophy; e.g., Boden 1994,Bringsjord & Ferrucci 2000).
Although the focus has been onAIMA, any of its counterpartscould have been used. As an example, considerArtificialIntelligence: A New Synthesis, by Nils Nilsson. As in the case ofAIMA, everything here revolves around a gradual progressionfrom the simplest of agents (in Nilsson’s case,reactiveagents), to ones having more and more of those powers thatdistinguish persons. Energetic readers can verify that there is astriking parallel between the main sections of Nilsson’s bookandAIMA. In addition, Nilsson, like Russell and Norvig,ignores phenomenal consciousness, reading, and creativity. None of thethree are even mentioned. Likewise, a recent comprehensive AI textbookby Luger (2008) follows the same pattern.
A final point to wrap up this section. It seems quite plausible tohold that there is a certain inevitability to the structure of an AItextbook, and the apparent reason is perhaps rather interesting. Inpersonal conversation, Jim Hendler, a well-known AI researcher who isone of the main innovators behind Semantic Web (Berners-Lee, Hendler,Lassila 2001), an under-development “AI-ready” version ofthe World Wide Web, has said that this inevitability can be rathereasily displayed when teaching Introduction to AI; here’s how.Begin by asking students what they think AI is. Invariably, manystudents will volunteer that AI is the field devoted to buildingartificial creatures that are intelligent. Next, ask for examples ofintelligent creatures. Students always respond by giving examplesacross a continuum: simple multi-cellular organisms, insects, rodents,lower mammals, higher mammals (culminating in the great apes), andfinally human persons. When students are asked to describe thedifferences between the creatures they have cited, they end upessentially describing the progression from simple agents to oneshaving our (e.g.) communicative powers. This progression gives theskeleton of every comprehensive AI textbook. Why does this happen? Theanswer seems clear: it happens because we can’t resistconceiving of AI in terms of the powers of extant creatures with whichwe are familiar. At least at present, persons, and the creatures whoenjoy only bits and pieces of personhood, are – to repeat– the measure of AI.[25]
Reasoning based on classical deductive logic is monotonic; that is, if\(\Phi\vdash\phi\), then for all \(\psi\), \(\Phi\cup\{\psi\}\vdash\phi\). Commonsense reasoning is not monotonic. Whileyou may currently believe on the basis of reasoning that your house isstill standing, if while at work you see on your computer screen thata vast tornado is moving through the location of your house, you willdrop this belief. The addition of new information causes previousinferences to fail. In the simpler example that has become an AIstaple, if I tell you that Tweety is a bird, you will infer thatTweety can fly, but if I then inform you that Tweety is a penguin, theinference evaporates, as well it should. Nonmonotonic (or defeasible)logic includes formalisms designed to capture the mechanismsunderlying these kinds of examples. See the separate entry onlogic and artificial intelligence, which is focused on nonmonotonic reasoning, and reasoning about timeand change. It also provides a history of the early days oflogic-based AI, making clear the contributions of those who foundedthe tradition (e.g., John McCarthy and Pat Hayes; see their seminal1969 paper).
The formalisms and techniques of logic-based AI have reached a levelof impressive maturity – so much so that in various academic andcorporate laboratories, implementations of these formalisms andtechniques can be used to engineer robust, real-world software. It isstrongly recommend that readers who have an interest to learn where AIstands in these areas consult (Mueller 2006), which provides, in onevolume, integrated coverage of nonmonotonic reasoning (in the form,specifically, of circumscription), and reasoning about time and changein the situation and event calculi. (The former calculus is alsointroduced by Thomason. In the second, timepoints are included, amongother things.) The other nice thing about (Mueller 2006) is that thelogic used is multi-sorted first-order logic (MSL), which hasunificatory power that will be known to and appreciated by manytechnical philosophers and logicians (Manzano 1996).
We now turn to three further topics of importance in AI. They are:
This trio is covered in order, beginning with the first.
Detailed accounts of logicist AI that fall under the agent-basedscheme can be found in (Lenat 1983, Lenat & Guha 1990, Nilsson1991, Bringsjord & Ferrucci 1998).[26]. The core idea is that an intelligent agent receives percepts from theexternal world in the form of formulae in some logical system (e.g.,first-order logic), and infers, on the basis of these percepts and itsknowledge base, what actions should be performed to secure theagent’s goals. (This is of course a barbaric simplification.Information from the external world isencoded in formulae,and transducers to accomplish this feat may be components of theagent.)
To clarify things a bit, we consider, briefly, the logicist view inconnection with arbitrarylogical systems \(\mathcal{L}_{X}\).[27] We obtain a particular logical system by setting \(X\) in theappropriate way. Some examples: If \(X=I\), then we have a system atthe level of FOL [following the standard notation from model theory;see e.g. (Ebbinghaus et al. 1984)]. \(\mathcal{L}_{II}\) issecond-order logic, and \(\mathcal{L}_{\omega_I\omega}\) is a“small system” of infinitary logic (countably infiniteconjunctions and disjunctions are permitted). These logical systemsare allextensional, but there areintensional ones aswell. For example, we can have logical systems corresponding to thoseseen in standard propositional modal logic (Chellas 1980). Onepossibility, familiar to many philosophers, would be propositionalKT45, or \(\mathcal{L}_{KT45}\).[28] In each case, the system in question includes a relevant alphabetfrom which well-formed formulae are constructed by way of a formalgrammar, a reasoning (or proof) theory, a formal semantics, and atleast some meta-theoretical results (soundness, completeness, etc.).Taking off from standard notation, we can thus say that a set offormulas in some particular logical system \(\mathcal{L}_X\),\(\Phi_{\mathcal{L}_X}\), can be used, in conjunction with somereasoning theory, to infer some particular formula\(\phi_{\mathcal{L}_X}\). (The reasoning may be deductive, inductive,abductive, and so on. Logicist AI isn’t in the least restrictedto any particular mode of reasoning.) To say that such a situationholds, we write \[ \Phi_{\mathcal{L}_X} \vdash_{\mathcal{L}_X} \phi_{\mathcal{L}_X} \]
When the logical system referred to is clear from context, or when wedon’t care about which logical system is involved, we can simplywrite \[ \Phi \vdash \phi \]
Each logical system, in its formal semantics, will include objectsdesigned to represent ways the world pointed to by formulae in thissystem can be. Let these ways be denoted by \(W^i_{{\mathcal{L}_X}}\).When we aren’t concerned with which logical system is involved,we can simply write \(W^i\). To say that such a way models a formula\(\phi\) we write \[ W_i \models \phi \]
We extend this to a set of formulas in the natural way:\(W^i\models\Phi\) means that all the elements of \(\Phi\) are true on\(W^i\). Now, using the simple machinery we’ve established, wecan describe, in broad strokes, the life of an intelligent agent thatconforms to the logicist point of view. This life conforms to thebasic cycle that undergirds intelligent agents in theAIMAsense.
To begin, we assume that the human designer, after studying the world,uses the language of a particular logical system to give to our agentan initial set of beliefs \(\Delta_0\) about what this world is like.In doing so, the designer works with a formal model of this world,\(W\), and ensures that \(W\models\Delta_0\). Following tradition, werefer to \(\Delta_0\) as the agent’s (starting)knowledgebase. (This terminology, given that we are talking about theagent’sbeliefs, is known to be peculiar, but itpersists.) Next, the agentADJUSTS its knowlege base to producea new one, \(\Delta_1\). We say that adjustment is carried out by wayof an operation \(\mathcal{A}\); so\(\mathcal{A}[\Delta_0]=\Delta_1\). How does the adjustment process,\(\mathcal{A}\), work? There are many possibilities. Unfortunately,many believe that the simplest possibility (viz.,\(\mathcal{A}[\Delta_i]\) equals the set of all formulas that can bededuced in some elementary manner from \(\Delta_i\)) exhaustsall the possibilities. The reality is that adjustment, asindicated above, can come by way ofany mode of reasoning– induction, abduction, and yes, various forms of deductioncorresponding to the logical system in play. For present purposes,it’s not important that we carefully enumerate all the options.
The cycle continues when the agentACTS on the environment, inan attempt to secure its goals. Acting, of course, can cause changesto the environment. At this point, the agentSENSES theenvironment, and this new information \(\Gamma_1\) factors into theprocess of adjustment, so that\(\mathcal{A}[\Delta_1\cup\Gamma_1]=\Delta_2\). The cycle ofSENSES\(\Rightarrow\) ADJUSTS \(\Rightarrow\) ACTS continues to producethe life \(\Delta_0,\Delta_1,\Delta_2,\Delta_3,\ldots,\) … ofour agent.
It may strike you as preposterous that logicist AI be touted as anapproach taken to replicateall of cognition. Reasoning overformulae in some logical system might be appropriate forcomputationally capturing high-level tasks like trying to solve a mathproblem (or devising an outline for an entry in the StanfordEncyclopedia of Philosophy), but how could such reasoning apply totasks like those a hawk tackles when swooping down to capturescurrying prey? In the human sphere, the task successfully negotiatedby athletes would seem to be in the same category. Surely, some willdeclare, an outfielder chasing down a fly ball doesn’t provetheorems to figure out how to pull off a diving catch to save thegame! Two brutally reductionistic arguments can be given in support ofthis “logicist theory of everything” approach towardscognition. The first stems from the fact that a complete proofcalculus for just first-order logic can simulate all of Turing-levelcomputation (Chapter 11, Boolos et al. 2007). The second justificationcomes from the role logic plays in foundational theories ofmathematics and mathematical reasoning. Not only are foundationaltheories of mathematics cast in logic (Potter 2004), but there havebeen successful projects resulting in machine verification of ordinarynon-trivial theorems, e.g., in theMizar project alone around 50,000 theorems have been verified (Naumowicz andKornilowicz 2009). The argument goes that if any approach to AI can becast mathematically, then it can be cast in a logicist form.
Needless to say, such a declaration has been carefully considered bylogicists beyond the reductionistic argument given above. For example,Rosenschein and Kaelbling (1986) describe a method in which logic isused to specify finite state machines. These machines are used at“run time” for rapid, reactive processing. In thisapproach, though the finite state machines contain no logic in thetraditional sense, they are produced by logic and inference. Realrobot control via first-order theorem proving has been demonstrated byAmir and Maynard-Reid (1999, 2000, 2001). In fact, you candownload version 2.0 of the software that makes this approach real for a Nomad200 mobile robot in an office environment. Of course, negotiating anoffice environment is a far cry from the rapid adjustments anoutfielder for the Yankees routinely puts on display, but certainlyit’s an open question as to whether future machines will be ableto mimic such feats through rapid reasoning. The question is open iffor no other reason than that all must concede that the constantincrease in reasoning speed of first-order theorem provers isbreathtaking. (For up-to-date news on this increase, visit and monitortheTPTP site.) There is no known reason why the software engineering in questioncannot continue to produce speed gains that would eventually allow anartificial creature to catch a fly ball by processing information inpurely logicist fashion.
Now we come to the second topic related to logicist AI that warrantsmention herein: common logic and the intensifying quest forinteroperability between logic-based systems using different logics.Only a few brief comments are offered.[29] Readers wanting more can explore the links provided in the course ofthe summary.
One standardization is through what is known as Common Logic (CL), andvariants thereof. (CL is published as anISO standard – ISO is the International Standards Organization.)Philosophers interested in logic, and of course logicians, will findCL to be quite fascinating. From an historical perspective, the adventof CL is interesting in no small part because the person spearheadingit is none other than Pat Hayes, the same Hayes who, as we have seen,worked with McCarthy to establish logicist AI in the 1960s. ThoughHayes was not at the original 1956 Dartmouth conference, he certainlymust be regarded as one of the founders of contemporary AI.) One ofthe interesting things about CL, at least as we see it, is that itsignifies a trend toward the marriage of logics, and programminglanguages and environments. Another system that is a logic/programminghybrid isAthena, which can be used as a programming language, and is at the same timea form of MSL. Athena is based on formal systems known asdenotational proof languages (Arkoudas 2000).
How is interoperability between two systems to be enabled by CL?Suppose one of these systems is based on logic \(L\), and the other on\(L'\). (To ease exposition, assume that both logics are first-order.)The idea is that a theory \(\Phi_L\), that is, a set of formulae in\(L\), can be translated into CL, producing \(\Phi_{CL}\), and thenthis theory can be translated into \(\Phi_L'\). CL thus becomes aninter lingua. Note that what counts as a well-formed formula in\(L\) can be different than what counts as one in \(L'\). The twologics might also have different proof theories. For example,inference in \(L\) might be based on resolution, while inference in\(L'\) is of the natural deduction variety. Finally, the symbol setswill be different. Despite these differences, courtesy of thetranslations, desired behavior can be produced across the translation.That, at any rate, is the hope. The technical challenges here areimmense, but federal monies are increasingly available for attacks onthe problem of interoperability.
Now for the third topic in this section: what can be calledencoding down. The technique is easy to understand. Supposethat we have on hand a set \(\Phi\) of first-order axioms. As iswell-known, the problem of deciding, for arbitrary formula \(\phi\),whether or not it’s deducible from \(\Phi\) isTuring-undecidable: there is no Turing machine or equivalent that cancorrectly return “Yes” or “No” in the generalcase. However, if the domain in question is finite, we can encode thisproblem down to the propositional calculus. An assertion that allthings have \(F\) is of course equivalent to the assertion that\(Fa\), \(Fb\), \(Fc\), as long as the domain contains only thesethree objects. So here a first-order quantified formula becomes aconjunction in the propositional calculus. Determining whether suchconjunctions are provable from axioms themselves expressed in thepropositional calculus is Turing-decidable, and in addition, incertain clusters of cases, the check can be done very quickly in thepropositional case;very quickly. Readers interested inencoding down to the propositional calculus should consult recentDARPA-sponsored work by Bart Selman. Please note that the target of encoding down doesn’t need to bethe propositional calculus. Because it’s generally harder formachines to find proofs in an intensional logic than in straightfirst-order logic, it is often expedient to encode down the former tothe latter. For example, propositional modal logic can be encoded inmulti-sorted logic (a variant of FOL); see (Arkoudas & Bringsjord2005). Prominent usage of such an encoding down can be found in a setof systems known asDescription Logics, which are a set oflogics less expressive than first-order logic but more expressive thanpropositional logic (Baader et al. 2003). Description logics are usedto reason about ontologies in a given domain and have beensuccessfully used, for example, in the biomedical domain (Smith et al.2007).
It’s tempting to define non-logicist AI by negation: an approachto building intelligent agents that rejects the distinguishingfeatures of logicist AI. Such a shortcut would imply that the agentsengineered by non-logicist AI researchers and developers, whatever thevirtues of such agents might be, cannot be said to know that \(\phi\);– for the simple reason that, by negation, the non-logicistparadigm would have not even a single declarative proposition that isa candidate for \(\phi\);. However, this isn’t a particularlyenlightening way to define non-symbolic AI. A more productive approachis to say that non-symbolic AI is AI carried out on the basis ofparticular formalisms other than logical systems, and to thenenumerate those formalisms. It will turn out, of course, that theseformalisms fail to include knowledge in the normal sense. (Inphilosophy, as is well-known, the normal sense is one according towhich if \(p\) is known, \(p\) is a declarative statement.)
From the standpoint of formalisms other than logical systems,non-logicist AI can be partitioned into symbolic but non-logicistapproaches, and connectionist/neurocomputational approaches. (AIcarried out on the basis of symbolic, declarative structures that, forreadability and ease of use, are not treated directly by researchersas elements of formal logics, does not count. In this category falltraditional semantic networks, Schank’s (1972) conceptualdependency scheme, frame-based schemes, and other such schemes.) Theformer approaches, today, are probabilistic, and are based on theformalisms (Bayesian networks) coveredbelow. The latter approaches are based, as we have noted, on formalisms thatcan be broadly termed “neurocomputational.” Given ourspace constraints, only one of the formalisms in this category isdescribed here (and briefly at that): the aforementionedartificialneural networks.[30]. Though artificial neural networks, with an appropriate architecture,could be used for arbitrary computation, they are almost exclusivelyused for building learning systems.
Neural nets are composed ofunits ornodes designed torepresent neurons, which are connected bylinks designed torepresent dendrites, each of which has a numericweight.

A “Neuron” Within an Artificial Neural Network (fromAIMA3e)
It is usually assumed that some of the units work in symbiosis withthe external environment; these units form the sets ofinputandoutput units. Each unit has a currentactivationlevel, which is its output, and can compute, based on its inputsand weights on those inputs, its activation level at the next momentin time. This computation is entirely local: a unit takes account ofbut its neighbors in the net. This local computation is calculated intwo stages. First, theinput function, \(in_i\), gives theweighted sum of the unit’s input values, that is, the sum of theinput activations multiplied by their weights:
\[in_i = \displaystyle\sum_j W_{ji} a_j\]In the second stage, theactivation function, \(g\), takes theinput from the first stage as argument and generates the output, oractivation level, \(a_i\):
\[a_i = g(in_i) = g \left(\displaystyle\sum_j W_{ji}a_j\right)\]One common (and confessedly elementary) choice for the activationfunction (which usually governs all units in a given net) is the stepfunction, which usually has a threshold \(t\) that sees to it that a 1is output when the input is greater than \(t\), and that 0 is outputotherwise. This is supposed to be “brain-like” to somedegree, given that 1 represents the firing of a pulse from a neuronthrough an axon, and 0 represents no firing. A simple three-layerneural net is shown in the following picture.

A Simple Three-Layer Artificial Neural Network (fromAIMA3e)
As you might imagine, there are many different kinds of neuralnetworks. The main distinction is betweenfeed-forward andrecurrent networks. In feed-forward networks like the onepictured immediately above, as their name suggests, links moveinformation in one direction, and there are no cycles; recurrentnetworks allow for cycling back, and can become rather complicated.For a more detailed presentation, see the
Neural networks were fundamentally plagued by the fact that while theyare simple and have theoretically efficient learning algorithms, whenthey are multi-layered and thus sufficiently expressive to representnon-linear functions, they were very hard to train in practice. Thischanged in the mid 2000s with the advent of methods that exploitstate-of-the-art hardware better (Rajat et al. 2009). Thebackpropagation method for training multi-layered neural networks canbe translated into a sequence of repeated simple arithmetic operationson a large set of numbers. The general trend in computing hardware hasfavored algorithms that are able to do a large of number of simpleoperations that are not that dependent on each other, versus a smallof number of complex and intricate operations.
Another key recent observation is that deep neural networks can bepre-trained first in an unsupervised phase where they are just feddata without any labels for the data. Each hidden layer is forced torepresent the outputs of the layer below. The outcome of this trainingis a series of layers which represent the input domain with increasinglevels of abstraction. For example, if we pre-train the network withimages of faces, we would get a first layer which is good at detectingedges in images, a second layer which can combine edges to form facialfeatures such as eyes, noses etc., a third layer which responds togroups of features, and so on (LeCun et al. 2015).
Perhaps the best technique for teaching students about neural networksin the context of other statistical learning formalisms and methods isto focus on a specific problem, preferably one that seems unnatural totackle using logicist techniques. The task is then to seek to engineera solution to the problem,using any and all techniquesavailable. One nice problem is handwriting recognition (whichalso happens to have a rich philosophical dimension; see e.g.Hofstadter & McGraw 1995). For example, consider the problem ofassigning, given as input a handwritten digit \(d\), the correctdigit, 0 through 9. Because there is a database of 60,000 labeleddigits available to researchers (from the National Institute ofScience and Technology), this problem has evolved into a benchmarkproblem for comparing learning algorithms. It turns out that neuralnetworks currently reign as the best approach to the problem accordingto a recent ranking by Benenson (2016).
Readers interested in AI (and computational cognitive science) pursuedfrom an overtly brain-based orientation are encouraged to explore thework of Rick Granger (2004a, 2004b) and researchers in hisBrain Engineering Laboratory andW. H. Neukom Institute for Computational Sciences. The contrast between the “dry”, logicist AI started atthe original 1956 conference, and the approach taken here by Grangerand associates (in which brain circuitry is directly modeled) isremarkable. For those interested in computational properties of neuralnetworks, Hornik et al. (1989) address the general representationcapability of neural networks independent of learning.
At this point the reader has been exposed to the chief formalisms inAI, and may wonder about heterogeneous approaches that bridge them. Isthere such research and development in AI? Yes. From anengineering standpoint, such work makes irresistibly goodsense. There is now an understanding that, in order to buildapplications that get the job done, one should choose from a toolboxthat includes logicist, probabilistic/Bayesian, and neurocomputationaltechniques. Given that the original top-down logicist paradigm isalive and thriving (e.g., see Brachman & Levesque 2004, Mueller2006), and that, as noted, a resurgence of Bayesian andneurocomputational approaches has placed these two paradigms on solid,fertile footing as well, AI now moves forward, armed with thisfundamental triad, and it is a virtual certainty that applications(e.g., robots) will be engineered by drawing from elements of allthree. Watson’s DeepQA architecture is one recent example of anengineering system that leverages multiple paradigms. For a detaileddiscussion, see the
Supplement on Watson’s DeepQA Architecture.
Google DeepMind’s AlphaGo is another example of a multi-paradigmsystem, although in a much narrower form than Watson. The centralalgorithmic problem in games such as Go or Chess is to search througha vast sequence of valid moves. For most non-trivial games, this isnot feasible to do so exhaustively. The Monte Carlo tree search (MCTS)algorithm gets around this obstacle by searching through an enormousspace of valid moves in a statistical fashion (Browne et al. 2012).While MCTS is the central algorithm in AlpaGo, there are two neuralnetworks which help evaluate states in the game and help model howexpert opponents play (Silver et al. 2016). It should be noted thatMCTS is behind almost all the winning submissions in general gameplaying (Finnsson 2012).
What, though, about deep, theoretical integration of the mainparadigms in AI? Such integration is at present only a possibility forthe future, but readers are directed to the research of some strivingfor such integration. For example: Sun (1994, 2002) has been workingto demonstrate that human cognition that is on its face symbolic innature (e.g., professional philosophizing in the analytic tradition,which deals explicitly with arguments and definitions carefullysymbolized) can arise from cognition that is neurocomputational innature. Koller (1997) has investigated the marriage betweenprobability theory and logic. And, in general, the very recent arrivalof so-calledhuman-level AI is being led by theorists seekingto genuinely integrate the three paradigms set out above (e.g.,Cassimatis 2006).
Finally, we note thatcognitive architectures such as Soar(Laird 2012) and PolyScheme (Cassimatis 2006) are another area whereintegration of different fields of AI can be found. For example, onesuch endeavor striving to build human-level AI is the Companionsproject (Forbus and Hinrichs 2006). Companions are long-lived systemsthat strive to be human-level AI systems that function ascollaborators with humans. The Companions architecture tries to solvemultiple AI problems such as reasoning and learning, interactivity,and longevity in one unifying system.
As we noted above, work on AI has mushroomed over the past couple ofdecades. Now that we have looked a bit at the content that composesAI, we take a quick look at the explosive growth of AI.
First, a point of clarification. The growth of which we speak is not ashallow sort correlated with amount of funding provided for a givensub-field of AI. That kind of thing happens all the time in allfields, and can be triggered by entirely political and financialchanges designed to grow certain areas, and diminish others. Along thesame line, the growth of which we speak is not correlated with theamount of industrial activity revolving around AI (or a sub-fieldthereof); for this sort of growth too can be driven by forces quiteoutside an expansion in the scientific breadth of AI.[31] Rather, we are speaking of an explosion of deepcontent: newmaterial which someone intending to be conversant with the field needsto know. Relative to other fields, the size of the explosion may ormay not be unprecedented. (Though it should perhaps be noted that ananalogous increase in philosophy would be marked by the development ofentirely new formalisms for reasoning, reflected in the fact that,say, longstanding philosophy textbooks like Copi’s (2004)Introduction to Logic are dramatically rewritten and enlargedto include these formalisms, rather than remaining anchored toessentially immutable core formalisms, with incremental refinementaround the edges through the years.) But it certainly appears to bequite remarkable, and is worth taking note of here, if for no otherreason than that AI’s near-future will revolve in significantpart around whether or not the new content in question forms afoundation for new long-lived research and development that would nototherwise obtain.[32]
AI has also witnessed an explosion in its usage in various artifactsand applications. While we are nowhere near building a machine withcapabilities of a human or one that acts rationally in all scenariosaccording to the Russell/Hutter definition above, algorithms that havetheir origins in AI research are now widely deployed for many tasks ina variety of domains.
A huge part of AI’s growth in applications has been madepossible through invention of new algorithms in the subfield ofmachine learning. Machine learning is concerned with buildingsystems that improve their performance on a task when given examplesof ideal performance on the task, or improve their performance withrepeated experience on the task. Algorithms from machine learning havebeen used in speech recognition systems, spam filters, onlinefraud-detection systems, product-recommendation systems, etc. Thecurrent state-of-the-art in machine learning can be divided into threeareas (Murphy 2013, Alpaydin 2014):
In addition to being used in domains that are traditionally the ken ofAI, machine-learning algorithms have also been used in all stages ofthe scientific process. For example, machine-learning techniques arenow routinely applied to analyze large volumes of data generated fromparticle accelerators. CERN, for instance, generates a petabyte(\(10^{15}\) bytes) per second, and statistical algorithms that havetheir origins in AI are used to filter and analyze this data. Particleaccelerators are used in fundamental experimental research in physicsto probe the structure of our physical universe. They work bycolliding larger particles together to create much finer particles.Not all such events are fruitful. Machine-learning methods have beenused to select events which are then analyzed further (Whiteson &Whiteson 2009 and Baldi et al. 2014). More recently, researchers atCERN launched a machine learning competition to aid in the analysis of the Higgs Boson. The goal of this challengewas to develop algorithms that separate meaningful events frombackground noise given data from the Large Hadron Collider, a particleaccelerator at CERN.
In the past few decades, there has been an explosion in data that doesnot have any explicit semantics attached to it. This data is generatedby both humans and machines. Most of this data is not easilymachine-processable; for example, images, text, video (as opposed tocarefully curated data in a knowledge- or data-base). This has givenrise to a huge industry that applies AI techniques to get usableinformation from such enormous data. This field of applying techniquesderived from AI to large volumes of data goes by names such as“data mining,” “big data,”“analytics,” etc. This field is too vast to evenmoderately cover in the present article, but we note that there is nofull agreement on what constitutes such a “big-data”problem. One definition, from Madden (2012), is that big data differsfrom traditional machine-processable data in that it is too big (formost of the existing state-of-the-art hardware), too quick (generatedat a fast rate, e.g. online email transactions), or too hard. It is inthe too-hard part that AI techniques work quite well. While thisuniverse is quite varied, we use the Watson’s system later inthis article as an AI-relevant exemplar. As we will see later, whilemost of this new explosion is powered by learning, it isn’tentirely limited to just learning. This bloom in learning algorithmshas been supported by both a resurgence in neurocomputationaltechniques and probabilistic techniques.
One of the remarkable aspects of (Charniak & McDermott 1985) isthis: The authors say the central dogma of AI is that “What thebrain does may be thought of at some level as a kind ofcomputation” (p. 6). And yet nowhere in the book is brain-likecomputation discussed. In fact, you will search the index in vain forthe term ‘neural’ and its variants. Please note that theauthors are not to blame for this. A large part of AI’s growthhas come from formalisms, tools, and techniques that are, in somesense, brain-based, not logic-based. A paper that conveys theimportance and maturity of neurocomputation is (Litt et al. 2006).(Growth has also come from a return of probabilistic techniques thathad withered by the mid-70s and 80s. More about that momentarily, inthe next “resurgence”section.)
One very prominent class of non-logicist formalism does make anexplicit nod in the direction of the brain: viz.,artificial neuralnetworks (or as they are often simply called,neuralnetworks, or even justneural nets). (The structure ofneural networks and more recent developments are discussedabove). Because Minsky and Pappert’s (1969)Perceptrons led many(including, specifically, many sponsors of AI research anddevelopment) to conclude that neural networks didn’t havesufficient information-processing power to model human cognition, theformalism was pretty much universally dropped from AI. However, Minskyand Pappert had only considered very limited neural networks.Connectionism, the view that intelligence consists not insymbolic processing, but rathernon-symbolic processing atleast somewhat like what we find in the brain (at least at thecellular level), approximated specifically by artificial neuralnetworks, came roaring back in the early 1980s on the strength of moresophisticated forms of such networks, and soon the situation was (touse a metaphor introduced by John McCarthy) that of two horses in arace toward building truly intelligent agents.
If one had to pick a year at which connectionism was resurrected, itwould certainly be 1986, the yearParallel DistributedProcessing (Rumelhart & McClelland 1986) appeared in print.The rebirth of connectionism was specifically fueled by theback-propagation (backpropagation) algorithm over neural networks,nicely covered in Chapter 20 ofAIMA. Thesymbolicist/connectionist race led to a spate of lively debate in theliterature (e.g., Smolensky 1988, Bringsjord 1991), and some AIengineers have explicitly championed a methodology marked by arejection of knowledge representation and reasoning. For example,Rodney Brooks was such an engineer; he wrote the well-known“Intelligence Without Representation” (1991), and his CogProject, to which we referred above, is arguably an incarnation of thepremeditatedly non-logicist approach. Increasingly, however, those inthe business of building sophisticated systems find thatbothlogicist and more neurocomputational techniques are required (Wermter& Sun 2001).[33] In addition, the neurocomputational paradigm today includesconnectionism only as a proper part, in light of the fact that some ofthose working on building intelligent systems strive to do so byengineering brain-based computation outside the neural network-basedapproach (e.g., Granger 2004a, 2004b).
Another recent resurgence in neurocomputational techniques hasoccurred in machine learning. The modus operandi in machine learningis that given a problem, say recognizing handwritten digits\(\{0,1,\ldots,9\}\) or faces, from a 2D matrix representing an imageof the digits or faces, a machine learning or a domain expert wouldconstruct afeature vector representation function for thetask. This function is a transformation of the input into a formatthat tries to throw away irrelevant information in the input and keeponly information useful for the task. Inputs transformed by \(\rr\)are termedfeatures. For recognizing faces, irrelevantinformation could be the amount of lighting in the scene and relevantinformation could be information about facial features. The machine isthen fed a sequence of inputs represented by the features and theideal or ground truth output values for those inputs. This convertsthe learning challenge from that of having to learn the function\(\ff\) from the examples: \(\left\{\left\langle x_1,\ff(x_1)\right\rangle,\left\langle x_2, \ff(x_2)\right\rangle, \ldots,\left\langle x_n, \ff(x_n)\right\rangle \right\}\) to having to learnfrom possibly easier data: \(\left\{\left\langle \rr(x_1),\ff(x_1)\right\rangle,\left\langle \rr(x_2), \ff(x_2)\right\rangle,\ldots, \left\langle \rr(x_n), \ff(x_n)\right\rangle \right\}\). Herethe function \(\rr\) is the function that computes the feature vectorrepresentation of the input. Formally, \(\ff\) is assumed to be acomposition of the functions \(\gg\) and \(\rr\). That is, for anyinput \(x\), \(f(x) = \gg\left(\rr\left(x\right)\right)\). This isdenoted by \(\ff=\gg\circ \rr\). For any input, the features are firstcomputed, and then the function \(\gg\) is applied. If the featurerepresentation \(\rr\) is provided by the domain expert, the learningproblem becomes simpler to the extent the feature representation takeson the difficulty of the task. At one extreme, the feature vectorcould hide an easily extractable form of the answer in the input andin the other extreme the feature representation could be just theplain input.
For non-trivial problems, choosing the right representation is vital.For instance, one of the drastic changes in the AI landscape was dueto Minsky and Papert’s (1969) demonstration that the perceptroncannot learn even the binaryXOR function, but this functioncan be learnt by the perceptron if we have the right representation.Feature engineering has grown to be one of the most labor intensivetasks of machine learning, so much so that it is considered to be oneof the“black arts” of machine learning. The othersignificant black art of learning methods is choosing the rightparameters. These black arts require significant human expertise andexperience, which can be quite difficult to obtain without significantapprenticeship (Domingos 2012). Another bigger issue is that the taskof feature engineering is just knowledge representation in a new skin.
Given this state of affairs, there has been a recent resurgence inmethods for automatically learning a feature representation function\(\rr\); such methods potentially bypass a large part of human laborthat is traditionally required. Such methods are based mostly on whatare now termeddeep neural networks. Such networks are simplyneural networks with two or more hidden layers. These networks allowus to learn a feature function \(\rr\) by using one or more of thehidden layers to learn \(\rr\). The general form of learning in whichone learns from the raw sensory data without much hand-based featureengineering has now its own term:deep learning. A general andyet concise definition (Bengio et al. 2015) is:
Deep learning can safely be regarded as the study of models thateither involve a greater amount of composition of learned functions orlearned concepts than traditional machine learning does. (Bengio etal. 2015, Chapter 1)
Though the idea has been around for decades, recent innovationsleading to more efficient learning techniques have made the approachmore feasible (Bengio et al. 2013). Deep-learning methods haverecently produced state-of-the-art results in image recognition (givenan image containing various objects, label the objects from a givenset of labels), speech recognition (from audio input, generate atextual representation), and the analysis of data from particleaccelerators (LeCun et al. 2015). Despite impressive results in taskssuch as these, minor and major issues remain unresolved. A minor issueis that significant human expertise is still needed to choose anarchitecture and set up the right parameters for the architecture; amajor issue is the existence of so-calledadversarial inputs,which are indistinguishable from normal inputs to humans but arecomputed in a special manner that makes a neural network regard themas different than similar inputs in the training data. The existenceof such adversarial inputs, which remain stable across training data,has raised doubts about how well performance on benchmarks cantranslate into performance in real-world systems with sensory noise(Szegedy et al. 2014).
There is a second dimension to the explosive growth of AI: theexplosion in popularity of probabilistic methods that aren’tneurocomputational in nature, in order to formalize and mechanize aform of non-logicist reasoning in the face of uncertainty.Interestingly enough, it is Eugene Charniak himself who can be safelyconsidered one of the leading proponents of an explicit, premeditatedturn away from logic to statistical techniques. His area ofspecialization is natural language processing, and whereas hisintroductory textbook of 1985 gave an accurate sense of his approachto parsing at the time (as we have seen, write computer programs that,given English text as input, ultimately infer meaning expressed inFOL), this approach was abandoned in favor of purely statisticalapproaches (Charniak 1993). At theAI@50 conference, Charniak boldly proclaimed, in a talk tellingly entitled“Why Natural Language Processing is Now Statistical NaturalLanguage Processing,” that logicist AI is moribund, and that thestatistical approach is the only promising game in town – forthe next 50 years.[34]
The chief source of energy and debate at the conference flowed fromthe clash between Charniak’s probabilistic orientation, and theoriginal logicist orientation, upheld at the conference in question byJohn McCarthy and others.
AI’s use of probability theory grows out of the standard form ofthis theory, which grew directly out of technical philosophy andlogic. This form will be familiar to many philosophers, butlet’s review it quickly now, in order to set a firm stage formaking points about the new probabilistic techniques that haveenergized AI.
Just as in the case of FOL, in probability theory we are concernedwith declarative statements, orpropositions, to which degreesof belief are applied; we can thus say that both logicist andprobabilistic approaches are symbolic in nature. Both approaches alsoagree that statements can either be true or false in the world. Inbuilding agents, a simplistic logic-based approach requires agents toknow the truth-value of all possible statements. This is notrealistic, as an agent may not know the truth-value of someproposition \(p\) due to either ignorance, non-determinism in thephysical world, or just plain vagueness in the meaning of thestatement. More specifically, the fundamental proposition inprobability theory is arandom variable, which can be conceivedof as an aspect of the world whose status is initially unknown to theagent. We usually capitalize the names of random variables, though wereserve \(p,q,r, \ldots\) as such names as well. For example, in aparticular murder investigation centered on whether or not Mr. Barolocommitted the crime, the random variable \(Guilty\) might be ofconcern. The detective may be interested as well in whether or not themurder weapon – a particular knife, let us assume –belongs to Barolo. In light of this, we might say that \(\Weapon =\true\) if it does, and \(\Weapon = \false\) if it doesn’t. As anotational convenience, we can write \(weapon\) and \(\lnot weapon\)and for these two cases, respectively; and we can use this conventionfor other variables of this type.
The kind of variables we have described so far are\(\mathbf{Boolean}\), because their \(\mathbf{domain}\) is simply\(\{true,false\}.\) But we can generalize and allow\(\mathbf{discrete}\) random variables, whose values are from anycountable domain. For example, \(\PriceTChina\) might be a variablefor the price of (a particular, presumably) tea in China, and itsdomain might be \(\{1,2,3,4,5\}\), where each number here is in USdollars. A third type of variable is \(\mathbf{continous}\); itsdomain is either the reals, or some subset thereof.
We say that anatomic event is an assignment of particularvalues from the appropriate domains to all the variables composing the(idealized) world. For example, in the simple murder investigationworld introduced just above, we have two Boolean variables,\(\Guilty\) and \(\Weapon\), and there are just four atomic events.Note that atomic events have some obvious properties. For example,they are mutually exclusive, exhaustive, and logically entail thetruth or falsity of every proposition. Usually not obvious tobeginning students is a fourth property, namely, any proposition islogically equivalent to the disjunction of all atomic events thatentail that proposition.
Prior probabilities correspond to a degree of belief accorded to aproposition in the complete absence of any other information. Forexample, if the prior probability of Barolo’s guilt is \(0.2\),we write \[ P\left(\Guilty=true\right)=0.2 \]
or simply \(\P(guilty)=0.2\). It is often convenient to have anotation allowing one to refer economically to the probabilities ofall the possible values for a random variable. For example,we can write \[ \P\left(\PriceTChina\right) \]
as an abbreviation for the five equations listing all the possibleprices for tea in China. We can also write \[ \P\left(\PriceTChina\right)=\langle 1,2,3,4,5\rangle \]
In addition, as further convenient notation, we can write \(\mathbf{P}\left(\Guilty, \Weapon\right)\) to denote the probabilitiesof all combinations of values of the relevant set of random variables.This is referred to as thejoint probability distribution of\(\Guilty\) and \(\Weapon\). Thefull joint probabilitydistribution covers the distribution for all the random variables usedto describe a world. Given our simple murder world, we have 20 atomicevents summed up in the equation \[ \mathbf{P}\left(\Guilty, \Weapon, \PriceTChina\right) \]
The final piece of the basic language of probability theorycorresponds toconditional probabilities. Where \(p\) and \(q\)are any propositions, the relevant expression is \(P\!\left(p\givenq\right)\), which can be interpreted as “the probability of\(p\), given that all we know is \(q\).” For example,\[ P\left(guilty\ggiven weapon\right)=0.7 \]
says that if the murder weapon belongs to Barolo, and no otherinformation is available, the probability that Barolo is guilty is\(0.7.\)
Andrei Kolmogorov showed how to construct probability theory fromthree axioms that make use of the machinery now introduced, viz.,
These axioms are clearly at bottom logicist. The remainder ofprobability theory can be erected from this foundation (conditionalprobabilities are easily defined in terms of prior probabilities). Wecan thus say that logic is in some fundamental sense still being usedto characterize the set of beliefs that a rational agent can have. Butwhere does probabilisticinference enter the picture on thisaccount, since traditional deduction is not used for inference inprobability theory?
Probabilistic inference consists in computing, from observed evidenceexpressed in terms of probability theory, posterior probabilities ofpropositions of interest. For a good long while, there have beenalgorithms for carrying out such computation. These algorithms precedethe resurgence of probabilistic techniques in the 1990s. (Chapter 13ofAIMA presents a number of them.) For example, given theKolmogorov axioms, here is a straightforward way of computing theprobability of any proposition, using the full joint distributiongiving the probabilities of all atomic events: Where \(p\) is someproposition, let \(\alpha(p)\) be the disjunction of all atomic eventsin which \(p\) holds. Since the probability of a proposition (i.e.,\(P(p)\)) is equal to the sum of the probabilities of the atomicevents in which it holds, we have an equation that provides a methodfor computing the probability of any proposition \(p\), viz.,
\[ P(p) = \sum_{e_i\in\alpha(p)} P(e_i)\]Unfortunately, there were two serious problems infecting this originalprobabilistic approach: One, the processing in question needed to takeplace over paralyzingly large amounts of information (enumeration overthe entire distribution is required). And two, the expressivity of theapproach was merely propositional. (It was by the way the philosopherHilary Putnam (1963) who pointed out that there was a price to pay inmoving to the first-order level. The issue is not discussed herein.)Everything changed with the advent of a new formalism that marks themarriage of probabilism and graph theory:Bayesian networks(also calledbelief nets). The pivotal text was (Pearl 1988).For a more detailed discussion, see the
Supplement on Bayesian Networks.
Before concluding this section, it is probably worth noting that, fromthe standpoint of philosophy, a situation such as the murderinvestigation we have exploited above would often be analyzed intoarguments, and strength factors, not into numbers to becrunched by purely arithmetical procedures. For example, in theepistemology of Roderick Chisholm, as presented hisTheory ofKnowledge (1966, 1977), Detective Holmes might classify aproposition like Barolo committed the murder. ascounterbalanced if he was unable to find a compelling argumenteither way, or perhapsprobable if the murder weapon turned outto belong to Barolo. Such categories cannot be found on a continuumfrom 0 to 1, and they are used in articulating arguments for oragainst Barolo’s guilt. Argument-based approaches to uncertainand defeasible reasoning are virtually non-existent in AI. Oneexception is Pollock’s approach, covered below. This approach isChisholmian in nature.
It should also be noted that there have been well-establishedformalisms for dealing with probabilistic reasoning as an instance oflogic-based reasoning. E.g., the activity a researcher inprobabilistic reasoning undertakes when she proves a theorem \(\phi\)about their domain (e.g. any theorem in (Pearl 1988)) is purely withinthe realm of traditional logic. Readers interested in logic-flavoredapproaches to probabilistic reasoning can consult (Adams 1996,Hailperin 1996 & 2010, Halpern 1998). Formalisms marryingprobability theory, induction and deductive reasoning, placing them onan equal footing, have been on the rise, with Markov logic (Richardsonand Domingos 2006) being salient among these approaches.
Probabilistic Machine Learning
Machine learning, in the sense givenabove, has been associated with probabilistic techniques. Probabilistictechniques have been associated with both the learning of functions(e.g. Naive Bayes classification) and the modeling of theoreticalproperties of learning algorithms. For example, a standardreformulation of supervised learning casts it as aBayesianproblem. Assume that we are looking at recognizing digits\([0{-}9]\) from a given image. One way to cast this problem is to askwhat the probability that the hypothesis \(H_x\): “the digitis \(x\)” is true given the image \(d\) from a sensor. Bayestheorem gives us:
\[ P\left(H_x\ggiven d\right) = \frac{P\left(d\ggiven H_x\right)*P\left(H_x\right)}{P\left(d\right)}\]\(P(d\given H_x)\) and \(P(H_x)\) can be estimated from the giventraining dataset. Then the hypothesis with the highest posteriorprobability is then given as the answer and is given by:\(\argmax_{x}P\left(d\ggiven H_x\right)*P\left(H_x\right) \) Inaddition to probabilistic methods being used to build algorithms,probability theory has also been used to analyze algorithms whichmight not have an overt probabilistic or logical formulation. Forexample, one of the central classes of meta-theorems in learning,probably approximately correct (PAC) theorems, are cast interms of lower bounds of the probability that the mismatch between theinduced/learntfL function and the true functionfT being less than a certain amount, given that thelearnt functionfL works well for a certain numberof cases (see Chapter 18, AIMA).
From at least its modern inception, AI has always been connected togadgets, often ones produced by corporations, and it would be remissof us not to say a few words about this phenomenon. While there havebeen a large number of commercial in-the-wild success stories for AIand its sister fields, such as optimization and decision-making, someapplications are more visible and have been thoroughly battle-testedin the wild. In 2014, one of the most visible such domains (one inwhich AI has been strikingly successful) is information retrieval,incarnated as web search. Another recent success story is patternrecognition. The state-of-the-art in applied pattern recognition(e.g., fingerprint/face verification, speech recognition, andhandwriting recognition) is robust enough to allow“high-stakes” deployment outside the laboratory. As of mid2018, several corporations and research laboratories have beguntesting autonomous vehicles on public roads, with even a handful ofjurisdictions making self-driving cars legal to operate. For example,Google’s autonomous cars have navigated hundreds of thousands ofmiles in California with minimal human help under non-trivialconditions (Guizzo 2011).
Computer games provide a robust test bed for AI techniques as they cancapture important parts that might be necessary to test an AItechnique while abstracting or removing details that might beyond thescope of core AI research, for example, designing better hardware ordealing with legal issues (Laird and VanLent 2001). One subclass ofgames that has seen quite fruitful for commercial deployment of AI isreal-time strategy games. Real-time strategy games are games in whichplayers manage an army given limited resources. One objective is toconstantly battle other players and reduce an opponent’s forces.Real-time strategy games differ from strategy games in that playersplan their actions simultaneously in real-time and do not have to taketurns playing. Such games have a number of challenges that aretantalizing within the grasp of the state-of-the-art. This makes suchgames an attractive venue in which to deploy simple AI agents. Anoverview of AI used in real-time strategy games can be found in(Robertson and Watson 2015).
Some other ventures in AI, despite significant success, have been onlychugging slowly and humbly along, quietly. For instance, AI-relatedmethods have achieved triumphs in solving open problems in mathematicsthat have resisted any solution for decades. The most noteworthyinstance of such a problem is perhaps a proof of the statement that“All Robbins algebras are Boolean algebras.” Thiswas conjectured in the 1930s, and the proof was finally discovered bythe Otter automatic theorem-prover in 1996 after just a few months ofeffort (Kolata 1996, Wos 2013). Sister fields like formal verificationhave also bloomed to the extent that it is now not too difficult tosemi-automatically verify vital hardware/software components (Kaufmannet al. 2000 and Chajed et al. 2017).
Other related areas, such as (natural) language translation, stillhave a long way to go, but are good enough to let us use them underrestricted conditions. The jury is out on tasks such as machinetranslation, which seems to require both statistical methods (Lopez2008)and symbolic methods (España-Bonet 2011). Bothmethods now have comparable but limited success in the wild. Adeployed translation system at Ford that was initially developed fortranslating manufacturing process instructions from English to otherlanguages initially started out as rule-based system with Ford anddomain-specific vocabulary and language. This system then evolved toincorporate statistical techniques along with rule-based techniques asit gained new uses beyond translating manuals, for example, lay userswithin Ford translating their own documents (Rychtyckyj and Plesco2012).
AI’s great achievements mentioned above so far have all been inlimited, narrow domains. This lack of any success in the unrestrictedgeneral case has caused a small set of researchers to break away intowhat is now called artificial general intelligence (Goertzel and Pennachin 2007). The stated goals of this movementinclude shifting the focus again to building artifacts that aregenerally intelligent and not just capable in one narrow domain.
Computer Ethics has been around for a long time. In thissub-field, typically one would consider how one ought to act in acertain class of situations involving computer technology, where the“one” here refers to a human being (Moor 1985). So-called“robot ethics” is different. In this sub-field (which goesby names such as “moral AI,” “ethical AI,”“machine ethics,” “moral robots,” etc.) one isconfronted with such prospects as robots being able to make autonomousand weighty decisions – decisions that might or might not bemorally permissible (Wallach & Allen 2010). If one were to attemptto engineer a robot with a capacity for sophisticated ethicalreasoning and decision-making, one would also be doing PhilosophicalAI, as that concept is characterizedelsewhere in the present entry. There can be many different flavors ofapproaches toward Moral AI. Wallach and Allen (2010) provide ahigh-level overview of the different approaches. Moral reasoning isobviously needed in robots that have the capability for lethal action.Arkin (2009) provides an introduction to how we can control andregulate machines that have the capacity for lethal behavior. Moral AIgoes beyond obviously lethal situations, and we can have a spectrum ofmoral machines. Moor (2006) provides one such spectrum of possiblemoral agents. An example of a non-lethal but ethically-charged machinewould be a lying machine. Clark (2010) uses acomputational theoryof the mind, the ability to represent and reason about otheragents, to build a lying machine that successfully persuades peopleinto believing falsehoods. Bello & Bringsjord (2013) give ageneral overview of what might be required to build a moral machine,one of the ingredients being a theory of mind.
The most general framework for building machines that can reasonethically consists in endowing the machines with amoral code.This requires that the formal framework used for reasoning by themachine be expressive enough to receive such codes. The field of MoralAI, for now, is not concerned with the source or provenance of suchcodes. The source could be humans, and the machine could receive thecode directly (via explicit encoding) or indirectly (reading). Anotherpossibility is that the code is inferred by the machine from a morebasic set of laws. We assume that the robot has access to some suchcode, and we then try to engineer the robot to follow that code underall circumstances while making sure that the moral code and itsrepresentation do not lead to unintended consequences.Deonticlogics are a class of formal logics that have been studied themost for this purpose. Abstractly, such logics are concerned mainlywith what follows from a given moral code. Engineering then studiesthe match of a given deontic logic to a moral code (i.e., is the logicexpressive enough) which has to be balanced with the ease ofautomation. Bringsjord et al. (2006) provide a blueprint for usingdeontic logics to build systems that can perform actions in accordancewith a moral code. The role deontic logics play in the frameworkoffered by Bringsjord et al (which can be considered to berepresentative of the field of deontic logic for moral AI) can be bestunderstood as striving towards Leibniz’s dream of a universalmoral calculus:
When controversies arise, there will be no more need for a disputationbetween two philosophers than there would be between two accountants[computistas]. It would be enough for them to pick up their pens andsit at their abacuses, and say to each other (perhaps having summoneda mutual friend): ‘Let us calculate.’
Deontic logic-based frameworks can also be used in a fashion that isanalogous to moral self-reflection. In this mode, logic-basedverification of the robot’s internal modules can done before therobot ventures out into the real world. Govindarajulu and Bringsjord(2015) present an approach, drawing fromformal-programverification, in which a deontic-logic based system could be usedto verify that a robot acts in a certain ethically-sanctioned mannerunder certain conditions. Since formal-verification approaches can beused to assert statements about an infinite number of situations andconditions, such approaches might be preferred to having the robotroam around in an ethically-charged test environment and make a finiteset of decisions that are then judged for their ethical correctness.More recently, Govindarajulu and Bringsjord (2017) use a deontic logicto present a computational model of theDoctrine of Double Effect, an ethical principle for moral dilemmas that has been studiedempirically and analyzed extensively by philosophers.[35] The principle is usually presented and motivated via dilemmas usingtrolleys and was first presented in this fashion by Foot (1967).
While there has been substantial theoretical and philosophical work,the field of machine ethics is still in its infancy. There has beensome embryonic work in building ethical machines. One recent suchexample would be Pereira and Saptawijaya (2016) who use logicprogramming and base their work in machine ethics on the ethicaltheory known ascontractualism, set out by Scanlon (1982). Andwhat about the future? Since artificial agents are bound to getsmarter and smarter, and to have more and more autonomy andresponsibility, robot ethics is almost certainly going to grow inimportance. This endeavor might not be a straightforward applicationof classical ethics. For example, experimental results suggest thathumans hold robots to different ethical standards than they expectfrom humans under similar conditions (Malle et al. 2015).[36]
Notice that the heading for this section isn’t Philosophyof AI. We’ll get to that category momentarily. (For nowit can be identified with the attempt to answer such questions aswhether artificial agents created in AI can ever reach the fullheights of human intelligence.) Philosophical AI is AI, notphilosophy; but it’s AI rooted in and flowing from, philosophy.For example, one could engage, using the tools and techniques ofphilosophy, a paradox, work out a proposed solution, and then proceedto a step that is surely optional for philosophers: expressing thesolution in terms that can be translated into a computer program that,when executed, allows an artificial agent to surmount concreteinstances of the original paradox.[37] Before we ostensively characterize Philosophical AI of this sortcourtesy of a particular research program, let us consider first theview that AI is in fact simply philosophy, or a part thereof.
Daniel Dennett (1979) has famously claimed not just that there areparts of AI intimately bound up with philosophy, but that AIis philosophy (and psychology, at least of the cognitivesort). (He has made a parallel claim about Artificial Life (Dennett1998)). This view will turn out to be incorrect, but the reasons whyit’s wrong will prove illuminating, and our discussion will pavethe way for a discussion of Philosophical AI.
What does Dennett say, exactly? This:
I want to claim that AI is better viewed as sharing with traditionalepistemology the status of being a most general, most abstract askingof the top-down question: how is knowledge possible? (Dennett 1979,60)
Elsewhere he says his view is that AI should be viewed “as amost abstract inquiry into the possibility of intelligence orknowledge” (Dennett 1979, 64).
In short, Dennett holds that AI is the attempt to explainintelligence, not by studying the brain in the hopes of identifyingcomponents to which cognition can be reduced, and not by engineeringsmall information-processing units from which one can build inbottom-up fashion to high-level cognitive processes, but rather by– and this is why he says the approach istop-down– designing and implementing abstract algorithms that capturecognition. Leaving aside the fact that, at least starting in the early1980s, AI includes an approach that is in some sense bottom-up (seethe neurocomputational paradigm discussed above, inNon-Logicist AI: A Summary; and see, specifically, Granger’s (2004a, 2004b) work,hyperlinked in text immediately above, a specific counterexample), afatal flaw infects Dennett’s view. Dennett sees the potentialflaw, as reflected in:
It has seemed to some philosophers that AI cannot plausibly be soconstrued because it takes on an additional burden: it restrictsitself tomechanistic solutions, and hence its domain is notthe Kantian domain of all possible modes of intelligence, but just allpossible mechanistically realizable modes of intelligence. This, it isclaimed, would beg the question against vitalists, dualists, and otheranti-mechanists. (Dennett 1979, 61)
Dennett has a ready answer to this objection. He writes:
But … the mechanism requirement of AI is not an additionalconstraint of any moment, for if psychology is possible at all, and ifChurch’s thesis is true, the constraint of mechanism is no moresevere than the constraint against begging the question in psychology,and who would wish to evade that? (Dennett 1979, 61)
Unfortunately, this is acutely problematic; and examination of theproblems throws light on the nature of AI.
First, insofar as philosophy and psychology are concerned with thenature of mind, they aren’t in the least trammeled by thepresupposition that mentation consists in computation. AI, at least ofthe “Strong” variety (we’ll discuss“Strong” versus “Weak” AIbelow) is indeed an attempt to substantiate, through engineering certainimpressive artifacts, the thesis that intelligence is at bottomcomputational (at the level of Turing machines and their equivalents,e.g., Register machines). So there is a philosophical claim, for sure.But this doesn’t make AI philosophy, any more than some of thedeeper, more aggressive claims of some physicists (e.g., that theuniverse is ultimately digital in nature) make their field philosophy.Philosophy of physics certainlyentertains the propositionthat the physical universe can be perfectly modeled in digital terms(in a series of cellular automata, e.g.), but of course philosophy ofphysics can’t beidentified with this doctrine.
Second, we now know well (and those familiar with the relevant formalterrain knew at the time of Dennett’s writing) that informationprocessing can exceed standard computation, that is, can exceedcomputation at and below the level of what a Turing machine can muster(Turing-computation, we shall say). (Such informationprocessing is known ashypercomputation, a term coined byphilosopher Jack Copeland, who has himself defined such machines(e.g., Copeland 1998). The first machines capable of hypercomputationweretrial-and-error machines, introduced in the same famousissue of theJournal of Symbolic Logic (Gold 1965; Putnam1965). A new hypercomputer is the infinite time Turing machine(Hamkins & Lewis 2000).) Dennett’s appeal to Church’sthesis thus flies in the face of the mathematical facts: somevarieties of information processing exceed standard computation (orTuring-computation). Church’s thesis, or more precisely, theChurch-Turing thesis, is the view that a function \(f\) is effectivelycomputable if and only if \(f\) is Turing-computable (i.e., someTuring machine can compute \(f\)). Thus, this thesis has nothing tosay about information processing that is more demanding than what aTuring machine can achieve. (Put another way, there is nocounter-example to CTT to be automatically found in aninformation-processing device capable of feats beyond the reach ofTMs.) For all philosophy and psychology know, intelligence, even iftied to information processing, exceeds what is Turing-computationalor Turing-mechanical.[38] This is especially true because philosophy and psychology, unlike AI,are in no way fundamentally charged with engineering artifacts, whichmakes the physical realizability of hypercomputation irrelevant fromtheir perspectives. Therefore,contra Dennett, to consider AIas psychology or philosophy is to commit a serious error, preciselybecause so doing would box these fields into only a speck of theentire space of functions from the natural numbers (including tuplestherefrom) to the natural numbers. (Only a tiny portion of thefunctions in this space are Turing-computable.) AI is without questionmuch, much narrower than this pair of fields. Of course, it’spossible that AI could be replaced by a field devoted not to buildingcomputational artifacts by writing computer programs and running themon embodied Turing machines. But this new field, by definition, wouldnot be AI. Our exploration ofAIMA and other textbooks providedirect empirical confirmation of this.
Third, most AI researchers and developers, in point of fact, aresimply concerned with building useful, profitable artifacts, anddon’t spend much time reflecting upon the kinds of abstractdefinitions of intelligence explored in this entry (e.g.,What Exactlyis AI?).
Though AI isn’t philosophy, there are certainly ways of doingreal implementation-focussed AI of the highest caliber that areintimately bound up with philosophy. The best way to demonstrate thisis to simply present such research and development, or at least arepresentative example thereof. While there have been many examples ofsuch work, the most prominent example in AI is John Pollock’sOSCAR project, which stretched over a considerable portion of hislifetime. For a detailed presentation and further discussion, seethe
Supplement on the OSCAR Project.
It’s important to note at this juncture that the OSCAR project,and the information processing that underlies it, are without questionat once philosophyand technical AI. Given that the work inquestion has appeared in the pages ofArtificial Intelligence,a first-rank journal devoted to that field, and not to philosophy,this is undeniable (see, e.g., Pollock 2001, 1992). This point isimportant because while it’s certainly appropriate, in thepresent venue, to emphasize connections between AI and philosophy,some readers may suspect that this emphasis is contrived: they maysuspect that the truth of the matter is that page after page of AIjournals are filled with narrow, technical content far fromphilosophy. Many such papers do exist. But we must distinguish betweenwritings designed to present the nature of AI, and its core methodsand goals, versus writings designed to present progress on specifictechnical issues.
Writings in the latter category are more often than not quite narrow,but, as the example of Pollock shows, sometimes these specific issuesare inextricably linked to philosophy. And of course Pollock’swork is a representative example (albeit the most substantive one).One could just as easily have selected work by folks who don’thappen to also produce straight philosophy. For example, for an entirebook written within the confines of AI and computer science, but whichis epistemic logic in action in many ways, suitable for use inseminars on that topic, see (Fagin et al. 2004). (It is hard to findtechnical work that isn’t bound up with philosophy in somedirect way. E.g., AI research on learning is all intimately bound upwith philosophical treatments of induction, of how genuinely newconcepts not simply defined in terms of prior ones can be learned. Onepossible partial answer offered by AI isinductive logicprogramming, discussed in Chapter 19 ofAIMA.)
What of writings in the former category? Writings in this category,while by definition in AI venues, not philosophy ones, are nonethelessphilosophical. Most textbooks include plenty of material that fallsinto this latter category, and hence they include discussion of thephilosophical nature of AI (e.g., that AI is aimed at buildingartificial intelligences, and that’s why, after all, it’scalled ‘AI’).
Recall that we earlier discussed proposed definitions of AI, andrecall specifically that these proposals were couched in terms of thegoals of the field. We can follow this pattern here: We candistinguish between “Strong” and “Weak” AI bytaking note of the different goals that these two versions of AIstrive to reach. “Strong” AI seeks to create artificialpersons: machines that have all the mental powers we have, includingphenomenal consciousness. “Weak” AI, on the other hand,seeks to build information-processing machines thatappear tohave the full mental repertoire of human persons (Searle 1997).“Weak” AI can also be defined as the form of AI that aimsat a system able to pass not just the Turing Test (again, abbreviatedas TT), but theTotal Turing Test (Harnad 1991). In TTT, amachine must muster more than linguistic indistinguishability: it mustpass for a human in all behaviors – throwing a baseball, eating,teaching a class, etc.
It would certainly seem to be exceedingly difficult for philosophersto overthrow “Weak” AI (Bringsjord and Xiao 2000). Afterall, whatphilosophical reason stands in the way of AIproducing artifacts thatappear to be animals or even humans?However, some philosophers have aimed to do in “Strong”AI, and we turn now to the most prominent case in point.
Without question, the most famous argument in the philosophy of AI isJohn Searle’s (1980) Chinese Room Argument (CRA), designed tooverthrow “Strong” AI. We present a quick summary here anda “report from the trenches” as to how AI practitionersregard the argument. Readers wanting to further study CRA will find anexcellent next step in the entry onthe Chinese Room Argument and (Bishop & Preston 2002).
CRA is based on a thought-experiment in which Searle himself stars. Heis inside a room; outside the room are native Chinese speakers whodon’t know that Searle is inside it. Searle-in-the-box, likeSearle-in-real-life, doesn’t know any Chinese, but is fluent inEnglish. The Chinese speakers send cards into the room through a slot;on these cards are written questions in Chinese. The box, courtesy ofSearle’s secret work therein, returns cards to the nativeChinese speakers as output. Searle’s output is produced byconsulting a rulebook: this book is a lookup table that tells him whatChinese to produce based on what is sent in. To Searle, the Chinese isall just a bunch of – to use Searle’s language –squiggle-squoggles. The following schematic picture sums up thesituation. The labels should be obvious. \(O\) denotes the outsideobservers, in this case the Chinese speakers. Input is denoted by\(i\) and output by \(o\). As you can see, there is an icon for therulebook, and Searle himself is denoted by \(P\).

The Chinese Room, Schematic View
Now, what is the argument based on this thought-experiment? Even ifyou’ve never heard of CRA before, you doubtless can see thebasic idea: that Searle (in the box) is supposed to be everything acomputer can be, and because he doesn’t understand Chinese, nocomputer could have such understanding. Searle is mindlessly movingsquiggle-squoggles around, and (according to the argument)that’s all computers do, fundamentally.[39]
Where does CRA stand today? As we’ve already indicated, theargument would still seem to be alive and well; witness (Bishop &Preston 2002). However, there is little doubt that at least among AIpractitioners, CRA is generally rejected. (This is of coursethoroughly unsurprising.) Among these practitioners, the philosopherwho has offered the most formidable response out of AI itself isRapaport (1988), who argues that while AI systems are indeedsyntactic, the right syntax can constitute semantics. It should besaid that a common attitude among proponents of “Strong”AI is that CRA is not only unsound, but silly, based as it is on afanciful story (CR) far removed from thepractice of AI– practice which is year by year moving ineluctably towardsophisticated robots that will once and for all silence CRA and itsproponents. For example, John Pollock (as we’ve noted,philosopherand practitioner of AI) writes:
Once [my intelligent system] OSCAR is fully functional, the argumentfrom analogy will lead us inexorably to attribute thoughts andfeelings to OSCAR with precisely the same credentials with which weattribute them to human beings. Philosophical arguments to thecontrary will be passé. (Pollock 1995, p. 6)
To wrap up discussion of CRA, we make two quick points, to wit:
Readers may wonder if there are philosophical debates that AIresearchers engage in, in the course of working in their field (asopposed to when they might attend a philosophy conference). Surely, AIresearchers have philosophical discussions amongst themselves, right?
Generally, one finds that AI researchers do discuss among themselvestopics in philosophy of AI, and these topics are usually the very sameones that occupy philosophers of AI. However, the attitude reflectedin the quote from Pollock immediately above is by far the dominantone. That is, in general, the attitude of AI researchers is thatphilosophizing is sometimes fun, but the upward march of AIengineering cannot be stopped, will not fail, and will eventuallyrender such philosophizing otiose.
We will return to the issue of the future of AI in thefinal section of this entry.
Four decades ago, J.R. Lucas (1964) argued that Gödel’sfirst incompleteness theorem entails that no machine can ever reachhuman-level intelligence. His argument has not proved to becompelling, but Lucas initiated a debate that has produced moreformidable arguments. One of Lucas’ indefatigable defenders isthe physicist Roger Penrose, whose first attempt to vindicate Lucaswas a Gödelian attack on “Strong” AI articulated inhisThe Emperor’s New Mind (1989). This first attemptfell short, and Penrose published a more elaborate and more fastidiousGödelian case, expressed in Chapters 2 and 3 of hisShadows ofthe Mind (1994).
In light of the fact that readers can turn to theentry on the Gödel’s Incompleteness Theorems, a full review here is not needed. Instead, readers will be given adecent sense of the argument by turning to an online paper in whichPenrose, writing in response to critics (e.g., the philosopher DavidChalmers, the logician Solomon Feferman, and the computer scientistDrew McDermott) of hisShadows of the Mind, distills theargument to a couple of paragraphs.[40] Indeed, in this paper Penrose gives what he takes to be the perfectedversion of the core Gödelian case given inSOTM. Here isthis version, verbatim:
We try to suppose that the totality of methods of (unassailable)mathematical reasoning that are in principle humanly accessible can beencapsulated in some (not necessarily computational) sound formalsystem \(F\). A human mathematician, if presented with \(F\), couldargue as follows (bearing in mind that the phrase “I am\(F\)” is merely a shorthand for “\(F\) encapsulates allthe humanly accessible methods of mathematical proof”):(A) “Though I don’t know that I necessarily am \(F\), Iconclude that if I were, then the system \(F\) would have to be soundand, more to the point, \(F'\) would have to be sound, where \(F'\) is\(F\) supplemented by the further assertion “I am \(F\).”I perceive that it follows from the assumption that I am \(F\) thatthe Gödel statement \(G(F')\) would have to be true and,furthermore, that it would not be a consequence of \(F'\). But I havejust perceived that “If I happened to be \(F\), then \(G(F')\)would have to be true,” and perceptions of this nature would beprecisely what \(F'\) is supposed to achieve. Since I am thereforecapable of perceiving something beyond the powers of \(F'\), I deducethat I cannot be \(F\) after all. Moreover, this applies to any other(Gödelizable) system, in place of \(F\).” (Penrose 1996,3.2)
Does this argument succeed? A firm answer to this question is notappropriate to seek in the present entry. Interested readers areencouraged to consult four full-scale treatments of the argument(LaForte et. al 1998; Bringsjord and Xiao 2000; Shapiro 2003; Bowie1982).
In addition to the Gödelian and Searlean arguments coveredbriefly above, a third attack on “Strong” AI (of thesymbolic variety) has been widely discussed (though with the rise ofstatistical machine learning has come a corresponding decrease in theattention paid to it), namely, one given by the philosopher HubertDreyfus (1972, 1992), some incarnations of which have beenco-articulated with his brother, Stuart Dreyfus (1987), a computerscientist. Put crudely, the core idea in this attack is that humanexpertise is not based on the explicit, disembodied, mechanicalmanipulation of symbolic information (such as formulae in some logic,or probabilities in some Bayesian network), and that AI’sefforts to build machines with such expertise are doomed if based onthe symbolic paradigm. The genesis of the Dreyfusian attack was abelief that the critique of (if you will) symbol-based philosophy(e.g., philosophy in the logic-based, rationalist tradition, asopposed to what is called the Continental tradition) from suchthinkers as Heidegger and Merleau-Ponty could be made against therationalist tradition in AI. After further reading and study ofDreyfus’ writings, readers may judge whether this critique iscompelling, in an information-driven world increasingly managed byintelligent agents that carry out symbolic reasoning (albeit not evenclose to the human level).
For readers interested in exploring philosophy of AI beyond what JimMoor (in a recent address – “The Next Fifty Years of AI:Future Scientific Research vs. Past Philosophical Criticisms”– as the 2006 Barwise Award winner at the annual easternAmerican Philosophical Association meeting) has called “thebig three” criticisms of AI, there is no shortage of additionalmaterial, much of it available on the Web. The last chapter ofAIMA provides a compressed overview of some additionalarguments against “Strong” AI, and is in general not a badnext step. Needless to say, Philosophy of AI today involves much morethan the three well-known arguments discussed above, and, inevitably,Philosophy of AI tomorrow will include new debates and problems wecan’t see now. Because machines, inevitably, will get smarterand smarter (regardless of justhow smart they get),Philosophy of AI, pure and simple, is a growth industry. With everyhuman activity that machines match, the “big” questionswill only attract more attention.
If past predictions are any indication, the only thing we know todayabout tomorrow’s science and technology is that it will beradically different than whatever we predict it will be like.Arguably, in the case of AI, we may also specifically know today thatprogress will be much slower than what most expect. After all, at the1956 kickoff conference (discussed at the start of this entry), HerbSimon predicted that thinking machines able to match the human mindwere “just around the corner” (for the relevant quotes andinformative discussion, see the first chapter ofAIMA). As itturned out, the new century would arrive without a single machine ableto converse at even the toddler level. (Recall that when it comes tothe building of machines capable of displaying human-levelintelligence, Descartes, not Turing, seems today to be the betterprophet.) Nonetheless, astonishing though it may be, serious thinkersin the late 20th century have continued to issue incredibly optimisticpredictions regarding the progress of AI. For example, Hans Moravec(1999), in hisRobot: Mere Machine to Transcendent Mind,informs us that because the speed of computer hardware doubles every18 months (in accordance with Moore’s Law, which has apparentlyheld in the past), “fourth generation” robots will soonenough exceed humans in all respects, from running companies towriting novels. These robots, so the story goes, will evolve to suchlofty cognitive heights that we will stand to them as single-cellorganisms stand to us today.[41]
Moravec is by no means singularly Pollyannaish: Many others in AIpredict the same sensational future unfolding on about the same rapidschedule. In fact, at the aforementioned AI@50 conference, Jim Moorposed the question “Will human-level AI be achieved within thenext 50 years?” to five thinkers who attended the original 1956conference: John McCarthy, Marvin Minsky, Oliver Selfridge, RaySolomonoff, and Trenchard Moore. McCarthy and Minsky gave firm,unhesitating affirmatives, and Solomonoff seemed to suggest that AIprovided the one ray of hope in the face of fact that our speciesseems bent on destroying itself. (Selfridge’s reply was a bitcryptic. Moore returned a firm, unambiguous negative, and declaredthat once his computer is smart enough to interact with himconversationally about mathematical problems, he might take this wholeenterprise more seriously.) It is left to the reader to judge theaccuracy of such risky predictions as have been given by Moravec,McCarthy, and Minsky.[42]
The judgment of the reader in this regard ought to factor in thestunning resurgence, very recently, of serious reflection on what isknown as “The Singularity,” (denoted by us simply asS) the future point at which artificial intelligence exceedshuman intelligence, whereupon immediately thereafter (as the storygoes) the machines make themselves rapidly smarter and smarter andsmarter, reaching a superhuman level of intelligence that, stuck as weare in the mud of our limited mentation, we can’t fathom. Forextensive, balanced analysis ofS, see Eden et al. (2013).
Readers unfamiliar with the literature onS may be quitesurprised to learn the degree to which, among learned folks, thishypothetical event is not only taken seriously, but has in fact becomea target for extensive and frequent philosophizing [for a mordant tourof the recent thought in question, see Floridi (2015)]. Whatarguments support the belief thatS is in our future?There are two main arguments at this point: the familiarhardware-based one [championed by Moravec, as noted above, and againmore recently by Kurzweil (2006)]; and the – as far as we know– original argument given by mathematician I. J. Good (1965). Inaddition, there is a recent and related doomsayer argument advanced byBostrom (2014), which seems to presuppose thatS will occur.Good’s argument, nicely amplified and adjusted by Chalmers(2010), who affirms the tidied-up version of the argument, runs asfollows:
In this argument, ‘AI’ is artificial intelligence at thelevel of, and created by, human persons, ‘AI\(^+\)’artificial intelligence above the level of human persons, and‘AI\(^{++}\)’ super-intelligence constitutive ofS.The key process is presumably thecreation of one class ofmachine by another. We have added for convenience ‘HI’ forhuman intelligence; the central idea is then: HI will create AI, thelatter at the same level of intelligence as the former; AI will createAI\(^+\); AI\(^+\) will create AI\(^{++}\); with the ascensionproceeding perhaps forever, but at any rate proceeding long enough forus to be as ants outstripped by gods.
The argument certainly appears to be formally valid. Are its threepremises true? Taking up such a question would fling us far beyond thescope of this entry. We point out only that the concept of one classof machines creating another, more powerful class of machines is not atransparent one, and neither Good nor Chalmers provides a rigorousaccount of the concept, which is ripe for philosophical analysis. (Asto mathematical analysis, some exists, of course. It is for examplewell-known that a computing machine at level \(L\) cannot possiblycreate another machine at a higher level \(L'\). For instance, alinear-bounded automaton can’t create a Turing machine.)
The Good-Chalmers argument has a rather clinical air about it; theargument doesn’t say anything regarding whether machines in theAI\(^{++}\) category will be benign, malicious, or munificent. Manyothers gladly fill this gap with dark, dark pessimism. Thelocusclassicus here is without question a widely read paper by Bill Joy(2000): “Why The Future Doesn’t Need Us.” Joybelieves that the human race is doomed, in no small part becauseit’s busy building smart machines. He writes:
The 21st-century technologies – genetics, nanotechnology, androbotics (GNR) – are so powerful that they can spawn whole newclasses of accidents and abuses. Most dangerously, for the first time,these accidents and abuses are widely within the reach of individualsor small groups. They will not require large facilities or rare rawmaterials. Knowledge alone will enable the use of them.
Thus we have the possibility not just of weapons of mass destructionbut of knowledge-enabled mass destruction (KMD), this destructivenesshugely amplified by the power of self-replication.
I think it is no exaggeration to say we are on the cusp of the furtherperfection of extreme evil, an evil whose possibility spreads wellbeyond that which weapons of mass destruction bequeathed to thenation-states, on to a surprising and terrible empowerment of extreme individuals.[43]
Philosophers would be most interested inarguments for thisview. What are Joy’s? Well, no small reason for the attentionlavished on his paper is that, like Raymond Kurzweil (2000), Joyrelies heavily on an argument given by none other than the Unabomber(Theodore Kaczynski). The idea is that, assuming we succeed inbuilding intelligent machines, we will have them do most (if not all)work for us. If we further allow the machines to make decisions for us– even if we retain oversight over the machines –, we willeventually depend on them to the point where we must simply accepttheir decisions. But even if we don’t allow the machines to makedecisions, the control of such machines is likely to be held by asmall elite who will view the rest of humanity as unnecessary –since the machines can do any needed work (Joy 2000).
This isn’t the place to assess this argument. (Having said that,the pattern pushed by the Unabomber and his supporters certainlyappears to be flatly invalid.[44]) In fact, many readers will doubtless feel that no such place existsor will exist, because the reasoning here is amateurish. So then, whatabout the reasoning of professional philosophers on the matter?
Bostrom has recently painted an exceedingly dark picture of a possiblefuture. He points out that the “first superintelligence”could have the capability
to shape the future of Earth-originating life, could easily havenon-anthropomorphic final goals, and would likely have instrumentalreasons to pursue open-ended resource acquisition. If we now reflectthat human beings consist of useful resources (such as convenientlylocated atoms) and that we depend on many more local resources, we cansee that the outcome could easily be one in which humanity quicklybecomes extinct. (Bostrom 2014, p. 416)
Clearly, the most vulnerable premise in this sort of argument is thatthe “first superintelligence” will arrive indeed arrive.Here perhaps the Good-Chalmers argument provides a basis.
Searle (2014) thinks Bostrom’s book is misguided andfundamentally mistaken, and that we needn’t worry. His rationaleis dirt-simple: Machines aren’t conscious; Bostrom is alarmed atthe prospect of malicious machines who do us in; a malicious machineis by definition a conscious machine; ergo, Bostrom’s argumentdoesn’t work. Searle writes:
If the computer can fly airplanes, drive cars, and win at chess, whocares if it is totally nonconscious? But if we are worried about amaliciously motivated superintelligence destroying us, then it isimportant that the malicious motivation should be real. Withoutconsciousness, there is no possibiity of its being real.
The positively remarkable thing here, it seems to us, is that Searleappears to be unaware of the brute fact that most AI engineers areperfectly content to build machines on the basis of theAIMAview of AI we presented and explained above: the view according towhich machines simply map percepts to actions. On this view, itdoesn’t matter whether the machinereally has desires;what matters is whether it acts suitably on the basis of how AIscientists engineer formalcorrelates to desire. Anautonomous machine with overwhelming destructive power thatnon-consciously “decides” to kill doesn’t becomejust a nuisance because genuine, human-level, subjective desire isabsent from the machine. If an AI can play the game of chess, and thegame ofJeopardy!, it can certainly play the game of war. Justas it does little good for a human loser to point out that thevictorious machine in a game of chess isn’t conscious, it willdo little good for humans being killed by machines to point out thatthese machines aren’t conscious. (It is interesting to note thatthe genesis of Joy’s paper was an informal conversation withJohn Searle and Raymond Kurzweil. According to Joy, Searledidn’t think there was much to worry about, since he was (andis) quite confident that tomorrow’s robots can’t be conscious.[45])
There are some things we cansafely say about tomorrow.Certainly, barring some cataclysmic events (nuclear or biologicalwarfare, global economic depression, a meteorite smashing into Earth,etc.), we now know that AI will succeed in producing artificialanimals. Since even some natural animals (mules, e.g.) can beeasily trained to work for humans, it stands to reason that artificialanimals, designed from scratch with our purposes in mind, will bedeployed to work for us. In fact, many jobs currently done by humanswill certainly be done by appropriately programmed artificial animals.To pick an arbitrary example, it is difficult to believe thatcommercial drivers won’t be artificial in the future. (Indeed,Daimler is already running commercials in which they tout the abilityof their automobiles to drive “autonomously,” allowinghuman occupants of these vehicles to ignore the road and read.) Otherexamples would include: cleaners, mail carriers, clerical workers,military scouts, surgeons, and pilots. (As to cleaners, probably asignificant number of readers, at this very moment, have robots fromiRobot cleaning the carpets in their homes.) It is hard to see howsuch jobs are inseparably bound up with the attributes often taken tobe at the core of personhood – attributes that would be the mostdifficult for AI to replicate.[46]
Andy Clark (2003) has another prediction: Humans will graduallybecome, at least to an appreciable degree, cyborgs, courtesy ofartificial limbs and sense organs, and implants. The main driver ofthis trend will be that while standalone AIs are often desirable, theyare hard to engineer when the desired level of intelligence is high.But to let humans “pilot” less intelligent machines is agood deal easier, and still very attractive for concrete reasons.Another related prediction is that AI would play the role of acognitive prosthesis for humans (Ford et al. 1997; Hoffman et al.2001). The prosthesis view sees AI as a “great equalizer”that would lead to less stratification in society, perhaps similar tohow the Hindu-Arabic numeral system made arithmetic available to themasses, and to how the Guttenberg press contributed to literacybecoming more universal.
Even if the argument is formally invalid, it leaves us with a question– the cornerstone question about AI and the future: Will AIproduce artificial creatures that replicate and exceed human cognition(as Kurzweil and Joy believe)? Or is this merely an interestingsupposition?
This is a question not just for scientists and engineers; it is also aquestion for philosophers. This is so for two reasons. One, researchand development designed to validate an affirmative answer mustinclude philosophy – for reasons rooted in earlier parts of thepresent entry. (E.g., philosophy is the place to turn to for robustformalisms to model human propositional attitudes in machine terms.)Two, philosophers might well be able to provide arguments that answerthe cornerstone question now, definitively. If a version of either ofthe three arguments against “Strong” AI alluded to above(Searle’s CRA; the Gödelian attack; the Dreyfus argument)are sound, then of course AI will not manage to produce machineshaving the mental powers of persons. No doubt the future holds notonly ever-smarter machines, but new arguments pro and con on thequestion of whether this progress can reach the human level thatDescartes declared to be unreachable.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
artificial intelligence: logic-based |causation: probabilistic |Chinese room argument |cognitive science |computability and complexity |computing: modern history of |connectionism |epistemology: Bayesian |frame problem |information technology: and moral values |language of thought hypothesis |learning theory, formal |linguistics: computational |mind: computational theory of |reasoning: automated |reasoning: defeasible |statistics, philosophy of |Turing test
Thanks are due to Peter Norvig and Prentice-Hall for allowing figuresfromAIMA to be used in this entry. Thanks are due as well tothe many first-rate (human) minds who have read earlier drafts of thisentry, and provided helpful feedback. Without the support of our AIresearch and development from both ONR and AFOSR, our knowledge of AIand ML would confessedly be acutely narrow, and we are grateful forthe support. We are also very grateful to the anonymous referees whoprovided us with meticulous reviews in our reviewing round in late2015 to early 2016. Special acknowledgements are due to the SEPeditors and, in particular, Uri Nodelman for patiently working with usthroughout and for providing technical and insightful editorial help.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054