Philosophy of linguistics is the philosophy of science as applied tolinguistics. This differentiates it sharply from the philosophy oflanguage, traditionally concerned with matters of meaning andreference.
As with the philosophy of other special sciences, there are generaltopics relating to matters like methodology and explanation (e.g., thestatus of statistical explanations in psychology and sociology, or thephysics-chemistry relation in philosophy of chemistry), and morespecific philosophical issues that come up in the special science atissue (simultaneity for philosophy of physics; individuation ofspecies and ecosystems for the philosophy of biology). General topicsof the first type in the philosophy of linguistics include:
Specific topics include issues in language learnability, languagechange, the competence-performance distinction, and the expressivepower of linguistic theories.
There are also topics that fall on the borderline between philosophyof language and philosophy of linguistics: of “linguisticrelativity” (see the supplement on the linguistic relativityhypothesis in the Summer 2015 archived version of the entry onrelativism), language vs.idiolect,speech acts (including the distinction between locutionary,illocutionary, and perlocutionary acts), the language of thought,implicature, and the semantics of mental states (see the entries onanalysis,semantic compositionality,mental representation,pragmatics, anddefaults in semantics and pragmatics). In these cases it is often the kind of answer given and not theinherent nature of the topic itself that determines theclassification. Topics that we consider to be more in the philosophyof language than the philosophy of linguistics include intensionalcontexts, direct reference, and empty names (see the entries onpropositional attitude reports,intensional logic,rigid designators,reference, anddescriptions).
This entry does not aim to provide a general introduction tolinguistics for philosophers; readers seeking that should consult asuitable textbook such as Akmajian et al. (2010) or Napoli (1996). Fora general history of Western linguistic thought, including recenttheoretical linguistics, see Seuren (1998). Newmeyer (1986) is usefuladditional reading for post-1950 American linguistics. Tomalin (2006)traces the philosophical, scientific, and linguistic antecedents ofChomsky’s magnum opus (1955/1956; published 1975), and Scholzand Pullum (2007) provide a critical review. Articles that havefocused on the philosophical implications of generative linguisticsinclude Ludlow (2011) and Rey (2020). For recent articles on thephilosophy of linguistics more generally, Itkonen (2013) discussesvarious aspects of the field from its early Greek beginnings, Pullum(2019) details debates that have engaged philosophers from 1945 to2015, and Nefdt (2019a) discusses connections with contemporary issuesin the philosophy of science.
The issues we discuss have been debated with vigor and sometimesvenom. Some of the people involved have had famous exchanges in thelinguistics journals, in the popular press, and in public forums. Tounderstand the sharp disagreements between advocates of the approachesit may be useful to have a sketch of the dramatis personae before us,even if it is undeniably an oversimplification.
We see three tendencies or foci, divided by what they take to be thesubject matter, the approach they advocate for studying it, and whatthey count as an explanation. We characterize them roughly in Table1.
| externalists | emergentists | essentialists | |
| Primary phenomena | Actual utterances as produced by language users | Facts of social cognition, interaction, and communication | Intuitions of grammaticality and literal meaning |
| Primary subject matter | Language use; structural properties of expressions andlanguages | Linguistic communication, cognition, variation, and change | Abstract universal principles that explain the properties ofspecific languages |
| Aim | To describe attested expression structure and interrelations,and predicting properties of unattested expressions | To explain structural properties of languages in terms ofgeneral cognitive mechanisms and communicative functions | To articulate universal principles and provide explanations fordeep and cross-linguistically constant linguistic properties |
| Linguistic structure | A system of patterns, inferrable from generally accessible,objective features of language use | A system of constructions that range from fixed idiomaticphrases to highly abstract productive types | A system of abstract conditions that may not be evident from theexperience of typical language users |
| Values | Accurate modeling of linguistic form that accords with empiricaldata and permits prediction concerning unconsidered cases | Cognitive, cultural, historical, and evolutionary explanationsof phenomena found in linguistic communication systems | Highly abstract, covering-law explanations for properties oflanguage as inferred from linguistic intuitions |
| Children’s language | A nascent form of language, very different from adult linguisticcompetence | A series of stages in an ontogenetic process of developing adultcommunicative competence | Very similar to adult linguistic competence though obscured bycognitive, articulatory, and lexical limits |
| What is acquired | A grasp of the distributional properties of the constituents ofexpressions of a language | A mainly conventional and culturally transmitted system forlinguistic communication | An internalized generative device that characterizes an infiniteset of expressions |
Table 1. Three Approaches to the Studyof Language
A broad and varied range of distinct research projects can be pursuedwithin any of these approaches; one advocate may be more motivated bysome parts of the overall project than others are. So the tendenciesshould not be taken as sharply honed, well-developed research programsor theories. Rather, they provide background biases for thedevelopment of specific research programs—biases which sometimesdevelop into ideological stances or polemical programs or lead to thebranching off of new specialisms with separate journals. In thejudgment of Phillips (2010), “Dialog between adherents ofdifferent approaches is alarmingly rare.”
The names we have given these approaches are just mnemonic tags, notdescriptions. The Externalists, for example, might well have beencalled ‘structural descriptivists’ instead, since theytend to be especially concerned to develop models that can be used topredict the structure of natural language expressions. TheExternalists have long been referred to by Essentialists as‘empiricists’ (and sometimes Externalists apply that termto themselves), though this is misleading (see Scholz and Pullum 2006:60–63): the ‘empiricist’ tag comes with anaccusation of denying the role of learning biases in languageacquisition (see Matthews 1984, Laurence and Margolis 2001), but thatis no part of the Externalists’ creed (see e.g. Elman 1993,Lappin and Shieber 2007).
Emergentists are also sometimes referred to by Essentialists as‘empiricists’, but they either use the Emergentist labelfor themselves (Bates et al. 1998, O’Grady 2008, MacWhinney2005) or call themselves ‘usage-based’ linguists (Barlowand Kemmer 2002, Tomasello 2003) or ‘constructiongrammarians’ (Goldberg 1995, Croft 2001). Newmeyer (1991), likeTomasello, refers to the Essentialists as ‘formalists’,because of their tendency to employ abstractions, and to use toolsfrom mathematics and logic.
Despite these terminological inconsistencies, we can look at whattypical members of each approach would say about their vision oflinguistic science, and what they say about the alternatives. Many ofthe central differences between these approaches depend on whatproponents consider to be the main project of linguistic theorizing,and what they count as a satisfying explanation.
Many researchers—perhaps most—mix elements from each ofthe three approaches. For example, if Emergentists are to explain thesyntactic structure of expressions by appeal to facts about the natureof the use of symbols in human communication, then they willpresuppose a great deal of Externalist work in describing linguisticpatterns, and those Externalists who work on computational parsingsystems frequently use (at least as a starting point) rule systems and‘structural’ patterns worked out by Essentialists.Certainly, there are no logical impediments for a researcher with onetendency from simultaneously pursuing another; these approaches areonly general centers of emphasis.
If one assumes, with the Externalists, that the main goal of alinguistic theory is to develop accurate models of the structuralproperties of the speech sounds, words, phrases, and other linguisticitems, then the clearly privileged information will include corpora(written and oral)—bodies of attested and recorded language use(suitably idealized). The goal is to describe how this public recordexhibits certain (perhaps non-phenomenal) patterns that areprojectable.
American structural linguistics of the 1920s to 1950s championed thedevelopment of techniques for using corpora as a basis for developingstructural descriptions of natural languages, although such work wasreally not practically possible until the wide-spread availability ofcheap, powerful, and fast computers. André Martinet (1960: 1)notes that one of the basic assumptions of structuralist approaches tolinguistics is that “nothing may be called‘linguistic’ that is not manifest or manifested one way oranother between the mouth of the speaker and the ears of thelistener”. He is, however, quick to point out that “thisassumption does not entail that linguists should restrict their fieldof research to the audible part of the communicationprocess—speech can only be interpreted as such, and not as somuch noise, because it stands for something else that is notspeech.”
American structuralists—Leonard Bloomfield inparticular—were attacked, sometimes legitimately and sometimesillegitimately, by certain factions in the Essentialist tradition. Forexample, it was perhaps justifiable to criticize Bloomfield foradopting a nominalist ontology as popularized by the logicalempiricists. But he was later attacked by Essentialists for holdinganti-mentalist views about linguistics, when it is arguable that hisactual view was that the science of linguistics should not commititself to any particular psychological theory. (He had earlier been anenthusiast for the mentalist and introspectionist psychology ofWilhelm Wundt; see Bloomfield 1914.)
Externalism continues to thrive within computational linguistics,where the American structuralist vison of studying language throughautomatic analysis of corpora has enjoyed a recrudescence, and verylarge, computationally searchable corpora are being used to testhypotheses about the structure of languages (see Sampson 2001, chapter1, for discussion).
Emergentists aim to explain the capacity for language in terms ofnon-linguistic human capacities: thinking, communicating, andinteracting. Edward Sapir expressed a characteristic Emergentist themewhen he wrote:
Language is primarily a cultural or social product and must beunderstood as such… It is peculiarly important that linguists,who are often accused, and accused justly, of failure to look beyondthe pretty patterns of their subject matter, should become aware ofwhat their science may mean for the interpretation of human conduct ingeneral. (Sapir 1929: 214)
The “pretty patterns” derided here are characteristic ofstructuralist analyses. Sociolinguistics, which is much closer inspirit to Sapir’s project, studies the influence of social andlinguistic structure on each other. One particularly influentialstudy, Labov (1966), examines the influence of social class onlanguage variation. Other sociolinguists examine the relation betweenstatus within a group on linguistic innovation (Eckert 1989). Thisinterest in variation within languages is characteristic ofEmergentist approaches to the study of language.
Another kind of Emergentist, like Tomasello (2003), will stress therole of theory of mind and the capacity to use symbols to changeconspecifics’ mental states as uniquely human preadaptations forlanguage acquisition, use, and invention. MacWhinney (2005) aims toexplain linguistic phenomena (such as phrase structure and constraintson long distance dependencies) in terms of the way conversationfacilitates accurate information-tracking andperspective-switching.
Functionalist research programs generally fall within the broadtendency to approach the study of language as an Emergentist.According to one proponent:
The functionalist view of language [is] as a system of communicativesocial interaction… Syntax is not radically arbitrary, in thisview, but rather isrelatively motivated by semantic,pragmatic, and cognitive concerns. (Van Valin 1991, quoted in Newmeyer1991: 4; emphasis in original)
And according to Russ Tomlin, a linguist who takes a functionalistapproach:
Syntax is not autonomous from semantics or pragmatics…therejection of autonomy derives from the observation that the use ofparticular grammatical forms is strongly linked, evendeterministically linked, to the presence of particular semantic orpragmatic functions in discourse. (Tomlin 1990, quoted by Newmeyer(1991): 4)
The idea that linguistic form is autonomous, and more specificallythat syntactic form (rather than, say, phonological form) isautonomous, is a characteristic theme of the Essentialists. And theclaims of Van Valin and Tomlin to the effect that syntax isnot independent of semantics and pragmatics might tempt someto think that Emergentism and Essentialism are logically incompatible.But this would be a mistake, since there are a large number ofnonequivalent autonomy of form theses.
Even in the context of trying to explain what the autonomy thesis is,Newmeyer (1991: 3) talks about five formulations of the thesis, eachof which can be found in some Essentialists’ writings, without(apparently) realizing that they are non-equivalent. One is therelatively strong claim that the central properties of linguistic formmust not be defined with essential reference to“concepts outside the system”, which suggests that noprimitives in linguistics could be defined in psychological orbiological terms. Another takes autonomy of form to be anormative claim: that linguistic conceptsought notto be defined or characterized in terms of non-linguistic concepts.The third and fourth versions are ontological: one denies that centrallinguistic conceptsshould be ontologically reduced tonon-linguistic ones, and the other denies that theycan be.And in the fifth version the autonomy of syntax is taken to deny thatsyntactic patterning can beexplained in terms of meaning ordiscourse functions.
For each of these versions of autonomy, there are Essentialists whoagree with it. Probably the paradigmatic Essentialist agrees with themall. But Emergentists need not disagree with them all. Paradigmaticfunctionalists like Tomlin, Van Valin and MacWhinney could inprinciple hold that the explanation of syntactic form, for example,will ultimately be in terms of discourse functions and semantics, butstill accept that syntactic categories cannot be reduced tonon-linguistic ones.
If Leonard Bloomfield is the intellectual ancestor of Externalism, andSapir the father of Emergentism, then Noam Chomsky is the intellectualancestor of Essentialism. The researcher with predominantlyEssentialist inclinations aims to identify the intrinsic properties oflanguage that make it what it is. For a huge majority of practitionersof this approach—researchers in the tradition ofgenerative grammar associated with Chomsky—thismeans postulating universals of human linguistic structure, unlearnedbut tacitly known, that permit and assist children to acquire humanlanguages. This generative Essentialism has a preference for findingsurprising characteristics of languages that cannot be inferred fromthe data of usage, and are not predictable from human cognition or therequirements of communication.
Rather than being impressed with language variation, as areEmergentists and many Externalists, the generative Essentialists areextremely impressed with the idea that very young children of almostany intelligence level, and just about any social upbringing, acquirelanguage to the same high degree of mastery. From this it is inferredthat there must be unlearned features shared by all languages thatsomehow assist in language acquisition.
A large number of contemporary Essentialists who followChomsky’s teaching on this matter claim that semantics andpragmatics are not a central part of the study of language. InChomsky’s view, “it is possible that natural language hasonly syntax and pragmatics” (Chomsky 1995: 26); that is, only“internalist computations and performance systems that accessthem”; semantic theories are merely “part of an interfacelevel” or “a form of syntax” (Chomsky 1992:223).
Thus, while Bloomfield understood it to be a sensible practicaldecision to assign semantics to some field other than linguisticsbecause of the underdeveloped state of semantic research, Chomskyappears to think that semantics as standardly understood is not partof the essence of the language faculty at all. (In broad outline, thisexclusion of semantics from linguistics comports with Sapir’sview that form is linguistic but content is cultural.)
Although Chomsky is an Essentialist in his approach to the study oflanguage, excluding semantics as a central part of linguistic theoryclearly does not follow from linguistic Essentialism (Katz 1980provides a detailed discussion of Chomsky’s views on semantics).Today there are many Essentialists whodo hold that semanticsis a component of a full linguistic theory.
For example, many linguists today are interested in thesyntax-semantics interface—the relationship between the surfacesyntactic structure of sentences and their semantic interpretation.This area of interest is generally quite alien to philosophers who areprimarily concerned with semantics only, and it falls outside ofChomsky’s syntactocentric purview as well. Linguists who work inthe kind of semantics initiated by Montague (1974) certainly focus onthe essential features of language (most of their findings appear tobe of universal import rather than limited to the semantic rules ofspecific languages). Useful works to consult to get a sense of themodern style of investigation of the syntax-semantics interface wouldinclude Partee (1975), Jacobson (1996), Szabolcsi (1997), Chierchia(1998), Steedman (2000).
The discussion so far has been at a rather high level of abstraction.It may be useful to contrast the three tendencies by looking at howthey each would analyze a particular linguistic phenomenon. We haveselected the syntax ofdouble-object clauses likeHand the guard your pass (also calledditransitive clauses), in which the verb isimmediately followed by a sequence of two noun phrases, the firsttypically denoting a recipient and the second something transferred.For many such clauses there is an alternative way of expressingroughly the same thing: forHand the guard your pass there isthe alternativeHand your pass to the guard, in which theverb is followed by a single object noun phrase and the recipient isexpressed after that by a preposition phrase withto. We willcall theserecipient-PP clauses.
Larson (1988) offers a generative Essentialist approach to the syntaxof double-object clauses. In order to provide even a rough outline ofhis proposals, it will be very useful to be able to usetreediagrams of syntactic structure. A tree is a mathematicalobject consisting of a set of points callednodesbetween which certain relations hold. The nodes correspond tosyntactic units; left-right order on the page corresponds to temporalorder of utterance between them; and upward connecting lines representthe relation ‘is an immediate subpart of’. Nodes arelabeled to show categories of phrases and words, such as noun phrase(NP); preposition phrase (PP); and verb phrase (VP). When the internalstructure of some subpart of a tree is basically unimportant to thetopic under discussion, it is customary to mask that part with anempty triangle. Consider a simple example: an active transitive clauselike (Ai) and its passive equivalent (Aii).
A tree structure for (Ai) is shown in (T1).

In analyses of the sort Larson exemplifies, the structure of anexpression is given by aderivation, which consistsof a sequence of successively modified trees. Larson calls theearliest onesunderlying structures. The last (andleast abstract) in the derivation is the surface structure, whichcaptures properties relevant to the way the expression is written andpronounced. The underlying structures are posited in order to betteridentify syntactic generalizations. They are related to surfacestructures by a series of operations calledtransformations (which generative Essentialiststypically regard as mentally real operations of the human languagefaculty).
One of the fundamental operations that a transformation can effect ismovement, which involves shifting a part of thesyntactic structure of a tree to another location within it. Forexample, it is often claimed that passive clauses have very much thesame kinds of underlying structures as the synonymous active clauses,and thus a passive clause like (Aii) would have an underlyingstructure much like (T1). A movement transformation would shiftthe guard toward the end of the clause (and addby),and another would shiftmy pass into the position before theverb. In other words, passive clauses look much more like their activecounterparts in underlying structure.
In a similar way, Larson proposes that a double-object clause like(B.ii) has the same underlying structure as (B.i).
Moreover, he proposes that the transformational operation of derivingthe surface structure of (B.ii) from the underlying structure of (B.i)is essentially the same as the one that derives the surface structureof (A.ii) from the underlying structure of (A.i).
Larson adopts many assumptions from Chomsky (1981) and subsequentwork. One is that all NPs have to be assignedCase inthe course of a derivation. (Case is an abstract syntactic property,only indirectly related to the morphological case forms displayed bynominative, accusative, and genitive pronouns. Objective Case isassumed to be assigned to any NP in direct object position, e.g.,my pass in (T1), and Nominative Case is assigned to an NP inthe subject position of a tensed clause, e.g.,the guard in(T1).)
He also makes two specific assumptions about the derivation of passiveclauses. First, Case assignment to the position immediately after theverb is “suppressed”, which entails that the NP there willnot get Case unless it moves to some other position. (The subjectposition is the obvious one, because there it will receive NominativeCase.) Second, there is an unusual assignment of semantic role to NPs:instead of the subject NP being identified as the agent of the actionthe clause describes, that role is assigned to an adjunct at the endof the VP (theby-phrase in (A.ii); an adjunct is aconstituent with an optional modifying role in its clause rather thana grammatically obligatory one like subject or object).
Larson proposes that both of these points about passive clauses haveanalogs in the structure of double-object VPs. First, Case assignmentto the position immediately after the verb is suppressed; and sinceLarson takes the preposition to to be the marker of Case, this meansin effect thatto disappears. This entails that the NP afterto will not get Case unless it moves to some other position.Second, there is an unusual assignment of semantic role to NPs:instead of the direct object NP being identified as the entityaffected by the action the clause describes, that role is assigned toan adjunct at the end of the VP.
Larson makes some innovative assumptions about VPs. First, he proposesthat in the underlying structure of a double-object clausethedirect object precedes the verb, the tree diagram being (T2).

This does not match the surface order of words (showed my pass tothe guard), but it is not intended to: it is an underlyingstructure. A transformation will move the verb to the left ofmypass to produce the surface order seen in (B.i).
Second, he assumes that there are two nodes labeled VP in adouble-object clause, and two more labeled V′, though there isonly one word of the verb (V) category. (Only the smaller VP andV′ are shown in the partial structure (T2).)
What is important here is that (T2) is the basis for the double-objectsurface structure as well. To produce that, the prepositionto is erased and an additional NP position (formypass) is attached to the V′, thus:

The additional NP is assigned the affected-entity semantic role. Theother NP (the guard) does not yet have Case; but Larsonassumes that it moves into the NP position before the verb. The resultis shown in (T4), where ‘e’ marks the emptystring left where some words have been moved away:

Larson assumes that in this positionthe guard can receiveCase. What remains is for the verb to move into a higher V positionfurther to its left, to obtain the surface order:

The complete sequence of transformations is taken to give a deeptheoretical explanation of many properties of (B.i) and (B.ii),including such things as what could be substituted for the two NPs,and the fact there is at least rough truth-conditional equivalencebetween the two clauses.
The reader with no previous experience of generative linguistics willhave many questions about the foregoing sketch (e.g., whether it isreally necessary to havethe guard aftershowed in(T3), then the opposite order in (T4), and finally the same orderagain in (T5)). We cannot hope to answer such questions here;Larson’s paper is extremely rich in further assumptions, linksto the previous literature, and additional classes of data that heaims to explain. But the foregoing should suffice to convey some ofthe flavor of the analysis.
The key point to note is that Essentialists seek underlying symmetriesand parallels whose operation is not manifest in the data of languageuse. For Essentialists, there is positive explanatory virtue inhypothesizing abstract structures that are very far from beinginferrable from performance; and the posited operations on thosestructures are justified in terms of elegance and formal parallelismwith other analyses, not through observation of language use incommunicative situations.
Many Emergentists are favorably disposed toward the kind ofconstruction grammar expounded in Goldberg (1995). Wewill use her work as an exemplar of the Emergentist approach. Thefirst thing to note is that Goldberg does not take double-objectclauses like (B.ii) to be derived alternants of recipient-PPstructures like (B.i), the way Larson does. So she is not looking fora regular syntactic operation that can relate their derivations;indeed, she does not posit derivations at all. She is interested inexplaining correlations between syntactic, semantic, and pragmaticaspects of clauses; for example, she asks this question:
How are the semantics of independent constructions related such thatthe classes of verbs associated with one overlap with the classes ofverbs associated with another? (Goldberg 1995: 89)
Thus she aims to explain why some verbs occur in both thedouble-object and recipient-PP kinds of expression and some donot.
The fundamental notion in Goldberg’s linguistic theory is thatof aconstruction. A construction can be defined veryroughly as a way of structurally composing words or phrases—asort of template—for expressing a certain class of meanings.Like Emergentists in general, Goldberg regards linguistic theory ascontinuous with a certain part of general cognitive psychologicaltheory; linguistics emerges from this more general theory, andlinguistic matters are rarely fully separate from cognitive matters.So a construction for Goldberg has a mental reality: it corresponds toa generalized concept or scenario expressible in a language, annotatedwith a guide to the linguistic structure of the expression.
Many words will be trivial examples of constructions: a single conceptpaired with a way of pronouncing and some details about grammaticalrestrictions (category, inflectional class, etc.); but constructionscan be much more abstract and internally complex. The double-objectconstruction, which Goldberg calls the Ditransitive Construction, is amoderately abstract and complex one; she diagrams it thus (p. 50):

This expresses a set of constraints on how to use English tocommunicate the idea of a particular kind of scenario. The scenarioinvolves a ternary relation CAUSE-RECEIVE holding between an agent(agt), a recipient (rec), and apatient (pat). PRED is a variable that is filled bythe meaning of a particular verb when it is employed in thisconstruction.
The solid vertical lines downward fromagt andpat indicate that for any verb integrated into thisconstruction it is required that its subject NP should express theagent participant, and the direct object (OBJ2) shouldexpress the patient participant. The dashed vertical line downwardfrom rec signals that the first object (OBJ) may express the recipientbut it does not have to—the necessity of there being a recipientis a property of the construction itself, and not every verb demandsthat it be made explicit who the recipient is. But if there are twoobjects, the first is obligatorily associated with the recipient role:We sent the builder a carpenter can only express a claimabout the sending of a carpenter over to the builder, never thesending of the builder over to where a carpenter is.
When a particular verb is used in this construction, it may haveobligatory accompanying NPs denoting what Goldberg calls“profiled participants” so that the match between theparticipant roles (agt,rec,pat) is one-to-one, as with the verbhand.When this verb is used, the agent (‘hander’), recipient(‘handee’), and item transferred (‘handed’)must all be made explicit. Goldberg gives the following diagram of the“composite structure” that results whenhand isused in the construction:

Because of this requirement of explicit presence,Hand him yourpass is grammatical, but*Hand him is not,and neither is*Hand your pass. The verbsend, on the other hand, illustrates the optional syntacticexpression of the recipient role: we can saySend a textmessage, which is understood to involve some recipient but doesnot make the recipient explicit.
The R notation relates to the fact that particular verbs may expresseither aninstance of causing someone to receive something,as withhand, or ameans of causing someone toreceive something, as withkick: whatJoe kicked Bill theball means is that Joe caused Bill to receive the ball by meansof a kicking action.
Goldberg’s discussion covers many subtle ways in which thescenario communicated affects whether the use of a construction isgrammatical and appropriate. For example, there is something odd about?Joe kicked Bill the ball he was trying to kick toSam: the Ditransitive Construction seems best suited to cases ofvolitional transfer (rather than transfer as an unexpected side effectof a blunder). However, an exception is provided by a class of casesin which the transfer is not of a physical object but is onlymetaphorical:That guy gives me the creeps does not imply anyvolitional transfer of a physical object.
Metaphorical cases are distinguished from physical transfers in otherways as well. Goldberg notes sentences likeThe music lent theevent a festive air, wherethe music is subject of theverb lend despite the fact that music cannot literally lend anythingto anyone.
Goldberg discusses many topics such as metaphorical extension,shading, metonymy, cutting, role merging, and also presents variousgeneral principles linking meanings and constructions. One of theseprinciples, the No Synonymy Principle, says that no two syntacticallydistinct constructions can be both semantically and pragmaticallysynonymous. It might seem that if any two sentences are synonymous,pairs like this are:
Yet the two constructions cannot be fully synonymous, bothsemantically and pragmatically, if the No Synonymy Principle iscorrect. And to support the principle, Goldberg notes purportedcontrasts such as this:
There is a causation-as-transfer metaphor here, and it seems to becompatible with the double object construction but not with therecipient-PP. So (in Goldberg’s view) the two are not fullysynonymous.
It is no part of our aim here to provide a full account of the contentof Goldberg’s discussion of double-object clauses. But what wewant to highlight is that the focus is not on finding abstractelements or operations of a purely syntactic nature that arecandidates for being essential properties of language per se. Thefocus for Emergentists is nearly always on the ways in which meaningis conveyed, the scenarios that particular constructions are used tocommunicate, and the aspects of language that connect up withpsychological topics like cognition, perception, andconceptualization.
One kind of work that is representative of the Externalist tendency isnicely illustrated by Bresnan et al. (2007) and Bresnan and Ford(2010). Bresnan and her colleagues defend the use ofcorpora—bodies of attested written and spoken texts. One oftheir findings is that a number of types of expressions that linguistshave often taken to be ungrammatical do in fact turn up in actual use.Essentialists and Emergentists alike have often, purely on the basisof intuition, asserted that sentences likeJohn gave Mary akiss are grammatical but sentences likeJohn gave a kiss toMary are no, as we see above with Goldberg’s (D)(ii).Bresnan and her colleagues find numerous occurrences of the lattersort on the World Wide Web, and conclude that they are notungrammatical or even unacceptable, but merely dispreferred.
Bresnan and colleagues used a three-million-word collection ofrecorded and transcribed spontaneous telephone conversations known asthe Switchboard corpus to study the double-object and recipient-PPconstructions. They first annotated the utterances with indications ofa number of factors that they thought might influence the choicebetween the double-object and recipient-PP constructions:
They also coded the verb meanings by assigning them to half a dozensemantic categories:
They then constructed a statistical model of the corpus: amathematical formula expressing, for each combination of the factorslisted above, the ratio of the probabilities of the double object andthe recipient-PP. (To be precise, they used the natural logarithm ofthe ratio ofp to 1 −p, wherep isthe probability of a double-object or recipient-PP in the corpus beingof the double-object form.) They then used logistic regression topredict the probability of fit to the data.
To determine how well the model generalized to unseen data, theydivided the data randomly 100 times into a training set and a testingset, fit the model parameters on each training set, and scored itspredictions on the unseen testing set. The average percent of correctpredictions on unseen data was 92%. All components of the model exceptnumber of the recipient NP made a statistically significantdifference—almost all at the 0.001 level.
What this means is that knowing only the presence or absence of thesort of factors listed above they were reliably able to predictwhether double-object or recipient-PP structures would be used in agiven context, with a 92% score accuracy rate.
The implication is that the two kinds of structure are notinterchangeable: they are reliably differentiated by the presence ofother factors in the texts in which they occur.
They then took the model they had generated for the telephone speechdata and applied it to a corpus of written material: theWallStreet Journal corpus (WSJ), a collection of 1987–9newspaper copy, only roughly edited. The main relevant difference withwritten language is that the language producer has more opportunity toreflect thoughtfully on how they are going to phrase things. It wasreasonable to think that a model based on speech data might nottransfer well. But instead the model had 93.5% accuracy. The authorsconclude is that “the model for spoken English transfersbeautifully to written”. The main difference between the corporawas found to be a slightly higher probability of the recipient-PPstructure in written English.
In a very thorough subsequent study, Bresnan and Ford (2010) show thatthe results also correlate with native speakers’ metalinguisticjudgments of naturalness for sentence structures, and with lexicaldecision latencies (speed of deciding whether the words in a text weregenuine English words or not), and with a sentence completion task(choosing the most natural of a list of possible completions of apartial sentence). The results of these experiments confirmed thattheir model predicted participants’ performance.
Among the things to note about this work is that it was all done ondirectly recorded performance data: transcripts of people speaking toeach other spontaneously on the phone in the case of the Switchboardcorpus, stories as written by newspaper journalists in the case ofWSJ, measured responses of volunteer subjects in a laboratory in thecase of the psycholinguistic experiments of Bresnan and Ford (2010).The focus is on identifying the factors in linguistic performance thatpermit accurate prediction of future performance, and the methods ofinvestigation have a replicability and checkability that is familiarin the natural sciences.
However, we should make it clear that the work is not some kind ofclose-to-the-ground collecting and classifying of instances. Themodels that Bresnan and her colleagues develop are sophisticatedmathematical abstractions, very far removed from the records ofutterance tokens. They claim that these models “allow linguistictheory to solve more difficult problems than it has in the past, andto build convergent projects with psychology, computer science, andallied fields of cognitive science” (Bresnan et al. 2007:69).
It is important to see that the contrast we have drawn here is notjust between three pieces of work that chose to look at differentaspects of the phenomena associated with double-object sentences. Itis true that Larson focuses more on details of tree structure,Goldberg more on subtle differences in meaning, and Bresnan et al. onfrequencies of occurrence. But that is not what we are pointing to.What we want to stress is that we are illustrating three differentbroad approaches to language that regard different facts as likely tobe relevant, and make different assumptions about what needs to beaccounted for, and what might count as an explanation.
Larson looks at contrasts between different kinds of clause withdifferent meanings and see evidence of abstract operations affectingsubtle details of tree structure, and parallelism between derivationaloperations formerly thought distinct.
Goldberg looks at the same facts and sees evidence not for anything todo with derivations but for the reality of specificconstructions—roughly, packets of syntactic, semantic, andpragmatic information tied together by constraints.
Bresnan and her colleagues see evidence that readily observable factsabout speaker behavior and frequency of word sequences correlateclosely with certain lexical, syntactic, and semantic properties ofwords.
Nothing precludes defenders of any of the three approaches from payingattention to any of the phenomena that the other approaches attend to.There is ample opportunity for linguists to mix aspects of the threeapproaches in particular projects. But in broad outline there arethree different tendencies exhibited here, with stereotypical viewsand assumptions roughly as we laid them out in Table 1.
The complex and multi-faceted character of linguistic phenomena meansthat the discipline of linguistics has a whole complex ofdistinguishable subject matters associated with different researchquestions. Among the possible topics for investigation are these:
There is no reason for all of the discipline of linguistics toconverge on a single subject matter, or to think that the entire fieldof linguistics cannot have a diverse range of subject matters. To givea few examples:
Most saliently of all, Harris’s student Chomsky reacted stronglyagainst indifference toward the mind, and insisted that the principalsubject matter of linguistics was, and had to be, a narrowpsychological version of (i), and an individual, non-social, andinternalized conception of (ii).
In the course of advancing his view, Chomsky introduced a number ofnovel pairs of terms into the linguistics literature: competence vs.performance (Chomsky 1965); ‘I-language’ vs.‘E-language’ (Chomsky 1986); the faculty of language inthe narrow sense vs. the and faculty of language in the broad sense(the ‘FLN’ and ‘FLB’ of Hauser et al. 2002).Because Chomsky’s terminological innovations have been adoptedso widely in linguistics, the focus of sections 2.1–2.3 will beto examine the use of these expressions as they were introduced intothe linguistics literature and consider their relation to(i)–(vii).
Essentialists invariably distinguish between what Chomsky (1965)calledcompetence andperformance.Competence is what knowing a language confers: a tacit grasp of thestructural properties of all the sentences of a language. Performanceinvolves actual real-time use, and may diverge radically from theunderlying competence, for at least two reasons: (a) an attempt toproduce an utterance may be perturbed by non-linguistic factors likebeing distracted or interrupted, changing plans or losing attention,being drunk or having a brain injury; or (b) certain capacity limitsof the mechanisms of perception or production may be overstepped.
Emergentists tend to feel that the competence/performance distinctionsidelines language use too much. Bybee and McClelland put it thisway:
One common view is that language has an essential and unique innerstructure that conforms to a universal ideal, and what people say is apotentially imperfect reflection of this inner essence, muddied byperformance factors. According to an opposing view…language usehas a major impact on language structure. The experience that usershave with language shapes cognitive representations, which are builtup through the application of general principles of human cognition tolinguistic input. The structure that appears to underlie language usereflects the operation of these principles as they shape howindividual speakers and hearers represent form and meaning and adaptthese forms and meanings as they speak. (Bybee and McClelland 2005:382)
And Externalists are often concerned to describe and explain not onlylanguage structure, but also the workings of processing mechanisms andthe etiology of performance errors.
However, every linguist accepts that some idealization away from thespeech phenomena is necessary. Emergentists and Externalists arealmost always happy to idealize away from sporadic speech errors. Whatthey are not so keen to do is to idealize away from limitations onlinguistic processing and the short-term memory on which it relies.Acceptance of a thoroughgoing competence/performance distinction thustends to be a hallmark of Essentialist approaches, which take thenature of language to be entirely independent of other human cognitiveprocesses (though of course capable of connecting to them).
The Essentialists’ practice of idealizing away from evenpsycholinguistically relevant factors like limits on memory andprocessing plays a significant role in various important debateswithin linguistics. Perhaps the most salient and famous is the issueof whether English is a finite-state language.
The claim that English is not accepted by any finite-state automatoncan only be supported by showing that every grammar for English hascenter- embedding to an unbounded depth (see Levelt 2008: 20–23for an exposition and proof of the relevant theorem, originally fromChomsky 1959). But even depth-3 center-embedding of clauses (a clauseinterrupting a clause that itself interrupts a clause) is in practiceextraordinarily hard to process. Hardly anyone can readily understandeven semantically plausible sentences likeVehicles that engineerswho car companies trust build crash every day. And such sentencesvirtually never occur, even in writing. Karlsson (2007) undertakes anextensive examination of available textual material, and concludesthat depth-3 center-embeddings are vanishingly rare, and no genuinedepth-4 center-embedding has ever occurred at all in naturallycomposed text. He proposes that there is no reason to regardcenter-embedding as grammatical beyond depth 3 (and for spokenlanguage, depth 2). Karlsson is proposing a grammar that stays closeto what performance data can confirm; the standard Essentialist viewis that we should project massively from what is observed, and saythat depth-n center-embedding is fully grammatical for alln.
Chomsky (1986) introduced into the linguistics literature twotechnical notions of a language: ‘E-Language’ and‘I-Language’. He deprecates the former as eitherundeserving of study or as a fictional entity, and promotes the latteras the only scientifically respectable object of study for a seriouslinguistics.
Chomsky’s notion ‘E-language’ is supposed to suggestby its initial ‘E’ both ‘extensional’(concerned with which sentences happen to satisfy a definition of alanguage rather than with what the definition says) and‘external’ (external to the mind, that is, non-mental).The dismissal of E-language as an object of study is aimed at criticsof Essentialism—many but not all of those critics falling withinour categories of Externalists and Emergentists.
Extensional. First, there is an attempt to impugn theextensional notion of a language that is found in two radicallydifferent strands of Externalist work. Some Externalist investigationsare grounded in the details of attested utterances (as collected incorpora), external to human minds. Others, with mathematical orcomputational interests, sometimes idealize languages as extensionallydefinable objects (typically infinite sets of strings) with a certainstructure, independently of whatever device might be employed tocharacterize them. A set of strings of words either is or is notregular (finite-state), either is or is not recursive (decidable),etc., independently of forms of grammar statement. Chomsky (1986)basically dismissed both corpus-based work and mathematicallinguistics simply on the grounds that they employ an extensionalconception of language that is, a conception that removes the objectof study from having an essential connection with the mental.
External. Second, a distinct meaning based on‘external’ was folded into the neologism‘E-language’ to suggest criticism of any view thatconceives of a natural language as a public, intersubjectivelyaccessible system used by a community of people (often millions ofthem spread across different countries). Here, the objection is thatlanguages as thus conceived have no clear criteria of individuation interms of necessary and sufficient conditions. On this conception, thesubject matter of interest is a historico-geographical entity thatchanges as it is transmitted over generations, or over mountainranges. Famously, for example, there is a gradual valley-to-valleychange in the language spoken between southeastern France andnorthwestern Italy such that each valley’s speakers canunderstand the next. But the far northwesterners clearly speak Frenchand the far southeasterners clearly speak Italian. It is thepolitically defined geographical border, not the intrinsic propertiesof the dialects, that would encourage viewing this continuum as twodifferent languages.
Perhaps the most famous quotation by any linguist is standardlyattributed to Max Weinreich (1945): ‘A shprakh iz a dialekt mitan armey un flot’ (‘A language is a dialect with an armyand navy’; he actually credits the remark to an unnamedstudent). The implication is that E-languages are defined in terms ofnon-linguistic, non-essential properties. Essentialists object that ascientific linguistics cannot tolerate individuating French andItalian in a way that is subject to historical contingencies of warsand treaties (after all, the borders could have coincided with adifferent hill or valley had some battle had a different outcome).
Considerations of intelligibility fare no better. Mutualintelligibility between languages is not a transitive relation, andsometimes the intelligibility relation is not even symmetric (smaller,more isolated, or less prestigious groups often understand thedialects of larger, more central, or higher-prestige groups when theconverse does not hold). So these sociological facts cannotindividuate languages either.
Chomsky therefore concludes that languages cannot be defined orindividuated extensionally or mind-externally, and hence the onlyscientifically interesting conception of a ‘language’ isthe ‘I-language’ view (see for example Chomsky 1986: 25;1992; 1995 and elsewhere). Chomsky says of E-languages that “allscientific approaches have simply abandoned these elements of what iscalled ‘language’ in common usage” (Chomsky 1988,37); and “we can define E-language in one way or another or notat all, since the concept appears to play no role in the theory oflanguage” (Chomsky 1986: 26; in saying that it appears to playno role in the theory of language, here he means that it plays no rolein the theory he favours).
This conclusion may be bewildering to non-linguists as well asnon-Essentialists. It is at odds with what a broad range ofphilosophers have tacitly assumed or explicitly claimed about languageor languages: ‘[A language] is a practice in which peopleengage…it is constituted by rules which it is part of socialcustom to follow’ (Dummett 1986: 473–473); ‘Languageis a set of rules existing at the level of common knowledge’ andthese rules are ‘norms which govern intentional socialbehavior’ (Itkonen 1978: 122), and so on. Generally speaking,those philosophers influenced by Wittgenstein also take the view thata language is a social-historical entity. But the opposite view hasbecome a part of the conceptual underpinning of linguistics for manyEssentialists.
Failing to have precise individuation conditions is surely not asufficient reason to deny that an entity can be studiedscientifically. ‘Language’ as a count noun in theextensional and socio-historical sense is vague, but this need not beany greater obstacle to theorizing about them than is the vagueness ofother terms for historical entities without clear individuationconditions, like ‘species’ and ‘individualorganism’ in biology.
At least some Emergentist linguists, and perhaps some Externalists,would be content to say that languages are collections of socialconventions, publicly shared, and some philosophers would agree (seeMillikan 2003, for example, and Chomsky 2003 for a reply). Lewis(1969) explicitly defends the view that language can be understood interms of public communications, functioning to solve coordinationproblems within a group (although he acknowledges that thecoordination could be between different temporal stages of oneindividual, so language use by an isolated person is alsointelligible; see the appendix “Lewis’s Theory ofLanguages as Conventions” in the entry onidiolects, for further discussion of Lewis). What Chomsky calls E-languages,then, would be perfectly amenable to linguistic or philosophicalstudy. Santana (2016) makes a similar argument in terms of scientificidealization. He argues that since all sciences idealize theirtargets, Chomsky needs to do more to show why idealizations concerningE-languages are illicit (see also Stainton 2014).
Chomsky (1986) introduced the neologism ‘I-language’ inpart to disambiguate the word ‘grammar’. In earliergenerative Essentialist literature, ‘grammar’ was(deliberately) ambiguous between (i) the linguist’s generativetheory and (ii) what a speaker knows when they know a language.‘I-language’ can be regarded as a replacement forBever’s term ‘psychogrammar’ (see also George 1989):it denotes a mental or psychological entity (not a grammarian’sdescription of a language as externally manifested).
I-language is first discussed under the sub-heading of‘internalized language’ to denote linguistic knowledge.Later discussion in Chomsky 1986 and 1995 makes it clear that the‘I’ of ‘I-language’ is supposed to suggest atleast three English words: ‘individual’,‘internal’, and ‘intensional’. And Chomskyemphasizes that the neologism also implies a kind of realism aboutspeakers’ knowledge of language.
Individual. A language is claimed to be strictly aproperty of individual human beings—not groups. The contrast isbetween the idiolect of a single individual, and a dialect or languageof a geographical, social, historical, or political group. I-languagesare properties of the minds of individuals who know them.
Internal. As generative Essentialists see it, yourI-language is a state of your mind/brain. Meaning isinternal—indeed, on Chomsky’s conception, anI-language
is a strictly internalist, individualist approach to language,analogous in this respect to studies of the visual system. If thecognitive system of Jones’s language faculty is in state L, wewill say that Jones has the I-language L. (Chomsky 1995: 13)
And he clarifies the sense in which an I-language is internal byappealing to an analogy with the way the study of vision isinternal:
The same considerations apply to the study of visual perception alonglines pioneered by David Marr, which has been much discussed in thisconnection. This work is mostly concerned with operations carried outby the retina; loosely put, the mapping of retinal images to thevisual cortex. Marr’s famous three levels ofanalysis—computational, algorithmic, andimplementation—have to do with ways of construing such mappings.Again, the theory applies to a brain in a vat exactly as it does to aperson seeing an object in motion. (Chomsky 1995: 52)
Thus, while the speaker’s I-language may be involved inperforming operations over representations of distalstimuli—representations of other speaker’sutterances—I-languages can and should be studied in isolationfrom their external environments.
Although Chomsky sometimes refers to this narrow individuation ofI-languages as ‘individual’, he clearly claims thatI-languages are individuated in isolation from both speech communitiesand other aspects of the broadly conceived natural environment:
Suppose Jones is a member of some ordinary community, and J isindistinguishable from him except that his total experience derivesfrom some virtual reality design; or let J be Jones’s Twin in aTwin-Earth scenario. They have had indistinguishable experiences andwill behave the same way (in so far as behavior is predictable atall); they have the same internal states. Suppose that J replacesJones in the community, unknown to anyone except the observingscientist. Unaware of any change, everyone will act as before,treating J as Jones; J too will continue as before. The scientistseeking the best theory of all of this will construct a narrowindividualist account of Jones, J, and others in the community. Theaccount omits nothing… (Chomsky 1995: 53–54)
This passage can also be seen as suggesting a radically intensionalistconception of language.
Intensional. The way in which I-languages are‘intensional’ for Chomsky needs a little explication. Theconcept of intension is familiar in logic and semantics, where‘intensional’ contrasts with ‘extensional’.The extension of a predicate like blue is simply the set of all blueobjects; the intension is the function that picks out in a given worldthe blue objects contained therein. In a similar way, the extension ofa set can be distinguished from an intensional description of the setin terms of a function: the set of integer squares is {1, 4, 9, 16,25, 36, …}, and the intension could be given in terms of theone-place functionf such thatf(n) =n ×n. One difference between the two accountsof squaring is that the intensional one could be applied to adifferent domain (any domain on which the ‘×’operation is defined: on the rationals rather than the integers, forexample, the extension of the identically defined function is adifferent and larger set containing infinitely many fractions).
In an analogous way, a language can be identified with the set of alland only its expressions (regardless of what sort of object anexpression is: a word sequence, a tree structure, a completederivation, or whatever), which is the extensional view; but it canalso be identified intensionally by means of a recipe or formalspecification of some kind—what linguists call a grammar. Ludlow(2011) considers the first I (individual) to be the weakest link andthus the most expendable. He argues in its stead for a concept of a“Ψ-language” which allows for the possibility ofthe I-language relating to external objects either constitutively orotherwise.
In natural language semantics, an intensional context is one wheresubstitution of co-extensional terms fails to preserve truth value(Scott is Scott is true, andScott is the author ofWaverley is true, but the truth ofGeorge knows that Scott isScott doesn’t guarantee the truth ofGeorge knows thatScott is the author of Waverly, soknows thatestablishes an intensional context).
Chomsky claims that the truth of an I-language attribution is notpreserved by substituting terms that have the same extension. That is,even when two human beings do not differ at all on what expressionsare grammatical, it may be false to say that they have the sameI-language. Where H is a human being and L is a language (in theinformal sense) and R is the relation of knowing (or having, or using)that holds between a human being and a language, Chomsky holds, ineffect, that R establishes an intensional context in statements of thetheory:
[F]or H to know L is for H to have a certain I-language. Thestatements of the grammar are statements of the theory of mind aboutthe I-language, hence structures of the brain formulated at a certainlevel of abstraction from mechanisms. These structures are specificthings in the world, with their properties… The I-language Lmay be the one used by a speaker but not the I-language L′ evenif the two generate the same class of expressions (or other formalobjects) … L′ may not even be a possible humanI-language, one attainable by the language faculty. (Chomsky 1986: 23)
The idea is that two individuals can know (or have, or use) differentI-languages that generate exactly the same strings of words, and evengive them exactly the same structures. This situation forms the basisof Quine’s (1972) infamous critique of the psychological realityof generative grammars (see Johnson 2015 for a solution in terms ofinvariance of ‘behaviorally equivalent grammar formalisms, touse Quine’s terminology, see also Nefdt 2021 for a similarresolution in terms of structural realism in the philosophy ofscience).
The generative Essentialist conception of an I-language isantithetical to Emergentist research programs. If the fundamentalexplanandum of scientific linguistics is how actual linguisticcommunication takes place, one must start by looking at both internal(psychological) and external (public) practices and conventions invirtue of which it occurs, and consider the effect of historical andgeographic contingencies on the relevant underlying processes. Thatwould not rule out ‘I-language’ as part of the explanans;but some Emergentists seem to befictionalists aboutI-languages, in an analogous sense to the way that Chomsky is afictionalist about E-languages. Emergentists do not see a child aslearning a generative grammar, but as learning how to use a symbolicsystem for propositional communication. On this view grammars are mereartifacts that are developed by linguists to codify aspects of therelevant systems, and positing an I-language amounts to projecting thelinguist’s codification illegitimately onto human minds (see,for example, Tomasello 2003).
The I-language concept brushes aside certain phenomena of interest tothe Externalists, who hold that the forms of actually attestedexpressions (sentences, phrases, syllables, and systems of such units)are of interest for linguistics. For example, computationallinguistics (work on speech recognition, machine translation, andnatural language interfaces to databases) must rely on a conception oflanguage as public and extensional; so must any work on the utterancesof young children, or the effects of word frequency on vowelreduction, or misunderstandings caused by road sign wordings. At thevery least, it might be said on behalf of this strain of Externalism(along the lines of Soames 1984) that linguistics will need carefulwork on languages as intersubjectively accessible systems beforehypotheses about the I-language that purportedly produces them can beinvestigated.
It is a highly biased claim that the E-language concept “appearsto play no role in the theory of language” (Chomsky 1986: 26).Indeed, the terminological contrast seems to have been invented not toclarify a distinction between concepts but to nudge linguisticresearch in a particular direction.
In Hauser et al. (2002) (henceforth HCF) a further pair of contrastingterms is introduced. They draw a distinction quite separate from thecompetence/performance and‘I-language’/‘E-language’ distinctions: the“language faculty in the narrow sense” (FLN) isdistinguished from the “language faculty in the broadsense” (FLB). According to HCF, FLB “excludes otherorganism-internal systems that are necessary but not sufficient forlanguage (e.g., memory, respiration, digestion, circulation,etc.)” but includes whatever is involved in language, and FLN issome limited part of FLB (p. 1571) This is all fairly vague, but it isclear that FLN and FLB are both internal rather than external, andindividual rather than social.
The FLN/FLB distinction apparently aims to address the uniqueness ofone component of the human capacity for language rather than (say) thecontent of human grammars. HCF say (p. 1573) that “Only FLN isuniquely human”; they “hypothesize that most, if not all,of FLB is based on mechanisms shared with nonhuman animals”; andthey say:
[T]he computations underlying FLN may be quite limited. In fact, wepropose in this hypothesis that FLN comprises only the corecomputational mechanisms of recursion as they appear in narrow syntaxand the mappings to the interfaces. (ibid.)
The components of FLB that HCF hypothesize are not part of FLN are the“sensory-motor” and “conceptual-intentional”systems. The study of the conceptual-intentional system includesinvestigations of things like the theory of mind; referential vocalsignals; whether imitation is goal directed; and the field ofpragmatics. The study of the sensory motor system, by contrast,includes “vocal tract length and formant dispersion in birds andprimates”; learning of songs by songbirds; analyses of vocaldialects in whales and spontaneous imitation of artificially createdsounds in dolphins; “primate vocal production, including therole of mandibular oscillations”; and “[c]ross-modalperception and sign language in humans versus unimodal communicationin animals”.
It is presented as an empirical hypothesis that a core property of theFLN is “recursion”:
All approaches agree that a core property of FLN is recursion,attributed to narrow syntax…FLN takes a finite set of elementsand yields a potentially infinite array of discrete expressions. Thiscapacity of FLN yields discrete infinity (a property that alsocharacterizes the natural numbers). (HCF, p. 1571)
HCF leave open exactly what the FLN includes in addition to recursion.It is not ruled out that the FLN incorporates substantive universalsas well as the formal property of “recursion”. Butwhatever “recursion” is in this context, it is apparentlynot domain-specific in the sense of earlier discussions by generativeEssentialists, because it is not unique to human natural language ordefined over specifically linguistic inputs and outputs: it is thebasis for humans’ grasp of the formal and arguably non-naturallanguage of arithmetic (counting, and the successor function), andperhaps also navigation and social relations. It might be moreappropriate to say that HCF identify recursion as a cognitiveuniversal, not a linguistic one. And in that case it is difficult tosee how the so-called ‘language faculty’ deserves thatname: it is more like a faculty for cognition and communication.
This abandonment of linguistic domain-specificity contrasts verysharply with the picture that was such a prominent characteristic ofthe earlier work on linguistic nativism, popularized in different waysby Fodor (1983), Barkow et al. (1992), and Pinker (1994). And yet theHCF discussion of FLN seems to incline to the view that human languagecapacities have a unique human (though not uniquely linguistic)essence.
The FLN/FLB distinction provides earlier generative Essentialism withan answer (at least in part) to the question of what the singularityof the human language faculty consists in, and it does so in a waythat subsumes many of the empirical discoveries of paleoanthropology,primatology, and ethnography that have been part of highly influentialin Emergentist approaches as well as neo-Darwinian Essentialistapproaches. A neo-Darwinian Essentialist like Pinker will accept thatthe language faculty involves recursion, but also will also hold (withEmergentists) that human language capacities originated, via naturalselection, for the purpose of linguistic communication.
Thus, over the years, those Essentialists who follow Chomsky closelyhave changed the term they use for their core subject matter from‘linguistic competence’ to ‘I-language’ to‘FLN’, and the concepts expressed by these terms are allslightly different. In particular, what they are counterposed todiffers in each case.
The challenge for the generative Essentialist adopting the FLN/FLBdistinction as characterized by HCF is to identify empirical data thatcan support the hypothesis that the FLN “yields discreteinfinity”. That will mean answering the question: discreteinfinity of what? HCF write that FLN “takes a finite set ofelements and yields a potentially infinite array of discreteexpressions” (p. 1571), which makes it clear that there must bea recursive procedure in the mathematical sense, perhaps puttingatomic elements such as words together to make internally complexelements like sentences (“array” should probably beunderstood as a misnomer for ‘set’). But then they say,somewhat mystifyingly:
Each of these discrete expressions is then passed to the sensory-motorand conceptual-intentional systems, which process and elaborate thisinformation in the use of language. Each expression is, in this sense,a pairing of sound and meaning. (HCF, p. 1571)
But the sensory-motor and conceptual-intentional systems are concreteparts of the organism: muscles and nerves and articulatory organs andperceptual channels and neuronal activity. How can each one of a“potentially infinite array” be “passed to”such concrete systems without it taking a potentially infinite amountof time? HCF may mean that for any one of the expressions that FLNdefines as well-formed (by generating it) there is a possibility ofits being used as the basis for a pairing of sound and meaning. Thiswould be closer to the classical generative Essentialist view that thegrammar generates an infinite set of structural descriptions; but itis not what HCF say.
At root, HCF is a polemical work intended to identify the view itpromotes as valuable and all other approaches to linguistics asotiose.
In the varieties of modern linguistics that concern us here, the term“language” is used quite differently to refer to aninternal component of the mind/brain (sometimes called internallanguage or I-language).… However, this biologically andindividually grounded usage still leaves much open to interpretation(and misunderstanding). For example, a neuroscientist might ask: Whatcomponents of the human nervous system are recruited in the use oflanguage in its broadest sense? Because any aspect of cognitionappears to be, at least in principle, accessible to language, thebroadest answer to this question is, probably, “most ofit.” Even aspects of emotion or cognition not readily verbalizedmay be influenced by linguistically based thought processes. Thus,this conception is too broad to be of much use. (HCF, p. 1570)
It is hard to see this as anything other than a claim that approachesto linguistics focusing on anything that could fall under the label‘E-language’ are to be dismissed as useless.
Some Externalists and Emergentists actually reject the idea that thehuman capacity for language yields “a potentially infinite arrayof expressions”. It is often pointed out by empirically inclinedcomputational linguists that in practice there will only ever be afinite number of sentences to be dealt with (though the people sayingthis may underestimate the sheer vastness of the finite set involved).And naturally, for those who do not believe there are generativegrammars in speakers’ heads at all, it holds a fortiori thatspeakers do not have grammars in their heads generating infinitelanguages (see Nefdt 2019c for a scientific modeling perspective onthe infinity postulate). Externalists and Emergentists tend to holdthat the “discrete infinity” that HCF posits is moreplausibly a property of the generative Essentialists’ model oflinguistic competence, I-language, or FLN, than a part of the humanmind/brain. This does not mean that non-Essentialists deny that actuallanguage use is creative, or (of course) that they think there is alongest sentence of English. But they may reject the link betweenlinguistic productivity or creativity and the mathematical notion ofrecursion (see Pullum and Scholz 2010).
HCF’s remarks about how FLN “yields” or“generates” a specific “array” assume thatlanguages are clearly and sharply individuated by their generators.They appear to be committed to the view that there is a fact of thematter about exactly which generator is in a given speaker’shead. Emergentists tend not to individuate languages in this way, andmay reject generative grammars entirely as inappropriately orunacceptably ‘formalist’. They are content with the notionthat the common-sense concept of a language is vague, and it is notthe job of linguistic theory to explain what a language is, any morethan it is the job of physicists to explain what material is, or ofbiologists to explain what life is. Emergentists, in particular, areinterested not so much in identifying generators, or individuatinglanguages, but in exploring the component capacities that facilitatelinguistic communication, and finding out how they interact.
Similarly, Externalists are interested in the linguistic structure ofexpressions, but have little use for the idea of a discrete infinityof them, a view that is not, and cannot be empirically supported,unless one thinks of simplicity and elegance of theory as empiricalmatters. They focus on the outward manifestations of language, not ona set of expressions regarded as a whole language—at least notin any way that would give a language a definite cardinality. ZelligHarris, an archetypal Externalist, is explicit that the reason for notregarding the set of utterances as finite concerns the elegance of theresulting grammar: “If we were to insist on a finite language,we would have to include in our grammar several highly arbitrary andnumerical conditions” (Harris 1957: 208). Infinitude, on hisview is an unimportant side consequence of setting up asentence-generating grammar in an uncluttered and maximally elegantway, not a discovered property of languages (see Pullum and Scholz2010 for further discussion).
Not all Essentialists agree that linguistics studies aspects of whatis in the mind or aspects of what is human. There are some who do notsee language as either mental or human, and certainly do not regardlinguists as working on a problem within cognitive psychology orneurophysiology. The debate on the ontology of language has seen threemajor options emerging in the literature. Besides the mentalism ofChomskyan linguistics, Katz (1981), Katz and Postal (1991) and Postal(2003) proffered a platonistic alternative and finally nominalism wasproposed by Devitt (2006).
However, the Katzian trichotomy is no longer a useful characterisationof the state-of-the-art in linguistic ontology. For one thing,Katzian-style linguistic Platonism has very few if any extantadherents. One reason for this situation is that linguistic platonistsattempt to restage the debate on the foundations and metaphysics ofnatural language within the philosophy of mathematics (see Katz1996). But even if this move was legitimate, it would only have openedup a range of possibilities including nominalism (Field 1980; Azzouni2004), structuralism (Hellman 1989; Shapiro 2007; Nefdt 2016), andforms of mentalism in the guise of intuitionism. For instance, whileRichard Montague is often attributed with the view that linguisticscan be viewed as a branch of mathematics, it is unclear whether or nothe endorsed a platonistic ontology. Devitt (2006: 26) describes thepossibility of a ‘methodological platonism’ in thefollowing manner:
It is often convenient to talk of objects posited by these theories asif they were types not tokens, as if they were Platonic objects, butthis need be nothing more than a manner of speaking: when the chipsare down the objects are part of the spatiotemporal physicalworld.
Devitt’s nominalism or ‘linguistic conception’ wasnot around at the time of the original Katzian tripartite analysis. Heargues that linguistics is an empirical science which studieslanguages as they are spoken by linguistic communities and viewingsentences as ‘idealised tokens’. Devitt’s‘linguistic view’ (as opposed to the ’psychologicalview’ or Chomskyan mentalism) claims that grammars map ontobehavioural output of language production, of which speakers aregenerally ignorant.
Katz took nominalism to have been refuted by Chomsky in his critiquesof American structuralists in the 1960s. But, in Katz’s opinion,Chomsky had failed to notice that conceptualism was infected with manyof the same faults as nominalism, because it too localized languagespatiotemporally (in contingently existing, finite, human brains).Since contemporary Minimalist theories share in the earlierontological commitment, Katz’ argument would presumably extentto them. Through an argument by elimination, Katz concluded that onlyplatonism remained, and must be the correct view to adopt. But this isa false trichotomy and besides predating Devitt’s morephilosophically grounded nominalism, it also fails to take linguisticpluralism into account.
Recent adherents of pluralism are Stainton (2014) and Santana (2016).Santana (2016) argues in favour of a pluralistic ontology for naturallanguage based on all of the major foundational approaches, includingsociolinguistic ontology. His approach is thoroughly naturalistic inasking the ontological question through the lens of “what sortof roles the concept of language plays in linguistic theory andpractice” (Santana, 2016: 501).
The first thing Santana does is to separate the discussion into tworelated questions, one scientific and the other metascientific or‘descriptive’ and ‘normative’ in his terms. Heclaims that “[l]anguage, the scientific concept, is thusdescriptively whatever it is that linguists take as their primaryobject of study, and normatively whatever it is they should bestudying” (Santana, 2016: 501). Eventually he advocates a unionof various ontologies based on the ineliminable status of eachperspective (in that way the oppose of Katz’ eliminativestrategy).
Stainton (2014) similarly proposes a pluralistic ontology but with amore intersectional approach. His additional argument relates to howall of the views are indeed compatible. This argument is a response toan immediate objection along the lines of Postal (2003, 2009) as tothe incompatibility of the various ontologies associated withmentalism, Platonism, physicalism and public language views. Staintonbegins the pluralist apology in this way.
There is an obvious rebuttal on behalf of pluralism, namely that“the linguistic” is a complex phenomenon with parts thatbelong to distinct ontological categories. This shouldn’tsurprise, since even “the mathematical” is like this: Twowholly physical dogs plus two other wholly physical dogs yields fourdogs; there certainly is the mental operation of multiplying 26 by 84,the mental state of thinking about the square root of 7, and so on.(2014: 5)
His main argument against incompatibility, and in favour ofintersection, is that the former rests on an equivocation of the terms‘mental’, ‘abstract’ and even‘physical’. Once the equivocation is cleared up, it isargued, hybrid ontological objects are licensed. The argument goesthat appreciating the nuanced physical and mental and what he calls‘abstractish’ nature of natural language will dissolveworries about ontological inconsistency and open the door forintersection. Consider some other members of this category ofobjects.
Indeed, our world is replete with such hybrid objects: psychoculturalkinds (e.g. dining room tables, footwear, bonfires, people, sportfishing [...]; intellectual artifacts (college diplomas,drivers’ licenses, the Canadian dollar [...]; and institutions(MIT’s Department of Linguistics and Philosophy, Disneyworld[...] (Stainton, 2014: 6).
Despite the decline in interest in the ontology of language itself,philosophers have recently embraced a subset of this debate in thephilosophy of linguistic objects with a special focus on words. Thereis a recent debate in philosophy on the philosophy of what Rey (2006,2020) calls ‘Standard Linguistic Entities’ (SLEs) ortokens of word, sentence, morpheme, and phonemes types. Rey thendefines a position called ’physical tokenism’ or PT as theassumption that SLEs can be identified with physical (acoustic)spatio-temporal phenomena. He doesn’t think that SLEs share thesame kind of existence as his trusty Honda. In fact he thinks thatthey are ‘intentional inexistents’ (borrowed fromBrentano, see theSEP entry on ‘Brentano’s Theory of Judgement, see also the sectionentitled ‘Intentional Inexistence’ in theentry on Intentionality) or purely intentional uses of the term ‘represents’ whichdenote fictions of a particular sort. Linguistic theory according tohim is only committed to the intentional contents of things likenouns, verbs, verb phrases etc. where “an (intentional) contentis whatever we understand x to be when we use the idiom‘represent(ation of) x’ but there is no real x”(2006: 242).
There has been some theoretical work on the nature of entities likephrases and words in linguistics. For example, Ross (2010) argues thatthe concept of parts of speech is fuzzy. Similarly, Szabó(2015) rejects the idea that parts of speech should be identified bydistributional analysis as is common in syntax. Instead he offers asemantic approach based on predicate logic where the aim is to modelthe major lexical categories directly in terms of open classconstants. This, he claims, results in a reduction of the gap betweengrammar and logic. So, for instance, nouns become not typescorresponding to distributionally defined syntactic objects but ratheropen lexical constants used for reference such that the semanticclause only needs to involve a universal quantifier and a variablespecified in terms of reference. Verbs, on the other hand, areconstants which purport to predicate (for more details, seeSzabó 2015 and Nefdt 2020). For Haspelmath (2011) a centralproblem is the concept of wordhood. He identifies ten morphosyntacticcriteria for words as the best possible candidates over seeminglyinferior semantic or phonological options. He shows all of them to bewanting, with the result among other things being that “thenotion of lexical integrity is not well supported and should not beappealed to in explaining grammatical phenomena” (Haspelmath,2011: 33). The very notion of wordhood, although intuitive andcentral, is unclear upon further scrutiny. Yet, in linguistics thereis continual hope for a resolution, that there is something more thanessential inexistence at stake. Haspelmath thinks this is a vain hope,and attributes it to the influence of orthography on the thought oflinguist researchers.
Philosophers have been traditionally interested in the metaphysics ofSLEs with a special focus on the ontological status of words.Interestingly, this literature showcases variations on thefoundational debates on the ontology of language. As Miller (2020)notes:
Words play various roles in our lives. Some insult, some inspire, andwords are central to communication. The aim of an ontology of words isto determine what entities, if any, can play those roles and possess(or instantiate) these properties. (2)
However the positions advocated are somewhat more nuanced than theoriginal Katzian trichotomy suggests. They usually start with theproblem of word individuation expressed in the following manner:
Think of the following line: A rose is a rose is a rose. How manywords are there in this line? If we were to count words themselves,not their instances, the answer is three: rose, is, and a. If we wereto count the concrete instances we see on a piece of paper, the answeris eight. The line, however, can be taken as an abstract type; asequence of shapes. (Irmak, 2019: 1140)
Further complications are introduced by reference to words like“color” and “colour”. Here the idea is thatthe phonological profile of a word is a guide to its identity. Butthis fails in other cases.
Now take the name MOHAMMED. Since Arabic does not notate vowels, thename has been transcribed in a wide variety of ways in English, andsome of such transcriptions present important discrepancies: Forexample, “Mohammad” and “Mehmood.” Even if weknow that they originated from the same source, the difference betweenthe two forms is considerable, and intuitions about their beingvariants of the same name are less clear. What is the point up towhich differences in spelling are consistent with word identity?(Gasparri, 2021: 594)
Word individuation goes beyond this initial characterisation and it isnot always clear how the many accounts deal with the more complexquestions directly in their metaphysical pursuits. For instance,issues not usually mentioned in the literature, but which seem equallyimportant are related to whether pitch in ‘They poured pitch allover the parking lot’ and ‘The players swarmed the pitchafter they won the game’ are different words. What about wordswithin different syntactic categories, such as ‘watch’(time-telling device) and ‘watch’ (observe)? Are“ain’t” and “isn’t” and“aren’t” different words? What about simple cases ofinflectional morphology such as like ‘toy’ and‘toys’?
According to Nefdt (2019b), the identity of a word is tied to its rolein the sentence structure. In which case, “ain’t”and “isn’t” come out as the same word (at least inthe singular use) but ‘watch’ (noun) and‘watch’ (verb) do not. However, counterintuitively, hisaccount might license the identity of words like ‘truck’and ‘lorry’.
There are two strong but separate traditions which can both lay claimto being the ‘received position’. Within linguistics, theidea of a word as a LEXEME or mental dictionary entry is commonplace(with stipulations for senses, irregular forms, and selectionalcriteria). Most introductory textbooks assume something of this sort.In the philosophical literature, on the other hand, a mild ormethodological version of platonism is often presupposed. This viewhas it that words can be separated into types and tokens, where theformer lack specific spatiotemporal features and the latterinstantiate these forms somehow.
The latter intuition seems to characterize most views on the ontologyof words. Bromberger (1989) defines what he calls “the PlatonicRelationship Principle” or the principle that allows us to‘impute properties to types after observing and judging some oftheir tokens’ (Bromberger 1989, 62). While Bromberger (1989,2011) represents the pinnacle of the classical philosophy oflinguistics approach to these questions. In a more metaphysical mode,David Kaplan (1990, 2011) constructs a thoroughly physicalist proposalin which words are modelled in terms of a stages and continuants:
I propose a quite different model according to which utterances andinscriptions are stages of words, which are the continuants made up ofthese interpersonal stages along with some more mysteriousintrapersonal stages. (Kaplan 1990: 98)
For him, what individuates words is the intention of the user (seeCappelen (1999) for an objection to intentional accounts tout court).Unfortunately, fascinating as Kaplan’s proposal is, it does notattempt to reflect on linguistic theory directly. In fact, one majorcriticism of his view, courtesy of Hawthorne and Lepore (2011), isthat it fails to account for uninstantiated word-types whose existenceis guaranteed by derivational morphology, whether or not they’vebeen tokened or baptized in the real world. Other notable accounts areWetzel’s (2009) Platonism (see theSEP entry on abstract objects) and Szabó’s (1999) representational/nominalist view.
The philosophy of words has recently seen a resurgence in interestamong philosophers, especially on the ontological issues. Miller(2021), for example, attempts to apply a bundle theory to the task ofword individuation and identification. Irmak (2019) suggests thatwords are abstract artifacts (similarly to Katz and Wetzel) butinsists that they are more akin to musical scores or works of fictionwhich have temporal components (“temporal abstracta” as hecalls it). Mallory (2020) advocates the position that words are notreally objects in the ordinary sense. However, he opts for anaction-theoretic approach in which tokens provide instructions for theperformance of action-types where our normal understanding of‘word’ is to be identified with those types. His view isovertly naturalistic and focuses on the concept of words which isdrawn from contemporary linguistic theory. Similarly, Nefdt (2019b)proffers a mathematical structuralist interpretation of SLEs in whichthe definition of words is continuous with the ontology of phrases andsentences. Here he follows Jackendoff (2018) who uses model-theoretic(Pullum 2013) or constraint-based grammar formalisms to argue for acontinuum between words and linguistic rules. In other words, theselatter two authors reject the idea that words are somehow sui generisentities in need of discontinuous explanation. Gasparri (2020)suggests pluralism is a more solid foundation for the ontology ofwords. He evaluates both “bundlistic” views such asMiller’s and causal-historical accounts such as Irmak’sbefore offering an alternative “view that there is a pluralityof epistemically virtuous ways of thinking about the nature ofwords” (608).
These are of course complex issues and they offer a lens through whichto appreciate the erstwhile debate on the ontology of language butwith a contemporary and more focused flavor. Not all of the authorswho work on the philosophy of words consider the role of linguistictheory to be central. Hence their work might be related but it doesnot quite qualify as the philosophy of linguistics, where this isviewed as a subfield of the philosophy of science. By contrast, wehave focused on the authors who directly engage with linguistic theoryin their accounts of the ontology of SLEs. There is also no clearmapping between the various ontological accounts mentioned here andthe characterizations of linguistic theorizing in terms ofExternalism, Emergentism and Essentialism. No particular metaphysicalview unifies any of our three groupings. For example, not allExternalists incline toward nominalism; numerous Emergentists as wellas most Essentialists take linguistics to be about mental phenomena;and our Essentialists include Katz’s platonism alongside theChomskyan ‘I-language’ advocates and pluralists embraceaspects of all of the above.
Linguists’ conception of the components of the study of languagecontrast with philosophers’ conceptions (even those ofphilosophers of language) in at least three ways. First, linguists areoften intensely interested in small details of linguistic form intheir own right. Second, linguists take an interest in whole topicareas like the internal structure of phrases, the physics ofpronunciation, morphological features such as conjugation classes,lexical information about particular words, and so on—topics inwhich there is typically little philosophical payoff. And third,linguists are concerned with relations between the differentsubsystems of languages: the exact way the syntax meshes with thesemantics, the relationship between phonological and syntactic facts,and so on.
With regard to form, philosophers broadly follow Morris (1938), afoundational work in semiotics, and to some extent Peirce (see SEPentry: Peirce, semiotics), in thinking of the theory of language ashaving three main components:
Linguists, by contrast, following both Sapir (1921) and Bloomfield(1933), treat the syntactic component in a more detailed way thanMorris or Peirce, and distinguish between at least three kinds oflinguistic form: the form of speech sounds (phonology), the form ofwords (morphology), and the form of sentences. (If syntax is about theform of expressions in general, then each of these would be an elementof Morris’s syntax.)
Emergentists in general deny that there is a distinction betweensemantics and pragmatics—a position that is familiar enough inphilosophy: Quine (1987: 211), for instance, holds that “theseparation between semantics and pragmatics is a perniciouserror.” And generally speaking, those theorists who, like thelater Wittgenstein, focus on meaning as use will deny that one canseparate semantics from pragmatics. Emergentists such as Paul Hopper& Sandra Thompson agree:
[W]hat is called semantics and what is called pragmatics are anintegrated whole. (Hopper and Thompson 1993: 372)
Some Essentialists—notably Chomsky—also deny thatsemantics can be separated from pragmatics, but unlike theEmergentists (who think that semantics-pragmatics is a starting pointfor linguistic theory), Chomsky (as we noted briefly in section 1.3)denies that semantics and pragmatics can have any role inlinguistics:
It seems that other cognitive systems—in particular, our systemof beliefs concerning things in the world and theirbehavior—play an essential part in our judgments of meaning andreference, in an extremely intricate manner, and it is not at allclear that much will remain if we try to separate the purelylinguistic components of what in informal usage or even in technicaldiscussion we call ‘the meaning of [a] linguisticexpression.’ (Chomsky 1979; 142)
Regarding the theoretical account of the relation between words orphrases and what speakers take them to refer to, Chomsky says,“I think such theories should be regarded as a variety ofsyntax” (Chomsky 1992: 223).
Not every Essentialist agrees with Chomsky on this point. Many believethat every theory should incorporate a linguistic component thatyields meanings, in much the same way that many philosophers oflanguage believe there to be such a separate component. Often,although not always, this component amounts to a truth-theoreticaccount of the values of syntactically-characterized sentences. Thistypically involves a translation of the natural language sentence intosome representation that is “intermediate” between naturallanguage and a truth-theory—perhaps an augmented version offirst-order logic, or perhaps a higher-order intensional language. TheEssentialists who study semantics in such ways usually agree withChomsky in seeing little role for pragmatics within linguistic theory.But their separation of semantics from pragmatics allows them toaccord semantics a legitimacy within linguistics itself, and not justin psychology or sociology.
Such Essentialists, as well as the Emergentists, differ in importantways from classical philosophical logic in their attitudes towards“the syntactic-semantic interface”, however. Philosophersof language and logic who are not also heavily influenced bylinguistics tend to move directly—perhaps by means of a“semantic intuition” or perhaps from an intuitiveunderstanding of the truth conditions involved—from a naturallanguage sentence to its “deep, logical” representation.For example, they may move directly from (EX1) to (LF1):
And from there perhaps to a model-theoretic description of itstruth-conditions. A linguist, on the other hand, would aim to describehow (EX1) and (LF1) are related. From the point of view of asemantically-inclined Essentialist, the question is: how should thesyntactic component of linguistic theory be written so that thesemantic value (or, “logical form representation”) can beassigned? From some Emergentist points of view, the question is: howcan the semantic properties and communicative function of anexpression explain its syntactic properties?
Matters are perhaps less clear with the Externalists—at leastwith those who identify semantic value with distribution in terms ofneighboring words (there is a tradition stemming from thestructuralists of equating synonymy with the possibility ofsubstitution in all contexts without affecting acceptability).
Matters are in general quite a bit more subtle and tricky than (EX1)might suggest. Philosophers have taken the natural language sentence(EX2) to have two logical forms, (LF2a) and (LF2b):
But for the linguist interested in the syntax-semantics interface,there needs to be some explanation of how (LF2a) and (LF2b) areassociated with (EX2). It could be a way in which rules can derive(LF2a) and (LF2b) from the syntactic representation of (EX2), as somesemantically-inclined Essentialists would propose, or a way to explainthe syntactic properties of (EX2) from facts about the meaningsrepresented by (LF2a) and (LF2b), as some Emergentists might want. Butthat they should be connected up in some way is something thatlinguists would typically count as non-negotiable.
The strengths and limitations of different data gathering methodsbegan to play an important role in linguistics in the early tomid-20th century. Voegelin and Harris (1951: 323) discuss severalmethods that had been used to distinguish Amerindian languages anddialects:
They note that the anthropological linguists Boas and Sapir (who wetake to be proto-Emergentists) used the ‘ask theinformant’ method of informal elicitation, addressing questions“to the informant’s perception rather than to the datadirectly” (1951: 324). Bloomfield (the proto-Externalist), onthe other hand, worked on Amerindian languages mostly by collectingcorpora, with occasional use of monolingual elicitation.
The preferred method of Essentialists today is informal elicitation,including elicitation from oneself. Although the techniques forgathering data about speakers and their language use have changeddramatically over the past 60 or more years, the general strategieshave not: data is still gathered by elicitation of metalinguisticjudgments, collection of corpus material, or direct psychologicaltesting of speakers’ reactions and behaviors. Differentlinguists will have different preferences among these techniques, butit is important to understand that data could be gathered in any ofthe three ways by advocates of any tendency. Essentialists,Emergentists, and Externalists differ as much on how data isinterpreted and used as on their views of how it should begathered.
A wide range of methodological issues about data collection have beenraised in linguistics. Since gathering data by direct objectiveexperimental testing of informants is a familiar practice throughoutthe social, psychological, medical, and biological sciences, we willsay little about it here, focusing instead on these five issues aboutdata:
The debate in linguistics over the use of linguistic intuitions(elicited metalinguistic judgments) as data, and how that data shouldbe collected has resulted in enduring, rancorous, often ideologicallytinged disputes over the past 45 years. The disputes are remarkable,if only for their fairly consistent venomous tone.
At their most extreme, many Emergentists and some Externalists castthe debate in terms of whether linguistic intuitions shouldever countas evidence for linguistic theorizing. And many Essentialists cast itin terms of whether anything but linguistic intuitions are ever reallyneeded to support linguistic theorizing.
The debate focuses on the Essentialists’ notion of a mentalgrammar, since linguistic intuitions are generally understood to be aconsequence of tacit knowledge of language. Emergentists who deny thatspeakers have innate domain-specific grammars (competence,I-languages, or FLN) have raised a diverse range of objections to theuse of reports of intuitions as linguistic data – though Devitt (2006)offers an account of linguistic intuitions that does not base themon inferred tacit knowledge of competence grammars. The followingpassages are representative Emergentist critiques of ‘intuitions’(elicited judgments):
Generative linguists typically respond to calls for evidence for thereality of their theoretical constructs by claiming that no evidenceis needed over and above the theory’s ability to account forpatterns of grammaticality judgments elicited from native speakers.This response is unsatisfactory on two accounts. First, such judgmentsare inherently unreliable because of their unavoidable meta-cognitiveovertones… Second, the outcome of a judgment (or the analysisof an elicited utterance) is invariably brought to bear on somedistinction between variants of the current generative theory, neveron its foundational assumptions. (Edelman and Christiansen 2003:60)
The data that are actually used toward this end in Generative Grammaranalyses are almost always disembodied sentences that analysts havemade up ad hoc, … rather than utterances produced by realpeople in real discourse situations… In diametric opposition tothese methodological assumptions and choices, cognitive-functionallinguists take as their object of study all aspects of naturallanguage understanding and use… They (especially the morefunctionally oriented analysts) take as an important part of theirdata not disembodied sentences derived from introspection, but ratherutterances or other longer sequences from naturally occurringdiscourse. (Tomasello 1998: xiii)
[T]he journals are full of papers containing highly questionable data,as readers can verify simply by perusing the examples in nearly anysyntax article about a familiar language. (Wasow and Arnold 2005:1484)
It is a common Emergentist objection that linguistic intuitions (takento be reports of elicited judgments of the acceptability ofexpressions not their grammaticality) are bad data points because notonly are they not usage data, i.e., they are metalinguistic, but alsobecause they are linguists’ judgments about invented examplesentences. On neither count would they be clear and direct evidence oflanguage use and human communicative capacities—the subjectmatter of linguistics on the Emergentist view. A further objection isto their use by theorists to the exclusion of all other kinds ofevidence. For example,
[Formal linguistics] continues to insist that its method for gatheringdata is not only appropriate, but is superior to others. Occasionallya syntactician will acknowledge that no one type of data isprivileged, but the actual behavior of people in the field belies thisconcession. Take a look at any recent article on formal syntax and seewhether anything other than the theorist’s judgments constitutethe data on which the arguments are based. (Ferreira 2005: 372)
“Formal” is Emergentist shorthand for referring togenerative linguistics. And it should be noted that the practice byEssentialists of collapsing various kinds of acceptability judgmentsunder the single label ‘intuitions’ masks importantdifferences. In principle there might be significant differencesbetween the judgments of (i) linguists with a stake in what theevidence shows; (ii) linguists with experience in syntactic theory butno stake in the issue at hand; (iii) non-linguist native speakers whohave been tutored in how to provide the kinds of judgments thelinguist is interested in; and (iv) linguistically naïve nativespeakers.
Many Emergentists object to all four kinds of reports of intuitions onthe grounds that they are not direct evidence language use. Forexample, a common objection is based on the view that
[T]he primary object of study is the language people actually produceand understand. Language in use is the best evidence we have fordetermining the nature and specific organization of linguisticsystems. Thus, an ideal usage-based analysis is one that emerges fromobservation of such bodies of usage data, called corpora.…Because the linguistic system is so closely tied to usage, it followsthat theories of language should be grounded in an observation of datafrom actual uses of language. (Barlow and Kemmer 2002, Introduction)
But collections of linguists’ reports of their own judgments arealso criticized by Emergentists as “arm-chair datacollection,” or “data collection by introspection”.All parties tend to call this kind of data collection“informal”—though they all rely on either formallyor informally elicited judgments to some degree.
On the other side, Essentialists tend to deny that usage data isadequate evidence by itself:
More than five decades of research in generative linguistics haveshown that the standard generative methodology of hypothesis formationand empirical verification via judgment elicitation can lead to averitable goldmine of linguistic discovery and explanation. In manycases it has yielded good, replicable results, ones that could not aseasily have been obtained by using other data-gathering methods, suchas corpus-based research…[C]onsider the fact that parasitic gapconstructions…are exceedingly rare in corpora…. [T]hesedistributional phenomena would have been entirely impossible todistill via any non-introspective, non-elicitation based datagathering method. Corpus data simply cannot yield such a detailedpicture of what is licit and, more crucially, what is not licit for aparticular construction in a particular linguistic environment. (denDikken et al. 2007: 336)
And Essentialists often seem to deny that they are guilty of what theEmergentist claims they are guilty of. For example, Chomsky appears tobe claiming that acceptability judgments are performance data, i.e.evidence of use:
Acceptability is a concept that belongs to the study of performance,whereas grammaticalness belongs to the study of competence…Like acceptability, grammaticalness is, no doubt, a matter ofdegree…but the scales of grammaticalness and acceptability donot coincide. Grammaticalness is only one of many factors thatinteract to determine acceptability. (Chomsky 1965: 11)
Chomsky means to deny that acceptability judgments are direct evidenceoflinguistic competence. But it does not follow from this thatelicited acceptability judgments are direct evidence of languageuse.
And as for the charge of “arm-chair” collection methods,some Essentialists claim to have shown that such methods are as goodas more controlled experimental methods. For example, Sprouse andAlmeida report:
[W]e empirically assess this claim by formally testing all 469(unique, US-English) data points from a popular syntax textbook (Adger2003) using 440 naïve participants, two judgment tasks (magnitudeestimation and yes–no), and three different types of statisticalanalyses (standard frequentist tests, linear mixed effects models, andBayes factor analysis). The results suggest that the maximumdiscrepancy between traditional methods and formal experimentalmethods is 2%. This suggests that … the minimum replicationrate of these 469 data points is 98%. (Spouse and Almeida 2012,p. 609, abstract)
This can be read as defending either Essentialists’ consultingof their own intuitions simpliciter, or their self-consultation ofintuitions on uncontroversial textbook cases only. The former is muchmore controversial than the later.
One might also wonder whether an error rate of 2% really isappropriate for the primary data presented in an elementary textbook.If a geography textbook misidentified 2–3% of the rivers of thecontinental United States, or gave incorrect locations for them, orincorrectly reported their lengths, it would forfeit ourtrust. Analogous claims could be made about any elementary textbook inother fields: an elementary English literature textbook thatmisidentified the authors of 2% of the books discussed, or their yearsof publication, etc.
Finally, both parties of the debate engage inad hominemattacks on their opponents. Here is one example of a classicadhominem ortu quoque attack on Emergentists in defenseof constructed examples by Essentialists:
[The charge made concerning “armchair data collection”]implies that there is something intrinsic to generative grammar thatinvites partisans of that framework to construct syntactic theories onthe evidence of a single person’s judgments. Nothing could befarther from the truth. The great bulk of publications in cognitiveand functional linguistics follow the same practice. Of course,rhetorically many of the latter decry the use oflinguists’ own intuitions as data. For example, in … animportant collections [sic] of papers in cognitive-functionallinguistics, … only two contributors to the volume …present segments of natural discourse, neither filling even a page oftext. All of the other contributors employ examples constructed by thelinguists themselves. It is quite difficult to findany workin cognitive linguistics (and functional linguists are only slightlybetter) that uses multiple informants. It seems almost disingenuous… to fault generativists for what (for better or worse) isstandard practice in the field, regardless of theoretical allegiance.(Newmeyer 2007: 395)
Clearly, the mere fact that some Emergentists may in practice havemade use of invented examples in testing their theories does not tellagainst any cogent general objections they may have offered to suchpractice. What is needed is a decision on the methodological point,not just a cry of “You did it too!”.
Given the intolerance of each other’s views, and the crosstalkpresent in these debates, it is tempting to think that Emergentism andEssentialism are fundamentally incompatible on what counts aslinguistic data, since their differences are based on their differentviews of the subject matter of linguistics, and what the phenomena andgoals of linguistic theorizing are. There is no doubt that theopposing sides think that their respective views are incompatible. Butthis conclusion may well be too hasty. In what follows, we try topoint to a way that the dispute could be ameliorated, if notadjudicated.
Essentialists who accept the competence/performance distinction ofChomsky (1965) traditionally emphasize elicited acceptability judgmentdata (although they need not reject data that is gathered using othermethods). But as Cowart notes:
In this view, which exploits the distinction between competence andperformance, the act of expressing a judgment of acceptability is akind of linguistic performance. The grammar that a [generativeEssentialist] linguistic theory posits in the head of a speaker doesnot exercise exhaustive control of judgments… While forming asentence judgment, a speaker draws on a variety of cognitiveresources… The resulting [acceptability] judgments couldpattern quite differently than the grammaticality values we might likethem to reflect. (Cowart 1997: 7)
The grammaticality of an expression, on the standard generativeEssentialist view, is the status conferred on it by the competencestate of an ideal speaker. But competence can never be exercised orused without potentially interfering performance factors like memorybeing exercised as well. This means that judgments aboutgrammaticality are never really directly available to the linguistthrough informant judgments: they have to be inferred from judgmentsof acceptability (along with any other relevant evidence).Nevertheless, Essentialists do take acceptability judgments to providefairly good evidence concerning the character of linguisticcompetence. In fact the use of informally gathered acceptabilityjudgment data is a hallmark of post-1965 Essentialist practice.
It would be a mistake, however, to suppose that only Essentialistsmake use of such judgments. Many contemporary Externalists andEmergentists who reject the competence/performance distinction stilluse informally gathered acceptability judgments in linguistictheorizing, though perhaps not in theory testing. Emergentists tend tointerpret experimentally gathered judgment data as performance datareflecting the interactions between learned features of communicationsystems and general learning mechanisms as deployed in communication.And Externalists use judgment data for corpus cleaning (seebelow).
It should be noted that sociolinguists and anthropological linguists(and we regard them as tending toward Emergentist views) ofteninformally elicit informant judgments not only about acceptability butalso about social and regional style and variation, and meaning. Theymay ask informants questions like, “Who would typically saythat?”, or “What does X mean in context XYZ?”, or“If you can say WXY, can you say WXZ?” (see Labov 1996:77).
A generative grammar gives a finite specification of a set ofexpressions. A psychogrammar, to the extent that it corresponds to agenerative grammar, might be thought to equip a speaker to know (atleast in principle) absolutely whether a string is in the language.However, elicited metalinguistic judgments are uncontroversially amatter of degree. A question arises concerning the scale on whichthese degrees of acceptability should be measured.
Linguists have implicitly worked with a scale of roughly half a dozenlevels and types of acceptability, annotating them with prefixedsymbols. The most familiar is the asterisk, originally used simply tomark strings of words as ungrammatical, i.e., as not belonging to thelanguage at all. Other prefixed marks have gradually becomecurrent:
| prefix | approximate meaning |
| # | semantically anomalous: unacceptable in virtue of a bizarremeaning |
| % | subject to a ‘dialect’ split: judged grammaticalonly by some speakers |
But other annotations have been used to indicate a gradation in theextent to which some sentences are unacceptable. No scientificallyvalidated or explicitly agreed meanings have been associated withthese marks, but a tradition has slowly grown up of assigning prefixessuch as those in Table 2 to signify degrees of unacceptability:
| prefix | approximate meaning |
| (no prefix) | acceptable and thus presumably grammatical |
| ? | of dubious acceptability, though probably grammatical |
| ?? | clearly unacceptable but possibly grammatical |
| ?* | unacceptable enough to suggest probable ungrammaticality |
| * | unacceptable enough to suggest clear ungrammaticality |
| ** | grossly unacceptable, suggesting extreme ungrammaticality |
Table 2: Prefixes used to mark levels ofacceptability
Such markings are often used in a way that suggests anordinalscale, i.e. a partial ordering that is silent on anythingother than equivalence in acceptability or ranking in degree ofunacceptability.
By contrast, Bard et al. (1996: 39) point out, it is possible to useinterval scales, which additionally measure distancebetween ordinal positions. Interval scales of acceptability wouldmeasurerelative distances between strings—how muchmore or less acceptable one is than another.Magnitudeestimation is a method developed in psychophysics to measuresubjects’ judgments of physical stimuli on an interval scale.Bard et al. (1996) adapted these methods to linguistic acceptabilityjudgments, arguing that interval scales of measurement are requiredfor testing theoretical claims that rely on subtle judgments ofcomparative acceptability. An ordinal scale of acceptability canrepresent one expression as being less acceptable than another, butcannot support quantitative questions about how much less. Manygenerative Essentialist theorists had been suggesting that violationof different universal principles led to different degrees ofunacceptability. According to Bard et al. (34–35), because theremay be “disproportion between the fineness of judgments peoplecan make and the symbol set available for recording them” itwill not suffice to use some fixed scale such as this one:
? < ?? < ?* < * < **
indicating absolute degrees of unacceptability. Degrees of relativeunacceptability must be measured. This is done by asking the informanthow much less acceptable one string is than another.
Magnitude estimation can be used with both informal and experimentalmethods of data collection. And data that is measured using intervalscales can be subjected to much more mathematically sophisticatedtests and analyses than data measured solely by an ordinal scale,provided that quantitative data are available.
It should be noted that the value of applying magnitude estimation tothe judgment of acceptability has been directly challenged in tworecent papers. Weskott and Fanselow (2011) and Sprouse (2011) bothpresent critiques of Bard et al. (1996). Weskott and Fanselow comparedmagnitude estimation data to standard judgments on binary and 7-pointscales, and claim that magnitude estimation does not yield moreinformation than other judgment tasks, and moreover can producespurious variance. And Sprouse, on the basis of recent formalizationsof magnitude estimation in the psychophysics literature, presentsexperimental evidence that participants cannot make ratio judgments ofacceptability (for example, a judgment that one sentence is preciselyhalf as acceptable as another), which suggests that the magnitudeestimation task probably provides the same interval-level data asother judgment tasks.
Part of the dispute over the reliability of informal methods ofacceptability judgment elicitation and collection is between differentgroups of Essentialists. Experimentally trained psycholinguistsadvocate using and adapting various experimental methods that havebeen developed in the cognitive and behavioral sciences to collectacceptability judgments. And while the debate is often cast in termsof which method is absolutely better, a more appropriate questionmight be when one method is to be preferred to the others. Thoseinclined toward less experimentally controlled methods point out thatthere are many clear and uncontroversial acceptability judgments thatdo not need to be shown to be reliable. Advocates of experimentalmethods point out that many purportedly clear, uncontroversialjudgments have turned out to be unreliable, and led to false empiricalgeneralizations about languages. Both seem to be right in differentcases.
Chomsky has frequently stated his view that the experimentaldata-gathering techniques developed in the behavioral sciences areneither used nor needed in linguistic theorizing. For example:
The gathering of data is informal; there has been little use ofexperimental approaches (outside of phonetics) or of complextechniques of data collection and data analysis of a sort that caneasily be devised, and that are widely used in the behavioralsciences. The arguments in favor of this informal procedure seem to mequite compelling; basically, they turn on the realization that for thetheoretical problems that seem most critical today, it is not at alldifficult to obtain a mass of crucial data without use of suchtechniques. Consequently, linguistic work, at what I believe to be itsbest, lacks many of the features of the behavioral sciences. (Chomsky1969: 56)
He also expressed the opinion that using experimental behavioral datacollection methods in linguistics “would be a waste of time andenergy” (1969: 81).
Although many Emergentists—the intellectual heirs ofSapir—would accept ‘ask-the-informant’ data, wemight expect them to tend to accept experimental data-gatheringmethods that have been developed in the social sciences. There islittle doubt that strict followers of the methodology preferred byBloomfield in his later career would disapprove of ‘ask theinformant’ methods. Charles Hockett remarked:
A language, as a set of habits, is a fragile thing, subject to minormodification in the slightest breeze of circumstance; this, indeed, isits great source of power. But this is also why thetransformationalists (like the rest of us!), using themselves asinformants, have such a hard time deciding whether certain candidatesfor sentencehood are really ‘in their dialect’ or not; andit is why Bloomfield, in his field work, would never elicit paradigms,for fear he would induce his informant to say something under theartificial conditions of talking with an outsider that he would neverhave said in his own everyday surroundings. (Hockett 1968:89–90, fn. 31)
We might expect Bloomfield, having abandoned his earlier Wundtianpsychological leanings, to be suspicious of any method that could becast as introspective. And we might expect many contemporaryExternalists to prefer more experimentally controlled methods too. (Weshall see below that to some extent they do.)
Derwing (1973) was one early critic of Chomsky’s view (1969)that experimentally controlled data collection is useless; but it wasnearly 25 years before systematic research into possible confoundingvariables in acceptability judgment data started being conducted onany significant scale. In the same year that Bard et al. (1996)appeared, Carson Schütze (1996) published a monograph with thefollowing goal statement:
I aim to demonstrate…that grammaticality judgments and othersorts of linguistic intuition, while indispensable forms of data forlinguistic theory, require new ways of being collected and used. Agreat deal is known about the instability and unreliability ofjudgments, but rather than propose that they be abandoned, I endeavorto explain the source of their shiftiness and how it can be minimized.(1996: 1)
In a similar vein, Wayne Cowart stated that he wanted to“describe a family of practical methods that yield demonstrablyreliable data on patterns of sentence acceptability.” Heobserves that the stability and reliability of acceptability judgmentcollection is
complicated by the fact that there seems to be no consensus on how togather judgments apart from a widespread tolerance for informalmethods in which the linguist consults her own intuitions and those ofthe first handy informant (what we might call the “Hey,Sally” method). (Cowart 1997: 2)
Schütze also expresses the importance of using experimentalmethods developed in cognitive science:
[M]y claim is thatnone of the variables that confoundmetalinguistic data are peculiar to judgments about language. Ratherthey can be shown to operate in some other domain in a similar way.(This is quite similar to Valian’s (1982) claim that the data ofmore traditional psychological experiments have all the same problemsthat judgment data have.) (Schütze 1996: 14)
The above can be read as sympathetic to the Essentialist preferencefor elicited judgments.
Among the findings of Schütze and Cowart about informal judgmentcollection methods are these:
Although Schütze (1996) and Cowart (1997) are both critical oftraditional Essentialist informal elicitation methods, their primaryconcern is to show how the claims of Essentialist linguistics can bemade less vulnerable to legitimate complaints about informal datacollection methods. Broadly speaking, they are friends ofEssentialism. Critics of Essentialism have raised similar concerns inless friendly terms, but it is important to note that the debate overthe reliability of informal methods is a debate within Essentialistlinguistics as well.
Informal methods of acceptability judgment data have often beendescribed as excessively casual. Ferreira described the informalmethod this way:
An example sentence that is predicted to be ungrammatical iscontrasted with some other sentence that is supposed to be similar inall relevant ways; these two sentences constitute a “minimalpair”. The author of the article provides the judgment that thesentence hypothesized to be bad is in fact ungrammatical, as indicatedby the star annotating the example. But there are serious problemswith this methodology. The example that is tested could haveidiosyncratic properties due to its unique lexical content.Occasionally a second or third minimal pair is provided, but noattempt is made to consider the range of relevant extraneous variablesthat must be accounted for and held constant to make sure thereisn’t some correlated property that is responsible for thecontrast in judgments. Even worse, the “subject” whoprovides the data is not a naïve informant, but is in fact thetheorist himself or herself, and that person has a stake in whetherthe sentence is judged grammatical or ungrammatical. That is, theperson’s theory would be falsified if the prediction were wrong,and this is a potential source of bias. (Ferreira 2005: 372)
(It would be appropriate to read ‘grammatical’ and‘grammaticality’ in Ferreira’s text as meaning‘acceptable’ and ‘acceptability’.)
This critical characterization exemplifies the kind of method thatSchütze and Cowart aimed to improve on. More recently, Gibson andFedorenko describe the traditional informal method this way:
As has often been noted in recent years (Cowart, 1997; Edelman &Christiansen, 2003; Featherston, 2007; Ferreira, 2005; Gibson &Fedorenko, 2010a; Marantz, 2005; Myers, 2009; Schütze, 1996;Wasow & Arnold, 2005), the results obtained using this method arenot necessarily generalisable because of (a) the small number ofexperimental participants (typically one); (b) the small number ofexperimental stimuli (typically one); (c) cognitive biases on the partof the researcher and participants; and (d) the effect of thepreceding context (e.g., other constructions the researcher may havebeen recently considering). (Gibson and Fedorenko, 2013)
While some Essentialists have acknowledged these problems with thereliability of informal methods, others have, in effect, denied theirrelevance. For example, Colin Phillips (2010) argues that “thereis little evidence for the frequent claim that sloppy data-collectionpractices have harmed the development of linguistic theories”.He admits that not all is epistemologically well in syntactic theory,but adds, “I just don’t think that the problems will besolved by a few rating surveys.” He concludes:
I do not think that we should be fooled into thinking that informaljudgment gathering is the root of the problem or that more formalizedjudgment collection will solve the problems. (Phillips 2010: 61)
To suggest that informal methods are as fully reliable as controlledexperimental ones would be a serious charge, implying that researcherslike Bard, Robinson, Sorace, Cowart, Schütze, Gibson, Fedorenko,and others have been wasting their time. But Phillips actually seemsto be making a different claim. He suggests first that informallygathered data has not actually harmed linguistics, and second thatlinguists are in danger of being “fooled” by critics whoinvent stories about unreliable data having harmed linguistics.
The harm that Phillips claims has not occurred relates to the chargethat “mainstream linguistics” (he means the currentgenerative Essentialist framework called ‘Minimalism’) is“irrelevant” to broader interests in the cognitivesciences, and has lost “the initiative in language study”.Of course, Phillips is right in a sense: one cannot insure thatexperimental judgment collection methods will address every way inwhich Minimalist theorizing is irrelevant to particular endeavors(language description, language teaching, natural language processing,or broader questions in cognitive psychological research). But thisclaim does not bear on what Schütze (1996) and Cowart (1997) showabout the unreliability of informal methods.
Phillips does not fully accept the view of Chomsky (1969) thatexperimental methods are useless for data gathering (he says, “Ido not mean to argue that comprehensive data gathering studies ofacceptability are worthless”). But his defense of informalmethods of data collection rests on whether these methods have damagedEssentialist theory testing:
The critiques I have read present no evidence of the supposed damagethat informal intuitions have caused, and among those who do providespecific examples it is rare to provide clear evidence of the supposeddamage that informal intuitions have caused…
What I am specifically questioning is whether informal (andoccasionally careless) gathering of acceptability judgments hasactually held back progress in linguistics, and whether more carefulgathering of acceptability judgments will provide the key to futureprogress.
Either Phillips is fronting the surprising opinion that generativetheorizing has never been led down the wrong track by demonstrablyunreliable data, or he is changing the subject. And unless clearcriteria are established for what counts as “damage” and“holding back,” Phillips is not offering any testablehypothesis about data collection methodology. For example, Phillipsdiscounts the observation of Schütze (1996) that conflictingjudgments of relative unacceptability of violations of two linguisticuniversals held back the development of Government and Binding (GB),on the grounds that two sets of conflicting judgments and theiranalyses “are now largely forgotten, supplanted by theories thathave little to say about such examples.” But the fact that theproposed universals are discarded principles of UG is irrelevant tothe effect that unreliable data once had on the (now largelyabandoned) GB theory. A methodological concern cannot be dismissed onthe basis of a move to a new theory that abandons the old theory butnot its methods!
More recently, Bresnan (2007) claims that many theoretical claims havearguably been supported by unreliable informally gathered syntacticacceptability judgments. She observes:
Erroneous generalizations based on linguistic intuitions aboutisolated, constructed examples occur throughout all parts of thegrammar. They often seriously underestimate the space of grammaticalpossibility (Taylor 1994, 1996, Bresnan & Nikitina 2003, Fellbaum2005, Lødrup 2006, among others), reflect relative frequencyinstead of categorical grammaticality (Labov 1996, Lapata 1999,Manning 2003), overlook complex constraint interactions (Green 1971,Gries 2003) and processing effects (Arnon et al. 2005a, b), and failto address the problems of investigator bias (Labov 1975, Naro 1980,Chambers 2003: 34) and social intervention (Labov 1996, Milroy 2001,Cornips & Poletto 2005). (Bresnan 2007: 301)
Her discussion supports the view that various highly abstracttheoretical hypotheses have been defended through the use ofgeneralizations based on unreliable data.
The debate over the harm that the acceptance of informally collecteddata has had on theory testing is somewhat difficult to understand forEssentialist, Externalist, and Emergentist researchers who have beentrained in the methods of the cognitive and behavioral sciences. Whytry to support one’s theories of universal grammar, or of thegrammars of particular languages, by using questionably reliabledata?
One clue might be found in Culicover and Jackendoff (2010), whowrite:
[T]heoreticians’ subjective judgments are essential informulating linguistic theories. It would cripple linguisticinvestigation if it were required that all judgments of ambiguity andgrammaticality be subject to statistically rigorous experiments onnaive subjects. (Culicover and Jackendoff 2010)
The worry is that use of experimental methods is so resourceconsumptive that it would impede the formulation of linguistictheories. But this changes the subject from the importance of usingreliable data as evidence in theorytesting to using onlyexperimentally gathered data in theoryformulation. We arenot aware of anyone who has ever suggested that at the stage ofhypothesis development or theory formulation the linguist shouldeschew intuition. Certainly Bard et al., Schütze, Cowart, Gibson& Fedorenko, and Ferreira say no such thing. The relevant issueconcerns what data should be used totest theories, which isa very different matter.
We noted earlier that there are clear and uncontroversialacceptability judgments, and that these judgments are reliable data.The difficulty lies in distinguishing the clear, uncontroversial, andreliable data from what only appears to be clear, uncontroversial, andreliable to a research community at a time. William Labov, the founderof modern quantitative sociolinguistics, who takes an Emergentistapproach, proposed a set of working methodological principles in Labov(1975) for adjudicating when experimental methods should beemployed.
The Consensus Principle: If there is no reason tothink otherwise, assume that the judgments of any native speaker arecharacteristic of all speakers.
The Experimenter Principle: If there is anydisagreement on introspective judgments, the judgments of those whoare familiar with the theoretical issues may not be counted asevidence.
The Clear Case Principle: Disputed judgments shouldbe shown to include at least one consistent pattern in the speechcommunity or be abandoned. If differing judgments are said torepresent different dialects, enough investigation of each dialectshould be carried out to show that each judgment is a clear case inthat dialect. (Labov 1975, quoted in Schütze 1996: 200)
If we accept that ‘introspective judgments’ areacceptability judgments, then Labov’s rules of thumb are guidesfor when to deploy experimental methods, although they no doubt needrefinement. However, it seems vastly more likely that carefuldevelopment of such methodological rules of thumb can serve to improvethe reliability of linguistic data and adjudicate these methodologicaldisputes that seem largely independent of any particular approach tolinguistics.
In linguistics, the goal of collecting corpus data is to identify andorganize a representative sample of a written and/or spoken varietyfrom which characteristics of the entire variety or genre can beinduced. Concordances of word usage in linguistic context have longbeen used to aid in the translation and interpretation of literary andsacred texts of particular authors (e.g. Plato, Aristotle, Aquinas)and of particular texts (e.g. the Torah, the rest of the OldTestament, the Gospels, the Epistles). Formal textual criticism, theidentification of antecedently existing oral traditions that werelater redacted into Biblical texts, and author identification (e.g.figuring out which of the Epistles were written by Paul and which wereprobably not) began to develop in the late 19th century.
The development of computational methods for collecting, analyzing,and searching corpora have seen rapid development as computer memoryhas become less expensive and search and analysis programs have becomefaster. The first computer searchable corpus of American English, theBrown Corpus, developed in the 1960s, contained just over one millionword tokens. The British National Corpus (BNC) is a balanced corpuscontaining over 100 million words—a hundredfold sizeincrease—of which 90% is written prose published from 1991 to1994 and 10% is spoken English. Between 2005 and 2007, billion-wordcorpora were released for British English (ukWaC), German (deWaC), andItalian (itWaC)—a thousand times bigger than the Brown corpus.And the entire World Wide Web probably holds about a thousand times asmuch as that—around a trillion words. Thus corpus linguisticshas gone from megabytes of data (∼ 103kB) to terabytesof data (∼ 109kB) in fifty years.
Just as a central issue concerning acceptability judgment dataconcerns its reliability as evidence for empirical generalizationsabout languages or idiolects, a central question concerning thecollection of corpus data concerns whether or not it is representativeof the language variety it purports to represent. Some linguists makethe criterion of representativeness definitional: they call acollection of samples of language use a corpus only if it has beencarefully balanced between different genres (conversation, informalwriting, journalism, literature, etc.), regional varieties, orwhatever.
But corpora are of many different kinds. Some are just very largecompilations of text from individual sources such as newspapers ofrecord or the World Wide Web—compilations large enough for thediversity in the source to act as a surrogate for representativeness.For example, a billion words of a newspaper, despite coming from asingle source, will include not only journalists’ news reportsand prepared editorials but also quoted speech, political rhetoric,humor columns, light features, theater and film reviews,readers’ letters, fiction items, and so on, and will thusprovide examples of a much wider variety of styles than one might havethought.
Corpora are cleaned up through automatic or manual removal of suchelements as numerical tables, typographical slips, spelling mistakes,markup tags, accidental repetitions (the the), larger-scaleduplications (e.g., copies on mirror sites), boilerplate text(Opinions expressed in this email do not necessarilyreflect…), and so on (see Baroni et al. 2009 for a fullerdiscussion of corpus cleaning).
The entire web itself can be used as a corpus to some degree, despiteits constantly changing content, its multilinguality, its many tablesand images, and its total lack of quality control; but when it is, theoutputs of searches are nearly always cleaned by disregarding unwantedresults. For example, Google searches are blind to punctuation,capitalization, and sentence boundaries, so search results fortobe will unfortunately include irrelevant cases, such as where asentence likeDo you want to? happens to be followed by asentence likeBe careful.
Corpora can be annotated in ways that permit certain kinds of analysisand grammar testing. One basic kind of annotation is part-of-speechtagging, in which each word is labeled with its syntactic category.Another is lemmatization, which classifies the differentmorphologically inflected forms of a word as belonging together(goes,gone,going, andwentbelong withgo, for example). A more thoroughgoing kind ofannotation involves adding markup that encodes trees representingtheir structure; an example likeThat road leads to thefreeway might be marked up as a Clause within which the first twowords make up a Noun Phrase (NP), the last four constitute a VerbPhrase (VP), and so on, giving a structural analysis representedthus:

Such a diagram is isomorphic to (and the one shown was computeddirectly from) a labeled bracketing like this:
(.Clause. (.NP. (.D. ‘that’ ) (.N. ‘road’ ) )(.VP. (.V. ‘leads’ ) (.PP. (.P. ‘to’ ) (.NP.(.D. ‘the’ ) (.N. ‘freeway’ ) ) ) ) )
and this in turn could be represented in a markup language like XMLas:
<clause> <nounphrase> <determiner>that</determiner> <noun>road</noun> </nounphrase> <verbphrase> <verb>leads</verb> <prepphrase> <prep>to</prep> <nounphrase> <determiner>the</determiner> <noun>freeway</noun> </nounphrase> </prepphrase> </verbphrase> </clause>
A corpus annotated with tree structure is known as atreebank. Clearly, such a corpus is not a raw recordof attested utterances at all; it is a combination of a collection ofattested utterances together with a systematic attempt at analysingtheir structure. Whether the analysis is added manually orsemi-automatically, it is ultimately based on native speakerjudgments. (Treebanks are often developed by graduate studentannotators tutored by computational linguists; naturally, consistencybetween annotators is an issue that needs regular attention. SeeArtstein and Poesio, 2008, for discussion of the methodologicalissues.).
One of the purposes of a treebank is to permit the furtherinvestigation of a language and the checking of further linguistichypotheses by searching a large database of previously establishedanalyses. It can also be used to test grammars, natural languageprocessing systems, or machine learning programs.
Going beyond syntactic parse trees, it is possible to annotate corporafurther, with information of a semantic and pragmatic nature. There isongoing computational linguistic research aimed at discoveringwhether, for example, semantic annotation that is semi-automaticallyadded might suffice for recognition of whether a product review ispositive or negative (what computational linguists call‘sentiment analysis’).
Notice, then, that using corpus data does not mean abandoning orescaping from the use of intuitions about acceptability or grammaticalstructure: the results of a corpus search are generally filteredthrough the judgments of an investigator who decides which pieces ofcorpus data are to be taken at face value and which are just bad hitsor irrelevant noise.
Difficult methodological issues arise in connection with thecollection, annotation, and use of corpus data. For example, there isthe issue of extremely rare expression tokens. Are they accuratelyrecorded tokens of expression types that turn up only in consequenceof sporadic errors and should be dismissed as irrelevant unless thetopic of interest is performance errors? Are they due to errors in thecompilation of the corpus itself, corresponding to neither acceptedusage nor sporadic speech errors? Or are they perfectly grammaticalbut (for some extraneous reason) very rare, at least in thatparticular corpus?
Many questions arise about what kind of corpus is best suited to theresearch questions under consideration, as well as what kind ofannotation is most appropriate. For example, as Ferreira (2005: 375)points out, some large corpora, insofar as they havenot beencleaned of speech errors, provide relevant data for studying thedistribution of speech disfluencies. In addition, probabilisticinformation about the relation between a particular verb and itsarguments has been used to show that “verb-argument preferences[are] an essential part of the process of sentenceinterpretation” (Roland and Jurafsky 2002: 325): acceptabilityjudgments on individual expressions do not provide information aboutthe distribution of a verb and its arguments in various kinds ofspeech and writing. Studying conveyed meaning in context andidentification of speech acts will require a kind of data thatdecontextualized acceptability judgments do not provide butsemantically annotated corpora might.
Many Essentialists have been skeptical of the reliability ofuncleaned, unanalyzed corpus data as evidence to support linguistictheorizing, because it is assumed to be replete with strings that anynative speaker would judge unacceptable. And many Emergentists andExternalists, as well as some Essentialists, have charged thatinformally gathered acceptability judgments can be highly unreliabletoo. Both worries are apposite; but the former does not hold foradequately cleaned and analyzed corpora, and the latter does not holdfor judgment data that has been gathered using appropriatelycontrolled methods. In certain contested cases of acceptability, itwill of course be important to use both corpus and controlledelicitation methods to cross-compare.
Notice that we have not in any way suggested that our three broadapproaches to linguistics should differ in the kinds of data they usefor theory testing: Essentialists are not limited to informalelicitation; nor are Emergentists and Externalists denied access toit. In matters of methodology, at least, there is in principle an openmarket—even if many linguists seem to think otherwise.
The three approaches to linguistic theorizing have at least somethingto say about how languages are acquired, or could in principle beacquired. Language acquisition has had a much higher profile sincegenerative Essentialist work of the 1970s and 1980s gave it a centralplace on the agenda for linguistic theory.
Research into language acquisition falls squarely within thepsychology of language; see the entry onlanguage and innateness. In this section we do not aim to deal in detail with any of thevoluminous literature on psychological or computational experimentsbearing on language acquisition, or with any of the empirical study oflanguage acquisition by developmental linguists, or the‘stimulus poverty’ argument for the existence of innateknowledge about linguistic structure (Pullum and Scholz 2002). Ourgoals are merely to define the issue oflinguisticnativism, set it in context, and draw morals for our threeapproaches from some of the mathematical work on inductive languagelearning.
The reader with prior acquaintance with the literature of linguisticswill notice that we have not made reference to any partitioning oflinguists into two camps called ‘empiricists’ and‘rationalists’ (see e.g. Matthews 1984, Cowie 1999). Wedraw a different distinction relating to the psychological andbiological prerequisites for first language acquisition. It dividesnearly all Emergentists and Externalists from most Essentialists. Ithas often been confused with the classical empiricist/rationalistissue.
General nativists maintain that the prerequisites forlanguage acquisition are just general cognitive abilities andresources.Linguistic nativists, by contrast, claimthat human infants have access to at least some specificallylinguistic information that is not learned from linguistic experience.Table 3 briefly sketches the differences between the two views.
| general nativists | linguistic nativists |
| Languages are acquired mainly through the exercise of defeasibleinductive methods, based on experience of linguisticcommunication | Language cannot be acquired by defeasible inductive methods; itsstructural principles must to a very large degree be unlearned |
| The unlearned capacities that underpin language acquisitionconstitute a uniquely human complex of non-linguistic dispositions andmechanisms that also subserve other cognitive functions | In addition to various broadly language-relevant cognitive andperceptual capacities, language acquisition draws on an unlearnedsystem of ‘universal grammar’ that constrains languageform |
| Various non-human animal species may well have most or all ofthe capacities that humans use for language acquisition—thoughno non-human species seems to have the whole package, so interspeciesdifferences are a matter of degree | There is a special component of the human mind which has thedevelopment of language as its key function, and no non-human specieshas anything of the sort, so there is a difference in kind between theabilities of humans and other animals |
Table 3: General and linguistic nativismcontrasted
There does not really seem to be anyone who is a completenon-nativist: nobody really thinks that a creature with no unlearnedcapacities at all could acquire a language. That was the point of themuch-quoted remark by Quine (1972: 95–96) about how “thebehaviorist is knowingly and cheerfully up to his neck in innatemechanisms of learning-readiness”. Geoffrey Sampson (2001, 2005)is about as extreme an opponent of linguistic nativism as one canfind, but even he would not take the failure of language acquisitionin his cat to be unrelated to the cognitive and physical capabilitiesof cats.
The issue on which empirical research can and should be done iswhether some of the unlearned prerequisites that humans enjoy havespecifically linguistic content. For a philosophically-orienteddiscussion of the matter, see chapters 4–6 of Stainton (2006).For extensive debate about “the argument from poverty of thestimulus”, see Pullum and Scholz (2002) together with the sixcritiques published in the same issue ofThe LinguisticReview and the responses to those critiques by Scholz and Pullum(2002).
Linguists have given considerable attention to considerations ofin-principlelearnability—not so much thecourse of language acquisition as tracked empirically (the work ofdevelopmental psycholinguists) but the question of how languages ofthe human sort could possibly be learned by any kind of learner. Thetopic was placed squarely on the agenda by Chomsky (1965); and ahugely influential mathematical linguistics paper by Gold (1967)hasdominated much of the subsequent discussion.
Gold began by considering a reformulation of the standardphilosophical problem of induction. The trouble with the question‘Which hypothesis is correct given the totality of thedata?’ is of course the one that Hume saw: if the domain isunbounded, no finite amount of data can answer the question. Anyfinite body of evidence will be consistent with arbitrarily manyhypotheses that are not consistent with each other. But Gold proposedreplacing the question with a very different one:Which tentativehypothesis is the one to pick, given the data provided so far,assuming a finite number of wrong guesses can be forgiven?
Gold assumed that the hypotheses, in the case of language learning,were generative grammars (or alternatively parsers; he proves resultsconcerning both, but for brevity we follow most of the literature andneglect the very similar results on parsers). The learner’s taskis conceived of as responding to an unending input data stream(ultimately complete, in that every expression eventually turns up) byenunciating a sequence of guesses at grammars.
Although Gold talks in developmental psycholinguistic terms aboutlanguage learners learning grammars by trial and error, his extremelyabstract proofs actually make no reference to the linguistic contentof languages or grammars at all. The set of all finite grammarsformulable in any given metalanguage is computably enumerable, sogrammars can be systematically numbered. Inputs—grammaticalexpressions from the target language—can also be numericallyencoded. We end up being concerned simply with the existence ornon-existence of certain functions from natural number sequences tonatural numbers.
A successful learner is one who uses a procedure that is guaranteed toeventually hit on a correct grammar. For single languages, this istrivial: if the target language isL and it is generated by agrammarG, then the procedure “Always guessG” does the job, and every language is learnable. Whatmakes the problem interesting is applying it toclasses ofgrammars. A successful learner for a classC is one who usesa procedure that is guaranteed to succeed no matter what grammar fromC is the target and no matter what the data stream is like(as long as it is complete and contains no ungrammaticalexamples).
Gold’s work has interesting similarities with earlierphilosophical work on inductive learning by Hilary Putnam (1963; it isnot clear whether Gold was aware of this paper). Putnam gave aninformal proof of a sort of incompleteness theorem for inductiveregularity-learning devices: no matter what algorithm is used in amachine for inducing regularities from experience, and thus becomingable to predict events, there will always be some possibleenvironmental regularities that will defeat it. (As a simple example,imagine an environment giving an unbroken sequence of presentationsall having some propertya. If there is a positive integern such that aftern presentations the machine willpredict that presentation number n + 1 will also have propertya, then the machine will be defeated by an environmentconsisting ofn presentations ofa followed by onewith the incompatible propertyb—the future need notalways resemble the past. But if on the other hand there is no suchn, then an environment consisting of an unending sequence ofa presentations will defeat it.)
Gold’s theorems are founded on certain specific idealizingassumptions about the language learning situation, some of which areintuitively very generous to the learner. The main ones are these:
The most celebrated of the theorems Gold proved (using some reasoningremarkably similar to that of Putnam 1963) showed that a languagelearner could be similarly hostage to malign environments. Imagine alearner being exposed to an endless and ultimately exhaustive sequenceof presented expressions from some target language—Gold callssuch a sequence a ‘text’. Suppose the learner does notknow in advance whether the language is infinite, or is one of theinfinitely many finite languages over the vocabularyV. Goldreasons roughly thus:
Leaping too soon to the conclusion that the target language isinfinite will be disastrous, because there will be no way to retrench:no presented examples from a finite languageLk will ever conflict with the hypothesisthat the target is some infinite superset ofLk.
The relevance of all this to the philosophy of linguistics is that thetheorem just sketched has been interpreted by many linguists,psycholinguists, and philosophers as showing that humans could notlearn languages by inductive inference based on examples of languageuse, becauseall of the well-known families of languagesdefined by different types of generative grammar have the crucialproperty of allowing grammars for every finite language and for atleast some infinite supersets of them. But Gold’s paper hasoften been over-interpreted. A few examples of the resultant mistakesfollow.
It’s not about underdetermination. Gold’snegative results are sometimes wrongly taken to be an unsurprisingreflection of the underdetermination of theories by finite bodies ofevidence (Hauser et al. 2002 seem to make this erroneous equation onp. 1577; so do Fodor and Crowther 2002, implicitly—see Scholzand Pullum 2002, 204–206). But the failure oftext-identifiability for certain classes of languages is differentfrom underdetermination in a very important way, because there areinfinite classes of infinite languages thatare identifiablefrom text. The first chapter of Jain et al. (1999) discusses anillustrative example (basically, it is the class containing, for alln > 0, the set of all strings with length greater thann). There are infinitely many others. For example, Shinohara(1990) showed that for any positive integern the class ofall languages generated by a context-sensitive grammar with not morethann rules is learnable from text.
It’s not about stimulus poverty. It has alsosometimes been assumed that Gold is giving some kind of argument frompoverty of the stimulus (there are signs of this in Cowie 1999, 194ff;Hauser et al. 2002, 1577; and Prinz 2002, 210). This is very clearly amistake (as both Laurence and Margolis 2001 and Matthews 2007 note):in Gold’s text-learning scenario there is no stimulus poverty atall. Every expression in the language eventually turns up in thelearner’s input.
It’s not all bad news. It is sometimesforgotten that Gold established a number of optimistic results as wellas the pessimistic one about learning from text. Given what he calledan ‘informant’ environment rather than a text environment,we see strikingly different results. An informant environment is aninfinite sequence of presentations sorted into two lists, positiveinstances (expressions belonging to the target language) and negativeinstances (not in the language). Almost all major language-theoreticclasses are identifiable in the limit from an informant environment(up to and including the class of all languages with a primitiverecursive characteristic function, which comes close to covering anylanguage that could conceivably be of linguistic interest), and allcomputably enumerable languages become learnable if texts are allowedto be sequenced in particular ways (see the results in Gold 1967 on‘anomalous text’).
Gold did not give a necessary condition for a class to be identifiablein the limit from text, but Angluin (1980) later provided one (in aresult almost but not quite obtained by Wexler and Hamburger 1973).Angluin showed that a classC is text-identifiable iff everylanguageL inC has a finite “telltale”subsetT such that ifT is also proper subset ofsome other language inC, that other language is not a propersubset ofL. This condition precludes guessing too large alanguage. Once all the members of the telltale subset forLhave been received as input, the learner can safely makeLthe current conjecture. The language to be identified must be eitherL or (if subsequent inputs include new sentences not inL) some larger language, but it can’t be a propersubset ofL.
Johnson (2004) provides a useful review of several othermisconceptions about Gold’s work; e.g., the notion that it mightbe the absence of semantics from the input that makes identificationfrom text impossible (this is not the case).
Some generative Essentialists see a kind of paradox in Gold’sresults—a reductio on one or more of the assumptions he makesabout in-principle learnability. To put it very crudely, learninggenerative grammars from presented grammatical examples seems to havebeen proved impossible, yet children do learn their first languages,which for generative Essentialists means they internalize generativepsychogrammars, and it is claimed to be an empirical fact that theyget almost no explicit evidence about what isnot in thelanguage (Brown and Hanlon 1970 is invariably cited to support this).Contradiction. Gold himself suggested three escape routes from theapparent paradox:
All three of these paths have been subsequently explored. Path (1)appealed to generative Essentialists. Chomsky (1981) suggested anextreme restriction: that universal grammar permitted only finitelymany grammars. This claim (for which Chomsky had little basis: seePullum 1983) would immediately guarantee that not all finite languagesare humanly learnable (there are infinitely many finite languages, sofor most of them there would be no permissible grammar). Osherson andWeinstein (1984) even proved that under three fairly plausibleassumptions about the conditions on learning, finiteness of the classof languages is necessary—that is, a classmust befinite if it is to be identifiable from text. However, they alsoproved that this is not sufficient: there are very small finiteclasses of languages that arenot identifiable from text, soit is logically possible for text-identification to be impossible evengiven only a finite number of languages (grammars). These two resultsshow that Chomsky’s approach cannot be the whole answer.
Path (2) proposes investigation of children’s input with an eyeto finding covert sources of negative evidence. Variouspsycholinguists have pursued this idea; see the entry onlanguage and innateness in this encyclopedia, and (to cite one example) the results ofChouinard and Clark (2003) on hitherto unnoticed sources of negativeevidence in the infant’s linguistic environment, such asparental corrections.
Path (3) suggests investigating the nature of children’slinguistic environments more generally. Making evidence available tothe learner in some fixed order can certainly alter the picture quiteradically (Gold proved that if some primitive-recursive generatorcontrols the text it can in effect encode the identity of the targetlanguage so that all computably enumerable languages becomeidentifiable from text). It is possible in principle that limitationson texts (or on learners’ uptake) might have positive ratherthan negative effects on learnability (see Newport 1988; Elman 1993;Rohde and Plaut 1999; and the entry onlanguage and innateness).
Gold’s suggested strategy of restricting the pre-set class ofgrammars has been interpreted by some as a defense of rationalistrather than empiricist theories of language acquisition. For example,Wexler and Culicover state:
Empiricist theory allows for a class of sensory or peripheralprocessing mechanisms by means of which the organism receives data. Inaddition, the organism possesses some set of inductive principles orlearning mechanisms…Rationalist theory also assumes that alearner has sensory mechanisms and inductive principles. Butrationalist theory assumes that in addition the learner possesses arich set of principles concerning the general nature of the abilitythat is to be learned. (Wexler and Culicover 1980: 5)
Wexler and Culicover claim that ‘empiricist’ learningmechanisms are both weak and general: not only are they ‘notrelated to the learning of any particular subject matter or cognitiveability’ but they are not ‘limited to any particularspecies’. It is of course not surprising that empiricistlearning fails if it is defined in a way that precludes drawing adistinction between the cognitive abilities of humans and fruitflies.
Equating Gold’s idea of restricting the class of grammars withthe idea of a ‘rationalist’ knowledge acquisition theory,Wexler and Culicover try to draw out the consequences of Gold’sparadigm for the Essentialist linguistic theory of Chomsky (1965).They show how a very tightly restricted class of transformationalgrammars could be regarded as text-identifiable under extremely strongassumptions (e.g., that all languages have the same innately knowndeep structures).
Matthews (1984) follows Wexler and Culicover’s lead and draws amore philosophically oriented moral:
The significance of Gold’s result becomes apparent if oneconsiders that (i) empiricists assume that there are no constraints onthe class of possible languages (besides perhaps that naturallanguages be recursively enumerable), and (ii) the learner employs amaximally powerful learning strategy—there are no strategiesthat could accomplish what that assumed by Gold cannot. These twofacts, given Gold’s unsolvability result for text data,effectively dispose of the empiricist claim that there exists a‘discovery procedure’. (1989: 60)
The actual relation of Gold’s results to theempiricism/rationalism controversy seems to us rather different.Gold’s paradigm looks a lot more like a formalization ofso-called ‘rationalism’. The fixed class of candidatehypotheses (grammars) corresponds to what is given by universalgrammar—the innate definition of the essential properties oflanguage. What Gold actually shows, therefore, is not “theplausibility of rationalism” but rather the inadequacy of a hugerange of rationalist theories: under a wide range of different choicesof universal grammar, language acquisition appears to remainimpossible.
Moreover, Matthews ignores (as most linguists have) the existence oflarge and interesting classes of languages that aretext-identifiable.
Gold’s result, like Putnam’s earlier one, does show that acertain kind of trial-and-error inductive learning is insufficient topermit learning of arbitrary environmental regularities. There has tobe some kind of initial bias in the learning procedure or in the data.But ‘empiricism’, the supposed opponent of‘rationalism’, is not to be equated with a denial of theexistence of learning biases. No one doubts that humans have inductivebiases. To quote Quine again, “Innate biases and dispositionsare the cornerstone of behaviorism, and have been studied bybehaviorists” (1972: 95–96). As Lappin and Shieber (2007)stress, there cannot be such a thing as a learning procedure (orprocessing mechanism) with no biases at all.
The biases posited in Emergentist theories of language acquisition arefound, at least in part, in the non-linguistic social and cognitivebases of human communication. And the biases of Externalist approachesto language acquisition are to be found in the distributional andstochastic structure of the learning input and the multitude ofmechanisms that process that input and their interactions. Allcontemporary approaches to language acquistion have acknowledgedGold’s results, but those results do not by themselves vindicateany one of our three approaches to the study of language.
Gold’s explicit equation of acquiring a language withidentifying a generative grammar that exactly generates it naturallymakes his work seem relevant to generative Essentialists (though evenfor them, his results do not provide anything like a sufficient reasonfor adopting the linguistic nativist position). But another keyassumption, that nothing about the statistical structure of the inputplays a role in the acquisition process, is being questioned byincreasing numbers of Externalists, many of whom have used Bayesianmodeling to show that the absence of positive evidence can function asa powerful source of (indirect) negative evidence: learning can bedriven by what is not found as well as by what is (see e.g. Foraker etal. (2009)).
Most Emergentists simply reject the assumption that what is learned isa generative grammar. They see the acquisition task as a matter oflearning the details of an array of constructions (roughly,meaning-bearing ways of structurally composing words or phrases) andhow to use them to communicate. How such learning is accomplishedneeds a great deal of further study, but Gold’s paper did notshow it to be impossible.
Over the past three decades a large amount of work has been done ontopics to which the term ‘language evolution’ is attached,but there are in fact four distinct such topics:
Emergentists tend to regard any of the topics (a)–(d) aspotentially relevant to the study of language evolution. Essentialiststend to focus solely on (c). Some Essentialists even deny that (a) and(b) have any relevance to the study of (c); for example:
There is nothing useful to be said about behavior or thought at thelevel of abstraction at which animal and human communication falltogether… [H]uman language, it appears, is based on entirelydifferent principles. This, I think, is an important point, oftenoverlooked by those who approach language as a natural, biologicalphenomenon; in particular, it seems rather pointless, for thesereasons, to speculate about the evolution of human language fromsimpler systems… (Chomsky 1968: 62)
Other generative Essentialists, like Pinker and Bloom (1990) andPinker and Jackendoff (2005), seem open to the view that even the mostelemental aspects of topic (b) can be directly relevant to the studyof (c). This division among Essentialists reflects a division amongtheir views about the role of adaptive explanations in the emergenceof (b) and especially (c). For example:
We know very little about what happens when 1010 neuronsare crammed into something the size of a basketball, with furtherconditions imposed by the specific manner in which this systemdeveloped over time. It would be a serious error to suppose that allproperties, or the interesting properties of the structures thatevolved, can be ‘explained’ in terms of ‘naturalselection’. (Chomsky 1975:59, quoted by Newmeyer 1998 andJackendoff 2002)
The view expressed here that all (or even most) interesting propertiesof the language faculty are not adaptations conflicts with the basicexplanatory strategy of evolutionary psychology found in theneo-Darwinian Essentialist views of Pinker and Bloom.Piattelli-Palmarini (1989), following Chomsky, adopts a fairlystandard Bauplan critique of adaptationism. On this view the languagefaculty did not originate as an adaptation, but more plausibly“may have originally arisen for some purely architectural orstructural reason (perhaps overall brain size, or the sheerduplication of pre-existing modules), or as a by product of theevolutionary pressures” (p. 19), i.e., it is a kind of Gouldianspandrel.
More recently, some Essentialist-leaning authors have rejected theview that no analogies and homologies between animal and humancommunication are relevant to the study of language. For example, inthe context of commenting on Hauser et al. (2002), Tecumseh Fitch(2010) claims that “Although Language, writ large, is unique toour species (many probably most) of the mechanisms involved inlanguage have analogues or homologues in other animals.”However, the view that the investigation of animal communication canshed light on human language is still firmly rejected by some. Forexample, Bickerton (2007: 512) asserts that “nothing resemblinghuman language could have developed from prior animal callsystems.”
Bickerton fronts the following simple argument for his view:
If any adaptation is unique to a species, the selective pressure thatdrove it must also be unique to that species; otherwise the adaptationwould have appeared elsewhere, at least in rudimentary form. (2007:514)
Thus, the mere fact that language is unique to humans is sufficient torule out monkey and primate call systems as preadapations forlanguage. But,contra Bickerton, a neo-Darwinian likeJackendoff (2002) appeals to the work of Dunbar (1998), Power (1998),Worden (1998) to provide a selectionist story which assumes thatcooperation in hunting, defense (Pinker and Bloom 1990), and “‘social grooming’ or deception” are selective forcesthat operated on human ancestors to drive increases in expressivepower that distinguishes non-human communication and human linguisticcapacities and systems. Bickerton (2014), however, combines aspects ofEssentialism, Emergentism, and Externalism by taking equal parts ofMinimalism, primatology, and cultural evolution into a more holisticaccount. He specifically tailors a niche construction theory toexplain the emergence of displaced, discrete symbolization in aparticular kind of primate, namely human beings. He thus allows for(a) and (b) to figure in an explanation of (c). This is somewhat of adeparture from his earlier positions.
Within the general Essentialist camp, language evolution has takencenter stage since the inception of the Minimalist Program. Anexplanation of the evolution of language now became one of the maintheoretical driving forces behind linguistic theory and explanation.Again, the focus seems to have stayed largely on (c). Berwick andChomsky explicitly state:
At some time in the very recent past, apparently sometime before80,000 years ago, if we can judge from associated symbolic proxies,individuals in a small group of hominids in East Africa underwent aminor biological change that provided the operation Merge-an operationthat takes human concepts as computational atoms and yields structuredexpressions that, systematically interpreted by the conceptual system,provide a rich language of thought. (2016: 87)
Such theories rely heavily on the possibility of the evolution oflanguage being explained in terms of saltation or random mutation.This postulate has come under significant scrutiny (see Steedman2017). Saltation views, however, rely on one of the core assumptionsmentioned in the quote above, i.e. that language evolved circa 100 000years old. This central claim has recently been challenged by Everett(2017) who cites paleontological evidence from the alleged nauticalabilities ofHomo Erectus to dismantle this timeline. Iftrue, this would mean that language evolved around two million yearsago and random mutation need not be the only viable explanation asmany in the Essentialist framework assume (see Progovac 2015 for aparticular gradualist account).
While generative Essentialists debate among themselves about theplausibility of adaptive explanations for the emergence of essentialfeatures of a modular language capacity, Emergentists are perhaps bestcharacterized as seeking broad evolutionary explanations of thefeatures of languages (topic (c)) and communicative capacities (topics(b) and (c)) conceived in non-essentialist, non-modular ways. And astheorists who are committed to exploring non-modular views oflinguistic capacities (topic (c)), the differences and similaritiesbetween (a) and (b) are potentially relevant to (c).
Primatologists like Cheney and Seyfarth, psychologists like Tomasello,anthropologists like Terrence Deacon, and linguists like PhillipLieberman share an interest in investigating the communicative,anatomical, and cognitive characteristics of non-human animals toidentify biological differences between humans, and monkeys andprimates. In the following paragraph we discuss Cheney and Seyfarth(2005) as an example, but we could easily have chosen any of a numberof other theorists.
Cheney and Seyfarth (2005) emphasize that non-human primates have asmall, stimulus specific repertoire of vocal productions that are not“entirely involuntary,” and this contrasts with their“almost openended ability to learn novel sound-meaningpairs” (p. 149). They also emphasize that vocalizations inmonkeys and apes are used to communicate information about thevocalizer, not to provide information intended to “rectify falsebeliefs in others or instruct others” (p. 150). Non-humanprimate communication consists in the mainly involuntary broadcastingof the vocalizer’s current affective state. Moreover, althoughCheney and Seyfarth recognize that the vervet monkey’scelebrated call system (Cheney and Seyfarth 1990) is“functionally referential” in context, their calls have noexplicit meaning since they lack “any propositionalstructure”. From this they conclude:
The communication of non-human animals lacks three features that areabundantly present in the utterances of young children: a rudimentaryability to attribute mental states different from their own to others,the ability to generate new words, and lexical syntax. (2005: 151)
By ‘lexical syntax’ Cheney and Seyfarth mean a kind ofsemantic compositionality of characteristic vocalizations. If avocalization (call) were to have lexical syntax, the semanticsignificance of the whole would depend on the relation of thestructure of parts of the call to the structure of what they signify.The absence of ‘lexical syntax’ in call systems suggeststhat it is illegitimate to think of them as having anything likesemantic structure at all.
Despite the rudimentary character of animal communication systems whencompared with human languages, Cheney and Seyfarth argue that monkeysand apes exhibit at least five characteristics that arepre-adaptations for human communication:
It is, of course, controversial to claim that monkeys haverule-governed propositional social knowledge systems as claimed in(iv) and (v). For instance, Tomasello’s (2008)‘Cooperative Communication’ approach makes a case forprimate intentional systems not based on their vocalizations but ontheir gestural systems. Therein he claims that “great apegestural communication shares with human linguistic communicationfoundational aspects of its manner of functioning, namely, theintentional and flexible use of learned communicative signals”(2008: 21).
But Emergentists, Externalists, and Essentialists could all, inprinciple, agree that there are both unique characteristics of humancommunicative capacities and characteristics of such capacities thatare shared with non-humans. For example, by the age of one, humaninfants can use direction of gaze and focus of attention to infer thereferent of a speaker’s utterance (Baldwin and Moses 1994). Bycontrast, this sort of social referencing capacity in monkeys and apesis rudimentary. This suggests that a major component of humans’capacity to infer a specific referent is lacking in non-humans.
Disagreements between the approaches might be due to the perceivedsignificance of non-human communicative capacities and their relationto uniquely human ones.
We mentioned earlier that both early 20th-century linguisticsmonographs and contemporary introductory textbooks include discussionsof historical linguistics, i.e., that branch that studies the historyand prehistory of changes in particular languages, how they arerelated to each other, and how and why they change. Again, this topicis distinct from the emergence of language in hominoid species andconcerns mostly the linguistic changes that have occurred over a muchshorter period within languages.
The last decade has seen two kinds of innovations related to studyingchanges in particular languages. One, which we will call‘linguistic phylogeny’, concerns the application ofstochastic phylogenetic methods to investigate prehistoric populationand language dispersion (Gray and Jordan 2000, Gray 2005, Atkinson andGray 2006, Gray et al. 2009). These methods answer questions about howmembers of a family of languages are related to each other anddispersed throughout a geographic area. The second, which we will callthe effects of transmission, examines how interpreted artificiallanguages (sets of signifier/signified pairs) change under a range oftransmission conditions (Kirby et al. 2008, Kirby 2001, Hurford 2000),thus providing evidence about how the process of transmission affectsthe characteristics, especially the structure, of the transmittedinterpreted system.
Russell Gray and his colleagues have taken powerful phylogeneticmethods that were developed by biologists to investigate molecularevolution, and applied them to linguistic data in order to answerquestions about the evolution of language families. For example, Grayand Jordan (2000) used a parsimony analysis of a large language dataset to adjudicate between competing hypotheses about the speed of thespread of Austronesian languages through the Pacific. More recently,Greenhill et al. (2010) used a NeighbourNet analysis to evaluate therelative rates of change in the typological and lexical features ofAustronesian and Indo-European. These results bear on hypotheses aboutthe relative stability of language types over lexical features ofthose languages, and how far back in time that stability extends. Ifthere were highly conserved typological and lexical features, then itmight be possible to identify relationships between languages thatdate beyond the 8000 (plus or minus 2000) year limit that is imposedby lexical instability.
The computational and laboratory experiments of Kirby and hiscollaborators have shown that under certain conditions of iteratedlearning, any given set of signifier/signified pairs in which themapping is initially arbitrary will change to exhibit a very generalkind of compositional structure. Iterated learning has been studied inboth computational and laboratory experiments by means of diffusionchains, i.e., sequences of learners. A primary characteristic of suchsequences of transmission is that what is transmitted from learner tolearner will change in an iterated learning environment, in a way thatdepends on the conditions of transmission.
The children’s game called ‘Telephone’ in the USA(‘Chinese Whispers’ in the UK), provides an example ofdiffusion chains under which what is transmitted is not stable. In adiffusion chain learning situation what a chain member has actuallylearned from an earlier member of the chain is presented as the inputto the next learner, and what that learner has actually learnedprovides the input to the following learner. In cases where theinitial learning task is very simple: i.e., where what is transmittedis both simple, completely transmitted, and the transmission channelis not noisy, what is transmitted is stable over iteratedtransmissions even in cases when the participants are young childrenand chimpanzees (Horner et al. 2006). That is, there is little changein what is transmitted over iterated transmissions. However, in caseswhere what is transmitted is only partially presented, very complex,or the channel is noisy, then there is a decrease in the fidelity ofwhat is transmitted across iterations just like there is in thechildren’s game of Telephone.
What Kirby and colleagues show is that when the initial input to adiffusion chain is a reasonably complex set of arbitrarysignal/signifier pairs, e.g. one in which 27 complex signals of 6letters are randomly assigned to 27 objects varying on dimensions ofcolor, kind of motion, and shape, what is transmitted becomes more andmore compositional over iterated transmission. Here,‘compositional’ is being used to refer to the high degreeto which sub-strings of the signals come to be systematically pairedwith specific phenomenal sub-features of what is signified. Thetransmission conditions in these experiments were free of noise, andfor each iteration of the learning task only half of the possible 27signifier/signified pairs were presented to participants. Under thiskind of transmission bottleneck a high degree of sign/signifiedstructure emerged.
A plausible interpretation of these results is that the developingstructure of the collection of signs is a consequence of the repeatedforced inference by participants from 14 signs and signifieds in thetraining set to the entire set of 27 pairs. A moral could be thatiterated forced prediction of the sign/signified pairs in the entireset, on the basis of exposure to only about half of them, induced thedevelopment of a systematic, compositional structure over the courseof transmission. It is reasonable to conjecture that this resultingstructure reflects effects of human memory, not a domain-specificlanguage module—although further work would be required to ruleout many other competing hypotheses.
Thus Kirby and his colleagues focus on something very different fromthe prerequisites for language emergence. Linguistic nativists havebeen interested in how primates like us could have become capable ofacquiring systems with the structural properties of natural languages.Kirby and his colleagues (while not denying that human cognitiveevolution is of interest) are studying howlanguages evolve to becapable of being acquired by primates like us.
Lastly, language evolution has amassed a great deal ofinterdisciplinary work in recent times. This has allowed philosophersto directly contribute to this emerging field. The trends in thephilosophical work have only loosely followed the Externalist,Emergentist and Essentialist divisions we advocate here. Mostphilosophical work has largely been focused on Emergentist conceptionswithin the evolution of linguistic meaning specifically.
Bar-On (2013) distinguishes between Gricean and Post-Griceanapproaches to the evolution of language. The former requires anattribution of Gricean speaker meaning to our languageless ancestorswhich in turn seems to assume intentional actions govern byrationality (‘nonnatural meaning’). This task is asfraught as explaining the evolution of language itself. She thusproposes the latter, specifically a Post-Gricean (Orrigi and Sperber2000) approach which takes expressive communication (found widely innon-human animal species) as a basis for the emergence of linguisticmeaning between signalers and receivers. She states:
Expressive communication, I will argue, exhibits features thatforeshadow significant aspects of linguistic communication. In itsdomain, we can identify legitimate natural precursors of meaningfullinguistic communication. (For present purposes, by ‘legitimatenatural precursors’, I mean behavioral interactions that atleast: a. can be found in the natural world; b. go beyondTomasello’s mere ‘communicative displays’; c. do notdepend on crediting the relevant creatures with language-likepropositional thought or post-Gricean communicative intentions, and;d. exhibit features that foreshadow important semantic and pragmaticfeatures of linguistic communication so in that sense areproto-semantic and proto-pragmatic.) (2013: 354)
Recent work in Evolutionary Game Theory has also lent credence to theemergence of signaling systems involving non-intentional states.Taking Lewis (1969) as a spring-board, Skyrms (2010) investigates thestructure of signaling behavior beyond the existence of mutualconventions. His framework starts from the most basic non-trivialcases and gradually introduces complexity (such as deception and theintroduction of new signals etc.). Skyrms’ account viewspropositional or semantic content as a special case of informationalcontent thereby reintroducing information theory as a tool tophilosophers of language and linguistics interested in the emergenceof linguistic communication and/or semantic meaning.
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
analysis |assertion |compositionality |defaults in semantics and pragmatics |descriptions |empiricism: logical |idiolects |innate/acquired distinction |innateness: and language |language of thought hypothesis |linguistics: computational |logic: intensional |mental representation |pragmatics |propositional attitude reports |reference |relativism |rigid designators
The authors are very grateful to the two SEP referees, Tom Wasow andWilliam Starr, who provided careful reviews of our drafts; to BonnieWebber and Zoltan Galsi for insightful comments and advice; and toDean Mellow for some helpful corrections. BCS was the lead author ofthis article throughout the lengthy period of its preparation, andworked on it in collaboration with FJP and GKP until thepost-refereeing revision was submitted at the end of April 2011. Shedied two weeks later, on May 14. FJP and GKP oversaw the few finalcorrections that were made when the HTML version was first publishedin September 2011.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2025 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054