Associationist Theories of Thought

First published Tue Mar 17, 2015; substantive revision Sun Jul 13, 2025

Associationism is one of the oldest, and, in some form or another,most widely held theories of thought. Associationism has been theengine behind empiricism for centuries, from the British Empiriciststhrough the Behaviorists and modern day Connectionists. Nevertheless,“associationism” does not refer to one particular theoryof cognitionper se, but rather a constellation of relatedthough separable theses. What ties these theses together is acommitment to a certain arationality of thought: a creature’smental states are associated because of some facts about its causalhistory, and having these mental states associated entails thatbringing one of a pair of associates to mind will,ceterisparibus, ensure that the other also becomes activated.

1. What is Associationism?

Associationism is a theory that connects learning to thought based onprinciples of the organism’s causal history. Since its earlyroots, associationists have sought to use the history of anorganism’s experience as the main sculptor of cognitivearchitecture. In its most basic form, associationism has claimed thatpairs of thoughts become associated based on the organism’s pastexperience. So, for example, a basic form of associationism (such asHume’s) might claim that the frequency with which an organismhas come into contact withXs andYs in one’senvironment determines the frequency with which thoughts aboutXs and thoughts aboutYs will arise together in theorganism’s future.

Associationism’s popularity is in part due to how many differentfunctions it can subserve. In particular, associationism can beused as a theory of learning (e.g., as in behaviorist theorizing), atheory of thinking (as in Jamesian “streams of thought”),a theory of mental structures (e.g., as in concept pairs), and atheory of the implementation of thought (e.g., as in connectionism).All these theories are separable, but share a related,empiricist-friendly core. As used here, a “pureassociationist” will refer to one who holds associationisttheories of learning, thinking, mental structure, and implementation.The “pure associationist” is a somewhat idealizedposition, one that no particular theorist may have ever held, but manyhave approximated to differing degrees (e.g., Locke 1690/1975; Hume1738/1975; Thorndike 1911; Skinner 1953; Hull 1943; Churchland 1986,1989; Churchland and Sejnowski 1990; Smolensky 1988; Elman 1991; Elmanet al. 1996; McClelland et al. 2010; Rydell and McConnell 2006; Fazio2007; Demeter 2021; Buckner 2023).

Outside of these core uses of associationism the movement has alsobeen closely aligned with a number of different doctrines over theyears: empiricism, behaviorism, anti-representationalism (i.e.,skepticism about the necessity of representational realism inpsychological explanation), gradual learning, and domain-generallearning. All of these theses are dissociable from core associationistthought (seesection 7). While one can be an associationist without holding those theses, someof those theses imply associationism more than others. These extratheses’ historical and sociological ties to associationism arestrong, and so will be intermittently discussed below.

2. Associationism as a Theory of Mental Processes: The Empiricist Connection

Empiricism is a general theoretical outlook, which offers a theory oflearning to explain as much of our mental life as possible. From theBritish empiricists through Skinner and the behaviorists (see theentry onbehaviorism) the main focus has been arguing for the acquisition of concepts (forthe empiricists’ “Ideas”, for the behaviorists“responses”) through learning. However, the mentalprocesses that underwrite such learning are almost never themselvesposited to be learned.^[1] So winnowing down the amount of mental processes one has to positlimits the amount of innate machinery with which the theorist issaddled. Associationism, in its original form as in Hume (1738/1975),was put forward as a theory of mental processes.Associationists’ attempt to answer the question of how manymental processes there are by positing, ideally, only a single mentalprocess: the ability to associate ideas.^[2]

Of course, thinkers execute many different types of cognitive acts, soif there is only one mental process, the ability to associate, thatprocess must be flexible enough to accomplish a wide range ofcognitive work. In particular, it must be able to account for learningand thinking. Accordingly, associationism has been utilized on bothfronts. We will first discuss the theory of learning and then, afteranalyzing that theory and seeing what is putatively learned, we willreturn to the associationist theory of thinking.

3. Associationism as a Theory of Learning

In one of its senses, “associationism” refers to a theoryof how organisms acquire concepts, associative structures, responsebiases, and even propositional knowledge. It is commonly acknowledgedthat associationism took hold after the publishing of JohnLocke’sEssay Concerning Human Understanding (1690/1975).^[3] However, Locke’s comments on associationism were terse (thoughfertile), and did not address learning to any great degree. The firstserious attempt to detail associationism as a theory of learning wasgiven by Hume in theTreatise of Human Nature (1738/1975).^[4] Hume’s associationism was, first and foremost, a theoryconnecting how perceptions (“Impressions”) determinedtrains of thought (successions of “Ideas”). Hume’sempiricism, as enshrined in the Copy Principle,^[5] demanded that there were no Ideas in the mind that were not firstgiven in experience. For Hume, the principles of associationconstrained the functional role of Ideas once they were copied fromImpressions: if Impressions IM1 and IM2 were associated in perception,then their corresponding Ideas, ID1 and ID2 would also becomeassociated. In other words, the ordering of Ideas was determined bythe ordering of the Impressions that caused the Ideas to arise.

Hume’s theory then needs to analyze what types of associativerelations between Impressions mattered for determining the ordering ofIdeas. Hume’s analysis consisted of three types of associativerelations: cause and effect, contiguity, and resemblance. If twoImpressions instantiated one of these associative relations, thentheir corresponding Ideas would mimic the same instantiation.^[6] For instance, if Impression IM1 was contemporaneous with ImpressionIM2, then (ceteris paribus) their corresponding Ideas, ID1and ID2, would become associated.

As stated, Hume’s associationism was mostly a way of determiningthe functional profile of Ideas. But we have not yet said what it isfor two Ideas to be associated (for that seesection 4). Instead, one can see Hume’s contribution as introducing a veryinfluential type of learning—associative learning—forHume’s theory purports to explain how we learn to associatecertain Ideas. We can abstract away from Hume’s framework ofideas and his account of the specific relations that underlieassociative learning, and state the theory of associative learningmore generally: if two representations ofX andY instantiate some associative relation,R,then those representations will become associated, so that futureactivations ofX will tend to bring about activations ofY, and do so directly (i.e., without any intermediatecomputations). The associationist then has to explain whatrelationR amounts to. The Humean form of associativelearning (whereR is equated with cause and effect,contiguity, or resemblance) has been hugely influential, informing theaccounts of those such as Jeremy Bentham, J.S. Mill, and AlexanderBain (see, e.g., the entries onJohn Stuart Mill and19^th Century Scottish Philosophy).^[7]

Associative learning didn’t hit its stride until the work ofIvan Pavlov, which spurred the subsequent rise of the behavioristmovement in psychology. Pavlov introduced the concept of classicalconditioning as a modernized version of associative learning. ForPavlov, classical conditioning was in part an experimental paradigmfor teaching animals to learn new associations between stimuli. Thegeneral method of learning was to pair an unconditioned stimulus (US)with a novel stimulus. An unconditioned stimulus is just a stimulusthat instinctively (i.e., without training) provokes a responsein an organism. Since this response is not itself learned, theresponse is referred to as an “unconditioned response”(UR). In Pavlov’s canonical experiment, the US was a meatpowder, as the smell of meat automatically brought about salivation(UR) in his canine subjects. The US is then paired with a neutralstimulus, such as a bell. Over time, the contiguity between the US andthe neutral stimulus causes the neutral stimulus to provoke the sameresponse as the US. Once the bell starts to provoke salivation, thebell has become a “conditioned stimulus” (CS) and thesalivating, when prompted by the bell alone, a “conditionedresponse” (CR). The associative learning here is learning toform a new stimulus-response pair between the bell and the salivation.^[8]

Classical conditioning is a fairly circumscribed process. It is a“stimulus substitution” paradigm where one stimulus can beswapped for another to provoke a response.^[9] However, the responses that are provoked are supposed to remainunchanged; all that changes is the stimulus that gets associated withthe response. Thus, classical conditioning seemed to some to be toorestrictive to explain the panoply of novel behavior organisms appearto execute.^[10]

Edward Thorndike’s research with cats in puzzle boxes broadenedthe theory of associative learning by introducing the notion ofconsequences to associative learning. Thorndike expanded the notion ofassociative learning beyond instinctual behaviors and sensorysubstitution to genuinely novel behaviors. Thorndike’sexperiments initially probed, e.g., how cats learned to lift a leverto escape the “puzzle boxes” (the forerunnerof “Skinner boxes”) that they were trapped in. Thecats’ behaviors, such as attempting to lift a lever, were notthemselves instinctual behaviors like the URs of Pavlov’sexperiments. Additionally, the cats’ behaviors were shaped bythe consequences that they brought on. For Thorndike it was becauselifting the lever caused the door to open that the cats learned theconnection between the lever and the door. This new view of learning,operant conditioning (for the organism is “operating” onits environment), was not merely the passive learning of Pavlov, but aspecies-nonspecific, general, active theory of learning.

This research culminated in Thorndike’s famous “Law ofEffect” (1911), the first canonical psychological law ofassociationist learning. It asserted that responses that areaccompanied by the organism feeling satisfied will,ceterisparibus, make the response more likely to occur when the organismencounters the same situation, whereas responses that are accompaniedwith a feeling of discomfort to the animal will,ceterisparibus, make the response less likely to occur when the organismencounters the same situation.^[11] The greater the positive or negative feelings produced, the greaterthe likelihood that the behavior will be evinced. To this Thorndikeadded the “Law of Exercise”, that responses to situationswill,ceteris paribus, be more associated with thosesituations in proportion to the frequency of past pairings betweensituation and response. Thorndike’s paradigm was popularized andextended by B.F. Skinner (see, e.g., Skinner 1953) who stressed thenotion not just of consequences but ofreinforcement as thebasis of forming associations. For Skinner, a behavior would getassociated with a situation according to the frequency and strength ofreinforcement that would arise as a consequence of the behavior.

Since the days of Skinner, associative learning has come in manydifferent variations. But what all varieties should share with theirhistorical predecessors is that associative learning is supposed tomirror the contingencies in the world without adding additionalstructure (seesection 9 for some examples of when supposedlyassociative theories smuggle in extra structure). The question of whatcontingencies associative learning detects (that is, one’spreferred analysis of what the associative relationR is), isup for debate and changes between theorists.

The final widely shared, though less central, property of associativelearning concerns the domain generality of associative learning.Domain generality’s prevalence among associationists is due inlarge part to their traditional empiricist allegiances: excisingdomain-specific learning mechanisms constrains the amount of innatemental processes one has to posit. Thus it is no surprise to find thatboth Hume and Pavlov assumed that associative learning could be usedto acquire associations between any contents, regardless of the typesof contents they were. For example, Pavlov writes,

Any natural phenomenon chosen at will may be converted into aconditioned stimulus. Any ocular stimulus, any desired sound, anyodor, and the stimulation of any portion of the skin, whether bymechanical means or by the application of heat or cold never failed tostimulate the salivary glands. (Pavlov 1906: 615)

For Pavlov the content of the CS doesn’t matter. Any contentwill do, as long as it bears the right functional relationship in theorganism’s learning history. In that sense, the learning isdomain general—it matters not what the content is, just the roleit plays (for more on this topic, seesection 9.4).^[12]

4. Associationism as a Theory of Mental Structure

As a theory of learning, associationism amounts to a constellation ofrelated views that interpret learning as associating stimuli withresponses (in operant conditioning), or stimuli with other stimuli (inclassical conditioning), or stimuli with valences (in evaluative conditioning).^[13] Associative learning accounts raise the question: when one learns toassociate contentsX andY because, e.g., previousexperiences withXs andYs instantiatedR,how does one store the information thatX andY are associated?^[14] A highly contrived sample answer to this question would be that athinker learns an explicitly represented unconscious conditional rulethat states “when a token ofx is activated, then alsoactivate a token ofy”. Instead of such a highlyintellectualized response, associationists have found a natural(though by no means necessary, seesection 4.2) complementary view that the information is stored in anassociative structure.

An associative structure describes the type of bond that connects twodistinct mental states.^[15] An example of such a structure is the associative pairsalt/pepper.^[16] The associative structure is defined, in the first instance,functionally: ifX andY form an associativestructure, then,ceteris paribus, activations of mental stateX bring about mental stateY andvice versawithout the mediation of any other psychological states (such as anexplicitly represented rule telling the system to activate a conceptbecause its associate has been activated).^[17] In other words, saying that two concepts are associated amounts tosaying that there is a reliable, psychologically basic causal relationthat holds between them—the activation of one of the conceptscauses the activation of the other. So, saying that someone harborsthe structuresalt/pepper amounts to sayingthat activations ofsalt will causeactivations ofpepper (andviceversa) without the aid of any other cognitive states.

Associative structures are most naturally contrasted withpropositional structures. The key distinction is that‘association’ denotes a causal relation among mentalrepresentations, where ‘predication’ expresses arelation between things in the world (or intentional contents thatspecify external relations). A pure associationist is opposed topropositional structures—strings of mental representations thatexpress a proposition—because propositionally structured mentalrepresentations have structure over and above the mere associativebond between two concepts. Take, for example, the associativestructure green/toucan. This structure does not predicate green ontotoucans; it merely indicates that activating one of those conceptsleads to the activation of the other. A pure associative theory rulesout predication, for propositional structures aren’t merelystrings of associations. Saying that someone has an associativethought green/toucan tells you something about the causal and temporalsequences of the activation of concepts in one’s mind; sayingthat someone has the thought that is a toucan tells you that aperson is predicating greenness of a particular toucan (see Fodor2003: 91–94, for an expansion of this point).

Associative structures needn’t just hold between simpleconcepts. One might have reason to posit associative structuresbetween propositional elements (seesection 5) or between concepts and valences (seesection 8). But none of the proceeding is meant to imply that all structures areassociative or propositional—there are other representationalformats that the mind might harbor (e.g., analog magnitudes or iconicstructures; see Camp 2007; Quilty-Dunn 2020). For instance, not allsemantically related concepts are harbored in associative structures.Semantically related concepts may in fact also be directly associated(as indoctor/nurse) or they may not (as inhorse/zebra; see Perea and Rosa 2002). Thedifference in structure is not just a theoretical possibility, asthese different structures have different functional profiles: forexample, conditioned associations appear to last longer than semanticassociations do in subjects with dementia (Glosser and Friedman1991).

4.1 Associative Symmetry

The analysis of associative structures implies that,ceterisparibus, associations are symmetric in their causal effects: if athinker has a bond betweensalt/pepper, thensalt should bring aboutpepperjust as well aspepperbrings aboutsalt(for extensive discussion of the symmetry point see Quilty-Dunn andMandelbaum 2019). But all else is rarely equal. For example,behaviorists such as Thorndike, Hull, and Skinner knew that the orderof learning affected the causal sequence of recall: if one is alwayshearing “salt and pepper” thensaltwill be more poised to activatepepperthanpepper toactivatesalt. So, included in theceterisparibus clause in the analysis of associative structures is theidealization that the learning of the associative elements was equallywell randomized in order.

Similarly, associative symmetry is violated when there are differingamounts of associative connections between the individual associatedelements. For example, in thegreen/toucancase, most thinkers will have many more associations stemming fromgreen than stemming fromtoucan.Suppose we have a thinker that onlyassociatestoucan withgreen,but associatesgreenwith a large host of other concepts (e.g.,grass,vegetables,tea,kermit,seasickness,moss,mold,lantern,ireland,etc). In this case one can expect thattoucan will more quickly activategreenthangreen willactivatetoucan, for the former bond will haveits activation strength less weakened amongst other associates thanthe latter will.

4.2 Activation Maps of Associative Structure

An associative activation map (sometimes called a “spreadingactivation” map, Collins and Luftus 1975) is a mapping for asingle thinker of all the associative connections between concepts.^[18] There are many ways of operationalizing associative connections. Inthe abstract, a psychologist will attempt to probe which concepts (orother mental elements) activate which other concepts (or elements).Imagine a subject who is asked to say whether a string of lettersconstitutes a word or not, which is the typical goal given to subjectsin a “lexical decision task”. If a subject has just seenthe word “mouse”, we assume that the conceptmousewas activated. If the subject is then quickerto say that, e.g., “cursor” is a word than the subject isto say that “toaster” is, then we can infer thatcursorwas primed, and is thus associatively relatedtomouse, in this thinker. Likewise, if wefind that “rodent” is also responded to quicker, then weknow thatrodent is associatively related tomouse. Using this procedure, one can generatean associative mapping of a thinker’s mind. Such a mapping wouldconstitute a mapping of the associative structures one harbors.However, to be a true activation map—a true mapping of whatconcepts facilitate what—the mapping would also need to includeinformation about the violations of symmetry between concepts.

4.3 Relation Between Associative Learning and Associative Structures

The British Empiricists desired to have a thoroughgoing pureassociationist theory, for it allowed them to lessen the load ofinnate machinery they needed to posit. Likewise, the behaviorists alsotended to want a pure associationist theory (sometimes out of asimilar empiricist tendency, other times because they were radicalbehaviorists like Skinner, who banned all discussion of mentalrepresentations). Pure associationists tend to be partial to aconnection that Fodor (2003) refers to as “Bare-BonedAssociation”. The idea is that the current strength of anassociative connection betweenX andY isdetermined,ceteris paribus, by the frequency of the pastassociations ofX andY. As stated, Bare-BonedAssociation assumes that associative structures encode, at leastimplicitly, the frequency of past associations ofX andY, and the strength of that associative bond is determined bythe organism’s previous history of experiencingXs andYs.^[19] In other words, the learning history of past associations determinesthe current functional profile of the corresponding associative structures.^[20]

Although the picture sketched above, where associative learningeventuates in associative structure, is appealing for many, it is notforced upon one, as there is no a priori reason to bar any type ofstructure to arise from a particular type of learning. One may, forexample, gain propositional structures from associative learning (seeMitchell et al. 2009 and Mandelbaum 2016 for arguments that this ismore than a mere logical possibility). This can happen in two ways. Inthe first, one may gain an associative structure that has aproposition as one of its associates. Assume that every timeone’s father came home he immediately made dinner. In such acase one might associate the propositiondaddy ishome with the conceptdinner (that isone might acquire:daddy is home/dinner).However, one might also just have a propositional structure resultfrom associative learning. If every time one’s father came homehe made dinner, then one might just end up learningifdaddy is home then dinner will come soon, whichis a propositional structure.

4.4 Extinction and Counterconditioning

There is a different, tighter relationship between associativelearning and associative structures concerning how to modulate anassociation. Associative theorists, especially from Pavlov onward,have been clear on the functional characteristics necessary tomodulate an already created association. There have been two generallyagreed upon routes:extinction andcounterconditioning. Suppose that, through associativelearning, you have learned to associate a CS with a US. How do webreak that association? Associationists have posited that one breaksan associative structure via two different types of associativelearning (/unlearning).Extinction is the name for one suchprocess. During extinction one decouples the external presentation ofthe CS and the US by presenting the CS without the US (and sometimesthe US without the CS). Over time, the organism will learn todisconnect the CS and US.

Counterconditioning names a similar process to extinction,though one which proceeds via a slightly different method.Counterconditioning can only occur when an organism has an associationbetween a mental representation and a valence, as acquired in anevaluative conditioning paradigm. Suppose that one associatesduckswith a positive valence. To break thisassociation via counterconditioning one introduces ducks not with alack of positive valence (as would happen in extinction) but with theopposite valence, a negative valence. Over multiple exposures, theinitial representation/valence association weakens, and is perhapscompletely broken.^[21]

How successful extinction and counterconditioning are, and how theywork, is the source of some controversy, and some reason to see bothmethods as highly ineffectual (Bouton 2004). Although the traditionalview is that extinction breaks associative bonds, it is an openempirical question whether extinction proceeds by breaking thepreviously created associative bonds, or whether it proceeds byleaving that bond alone but creating new, more salient (and perhapscontext-specific) associations between the CS and other mental states(Bouton 2002, Bendaña and Mandelbaum 2021). Additionally,reinstatement, the spontaneous reappearance of an associative bondafter seemingly successful extinction, has been observed in manycontexts (see, e.g., Dirikx et al. 2004 for reinstatement of fear in humans).^[22]

One fixed point in this debate is that one reverses associativestructures via these two types of associative learning/unlearning, andonly via these two pathways. What one doesnot do is try tobreak an associative structure by using practical or theoreticalreasoning. If you associatesalt withpepper,then telling you that salt has nothing to dowith pepper or giving you very good reasons not to associate the two(say, someone will give you $50,000 for not associating them)won’t affect the association. This much has at least been clearsince Locke. In theEssay concerning Human Understanding, inhis chapter “On the Association of Ideas” (chapter XXIII)he writes,

When this combination is settled, and while it lasts, it is not in thepower of reason to help us, and relieve us from the effects of it.Ideas in our minds, when they are there, will operate according totheir natures and circumstances. And here we see the cause why timecures certain affections, which reason, though in the right, andallowed to be so, has not power over, nor is able against them toprevail with those who are apt to hearken to it in other cases.(2.23.13)

Likewise, say one has just eaten lutefisk and then vomited. The smelland taste of lutefisk will then be associated with feeling nauseated,and no amount of telling one that they shouldn’t be nauseatedwill be very effective. Say the lutefisk that made one vomit wascovered in poison, so that we know that the lutefisk wasn’t theroot cause of the sickness. Having this knowledge won’tdislodge the association. In essence, associative structures arefunctionally defined as being alterable based on counterconditioning,extinction, and nothing else. Thus, assuming one seescounterconditioning and extinction as types of associative learning,we can say that associative learning does not necessarily eventuate inassociative structures, but associative structures can only bemodified by associative learning.

5. Associative Transitions

So far we’ve discussed learning and mental structures, but haveyet to discussthinking. The pure associationist will want atheory that covers not just acquisition and cognitive structure, butalso the transition between thoughts. Associative transitions are aparticular type of thinking, akin to what William James called“The Stream of Thought” (James 1890). Associativetransitions are movements between thoughts that are not predicated ona prior logical relationship between the elements of the thoughts thatone connects. In this sense, associative transitions are contrastedwith computational transitions as analyzed by the Computational Theoryof Mind (Fodor 2001; Quilty-Dunn and Mandelbaum 2018,2019; Quilty-Dunn et al. 2023; see the entry onComputational Theory of Mind). CTM understandsinferences as truth preserving movements inthought that are underwritten by the formal/syntactic properties ofthoughts. For example inferring the conclusion inmodusponens from the premises is possible just based on the form ofthe major and minor premise, and not on the content of the premises.Associative transitions are transitions in thought that are not basedon the logico-syntactic properties of thoughts. Rather, they aretransitions in thought that occur based on the associative relationsamong the separate thoughts.

Imagine an impure associationist model of the mind, one that containsboth propositional and associative structures. A computationalinference might be one such as inferringyou are ag from the thoughtsif you are anf, then you are ag, andyouare anf. However, an associative transition is just astream of ideas that needn’t have any formal, or even rational,relation between them, such as the transition fromthiscoffee shop is cold torussiashould annex idaho, without there being any interveningthoughts. This transition could be subserved merely by one’sassociation ofidaho andcold,or it could happen because the two thoughtshave tended to co-occur in the past, and their close temporalproximity caused an association between the two thoughts to arise (orfor many other reasons). Regardless of the etiology, the transitiondoesn’t occur on the basis of the formal properties of the thoughts.^[23]

According to this taxonomy, talk of an “associativeinference” (e.g., Anderson et al. 1994; Armstrong et al. 2012)is a borderline oxymoron. The easiest way to give sense to the idea ofan associative inference is for it to involve transitions in thoughtthat began because they were purely inferential (as understood by thecomputational theory of mind) but then became associated over time.For example, at first one might make themodus ponensinference because a particular series of thoughts instantiates themodus ponens form. Over time the premises and conclusion ofthat particular token of amodus ponens argument becomeassociated with each other through their continued use in thatinference and now the thinker merely associates the premises with theconclusion. That is, the constant contiguity between the premises andthe conclusion occurred because the inference was made so frequently,but the inference was originally made so frequently not because of theassociative relations between the premises and conclusion, but becausethe form of the thoughts (and the particular motivations of thethinker). This constant contiguity then formed the basis for anassociative linkage between the premises and the conclusion.^[24]

As was the case for associative structures, associative transitions inthought are not just a logical possibility. There are particularempirical differences associated with associative transitions versusinferential transitions (see section 6 of Quilty-Dunn et al. 2023).Associative transitions tend to move across different content domains,whereas inferential transitions tend to stay on a more focused set ofcontents. These differences have been seen to result in measurabledifferences in mood: associative thinking across topics bolsters moodwhen compared to logical thinking on a single topic (Mason and Bar2012).

6. Associative Instantiation

The associationist position so far has been neutral on howassociations are to be implemented. Implementation can be seen at arepresentational (that is psychological) level of explanation, or atthe neural level. A pure associationist picture would posit anassociative implementation base at one, or both, of these levels.^[25]

The most well-known associative instantiation base is a class ofnetworks called Connectionist networks (see the entry onconnectionism andsection 10 below). Connectionist networks are sometimes pitchedat the psychological level (see, e.g., Elman 1991; Elman et al. 1996;Smolensky 1988). This amounts to the claim that models of algorithmsembedded in the networks capture the essence of certain mentalprocesses, such as associative learning. Other times connectionistnetworks are said to be models of neural activity (“neuralnetworks”). Connectionist networks consist in sets of nodes,generally input nodes, hidden nodes, and output nodes. Input nodes aretaken to be analogs of sensory neurons (or sub-symbolic sensoryrepresentations), output nodes the analog of motor neurons (orsub-symbolic behavioral representations), and hidden nodes arestand-ins for all other neurons.^[26] The network consists in these nodes being connected to each otherwith varying strengths. The topology of the connections gives one anassociative mapping of the system, with the associative weightsunderstood as the differing strengths of connections. On thepsychological reading, these associations are functionally defined; onthe neurological reading, they are generally understood to berepresenting synaptic conductance (and are the analogs of dendrites).^[27]Prima facie, these networks are purely associative and donot contain propositional elements, and the nodes themselves are notto be equated with single representational states (such as concepts;see, e.g., Gallistel and King 2009).

However, a connectionist network can implement a classical Turingmachine architecture (see, e.g., Fodor and McLaughlin 1990; Chalmers1993). Many, if not most, of the adherents of classical computation,for example proponents of CTM, think that the brain is an associativenetwork, one which implements a classical computational program. Someadherents of CTM do deny that the brain runs an associative network(see, e.g., Gallistel and King 2009, who appear to deny that there isany scientific level of explanation that association is intimatelyinvolved in), but they do so on separate empirical grounds and notbecause of any logical inconsistency with an associative brainimplementing a classical mind.

When discussing an associative implementation base it is important todistinguish questions of associationist structure from questions ofrepresentational reality. Connectionists have often been followers ofthe Skinnerian anti-representationalist tradition (Skinner 1938).Because of the distributed nature of the nodes in connectionistnetworks, the networks have tended to be analyzed as associativestimulus/response chains of subsymbolic elements. However, thequestion of whether connectionist networks have representations whichare distributed in patterns of activity throughout different nodes ofthe network, or whether connectionist networks are best understood ascontaining no representational structures at all, is orthogonal toboth the question of whether the networks are purely associative orcomputational, and whether the networks can implement classicalarchitectures.

7. Relation between the Varieties of Association and Related Positions

These four types of associationism share a certain empiricistspiritual similarity, but are logically, and empirically, separable.The pure associationist who wants to posit the smallest number ofdomain-general mental processes will theorize that the mind consistsof associative structures acquired by associative learning which enterinto associative transitions and are implemented in an associativeinstantiation base. However, many hybrid views are available andfrequently different associationist positions become mixed andmatched, especially once issues of empiricism, domain-specificity, andgradual learning arise. Below is a partial taxonomy of where somewell-known theorists lie in terms of associationism and these other,often related doctrines.

Prinz (2002) and Karmiloff-Smith (1995) are examples of empiricistnon-associationists. It is rare to find an associationist who is anativist, but plenty of nativists have aspects of associationism intheir own work. For example, even the arch-nativist Jerry Fodormaintains that intramodular lexicons contain associative structures(Fodor 1983). Similarly, there are many non-behaviorist (at leastnon-radical, analytic, or methodological behaviorist) associationists,such as Elman (1991), Smolensky (1988), Baeyens (De Houwer and Baeyens2001), and modern day dual process theorists such as Evans andStanovich (2013). It is quite difficult to find a non-associationistbehaviorist, though Tolman approximates one (Tolman 1948). Elman andSmolensky also qualify as representationalist associationists, and VanGelder (1995) as an anti-representationalist non-associationist.Karmiloff-Smith (1995) can be interpreted as, for some areas oflearning, a proponent of gradual learning without being associationist(some might also read contemporary Bayesian theorists, e.g., Tenenbaumet al. 2011 and Chater et al. 2006 as holding a similar position forsome areas of learning). Rescorla (1988) and Heyes (2012) claim to beassociationists who are pro step-wise, one shot learning (thoughRescorla sees his project as a continuation of the classicalconditioning program, others see his data as grist for theanti-associationist, pro-computationalist mill, see Gallistel and King2009; Quilty-Dunn and Mandelbaum 2019). Lastly, Tenenbaum and hiscontemporary Bayesians colleagues sometimes qualify as holding adomain-general learning position without it being associationist,though they are no foes of innate content as they build many aspectsof core cognition into their theoretical basis (see Tenenbaum et al.2011; Carey 2009; Spelke 2022).^[28]

8. Associationism in Social Psychology

Since the cognitive revolution, associationism’s influence hasmostly died out in cognitive psychology and psycholinguistics. This isnot to say that all aspects of associative theorizing are dead inthese areas; rather, they have just taken on much smaller, moreperipheral roles (for example, it has often been suggested that mentallexicons are structured, in part, associatively, which is why lexicaldecision tasks are taken to be facilitation maps of one’slexicon). In other areas of cognitive psychology (for example, thestudy of causal cognition, see Gerstenberg et al. 2021),associationism is no longer the dominant theoretical paradigm, butvestiges of associationism still persist (see Shanks 2010 for anoverview of associationism in causal cognition). Associationism isalso still alive in the connectionist literature, as well as in theanimal cognition tradition.

But the biggest contemporary stronghold of associationist theorizingresides in social psychology, an area which has traditionally beenhostile to associationism (see, e.g., Asch 1962, 1969). The ascendanceof associationism in social psychology has been a fairly moderndevelopment, and has caused a revival of associationist theories inphilosophy (e.g., Gendler 2008). The two areas of social psychologythat have seen the greatest renaissance of associationism are theimplicit attitude and dual-process theory literature. However, in thelate 2010s social psychology has begun to take a critical look atassociationist theories (e.g., Mann et al. 2019; Kurdi and Dunham2021; Kurdi and Mandelbaum 2023).

8.1 Implicit Attitudes

Implicit attitudes are generally operationally defined as theattitudes tested on implicit tests such as the Implicit AssociationTest (Greenwald et al. 1998), the Affect Misattribution Procedure(Payne et al. 2005), the Sorted Paired Feature Task (Bar-Annan et al.2009) and the Go/No-Go Association Task (Nosek and Banaji 2001).Implicit attitudes are contrasted with explicit attitudes, attitudesoperationalized as the one’s being probed when one gives anexplicit response like a marking on a Likert scale, feelingthermometer, or in free report. Such operationalizations leave openthe question of whether there are any natural kinds to which explicitand implicit attitudes refer. In general implicit attitudes arecharacterized as being mental representations that are unavailable forexplicit report and inaccessible to consciousness (Morris and Kurdi2023; cf. Hahn et al. 2014; Berger 2020).

The default position among social psychologists is to treat implicitattitudes as if they are associations among mental representations(Fazio 2007), or among pairs of mental representations and valences.In particular, they treat implicit attitudes as associative structureswhich enter into associative transitions. Recently this issue has comeunder much debate. In an ever expanding series of studies, De Houwerand his collaborators have taken to show that associative learning is,at base, relational, propositional contingency learning; i.e., thatall putatively associative learning is in fact a nonautomatic learningprocess that generates and evaluates propositional hypotheses(Mitchell et al. 2009; De Houwer 2009, 2011, 2014, 2019; Hughes et al.2019). Other researchers have approached the question also usinglearning as the entrance point to the debate, demonstrating effectsthat non-associative acquisition creates stronger attitudes thanassociative acquisition (Hughes et al. 2019). For example, one mightdemonstrate that learning through merely reading an evaluativestatement creates a stronger implicit attitude than repeatedassociative exposures (Kurdi and Banaji 2017, 2019; Mann et al. 2019).Other researchers have championed propositional models not based onlearning, but instead based on how implicit attitudes changeregardless of how they are acquired. For instance, Mandelbaum (2016)argued that logical/evidential interventions modulate implicitattitudes in predictable ways (e.g., using double negation to canceleach other out), while others have used diagnosticity to show thatimplicit attitudes update in a non-associationistic, propositional way(e.g., after reading a story about a man who broke into a building andappeared to ransack it you learn that we jumped into save people froma fire and immediately change your opinion of the man from negative topositive; Mann and Ferguson 2015; Mann et al. 2017; Van Dessel et al.2019). (For more on implicit attitudes see the entry onimplicit bias). Perhaps the most probing work in this area has been the work of Benedek Kurdiand colleagues which has pitted associative vs. propositional modelsin both acquisition (Kurdi and Banaji) and change (Kurdi and Dunham2021), finding very little work for associative models toaccomplish.

8.2 Dual Process Theories

Associative structures and transitions are widely implicated in aparticular type of influential dual-process theory. Though there aremany dual-process theories in social psychology (see, e.g., the papersin Chaiken and Trope 1999, or the discussion in Evans and Stanovich2013), the one most germane to associationism is also the mostpopular. It originates from work in the psychology of reasoning and isoften also invoked in the heuristics and biases tradition (see, e.g.,Kahneman 2011). It has been developed by many different psychologicaltheorists (Sloman 1996; Smith and DeCoster 2000; Wilson et al. 2000;Evans and Stanovich 2013) and, in parts, taken up by philosophers too(see, e.g., Gendler 2008; Frankish 2009; see also some of the essaysin Evans and Frankish 2009).

The dual-process strain most relevant to the current discussion positstwo systems, one evolutionarily ancient intuitive system underlyingunconscious, automatic, fast, parallel and associative processing, theother an evolutionarily recent reflective system characterized byconscious, controlled, slow, “rule-governed” serialprocesses (see, e.g., Evans and Stanovich 2013). The ancient system,sometimes called “System 1”, is often understood toinclude a collection of autonomous, distinct subsystems, each of whichis recruited to deal with distinct types of problems (see Stanovich2011 for a discussion of “TASS—the autonomous set ofsystems”). Although theories differ on how System 1 interactswith System 2,^[29] the theoretical core of System 1 is arguing that its processing isessentially associative. As in the implicit attitude debate,dual-systems models have recently come under sustained critique (seeKruglanski 2013; Osman 2013; Mandelbaum 2016; De Houwer 2019), thoughthey remain very popular.

9. Criticisms of Associationism

Associationism has been a dominant theme in mental theorizing forcenturies. As such, it has garnered an appreciable amount ofcriticism.

9.1 Learning Curves

The basic associative learning theories imply, either explicitly orimplicitly, slow, gradual learning of associations (Baeyens et al.1995). The learning process can be summarized in a learning curvewhich plots the frequency (or magnitude) of the conditioned responseas a function of the number of reinforcements (Gallistel et al. 2004:13124). Mappings between CRs and USs are gradually built up overnumerous trials (in the lab) or experiences (in the world). Gradual,slow learning has come under fire from a variety of areas (seesection 9.3 andsection 9.4.1). However, here we just focus on the behavioraldata. In a series of works re-analyzing animal behavior, Gallistel(Gallistel et al. 2004; Gallistel and King 2009) has argued thatalthoughgroup-level learning curves do display theproperties of being negatively accelerated and gradually developing,these curves are misleading because noindividual’slearning curve has these properties. Gallistel has argued thatlearning for individuals is generally step-like, rapid, and abrupt. Anindividual’s learning from a low-level of responding toasymptotic responding is very quick. Sometimes, the learning is soquick that it is literally one-shot learning. For example, afteranalyzing multiple experiments of animal learning of spatial locationGallistel writes,

The learning of a spatial location generally requires but a singleexperience. Several trials may, however, be required to convince thesubject that the location is predictable from trial to trial.(Gallistel et al. 2004: 13130)

Gallistel argues that the reason the group learning curves look to besmooth and gradual is that there are large individual differencesbetween subjects in terms of when the onset latency of the step-wisecurves begin (Gallistel et al. 2004: 13125); in other words, differentanimals take different amounts of time for the learning to commence.The differences between individual subjects’ learningcurves are predicated on when the steps begin and not by the speed ofthe individual animal’s learning process. All individuals appearto show rapid rises in learning, but since each begins their learningat a different time, when we average over the group, the rapidstep-wise learning appears to look like slow, gradual learning(Gallistel et al. 2004: 13124).

9.2 The Problem of Predication

The problem of predication is, at its core, a problem of how anassociative mechanism can result in the acquisition ofsubject/predicate structures, structures which many theorists believeappear in language, thought, and judgment. The first major discussionof the problem appears in Kant (1781/1787), but variants of the basicKantian criticism can be seen across the contemporary literature (see,e.g., Chomsky 1959; Fodor and Pylyshyn 1988; Fodor 2003; Mandelbaum2013a; for the details of the Kantian argument see the entry onKant’s Transcendental Argument).

For a pure associationist, association is “semanticallytransparent” (see Fodor 2003), in that it purports to add noadditional structure to thoughts. When a simple concept,Xand a simple conceptY, become associated one acquires theassociative structureX/Y. ButX/Yhas no additional structure on top of their contents. Knowing thatX andY are associated amounts to knowing a causalfact: that activatingXs will bring about the activation ofYs andvice versa. However, so the argument goes,some of our thoughts appear to have more structure than this: thethoughtbirds fly predicates the property offlying onto birds. The task for the associationist is to explain howassociative structures can distinguish a thinker who has a single(complex) thoughtbirds fly from a thinker whoconjoins two simple thoughts in an associative structure where onethought,birds, is immediately followed byanother,fly. As long as the two simplethoughts are reliably causally correlated so that, for a thinker,activations ofbirds regularly brings aboutfly, then that thinker has the associativestructurebirds/fly. Yet it appears thatthinker hasn’t yet had the thoughtbirdsfly. The problem of predication is explaining how a purelyassociative mechanism could eventuate in complex thoughts. InFodor’s terms the problem boils down to how association, acausal relation among mental representations, can affect predication,a relation among intentional contents (Fodor 2003).

A family of related objections to associationism can be interpreted asvariations on this theme. For example, problems of productivity,compositionality, and systematicity for associationist theorizingappear to be variants of the problem of predication (for more on thesespecific issues see the entries on theLanguage of Thought Hypothesis and oncompositionality). If association doesn’t add any additional structure to themental representations that get associated, then it is hard to see howit can explain the compositionality of thought, which relies onstructures that specify relations among intentional contents.Compositionality requires that the meaning of a complex thought isdetermined by the meanings of its simple constituents along with theirsyntactic arrangements. The challenge to associationism is to explainhow an associative mechanism can give rise to the syntactic structuresnecessary to distinguish a complex thought likebirdsfly from the temporal succession of two simple thoughtsbirdsandfly. Since thecompositionality of thought is posited to undergird the productivityof thought (thinkers’ abilities to think novel sentences ofarbitrary lengths, e.g.,green birds fly,giant green birds fly,cuddlygiant green birds fly, etc.), associationism has problemsexplaining productivity.

Systematicity is the thesis that there are predictable patterns amongwhich thoughts a thinker is capable of entertaining. Thinkers that canentertain thoughts of certain structures can always entertain distinctthoughts that have related structure. For instance, any thinker whocan think a complex thought of the form “X transitiveverbY” can think “Y transitive verbX”.^[30] Systematicity entails that we won’t find any thinker that canonly think one of those two thoughts, in which case we could not finda person who could thinkaudrey wronged max,but notmax wronged audrey. Of course, thesetwo thoughts have very different effects in one’s cognitiveeconomy. The challenge for the associationist is to explain how theassociative structureaudrey/wronged/max canbe distinguished from the structuremax/wronged/audrey,while capturing the differencesin those thoughts’ effects.

Associationists have had different responses to the problem. Some havedenied that human thought is actually compositional, productive, andsystematic, and other non-associationists have agreed with thiscritique. For example, Prinz and Clark claim “concepts do notcompose most of the time” (2002: 62), and Johnson (2004) arguesthat the systematicity criterion is wrongheaded (see Aydede 1997 forextended discussion of these issues). Rumelhart et al. offer aconnectionist interpretation of “schemata”, one which isintended to cover some of the phenomenon mentioned in this section(Rumelhart et al. 1986). Others have worked to show that classicalconditioning can indeed give rise to complex associative structures(Rescorla 1988). In defense of the associationist construal of complexassociations Rescorla writes,

Clearly, the animals had not simply coded the RH [complex] compound interms of parallel associations with its elements. Rather they hadengaged in some more hierarchical structuring of the situation,forming a representation of the compound and using it as an associate.(Rescorla 1988: 156)

Whether or not associationism has the theoretical tools to explainsuch complex compounds by itself is still debated (see, e.g., Fodor2003; Mitchell 2009; Gallistel and King 2009; Quilty-Dunn andMandelbaum 2019; Quilty-Dunn et al. 2023). Notably, recent work indeep learning suggests that connectionist models may be capable ofexhibiting systematic compositional generalization. For example, Lakeand Baroni (2023) found that neural networks trained throughmeta-learning—a process where models learn how to learn bytraining on a distribution of related tasks—can acquirehuman-like systematic generalization abilities, allowing them tocorrectly interpret novel combinations of familiar elements. Whiledebate continues about whether such models truly capture the nature ofhuman compositionality, these findings challenge the long-standingassumption that connectionist architectures cannot generalizesystematically.

9.3 Word Learning

Multiple issues in the acquisition of the lexicon appear to causeproblems for associationism. Some of the most well known examples arereviewed below (for further discussion of word learning andassociationism see Bloom 2000).

9.3.1 Fast Mapping

Children learn words at an incredible rate, acquiring around 6,000words by age 6 (Carey 2010: 184). If gradual learning is the rule,then words too should be learned gradually across this time. However,this does not appear to be the case. Susan Carey discovered thephenomenon of “fast mapping”, which is one-shot learningof a word (Carey 1978a, 1978b; Carey and Bartlett 1978). Her mostinfluential example investigated children’s acquisition of“chromium” (a color word referring to olive green).Children were shown one of two otherwise identical objects, which onlydiffered in color and asked, “Can you get me the chromium tray,not the red one, the chromium one” (recited in Carey 2010: 2).All of the children handed over the correct tray at that time. Whenthe children were later tested in differing contexts, more than halfremembered the referent of “chromium”. These findings havebeen extended—for example, Markson and Bloom (1997) showed thatthey are not specific to the remembering of novel words, but also holdfor novel facts.

Fast mapping poses two problems for associationism. The first is thatthe learning of a new word did not develop slowly, as would bepredicted by proponents of gradual learning. The second is that inorder for the word learning to proceed, the mind must have been aidedby additional principles not given by the environment. Some of theseprinciples such as Markman’s (1989) taxonomic, whole object, andmutual exclusivity constraints, and Gleitman’s syntacticbootstrapping (Gleitman et al. 2005), imply that the mind does addstructure to what is learned. Consequently, the associationist claimthat learning is just mapping external contingencies without addingstructure is imperiled.

Recent research complicates the critique that associationist modelscannot account for fast mapping. For example, Wang et al. (2025) showthat neural networks can develop one-shot word learning abilitiesthrough meta-learning—practicing word learning across manyexamples. Their models achieve efficient word learning withoutexplicit structural constraints, using only human-scale child-directedlanguage. However, this approach still requires training on thespecific task of word learning itself, suggesting a middle ground:while pure associationism may be insufficient, structured associativelearning through meta-learning might support fast mapping withoutrequiring innate constraints.

9.3.2 Syntactic Category Learning

“Motherese”, the name of the type of language that infantsgenerally hear, consists of simple sentences such as “Nora wanta bottle?” and “Are you tired?”. These sentencesalmost always contain a noun and a verb. Yet, the infant’svocabulary massively over-represents nouns in the first 100 words orso, while massively under-representing the verbs (never mindadjectives or adverbs, which almost never appear in the first 100words infants produce; see, e.g., Goldin-Meadow, Seligman, and Gelman1976). Even more surprising is that the over-representation of nounsto verbs holds even though

the incidence of each word (that is, the token frequency) is higherfor the verbs than for the nouns in the common set used by mothers.(Snedeker and Gleitman 2004: 259, citing data from Sandhoffer, Smith,and Luo 2000)

Moreover, children hear a preponderance of determiners(“the” and “a”) but don’t produce them(Bloom 2000). These facts are not specific to English, but holdcross-culturally (see, e.g., Caselli et al. 1995). The disparitybetween the variation of the syntactic categories infants receive asinput and produce as output is troublesome to associationism, insofaras associationism is committed to the learned structures (and thebehaviors that follow from them) merely patterning what is given inexperience.

9.4 Against the Contiguity Analysis of Associationism

Contiguity has been a central part of associationist analyses sincethe British Empiricists. In the experimental literature, the problemof figuring out the parameters needed for acquiring an association dueto the contiguity of its relata has sometimes been termed the problemof the “Window of Association” (e.g., Gallistel and King2009). Every associationist theory has to specify what temporal windowtwo properties must instantiate in order for those properties to be associated.^[31] A related problem for contiguity theorists is that if the domaingenerality of associative learning is desired, then the window needsto be homogenous across content domains. The late 1960s saw persuasiveattacks on domain generality, as well as the necessity and sufficiencyof the contiguity criterion in general.

9.4.1 Against the Necessity of Contiguity

Research on “taste aversions” and“bait-shyness” provided a variety of problems withcontiguity in the associative learning tradition of classicalconditioning. Garcia observed that a gustatory stimulus (e.g.,drinking water or eating a hot dog) but not an audiovisual stimulus (alight and a sound) would naturally become associated with feelingnauseated. For instance, Garcia and Koelling (1966) paired anaudiovisual stimulus (a light and a sound) with a gustatory stimulus(flavored water). The two stimuli were then paired with the ratsreceiving radiation, which made the rats nauseated. The ratsassociated the feeling of nausea with the water and not with thesound, even though the sound was contiguous with the water. Moreover,the delay between ingesting the gustatory stimulus and feelingnauseated could be quite long, with the feeling not coming on until 12hours later (Roll and Smith 1972), and the organism needn’t evenbe conscious when the negative feeling arises. (For a review, seeSeligman 1970; Garcia et al. 1974). The temporal delay shows that theCS (the flavored water) needn’t be contiguous with the US (thefeeling of nausea) in order for learning to occur, thus showing thatcontiguity isn’t necessary for associative learning.

Garcia’s work also laid bare the problems with thedomain-general aspect of associationism. In the above study the ratwas prepared to associate the nausea with the gustatory stimulus, butwould not associate it with the audiovisual stimulus. However, if onechanges the US from feeling nauseated to receiving shocks in perfectcontiguity with the audiovisual and gustatory stimuli, then the ratswill associate the shocks with the audiovisual stimulus but not withthe gustatory stimulus. That is, rats are prepared to associateaudiovisual stimuli with the shock but are contraprepared to associatethe shocks with the gustatory stimulus. Thus, learning does not seemto be entirely domain-general (for similar content specificity effectsin humans, see Baeyens et al. 1990).^[32]

Lastly, “The Garcia effect” has also been used to showproblems in the learning curve (seesection 9.1). “Taste aversions” are the phenomena whereby an organismgets sick from ingesting the stimulus and the taste (or odor, Garciaet al. 1974) of that stimulus gets associated with the feeling ofsickness. As anyone who has had food poisoning can attest, thislearning can proceed in a one-shot fashion, and needn’t have agradual rise over many trials (taste aversions have also been observedin humans, see, e.g., Bernstein and Webster 1980; Bernsetin 1985;Logue et al. 1981; Rozin 1986).

9.4.2. Against the Sufficiency of Contiguity

Kamin’s famous blocking experiments (1969) showed that not allcontiguous structures lead to classical conditioning. A rat that hasalready learned that CS1 predicts a US, will not learn that asubsequent CS2 predicts the US, if the CS2 is always paired with theCS1. Suppose that a rat has learned that a light predicts a shockbecause of the constant contiguity of the light and shock. Afterlearning this, the rat has a sound introduced which only arises inconjunction with the light and the shock. As long as the rat hadpreviously learned that the light predicts the shock, it will notlearn that the sound does (as can be seen on later trials that havethe sound alone). In sum, having learned that the CS1 predicts the USblocks the organism from learning that the CS2 predicts the US.^[33] So even though CS2 is perfectly contiguous with the US, theassociation between CS2 and the US remains unlearned, thus serving asa counterexample to sufficiency of contiguity.^[34]

Similarly Rescorla (1968) demonstrated that a CS can appear only whenthe US appears and yet still have the association between them beunlearnable. If a tone is arranged to bellow only when there areshocks, but there are still shocks when there are no tones (that is,the CS only appears with the US, but the US sometimes appears withoutthe CS), no associative learning between the CS and the US will occur.Instead, subjects (in Rescorla 1968, rats) will only learn aconnection between the shock and the experimentalsituation—e.g., the room in which the experiment is carriedout.

In large part because of the problems discussed in 9.4, many classicalconditioning theorists gave up the traditional program. Some, likeGarcia, appeared to give up the classical theoretical frameworkaltogether (Garcia et al. 1974), others, such as Rescorla and Wagner,tried to usher the framework into the modern era (see, Rescorla andWagner 1972; Rescorla 1988), where conditioning is seen as sensitiveto base rates and driven by informational pick-up.^[35] The Rescorla-Wagner model, for example, proposes that learning occurswhen there is a discrepancy between what is expected and what actuallyoccurs—known as a prediction error (Rescorla & Wagner,1972). The model explains blocking as follows: once CS1 fully predictsthe US, no prediction error occurs when CS2 is added, preventing newlearning. It also accounts for the insufficiency of contiguity byshowing that mere co-occurrence is less important than the informationstimuli provide about outcomes. The model’s emphasis onprediction error has been influential beyond associative learning,informing computational models of dopamine function (Schultz et al.,1997) and contemporary reinforcement learning algorithms (Sutton &Barto, 1998; seesection 10). The shift from simple contiguity toprediction error illustrates a tension in the evolution ofassociationism: critics like Fodor and Gallistel argue that addingmechanisms like error correction effectively abandons associationism’score commitment to parsimony, while defenders see such additions asnecessary refinements that preserve the spirit of associativeexplanation. Whether this movement is interpreted as a substantiverevision of classical conditioning (Rescorla 1988; Heyes 2012) or awholesale abandoning of it (Gallistel and King 2009) is debatable.

9.5 Coextensionality

The Rescorla experiment also demonstrates another problem inassociative theorizing: the question of why some property is singledout as a CS as opposed to different, equally contemporaneouslyinstantiated properties. Put a different way, one needs a principle tosay what the “same situation” amounts to ingeneralizations such as Thorndike’s laws. For instance, if a CSand a US, say a tone and a shock, are perfectly paired so that theyare either both present or both absent, the organism won’tassociate the location it received shocks (e.g., the experimentalsetting) with getting shocked, it will just associate the tone withthe shocks. But in the condition where the US occurs without the CS,but the CS does not occur without the US, the organism will gain anassociation between the shocks and the location. However, in bothcases the location is present on every trial.^[36] In contrast to shocks, x-ray radiation, when used as a US, neverappears to become associated with location, even if they are alwaysperfectly paired (Garcia et al. 1972).^[37]

The problem of saying which properties become associated when multipleproperties are coinstantiated sometimes goes by the name the“Credit Assignment Problem” (see, e.g., Gallistel and King2009, and below insection 10.2.3).^[38] Some would argue that this problem is a symptom of a larger issue:trying to use extensional criteria to specify intentional content(see, e.g., Fodor 2003). Associationists need a criterion to specifywhich of the coextensive properties will in fact be learned, and whichnot.

An additional worry stems from the observation that sometimes the lackof a property being instantiated is an integral component of what islearned. To deal with the problem of missing properties, contemporaryassociationists have introduced an important element to the theory:inhibition. For example, if a US and a CS only appear when the otheris absent, the organism will learn a negative relationship holdsbetween them; that is, the organism will learn that the absence of theCS predicts the US.^[39] Here the CS becomes a “conditioned inhibitor” of the US.Inhibition, using associations as modulators and not just activators,is a central part of current associationist thinking. For example, inconnectionist networks, inhibition is implemented by the activation ofcertain nodes inhibiting the activation of other nodes. Connectionweights can be positive or negative, with the negative weight standingin for the inhibitory strength of the association.

Various solutions to the coextensionality problem have been proposedby associationists. Mackintosh (1975) developed a selective attentionmodel in which organisms learn through experience which stimuli aremost predictive of important outcomes and dynamically shift attentionto these cues. Attention increases to stimuli that are betterpredictors than other available stimuli and decreases to poorerpredictors. This selective attention process helps explain bothapparently sudden learning (as attention rapidly shifts) and why onlycertain co-extensive properties become associated (because attentionselectively focuses on the most predictive cues). Pearce’s(1987) configural theory offers another associationist solution.Rather than representing stimuli as separate elements thatindependently associate with outcomes, Pearce proposed that organismsform representations of entire stimulus configurations. Theseconfigural representations can then associate with outcomes, andsimilar configurations will produce generalization of respondingproportional to their similarity. This approach addresses the problemof which co-occurring features become associated by treating stimuluscombinations as unique representational wholes.

10. Associationism and Reinforcement Learning

Reinforcement learning (RL) is a computational approach tounderstanding how agents learn optimal behavior through interactionwith their environment. At its core, RL can be seen as formalizing thecore problem of associationist theories of learning: how an agentlearns to select beneficial actions by associating stimuli andresponses based on their experienced consequences. Unlike othermachine learning frameworks such as supervised learning which relieson labeled examples, or unsupervised learning which finds patterns inunlabeled data, RL involves learning through direct interaction withan environment and feedback about chosen actions. This trial-and-errorapproach allows agents to discover behavior that maximizes cumulativereward over time, even when the relationship between actions and theirlong-term consequences is initially unknown.

There is a direct lineage between associationist theories of learningand the development of modern RL. As discussed insection 3,Thorndike’s Law of Effect proposed that organisms learn byrepeating behaviors followed by positive outcomes and avoiding thosefollowed by negative outcomes, with the strength of stimulus-responseconnections depending on the perceived outcomes of the response. Thisemphasis on trial-and-error learning and the gradual strengthening ofsuccessful stimulus-response associations directly influenced earlyartificial intelligence researchers. In 1948, Alan Turing describedone of the earliest designs for implementing trial-and-error learningin a computer—a ‘pleasure-pain system’ whoseinitially random decisions when faced with an undetermined choicewould be canceled by ‘pain’ stimuli and made permanent by‘pleasure’ stimuli (Turing 1948). Several early mechanicallearning devices were inspired by similar ideas. In 1951, MarvinMinsky built the Stochastic Neural Analog Reinforcement Calculator(SNARC)—one of the first artificial neural networkmachines—in close consultation with Skinner himself (Minsky1952). SNARC implemented a form of RL using a network of 40 artificialsynapses based on Hebbian principles. It simulated rats runningthrough mazes, with each synapse maintaining a probability of signalpropagation that could be modified through reinforcement through amanually delivered reward. When the simulated rat reached its goal, amechanical system would strengthen the recently active connectionsbased on operant conditioning. These early efforts to implementmechanical learning by trial-and-error through reinforcement helpedestablish ideas that would later be formalized in modern RLalgorithms.

The development of RL theory has both operationalized and extendedassociationist principles. While it maintained the fundamental ideathat learning occurs through experience-dependent modification ofassociations, RL also introduced more sophisticated learningmechanisms for addressing challenges that simple associationismstruggled to explain, and incorporated insights from other fields suchas optimal control theory. In what follows, we will review some of thekey innovations introduced by RL, how they relate to the limitationsof traditional associationist learning, and their broaderphilosophical implications.

10.1 An overview of RL

RL models intelligent behavior as an interactive process between anagent and its environment. The agent—which could be anythingfrom a chess-playing program to a robot—learns through directexperience by perceiving aspects of its environment, taking actions,and receiving feedback about their consequences. The environmentrepresents everything external to the agent with which it can interactbut whose responses it cannot directly control. When the agent takesan action, the environment responds by transitioning to a newsituation and providing evaluative feedback in the form of a rewardsignal that indicates how well the agent is progressing toward itsgoals.

The agent’s perception of its environment at any moment iscaptured by the notion of state. States can be fully observable, wherethe agent has complete information about its current situation, orpartially observable, where some relevant information remains hidden.For instance, in chess, the current board position is fullyobservable, while an agent exploring a maze can only observe part ofthe environment. The information available in the current statedetermines which actions are possible for the agent to take.

The agent’s interactions with its environment occur throughactions, such as moving left or right in a maze or selecting moves inchess. After each action, the environment provides the agent with areward signal—a scalar value that indicates the immediatedesirability of the agent’s choice. This reward signal can besparse (occurring infrequently) or dense (provided frequently), andmay be positive or negative. It is both evaluative and sequential: itindicates the desirability of outcomes rather than prescribing correctactions, and the consequences of actions may only become apparentafter multiple steps of interaction. The reward signal is fundamentalto RL as it defines what constitutes success for the agent: it allowsthe latter to learn which actions are beneficial without requiringexplicit instruction about optimal strategies.

Another important component of RL is the policy, which represents theagent’s strategy for selecting actions in different situations.More formally, it maps the agent’s “perceived”states (i.e., its observations of the environment) to actions, eitherdeterministically (always choosing the same action in a given state)or probabilistically (selecting actions according to learnedprobabilities). The policy can be implemented through various methods,from simple lookup tables to sophisticated neural networks that canhandle complex state representations.^[40] The fundamental goal of RL is to discover a policy that maximizes theagent’s accumulated rewards over time.

To make good decisions, the agent needs to evaluate not just immediaterewards but also the long-term consequences of its actions. Thisevaluation is captured by value functions, which estimate the totalrewards the agent can expect to accumulate from a given state orstate-action pair when following a particular policy. Value functionsaccount for both immediate rewards and anticipated future rewards,with future rewards typically discounted to reflect their uncertaintyand temporal distance. For example, when considering a move in chess,the value function helps the agent assess not just the immediatestrength of its position but also its prospects for eventual victory.By learning accurate value functions, the agent can make decisionsthat optimize for long-term success rather than just immediateadvantages.

RL has proven remarkably successful across various domains ofartificial intelligence, such as game playing and robotic control. Forexample, RL systems have surpassed human expertise in increasinglycomplex board games. Earlier RL systems mastered simple games like andbackgammon (Tesauro, 1994), while more recent approaches have achievedsuperhuman performance in chess and Go (Silver et al., 2016,2018)—the latter being particularly significant given thegame’s strategic complexity. This progress has extended toimperfect information games, with systems achieving expert-levelperformance in poker (Brown et al., 2019) and multiplayer strategygames like StarCraft II (Vinyals et al., 2019). In arcade-style videogames, RL agents have learned to play dozens of Atari games athuman-level performance or better, using only raw pixel inputs andgame scores as feedback (Mnih et al., 2015). In robotics, RL hasenabled significant advances in both locomotion and manipulationtasks. Quadrupedal robots have learned to navigate difficult terrainand maintain balance (Lee et al., 2020), while robotic arms havemastered precise manipulation tasks such as in-hand objectmanipulation (OpenAI et al., 2019).

These achievements suggest that associative learning principles, whenimplemented in sophisticated computational systems, can give rise tobehavior that appears goal-directed and strategic. During its matchagainst Go champion Lee Sedol in March 2016, for example, AlphaGo madea surprising and decisive move (move 37) that no human player wouldhave considered making.^[41] This move has been widely discussed as evidence that RL trainingallows game-playing agents to come up with original strategicdecisions that go beyond mimicking human play patterns. In fact,professional human Go players have since improved their own strategiesby studying the decision-making process of RL-based Go-playingprograms—including win probability calculations and expectedoptimal move sequences for different possible moves (Shin et al.2021).

It should be noted, however, that some game-playing systems likeAlphaGo combine neural networks with a traditional search algorithm toexplore and evaluate possible move sequences before committing toactions. This hybrid architecture suggests that while RL is importantfor learning strategic patterns, the addition of explicit forwardplanning through search may be important for enabling creativeproblem-solving that goes beyond the training data. As such, thesesystems’ ability to produce original moves largely results fromexploring vast possibility spaces within their specialized Go modelrather than from the kind of flexible, generalizable reasoning thatallows humans and some animals to creatively solve novel problemsthrough understanding abstract causal principles (Halina 2021).

Pure RL methods also traditionally face several challenges. First, RLagents often require massive amounts of learning episodes to learngood policies. To achieve human-level performance on Atari videogames, for example, Mnih et al. (2015) had to train their agent on 50million frames—the equivalent of 38 days of playingtime—for every single game. RL agents also typically need to betrained separately for each specific task, with limited ability totransfer knowledge between different problems; for instance, an agenttrained to excel at one Atari game generally cannot perform well onother games without extensive retraining from scratch. Third, RLsystems are often limited to relatively simple and constrainedenvironments like games with well-defined rules and objectives, andhave more difficulty handling the multidimensional and unstructurednature of real-world tasks. As we will see, recent research has madesignificant progress to address these challenges with moresophisticated RL methods. For example, RL systems can now achievehuman-level performance on many Atari games with less than two hoursof play (Schwarzer et al. 2023). Robotics has also made progress inapplying RL to real-world tasks by bridging the so-called“sim-to-real” gap, allowing agents trained in simulationto transfer their skills to physical robots (Ju et al. 2022).

10.2 How RL extends classical associationism

RL shares associationism’s fundamental premise that learningoccurs through an agent’s interactions with its environmentbased on its causal history. Just as associationism proposes thatmental states become associated through experienced contingencies, RLalgorithms learn by forming associations between states, actions, andrewards through repeated environmental interactions. However, RLprovides a more precise computational framework for understanding howthese associations form and influence behavior. In fact, modern RLalgorithms extend associationism in ways that partially address someof the limitations of associationist theories of learning reviewed insection 9.

10.2.1 Prediction and control

Like associationism, RL addresses two fundamental aspects of learning:prediction (learning to anticipate future events) andcontrol (learning appropriate behavioral responses). Inassociationist theories of learning, these correspond respectively toclassical (Pavlovian) conditioning, where organisms learn predictiverelationships between stimuli, and instrumental conditioning, whereorganisms learn to select actions based on their consequences.Reinforcement learning provides precise computational mechanisms thatimplement and extend both forms of associative learning (Sutton &Barto 2018).

For prediction learning, the RL method known as temporal-difference(TD) learning formalizes how agents learn to anticipate future eventsbased on current stimuli (Sutton, 1988). TD learning allows agents tolearn value functions through direct interaction with an environment,without requiring a model of that environment. The key idea is that TDlearning updates value estimates based on the difference betweentemporally successive predictions, rather than waiting for the finaloutcome. Specifically, TD learning uses the current reward and theestimated value of the next state to update the value estimate of thecurrent state (a process called “bootstrapping”). Thismeans that TD learning can learn online, updating estimates at eachtime step, rather than having to wait until the end of a learningepisode (for related issues seesection 10.2.6).

Like classical conditioning, TD learning updates predictions whenactual outcomes differ from expected ones. However, TD learning goesbeyond simple stimulus-stimulus associations by incorporatingmechanisms that can bridge temporal gaps between predictive cues andoutcomes. This allows TD learning to account for phenomena likesecond-order conditioning, where previously conditioned stimuli canthemselves act as reinforcers—which eluded simplerassociationist models.

For control learning, RL implements the associationist principle thatbehaviors become associated with situations based on theirconsequences. However, rather than just forming simplestimulus-response associations, RL agents learn value functions thatestimate the long-term cumulative reward expected from differentactions in different situations. This provides a more sophisticatedmechanism for behavioral control that can account for both habitualresponses (through model-free learning of action values) andgoal-directed behavior (through model-based planning, seesection 10.2.6 below).

10.2.2 Beyond simple contiguity

A central tenet of classical associationism is that temporalcontiguity — the close temporal proximity of stimuli or events— is necessary for forming associations. This assumption facedsignificant empirical challenges, particularly from phenomena liketaste aversion learning, where organisms form strong associationsdespite long delays between stimuli and consequences (seeSection 9.4). RL provides several mechanisms that explain how learning canoccur without strict temporal contiguity.

TD learning enables learning across temporal gaps by comparingpredictions at successive time steps rather than waiting for finaloutcomes. Unlike classical associationism’s requirement forimmediate temporal relationships, TD learning can propagate learningbackwards through time by “bootstrapping” fromintermediate predictions. This allows the system to bridge temporalgaps that posed problems for traditional associationist theories.Schultz et al. (1997) showed that dopamine neuron activity closelymatches TD prediction errors, lending some credibility to TD learningas a biological learning mechanism mediated by dopaminesignaling—although this hypothesis remains disputed (Namboodiri2024).

Eligibility traces provide another mechanism for handling temporalgaps in learning. In classical conditioning, Hull’s notion of“stimulus trace” refers to a short-term memory of aconditioned stimulus that persists in the subject’s mind evenafter the physical stimulus has ended, allowing learning to occurdespite gaps between the conditioned and unconditioned stimuli. In RL,eligibility traces serve as a distinct mechanism that tracks whichstates or stimuli were recently experienced and are therefore“eligible” for learning updates, without affectingbehavioral responses, enabling more efficient learning across temporaldelays. Eligibility traces thus create temporally-extended records ofpast states and actions that can be updated when feedback eventuallyarrives, acting as a form of temporary memory that allows credit orblame to be assigned to events that occurred significantly earlier intime. This provides an additional computational mechanism for learningproblems that are difficult to address under strict contiguity requirements.^[42]

10.2.3 The credit assignment problem

Classical associationism faced what we called the“coextensionality problem” (Section 9.5), also known in AIas the “credit assignment problem” (Minsky 1961): whenmultiple stimuli are present simultaneously, how does the systemdetermine which ones should become associated with subsequentoutcomes? This problem manifests both spatially (which of multipleconcurrent stimuli matter) and temporally (which past events causedcurrent outcomes). Modern RL provides computational solutions totackle both of these credit assignment challenges.

TD learning addresses temporal credit assignment by propagating errorsignals backwards through time based on differences between successivepredictions. When an outcome occurs, the system can update not justrecent events but also states and actions from further in the past,weighted by their temporal distance through eligibility traces. Thisprovides a principled mechanism for determining which past eventscontributed to current outcomes, though only for events within thesystem’s hypothesis space. TD learning doesn’t inherently solvethe feature selection aspect of the coextensionalityproblem—distinguishing genuinely relevant features from spuriouscorrelations. Modern RL typically addresses this through inductivebiases that favor simpler hypotheses, though some recent approacheslike causal RL attempt to directly identify genuine causalrelationships (Bareinboim et al. 2024).

Value functions in RL help solve the simultaneous credit assignmentproblem by learning to predict the long-term consequences of differentstates and actions. Through experience, the system learns whichaspects of the current situation are predictive of future outcomes,effectively determining which stimuli deserve credit for results.Insofar as TD learning is biologically plausible, this may helpexplain blocking effects in classical conditioning — when astimulus fails to acquire associative strength because anotherstimulus already predicts the outcome perfectly.

10.2.4 Rapid and gradual learning

Traditional associationist theories of learning imply thatassociations could only be formed through slow, incrementalstrengthening through repeated exposure to stimulus pairings, incontrast with evidence that individual learning is often rapid andstep-like. Modern RL offers a new perspective on this apparent tensionbetween rapid and gradual learning. Rather than viewing them ascompeting accounts, some RL methods suggest that rapid learningcapabilities can emerge from and depend upon slower learningprocesses. For example, in meta-reinforcement learning, a“slow” outer loop of learning gradually tunes theparameters of a neural network through extensive experience acrossmany related tasks (Schweighofer & Doya 2003). This slow learningprocess shapes the network’s dynamics to implement a“fast” inner loop of learning that can rapidly adapt tonew situations within a familiar task domain. The fast learningcapabilities emerge precisely because the slow outer loop hasdiscovered useful regularities and inductive biases that constrain andguide learning in new situations. This behaviorally similar to howhuman subjects, after solving many puzzles of a certain type, becomeincreasingly quick at solving new puzzles of the same kind—notbecause they memorized specific solutions, but because they’velearned general problem-solving strategies for that domain. Thesuccess of meta-RL in modeling flexible behavior challenges the viewthat associative learning is inherently inflexible and unable toaccount for rapid adaptation.

Another method that leverages both gradual and fast learning isepisodic RL, which draws inspiration from biological episodic memorysystems—particularly the hippocampus’s role in memoryconsolidation through replay (Gershman & Daw 2017). Episodic RLcombines traditional RL with an episodic memory system to improvelearning efficiency and performance. It allows the agent to store pastexperiences as discrete episodes, typically represented as setscontaining the state, action taken, reward received, and resultingnext state. When the agent encounters a new situation, it can draw onpast experiences to compute the value of possible actions based on therecorded action values for similar states. While the system canimmediately leverage memories to inform decisions in new situations,the effectiveness of this process depends on having gradually learnedappropriate representations that make meaningful similaritycomparisons possible.^[43] The rapid deployment of episodic memories thus builds upon slowerprocesses that shape how experiences are encoded and compared.

Episodic RL is typically combined with experience replay, which allowsthe agent to sample and replay past experiences during training tobreak the correlation between consecutive training samples and enablesthe agent to learn more efficiently. This method was instrumental intraining RL agents that match human-level performance at Atari games(Mnih et al. 2013, 2015). While basic experience replay samples pastexperiences at random, more advanced methods prioritize whichexperiences to replay based on their potential learning value (Schaulet al., 2016).

It has been suggested that meta-RL and episodic RL with experiencereplay could help explain how organisms can exhibit both rapid,one-shot learning in familiar domains while still requiring extensiveexperience to master entirely novel types of tasks (Botvinick et al. 2019).^[44] Both methods suggest that associative learning can in principleoperate simultaneously across multiple timescales, with slowerprocesses laying the groundwork for faster forms of learning. Meta-RLin particular suggests that associative learning principles may playan important role not just in forming specific associations, but inshaping how organisms learn to learn (Sandbrink & Summerfield2024). When organisms show increasingly rapid learning of new problemswithin a domain, this may reflect the gradual tuning of learningmechanisms themselves through meta-learning processes, rather thansimple stimulus-response associations.

10.2.5 Content specificity

Connectionist models learn from specific input-output mappings intheir training data through associative mechanisms. As such, theyimplement content-specific computations: computations that arefaithful to content only because of the specific contents representedat input and output (Shea 2023). For example, a neural network trainedto classify images might learn to map certain patterns of edges andtextures to the label “dog”, but this tells us nothingabout how it should classify images of cats or trees. Likewise, purelyassociative transitions in psychology are content-specific(Quilty-Dunn & Mandelbaum 2019). By contrast, anon-content-specific computation is a computational process thatoperates in the same way regardless of the particular content of therepresentations it takes as input. For example, rules of logicalinference work the same way regardless of the specific conceptsinvolved; as such, inferential transitions arenon-content-specific.

The ability to perform non-content-specific computations allows formore flexible and generalizable processing, and is traditionally takento elude connectionist models, including most RL systems. However,Shea (2023) argues that episodic RL systems implementnon-content-specific computations. When an episodic RL systemencounters a new state, it computes the similarity between that stateand all previously stored episodes using the same algorithm,regardless of what specific states are being compared. This is adeparture from classical associationism’s reliance oncontent-specific transitions, where the relationship between twostates depends entirely on their specific contents and learninghistory.

This feature of episodic RL systems explains why they learn moreflexibly and efficiently. They can adapt more quickly to newsituations and avoid problems like catastrophic forgetting—wherenewly learnt associations overwrite past learning episodes—thatcan plague simpler neural network architectures relying exclusively oncontent-specific transitions. It should be noted, however, thatepisodic RL still relies on similarity-based computations (using asimilarity metric to compare vector-based representations), ratherthan inferential transitions that are sensitive to the constituentstructure of representations. While the representation of pastexperiences in episodic RL may have some compositional structure, itnormally lacks the kind of discrete constituent structure often takento underlie more regimented mental transitions such as logical inference.^[45]

10.2.6 Model-based RL

Another important distinction relevant to the reappraisal ofassociationism is that of model-free and model-based RL. In model-freeRL, the agent learns directly from experience without building anexplicit model of its environment. This is the typical RL setup wedescribed, in which the agent learns a policy through trial-and-errorand updates its estimates based on observed rewards and statetransitions. By contrast, model-based RL involves learning an explicitmodel of the environment, including the transition probabilitiesbetween states and the reward function. The agent can then use thismodel to plan and make decisions (Daw et al., 2005).

The distinction between model-free and model-based RL reflects afundamental trade-off between tractability and efficiency. Model-freeRL is computationally inexpensive, but it is not very sample-efficientor flexible as agents typically need very large amounts ofinteractions to learn optimal policies. Model-based methods are moresample-efficient and flexible, as the agent can use its model tosimulate experiences and plan ahead without actually taking actions inthe environment. However, they may struggle if the learned model isinaccurate or if the environment is too complex to model effectively.There is converging evidence that humans make use of both model-freeand model-based RL to balance these computational trade-offs (Lake etal. 2017; Botvinick et al. 2019). On this view, model-based planningcan take over model-free learning to enable flexible adaptation tonovel tasks, although with enough training certain skills acquiredthrough model-based RL can become “habituated” asmodel-free routines to alleviate computational resources.

Model-based RL goes beyond associative chaining by leveraging internalstructured knowledge of the environment that encodes relationshipsbetween states, actions, and outcomes to plan ahead. Some model-basedRL systems in AI have a hybrid architecture where the model isbuilt-in rather than learned by a neural network. AlphaGo, forexample, combines two neural network components—a “policynetwork” that selects moves and a “value network”that evaluates board positions—with Monte Carlo tree search(MCTS)—a traditional search algorithm which uses the policynetwork to focus the search on promising moves (Silver et al., 2016).In this system, the model of the rules of Go are encoded ashandcrafted features. By contrast, some model-based RL systems learn amodel of the environment with a neural network. For example, Kaiser etal. (2024) achieved excellent sample-efficiency on Atari games bytraining a model-based RL system with a “world model”,consisting of a neural network that learns to predict future frames ofthe game and expected rewards given past frames and possible actions.This “world model” can then be used to simulate the gameenvironment and allow the agent to learn optimal policies much morequickly.

Among computational innovations introduced by modern RL, model-basedmethods are probably those that most clearly strain the bare notion ofassociation inherited from classical associationism. On the one hand,model-based RL systems like Kaiser et al. (2024)’s Atari-playingneural network do fundamentally learn from associations betweenactions, observations, and rewards. On the other, it might bemisleading to describe the resulting “world model” ascontaining unstructured pairings of representations. A fortiori,hybrid RL systems that rely on built-in rules like AlphaGo containplenty of explicit structure. While the algorithmic innovations andbehavioral success of RL do address some of the core limitations ofassociationist theories of learning, they also abandon thelatter’s original commitment to simplicity.

Bibliography

Anderson, J., K. Spoehr, and D. Bennett, 1994, “A Study inNumerical Perversity: Teaching Arithmetic to a Neural Network”,inNeural Networks for Knowledge Representation andInference, D. Levine and M. Aparicio IV (eds.), East Sussex:Psychology Press, pp. 311–335.
Armstrong, K., S. Kose, L. Williams, A. Woolard, and S. Heckers,2012, “Impaired Associative Inference in Patients withSchizophrenia”,Schizophrenia Bulletin, 38(3):622–629.
Asch, S., 1962, “A Problem in the Theory ofAssociations”,Psychologische Beitrage, (6):553–563.
–––, 1969, “A Reformulation of the Problemof Association”,American Psychologist, 24(2):92–102.
Aydede, M., 1997, “Language of Thought: The ConnectionistContribution”,Minds and Machines,7(1):57–101.
Baeyens, F., P. Eelen, O. Van den Bergh, and G. Crombez, 1990,“Flavor-Flavor and Color-Flavor Conditioning in Humans”,Learning and Motivation, 21(4): 434–455.
Baeyens,F., P. Eelen, and G. Crombez, 1995, “PavlovianAssociations are Forever: On Classical Conditioning andExtinction”,Journal of Psychophysiology, 9(2):127–141.
Bain, A., 1855,The Senses and The Intellect, London:John W. Parker and Son.
Bar-Anan Y., B. Nosek, and M. Vianello, 2009, “The SortingPaired Features Task: A Measure of Association Strengths”,Experimental Psychology, 56(5): 329–343.
Bareinboim, E., J. Zhang, and S. Lee, 2024, “An Introductionto Causal Reinforcement Learning”, Technical Report R-65.CausalAI Lab, New York: Columbia University.
Bates, E. and B. MacWhinney, 1987, “Competition, Variation,and Language Learning”, in B. MacWhinney (ed.),Mechanismsof Language Acquisition, Hillsdale, N.J.: Lawrence ErlbaumAssociates, pp. 157–193.
Bendana, J. and E. Mandelbaum, forthcoming, “TheFragmentation of Belief”, in D. Kindermann, C. Borgoni, and A.Onofri (eds.),The Fragmented Mind, Oxford: Oxford UniversityPress.
Berger, J., 2020, “Implicit attitudes and awareness”,Synthese, 197(3): 1291–1312.
Bernstein, I. and M. Webster, 1980, “Learned Taste Aversionsin Humans”,Physiology and Behavior, 25(3):363–366.
Bernstein, I., 1985, “Learned Food Aversions in theProgression of Cancer and its Treatment”, in N. Braveman and P.Bronstein, (eds.),Experimental Assessments and ClinicalApplications of Conditioned Food Aversions, New York: New YorkAcademy of Sciences, pp. 365–80.
Binz, Marcel, Ishita Dasgupta, Akshay K. Jagadish, MatthewBotvinick, Jane X. Wang, and Eric Schulz, 2024, “Meta-LearnedModels of Cognition”,Behavioral and Brain Sciences, 47(January):e147.
Botvinick, Matthew, Sam Ritter, Jane X. Wang, Zeb Kurth-Nelson,Charles Blundell, and Demis Hassabis, 2019, “ReinforcementLearning, Fast and Slow”,Trends in Cognitive Sciences,23 (5): 408–22.
Black, W. and W. Prokasy (eds.), 1972,Classical ConditioningII: Current Research and Theory, New York:Appleton-Century-Crofts.
Bloom, P., 2000,How Children Learn the Meanings ofWords, Cambridge, MA: MIT Press.
Bouton, M., 2002, “Context, Ambiguity, and Unlearning:Sources of Relapse after Behavioral Extinction”,BiologicalPsychiatry, 52(10): 976–986.
–––, 2004, “Context and BehavioralProcesses in Extinction”,Learning and Memory, 11(5):485–494.
Brett, L., W. Hankins, and J. Garcia, 1976, “Prey-LithiumAversions. III: Buteo hawks”,Behavioral Biology,17(1): 87–98.
Brown, Noam, and Tuomas Sandholm. 2019. “Superhuman AI forMultiplayer Poker”,Science, 365 (6456): 885–90.
Buckner, C., 2023,From Deep Learning to Rational Machines:What the History of Philosophy Can Teach Us about the Future ofArtificial Intelligence, Oxford: Oxford University Press.
Camp, L., 2007, “Thinking with Maps”,Philosophical Perspectives, 21(1): 145–182.
Carey, S., 1978a, “Less May Never Mean More”, in R.Campbell and P. Smith, (eds.),Recent Advances in the Psychologyof Language, New York: Plenum Press, p. 109–132.
–––, 1978b, “The Child as WordLearner”, in J. Bresnan, G. Miller, and M. Halle, (eds.),Linguistic Theory and Psychological Reality, Cambridge, MA:MIT Press, pp. 264–293.
–––, 2010, “Beyond Fast Mapping”,Language Learning and Development, 6(3): 184–205.
Carey, S. and E. Bartlett, 1978, “Acquiring a Single NewWord”,Proceedings of the Stanford Child LanguageConference, 15: 17–29.
Caselli, M.C., E. Bates, P. Casadio, J. Fenson, L. Fenson, L.Sanderl, and J. Weir, 1995, “A Cross-linguistic Study of EarlyLexical Development”,Cognitive Development, 10(2):159–199.
Chaiken, S. and Y. Trope (eds.), 1999,Dual-Process Theoriesin Social Psychology, New York: Guilford Press.
Chalmers, D., 1993, “Connectionism and Compositionality: WhyFodor and Pylyshyn Were Wrong”,PhilosophicalPsychology, 6(3): 305–319.
Chater, N., 2009, “Rational Models of Conditioning”,Behavioral and Brain Sciences, 32(2): 204–205.
–––, J. Tenenbaum, and A. Yuille, 2006,“Probabilistic Models of Cognition: ConceptualFoundations”,Trends in Cognitive Sciences, 10(7):287–291.
Chomsky, N., 1959, “A Review of B.F. Skinner’s VerbalBehavior”,Language, 35(1): 26–58.
Churchland, P., 1986, “Some Reductive Strategies inCognitive Neurobiology”,Mind, 95(379):279–309.
–––, 1989,A Neurocomputational Perspective:The Nature of Mind and the Structure of Science, Cambridge, MA:MIT.
Churchland, P. and T. Sejnowski, 1990, “NeuralRepresentation and Neural Computation”,PhilosophicalPerspectives, 4: 343–382.
Collins, A. and E. Loftus, 1975, “A Spreading-ActivationTheory of Semantic Processing”,Psychological Review,82(6): 407–428.
Danks D., 2013, “Moving from Levels and Reduction toDimensions and Constraints”,Proceedings of the 35th AnnualConference of the Cognitive Science Society, 35:2124–2129.
De Houwer, J., 2009, “The Propositional Approach toAssociative Learning as an Alternative for Association FormationModels”,Learning & Behavior, 37(1):1–20.
–––, 2011, “Evaluative Conditioning: AReview of Procedure Knowledge and Mental Process Theories”, inT. Schachtman and S. Reilly (eds.),Associative Learning andConditioning Theory: Human and Non-Human Applications, New York:Oxford University Press, pp. 399–416.
–––, 2014, “A Propositional of ImplicitEvaluation”,Social and Personality Psychology Compass,8(7): 342–353.
–––, 2018, “Propositional Models ofEvaluative Conditioning”,Social PsychologicalBulletin, 13(2): 1–21.
–––, 2019, “Moving Beyond System 1 andSystem 2: Conditioning, Implicit Evaluation, and Habitual RespondingMight Be Mediated by Relational Knowledge”,ExperimentalPsychology, 66(4): 257–265.
De Houwer, J., S. Thomas, and F. Baeyens, 2001, “AssociationLearning of Likes and Dislikes: A Review of 25 years of Research onHuman Evaluative Conditioning”,Psychological Bulletin,127(6): 853–869.
Dehaene, S., 2011,The Number Sense: How the Mind CreatesMathematics, Oxford: Oxford University Press.
Demeter, T., 2021, “Fodor’s guide to the Humeanmind”,Synthese, 199(1), 5355–5375.doi:10.1007/s11229-021-03028-4
Diaz, E., G. Ruis, and F. Baeyens, 2005, “Resistance toExtinction of Human Evaluative Conditioning Using a Between-SubjectsDesign”,Cognition and Emotion, 19(2):245–268.
Dickinson, A., D. Shanks, and J. Evenden, 1984, “Judgment ofAct-Outcome Contingency: The role of Selective Attribution”,The Quarterly Journal of Experimental Psychology, 36(1):29–50.
Dirikx, T., D. Hermans, D. Vansteenwegen, F. Baeyens, and P.Eelen, 2004, “Reinstatement of Extinguished ConditionedResponses and Negative Stimulus Valence as a Pathway to Return of Fearin Humans”,Learning and Memory, 11: 549–54.
Elman, J., 1991, “Distributed Representations, SimpleRecurrent Networks, and Grammatical Structure”,Machinelearning, 7(2–3): 195–225.
Elman, J., E. Bates, M. Johnson, A. Karmiloff-Smith, D. Parisi,and K. Plunkett, 1996,Rethinking Innateness: A ConnectionistPerspective on Development, Cambridge, MA: MIT Press.
Evans, G., 1982,The Varieties of Reference, J. McDowell(ed.), Oxford: Clarendon Press.
Evans, J., and K. Frankish (eds.), 2009,In Two Minds: DualProcesses and Beyond, Oxford: Oxford University Press.
–––, and K. Stanovich, 2013, “Dual-ProcessTheories of Higher Cognition: Advancing the Debate,”Perspectives on Psychological Science, 8(3):223–241.
Fazio, R., 2007, “Attitudes as Object-EvaluationAssociations of Varying Strength”,Social Cognition,25(5): 603–637.
Festinger, L. and J. Carlsmith, 1959, “CognitiveConsequences of Forced Compliance”,The Journal of Abnormaland Social Psychology, 58(2): 203–210.
Field, A. and G. Davey, 1999, “Reevaluating EvaluativeConditioning: A Nonassociative Explanation of Conditioning Effects inthe Visual Evaluative Conditioning Paradigm”,Journal ofExperimental Psychology: Animal Behavior Processes, 25(2):211–224.
Fodor, J., 1983,The Modularity of Mind, Cambridge, MA:MIT Press.
–––, 2001,The Mind Doesn’t Work thatWay, Cambridge, MA: MIT Press.
–––, 2003,Hume Variations, Oxford:Clarendon Press.
Fodor, J., and B. McLaughlin, 1990, “Connectionism and theProblem of Systematicity: Why Smolensky’s Solution Doesn’tWork”,Cognition, 35(2): 183–204.
Fodor, J., and Z. Pylyshyn, 1988, “Connectionism andCognitive Architecture: A Critical Analysis”,Cognition, 28(1–2): 3–71.
Frankish, K., 2009, “Systems and Levels: Dual-SystemTheories and the Personal-Subpersonal Distinction”, in Evans andFrankish 2009: pp.89–107.
Gagliano, M., V. Vyazovsky, A. Borbely, M. Grimonprez, and M.Depczynski, 2016, “Learning by Association in Plants”,Scientific Reports, 6(38427): 1–8.
Gallistel, C., S. Fairhurst, and P. Balsam, 2004, “TheLearning Curve: Implications of a Quantitative Analysis”,Proceedings of the National Academy of Sciences of the UnitedStates of America, 101(36): 13124–13131.
Gallistel, C., and A. King, 2009,Memory and the ComputationalBrain: Why Cognitive Science Will Transform Neuroscience, WestSussex: Wiley Blackwell.
Garcia, J., 1981, “Tilting at the Paper Mills ofAcademe”,American Psychologist, 36(2):149–158.
Garcia, J., R. Kovner, and K. Green, 1970, “Cue Propertiesvs Palatability of Flavors in Avoidance Learning”,Psychonomic Science, 20(5): 313–314.
Garcia, J., B. McGowan, and K. Green, 1972, “BiologicalConstraints on Conditioning II”, in Black and Prokasy 1972:pp.3–27.
Garcia, J., W. Hankins, and K. Rusiniak, 1974, “BehavioralRegulation of the Milieu Interne in Man and Rat”,Science, 185(4154): 824–831.
Garcia, J., R.A. Koelling, 1966, “Relationship of cue toconsequence in avoidance learning”,PsychonomicScience, 4: 123–124.
Gendler, T., 2008, “Alief and Belief”,Journal ofPhilosophy, 105(10): 634–63.
Gleitman, L., K. Cassidy, R. Nappa, A. Papafragou, and J.Trueswell, 2005, “Hard Words”,Language Learning andDevelopment, 1(1): 23–64.
Glosser, G. and R. Friedman, 1991, “Lexical but not SemanticPriming in Alzheimer’s Disease”,Psychology andAging, 6(4): 522–27.
Goldin-Meadow, S., M. Seligman, and S. Gelman, 1976,“Language in the Two-Year Old”,Cognition, 4(2):189–202.
Greenwald, A., D. McGhee, and J. Schwartz, 1998, “MeasuringIndividual Differences in Implicit Cognition: The Implicit AssociationTest”,Journal of Personality and Social Psychology,74(6): 1464–1480.
Hahn, A., C. Judd, H. Hirsch, and I. Blair, 2014, “Awarenessof Implicit Attitudes”,Journal of Experimental Psychology:General, 143(3): 1369–1392.
Heyes, C., 2012, “Simple Minds: A Qualified Defence ofAssociative Learning”,Philosophical Transactions of theRoyal Society B: Biological Sciences, 367(1603):2695–2703.
Hughes, S., Y. Ye, P. Van Dessel, and J. De Houwer, 2019,“When people co occur with good or bad events: Graded effects ofrelational qualifiers on evaluative conditioning”,Personality and Social Psychology Bulletin, 45(2):196–208.
Hull, C., 1943,Principles of Behavior, New York:Appleton-Century-Crofts.
Hume, D., 1738,A Treatise of Human Nature, L.A.Selby-Bigge (ed.), 2^nd ed., revised by P.H. Nidditch,Oxford: Clarendon Press, 1975.
James, W., 1890,The Principles of Psychology (Vol. 1),New York: Holt.
Johnson, K., 2004, “On the Systematicity of Language andThought”,Journal of Philosophy, 101(3):111–139.
Kahneman, D., 2011,Thinking, Fast and Slow, New York:Farrar, Straus and Giroux.
Kamin, L., 1969, “Predictability, Surprise, Attention, andConditioning”, in B. Campbell and R. Church (eds.),Punishment and Aversive Behavior, New York:Appleton-Century-Crofts, pp. 279–296.
Kant, I., 1781/1787,Critique of Pure Reason, in P. Guyerand A. Wood (eds.),Critique of Pure Reason, New York:Cambridge University Press.
Karmiloff-Smith, A., 1995,Beyond Modularity: A DevelopmentalPerspective on Cognitive Science, Cambridge, MA: MITPress/Bradford Books.
Kruglanski, A., 2013, “Only One? The Default InterventionistPerspective as a Unimodel—Commentary on Evans &Stanovich”,Perspectives on Psychological Science,8(3): 242–247.
Kurdi, B., and M. Banaji, 2017, “Repeated evaluativepairings and evaluative statements: How effectively do they shiftimplicit attitudes?”,Journal of Experimental Psychology:General, 146(2): 194–213.
–––, 2019, “Attitude change via repeatedevaluative pairings versus evaluative statements: Shared and uniquefeatures”,Journal of Personality and SocialPsychology, 116(5): 681–703.
Locke, J., 1690,An Essay Concerning Human Understanding,in Peter H. Nidditch (ed.),An Essay Concerning HumanUnderstanding, Oxford: Clarendon Press, 1975,
Logue, A., I. Ophir, and K. Strauss, 1981, “The Acquisitionof Taste Aversion in Humans”,Behavioral Research andTherapy, 19(4): 319–33.
Luka, B., and L. Barsalou, 2005, “Structural facilitation:Mere exposure effects for grammatical acceptability as evidence forsyntactic priming in comprehension”,Journal of Memory andLanguage, 52: 444–467.
Lycan, W, 1990, “The Continuity of the Levels ofNature”, in W. Lycan (ed.),Mind and Cognition: AReader, Cambridge: Basil Blackwell, pp. 77–96.
Mandelbaum, E., 2013a, “Against Alief”,Philosophical Studies, 165(1): 197–211.
–––, 2013b, “NumericalArchitecture”,Topics in Cognitive Science, 5(2):367–386.
–––, 2016, “Attitude, Inference,Association: On the Propositional Structure of ImplicitAttitudes”,Nous, 50(3): 629–658.
–––, 2017, “Seeing and Conceptualizing:Modularity and the Shallow Contents of Vision”,Philosophyand Phenomenological Research, 97(2): 267–283.
–––, 2019, “Troubles with Bayesianism: AnIntroduction to the Psychological Immune System”,Mind &Language, 34(2): 141–157.
Mann, T., and M. Ferguson, 2015, “Can we undo our firstimpressions? The role of reinterpretation in reversing implicitevaluations”,Journal of Social and PersonalityPsychology, 108(6): 823–849.
–––, 2017, “Reversing implicit firstimpressions through reinterpretation after a two-day delay”,Journal of Experimental Social Psychology, 68:122–127.
Mann, T., B. Kurdi, and M. Banaji, 2019, ” How effectivelycan implicit evaluations be updated? Using evaluative statements afteraversive repeated evaluative pairings”,Journal ofExperimental Psychology: General, doi:10.1037/xge0000701.
Markman, E., 1989,Categorization and Naming in Children:Problems of Induction, Cambridge, MA: MIT Press.
Markson, L. and P. Bloom, 1997, “Evidence Against aDedicated System for Word Learning in Children”,Nature, 385(6619): 813–815.
Marr, D., 1982,Vision: A Computational Investigation into theHuman Representation and Processing of Visual Information, NY:W.H. Freeman and Co.
Mason, M. and M. Bar, 2012, “The Effect of MentalProgression on Mood”,Journal of Experimental Psychology:General, 141(2): 217–221. doi:10.1037/a0025035
McClelland, J., M. Botvinick, D. Noelle, D. Plaut, T. Rogers, M.Seidenberg, and L. Smith, 2010, “Letting Structure Emerge:Connectionist and Dynamic Systems Approaches to Cognition”,Trends in Cognitive Sciences, 14(8): 348–356.
Minsky, M., 1963, “Steps toward ArtificialIntelligence”, in E. Feigenbaum and J. Feldman (eds.),Computers And Thought, New York, NY: McGraw-Hill,pp. 406–450.
Mitchell, C., J. De Houwer, and P. Lovibond, 2009, “ThePropositional Nature of Human Associative Learning”,Behavioral and Brain Sciences, 32(2): 183–246.
Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,D. Wierstra, and M. Riedmiller, 2013, “Playing Atari with DeepReinforcement Learning”,Neural Information ProcessingSystems 2013, Deep Learning Workshop.[Mnih et al. 2013 available online]
Mnih, V., K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness,M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski,S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King,D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, 2015,“Human-Level Control Through Deep Reinforcement Learning”,Nature, 518(7540): 529–533.
Nosek, B. and M. Banaji, 2001, “The Go/No-Go AssociationTask”,Social Cognition, 19(6): 625–66.
Osman, M., 2013, “A Case Study Dual-Process Theories ofHigher Cognition—Commentary on Evans & Stanovich”,Perspectives on Psychological Science, 8(3):248–252.
Pavlov, I., 1906, “The Scientific Investigation of thePsychical Faculties or Processes in the Higher Animals”,Science, 24(620): 613–619.
–––, 1927,Conditioned Reflexes: AnInvestigation of the Physiological Activity of the CerebralCortex, Oxford: Oxford University Press.
Payne, B., Cheng, C., Govorun, O., and Stewart, B., 2005,“An Inkblot for Attitudes: Affect Misattribution as ImplicitMeasurement”,Journal of Personality and SocialPsychology, 89(3): 277–293.
Perea, M. and E. Rosa, 2002, “The Effects of Associative andSemantic Priming in the Lexical Decision Task”,Psychological Research, 66(3): 180–194.
Prinz, J., 2002,Furnishing the Mind: Concepts and theirPerceptual Basis, Cambridge, MA: MIT Press.
–––, and A. Clark, 2004, “Putting Conceptsto Work: Some Thoughts for the 21st Century”,Mind &Language, 19(1): 57–69.
Quilty-Dunn, J., 2020, “Perceptual Pluralism”,Nous, 1–41.
Quilty-Dunn, J. and E. Mandelbaum, 2018, “InferentialTransitions”,Australasian Journal of Philosophy,96(3): 532–547.
–––, 2019, “Non-Inferential Transitions:Imagery and Association”, in T. Chan and A. Nes (eds.),Inference and Consciousness, New York: Routledge,pp. 151–171.
Rescorla, R., 1968, “Probability of Shock in the Presenceand Absence of CS in Fear Conditioning”,Journal ofComparative and Physiological Psychology, 66(1): 1–5.
–––, 1988, “Pavlovian Conditioning:It’s Not What You Think It Is”,AmericanPsychologist, 43(3): 151–160.
Rescorla, E., and A. Wagner, 1972, “A Theory of PavlovianConditioning: Variations in the Effectiveness of Reinforcement andNonreinforcement”, in Black and Prokasy 1972,pp. 64–99.
Roll, D. and J. Smith, 1972, “Conditioned Taste Aversion inAnesthetized Rats”, in M. Hager and J. Seligman (eds.),Biological Boundaries of Learning. New York:Appleton-Century-Crofts, pp. 98–102.
Rozin, P., 1986, “One-Trial Acquired Likes and Dislikes inHumans: Disgust as a US, Food Predominance, and Negative LearningPredominance”,Learning and Motivation, 17(2):180–189.
Rumelhart, D., P. Smolensky, J. McClelland, and G. Hinton, 1986,“Sequential Thought Processes in PDP Models”, inJ.McClelland and D. Rumelhart (eds.),Parallel DistributedProcessing Vol. 2: Explorations in the Microstructure of Cognition:Psychological and Biological Models, Cambridge, MA: MIT Press,pp. 7–57.
Rusiniak, K., W. Hankins, J. Garcia, and L. Brett, 1979,“Flavor-illness Aversions: Potentiation of Odor by Taste inRats”,Behavioral and Neural Biology, 25(1):1–17.
Rydell, R. and A. McConnell, 2006, “Understanding Implicitand Explicit Attitude Change: A Systems of Reasoning Analysis”,Journal of Personality and Social Psychology, 91(6):995–1008.
Sandhoffer, C., L. Smith, and J. Luo, 2000, “Counting Nounsand Verbs in the Input: Differential Frequencies, Different Kinds ofLearning?”,Journal of Child Language, 27(3):561–585.
Seligman, M., 1970, “On the Generality of the Laws ofLearning”,Psychological Review, 77(5):406–418.
Shanks, D., 2010, “Learning: From Association toCognition”,Annual Review of Psychology, 1,273–301.
Skinner, B., 1938,The Behavior of Organisms: An ExperimentalAnalysis, Oxford: Appleton-Century.
–––, 1953,Science and Human Behavior,New York: Simon and Schuster.
Sloman, S., 1996, “The Empirical Case for Two Systems ofReasoning”,Psychological Bulletin, 119(1):3–22.
Smith, E. R. and J. DeCoster, 2000, “Dual-Process Models inSocial and Cognitive Psychology: Conceptual Integration and Links toUnderlying Memory Systems”,Personality and SocialPsychology Review, 4(2): 108–131.
Smith, J. and D. Roll, 1967, “Trace Conditioning with X-raysas an Aversive Stimulus”,Psychonomic Science, 9(1):11–12.
Smolensky, P., 1988, “On the Proper Treatment ofConnectionism”,Behavioral and Bruin Sciences, 11(1):l–23.
Snedeker, J. and L. Gleitman, 2004, “Why it is Hard to LabelOur Concepts”, in D. Hall and S. Waxman (eds.),Weaving aLexicon, Cambridge, MA: MIT Press, pp. 257–294.
Stanovich, K., 2011,Rationality and the Reflective Mind,New York: Oxford University Press.
Tenenbaum, J., C. Kemp, T. Griffiths, and N. Goodman, 2011,“How to Grow a Mind: Statistics, Structure, andAbstraction”,Science, 331(6022): 1279–1285.
Thorndike, E., 1911,Animal intelligence: Experimentalstudies, New York: Macmillan.
Todrank, J., D. Byrnes, A. Wrzesniewski, and P. Rozin, 1995,“Odors can Change Preferences for People in Photographs: ACross-Modal Evaluative Conditioning Study with Olfactory USs andVisual CSs”,Learning and Motivation, 26(2):116–140.
Tolman, E., 1948, “Cognitive Maps in Rats and Men”,Psychological Review, 55(4): 189–208.
Van Dessel, P., Y. Ye, and J. De Houwer 2019, “Chaningdeep-rooted implicit evaluation in the blink of an eye: negativeverbal information shifts automatic liking of Gandhi”,Social Psychological and Personality Science, 10(2):266–273.
Van Gelder, T., 1995, “What Might Cognition Be, If notComputation?”,The Journal of Philosophy, 91(7):345–381.
Vansteenwegen, D., G. Francken, B. Vervliet, A. De Clercq, and P.Eelen, 2006, “Resistance to Extinction in EvaluativeConditioning”,Journal of Experimental Psychology: AnimalBehavior Processes, 32(1): 71–79.
Wilson, T., S. Lindsey, and T. Schooler, 2000, “A Model ofDual Attitudes”,Psychological Review, 107(1):101–26.

Academic Tools

How to cite this entry.
Preview the PDF version of this entry at theFriends of the SEP Society.
Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
Enhanced bibliography for this entryatPhilPapers, with links to its database.

Other Internet Resources

Acknowledgments

Helpful feedback was received from Michael Brownstein, CameronBuckner, Bryce Huebner, Zoe Jenkin, Xander Macswan, Griffin Pion, JakeQuilty-Dunn, Shaun Nichols, Soren Schlassa, and Susanna Siegel who arehereby thanked for their efforts.

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

Browse

About

Support SEP

Mirror Sites

View this site from another server:

USA (Main Site)Philosophy, Stanford University

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054

	How to cite this entry.
	Preview the PDF version of this entry at theFriends of the SEP Society.
	Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
	Enhanced bibliography for this entryatPhilPapers, with links to its database.

Movatterモバイル変換