Movatterモバイル変換

Notes toAssociationist Theories of Thought

1. Empiricists who have wanted more than one type of learning mechanism have tendedto be constructivists. The basic constructivist position is to posit asingle mental process, the ability to associate ideas, and toconstruct new processes out of the single innate process (see, Fodor1983 for discussion). On pain of regress, no theorist, regardless oftheir orientation, can have every mental process be learned. Some mustbe innate, and some must be “built into the architecture”(e.g., not explicitly represented; see Quilty-Dunn and Mandelbaum2018).

2. This characterization reflects what could be considered the originaland strictest form of associationism, motivated primarily by itstheoretical parsimony—the ability to explain complex mentalphenomena with minimal innate machinery. Some later associationists,such as Pavlov (1927) and radical behaviorists like Skinner (1953),did largely maintain this commitment to a single associative process.While this single-process view represents the ideal that madeassociationism attractive to empiricists, it is worth noting thathistorical associationists often incorporated additional processes;for example, Hume (1738) included imagination and other facultiesalongside association (see Fodor 2003 and Demeter 2021 fordiscussion); Bain (1855) allowed for discrimination and similarity asfundamental principles; Thorndike (1913) included evaluative processes(‘satisfiers’ and ‘annoyers’) and motivationalprocesses (‘readiness’). Modern associative theoriesfurther incorporate mechanisms like inhibitory versus excitatoryassociations and error-correction learning principles (Rescorla &Wagner, 1972), and competitive learning mechanisms (Rumelhart &Zipser, 1985), though the extent to which they still embody theoriginal spirit of associationism can be debated.

3. That said, one can detect aspects of associationism in earlierwriters, such as Descartes when discussing memory and Spinoza whendiscussing the emotions (see the entry onDescartes and on17th and 18th Century Theories of Emotions.

4. Although Hume is generally acknowledged as laying the theoreticalfoundation of associationism, there is some evidence that FrancisHutcheson’s use of associations greatly influenced him. See theentry onScottish Philosophy in the 18^th Century.

5. “All our simple ideas in their first appearance arederiv’d from simple impressions, which are correspondent tothem, and which they exactly represent” (T 1.1.1.7/4).

6. This is a bit of a loose formulation. Strictly speaking, impressionsthemselves don’t instantiate any associative relation, ratherthe contents of the Impressions do. For example, it isn’t thatone’s Impression (understood as a vehicle of thought) ofchickens resembles roosters; rather it’s the content ofone’s impressions resemble one another. Presumably, allImpressions qua vehicles of thought resemble one another merely bybeing Impressions. What differs between Impressions is, e.g., whetherthe content they represent resembles other represented content. Thisdistinction between vehicle and content is important for Hume’soverall architecture: it’s not the vehicle of the Impressionthat gets copied into an Idea, but rather the content of that vehicle.That said, the vast majority of associationist theories range overassociated contents and not associated vehicles (even though there isno theoretical reason vehicles can’t be associated, and some reason tothink they sometimes are, see, e.g., Luka and Barsalou 2005). To easeexposition the distinction between vehicles and contents is elided inthe main text except where it is important to distinguish.

7. Although some contemporary associationist views still retain allthree original Humean associative relations, the resemblance relationhas come under the most scrutiny and is the least popular of thethree. For discussions of the problem of the resemblance criterion seeField and Davey (1999), and De Houwer (2009). In the canonicalRescorla-Wagner model (Rescorla and Wagner 1972), both contiguity andresemblance are superseded by the contingency requirement. However,allowing cause and effect (as such) to be analyzed as associativegreatly complicates the theory no longer allowing for it to be clearlyany simpler than computational or propositional theories.

8. A variation on classical conditioning is evaluative conditioning,where one tries to transfer the valence of the US onto the CS (see,e.g., De Houwer et al. 2001 for an overview). For instance, one mightpair a favorable flavor (e.g., sugar) with a novel neutral facestimulus, in order to transfer the positive valence to the previouslyneutral face.

9. There are many different ways of construing the details of Pavlovianconditioning. For example, some would restrict the usage further byarguing that the US must be biologically significant, or widen theusage, as Rescorla does (seesection 7). Some anti-associationists even believe that Pavlovian conditioning isreal, but not predicated on associations (Mitchell et al. 2009).

10. Classical conditioning also had some consequences that were a bitunpalatable for empiricists: if all learning was to be given asforming associative bonds between USs, CSs, and responses, then all ofour learning had to bottom out in some behaviors that werepreprogrammed to correspond to certain stimuli: in other words,certain instinctual patterns of behavior were innately set to beelicited by certain stimuli. Even more problematically, suchinstinctual patterns were apt to be species-specific, so notgeneralizable to humans. Most problematically, the theory just doesn’tseem true, as responses to CSs are often different than the responsesto USs. When bells are swapped for food dogs may still salivate,though not necessarily to the same degree as the actual food (a factwhich Pavlov himself knew, see Pavlov 1927, Lecture III). Moreover, ifgiven the opportunity, dogs will try to eat the food but they won’ttry to eat the bell.

11. Note how Thorndike does not hesitate to speak of mental states likesatisfaction and dissatisfaction, as opposed to the most famouspractitioner of operant conditioning, the radical behaviorist B.F.Skinner (see thebehaviorism entry).

12. From this level of abstraction, Pavlov and Skinner were united.Here’s Garcia’s on Skinnerian learning:

Any stimulus applied immediately after the response which, byempirical test, would increase response production was deemed areinforcer…The general procedures were said to be applicable toany and all reflexes, in any and all organisms. There was no need toconcern ourselves with species differences, with brain differences, orwith reinforcer differences. The payoff schedule’s the thingwherein we’d capture control of the organism. (Garcia 1981:155)

13. Some even question whether evaluative conditioning is a true form oflearning, or is instead a version of propositional learning; see DeHouwer 2018, andsection 8below.

14. Talk of storage implies the existence of memory. For associationistsuse of memory has been, at times, a tricky issue. Since memory is acognitive faculty, if associations need to have memory then one cannotbe an associationist while also denying the existence of mentalprocesses, or minds for that matter (yet another problem for theradical behaviorist position). Insofar as associative learning impliesmemory (and it seems to even on ‘model-free’ learning models) thensome unintuitive conclusions may follow, as even plants have appearedto engage in associative learning (Gagliano et al. 2016).

15. Radical behaviorists such as Skinner (e.g., 1953) would deny thisclaim, but only because of their ontological objections to reifyingmental states. But Eliminativism of the mental is a different thesisthan associationism, although both fit together well (seesection 6).

16. Hereafter we will use the forward slash to denote an associativebond between the entities on either side of the slash. Additionally,expressions written in small caps will be used to denote concepts, andI will assume that the concepts’ structural descriptions aregiven by the expressions. Thus red bird is taken to be a complexconcept consisting of two meaningful parts, the concept red and theconcept bird. However, bird will be assumed to be a simple conceptwith no semantically decomposable parts. The structural descriptionsare stipulated for exegetical reasons and without commitment to theactual structure of the corresponding concepts.

17. The mediation parenthetical can get a bit complicated to state, forone might want to claim that, e.g.,wrench andhammer are associated, even if the associationis mediated via a link between those concepts andtool.In which case, it’s best to say that twoconcepts form a basic associative structure if the activation of oneconcept brings on the activation of another without there being anyother mediating psychological variable.

18. This claim should be qualified in a few ways. First, the mappingmight not be a full mapping of a single thinker as opposed to asubsystem of a single thinker (such as their intramodularrepresentation of their lexicon, see Fodor 1983). Secondly, themapping needn’t be between concepts per se, and can instead bebetween mental representations that for some reason or another oneneedn’t bestow the honorific of “concepts” to(because, for example, the mental representations are intramodular andthus not properly “general”, see Evans 1982).

19. “ExperiencingXs andYs” generallymeans something such as “having formed representations ofXs andYs based on their appearance in the ambientenvironment,” but needn’t necessarily mean that. If onejust happened to keep thinkingx followed byy forany reason, even thoughXs andYs weren’tgiven in experience, that too could change the associative strength ofthex/y bond. Additionally, some theories allow“piggybacking” associations—associations formed fromactivated propositional structures. For example, constantly having thepropositional thoughtmolly owns a dog couldaffect the associative bond betweenmolly anddog (see Mandelbaum 2016 for discussion).

20. Although bare-boned associationism provides a good approximation ofHume and Pavlov, it doesn’t quite capture the full theory ofthose working in operant conditioning paradigms for it doesn’tinvolve any notion of reinforcement, or updating one’sassociative structure based on consequences. This isn’taccidental: how to square cognitive updating (i.e., association-basedor belief-based updating) based on consequences with the spartantenets of associationism has often been a point of difficulty (see,e.g., Festinger and Carlsmith 1959).

21. Curiously, it appears that extinction isn’t very effective inevaluative conditioning paradigms, though counterconditioning is (seeDe Houwer 2011 for many citations, such as Diaz et al. 2005 andVansteenwegen et al. 2006).

22. Technically, reinstatement is the reappearance of the CR uponreexposure to the US after successful extinction, whereas spontaneousrecovery is the name for the return of the associative pairing justdue to the passage of time. Thus one is due to changes in spatialcontext, the other changes in temporal context. Both reinstatement andspontaneous recovery are related, and both provide difficulties forthe traditional view of extinction.

23. In the example of associative transitions offered above, we usedassociations between propositions. But of course a pure associationistview would not allow propositional structures. It is thus a bit moredifficult for a pure associationist to distinguish associativetransitions from associative structures. For the pure associationist,all transitions are associative transitions among associativestructures, for association is the only available mental process andassociative structures the only available mental structure. Thus, forthe pure associationist, the only possible difference between anassociative structure and an associative transition is a contingenttemporal one (where an associative structure is ideallycontemporaneous whereas an associative transition unfolds overtime).

24. The situation is similar to what arises in numerical cognition. Whenwe are children, we may explicitly add 2 to 2 to get 4, but over time2 + 2 = 4 becomes an associated string, more phonetic than arithmetic,similar to the truths of the multiplication table. After memorizingthe multiplication table we don’t need to think to give the answer to5 x 5. Compare 2 + 4 or 5 x 5 to calculating either 9 + 16 = 25 or 55x 5. In these latter cases we don’t have any rote overlearning so wegenerally have to calculate the responses instead of merely parrotingstored ones. Evidence for the distinction between these two ways ofanswering numerical questions comes from patients who lose access totheir faculty of numerical reasoning (Dehaene 2011; Mandelbaum 2013b).These people can’t do basic numerical tasks (e.g., tell you if 30 isbetween 20 and 40 on a number line, visually distinguish which setshave more members than others, longhand calculate arithmeticalquestions, etc.) but they can still answer previously memorizedequations, like the multiplication tables. In this sense, knowledge ofthe multiplication tables are more similar to one’s knowledge of thestate capitals--both just species of semantic memories--than they areto mathematical reasoning. In essence answering via semantic memory isquicker and easier because the answers are associated with questionsin a way most expressions of mathematical truths are not. Similarly,overlearned inferences may cease to be inferences and instead becomeassociative strings because of the overlearning (e.g., perhaps this istrue for some for the old chestnut: All men are mortal, Socrates is aman, so Socrates is mortal).

25. The question of how many levels of explanation one allows in theircognitive architecture is a wholly separate question of whether any ofthose architectures are associationistic. Generalizations here varywildly from theorist to theorist. For example, many theorists, roughlyfollowing Marr (1982), assume there is just one algorithmic(psychological/representational) level which is then instantiated in aphysical (neurological) level (see, e.g., Mitchell et al. 2009).Others generally assume that there are multiple psychological levels.For instance, Fodor writes, “psychological faculties at the nthlevel are typically implemented by psychological faculties at then−1th level” (2003: 132; cf. Danks 2013).

26. In this context, “subsymbolic” just means that the nodeon its own has no semantic value. In other words, a single nodewouldn’t represent any content.

27. What’s seen as structural from one vantage point is seen asfunctional from another (Lycan 1990). Dendrites are structural fromthe vantage of intentional psychology, but functional from the vantageof particle physics. Which, if any of these glosses is meant to coverthe Connectome is itself a further question (Mandelbaum 2022).

28. There are no domain-specific associationists because associativelearning is incompatible with domain specificity. Domain specificityassumes different mental processes for different domains, andassociative learning presupposes the same learning mechanismregardless of domain (Mandelbaum 2017, 2019).

29. For example, in a “default-interventionist” model System2 processes are not always engaged though they are in “parallelcompetitive” models (both models include the constant automaticengagement of System 1). See Evans and Stanovich 2013 fordiscussion.

30. Systematicity arguments aren’t about language, or even humans,per se. One can run the same style argument for animals and learning:an animal that can learn that a is on top of b can learn that b is ontop of a; an animal that can learn to associate a green light with ashock and a blue light with food, can learn to associate a green lightwith food and a blue light with a shock. Of course there arewell-known domain specific constraints to this type of learning (seethe Garcia effect discussion) but they are merely exceptions to therule.

31. Gallistel and King (2009: 239) argue that there is no such window.Instead they argue that what matters for learning in place ofcontiguity is a ratio of the time between the presentation of the CSand the appearance of the US as compared to the time between differentUS presentations (in a given context). For example, speeding up theCS/US connection by a factor of two reduces the amount of USpresentations one needs by half.

32. It appears that content specificity of associations needn’tjust be based on innate dispositions. For example, in an evaluativeconditioning paradigm using odors as USs and faces as CSs, theevaluative conditioning only commenced when the odors were interpretedas plausibly human (Todrank et al. 1995). But “plausiblyhuman” included learned information (such as the odorsassociated with soap). When the odors were typically associated withobjects and not humans, no learning transpired. Additionally, thereappears to be content-specific differences in associative learning ata greater level of abstraction: there is evidence that negative US/CSpairings are learned more quickly, and form stronger bonds thanpositive US/CS pairings (Rozin 1986, Baeyens et al. 1990.)

33. Blocking has been observed in humans (see Dickinson et al. 1984) butone needn’t delve into the empirical literature to feel the pullof the phenomenon. Imagine you’ve eaten an orange andimmediately have an allergic reaction. If in your next meal you eat anorange and an apple and have the allergic reaction, you will be lesslikely to think the apple caused the reaction than you would were youto have never experienced the allergic reaction after eating theorange.

34. More problematically for associationists, blocking doesn’talways work, but when it doesn’t isn’t predictable byassociative theory. For example, if a weak odor is paired with astrong taste and the pairing is followed by gastrointestinal distress,the taste magnifies the sensitivity of the odor as a signal (Rusiniaket al. 1979). Relatedly, if a hawk eats a black mouse and gets sick,the hawk won’t just avoid black mice but will avoid all mice.However, if the black mouse tastes different than a white mouse, thenthe hawk will continue to eat white mice even after black mice make itsick (Brett et al. 1976).

35. Oddly enough, evaluative conditioning does not seem as sensitive tobase rates or as susceptible to “occasion setting” asclassical conditioning is. See De Houwer et al. 2001).

36. The problem metastasizes depending on how one interprets“location”. For example, if the testing facility isin New Jersey, or the east coast, or on Earth, or in the Milky Way,why isn’t that information also associated? Of course, the naturalthing to say here is that the animal has the concepttestinglocation but doesn’t have the conceptnewjersey. This response is blocked off frombehaviorists, but not associationists per se, though the latter stillhave to explain why these concepts remain unlearned.

37. The more one looks into how locational properties become associated,the more problems seem to mount. For example, if a rat has a strongpreference for a particular drink but gets shocked while ingestingthat drink, the rat will not change its preference of the flavor.Instead, the rat will just learn to avoid the drink when it encountersit in the experimental location. But when the rat is given a chance toingest the drink anywhere else (e.g., back in its home cage) it willstill continue to ingest the drink. Furthermore, in the case where therat gets shocked while drinking the highly desirable flavor in theSkinner box on trialN, the rat will increase how much of thedrink it will intake on trialN+1. This is a reasonablestrategy, one that seems to indicate rational thought: assuming thatone knows they are going to get shocked, they might as well intake asmuch as possible while getting shocked. For more on these points, seeGarcia (et al. 1970).

38. In other versions of the problem it is understood as the problem theorganism faces in trying to figure out which of its behaviors producedthe environmental change that interests the organism. It also appearsin problems in Artificial Intelligence (see Minksy 1963).

39. For a pure associationist, one would phrase this as the organismlearning to associate the lack of CS with the US. How the pureassociationist analyzes the absence of a CS while using onlyassociative structures can also be a tricky issue.

40.A policy can be understood as the agent’s strategy ordecision-making rule that determines which actions to take indifferent situations. For example, in a simple maze navigation task,one policy might be “always turn right when facing ajunction,” while a more sophisticated policy might be“choose the direction that leads toward unexplored areas first,then prefer paths that previously led to rewards.” In chess, abasic policy could specify “capture opponent pieces wheneverpossible,” while a more advanced policy might balance multipleconsiderations like “control the center of the board, protectvaluable pieces, and develop a strong pawn structure.” Thesepolicies can be implemented in various ways: from explicit rules (asin the examples above), to probability distributions over possibleactions (e.g., “in this position, move the knight with 70%probability and the bishop with 30% probability”), to neuralnetworks that process complex state information and output actionrecommendations. The fundamental goal in RL is to discover or learn apolicy that maximizes the agent’s accumulated rewards over time,regardless of how that policy is represented.

41.Move 37 is widely acknowledged as a move that defied human Goconvention. In the documentary AlphaGo, David Silver, the leadresearcher behind AlphaGo, reported that “the professionalcommentators almost unanimously said that not a single human playerwould have chosen move 37” (Kohs 2017). For example, Go championFan Hui commented: “When I see this move, for me, it’sjust a big shock. What? Normally, humans, we never play this onebecause it’s bad. It’s just bad. We don’t know why.” (Kohs2017). AlphaGo itself calculated only a 1/10,000 probability that ahuman player would make this move, showing just how dramatically itdeparted from established Go strategy and conventional human playpatterns.

42.While reinforcement learning has historical roots in behavioristtraditions, mechanisms like eligibility traces ironically contradictbehaviorism’s core anti-representational commitments, byeffectively reintroducing the internal cognitive machinery thatradical behaviorists sought to eliminate.

43.One might wonder whether this learning process needs to begradual. The evidence does suggest that gradual exposure to differentsituations matters to get good representations that enable meaningfulsimilarity comparisons between states (Botvinick et al. 2019).Importantly, the utility of episodic memories depends on accurateestimates of action values associated with those memories (Gershman& Daw 2017). These value estimates are refined over time throughrepeated interactions with the environment; if learning occurred tooquickly, the stored values would be based on limited samples and noisyinformation, subsequently leading to suboptimal decisions when thosememories are retrieved and used.

44.The concept of “domains” in meta-RL research typicallyrefers to narrow families of related tasks sharing underlyingstructure (e.g., variations of maze configurations, bandit problemswith different reward distributions, or similar game types). Thisnarrowness contrasts with the broader domains where humans exhibitone-shot learning, such as language acquisition ("fast mapping") orcausal learning. Extending meta-RL to achieve human-like one-shotlearning across broader domains remains an active research challenge,involving trade-offs between finding common representations andmaintaining good discriminative power across domains.

45.Although similarity-based computations may limit episodicRL’s ability to capture overhypotheses (abstract principles thatconstrain hypothesis spaces), meta-learning offers a promisingapproach to address this limitation. Meta-learning systems can acquireinductive biases from experience by training on distributions ofrelated tasks. Through this process, they can learn appropriate priorsover hypothesis space—essentially acquiringoverhypotheses—that facilitate rapid adaptation to new butrelated tasks. This allows meta-learned models to capture moreabstract patterns across tasks than standard episodic RL systems. AsBinz et al. (2024) show, meta-learning can produce approximatelyBayes-optimal learning algorithms even when exact Bayesian inferencewould be computationally intractable, potentially bridging the gapbetween similarity-based and more structured forms of generalization(see also Wang 2021, andsection 9.2 on how meta-learning helps with compositional generalization).

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

Browse

About

Support SEP

Mirror Sites

View this site from another server:

USA (Main Site)Philosophy, Stanford University

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054