Theory and Observation in Science

First published Tue Jan 6, 2009; substantive revision Mon Jun 14, 2021

Scientists obtain a great deal of the evidence they use by collectingand producing empirical results. Much of the standard philosophicalliterature on this subject comes from 20^th century logicalempiricists, their followers, and critics who embraced their issueswhile objecting to some of their aims and assumptions. Discussionsabout empirical evidence have tended to focus on epistemologicalquestions regarding its role in theory testing. This entry followsthat precedent, even though empirical evidence also plays importantand philosophically interesting roles in other areas includingscientific discovery, the development of experimental tools andtechniques, and the application of scientific theories to practicalproblems.

The logical empiricists and their followers devoted much of theirattention to the distinction between observables and unobservables,the form and content of observation reports, and the epistemic bearingof observational evidence on theories it is used to evaluate.Philosophical work in this tradition was characterized by the aim ofconceptually separating theory and observation, so that observationcould serve as the pure basis of theory appraisal. More recently, thefocus of the philosophical literature has shifted away from theseissues, and their close association to the languages and logics ofscience, to investigations of how empirical data are generated,analyzed, and used in practice. With this shift, we also seephilosophers largely setting aside the aspiration of a pureobservational basis for scientific knowledge and instead embracing aview of science in which the theoretical and empirical are usefullyintertwined. This entry discusses these topics under the followingheadings:

1. Introduction

Philosophers of science have traditionally recognized a special rolefor observations in the epistemology of science. Observations are theconduit through which the ‘tribunal of experience’delivers its verdicts on scientific hypotheses and theories. Theevidential value of an observation has been assumed to depend on howsensitive it is to whatever it is used to study. But this in turndepends on the adequacy of any theoretical claims its sensitivity maydepend on. For example, we can challenge the use of a particularthermometer reading to support a prediction of a patient’stemperature by challenging theoretical claims having to do withwhether a reading from a thermometer like this one, applied in thesame way under similar conditions, should indicate the patient’stemperature well enough to count in favor of or against theprediction. At least some of those theoretical claims will be suchthat regardless of whether an investigator explicitly endorses, or iseven aware of them, her use of the thermometer reading would beundermined by their falsity. All observations and uses ofobservational evidence are theory laden in this sense (cf. Chang 2005,Azzouni 2004). As the example of the thermometer illustrates,analogues of Norwood Hanson’s claim that seeing is a theoryladen undertaking apply just as well to equipment generatedobservations (Hanson 1958, 19). But if all observations and empiricaldata are theory laden, how can they provide reality-based, objectiveepistemic constraints on scientific reasoning?

Recent scholarship has turned this question on its head. Why thinkthat theory ladenness of empirical results would be problematic in thefirst place? If the theoretical assumptions with which the results areimbued are correct, what is the harm of it? After all, it is in virtueof those assumptions that the fruits of empirical investigation can be‘put in touch’ with theorizing at all. A number scribbledin a lab notebook can do a scientist little epistemic good unless shecan recruit the relevant background assumptions to even recognize itas a reading of the patient’s temperature. But philosophers haveembraced an entangled picture of the theoretical and empirical thatgoes much deeper than this. Lloyd (2012) advocates for what she calls“complex empiricism” in which there is “no pristineseparation of model and data” (397). Bogen (2016) points outthat “impure empirical evidence” (i.e. evidence thatincorporates the judgements of scientists) “often tells us moreabout the world that it could have if it were pure” (784).Indeed, Longino (2020) has urged that “[t]he naïve fantasythat data have an immediate relation to phenomena of the world, thatthey are ‘objective’ in some strong, ontological sense ofthat term, that they are the facts of the world directly speaking tous, should be finally laid to rest” and that “even theprimary, original, state of data is not free from researchers’value- and theory-laden selection and organization” (391).

There is not widespread agreement among philosophers of science abouthow to characterize the nature of scientific theories. What is atheory? According to the traditional syntactic view, theories areconsidered to be collections of sentences couched in logical language,which must then be supplemented with correspondence rules in order tobe interpreted. Construed in this way, theories include maximallygeneral explanatory and predictive laws (Coulomb’s law ofelectrical attraction and repulsion, and Maxwellian electromagnetismequations for example), along with lesser generalizations thatdescribe more limited natural and experimental phenomena (e.g., theideal gas equations describing relations between temperatures andpressures of enclosed gasses, and general descriptions of positionalastronomical regularities). In contrast, the semantic view caststheories as the space of states possible according to the theory, orthe set of mathematical models permissible according to the theory(see Suppe 1977). However, there are also significantly moreecumenical interpretations of what it means to be a scientific theory,which include elements of diverse kinds. To take just one illustrativeexample, Borrelli (2012) characterizes the Standard Model of particlephysics as a theoretical framework involving what she calls“theoretical cores” that are composed of mathematicalstructures, verbal stories, and analogies with empirical referencesmixed together (196). This entry aims to accommodate all of theseviews about the nature of scientific theories.

In this entry, we trace the contours of traditional philosophicalengagement with questions surrounding theory and observation inscience that attempted to segregate the theoretical from theobservational, and to cleanly delineate between the observable and theunobservable. We also discuss the more recent scholarship thatsupplants the primacy of observation by human sensory perception withan instrument-inclusive conception of data production and thatembraces the intertwining of theoretical and empirical in theproduction of useful scientific results. Although theory testingdominates much of the standard philosophical literature onobservation, much of what this entry says about the role ofobservation in theory testing applies also to its role in inventing,and modifying theories, and applying them to tasks in engineering,medicine, and other practical enterprises.

2. Observation and data

2.1 Traditional empiricism

Reasoning from observations has been important to scientific practiceat least since the time of Aristotle, who mentions a number of sourcesof observational evidence including animal dissection (Aristotle(a),763a/30–b/15; Aristotle(b), 511b/20–25).Francis Bacon argued long ago that the best way to discover thingsabout nature is to use experiences (his term for observations as wellas experimental results) to develop and improve scientific theories(Bacon 1620, 49ff). The role of observational evidence in scientificdiscovery was an important topic for Whewell (1858) and Mill (1872)among others in the 19th century. But philosophers didn’t talkabout observation as extensively, in as much detail, or in the way wehave become accustomed to, until the 20^th century whenlogical empiricists transformed philosophical thinking about it.

One important transformation, characteristic of the linguistic turn inphilosophy, was to concentrate on the logic of observation reportsrather than on objects or phenomena observed. This focus made sense onthe assumption that a scientific theory is a system of sentences orsentence-like structures (propositions, statements, claims, and so on)to be tested by comparison to observational evidence. It was assumedthat the comparisons must be understood in terms of inferentialrelations. If inferential relations hold only between sentence-likestructures, it follows that theories must be tested, not againstobservations or things observed, but against sentences, propositions,etc. used to report observations (Hempel 1935,50–51; Schlick 1935). Theory testing was treated asa matter of comparing observation sentences describing observationsmade in natural or laboratory settings to observation sentences thatshould be true according to the theory to be tested. This was to beaccomplished by using laws or lawlike generalizations along withdescriptions of initial conditions, correspondence rules, andauxiliary hypotheses to derive observation sentences describing thesensory deliverances of interest. This makes it imperative to ask whatobservation sentences report.

According to what Hempel called thephenomenalist account,observation reports describe the observer’s subjectiveperceptual experiences.

… Such experiential data might be conceived of as beingsensations, perceptions, and similar phenomena of immediateexperience. (Hempel 1952, 674)

This view is motivated by the assumption that the epistemic value ofan observation report depends upon its truth or accuracy, and thatwith regard to perception, the only thing observers can know withcertainty to be true or accurate is how things appear to them. Thismeans that we cannot be confident that observation reports are true oraccurate if they describe anything beyond the observer’s ownperceptual experience. Presumably one’s confidence in aconclusion should not exceed one’s confidence in one’sbest reasons to believe it. For the phenomenalist, it follows thatreports of subjective experience can provide better reasons to believeclaims they support than reports of other kinds of evidence.

However, given the expressive limitations of the language availablefor reporting subjective experiences, we cannot expect phenomenalisticreports to be precise and unambiguous enough to test theoreticalclaims whose evaluation requires accurate, fine-grained perceptualdiscriminations. Worse yet, if experiences are directly available onlyto those who have them, there is room to doubt whether differentpeople can understand the same observation sentence in the same way.Suppose you had to evaluate a claim on the basis of someoneelse’s subjective report of how a litmus solution looked to herwhen she dripped a liquid of unknown acidity into it. How could youdecide whether her visual experience was the same as the one you woulduse her words to report?

Such considerations led Hempel to propose, contrary to thephenomenalists, that observation sentences report ‘directlyobservable’, ‘intersubjectively ascertainable’ factsabout physical objects

… such as the coincidence of the pointer of an instrument with anumbered mark on a dial; a change of color in a test substance or inthe skin of a patient; the clicking of an amplifier connected with aGeiger counter; etc. (ibid.)

That the facts expressed in observation reports be intersubjectivelyascertainable was critical for the aims of the logical empiricists.They hoped to articulate and explain the authoritativeness widelyconceded to the best natural, social, and behavioral scientifictheories in contrast to propaganda and pseudoscience. Somepronouncements from astrologers and medical quacks gain wideacceptance, as do those of religious leaders who rest their cases onfaith or personal revelation, and leaders who use their politicalpower to secure assent. But such claims do not enjoy the kind ofcredibility that scientific theories can attain. The logicalempiricists tried to account for the genuine credibility of scientifictheories by appeal to the objectivity and accessibility of observationreports, and the logic of theory testing. Part of what they meant bycalling observational evidence objective was that cultural and ethnicfactors have no bearing on what can validly be inferred about themerits of a theory from observation reports. So conceived, objectivitywas important to the logical empiricists’ criticism of the Naziidea that Jews and Aryans have fundamentally different thoughtprocesses such that physical theories suitable for Einstein and hiskind should not be inflicted on German students. In response to thisrationale for ethnic and cultural purging of the German educationalsystem, the logical empiricists argued that because of itsobjectivity, observational evidence (rather than ethnic and culturalfactors) should be used to evaluate scientific theories (Galison1990). In this way of thinking, observational evidence and itssubsequent bearing on scientific theories are objective also in virtueof being free of non-epistemic values.

Ensuing generations of philosophers of science have found the logicalempiricist focus on expressing the content of observations in ararefied and basic observation language too narrow. Search for asuitably universal language as required by the logical empiricistprogram has come up empty-handed and most philosophers of science havegiven up its pursuit. Moreover, as we will discuss in the followingsection, the centrality of observation itself (and pointer readings)to the aims of empiricism in philosophy of science has also come underscrutiny. However, leaving the search for a universal pure observationlanguage behind does not automatically undercut the norm ofobjectivity as it relates to the social, political, and culturalcontexts of scientific research. Pristine logical foundations aside,the objectivity of ‘neutral’ observations in the face ofnoxious political propaganda was appealing because it could serve asshared ground available for intersubjective appraisal. This appealremains alive and well today, particularly as perniciousmisinformation campaigns are again formidable in public discourse (seeO’Connor and Weatherall 2019). If individuals can genuinelyappraise the significance of empirical evidence and come towell-justified agreement about how the evidence bears on theorizing,then they can protect their epistemic deliberations from the undueinfluence of fascists and other nefarious manipulators. However, thisaspiration must face subtleties arising from the social epistemologyof science and from the nature of empirical results themselves. Inpractice, the appraisal of scientific results can often requireexpertise that is not readily accessible to members of the publicwithout the relevant specialized training. Additionally, preciselybecause empirical results are not pure observation reports, theirappraisal across communities of inquirers operating with differentbackground assumptions can require significant epistemic work.

The logical empiricists paid little attention to the distinctionbetween observing and experimenting and its epistemic implications.For some philosophers, to experiment is to isolate, prepare, andmanipulate things in hopes of producing epistemically useful evidence.It had been customary to think of observing as noticing and attendingto interesting details of things perceived under more or less naturalconditions, or by extension, things perceived during the course of anexperiment. To look at a berry on a vine and attend to its color andshape would be to observe it. To extract its juice and apply reagentsto test for the presence of copper compounds would be to perform anexperiment. By now, many philosophers have argued that contrivance andmanipulation influence epistemically significant features ofobservable experimental results to such an extent that epistemologistsignore them at their peril. Robert Boyle (1661), John Herschell(1830), Bruno Latour and Steve Woolgar (1979), Ian Hacking (1983),Harry Collins (1985) Allan Franklin (1986), Peter Galison (1987), JimBogen and Jim Woodward (1988), and Hans-Jörg Rheinberger (1997),are some of the philosophers and philosophically-minded scientists,historians, and sociologists of science who gave serious considerationto the distinction between observing and experimenting. The logicalempiricists tended to ignore it. Interestingly, the contemporaryvantage point that attends to modeling, data processing, and empiricalresults may suggest a re-unification of observation and interventionunder the same epistemological framework. When one no longer thinks ofscientific observation as pure or direct, and recognizes the power ofgood modeling to account for confounds without physically interveningon the target system, the purported epistemic distinction betweenobservation and intervention loses its bite.

2.2 The irrelevance of observation per se

Observers use magnifying glasses, microscopes, or telescopes to seethings that are too small or far away to be seen, or seen clearlyenough, without them. Similarly, amplification devices are used tohear faint sounds. But if to observe something is to perceive it, notevery use of instruments to augment the senses qualifies asobservational.

Philosophers generally agree that you can observe the moons of Jupiterwith a telescope, or a heartbeat with a stethoscope. The van FraassenofThe Scientific Image is a notable exception, for whom tobe ‘observable’ meant to be something that, were itpresent to a creature like us, would be observed. Thus, for vanFraassen, the moons of Jupiter are observable “since astronautswill no doubt be able to see them as well from close up” (1980,16). In contrast, microscopic entities are not observable on vanFraassen’s account because creatures like us cannotstrategically maneuver ourselves to see them, present before us, withour unaided senses.

Many philosophers have criticized van Fraassen’s view as overlyrestrictive. Nevertheless, philosophers differ in their willingness todraw the line between what counts as observable and what does notalong the spectrum of increasingly complicated instrumentation. Manyphilosophers who don’t mind telescopes and microscopes stillfind it unnatural to say that high energy physicists‘observe’ particles or particle interactions when theylook at bubble chamber photographs—let alone digitalvisualizations of energy depositions left in calorimeters that are notthemselves inspected. Their intuitions come from the plausibleassumption that one can observe only what one can see by looking, hearby listening, feel by touching, and so on. Investigators can neitherlook at (direct their gazes toward and attend to) nor visuallyexperience charged particles moving through a detector. Instead theycan look at and see tracks in the chamber, in bubble chamberphotographs, calorimeter data visualizations, etc.

In more contentious examples, some philosophers have moved to speakingof instrument-augmented empirical research as more like tool use thansensing. Hacking (1981) argues that we do not seethrough amicroscope, but ratherwith it. Daston and Galison (2007)highlight the inherent interactivity of a scanning tunnelingmicroscope, in which scientists image and manipulate atoms byexchanging electrons between the sharp tip of the microscope and thesurface to be imaged (397). Others have opted to stretch the meaningof observation to accommodate what we might otherwise be tempted tocall instrument-aided detections. For instance, Shapere (1982) arguesthat while it may initially strike philosophers as counter-intuitive,it makes perfect sense to call the detection of neutrinos from theinterior of the sun “direct observation.”

The variety of views on the observable/unobservable distinction hintthat empiricists may have been barking up the wrong philosophicaltree. Many of the things scientists investigate do not interact withhuman perceptual systems as required to produce perceptual experiencesof them. The methods investigators use to study such things argueagainst the idea—however plausible it may once haveseemed—that scientists do or should rely exclusively on theirperceptual systems to obtain the evidence they need. Thus Feyerabendproposed as a thought experiment that if measuring equipment wasrigged up to register the magnitude of a quantity of interest, atheory could be tested just as well against its outputs as againstrecords of human perceptions (Feyerabend 1969, 132–137).Feyerabend could have made his point with historical examples insteadof thought experiments. A century earlier Helmholtz estimated thespeed of excitatory impulses traveling through a motor nerve. Toinitiate impulses whose speed could be estimated, he implanted anelectrode into one end of a nerve fiber and ran a current into it froma coil. The other end was attached to a bit of muscle whosecontraction signaled the arrival of the impulse. To find out how longit took the impulse to reach the muscle he had to know when thestimulating current reached the nerve. But

[o]ur senses are not capable of directly perceiving an individualmoment of time with such small duration …

and so Helmholtz had to resort to what he called ‘artificialmethods of observation’ (Olesko and Holmes 1994, 84). This meantarranging things so that current from the coil could deflect agalvanometer needle. Assuming that the magnitude of the deflection isproportional to the duration of current passing from the coil,Helmholtz could use the deflection to estimate the duration he couldnot see (ibid). This sense of ‘artificial observation’ isnot to be confused e.g., with using magnifying glasses or telescopesto see tiny or distant objects. Such devices enable the observer toscrutinize visible objects. The minuscule duration of the current flowis not a visible object. Helmholtz studied it by cleverly concoctingcircumstances so that the deflection of the needle would meaningfullyconvey the information he needed. Hooke (1705,16–17) argued for and designed instruments toexecute the same kind of strategy in the 17^th century.

It is of interest that records of perceptual observation are notalways epistemically superior to data collected via experimentalequipment. Indeed, it is not unusual for investigators to usenon-perceptual evidence to evaluate perceptual data and correct forits errors. For example, Rutherford and Pettersson conducted similarexperiments to find out if certain elements disintegrated to emitcharged particles under radioactive bombardment. To detect emissions,observers watched a scintillation screen for faint flashes produced byparticle strikes. Pettersson’s assistants reported seeingflashes from silicon and certain other elements. Rutherford’sdid not. Rutherford’s colleague, James Chadwick, visitedPettersson’s laboratory to evaluate his data. Instead of watchingthe screen and checking Pettersson’s data against what he saw,Chadwick arranged to have Pettersson’s assistants watch thescreen while unbeknownst to them he manipulated the equipment,alternating normal operating conditions with a condition in whichparticles, if any, could not hit the screen. Pettersson’s datawere discredited by the fact that his assistants reported flashes atclose to the same rate in both conditions (Stuewer 1985,284–288).

When the process of producing data is relatively convoluted, it iseven easier to see that human sense perception is not the ultimateepistemic engine. Consider functional magnetic resonance images (fMRI)of the brain decorated with colors to indicate magnitudes ofelectrical activity in different regions during the performance of acognitive task. To produce these images, brief magnetic pulses areapplied to the subject’s brain. The magnetic force coordinatesthe precessions of protons in hemoglobin and other bodily stuffs tomake them emit radio signals strong enough for the equipment torespond to. When the magnetic force is relaxed, the signals fromprotons in highly oxygenated hemoglobin deteriorate at a detectablydifferent rate than signals from blood that carries less oxygen.Elaborate algorithms are applied to radio signal records to estimateblood oxygen levels at the places from which the signals arecalculated to have originated. There is good reason to believe thatblood flowing just downstream from spiking neurons carries appreciablymore oxygen than blood in the vicinity of resting neurons. Assumptionsabout the relevant spatial and temporal relations are used to estimatelevels of electrical activity in small regions of the braincorresponding to pixels in the finished image. The results of all ofthese computations are used to assign the appropriate colors to pixelsin a computer generated image of the brain. In view of all of this,functional brain imaging differs, e.g., from looking and seeing,photographing, and measuring with a thermometer or a galvanometer inways that make it uninformative to call it observation. And similarlyfor many other methods scientists use to produce non-perceptualevidence.

The role of the senses in fMRI data production is limited to suchthings as monitoring the equipment and keeping an eye on the subject.Their epistemic role is limited to discriminating the colors in thefinished image, reading tables of numbers the computer used to assignthem, and so on. While it is true that researchers typically use theirsense of sight to take in visualizations of processed fMRIdata—or numbers on a page or screen for thatmatter—this is not the primary locus of epistemicaction. Researchers learn about brain processes through fMRI data, tothe extent that they do, primarily in virtue of the suitability of thecausal connection between the target processes and the data records,and of the transformations those data undergo when they are processedinto the maps or other results that scientists want to use. Theinteresting questions are not about observability, i.e. whetherneuronal activity, blood oxygen levels, proton precessions, radiosignals, and so on, are properly understood as observable by creatureslike us. The epistemic significance of the fMRI data depends on theirdelivering us the right sort of access to the target, but observationis neither necessary nor sufficient for thataccess.

Following Shapere (1982), one could respond by adopting an extremelypermissive view of what counts as an ‘observation’ so asto allow even highly processed data to count as observations. However,it is hard to reconcile the idea that highly processed data like fMRIimages record observations with the traditional empiricist notion thatcalculations involving theoretical assumptions and background beliefsmust not be allowed (on pain of loss of objectivity) to intrude intothe process of data production. Observation garnered its specialepistemic status in the first place because it seemed more direct,more immediate, and therefore less distorted and muddled than (say)detection or inference. The production of fMRI images requiresextensive statistical manipulation based on theories about the radiosignals, and a variety of factors having to do with their detectionalong with beliefs about relations between blood oxygen levels andneuronal activity, sources of systematic error, and more. Insofar asthe use of the term ‘observation’ connotes this extrabaggage of traditional empiricism, it may be better to replaceobservation-talk with terminology that is more obviously permissive,such as that of ‘empirical data’ and ‘empiricalresults.’

2.3 Data and phenomena

Deposing observation from its traditional perch in empiricistepistemologies of science need not estrange philosophers fromscientific practice. Terms like ‘observation’ and‘observation reports’ do not occur nearly as much inscientific as in philosophical writings. In their place, workingscientists tend to talk aboutdata. Philosophers who adoptthis usage are free to think about standard examples of observation asmembers of a large, diverse, and growing family of data productionmethods. Instead of trying to decide which methods to classify asobservational and which things qualify as observables, philosopherscan then concentrate on the epistemic influence of the factors thatdifferentiate members of the family. In particular, they can focustheir attention on what questions data produced by a given method canbe used to answer, what must be done to use that data fruitfully, andthe credibility of the answers they afford (Bogen 2016).

Satisfactorily answering such questions warrants further philosophicalwork. As Bogen and Woodward (1988) have argued, there is often a longroad between obtaining a particular dataset replete withidiosyncrasies born of unspecified causal nuances to any claim aboutthe phenomenon ultimately of interest to the researchers. Empiricaldata are typically produced in ways that make it impossible to predictthem from the generalizations they are used to test, or to deriveinstances of those generalizations from data and non ad hoc auxiliaryhypotheses. Indeed, it is unusual for many members of a set ofreasonably precise quantitative data to agree with one another, letalone with a quantitative prediction. That is because precise,publicly accessible data typically cannot be produced except throughprocesses whose results reflect the influence of causal factors thatare too numerous, too different in kind, and too irregular in behaviorfor any single theory to account for them. When Bernard Katz recordedelectrical activity in nerve fiber preparations, the numerical valuesof his data were influenced by factors peculiar to the operation ofhis galvanometers and other pieces of equipment, variations among thepositions of the stimulating and recording electrodes that had to beinserted into the nerve, the physiological effects of their insertion,and changes in the condition of the nerve as it deteriorated duringthe course of the experiment. There were variations in theinvestigators’ handling of the equipment. Vibrations shook theequipment in response to a variety of irregularly occurring causesranging from random error sources to the heavy tread of Katz’steacher, A.V. Hill, walking up and down the stairs outside of thelaboratory. That’s a short list. To make matters worse, many ofthese factors influenced the data as parts of irregularly occurring,transient, and shifting assemblies of causal influences.

The effects of systematic and random sources of error are typicallysuch that considerable analysis and interpretation are required totake investigators from data sets to conclusions that can be used toevaluate theoretical claims. Interestingly, this applies as much toclear cases of perceptual data as to machine produced records. When19^th and early 20^th century astronomers lookedthrough telescopes and pushed buttons to record the time at which theysaw a star pass a crosshair, the values of their data points depended,not only upon light from that star, but also upon features ofperceptual processes, reaction times, and other psychological factorsthat varied from observer to observer. No astronomical theory has theresources to take such things into account.

Instead of testing theoretical claims by direct comparison to the datainitially collected, investigators use data to infer facts aboutphenomena, i.e., events, regularities, processes, etc. whose instancesare uniform and uncomplicated enough to make them susceptible tosystematic prediction and explanation (Bogen and Woodward 1988, 317).The fact that lead melts at temperatures at or close to 327.5 C is anexample of a phenomenon, as are widespread regularities amongelectrical quantities involved in the action potential, the motions ofastronomical bodies, etc. Theories that cannot be expected to predictor explain such things as individual temperature readings cannevertheless be evaluated on the basis of how useful they are inpredicting or explaining phenomena. The sameholds for the action potential as opposed to the electrical data fromwhich its features are calculated, and the motions of astronomicalbodies in contrast to the data of observational astronomy. It isreasonable to ask a genetic theory how probable it is (given similarupbringings in similar environments) that the offspring of a parent orparents diagnosed with alcohol use disorder will develop one or moresymptoms the DSM classifies as indicative of alcohol use disorder. Butit would be quite unreasonable to ask the genetic theory to predict orexplain one patient’s numerical score on one trial of aparticular diagnostic test, or why a diagnostician wrote a particularentry in her report of an interview with an offspring of one of suchparents (see Bogen and Woodward, 1988,319–326).

Leonelli has challenged Bogen and Woodward’s (1988) claim thatdata are, as she puts it, “unavoidably embedded in oneexperimental context” (2009, 738). She argues that when data aresuitably packaged, they can travel to new epistemic contexts andretain epistemic utility—it is not just claims aboutthe phenomena that can travel, data travel too. Preparing data forsafe travel involves work, and by tracing data ‘journeys,’philosophers can learn about how the careful labor of researchers,data archivists, and database curators can facilitate useful datamobility. While Leonelli’s own work has often focused on data inbiology, Leonelli and Tempini (2020) contains many diverse casestudies of data journeys from a variety of scientific disciplines thatwill be of value to philosophers interested in the methodology andepistemology of science in practice.

The fact that theories typically predict and explain features ofphenomena rather than idiosyncratic data should not be interpreted asa failing. For many purposes, this is the more useful and illuminatingcapacity. Suppose you could choose between a theory that predicted orexplained the way in which neurotransmitter release relates toneuronal spiking (e.g., the fact that on average, transmitters arereleased roughly once for every 10 spikes) and a theory whichexplained or predicted the numbers displayed on the relevantexperimental equipment in one, or a few single cases. For mostpurposes, the former theory would be preferable to the latter at thevery least because it applies to so many more cases. And similarly fortheories that predict or explain something about the probability ofalcohol use disorder conditional on some genetic factor or a theorythat predicted or explained the probability of faulty diagnoses ofalcohol use disorder conditional on facts about the training thatpsychiatrists receive. For most purposes, these would be preferable toa theory that predicted specific descriptions in a single particularcase history.

However, there are circumstances in which scientists do want toexplain data. In empirical research it is often crucial to getting auseful signal that scientists deal with sources of background noiseand confounding signals. This is part of the long road from newlycollected data to useful empirical results. An important step on theway to eliminating unwanted noise or confounds is to determine theirsources. Different sources of noise can have different characteristicsthat can be derived from and explained by theory. Consider thedifference between ‘shot noise’ and ‘thermalnoise,’ two ubiquitous sources of noise in precision electronics(Schottky 1918; Nyquist 1928; Horowitz and Hill 2015). ‘Shotnoise’ arises in virtue of the discrete nature of a signal. Forinstance, light collected by a detector does not arrive all at once orin perfectly continuous fashion. Photons rain onto a detector shot byshot on account of being quanta. Imagine building up an image onephoton at a time—at first the structure of the imageis barely recognizable, but after the arrival of many photons, theimage eventually fills in. In fact, the contribution of noise of thistype goes as the square root of the signal. By contrast, thermal noiseis due to non-zero temperature—thermal fluctuationscause a small current to flow in any circuit. If you cool yourinstrument (which very many precision experiments in physics do) thenyou can decrease thermal noise. Cooling the detector is not going tochange the quantum nature of photons though. Simply collecting morephotons will improve the signal to noise ratio with respect to shotnoise. Thus, determining what kind of noise is affecting one’sdata, i.e. explaining features of the data themselves that areidiosyncratic to the particular instruments and conditions prevailingduring a specific instance of data collection, can be critical toeventually generating a dataset that can be used to answer questionsabout phenomena of interest. In using data that require statisticalanalysis, it is particularly clear that “empirical assumptionsabout the factors influencing the measurement results may be used tomotivate the assumption of a particular error distribution”,which can be crucial for justifying the application of methods ofanalysis (Woodward 2011, 173).

There are also circumstances in which scientists want to provide asubstantive, detailed explanation for a particular idiosyncraticdatum, and even circumstances in which procuring such explanations isepistemically imperative. Ignoring outliers without good epistemicreasons is just cherry-picking data, one of the canonical‘questionable research practices.’ Allan Franklin has described RobertMillikan’s convenient exclusion of data he collected fromobserving the second oil drop in his experiments of April 16, 1912(1986, 231). When Millikan initially recorded the data for this drop,his notebooks indicate that he was satisfied his apparatus was workingproperly and that the experiment was running well—hewrote “Publish” next to the data in his lab notebook.However, after he had later calculated the value for the fundamentalelectric charge that these data yielded, and found it aberrant withrespect to the values he calculated using data collected from othergood observing sessions, he changed his mind, writing“Won’t work” next to the calculation (ibid., seealso Woodward 2010, 794). Millikan not only never published thisresult, he never published why he failed to publish it. When data areexcluded from analysis, there ought to be some explanation justifyingtheir omission over and above lack of agreement with theexperimenters’ expectations. Precisely because they areoutliers, some data require specific, detailed, idiosyncratic causalexplanations. Indeed, it is often in virtue of those very explanationsthat outliers can be responsibly rejected. Some explanation of datarejected as ‘spurious’ is required. Otherwise, scientistsrisk biasing their own work.

Thus, while in transforming data as collected into something usefulfor learning about phenomena, scientists often account for features ofthe data such as different types of noise contributions, and sometimeseven explain the odd outlying data point or artifact, they simply donot explain every individual teensy tiny causal contribution to theexact character of a data set or datum in full detail. This is becausescientists can neither discover such causal minutia nor would theirinvocation be necessary for typical research questions. The fact thatit may sometimes be important for scientists to provide detailedexplanations of data, and not just claims about phenomena inferredfrom data, should not be confused with the dubious claim thatscientists could ‘in principle’ detail every causal quirkthat contributed to some data (Woodward 2010; 2011).

In view of all of this, together with the fact that a great manytheoretical claims can only be tested directly against facts aboutphenomena, it behooves epistemologists to think about how data areused to answer questions about phenomena. Lacking space for a detaileddiscussion, the most this entry can do is to mention two main kinds ofthings investigators do in order to draw conclusions from data. Thefirst is causal analysis carried out with or without the use ofstatistical techniques. The second is non-causal statisticalanalysis.

First, investigators must distinguish features of the data that areindicative of facts about the phenomenon of interest from those whichcan safely be ignored, and those which must be corrected for.Sometimes background knowledge makes this easy. Under normalcircumstances investigators know that their thermometers are sensitiveto temperature, and their pressure gauges, to pressure. An astronomeror a chemist who knows what spectrographic equipment does, and whatshe has applied it to will know what her data indicate. Sometimes itis less obvious. When Santiago Ramón y Cajal looked through hismicroscope at a thin slice of stained nerve tissue, he had to figureout which, if any, of the fibers he could see at one focal lengthconnected to or extended from things he could see only at anotherfocal length, or in another slice. Analogous considerations apply toquantitative data. It was easy for Katz to tell when his equipment wasresponding more to Hill’s footfalls on the stairs than to theelectrical quantities it was set up to measure. It can be harder totell whether an abrupt jump in the amplitude of a high frequency EEGoscillation was due to a feature of the subjects brain activity or anartifact of extraneous electrical activity in the laboratory oroperating room where the measurements were made. The answers toquestions about which features of numerical and non-numerical data areindicative of a phenomenon of interest typically depend at least inpart on what is known about the causes that conspire to produce thedata.

Statistical arguments are often used to deal with questions about theinfluence of epistemically relevant causal factors. For example, whenit is known that similar data can be produced by factors that havenothing to do with the phenomenon of interest, Monte Carlosimulations, regression analyses of sample data, and a variety ofother statistical techniques sometimes provide investigators withtheir best chance of deciding how seriously to take a putativelyilluminating feature of their data.

But statistical techniques are also required for purposes other thancausal analysis. To calculate the magnitude of a quantity like themelting point of lead from a scatter of numerical data, investigatorsthrow out outliers, calculate the mean and the standard deviation,etc., and establish confidence and significance levels. Regression andother techniques are applied to the results to estimate how far fromthe mean the magnitude of interest can be expected to fall in thepopulation of interest (e.g., the range of temperatures at which puresamples of lead can be expected to melt).

The fact that little can be learned from data without causal,statistical, and related argumentation has interesting consequencesfor received ideas about how the use of observational evidencedistinguishes science from pseudoscience, religion, and othernon-scientific cognitive endeavors. First, scientists are not the onlyones who use observational evidence to support their claims;astrologers and medical quacks use them too. To find epistemicallysignificant differences, one must carefully consider what sorts ofdata they use, where it comes from, and how it is employed. Thevirtues of scientific as opposed to non-scientific theory evaluationsdepend not only on its reliance on empirical data, but also on how thedata are produced, analyzed and interpreted to draw conclusionsagainst which theories can be evaluated. Secondly, it does not takemany examples to refute the notion that adherence to a single,universally applicable ‘scientific method’ differentiatesthe sciences from the non-sciences. Data are produced, and used in fartoo many different ways to treat informatively as instances of anysingle method. Thirdly, it is usually, if not always, impossible forinvestigators to draw conclusions to test theories againstobservational data without explicit or implicit reliance ontheoretical resources.

Bokulich (2020) has helpfully outlined a taxonomy of various ways inwhich data can be model-laden to increase their epistemic utility. Shefocuses on seven categories: data conversion, data correction, datainterpolation, data scaling, data fusion, data assimilation, andsynthetic data. Of these categories, conversion and correction areperhaps the most familiar. Bokulich reminds us that even in the caseof reading a temperature from an ordinary mercury thermometer, we are‘converting’ the data as measured, which in this case isthe height of the column of mercury, to a temperature (ibid., 795). Inmore complicated cases, such as processing the arrival times ofacoustic signals in seismic reflection measurements to yield valuesfor subsurface depth, data conversion may involve models (ibid.). Inthis example, models of the composition and geometry of the subsurfaceare needed in order to account for differences in the speed of soundin different materials. Data ‘correction’ involves commonpractices we have already discussed like modeling and mathematicallysubtracting background noise contributions from one’s dataset(ibid., 796). Bokulich rightly points out that involving models inthese ways routinely improves the epistemic uses to which data can beput. Data interpolation, scaling, and ‘fusion’ are alsorelatively widespread practices that deserve further philosophicalanalysis. Interpolation involves filling in missing data in a patchydata set, under the guidance of models. Data are scaled when they havebeen generated in a particular scale (temporal, spatial, energy) andmodeling assumptions are recruited to transform them to apply atanother scale. Data are ‘fused,’ in Bokulich’sterminology, when data collected in diverse contexts, using diversemethods are combined, or integrated together. For instance, when datafrom ice cores, tree rings, and the historical logbooks of seacaptains are merged into a joint climate dataset. Scientists must takecare in combining data of diverse provenance, and model newuncertainties arising from the very amalgamation of datasets (ibid.,800).

Bokulich contrasts ‘synthetic data’ with what she calls‘real data’ (ibid., 801–802). Synthetic data are virtual,or simulated data, and are not produced by physical interaction withworldly research targets. Bokulich emphasizes the role that simulateddata can usefully play in testing and troubleshooting aspects of dataprocessing that are to eventually be deployed on empirical data(ibid., 802). It can be incredibly useful for developing andstress-testing a data processing pipeline to have fake datasets whosecharacteristics are already known in virtue of having been produced bythe researchers, and being available for their inspection at will.When the characteristics of a dataset are known, or indeed can betailored according to need, the effects of new processing methods canbe more readily traced than without. In this way, researchers canfamiliarize themselves with the effects of a data processing pipeline,and make adjustments to that pipeline in light of what they learn byfeeding fake data through it, before attempting to use that pipelineon actual science data. Such investigations can be critical toeventually arguing for the credibility of the final empirical resultsand their appropriate interpretation and use.

Data assimilation is perhaps a less widely appreciated aspect ofmodel-based data processing among philosophers of science, exceptingParker (2016; 2017). Bokulich characterizes this method as “theoptimal integration of data with dynamical model estimates to providea more accurate ‘assimilation estimate’ of thequantity” (2020, 800). Thus, data assimilation involvesbalancing the contributions of empirical data and the output of modelsin an integrated estimate, according to the uncertainties associatedwith these contributions.

Bokulich argues that the involvement of models in these variousaspects of data processing does not necessarily lead to betterepistemic outcomes. Done wrong, integrating models and data canintroduce artifacts and make the processed data unreliable for thepurpose at hand (ibid., 804). Indeed, she notes that “[t]here ismuch work for methodologically reflective scientists and philosophersof science to do in string out cases in which model-data symbiosis maybe problematic or circular” (ibid.)

3. Theory and value ladenness

Empirical results are laden with values and theoretical commitments.Philosophers have raised and appraised several possible kinds ofepistemic problems that could be associated with theory and/orvalue-laden empirical results. They have worried about the extent towhich human perception itself is distorted by our commitments. Theyhave worried that drawing upon theoretical resources from the verytheory to be appraised (or its competitors) in the generation ofempirical results yields vicious circularity (or inconsistency). Theyhave also worried that contingent conceptual and/or linguisticframeworks trap bits of evidence like bees in amber so that theycannot carry on their epistemic lives outside of the contexts of theirorigination, and that normative values necessarily corrupt theintegrity of science. Do the theory and value-ladenness of empiricalresults render them hopelessly parochial? That is, when scientistsleave theoretical commitments behind and adopt new ones, must theyalso relinquish the fruits of the empirical research imbued with theirprior commitments too? In this section, we discuss these worries andresponses that philosophers have offered to assuage them.

3.1 Perception

If you believe that observation by human sense perception is theobjective basis of all scientific knowledge, then you ought to beparticularly worried about the potential for human perception to becorrupted by theoretical assumptions, wishful thinking, framingeffects, and so on. Daston and Galison recount the striking example ofArthur Worthington’s symmetrical milk drops (2007, 11–16).Working in 1875, Worthington investigated the hydrodynamics of fallingfluid droplets and their evolution upon impacting a hard surface. Atfirst, he had tried to carefully track the drop dynamics with a strobelight to burn a sequence of images into his own retinas. The images hedrew to record what he saw were radially symmetric, with rays of thedrop splashes emanating evenly from the center of the impact. However,when Worthington transitioned from using his eyes and capacity to drawfrom memory to using photography in 1894, he was shocked to find thatthe kind of splashes he had been observing were irregular splats(ibid., 13). Even curiouser, when Worthington returned to hisdrawings, he found that he had indeed recorded some unsymmetricalsplashes. He had evidently dismissed them as uninformative accidentsinstead of regarding them as revelatory of the phenomenon he wasintent on studying (ibid.) In attempting to document the ideal form ofthe splashes, a general and regular form, he had subconsciouslydown-played the irregularity of individual splashes. If theoreticalcommitments, like Worthington’s initial commitment to theperfect symmetry of the physics he was studying, pervasively andincorrigibly dictated the results of empirical inquiry, then theepistemic aims of science would be seriously undermined.

Perceptual psychologists, Bruner and Postman, found that subjects whowere briefly shown anomalous playing cards, e.g., a black four ofhearts, reported having seen their normal counterparts e.g., a redfour of hearts. It took repeated exposures to get subjects to say theanomalous cards didn’t look right, and eventually, to describethem correctly (Kuhn 1962, 63). Kuhn took such studies to indicatethat things don’t look the same to observers with differentconceptual resources. (For a more up-to-date discussion of theory andconceptual perceptual loading see Lupyan 2015.) If so, black heartsdidn’t look like black hearts until repeated exposures somehowallowed subjects to acquire the concept of a black heart. By analogy,Kuhn supposed, when observers working in conflicting paradigms look atthe same thing, their conceptual limitations should keep them fromhaving the same visual experiences (Kuhn 1962, 111,113–114, 115, 120–1). This wouldmean, for example, that when Priestley and Lavoisier watched the sameexperiment, Lavoisier should have seen what accorded with his theorythat combustion and respiration are oxidation processes, whilePriestley’s visual experiences should have agreed with histheory that burning and respiration are processes of phlogistonrelease.

The example of Pettersson’s and Rutherford’s scintillationscreen evidence (above) attests to the fact that observers working indifferent laboratories sometimes report seeing different things undersimilar conditions. It is plausible that their expectations influencetheir reports. It is plausible that their expectations are shaped bytheir training and by their supervisors’ and associates’theory driven behavior. But as happens in other cases as well, allparties to the dispute agreed to reject Pettersson’s data byappealing to results that both laboratories could obtain and interpretin the same way without compromising their theoretical commitments.Indeed, it is possible for scientists to share empirical results, notjust across diverse laboratory cultures, but even across seriousdifferences in worldview. Much as they disagreed about the nature ofrespiration and combustion, Priestley and Lavoisier gavequantitatively similar reports of how long their mice stayed alive andtheir candles kept burning in closed bell jars. Priestley taughtLavoisier how to obtain what he took to be measurements of thephlogiston content of an unknown gas. A sample of the gas to be testedis run into a graduated tube filled with water and inverted over awater bath. After noting the height of the water remaining in thetube, the observer adds “nitrous air” (we call it nitricoxide) and checks the water level again. Priestley, who thought therewas no such thing as oxygen, believed the change in water levelindicated how much phlogiston the gas contained. Lavoisier reportedobserving the same water levels as Priestley even after he abandonedphlogiston theory and became convinced that changes in water levelindicated free oxygen content (Conant 1957,74–109).

A related issue is that of salience. Kuhn claimed that if Galileo andan Aristotelian physicist had watched the same pendulum experiment,they would not have looked at or attended to the same things. TheAristotelian’s paradigm would have required the experimenter tomeasure

… the weight of the stone, the vertical height to which it hadbeen raised, and the time required for it to achieve rest (Kuhn 1962,123)

and ignore radius, angular displacement, and time per swing (ibid.,124). These last were salient to Galileo because he treated pendulumswings as constrained circular motions. The Galilean quantities wouldbe of no interest to an Aristotelian who treats the stone as fallingunder constraint toward the center of the earth (ibid., 123). ThusGalileo and the Aristotelian would not have collected the same data.(Absent records of Aristotelian pendulum experiments we can think ofthis as a thought experiment.)

Interests change, however. Scientists may eventually come toappreciate the significance of data that had not originally beensalient to them in light of new presuppositions. The moral of theseexamples is that although paradigms or theoretical commitmentssometimes have an epistemically significant influence on whatobservers perceive or what they attend to, it can be relatively easyto nullify or correct for their effects. When presuppositions causeepistemic damage, investigators are often able to eventually makecorrections. Thus, paradigms and theoretical commitments actually doinfluence saliency, but their influence is neither inevitable norirremediable.

3.2 Assuming the theory to be tested

Thomas Kuhn (1962), Norwood Hanson (1958), Paul Feyerabend (1959) andothers cast suspicion on the objectivity of observational evidence inanother way by arguing that one cannot use empirical evidence to testa theory without committing oneself to that very theory. This would bea problem if it leads to dogmatism but assuming the theory to betested is often benign and even necessary.

For instance, Laymon (1988) demonstrates the manner in which the verytheory that the Michelson-Morley experiments are considered to test isassumed in the experimental design, but that this does not engenderdeleterious epistemic effects (250). The Michelson-Morley apparatusconsists of two interferometer arms at right angles to one another,which are rotated in the course of the experiment so that, on theoriginal construal, the path length traversed by light in theapparatus would vary according to alignment with or against theEarth’s velocity (carrying the apparatus) with respect to thestationary aether. This difference in path length would show up asdisplacement in the interference fringes of light in theinterferometer. Although Michelson’s intention had been tomeasure the velocity of the Earth with respect to the all-pervadingaether, the experiments eventually came to be regarded as furnishingtests of the Fresnel aether theory itself. In particular, the nullresults of these experiments were taken as evidence against theexistence of the aether. Naively, one might suppose that whateverassumptions were made in the calculation of the results of theseexperiments, it should not be the case that the theory under the gunwas assumed nor that its negation was.

Before Michelson’s experiments, the Fresnel aether theory didnot predict any sort of length contraction. Although Michelson assumedno contraction in the arms of the interferometer, Laymon argues thathe could have assumed contraction, with no practical impact on theresults of the experiments. The predicted fringe shift is calculatedfrom the anticipated difference in the distance traveled by light inthe two arms is the same, when higher order terms are neglected. Thus,in practice, the experimenters could assume either that thecontraction thesis was true or that it was false when determining thelength of the arms. Either way, the results of the experiment would bethe same. After Michelson’s experiments returned no evidence ofthe anticipated aether effects, Lorentz-Fitzgerald contraction waspostulated precisely to cancel out the expected (but not found)effects and save the aether theory. Morley and Miller then set outspecifically to test the contraction thesis, and still assumed nocontraction in determining the length of the arms of theirinterferometer (ibid., 253). Thus Laymon argues that theMichelson-Morley experiments speak against the tempting assumptionthat “appraisal of a theory is based on phenomena which can bedetected and measured without using assumptions drawn from the theoryunder examinationor from competitors to that theory”(ibid., 246).

Epistemological hand-wringing about the use of the very theory to betested in the generation of the evidence to be used for testing, seemsto spring primarily from a concern about vicious circularity. How canwe have a genuine trial, if the theory in question has been presumedinnocent from the outset? While it is true that there would be aserious epistemic problem in a case where the use of the theory to betested conspired toguarantee that the evidence would turnout to be confirmatory, this is not always the case when theories areinvoked in their own testing. Woodward (2011) summarizes a tidycase:

For example, in Millikan’s oil drop experiment, the mere factthat theoretical assumptions (e.g., that the charge of the electron isquantized and that all electrons have the same charge) play a role inmotivating his measurements or a vocabulary for describing his resultsdoes not by itself show that his design and data analysis were of sucha character as to guarantee that he would obtain results supportinghis theoretical assumptions. His experiment was such that he mightwell have obtained results showing that the charge of the electron wasnot quantized or that there was no single stable value for thisquantity. (178)

For any given case, determining whether the theoretical assumptionsbeing made are benign or straight-jacketing the results that it willbe possible to obtain will require investigating the particularrelationships between the assumptions and results in that case. Whendata production and analysis processes are complicated, this task canget difficult. But the point is that merely noting the involvement ofthe theory to be tested in the generation of empirical results doesnot by itself imply that those results cannot be objectively usefulfor deciding whether the theory to be tested should be accepted orrejected.

3.3 Semantics

Kuhn argued that theoretical commitments exert a strong influence onobservation descriptions, and what they are understood to mean (Kuhn1962, 127ff; Longino 1979, 38–42). If so, proponents of a caloricaccount of heat won’t describe or understand descriptions ofobserved results of heat experiments in the same way as investigatorswho think of heat in terms of mean kinetic energy or radiation. Theymight all use the same words (e.g., ‘temperature’) toreport an observation without understanding them in the same way. Thisposes a potential problem for communicating effectively acrossparadigms, and similarly, for attributing the appropriate significanceto empirical results generated outside of one’s own linguisticframework.

It is important to bear in mind that observers do not always usedeclarative sentences to report observational and experimentalresults. Instead, they often draw, photograph, make audio recordings,etc. or set up their experimental devices to generate graphs,pictorial images, tables of numbers, and other non-sentential records.Obviously investigators’ conceptual resources and theoreticalbiases can exert epistemically significant influences on what theyrecord (or set their equipment to record), which details they includeor emphasize, and which forms of representation they choose (Dastonand Galison 2007, 115–190,309–361). But disagreements about the epistemicimport of a graph, picture or other non-sentential bit of data oftenturn on causal rather than semantical considerations. Anatomists mayhave to decide whether a dark spot in a micrograph was caused by astaining artifact or by light reflected from an anatomicallysignificant structure. Physicists may wonder whether a blip in aGeiger counter record reflects the causal influence of the radiationthey wanted to monitor, or a surge in ambient radiation. Chemists mayworry about the purity of samples used to obtain data. Such questionsare not, and are not well represented as, semantic questions to whichsemantic theory loading is relevant. Late 20^th centuryphilosophers may have ignored such cases and exaggerated the influenceof semantic theory loading because they thought of theory testing interms of inferential relations between observation and theoreticalsentences.

Nevertheless, some empirical results are reported as declarativesentences. Looking at a patient with red spots and a fever, aninvestigator might report having seen the spots, or measles symptoms,or a patient with measles. Watching an unknown liquid dripping into alitmus solution an observer might report seeing a change in color, aliquid with a PH of less than 7, or an acid. The appropriateness of adescription of a test outcome depends on how the relevant concepts areoperationalized. What justifies an observer to report having observeda case of measles according to one operationalization might requireher to say no more than that she had observed measles symptoms, orjust red spots according to another.

In keeping with Percy Bridgman’s view that

… in general, we mean by a concept nothing more than a set ofoperations; the concept is synonymous with the corresponding sets ofoperations (Bridgman 1927, 5)

one might suppose that operationalizations are definitions or meaningrules such that it is analytically true, e.g., that every liquid thatturns litmus red in a properly conducted test is acidic. But it ismore faithful to actual scientific practice to think ofoperationalizations as defeasible rules for the application of aconcept such that both the rules and their applications are subject torevision on the basis of new empirical or theoretical developments. Sounderstood, to operationalize is to adopt verbal and related practicesfor the purpose of enabling scientists to do their work.Operationalizations are thus sensitive and subject to change on thebasis of findings that influence their usefulness (Feest 2005).

Definitional or not, investigators in different research traditionsmay be trained to report their observations in conformity withconflicting operationalizations. Thus instead of training observers todescribe what they see in a bubble chamber as a whitish streak or atrail, one might train them to say they see a particle track or even aparticle. This may reflect what Kuhn meant by suggesting that someobservers might be justified or even required to describe themselvesas having seen oxygen, transparent and colorless though it is, oratoms, invisible though they are (Kuhn 1962, 127ff). To the contrary,one might object that what one sees should not be confused with whatone is trained to say when one sees it, and therefore that talkingabout seeing a colorless gas or an invisible particle may be nothingmore than a picturesque way of talking about what certainoperationalizations entitle observers to say. Strictly speaking, theobjection concludes, the term ‘observation report’ shouldbe reserved for descriptions that are neutral with respect toconflicting operationalizations.

If observational data are just those utterances that meetFeyerabend’s decidability and agreeability conditions, theimport of semantic theory loading depends upon how quickly, and forwhich sentences reasonably sophisticated language users who stand indifferent paradigms can non-inferentially reach the same decisionsabout what to assert or deny. Some would expect enough agreement tosecure the objectivity of observational data. Others would not. Stillothers would try to supply different standards for objectivity.

With regard to sentential observation reports, the significance ofsemantic theory loading is less ubiquitous than one might expect. Theinterpretation of verbal reports often depends on ideas about causalstructure rather than the meanings of signs. Rather than worryingabout the meaning of words used to describe their observations,scientists are more likely to wonder whether the observers made up orwithheld information, whether one or more details were artifacts ofobservation conditions, whether the specimens were atypical, and soon.

Note that the worry about semantic theory loading extends beyondobservation reports of the sort that occupied the logical empiricistsand their close intellectual descendents. Combining results of diversemethods for making proxy measurements of paleoclimate temperatures inan epistemically responsible way requires careful attention to thevariety of operationalizations at play. Even if no ‘observationreports’ are involved, the sticky question about how to usefullymerge results obtained in different ways in order to satisfyone’s epistemic aims remains. Happily, the remedy for the worryabout semantic loading in this broader sense is likely to be thesame—investigating the provenance of those resultsand comparing the variety of factors that have contributed to theircausal production.

Kuhn placed too much emphasis on the discontinuity between evidencegenerated in different paradigms. Even if we accept a broadly Kuhnianpicture, according to which paradigms are heterogeneous collections ofexperimental practices, theoretical principles, problems selected forinvestigation, approaches to their solution, etc., connections betweencomponents are loose enough to allow investigators who disagreeprofoundly over one or more theoretical claims to nevertheless agreeabout how to design, execute, and record the results of theirexperiments. That is why neuroscientists who disagreed about whethernerve impulses consisted of electrical currents could measure the sameelectrical quantities, and agree on the linguistic meaning and theaccuracy of observation reports including such terms as‘potential’, ‘resistance’,‘voltage’ and ‘current’. As we discussedabove, the success that scientists have in repurposing resultsgenerated by others for different purposes speaks against theconfinement of evidence to its native paradigm. Even when scientistsworking with radically different core theoretical commitments cannotmake the same measurements themselves, with enough contextualinformation about how each conducts research, it can be possible toconstruct bridges that span the theoretical divides.

3.4 Values

One could worry that the intertwining of the theoretical and empiricalwould open the floodgates to bias in science. Human cognizing, bothhistorical and present day, is replete with disturbing commitmentsincluding intolerance and narrow mindedness of many sorts. If suchcommitments are integral to a theoretical framework, or endemic to thereasoning of a scientist or scientific community, then they threatento corrupt the epistemic utility of empirical results generated usingtheir resources. The core impetus of the ‘value-freeideal’ is to maintain a safe distance between the appraisal ofscientific theories according to the evidence on one hand, and theswarm of moral, political, social, and economic values on the other.While proponents of the value-free ideal might admit that themotivation to pursue a theory or the legal protection of humansubjects in permissible experimental methods involve non-epistemicvalues, they would contend that such values ought not ought not enterinto the constitution of empirical results themselves, nor theadjudication or justification of scientific theorizing in light of theevidence (see Intemann 2021, 202).

As a matter of fact, values do enter into science at a variety ofstages. Above we saw that ‘theory-ladenness’ could referto the involvement of theory in perception, in semantics, and in akind of circularity that some have worried begets unfalsifiability andthereby dogmatism. Like theory-ladenness, values can and sometimes doaffect judgments about the salience of certain evidence and theconceptual framing of data. Indeed, on a permissive construal of thenature of theories, values can simply be understood as part of atheoretical framework. Intemann (2021) highlights a striking examplefrom medical research where key conceptual resources include notionslike ‘harm,’ ‘risk,’ ‘healthbenefit,’ and ‘safety.’ She refers to research onthe comparative safety of giving birth at home and giving birth at ahospital for low-risk parents in the United States. Studies reportingthat home births are less safe typically attend to infant and birthingparent mortality rates—which are low for thesesubjects whether at home or in hospital—but leaveout of consideration rates of c-section and episiotomy, which are bothrelatively high in hospital settings. Thus, a value-laden decisionabout whether a possible outcome counts as a harm worth consideringcan influence the outcome of the study—in this casetipping the balance towards the conclusion that hospital births aremore safe (ibid., 206).

Note that the birth safety case differs from the sort of cases atissue in the philosophical debate about risk and thresholds foracceptance and rejection of hypotheses. In accepting an hypothesis, aperson makes a judgement that the risk of being mistaken issufficiently low (Rudner 1953). When the consequences of being wrongare deemed grave, the threshold for acceptance may be correspondinglyhigh. Thus, in evaluating the epistemic status of an hypothesis inlight of the evidence, a person may have to make a value-basedjudgement. However, in the birth safety case, the judgement comes intoplay at an earlier stage, well before the decision to accept or rejectthe hypothesis is to be made. The judgement occurs already in decidingwhat is to count as a ‘harm’ worth considering for thepurposes of this research.

The fact that values do sometimes enter into scientific reasoning doesnot by itself settle the question of whether it would be better ifthey did not. In order to assess the normative proposal, philosophersof science have attempted to disambiguate the various ways in whichvalues might be thought to enter into science, and the variousreferents that get crammed under the single heading of‘values.’ Anderson (2004) articulates eight stages ofscientific research where values (‘evaluativepresuppositions’) might be employed in epistemically fruitfulways. In paraphrase: 1) orientation in a field, 2) framing a researchquestion, 3) conceptualizing the target, 4) identifying relevant data,5) data generation, 6) data analysis, 7) deciding when to cease dataanalysis, and 8) drawing conclusions (Anderson 2004, 11). Similarly,Intemann (2021) lays out five ways “that values play a role inscientific reasoning” with which feminist philosophers ofscience have engaged in particular:

(1) the framing [of] research problems, (2) observing phenomena anddescribing data, (3) reasoning about value-laden concepts andassessing risks, (4) adopting particular models, and (5) collectingand interpreting evidence. (208)

Ward (2021) presents a streamlined and general taxonomy of fourways in which values relate to choices: as reasons motivating orjustifying choices, as causal effectors of choices, or as goodsaffected by choices. By investigating the role of values in theseparticular stages or aspects of research, philosophers of science canoffer higher resolution insights than just the observation that valuesare involved in science at all and untangle crosstalk.

Similarly, fine points can be made about the nature of values involvedin these various contexts. Such clarification is likely important fordetermining whether the contribution of certain values in a givencontext is deleterious or salutary, and in what sense. Douglas (2013)argues that the ‘value’ of internal consistency of atheory and of the empirical adequacy of a theory with respect to theavailable evidence are minimal criteria for any viable scientifictheory (799–800). She contrasts these with the sort of values thatKuhn called ‘virtues,’ i.e. scope, simplicity, andexplanatory power that are properties of theories themselves, andunification, novel prediction and precision, which are properties atheory has in relation to a body of evidence (800–801). These are thesort of values that may be relevant to explaining and justifyingchoices that scientists make to pursue/abandon or accept/rejectparticular theories. Moreover, Douglas (2000) argues that what shecalls “non-epistemic values” (in particular, ethical valuejudgements) also enter into decisions at various stages“internal” to scientific reasoning, such as datacollection and interpretation (565). Consider a laboratory toxicologystudy in which animals exposed to dioxins are compared to unexposedcontrols. Douglas discusses researchers who want to determine thethreshold for safe exposure. Admitting false positives can be expectedto lead to overregulation of the chemical industry, while falsenegatives yield underregulation and thus pose greater risk to publichealth. The decision about where to set the unsafe exposure threshold,that is, set the threshold for a statistically significant differencebetween experimental and control animal populations, involvesbalancing the acceptability of these two types of errors. According toDouglas, this balancing act will depend on “whether we are moreconcerned about protecting public health from dioxin pollution orwhether we are more concerned about protecting industries that producedioxins from increased regulation” (ibid., 568). That scientistsdo as a matter of fact sometimes make such decisions is clear. Theyjudge, for instance, a specimen slide of a rat liver to be tumorous ornot, and whether borderline cases should count as benign or malignant(ibid., 569–572). Moreover, in such cases, it is not clear that theresponsibility of making such decisions could be offloaded tonon-scientists.

Many philosophers accept that values can contribute to the generationof empirical results without spoiling their epistemic utility.Anderson’s (2004) diagnosis is as follows:

Deep down, what the objectors find worrisome about allowing valuejudgments to guide scientific inquiry is not that they have evaluativecontent, but that these judgments might be held dogmatically, so as topreclude the recognition of evidence that might undermine them. Weneed to ensure that value judgements do not operate to drive inquiryto a predetermined conclusion. This is our fundamental criterion fordistinguishing legitimate from illegitimate uses of values in science.(11)

Data production (including experimental design and execution) isheavily influenced by investigators’ background assumptions.Sometimes these include theoretical commitments that leadexperimentalists to produce non-illuminating or misleading evidence.In other cases they may lead experimentalists to ignore, or even failto produce useful evidence. For example, in order to obtain data onorgasms in female stumptail macaques, one researcher wired up femalesto produce radio records of orgasmic muscle contractions, heart rateincreases, etc. But as Elisabeth Lloyd reports, “… theresearcher … wired up the heart rate of the male macaques asthe signal to start recording the female orgasms. When I pointed outthat the vast majority of female stumptail orgasms occurred during sexamong the females alone, he replied that yes he knew that, but he wasonly interested in important orgasms” (Lloyd 1993, 142).Although female stumptail orgasms occurring during sex with males areatypical, the experimental design was driven by the assumption thatwhat makes features of female sexuality worth studying is theircontribution to reproduction (ibid., 139). This assumption influencedexperimental design in such a way as to preclude learning about thefull range of female stumptail orgasms.

Anderson (2004) presents an influential analysis of the role of valuesin research on divorce. Researchers committed to an interpretiveframework rooted in ‘traditional family values’ couldconduct research on the assumption that divorce is mostly bad forspouses and any children that they have (ibid., 12). This backgroundassumption, which is rooted in a normative appraisal of a certainmodel of good family life, could lead social science researchers torestrict the questions with which they survey their research subjectsto ones about the negative impacts of divorce on their lives, therebycurtailing the possibility of discovering ways that divorce may haveactually made the ex-spouses lives better (ibid., 13). This is anexample of the influence that values can have on the nature of theresults that research ultimately yields, which is epistemicallydetrimental. In this case, the values in play biased the researchoutcomes to preclude recognition of countervailing evidence. Andersonargues that the problematic influence of values comes when research“is rigged in advance” to confirm certainhypotheses—when the influence of values amounts to incorrigibledogmatism (ibid., 19). “Dogmatism” in her sense isunfalsifiability in practice, “their stubbornness in the face ofany conceivable evidence”(ibid., 22).

Fortunately, such dogmatism is not ubiquitous and when it occurs itcan often be corrected eventually. Above we noted that the mereinvolvement of the theory to be tested in the generation of anempirical result does not automatically yield viciouscircularity—it depends on how the theory isinvolved. Furthermore, even if the assumptions initially made in thegeneration of empirical results are incorrect, future scientists willhave opportunities to reassess those assumptions in light of newinformation and techniques. Thus, as long as scientists continue theirwork there need be no time at which the epistemic value of anempirical result can be established once and for all. This should comeas no surprise to anyone who is aware that science is fallible, but itis no grounds for skepticism. It can be perfectly reasonable to trustthe evidence available at present even though it is logically possiblefor epistemic troubles to arise in the future. A similar point can bemade regarding values (although cf. Yap 2016).

Moreover, while the inclusion of values in the generation of anempirical result can sometimes be epistemically bad, values properlydeployed can also be harmless, or even epistemically helpful. As inthe cases of research on female stumptail macaque orgasms and theeffects of divorce, certain values can sometimes serve to illuminatethe way in which other epistemically problematic assumptions havehindered potential scientific insight. By valuing knowledge aboutfemale sexuality beyond its role in reproduction, scientists canrecognize the narrowness of an approach that only conceives of femalesexuality insofar as it relates to reproduction. By questioning theabsolute value of one traditional ideal for flourishing families,researchers can garner evidence that might end up destabilizing theempirical foundation supporting that ideal.

3.5 Reuse

Empirical results are most obviously put to epistemic work in theircontexts of origin. Scientists conceive of empirical research, collectand analyze the relevant data, and then bring the results to bear onthe theoretical issues that inspired the research in the first place.However, philosophers have also discussed ways in which empiricalresults are transferred out of their native contexts and applied indiverse and sometimes unexpected ways (see Leonelli and Tempini 2020).Cases of reuse, or repurposing of empirical results in differentepistemic contexts raise several interesting issues for philosophersof science. For one, such cases challenge the assumption that theory(and value) ladenness confines the epistemic utility of empiricalresults to a particular conceptual framework. Ancient Babylonianeclipse records inscribed on cuneiform tablets have been used togenerate constraints on contemporary geophysical theorizing about thecauses of the lengthening of the day on Earth (Stephenson, Morrison,and Hohenkerk 2016). This is surprising since the ancient observationswere originally recorded for the purpose of making astrologicalprognostications. Nevertheless, with enough background information,the records as inscribed can be translated, the layers of assumptionsbaked into their presentation peeled back, and the results repurposedusing resources of the contemporary epistemic context, the likes ofwhich the Babylonians could have hardly dreamed.

Furthermore, the potential for reuse and repurposing feeds back on themethodological norms of data production and handling. In light of thedifficulty of reusing or repurposing data without sufficientbackground information about the original context, Goodman et al.(2014) note that “data reuse is most possible when: 1) data; 2)metadata (information describing the data); and 3) information aboutthe process of generating those data, such as code, all allprovided” (3). Indeed, they advocate for sharing data and codein addition to results customarily published in science. As we haveseen, the loading of data with theory is usually necessary to puttingthat data to any serious epistemicuse—theory-loading makes theory appraisal possible.Philosophers have begun to appreciate that this epistemic boon doesnot necessarily come at the cost of rendering data “tragicallylocal” (Wylie 2020, 285, quoting Latour 1999). But it isimportant to note the useful travel of data between contexts issignificantly aided by foresight, curation, and management for thataim.

In light of the mediated nature of empirical results, Boyd (2018)argues for an “enriched view of evidence,” in which theevidence that serves as the ‘tribunal of experience’ isunderstood to be “lines of evidence” composed of theproducts of data collection and all of the products of theirtransformation on the way to the generation of empirical results thatare ultimately compared to theoretical predictions, consideredtogether with metadata associated with their provenance. Such metadataincludes information about theoretical assumptions that are made indata collection, processing, and the presentation of empiricalresults. Boyd argues that by appealing to metadata to‘rewind’ the processing of assumption-imbued empiricalresults and then by re-processing them using new resources, theepistemic utility of empirical evidence can survive transitions to newcontexts. Thus, the enriched view of evidence supports the idea thatit is not despite the intertwining of the theoretical and empiricalthat scientists accomplish key epistemic aims, but often in virtue ofit (ibid., 420). In addition, it makes the epistemic value of metadataencoding the various assumptions that have been made throughout thecourse of data collection and processing explicit.

The desirability of explicitly furnishing empirical data and resultswith auxiliary information that allow them to travel can beappreciated in light of the ‘objectivity’ norm, construedas accessibility to interpersonal scrutiny. When data are repurposedin novel contexts, they are not only shared between subjects, but canin some cases be shared across radically different paradigms withincompatible theoretical commitments.

4. The epistemic value of empirical evidence

One of the important applications of empirical evidence is its use inassessing the epistemic status of scientific theories. In this sectionwe briefly discuss philosophical work on the role of empiricalevidence in confirmation/falsification of scientific theories,‘saving the phenomena,’ and in appraising the empiricaladequacy of theories. However, further philosophical work ought toexplore the variety of ways that empirical results bear on theepistemic status of theories and theorizing in scientific practicebeyond these.

4.1 Confirmation

It is natural to think that computability, range of application, andother things being equal, true theories are better than false ones,good approximations are better than bad ones, and highly probabletheoretical claims are better than less probable ones. One way todecide whether a theory or a theoretical claim is true, close to thetruth, or acceptably probable is to derive predictions from it and useempirical data to evaluate them. Hypothetico-Deductive (HD)confirmation theorists proposed that empirical evidence argues forthe truth of theories whose deductive consequences it verifies, andagainst those whose consequences it falsifies (Popper 1959, 32–34).But laws and theoretical generalization seldom if ever entailobservational predictions unless they are conjoined with one or moreauxiliary hypotheses taken from the theory they belong to. When theprediction turns out to be false, HD has trouble explaining which ofthe conjuncts is to blame. If a theory entails a true prediction, itwill continue to do so in conjunction with arbitrarily selectedirrelevant claims. HD has trouble explaining why the prediction doesnot confirm the irrelevancies along with the theory of interest.

Another approach to confirmation by empirical evidence is Inference to the Best Explanation (IBE). The idea is roughly that an explanation of the evidence that exhibits certain desirable characteristics with respect to a family of candidate explanations is likely to be the true on (Lipton 1991). On this approach, it is in virtue of their successful explanation of the empirical evidence that theoretical claims are supported. Naturally, IBE advocates face the challenges of defending a suitable characterization of what counts as the ‘best’ and of justifying the limited pool of candidate explanations considered (Stanford 2006).

Bayesian approaches to scientific confirmation have garnered significant attention and are now widespread in philosophy of science. Bayesians hold that the evidential bearing of empirical evidence on a theoretical claim is to be understood in terms of likelihood or conditional probability. For example, whether empirical evidenceargues for a theoretical claim might be thought to depend upon whetherit is more probable (and if so how much more probable) than its denialconditional on a description of the evidence together with backgroundbeliefs, including theoretical commitments. But by Bayes’Theorem, the posterior probability of the claim of interest (that is, its probability given the evidence) is proportional to that claim’s prior probability. How to justify the choice of these prior probability assignments is one of the most notorious points of contention arising for Bayesians. If one makes the assignment of priors a subjective matter decided by epistemic agents, then it is not clear that they can be justified. Once again, one’s use of evidence to evaluate a theory depends in part upon one’s theoretical commitments (Earman 1992, 33–86; Roush 2005, 149–186). If one instead appeals to chains of successive updating using Bayes’ Theorem based on past evidence, one has to invoke assumptions that generally do not obtain in actual scientific reasoning. For instance, to ‘wash out’ the influence of priors a limit theorem is invoked wherein we consider very many updating iterations, but much scientific reasoning of interest does not happen in the limit, and so in practice priors hold unjustified sway (Norton 2021, 33).

Rather than attempting to cast all instances of confirmation based on empirical evidence as belonging to a universal schema, a better approach may be to ‘go local’. Norton’s material theory of induction argues that inductive support arises from background knowledge, that is, from material facts that are domain specific. Norton argues that, for instance, the induction from “Some samples of the element bismuth melt at 271°C” to “all samples of the element bismuth melt at 271°C” is admissible not in virtue of some universal schema that carries us from ‘some’ to ‘all’ but matters of fact (Norton 2003). In this particular case, the fact that licenses the induction is a fact about elements: “their samples are generally uniform in their physical properties” (ibid., 650). This is a fact pertinent to chemical elements, but not to samples of material like wax (ibid.). Thus Norton repeatedly emphasizes that “all induction is local”.

Still, there are those who may be skeptical about the very possibility of confirmation or of successful induction. Insofar as the bearing of evidence on theory is never totally decisive, insofar there is no single trusty universal schema that captures empirical support, perhaps the relationship between empirical evidence and scientific theory is not really about support after all. Giving up on empirical support would not automatically mean abandoning any epistemic value for empirical evidence. Rather than confirm theory, the epistemic role of evidence could be to constrain, for example by furnishing phenomena for theory to systematize or to adequately model.

4.2 Saving the phenomena

Theories are said to ‘save’ observable phenomena if theysatisfactorily predict, describe, or systematize them. How well atheory performs any of these tasks need not depend upon the truth oraccuracy of its basic principles. Thus according to Osiander’spreface to Copernicus’On the Revolutions, a locusclassicus, astronomers “… cannot in any way attain to truecauses” of the regularities among observable astronomicalevents, and must content themselves with saving the phenomena in thesense of using

… whatever suppositions enable … [them] to be computedcorrectly from the principles of geometry for the future as well asthe past … (Osiander 1543, XX)

Theorists are to use those assumptions as calculating tools withoutcommitting themselves to their truth. In particular, the assumptionthat the planets revolve around the sun must be evaluated solely interms of how useful it is in calculating their observable relativepositions to a satisfactory approximation. Pierre Duhem’sAim and Structure of Physical Theory articulates a relatedconception. For Duhem a physical theory

… is a system of mathematical propositions, deduced from a smallnumber of principles, which aim to represent as simply and completely,and exactly as possible, a set of experimental laws. (Duhem 1906, 19)

‘Experimental laws’ are general, mathematical descriptionsof observable experimental results. Investigators produce them byperforming measuring and other experimental operations and assigningsymbols to perceptible results according to pre-establishedoperational definitions (Duhem 1906, 19). For Duhem, the main functionof a physical theory is to help us store and retrieve informationabout observables we would not otherwise be able to keep track of. Ifthat is what a theory is supposed to accomplish, its main virtueshould be intellectual economy. Theorists are to replace reports ofindividual observations with experimental laws and devise higher levellaws (the fewer, the better) from which experimental laws (the more,the better) can be mathematically derived (Duhem 1906, 21ff).

A theory’s experimental laws can be tested for accuracy andcomprehensiveness by comparing them to observational data. Let EL beone or more experimental laws that perform acceptably well on suchtests. Higher level laws can then be evaluated on the basis of howwell they integrate EL into the rest of the theory. Some data thatdon’t fit integrated experimental laws won’t beinteresting enough to worry about. Other data may need to beaccommodated by replacing or modifying one or more experimental lawsor adding new ones. If the required additions, modifications orreplacements deliver experimental laws that are harder to integrate,the data count against the theory. If the required changes areconducive to improved systematization the data count in favor of it.If the required changes make no difference, the data don’t arguefor or against the theory.

4.3 Empirical adequacy

On van Fraassen’s (1980) semantic account, a theory isempirically adequate when the empirical structure of at least onemodel of that theory is isomorphic to what he calls the“appearances” (45). In other words, when the theory“has at least one model that all the actual phenomena fitinside” (12). Thus, for van Fraassen, we continually check theempirical adequacy of our theories by seeing if they have thestructural resources to accommodate new observations. We’llnever know that a given theory is totally empirically adequate, sincefor van Fraassen, empirical adequacy obtains with respect to all thatis observable in principle to creatures like us, not all that hasalready been observed (69).

The primary appeal of dealing in empirical adequacy rather thanconfirmation is its appropriate epistemic humility. Instead ofclaiming that confirming evidence justifies belief (or boostedconfidence) that a theory is true, one is restricted to saying thatthe theory continues to be consistent with the evidence as far as wecan tell so far. However, if the epistemic utility of empiricalresults in appraising the status of theories is just to judge theirempirical adequacy, then it may be difficult to account for thedifference between adequate but unrealistic theories, and thoseequally adequate theories that ought to be taken seriously asrepresentations. Appealing to extra-empirical virtues like parsimonymay be a way out, but one that will not appeal to philosophersskeptical of the connection thereby supposed between such virtues andrepresentational fidelity.

5. Conclusion

On an earlier way of thinking, observation was to serve as theunmediated foundation of science—direct access tothe facts upon which the edifice of scientific knowledge could bebuilt. When conflict arose between factions with different ideologicalcommitments, observations could furnish the material for neutralarbitration and settle the matter objectively, in virtue of beingindependent of non-empirical commitments. According to this view,scientists working in different paradigms could at least appeal to thesame observations, and propagandists could be held accountable to thepublicly accessible content of theory and value-free observations.Despite their different theories, Priestley and Lavoisier could findshared ground in the observations. Anti-Semites would be compelled toadmit the success of a theory authored by a Jewish physicist, invirtue of the unassailable facts revealed byobservation.

This version of empiricism with respect to science does not accordwell with the fact that observation per se plays a relatively smallrole in many actual scientific methodologies, and the fact that eventhe most ‘raw’ data is often already theoretically imbued.The strict contrast between theory and observation in science is morefruitfully supplanted by inquiry into the relationship betweentheorizing and empirical results.

Contemporary philosophers of science tend to embrace the theoryladenness of empirical results. Instead of seeing the integration ofthe theoretical and the empirical as an impediment to furtheringscientific knowledge, they see it as necessary. A ‘view fromnowhere’ would not bear on our particular theories. That is, itis impossible to put empirical results to use without recruiting sometheoretical resources. In order to use an empirical result toconstrain or test a theory it has to be processed into a form that canbe compared to that theory. To get stellar spectrograms to bear onNewtonian or relativistic cosmology, they need to beprocessed—into galactic rotation curves, say. The spectrogramsby themselves are just artifacts, pieces of paper. Scientists needtheoretical resources in order to even identify that such artifactsbear information relevant for their purposes, and certainly to putthem to any epistemic use in assessing theories.

This outlook does not render contemporary philosophers of science allconstructivists, however. Theory mediates the connection between thetarget of inquiry and the scientific worldview, it does not sever it.Moreover, vigilance is still required to ensure that the particularways in which theory is ‘involved’ in the production ofempirical results are not epistemically detrimental. Theory can bedeployed in experiment design, data processing, and presentation ofresults in unproductive ways, for instance, in determining whether theresults will speak for or against a particular theory regardless ofwhat the world is like. Critical appraisal of the roles of theory isthus important for genuine learning about nature through science.Indeed, it seems that extra-empirical values can sometimes assist suchcritical appraisal. Instead of viewing observation as the theory-freeand for that reason furnishing the content with which to appraisetheories, we might attend to the choices and mistakes that can be madein collecting and generating empirical results with the help oftheoretical resources, and endeavor to make choices conducive tolearning and correct mistakes as we discover them.

Recognizing the involvement of theory and values in the constitutionand generation of empirical results does not undermine the specialepistemic value of empirical science in contrast to propaganda andpseudoscience. In cases where the influence of cultural, political,and religious values hinder scientific inquiry, it is often the casethat they do so by limiting or determining the nature of the empiricalresults. Yet, by working to make the assumptions that shape resultsexplicit we can examine their suitability for our purposes and attemptto restructure inquiry as necessary. When disagreements arise,scientists can attempt to settle them by appealing to the causalconnections between the research target and the empirical data. Thetribunal of experience speaks through empirical results, but it onlydoes so through via careful fashioning with theoretical resources.

Bibliography

Anderson, E., 2004, “Uses of Value Judgments in Science: AGeneral Argument, with Lessons from a Case Study of Feminist Researchon Divorce,”Hypatia, 19(1): 1–24.
Aristotle(a),Generation of Animals inComplete Worksof Aristotle (Volume 1), J. Barnes (ed.), Princeton: PrincetonUniversity Press, 1995, pp. 774–993
Aristotle(b),History of Animals inComplete Works ofAristotle (Volume 1), J. Barnes (ed.), Princeton: PrincetonUniversity Press, 1995, pp. 1111–1228.
Azzouni, J., 2004, “Theory, Observation, and ScientificRealism,”British Journal for the Philosophy ofScience, 55(3): 371–92.
Bacon, Francis, 1620,Novum Organum with other parts of theGreat Instauration, P. Urbach and J. Gibson (eds. and trans.), LaSalle: Open Court, 1994.
Bogen, J., 2016, “Empiricism and After,”in P.Humphreys (ed.),Oxford Handbook of Philosophy of Science,Oxford: Oxford University Press, pp. 779–795.
Bogen, J, and Woodward, J., 1988, “Saving thePhenomena,”Philosophical Review, XCVII (3):303–352.
Bokulich, A., 2020, “Towards a Taxonomy of theModel-Ladenness of Data,”Philosophy of Science, 87(5):793–806.
Borrelli, A., 2012, “The Case of the Composite Higgs: TheModel as a ‘Rosetta Stone’ in Contemporary High-EnergyPhysics,”Studies in History and Philosophy of Science (PartB: Studies in History and Philosophy of Modern Physics), 43(3):195–214.
Boyd, N. M., 2018, “Evidence Enriched,”Philosophyof Science, 85(3): 403–21.
Boyle, R., 1661,The Sceptical Chymist, Montana:Kessinger (reprint of 1661 edition).
Bridgman, P., 1927,The Logic of Modern Physics, NewYork: Macmillan.
Chang, H., 2005, “A Case for Old-fashioned Observability,and a Reconstructive Empiricism,”Philosophy ofScience, 72(5): 876–887.
Collins, H. M., 1985Changing Order, Chicago: Universityof Chicago Press.
Conant, J.B., 1957, (ed.) “The Overthrow of the PhlogistonTheory: The Chemical Revolution of 1775–1789,” inJ.B.Conant and L.K. Nash (eds.),Harvard Studies in ExperimentalScience, Volume I, Cambridge: Harvard University Press, pp.65–116).
Daston, L., and P. Galison, 2007,Objectivity, Brooklyn:Zone Books.
Douglas, H., 2000, “Inductive Risk and Values inScience,”Philosophy of Science, 67(4):559–79.
–––, 2013, “The Value of CognitiveValues,”Philosophy of Science, 80(5):796–806.
Duhem, P., 1906,The Aim and Structure of PhysicalTheory, P. Wiener (tr.), Princeton: Princeton University Press,1991.
Earman, J., 1992,Bayes or Bust?, Cambridge: MITPress.
Feest, U., 2005, “Operationism in psychology: what thedebate is about, what the debate should be about,”Journalof the History of the Behavioral Sciences, 41(2):131–149.
Feyerabend, P.K., 1969, “Science Without Experience,”in P.K. Feyerabend,Realism, Rationalism, and ScientificMethod (Philosophical Papers I), Cambridge: Cambridge UniversityPress, 1985, pp. 132–136.
Franklin, A., 1986,The Neglect of Experiment, Cambridge:Cambridge University Press.
Galison, P., 1987,How Experiments End, Chicago:University of Chicago Press.
–––, 1990, “Aufbau/Bauhaus: logical positivism andarchitectural modernism,”Critical Inquiry, 16 (4):709–753.
Goodman, A., et al., 2014, “Ten Simple Rules for the Care andFeeding of Scientific Data,”PLoS ComputationalBiology, 10(4): e1003542.
Hacking, I., 1981, “Do We See Through a Microscope?,”Pacific Philosophical Quarterly, 62(4): 305–322.
–––, 1983,Representing andIntervening, Cambridge: Cambridge University Press.
Hanson, N.R., 1958,Patterns of Discovery, Cambridge,Cambridge University Press.
Hempel, C.G., 1952, “Fundamentals of Concept Formation inEmpirical Science,” inFoundations of the Unity ofScience, Volume 2, O. Neurath, R. Carnap, C. Morris (eds.),Chicago: University of Chicago Press, 1970, pp. 651–746.
Herschel, J. F. W., 1830,Preliminary Discourse on the Studyof Natural Philosophy, New York: Johnson Reprint Corp.,1966.
Hooke, R., 1705, “The Method of Improving NaturalPhilosophy,” in R. Waller (ed.),The Posthumous Works ofRobert Hooke, London: Frank Cass and Company, 1971.
Horowitz, P., and W. Hill, 2015,The Art of Electronics,third edition, New York: Cambridge University Press.
Intemann, K., 2021, “Feminist Perspectives on Values inScience,” in S. Crasnow and L. Intemann (eds.),TheRoutledge Handbook of Feminist Philosophy of Science, New York:Routledge, pp. 201–15.
Kuhn, T.S.,The Structure of Scientific Revolutions,1962, Chicago: University of Chicago Press, reprinted,1996.
Latour, B., 1999, “Circulating Reference: Sampling the Soilin the Amazon Forest,” inPandora’s Hope: Essays on theReality of Science Studies, Cambridge, MA: Harvard UniversityPress, pp. 24–79.
Latour, B., and Woolgar, S., 1979,Laboratory Life, TheConstruction of Scientific Facts, Princeton: Princeton UniversityPress, 1986.
Laymon, R., 1988, “The Michelson-Morley Experiment and theAppraisal of Theories,” in A. Donovan, L. Laudan, and R. Laudan(eds.),Scrutinizing Science: Empirical Studies of ScientificChange, Baltimore: The Johns Hopkins University Press,pp. 245–266.
Leonelli, S., 2009, “On the Locality of Data and Claimsabout Phenomena,”Philosophy of Science, 76(5):737–49.
Leonelli, S., and N. Tempini (eds.), 2020,Data Journeys inthe Sciences, Cham: Springer.
Lipton, P., 1991,Inference to the Best Explanation, London: Routledge.
Lloyd, E.A., 1993, “Pre-theoretical Assumptions InEvolutionary Explanations of Female Sexuality,”Philosophical Studies, 69: 139–153.
–––, 2012, “The Role of ‘Complex’Empiricism in the Debates about Satellite Data and ClimateModels,”,Studies in History and Philosophy of Science (PartA), 43(2): 390–401.
Longino, H., 1979, “Evidence and Hypothesis: An Analysis ofEvidential Relations,”Philosophy of Science, 46(1):35–56.
–––, 2020, “Afterward:Data inTransit,” in S. Leonelli and N. Tempini (eds.),DataJourneys in the Sciences, Cham: Springer, pp. 391–400.
Lupyan, G., 2015, “Cognitive Penetrability of Perception inthe Age of Prediction – Predictive Systems are PenetrableSystems,”Review of Philosophical Psychology, 6(4):547–569. doi:10.1007/s13164-015-0253-4
Mill, J. S., 1872,System of Logic, London: Longmans,Green, Reader, and Dyer.
Norton, J., 2003, “A Material Theory of Induction,”Philosophy of Science, 70(4): 647–70.
–––, 2021,The Material Theory of Induction,http://www.pitt.edu/~jdnorton/papers/material_theory/Material_Induction_March_14_2021.pdf.
Nyquist, H., 1928, “Thermal Agitation of Electric Charge inConductors,”Physical Review, 32(1): 110–13.
O’Connor, C. and J. O. Weatherall, 2019,TheMisinformation Age: How False Beliefs Spread, New Haven: YaleUniversity Press.
Olesko, K.M. and Holmes, F.L., 1994, “Experiment,Quantification and Discovery: Helmholtz’s Early PhysiologicalResearches, 1843–50,” in D. Cahan, (ed.),HermannHelmholtz and the Foundations of Nineteenth Century Science,Berkeley: UC Press, pp. 50–108.
Osiander, A., 1543, “To the Reader Concerning the Hypothesisof this Work,” in N. CopernicusOn the Revolutions, E.Rosen (tr., ed.), Baltimore: Johns Hopkins University Press, 1978, p.XX.
Parker, W. S., 2016, “Reanalysis and Observation:What’s the Difference?,”Bulletin of theAmerican Meteorological Society, 97(9): 1565–72.
–––, 2017, “Computer Simulation,Measurement, and Data Assimilation,”The British Journal forthe Philosophy of Science, 68(1): 273–304.
Popper, K.R.,1959,The Logic of Scientific Discovery,K.R. Popper (tr.), New York: Basic Books.
Rheinberger, H. J., 1997,Towards a History of EpistemicThings: Synthesizing Proteins in the Test Tube, Stanford:Stanford University Press.
Roush, S., 2005,Tracking Truth, Cambridge: CambridgeUniversity Press.
Rudner, R., 1953, “The Scientist Qua Scientist Makes ValueJudgments,”Philosophy of Science, 20(1):1–6.
Schlick, M., 1935, “Facts and Propositions,” inPhilosophy and Analysis, M. Macdonald (ed.), New York:Philosophical Library, 1954, pp. 232–236.
Schottky, W. H., 1918, “Über spontane Stromschwankungenin verschiedenen Elektrizitätsleitern,”Annalen derPhysik, 362(23): 541–67.
Shapere, D., 1982, “The Concept of Observation in Scienceand Philosophy,”Philosophy of Science, 49(4):485–525.
Stanford, K., 1991,Exceeding Our Grasp: Science, History, and the Problem of Unconceived Alternatives, Oxford: Oxford University Press.
Stephenson, F. R., L. V. Morrison, and C. Y. Hohenkerk, 2016,“Measurement of the Earth’s Rotation: 720 BC to AD2015,”Proceedings of the Royal Society A: Mathematical,Physical and Engineering Sciences, 472: 20160404.
Stuewer, R.H., 1985, “Artificial Disintegration and theCambridge-Vienna Controversy,” in P. Achinstein and O. Hannaway(eds.),Observation, Experiment, and Hypothesis in Modern PhysicalScience, Cambridge, MA: MIT Press, pp. 239–307.
Suppe, F., 1977, in F. Suppe (ed.)The Structure of ScientificTheories, Urbana: University of Illinois Press.
Van Fraassen, B.C, 1980,The Scientific Image, Oxford:Clarendon Press.
Ward, Z. B., 2021, “On Value-Laden Science,”Studies in History and Philosophy of Science Part A, 85: 54–62.
Whewell, W., 1858,Novum Organon Renovatum, Book II, inWilliam Whewell Theory of Scientific Method, R.E. Butts(ed.), Indianapolis: Hackett Publishing Company, 1989, pp.103–249.
Woodward, J. F., 2010, “Data, Phenomena, Signal, andNoise,”Philosophy of Science, 77(5):792–803.
–––, 2011, “Data and Phenomena: A Restatementand Defense,”Synthese, 182(1): 165–79.
Wylie, A., 2020, “Radiocarbon Dating in Archaeology:Triangulation and Traceability,” in S. Leonelli and N. Tempini(eds.),Data Journeys in the Sciences, Cham: Springer,pp. 285–301.
Yap, A., 2016, “Feminist Radical Empiricism, Values, andEvidence,”Hypatia, 31(1): 58–73.

Academic Tools

How to cite this entry.
Preview the PDF version of this entry at theFriends of the SEP Society.
Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
Enhanced bibliography for this entryatPhilPapers, with links to its database.

Other Internet Resources

Confirmation, by Franz Huber, in theInternet Encyclopedia ofPhilosophy.

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

Browse

About

Support SEP

Mirror Sites

View this site from another server:

USA (Main Site)Philosophy, Stanford University

Info about mirror sites

Library of Congress Catalog Data: ISSN 1095-5054

Movatterモバイル変換