Measurement is an integral part of modern science as well as ofengineering, commerce, and daily life. Measurement is often considereda hallmark of the scientific enterprise and a privileged source ofknowledge relative to qualitative modes of inquiry.[1] Despite its ubiquity and importance, there is little consensus amongphilosophers as to how to define measurement, what sorts of things aremeasurable, or which conditions make measurement possible. Most (butnot all) contemporary authors agree that measurement is an activitythat involves interaction with a concrete system with the aim ofrepresenting aspects of that system in abstract terms (e.g., in termsof classes, numbers, vectors etc.) But this characterization also fitsvarious kinds of perceptual and linguistic activities that are notusually considered measurements, and is therefore too broad to countas a definition of measurement. Moreover, if “concrete”implies “real”, this characterization is also too narrow,as measurement often involves the representation of ideal systems suchas the average household or an electron at complete rest.
Philosophers have written on a variety of conceptual, metaphysical,semantic and epistemological issues related to measurement. This entrywill survey the central philosophical standpoints on the nature ofmeasurement, the notion of measurable quantity and relatedepistemological issues. It will refrain from elaborating on the manydiscipline-specific problems associated with measurement and focus onissues that have a general character.
Modern philosophical discussions about measurement—spanning fromthe late nineteenth century to the present day—may be dividedinto several strands of scholarship. These strands reflect differentperspectives on the nature of measurement and the conditions that makemeasurement possible and reliable. The main strands are mathematicaltheories of measurement, operationalism, conventionalism, realism,information-theoretic accounts and model-based accounts. These strandsof scholarship do not, for the most part, constitute directlycompeting views. Instead, they are best understood as highlightingdifferent and complementary aspects of measurement. The following is avery rough overview of these perspectives:
These perspectives are in principle consistent with each other. Whilemathematical theories of measurement deal with the mathematicalfoundations of measurement scales, operationalism and conventionalismare primarily concerned with the semantics of quantity terms, realismis concerned with the metaphysical status of measurable quantities,and information-theoretic and model-based accounts are concerned withthe epistemological aspects of measuring. Nonetheless, the subjectdomain is not as neatly divided as the list above suggests. Issuesconcerning the metaphysics, epistemology, semantics and mathematicalfoundations of measurement are interconnected and often bear on oneanother. Hence, for example, operationalists and conventionalists haveoften adopted anti-realist views, and proponents of model-basedaccounts have argued against the prevailing empiricist interpretationof mathematical theories of measurement. These subtleties will becomeclear in the following discussion.
The list of strands of scholarship is neither exclusive norexhaustive. It reflects the historical trajectory of the philosophicaldiscussion thus far, rather than any principled distinction amongdifferent levels of analysis of measurement. Some philosophical workson measurement belong to more than one strand, while many other worksdo not squarely fit either. This is especially the case since theearly 2000s, when measurement returned to the forefront ofphilosophical discussion after several decades of relative neglect.This recent body of scholarship is sometimes called “theepistemology of measurement”, and includes a rich array of worksthat cannot yet be classified into distinct schools of thought. Thelast section of this entry will be dedicated to surveying some ofthese developments.
Although the philosophy of measurement formed as a distinct area ofinquiry only during the second half of the nineteenth century,fundamental concepts of measurement such as magnitude and quantityhave been discussed since antiquity. According to Euclid’sElements, a magnitude—such as a line, a surface or asolid—measures another when the latter is a whole multiple ofthe former (Book V, def. 1 & 2). Two magnitudes have a commonmeasure when they are both whole multiples of some magnitude, and areincommensurable otherwise (Book X, def. 1). The discovery ofincommensurable magnitudes allowed Euclid and his contemporaries todevelop the notion of aratio of magnitudes. Ratios can beeither rational or irrational, and therefore the concept of ratio ismore general than that of measure (Michell 2003, 2004a;Grattan-Guinness 1996).
Aristotle distinguished between quantities and qualities. Examples ofquantities are numbers, lines, surfaces, bodies, time and place,whereas examples of qualities are justice, health, hotness andpaleness (Categories §6 and §8). According toAristotle, quantities admit of equality and inequality but not ofdegrees, as “one thing is not more four-foot than another”(ibid. 6.6a19). Qualities, conversely, do not admit of equality orinequality but do admit of degrees, “for one thing is calledmore pale or less pale than another” (ibid. 8.10b26). Aristotledid not clearly specify whether degrees of qualities such as palenesscorrespond to distinct qualities, or whether the same quality,paleness, was capable of different intensities. This topic was at thecenter of an ongoing debate in the thirteenth and fourteenth centuries(Jung 2011). Duns Scotus supported the “addition theory”,according to which a change in the degree of a quality can beexplained by the addition or subtraction of smaller degrees of thatquality (2011: 553). This theory was later refined by Nicole Oresme,who used geometrical figures to represent changes in the intensity ofqualities such as velocity (Clagett 1968; Sylla 1971). Oresme’sgeometrical representations established a subset of qualities thatwere amenable to quantitative treatment, thereby challenging thestrict Aristotelian dichotomy between quantities and qualities. Thesedevelopments made possible the formulation of quantitative laws ofmotion during the sixteenth and seventeenth centuries (Grant1996).
The concept of qualitative intensity was further developed by Leibnizand Kant. Leibniz’s “principle of continuity” statedthat all natural change is produced by degrees. Leibniz argued thatthis principle applies not only to changes in extended magnitudes suchas length and duration, but also to intensities of representationalstates of consciousness, such as sounds (Jorgensen 2009; Diehl 2012).Kant is thought to have relied on Leibniz’s principle ofcontinuity to formulate his distinction between extensive andintensive magnitudes. According to Kant, extensive magnitudes arethose “in which the representation of the parts makes possiblethe representation of the whole” (1787: A162/B203). An exampleis length: a line can only be mentally represented by a successivesynthesis in which parts of the line join to form the whole. For Kant,the possibility of such synthesis was grounded in the forms ofintuition, namely space and time. Intensive magnitudes, like warmth orcolors, also come in continuous degrees, but their apprehension takesplace in an instant rather than through a successive synthesis ofparts. The degrees of intensive magnitudes “can only berepresented through approximation to negation” (1787: A168/B210), that is, by imagining their gradual diminution until theircomplete absence.
Scientific developments during the nineteenth century challenged thedistinction between extensive and intensive magnitudes. Thermodynamicsand wave optics showed that differences in temperature and huecorresponded to differences in spatio-temporal magnitudes such asvelocity and wavelength. Electrical magnitudes such as resistance andconductance were shown to be capable of addition and division despitenot being extensive in the Kantian sense, i.e., not synthesized fromspatial or temporal parts. Moreover, early experiments inpsychophysics suggested that intensities of sensation such asbrightness and loudness could be represented as sums of “justnoticeable differences” among stimuli, and could therefore bethought of as composed of parts (seeSection 3.3). These findings, along with advances in the axiomatization of branchesof mathematics, motivated some of the leading scientists of the latenineteenth century to attempt to clarify the mathematical foundationsof measurement (Maxwell 1873; von Kries 1882; Helmholtz 1887; Mach1896; Poincaré 1898; Hölder 1901; for historical surveyssee Darrigol 2003; Michell 1993, 2003; Cantù and Schlaudt 2013;Biagioli 2016: Ch. 4, 2018). These works are viewed today asprecursors to the body of scholarship known as “measurementtheory”.
Mathematical theories of measurement (often referred to collectivelyas “measurement theory”) concern the conditions underwhich relations among numbers (and other mathematical entities) can beused to express relations among objects.[2] In order to appreciate the need for mathematical theories ofmeasurement, consider the fact that relations exhibited bynumbers—such as equality, sum, difference and ratio—do notalways correspond to relations among the objects measured by thosenumbers. For example, 60 is twice 30, but one would be mistaken inthinking that an object measured at 60 degrees Celsius is twice as hotas an object at 30 degrees Celsius. This is because the zero point ofthe Celsius scale is arbitrary and does not correspond to an absenceof temperature.[3] Similarly, numerical intervals do not always carry empiricalinformation. When subjects are asked to rank on a scale from 1 to 7how strongly they agree with a given statement, there is noprimafacie reason to think that the intervals between 5 and 6 andbetween 6 and 7 correspond to equal increments of strength of opinion.To provide a third example, equality among numbers is transitive [if(a=b & b=c) then a=c] but empirical comparisons among physicalmagnitudes reveal only approximate equality, which is not a transitiverelation. These examples suggest that not all of the mathematicalrelations among numbers used in measurement are empiricallysignificant, and that different kinds of measurement scale conveydifferent kinds of empirically significant information.
The study of measurement scales and the empirical information theyconvey is the main concern of mathematical theories of measurement. Inhis seminal 1887 essay, “Counting and Measuring”, Hermannvon Helmholtz phrased the key question of measurement theory asfollows:
[W]hat is the objective meaning of expressing through denominatenumbers the relations of real objects as magnitudes, and under whatconditions can we do this? (1887: 4)
Broadly speaking, measurement theory sets out to (i) identify theassumptions underlying the use of various mathematical structures fordescribing aspects of the empirical world, and (ii) draw lessons aboutthe adequacy and limits of using these mathematical structures fordescribing aspects of the empirical world. Following Otto Hölder(1901), measurement theorists often tackle these goals through formalproofs, with the assumptions in (i) serving as axioms and the lessonsin (ii) following as theorems. A key insight of measurement theory isthat the empirically significant aspects of a given mathematicalstructure are those thatmirror relevant relations among theobjects being measured. For example, the relation “biggerthan” among numbers is empirically significant for measuringlength insofar as it mirrors the relation “longer than”among objects. This mirroring, or mapping, of relations betweenobjects and mathematical entities constitutes a measurement scale. Aswill be clarified below, measurement scales are usually thought of asisomorphisms or homomorphisms between objects and mathematicalentities.
Other than these broad goals and claims, measurement theory is ahighly heterogeneous body of scholarship. It includes works that spanfrom the late nineteenth century to the present day and endorse a widearray of views on the ontology, epistemology and semantics ofmeasurement. Two main differences among mathematical theories ofmeasurement are especially worth mentioning. The first concerns thenature of therelata, or “objects”, whoserelations numbers are supposed to mirror. Theserelata may beunderstood in at least four different ways: as concrete individualobjects, as qualitative observations of concrete individual objects,as abstract representations of individual objects, or as universalproperties of objects. Which interpretation is adopted depends inlarge part on the author’s metaphysical and epistemiccommitments. This issue will be especially relevant to the discussionof realist accounts of measurement (Section 5). Second, different measurement theorists have taken different standson the kind of empirical evidence that is required to establishmappings between objects and numbers. As a result, measurementtheorists have come to disagree about the necessary conditions forestablishing the measurability of attributes, and specifically aboutwhether psychological attributes are measurable. Debates aboutmeasurability have been highly fruitful for the development ofmeasurement theory, and the following subsections will introduce someof these debates and the central concepts developed therein.
During the late nineteenth and early twentieth centuries severalattempts were made to provide a universal definition of measurement.Although accounts of measurement varied, the consensus was thatmeasurement is a method ofassigning numbers to magnitudes.For example, Helmholtz (1887: 17) defined measurement as the procedureby which one finds the denominate number that expresses the value of amagnitude, where a “denominate number” is a numbertogether with a unit, e.g., 5 meters, and a magnitude is a quality ofobjects that is amenable to ordering from smaller to greater, e.g.,length. Bertrand Russell similarly stated that measurement is
any method by which a unique and reciprocal correspondence isestablished between all or some of the magnitudes of a kind and all orsome of the numbers, integral, rational or real. (1903: 176)
Norman Campbell defined measurement simply as “the process ofassigning numbers to represent qualities”, where a quality is aproperty that admits of non-arbitrary ordering (1920: 267).
Defining measurement as numerical assignment raises the question:which assignments are adequate, and under what conditions? Earlymeasurement theorists like Helmholtz (1887), Hölder (1901) andCampbell (1920) argued that numbers are adequate for expressingmagnitudes insofar as algebraic operations among numbers mirrorempirical relations among magnitudes. For example, the qualitativerelation “longer than” among rigid rods is (roughly)transitive and asymmetrical, and in this regard shares structuralfeatures with the relation “larger than” among numbers.Moreover, the end-to-end concatenation of rigid rods shares structuralfeatures—such as associativity and commutativity—with themathematical operation of addition. A similar situation holds for themeasurement of weight with an equal-arms balance. Here deflection ofthe arms provides ordering among weights and the heaping of weights onone pan constitutes concatenation.
Early measurement theorists formulated axioms that describe thesequalitative empirical structures, and used these axioms to provetheorems about the adequacy of assigning numbers to magnitudes thatexhibit such structures. Specifically, they proved that ordering andconcatenation are together sufficient for the construction of anadditive numerical representation of the relevant magnitudes.An additive representation is one in which addition is empiricallymeaningful, and hence also multiplication, division etc. Campbellcalled measurement procedures that satisfy the conditions ofadditivity “fundamental” because they do not involve themeasurement of any other magnitude (1920: 277). Kinds of magnitudesfor which a fundamental measurement procedure has beenfound—such as length, area, volume, duration, weight andelectrical resistance—Campbell called “fundamentalmagnitudes”. A hallmark of such magnitudes is that it ispossible to generate them by concatenating a standard sequence ofequal units, as in the example of a series of equally spaced marks ona ruler.
Although they viewed additivity as the hallmark of measurement, mostearly measurement theorists acknowledged that additivity is notnecessary for measuring. Other magnitudes exist that admit of orderingfrom smaller to greater, but whose ratios and/or differences cannotcurrently be determined except through their relations to other,fundamentally measurable magnitudes. Examples are temperature, whichmay be measured by determining the volume of a mercury column, anddensity, which may be measured as the ratio of mass and volume. Suchindirect determination came to be called “derived”measurement and the relevant magnitudes “derivedmagnitudes” (Campbell 1920: 275–7).
At first glance, the distinction between fundamental and derivedmeasurement may seem reminiscent of the distinction between extensiveand intensive magnitudes, and indeed fundamental measurement issometimes called “extensive”. Nonetheless, it is importantto note that the two distinctions are based on significantly differentcriteria of measurability. As discussed inSection 2, the extensive-intensive distinction focused on the intrinsicstructure of the quantity in question, i.e., whether or not it iscomposed of spatio-temporal parts. The fundamental-deriveddistinction, by contrast, focuses on the properties of measurementoperations. A fundamentally measurable magnitude is one forwhich a fundamental measurement operation has been found.Consequently, fundamentality is not an intrinsic property of amagnitude: a derived magnitude can become fundamental with thediscovery of new operations for its measurement. Moreover, infundamental measurement the numerical assignment need not mirror thestructure of spatio-temporal parts. Electrical resistance, forexample, can be fundamentally measured by connecting resistors in aseries (Campbell 1920: 293). This is considered a fundamentalmeasurement operation because it has a shared structure with numericaladdition, even though objects with equal resistance are not generallyequal in size.
The distinction between fundamental and derived measurement wasrevised by subsequent authors. Brian Ellis (1966: Ch. 5–8)distinguished among three types of measurement: fundamental,associative and derived. Fundamental measurement requires ordering andconcatenation operations satisfying the same conditions specified byCampbell. Associative measurement procedures are based on acorrelation of two ordering relationships, e.g., the correlationbetween the volume of a mercury column and its temperature. Derivedmeasurement procedures consist in the determination of the value of aconstant in a physical law. The constant may be local, as in thedetermination of the specific density of water from mass and volume,or universal, as in the determination of the Newtonian gravitationalconstant from force, mass and distance. Henry Kyburg (1984: Ch.5–7) proposed a somewhat different threefold distinction amongdirect, indirect and systematic measurement, which does not completelyoverlap with that of Ellis.[4] A more radical revision of the distinction between fundamental andderived measurement was offered by R. Duncan Luce and John Tukey(1964) in their work on conjoint measurement, which will be discussedinSection 3.4.
The previous subsection discussed the axiomatization of empiricalstructures, a line of inquiry that dates back to the early days ofmeasurement theory. A complementary line of inquiry within measurementtheory concerns the classification of measurement scales. Thepsychophysicist S.S. Stevens (1946, 1951) distinguished among fourtypes of scales: nominal, ordinal, interval and ratio. Nominal scalesrepresent objects as belonging to classes that have no particularorder, e.g., male and female. Ordinal scales represent order but nofurther algebraic structure. For example, the Mohs scale of mineralhardness represents minerals with numbers ranging from 1 (softest) to10 (hardest), but there is no empirical significance to equality amongintervals or ratios of those numbers.[5] Celsius and Fahrenheit are examples of interval scales: theyrepresent equality or inequality among intervals of temperature, butnot ratios of temperature, because their zero points are arbitrary.The Kelvin scale, by contrast, is a ratio scale, as are the familiarscales representing mass in kilograms, length in meters and durationin seconds. Stevens later refined this classification anddistinguished between linear and logarithmic interval scales (1959:31–34) and between ratio scales with and without a natural unit(1959: 34). Ratio scales with a natural unit, such as those used forcounting discrete objects and for representing probabilities, werenamed “absolute” scales.
As Stevens notes, scale types are individuated by the families oftransformations they can undergo without loss of empiricalinformation. Empirical relations represented on ratio scales, forexample, are invariant under multiplication by a positive number,e.g., multiplication by 2.54 converts from inches to centimeters.Linear interval scales allow both multiplication by a positive numberand a constant shift, e.g., the conversion from Celsius to Fahrenheitin accordance with the formula °C × 9/5 + 32 = °F.Ordinal scales admit of any transformation function as long as it ismonotonic and increasing, and nominal scales admit of any one-to-onesubstitution. Absolute scales admit of no transformation other thanidentity. Stevens’ classification of scales was latergeneralized by Louis Narens (1981, 1985: Ch. 2) and Luce et al. (1990:Ch. 20) in terms of the homogeneity and uniqueness of the relevanttransformation groups.
While Stevens’ classification of scales met with generalapproval in scientific and philosophical circles, its widerimplications for measurement theory became the topic of considerabledebate. Two issues were especially contested. The first was whetherclassification and ordering operations deserve to be called“measurement” operations, and accordingly whether therepresentation of magnitudes on nominal and ordinal scales shouldcount as measurement. Several physicists, including Campbell, arguedthat classification and ordering operations did not provide asufficiently rich structure to warrant the use of numbers, and henceshould not count as measurement operations. The second contested issuewas whether a concatenation operation had to be found for a magnitudebefore it could be fundamentally measured on a ratio scale. The debatebecame especially heated when it re-ignited a longer controversysurrounding the measurability of intensities of sensation. It is tothis debate we now turn.
One of the main catalysts for the development of mathematical theoriesof measurement was an ongoing debate surrounding measurability inpsychology. The debate is often traced back to Gustav Fechner’s(1860)Elements of Psychophysics, in which he described amethod of measuring intensities of sensation. Fechner’s methodwas based on the recording of “just noticeabledifferences” between sensations associated with pairs ofstimuli, e.g., two sounds of different intensity. These differenceswere assumed to be equal increments of intensity of sensation. AsFechner showed, under this assumption a stable linear relationship isrevealed between the intensity of sensation and the logarithm of theintensity of the stimulus, a relation that came to be known as“Fechner’s law” (Heidelberger 1993a: 203; Luce andSuppes 2004: 11–2). This law in turn provides a method forindirectly measuring the intensity of sensation by measuring theintensity of the stimulus, and hence, Fechner argued, providesjustification for measuring intensities of sensation on the realnumbers.
Fechner’s claims concerning the measurability of sensationbecame the subject of a series of debates that lasted nearly a centuryand proved extremely fruitful for the philosophy of measurement,involving key figures such as Mach, Helmholtz, Campbell and Stevens(Heidelberger 1993a: Ch. 6 and 1993b; Michell 1999: Ch. 6). Thoseobjecting to the measurability of sensation, such as Campbell,stressed the necessity of an empirical concatenation operation forfundamental measurement. Since intensities of sensation cannot beconcatenated to each other in the manner afforded by lengths andweights, there could be no fundamental measurement of sensationintensity. Moreover, Campbell claimed that none of the psychophysicalregularities discovered thus far are sufficiently universal to countas laws in the sense required for derived measurement (Campbell inFerguson et al. 1940: 347). All that psychophysicists have shown isthat intensities of sensation can be consistently ordered, but orderby itself does not yet warrant the use of numerical relations such assums and ratios to express empirical results.
The central opponent of Campbell in this debate was Stevens, whosedistinction between types of measurement scale was discussed above.Stevens defined measurement as the “assignment of numerals toobjects or events according to rules” (1951: 1) and claimed thatany consistent and non-random assignment counts as measurement in thebroad sense (1975: 47). In useful cases of scientific inquiry, Stevensclaimed, measurement can be construed somewhat more narrowly as anumerical assignment that is based on the results ofmatchingoperations, such as the coupling of temperature to mercury volume orthe matching of sensations to each other. Stevens argued against theview that relations among numbers need to mirror qualitative empiricalstructures, claiming instead that measurement scales should beregarded as arbitrary formal schemas and adopted in accordance withtheir usefulness for describing empirical data. For example, adoptinga ratio scale for measuring the sensations of loudness, volume anddensity of sounds leads to the formulation of a simple linear relationamong the reports of experimental subjects: loudness = volume ×density (1975: 57–8). Such assignment of numbers to sensationscounts as measurement because it is consistent and non-random, becauseit is based on the matching operations performed by experimentalsubjects, and because it captures regularities in the experimentalresults. According to Stevens, these conditions are togethersufficient to justify the use of a ratio scale for measuringsensations, despite the fact that “sensations cannot beseparated into component parts, or laid end to end like measuringsticks” (1975: 38; see also Hempel 1952: 68–9).
In the mid-twentieth century the two main lines of inquiry inmeasurement theory, the one dedicated to the empirical conditions ofquantification and the one concerning the classification of scales,converged in the work of Patrick Suppes (1951; Scott and Suppes 1958;for historical surveys see Savage and Ehrlich 1992; Diez 1997a,b).Suppes’ work laid the basis for the Representational Theory ofMeasurement (RTM), which remains the most influential mathematicaltheory of measurement to date (Krantz et al. 1971; Suppes et al. 1989;Luce et al. 1990). RTM defines measurement as the construction ofmappings from empirical relational structures into numericalrelational structures (Krantz et al. 1971: 9). An empirical relationalstructure consists of a set of empirical objects (e.g., rigid rods)along with certain qualitative relations among them (e.g., ordering,concatenation), while a numerical relational structure consists of aset of numbers (e.g., real numbers) and specific mathematicalrelations among them (e.g., “equal to or bigger than”,addition). Simply put, a measurement scale is a many-to-onemapping—a homomorphism—from an empirical to a numericalrelational structure, and measurement is the construction of scales.[6] RTM goes into great detail in clarifying the assumptions underlyingthe construction of different types of measurement scales. Each typeof scale is associated with a set of assumptions about the qualitativerelations obtaining among objects represented on that type of scale.From these assumptions, or axioms, the authors of RTM derive therepresentational adequacy of each scale type, as well as the family ofpermissible transformations making that type of scale unique. In thisway RTM provides a conceptual link between the empirical basis ofmeasurement and the typology of scales.[7]
On the issue of measurability, the Representational Theory takes amiddle path between the liberal approach adopted by Stevens and thestrict emphasis on concatenation operations espoused by Campbell. LikeCampbell, RTM accepts that rules of quantification must be grounded inknown empirical structures and should not be chosen arbitrarily to fitthe data. However, RTM rejects the idea that additive scales areadequate only when concatenation operations are available (Luce andSuppes 2004: 15). Instead, RTM argues for the existence of fundamentalmeasurement operations that do not involve concatenation. The centralexample of this type of operation is known as “additive conjointmeasurement” (Luce and Tukey 1964; Krantz et al. 1971:17–21 and Ch. 6–7). Here, measurements of two or moredifferent types of attribute, such as the temperature and pressure ofa gas, are obtained by observing their joint effect, such as thevolume of the gas. Luce and Tukey showed that by establishing certainqualitative relations among volumes under variations of temperatureand pressure, one can construct additive representations oftemperature and pressure, without invoking any antecedent method ofmeasuring volume. This sort of procedure is generalizable to anysuitably related triplet of attributes, such as the loudness,intensity and frequency of pure tones, or the preference for a reward,it size and the delay in receiving it (Luce and Suppes 2004: 17). Thediscovery of additive conjoint measurement led the authors of RTM todivide fundamental measurement into two kinds: traditional measurementprocedures based on concatenation operations, which they called“extensive measurement”, and conjoint or“nonextensive” fundamental measurement. Under this newconception of fundamentality, all the traditional physical attributescan be measured fundamentally, as well as many psychologicalattributes (Krantz et al. 1971: 502–3).
Above we saw that mathematical theories of measurement are primarilyconcerned with the mathematical properties of measurement scales andthe conditions of their application. A related but distinct strand ofscholarship concerns the meaning and use of quantity terms. Scientifictheories and models are commonly expressed in terms of quantitativerelations among parameters, bearing names such as“length”, “unemployment rate” and“introversion”. A realist about one of these terms wouldargue that it refers to a set of properties or relations that existindependently of being measured. An operationalist or conventionalistwould argue that the way such quantity-terms apply to concreteparticulars depends on nontrivial choices made by humans, andspecifically on choices that have to do with the way the relevantquantity is measured. Note that under this broad construal, realism iscompatible with operationalism and conventionalism. That is, it isconceivable that choices of measurement method regulate the use of aquantity-term and that, given thecorrect choice, this termsucceeds in referring to a mind-independent property or relation.Nonetheless, many operationalists and conventionalists adoptedstronger views, according to which there are no facts of the matter asto which of several and nontrivially different operations is correctfor applying a given quantity-term. These stronger variants areinconsistent with realism about measurement. This section will bededicated to operationalism and conventionalism, and the next torealism about measurement.
Operationalism (or “operationism”) about measurement isthe view that the meaning of quantity-concepts is determined by theset of operations used for their measurement. The strongest expressionof operationalism appears in the early work of Percy Bridgman (1927),who argued that
we mean by any concept nothing more than a set of operations; theconcept is synonymous with the corresponding set of operations. (1927:5)
Length, for example, would be defined as the result of the operationof concatenating rigid rods. According to this extreme version ofoperationalism, different operations measure different quantities.Length measured by using rulers and by timing electromagnetic pulsesshould, strictly speaking, be distinguished into two distinctquantity-concepts labeled “length-1” and“length-2” respectively. This conclusion led Bridgman toclaim that currently accepted quantity concepts have“joints” where different operations overlap in theirdomain of application. He warned against dogmatic faith in the unityof quantity concepts across these “joints”, urging insteadthat unity be checked against experiments whenever the application ofa quantity-concept is to be extended into a new domain. Nevertheless,Bridgman conceded that as long as the results of different operationsagree within experimental error it is pragmatically justified to labelthe corresponding quantities with the same name (1927: 16).[8]
Operationalism became influential in psychology, where it waswell-received by behaviorists like Edwin Boring (1945) and B.F.Skinner (1945). Indeed, Skinner maintained that behaviorism is“nothing more than a thoroughgoing operational analysis oftraditional mentalistic concepts” (1945: 271). Stevens, who wasBoring’s student, was a key promoter of operationalism inpsychology, and argued that psychological concepts have empiricalmeaning only if they stand for definite and concrete operations (1935:517; see also Isaac 2017). The idea that concepts are defined bymeasurement operations is consistent with Stevens’ liberal viewson measurability, which were discussed above (Section 3.3). As long as the assignment of numbers to objects is performed inaccordance with concrete and consistent rules, Stevens maintained thatsuch assignment has empirical meaning and does not need to satisfy anyadditional constraints. Nonetheless, Stevens probably did not embracean anti-realist view about psychological attributes. Instead, thereare good reasons to think that he understood operationalism as amethodological attitude that was valuable to the extent that itallowed psychologists to justify the conclusions they drew fromexperiments (Feest 2005). For example, Stevens did not treatoperational definitions asa priori but as amenable toimprovement in light of empirical discoveries, implying that he tookpsychological attributes to exist independently of such definitions(Stevens 1935: 527). This suggests that Stevens’ operationalismwas of a more moderate variety than that found in the early writingsof Bridgman.[9]
Operationalism met with initial enthusiasm by logical positivists, whoviewed it as akin to verificationism. Nonetheless, it was soonrevealed that any attempt to base a theory of meaning onoperationalist principles was riddled with problems. Among suchproblems were the automatic reliability operationalism conferred onmeasurement operations, the ambiguities surrounding the notion ofoperation, the overly restrictive operational criterion ofmeaningfulness, and the fact that many useful theoretical conceptslack clear operational definitions (Chang 2009).[10] In particular, Carl Hempel (1956, 1966) criticized operationalistsfor being unable to define dispositional terms such as“solubility in water”, and for multiplying the number ofscientific concepts in a manner that runs against the need forsystematic and simple theories. Accordingly, most writers on thesemantics of quantity-terms have avoided espousing an operational analysis.[11]
A more widely advocated approach admitted a conventional element tothe use of quantity-terms, while resisting attempts to reduce themeaning of quantity terms to measurement operations. These accountsare classified under the general heading“conventionalism”, though they differ in the particularaspects of measurement they deem conventional and in the degree ofarbitrariness they ascribe to such conventions.[12] An early precursor of conventionalism was Ernst Mach, who examinedthe notion of equality among temperature intervals (1896: 52). Machnoted that different types of thermometric fluid expand at different(and nonlinearly related) rates when heated, raising the question:which fluid expands most uniformly with temperature? According toMach, there is no fact of the matter as to which fluid expands moreuniformly, since the very notion of equality among temperatureintervals has no determinate application prior to a conventionalchoice of standard thermometric fluid. Mach coined the term“principle of coordination” for this sort ofconventionally chosen principle for the application of a quantityconcept. The concepts of uniformity of time and space received similartreatments by Henri Poincaré (1898, 1902: Part 2).Poincaré argued that procedures used to determine equalityamong durations stem from scientists’ unconscious preference fordescriptive simplicity, rather than from any fact about nature.Similarly, scientists’ choice to represent space with eitherEuclidean or non-Euclidean geometries is not determined by experiencebut by considerations of convenience.
Conventionalism with respect to measurement reached its mostsophisticated expression in logical positivism. Logical positivistslike Hans Reichenbach and Rudolf Carnap proposed “coordinativedefinitions” or “correspondence rules” as thesemantic link between theoretical and observational terms. Theseapriori, definition-like statements were intended to regulate theuse of theoretical terms by connecting them with empirical procedures(Reichenbach 1927: 14–19; Carnap 1966: Ch. 24). An example of acoordinative definition is the statement: “a measuring rodretains its length when transported”. According to Reichenbach,this statement cannot be empirically verified, because a universal andexperimentally undetectable force could exist that equally distortsevery object’s length when it is transported. In accordance withverificationism, statements that are unverifiable are neither true norfalse. Instead, Reichenbach took this statement to expresses anarbitrary rule for regulating the use of the concept of equality oflength, namely, for determining whether particular instances of lengthare equal (Reichenbach 1927: 16). At the same time, coordinativedefinitions were not seen as replacements, but rather as necessaryadditions, to the familiar sort of theoretical definitions of conceptsin terms of other concepts (1927: 14). Under the conventionalistviewpoint, then, the specification of measurement operations did notexhaust the meaning of concepts such as length or length-equality,thereby avoiding many of the problems associated with operationalism.[13]
Realists about measurement maintain that measurement is bestunderstood as the empirical estimation of an objective property orrelation. A few clarificatory remarks are in order with respect tothis characterization of measurement. First, the term“objective” is not meant to exclude mental properties orrelations, which are the objects of psychological measurement. Rather,measurable properties or relations are taken to be objective inasmuchas they are independent of the beliefs and conventions of the humansperforming the measurement and of the methods used for measuring. Forexample, a realist would argue that the ratio of the length of a givensolid rod to the standard meter has an objective value regardless ofwhether and how it is measured. Second, the term“estimation” is used by realists to highlight the factthat measurement results are mereapproximations of truevalues (Trout 1998: 46). Third, according to realists, measurement isaimed at obtaining knowledge about properties and relations, ratherthan at assigning values directly to individual objects. This issignificant because observable objects (e.g., levers, chemicalsolutions, humans) often instantiate measurable properties andrelations that are not directly observable (e.g., amount of mechanicalwork, more acidic than, intelligence). Knowledge claims about suchproperties and relations must presuppose some background theory. Byshifting the emphasis from objects to properties and relations,realists highlight the theory-laden character of measurements.
Realism about measurement should not be confused with realism aboutentities (e.g., electrons). Nor does realism about measurementnecessarily entail realism about properties (e.g., temperature), sinceone could in principle accept only the reality of relations (e.g.,ratios among quantities) without embracing the reality of underlyingproperties. Nonetheless, most philosophers who have defended realismabout measurement have done so by arguing for some form of realismabout properties (Byerly and Lazara 1973; Swoyer 1987; Mundy 1987;Trout 1998, 2000). These realists argue that at least some measurableproperties exist independently of the beliefs and conventions of thehumans who measure them, and that the existence and structure of theseproperties provides the best explanation for key features ofmeasurement, including the usefulness of numbers in expressingmeasurement results and the reliability of measuring instruments.
For example, a typical realist about length measurement would arguethat the empirical regularities displayed by individual objects’lengths when they are ordered and concatenated are best explained byassuming that length is an objective property that has an extensivestructure (Swoyer 1987: 271–4). That is, relations among lengthssuch as “longer than” and “sum of” existindependently of whether any objects happen to be ordered andconcatenated by humans, and indeed independently of whether objects ofsome particular length happen to exist at all. The existence of anextensive property structure means that lengths share much of theirstructure with the positive real numbers, and this explains theusefulness of the positive reals in representing lengths. Moreover, ifmeasurable properties are analyzed in dispositional terms, it becomeseasy to explain why some measuring instruments are reliable. Forexample, if one assumes that a certain amount of electric current in awire entails a disposition to deflect an ammeter needle by a certainangle, it follows that the ammeter’s indicationscounterfactually depend on the amount of electric current in the wire,and therefore that the ammeter is reliable (Trout 1998: 65).
A different argument for realism about measurement is due to JoelMichell (1994, 2005), who proposes a realist theory of number based onthe Euclidean concept of ratio. According to Michell, numbers areratios between quantities, and therefore exist in space and time.Specifically,real numbers are ratios between pairs ofinfinite standard sequences, e.g., the sequence of lengths normallydenoted by “1 meter”, “2 meters” etc. and thesequence of whole multiples of the length we are trying to measure.Measurement is the discovery and estimation of such ratios. Aninteresting consequence of this empirical realism about numbers isthat measurement is not a representational activity, but rather theactivity of approximating mind-independent numbers (Michell 1994:400).
Realist accounts of measurement are largely formulated in oppositionto strong versions of operationalism and conventionalism, whichdominated philosophical discussions of measurement from the 1930suntil the 1960s. In addition to the drawbacks of operationalismalready discussed in the previous section, realists point out thatanti-realism about measurable quantities fails to make sense ofscientific practice. If quantities had no real values independently ofone’s choice of measurement procedure, it would be difficult toexplain what scientists mean by “measurement accuracy” and“measurement error”, and why they try to increase accuracyand diminish error. By contrast, realists can easily make sense of thenotions of accuracy and error in terms of the distance between realand measured values (Byerly and Lazara 1973: 17–8; Swoyer 1987:239; Trout 1998: 57). A closely related point is the fact that newermeasurement procedures tend to improve on the accuracy of older ones.If choices of measurement procedure were merely conventional it wouldbe difficult to make sense of such progress. In addition, realismprovides an intuitive explanation for why different measurementprocedures often yield similar results, namely, because they aresensitive to the same facts (Swoyer 1987: 239; Trout 1998: 56).Finally, realists note that the construction of measurement apparatusand the analysis of measurement results are guided by theoreticalassumptions concerning causal relationships among quantities. Theability of such causal assumptions to guide measurement suggests thatquantities are ontologically prior to the procedures that measure them.[14]
While their stance towards operationalism and conventionalism islargely critical, realists are more charitable in their assessment ofmathematical theories of measurement. Brent Mundy (1987) and ChrisSwoyer (1987) both accept the axiomatic treatment of measurementscales, but object to the empiricist interpretation given to theaxioms by prominent measurement theorists like Campbell (1920) andErnest Nagel (1931; Cohen and Nagel 1934: Ch. 15). Rather thaninterpreting the axioms as pertaining to concrete objects or toobservable relations among such objects, Mundy and Swoyer reinterpretthe axioms as pertaining to universal magnitudes, e.g., to theuniversal property of being 5 meter long rather than to the concreteinstantiations of that property. This construal preserves theintuition that statements like “the size ofx is twicethe size ofy” are first and foremost about twosizes, and only derivatively about the objectsx andy themselves (Mundy 1987: 34).[15] Mundy and Swoyer argue that their interpretation is more general,because it logically entails all the first-order consequences of theempiricist interpretation along with additional, second-order claimsabout universal magnitudes. Moreover, under their interpretationmeasurement theory becomes a genuine scientific theory, withexplanatory hypotheses and testable predictions. Building on thiswork, Jo Wolff (2020a) has recently proposed a novel realist accountof quantities that relies on the Representational Theory ofMeasurement. According to Wolff’s structuralist theory ofquantity, quantitative attributes are relational structures.Specifically, an attribute is quantitative if its structure hastranslations that form an Archimedean ordered group. Wolff’sfocus on translations, rather than on specific relations such asconcatenation and ordering, means that quantitativeness can berealized in multiple ways and is not restricted to extensivestructures. It also means that being a quantity does not have anythingspecial to do with numbers, as both numerical and non-numericalstructures can be quantitative.
Information-theoretic accounts of measurement are based on an analogybetween measuring systems and communication systems. In a simplecommunication system, a message (input) is encoded into a signal atthe transmitter’s end, sent to the receiver’s end, andthen decoded back (output). The accuracy of the transmission dependson features of the communication system as well as on features of theenvironment, i.e., the level of background noise. Similarly, measuringinstruments can be thought of as “information machines”(Finkelstein 1977) that interact with an object in a given state(input), encode that state into an internal signal, and convert thatsignal into a reading (output). The accuracy of a measurementsimilarly depends on the instrument as well as on the level of noisein its environment. Conceived as a special sort of informationtransmission, measurement becomes analyzable in terms of theconceptual apparatus of information theory (Hartley 1928; Shannon1948; Shannon and Weaver 1949). For example, the information thatreading \(y_i\) conveys about the occurrence of a state \(x_k\) of theobject can be quantified as \(\log \left[\frac{p(x_k \midy_i)}{p(x_k)}\right]\), namely as a function of the decrease ofuncertainty about the object’s state (Finkelstein 1975: 222; foralternative formulations see Brillouin 1962: Ch. 15; Kirpatovskii1974; and Mari 1999: 185).
Ludwik Finkelstein (1975, 1977) and Luca Mari (1999) suggested thepossibility of a synthesis between Shannon-Weaver information theoryand measurement theory. As they argue, both theories centrally appealto the idea of mapping: information theory concerns the mappingbetween symbols in the input and output messages, while measurementtheory concerns the mapping between objects and numbers. Ifmeasurement is taken to be analogous to symbol-manipulation, thenShannon-Weaver theory could provide a formalization of the syntax ofmeasurement while measurement theory could provide a formalization ofits semantics. Nonetheless, Mari (1999: 185) also warns that theanalogy between communication and measurement systems is limited.Whereas a sender’s message can be known with arbitrary precisionindependently of its transmission, the state of an object cannot beknown with arbitrary precision independently of its measurement.
Information-theoretic accounts of measurement were originallydeveloped by metrologists — experts in physical measurement andstandardization — with little involvement from philosophers.Independently of developments in metrology, Bas van Fraassen (2008:141–185) has recently proposed a conception of measurement inwhich information plays a key role. He views measurement as composedof two levels: on the physical level, the measuring apparatusinteracts with an object and produces a reading, e.g., a pointer position.[16] On the abstract level, background theory represents theobject’s possible states on a parameter space. Measurementlocates an object on a sub-region of this abstract parameter space,thereby reducing the range of possible states (2008: 164 and 172).This reduction of possibilities amounts to the collection ofinformation about the measured object. Van Fraassen’s analysisof measurement differs from information-theoretic accounts developedin metrology in its explicit appeal to background theory, and in thefact that it does not invoke the symbolic conception of informationdeveloped by Shannon and Weaver.
Since the early 2000s a new wave of philosophical scholarship hasemerged that emphasizes the relationships between measurement andtheoretical and statistical modeling (Morgan 2001; Boumans 2005a,2015; Mari 2005b; Mari and Giordani 2013; Tal 2016, 2017; Parker 2017;Miyake 2017). According to model-based accounts, measurement consistsof two levels: (i) a concrete process involving interactions betweenan object of interest, an instrument, and the environment; and (ii) atheoretical and/or statistical model of that process, where“model” denotes an abstract and local representationconstructed from simplifying assumptions. The central goal ofmeasurement according to this view is to assign values to one or moreparameters of interest in the model in a manner that satisfies certainepistemic desiderata, in particular coherence and consistency.
Model-based accounts have been developed by studying measurementpractices in the sciences, and particularly in metrology. Metrology,officially defined as the “science of measurement and itsapplication” (JCGM 2012: 2.2), is a field of study concernedwith the design, maintenance and improvement of measuring instrumentsin the natural sciences and engineering. Metrologists typically workat standardization bureaus or at specialized laboratories that areresponsible for the calibration of measurement equipment, thecomparison of standards and the evaluation of measurementuncertainties, among other tasks. It is only recently thatphilosophers have begun to engage with the rich conceptual issuesunderlying metrological practice, and particularly with the inferencesinvolved in evaluating and improving the accuracy of measurementstandards (Chang 2004; Boumans 2005a: Chap. 5, 2005b, 2007a; Frigerioet al. 2010; Teller 2013, 2018; Riordan 2015; Schlaudt and Huber 2015;Tal 2016a, 2018; Mitchell et al. 2017; Mößner and Nordmann2017; de Courtenay et al. 2019).
A central motivation for the development of model-based accounts isthe attempt to clarify the epistemological principles underlyingaspects of measurement practice. For example, metrologists employ avariety of methods for the calibration of measuring instruments, thestandardization and tracing of units and the evaluation ofuncertainties (for a discussion of metrology, see the previoussection). Traditional philosophical accounts such as mathematicaltheories of measurement do not elaborate on the assumptions, inferencepatterns, evidential grounds or success criteria associated with suchmethods. As Frigerio et al. (2010) argue, measurement theory isill-suited for clarifying these aspects of measurement because itabstracts away from the process of measurement and focuses solely onthe mathematical properties of scales. By contrast, model-basedaccounts take scale construction to be merely one of several tasksinvolved in measurement, alongside the definition of measuredparameters, instrument design and calibration, object sampling andpreparation, error detection and uncertainty evaluation, among others(2010: 145–7).
According to model-based accounts, measurement involves interactionbetween an object of interest (the “system undermeasurement”), an instrument (the “measurementsystem”) and an environment, which includes the measuringsubjects. Other, secondary interactions may also be relevant for thedetermination of a measurement outcome, such as the interactionbetween the measuring instrument and the reference standards used forits calibration, and the chain of comparisons that trace the referencestandard back to primary measurement standards (Mari 2003: 25).Measurement proceeds by representing these interactions with a set ofparameters, and assigning values to a subset of those parameters(known as “measurands”) based on the results of theinteractions. When measured parameters are numerical they are called“quantities”. Although measurands need not be quantities,a quantitative measurement scenario will be supposed in whatfollows.
Two sorts of measurement outputs are distinguished by model-basedaccounts [JCGM 2012: 2.9 & 4.1; Giordani and Mari 2012: 2146; Tal2013]:
As proponents of model-based accounts stress, inferences frominstrument indications to measurement outcomes are nontrivial anddepend on a host of theoretical and statistical assumptions about theobject being measured, the instrument, the environment and thecalibration process. Measurement outcomes are often obtained throughstatistical analysis of multiple indications, thereby involvingassumptions about the shape of the distribution of indications and therandomness of environmental effects (Bogen and Woodward 1988:307–310). Measurement outcomes also incorporate corrections forsystematic effects, and such corrections are based on theoreticalassumptions concerning the workings of the instrument and itsinteractions with the object and environment. For example, lengthmeasurements need to be corrected for the change of the measuringrod’s length with temperature, a correction which is derivedfrom a theoretical equation of thermal expansion. Systematiccorrections involve uncertainties of their own, for example in thedetermination of the values of constants, and these uncertainties areassessed through secondary experiments involving further theoreticaland statistical assumptions. Moreover, the uncertainty associated witha measurement outcome depends on the methods employed for thecalibration of the instrument. Calibration involves additionalassumptions about the instrument, the calibrating apparatus, thequantity being measured and the properties of measurement standards(Rothbart and Slayden 1994; Franklin 1997; Baird 2004: Ch. 4; Soler etal. 2013). Another component of uncertainty originates from vaguenessin the definition of the measurand, and is known as“definitional uncertainty” (Mari and Giordani 2013;Grégis 2015). Finally, measurement involves backgroundassumptions about the scale type and unit system being used, and theseassumptions are often tied to broader theoretical and technologicalconsiderations relating to the definition and realization of scalesand units.
These various theoretical and statistical assumptions form the basisfor the construction of one or more models of the measurement process.Unlike mathematical theories of measurement, where the term“model” denotes a set-theoretical structure thatinterprets a formal language, here the term “model”denotes an abstract and local representation of a target system thatis constructed from simplifying assumptions.[17] The relevant target system in this case is a measurement process,that is, a system composed of a measuring instrument, objects orevents to be measured, the environment (including human operators),secondary instruments and reference standards, the time-evolution ofthese components, and their various interactions with each other.Measurement is viewed as a set of procedures whose aim is tocoherently assign values to model parameters based on instrumentindications. Models are therefore seen as necessary preconditions forthe possibility of inferring measurement outcomes from instrumentindications, and as crucial for determining the content of measurementoutcomes. As proponents of model-based accounts emphasize, the sameindications produced by the same measurement process may be used toestablish different measurement outcomes depending on how themeasurement process is modeled, e.g., depending on which environmentalinfluences are taken into account, which statistical assumptions areused to analyze noise, and which approximations are used in applyingbackground theory. As Luca Mari puts it,
any measurement result reports information that is meaningful only inthe context of a metrological model, such a model being required toinclude a specification for all the entities that explicitly orimplicitly appear in the expression of the measurement result. (2003:25)
Similarly, models are said to provide the necessary context forevaluating various aspects of the goodness of measurement outcomes,including accuracy, precision, error and uncertainty (Boumans 2006,2007a, 2009, 2012b; Mari 2005b).
Model-based accounts diverge from empiricist interpretations ofmeasurement theory in that they do not require relations amongmeasurement outcomes to be isomorphic or homomorphic to observablerelations among the items being measured (Mari 2000). Indeed,according to model-based accounts relations among measured objectsneed not be observable at all prior to their measurement (Frigerio etal. 2010: 125). Instead, the key normative requirement of model-basedaccounts is that values be assigned to model parameters in a coherentmanner. The coherence criterion may be viewed as a conjunction of twosub-criteria: (i) coherence of model assumptions with relevantbackground theories or other substantive presuppositions about thequantity being measured; and (ii) objectivity, i.e., the mutualconsistency of measurement outcomes across different measuringinstruments, environments and models[18] (Frigerio et al. 2010; Tal 2017a; Teller 2018). The firstsub-criterion is meant to ensure that theintended quantityis being measured, while the second sub-criterion is meant to ensurethat measurement outcomes can be reasonably attributed to the measuredobject rather than to some artifact of the measuringinstrument, environment or model. Taken together, these tworequirements ensure that measurement outcomes remain validindependently of the specific assumptions involved in theirproduction, and hence that the context-dependence of measurementoutcomes does not threaten their general applicability.
Besides their applicability to physical measurement, model-basedanalyses also shed light on measurement in economics. Like physicalquantities, values of economic variables often cannot be observeddirectly and must be inferred from observations based on abstract andidealized models. The nineteenth century economist William Jevons, forexample, measured changes in the value of gold by postulating certaincausal relationships between the value of gold, the supply of gold andthe general level of prices (Hoover and Dowell 2001: 155–159;Morgan 2001: 239). As Julian Reiss (2001) shows, Jevons’measurements were made possible by using two models: acausal-theoretical model of the economy, which is based on theassumption that the quantity of gold has the capacity to raise orlower prices; and a statistical model of the data, which is based onthe assumption that local variations in prices are mutuallyindependent and therefore cancel each other out when averaged. Takentogether, these models allowed Jevons to infer the change in the valueof gold from data concerning the historical prices of various goods.[19]
The ways in which models function in economic measurement have ledsome philosophers to view certain economic models as measuringinstruments in their own right, analogously to rulers and balances(Boumans 1999, 2005c, 2006, 2007a, 2009, 2012a, 2015; Morgan 2001).Marcel Boumans explains how macroeconomists are able to isolate avariable of interest from external influences by tuning parameters ina model of the macroeconomic system. This technique frees economistsfrom the impossible task of controlling the actual system. As Boumansargues, macroeconomic models function as measuring instruments insofaras they produce invariant relations between inputs (indications) andoutputs (outcomes), and insofar as this invariance can be tested bycalibration against known and stable facts. When such model-basedprocedures are combined with expert judgment, they can producereliable measurements of economic phenomena even outside controlledlaboratory settings (Boumans 2015: Chap. 5).
Another area where models play a central role in measurement ispsychology. The measurement of most psychological attributes, such asintelligence, anxiety and depression, does not rely on homomorphicmappings of the sort espoused by the Representational Theory ofMeasurement (Wilson 2013: 3766). Instead, psychometric theory reliespredominantly on the development of abstract models that are meant topredict subjects’ performance in certain tasks. These models areconstructed from substantive and statistical assumptions about thepsychological attribute being measured and its relation to eachmeasurement task. For example, Item Response Theory, a popularapproach to psychological measurement, employs a variety of models toevaluate the reliability and validity of questionnaires. Consider aquestionnaire that is meant to assess English language comprehension(the “ability”), by presenting subjects with a series ofyes/no questions (the “items”). One of the simplest modelsused to calibrate such questionnaires is the Rasch model (Rasch 1960).This model supposes a straightforward algebraic relation—knownas the “log of the odds”—between the probabilitythat a subject will answer a given item correctly, the difficulty ofthat particular item, and the subject’s ability. Newquestionnaires are calibrated by testing the fit between theirindications and the predictions of the Rasch model and assigningdifficulty levels to each item accordingly. The model is then used inconjunction with the questionnaire to infer levels of English languagecomprehension (outcomes) from raw questionnaire scores (indications)(Wilson 2013; Mari and Wilson 2014).
The sort of statistical calibration (or “scaling”)provided by Rasch models yields repeatable results, but it is oftenonly a first step towards full-fledged psychological measurement.Psychologists are typically interested in the results of a measure notfor its own sake, but for the sake of assessing some underlying andlatent psychological attribute, e.g., English language comprehension.A good fit between item responses and a statistical model does not yetdetermine what the questionnaire is measuring. The process ofestablishing that a procedure measures the intended psychologicalattribute is known as “validation”. One way of validatinga psychometric instrument is to test whether different procedures thatare intended to measure the same latent attribute provide consistentresults. Such testing belongs to a family of validation techniquesknown as “construct validation”. A construct is anabstract representation of the latent attribute intended to bemeasured, and
reflects a hypothesis […] that a variety of behaviors willcorrelate with one another in studies of individual differences and/orwill be similarly affected by experimental manipulations. (Nunnally& Bernstein 1994: 85)
Constructs are denoted by variables in a model that predicts whichcorrelations would be observed among the indications of differentmeasures if they are indeed measures of the same attribute. Suchmodels involve substantive assumptions about the attribute, includingits internal structure and its relations to other attributes, andstatistical assumptions about the correlation among different measures(Campbell & Fiske 1959; Nunnally & Bernstein 1994: Ch. 3;Angner 2008).
In recent years, philosophers of science have become increasinglyinterested in psychometrics and the concept of validity. One debateconcerns the ontological status of latent psychological attributes.Denny Borsboom has argued against operationalism about latentattributes, and in favour of defining validity in a manner thatembraces realism: “a test is valid for measuring an attribute ifand only if a) the attribute exists, and b) variations in theattribute causally produce variations in the outcomes of themeasurement procedure” (2005: 150; see also Hood 2009, 2013;Feest 2020). Elina Vessonen has defended a moderate form ofoperationalism about psychological attributes, and argued thatmoderate operationalism is compatible with a cautious type of realism(2019). Another recent discussion focuses on the justification forconstruct validation procedures. According to Anna Alexandrova,construct validation is in principle a justified methodology, insofaras it establishes coherence with theoretical assumptions andbackground knowledge about the latent attribute. However, Alexandrovanotes that in practice psychometricians who intend to measurehappiness and well-being often avoid theorizing about theseconstructs, and instead appeal to respondents’ folk beliefs.This defeats the purpose of construct validation and turns it into anarrow, technical exercise (Alexandrova and Haybron 2016; Alexandrova2017; see also McClimans et al. 2017).
A more fundamental criticism leveled against psychometrics is that itdogmatically presupposes that psychological attributes can bequantified. Michell (2000, 2004b) argues that psychometricians havenot made serious attempts to test whether the attributes they purportto measure have quantitative structure, and instead adopted an overlyloose conception of measurement that disguises this neglect. Inresponse, Borsboom and Mellenbergh (2004) argue that Item ResponseTheory provides probabilistic tests of the quantifiability ofattributes. Psychometricians who construct a statistical modelinitially hypothesize that an attribute is quantitative, and thensubject the model to empirical tests. When successful, such testsprovide indirect confirmation of the initial hypothesis, e.g. byshowing that the attribute has an additive conjoint structure (seealso Vessonen 2020).
Several scholars have pointed out similarities between the ways modelsare used to standardize measurable quantities in the natural andsocial sciences. For example, Mark Wilson (2013) argues thatpsychometric models can be viewed as tools for constructingmeasurement standards in the same sense of “measurementstandard” used by metrologists. Others have raised doubts aboutthe feasibility and desirability of adopting the example of thenatural sciences when standardizing constructs in the social sciences.Nancy Cartwright and Rosa Runhardt (2014) discuss“Ballung” concepts, a term they borrow from Otto Neurathto denote concepts with a fuzzy and context-dependent scope. Examplesof Ballung concepts are race, poverty, social exclusion, and thequality of PhD programs. Such concepts are too multifaceted to bemeasured on a single metric without loss of meaning, and must berepresented either by a matrix of indices or by several differentmeasures depending on which goals and values are at play (see alsoBradburn, Cartwright, & Fuller 2016, Other Internet Resources).Alexandrova (2008) points out that ethical considerations bear onquestions about the validity of measures of well-being no less thanconsiderations of reproducibility. Such ethical considerations arecontext sensitive, and can only be applied piecemeal. In a similarvein, Leah McClimans (2010) argues that uniformity is not always anappropriate goal for designing questionnaires, as the open-endednessof questions is often both unavoidable and desirable for obtainingrelevant information from subjects.[20] The intertwining of ethical and epistemic considerations isespecially clear when psychometric questionnaires are used in medicalcontexts to evaluate patient well-being and mental health. In suchcases, small changes to the design of a questionnaire or the analysisof its results may result in significant harms or benefits to patients(McClimans 2017; Stegenga 2018, Chap. 8). These insights highlight thevalue-laden and contextual nature of the measurement of mental andsocial phenomena.
The development of model-based accounts discussed in the previoussection is part of a larger, “epistemic turn” in thephilosophy of measurement that occurred in the early 2000s. Ratherthan emphasizing the mathematical foundations, metaphysics orsemantics of measurement, philosophical work in recent years tends tofocus on the presuppositions and inferential patterns involved inconcrete practices of measurement, and on the historical, social andmaterial dimensions of measuring. The philosophical study of thesetopics has been referred to as the “epistemology ofmeasurement” (Mari 2003, 2005a; Leplège 2003; Tal 2017a).In the broadest sense, the epistemology of measurement is the study ofthe relationships between measurement and knowledge. Central topicsthat fall under the purview of the epistemology of measurement includethe conditions under which measurement produces knowledge; thecontent, scope, justification and limits of such knowledge; thereasons why particular methodologies of measurement andstandardization succeed or fail in supporting particular knowledgeclaims, and the relationships between measurement and otherknowledge-producing activities such as observation, theorizing,experimentation, modelling and calculation. In pursuing theseobjectives, philosophers are drawing on the work of historians andsociologists of science, who have been investigating measurementpractices for a longer period (Wise and Smith 1986; Latour 1987: Ch.6; Schaffer 1992; Porter 1995, 2007; Wise 1995; Alder 2002; Galison2003; Gooday 2004; Crease 2011), as well as on the history andphilosophy of scientific experimentation (Harré 1981; Hacking1983; Franklin 1986; Cartwright 1999). The following subsectionssurvey some of the topics discussed in this burgeoning body ofliterature.
A topic that has attracted considerable philosophical attention inrecent years is the selection and improvement of measurementstandards. Generally speaking, to standardize a quantity concept is toprescribe a determinate way in which that concept is to be applied toconcrete particulars.[21] To standardize a measuring instrument is to assess how well theoutcomes of measuring with that instrument fit the prescribed mode ofapplication of the relevant concept.[22] The term “measurement standard” accordingly has at leasttwo meanings: on the one hand, it is commonly used to refer toabstract rules and definitions that regulate the use of quantityconcepts, such as the definition of the meter. On the other hand, theterm “measurement standard” is also commonly used to referto the concrete artifacts and procedures that are deemed exemplary ofthe application of a quantity concept, such as the metallic bar thatserved as the standard meter until 1960. This duality in meaningreflects the dual nature of standardization, which involves bothabstract and concrete aspects.
InSection 4 it was noted that standardization involves choices among nontrivialalternatives, such as the choice among different thermometric fluidsor among different ways of marking equal duration. These choices arenontrivial in the sense that they affect whether or not the sametemperature (or time) intervals are deemed equal, and hence affectwhether or not statements of natural law containing the term“temperature” (or “time”) come out true.Appealing to theory to decide which standard is more accurate would becircular, since the theory cannot be determinately applied toparticulars prior to a choice of measurement standard. Thiscircularity has been variously called the “problem ofcoordination” (van Fraassen 2008: Ch. 5) and the “problemof nomic measurement” (Chang 2004: Ch. 2). As already mentioned,conventionalists attempted to escape the circularity by positingapriori statements, known as “coordinativedefinitions”, which were supposed to link quantity-terms withspecific measurement operations. A drawback of this solution is thatit supposes that choices of measurement standard are arbitrary andstatic, whereas in actual practice measurement standards tend to bechosen based on empirical considerations and are eventually improvedor replaced with standards that are deemed more accurate.
A new strand of writing on the problem of coordination has emerged inrecent years, consisting most notably of the works of Hasok Chang(2001, 2004, 2007; Barwich and Chang 2015) and Bas van Fraassen (2008:Ch. 5; 2009, 2012; see also Padovani 2015, 2017; Michel 2019). Theseworks take a historical and coherentist approach to the problem.Rather than attempting to avoid the problem of circularity completely,as their predecessors did, they set out to show that the circularityis not vicious. Chang argues that constructing a quantity-concept andstandardizing its measurement are co-dependent and iterative tasks.Each “epistemic iteration” in the history ofstandardization respects existing traditions while at the same timecorrecting them (Chang 2004: Ch. 5). The pre-scientific concept oftemperature, for example, was associated with crude and ambiguousmethods of ordering objects from hot to cold. Thermoscopes, andeventually thermometers, helped modify the original concept and madeit more precise. With each such iteration the quantity concept wasre-coordinated to a more stable set of standards, which in turnallowed theoretical predictions to be tested more precisely,facilitating the subsequent development of theory and the constructionof more stable standards, and so on.
How this process avoids vicious circularity becomes clear when we lookat it either “from above”, i.e., in retrospect given ourcurrent scientific knowledge, or “from within”, by lookingat historical developments in their original context (van Fraassen2008: 122). From either vantage point, coordination succeeds becauseit increases coherence among elements of theory and instrumentation.The questions “what counts as a measurement of quantityX?” and “what is quantityX?”,though unanswerable independently of each other, are addressedtogether in a process of mutual refinement. It is only when one adoptsa foundationalist view and attempts to find a starting point forcoordination free of presupposition that this historical processerroneously appears to lack epistemic justification (2008: 137).
The new literature on coordination shifts the emphasis of thediscussion from the definitions of quantity-terms to therealizations of those definitions. In metrological jargon, a“realization” is a physical instrument or procedure thatapproximately satisfies a given definition (cf. JCGM 2012: 5.1).Examples of metrological realizations are the official prototypes ofthe kilogram and the cesium fountain clocks used to standardize thesecond. Recent studies suggest that the methods used to design,maintain and compare realizations have a direct bearing on thepractical application of concepts of quantity, unit and scale, no lessthan the definitions of those concepts (Riordan 2015; Tal 2016). Therelationship between the definition and realizations of a unit becomesespecially complex when the definition is stated in theoretical terms.Several of the base units of the International System (SI) —including the meter, kilogram, ampere, kelvin and mole — are nolonger defined by reference to any specific kind of physical system,but by fixing the numerical value of a fundamental physical constant.The kilogram, for example, was redefined in 2019 as the unit of masssuch that the numerical value of the Planck constant is exactly6.62607015 × 10-34 kg m2 s-1(BIPM 2019:131). Realizing the kilogram under this definition is ahighly theory-laden task. The study of the practical realization ofsuch units has shed new light on the evolving relationships betweenmeasurement and theory (Tal 2018; de Courtenay et al 2019; Wolff2020b).
As already discussed above (Sections7 and8.1), theory and measurement are interdependent both historically andconceptually. On the historical side, the development of theory andmeasurement proceeds through iterative and mutual refinements. On theconceptual side, the specification of measurement procedures shapesthe empirical content of theoretical concepts, while theory provides asystematic interpretation for the indications of measuringinstruments. This interdependence of measurement and theory may seemlike a threat to the evidential role that measurement is supposed toplay in the scientific enterprise. After all, measurement outcomes arethought to be able to test theoretical hypotheses, and this seems torequire some degree of independence of measurement from theory. Thisthreat is especially clear when the theoretical hypothesis beingtested is already presupposed as part of the model of the measuringinstrument. To cite an example from Franklin et al. (1989: 230):
There would seem to be, at first glance, a vicious circularity if onewere to use a mercury thermometer to measure the temperature ofobjects as part of an experiment to test whether or not objects expandas their temperature increases.
Nonetheless, Franklin et al. conclude that the circularity is notvicious. The mercury thermometer could be calibrated against anotherthermometer whose principle of operation does not presuppose the lawof thermal expansion, such as a constant-volume gas thermometer,thereby establishing the reliability of the mercury thermometer onindependent grounds. To put the point more generally, in the contextof local hypothesis-testing the threat of circularity can usually beavoided by appealing to other kinds of instruments and other parts oftheory.
A different sort of worry about the evidential function of measurementarises on the global scale, when the testing of entire theories isconcerned. As Thomas Kuhn (1961) argues, scientific theories areusually accepted long before quantitative methods for testing thembecome available. The reliability of newly introduced measurementmethods is typically tested against the predictions of the theoryrather than the other way around. In Kuhn’s words, “Theroad from scientific law to scientific measurement can rarely betraveled in the reverse direction” (1961: 189). For example,Dalton’s Law, which states that the weights of elements in achemical compound are related to each other in whole-numberproportions, initially conflicted with some of the best knownmeasurements of such proportions. It is only by assumingDalton’s Law that subsequent experimental chemists were able tocorrect and improve their measurement techniques (1961: 173). Hence,Kuhn argues, the function of measurement in the physical sciences isnot to test the theory but to apply it with increasing scope andprecision, and eventually to allow persistent anomalies to surfacethat would precipitate the next crisis and scientific revolution. Notethat Kuhn is not claiming that measurement has no evidential role toplay in science. Instead, he argues that measurements cannot test atheory in isolation, but only by comparison to some alternative theorythat is proposed in an attempt to account for the anomalies revealedby increasingly precise measurements (for an illuminating discussionof Kuhn’s thesis see Hacking 1983: 243–5).
Traditional discussions of theory-ladenness, like those of Kuhn, wereconducted against the background of the logical positivists’distinction between theoretical and observational language. Thetheory-ladenness of measurement was correctly perceived as a threat tothe possibility of a clear demarcation between the two languages.Contemporary discussions, by contrast, no longer presenttheory-ladenness as an epistemological threat but take for grantedthat some level of theory-ladenness is a prerequisite for measurementsto have any evidential power. Without some minimal substantiveassumptions about the quantity being measured, such as its amenabilityto manipulation and its relations to other quantities, it would beimpossible to interpret the indications of measuring instruments andhence impossible to ascertain the evidential relevance of thoseindications. This point was already made by Pierre Duhem (1906:153–6; see also Carrier 1994: 9–19). Moreover,contemporary authors emphasize that theoretical assumptions playcrucial roles in correcting for measurement errors and evaluatingmeasurement uncertainties. Indeed, physical measurement proceduresbecomemore accurate when the model underlying them isde-idealized, a process which involves increasing the theoreticalrichness of the model (Tal 2011).
The acknowledgment that theory is crucial for guaranteeing theevidential reliability of measurement draws attention to the“problem of observational grounding”, which is an inversechallenge to the traditional threat of theory-ladenness (Tal 2016b).The challenge is to specify what roleobservation plays inmeasurement, and particularly what sort of connection with observationis necessary and/or sufficient to allow measurement to play anevidential role in the sciences. This problem is especially clear whenone attempts to account for the increasing use of computationalmethods for performing tasks that were traditionally accomplished bymeasuring instruments. As Margaret Morrison (2009) and Wendy Parker(2017) argue, there are cases where reliable quantitative informationis gathered about a target system with the aid of a computersimulation, but in a manner that satisfies some of the centraldesiderata for measurement such as being empirically grounded andbackward-looking (see also Lusk 2016). Such information does not relyon signals transmitted from the particular object of interest to theinstrument, but on the use of theoretical and statistical models toprocess empirical data about related objects. For example, dataassimilation methods are customarily used to estimate past atmospherictemperatures in regions where thermometer readings are not available.Some methods do this by fitting a computational model of theatmosphere’s behavior to a combination of available data fromnearby regions and a model-based forecast of conditions at the time ofobservation (Parker 2017). These estimations are then used in variousways, including as data for evaluating forward-looking climate models.Regardless of whether one calls these estimations“measurements”, they challenge the idea that producingreliable quantitative evidence about the state of an object requiresobserving that object, however loosely one understands the term “observation”.[23]
Two key aspects of the reliability of measurement outcomes areaccuracy and precision. Consider a series of repeated weightmeasurements performed on a particular object with an equal-armsbalance. From a realist, “error-based” perspective, theoutcomes of these measurements areaccurate if they are closeto the true value of the quantity being measured—in our case,the true ratio of the object’s weight to the chosenunit—andprecise if they are close to each other. Ananalogy often cited to clarify the error-based distinction is that ofarrows shot at a target, with accuracy analogous to the closeness ofhits to the bull’s eye and precision analogous to the tightnessof spread of hits (cf. JCGM 2012: 2.13 & 2.15, Teller 2013: 192).Though intuitive, the error-based way of carving the distinctionraises an epistemological difficulty. It is commonly thought that theexact true values of most quantities of interest to science areunknowable, at least when those quantities are measured on continuousscales. If this assumption is granted, the accuracy with which suchquantities are measured cannot be known with exactitude, but onlyestimated by comparing inaccurate measurements to each other. And yetit is unclear why convergence among inaccurate measurements should betaken as an indication of truth. After all, the measurements could beplagued by a common bias that prevents their individual inaccuraciesfrom cancelling each other out when averaged. In the absence ofcognitive access to true values, how is the evaluation of measurementaccuracy possible?
In answering this question, philosophers have benefited from studyingthe various senses of the term “measurement accuracy” asused by practicing scientists. At least five different senses havebeen identified: metaphysical, epistemic, operational, comparative andpragmatic (Tal 2011: 1084–5). In particular, the epistemic or“uncertainty-based” sense of the term is metaphysicallyneutral and does not presuppose the existence of true values. Instead,the accuracy of a measurement outcome is taken to be the closeness ofagreement among values reasonably attributed to a quantity givenavailable empirical data and background knowledge (cf. JCGM 2012: 2.13Note 3; Giordani & Mari 2012; de Courtenay and Grégis2017). Thus construed, measurement accuracy can be evaluated byestablishing robustness among the consequences of models representingdifferent measurement processes (Basso 2017; Tal 2017b; Bokulich 2020;Staley 2020).
Under the uncertainty-based conception, imprecision is a special typeof inaccuracy. For example, the inaccuracy of weight measurements isthe breadth of spread of values that are reasonably attributed to theobject’s weight given the indications of the balance andavailable background knowledge about the way the balance works and thestandard weights used. The imprecision of these measurements is thecomponent of inaccuracy arising from uncontrolled variations to theindications of the balance over repeated trials. Other sources ofinaccuracy besides imprecision include imperfect corrections tosystematic errors, inaccurately known physical constants, and vaguemeasurand definitions, among others (seeSection 7.1).
Paul Teller (2018) raises a different objection to the error-basedconception of measurement accuracy. He argues against an assumption hecalls “measurement accuracy realism”, according to whichmeasurable quantities have definite values in reality. Teller arguesthat this assumption is false insofar as it concerns the quantitieshabitually measured in physics, because any specification of definitevalues (or value ranges) for such quantities involves idealization andhence cannot refer to anything in reality. For example, the conceptusually understood by the phrase “the velocity of sound inair” involves a host of implicit idealizations concerning theuniformity of the air’s chemical composition, temperature andpressure as well as the stability of units of measurement. Removingthese idealizations completely would require adding infinite amount ofdetail to each specification. As Teller argues, measurement accuracyshould itself be understood as a useful idealization, namely as aconcept that allows scientists to assess coherence and consistencyamong measurement outcomesas if the linguistic expression ofthese outcomes latched onto anything in the world. Precision issimilarly an idealized concept, which is based on an open-ended andindefinite specification of what counts as repetition of measurementunder “the same” circumstances (Teller 2013: 194).
How to cite this entry. Preview the PDF version of this entry at theFriends of the SEP Society. Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entryatPhilPapers, with links to its database.
Duhem, Pierre |economics: philosophy of |empiricism: logical |Helmholtz, Hermann von |Mach, Ernst |models in science |operationalism |physics: experiment in |Poincaré, Henri |quantum theory: philosophical issues in |Reichenbach, Hans |science: theory and observation in |scientific objectivity |Vienna Circle
The author would like to thank Stephan Hartmann, Wendy Parker, PaulTeller, Alessandra Basso, Sally Riordan, Jo Wolff, Conrad Heilmann andparticipants of the History and Philosophy of Physics reading group atthe Department of History and Philosophy of Science at the Universityof Cambridge for helpful feedback on drafts of this entry. The authoris also indebted to Joel Michell and Oliver Schliemann for usefulbibliographical advice, and to John Wiley and Sons Publishers forpermission to reproduce excerpt from Tal (2013). Work on this entrywas supported by an Alexander von Humboldt Postdoctoral ResearchFellowship and a Marie Curie Intra-European Fellowship within the7th European Community Framework Programme. Work on the2020 revision of this entry was supported by an FRQSC New Academicgrant, a Healthy Brains for Healthy Lives Knowledge Mobilizationgrant, and funding from the Canada Research Chairs program.
View this site from another server:
The Stanford Encyclopedia of Philosophy iscopyright © 2023 byThe Metaphysics Research Lab, Department of Philosophy, Stanford University
Library of Congress Catalog Data: ISSN 1095-5054