Philosophy of Science, 72 (December 2005) pp. 850–863. 0031-8248/2005/7205-0017$10.00 Copyright 2005 by the Philosophy of Science Association. All rights reserved. 850 Measurement Outside the Laboratory Marcel Boumans† The kinds of models discussed in this paper function as measuring instruments. We will concentrate on two necessary steps for measurement: (1) one should search for a mathematical representation of the phenomenon; (2) this representation about the phenomenon should cover an invariant relationship between properties of the phe- nomenon to be measured and observable associated attributes of a measuring instru- ment. Therefore, the measuring instrument should function as a nomological machine. However, invariant relationships are not necessarily ceteris paribus regularities, but could also occur when the influence of the environment is negligible. Then we are able to achieve accurate measurements outside the laboratory. 1. Introduction. Although phenomena are investigated by using observed data, they themselves are in general not directly observable. To ‘see’ them we need instruments, and to obtain numerical facts about the phenomena in particular we need measuring instruments. This view is a result of Woodward’s (1989) account of the distinction between phenomena and data. According to Woodward, phenomena are relatively stable and gen- eral features of the world and therefore suited as objects of explanation and prediction. Data, that is, the observations playing the role of evidence for claims about phenomena, on the other hand, involve observational mistakes, are idiosyncratic, and reflect the operation of many different causal factors, and are therefore unsuited for any systematic and gener- alizing treatment. Theories are not about observations—particulars—but about phenomena—universals. Woodward characterizes the contrast between data and phenomena in three ways. In the first place, the difference between data and phenomena can be indicated in terms of the notions of error applicable to each. In the case of data, the notion of error involves observational mistakes, while in the case of phenomena, one worries whether one is detecting a real fact rather than an artifact produced by the peculiarities of one’s instru- †To contact the author, please write to: Department of Economics, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, Netherlands; e-mail: m.j.boumans@uva.nl. MEASUREMENT OUTSIDE THE LABORATORY 851 ments or detection procedures. A second contrast between data and phe- nomena is that phenomena are more ‘widespread’ and less idiosyncratic, less closely tied to the details of a particular instrument or detection procedure. A third way of thinking about the contrast between data and phenomena is that scientific investigation is typically carried on in a noisy environment, an environment in which the observations reflect the op- eration of many different causal factors. The problem of detecting a phenomenon is the problem of detecting a signal in this sea of noise, of identifying a relatively stable and invariant pattern of some simplicity and generality with recurrent features—a pattern which is not just an artifact of the particular detection techniques we employ or the local environment in which we operate. (Woodward 1989, 396–397) Underlying the contrast between data and phenomena is the idea that theories do not explain data, which typically will reflect the presence of a great deal of noise. Rather, an investigator first subjects the data to analysis and processing, or alters the experimental design or detection technique, in an effort to separate out the phenomenon of interest from extraneous background factors. “It is this extracted signal rather than the data itself which is then regarded as a potential object of explanation by theory” (397). The kinds of models discussed in this paper function as detection in- struments—more specifically, as measuring instruments. In measurement theory, measurement is the mapping of a property of the empirical world into a set of numbers. But how do we arrive at meaningful numbers? To attain numbers that will inform us about phenomena, we have to find appropriate mappings of the phenomena. This kind of mathematization is done by modeling the phenomena in a very specific way. Theories are incomplete with respect to the facts about phenomena. Though theories explain phenomena, they often (particularly in the social sciences) do not have built-in application rules for mathematizing the phenomena. Moreover, theories do not have built-in rules for measuring the phenomena. For example, theories tell us that metals melt at a certain temperature, but not at which temperature (Woodward’s example); or they tell us that capitalist economies give rise to business cycles, but not the duration of recovery. In practice, by mediating between theories and data, models may overcome this dual incompleteness of theories. This paper will concentrate on two necessary steps for measurement (whether or not provided by theory): (1) one should search for a math- ematical representation of the phenomenon; and (2) this representation about the phenomenon should cover an as far as possible invariant re- lationship between facts and data. 852 MARCEL BOUMANS Figure 1. 2. Mathematical Representation. The dominant measurement theory to- day is the representational theory of measurement. The core of this theory is that measurement is a process of assigning numbers to attributes or characteristics of the empirical world in such a way that the relevant qualitative empirical relations among these attributes or characteristics are reflected in the numbers themselves as well as in important properties of the number system. In other words, measurement is conceived of as establishing a homomorphism1 between a numerical and an empirical structure. In the formal representational theory this is expressed as: Take a well-defined, non-empty, class of extra-mathematical entities, X. Let there exist on that class a set of empirical relations .R p {R , . . . , R }1 n Let us further consider a set of numbers N (in general a subset of the set of real numbers Re) and let there be defined on that set a set of numerical relations . Let there exist a mapping M with domain XP p {P, . . . , P }1 n and a range in N, M: which is a homomorphism of the empiricalX r N relationship system AX, RS and the numerical relational system AN, PS (Finkelstein 1975, 105). This is diagrammatically represented in Figure 1, where andx � Xi . Mapping M is a so-called ‘scale of measurement’. Measurementn � Ni theory is supposed to analyze the concept of a scale of measurement. It distinguishes various types of scale and describes their uses, and formu- lates the conditions required for the existence of scales of various types.2 1. The term derives from the Greek omo, ‘alike’, and morphosis, ‘to form’ or ‘to shape’. It denotes that the assignment M preserves the properties of the relational structure R. 2. See Savage and Ehrlich 1992 for a historical survey of measurement theory, and Finkelstein 1975 for a survey of the epistemological and logical foundations of measurement. MEASUREMENT OUTSIDE THE LABORATORY 853 The problem, however, is that the representational theory of measure- ment has turned too much into a pure mathematical discipline, leaving out the question of how the mathematical structures gain their empirical significance in actual measurement. The representational theory lacks con- crete measurement procedures and devices. This problem of empirical significance is discussed by Heidelberger (1994a, 1994b), who argues for giving the representational theory a ‘correlative interpretation’. In his plea for a correlative interpretation, Heidelberger traces the or- igins of the representational theory of measurement in Maxwell’s method of using formal analogies. As Heidelberger observantly noted, a first glimpse of a representational theory of measurement appeared in (Max- well [1855] 1965). In discussing his method of using analogies, the ‘rep- resentational view’ is expressed en passant: “Thus all the mathematical sciences are founded on relations between physical laws and laws of num- bers, so that the aim of exact science is to reduce the problems of nature to the determination of quantities by operations with numbers” (Maxwell [1855] 1965, 156). Helmholtz took up Maxwell’s view and continued to think in this di- rection. Usually, Helmholtz 1887 is taken as the starting point of the development of the representational theory. The development since Helm- holtz’s seminal paper is well described elsewhere (see, for example, Michell 1993 and Savage and Ehrlich 1992.), and will not be repeated here. But unfortunately, as Heidelberger emphasizes, the result of this development is that most followers of the representational theory of today have adopted an operationalist interpretation. This operationalist interpretation is best illustrated by Stevens’ dictum: [M]easurement [is] the assignment of numerals to objects or events according to rule—any rule. Of course, the fact that numerals can be assigned under different rules leads to different kinds of scales and different kinds of measurements, not all of equal power and usefulness. Nevertheless, provided a consistent rule is followed, some form of measurement is achieved. (Stevens 1959, 19) By labeling the current interpretation of measurement an operationalist one, Heidelberger alluded not only to a strong version of operationalism in which terms in a theory are fixed by giving operational definitions, but also to a weaker one that says that a concept is quantitative if the op- erational rules are fixed that lead to a numerical value, whatever else the meaning of the concept might be. The disadvantage of an operationalist interpretation is that it is much too liberal. As Heidelberger rightly argues, we could not make any dif- ference between a theoretical determination of the value of a theoretical quantity and the actual measurement. A correlative interpretation does 854 MARCEL BOUMANS not have this disadvantage, because it refers to the handling of a measuring instrument. This interpretation of the representational theory of mea- surement was based on Fechner’s correlational theory of measurement. Fechner had argued that the measurement of any attribute x generally presupposes a second, directly observable attribute y and a measurement apparatus A that can represent variable values of y in correlation to values of x. The correlation is such that when the states of A are arranged in the order of x they are also arranged in the order of y. The different values of y are defined by an intersubjective, determinate, and repeatable cal- ibration of A. They do not have to be measured on their part. The function that describes the correlation between x and y relative to A (underlying the measurement of x by y in A) is precisely what Fechner called the measurement formula. Normally, we try to construct (or find) a measurement apparatus which realizes a 1 : 1 correlation be- tween the values of x and the values of y so that we can take the values of y as a direct representation of the value of x. (Heidelberger 1993, 146)3 To illustrate this, let us consider an example of temperature measurement. We can measure temperature, x, by constructing a thermometer, A, that contains a mercury column whose length, y, is correlated with tempera- ture. The measurement formula, the function describing the correlation between x and y, , is determined by choosing the shape of thex p f ( y) function, f, e.g., linear, and by calibration. For example, the temperature of boiling water is fixed at 100, and of ice water at 0. The correlative interpretation of measurement implies that the scales of measurement are a specific form of indirect scales, namely so-called associative scales. This terminology is from Ellis (1968), who adopted a conventionalist view on measurement. To see that measurement on the one side requires empirical significance—Heidelberger’s point—and on the other hand is conventional, we first have a closer look at direct mea- surement and thereafter we will discuss Ellis’ account of indirect measurements. A direct measurement scale for a class of measurands is one based entirely on relations among that class and not involving the use of mea- surements of any other class. This type of scale is implied by the definition of the representational theory of measurement above, see Figure 1. Al- though, direct measurement assumes direct observability—human per- ception without the aid of any instrument—of the measurand, we nev- 3. I have replaced the capitals Q and R in the original text by the lower case letters x and y, respectively, to make the discussion of the measurement literature uniform. MEASUREMENT OUTSIDE THE LABORATORY 855 Figure 2. ertheless need a standard to render an observation into a measurement. A standard is a “material measure, measuring instrument, reference ma- terial or measuring system intended to define, realize, conserve or repro- duce a unit or one or more values of a quantity to serve as a reference” (International Vocabulary 1993, 45). This means that inserting a standard, s, into the physical state set (see Figure 2) should complete the above Figure 1. However, there are properties, like temperature, for which it is not possible or convenient to construct satisfactory direct scales of measure- ment. Scales for the measurement of such properties can, however, be constructed, based on the relation of that property, x, and quantities, iy ( enumerates the different dimensions), with which it is as-i p 1, . . . , n sociated and for which measurement scales have been defined. Such scales are termed indirect. Associative measurement depends on there being some quantity y associated with property x to be measured, such that when things are arranged in the order of x, under specific conditions, they are also arranged in the order of y. Ellis defines an associative scale for the measurement of x by taking as the measure of x, where M(y) isf (M( y)) the measure of y on some previously defined scale, and f is any strictly monotonic increasing function. We have derived measurement if there exists an empirical law and if it is the case1 nF p F(M ( y ), . . . , M ( y ))1 n that whenever things are ordered in the order of x, they are also arranged in the order of F. Then we can define as a measure1 nF(M ( y ), . . . , M ( y ))1 n of x. The measurement problem then is the choice of the associated property y and the choice of f (or F), which Ellis, following Mach, called the “choice of principle of correlation.” For Ellis, the only kinds of considerations that should have any bearing on the choice of principle of correlation are 856 MARCEL BOUMANS Figure 3. considerations of mathematical simplicity (Ellis 1968, 95–96). But this is too much conventionalism; even Mach noted that whatever form one chooses, it still should have some empirical significance. It is imperative to notice that whenever we apply a definition to nature we must wait to see if it will correspond to it. With the exception of pure mathematics we can create our concepts at will, even in geometry and still more in physics, but we must always investigate whether and how reality correspond to these concepts. (Mach [1896] 1966, 185) Associative measurement can be pictured as an extended version of direct measurement; see Figure 3. The association is indicated by f in Figure 3. An associative scale for the measurement of x is now defined by taking n p M( y) p M( f (x)). According to Heidelberger (1993, 147), “Mach not only defended Fech- ner’s measurement theory, he radicalized it and extended it into physics.” To Mach, any establishment of an objective equality in science must ul- timately be based on sensation, because it needs the reading (or at least MEASUREMENT OUTSIDE THE LABORATORY 857 the gauging) of a material device by an observer; see Figure 2. The central idea of associative measurement, which stood at the center of Mach’s philosophy of science, is that in measuring any attribute we always have to take into account its empirical lawful relation to (at least) another attribute. The distinc- tion between fundamental [read: direct] and derived [read: indirect] measurement, at least in a relevant epistemological sense, is illusory. (Heidelberger 1994b, 11) The difference between Ellis’ associative measurement and Heidelber- ger’s correlative measurement is that, according to Heidelberger, the map- ping of y into numbers, M(y), is not the result of (direct) measurement, but is obtained by calibration (see the quotation from Heidelberger 1993, 146); moreover, a correlation necessarily involves an instrument, discussed in next section. To determine the scale of the thermometer, no prior mea- surement of the expansion of the mercury column is required; by con- vention it is decided in how many equal parts the interval between two fixed points (melting point and boiling point) should be divided. In the same way, a clock continuously measures time, irrespective of its face. The face is the conventional part of time measurement and the moving of the hands the empirical determination of time. Heidelberger’s inter- pretation shifts the emphasis from mapping M—the conventional part— to the empirical relationships described by f, and thus gives back to mea- surement the idea that it concerns concrete measurement procedures and devices, taking place in the domain of the physical state sets as a result of an interaction between X and Y; see Figure 3. 3. Invariance. Measurement, including the measuring instrument being used, is based on a correlative relation between the measurand, x, and the associated quantity, y. To gain a better understanding of measurement we must have a closer look at the nature of the correlative relation. The various authors refer to it in terms of an empirical lawful relationship, in the sense that “when things are arranged in the order of x, under certain specified conditions, they are also arranged in the order of y” (Ellis 1968, 90). However, it should not be considered as a numerical law, because that would require independent measurements of both x and y. “For each of the variables in a law there must exist a measurement apparatus with a measurement formula before a law can be established and tested” (Hei- delberger 1993, 146–147).4 To investigate what a ‘lawful relation’ means in the context of mea- 4. In the case of derived measurement, there exist independent measurements of the y i’s, but not of x. The discussion below includes this type of measurement, too. 858 MARCEL BOUMANS surement, it is very useful to use Cartwright’s view that a law of nature— necessary regular association between properties—holds only relative to the successful repeated operation of a nomological machine: a fixed (enough) arrangement of components, or factors, with stable (enough) capacities that in the right sort of stable (enough) environ- ment will, with repeated operation, give rise to the kind of regular behaviour that we represent in our scientific laws. (Cartwright 1999, 50) It shows why the empirical lawful relation on which the measurement is based and the measuring instrument are two sides of the same coin. The measuring instrument must function as a nomological machine to fulfill its task. This interconnection is affirmed by Heidelberger’s use of “cor- relative relation” and “measuring instrument” as nearly synonymous, El- lis’ definition of a lawful relation as an arrangement under specific con- ditions and Finkelstein’s observation that the “law of correlation” is “not infrequently less well established and less general, in the sense that it may be the feature of specially defined experimental apparatus and conditions” (Finkelstein 1975, 108). The correlative interpretation of measurement takes account of a mea- suring system—nomological machine—A, which results in a further ex- pansion of the above figures; see Figure 4. A correlative scale for the measurement of x is then defined by taking n p M( y) p M(F(x, OC )), where M(y) is the measure of y on some previously defined scale, and F is the correlation between x and y that also involves other influences indicated by OC: y p F(x, OC ), where OC, an acronym of “other circumstances,” is a collective noun of all the other factors that could have an influence on y and therefore needed to be controlled for. However, outside the laboratory we can control the environment only to a certain extent. To gain more insight into how to deal with this problem of the (im)possibility of conditioning the circumstances with respect to measurement, the history of the standardization of the thermometer is helpful. Chang (2001) shows that standardization was closely linked to the dual measurement problem, namely the choice of the proper associated quantity and the choice of the principle of correlation, which is labeled by him the “problem of nomic measurement”: 1. We want to measure quantity X. MEASUREMENT OUTSIDE THE LABORATORY 859 Figure 4. 2. Quantity X is not directly observable by unaided human perception so we infer it from another quantity Y, which is directly observable. 3. For this inference we need a law that expresses X as a function of Y, .X p f (Y ) 4. The form of this function f cannot be discovered or tested empiri- cally, because that would involve knowing the values of both Y and X, and X is the unknown variable that we are trying to measure. (Chang 2001, 251) Although Chang (2001) discusses only one part of the measurement prob- lem, namely the choice of the associated property, in this case the choice of the right thermometric fluid, it also gives some hints about solving the problem of the choice of the most appropriate form of f. Historically, there were three significant contenders: atmospheric air, 860 MARCEL BOUMANS mercury, and ethyl alcohol. At the end of the eighteenth century, it was generally believed that the mercury thermometer indicated the real degree of heat. But in the nineteenth century people started to question the accuracy of mercury thermometers. To choose among the three candidate contenders, all kinds of experiments were suggested. The problem, how- ever, was that the proposed experiments to settle the debate were based on theoretical assumptions about the kind of thermal expansion the fluid would show—the form of f. But to test these expansions one has to carry out measurements for which a thermometer was needed. This circularity was avoided by Regnault’s use of principle of ‘comparability’: “If a type of thermometer is to be an accurate instrument, all thermometers of that type must agree with each other in their readings” (Chang 2001, 276). Before Regnault’s experiments, it was agreed that mercury thermom- eters were comparable. Regnault discovered that this was not true: the readings of mercury thermometers made with different types of glass, or even the same type of glass which had undergone different thermal treat- ments, could not be made to agree with each other. The failure of com- parability due to the behavior of glass was not avoidable by specifying a certain type of glass as the standard glass. “To do so, one would have needed to specify and control the exact chemical composition of the glass, the process of manufacture, and the method of blowing the thermometer bulb” (Chang 2001, 278). But using gas instead of mercury seemed to be the answer, for the thermal expansion of gas was known to be so great that the expansion of the glass envelop was thereby made negligible. Restricting his attention to thermometers containing air, he found that those made with different densities were quite comparable with each other, certainly better than the mercury thermometers. This result was also sum- marized by Mach, when discussing the experiments of Dulong and Petit. One of the greatest advantages which a gas offers is its large expan- sion, and the resultantly great sensitivity of the thermometer. Also, because of this great expansion, the disturbing influence of the vary- ing material of the vessels passes into the background. . . . [G]as expands 146 times as much as glass. The expansion has therefore only a small influence upon the apparent expansion of the gas, and its change with different sorts [of glass] is of negligible influence. . . . The choice of material for the vessel, that is the individuality of the thermometer, can only disturb this relation insignificantly: thermom- eters become comparable to a high degree. (Mach [1896] 1966, 188) If we want to apply the strategy of comparability in social science, we face the problem that we often cannot create a uniform environment to compare the different instruments, for instance in a laboratory as Regnault did. But when we have a closer look at Regnault’s method, we see that MEASUREMENT OUTSIDE THE LABORATORY 861 the essence of comparability is that it allows one to find an accurate measuring instrument, especially when one cannot control all circum- stances. As Chang pointed out: The requirement of comparability was not new with Regnault. It had been widely considered a basic requirement for reliability in ther- mometry for a long time. The term is more easily understood if we go back to its origin, namely when thermometers were notoriously unstandardized so the readings of different types of thermometers could not be meaningfully compared with one another. (Chang 2001, 276) The readings of the thermometers were not comparable to each other because the materials from which the instruments were built were not of the same quality—there were no standards. This quality depended on, for example, the kind of glass or the density of the gas that was used for fabricating the instrument, but also on the craftsmanship of the instru- ment-maker, circumstances that were not controlled in a laboratory, but were at the same time part of the setup in which the measurement took place. The strategy of comparability is to find a measuring-instrument for which the readings are least influenced by or most independent of the quality of the materials, or in more general terms, the circumstances one cannot control. In measurement, even in a laboratory, there are always circumstances one cannot control. A measuring instrument is accurate when it is de- signed, fabricated and used in such a way that the influences of all these uncontrollable circumstances are negligible. For example, a gas thermom- eter is more accurate than a mercury thermometer, because the expansion of glass is negligible compared with the expansion of gas. Thus, the em- pirical relation between the expansion of the gas column and temperature is (more or less) invariant, and influenced only negligibly by other cir- cumstances. In other words, a measuring device should be constructed and used such that it fulfills the ceteris neglectis condition. This latter condition can be clarified by the next equation: Dy p DF(x, OC ) p �F/�x 7 Dx � �F/�OC 7 DOC. Suppose we care about the correlative relation between a property, x, to be measured and the associated quantity, y. The instrument should be constructed and used such that changes in y are sensitive to changes in x and at the same time insensitive to changes in the other circumstances, OC. This condition implies requirements on and , namely�F/�p �F/�OC must be negligible compared to , and, of course, is�F/�OC �F/�x �F/�x not affected by changes in x: Dx has no influence on . If we can�F/�x construct the instrument based on the ceteris neglectis condition, we do 862 MARCEL BOUMANS not have to worry about the extent to which the other circumstances are changing. They do not have to be controlled as is assumed in the con- ventional ceteris paribus ( ) and ceteris absentibus ( )DOC p 0 OC p 0 conditions. The equation above clarifies the problem of comparability: accuracy of the thermometer was dealt with by searching for the most adequate filling. This was done by selecting filling y such that is negligible com-�F/�OC pared to .�F/�x 4. Conclusion. A measurement formula must be a representation of a lawful relationship. According to Cartwright, for lawful relationships we need stable environments—nomological machines. So in her account, measuring instruments can fulfill their measurement task only when they are nomological machines. To build them we must be able to control the circumstances, which is possible (always only to a certain extent) in a laboratory but highly problematic in the open air. However, invariant relationships are not always the result of ceteris paribus environments but could also occur because the influence of the environment is negligible, in other words invariant relationships could also be ceteris neglectis reg- ularities. With an instrument fulfilling this ceteris neglectis condition one is able to achieve accurate measurements outside the laboratory. REFERENCES Cartwright, Nancy (1999), The Dappled World: A Study of the Boundaries of Science. Cam- bridge: Cambridge University Press. Chang, Hasok (2001), “Spirit, Air, and Quicksilver: The Search for the ‘Real’ Scale of Temperature”, Historical Studies in the Physical and Biological Sciences 31: 249–284. Ellis, Brian (1968), Basic Concepts of Measurement. Cambridge: Cambridge University Press. Finkelstein, Ludwik (1975), “Fundamental Concepts of Measurement: Definition and Scales”, Measurement and Control 8: 105–110. Heidelberger, Michael (1993), “Fechner’s Impact for Measurement Theory”, Behavioral and Brain Sciences 16: 146–148. ——— (1994a), “Alternative Interpretationen der Repräsentationstheorie der Messung”, in Georg Meggle, and Ulla Wessels (eds.), Analyomen 1: Proceedings of the 1st Conference “Perspectives in Analytical Philosophy”. Berlin and New York: Walter de Gruyter, 310– 323. ——— (1994b), “Three Strands in the History of the Representational Theory of Mea- surement.” Working paper. Berlin: Humboldt University. Helmholtz, H. von (1887), “Zählen und Messen, erkenntnistheoretisch betrachtet”, in Phi- losophische Aufsätze Eduard Zeller gewidmet. Leipzig: Fuess. International Vocabulary of Basic and General Terms in Metrology (1993). Geneva: Inter- national Organization for Standardization. Mach, Ernst [1896] (1966), “Critique of the Concept of Temperature”, in Brian Ellis, Basic Concepts of Measurement. Translated by M. J. Scott-Taggart and B. Ellis. Cambridge: Cambridge University Press, 183–196. Maxwell, James Clerk [1855] (1965), “On Faraday’s Lines of Force”, in W. D. Niven (ed.), The Scientific Papers of James Clerk Maxwell, vol. I. New York: Dover, 155–229. MEASUREMENT OUTSIDE THE LABORATORY 863 Michell, Joel (1993), “The Origins of the Representational Theory of Measurement: Helm- holtz, Hölder, and Russell”, Studies in History and Philosophy of Science 24: 185–206. Savage, C. Wade, and Philip Ehrlich (1992), “A Brief Introduction to Measurement Theory and to the Essays”, in C. W. Savage, and P. Ehrlich (eds.), Philosophical and Foundational Issues in Measurement Theory. Hillsdale, NJ: Lawrence Erlbaum Associates, 1–14. Stevens, S. S. (1959), “Measurement, Psychophysics, and Utility”, in C. West Churchman and Philburn Ratoosh (eds.), Measurement: Definitions and Theories. New York: Wiley, 18–63. Woodward, James (1989), “Data and Phenomena”, Synthese 79: 393–472.