info2a.dvi


Objectivity, Information, and Maxwell�s Demon∗

Steven Weinstein

Abstract

This paper examines some common measures of complexity, structure, and informa-

tion, with an eye toward understanding the extent to which complexity or information-

content may be regarded as objective properties of individual objects. A form of con-

textual objectivity is proposed which renders the measures objective, and which largely

resolves the puzzle of Maxwell�s Demon.

An n number of possible languages use the same vocabulary; in some of them,

the symbol library allows the correct deÞnition a ubiquitous and lasting system

of hexagonal galleries, but library is bread or pyramid or anything else, and

these seven words which deÞne it have another value. You who read me, are you

sure of understanding my language? (Borges 1964, 57-58)

1 Introduction

In The Library of Babel, the Argentinian writer Jorge Luis Borges portrays a vast library

consisting of Þlled with books having an identical format of four-hundred ten pages, forty

lines per page, and eighty characters per line.

The Library is total and ... its shelves register all the possible combinations

of the twenty-odd orthographical symbols (a number which, though extremely

vast, is not inÞnite): Everything: the minutely detailed history of the future,

the archangels� autobiographies, the faithful catalogs of the Library, thousands

and thousands of false catalogs, the demonstration of the fallacy of those cata-

logs, the demonstration of the fallacy of the true catalog, the Gnostic gospel of

Basilides, the commentary on that gospel, the commentary on the commentary

on that gospel, the true story of your death, the translation of every book in all

languages, the interpolations of every book in all books. (Borges 1964, 52)

∗Thanks to Jos Uffink and Janneke van Lith�van Dis for helpful discussions.

1

CORE Metadata, citation and similar papers at core.ac.uk

Provided by PhilSci Archive

https://core.ac.uk/display/11921120?utm_source=pdf&utm_medium=banner&utm_campaign=pdf-decoration-v1


The very idea of such a collection is mind-boggling. The library clearly contains great

wisdom, but it is hidden amongst sophistry, lies, and above all, gibberish. Indeed, most of

the volumes will contain vast stretches of meaningless prose; meaningless, anyway, in any

given language. One might wonder what portion of the ∼ 2×101,800,000 books (assuming
25 orthographic symbols) are truly random.1 Which begs the question, what does it mean

to say that a sequence of letters is random?

Were Borges�s library constructed from an alphabet of numerals, rather than letters, a

typical page from a typical book might well look something like this:

Figure 1

This isanexcerpt fromA Million Random Digits, byTheRandCorporation(RandCorporation

1955). Is there really a fact of the matter as to whether a given sequence is random? Is ran-

domness objective? More generally, can we quantify the complexity of a string of symbols,

or a physical object?

2 Shannon information

The information theory of Claude Shannon (1948) provides a method for quantifying the

amount of information produced by an information source. For a source emitting strings

made up of an alphabet of n discrete symbols, the associated Shannon information is deÞned

1Note that there are �only� around 1079 atoms in the known universe!

2


to be

H = −
nX
i=1

pi logpi (1)

where pi is the probability of occurrence of the i�th symbol, and where the base of the

logarithmis regardedasanarbitrarychoiceofunits.2 Sometimescalled theentropy(because

theexpression forH is formally identical to theexpression for theGibbsentropyof statistical

mechanics), this quantity is rightly said to represent the �surprisal� of the source. That is,

it tells us how much new information we obtain when we receive an additional character.

Thus if our alphabet has two symbols, X and Y , and the source is such that X and

Y are equally probable, we Þnd that the information associated with the source is H =

−(1
2
(log2

1
2
) + 1

2
(log2

1
2
)) = 1. On the other hand, if the probability of X is 3

4
and the

probability of Y is 1
4
, then we Þnd that H = −(3

4
(log2

3
4
) + 1

4
(log2

1
4
)) ≈ 0.81. The fact

that the information content of the latter source is lower corresponds to the fact that there

is less uncertainty in the value of the next symbol � it is more likely to be X than Y . The

information is of course minimized if the probability is maximally concentrated on a single

symbol; it is maximized if all symbols are equally likely. To the extent that one is inclined

to regard a source with typical string XXXXX as �more ordered� than a source in which

a typical string has an equal number of Xs and Y s, low information content can be said to

correspond to high disorder or lack of structure.

As noted, however, the Shannon information H quantiÞes the information associated

with a source, rather than with any particular string emitted by that source. Thus there is

no way to assign a distinctive information content to a particular string, such as XXXXX.

Information content is a property of sources, or more abstractly, of statistical ensembles

which represent the probabilistically weighted possible outputs of the source. (The same

goes for the statistical-mechanical entropy of Gibbs.) Thus we can assign a Shannon

information content to the Library of Babel, but we can�t assign a distinctive information

content to each of the individual books.

3 Algorithmic complexity and objectivity

The algorithmic complexity theory of Kolmogorov (1965), Solomonoff (1964) and Chaitin

(1966) is an attempt to characterize the complexity of individual strings. In particular, it

is claimed by some of its advocates and adherents to provide an objective characterization

2Here we are assuming that the probabilities are independent; that is, the probability of a given symbol
does not depend on what symbol or symbols preceded it.

3


of randomness, where randomness is identiÞed with maximal complexity. For example,

Cover and Thomas (1991) enthuse,

This deÞnition of complexity turns out to be universal, that is, computer inde-

pendent, and is of fundamental importance. (p. 3)

and we Þnd Dennett (1991) following suit, claiming that

A pattern exists in some data � is real � if there is a description of the data that

is more efficient than the bit map, whether or not anyone can concoct it. (p.

34)

Dennett�s use of �real� here, and his emphasis on �there is�, suggests that he, too, has

taken on board the idea that algorithmic complexity simpliciter is an objective property.

We shall see that this is not the case.

The idea behind algorithmic complexity is simple: the complexity of a string of symbols

is the length of the shortest program on a universal Turing machine (essentially a general-

purpose computer) that will generate the string. Thus a string of 106 ones will generally

have a very low algorithmic complexity insofar as one can generate it by a program which

says, in essence, �print 106 ones and then stop.� On the other a hand, a string X1X2...Xn

is regarded as random (maximally complex) if the only way to generate it is with a program

of the form �print �X1X2...Xn� and stop.� In such a case, the length of the program is (for

long strings) roughly the length of the string itself.

But this is all a bit quick, for the minimum length of the program required to generate a

string depends on which universal Turing machine is available. In fact, for any Þnite string

at all, one always design a machine that will generate the string with a program length of 1!

This is of course done by effectively building the description of the string into the machine

table (the hardware). Thus the complexity of a particular string is only unambiguously

deÞned with respect to a particular machine.

Now it is often pointed out in discussions of this limitation that the complexity of a

given string S on one machine U1 can vary by no more than a constant from the complexity

on another machine U2. Formally,

CU2(S) ≤ CU1 +K(U1,U2) (2)

where K(U1,U2) is a constant that encodes the length of the shortest program that will

emulate U1 on U2 (so that one may run the U1 program on U2). If this were an equality,

4


rather than an inequality, we would be in business, because this would mean that both

machines would rank all strings of a given length the same in terms of their algorithmic

complexity, differing only by the overall constant K. Indeed, this is how the expression

above is sometimes (mis)understood. For example, Caves (1990) writes,

Algorithmic information is deÞned up to an additive constant, which depends

on the choice of universal computer. (p. 92)

Were the above inequality an equality, this would be correct. But it isn�t, and it isn�t.

Sometimes one sees a similar but more qualiÞed claim along these lines:

The constant [K] in the theorem may be very large. For example, [U1] may be

a large computer with a large number of functions built into the system. The

computer [U2] can be a simple microprocessor. The simulation program will

contain the details of the implementation of all these functions, in fact, all the

software available on the large computer. The crucial point is that the length

of this simulation program is independent of the length of x, the string to be

compressed. For sufficiently long x, the length of this simulation program can

be neglected, and we can discuss Kolmogorov complexity without talking about

the constants. (Cover & Thomas 1991, 148-149)

This is almost, but not quite, correct. Given two different machines, most suitably long

strings will not differ signiÞcantly in their complexity, though this is only because most

strings are maximally complex.3 But this is not true of all suitably long strings, for one

might have an extremely long string which has a complexity of 1 on machine U1 and a

complexity of 1+K on U2. Furthermore, there will always be an inÞnite number of other

machines which will assign an absolutely minimal complexity to any given string, no matter

what the length.4

The purportedly objective characterization of the complexity/randomness of a single

string of symbols given by algorithmic information theory turns out to be objective only

insofar as one speciÞes a Turing machine. Algorithmic complexity simpliciter is no more

an objective property of a string than velocity is an objective property of a physical object.

But �objective� has a variety of meanings (Lloyd 1995) and so it behooves me to spell out

what I have in mind.
3See theorem 7.2.4 in Cover & Thomas (1991).
4To construct such a machine, simply modify an existing machine by adding an internal subroutine (a

consecutively invoked sequence of internal states) which erases the tape and writes out the desired sequence
if the initial string is a 1 followed by a blank.

5


When I call CU1, the algorithmic complexity relative to a particular machine U1, objec-

tive, I mean that it is unambiguous. It is thus objective insofar as two reasonable people

could not differ on its value. Without specifying the machine, the context, a Mac user

and a PC user might quite reasonably differ! What starts out as an ambiguous property,

whose value depends on a certain context (the particular machine) is rendered objective by

explicitly including the context and thereby removing the ambiguity.

The concept of algorithmic complexity is similar to the concept of velocity. Two people

may reasonably differ on the velocity of an object, depending on their state of motion, or

on what they presumed to be the appropriate reference frame. Velocity is thus not an

objective property of an object, insofar as its ambiguity (velocity relative to what?) allows

reasonable people to differ over its value. But relative velocity, the velocity of an object

relative to some speciÞc frame of reference, is of course objective, because the ambiguity

has been removed, the context speciÞed.

4 Thermodynamic entropy

The concept of entropy arose in the context of thermodynamics, which is the study of

physical systems having temperatures (no surprise here!), and thus well-deÞned equilibrium

states. The state of a system in thermodynamics is given by specifying the temperature

T and one or more state variables X,Y,...which are understood to be stable in time, and

thus �in equilibrium.� These variables typically come in pairs, and satisfy one or more

equations of state (hence the name �state variable�). Thus for a so-called �ideal gas�, we

have the equation of state PV = NRT, where P and V are pressure and volume, N is the

number of moles of gas, R is Boltzmann�s constant, and T, of course, is the temperature.

Neither N nor R are state variables (R is not even a variable), but P and V are, as they

correspond to quantities which can be manipulated.

The change in entropy of a thermodynamic system is deÞned by ∆S :=
R
dQ/T (where

the integral is taken over a reversible path in the space of state variables). The entropy

itself is expressible (up to a constant) as a function of the state variables. For example the

entropy S of N moles of an ideal gas is given by

S = N log(VT
3
2)−N logN . (3)

Thus if one holds the volume of N moles Þxed and raises the temperature, the entropy

increases. Or, if one maintains the temperature while allowing the volume to increase, the

entropy increases.

6


Changes in entropy are subject to the Second Law of thermodynamics. One version of

this law reads5,

The entropy S of an isolated system cannot decrease.

�Isolated system� here means a system which receives no energy from the environment.

According to the Second Law, the only way to reduce the entropy of a thermodynamic

system is to do work on it (for example, by pushing on a piston) or to drain heat from it

(cool it).

Now suppose the system is used to do work. For example, suppose our system is

compressed air in a cylinder, which is allowed to push on a piston and thereby lift a weight.

The work done corresponds to the increase in potential energy of the weight. It follows

from the First and Second Laws that the work done W is related to the change in entropy

∆S by

W = T∆S. (4)

Thus work done by an isolated system means an increase in entropy. Conversely, an isolated

system whose entropy is constant can do no work.

While changes in entropy tell us something about how much work we can extract from a

system, entropy itself is often said to characterize the amount of disorder in a system. Thus

a system whose entropy is maximal is often understood to be maximally disordered or ran-

domized. Having reßected on the objectivity and contextuality of algorithmic complexity,

the question naturally arises as to whether these are features of entropy as well.

Here is a reason to think not. Since the possible changes in entropy quantify our ability

to extract work from a system, there should be an objective fact of the matter as to the

difference in entropy between two states of the system. In other words, entropy should be

objective, in the sense of unambiguous, up to an additive constant. The idea here is that

whether or not one can extract work from a system by, for example, removing a partition,

must be an objective property of the system.

Consider a system consisting of a box Þlled with gas with a partition down the middle.6

The temperature and pressure are identical on both sides of the partition, and the entropy

should, as far as we know, be a function S(T,V ). Now remove the partition. Is there

a change in the entropy? It would seem not, and the equation above for the entropy of

an ideal gas is in line with this. Removing the partition makes no real difference, and in

5See Uffink (2001) for an excellent discussion of the surprisingly wide variety of formulations of the Second
Law.

6In what follows I draw heavily on the exposition of Jaynes (1992).

7


particular there is no loss of ability to do work when the two sides mix. There was never

any such ability to start with, as the initial and Þnal states are both states of the same,

maximal entropy.

Now suppose someone else is watching, and that this someone else has special glasses.

To this person, the initial state is one in which the gas on one side of the partition looks

red, and that on the other side looks green. This person will use a different entropy

function S(T,V,C) which includes �color�, since this person can clearly see that there are

two different gasses. This person will certainly say that the entropy increases after the

partition is removed.

At this point, it seems ambiguous as to whether the system has undergone a change in

entropy following the removal of the partition. But one might think that the individual

with more complete knowledge, the one aware of the red-green distinction, has the true

entropy. Further evidence for this might be thought to come from the fact that someone

who is aware of the distinction could use this information to do work by inserting a piston

made of material which is transparent to the green gas and opaque to the red gas. Such a

piston would be driven by the pressure of the red gas, and this could be harnessed to lift a

weight (for example). So it would seem that the true entropy is the entropy which takes

in all of the physical quantities governing the behavior of the gas.

However compelling this idea may be, it is incorrect. There is no �true� entropy.

Rather, there are, as Grad (1961) and Jaynes (1992) have argued, many different entropies.

Consider N moles of some liquid � it might be characterized simply by its temperature,

pressure, and volume. But suppose that the molecules of the liquid have an electric dipole

moment, as is the case with nitrobenzene.7 In this case, one can change the polarization

M of the liquid by manipulating an electric Þeld E (this is how a Kerr cell works). Whereas

before we had one equation of state f(T,P,V ) = 0, there will now be two equations of

state f(T,P,V,E) = 0 and g(T,M,E,P) = 0. The additional state variables mean that

entropy will now be a function S(T,V,M), rather than S(T,V ), as before. Thus a sample

of nitrobenzene which appears to be at maximum entropy with respect to V may or may

not be at maximum entropy with respect to V and M. For instance, if there is no electric

Þeld, then the electrons will be unpolarized. If there is a (uniform) electric Þeld, then

they will tend to line up in the direction of the Þeld. This conÞguration will have a lower

entropy than the unpolarized liquid.

At this point, one might rightly point out that this is just the red-green example in a

more realistic guise. One might further insist that this description, the one which invokes

7This example derives from Jaynes (1965).

8


the polarization, is the true entropy. But the problem is that one can introduce further

state variables as well, not only by discovering new sorts of properties, but by manipulating

the same properties in Þner detail (for example, by applying different electric Þelds to

different regions of the sample). This process is limited only by the fact that, at extremely

small scales, the system ceases to display equilibrium behavior. We do not expect the

polarization of a single molecule to have a stable value, as that molecule is vibrating due to

thermal ßuctuations.

In conclusion, then, the entropy depends not only on the substance, but on the descrip-

tion of the substance, and there is no most-fundamental description. Which description

is appropriate depends on which variables one is able to control, or interested in control-

ling. As is the case with algorithmic complexity and relative velocity, entropy can be

contextualized, and entropy relative to a description can be considered an objective prop-

erty of a physical system. But there is no most-fundamental description, just as there is

no most-fundamental Turing machine or inertial reference frame.

5 Maxwell�s demon and the objectivity of the Second Law

Let us now return to the Second Law. Given that entropy is only objective when contex-

tualized, it would seem that the Second Law is in danger of losing its objectivity unless it

is formulated in a way that incorporates this feature. This suggest the following reformu-

lation:

The entropy S(T,X,Y,. . .) of an isolated system cannot be reduced by manip-

ulation of the state variables (X,Y,.. .).

This version of the Second Law (akin to that of Jaynes (1992)) explicitly acknowledges

the contextual aspect of entropy, its dependence on the thermodynamic description, and

furthermore gives this physical content by making the connection between the description

and the possible physical manipulations of the system.

Perhaps the most important thing to note about this formulation of the Second Law is

that it is a formulation which is immune to the threat of Maxwell�s Demon (see Þgure 2).

Maxwell�s demon is an imaginary creature who is able to take a box of gas at equilibrium

and operate a trap door in a partition in the box, permitting (in one of the more popular

versions) the fast molecules to go one into one side and the slow ones to go into the other.

This results in a decrease in entropy, in the sense that the pressure in one side goes up, and

in fact one can use this entropy difference to do work. Coupled to a heat bath (an inÞnite

9


source of heat to maintain the box of gas at constant temperature), this cycle can go on

forever, creating what is called a �perpetual motion machine of the second kind.�

Figure 2

Much ink has been spilled attempting to �exorcise� the demon (see (Leff & Rex 1990)),

as it has seemed to many (though apparently not to Maxwell) to be a threat to the universal

validity of the Second Law.8 Most exorcisms work on the assumption that the demon must

do work in order to reduce the entropy, thereby ensuring no net gain of work, and thereby

precluding a perpetual motion machine. These exorcisms are quite problematic (Earman

& Norton 1998, 1999), but from the perspective of this paper, they are completely beside

the point. If the Second Law is formulated properly, the demon doesn�t violate the Second

Law. The demon is operating on a scale at which the Second Law does not apply, for he

is manipulating the system on a microscopic level. The Second Law, properly understood,

is a law of thermodynamics, and it should come as no surprise that its domain extends only

to thermodynamic variables. The demon interacts with the gas on a scale at which the

properties it is manipulating are constantly ßuctuating.

8The demon seems to have been proposed by Maxwell (1871) to make a point closely related to the point
I am making in this paper. However, it was subsequently thought by many to pose a threat to the Second
Law.

10


To those who resist this conclusion, I would simply point out that we are not surprised

that manipulations described by quantum mechanics give rise to phenomena which could

not be achieved by classical manipulations. In fact, this is the primary appeal of the Þeld

of quantum computation: manipulating systems on a sufficiently small scale may allow us

to compute things in polynomial time that could only be computed in exponential time on

a machine whose behavior is governed by classical mechanics (Nielsen & Chuang, 2000).

To be sure, there remains a compelling question at the end of the day, which is, �Can one

build a perpetual motion machine of the second kind?� However, if one could, it would not

violate a properly formulated Second Law.

6 Conclusion

Wehavetakenabrief tour throughsomemeasures of complexity, structure, and information.

I have shown how two important measures, algorithmic complexity and thermodynamic

entropy, can be thought of as objective properties of, respectively, strings of symbols and

thermodynamic systems, only insofar as they are regarded as contextual. In one case, we

contextualize to a Turing machine, in the other case, to a description. Once this is taken

into account, the apparent paradox of Maxwell�s demon disappears.

7 Acknowledgements

Figure 2 is taken from Order and Chaos, by Stanley W. Angrist and Loren G. Hepler, c°
1967 by Basic Books, New York. Permission pending.

References

Borges, J. (1964), The Library of Babel, in D. Yates & J. Irby, eds, �Labyrinths: Selected

Stories and Other Writings�, New Directions, New York, pp. 51�58.

Caves, C. (1990), Entropy and information: how much information is needed to assign a

probability, in W. Zurek, ed., �Complexity, Entropy and the Physics of Information,

SFI Studies in the Science of Complexity, Vol. VIII�, Addison-Wesley, Reading, pp. 91�

113.

Chaitin, G. (1966), �On the length of programs for computing binary sequences�, Journal

of the ACM 13, 547�569.

11


Cover, T. & Thomas, J. (1991), Elements of Information Theory, John Wiley and Sons,

New York.

Dennett, D. (1991), �Real patterns�, Journal of Philosophy 88, 27�51.

Earman, J. & Norton, J. (1998), �Exorcist XIV: The wrath of Maxwell�s demon, part I �

from Maxwell to Szilard�, Studies in the History and Philosophy of Modern Physics

29, 435�471.

Earman, J. & Norton, J. (1999), �Exorcist XIV: The wrath of Maxwell�s demon, part II

� from Szilard to Landauer and beyond�, Studies in the History and Philosophy of

Modern Physics 30, 1�40.

Grad, H. (1961), �The many faces of entropy�, Communications on Pure and Applied Math-

ematics 14, 323�354.

Jaynes, E. (1965), Thermodynamics, unpublished, available at

http://bayes.wustl.edu/etj/thermo.html.

Jaynes, E. (1992), The Gibbs paradox, in �Maximum-Entropy and Bayesian Methods�,

Kluwer, Dordrecht.

Kolmogorov, A. (1965), �Three approaches to the quantitative deÞnition of information�,

Problems of Information Transmission 1, 4�7.

Leff, H. & Rex, A. (1990), Maxwell�s Demon: Entropy, Information, Computing, Princeton

University Press, Princeton, NJ.

Lloyd, E. (1995), �Objectivity and the double standard for feminist epistemologies�, Synthese

194, 351�281.

Maxwell, J. (1871), Theory of Heat, Longmans, Green, and Co., London.

Nielsen, M. & Chuang, I. (2000), Quantum Computation and Quantum Information, Cam-

bridge University Press, Cambridge.

Rand Corporation (1955), A Million Random Digits with 100,000 Normal Deviates, The

Free Press, Glencoe, IL.

Shannon, C. (1958), �A mathematical theory of communication�, Bell System Technical

Journal 27, 379�423, 623�656.

12


Solomonoff, R. (1964), �A formal theory of inductive inference�, Information and Control

7, 1�22, 224�254.

Uffink, J. (2001), �Bluff your way in the second law of thermodynamics�, Studies in the

History and Philosophy of Modern Physics 32, 305�394.

13