Philosophy of Science, 78 (December 2011) pp. 000–000. 0031-8248/2011/7805-0042$10.00
Copyright 2011 by The Philosophy of Science Association. All rights reserved.

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

1

A Lot of Data

Kent Johnson†

This article encourages the use of explicit methods in linguistics by attempting to
estimate the size of a linguistic data set. Such estimations are difficult because redundant
data can easily pad the data set. To address this, I offer some explicit operationalizations
of the data and their features. For linguistic data, negative associations do not indicate
true redundancy, and yet for many measures they can be mathematically impossible
to ignore. It is proven that this troublesome phenomenon has positive Lebesgue measure
and is monotonically increasing and that these two features hold robustly in four
different ways.

1. Introduction. Studying how evidence is produced and related to theory
is an important part of developing and maintaining a discipline. Fields such
as psychology, economics, biology, and chemistry have tenured appoint-
ments, conferences, societies, and journals (e.g., Psychometrica, Econome-
trica, Biometrika, Journal of Chemometrics) dedicated to the study and
improvement of their methods.

Attention to methods gets increasingly important as matters get more
complex and are less well developed. For example, the methods of some
parts of economics cry out for attention more so than those of some parts
of physics. This is not only because physics is an older discipline and has
had time for its methods to mature but also because economics often
deals with highly complicated phenomena laden with enormous uncer-
tainties. In such situations, little headway is made without using (and
studying, improving, etc.) explicit, typically quantitative, methods for re-
lating evidence to theory.

Whatever else theorizing in linguistics is, it is complex (even if human
language is built from just a few relatively simple structural items). Nearly
all currently active linguistic projects involve complicated, untamed, and

†To contact the author, please write to: Department of Logic and Philosophy of Science,
3151 SSPA, University of California, Irvine, Irvine, CA 92697-5100; e-mail: johnsonk@
uci.edu.

q1

mailto:http://www.lps.uci.edu/johnsonk/johnsonk@uci.edu/
mailto:http://www.lps.uci.edu/johnsonk/johnsonk@uci.edu/


2 KENT JOHNSON

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

uncharted regions of human language. Moreover, linguistics is a relatively
young discipline. Thus, linguistics would seem to be yet another field that
would benefit from the incorporation of explicit methods that themselves
can be studied, criticized, improved, and so forth. In this article I explore
this issue. In section 2, I consider a very basic issue: assessing the size of
a given linguistic data set. In section 3, I take an initial step toward
explicitly addressing this matter, by suggesting an operationalized char-
acterization of an expression type. In section 4, I consider how this op-
erationalization might help with our assessment. Structurally, the situation
is similar to familiar cases involving multivariate data sets, with one ex-
ception: the irrelevance of negative associations. This exception, I argue,
changes matters considerably. A theorem, proven in the appendix, shows
that these negative correlations cannot simply be ignored and that this
undesirable phenomenon is quite robust. I conclude in section 5.

The size of one’s data set is important; however, my broader goal is to
promote greater methodological research in linguistics. Currently, linguists
routinely assess large and diverse bodies of evidence almost entirely by
informal, holistic, “expert judgments.” Famously, however, in situations
vastly less complex than linguistics, expert judgments are much less re-
liable than they are typically assumed to be (e.g., Dawes 1979; Johnson
2009).

2. Target Issue: Individuating Linguistic Data. I turn now to the basic
matter of estimating the size of a linguistic data set. Such a notion is
employed, implicitly or explicitly, whenever one judges that a journal
article, research project, presentation, and so forth, used “a lot” of data,
“not enough” data, a “wide range” of “diverse” data, “more” data than
a rival hypothesis is founded on, and so on. Such comments are clearly
meant to summarize certain aspects of the evidence and to be part of the
overall normative judgment regarding the theory. If “a lot” of data were
used, how much was that? More than 15 data points? What were those
data points, anyway, and what makes any two of them (if there is more
than one) distinct from one another?

The amount of data used in the construction or confirmation of a theory
is an utterly fundamental matter across the empirical sciences, especially
in those areas where there is great uncertainty and complexity. I will largely
take it for granted that it is of similar importance to linguistics to have
some (explicit) means for estimating the size of a data set. In general, it
is hard to see how progress could be made toward an explicit linguistic
methodology if one cannot even say how large one’s sample is. (For
example, not knowing the size of one’s sample severely limits any further
analysis or assessment in most experimental designs.)

To begin, let us consider the general type of evidence that mainstream

q2


A LOT OF DATA 3

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

linguists typically employ in actual practice. Simplifying, we can assume
the data are linguistic expressions. They are not the psychological data
structures realized (under appropriate idealizations, etc.) inside speakers’
heads (e.g., Chomsky 1986, 25–26). Instead, they are expression types,
such as the type of which the physical inscription “the cat is on the mat”
is an instance. I will assume that such types can be unproblematically
individuated.

In asking how much data were employed by a given linguistic project,
a natural first thought would be to just count how many expression types
were offered in the project. For example, consider a linguist developing
a theory of control, illustrated by sentence 1a.

1a. Sue wants to win.
1b. Suei wants [PROi to win].

Control structures are noteworthy in that they contain a clause (here, to
win) that does not overtly contain a subject. However, in sentence 1a, the
subject of this lower clause can only be Sue. This and other such phe-
nomena have led linguists to posit a phonologically null but syntactically
and semantically active element, PRO, as the subject of the lower clause
(cf. 1b). PRO is “controlled” by Sue, thus mechanically and automatically
determining the correct interpretation of the sentence. Suppose that a
linguist develops a theory concerning a fragment of human language that
includes control structures, which focuses on the phenomenon of “partial
control,” illustrated in sentence 2a:

2a. The chair wanted to meet on Tuesday.
2b. Sue wanted to meet on Tuesday.
2c. The chair hoped to meet on Tuesday.

Sentence 2a is noteworthy in that its most natural interpretation is that
the chair wanted a group of people, only one of whom is the chair herself,
to meet on Tuesday. Thus, the chair only partially determines the subject
of the lower clause (i.e., the value of PRO; Landau 2000).

The central difficulty with determining how much evidence is used in
a theory is that new, redundant data are all too easy to generate. For
example, one gathers no new evidence for a theory by adding sentence
2b to a data set containing sentence 2a. Sentences 2a–2b are simply too
relevantly similar to count as distinct data points. Of course, sentences
2a–2b have different grammatical properties; for example:

3a. *Crazy old the chair wanted to meet on Tuesday.
3b. Crazy old Sue wanted to meet on Tuesday.

However, it is unlikely that such a difference would be relevant to a theory
of control.


4 KENT JOHNSON

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

What about sentence 2c? The extent to which this expression provides
the theory with something new may depend on the nature of the particular
theory. As sentences 4 and 5 suggest, sentences 2a and 2c are somewhat
different as regards the licensing of possible complement structures:

4a. The chair wanted the committee to meet on Tuesday.
4b. *The chair hoped the committee to meet on Tuesday.

5a. *The chair wanted the committee would meet on Tuesday.
5b. The chair hoped the committee would meet on Tuesday.

However, sentence 6 shows that they behave similarly in other respects,
so—depending on the details of the theory at hand—there may be some
amount of (relevant) redundancy present.

6a. It was wanted for the committee to meet on Tuesday.
6b. It was hoped for the committee to meet on Tuesday.

These examples show how a pair of expressions may exhibit some degree
of redundancy. Importantly, however, redundancy is a holistic affair, po-
tentially involving most or all of the data set. For example, if expressions
7a and 7b are already in the data set, then 7c adds no new information.
Similarly, depending on the nature of one’s project, expressions 7d and
7e may also be highly redundant with (7a–7b) collectively, although not
so much with either one individually.

7a. Sue crashed while PRO biking.
7b. Kim wants PRO to be recognized t.
7c. Sue crashed while PRO biking, and Kim wants PRO to be rec-

ognized t.
7d. Kim wants PRO to be recognized t while PRO biking.
7e. Kim went unnoticed while PRO wanting PRO to be recognized t.

In sum, an expression may exhibit some degree of redundancy with
respect to other elements of a data set. Any such redundancy will always
be relative to both a given data set and the particular theory at hand. I
will call this phenomenon the “problem of redundancy.” Since linguistic
data are used in the construction of a theory as well as its confirmation,
what properties of an expression are relevant to an assessment of redun-
dancy—for example, does it license direct objects with controlled com-
plements 4 or tensed clausal complements 5—may not be known a priori.
Instead, determining the relevant properties of expressions may be a mat-
ter of a “bootstrap” procedure (Glymour 1980) as the theory is developed
over time.

The problem of redundancy shows that determining whether one has
used “a lot of data” in the construction/confirmation of a linguistic theory

q3

q4


A LOT OF DATA 5

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

is a highly nontrivial matter. If redundancy is not addressed, then there
is no difference between any finite data set and a countably infinite one
(e.g., augment expression 2a–2b with DP wanted to meet on Tuesday, for
every DP). If there is no difference between finite and infinite data sets,
then all distinctions regarding the amount of data used collapse.

I suspect that some linguists would see the problem of redundancy as
not particularly serious. I also suspect that some linguists feel that as they
are constructing or evaluating a theory, they notice such correlations in
the behavior of the data and take this into account in an implicit and
intuitive manner. In the next two sections, I will attempt to render explicit
this purported practice and analyze it.

3. Operationalizing Theoretical Types of Expressions and Their Properties.
In linguistic theorizing, we want to relate the evidence of concrete ex-
pressions to the theoretical models that produce psychological expressions
(e.g., Chomsky 1986, 25–26). However, we have seen that redundancy
threatens to undermine, partially or completely, one of the most basic
features of a body of evidence, namely, its size. In this section, I outline
a strategy for operationally characterizing the relevant theoretical types
of expressions and the relevant properties of these types. Then, in section
4, I consider the problem of redundancy explicitly.

Two key factors motivate our operationalization. First, as sentences
2a–2b and 6 showed, the expressions used as evidence have a great deal
of structure, only some of which is relevant to a given project (since we
currently have only very partial theories of human language). More gen-
erally, a project on control is not likely to be concerned with the highly
detailed structures sketched in expressions 8a–8b. Instead, it is more likely
to focus on certain schematic aspects of structure that are hypothesized
to be those aspects relevant to control, similarly sketched in 8c:

8a. [ [ [ the chair]] [ wanted [ [ [ PRO [ to [′ ′ ′TP DP D T VP CP TP T VP
PRO [ meet [ on Tuesday]]]]]]]]]′V PP

8b. [ [ Sue] [ wanted [ [ [ PRO [ to [ PRO [′ ′ ′TP DP T VP CP TP T VP V
meet [ on Tuesday]]]]]]]]]PP

8c. [ DP [ Verb [ [ [ PRO [ to [ PRO′ ′TP T {F , . . . , F } VP CP TP T VP1 n
[ ]]]]]]]]′V

The structures in expression 8 are merely illustrative—different theories
would posit different structures. However, they show that part of ana-
lyzing linguistics is isolating those structural elements that are/are not
relevant to a given project. Because both 2a and 2b have the structure
given in expression 8, they count as the same type, and so adding 2b to
a data set that contains 2a should not increase the amount of evidence
considered.

q5

q6


6 KENT JOHNSON

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

TABLE 1. HYPOTHETICAL ORGANIZATION OF EXPRESSION TYPES.

Expression Type P1 P2 P3 . . . Pk

The chair wanted. . . 1 0 1 0
Susan wanted. . . 1 0 1 0
The chair hoped. . . 0 1 1 1
_ _ _ _ _
John tried. . . 0 0 0 1

Note.—Hypothetical explicit organization of nominally distinct expression types, with relevant prop-
erties. The first three properties correspond to expressions 4–6 above, namely, the ability to accept direct
object control, finite 0-complement, and for-IP, respectively. The final kth property concerns the ability
to have a complement of the form for-DP, as in *The chair wanted for a short meeting and The chair
hoped for a short meeting.

Of course, we do not have immediate access to the structures in ex-
pression 8; determining them is a big part of what linguistics is all about.
Here too, linguists often bootstrap into increasingly better theories: the
linguist uses current theory to hypothesize some relevant structure thought
to be shared by some expressions and thus individuates the data and
explores the results so as to arrive at a new (hopefully improved) theory.
This new theory is then part of the input that allows the hypothesis of
what the relevant types of expressions are that then allows for a new
individuation of the data, and so on.

I call theoretical items represented by expression 8c “(theoretical)
types.” The difficult task of determining what structure a theoretical type
contains belongs mainly to linguists. From the present perspective, how-
ever, we can assume that such concrete decisions for particular theories
have been made.

The second key fact is that we use expressions like those in sentences
4–6 to explore the nature of theoretical types. In that sense, sentences 4–
6 represent what I call “properties” of the types. For example, suppose
that the relevant expression types differentiate 2a and 2c—that is, the
different kinds of relevant structures that the theory posits are fine-grained
enough that one of them applies to 2a and a different one applies to 2c.
In such a case, sentence 4 tells us that the type represented with 2a allows
direct objects, but the one that 2c represents does not. Similarly, sentence
5 tells us that of the two, only the 2a type allows tensed clauses.

Eventually, in the simplest case, the linguist would posit n theoretical
types each examined from the perspective of k properties. The data can
then be represented by an matrix, where the th element containsn # k ij
a 1 if the ith expression type possesses the jth property, and a 0 otherwise
(or by some other coding system if greater discrimination is used). The
outcome of this stage of the analysis is represented by table 1: The ith
expression type can be operationally defined as the ith row of this matrix,
and the jth property can similarly be operationally defined as the jth
column. Thus, a theoretical type can also be (operationally) thought of

q7

q8


A LOT OF DATA 7

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

as an equivalence class of expressions all of whose members behave iden-
tically across the properties that the linguist has identified as relevant.
How reasonable these operationalizations are is largely an empirical mat-
ter, to be addressed mainly by linguists.

4. The Problem of Negative Correlations. In section 2, we saw that dis-
tinguishing various data points in a linguistic data set is neither trivial
nor an all-or-nothing affair. We need to recognize that there can be re-
dundancy in our data sets and that a nominally new addition to a data
set may not contribute all that much novel information. Moreover, we
saw that the redundancy of an expression may be spread across multiple
elements of the data set. We would like to somehow account for this
redundancy so as to estimate the true size of our data set.

Fortunately, the issue to which we have reduced our linguistic problem
is a familiar one. Intercorrelations in multivariate data sets are a bread-
and-butter issue for many fields, and there are numerous statistical tech-
niques for dealing with them. Because we now have explicit representa-
tions (from sec. 3) of the relevant aspects of the expressions under study,
we can make an assessment of redundancy by considering the amount of
“overlap” between the various pairs of theoretical types of expressions.
A natural first step would be to consider the correlations between the
various types, represented by the n row vectors described above. (In a
complete analysis, the k column vectors would be analyzed as well, as
there could easily be unwanted redundancy in the relevant properties of
the theoretical types. For simplicity’s sake, I ignore this matter here.) Such
correlations are often the inputs to various techniques for treating re-
dundancy (e.g., Jolliffe 2010).

Before turning to these techniques, however, one final empirical con-
sideration must be addressed. The relevant type of redundancy here is
only that which corresponds to “positive” correlation. That is, we only
care about the extent to which two (operationalized) expression types
exhibit the same behavior. In particular, the extent to which they are
negatively associated is the same, for the present estimation purposes, as
having no association at all. For example, across the right range of prop-
erties, pronouns and anaphors are highly negatively correlated. But this
high correlation does not suggest that they are redundant; rather, it shows
how importantly different they are.

Because negatively correlated theoretical types share no structural re-
dundancy, we might try to disregard them by treating the relevant the-
oretical types as independent and hence uncorrelated (i.e., having a cor-
relation of zero). At first, this idea seems simple and natural, and it does
work in two or three dimensions. Unfortunately, with more than three
expressions, there is no guarantee that this strategy will work: the resulting


8 KENT JOHNSON

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

set of correlations often cannot be simultaneously realized. A few remarks
may give a sense of why this is so.

When , there are possible correlations. Thus,nn 1 3 ( ) p n [(n � 1)/2] 1 n2
in fixing these correlations, we have equations of the formn( ) r(x ,i2

(where r is the correlation function), but only n (vectors of)x ) p rj ij
variables to work with. Thus, there is no guarantee that a solution to
these equations will always exist.

More can be said. Let , and for any set of vectors, letnp p ( ) n c p2
be the correlations between each pair of expression types,Ac , . . . , c S1 p

that is, each pair of k -dimensional vectors. (I assume some canonicalk
ordering of all sets of vectors and of the elements of .) The “nonnegativec
variant” of is the sequence that is just like except that the negativec c
correlations have been replaced with zeroes. Let us say that is bad if noc
set of n vectors could have the correlations of its nonnegative variant.
Finally, let is bad We then have the following robustness the-B p {c:c }
orem, proven in the appendix:

Theorem 1. (i) , and more generally, (ii) within , has pos-pB ( /0 � B
itive Lebesgue measure; (iii) is monotonically increasing in . Fur-B n
thermore, facts ii and iii: iv do not depend on any probability dis-
tributions, v do not depend on the operational characterization of
expression types given in section 3, vi hold for any choice of inner
product used as the measure of association (of possibly scaled data),
and vii have corresponding versions for other means of reducing
redundancy, such as singular value decomposition, that do not ob-
viously depend on pairwise associations.

The results just listed apply immediately to the position described at
the end of section 3. There I imagined a linguist saying that the problem
of redundancy can be dealt with by just noticing and keeping track of
the correlations between the data, and accounting for this accordingly, as
part of a holistic expert judgment. However, such a strategy simply will
not work for a very broad range of data sets and a similarly broad range
of measures of association. The reason why has nothing to do with a
linguist’s expertise; rather, it is a fact about what is mathematically pos-
sible.

5. Conclusion. Where does this leave us? We started off by isolating a
fundamental issue regarding the evaluation of linguistic theories, counting
data. Along the way we helped ourselves to many simplifications, finally
ending up with a task vastly simpler than what is routinely performed in
linguistic inference. Unfortunately, that simple task is often impossible.

The persistent adherent of informal, holistic expert judgment faces the


A LOT OF DATA 9

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

challenge of demonstrating the accuracy of this method. This may be
difficult, as tasks generally do not get easier as they get more complex,
and the simple task is not possible. Given the remarkable weakness of
such holistic expert judgments in simpler inferential situations, I am not
optimistic about these prospects. A more promising strategy, I submit, is
to view the present problem as encouragement to follow the other sciences,
and begin taking the difficult steps necessary to develop explicit linguistic
methods. A first step would be to find a way to estimate the size of a
data set, which requires some metric of association to factor out redun-
dancy. I believe common metrics like the correlation are inappropriate
for linguistics, which is good since we have just seen that they do not
work. But what appropriate metric(s) will work is only one of many
questions yet to be addressed.

Appendix: Proof of Theorem 1

Proof of i. Recall that our operationalized initial data were a set
of vectors in . We can rescale our data, settingkx , x R z p [(x �1 n ij ij

, where is the th element of , and and are the mean�¯ ¯x ) / ks ] x j x x si i ij i i i
and standard deviation of . Then the (Pearson) correlation coef-x i
ficient is given by

k
k¯ ¯� (x � x )(x � x )ih i jh jhp1Cov (x , x ) 1i jr p p p z z�ij ih jh
hp1s s k s si j i j (A1)

p (z , z ).i j

Thus, the correlation between and is also the usual inner productx xi j
between the standardizations and . Suppose, for example,z z n pi j

, and that the four vectors are related as in (A2a). The nonnegative4
variant of (A2a) is then given in (A2b):

a. r p �.4, r p .8, r p .1,12 13 14
r p .1, r p .8, r p .6.23 24 34

b. r p 0, r p .8, r p .1,12 13 14 (A2)
r p .1, r p .8, r p .6.23 24 34

Regardless of the nature of the original , we now show that no setxi
of four vectors can have the correlations in (A2b). Thez , . . . , z1 4
proofs below follow easily from some well-known results in matrix
analysis, which are covered in many standard textbooks (e.g., Horn
and Johnson 1985).

To begin, consider the correlation matrices for (A2a) and (A2b),


10 KENT JOHNSON

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

where the th entry of (A3a) corresponds to the correlation betweenij
and :x xi j

1 �.4 .8 .1 1 0 .8 .1
�.4 1 .1 .8 0 1 .1 .8

a. b. . (A3)
.8 .1 1 .6 .8 .1 1 .6( ) ( )
.1 .8 .6 1 .1 .8 .6 1

For , ; fix some unproblematic bijection between pn p 4 p p 6 f �
and the symmetric matrices whose diagonal elements are uni-n # n
formly 1 and whose off-diagonals are the p elements of the argument.
Thus, if and are the vectors corresponding to the (ordered) listsx y
in (A2a) and (A2b), then and . Let d be a functionf (x) p (a) f (y) p (b)
from matrices to matrices such that is exactly liken # n n # n d(M )
M, except that any negative entries in M have been replaced with
zeros. Thus, is then the “nonnegative variant” of M. Thus,d(M )

is a function from a set of correlations to the correlationd( f (x)) p (b)
matrix of its nonnegative variant.

By equation (A1), the th entry of equation (A3a) is also the innerij
product of and . Suppose for a moment that equation (A2b) isz zi j
a possible set of correlations. Then equation (A3b) is a Gram matrix,
that is, a matrix whose th entry is the inner product between twoij
vectors, for some fixed set of n vectors. A Gram matrix G is positive
semidefinite (PSD), i.e., a symmetric matrix G such that forn # n
all , . Importantly, all n eigenvalues of a PSD matrixn Tx � � x Gx ≥ 0
are real and nonnegative. The converses of these implications hold
as well, meaning that G is a Gram matrix if and only if (iff) it has
no negative eigenvalues. The smallest eigenvalues of (A3a) and (A3b)
are .039 and �0.062, respectively. Thus, there exist four vectors

such that (A2a) is their correlation matrix; no such vectorsz , . . . , z1 4
have (A3b) as their correlation matrix. Thus, . QED.B ( /0

Proof of ii. The eigenvalues of a square matrix are a continuous
function of the matrix’s components. Therefore, for any , therec � B
is an open ball of radius , centered at , such that forpD � � � 1 0 c
any : , and is PSD, but is not. To seepy � D y � [�1, 1] f (y) d( f (y))
this notice that there are open balls E, F, G (of , centered at ,p� ) x
such that iff the smallest eigenvalue of is nonnegative,y � E f (y)


A LOT OF DATA 11

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

iff the smallest eigenvalue of is negative, andy � F d( f (y)) G P
. Let . Clearly, . Thus, B has positivep(�1, 1) D p E ∩ F ∩ G D P B

Lebesgue measure.

Proof of iii. Pick any , and let be any n vectorsc � B x , . . . , x1 n
with correlation matrix . Pick any , and let N be thenf (c) x � �n�1
correlation matrix of . Clearly, N is PSD. Let andx , . . . , x , x l1 n n�1

be the smallest eigenvalues of and , respectively. Sincem d( f (c)) d(N )
, . But by the interlacing theorem for bordered matrices,c � B l ! 0
, and so . Thus, Lebesgue measure is preserved when wem ≤ l m ! 0

move to the set of bad matrices for , and any “new” regions′ p�nB �
of bad correlations will only increase the size of . QED.′B

Proof of iv and v. The discussion so far, while employing some tech-
niques widely used in statistics, has been purely algebraic. No use of
probability distributions, implicit or otherwise, has been made; this
proves iv. Moreover, no essential use was made of the operationalized
notion of an expression type from section 3. Any means of comparing
the basic evidential units of linguistic theorizing can establish the
same results, provided only that they ultimately determine a Gram
matrix. This yields v. QED.

Proof of vi. Pick any inner product , and any ; letA 7 , 7 S c � B
. There exist vectors such that , whereTf (c) p C z , . . . , z C p Z Z1 4

is the matrix composed of the zis. SinceZ p [z , . . . , z ] k # 41 4
is an inner product, , for some positiveTA 7 , 7 S Ax, yS p x Py k # k

definite P. Thus, is the Gram matrix for .TG p X PX X p x . . . x1 n
Since is positive definite, there exists a nonsingular suchP k # k Q
that . Since is nonsingular, there exists a suchTP p Q Q Q k # 4 R
that . Thus, .T T T T TQR p Z C p Z Z p (QR) QR p R Q QR p R PR
Since has a negative eigenvalue, it is not the Gram matrix withd(C )
respect to any inner product, including . Since is a con-A 7 , 7 S A 7 , 7 S
tinuous function of the components of its two input vectors, versions
of ii and iii follow for .A 7 , 7 S


12 KENT JOHNSON

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

It is worth noting that many measures of association can be rep-
resented as an inner product. For example, covariance is just a rescaling
of the correlation, and proportions of agreement can be represented
by setting if the ith element possesses the jth property, and�x p 1/ kij

otherwise. In this case, is positive when�x p � (1/ k) Ax, yS � [�1, 1]ij
and agree on most properties, 0 when they are evenly split, andx y

negative when they disagree on most properties. (In general, we may
assume that the vectors and are scaled so that they are unassociatedx y
iff .) QED.Ax, yS p 0

Proof of vii. Finally, it might be thought that we can avoid these
problems by moving from the restrictive case of square, symmetric
( ) matrices to the more general case of rectangular ( )n # n n # k
matrices, which is the form of our original data set. Techniques such
as the singular value decomposition are commonly used to eliminate
redundancy by operating directly on the data matrix, not on its cor-
relation matrix. Perhaps we could decompose some minor n # k
variant of our original data in a way that ignores the negative cor-
relations. The results above show that this is impossible. If a set of
correlations is bad, then no vectors will collectively realize the cor-
relations of its denegativized variant. In particular, no vectors in

will do this. Thus, there simply does not exist an appropriatekR
matrix, made up of such vectors, to decompose. QED.n # k n

REFERENCES

Chomsky, N. 1986. Knowledge of Language. Westport, CT: Praeger.
Dawes, R. 1979. “The Robust Beauty of Improper Linear Models in Decision Making.”

American Psychologist 34: 571–82.
Glymour, C. 1980. Theory and Evidence. Princeton, NJ: Princeton University Press.
Horn, R. A., and C. R. Johnson. 1985. Matrix Analysis. Cambridge: Cambridge University

Press.
Johnson, K. 2009. “The Need for Explicit Inferential Methods in Linguistics.” In Language

and Linguistics Emerging Trends, ed. C. R. Dreyer 193–208. New York: Nova.
Jolliffe, I. 2010. Principal Component Analysis. 2nd ed. New York: Springer.
Landau, I. 2000. Elements of Control: Structure and Meaning in Infinitival Constructions.

Dordrecht: Kluwer.


A LOT OF DATA 13

Wednesday Sep 28 2011 03:48 PM/PHOS10286/2011/78/5/kfoster2/acompton///ms editing
started/1002/use-graphics/narrow/default/

QUERIES TO THE AUTHOR

q1. Au: Abstract: Changed “This article motivates using explicit meth-
ods” to “This article encourages the use of explicit methods...”; intended
meaning kept?

q2. Au: In sentence “Famously, however, in situations...” changed “typ-
ically thought to be” to “typically assumed to be”; change okay?

q3. Au: The journal prefers to not use italics when the meaning is clear
from the context; “holisitic” changed accordingly.

q4. Au: In expressions 7b, 7c, and 7e, what does “t” indicate? I have
italized “t”; is it a variable?

q5. Au: In sentence “In linguistic theorizing...,” changed “partly or to-
tally” to “partially or completely”; change okay?

q6. Au: Changed “they show that part of doing linguistics” to “they
show that part of analyzing linguistics...”; intended meaning kept?

q7. Au: Edits made to sentence “This new theory is then...” for greater
clarity; please check that your meaning has been retained.

q8. Au: It is journal style to cite tables at least once in the run of text.
A reference to table 1 has been placed here; please revise as needed.