OP-LLCJ130021 1..8


Patterns of local discourse
coherence as a feature for
authorship attribution
............................................................................................................................................................

Vanessa Wei Feng and Graeme Hirst

University of Toronto, Canada
.......................................................................................................................................

Abstract
We define a model of discourse coherence based on Barzilay and Lapata’s entity
grids as a stylometric feature for authorship attribution. Unlike standard lexical
and character-level features, it operates at a discourse (cross-sentence) level. We
test it against and in combination with standard features on nineteen book-
length texts by nine nineteenth-century authors. We find that coherence alone
performs often as well as and sometimes better than standard features, though a
combination of the two has the highest performance overall. We observe that
despite the difference in levels, there is a correlation in performance of the two
kinds of features.

.................................................................................................................................................................................

1 Introduction

Contemporary methods of authorship attribution
and discrimination that are based on text
classification algorithms invariably use low-level
within-sentence aspects of the text as stylometric
features. In his recent survey, for example,
Stamatatos (2009) lists twenty types of stylometric
features that mostly involve character and word
unigrams and n-grams, part-of-speech tags, and
syntactic chunks and parse structures. Koppel,
Schler, and Argamon (2009) adduce a similar set.
In this article, we experiment with a stylometric
feature that, by contrast, is drawn from the
discourse level, across sentences: patterns of local
coherence. We look at the choice that a writer
makes when referring to an entity as to which gram-
matical role in the sentence the reference will appear
in, and how this choice changes over a sequence of
sentences as the writer repeatedly refers to the same
entity. We show that differences in the resulting
patterns of reference and grammatical role are a
powerful stylometric feature for identifying an
author.

2 Barzilay and Lapata’s discourse
entity grid

The basis of our method is the discourse entity grid,
introduced by Barzilay and Lapata (2008) (‘B&L’
hereafter) as a model of local discourse coherence.
B&L’s model is based on the assumption that a text
naturally makes repeated reference to the elements
of a set of entities that are central to its topic. It
represents local coherence as a sequence of
transitions, from one sentence to the next, in the
grammatical role of these references.

Consider, for example, this text:

(1) (a) Thus encouraged, Oliver tapped at the
study door. (b) On Mr. Brownlow calling to
him to come in, he found himself in a little
back room, quite full of books, with a
window, looking into some pleasant little gar-
dens. (c) There was a table drawn up before
the window, at which Mr. Brownlow was
seated reading. (d) When he saw Oliver, he
pushed the book away from him, and told
him to come near the table, and sit down.1

Correspondence: Graeme

Hirst, Department of

Computer Science,

University of Toronto,

Toronto, Ontario, Canada

M5S 3G4.

E-mail:

gh@cs.toronto.edu

Literary and Linguistic Computing � The Author 2013. Published by Oxford University Press on behalf of ALLC.
All rights reserved. For Permissions, please email: journals.permissions@oup.com

1 of 8

doi:10.1093/llc/fqt021

 Literary and Linguistic Computing Advance Access published April 9, 2013
 at U

niversity of T
oronto L

ibrary on A
pril 10, 2013

http://llc.oxfordjournals.org/
D

ow
nloaded from

 
http://llc.oxfordjournals.org/


The entity Oliver makes the following sequence of
grammatical roles: it is the subject of tapped in (a);
it is both the subject and object of found and the
object of the prepositional phrase to him in (b); it
does not appear in (c); and it is an object (twice, of
saw and of told) in (d). As this example demon-
strates, it is the referent entity itself that matters,
not the form of the reference as, for example,
Oliver, him, himself, or (not in this example) the
boy are all the same entity, whereas the books of
sentence (b) are not the same entity as the book of
sentence (d). When an entity appears in more than
one role in a sentence, it is assigned only the ‘highest
ranking’ role (Barzilay and Lapata, 2008); subject
outranks object, which in turn outranks all others.
Thus in sentence (b), we assign (only) the subject
role to Oliver. The sequence for the entity Oliver in
this four-sentence text is thus [S S – O], where ‘–’
indicates that the entity does not occur in the sen-
tence; hence the sentence-to-sentence transitions
made by this entity in the text are subject to subject
([S S]), subject to not-mentioned ([S –]), and not-
mentioned to object ([– O]).

In the case of passive verbs, the surface-form
subject is assumed to take the grammatical object
role. In the example text, the table, which does not
appear in the first two sentences, is construed as the
grammatical object of the passive verb drawn up
rather than its subject in (c),2 and it then appears
in a prepositional phrase in (d). Thus its sequence is
[– – O X], where ‘X’ indicates a role that is neither
subject nor object, and the transitions (in addition
to the empty transition [– –]) are [– O] and [O X].

Because a complete sentence, possibly containing
several clauses, is considered at a single time in
B&L’s model, a sentence may have several entities
in subject and object roles in the different clauses,
and even a simple clause may refer to more than one
entity within a single grammatical role. It is also
possible, of course, that an entity will be mentioned
only once in the entire text (such as the gardens in
example (1) above) and thus not contribute to local
coherence at all. B&L consider an entity to be
salient if it is mentioned in the text at least twice
(or some other threshold), and they describe models
in which transitions for salient and non-salient enti-
ties are treated separately.

More formally, we can represent a document d as
an entity grid in which the columns represent the
entities referred to in d, and rows represent the sen-
tences.3 Each cell corresponds to the grammatical
role of an entity in the corresponding sentence: sub-
ject (S), object (O), neither (X), or nothing (–).
Each column is a complete sequence of roles. An
example of an entity grid for the salient entities of
example (1) is shown in Table 1.

B&L define a local transition as a sequence
fS, O, X, �gn, representing the occurrence and
grammatical roles of an entity in n adjacent sen-
tences. In our example earlier, we took n ¼ 2 for
transitions, but we could use higher values; for
example, for n ¼ 3, Oliver has the local transition
[S – O] from sentences (b) to (d). Clearly, these
transition sequences can be extracted from the
entity grid as continuous subsequences in each
column. For example, the entity Mr. Brownlow in
Table 1 has a bigram transition [S S] from sentence
(b) to (c). For a given value of n, 4n different local
transitions are possible. We can count the number
of times that each one occurs in a given text and
thus compute the proportion of transitions that are
of each type.

We can interpret these proportions as
probabilities—the probability that any randomly
chosen transition of length n in the text is of the
given type. This lets us encode the entity grid as
a feature vector (ðdÞ ¼ ðp1ðdÞ, p2ðdÞ, . . . , pmðdÞÞ,
where ptðdÞ is the probability of transition type
t in the entity grid, computed as the number of
occurrences of t in the entity grid of text d divided
by the total number of transitions of length n in the
entity grid, and m is the total number of transition
types considered. For example, if we take transitions
of length n¼2, then m ¼ 16, and if n¼3, then
m¼64. We have

Pm
t¼1 ptðdÞ ¼ 1.

Table 1 The grid for the salient entities of example (1)

Oliver Mr.

Brownlow

Window Table

(a) S – – –

(b) S S X –

(c) – S X O

(d) O S – X

V. W. Feng and G. Hirst

2 of 8 Literary and Linguistic Computing, 2013

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/


B&L evaluated the use of entity grids as a model
of local coherence by showing that they can be used
to discriminate original news texts from random
permutations of the same sentences by learning a
pairwise ranking preference between alternative
renderings of a document based on the probability
distribution of the entity-grid transitions. The
model can also improve the performance of a text
readability assessment system. Here, we will use
entity grids in quite a different manner—not to
measure degree of local coherence but rather,
assuming the presence of coherence, to look at the
different ways in which it is achieved.

3 Local transitions as features for
authorship attribution

Because the set of transition probabilities forms a
feature vector ( for a text, we can investigate it as a
possible stylometric feature for authorship attribu-
tion. That is, we hypothesize that authors differ in
the patterns of local transitions that they use—their
tendency to use some types of local transitions
rather than others—and that this forms part of an
author’s unconscious stylistic signature. We can test
this hypothesis by seeing whether we can use these
feature vectors to build a classifier to identify
authorship.

Of course, we are not suggesting that local
transitions by themselves are sufficient for high-
accuracy authorship attribution. Rather, we are
hypothesizing only that they carry information
about authorship, which, if correct, is an interesting
new stylometric observation. And they could thus
be a useful complement to lexical and syntactic fea-
tures for attribution. In our experiments below,
therefore, we put our emphasis on examining
how much authorship information these features
carry by themselves in comparison with a simple
baseline as well as with traditional lexical and
lexico-syntactic features and in combination with
such features.

Any implementation of entity grids must deal
with the question of how entities are identified
and tracked through the text. Entity recognition
and coreference resolution remain incompletely

solved problems in computational linguistics. B&L
experimented with two approximations: the use of
an imperfect coreference resolution tool (Ng and
Cardie, 2002) and a very simple string-matching
algorithm. Unsurprisingly, the former gave better
results (which was also our experience in our own
earlier work with entity grids (Feng and Hirst,
2012)). Accordingly, we use a coreference tool
here too (see Section 4.1 below).

4 Experiments

4.1 Data
We gathered nineteen works of nine nineteenth-
century British and American novelists and essayists
from Project Gutenberg; they are listed in Table 2.
We split the texts at sentence boundaries into
chunks of approximately 1,000 words, regarding
each chunk as a separate document by that
author; leftovers of fewer than 1,000 words were
discarded. The imbalance between different authors
in the number of documents for each is corrected by
the sampling methods in our experiments (see
Section 4.2 below).

We applied coreference resolution to each docu-
ment, using Reconcile 1.1 (Ng and Cardie, 2002).
Reconcile is a learning-based end-to-end corefer-
ence resolution system that outputs the entities
(noun phrases) and the coreference chains formed
by these entities. It achieves F1 scores of 60 to 70 on
several coreference benchmark datasets. We then
obtained a dependency parse of each sentence of
each document to extract the grammatical role of
each entity in the text, using the Stanford depend-
ency parser (de Marneffe, MacCartney, and
Manning, 2006). We could then construct an
entity grid and the corresponding coherence feature
vector for each document. We took n ¼ 2, that is
only transition bigrams, so there are 42 ¼ 16 tran-
sition types. But we counted transitions separately
for salient entities—those entities with at least two
occurrences in a document—and for non-salient
entities. In the latter case, only seven of the sixteen
transition types—those in which the entity appears
in at most one sentence—can occur.

For each document, we also extracted a set of 208
low-level lexico-syntactic stylistic features: the

Discourse coherence for authorship attribution

Literary and Linguistic Computing, 2013 3 of 8

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/


frequencies of the 100 most frequent letter bigrams,
of the 100 most frequent letter trigrams, and of the
following eight types of function words: prepos-
itions, pronouns, determiners, conjunctions,
modal auxiliaries, auxiliary verbs (be, have, do), ad-
verbs, and to.

4.2 Method
We conducted two sets of authorship attribution
experiments: pairwise and one-versus-others. In
the former, we select two authors and build a clas-
sifier that attempts to discriminate them, using
either the coherence feature set, the lexico-syntactic
feature set, or a combination of both. In the latter,
we select one author and build a classifier that at-
tempts to discriminate that author from all others in
the dataset, again using one or both of the feature
sets. The classifier in all experiments was a neural

network with one hidden layer. We chose this
classifier because it is able to handle non-linear
relations among features, and it outperformed
some other classifiers, such as decision trees and
support vector machines, in our development
experiments.

Each experiment used five-fold cross-validation,
in which one-fifth of the data are chosen as test data
and the training data are derived from the other
four-fifths. The process is repeated for each one-
fifth of the data in turn. To prevent class imbalance,
the training data partitions are resampled (specific-
ally, oversampled) in each iteration (Estabrooks, Jo,
and Japkowicz, 2004) to obtain a balanced distribu-
tion in the training set between the two classes—
that is, between the two selected authors in the pair-
wise experiments and between the selected author
and all the others in the one-versus-others experi-
ments. If the number of datapoints in one class is
markedly fewer than that of the other, this proced-
ure (implemented here by the Resample module of
the Weka 3.6.8 toolkit (Hall et al., 2009)) replicates
datapoints at random until the two classes are ap-
proximately equal in size. For pairwise classification,
we also oversample the test set in the same way in
order to set an appropriate baseline.

4.3 Results
4.3.1 Pairwise classification

The results of pairwise classification are shown in
Table 3. For each pair of authors, we show macro-
averaged classification accuracy for four conditions:
using only coherence features, using only traditional
lexico-syntactic features, using all features, and, as a
baseline, always guessing the author that has the
greater representation in the training data. Because
of our resampling procedure, the baseline is always
close to, but not exactly, 50%, and it will be less than
50% when the more frequent author in the training
data is not the more frequent one in the test data.
Significant differences between the conditions for
each pair of authors are indicated by superscripts
(p < :05 in all cases).

As expected, the established lexico-syntactic
stylometric features give accuracies that are signifi-
cantly above the baseline (with one exception:
Hawthorne versus Melville, where these features

Table 2 The data used in our experiments

Text Chunks

Anne Brontë

Agnes Grey 78

The Tenant of Wildfell Hall 183

Jane Austen

Emma 183

Mansfield Park 173

Sense and Sensibility 140

Charlotte Brontë

Jane Eyre 167

The Professor 92

James Fenimore Cooper

The Last of the Mohicans 156

The Spy 103

Water Witch 164

Charles Dickens

Bleak House 383

Dombey and Son 377

Great Expectations 203

Ralph Waldo Emerson

The Conduct of Life 67

English Traits 68

Emily Brontë

Wuthering Heights 126

Nathaniel Hawthorne

The House of the Seven Gables 106

Herman Melville

Moby Dick 261

Redburn 27

V. W. Feng and G. Hirst

4 of 8 Literary and Linguistic Computing, 2013

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/


perform only seven percentage points above base-
line, a difference that is not significant because we
have relatively little data for Hawthorne). And, as
we hypothesized, our coherence features also signifi-
cantly exceed the baseline in all cases, showing that
these features contain a considerable amount of in-
formation about authorship. The combined feature

set also significantly exceeds the baseline in all cases.
However, there is no consistency or pattern as to the
relative performance of the three feature sets. In
some cases (denoted by a in the table), such as
Dickens versus Anne Brontë, the coherence features
outperform the lexico-syntactic features, whereas in
others (denoted by b), such as Dickens versus

Table 3 Accuracy scores (%) of pairwise classification experiments

Austen Charlotte Cooper Dickens Emerson Emily Hawthorne Melville

Anne Coherence 78.3
d

73.8
d

88.4
d

83.7
a,d

85.5
d

81.9
a,d

81.0
d

83.9
d

Lexico-syntactic 79.4
d

77.7
d

98.1
b,d

78.4
d

90.6
b,d

71.5
d

85.3
d

90.7
b,d

Combined 86.7a,b,d 83.1a,b,d 99.1b,d 84.4a,d 90.9b,d 82.6a,d 90.1a,b,d 91.6b,d

Baseline 53.3 57.7 53.6 52.5 47.1 48.0 46.9 44.8

Austen Coherence 78.9
d

82.3
d

75.5
d

74.9
d

83.5
d

71.9
d

78.6
d

Lexico-syntactic 85.3b,d 97.5b,d 84.1b,d 97.4b,d 86.2d 90.3b,c,d 91.5b,d

Combined 91.7a,b,d 97.5b,d 86.4a,b,d 98.3b,d 93.9a,b,d 85.3b,d 90.5b,d

Baseline 47.2 49.3 53.6 45.4 45.2 46.1 46.4

Charlotte Coherence 93.2
d

80.6
a,c,d

84.0
d

78.8
a,c,d

84.1
a,c,d

82.9
a,d

Lexico-syntactic 92.8d 58.1d 93.3b,d 62.3d 73.7d 76.1d

Combined 97.2
a,b,d

77.3
a,d

93.0
b,d

73.4
a,d

77.2
d

83.8
a,d

Baseline 53.1 52.8 44.7 47.1 48.1 45.6

Cooper Coherence 81.3d 79.6d 92.9d 73.5d 74.2d

Lexico-syntactic 92.8b,d 87.1b,d 93.3d 81.0b,d 84.9b,d

Combined 95.3
a,b,d

88.5
b,d

96.6
a,b,d

83.4
b,d

87.7
a,b,d

Baseline 53.7 44.6 46.0 46.1 46.0

Dickens Coherence 86.5d 91.1a,c,d 75.2d 79.9d

Lexico-syntactic 87.3
d

77.8
d

74.3
d

83.2
b,d

Combined 94.3
a,b,d

87.3
a,d

77.6
a,d

85.7
a,b,d

Baseline 48.8 49.1 49.6 49.0

Emerson Coherence 97.5d 70.9d 75.6d

Lexico-syntactic 97.5
d

80.6
b,d

89.9
b,d

Combined 95.1
d

88.8
a,b,d

94.2
a,b,d

Baseline 51.5 53.6 51.6

Emily Coherence 78.9
d

94.6
a,d

Lexico-syntactic 88.3
b,d

86.0
d

Combined 91.1b,d 92.1a,d

Baseline 58.3 52.5

Hawthorne Coherence 67.9
a,d

Lexico-syntactic 58.5

Combined 72.2a,d

Baseline 51.3

aSignificantly better than lexico-syntactic features (p < :05).
b
Significantly better than coherence features (p < :05).

cSignificantly better than combined features (p < :05).
dSignificantly better than baseline (p < :05).

Discourse coherence for authorship attribution

Literary and Linguistic Computing, 2013 5 of 8

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/


Austen, the reverse is true. In a few cases (denoted
by c), such as Charlotte Brontë versus Hawthorne,
the coherence features also outperform the com-
bined feature set, although the converse is more
usual. It is perhaps notable that, apart from
Hawthorne versus Melville, all the pairs in which
coherence features are superior to lexico-syntactic
features involve a Brontë sister versus Dickens,
Hawthorne, Melville, or another Brontë sister.

Generally speaking, however, Table 3 suggests
that coherence features perform well but usually
not quite as well as lexico-syntactic features, and
that the combined feature set usually performs
best. We confirm this generalization by aggregating
the results for all pairs of authors by taking all
predictions for all author pairs as a single set, and
reporting accuracy for each set of features. The
results, in Table 4, show this stated ordering for
accuracy, with significant differences at each step.

4.3.2 One-versus-others classification

Unlike pairwise classification, where we are inter-
ested in the performance of both classes, in one-
versus-others classification we are interested only
in a single author class, and hence we can regard
the problem as retrieval and report the results using
the F1 score of each class under each condition.

The results for each author are shown in Table 5,
and aggregated results are shown in Table 6. A pat-
tern similar to that of pairwise classification is
observed. All feature sets perform significantly
better than baseline, and the combined feature set
is always significantly better than both others. The
coherence features perform well, and significantly
better than the lexico-syntactic features for

Dickens and Charlotte Brontë; in addition, they per-
form notably better, albeit not significantly so, for
Hawthorne and Emily Brontë. However, in aggre-
gation, the lexico-syntactic features are significantly

Table 5 F1 scores of one-class classification experiments

Author Features F1

Anne Coherence 34.5
d

Lexico-syntactic 40.9
b,d

Combined 48.0a,b,d

Baseline 15.1

Austen Coherence 39.1
d

Lexico-syntactic 60.3b,d

Combined 65.9a,b,d

Baseline 26.3

Charlotte Coherence 37.2
a,d

Lexico-syntactic 25.2d

Combined 38.0
a,b,d

Baseline 16.2

Cooper Coherence 53.8d

Lexico-syntactic 73.3b,d

Combined 78.0
a,b,d

Baseline 25.8

Dickens Coherence 63.8a,d

Lexico-syntactic 61.3
d

Combined 70.6
a,b,d

Baseline 50.7

Emerson Coherence 26.1d

Lexico-syntactic 51.9
b,d

Combined 61.6
a,b,d

Baseline 9.1

Emily Coherence 29.5
d

Lexico-syntactic 25.6
d

Combined 32.7a,b,d

Baseline 7.9

Hawthorne Coherence 17.3
d

Lexico-syntactic 14.0
d

Combined 21.0a,b,d

Baseline 7.2

Melville Coherence 25.9
d

Lexico-syntactic 38.2b,d

Combined 39.6b,d

Baseline 12.1

a
Significantly better than lexico-syntactic features (p < :05).

b
Significantly better than coherence features (p < :05).

dSignificantly better than baseline (p < :05).

Table 4 Accuracy scores (%) of pairwise classification

experiments, aggregated across all authors for each

feature set

Feature set Acc. (%)

Coherence 81.3
d

Lexico-syntactic 83.7b,d

Combined 88.4a,b,d

Baseline 49.8

a
Significantly better than lexico-syntactic features (p < :05).

bSignificantly better than coherence features (p < :05).
dSignificantly better than baseline (p < :05).

V. W. Feng and G. Hirst

6 of 8 Literary and Linguistic Computing, 2013

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/


better than coherence features, and the combined
set is significantly better again.

5 Discussion

Our experiments show that entity-based local co-
herence, by itself, is informative enough to be able
to classify texts by authorship almost as well as con-
ventional lexico-syntactic information, even though
it uses markedly fewer features. And the two types of
information together perform better than either
alone. This shows that local coherence does not
just represent a subset of the same information as
lexico-syntactic features, which is not a surprise,
given that they focus on different aspects of the
text at different levels. On the other hand, given
this point, we might expect that the performance
of the two feature sets would be independent, and
that there would be authorship discriminations that
are difficult for one set of features but easy for the
other. However, we did not find this; while each
had cases in which it was significantly better
than the other, the scores for the two feature sets
were correlated significantly (pairwise task,
r ¼ :3657, df ¼ 34, p < :05; one-versus-others
task, r ¼ :7388, df ¼ 7, p < :05).

Although in aggregation the combination of
lexico-syntactic and coherence feature sets outper-
formed both feature sets individually, in a handful
of cases, such as Hawthorne versus Austen in
Table 3, the combined features obtained lower
accuracy than using either lexico-syntactic or coher-
ence features alone. We speculate that this is due to
potential overfitting on the training data when using

the combined feature set, which has a higher dimen-
sionality than the other two.

6 Conclusion

We have shown that an author’s presumably uncon-
scious choices of transition types in entity-based
local coherence is a stylometric feature. It forms
part of an author’s stylistic ‘signature’, and is in-
formative in authorship attribution and discrimin-
ation. As a stylometric feature, it differs markedly
from the syntactic, lexical, and even character-level
features that typify contemporary approaches to the
problem: It operates at the discourse level, above
that of sentences. Nonetheless, the correlation that
we found between the performance of this feature
set and the lower-level feature set suggests that there
is some latent relationship between the two, and this
requires further investigation.

We took Barzilay and Lapata’s model very much
as they originally specified it. However, there are
many possible variations to the model that bear
investigation. Apart from the obvious parameter
settings—the value of n, the threshold for sali-
ence—the ranking of grammatical roles when an
entity occurs more than once in a sentence may
be varied. In particular, we experimented with a
variation that we called ‘multiple-transition mode’
in which each occurrence of an entity in a sentence
is paired individually in all combinations. For
example, if a specific entity occurs three times, in
the roles S, O, and X, in one sentence and then
twice, in the roles S and O in the next, then we
extract six transition bigrams ([S S], [O S], [X S],
[S O], [O O], and [X O]) rather than just the one
with the highest priority, [S S]. However, in some of
our early experiments, this variation showed little
difference in performance from Barzilay and
Lapata’s single-transition model, so we abandoned
it, as it adds significant complexity to the
model. But we suspect that it might be more
useful for characterizing authors, such as Dickens,
who tend to write very long sentences involving
complicated interactions among entities within a
single sentence.

Table 6 F1 scores of one-class classification experiments

aggregated across all authors for each feature set

Feature set Acc. (%)

Coherence 40.5
d

Lexico-syntactic 47.9b,d

Combined 56.6a,b,d

Baseline 20.0

a
Significantly better than lexico-syntactic features (p < :05).

b
Significantly better than coherence features (p < :05).

dSignificantly better than baseline (p < :05).

Discourse coherence for authorship attribution

Literary and Linguistic Computing, 2013 7 of 8

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/


We see entity-based local coherence as possibly
the first of many potential stylometric features at
this level that may be investigated. These could in-
clude other aspects of repeated reference to entities.
The present features consider only identical refer-
ents; but an author’s use of other referential rela-
tionships such as various forms of meronymy (the
car . . . the wheels; the government . . . the minister)
could also be investigated. More generally, the set
of various kinds of relationships used in lexical
chains (Morris and Hirst, 1991) could be tried,
but a too-promiscuous set would simply result in
almost everything being related to almost everything
else. Rather, instead of holding the relationships
constant, as in the present approach, one would
look for inter-author differences in the patterns of
variation in the relationships. Such differences could
also be sought in the surface forms of entity corefer-
ence across sentences (the author’s choice of definite
description, name, or pronoun); these forms are
conflated in the present approach.

Understanding these kinds of discourse-level fea-
tures better may bring new insight to this level of the
stylistic analysis of text and the individuation of
authors’ style.

Acknowledgements
This work was supported by the Natural Sciences
and Engineering Research Council of Canada.

References
Barzilay, R. and Lapata, M. (2008). Modeling local co-

herence: an entity-based approach. Computational
Linguistics, 34(1): 1–34.

De Marneffe, M.-C., MacCartney, B., and Manning, C.
D. (2006). Generating typed dependency parses from

phrase structure parses. Proceedings of the 5th

International Conference on Language Resources and
Evaluation (LREC 2006), Genoa, 449–54.

Feng, V. W. and Hirst, G. (2012). Extending the

entity-based coherence model with multiple ranks.
Proceedings, 13th Conference of the European Chapter

of the Association for Computational Linguistics
(EACL-2012), Avignon, France, 315–24.

Estabrooks, A., Jo, T., and Japkowicz, N. (2004). A mul-

tiple resampling method for learning from imbalanced
data sets. Computational Intelligence, 20(1): 18–36.

Hall, M., Frank, E., Holmes, G., Pfahringer, B.,

Reutemann, P., and Witten, I. H. (2009). The
WEKA data mining software: an update. SIGKDD

Explorations, 11(1): 10–18.

Koppel, M., Schler, J., and Argamon, S. (2009).
Computational methods in authorship attribution.

Journal of the American Society for Information Science
and Technology, 60(1): 9–26.

Morris, J. and Hirst, G. (1991). Lexical cohesion, the

thesaurus, and the structure of text. Computational
Linguistics, 17(1): 21–48.

Ng, V. and Cardie, C. (2002). Improving machine

learning approaches to coreference resolution.
Proceedings of the 40th Annual Meeting of the
Association for Computational Linguistics (ACL 2002),

Philadelphia, 104–11.

Stamatatos, E. (2009). A survey of modern authorship
attribution methods. Journal of the American Society

for Information Science and Technology, 60(3): 538–56.

Notes
1 Charles Dickens, Oliver Twist, chapter XIV.
2 To be precise: table is the elided surface-form grammat-

ical subject of the reduced relative clause of which the
passive verb drawn up is the head.

3 The remainder of this section is based in part on ma-
terial from Feng and Hirst (2012).

V. W. Feng and G. Hirst

8 of 8 Literary and Linguistic Computing, 2013

 at U
niversity of T

oronto L
ibrary on A

pril 10, 2013
http://llc.oxfordjournals.org/

D
ow

nloaded from
 

http://llc.oxfordjournals.org/