OP-LLCJ150026 1..18


Stylometry and Collaborative
Authorship: Eddy, Lovecraft, and
‘The Loved Dead’
............................................................................................................................................................

Alexander A. G. Gladwin, Matthew J. Lavin and Daniel M. Look

St. Lawrence University, Canton, NY, USA
.......................................................................................................................................

Abstract
The authorship of the 1924 short story ‘The Loved Dead’ has been contested by
family members of Clifford Martin Eddy, Jr. and Sunand Tryambak Joshi, a
leading scholar on Howard Phillips Lovecraft. The authors of this article use
stylometric methods to provide evidence for a claim about the authorship of
the story and to analyze the nature of Eddy’s collaboration with Lovecraft.
Further, we extend Rybicki, Hoover, and Kestemont’s (Collaborative authorship:
Conrad, Ford, and rolling delta. Literary and Linguistic Computing, 2014; 29, 422–
31) analysis of stylometry as it relates to collaborations in order to reveal the
necessary considerations for employing a stylometric approach to authorial
collaboration.

.................................................................................................................................................................................

1 Introduction

When ‘The Loved Dead’ was published in the May-
June-July 1924 issue of Weird Tales—accredited to
Clifford Martin Eddy, Jr., or C. M. Eddy, Jr.—con-
troversy followed. The issue was banned in at least
Indiana, if not nationwide (Joshi, 2010, p. 501); the
magazine’s editor, Farnsworth Wright, would hold a
wariness of stories containing socially contentious
subject matter for years. ‘The Loved Dead’ is a
first-person narrative of a necrophiliac who is on
the run from authorities as he explains the roots
of his predilection for corpses. The material
proved too explicit for audiences, and Wright’s at-
tempts to preclude another controversy caused rela-
tively innocuous stories such as ‘In the Vault’ by
Howard Phillips Lovecraft—better known as H. P.
Lovecraft, who was a friend of Eddy—to be denied
publication in 1925 in fear of another mishap (de
Camp, 1975, p. 244).

Although controversy has continued to surround
the story, the focus is no longer on the subject
matter. Instead, the question of authorship has
defined the discourse surrounding ‘The Loved
Dead’, because Lovecraft—one of Weird Tales’
most popular contributors—is known to have
revised it. The extent of his revisions, though, re-
mains unclear.

2 Biographical and historical
considerations

Because direct historical evidence is typically privi-
leged over computational analysis, we seek to ex-
haust these resources before enacting a less
conventional approach. However, the results are
not fruitful; when we consider the fact that
Lovecraft ‘revised’ the story, we find only more am-
biguity. The term may imply copy-editing or

Correspondence: Alexander

A. G. Gladwin, 753 Franklin

Ave. Columbus, OH 43205

United States.

E-mail:

aaggladwin@gmail.com

Digital Scholarship in the Humanities � The Author 2015. Published by Oxford University Press on behalf of EADH.
All rights reserved. For Permissions, please email: journals.permissions@oup.com

1 of 18

doi:10.1093/llc/fqv026

 Digital Scholarship in the Humanities Advance Access published July 29, 2015


content suggestions, but Lovecraft often ghostwrote
and collaborated on stories while retaining the title
of ‘revisionist’. Lovecraft biographer Sunand
Tryambak Joshi, or S. T. Joshi (2011), identifies
the author’s frequent undermining of his input on
revised stories: he and his friend, Robert H. Barlow,
cowrote ‘Till A’the Seas’, but Lovecraft allowed it to
be published exclusively in Barlow’s name to en-
courage the young author (p. 10–11). In another
case, he took a two-sentence treatment by Zelia
Bishop and turned it into a 29,000 word novella,
‘The Mound’ (Joshi, 2010, p. 745); he does not
appear to have sought credit in its attempted pub-
lication. Thus, the term ‘revision’ is separated from
its denotation and even connotations in the context
of Lovecraft, and a careful suspicion of declaring the
story to be Eddy’s on that basis alone is warranted.
To Lovecraft, revision could mean copy-editing,
minor rewrites, major rewrites, or sole writing
based on ideas. Current primary and secondary
claims do not clarify the extent of his contribution.

Additional historical details about the authorship
complicate the question further. In terms of first-
hand accounts, Lovecraft (1925/2005) does refer to
the story in a letter to his aunt as ‘poor Eddy’s ‘‘The
Loved Dead’’’ (p. 252). However, Eddy and
Lovecraft were friends; the latter visited the former
on occasion, as both were permanent residents of
Providence—excepting Lovecraft’s stint in New
York City from 1924 to 1926 (Joshi, 2010, p. 466–
8). If he were willing to allow his friend Barlow and
numerous others to take credit for stories that he
worked on, then the possibility that he would refer
to the story as Eddy’s is inconclusive of its
authorship.

However, in March 1935, Lovecraft (1935/1993)
says in a letter to Robert Bloch, ‘It may interest you
to know that I revised the now-notorious ‘‘Loved
Dead’’ myself—practically re-writing the latter half’
(p. 61). In contrast, Eddy’s wife, Muriel Eddy (1961/
1998), claims in her memoir about Lovecraft that he
merely ‘read the original manuscript and touched
it up in places, with my husband’s full sanction,
but it was entirely the brain-child of my husband’
(p. 58–9). Joshi (2010) has called into question many
of Muriel Eddy’s claims in this piece, citing a lack of
supporting evidence and positing the possibility that

she wished to capitalize on Lovecraft’s fame
(p. 464–5). There is no way to be certain in either
case, as Joshi himself is merely speculating, and there
is no reason to dismiss Muriel’s statement about ‘The
Loved Dead’ out of hand.

Jim Dyer—Eddy’s grandson and head of Fenham
Publishing, which continues to release collections of
Eddy’s stories—further argues in favor of his grand-
father’s nigh-complete authorship: ‘There should be
no confusion regarding my Grandfather’s stories.
Lovecraft and my Grandfather would read their
stories aloud to each other and both would give
advice and suggestions. My grandfather’s stories
were written by him as any of the people who
knew both Lovecraft and my Grandfather could
attest to’ (personal communication, 21 February
2014). He echoes Muriel’s viewpoints, and the opin-
ion of Eddy’s family on the matter is consistent.

The historical evidence does not provide a clear,
single narrative. Joshi (2010) argues in favor of
Lovecraft, albeit without any concrete evidence:
‘There was. . . in all likelihood a draft written by
Eddy for this tale’, he says, ‘but the published ver-
sion certainly reads as if Lovecraft had written the
entire thing’ (p. 466). Joshi compares the ‘adjective-
choked prose’ to Lovecraft’s contemporary short
story, ‘The Hound’.

1 Most likely, Eddy wrote at
least the plot, if not an entire draft. The uncertainty
lies in how much of the printed tale comes from
Eddy’s draft, and how much Lovecraft wrote or
rewrote.

3 Primary tests

3.1 Background
To provide a more in-depth analysis, we look to
stylometry, a study that analyzes, per Holmes’
(1998) definition, implicit aspects of an author’s
writing style through statistical analysis. There are
numerous approaches and debates within the field
about the most accurate metrics—e.g. should the
focus be on common words such as articles in
order to measure subconscious stylistic qualities,
or words that appear rarely in order to find author-
ial stamps—but the principle remains: authors have
distinctive qualities to their writing that can be un-
covered and statistically analyzed, which provides

A. A. G. Gladwin et al.

2 of 18 Digital Scholarship in the Humanities, 2015


evidence regarding the authorship of a disputed
text.

We first provide a historical overview of stylo-
metric techniques, because many have come into
popularity only to be dismissed as a result of further
investigation. Morton (1978) focused on words that
appear only once in a text, i.e. hapax legomena, and
their positions in sentences. Smith (1987) revealed
the process’ several flaws: small sample sizes, loose
statistical inferences, and improper data collection.
To question specific stylometric methods is import-
ant, because it pushes scholars to find more sound
and rigorous ones.

Several tests have yielded useful results, chief
among them being Mosteller and Wallace’s (1964)
testing of the Federalist Papers. They attempted to
analyze the texts through synonym pairs, such as the
selection of the word ‘big’ over ‘large’. However,
upon realizing that this method would not work
for the texts at hand, they altered their study to
focus on function words such as conjunctions and
articles, which have little meaning on their own but
reveal relationships in the structure of the sentence.

The Federalist Papers were written by John Jay,
Alexander Hamilton, and James Madison and pub-
lished under a pseudonym. Scholars and historians
were certain of the authorship of seventy-three let-
ters, but the author(s) of the remaining twelve was
unclear. Mosteller and Wallace tested claims via
Bayesian statistics about who of John Jay,
Alexander Hamilton, and James Madison wrote
the disputed papers. They discovered that the meas-
ures were consistent with Madison, which suggested
a high probability that he was the author of all
twelve, a claim agreed upon by historians (Juola,
2006, p. 242). This case study suggests that, if stat-
istical rigor is maintained, reliable results can be
obtained. The important step is procuring accurate
measurements and tests, applying them to suitable
texts, and carefully interpreting the content of the
results.

There have been only a small number of stylo-
metric analyses of texts where the authorship is col-
laborative. Rybicki et al. (2014) explore the stylistic
qualities of The Inheritors, Romance, and The Nature
of a Crime, all three of which are accredited to both
Ford Madox Ford and Joseph Conrad. Their

method of testing is particularly useful for differen-
tiating authorship in texts that were written in
segments.

The publications that contained Lovecraft and
Eddy’s stories, such as Weird Tales, provide numer-
ous authors and stories that can serve as test cases
for collaborative authorship due to the literary con-
text of ‘hack writing’—a context that adds weight to
our inquiry into Lovecraft as revisionist. Usage of
the term ‘hack’, from hackney, to describe a laborer
or ‘drudge’ dates back to the 18th century, but the
concept of literary hackwork has its roots in the
history of popular periodicals, especially London’s
Grub Street (Cox and Mowatt, 2014). In the context
of the 19th- and early-20th-century United States,
literary hackwork was likewise linked to mass-
market periodicals and pulp publications. Because
the second half of the 19th century saw the rise of
the modern profession of authorship and with it a
professional discourse of trade books and period-
icals, there exist numerous articles defining hack-
work and discussing its merits. These sources tend
to agree that a hack ‘writes for pay, and, if he were
not paid, could not write’ (Lang, 1896, p. 15). To
succeed, a hack had to be ‘a sort of all-trades’ jack’
(Reeve, 1910, p. 55). In comparison with popular
notions of authorial identity and celebrity, literary
hackwork occurred behind the scenes and carried
with it a decreased emphasis on receiving credit.
Broadly speaking, hackwork included various
levels of unnamed collaborative labor, including
ghostwriting, revision, line editing, and adaptation.

Lovecraft in particular is a part of this context,
although he does differ from the prescribed image of
a hack writer. He often edited and ghostwrote for
money, including a short story published in the
same issue of Weird Tales as ‘The Loved Dead’:
‘Imprisoned with the Pharaohs’, originally titled
‘Under the Pyramids’, which was accredited to
famous magician Harry Houdini; Lovecraft received
$100 for his work (Joshi, 2010, p. 498–9). However,
Lovecraft does not fit the image of a hack writer in
many ways, namely that he was against the idea of
commercial writing (Joshi, 2010, p. 297), favoring
the image of the amateur who wrote out of personal
passion and expression. Still, he did revise, ghost-
write, and line edit—at times for friends, at times

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 3 of 18


for money due to his financial destitution—which
places him in this tradition.

Although Rybicki, Hoover, and Kestmont have
begun to study literary collaboration using compu-
tational tools, further investigation into the range of
literary collaborations of the modern period is called
for, an investigation for which hack writing is well
suited. The kinds of collaborations we seek to study
are notoriously opaque but tremendously common,
and crucial to the production of popular and liter-
ary texts. Successfully using authorship attribution
techniques with these kinds of collaborations would
have numerous implications for the study of litera-
ture and literary history. Potential fruits include: a
better understanding of how and when hackwork
took place; better information about the extent of
collaboration needed to affect an authorial signal in
a text; and, more aspirationally, methods for detect-
ing ghostwritten texts in a field of candidate texts.

3.2 Lexical richness
We initially attempted to measure variables of lex-
ical richness, which has a turbulent history in styl-
ometry and has largely faded from popular use.
Juola (2006) argues that such measurements have
not ‘been demonstrated to be sufficiently distin-
guishing or sufficiently accurate’, although he does
admit that they cannot be dismissed outright with
specific counterexamples (p. 240–1). Scholars have
found stability with certain variables, including
Look’s (2012) study of the works of Robert E.
Howard, which argues for the stability of the
Type-Token Ratio given certain constraints;
Holmes’ (1992) study of authorship in Mormon
scripture, which combines variables including the
hapax dislegomena ratio H, Honoré’s R, and Yule’s
K; and Grieve’s (2007) multifaceted approach to
lexical richness, which utilizes the aforementioned
and numerous other variables to predict the author-
ship of a text, achieving a success rate ranging from
59 to 77% when there are two potential authors.2

However, Tweedie and Baayen (1998) provide evi-
dence toward the instability of R and K in relation
to the total token count of a text, N. Its general
usage appears to be heavily predicated on having
particularly apt test subjects, and even then, a level
of certainty reaching even 77% could be seen as too

low for predicting the author of a single disputed
text.

We measured T, H, R, and K and performed
principal component analysis (PCA) to condense
and visualize the data,3 because lexical richness
can provide insight into the style of a text, but our
results were too invariable and muddled to use in
our test case. However, it can be useful for stylo-
metric analysis of collaborative authorship given
particularly apt test cases, so we leave this informa-
tion for those wishing to perform further testing on
other subjects.

3.3 Latent features
Measurements of lexical richness can be useful, but
we prefer elements of style that can more consist-
ently allow us to distinguish between authors, and
such elements have been found in the study of the
parts of language that often fail to attract significant
attention: function words. Included in this umbrella
term are conjunctions, articles, prepositions, par-
ticles, and auxiliary verbs. The principle behind
measuring function word usage is that it reflects
an author’s structural stylistic impulse. Thus, mea-
suring the frequencies of the most common func-
tion words and words in general can provide
information about an author on a level that
cannot be easily imitated.

We will discuss two particularly useful tools,
both suggested by Burrows, that focus on function
words and common words across texts in general in
order to reveal the relationships among corpora.4

3.3.1 Function words

In the first method, Burrows (1989) outlines the
process of taking function word frequencies and
applying PCA by analyzing the speech of certain
characters in the works of Jane Austen, and then
comparing the works of different authors. Binongo
(2003) hones the technique in a process that will be
outlined here. His focus is the 15th book in the Oz
series, The Royal Book of Oz, which was published 2
years after the death of L. Frank Baum, the undis-
puted author of the first fourteen books. The con-
troversy is due to a statement in the 15th book’s
original publication that claims the text was written
by Baum and only ‘enlarged and edited’ by Ruth

A. A. G. Gladwin et al.

4 of 18 Digital Scholarship in the Humanities, 2015


Plumly Thompson (as cited in Binongo, 2003,
p. 10), who would go on to write the 16th to 33rd
entries. However, this claim has been disputed. The
most recent edition from 2001 recognizes
Thompson’s authorship, stirring the controversy.

Binongo performs an authorship attribution test
distilled from Burrows’ technique. He takes the first
thirty-three books in the Oz series—including four-
teen that are undisputedly by Baum, eighteen undis-
putedly by Thompson, and The Royal Book of Oz—
and treats these books as a single corpus. He re-
moves non-prose sections for ease of study. Then,
he obtains the proportion of each word in the
corpus—i.e. for every distinct word, he measures
the number of occurrences in the text, say w, and
calculates w / N. From this list, he records the top
fifty function words, excluding auxiliary verbs and
pronouns due to the former’s multiplicity of inflec-
tions and the latter’s dependence on factors such as
point of view.

Binongo then subdivides each text into blocks of
5,000 words in order to see variations within a book,
creating 223 text blocks; next, he calculates the pro-
portions of the top fifty function words within each
5,000 word subdivision. He then utilizes PCA to
distill the information from these measurements
into easily visualized data.

Binongo’s results are convincing (Fig. 1). When
he visualizes the data, works by Baum and those by
Thompson are divided along the x-axis. Binongo
notes that even the 14th Oz book, Glinda of Oz—
which was edited by Baum’s son from a rough
draft—falls distinctly in the cluster of Baum’s
works. The Royal Book of Oz, however, clusters
with Thompson’s work, lending evidence to the
claim that she is the likely author of the text. The
fact that the revised version of Baum’s work is not-
ably similar to his other works is useful because the
disputed text we will be considering, ‘The Loved
Dead’, is possibly just a light revision by
Lovecraft. In theory, if he had no greater hand in
the published version, we should obtain similar re-
sults in favor of Eddy.

This process provides visualizable data on the
differences in style between two authors by tracking
subconscious stylistic qualities. The utility is clear
and will provide information about the disputed

text that we will consider, because a light revision
by one author of another’s text will not likely change
an author’s basic grammatical structures.

3.3.2 Burrows’ Delta

The second method outlined by Burrows (2003)
measures style based on common words—including
non-function words—and does not privilege visual-
izations; rather, it yields a single value called Delta
that reflects the similarity of subsets of a corpus to a
key subset.

5 In our study, that will mean treating
‘The Loved Dead’ as our key subset, and comparing
stories by Lovecraft and Eddy to it. The main set will
consist of all of these stories.

The authorial subset that yields the smallest Delta
score will be the least unlike the key text in terms of
common word usage. Burrows emphasizes using the
phrase ‘least unlike’ in order to clarify that the
author with the lowest Delta score does not neces-
sarily have a similar style to the author of the dis-
puted text, but rather is closer in style than any
other author with a higher Delta score.
Fortunately for us, this will not be an issue, as
‘The Loved Dead’ is all but certainly the work of
Lovecraft, Eddy, or both. The purpose of using
this test in conjunction with function word PCA is
that it provides another way to measure style as seen

Fig. 1 Binongo’s results for function word PCA for Baum
and Thompson

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 5 of 18


in common words—and the results for both tests
should thus coincide with each other—but with a
potentially different word set.

3.3.3 Rolling Delta

Rybicki et al. (2014) use Rolling Delta in their study
of collaborative authorship, which applies the same
basic technique as Burrows’ Delta, but provides sev-
eral Delta scores for a single test text by setting two
numbers—a ‘rolling window’ and a step size—and
then ‘rolling’ through the text. The window is the
number of words that will be considered when cal-
culating Delta, and the step size is the increment by
which the index of the first word for the window
increases. So, for example, if the window size were
set to 3,000 and the step size 500, then that test would
start with a window containing the first 3,000 words
of the text, evaluate the Delta scores for the authorial
subsets, then measure another 3,000 word window
starting with the 501st word, followed by a 3,000
word window starting with the 1,001st, etc., until a
window contained the last word of the text.

There are two other differences between Burrows’
Delta and Rolling Delta. First, while the Delta score is
weighted by the number of top words considered,
Rolling Delta is not. So, if we were to measure the
frequencies of the thirty top words, and one author
differed from the usage in the test text by 1 standard
deviation each time—and, in the case of Rolling
Delta, for each rolling window—then the Burrows’
Delta score would be 1, but the Rolling Delta scores
would be 30. Second, Rolling Delta uses ‘culling’,
which is the process of removing words that do not
appear in a certain percentage of the text. If the cul-
ling value is thirty, then only words that appear in at
least 30% of the texts will be considered when calcu-
lating the Rolling Delta scores. This process is useful,
in particular for larger texts, because it removes
words that are used often, but only in a designated
percentage of samples.

4 Text selection and natural lan-
guage processing

Before any tests can be run, the data must be stan-
dardized. We choose to work only with prose because

that is the genre of the disputed text. We want to
have samples that are of the same language form as
each other and as the test text to avoid unforeseen
variables. Moreover, because ‘The Loved Dead’ is fic-
tion, we have decided to exclude essays and letters in
case they interfere with the language structures that
we are attempting to unearth.6 Non-English language
sections, epigraphs, and chapter numberings (e.g. ‘II’
or ‘Chapter 2’) have been removed.

Because we only have twelve Eddy stories to work
with (Table 1), we create a subset of Lovecraft’s
horror prose that contains twelve tales contempor-
ary to ‘The Loved Dead’ (Table 2).7

Contractions are not expanded in the Delta tests
as a result of Hoover’s (2004) findings that doing so
actually lowers accuracy, and are similarly not ex-
panded for the function word PCA test. Spellings
are not standardized in order to avoid altering the
data. Thus, even sections of the stories written in
dialect, e.g. Zadok Allen’s monologue in The
Shadow Over Innsmouth, are left intact, as they rep-
resent specific word choices by the authors.

The Lovecraft set is tested against Eddy in relation
to not only ‘The Loved Dead’, but also a text that we
know to be entirely by Lovecraft: ‘The Mound’,
which is not in our test sets due to its technical des-
ignation as a collaboration between him and Zelia
Bishop.8 Testing ‘The Mound’ provides base cases
that should yield strong results in favor of Lovecraft
if the tests are accurately capturing style.

We perform tokenization in Python using the
Natural Language Tool Kit’s (NLTK) tokenizer,
which creates a list wherein each item is a word or
piece of punctuation (http://www.nltk.org/).9 All of
the tests and calculations for this paper excluding
Rolling Delta—which was performed using the
Stylo package for R programming language, de-
veloped by Eder, Rybicki, and Kestemont (https://
sites.google.com/site/computationalstylistics/stylo)—
are written in Python to optimize performance and
collection.10

5 Validating our approach

To validate our method, we perform our tests in two
separate scenarios. First, we run our tests on authors

A. A. G. Gladwin et al.

6 of 18 Digital Scholarship in the Humanities, 2015

http://www.nltk.org/
https://sites.google.com/site/computationalstylistics/stylo
https://sites.google.com/site/computationalstylistics/stylo


that worked during the same time and/or in the
same genre as Lovecraft to ensure that we can dif-
ferentiate authors that share those qualities; second,
we run our tests on established revisions/collabor-
ations to see how our test results reflect the mixed
authorship.

5.1 Differentiating Lovecraft from
contemporaries
To explore how our tests interpret results for au-
thors working in similar time periods or genres, we
choose to test the Lovecraft set against a selection of
texts by three authors: Booth Tarkington, who
wrote during approximately the same period as
Lovecraft, but not in the horror, fantasy, or weird
fiction genres; Edgar Allan Poe, whose horror stories
greatly influenced Lovecraft (Joshi, 2011, p. 44), but
who wrote approximately a century earlier; and

Arthur Machen, who wrote before and during
Lovecraft’s lifetime and worked in the horror, fan-
tasy, and weird fiction genres. For each author, we
select a subset of his/her bibliography (Table 3), as
well as one story that will act as a test text in the
same way that we will use ‘The Mound’. For
Tarkington, we use The Magnificent Ambersons;
for Poe, ‘The Fall of the House of Usher’, and for
Machen, ‘The White People’.

5.1.1 Function word PCA

We note that only twenty-four top words could be
tested due to our smaller sample size and the con-
straints of our method of PCA.11 Thus, a lack of
clarity in our visualizations could indicate an
actual lack of distinction between the two authors
in terms of function word usage, or that we are not
measuring enough top words. This bolsters the im-
portance of using Burrows’ Delta, as we can select
any number of top words regardless of the number
of samples.

The Lovecraft and Tarkington sets cluster dis-
tinctly across the first PC with no overlap (Fig. 2).
Both ‘The Mound’ and The Magnificent Ambersons
cluster with their respective authors.

For the comparison between the Lovecraft and
Poe sets, the test does not differentiate the stories as
clearly (Fig. 3); ‘The Mound’ and ‘The Fall of the
House of Usher’ appear between the two clusters.
This discrepancy shows why it is especially import-
ant to follow up on unclear results with other tests
in order to check whether the lack of separation is
due to the constraints of the test or the texts
themselves.

The Lovecraft and Machen sets differentiate
clearly (Fig. 4). ‘The Mound’ and ‘The White
People’ are substantially closer in terms of function
word usage to the Lovecraft set and Machen set,
respectively.

Function word PCA does differentiate the au-
thors distinctly in most cases, with few cases of
‘mis-attribution’. The tests do not always demarcate
the differences clearly enough, which shows that we
should look more closely at the comparison either
by increasing the number of top word counts mea-
sured or looking to other tests, namely Burrows’
Delta.

Table 2 Lovecraft stories and token count

Title Token Count

The Festival 3,611

Herbert West-Reanimator 11,894

The Hound 2,942

Hypnos 2,745

The Lurking Fear 8,050

The Music of Erich Zann 3,433

The Nameless City 4,926

The Outsider 2,556

The Picture in the House 3,317

The Rats in the Walls 7,819

The Temple 5,337

The Terrible Old Man 1,126

Table 1 Eddy’s stories and token counts

Title Token count

An Arbiter of Destiny 2,698

Arhl-A of the Caves 3,601

Ashes 3,182

The Better Choice 3,654

The Cur 2,883

Deaf, Dumb, and Blind 4,640

Eterna 3,278

The Ghost-Eater 3,842

Red Cap of the Mara 4,294

Sign of the Dragon 22,561

Souls and Heels 5,484

With Weapons of Stone 3,498

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 7 of 18


5.1.2 Burrows’ Delta

Burrows’ Delta provides the most accurate results in
terms of predicting the authorship of our test texts
(Table 4).12 Although a Delta score for an authorial
subset on its own can reveal how much or little that
author’s usage of common words resembles that of
the test text’s, the number we will use in comparing
authors is the difference between the Delta scores.
Because a lower Delta score means an author’s style
is ‘less unlike’ that of the test text, the difference
between Delta scores reveals the extent to which
one author is stylistically closer.

Our results for the comparison of the Lovecraft
and Tarkington sets yield smaller Delta scores for
Lovecraft when ‘The Mound’ is the test text, and
similarly smaller Delta scores for Tarkington when

The Magnificent Ambersons is the test text. The dif-
ferences range from an absolute value of 0.5 to over
0.8. Our results for Machen show that Lovecraft’s
style is closer to that of ‘The Mound’ than

Table 3 Tarkington, Poe, and Machen stories

Tarkington Poe Machen

Alice Adams Harlequin and

Columbine

The Black Cat The Masque of the

Red Death

The Angels of Mons The Hill of Dreams

The Beautiful Lady Penrod The Cask of

Amontillado

The Murders in the

Rue Morgue

Far Off Things The Inmost Light

The Conquest

of Canaan

Penrod and Sam A Descent into

the Maelström

The Pit and the

Pendulum

A Fragment of Life The Secret Glory

The Flirt Seventeen The Facts in

the Case

of M. Valdemar

The Purloined Letter The Great God Pan The Terror

Gentle Julia The Turmoil Hop-Frog The Tell-Tale Heart The Great Return The Three

Imposters

The Gentleman

from Indiana

Ligela Hieroglyphics: A Note

upon Ecstasy

in Literature

Fig. 2 Function word PCA results for Lovecraft and
Tarkington

Fig. 4 Function word PCA results for Lovecraft and
Machen

Fig. 3 Function word PCA results for Lovecraft and Poe

A. A. G. Gladwin et al.

8 of 18 Digital Scholarship in the Humanities, 2015


Machen’s, and Machen’s is closer to that of ‘The
White People’ than Lovecraft’s.

Poe, in contrast, produces similar difficulties to
those seen with function word PCA. Although ‘The
Mound’ is closer to the Lovecraft set, ‘The Fall of the
House of Usher’ produces a difference in Delta scores
of approximately 0.05 in favor of Lovecraft. Thus, we
have a ‘false positive’, or a result that inaccurately
indicates the authorship of a disputed text. To test
that Delta is actually unable to differentiate the style
of the two authors clearly, though, we return to
Burrows’ suggestion about the number of top
words chosen: although thirty is often enough to
differentiate, it is on the low end of the recom-
mended values.13 Thus, one way we can test our
result is to perform Delta tests for various top
word counts. Therefore, we perform Delta tests
with word counts ranging from 30 to 100 in incre-
ments of ten words. The results clearly show that
usage of common words in ‘The Fall of the House
of Usher’ is closer to that of the Poe set (Fig. 5). In
fact, the more top words we use, the clearer that
result becomes. This suggests that if our results for
an unknown text yield differences in Delta scores that
are below 0.1, then we should further investigate the
scores by testing with more top words to ensure that
the result for thirty top words is not a misrepresen-
tation of the actual stylistic qualities being measured.
As a result, we will consider an author’s style to be
notably closer to that of the test text if the difference
between the authors’ Delta scores is 0.1 or greater.

5.1.3 Rolling Delta

The results for Rolling Delta are similar to those for
Burrows’ Delta.

14
Rolling Delta is useful for exam-

ining the stylistic qualities of several sections of the
test text, which informs collaborative authorship at-
tribution by potentially revealing segmentation. Our

test texts, however, are not only unsegmented, but
also non-collaborative, which means our results
should resemble those we found for Burrows’ Delta.

In the comparisons between the Tarkington and
Lovecraft sets, we see similar results to the ones
from Burrows’ Delta; for ‘The Mound’, with a
window size of 100% of the windows yield smaller
Delta scores for the Lovecraft set, with an average
difference of 3.871. For Lovecraft, the Delta scores
range from 3.242 to 11.320, and for Tarkington,
they range from 8.627 to 17.138. When we use
The Magnificent Ambersons as a test text, we see
similarly clear results: 100% of the windows yield
smaller Delta scores for Tarkington, with an average
difference of 6.078 in favor of Tarkington. The
values for Tarkington range from 5.221 to 11.483,
and for Lovecraft, they range from 9.945 to 17.915.

The Rolling Delta results for comparing Poe and
the Lovecraft sets similarly reflect our results for
Burrows’ Delta. When we use ‘The Mound’ as the
test text, the Lovecraft set has smaller Delta scores
for 100% of the windows, with an average differ-
ences of 3.358 in favor of Lovecraft. The values for
Lovecraft range from 3.367 to 12.183, and for Poe,

Table 4 Burrows’ Delta results for Lovecraft, Tarkington, Poe, and Machen

Test Author 1 Delta 1 Author 2 Delta 2 Diff

‘The Mound’ Lovecraft 0.48089 Tarkington 0.98876 0.50787

The Magnificent Ambersons 1.15699 0.29169 0.865306

‘The Mound’ 0.41838 Poe 0.67259 0.25421

‘The Fall of the House of Usher’ 0.61295 0.66131 0.04836

‘The Mound’ 0.4799 Machen 0.69305 0.21315

‘The White People’ 1.41344 1.22569 0.18775

Fig. 5 Delta differences for Lovecraft and Poe

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 9 of 18


from 7.441 to 14.635. With ‘The Fall of the House of
Usher’—for which we run all Rolling Delta tests
with a window size of 1,500 and step size of 100—
the results are not as decisive, but still indicate that
Poe’s style is predominant: 72.4% of the windows
yield smaller scores for Poe, with an average differ-
ence for those windows of 0.898. For the windows
that favor Lovecraft, the average difference is 0.302.
The Delta scores for Poe range from 4.039 to 8.986,
and for Lovecraft they range from 3.74544 to
10.27372.

Although there is some misattribution here for
the windows in ‘The Fall of the House of Usher’, we
remind the reader that these scores are unweighted.
Thus, for thirty top words, the average differences of
0.898 and 0.302 are weighted to 0.030 and 0.010.
These small values indicate that the test text does
not as clearly differentiate the style of the potential
authors, and should be considered with the results
from the other tests, or further testing.

For Machen, we again see reflections of our re-
sults for Burrows’ Delta. For ‘The Mound’, the
Lovecraft set has smaller Delta scores for 80.8% of
the windows, with an average difference for those
windows of 1.302 in favor of Lovecraft. For the win-
dows that yield smaller Delta scores for Machen, the
average difference is 0.315. The values for Lovecraft
range from 2.926 to 8.724, and for Machen, from
5.238 to 10.012. For ‘The White People’, 87.5% of
the windows have smaller Delta scores for Machen,
albeit with a smaller average difference of 0.618. For
the windows in favor of Lovecraft, the average dif-
ference is 0.293. The values for Machen range from
7.443 to 15.978, and for Lovecraft, from 7.616 to
16.095.

Taken in whole, Rolling Delta always favors the
proper author, and the cases of mis-attribution do
not yield large differences in Delta scores. However,
Rolling Delta does get results that correctly identify
the true author for test texts that are as small as
4,000 words. Thus, it is sufficiently reliable and
will help us test for a segmented collaboration
‘The Loved Dead’.

5.2 Determining authorship in an estab-
lished collaboration
To test an established collaboration, we return to
the field of ‘hack writing’, in this case focusing on
stories featuring the character Conan the Barbarian.
Starting in 1950 and ending in 1954, Gnome Press
published five books containing Robert E. Howard’s
original Conan stories. In 1955, a sixth book, Tales
of Conan, was published. However, the stories con-
tained in the sixth book were not original Conan
stories written by Howard. Rather, L. Sprague de
Camp took four existing non-Conan stories by
Howard, recast them as Conan stories, and changed
their titles, resulting in ’The Road of the Eagles’, The
Flame Knife, ‘Hawks over Shem’, and The Blood-
Stained God.

As these works were originally penned by
Howard and then heavily edited by de Camp—
including changes to the setting, time period, and
main characters—these stories are an example of a
specific type of revision, one that borders on post-
humous collaboration. Given our interest in the
nature of the revision versus collaboration, these
texts will be compared to the nineteen Conan stories
by Howard as well as sixteen non-Conan stories by
de Camp (Table 5). For the collaborations, we will

Table 5 Howard and de Camp stories

Howard de Camp

Beyond the Black River Red Nails The Ameba The Hostage of Zir

Black Colossus Rogues in the House The Clocks of Iraz The Inspctor’s Teeth

The Devil in Iron The Scarlet Citadel The Command Little Green Men from Afar

Gods of the North Shadows in the Moonlight The Emperor’s Fan The Merman

Hour of the dragon Shadows in Zamboula The Gnarly Man Nothing in the Rules

Jewels of Gwahalur The Slithering Shadow The Goblin Tower Reward of Virtue

The People of the Black Circle The Tower of the Elphant The Guided Man Two Yards of Dragon

The Phoenix on the Sword The Vale of Lost Women The Hardwood Pile The Unbeheaded King

The Pool of the Black One A Witch Shall Be Born

Queen of the Black Coast

A. A. G. Gladwin et al.

10 of 18 Digital Scholarship in the Humanities, 2015


use ‘The Road of the Eagles’, The Flame Knife, and
‘Hawks over Shem’ as test texts.

5.2.1 Function word PCA

Function word PCA clearly divides the authors
across the first principal component, and the test
texts all cluster with Howard (Fig. 6).15 The analysis
of function words reflects that heavily edited texts
may retain stylistics similarities to the original
author, namely in the use of function words; despite
de Camp’s heavy editing of the texts, these three test
texts clearly cluster with the Howard samples.

5.2.2 Burrows’ Delta

Burrows’ Delta corroborates our results from
Function Word PCA, with all three test texts yield-
ing smaller Delta scores for Howard than de Camp
(Table 6). This reflects the fact that, in terms of
common word usage, the test texts resemble
Howard more closely than they do de Camp.
Based on these tests, we can infer that even a heavily
edited text can still bear stylistic similarities to the
original author. Notably, both author sets yield
small Delta scores, less than 0.5 for all of them,
and Howard’s Delta scores for two of the stories is
smaller by a margin greater than 0.2. These results
imply that both author sets have word usage closer
to the test texts than Tarkington, Poe, and Machen
do to ‘The Mound’, as well as Lovecraft to those
authors’ respective texts.

5.2.3 Rolling Delta

The results for Rolling Delta similarly reflect the
clear stylistic resemblance of the common word
usage in the test texts to that of the works of
Howard. For ‘Hawks Over Shem’, 81.1% of the win-
dows yield smaller Delta scores for Howard, with an
average difference of 2.937, whereas the average dif-
ference for the windows in favor of de Camp is
1.559. The Delta scores for Howard range from
5.430 to 11.871 and from 6.207 to 17.053 for de
Camp.

For The Flame Knife, 66.7% of the windows favor
Howard with an average difference of 2.691. The
windows in favor of de Camp have an average dif-
ference of 1.606. For Howard, the values range from

6.638 to 14.814, and from 5.652 to 16.188 for de
Camp.

For ‘The Road of the Eagles’, 71.7% of the win-
dows favor Howard with an average difference of
3.609, compared to the average difference of 3.444
for the windows in favor of de Camp. For Howard,
the values range from 5.646 to 13.813, and from
5.430 to 18.023 for de Camp.

For each text, several windows of the text are
stylistically similar to de Camp; this could stem
from the authors’ varying word usage, or reflect sec-
tions more heavily rewritten by de Camp.
Regardless, the results indicate that even though
heavy editing can affect the common words usage
to resemble the style of the editing author, the style
of the original author can remain predominant.

6 Results for ‘The Loved Dead’

6.1 Function word PCA results
The Lovecraft and Eddy sets differentiate with
minor overlap, and ‘The Mound’ is correctly clus-
tered with Lovecraft’s texts; ‘The Loved Dead’ clus-
ters toward the rightmost edge of the Eddy set,
although we must again note that only 25 words
were tested here for the reasons mentioned in
Note 11 (Fig. 7).

These results would imply that ‘The Loved Dead’
is closer in style to Eddy in terms of common func-
tion word usage, although not as drastically as ‘The
Mound’ is to Lovecraft. These results could further
imply that this test struggles to differentiate the

Fig. 6 Function word PCA results for Howard and de
Camp

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 11 of 18


authors, although the ability to accurately categorize
‘The Mound’ so strongly would provide counter-
evidence to that claim. Still, another test of
common word usage that approaches the measure-
ments differently and considers a less selective set is
useful in corroborating or contradicting these find-
ings, and so we look to Burrows’ Delta.

6.2 Burrows’ Delta results
Preliminary, Burrows’ Delta tests on ‘The Mound’
with thirty top words and no subdivision clearly
distinguish the two authors: Lovecraft’s set has a
Delta score of 0.37722, and Eddy’s of 0.73271
(Table 7). The test captures large differences that
indicate Lovecraft is stylistically closer in terms of
common (function and non-function) word usage.
The difference of 0.355 between the two authors is
notable for Delta scores, and the low score corrob-
orates the picture painted here that Lovecraft is styl-
istically closer to the author of ‘The Mound’ than
Eddy.

When ‘The Loved Dead’ is the test text, the re-
sults are less staggering: Lovecraft’s set has a Delta
score of 1.03259, and Eddy’s, 0.96894, resulting in a
difference of 0.063651 in favor of the Eddy set.

Nonetheless, the results show that Eddy is closer
in common word usage than Lovecraft.

A possibility for the lack of such large differences
as those seen in ‘The Mound’ is that ‘The Loved
Dead’ is a significantly shorter text—nearly one-
seventh of the Lovecraft story’s token count. Thus,
in order to find out how this difference compares to
other Delta scores, we perform Burrows’ Delta with
each of the Lovecraft and Eddy stories as the test
text one at a time, tested against the Lovecraft and
Eddy subsets. In each case, we remove the test text
from the respective author subset. This allows us to
see how many false positives are found in stories for
which we know the author, and find the range for
the differences in Delta scores.

We visualize the results for these tests by plotting
the difference between Delta scores for each graph,
with a negative difference reflecting a smaller score
for Lovecraft, and a positive difference reflecting a
smaller Delta score for Eddy. Thus, a story that
reads closer to Eddy will have a positive difference,
while a story that reads closer to Lovecraft will
have a negative difference. We find that there are
no false positives for Lovecraft, but two for Eddy:
‘The Ghost-Eater’ and ‘Deaf, Dumb, and Blind’
(Fig. 8).

The differences in Delta scores for these stories
are smaller than most of the other tests, which
implies that the common word usage is not nearly
as distinguishing as it is for the other texts. To fur-
ther investigate the false positives, we reapply the
method we used when Delta scores for Lovecraft
and Poe tested against ‘The Fall of the House of
Usher’ yielded a false positive: we test Delta scores
for word counts ranging from 30 to 100 in incre-
ments of ten. When we perform these tests, we see
only confirmation that the styles of the stories are
closer to Lovecraft, rather than the clear picture we
saw with ‘The Fall of the House of Usher’ (Fig. 9).
‘Deaf, Dumb, and Blind’ in particular consistently

Fig. 7 Function word PCA results for Lovecraft and Eddy

Table 6 Burrows’ Delta results for Howard and de Camp

Test Author 1 Delta 1 Author 2 Delta 2 Diff

‘Hawks over Shem’ Howard 0.25829 de Camp 0.33691 0.07862

The Flame Knife 0.23253 0.46937 0.23684

‘The Road of the Eagles’ 0.17150 0.37429 0.20279

A. A. G. Gladwin et al.

12 of 18 Digital Scholarship in the Humanities, 2015


yields a stronger score for Lovecraft by a margin
greater than 0.1.

‘The Ghost-Eater’ and ‘Deaf, Dumb, and Blind’
are the only two stories that Delta fails to distin-
guish in our testing. The data for the function word
PCA graphs reveal that the two rightmost Eddy
values—i.e. those closest to the Lovecraft cluster,
which implies they are less distinctly similar to
Eddy in terms of function word usage—are ‘The
Ghost-Eater’ and ‘Deaf, Dumb, and Blind’ (Fig. 7).
This is notable because these are two stories on
which Lovecraft is also known to have done revision
work. In terms of ‘The Ghost-Eater’, Lovecraft (as
cited in Derleth, 1966) claims in a 1923 letter to
Muriel Eddy that he ‘made two or three minor re-
visions’, which Joshi (2010) finds agreeable, claim-
ing that he ‘cannot detect much actual Lovecraft
prose, unless he was deliberately altering his style’
(p. 466). In regard to ‘Deaf, Dumb, and Blind’, there
is little in terms of Lovecraft’s own claims, although
Joshi adds that—while Eddy only implies that
Lovecraft fixed up the last paragraph—‘in truth,
the entire tale was probably revised, although
again Eddy very likely had prepared a draft’
(p. 467). Although the margin for Delta scores
with ‘The Ghost-Eater’ are relatively small, the dif-
ferences for ‘Deaf, Dumb, and Blind’ are large

enough to support the claim that Lovecraft’s style
is as present—if not more present—than Eddy’s.

Both stories are actually closer to Lovecraft’s style
in terms of common word usage than ‘The Loved
Dead’, and are the only two of the Eddy subset for
which that is true. Thus, we have inadvertently
found that these three stories are the most difficult
to differentiate stylistically between Lovecraft and
Eddy; therefore, the stories have measurable simila-
rities to the former’s style. When we consider the
Delta scores that strongly favored Howard over de
Camp, the scores for ‘The Loved Dead’, ‘The Ghost-
Eater’, and ‘Deaf, Dumb, and Blind’ could provide
evidence toward claims that Lovecraft revised the
stories substantially, because even de Camp’s exten-
sive rewriting did not make his common word usage
more predominant in the texts. This would imply
that he rewrote at least portions of the story, and the
small difference in scores could imply that he did
indeed rewrite the second half.

When we apply the varying word count method-
ology to Delta tests for ‘The Mound’ and ‘The
Loved Dead’, we find a corroboration of our previ-
ous findings for the two stories, namely, ‘The
Mound’ is definitively closer in style to the
Lovecraft set, and ‘The Loved Dead’ is not

Fig. 9 Delta differences for Lovecraft and Eddy

Fig. 8 Delta differences for Lovecraft and Eddy

Table 7 Burrows’ Delta results for Lovecraft and Eddy

Test Author 1 Delta 1 Author 2 Delta 2 Diff

‘The Mound’ Lovecraft 0.37722 Eddy 0.73271 0.35549

‘The Loved Dead’ 1.03259 0.96894 0.063651

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 13 of 18


significantly differentiated, as the author with the
smaller Delta score changes, given the number of
top words considered, and the differences are
within 0.1 of each other (Fig. 10).

6.3 Rolling Delta results
Our Rolling Delta results largely reflect our
Burrows’ Delta results.16 For ‘The Mound’, 94.7%
of the windows have smaller Delta scores for the
Lovecraft set when compared to Eddy, with an aver-
age difference in Delta scores of 1.791. For the win-
dows that favor Eddy, the average difference is
0.186. The Delta scores for Lovecraft range from
3.301 to 10.301, and for Eddy they range from
4.530 to 12.008. For ‘The Loved Dead’, interestingly,
Eddy yields smaller Delta scores for 100% of the
windows, with an average difference of 1.996. The
values range from 5.624 to 7.771 for Eddy, and from
7.724 to 9.838 for Lovecraft.

To further investigate these results, we look into
the effects of culling. Because ‘The Loved Dead’ is
such a small text, and we are already working with
shorter and fewer texts, the effects of culling can be
severe. Even common words might not show up in a
smaller text, and thus would be removed from the
set due to our decision to set the culling rate to 100.
When the culling rate is 100, we do not control the
number of top words considered, which means the
count can be below thirty. We will consider no cul-
ling with 30, 50, and 100 top words.

When we perform Rolling Delta with thirty top
words (Fig. 11), our results seem in line with our
findings in Burrows’ Delta. Sixty percent of the win-
dows favor Lovecraft, with an average difference of
0.435, compared to the average difference of 0.311
for the windows that favor Eddy. The values for
Lovecraft range from 14.016 to 15.669, and from
13.933 to 16.770 for Eddy. The windows are
almost evenly split in terms of which author has a
smaller Delta score, which reflects the fact that the
test can not clearly differentiate the style of the
author. However, the windows with the largest dif-
ference in Delta scores are the last two, which due to
the window size encapsulate the entire second half
of the story. The differences for these windows are
1.249 and 1.470, which are greater than the average
differences by a factor of approximately 4.

When we perform Rolling Delta with 50 (Fig. 12)
and 100 (Fig. 13) top words, the Lovecraft set gar-
ners a smaller Delta score for every window, with
respective average differences of 1.036 and 1.448.
The values for the tests with fifty words range
from 16.738 to 18.735 for Lovecraft and from
17.217 to 20.263 for Eddy. The values for the tests
with 100 words range from 21.848 to 23.864 for
Lovecraft and from 23.278 to 26.088 for Eddy. In
both cases, the last two windows yield the largest
difference in Delta scores. For Rolling Delta with
fifty top words, the differences for the last two win-
dows are 2.537 and 2.828—approximately twice the
average difference—and for 100 top words, the dif-
ferences for the last two windows are 2.406 and
2.732—again approximately twice the average
difference.

7 Conclusion

7.1 Interpretation of results
Our tests consistently reveal that ‘The Loved Dead’
does not bear an overwhelming stylistic similarity to
Lovecraft or Eddy, and is consistent with the con-
jecture that the story is a collaboration. We per-
formed lexical richness PCA to see if we could
detect stylistic similarities in infrequently used
words. However, these tests did not reveal anything
regarding the authorship of ‘The Loved Dead’ and
are known to be of questionable reliability in gen-
eral. We performed function word PCA because the
analysis of common word usage is more reliable,
and this test indicated that ‘The Loved Dead’ is

Fig. 10 Delta differences for Lovecraft and Eddy

A. A. G. Gladwin et al.

14 of 18 Digital Scholarship in the Humanities, 2015


closer in terms of function word usage to Eddy’s
style, although it does share similarities to
Lovecraft’s.

We selected Burrows’ Delta because it proved to
be the most reliable in our preliminary tests, and
allows us to look at most common word usage
and not just function word usage. Again, our results
showed that ‘The Loved Dead’ is stylistically closer
to Eddy, but also has stylistic similarities to
Lovecraft, which could imply that Lovecraft’s pres-
ence as author in the story is a larger revision akin
to de Camp’s revision of Howard stories. Further

study on the relationship between the extent of a
collaboration and the margin of Delta scores is war-
ranted. Finally, we also tested with Rolling Delta to
see if ‘The Loved Dead’ might be a segmented col-
laboration, and while our results generally suggest
that both authors’ style is detectable in the whole
story—again, implying that Lovecraft likely rewrote
sections of the text—Lovecraft’s is particularly no-
ticeable in the story’s second half, which corrobor-
ates his epistolary claim.

The results for all four tests, when considered in
tandem, suggest that Lovecraft likely edited ‘The

Fig. 12 Rolling Delta results for Lovecraft and Eddy

Fig. 11 Rolling Delta results for Lovecraft and Eddy

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 15 of 18


Loved Dead’, perhaps extensively, but did not write
the entirety or majority of the story as it appeared in
Weird Tales. At most, it appears that his edits
focused on the second half of the story, as he
stated in his letters. This claim supports a middle-
ground between the claims made by Joshi and
Eddy’s family. No definite claims can be made—
as, we want to stress, this information is only evi-
dence and should not be considered an endpoint for
scholarship on the matter—but the evidence cer-
tainly goes against any claim that either author is
solely responsible for ‘The Loved Dead’. We have
found that the same is true for ‘The Ghost-Eater’
and ‘Deaf, Dumb, and Blind’.

Our set of tests allows for a deeper understanding
of the extent to which multiple authors might have
contributed to a text. Although the tests do require
interpretive work by the person performing the
tests, the use of these four tests in tandem is
useful because each provides unique insights while
potentially corroborating the results found in the
other three. Although the results of stylometric ana-
lyses cannot demonstrate causality, mathematical
methods can describe features that correlate with
authorial identity, which is particularly useful
when historical evidence is not present. Further,
we find that ‘hack writers’ of the pulp market pro-
vide particularly fruitful test cases of collaborative
authorship. Future analyses of collaborative author-
ship might extend our findings by applying our

methods to other collaborative case studies, com-
paring our methods with machine-learning meth-
ods, or studying how ‘collaborative proportion’
affects Delta scores.

References
Binongo, J. N. G. (2003). Who wrote the 15th book of

Oz? An application of multivariate analysis to author-
ship attribution. Chance, 16(2): 9–17.

Burrows, J. (1989). ’An ocean where each kind. . .’: stat-
istical analysis and some major determinants of literary
style. Computers and the Humanities, 23(4/5): 309–21.

Burrows, J. (2003). Questions of authorship: attribution
and beyond: a lecture delivered on the occasion of the
Roberto Busa Award ACH-ALLC 2001, New York.
Computers and the Humanities, 37(1): 5–32.

Cox, H. and Mowatt, S. (2014). Revolutions from Grub
Street: A History of Magazine Publishing in Britain.
Oxford: Oxford UP.

de Camp, L. S. (1975). Lovecraft: A Biography. London:
New English Library.

Derleth, A. W. (1966). [Introduction]. In Divers Hand
and H. P. Lovecraft (Author), The Dark Brotherhood
and Other Pieces. Sauk City, WI: Arkham House, pp.
ix–x.

Eddy, M. (1998). The gentleman from Angell Street. In
Cannon P. H. (ed.), Lovecraft Remembered. Sauk City,
WI: Arkham House, pp. 49–64. (Original work pub-
lished 1961)

Fig. 13 Rolling Delta results for Lovecraft and Eddy

A. A. G. Gladwin et al.

16 of 18 Digital Scholarship in the Humanities, 2015


Grieve, J. (2007). Quantitative authorship attribution: an

evaluation of techniques. Literary and Linguistic

Computing, 22(3): 251–70.

Holmes, D. I. (1992). A stylometric analysis of Mormon

scripture and related texts. Journal of the Royal

Statistical Society. Series A (Statistics in Society),

155(1): 91–120.

Holmes, D. I. (1998). The evolution of stylometry in
humanities scholarship. Literary and Linguistic

Computing, 13(3): 111–7.

Honorè, A. (1979). Some simple measures of richness of

vocabulary. Association for Literary and Linguistic

Computing Bulletin, 7: 172–7.

Hoover, D. L. (2004). Delta prime? Literary and Linguistic

Computing, 19(4): 477–95.

Joshi, S. T. (1981). A note on the texts [Foreword]. In

Joshi S. T. (ed.) and H. P. Lovecraft (Author), The
Horror in the Museum. New York: Del Rey, Reprint

edn., pp. xv–xviii.

Joshi, S. T. (2010). I Am Providence: The Life and Times of

H.P. Lovecraft. New York: Hippocampus.

Joshi, S. T. (2011). [Introduction]. In Joshi S. T. (ed.) and

H. P. Lovecraft (Author), The Crawling Chaos and

Others: The Annotated Revisions and Collaborations of

H. P. Lovecraft, Vol. 1. Welches, OR: Arcane Wisdom,

pp. 7–17.

Juola, P. (2006). Authorship attribution. Foundations and
Trends in Information Retrieval, 1(3): 233–334.

Kuiper, S. and Sklar, J. (2012). Principal component ana-

lysis: stock market values. In Practicing Statistics:

Guided Investigations for the Second Course, 1st edn.

Upper Saddle River, NJ: Pearson, pp. 332–68.

Lang, A. (1896). In defense of the literary hack. Current

Literature, 20: 15.

Look, D. M. (2012). Statistics in the Hyborian Age: an

introduction to stylometry. In Prida J. (ed.), Conan
Meets the Academy: Multidisciplinary Essays on the

Enduring Barbarian. Jefferson, NC: McFarland, pp.

103–22.

Lovecraft, H. P. (1993). Letters to Robert Bloch. S. T. Joshi

and D. E. Schultz (Eds.). West Warick, RI:

Necronomicon Press.

Lovecraft, H. P. (2005). Letter to Lillian D. Clark [13

December 1925]. In Joshi S. T. and Schultz D. E.
(eds), Letters from New York, vol. 2. Lovecraft Letters.

San Francisco: Night Shade, p. 252.

Morton, A. Q. (1978). Literary Detection. New York:

Scribners.

Mosteller, F. and Wallace, D. L. (1964). Inference and
Disputed Authorship: The Federalist. Reading, MA:
Addison-Wesley.

Reeve, J. K. (1910). Practical Authorship. Ridged, NJ:
Editor Company.

Rybicki, J., Hoover, D., and Kestemont, M. (2014).
Collaborative authorship: Conrad, Ford, and rolling
delta. Literary and Linguistic Computing, 29: 422–31.

Smith, M. W. A. (1987). Hapax legomena in prescribed
positions: an investigation of recent proposals to re-
solve problems of authorship. Literary and Linguistic
Computing, 2: 145–52.

Tweedie, F. J. and Baayen, R. H. (1998). How variable
may a constant be? Measures of lexical richness in per-
spective. Computers and The Humanities, 32: 323–52.

Yule, G. U. (1944). The Statistical Study of Literary
Vocabulary. Cambridge: Cambridge University Press.

Notes
1 This opinion seems to have evolved over two decades;

writing three decades prior, Joshi (1981) states, ‘the two
authors probably contributed equally’ (p. xvii).

2 T ¼ V=N , where V is the number of distinct words in a
text and N, the total number of words. H ¼ V2=V
where V2 is the number of words appearing exactly

twice in a text. R ¼
100logðNÞ
1�V1=V

a value first suggested

and defined in Honorè (1979). K ¼
104ð

X
i2 Vi�NÞ

N 2
,

where Vi is the number of words appearing i times;
this value was first suggested and defined in Yule
(1944).

3 The process for performing PCA is outlined by Kuiper
and Sklar (2012).

4 A corpus is defined as any collection of texts. In our
case, we have three subsets: ‘The Loved Dead’, Lovcraft
texts, and Eddy texts.

5 The mathematics behind calculating Delta are outlined
more completely in Burrows (2003), but are outlined in
short here. A main set—or, to use a familiar term,
corpus—is created from all of the texts under consid-
eration. Any number of the most frequent words be-
tween 30 and 150 are found, and Burrows separates
homographs such as ‘before’, which can be either a
conjunction or preposition, although the practice is
not necessary. He then finds the percentage that each
of these words takes up for each text of the corpora, and
standardizes these percentages based on the mean and
standard deviation for that word across the texts. A
Delta score for a subset is the average absolute value

Stylometry and collaborative authorship

Digital Scholarship in the Humanities, 2015 17 of 18


of the average z-scores for the number of words being
considered across all the texts in that subset. Thus, if we
have a key text k and author subset a, the Delta score
for a for a list of n words is:

DeltaðaÞ ¼

Xn
i¼1
jzki � zaij

n
:

6 The discarded draft of The Shadow Over Innsmouth, a
purposefully experimental piece, is not included, as it
was a deliberate attempt by Lovecraft to depart from his
usual style (Joshi, 2010, p. 791). ‘The Thing in the
Moonlight’, which was adapted from notes in one of
Lovecraft’s letters, is also not included, as he did not
write the published piece (Joshi, 2010, p. 697).

7 We initially ran tests with this set and a larger set of all
of Lovecraft’s horror stories, which excludes his prose
poems, humorous stories, collaborations, ghost-written
projects, and so-called ‘Dream Cycle’ stories. All of the
results were largely identical and did not add insight
into our testing.

8 ‘The Mound’ contains 29,019 tokens and ‘The Loved
Dead’, 3,909.

9 Tokenization involves three major steps: first, we
remove apostrophes, semicolons, commas, and periods.
One consequence of this process is that contractions
can be confused with homographs, e.g. ‘can’t’ and
‘cant’; however, this does not skew our results signifi-
cantly. Second, we tokenize the texts with NLTK’s func-
tion. Third, we remove any remaining punctuation or
leftover odd characters such as HTML tags from the
tokenized list. The remaining list consists of words
only, which allows us to perform the frequency tests.

10 We find the values used to calculate T, H, R, and K, as
well as word frequencies, using methods in NLTK’s
FreqDist class, which creates a frequency distribution

of the tokens in a text. We utilize SciPy’s ‘NumPy’ and

‘stats’ packages. The former allows us to store data in a
matrix format for use in PCA computation, and the

latter allows us to standardize values as z-scores
(http://www.scipy.org). We perform PCA using

Matplotlib’s ‘mlab’ module, which mimics MATLAB
functions (http://www.matplotlib.org).

11 PCA, in our case, treats texts as rows and measure-

ments as columns, and the tool we use to run PCA—
Matplotlib’s ‘mlab’ module—requires that there be

more rows than columns.
12 We have chosen to perform these tests using only two

authors at a time in order to most closely emulate the

testing procedure we will use for ‘The Loved Dead’.
13 We use thirty as the word count as a default because it

is, generally, sufficient. It also avoids any potential
complications from including too many top words,

e.g. incorporating words that are used commonly in
only one set. This is the reason that, when the Delta

scores are close to each other, we try a range of values
between 30 and 100: it gives us a complete picture of
the differences in common word usages between the

sets, accounting for the potential errors associated
with the extremes of our word count range.

14 We run this test at a culling rate of 100. Unless other-
wise stated, the window size is 5,000 and the step size

is 1,000.
15 Two de Camp samples are not visible in the selected

window. They carried such highly positive values for

the first principal component that their inclusion
made the samples on the graph more difficult to

discern.
16 We use a window size of 1,500 words and a step size

of 100 words for tests with ‘The Loved Dead’, as that

is the low end of the range stated by Burrows for
Delta.

A. A. G. Gladwin et al.

18 of 18 Digital Scholarship in the Humanities, 2015

http://www.scipy.org
http://www.matplotlib.org