OP-LLCJ170039 78..88


Qu’est-ce qu’un texte numérique?—
A new rationale for the digital
representation of text
............................................................................................................................................................

Joris J. van Zundert

Royal Netherlands Academy of Arts and Sciences, The Netherlands

Tara L. Andrews

University of Vienna, Austria
.......................................................................................................................................

Abstract
In this article we aim to provide a minimally sufficient theoretical framework to
argue that it is time for a re-conception of the notion of text in the field of digital
textual scholarship. This should allow us to reconsider the ontological status of
digital text, and that will ground future work discussing the specific analytical
affordances offered by digital texts understood as digital texts. Following from the
argument of Suzanne Briet regarding documentation, referring to Eco’s under-
standing of ‘infinite semiosis’, and accounting for the reciprocal effects between
carrier technology and meaning observed by McLuhan, we argue that the func-
tions of document and text are realized primarily by their fluid nature and by the
dynamic character of their interpretation. To define the purpose of textual schol-
arship as a ‘stabilisation’ of text is therefore fallacious. The delusive focus on
‘stability’ and discrete ‘philological fact’ gives rise to a widespread belief in textual
scholarship that digital texts can be treated simply as representations of print or
manuscript texts. On the contrary—digital texts are texts in and of themselves in
numerous digital models and data structures which may include, but is not
limited to, text meant for graphical display on a screen. We conclude with the
observation that philological treatment of these texts demands an adequate digi-
tal and/or computational literacy.

.................................................................................................................................................................................

In 1951 Suzanne Briet asked the question ‘Qu’est-ce
que la documentation?’, ruminating on what it
is that documentation does, what constitutes a
document, and what does not. She departed from
a linguist–philosophical definition of ‘document’—
‘Tout indice concret ou symbolique, conservé, ou
enregistré, aux fins de représenter, de reconstituer
ou de prouver un phénomène ou physique ou intel-
lectuel.’ (Briet, 1951)—and ultimately proposed a
new understanding of the concept of document,
one much more fluid than the writings on paper
that we usually associate with the term.

Un étoile est-elle un document? Un galet roulé
par un torrent est-il un document? Un animal
vivant est-il un document? Non. Mais sont des
documents les photographies et les catalogues
d’étoiles, les pierres d’un musée de minéralogie,
les animaux catalogués et exposés dans un Zoo.
(Briet, 1951)

A rock on the ground, for example, may not have
information communication significance. The same
stone as a part of a museum’s geological collection,
on the other hand, may document the type of rock

Correspondence:

Joris J. van Zundert, Huygens

Institute for the History of the

Netherlands, Royal

Netherlands Academy of Arts

and Sciences, Amsterdam,

The Netherlands.

E-mail:

joris.van.zundert@huygens.

knaw.nl

Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017. � The Author 2017. Published by Oxford University
Press on behalf of EADH. All rights reserved. For Permissions, please email: journals.permissions@oup.com

ii78

doi:10.1093/llc/fqx039 Advance Access published on 4 August 2017

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: -


found in a certain geological layer or area. Her most
famous example, that of an antelope, becomes
documentation of its species as soon as it is cap-
tured by an explorer and housed in a zoo; even its
corpse can be preserved and thus maintained as a
document even after its death. Briet in this way ex-
panded our notion of the ontological and epistemo-
logical status of the concept of ‘document’.

For Briet, documentation ‘was a scientific activity
of the greatest importance’ (Hearns Bishop, 2003, p.
12)—a positive and unifying force, a foundational
work for science, and an inscription technology
that allowed knowledge to be codified, connected,
and spread (Bede, 2007). Most salient to our argu-
ment is that documentation is situated; it is an act of
interpretation made from a particular cultural and
historic context (Hearns Bishop, 2003, pp. 12–13).
Thus, when Briet’s antelope becomes a document
and becomes a part of the document we call tax-
onomy, it is also a constituent part of a specific cul-
turally induced worldview. The antelope document
inscribes the meaning of an antelope according to a
certain specific culture. Briet evokes the sheer prolific
power of documentation to establish these meanings:
‘In our age [. . .] the least event, scientific or political,
once it has been brought into public knowledge im-
mediately becomes weighted down under a ‘‘vest-
ment of documents’’’: a new sub-species of
antelope inspires a newspaper item, is described in
various scientific articles, a specimen gets loaned to
an exhibition, a taxonomic description is made, etc.
(Briet, 1951, 2006; Hearns Bishop, 2003, p. 12).

Briet thus advocated a shift in understanding of
what a document is, urging her readers to focus on
its function rather than the form in which one nor-
mally expects to find it. If we take such a purely
functional view, then a document is anything that,
on the material level, is used by humans to commu-
nicate information to other humans.

Briet’s understanding is thus that the concept of
document is fluid. Documentation is not merely pro-
lific, it is also transformative. Each act of documenta-
tion that sprouts yet more documents—that we
understand now can be of any kind—transforms the
inscription of knowledge that the source document
contained. Bishop in this respect remarking on Briet
notes that Briet’s perspective is still important today

‘because her theories allow us to view a wide variety of
information objects in terms of their relationships’.
These relationships are transformative: documentation
‘is a surrogate artifact, it is an interpretation of the
artifact. In effect, a new ‘artifact’ in the form of docu-
mentation is created to serve as a surrogate for the
artifact’ (Hearns Bishop, 2003, p. 15).

If, thanks to Briet, the concept of ‘document’ is a
fluid one, then so must be the concept of ‘text’;
while text is a feature commonly found in docu-
ments—indeed documents, in the broad under-
standing that Briet defined them, are the carriers
of text—text is no more bound by the document
than documentation is bound by the text it carries.
However, that does not mean that text and docu-
ment do not influence each other. The relationship
between medium and message and between technol-
ogy and meaning is reciprocal (McLuhan, 2003
(1964)). As Alan Kay (1993) put it: ‘I had a very
McLuhanish feeling about media and environments:
that once we’ve shaped tools, in his words, they turn
around and ‘‘reshape us.’’’ The means of documen-
tation in part shapes (or contributes to the shaping
of) the interpretation that a document inscribes.
A taxonomy for instance inclines toward enforce-
ment of its categories: the decision to put a
specimen in a certain category depends not only
on the judgment of, e.g., a biologist but also on
the trade-off to be made between the convenience
of the existence of a category that more or less fits
and the effort it takes to create one that might per-
haps be more fitting, as well as the ramifications that
this new category might have for the overall struc-
ture of the taxonomy. The medium of any text tends
likewise to influence the shaping of the text—in-
scription in stone demands austerity, and a digital
text editor invites prolixity.

The medium shapes the text; in the same vein
then the medium shapes to an extent the meaning
of the text. The prolific nature of documentation
Briet pointed to is also the prolific nature of infor-
mation: cultures use media of any kind to prolifer-
ate information. The medial shaping of text is
causally connected to the remediation that happens
when we push information onto a new medium to
communicate, store, transform, or analyze it (Bolter
and Grusin, 2000). But text is volatile and stable in

Qu’est-ce qu’un texte numérique?

Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017 ii79

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: :
Deleted Text: :
Deleted Text: -
Deleted Text: -
Deleted Text: (1951),
Deleted Text: :
Deleted Text: .
Deleted Text: `
Deleted Text: artifact' 
Deleted Text: :
Deleted Text: ,
Deleted Text:  (1964)
Deleted Text: judgement
Deleted Text: ,


equal measure: even while the medium shapes a
text, so that text can be conveyed across multiple
media, it can still be recognized and meaningful as
the same text. Text thus adapts with great ease to
any new medium—from oral to pictorial to
inscribed to time- or digitally based storage
media—and will permeate nearly any medium
within a short amount of time. Text is constantly
moved, copied, translated, paraphrased, re-written,
re-contextualized, and re-mixed. Each occurrence of
any of these acts produces a new text, the result of
the act, distinct from the text or texts that went into
the act, and yet recognizable nevertheless as ‘the
same text’. This is the textual condition, as it was
defined by Jerome McGann (1991). We can under-
stand text as a stable entity, but the harder we try to
stabilize it, the more stubbornly it refuses to be
bolted down.

Although philologists perceive the unwillingness
of texts to be pinned down as ‘philological fact’ as a
problem (McGann, 2013, van Zundert, 2016, pp.
375–60), for Umberto Eco this is their most salient
feature, as it allows for ‘infinite semiosis’ (Eco,
1981). Eco argues that the sign has its roots in
omens: natural phenomena that could be inter-
preted as predictors of natural events. Dark clouds
forebode rain, and smoke above the forest signals a
fire. This dynamic aspect of inference and interpret-
ation is pivotal for the sign function. Only well after
the invention of writing does an identity function
begin to be attributed to signs as words; this process
results from a unification of a theory of signs and
one of language that, according to Eco, finds it cul-
mination in the work of Augustine. In written lan-
guage denotation, i.e. the assertion that a word (or
sign) uniquely refers to some real-world antecedent,
becomes strongly foregrounded. According to Eco:
‘problems derive from the fact that contemporary
theories of sign have been dominated by a linguistic
model, and a wrong one at that [. . .] where signs are
conceived of as being intentionally emitted and con-
ventionally coded, linked by a bi-conditional bond
to their definition, subject to analysis in terms of
lesser articulatory components, and syntagmatically
disposed according to a linear sequence’ (Eco, 1981,
p. 39). He refers to C. S. Peirce to argue that a sign’s
primary function is not to signify some identity with

a real-world phenomenon, that is a word as a sign is
not strongly linked to one single meaning, instead a
sign works through inference, or interpretation.
Reading, interpretation, and understanding do not
operate by scientific induction or deduction to
make predictions about the meaning of a sign.
Rather, these are abductive processes: given a
word, a reader instantly hypothesizes about possible
meanings of that word and of its relation to words
in its context, based on her own tacit knowledge.
Reading a text is therefore not the decoding of a
sequence of identity relations from text to real-
world objects and events but the construction of
meaning through a process, executed by the
reader, of structuring hypotheses.

It is this phenomenon of abduction-based rea-
soning about the meaning of signs in a text that
drives the infinitive process of interpretation and
reinterpretation to which readers subject a text.
This same phenomenon can be said to underlie
Barthes’ (1975) concept of ‘writerly’ text: as a
reader reads, she is constructing a new ‘cognitive’
text from a sequence of words. This text is ‘writerly’
because the reader is in a sense re-writing the text—
she is both re-constructing its meaning and there-
fore also adorning and changing its meaning be-
cause the hypothetical or abductive way of
decoding signs allows for—or rather, is the direct
cause of—variation in interpretation. This variation
would not be possible if words (and signs) really
only existed as a unique referential relationship of
identity.

Thus we cannot read but by interpretation. The
text-as-signs is the inscription on the page, on the
screen, or on the disk, but that is in a sense the least
enticing aspect of text. It could be argued that the
serialization of a text as words on paper or as bits in
electronic storage exists merely as an affordance for
the reader to hypothesize about the meaning of the
‘actual’ text that is being cognitively constructed as
she reads. It is clear that this new ‘writerly’ and
mental representation of the text is an ephemeral
one, existing only in the mind of the reader. It is
moreover a text that is ontologically different from
what might be recorded in signs on the page. In
order for its meaning to be fully realized, a text
must thus undergo an ontological shift, from

J. J. van Zundert and T. L. Andrews

ii80 Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: -
Deleted Text: recognisable 
Deleted Text: :
Deleted Text: 3
Deleted Text: &mdash;
Deleted Text: real 
Deleted Text: &mdash;
Deleted Text: :
Deleted Text: real 
Deleted Text: &mdash;
Deleted Text: ,
Deleted Text: ,
Deleted Text: ,
Deleted Text:  
Deleted Text: `


existence as signs on paper to existence as a cogni-
tive representation in a reader’s mind. How could
we say that what is in the mind of the reader is ‘not’
a text?

Where the act of documenting creates a surrogate
of a phenomenon or artifact, the act of reading pro-
duces a cognitive surrogate of a text that is itself
already a surrogate: the signs on paper that repre-
sent a cognitive text that once existed in an author’s
mind. To exist as communicated meaning, the text
has undergone not one but two shifts in its onto-
logical status, and arguably has become three texts
along the way—and yet it is said that both author
and reader have acted upon the same text.

Documentation codifies and inscribes how a cer-
tain culture understands its world. As Briet showed,
the volatility of the concept of document and the
prolific nature of documentation are fundamental
to this function. Without a rapid proliferation of
copies, surrogates, derivatives, and remediations,
documentation fails in one of its major purposes.
Text inscribes information that codifies some
understanding about the world. Its carriage by
these rapidly proliferating documents makes text
volatile with respect to form, medium, and—follow-
ing McLuhan, 2003 (1964)—meaning. The abduc-
tive process of interpreting and understanding
them, that according to Eco we cannot escape,
makes their meaning yet more volatile. Moreover,
to achieve its purpose of communicating some
meaning, the volatility of texts—and quite possibly
that of documents too—must also ‘fundamentally’
encompass a negotiation of boundaries between
modes of being, that is an ontological shift from
being as signs-in-a-medium toward being as cogni-
tive representation.

Thus, the functions of document and text are
realized through processes of dynamics. The idea
then that the purpose of textual scholarship is to
‘stabilize’ a text, is an audacious one. Even as the
philologist works to stamp an authority on a par-
ticular version of a text, the text itself replicates cog-
nitively with every exemplar considered, and its
meaning shifts with its medium. The end result, of
course, is a new text which can claim to be a faithful
representation of the cognitive text of the editor,
informed by the texts of the exemplars. This new

text may stake a claim to supplant prior editions,
but it is very difficult to argue that it supplants any
of the exemplars, and even its claim to authority
over prior editions can be questioned. Essentially,
a new set of signs has sprung into existence that
will produce yet more texts, each of which may be
just as prolific as its siblings and ancestors. Thus in
the quest for authority and stabilization of a text,
the philologist cannot help but have a multiplicative
effect.

In the time before the rise of digital scholarly
editions, the sheer audacity of this multiplication
in service to authoritative stability was not so
clear. As long as the system of print production
and mass distribution of books endures within text-
ual scholarship, the authority of the particular
inscribed version of a text that the philologist
seeks to impose has been amplified by the inherent
authority of that version gaining access to the aca-
demically and commercially controlled channels of
distribution and replacing older versions on the
shelves of libraries and bookstores.

The scholar who, on the other hand, goes on to
make a digital edition drives the proliferation of the
ontological status of text even farther, and farther
perhaps than he or she realizes. Peter Shillingsburg
has argued that even a simple digital transcription
cannot but be an imprecise and often erroneous
representation of a written text. The questions he
raises defy simple answers:

But especially transcription—even ‘text only’
transcriptions—involves interpretation (Is it
an i or e? Is it underlined or crossed out? Is
the obscured letter a k or a t? Should that
upright have been crossed as a t or is it an
l?—were the bushes lopped or topped?). And
these questions about the text can be multi-
plied if one asks what is the meaning of
underlining or italics (Is it for emphasis or
to indicate a foreign word, title of a book, or
name of a ship?). And so, it is asked, can the
surrogate be unmediated, representing exactly
the original, such that the user need not see
the physical document? And if transcription is
always interpretive, is all the interpretive ana-
lysis by transcribers of a piece? Is it futile to
distinguish levels of intervention so that the

Qu’est-ce qu’un texte numérique?

Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017 ii81

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text:  (1964)
Deleted Text: :
Deleted Text: s
Deleted Text: -
Deleted Text: stabilisation 
Deleted Text: -
Deleted Text: -


decision about the e/i or t/l, the decision not
to include crossed out words, the decision
about the emphasis/ship’s name, and the de-
cision to add links to related documents just a
continuum of editorial intervention from
minimal to unlimited? (Shillingsburg, 2014,
p. 164)

Just as Eco argues that contemporary theories of
sign have been dominated by a linguistic model
foregrounding the identity between sign and refer-
ent, leaving little space for the more fundamental
operation of inference for formation of meaning,
we argue that contemporary theories in textual
scholarship have been dominated by a foreground-
ing of the document and the sign as purely discrete
(insofar as they are able to be typeset) attributes of
text. Shillingsburg’s statements show how even at
the most intimate level of the glyph philological
editing is seen as a process of abstraction and cre-
ation of discrete surrogates of what is in reality a
multi-attributed material representation of text.

Moreover, his discussion of markup code, and his
claim that it ‘tends to interfere with repurposing’ of
a text, indicates that textual scholars are working in
a medium whose properties and qualities they often
do not yet fully grasp. It is clear, as Shillingsburg
argues, that digital transcriptions are not simply
‘the text’ itself. Even less so are digital editions—
that is to say, digital versions of texts that go
beyond simple transcription, or digital presentation
of print-based editions. These are texts whose
medium lends them qualities that defy translation
to the physical or print medium, as observed by
Sahle (2008). Notwithstanding the claim to repre-
sentation of a handwritten or print text that these
editions generally make, these digital texts are not
simple surrogates or stand-in artifacts for originals
(Hearns Bishop, 2003, p. 15) nor are they merely

philological evidence (cf. for instance Greetham,
1994, pp. 2, 42, 43). To the degree that we can sum-
marize Briet’s argument as ‘infinite proliferation of
the document’ and Eco’s argument as ‘infinite pro-
liferation of meaning’, we should regard these texts
fully as texts in their own right.

Despite the risks associated with the use of a
medium that is not fully understood, Shillingsburg
advises us to ‘just do it’—to go forth and create the
editions, to create a text which may be intended to
represent a physical text but is nevertheless new, and
digital. The scholar who takes his advice is con-
fronted with the peculiar fluidity, perhaps the
agency1 even, of the text that she has created. This
is a feature of the digital medium in which we have
chosen to work—one of its qualities that we, as yet,
only dimly understand.

The digital medium can be transformative and
discrete in its ambiguity. ASCII art is an excellent
example of this (e.g. Fig. 1). What exactly is the text
here? How should it be read, and how should it be
represented if it is remediated? This is, however, not
to say that text must be digital to be fluid, or that
the potential for such double entendre resides pecu-
liarly in the digital. The commingling of linguistic
sign and visual image that is usually expressed as
ASCII art today is in fact age old, as can be seen
in Fig. 2. This kind of glyphic art stresses not only
the transmedial or fluid nature of text; it also points
to its continuous character even in this discrete
medium. That continuity of interpretation, certainly
present within artistic expression before the compu-
tational age, becomes even more pronounced in
some Internet memes.

Or consider emojis—in essence iconography but
now registered in the writing system called Unicode,
pinned down in meaning by a standard but infin-
itely variable in interpretation depending on the
style of implementation of that standard. This is

Fig. 1 ASCII Art, produced by the authors using the online Text to ASCII Art Generator (patorjk.com/software/taag/)

J. J. van Zundert and T. L. Andrews

ii82 Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: Morever
Deleted Text: :
Deleted Text: ,
Deleted Text: :
Deleted Text: -
Deleted Text:  - 
Deleted Text: in order 
http://patorjk.com/software/taag/


fluid transmediality at its finest—emojis are pictor-
ial in nature and defined as glyphs, while ASCII art
is composed of glyphs and used to produce some-
thing pictorial. The transmedial, fluid, and continu-
ous nature of text and therefore of ‘textuality’ is now
far more obvious than it ever was. The digital en-
vironment amplifies the continuity of these natures,
just as the digital environment seems to amplify all
problematic qualities of text (O’Donnell, 2015).

Our claim of course is not that text is fluid. This
is well known (cf. Bryant, 2002, Levy, 1994). We also
consider it well established by now that the very
concept of text is fluid. What we claim here is that
the digitally enabled humanities for the greater ma-
jority have fallen into a habit of considering digital
texts as mere digitized surrogates of non-digitally
inscribed texts, that is as documents. The height of
sophistication of digital publishing among pub-
lishers still tends to be to offer a digital publication,
meaning a PDF or an ePub version of a book—even
while the idea of so-called ‘born digital’ texts (files
produced by text editors, blog posts, tweets, and so
forth) has found traction throughout the world. Or,
to put another analytic frame on this, for the most
part digital textual scholarship seems to be stuck in
a paradigm of remediation. Bolter and Grusin con-
tend that all mediation is remediation, that is all
new media express themselves through the encap-
sulation of older media. ‘Each new medium is jus-
tified because it fills a lack or repairs a fault in its
predecessor, because it fulfills the unkept promise of
an older medium’ (Bolter and Grusin, 2000, p. 60).
Indeed, the rhetoric surrounding the digital

scholarly edition is revolutionary, whereas closer in-
spection of the practice reveals little novelty; most
digital editions seem to be dutiful remediations of
print publications (Karlsson and Malm, 2004).

When scholars speak of a ‘digital text’ what they
usually have in mind is the visible rendering of a
digitally inscribed text, which usually takes a form
visually very similar to a physical text, allowing the
option of re-inscription on, e.g., paper. The on-
screen display of the digital representation is tech-
nically an interface to the digitally inscribed text,
but from the ontological perspective of the scholar,
it is ‘really’ an interface to a real or potential phys-
ical text. This conforms to the assertion by Bolter
and Grusin that ‘digital media can never reach [a]
state of transcendence, but will instead function in a
constant dialectic with earlier media’ (Bolter and
Grusin, 2000, pp. 49–50). Yet perhaps the most sig-
nificant remediation of text is not occurring at the
level of the graphical interface. There may certainly
be a mediation—in the original Marxist analytic
sense of the process of negotiating a balance of
power between social groups—between scholars
producing conventional print editions and those
creating digital editions. But the more important,
yet less apparent, remediation is a similar renegoti-
ation of what text is, between those scholars who
understand digital text as the visualization in a
graphical interface and those scholars and pro-
grammers who write, work with, and experience
the digital code and models of text of which
these visualizations are merely screen-oriented
representations.

Representation for reading purposes only
scratches the surface of what a digital text is. As
soon as scholars set out to apply the first computa-
tional analyses to text and to create the first digital
editions, text started to flow into the digital envir-
onment. As it did, it brought its textual condition to
the digital environment, even as properties of the
digital were imparted on the nature of text. The
people who worked within the digital environment
began to create a particular category of text that was
digital in nature. Programming languages, for in-
stance, applied algebraic and textual constructs, so
that they could be more easily read and applied
(anon, 1954). These executable texts were then

Fig. 2 Fragment of British Library, MS Harley 647, f.11r.
� British Library Board, reproduced with permission of
the British Library Board

Qu’est-ce qu’un texte numérique?

Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017 ii83

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: e.g. 
Deleted Text: -
Deleted Text: -
Deleted Text: &mdash;
Deleted Text: ,
Deleted Text: &mdash;
Deleted Text: ,
Deleted Text: ``
Deleted Text: '' 
Deleted Text: :
Deleted Text: -
Deleted Text: -
Deleted Text: ``
Deleted Text: '' 
Deleted Text: :
Deleted Text: -


used to model databases, which in turn were used to
model other texts into these databases (Jones, 2016).
Computational linguists began compiling vast cor-
pora of texts, using textual tags to annotate them,
making them distinctly ‘different’ from the physical
texts they were derived from. All these textual con-
structs imported into the digital environment
became products of their idiosyncratic environ-
ment, defined foremost by their ‘digital’ properties.

Our claim is that these texts belong to a distinct
ontological category. They are true digital objects
with inalienable digital properties. Even a plain-
text transcription is not a mere imitation of a real-
world text, but should be considered as a text in and
of itself. Until now we have not usually considered
these texts as being texts in their true digital form.
But what happens if we broaden our perspective to
accept all of these as texts: databases, XML files in
their XML form, source code in its legible form as
well as in the form of the results of its execution,
whatever visual form those results may take (cf. for
instance Fig. 3). From this perspective, and with the
help of hindsight, it becomes clear that the history
of digital textual scholarship has been by and large
one of ‘patching’ the perceived inadequacies of digi-
tal text to allow it to function more like ‘normal’
physical text—thereby inadvertently misunder-
standing and disregarding the digital nature and ex-
istence of digital text.

The simplest form of digital text is arguably the
string—a linear series of binary signals that encode
characters according to some predefined table. Its
origin is connected to the physical and technical
requirements of telegraphs and earlier signal transfer
technologies such as semaphores (Petzold, 2000).
The ability to regard information as an unidimen-
sional stream of discrete dichotomic bits was essen-
tial to the work of both Turing (1937) and Shannon
(1948). They and we have nonetheless been aware all
along that a linear series of characters can never
capture the multidimensional properties of text. It
cannot represent structure, semantics, relations, or
perspectives internal or external to the text. Because
of this, the computational string was ‘patched’ to
become a data structure, initially with typesetting
codes to instruct printing machines on how to pro-
duce typographically beautified texts (Goldfarb,
1996). Markup and hyperlinks were invented at a
later stage, patching the string to allow for more
multidimensional connections within and between
digital texts. Markup in the form of HTML arguably
became the most preferred of these patches, along-
side XML in general and, in humanistic/scholarly
contexts, TEI in particular. These ‘patching’ tech-
nologies were developed to allow digital text to
behave more like analogue text. They helped to re-
mediate what we knew about the properties of ‘real-
world’ text in the digital environment.

Fig. 3 Examples of four texts: a so-called plain text, a JSON encoding of a manuscript transcription, the text of a
JavaScript source code document, and a graph depiction of textual markup

J. J. van Zundert and T. L. Andrews

ii84 Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017


Arguably the most advanced ‘patch’ we have
come up with so far is the knowledge graph. The
graph as an interface to a real-world text has been
gaining currency (Andrews and Macé, 2013; Dekker
et al., 2015; Schmidt and Colomb, 2009; van
Zundert and Andrews, 2016). As a model for the
representation of the multidimensionality of text,
the graph model takes us far beyond the limitations
of the linearity of the string. It also takes us well
beyond the limitations of the hierarchy of the
string-segmented-into-a-tree that is markup. This
has considerable advantages for the digital ‘repre-
sentation’ of real-world text. Especially where that
representation must encompass multidimensional
aspects, such as ambiguity, narrative structure, vari-
ance, annotation, and so forth.

Nevertheless there is a further, more fundamental
step beyond re-representation that should be taken.
This is a step that was in fact already taken when
text crossed into the digital medium and a new
ontological category was created, but we as textual
scholars have failed to acknowledge it. As long as we
keep treating digital texts indeed as ‘models’ of text,
digital models moreover whose only purpose of
being is to depict themselves as digital re-represen-
tations of analogue texts, we deny these models their
ontological status of actually ‘being-a-text’ in and of
themselves. This is what we claim: the graph, the
database, and the JSON-LD file that now are
regularly created and maintained to function as
data structures for the representation of text are in
fact texts, and they should be considered as that: as
texts.

That most scholars do not regard the idiosyn-
cratic aspects of databases, graphs, text files, and
so forth, as idiosyncratic properties of a kind and
category of texts in its own right, is an effect of the
fact that digital text production is still rooted very
firmly in a representational philosophy. Almost all
digital text production is geared toward recreating,
within a digital environment, in a familiar guise, the
comfortable and familiar aspects of continuous and
fluid texts-in-the-physical-world. Even while we
have used digital text in this way, the properties of
these digital ‘versions’, which is to say the digital
properties of these texts in their own right, were
unintentionally neglected. Graphs, markup, and

strings seen solely as representations of text-in-the-
real-world will always strike us as inadequate on
some level. As Shillingsburg argued, it is not possible
to make a perfect translation or copy of an analogue
text into the digital realm. Kirschenbaum (2008) has
convincingly shown that digital texts are physical too
and that we must acknowledge their materiality.
Along the way he confirms that our digital models
have a full claim to the status of texts, for they too are
material texts—ones that require machine mediation
to be read, and have therefore a different sort of
materiality, but nevertheless still material and still
‘texts’.

In 2013 Jerome McGann, apparently driven to
despair about the perceived volatility and ephem-
erality of digital texts, argued that textual scholars
should regroup toward the philological–physical
fact of the glyph on paper (McGann, 2013). We
argue here that scholarship should rather venture
in the opposite direction, embracing digital texts
for what they are: texts adorned with properties
that are both inalienably textual and inalienably
digital. David Berry (2014) argues for the need to
critically examine digital objects such as digital in-
formation streams, now that these objects increas-
ingly help to constitute contemporary society and
culture (cf. also Jones, 2014). We would add to this
the argument that code and digital data structures
are included among the digital texts that increas-
ingly constitute contemporary cultural artifacts
and scholarship. These texts are thus worthy of
our philological consideration. We call attention
here to their ontological and epistemological status
and import within textual scholarship.

We should indeed go even further: where Berry
(and others) call for the consideration of the ‘sur-
face’ or the ‘interfaces’ of these data structures as
digital objects in themselves, we contend that the
data structures and models are themselves the ob-
jects worthy of our scholarly scrutiny. These are,
after all, texts in themselves. When a scholar has
modeled the semantics, the structure, or for that
matter any characteristics of a text in a database,
and she has added some logic or style sheets to
depict a visualization of those characteristics onto
the ‘canvas’ of a computer screen, then that depic-
tion may be the representation of the text of some

Qu’est-ce qu’un texte numérique?

Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017 ii85

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: real 
Deleted Text: Schmidt 
Deleted Text: and
Deleted Text: &amp; Colomb 2009, 
Deleted Text: , 
Deleted Text: Haentjens 
Deleted Text: , 
Deleted Text: -
Deleted Text: ,
Deleted Text: s
Deleted Text: s
Deleted Text: -


physical exemplar. In the process, however, new
cognitive texts and new documentary evidence of
these, in the form of those very database models
and style sheets, were created. Not only the visual-
ization but also the digital objects that produce the
visualization have become documents and texts in
themselves.

The fluidity of document and the infinite semi-
osis of text cause a proliferation of documents and
texts that each have inalienable unique properties
that may be bound to the specific materiality and
medium of the document and the text. In a schol-
arly context it is negligent not to acknowledge these
idiosyncratic properties, and to regard them as mere
inconvenient and unsatisfying incongruencies be-
tween the physical print text, the digital representa-
tion, and the digital model. These incongruencies
are what make digital texts texts in their own
right, and they point toward the differing onto-
logical status of digital and print text. These texts
cannot be and, in fact, actively resist being identical.
Purporting that they are, or can be, and that they are
only representations of physical texts, and nothing
more, is epistemologically shortsighted. None of the
texts we produce can have an inherent scholarly pri-
macy over the others, simply on the basis of its
form—the print text says things that the TEI encod-
ing does not, the TEI encoding says things that the
JSON does not; the JSON says things that the graph
does not; and saliently: vice versa. They are all texts,
and their forms are intrinsically bound up in the
expression of their essence.

One reason that scholars have paid less attention
to digital data structures and information models as
texts in their own right may be that digital texts
require their own specific literacy to be read and
written. Digital structures and objects are texts
that contain programming code, or require pro-
gramming code to be created, analyzed, visualized,
etc. That is, these texts are made up in part of signs
whose meanings scholars will recognize from other
sorts of texts (e.g. characters, words, and syntactic
and semantic structures), but they also consist of
signs still rather alien to scholars without program-
ming experience, such as string denotations, punc-
tuation semantics, variables, loops, and subroutine
statements. These signs, innate to the realm of code

and computation, require a different, additional lit-
eracy to be fully understood and interpreted. The
ability to code and encode are necessary prerequis-
ites, but computational literacy goes beyond learn-
ing the syntax and semantics of a particular
programming language. As Annette Vee noted:
‘But, unfortunately, when ‘‘literacy’’ is connected
to programming, it is often in unsophisticated
ways: literacy as limited to reading and writing
text; literacy divorced from social or historical con-
text; literacy as an unmitigated form of progress’
(Vee, 2013, p. 43). Vee argues that literacy refers
to a set of skills without which one is no longer
able to navigate one’s world. Code and digital
texts as technologies are not yet infrastructurally
critical to textual scholarship. However, the text-
ual scholar who does want to engage with digital
texts as ‘digital’ texts requires a specific literacy.
Epigraphical literacy, codicological literacy, and
computational literacy are essential in the under-
standing, respectively, of a stone inscription, of a
medieval manuscript, and of a digital text, each
one in its specific mode of being.

Vee’s argument is the most recent in a discourse
spanning at least four decades, which includes inter
alia Stephen Ramsay (2011), John Unsworth (2002),
Friedrich Kittler (1993), and Donald Knuth (1984)—
a discourse that puts forth the argument that working
with digital texts requires some proficiency in coding,
and that this proficiency is easily recognizable as lit-
eracy: reading and writing, but of a different kind.
Roots of the argument can be traced back to the
work of Adele Goldberg and Alan Kay, who were
involved with the creation of Smalltalk, which can
be regarded as the mother of all object-oriented pro-
gramming languages. Kay and Goldberg were spe-
cifically interested in how programming could be
taught, an experience that profoundly influenced
Goldberg’s thinking on literacy, convincing her ‘that
literacy should involve computing-based technologies
and the expectation that our knowledge and skills will
continually change, rather than define literacy as
being pencil/paper/book-based’ (Goldberg, 2010,
p. 24). However, literacy (be it computational or writ-
ten-language literacy) cannot be reduced to the skills
of reading and writing. Kay’s sobering observation
was that, after 30 years of experience, the success of

J. J. van Zundert and T. L. Andrews

ii86 Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: ,
Deleted Text: ,
Deleted Text: :
Deleted Text: :
Deleted Text: that 
Deleted Text: object 
Deleted Text: :
Deleted Text: thirty 


teaching computing literacy still depended on the
‘‘‘hacker phenomenon’’, that, for any given pursuit,
a particular 5% of the population will jump into it
naturally, while the 80% or so who can learn it in
time do not find it at all natural’ (Kay, 1993, p. 81).
More salient however is another observation he
makes: ‘The connection to literacy was painfully
clear. It isn’t enough to just learn to read and write.
There is also a literature that renders ideas. Language
is used to read and write about them, but at some
point the organization of ideas starts to dominate
mere language abilities’. That is, literacy does not
only consist of the basic skills of reading and writing
a certain set of symbols. Following Eco, interpretation
and understanding come from tacit knowledge-based
inference. Reading, writing, and thus also coding are
about fluency of words and of symbols, whereas the
fluency we need is a fluency in ideas and concepts. In
the case of coding literacy, this means an experienced
understanding of basic algorithms, coding constructs,
and programming patterns, and it is a literacy that
requires a number of years in training and experience,
rather than a few months.

It is hard for scholars who lack this literacy to
conceive of code and data structures as just another
semiotics, another meaningful way to express texts.
It is clear that being non-literate in code and encod-
ing makes it extremely hard to appreciate ‘digital’
texts as what they are essentially: texts. What is then
left is the mere use of code and data structures as
another tool for representational approaches, for the
depiction of a print or manuscript text in a digitized
guise mimicking the exemplar as closely as possible.
We lose sight of the fact that there are ‘native’ digital
ways of looking and working with digital texts, read-
ing and writing them, when we remain within
our representational philosophical confines. That
limited understanding not only provokes us to con-
centrate almost exclusively on standards for repre-
senting texts it also prohibits us from investigating
the textual nature of the digital text.

Just as Briet argued the epistemological status of
the rock as document, we should grant the proper
ontological and epistemological status to the digital
objects that we have so far used merely for textual
representation. Just as a rock can be a document, a
serialization or a source code is certainly a text.

References
Andrews, T.L. and Macé, C. (2013). Beyond the tree of

texts: Building an empirical model of scribal variation
through graph analysis of texts and stemmata. Literary
and Linguistic Computing, 28(4), 504–21.

anon (1954). Preliminary Report: Specifications for the
IBM Mathematical FORmula TRANSlating System,
FORTRAN. New York, NY: International Business
Machines Cooperation. http://www.computerhistory.
org/collections/catalog/102679231 (accessed 13
November 2016).

Barthes, R. (1975). S/Z: An Essay. New York, NY: Hill and
Wang.

Bede, M.W. (2007). What is documentation? English
translation of the classic French text, by Suzanne Briet
(Lanham, MD: Scarecrow, 2006). College and Research
Libraries, 68, 199–200.

Berry, D.M. (2014). Critical Theory and the Digital. New
York, NY; London; New Delhi etc.: Bloomsbury Academic.

Bolter, J.D. and Grusin, R. (2000). Remediation:
Understanding New Media. Cambridge, MA: MIT Press.

Briet, S. (1951). Qu’est-ce que la Documentation? Paris: Édit.

Briet, S. (2006). What is Documentation? English
Translation of the Classic French Text. Lanham, Md:
Scarecrow Press. http://ella.slis.indiana.edu/�roday/
briet.htm (accessed 19 October 2015).

Bryant, J. (2002). The Fluid Text: A Theory of Revision and
Editing for Book and Screen. University of Michigan Press.
http://books.google.nl/books?id¼1w4wpOdPbu4C.

Dekker, R., van Hulle, D., Middell, G., Neyt, V., van
Zundert, J. (2015). Computer supported collation of
modern manuscripts: CollateX and the Beckett digital
manuscript project. Literary and Linguistic Computing,
30(3), 452–70.

Eco, U. (1981). The theory of signs and the role of the
reader. The Bulletin of the Midwest Modern Language
Association, 14(1), 35–45.

Goldberg, A. (2010). Oral history of Adele Goldberg. http://
archive.computerhistory.org/resources/access/text/2013/
05/102701984-05-01-acc.pdf (accessed 8 November 2016).

Goldfarb, C.F. (1996). The roots of SGML—A personal
recollection. http://www.sgmlsource.com/history/roots.
htm (accessed 26 August 2014).

Greetham, D. (1994). Textual Scholarship: An Introduction.
New York & London: Garland Publishing Inc.

Hearns Bishop, M. (2003). Briet’s antelope: Some
thoughts on Suzanne Briet (1894-1989) and conserva-
tion documentation. WAAC Newsletter, 25(1), 12–16.

Qu’est-ce qu’un texte numérique?

Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017 ii87

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

Deleted Text: :
Deleted Text: ``
Deleted Text: ''
Deleted Text: :
Deleted Text: is 
Deleted Text: ; 
http://www.computerhistory.org/collections/catalog/102679231
http://www.computerhistory.org/collections/catalog/102679231
http://ella.slis.indiana.edu/&sim;roday/briet.htm
http://ella.slis.indiana.edu/&sim;roday/briet.htm
http://ella.slis.indiana.edu/&sim;roday/briet.htm
http://books.google.nl/books?id=1w4wpOdPbu4C
http://books.google.nl/books?id=1w4wpOdPbu4C
http://archive.computerhistory.org/resources/access/text/2013/05/102701984-05-01-acc.pdf
http://archive.computerhistory.org/resources/access/text/2013/05/102701984-05-01-acc.pdf
http://archive.computerhistory.org/resources/access/text/2013/05/102701984-05-01-acc.pdf
http://www.sgmlsource.com/history/roots.htm
http://www.sgmlsource.com/history/roots.htm


Jones, S.E. (2014). The Emergence of the Digital
Humanities. New York, NY; London: Routledge.

Jones, S.E. (2016). Roberto Busa, S.J., and the Emergence of
Humanities Computing: The Priest and the Punched
Cards. New York, NY; London: Routledge, Taylor &
Francis Group.

Karlsson, L. and Malm, L. (2004). Revolution or remedi-
ation? A study of electronic scholarly editions on the
web. HUMAN IT, 7(1), 1–46.

Kay, A.C. (1993). The early history of smalltalk. ACM
SIGPLAN Notices, 28(3), 69–95.

Kirschenbaum, M. (2008). Mechanisms: New Media and
the Forensic Imagination. Cambridge, MA: MIT Press.

Kittler, F. (1993). Es gibt keine software. In Draculas
Vermächtmis. Leipzig: Reclam Verlag, pp. 225–42.

Knuth, D.E. (1984). Literate programming. The Computer
Journal, 27(1), 97–111.

Levy, D.M. (1994). Fixed or fluid? Document stability and
new media. In ECHT 94 Proceedings of the 1994 ACM
European Conference on Hypermedia Technology.
Edinburgh; New York, NY: ACM Press, pp. 24–31.
https://pdfs.semanticscholar.org/f340/b718af6a208e47-
c143e4d25c1da9dcaf9f23.pdf (accessed 15 March 2017).

Manovich, L. (2013). Software Takes Command, Vol. 5.
New York, NY; London; New Delhi etc.: Bloomsbury
Academic.

McGann, J. (1991). The Textual Condition. Princeton:
Princeton University Press.

McGann, J. (2013). Philology in a new key. Critical
Inquiry, 39(2), 327–46.

McLuhan, M. (2003). Understanding Media: The
Extensions of Man (Critical Edition). In Gordon, W.
T. (ed.) (First published 1964). Berkeley: Gingko Press.

O’Donnell, D.P. (2015). A first law of humanities com-
puting? Blog. http://people.uleth.ca/�daniel.odonnell/
Blog/the-first-law-of-humanities-computing (accessed
17 June 2016).

Petzold, C. (2000). Code: The Hidden Language of Computer
Hardware and Software. Redmond: Microsoft Press.

Ramsay, S. (2011). Reading Machines: Toward an
Algorithmic Criticism (Topics in the Digital
Humanities). Chicago: University of Illinois Press.

Sahle, P. (2008). About ‘‘a catalog of: Digital scholarly
editions’’. http://www.digitale-edition.de/vlet-about.
html (accessed 11 November 2016).

Schmidt, D. and Colomb, R. (2009). A data structure for

representing multi-version texts online. International

Journal of Human-Computer Studies, 67(6), 497–514.

Shannon, C.E. (1948). A mathematical theory of communi-

cation. The Bell System Technical Journal, 27(3), 379–423.

Shillingsburg, P. (2014). From physical to digital textual-

ity: Loss and gain in literary projects. CEA Critic, 76(2),

158–68.

Turing, A.M. (1937). On computable numbers, with an

application to the entscheidungsproblem. Proceedings of

the London Mathematical Society, 42(1), 230–65.

Unsworth, J. (2002). What is humanities computing and

what is not? In Braungart, G., Gendolla, P. and

Jannidis, F. (eds), Jahrbuch für Computerphilologie, 4.

http://computerphilologie.digital-humanities.de/jg02/

unsworth.html (accessed 8 July 2013).

van Zundert, J.J. (2016). Author, editor, engineer—Code

& the rewriting of authorship in scholarly editing.

Interdisciplinary Science Reviews, 40(4), 349–75.

van Zundert, J.J. and Andrews, T.L. (2016). Apparatus

vs. graph: New models and interfaces for text. In

Hadler, F. and Haupt, J. (eds.) Interface Critique.

Kaleidogramme. Berlin: Kulturverlag Kadmos.

Vee, A. (2013). Understanding computer programming as

a literacy. LiCS, 1(2), 42–64.

Woolgar, S., and Cooper, G. (1999). Do artefacts have

ambivalence? Moses’ bridges, Winner’s bridges and

other urban legends in S&TS. Social Studies of Science,

29(3), 433–49.

Note
1. Following the debate surrounding the potential politics

and agency of artifacts (cf. Woolgar and Cooper,

1999), some form of agency for documents and

(thus) texts can be assumed. Although we would not

attribute direct agency to texts, artifacts (and thus

documents and texts) may effectuate a ‘deferred’

agency of, for instance, an author. This notion of

deferred agency relates to poetic notions such as cath-

arsis and, e.g., Bertold Brecht’s ideas on theater as a

political forum. Among others Manovich (2013), Berry

(2014), and van Zundert (2016) argue that ideas on

such deferred agency actually are very relevant with

respect to the recent form of text that software code is.

J. J. van Zundert and T. L. Andrews

ii88 Digital Scholarship in the Humanities, Vol. 32, Supplement 2, 2017

Downloaded from https://academic.oup.com/dsh/article-abstract/32/suppl_2/ii78/4065124
by guest
on 09 November 2017

https://pdfs.semanticscholar.org/f340/b718af6a208e47c143e4d25c1da9dcaf9f23.pdf
https://pdfs.semanticscholar.org/f340/b718af6a208e47c143e4d25c1da9dcaf9f23.pdf
http://people.uleth.ca/&sim;daniel.odonnell/Blog/the-first-law-of-humanities-computing
http://people.uleth.ca/&sim;daniel.odonnell/Blog/the-first-law-of-humanities-computing
http://people.uleth.ca/&sim;daniel.odonnell/Blog/the-first-law-of-humanities-computing
http://www.digitale-edition.de/vlet-about.html
http://www.digitale-edition.de/vlet-about.html
http://computerphilologie.digital-humanities.de/jg02/unsworth.html
http://computerphilologie.digital-humanities.de/jg02/unsworth.html