Document downloaded from: 

 
This paper must be cited as:  

 
The final publication is available at 

 
Copyright 

 
Additional Information 

 
https://doi.org/10.1093/llc/fqw005

http://hdl.handle.net/10251/99607

Oxford University Press

Gamermann ., D.; Moret-Tatay, C.; Navarro Pardo, E.; Fernández De Córdoba, P. (2016).
The small-world of 'Le Petit Prince': Revisiting the word frequency distribution. Digital
Scholarship in the Humanities. 32(2):301-311. doi:10.1093/llc/fqw005


The Small-World of “Le Petit Prince”: revisiting the word frequency

distribution

D. Gamermann∗1, C. Moret-Tatay2, E. Navarro-Pardo3, and P. Fernandez de Córdoba Castellá4

1Department of Physics, Universidade Federal do Rio Grande do Sul (UFRGS) - Instituto de F́ısica , Av.
Bento Gonçalves 9500 - Caixa Postal 15051 - CEP 91501-970 - Porto Alegre, RS, Brasil.

2Departamento de Neuropsicobioloǵıa, Metodoloǵıa y Psicoloǵıa Social - Facultad de Psicoloǵıa,
Magisterio y Ciencias de la Educación, Sede de San Juan Bautista. Universidad Católica de Valencia,

San Vicente Mártir - Calle Guillem de Castro 175, 46008- Valencia, Spain.
3Department of Developmental and Educational Psychology - Faculty of Psychology, Universitat de

València. Av. Blasco Ibáñez, 21 46010 - Valencia, Spain.
4Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València. Camino de

Vera, s/n 46022 - Valencia, Spain.

March 6, 2018

Abstract

Many complex systems are naturally described through graph theory and different kinds of systems described
as networks present certain important characteristics in common. One of these features is the so called scale-free
distribution for its node’s connectivity, which means that the degree distribution for the network’s nodes follows
a power law. Scale-free networks are usually refered to as small-world because the average distance between their
nodes do not scale linearly with the size of the network, but logarithmically. Here we present a mathematical
analysis on linguistics: the word frequency effect for different translations of the “Le Petit Prince” in different
languages. Comparison of word association networks with random networks makes evident the discrepancy
between the random Erdös-Rény model for graphs and real world networks.

Key words: Small-world, word frequency, Zipf’s law

Many objects of study in different interdisciplinary fields find a natural mathematical description as graphs.
A graph is simply an object formed by two different sets: a set of nodes and a set of edges connecting these nodes.
For many decades the mathematical study of graphs has been guided by the Erdös-Rény model for random graphs
Erdös, P. and Rényi, A. (1960). In this model a (random) graph is constructed from a set of N nodes by connecting

or not each one of the
N(N−1)

2
pairs of nodes with a probability p. A random graph will, therefore, have on average

∗danielg@if.ufrgs.br

1


p
N(N−1)

2
links and the degree distribution of its nodes will follow a Poisson distribution. Another characteristic

of random graphs is the fact that its size (average node distance) scales linearly with the number of nodes in the
graph.

As graph theory started being applied to many real systems such as metabolic or protein networks, neural
networks, the Internet, social networks, food-chains, among many others Rives & Galitski (2003), Haykin (1994),
Pastor-Satorras et al. (2001), Crucitti et al. (2003), a discrepancy between these real-world graphs and the random
Erdös-Rény graphs became evident. The node’s degree distribution in real-world graphs do not follow a Poisson
distribution, instead they follow a power-law distribution and thus became known as scale-free. As a consequence,
the average distance between two nodes in such networks grows slowly with the the number of N nodes in the
network and this characteristic is known as small-world behavior Amaral et al. (2000).

It has been observed that the word frequency distribution in a language also follows a scale-free distribution
and many explanations for this phenomenon have been given. In linguistics, this observation is known as Zipf’s
law. It states that the proportion of words P (in a text, for example) with a given frequency k follows a power
law: P(k) ∼ k−γ where γ is generally a number between 2 and 3. This law shows that few words present very
high frequency and, conversely, many words present low frequency. A particular and appealing explanation for
this could be achieved via concepts from statistical mechanics where one tries to minimize an energy function
based on the balance between the efforts of the speaker and the listener which is defined by the word frequency
and ambiguity, as shown in Cancho & Solé (2003).

One traditional way to examine differences between languages is by variables such as frequency, morphological
complexity, evolution and cultural transmission. All these aspects can be related in a complex adaptive system
Beckner et al. (2009). In particular, the word frequency is a classical effect in cognitive psychology characterized
by its robustness: high frequency words are recognized quicker and remembered better Sternberg & Powell (1983).
Therefore, a large body of research has employed the word frequency as an approach of word difficulties Dufau
et al. (2011), Esteves et al. (2015), Moreno-Cid et al. (2015), Moret-Tatay & Perea (2011b,a), Navarro-Pardo
et al. (2013), Perea, Moret-Tatay & Carreiras (2011), Perea, Comesaña, Soares & Moret-Tatay (2012), Perea,
Gatt, Moret-Tatay & Fabri (2012), Perea, Moret-Tatay & Gómez (2011). According to Breland (1996), the logic
of this is that low frequency words are more difficult because they appear less often in print. Moreover, (van
Heuven et al. (2014)) proposed the Zipf-scale as a better standardized measure of word frequency. Given the
ease with which word counts can be collected at the present time, a useful tool on contrastive linguistics is a
lexical corpus of a language. In other words, a large collection of texts in the electronic form supplemented by
linguistic annotation that has become an important tool in linguistic studies. Not surprisingly, several databases
for Computing Statistics and Psycholinguistic in several languages have been developed for this objective Coltheart
(1981), Davis (2005). However, according to Perea et al. (2013), Yap et al. (2011), other variables might be involved
in word recognition, in particular in word frequency, such as the number of contexts in which a word appears.

In the present work we focus on the analysis of a single linguistic material (the Little Prince by Saint-Exupéry)
in several different languages. To this propose, we have studied statistical properties of the text and networks
(graphs) associated with this text. In the different languages we studied the word frequency distribution on one
hand and then we constructed different networks by word associations. For each network we built, we evaluated
its main properties, like its average clustering coefficient, nodes distances and its degree distribution. In the next

2


section we present the methodology we used and the mathematics behind our analyses, in the Results section
we describe our findings and in the Conclusions section we present the main aspects of our results and a brief
overview.

1 Methods

1.1 Materials

The Little Prince text was obtained from the Internet in eight different languages: Spanish, English, Dutch, Greek,
Basque, Italian, Portuguese and (of course) French.

In order to analyze the text, python scripts were written. The computer codes were run in a computer with a
i7 quadcore processor and 8Gb of RAM memory. The scripts first stored all text in the computer RAM memory.
Then, it used punctuation in order to slice the text in its sentences and then removed all punctuation and numerals
(0, 1, 2, ...) from the raw text. It then identified the different words as the strings left which were separated by
spaces. As an example, below one can see the first 300 characters from the French text:

Antoine de Saint-Exupéry

LE PETIT PRINCE

1943

PREMIER CHAPITRE

Lorsque j’avais six ans j’ai vu, une fois, une magnifique image, dans un livre sur la Forêt

Vierge qui s’appelait � Histoires Vécues �. Ça représentait un serpent boa qui avalait un fauve.

Voilà la copie du dessin. On disai

Through our scripts, the extract above becomes the list of words: antoine, de, saint, exupéry, le,
petit, prince, premier, chapitre, lorsque, j, avais, six, ans, j, ai, vu, une, fois, une, magnifique,

image, dans, un, livre, sur, la, forêt, vierge, qui, s, appelait, histoires, vecues, ça, représentait,

un, serpent, boa, qui, avalait, un, fauve, voilà, la, copie, du, dessin, on, disait.
Once the python script transforms the whole text in a raw list of words (15612 in the case of the French text),

it counts the number of different words (2600 in the French text) and counts also the number of times that each
single word is repeated in the text. For the construction of networks, we will link words based on their relative
distance in the text. For this, one needs to keep track of the sentences in which the text is divided and which
words appear in each sentence. So our script actually first creates a list of sentences, by slicing the text when it
finds a punctuation symbol, and after that a list of single words, by slicing the sentences in its blank spaces.

1.2 Analysis

The word frequency distribution P(k) is a function that, for each natural number k, tells how many words
appeared in the text k times. In the case of the French text, for example, 1516 different words appeared only
once (P(1) = 1516), one of these is the word “réjouir”, that appears in the whole text only once. On the other
hand, the word “et” was the fifth most frequent word, appearing 306 times (k = 306) and this is the only word

3


that appeared this number of times, consequently P(306) = 1. The most frequent word was the article “le” that
appeared 465 times and is the only word appearing 465 times in the text (P(465) = 1).

Typically, for a text, many words appear only a few times, while a few words are repeated constantly along
the text. As a consequence, the function P(k) is a decreasing function. A mathematical function that often fits
P(k) in a text is the power-law distribution:

P(k) = Ak−γ, (1)

log (P(k)) = log(A) −γ log(k) (2)

where A is a proportionality constant that can be evaluated by the total number of words. The fact that
the frequency distribution follows a power-law (or scale-free) distribution is known as the Zipf law. Note from
equation (2) that, in a log-log plot, the distribution will follow a straight line.

For real texts, the tail (large values of k) of the P(k) distribution will be very noisy, because only a handful
of large values of k will be populated and then by a single word. In figure 1 we show the function P(k) (in
logarithmic scale) for the French text. One can clearly see the noise in the right tail.

Figure 1: Word frequency distribution for the French text with a noisy right tail.

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

P(k)

4


In order to fit the distribution avoiding the noisy tail, one can use the right-cumulative distribution:

Pc(k) =

∫ ∞
k

P(k′)dk′ =
A

γ − 1
k−(γ−1) (3)

log (Pc(k)) = log

(
A

γ − 1

)
− (γ − 1) log(k). (4)

In figure 2 one can see the distribution Pc(k) (in logarithmic scale) for the French text. This curve is much
smoother than the raw P(k) distribution and it is always decreasing.

Figure 2: Word frequency cumulative distribution for the French text.

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

Pc(k)

From equations (2) and (4) it is clear that the plot of log(P) or log(Pc) versus log(k) will follow a straight line
if the distribution P(k) follows the power-law in equation (1). So fitting lines to the empirical data collected from
the texts, one can determine the parameters A and γ. The parameter A divided by γ−1 is just the total number
of different words in a text. One can realize this by noticing that Pc(1) = #total of words.

Apart from measuring and fitting the word frequency distribution, we analyzed networks of words association
built from the texts. In order to build a network from the texts in each language, we set each word as a node and

5


we built two different networks by following two different rules in order to set the links between words. In the
first network we define a link between two words if they appear side by side in at least one sentence in the text.
In the second network a link is defined between two words if there is a third word between the two in at least one
sentence in the text. In figure 3 we show examples of the two networks based on a single sentence in the text:
“My drawing was not a picture of a hat!”

Figure 3: Example of the two networks. Network 1 on the left and network 2 on the right.

of was

picture

hat drawing

my

not

a

of was

picture

hat drawing

my

not

a

Network 1 Network 2

An important structure in order to analyze a graph is its adjacency matrix, this is a symmetric N×N matrix,
where N is the number of nodes in the graph and the elements Mij are equal to one if there is a link between
nodes i and j and zero otherwise. From this matrix, one can directly obtain the degree (number of neighbors or

connections) for any given node in the graph: ki =
N∑
j=1

Mij.

The number of nodes (words) in each network constructed from the texts maybe less than the total number of
different words in each whole text because we remove non-connected components (sets of nodes from which it is
not possible the reach a bigger set of nodes following the links within the set) from the graphs. For each network
we performed three analyzes: we fitted a power-law to its degree distribution, we calculated the average clustering
coefficient and the average distance between two nodes.

The fitting of a power-law follows the same steps done in order to fit word frequencies (but now looking at
degree for each node in the network). The clustering coefficient of a node is given by Ravasz & Barabasi (2003):

Ci =
2Ei

ki(ki − 1)
(5)

where ki is the degree of node i and Ei is the number of connections between the neighbors of node i. The average

6


clustering C̄ of a network can now be calculated straightforward as the average value of the Ci’s for all nodes in
the network.

The distance between two nodes is defined as the minimum number of links one has to go through in order to
travel from one node to the other. The average distance between every one of the

N(N−1)
2

different pairs of nodes
in each network was calculated using Dijkstra’s algorithm Dijkstra (1959) via the PyNetMet package Gamermann
et al. (2014). The average of the distances between every pair is the network’s average distance d̄.

We compared the average clustering and average distance in every network with results from random networks.
For this purpose, for each network, we built an ensemble with twenty random networks with the same number of
nodes and the same number of links, but with random topology. The input for a network is its adjacency matrix
M. So, for building a random network we use the following algorithm:

(1) Start with an N ×N matrix where all its elements are zero. (One has here N nodes and zero links (` = 0)
between them.

(2) While the number of links (`) is less than the desired number of links in the network, repeat:

(2.1) Chose two different integer random numbers (i and j) between 1 and N.

(2.2) If Mij is zero, change Mij and Mji to one and increase in one unit the number of links (`→ ` + 1).

(3) Check if any node (i) has been left unconnected. If so, randomly choose a node (j) to connect it (i) to and
randomly break an existing connection of node j.

(4) Repeat step (3) until no node is left unconnected.

Steps (3) and (4) are actually optional, but throughout our calculations, we have chosen to work with fully
connected graphs. This algorithm returns a randomly generated adjacency matrix representing a connected
network with a predefined number of nodes and links.

Using this algorithm, for each network obtained from a text, we generate an ensemble of twenty random
networks with the same number of nodes and links. For each random network in the ensemble the average
clustering and average distance is calculated and then the average inside each ensemble is evaluated.

2 Results

In figure 4 the distributions for all the eight languages in log-log scale are supper-posed showing the tendency
they have to follow a straight line. In figure 5 the distribution for each individual language is shown with the best
line fitted using the least squares method. In the title of each plot one finds the equation fitted.

In table 1 we show the values of γ, A
γ−1 , total number of words and the

χ2

dof
for the best fit for each language.

The value for χ2 (minimized by the least square method) is calculated as:

χ2 =

kmax∑
k=1

(log(Pc(k)) − log(Pcobs,k ))
2

�k
(6)

7


Figure 4: Cumulative word frequency distribution for all texts.

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 4

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

spanish
english

dutch
basque

greek
italian

portuguese
french

where Pcobs,k is the observed value for the right-cumulative distribution of words at frequency k, �k is the error
associated to log(Pcobs,k ) and the sum is made for all k’s for which Pobs,k is different from zero

1. Since Pcobs,k is
an absolute frequency, the error associated to it is its square-root and, therefore, one evaluates the logarithmic2

error �k =
1

ln(10)
√
Pcobs,k

.

The results for the networks analysis can be found in tables 2 and 3. In figure 6 we show, for the Network 1
constructed from the Portuguese text, its degree distribution, the best fitted line to it and the degree distribution
for a random network with the same number of nodes and links (N = 2424 and ` = 6175). From this figure, one
can clearly see the difference between the distribution obtained from a “real” network (power-law distribution)
and the one obtained from a completely random network (Poisson distribution). In a power-law distribution there
is a sensible probability of observing nodes with a higher (much bigger than average) degree, while in a Poisson
distribution this probability drops to zero very fast.

1Note that Pcobs,k is the right cumulative distribution so, if Pobs,k is zero for a given value of k, Pcobs,k will be a constant for all
k′s after this, until reaching a new k where Pobs,k is not zero, and therefore, these points would not bring any new information to the
analysis.

2In all our equations log is the base 10 logarithm and ln is the natural (base e) logarithm.

8


Figure 5: Cumulative word frequency distribution for all texts with the best line fitted.

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -0.951297 log10(k) + 3.325550

french
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -1.040888 log10(k) + 3.393484

greek
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 4

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -1.216154 log10(k) + 3.485592

basque
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -0.995365 log10(k) + 3.337836

dutch
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -0.937050 log10(k) + 3.327242

english
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -1.056252 log10(k) + 3.362926

italian
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -1.067550 log10(k) + 3.364569

portuguese
b+m*x

-0.5

 0

 0.5

 1

 1.5

 2

 2.5

 3

 3.5

 0  0.5  1  1.5  2  2.5  3

lo
g

1
0
(#

 w
o

rd
s
)

log10(freq)

log10(Pc) = -1.080361 log10(k) + 3.370859

spanish
b+m*x

9


Figure 6: Degree distribution for the Network 1 obtained from the Portuguese text compared with a random
network.

-1

 0

 1

 2

 3

 4

 5

 6

 7

 8

 0  1  2  3  4  5  6  7

lo
g

1
0
(#

 n
o

d
e

s
)

log10(freq)

Network 1
Fit

Random Network

10


Table 1: Summary of the fits.

Language # words A
γ−1 γ

χ2

dof

SPANISH 2801 2348.87 2.08 0.078
ENGLISH 2098 2124.43 1.94 0.041
DUTCH 2375 2176.89 2.00 0.040
BASQUE 3226 3059.09 2.22 0.016
GREEK 2951 2474.48 2.04 0.063

ITALIAN 2689 2306.35 2.06 0.045
PORTUGUESE 2607 2315.10 2.07 0.031

FRENCH 2600 2116.17 1.95 0.112

The properties calculated for the two types of networks (1 and 2) are very similar, but they differ significantly
from the properties calculated for random networks. The average node distance in the random networks are, on
average, around two units bigger than in the language networks and they present a much smaller standard deviation
in the case of random networks. The second interesting difference between random and language networks is the
average clustering coefficient, which is very close to zero in the case of random networks. In language networks,
words tend to form clusters because of the language structure (they will share either context, grammatical or
semantic function, ...) and this feature is reflected in the clustering coefficient calculated from eq. (5).

3 Conclusions

Here we present a mathematical analysis on linguistics: the word frequency effect for different translations of the
same book (“Le Petit Prince”) in eight different languages. The interest of these studies is that the occurrence
of words in sentences reflects the language’s organization. Apart from the word frequency distribution, we also
performed analyzes of different networks built based on word associations in the text and compared these to
random networks.

As expected, word frequency presented a scaling law. The results suggest small differences on language volume
for the same material. In particular, the γ parameter varied slightly across the different languages. Moreover,
our study shows how different languages tend to slightly differ in formal aspects. Comparison of word association
networks with random networks makes evident the discrepancy between the random Erdös-Rény model for graphs
and real world networks. A real network follows a specific design principle and therefore its nodes are connected
in an organized way. This becomes evident from the clustering coefficient of the networks which have a high value
for networks 1 and 2, but is very close to zero for the random networks. Another interesting difference between
the real and random networks is the observation of the small-world effect in real networks: its average node’s
distance is much smaller than in random networks.

Finally, one can conclude that these results show how different languages tend to slightly differ in formal aspects

11


Table 2: Network 1 parameters for the different languages. N is the number of nodes and ` is the number of
links, γ is the parameter obtained fitting a power-law to the degree distribution for the nodes, C̄ is the average
clustering, d̄ is the average nodes distances. The parameters with a subscript R refer to the the averages in the
random networks and the uncertainties shown are the standard deviations for the calculated averages (in the
case C̄R and d̄R, it is the standard deviation within the ensemble and not the average standard deviation within
networks).

Language N ` γ C̄ d̄ C̄R d̄R
SPANISH 2705 6912 2.223 0.203 ± 0.343 3.240 ± 0.416 0.002 ± 0.000 4.988 ± 0.015
ENGLISH 1950 6770 2.260 0.248 ± 0.358 3.026 ± 0.379 0.004 ± 0.001 4.123 ± 0.006
DUTCH 2236 7048 2.201 0.294 ± 0.440 3.156 ± 0.413 0.003 ± 0.001 4.388 ± 0.006
BASQUE 3100 7017 2.481 0.069 ± 0.219 3.915 ± 0.657 0.001 ± 0.000 5.408 ± 0.021
GREEK 2745 6990 2.273 0.210 ± 0.349 3.287 ± 0.494 0.002 ± 0.000 5.005 ± 0.013

ITALIAN 2559 6566 2.258 0.153 ± 0.302 3.363 ± 0.446 0.002 ± 0.000 4.946 ± 0.014
PORTUGUESE 2311 5786 2.240 0.198 ± 0.365 3.292 ± 0.442 0.002 ± 0.000 4.945 ± 0.020

FRENCH 2230 6004 2.327 0.207 ± 0.362 3.231 ± 0.391 0.002 ± 0.001 4.737 ± 0.017

when the context is controlled. In particular, these results are of interest to other applied fields. Bear in mind
that, in recent decades, the cognitive psychology has paid particular interest to examining factors influencing the
recognition of printed words, i.e., frequency, familiarity, word length, age of acquisition among others, according to
Andrews (2006). There remain some empirical underlying questions, regarding the question of measuring the word
frequency for different languages, from printed manuals to even subtitles. Even if more research is needed here,
the comparison between these sources is beyond the scope of this study. Here, we offer a comparison employing
different translations of the same printed material in different languages. That allows us to compare differences
of word frequency in the same context. Regarding this topic, Perea et al. (2013), Yap et al. (2011) stated that
other variables must have a role on frequency, such as the number of contexts in which a word appears. That
correspond with the nature of our results. Furthermore, some researchers (van Heuven et al. (2014)) proposed the
Zipf-scale as a better standardized measure of word frequency, giving also examples of printed words with various
Zipf values. The authors also claimed that an alternative Zipf scale presented in their work is better suited for
research in word recognition. Here, we follow the same logic. Thus, these results might offer some insights in to
the role of the word frequency effect for print words, but more research in this field is necessary.

Acknowledgment

We would like to thank Thomas Irvin for his invaluable help and comments.

12


Table 3: Network 2 parameters for the different languages. N is the number of nodes and ` is the number of
links, γ is the parameter obtained fitting a power-law to the degree distribution for the nodes, C̄ is the average
clustering, d̄ is the average nodes distances. The parameters with a subscript R refer to the the averages in the
random networks and the uncertainties shown are the standard deviations for the calculated averages (in the
case C̄R and d̄R, it is the standard deviation within the ensemble and not the average standard deviation within
networks).

Language N ` γ C̄ d̄ C̄R d̄R
SPANISH 2682 6418 2.233 0.262 ± 0.518 3.413 ± 0.644 0.002 ± 0.001 5.164 ± 0.017
ENGLISH 1927 6499 2.277 0.332 ± 0.513 3.129 ± 0.506 0.003 ± 0.000 4.167 ± 0.009
DUTCH 2218 6577 2.213 0.370 ± 0.611 3.145 ± 0.560 0.003 ± 0.001 4.515 ± 0.010
BASQUE 3035 6064 2.439 0.157 ± 0.416 3.792 ± 0.948 0.001 ± 0.000 5.784 ± 0.024
GREEK 2703 6266 2.321 0.221 ± 0.481 3.438 ± 0.803 0.002 ± 0.000 5.250 ± 0.018

ITALIAN 2537 6203 2.283 0.163 ± 0.367 3.478 ± 0.654 0.002 ± 0.001 5.068 ± 0.019
PORTUGUESE 2260 5064 2.285 0.232 ± 0.476 3.425 ± 0.792 0.002 ± 0.000 5.230 ± 0.016

FRENCH 2191 5290 2.298 0.202 ± 0.447 3.366 ± 0.712 0.002 ± 0.001 5.007 ± 0.015

References

Amaral, L. A., Scala, A., Barthelemy, M. & Stanley, H. E. (2000), ‘Classes of small-world networks’, Proc. Natl.
Acad. Sci. U.S.A. 97(21), 11149–11152.

Andrews, S. (2006), ‘All about words: A lexicalist perspective on reading’, From inkmarks to ideas: Current issues
in lexical processing p. 318.

Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N. C., Holland, J., Ke, J., Larsen-
Freeman, D. & Schoenemann, T. (2009), ‘Language is a complex adaptive system: Position paper’, Language
learning 59(s1), 1–26.

Breland, H. M. (1996), ‘Word frequency and word difficulty: A comparison of counts in four corpora’, Psychological
Science-Cambridge- 7, 96–99.

Cancho, R. F. & Solé, R. V. (2003), ‘Least effort and the origins of scaling in human language’, Proceedings of
the National Academy of Sciences 100(3), 788–791.

Coltheart, M. (1981), ‘The mrc psycholinguistic database’, The Quarterly Journal of Experimental Psychology
33(4), 497–505.

Crucitti, P., Latora, V., Marchiori, M. & Rapisarda, A. (2003), ‘Efficiency of scale-free networks: error and attack
tolerance’, Physica A: Statistical Mechanics and its Applications 320, 622–642.

13


Davis, C. J. (2005), ‘N-watch: A program for deriving neighborhood size and other psycholinguistic statistics’,
Behavior research methods 37(1), 65–70.

Dijkstra, E. (1959), ‘A note on two problems in connexion with graphs’, Numerische Mathematik 1(1), 269–271.
URL: http://dx.doi.org/10.1007/BF01386390

Dufau, S., Duñabeitia, J. A., Moret-Tatay, C., McGonigal, A., Peeters, D., Alario, F.-X., Balota, D. A., Brysbaert,
M., Carreiras, M., Ferrand, L. et al. (2011), ‘Smart phone, smart science: how the use of smartphones can
revolutionize research in cognitive science’, PloS one 6(9), e24974.

Erdös, P. and Rényi, A. (1960), On the Evolution of Random Graphs, in ‘Publication of the mathematical institute
of the hungarian academy of sciences’, , pp. 17–61.

Esteves, C. S., Oliveira, C. R., Moret-Tatay, C., Navarro-Pardo, E., Carli, G. A. D., Silva, I. G., Irigaray, T. Q. &
Argimon, I. I. d. L. (2015), ‘Phonemic and semantic verbal fluency tasks: normative data for elderly brazilians’,
Psicologia: Reflexão e Cŕıtica 28(2), 350–355.

Gamermann, D., Montagud, A., Jaime Infante, R., Triana, J., Urchuegúıa, J. & Fernández de Córdoba, P.
(2014), ‘Pynetmet: Python tools for efficient work with networks and metabolic models’, Computational and
Mathematical Biology (3), 1–11.

Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, 1st edn, Prentice Hall PTR, Upper Saddle
River, NJ, USA.

Moreno-Cid, A., Moret-Tatay, C., Irigaray, T. Q., Argimon, I. I., Murphy, M., Szczerbinski, M., Mart́ınez-Rubio,
D., Beneyto-Arrojo, M. J., Navarro-Pardo, E. & Fernández, P. (2015), ‘The role of age and emotional valence
in word recognition: An ex-gaussian analysis’, Studia Psychologica 57(2), 83–94.

Moret-Tatay, C. & Perea, M. (2011a), ‘Do serifs provide an advantage in the recognition of written words?’,
Journal of Cognitive Psychology 23(5), 619–624.

Moret-Tatay, C. & Perea, M. (2011b), ‘Is the go/no-go lexical decision task preferable to the yes/no task with
developing readers?’, Journal of experimental child psychology 110(1), 125–132.

Navarro-Pardo, E., Navarro-Prados, A. B., Gamermann, D. & Moret-Tatay, C. (2013), ‘Differences between young
and old university students on a lexical decision task: Evidence through an ex-gaussian approach’, The Journal
of General Psychology 140(4), 251–268.

Pastor-Satorras, R., Vazquez, A. & Vespignani, A. (2001), ‘Dynamical and correlation properties of the internet’,
Phys. Rev. Lett. 87(25), 258701.

Perea, M., Comesaña, M., Soares, A. P. & Moret-Tatay, C. (2012), ‘On the role of the upper part of words in lexical
access: Evidence with masked priming’, The Quarterly Journal of Experimental Psychology 65(5), 911–925.

14


Perea, M., Gatt, A., Moret-Tatay, C. & Fabri, R. (2012), ‘Are all semitic languages immune to letter transposi-
tions? the case of maltese’, Psychonomic bulletin & review 19(5), 942–947.

Perea, M., Moret-Tatay, C. & Carreiras, M. (2011), ‘Facilitation versus inhibition in the masked priming same–
different matching task’, The Quarterly Journal of Experimental Psychology 64(10), 2065–2079.

Perea, M., Moret-Tatay, C. & Gómez, P. (2011), ‘The effects of interletter spacing in visual-word recognition’,
Acta psychologica 137(3), 345–351.

Perea, M., Soares, A. P. & Comesaña, M. (2013), ‘Contextual diversity is a main determinant of word identification
times in young readers’, Journal of experimental child psychology 116(1), 37–44.

Ravasz, E. & Barabasi, A. L. (2003), ‘Hierarchical organization in complex networks’, Phys Rev E Stat Nonlin
Soft Matter Phys 67(2 Pt 2), 026112.

Rives, A. W. & Galitski, T. (2003), ‘Modular organization of cellular networks’, Proc. Natl. Acad. Sci. U.S.A.
100(3), 1128–1133.

Sternberg, R. J. & Powell, J. S. (1983), ‘Comprehending verbal comprehension.’, American Psychologist 38(8), 878.

van Heuven, W. J., Mandera, P., Keuleers, E. & Brysbaert, M. (2014), ‘Subtlex-uk: A new and improved word
frequency database for british english’, The Quarterly Journal of Experimental Psychology 67(6), 1176–1190.

Yap, M. J., Tan, S. E., Pexman, P. M. & Hargreaves, I. S. (2011), ‘Is more always better? effects of semantic
richness on lexical decision, speeded pronunciation, and semantic classification’, Psychonomic Bulletin & Review
18(4), 742–750.

15