Submitted 15 December 2018
Accepted 22 August 2019
Published 16 September 2019

Corresponding author
Anwitaman Datta,
anwitaman@ntu.edu.sg

Academic editor
Diego Amancio

Additional Information and
Declarations can be found on
page 19

DOI 10.7717/peerj-cs.220

Copyright
2019 Oggier et al.

Distributed under
Creative Commons CC-BY 4.0

OPEN ACCESS

A split-and-transfer flow based entropic
centrality
Frédérique Oggier1,*, Silivanxay Phetsouvanh2 and Anwitaman Datta2,*

1 Division of Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
2 School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
* These authors contributed equally to this work.

ABSTRACT
The notion of entropic centrality measures how central a node is in terms of how
uncertain the destination of a flow starting at this node is: the more uncertain the
destination, the more well connected and thus central the node is deemed. This
implicitly assumes that the flow is indivisible, and at every node, the flow is transferred
from one edge to another. The contribution of this paper is to propose a split-and-
transfer flow model for entropic centrality, where at every node, the flow can actually be
arbitrarily split across choices of neighbours. We show how to map this to an equivalent
transfer entropic centrality set-up for the ease of computation, and carry out three case
studies (an airport network, a cross-shareholding network and a Bitcoin transactions
subnetwork) to illustrate the interpretation and insights linked to this new notion of
centrality.

Subjects Data Mining and Machine Learning, Data Science
Keywords Entropy, Centrality, Information flow

INTRODUCTION
Centrality is a classical measure used in graph theory and network analysis to identify
important vertices. The meaning of ‘‘important’’ depends on the nature of the problem
analyzed, e.g., hubs in networks, spreaders of a disease, or influencers in social networks.
Commonly used centrality measures include: the degree centrality which is the degree (or
in-degree/out-degree) of the vertex depending on whether the graph is directed, possibly
normalized to get the fraction of vertices a given vertex is connected to; the closeness
centrality which is the reciprocal of the sum of the shortest path distances from a given
vertex to all others, typically normalized, and indicates how close a given vertex is to all
other vertices in the network; the betweenness centrality which is the sum of the fraction
of all pairs of shortest paths that pass through it, indicating the extent to which a given
vertex stands between other vertex pairs (see e.g., Estrada, 2011 for a survey of different
centrality measures and how centralities fit into the more general framework of complex
networks). These were extended to weighted graphs, though at the risk of changing the
interpretation of the measure, e.g., one may use weighted degrees instead of degrees, but
this measure does not count the number of neighbors anymore (see e.g., Opsahl, Agneessens
& Skvoretz, 2010 for a discussion on using the above cited centrality measures for weighted
graphs). Another way to determine centrality is to assign as centrality a (scaled) average

How to cite this article Oggier F, Phetsouvanh S, Datta A. 2019. A split-and-transfer flow based entropic centrality. PeerJ Comput. Sci.
5:e220 http://doi.org/10.7717/peerj-cs.220

mailto:anwitaman@ntu.edu.sg
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.220
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://doi.org/10.7717/peerj-cs.220


of the centralities of the neighbors. This is the idea behind eigenvector centrality discussed
by Newman (2009), which was already debated by Bonacich (1972), who later generalized
it to alpha centrality (Bonacich & Lloyd, 2001). Alpha centrality introduces an additive
exogenous term, which accounts for an influencing factor which does not depend on the
network structure. Though Katz centrality (Katz (1953)) relies on the idea that importance
is measured by weighted numbers of walks from the vertex in question to other vertices
(where longer walks have less weights than short ones), it turns out that the alpha centrality
and Katz centrality differ by a constant term. With these three centralities, a highly central
vertex with many links tends to endorse all its neighbors which in turn become highly
central. However one could argue that the inherited centrality should be diluted if the
central vertex is too magnanimous in the sense that it has too many neighbors. This is
solved by Page Rank centrality, which is based on the PageRank algorithm developed
by Page et al. (1999). Iannelli & Mariani (2018) proposed ViralRank as a new centrality
measure, defined to be the average random walk effective distance to and from all the other
nodes in the network. This measure is meant to identify influencers for global contagion
processes. Benzi & Klymko (2015) showed that a parameterized random walk model can
capture the behavior of a gamut of centrality measures, including degree centrality (walks
of length one) and eigenvector based centrality models (considered as infinite walks), which
contain the eigenvector and Katz centralities as particular cases. This parameterized model
helps explain and interpret the high rank correlation observed among degree centrality and
eigenvector based centralities. Schoch, Valente & Brandes (2017) argues that the role of the
network structure itself should not be underestimated when looking at correlations among
centralities.

Notwithstanding this high rank correlation among centrality measures, each measure
captures the vertex importance subject to a certain interpretation of importance, which is
a key rationale behind studying different centrality models in different contexts. A seminal
work by Borgatti (2005) looked at which notion of centrality is best suited given a scenario,
by characterizing the scenario as a flow circulating over a network: a typology of the flow
process is given across two dimensions, the type of circulation (parallel/serial duplication,
transfer) and the flow trajectories (geodesics, paths, trails, or walks): a flow may be based
on transfer, where an item or unit flows in an indivisible manner (e.g., package delivery), or
by serial replication, in which both the node that sends the item and the one that receives
it have the item (e.g., one-to-one gossip), or parallel duplication, where an item can be
transmitted in parallel through all outgoing edges (e.g., epidemic spread). It was shown
for example that betweenness is best suited for geodesics and transfer, while eigenvector
based centralities should be used for walks and parallel duplication. Indeed, betweenness
is based on shortest paths, suggesting a target to be reached as fast as possible, and
thus fitting transfer. Using Katz’s intuition, eigenvector based centralities count possible
unconstrained walks, and they are consistent with a scenario where every vertex influences
all of its neighbors simultaneously, which is consistent with parallel deduplication. This
flow characterization is of interest for this work, since we will be looking at a case where a
flow is actually not just transferred, but also split among outgoing edges, with the possibility
to partly remain at any node it encounters. This scenario could typically be motivated by

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 2/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


financial transactions, which are transferred, not duplicated. However when transferred,
the flow of money is not indivisible. Based on Borgatti’s typology, a measure of centrality
for transfer should be based on paths rather than eigenvectors. This is indeed the approach
that we will explore.

Our starting point is the notion of entropic centrality as proposed by Tutzauer (2007).
A (directed) graph G= (V,E) with vertex set V and edge set E is built whose edges are
unweighted. To define the centrality of u∈V , the probability pu,v that a random walk
constrained to not revisit any vertex (thus, only forming paths) starting at u terminates at v
is computed. To model the stoppage of flow/walk at any vertex, an edge to itself (self-loop)
is added. The process of computing pu,v is thus to consider a constrained random walk
to start at node u, and at every node w encountered in the path, to choose an outgoing
edge uniformly at random among the edges leading to unvisited nodes (or choosing the
self-loop to terminate the walk). Then the entropic centrality CH(u) of u is defined to be

CH(u)=−
∑
v∈V

pu,vlog2pu,v. (1)

This notion of entropic centrality was adapted in Nikolaev, Razib & Kucheriya (2015) to
fit a Markov model, where instead of paths, unconstrained random walks are considered, for
computational efficiency. In general, how to compute centrality at scale is an interesting
direction of study in its own right, e.g., Fan, Xu & Zhao (2017), but this is somewhat
orthogonal to the emphasis of the current work.

In this work we revisit and generalize the original concept of entropic centrality to make
it more flexible. To do so, we first interpret the ‘‘transfer’’ centrality proposed in Tutzauer
(2007) as having (1) an underlying graph, where every edge has a probability which is
that of being chosen uniformly at random among the other outgoing edges of a given
vertex, and (2) an indivisible flow which starts at a vertex u, and follows some path where
the probability to choose an edge at every vertex in this path is given by the probability
attached to this edge, taking into account unvisited neighbors, to reach v. Since the flow is
indivisible, the self-loop represents the probability for this flow to stop at a given vertex.

In our generalization, we similarly assume that we have (1) an underlying graph, only
now the probability attached to each edge depends on the scenario considered and could be
arbitrary, (2) the flow used to measure centrality can split among neighbors, by specifying
which subsets it goes to with which probability, at every vertex it encounters (as per a
flow in the traditional network analysis sense, flow conservation applies, meaning that
the amount of flow that goes out of u is the same amount of flow that reaches all of its
neighbors). Again, a self-loop is an artifact introduced to capture the effect of the flow
on vertices, even if none of the flow actually remains in the vertex (As in Nikolaev, Razib
& Kucheriya, 2015, a zero probability would otherwise render zero contribution to the
entropic centrality calculation). While the underlying phenomenon may have self-loops,
they may or not be directly used to determine the self-loops needed for the mathematical
model. This should be determined based on the scenario being modeled.

The above motivates the notion of a split-and-transfer entropic centrality. Since
propagation of flow is an indicator of spread over the network, we will also consider a scaled

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 3/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


version of entropic centrality, where a multiplicative factor is introduced to incorporate
additional information, which may suggest an a priori difference of importance among the
vertices, for instance, if the data suggests that some vertices handle volume of goods much
larger than other vertices.

The contributions of this work are to (1) introduce the above framework for split-and-
transfer entropic centrality, (2) show in ‘The transfer entropic centrality’ that transfer
centrality can be easily extended to consider arbitrary probabilities on graph edges and
(3) prove that computing the split-and-transfer entropic centrality can be reduced to
transfer entropic centrality over a graph with suitable equivalent edge probabilities (which
is crucial from a practicality perspective), as shown in Proposition 1 of ‘The transfer
entropic centrality’. Studies that showcase and explore our technique are provided in
‘Case Studies’: (i) a cross-shareholding network representing portfolio diversification, that
illustrates the versatility of our framework (ii) a subgraph of wallet addresses from the
Bitcoin network, which originally motivated the study of split-and-transfer flows, and (iii)
an airport network. Comparisons with other standard centralities (alpha, Katz, betweenness
and PageRank) are given, showing that the entropic centrality captures different features.

THE NOTION OF SPLIT-AND-TRANSFER ENTROPIC
CENTRALITY
The transfer entropic centrality
Consider the network shown on Fig. 1A and assume that the probability of an indivisible
flow going from one vertex to another is uniform at random (including the option to
remain at the current vertex). For a flow starting at v1, there is then a probability

1
4 to go

to v4, and a probability
1
2 to continue to v5, so the probability to go from v1 to v5 following

the path (v1,v4,v5) is
1
8. But since it is also possible to reach v5 from v1 using v3 instead, an

event of probability 18, we have that the probability pv1,v5 for an indivisible flow to start at
v1 and stop at v5 is pv1,v5 =

1
4. Similarly, we compute pv1,v1,pv1,v2,pv1,v3 and pv1,v4, and the

transfer entropic centrality CH(u) of u=v1 is CH(v1)=
3
4log24+

2
8log2(8)=2.25 by (1).

For a point of comparison, on the right of the same figure, we change the probability to
go out of v1, such that the edge (v1,v2) is chosen with a probability

1
2, while the probability

is 16 for using the edges to the other vertices (including a probability
1
6 that the flow just stays

at v1 itself). The resulting probabilities are provided on Fig. 1B. There is no complication
in computing CH(v1) using (1) with non-uniform probabilities. This reduces slightly
the centrality of v1, which is consistent with the interpretation of entropic centrality: the
underlying notion of entropy is a measure of uncertainty (Tutzauer, 2007), the uncertainty
of the final destination of a flow, knowing that it started at a given vertex. Imagine the
most extreme case where the edge (v1,v2) is chosen with a probability 1, then even though
v1 has three potential outgoing neighbors, two of them are used with probability 0, so the
centrality of v1 would reduce considerably, as expected, since there is no uncertainty left
regarding the destination of a flow at v1.

The notion of transfer entropic centrality captured by (1) assumes that there is no vertex
repetition in the paths taken by the flow. Figure 2 illustrates this hypothesis. Again for

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 4/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


v1

v3v2 v4

v5

pv1,v2 =
1
4

pv1,v1 =
1
4

pv1,v3 =
1
8

pv1,v4 =
1
8

pv1,v5 =
1
4

CH(v1) = 2.25.

1
4

1
1
2

1
2

1

1
4

1
4

1
4

1
2

1
2

(a)

v1

v3v2 v4

v5

pv1,v2 =
1
2

pv1,v1 =
1
6

pv1,v3 =
1

12

pv1,v4 =
1
12

pv1,v5 =
1
6

CH(v1) ≈ 1.959.

1
6

1
1
2

1
2

1

1
2

1
6

1
6

1
2

1
2

(b)

Figure 1. The transfer entropic centrality CH(v1) of v1 is computed using (1), for a uniform edge
distribution (the choice of an edge at a given vertex is chosen uniformly at random among choices of
unvisited neighbors) in (1a), and for a non-uniform distribution in (1b).

reaches all of its neighbors). Again, a self-loop is an artifact introduced to capture the effect of the flow92
on vertices, even if none of the flow actually remains in the vertex (As in (Nikolaev et al. (2015)), a zero93
probability would otherwise render zero contribution to the entropic centrality calculation). While the94
underlying phenomenon may have self-loops, they may or not be directly used to determine the self-loops95
needed for the mathematical model. This should be determined based on the scenario being modeled.96

The above motivates the notion of a split-and-transfer entropic centrality. Since propagation of flow97
is an indicator of spread over the network, we will also consider a scaled version of entropic centrality,98
where a multiplicative factor is introduced to incorporate additional information, which may suggest an a99
priori difference of importance among the vertices, for instance, if the data suggests that some vertices100
handle volume of goods much larger than other vertices.101

The contributions of this work are to (1) introduce the above framework for split-and-transfer entropic102
centrality, (2) show in Subsection 2.1 that transfer centrality can be easily extended to consider arbitrary103
probabilities on graph edges and (3) prove that computing the split-and-transfer entropic centrality104
can be reduced to transfer entropic centrality over a graph with suitable equivalent edge probabilities105
(which is crucial from a practicality perspective), as shown in Proposition 1 of Subsection 2.1. Studies106
that showcase and explore our technique are provided in Section 3: (i) a cross-shareholding network107
representing portfolio diversification, that illustrates the versatility of our framework (ii) a subgraph of108
wallet addresses from the Bitcoin network, which originally motivated the study of split-and-transfer109
flows, and (iii) an airport network. Comparisons with other standard centralities (alpha, Katz, betweenness110
and PageRank) are given, showing that the entropic centrality captures different features.111

2 THE NOTION OF SPLIT-AND-TRANSFER ENTROPIC CENTRALITY112
2.1 The Transfer Entropic Centrality113
Consider the network shown on Figure 1a and assume that the probability of an indivisible flow going114
from one vertex to another is uniform at random (including the option to remain at the current vertex).115
For a flow starting at v1, there is then a probability

1
4 to go to v4, and a probability

1
2 to continue to v5, so116

the probability to go from v1 to v5 following the path (v1,v4,v5) is
1
8 . But since it is also possible to reach117

v5 from v1 using v3 instead, an event of probability
1
8 , we have that the probability pv1,v5 for an indivisible118

flow to start at v1 and stop at v5 is pv1,v5 =
1
4 . Similarly, we compute pv1,v1 , pv1,v2 , pv1,v3 and pv1,v4 , and119

the transfer entropic centrality CH(u) of u = v1 is CH(v1) =
3
4 log2 4 +

2
8 log2(8) = 2.25 by (1).120

For a point of comparison, on the right of the same figure, we change the probability to go out of v1,121
such that the edge (v1,v2) is chosen with a probability

1
2 , while the probability is

1
6 for using the edges to122

the other vertices (including a probability 16 that the flow just stays at v1 itself). The resulting probabilities123
are provided on Figure 1b. There is no complication in computing CH(v1) using (1) with non-uniform124
probabilities. This reduces slightly the centrality of v1, which is consistent with the interpretation of125

3/14
PeerJ Comput. Sci. reviewing PDF | (CS-2018:12:33449:4:0:NEW 13 Aug 2019)

Manuscript to be reviewedComputer Science

Figure 1 The transfer entropic centrality CH (v1) of v1 is computed using (1), for a uniform edge distri-
bution (the choice of an edge at a given vertex is chosen uniformly at random among choices of unvis-
ited neighbors) in (A), and for a non-uniform distribution in (B).

Full-size DOI: 10.7717/peerjcs.220/fig-1

v1

v3v2 v4

v5

pv1,v2 =
1
4

pv1,v1 =
1
4

pv1,v3 =
1
8

pv1,v4 =
1
8

pv1,v5 =
1
4

CH(v1) = 2.25.

1
1
2

1
3

1

1
3

1
2

1
3

1
4

1
4

1
4

1
4

(a)

v1

v3v2 v4

v5

pv1,v2 =
1
2

pv1,v1 =
1
6

pv1,v3 =
1

12

pv1,v4 =
1
6

1
5

pv1,v5 =
1
6

4
5 +

1
6

1
2

CH(v1) ≈ 2.0795.

1
2

1
2

1
6

1

1
6

1
2

2
3

1
6

1
2

1
6

1
6

(b)

Figure 2. An example of transfer centrality involving already visited neighbors. If probabilities are
uniform at random (2a), they are scaled according to the number of unvisited neighbors. If not (2b), they
are scaled proportionally to the existing probabilities.

entropic centrality: the underlying notion of entropy is a measure of uncertainty (Tutzauer (2007)), the126
uncertainty of the final destination of a flow, knowing that it started at a given vertex. Imagine the127
most extreme case where the edge (v1,v2) is chosen with a probability 1, then even though v1 has three128
potential outgoing neighbors, two of them are used with probability 0, so the centrality of v1 would reduce129
considerably, as expected, since there is no uncertainty left regarding the destination of a flow at v1.130

The notion of transfer entropic centrality captured by (1) assumes that there is no vertex repetition131
in the paths taken by the flow. Figure 2 illustrates this hypothesis. Again for the centrality of v1, a flow132
leaves v1, it can go to either v2, v3 or v4. When reaching v4, the flow cannot go back to v1, since v1133
is already visited (and going back would not give a path anymore), there the probabilities to stay at v4134
and to go to v5 from v4 are modified. On the left, when probabilities are uniform, since now only two135
outgoing edges of v4 are available, namely edges (v4,v4) and (v4,v5), each is assigned a probability of

1
2 .136

On the right, when probabilities are not uniform, we distribute the probability of going to some visited137
vertex proportionally to the rest of the available edges. Since 46 is going to v5 while

1
6 is staying at v4, we138

have 4 and 1 out of 5 respectively leaving and staying, thus obtaining the renormalized probabilities as139
4
6 +

4
5

1
6 =

4
5 and

1
6 +

1
5

1
6 =

1
5 .140

The examples of Figures 1 and 2 illustrate diverse cases of indivisible flow. By definition of indivis-141
ibility, the choice of an edge at a vertex u corresponds to choosing subsets containing one vertex only142
in the list of all subsets of neighbors. We can thus set a probability 0 to all subsets which contain more143
than one vertex. Therefore, the definition of entropic centrality in (1), with or without uniform edge144
probabilities, are particular cases of the proposed split-and-transfer framework, that we discuss next.145

2.2 The Split-and-Transfer Entropic Centrality146
Consider the network of Figure 3 depicting a seller v1 whose direct customers are v2,v3,v4. Say we147
further know that when v1 distributes a new batch of items, he does so to either customers {v2,v3} or148
{v3,v4}, and in fact, the pair {v3,v4} is preferred (they receive 2/3 of the batches, versus 1/3 for the149
group {v2,v3}). Furthermore, in the first case, v2 receives a higher volume than v3 (say 2/3 of the batch150
goes to v2), while for the second case, v4 takes 3/4 of the batch shared with v3. Once v3,v4 obtain the151
items, they typically keep half for themselves, and distribute the other half to v5.152
To compute the centrality of v1, we consider a divisible flow starting at u = v1 which can split among153
different paths instead of following one. To model the choice of splitting among possible neighbors,154
we first define a probability q(x) over the set Eu = { {v1}, {v2}, {v3}, {v4}, {v1,v2}, {v1,v3}, {v1,v4},155
{v2,v3}, {v2,v4}, {v3,v4}, {v1,v2,v3}, {v1,v2,v4}, {v1,v3,v4}, {v2,v3,v4}, {v1,v2,v3,v4} } such that,156
for our example, q({v2,v3}) = 13 , q({v3,v4}) =

2
3 , and q(x) = 0 for other choices of x (in contrast to157

(Oggier et al. (2018c)) where it was chosen to be uniformly at random). This represents the fact that 1/3158
of the time, v1 sends the goods to the pair {v2,v3} (as shown in Figure 3a), while for the rest of the time,159

4/14
PeerJ Comput. Sci. reviewing PDF | (CS-2018:12:33449:4:0:NEW 13 Aug 2019)

Manuscript to be reviewedComputer Science

Figure 2 An example of transfer centrality involving already visited neighbors. If probabilities are uni-
form at random (A), they are scaled according to the number of unvisited neighbors. If not (B), they are
scaled proportionally to the existing probabilities.

Full-size DOI: 10.7717/peerjcs.220/fig-2

the centrality of v1, a flow leaves v1, it can go to either v2, v3 or v4. When reaching v4, the
flow cannot go back to v1, since v1 is already visited (and going back would not give a
path anymore), there the probabilities to stay at v4 and to go to v5 from v4 are modified.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 5/21

https://peerj.com
https://doi.org/10.7717/peerjcs.220/fig-1
https://doi.org/10.7717/peerjcs.220/fig-2
http://dx.doi.org/10.7717/peerj-cs.220


v1

v3v2 v4

v5

0

1
1
2

1
2

1

0

1
2

1
2

2
3

1
3

(a)

v1

v3v2 v4

v5

0

1
1
2

1
2

1

0

1
2

1
2

3
4

1
4

(b)

u

v3v2 v4

v5

0

1
1
2

1
2

1

2
9

5
18

1
2

1
2

1
2

(c)

Figure 3. An example of split-and-transfer entropic centrality: on (3a) in red, the event corresponding
to choosing {v2,v3}, in (3b), the event {v3,v4}. The probabilities pu,v are computed by summing over
both events, weighted by the respective event probability: pv1,v2 =

1
3 (

2
3 ) =

2
9 , pv1,v4 =

2
3 (

3
8 ) =

1
4 ,

pv1,v3 =
1
3 (

1
6 )+

2
3 (

1
8 ), pv1,v5 =

1
3 (

1
6 )+

2
3 (

1
8 +

3
8 ). This gives CH(v1) ≈ 1.9076.

it sends it to the pair {v3,v4} (shown in Figure 3b). We compute the path probabilities for each event, for160
q({v2,v3}) = 13 and for q({v3,v4}) =

2
3 accordingly.161

We have further information: when v1 deals with {v2,v3}, there is a bias of 23 for v2 compared to
1
3162

for v3, and the bias is of
3
4 for v4 in the other case. The corresponding probabilities are attached to the163

edges {(v1,v2),(v1,v3)} and {(v1,v3),(v1,v4)} respectively (shown in Figure 3c). Now that the edge164
probabilities are defined, we can compute the path probabilities. For example, from v1 to v5, we sum up165
the path probabilities for both events, weighted by the respective event probability: 13 (

1
6 )+

2
3 (

1
8 +

3
8 ).166

We next provide a general formula. We let a flow start at a vertex whose centrality we wish to compute,167
and at some point of the propagation process, a part fu of the flow reaches u. Let Nu be the neighborhood168
of interest given fu, that is, the set of outgoing neighbors which have not yet been visited by the flow.169
Every outgoing edge (u,v) of u exactly corresponds to some outgoing neighbor v, so in what follows,170
we may refer to either one or the other. Let Eu denote the set of possible outgoing edge subsets (where171
every edge (u,v) is represented by v the neighbor). We attach a possibly distinct probability q(x) to every172
choice x in Eu. Then ∑x∈Eu q(x) = 1.173

Every x in Eu corresponds to a set of edges (u,v) for v a neighbor. We further attach a weight ωx(u,v)174
to every edge in x, with the constraint that ∑(u,v)∈x ωx(u,v) = fu. For example, we could choose all edges175
with equal weight, that is ωx(u,v) =

fu
i for every (u,v) in x containing i edges, to instantiate the special176

case where the flow is uniformly split among all edges.177
For a given node u, we compute the expected flow from u to a chosen neighbor v. Every such choice

of x comes with a probability q(x), and every edge (u,v) in x has a weight ωx(u,v), which sums up to

fuv = ∑
x∈Eu,v

q(x)ωx(u,v), (2)

where Eu,v contains the sets in Eu themselves containing v.178

Example 1. Consider the running example, with u = v1. The set of neighbors of u is Nu = {u,v2,v3,v4}.179
We assign the following probabilities: q({u}) = q1, q({v2}) = q2, q({v3}) = q3, q({v4}) = q4, q({u,v2}) =180
q5, q({u,v3}) = q6, q({u,v4}) = q7, q({v2,v3}) = q8, q({v2,v4}) = q9, q({v3,v4}) = q10, q({u,v2,v3}) =181
q11, q({u,v2,v4}) = q12, q({u,v3,v4}) = q13, q({v2,v3,v4}) = q14, q({u,v2,v3,v4}) = q15, with ∑15i=1 qi =182
1. We write down explicitly the terms involved in the sum (2) for two nodes, v2 and v3:183

fu,v2 = q2 fu + q5ω{u,v2}(u,v2)+ q8ω{v2,v3}(u,v2)+ q9ω{v2,v4}(u,v2)+ q11ω{u,v2,v3}(u,v2)
+q12ω{u,v2,v4}(u,v2)+ q14ω{v2,v3,v4}(u,v2)+ q15ω{u,v2,v3,v4}(u,v2).

fu,v3 = q3 fu + q6ω{u,v3}(u,v3)+ q8ω{v2,v3}(u,v3)+ q10ω{v3,v4}(u,v3)+
q11ω{u,v2,v3}(u,v3)+ q13ω{u,v3,v4}(u,v3)+ q14ω{v2,v3,v4}(u,v3)+ q15ω{u,v2,v3,v4}(u,v3).

5/14
PeerJ Comput. Sci. reviewing PDF | (CS-2018:12:33449:4:0:NEW 13 Aug 2019)

Manuscript to be reviewedComputer Science

Figure 3 An example of split-and-transfer entropic centrality: on (A) in red, the event corresponding
to choosing {v2,v3}, in (B), the event {v3,v4}. The probabilities pu,v are computed by summing over both
events, weighted by the respective event probability: pv1,v2 =

1
3 (

2
3 )=

2
9 , pv1,v4 =

2
3 (

3
8 )=

1
4 , pv1,v3 =

1
3 (

1
6 )+

2
3 (

1
8 ), pv1,v5 =

1
3 (

1
6 )+

2
3 (

1
8 +

3
8 ). This gives CH (v1)≈1.9076.

Full-size DOI: 10.7717/peerjcs.220/fig-3

On the left, when probabilities are uniform, since now only two outgoing edges of v4 are
available, namely edges (v4,v4) and (v4,v5), each is assigned a probability of

1
2. On the

right, when probabilities are not uniform, we distribute the probability of going to some
visited vertex proportionally to the rest of the available edges. Since 46 is going to v5 while
1
6 is staying at v4, we have 4 and 1 out of 5 respectively leaving and staying, thus obtaining
the renormalized probabilities as 46 +

4
5
1
6 =

4
5 and

1
6 +

1
5
1
6 =

1
5.

The examples of Figs. 1 and 2 illustrate diverse cases of indivisible flow. By definition of
indivisibility, the choice of an edge at a vertex u corresponds to choosing subsets containing
one vertex only in the list of all subsets of neighbors. We can thus set a probability 0 to all
subsets which contain more than one vertex. Therefore, the definition of entropic centrality
in (1), with or without uniform edge probabilities, are particular cases of the proposed
split-and-transfer framework, that we discuss next.

The split-and-transfer entropic centrality
Consider the network of Fig. 3 depicting a seller v1 whose direct customers are v2,v3,v4.
Say we further know that when v1 distributes a new batch of items, he does so to either
customers {v2,v3} or {v3,v4}, and in fact, the pair {v3,v4} is preferred (they receive 2/3 of
the batches, versus 1/3 for the group {v2,v3}). Furthermore, in the first case, v2 receives a
higher volume than v3 (say 2/3 of the batch goes to v2), while for the second case, v4 takes
3/4 of the batch shared with v3. Once v3,v4 obtain the items, they typically keep half for
themselves, and distribute the other half to v5.

To compute the centrality of v1, we consider a divisible flow starting at u=v1 which can
split among different paths instead of following one. To model the choice of splitting among
possible neighbors, we first define a probability q(x) over the set Eu ={{v1}, {v2}, {v3},
{v4}, {v1,v2}, {v1,v3}, {v1,v4}, {v2,v3}, {v2,v4}, {v3,v4}, {v1,v2,v3}, {v1,v2,v4}, {v1,v3,v4},
{v2,v3,v4}, {v1,v2,v3,v4}} such that, for our example, q({v2,v3})=

1
3, q({v3,v4})=

2
3, and

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 6/21

https://peerj.com
https://doi.org/10.7717/peerjcs.220/fig-3
http://dx.doi.org/10.7717/peerj-cs.220


q(x)=0 for other choices of x (in contrast to Oggier, Silivanxay & Datta (2018) where it
was chosen to be uniformly at random). This represents the fact that 1/3 of the time, v1
sends the goods to the pair {v2,v3} (as shown in Fig. 3A), while for the rest of the time, it
sends it to the pair {v3,v4} (shown in Fig. 3B). We compute the path probabilities for each
event, for q({v2,v3})=

1
3 and for q({v3,v4})=

2
3 accordingly.

We have further information: when v1 deals with {v2,v3}, there is a bias of
2
3 for v2

compared to 13 for v3, and the bias is of
3
4 for v4 in the other case. The corresponding

probabilities are attached to the edges {(v1,v2),(v1,v3)} and {(v1,v3),(v1,v4)} respectively
(shown in Fig. 3C). Now that the edge probabilities are defined, we can compute the path
probabilities. For example, from v1 to v5, we sum up the path probabilities for both events,
weighted by the respective event probability: 13(

1
6)+

2
3(

1
8 +

3
8).

We next provide a general formula. We let a flow start at a vertex whose centrality
we wish to compute, and at some point of the propagation process, a part fu of the flow
reaches u. Let Nu be the neighborhood of interest given fu, that is, the set of outgoing
neighbors which have not yet been visited by the flow. Every outgoing edge (u,v) of u
exactly corresponds to some outgoing neighbor v, so in what follows, we may refer to either
one or the other. Let Eu denote the set of possible outgoing edge subsets (where every edge
(u,v) is represented by v the neighbor). We attach a possibly distinct probability q(x) to
every choice x in Eu. Then

∑
x∈Euq(x)=1.

Every x in Eu corresponds to a set of edges (u,v) for v a neighbor. We further attach
a weight ωx(u,v) to every edge in x, with the constraint that

∑
(u,v)∈xωx(u,v)= fu. For

example, we could choose all edges with equal weight, that is ωx(u,v)=
fu
i for every (u,v)

in x containing i edges, to instantiate the special case where the flow is uniformly split
among all edges.

For a given node u, we compute the expected flow from u to a chosen neighbor v. Every
such choice of x comes with a probability q(x), and every edge (u,v) in x has a weight
ωx(u,v), which sums up to

fuv =
∑
x∈Eu,v

q(x)ωx(u,v), (2)

where Eu,v contains the sets in Eu themselves containing v.
Example 1 Consider the running example, with u=v1. The set of neighbors of u is
Nu ={u,v2,v3,v4}. We assign the following probabilities: q({u})=q1, q({v2})=q2,
q({v3})=q3, q({v4})=q4, q({u,v2})=q5, q({u,v3})=q6, q({u,v4})=q7, q({v2,v3})=q8,
q({v2,v4})=q9, q({v3,v4})=q10, q({u,v2,v3})=q11, q({u,v2,v4})=q12, q({u,v3,v4})=q13,
q({v2,v3,v4})=q14, q({u,v2,v3,v4})=q15, with

∑15
i=1qi =1. We write down explicitly the

terms involved in the sum (2) for two nodes, v2 and v3:

fu,v2 =q2fu+q5ω{u,v2}(u,v2)+q8ω{v2,v3}(u,v2)+q9ω{v2,v4}(u,v2)+q11ω{u,v2,v3}(u,v2)

+q12ω{u,v2,v4}(u,v2)+q14ω{v2,v3,v4}(u,v2)+q15ω{u,v2,v3,v4}(u,v2).

fu,v3 =q3fu+q6ω{u,v3}(u,v3)+q8ω{v2,v3}(u,v3)+q10ω{v3,v4}(u,v3)

+q11ω{u,v2,v3}(u,v3)+q13ω{u,v3,v4}(u,v3)+q14ω{v2,v3,v4}(u,v3)+q15ω{u,v2,v3,v4}(u,v3).

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 7/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


Then fu,u+fu,v2 +fu,v3 +fu,v4 = fu
∑15

i=1qi = fu. By setting q8 =
1
3 and ω{v2,v3}(u,v2)= fu

2
3,

we find fu,v2 = fu
2
9. Also, adding up q10 =

2
3 and ω{v2,v3}(u,v3)= fu

1
3, ω{v3,v4}(u,v3)= fu

1
4,

we find fu,v3 = fu
1
9 + fu

1
6 = fu

5
18. Similarly fu,v4 = fu

1
2 and indeed fu

2
9 + fu

5
18 + fu

1
2 = fu.

We repeat the computations for fv3,v5 and fv4,v5. For that, we need to know what is fv3
and fv4, but in this case, since both v3 and v4 only have one incoming edge, we have that
fv3 = fv1,v3 and fv4 = fv1,v4:

fv3,v5 =
1
2fv3 = fu

1
2

5
18,fv4,v5 =

1
2fv4 = fu

1
2
1
2,fv5 = fv3,v5+fv4,v5 = fu

7
18.

It is true that by setting fu =1, we have fu,v2 =
2
9 =pu,v2 as computed in Fig. 3, but this

is true because pv2,v2 =1. If we consider v3 instead, we find fu,v3 =
5
18 =2pu,v3, this is

because we have computed what reaches v3, but since v3 has an outgoing edge, we need to
distinguish what stays from what continues. Notice that by setting fu=1 and fv3 = fv4 =1,
we get

fu,v2 =
2
9,fu,v3 =

5
18,fu,v4 =

1
2,fv3,v5 =

1
2,fv4,v5 =

1
2.

We then assign to edge (vi,vj) the probability fvi,vj (with fu=1) as reported on Fig. 3A.

The property of flow conservation observed in the example holds true in general,
which we shall prove next. Indeed, when v varies in Nu, the sets Eu,v appearing in the
summation

∑
v∈Nu

∑
x∈Eu,v q(x)ωx(u,v) may intersect, so for each choice x, one can gather

all the Eu,v that contains x. For this x, we find a term in the above sum of the form
q(x)

∑
(u,v)∈xωx(u,v)=q(x)fu. Then∑

v∈Nu

∑
x∈Eu,v

q(x)ωx(u,v)=
∑
x∈Eu

q(x)fu= fu.

This shows that the flow from u to v is conserved over all the neighbors v ∈Nu given fu.
Thus, by setting fu=1, the quantity

fuv =
∑
x∈Eu,v

q(x)ωx(u,v)

becomes a probability, and in fact, putting this probability on the edge (u,v) in the context
of the transfer entropic centrality gives the same result as the above computations using
the split-and-transfer model, as in fact already illustrated on the figure in Example 1, since
the probabilities displayed on the edges of the graph have been computed in this manner.
We summarized what we computed in the proposition below.
Proposition 1 The split-and-transfer entropic centrality CH,p(u) of a vertex u is given by

CH,p(u)=−
∑
v∈V

quvlog2(quv)

where quv =
∑

x∈Eu,v q(x)ωx(u,v) is computed from (2) with fu = 1 and the usual
convention that 0· log20=0 is assumed. The index p in CH,p emphasizes the dependency
on the choice of the probability distribution p. Then we have CH,unif when p is uniform as a
particular case.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 8/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


1We thank the authors of Dastkhan &
Gharneh (2016) for sharing their data
with us.

We thus showed that the split-and-transfer entropic centrality is equivalent to a transfer
entropic centrality, assuming the suitable computation of edge probabilities.

The notion of split-and-transfer entropic centrality characterizes the spread of a flow
starting at u through the graph. Now two vertices may have the same spread, but one vertex
may be dealing with an amount of goods much larger than the other. In order to capture
the scale of a flow, we also propose a scaled version of the above entropy.
Definition 1 The scaled split-and-transfer entropic centrality is accordingly given by
F(fu)CH,p(u) where F is a scaling function.

As a corollary, the computational complexity of this centrality measure is the same as
that of Tutzauer (2007), namely that of a depth first search, i.e., O(|V |+|E|) (Migliore,
Martorana & Sciortino, 1990). When the graph becomes large and some probability become
negligible, a natural heuristic of setting them to 0 is used.

The scaling function F may depend on the nature of the underlying real world
phenomenon being modeled by the graph, with F(fu)= fu being a simple default possible
choice (other standard choices are F(fu)=

√
fu or F(fu)= log(fu)). We use the default

choice in the example below.
Example 2 Continuing with the same example, we use the edge probabilities as obtained
in Example 1 to compute the transfer entropic centrality from Definition 1. The
scaled entropic centralities of u= v1 and v3 are simply fuCH,p(v1)≈ fu1.9076 and
fv3CH,p(v3)= fv3(

1
2log2(2)+

1
2log2(2))= fv3. Without the scaling factor, CH,p is a measure of

spread, and it makes sense that CH,p(v1)>CH,p(v3). However if v1 is actually distributing
some items in overall small amounts, while v3 is not only getting this item from v1 but also
producing it and furthermore sending it only to v5 but in large amounts, then the scaling
factor could be used to refine the analysis and account for this extra information. From
the moment fv3 ≥1.9076fv1, v3 will be deemed more important than v1 as per the scaled
entropic centrality measure.

CASE STUDIES
Shareholding in Tehran Stock Market
We next consider 131 companies from the Tehran Stock Market, as listed in Appendix A
of Dastkhan & Gharneh (2016).1 They form the vertices of a cross-shareholding network
of companies which have shares of other companies. There is an edge between i and j if
company i belongs to the investment portfolio of company j, i.e., j owns some share of i.
Therefore the in-degree of node j is the number of companies that belong to the investment
portfolio of company j. Conversely, the out-degree of node j is the number of companies
that are shareholders of j. Edges are weighted, edge (i,j) has for weight the percentage of
shares that company j has from company i. We will consider this graph, shown in Fig. 4,
and the graph with reverse edge directionality.

Nodes highlighted in green in Fig. 4 have one edge with weight more than 0.5, meaning
that more than 50% of their shares are owned by another company, otherwise they are
in grey. Nodes highlighted in vermillion, superseding the other coloring, have the highest
in-degrees, which means that they own shares of many other companies. They are nodes

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 9/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


Figure 4 Cross-shareholding network of Tehran Stock Market companies: Vermillion and sky blue
nodes have respectively the highest in- and out-degrees. Nodes circled in sky blue respectively vermillion
have the highest entropic centrality under the current and reverse edge directionality.

Full-size DOI: 10.7717/peerjcs.220/fig-4

121 (Adashare), 85 (NIKX1), 123 (Oilcopen), 42 (SA3A1), 119 (tamin org), and 118
(government). Nodes highlighted in sky blue have the highest out-degrees, which means
that their shares are owned by many other companies (but possibly in small amounts).
They are nodes 110 (BMEX1), 2 (CH12), 21 (FO041), 41 (GD021), 3 (GO02) with degree
7 and 69 (PFAX1), 1 (MADN), 20 (MS022) and 62 (PK061) with degree 8.

We next assign probabilities to edges: we use the edge weight, and fix the edge probability
to be inversely proportional to its weight. Self-loops have a natural interpretation. Since
the outgoing edges of node j indicate the companies that are shareholders of j, the self-loop
refers to j still owning some of its own shares, and the amount is 100% minus what the
other companies own (share ownerships with negligible amounts were not taken into
account in the data set, so self-loops absorb these portions).

Table 1 lists the seven nodes with the highest entropic centrality. The interpretation of
entropic centrality here is that we are looking at the nodes whose shares are ‘‘most diversely
owned’’ in terms of their shares being owned by different companies, whose shares are in
turn themselves owned by others. The economic fortunes of the company whose centrality
is looked at thus also affects those of the other companies, and the entropic centrality
thus indicates the impact a particular company’s economic performance would have on
the rest. We immediately see that this centrality measure is different from out-degree

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 10/21

https://peerj.com
https://doi.org/10.7717/peerjcs.220/fig-4
http://dx.doi.org/10.7717/peerj-cs.220


Table 1 Vertices with the highest entropic centrality, their scaled entropic centrality, where fu is the market size in percentage (their relative
ranking is marked with subscript), their out-degrees (in the above part) or in-degrees (in the below part, for the graph with reverse edge direc-
tionality) and neighbors with the weight of the connecting edges. Bold face indicates a high degree. Ranks with respect to alpha/Katz (AK) with
α=1, PageRank (PR) weighted (W) and unweighted (U), and betweenness (B).

no ID CH,p fu fuCH,p Out Neighbors A/K PR(U/W) B (W)

23 ARFZ1 3.1990 0.7153 2.28823 6 (1, 0.181), (2, 0.353), 1/1 1/1 39
(5, 0.045), (43, 0.045)
(41, 0.05), (85, 0.04)

3 GO02 3.0205 0.9635 2.91031 7 (1, 0.224),(21, 0.099) 3/3 69/69 3
(82, 0.011),(7, 0.037)
(42, 0.028), (43, 0.377)
(121, 0.185)

109 IPAR1 2.9680 0.6874 2.04026 5 (96, 0.2),(101, 0.0963) 22/22 7/4 39
(123, 0.173),
(97, 0.078), (38, 0.139)

59 PRDS1 2.9541 0.7951 2.34884 5 (88, 0.011), (57, 0.668), 14/14 18/26 39
(118, 0.054), (55, 0.05)
(82, 0.011)

65 PK3A1 2.8857 0.6267 1.80847 2 (57, 0.42), (55, 0.206) 71/71 30/12 39

5 KNRX 2.8817 0.9415 2.71312 3 (1, 0.8876),(3, 0.0209) 26/23 67/37 31
(83, 0.033)

116 PRSX1 2.8273 0.8086 2.28615 6 (96, 0.648), (97, 0.049) 8/7 12/9 39
(88, 0.071), (41, 0.011)
(129, 0.01), (79, 0.017)

no ID CH,p fu in Neighbors

118 gov 5.3434 9.8 40 (85,0.1737)
119 tamin org 4.6989 5.7647 31 (42, 0.0305)
121 Adashare 4.4777 6.7827 17
88 BTEJ1 4.2916 0.7143 9 (85, 0.474)
123 Oilcopen 3.9154 3.5699 22
83 TMEL1 3.8099 0.4849 9
85 NIKX1 3.7693 0.9722 17

centrality. We can however look at how they relate, by considering the role of out-degrees
not only on the nodes but also on their neighbors. We observe that only node 3 has one
of the highest out-degrees, however, nodes 23 and 116 still have high out-degrees, but also
are connected to neighbors which have high out-degrees, in fact, node 23 which has the
highest entropic centrality has three high out-degree neighbors. For nodes 109 and 59,
they still have a relatively high out-degree. For 5, it has a fairly low out-degree, but out
of the 3 neighbors, two have high degrees themselves. The case of node 65 is particularly
interesting, since it has only two neighbors, namely 55 and 57. Neither of 55 nor 57 has
a high centrality individually, but they together provide node 65 a conduit to a larger

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 11/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


subgraph than the individual transit nodes themselves do, illustrating the secondary effects
of flow propagation.

The cross-shareholding network of Tehran Stock Market was analyzed in Dastkhan &
Gharneh (2016), where a closeness centrality ranking is shown to be almost identical to the
degree based centrality one. The entropic centrality ranking in contrast manages to capture
a different dynamics, by involving the spread of influence via flow propagation, together
with a quantitative edge differentiation.

The above approaches ignore any other information such as the market size share of
the organizations. Arguably, between two organizations with identical position in the
graph, the one with larger market size may be deemed to have larger influence on the
other nodes. This is modeled by the scaled entropic centrality CH(u)fu, where fu is the
market size in percentage. We notice that this indeed creates a distinct relative ranking
(indicated by subscript in Table 1, for example, ARFZ1 is ranked 3rd as per weighted
entropic centrality). Particularly, among the top seven companies as per CH(u), we see that
only PRDS1 continues to be in the same (fourth) rank. KNRX has the largest change in
ranking, up from sixth to second. While the scaled entropic centrality ranking of the top
two nodes are congruent with the ranking based solely on the scale factor (market size),
we see that ARFZ1, which would be ranked 5th by market size, and first by solely network
effect, is ranked 3rd when both factors are taken into account.

We compare the entropic centrality CH,p with respect to the alpha, Katz and PageRank
centralities (using the reverse edge direction). Note that the unweighted graph has for
largest eigenvalue λ1 '2.99715780 (so

1
λ1
'0.3336494). The ranking results for the most

central entities from the Tehran stock exchange, and the overall Kendall tau rank correlation
coefficients Kendall (1938) are reported in Tables 1 and 2 respectively. The Kendall tau
coefficient indicates the rank correlation among a pair or ranked lists (see Schoch, Valente
& Brandes, 2017 for a discussion on why Kendall tau is preferred to Pearson). The entity
23 is outstandingly central with respect to all metrics but weighted betweenness. The alpha
and Katz centralities yield very similar results, but they rank the entity 3 as third instead
of second. They rank second the entity 20. A likely explanation could be that that 20
actually has a higher out-degree (and thus a higher in-degree in the reversed edge network)
than entity 3. The top 7 most central entities have mostly a 0 betweenness (ranked 39),
and are mostly ranked pretty low with respect to both versions of PageRank, weighted
and unweighted. The most central entity for the weighted betweenness is 85, which is
one of the most central with respect to in-degree (it has an in-degree of 18). Then 111
and 76 are second respectively for the unweighted and weighted PageRank. Entity 111
has out-neighbors 88,81,127,94,112 which become in-neighbors in the reversed edge
graph, neither 111 itself nor its neighbors stand out by their degrees, however 76 has for
in-neighbors in the reversed edge graph 72,73,130,85,118,76, and both 118 and 85 are very
central with respect to in-degree, making it easier to explain why it is ranked high. Note
that the assortativity coefficient of this graph is≈−0.01521584, so this is a non-assortative
graph, where high degree nodes do not particularly connect to neither high nor low degree
nodes.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 12/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


Table 2 Kendall rank correlation coefficient τκ across the centralities for the Tehran stock exchange.

CH,p Alpha PageRank (UW) PageRank (W) Katz
(α = 0.1) α = 0.85 α = 0.85 (α = 0.1)

CH,p 0 0.2259 0.3185 0.3326 0.2252
alpha 0.2259 0 0.3171 0.3787 0.0047
PageRank(UW) 0.3185 0.3171 0 0.1317 0.316
PageRank(W) 0.3326 0.3787 0.1317 0 0.3778
Katz 0.2252 0.0047 0.316 0.3778 0

The Kendall rank correlation confirms that the entropic centrality differs from the other
metrics not only to decide the most central vertices but also overall. The Kendall coefficient
for unweighted betweenness has not been reported since only 38 vertices have a non-zero
betweenness. This shows that the graph considered is far from being strongly connected.
Overall, comparison points illustrate that the entropic centrality CH,p provides a new
perspective not captured by the other algorithms.

We next consider the same graph but where edge directionality is reversed. A node
becomes central if it owns diverse companies, which themselves may in turn own various
companies. Since owning shares could be used to influence an organization’s management,
the entropic centrality based on reverse edges is thus a proxy indicator of how much control
a specific entity has over the other entities in the market. Organizations with very high
entropic centralities using either sense of edge direction could then be seen as probable
candidates causing structural risks—be it by being ‘too big to fail’, or having too much
control over significant portions of the market for it to be fairly competitive.

The nodes with highest entropic centrality are shown in Table 1. The most important
one is the government: we expect it to be one of the most important players in Iran when
it comes to owning shares in other companies (and yet not to appear in the list when
the original edge direction is considered). We see a higher correlation between entropic
centrality and in-degree than there was between entropic centrality and out-degree in
Table 1. Among the seven most central nodes, five of them are having one of the highest
in-degrees, the two most central nodes have themselves one high degree neighbor. Then
node 88 has a fairly low in-degree, but it is connected to node 85.

In summary, the case study of the Tehran stock exchange network exhibits three
important intertwined aspects of our model. Firstly, it is flexible. It seamlessly captures
the effect of relationships, considered either in a binary fashion (just the structure), or
quantified with relative strengths (the skew in strength of the relationships), while it can
also accommodate information which is intrinsic to the node but somewhat disentangled
from the network (used as a scale factor). Second, reversing the edge direction gives a dual
perspective. Finally, we see that we obtain different results and corresponding insights,
based on which variations of the model is applied for a specific study. Naturally, figuring
it out the best variation is done on a case-by-case basis.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 13/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


Figure 5 A subgraph of the Bitcoin subgraph, which comprises only addresses that have non-zero en-
tropic centrality. Those in red are listed in Table 3, with the highest entropic centrality.

Full-size DOI: 10.7717/peerjcs.220/fig-5

A bitcoin subgraph
Our final case study is a subgraph of the Bitcoin wallet address network derived from
Bitcoin transaction logs (see Fig. 5). Bitcoin is a cryptocurrency (Nakamoto, 2008), and
transactions (buy and sell) among users of this currency are stored and publicly available in
a distributed ledger called blockchain. User identities are unknown, but each user has one
or many wallet addresses, that are identifiers in every transaction. Then one transaction
record amalgamates the wallet addresses of possibly several payers and payment receivers,
together with the transaction amounts.

To be more precise, consider two Bitcoin transactions T1 and T2. The transaction T1 has n
inputs, from wallet addresses A1,...,An, of amounts i11,...,i1n respectively. The outputs, of
amounts o11,...,o1m go to wallet addresses C1,...,Cm respectively. The sum of inputs equals
the sum of outputs and any transaction fees (say τ1), i.e., |T1|=

∑n
k=1i1k =τ1+

∑m
l=1o1l.

For the sake of simplicity, we will ignore the transaction fees (i.e., consider τ1 =0). A
similar setting holds for transaction T2, where the same wallet address A1 appears again
as part of the inputs, together with some wallet addresses B2,...,Bn′ which may or not

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 14/21

https://peerj.com
https://doi.org/10.7717/peerjcs.220/fig-5
http://dx.doi.org/10.7717/peerj-cs.220


Table 3 Addresses with highest entropic centrality in the Bitcoin subgraph above (with the respective relative ranks as per other centralities—
alpha/Katz with α = 0.1 (AK), PageRank (PR), weighted (W) and unweighted (U)) and with highest centrality when edges have reverse direc-
tionality below.

Address CH,p fu Out fuCH AK PR (U/W)

3CD1QW6fjgTwKq3Pj97nty28WZAVkziNom 8.6633 0.0473 2807 652 1 1/1
38PjB1ghFrD9UQs7HV5n15Wt1i3mZP8Wke 5.7214 0.1961 382 481 3 14/14
3Eab4nDg6WJ5WR1uvWQirtMzWaA34RQk9s 5.4339 0.1778 568 514 2 13/15
3MYqQJ5LbDe9U3drsaDprKxWobVZA3UgAw 5.3316 0.9270 2 609 4 2/455
38mMQxz4knqfmecjLW3atdygfWxvvnJfg7 5.3316 0.9268 2 3 4 2/3
33XZf8Ys9sbqnAKynA4yBckyzwN3SEZaU7 5.3316 0.9254 2 9 4 2/10
3P4C7jpF1oxHgxqt4VgMRcCBEV3YEpaDUm 5.3316 0.9224 2 7 4 2/8
3Fp5ejYY8FsJ6Y3kb377VRjJunTeUVYsuq 5.3316 0.8966 2 2 4 2/3
3Q9SPyCN95szQUoQYgAHKgdhC3YnRsrFrW 5.3316 0.8928 2 8 4 2/8
38A6nGSMR59WHVnj9gaJ2Cm62y9kFE318i 5.3316 0.8908 2 5 4 2/6
3Ce7jUQn2RH5Ysdb4VvShoYymZLpkcqaAA 5.3316 0.8877 2 10 4 2/11
364qbSJFhwkBgZnMuhmUHdczpaZNS2PmE6 5.3316 0.8832 2 1 4 2/2
3KDgKr3qov4Ws5WPnaA2RHjcE1N2UeVYs3 5.3316 0.8619 2 4 4 2/5
1NxaBCFQwejSZbQfWcYNwgqML5wWoE3rK4 5.3316 0.1175 2 6 4 2/7

Address CH,p fu in

38PjB1ghFrD9UQs7HV5n15Wt1i3mZP8Wke 7.6477 171.359 218
3Eab4nDg6WJ5WR1uvWQirtMzWaA34RQk9s 7.5583 175.022 196
15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4 5.2649 7.504 17
1C7PDYzjRDqomyywDHEqx9huYoYQoGYgdV 4.9876 3.949 31
1zksVRSDUuX2E5mMNvvbA9C4esfnvVdfA 4.4176 0.494 2

intersect with A2,...,An. By design, Bitcoin transactions do not retain an association as to
which specific inputs within a transaction are used to create specific outputs.

Suppose one would like to create a derived address network from some extract of the
Bitcoin logs of transactions, that is a graph whose nodes are Bitcoin wallet addresses, and
edges are directed and weighted. There should be an edge from address u to address v
if there is at least one transaction where some amount of Bitcoin is going from u to v.
However as explained above, it is not always possible to disambiguate the input–output
pairs. If the input amounts are particularly mutually distinct, and so are the output
amounts, and there are input–output amounts that match closely, one might be able to
make reasonable guess about matching a specific input to a specific output. In general,
in absence of such particular information, one heuristic is to model the input–output
association probabilistically. A common heuristic (Kondor et al., 2014) is to consider that
based on transaction T1 there is an edge from A1 to each of C1,...,Cm. The same holds for
transaction T2. Thus in the derived address network, there will be an edge from A1 to each
of the C1,...,Cm,D1,...,Dm′. If some outputs X,...,Z are in common to both transaction
outputs, there is a single edge between A1 and each of the addresses X,...,Z.

The derived address network gives us the graph to be analyzed, whose vertices are
wallet addresses and edges are built as above. Originally, a given wallet address is sending

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 15/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


Bitcoins to possibly different output wallet addresses within one transaction, and the same
wallet address may be involved in different transactions, with possible reoccurences of the
same output addresses (this is the case of A1 which is an input to both transactions T1
and T2 and X,...,Z appear as output transactions in both). In the split-and-transfer flow
model, we can incorporate this information into the derived address network by assigning
the probabilities q({C1,...,Cm})=

i11
i11+i21

and q({D1,...,Dm′})=
i21

i21+i22
with which the

respective subsets {C1,...,Cm} and {D1,...,Dm′} of the set {C1,...,Cm,D1,...,Dm′} of
neighbors of A1 are used. Other choices for q(x) are possible, the rationale for this specific
choice is to use a probability that is proportional to the amount of Bitcoin injected by A1
in each of the transactions.

Edge weights in the derived address network are computed as follows. Let |T1|=
∑m

k=1o1k
and |T2|=

∑m′
k=1o2k denote the total amounts involved in each of the transactions. For an

edge between A1 and Cl, which happens in T1, it is ωC1,...,Cm(A1,Cl)=
o1l
|T1|

, while for an
edge between A1 and Dl, which happens in T2, it is ωD1,...,Dm′(A1,Dl)=

o2l
|T2|

. We thus have

m∑
l=1

ωC1,...,Cm(A1,Cl)=
m∑
l=1

o1l
|T1|
=1,

m′∑
l=1

ωD1,...,Dm′(A1,Dl)=
m′∑
l=1

o2l
|T2|
=1.

If some node pairs, and thus edges, repeat across transactions (for example, A1 to X,...,Z in
our example), these edge weights should cumulate in the overall derived address network.
This is captured by the formula (2) which is here instantiated as

fA1,X =q({C1,...,Cm})ωC1,...,Cm(A1,X)+q({D1,...,Dm′})ωD1,...,Dm′(A1,X)

=
i11

i11+i21

o1x
|T1|
+

i21
i11+i21

o2x
|T2|

where in transaction T1 the output to address X is o1x, while it is o2x in transaction T2.
In a departure from previous works that derive the address network in a manner

explained above Kondor et al. (2014), our graph model is able to retain the information
that subsets of edges co-occur, or not, as displayed above. For that reason, the Bitcoin
address network is a natural candidate (and in fact, part of the inspiration) for the general
flow model with arbitrary splits and transfers as considered in this paper, where individual
flows may go through a subset of outgoing edges.

For our experiments, we choose a Bitcoin subgraph appearing in the investigation
of wallet addresses involved in extorting victims of Ashley-Madison data breach (see
Oggier, Phetsouvanh & Datta, 2018a for accessing the data). It is obtained by extracting
a subgraph of radius 4 (if the graph were undirected) around the wallet address
1G52wBtL51GwkUdyJNYvMpiXtqaGkTLrMv. While we would like to emphasize that
we use here this Bitcoin graph to explore the entropic centrality model, it may still be worth
mentioning that one identified suspect node from another of our study (Phetsouvanh,
Oggier & Datta, 2018), namely node 15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4, has
high enough entropic centrality to be listed (see Table 3 below) as a top entropic centrality
node. Thus, the entropic centrality analysis can be used as a tool to identify nodes of
interest, and to create a shortlist of nodes to be investigated further in detail, in the context
of Bitcoin forensics.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 16/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


Table 4 Kendall rank correlation coefficient τκ across the centrality algorithms for the Bitcoin sub-
graph dataset (excluding 3926 nodes which all had an entropic centrality score of zero).

CH,p fuCH,p Alpha PageRank PageRank Katz
(α = 0.1) (UW) (W) (α = 0.1)

CH,p 0 0.2331 0.2359 0.3421 0.3956 0.2342
fuCH,p 0.2331 0 0.2135 0.3152 0.4337 0.2118
alpha 0.2359 0.2135 0 0.2652 0.4949 0.0032
PageRank(UW) 0.3421 0.3152 0.2652 0 0.4299 0.2652
PageRank(W) 0.3956 0.4337 0.4949 0.4299 0 0.4965
Katz 0.2342 0.2118 0.0032 0.2652 0.4965 0

Tables 3 and 4 compare the entropic centrality CH,p with other centralities. With
respect to scaled entropic centrality, there is a large variation in the weightages
associated with the edges, which has a significant impact on the relative rankings
between scaled/unscaled entropic centralities. With respect to weighted betweenness,
only three addresses are relevant, they are, with their respective in- and out-degree,
3Eab4nDg6WJ5WR1uvWQirtMzWaA34RQk9s (ranked 1, in-degree: 196, out-degree:
568), 38PjB1ghFrD9UQs7HV5n15Wt1i3mZP8Wke (ranked 2, in-degree: 218, out-degree:
382), and 3CD1QW6fjgTwKq3Pj97nty28WZAVkziNom (in-degree: 14, out-degree: 2807).
The other addresses are ranked 69 (corresponding to a betweenness of 0). The graph has
for largest eigenvalue λ1'7.1644140 and

1
λ1
'0.139578. As with the previous cases, alpha

and Katz centralities are very close to each other, they also agree more closely with CH,p on
the most central addresses, but Table 4 shows that this is not the case in general. The trends
shown by the Kendall rank correlation coefficient is similar to previous cases: there are
more dissimilarities between PageRank and entropic centralities than between alpha/Katz
and entropic centralities, but overall, entropic centralities give a different view point, as
would be expected by extrapolating Borgatti’s view point. The assortativity coefficient of
the illustrated Bitcoin subgraph is≈−0.11914239, suggesting a slight disassortativity. This
is easily explained as an artefact of the way the subgraph was extracted (a small radius
around a node), yielding a couple of hubs with nodes connected only to them (leaves). In
this example, these leaves are having an entropic centrality influenced by having these hubs
as their first neighbors.

As a last scenario, we consider the small network of Maine airports, with their
connecting flights, for a total of 55 airports (see Oggier, Phetsouvanh & Datta, 2018b
for accessing the data.). We created the network based on flights involving passenger
for the period of January 2018 as per the data obtained from the United States
Department of Transportation Bureau of Transportation Statistics website (https:
//www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=292).

In Table 5, we synopsize the Kendall’s tau coefficient τκ. The lower the value of this
coefficient, the closer (similar) two ranked lists are. We see that CH,unif produces results
which are very similar to alpha and Katz centralities, but CH,p yields a reasonably distinct
result instead. Furthermore, results from both PageRank applied to both weighted and
unweighted graphs are most distinct both with respect to entropic centralities, as well

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 17/21

https://peerj.com
https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=292
https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=292
http://dx.doi.org/10.7717/peerj-cs.220


Table 5 Kendall rank correlation coefficient τκ across the centrality algorithms for the airports net-
work data.

Uniform Non-uniform Alpha PageRank PageRank Katz
(α = 0.1) (UW) (W) (α = 0.1)

uniform entropic 0 0.1730 0.0242 0.2087 0.2484 0.0242
non-uniform entropic 0.1730 0 0.1555 0.2228 0.2962 0.1555
alpha 0.0242 0.1555 0 0.2114 0.2390 0
PageRank(UW) 0.2087 0.2228 0.2114 0 0.2713 0.2114
PageRank(W) 0.2484 0.2962 0.2390 0.2713 0 0.2390
Katz 0.0242 0.1555 0 0.2114 0.2390 0

as the other existing centralities explored in our experiments. The assortative coefficient
is ≈−0.71478751, this is thus a disassortative network. Indeed it contains two airports
that serve as hubs, and small airports connected to it (or important airports whose edges
have been cut when extracting the specific subgraph). In terms of entropic centrality, this
corresponds to having small airports inheriting the influence of being connected to hubs.

CONCLUSIONS
In this paper, we studied the concept of entropic centrality proposed by Tutzauer (2007),
which originally determined the importance of a vertex based on the extent of dissemination
of an indivisible flow originating at it, by considering the uncertainty in determining its
destination. We extended this concept to model divisible flows, which better reflect certain
real world phenomenon, for instance, flows of money. In fact, one of the motivating
scenarios that prompted us to study this model was to study the network induced by
Bitcoin transactions—though, in the course of the work, and to validate the model, we
also identified and analyzed other use cases, with arbitrary divisions of the flow. A previous
work which considered only equitable divisions of the flow was shown to be a special case
of the general model studied in this paper.

The flow based entropic centrality model bears in spirit some similarity with eigenvector
based centrality measures in the sense that the importance of vertex node is determined by
taking into account a transitive effect, namely, connections to a central vertex contributes
to increase the centrality. We thus compared our approach with several representatives of
this family, specifically alpha centrality, PageRank and Katz centrality. We observed that
alpha and Katz centralities are closer to entropic centralities than PageRank (in terms of
Kendall tau distance), but they are still fairly different. This could be extrapolated from
the view point of Borgatti (2005), which advocates to use path based centrality for transfer
type of flow, and not eigenvector based centralities, which are best suited for duplication.
This indicates that the new entropic centrality provides novelty not only in the principled
manner in which it captures the phenomenon of divisible flows, but also in terms of the
results and associated insights obtained from it.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 18/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220


ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The work of Phetsouvanh Silivanxay was supported by a NTU Singapore scholarship for
doing his PhD. The funders had no role in study design, data collection and analysis,
decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors:
NTU Singapore scholarship.

Competing Interests
Anwitaman Datta is an Academic Editor for PeerJ.

Author Contributions
• Frédérique Oggier conceived and designed the experiments, performed the experiments,
analyzed the data, prepared figures and/or tables, performed the computation work,
authored or reviewed drafts of the paper, approved the final draft.
• Silivanxay Phetsouvanh performed the experiments, prepared figures and/or tables,
performed the computation work, approved the final draft.
• Anwitaman Datta conceived and designed the experiments, analyzed the data, prepared
figures and/or tables, performed the computation work, authored or reviewed drafts of
the paper, approved the final draft.

Data Availability
The following information was supplied regarding data availability:

The Tehran stock exchange dataset came from Dastkhan H, Gharneh NS. 2016.
Determination of systematically important companies with cross-shareholding network
analysis: a case study from an emerging market. International Journal of Financial Studies
4(3):13 DOI 10.3390/ijfs4030013 and the authors can be contacted at nshams@aut.ac.ir for
the dataset.

The Bitcoin dataset from the Bitcoin transactions log is available at Oggier, Frederique
Elise; Phetsouvanh, Silivanxay; Datta, Anwitaman, 2018, ‘‘A 4571 node directed weighted
Bitcoin address subgraph’’, 10.21979/N9/IEPBXV, DR-NTU (Data), V1.

The Maine airport network dataset is available at Oggier, Frederique Elise; Phetsouvanh,
Silivanxay; Datta, Anwitaman, 2018, ‘‘Maine airport network in January 2018’’, Available
at 10.21979/N9/WM0K5W, DR-NTU (Data), V1.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.220#supplemental-information.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 19/21

https://peerj.com
https://doi.org/10.3390/ijfs4030013
mailto:nshams@aut.ac.ir
https://doi.org/10.21979/N9/IEPBXV
https://doi.org/10.21979/N9/WM0K5W
http://dx.doi.org/10.7717/peerj-cs.220#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.220#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.220


REFERENCES
Benzi M, Klymko C. 2015. On the limiting behavior of parameter-dependent network

centrality measures. SIAM Journal on Matrix Analysis and Applications 36(2):686706.
Bonacich P. 1972. Factoring and weighting approaches to status scores and clique

identification. Journal of Mathematics Sociology 2:113–120
DOI 10.1080/0022250X.1972.9989806.

Bonacich P, Lloyd P. 2001. Eigenvector-like measures of centrality for asymmetric
relations. Social Networks 23(3):191–201 DOI 10.1016/S0378-8733(01)00038-7.

Borgatti SP. 2005. Centrality and network flow. Social Networks 27:55–71
DOI 10.1016/j.socnet.2004.11.008.

Dastkhan H, Gharneh NS. 2016. Determination of systematically important companies
with cross-shareholding network analysis: a case study from an emerging market.
International Journal of Financial Studies 4(3):1–17.

Estrada E. 2011. The structure of complex networks: theory and applications. Oxford:
Oxford University Press.

Fan R, Xu K, Zhao J. 2017. A GPU-based solution for fast calculation of the be-
tweenness centrality in large weighted networks. PeerJ Computer Science 3:e140
DOI 10.7717/peerj-cs.140.

Iannelli F, Mariani MS, Sokolov IM. 2018. Influencers identification in complex
networks through reaction-diffusion dynamics. Physical Review E 98:062302.

Katz L. 1953. A new status index derived from sociometric analysis. Psychometrika
18(1):39–43.

Kendall MG. 1938. A new measure of rank correlation. Biometrika 30(1–2):81–93
DOI 10.1093/biomet/30.1-2.81.

Kondor D, Pósfai M, Csabai I, Vattay G. 2014. Do the rich get richer? an empirical
analysis of the bitcoin transaction network. PLOS ONE 9(2):e86197.

Migliore M, Martorana V, Sciortino F. 1990. An algorithm to find all paths be-
tween two nodes in a graph. Journal of Computational Physics 87:231–236
DOI 10.1016/0021-9991(90)90235-S.

Nakamoto S. 2008. Bitcoin: a peer-to-peer electronic cash system.
Newman MEJ. 2009. Mathematics of networks. In: The New Palgrave dictionary of eco-

nomics. London: Palgrave Macmillan UK DOI 10.1057/978-1-349-95121-5_2565-1.
Nikolaev AG, Razib R, Kucheriya A. 2015. On efficient use of entropy centrality for

social network analysis and community detection. Social Networks 40:154–162.
Oggier F, Phetsouvanh S, Datta A. 2018a. A 4571 node directed weighted Bitcoin

address subgraph. In: Dataverse. DOI 10.21979/N9/IEPBXV.
Oggier F, Phetsouvanh S, Datta A. 2018b. Maine airport network in January 2018. In:

Dataverse. DOI 10.21979/N9/WM0K5W.
Oggier F, Silivanxay P, Datta A. 2018. Entropic centrality for non-atomic flow network.

In: International symposium on information theory and its applications (ISITA).
Piscataway: IEEE.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 20/21

https://peerj.com
http://dx.doi.org/10.1080/0022250X.1972.9989806
http://dx.doi.org/10.1016/S0378-8733(01)00038-7
http://dx.doi.org/10.1016/j.socnet.2004.11.008
http://dx.doi.org/10.7717/peerj-cs.140
http://dx.doi.org/10.1093/biomet/30.1-2.81
http://dx.doi.org/10.1016/0021-9991(90)90235-S
http://dx.doi.org/10.1057/978-1-349-95121-5_2565-1
http://dx.doi.org/10.21979/N9/IEPBXV
http://dx.doi.org/10.21979/N9/WM0K5W
http://dx.doi.org/10.7717/peerj-cs.220


Opsahl T, Agneessens F, Skvoretz J. 2010. Node centrality in weighted networks:
generalizing degree and shortest paths. Social Networks 32(3):245–251.

Page L, Brin S, Motwani R, Winograd T. 1999. The PageRank citation ranking: bringing
order to the web. Stanford: Stanford InfoLab.

Phetsouvanh S, Oggier F, Datta A. 2018. EGRET: extortion graph exploration techniques
in the bitcoin network. In: IEEE ICDM workshop on data mining in networks
(DaMNet). Piscataway: IEEE.

Schoch D, Valente TW, Brandes U. 2017. Correlations among centrality indices and a
class of uniquely ranked graphs. Social Networks 50:46–54.

Tutzauer F. 2007. Entropy as a measure of centrality in networks characterized by path-
transfer flow. Social Networks 29(2):249–265.

Oggier et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.220 21/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.220