key: cord-1011115-68647zyz
authors: Merbis, Wout; Lodato, Ivano
title: Logistic growth on networks: exact solutions for the SI model
date: 2021-09-08
journal: Physical review. E
DOI: 10.1103/physreve.105.044303
sha: 9028fcd263eda0f387ff8ed92fc62b2f5023b6cc
doc_id: 1011115
cord_uid: 68647zyz

The SI model is the most basic of all compartmental models used to describe the spreading of information through a population. Despite its apparent simplicity, the analytic solution of this model on networks is still lacking. We address this problem here, using a novel formulation inspired by the mathematical treatment of many-body quantum systems. This allows us to organize the time-dependent expectation values for the state of individual nodes in terms of contributions from subgraphs of the network. We compute these contributions systematically and find a set of symmetry relations among subgraphs of differing topologies. We use our novel approach to compute the spreading of information on three different sample networks. The exact solution, which matches with Monte-Carlo simulations, visibly departs from the mean-field results.

The dynamical processes of information spreading through a population via individual interaction is ubiquitous in Nature and society [1] [2] [3] [4] [5] [6] [7] , and their importance is demonstrated by the recent COVID-19 pandemic [8] [9] [10] . Typically, there are two approaches to formulate spreading processes: a collective approach based on deterministic compartmental models governing the state of the population as a whole [1, [11] [12] [13] and a more detailed approach aimed at describing the stochastic individual interactions [14] [15] [16] . Various approximation and numerical schemes have been used to solve compartmental models of epidemic spreading, and in certain cases, exact results are known [17] [18] [19] [20] . These models are effective and often give a good estimate of the temporal evolution of the system. However, they are insensitive to the connectivity properties of the individuals within the population. To take this into account, stochastic models of epidemic spreading on networks have been proposed and studied extensively over the past decades [14, 16, [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] . Nonetheless, the presence of dynamical correlations and the non-linear nature of spreading processes make mean-field approximations and numerical simulations a necessity, while exact solutions in network epidemiology remain a rare treat [33] [34] [35] [36] .

In this paper we present a general framework to describe the spreading process on a network, which allows for exact analytic solutions. We focus on the most basics and yet relevant [37, 38] of all spreading processes, the si system. The nodes of the network are either susceptible (s) or infected (i) and, by a Poisson process, infected nodes can infect their susceptible neighbors. While the compartmental model for this system is solved by the * w.merbis@uva.nl † ivano.lodato@gmail.com Verhulst logistic function [39] , to the best of our knowledge, exact solutions to the stochastic network model are still unavailable. Here we intend to fill this gap, while simultaneously introducing a general procedure applicable to other, more complex, spreading processes. Our exact treatment of the si system relies on the formulation of [40] , inspired by many-body quantum systems. Time-dependent expectation values of target nodes (sinks), given an initial configuration of infected nodes (sources), can be obtained as a diagrammatic expansion over subgraphs of the network. Each subgraph represents a contribution to the spreading process from the sources to the sinks. We present a systematic way to compute these contributions as functions of time and furthermore, we find a set of symmetry relations among subgraphs of different topology, which simplifies the computational task. We illustrate our methods by analytically solving the spreading dynamics on three sample networks. The result differs visibly from the individual-based mean-field approximation, but agrees with Monte-Carlo simulations.

Consider an unweighted and undirected network of N nodes, each of which can be in one of two states: s or i. These states can be used as an orthonormal basis for a two-dimensional (2D) vector space, such that [41] :

The network as a whole can be in 2 N possible configurations C ∈ {s, i} N , each of which has an associated vector |C constructed as the tensor product of N individual state vectors (1)

where each state |i = {|s , |i }. The probabilities p(C, t) for the network to be in configuration C at time t are the arXiv:2109.03530v2 [cond-mat.stat-mech] 15 Feb 2022 components of a 2 N -dimensional probability vector |ρ(t)

This vector has unit L 1 -norm by normalization of probability, which we will denote as 1|ρ(t) = 1, where 1| = N i=1 (1 1) denotes the flat state. The si spreading is a Markovian process and hence the probability vector evolves in time by a master equation:

The infinitesimal generator Q(A) is a 2 N -dimensional matrix containing the possible transitions induced between neighboring states in a network with adjacency matrix A. It can be constructed from tensor products of 2D matrices, which act as linear operators on the individual state vectors. In the case of the si-model, it is given by [40] :

The 2 N -dimensional matrices P i i and q j s→i are constructed by taking the tensor product of N −1 2D identity matrices with the projection operator P i inserted at site i and the infinitesimal stochastic matrix q s→i inserted at site j, respectively, and:

Each term in Q(A) performs the following local operations on the individual nodes i, j connected by an edge: P i checks whether node i is infected, by projecting it to the infected basis vector |i . The matrix q s→i simultaneously acts on node j and maps the state |s to the infinitesimal stochastic state |i − |s . This state with vanishing L 1 norm subtracts probability for the node j to remain in the susceptible state and adds probability of it being infected. Our goal is to compute the probability for a certain sink node i to be infected at time t, given a fixed configuration of initially infected source nodes at time t = 0.

We will show in section IV below how this quantity can be expressed as a sum over subgraph diagrams with specific node configurations. Terms where any node other than i are in the infinitesimal stochastic state |i −|s will vanish due to the contraction with the flat state 1|. Therefore, the function i i (t) will only receive contributions from terms where no node other than i is in an infinitesimal stochastic state. We can hence identify the sinks of the spreading process with nodes in an infinitesimal stochastic state. Before proceeding to the diagrammatic expansion of (7), we will discuss how to this formula is derived from a dynamical partition function.

In this section, we will give an expression for the dynamical partition function [42, 43] in the ensemble average over all possible trajectories of the spreading process. We will use this to derive equation (7), as well as expressions for higher-order moments. The partition function is defined as the moment generating function for the observables I i (t), which tracks whether node i is infected at time t.

Where here I i (t) = 1 if node i is infected at time t and I i (t) = 0 otherwise. The dynamical partition function depends on a set of N (dual) variables {s i }, which serve as chemical potentials for the observables I i (t). The expectation value i i1 (t) · · · i in (t) for the joint probability of nodes i 1 , . . . , i n to be infected at time t is obtained from (8) by derivation with respect to s i1 , . . . , s in , followed by setting all {s i } = 0:

The dynamical partition function (8) can be computed by summing e i siI i (C) p(C, t) over all microscopic configurations C ∈ {s, i} N , where I i (C) gives 1 only if node i is infected in the configuration C. In the vector notation |ρ(t) = C p(C, t)|C , we can express e i siI i (C) as a linear operator T ({s i }) which places a factor e si for each infected node i in the configuration C. This operator is a 2 N -dimensional diagonal matrix with the following product form:

(|s s| + e si |i i|) .

(10) Now (8) can be written as the L 1 -norm of the contraction of T ({s i }) with the probability vector |ρ(t) :

By plugging (11) in (9), one obtains the expression

Specifically, the probability for individual nodes i to be infected at time t is (7) . From the dynamical partition function (11) it also follows that all higher moments of I i (t) are equal to the expectation value i i (t) , since:

Here we have used the fact that the projection operator is idempotent, i.e. P 2 i = P i . This implies that knowledge of the single node expectation values is sufficient to compute all higher-moments for the single nodes. Hence, the variance for individual nodes is:

For this reason we will focus in the remainder of the text on computing the single node expectation values (7) .

The expectation value for the prevalence in the whole network i(t) is obtained by averaging (7) over all nodes

This can be obtained from a dynamical partition function Z I (s, t) by setting s i = s/N for all i in (8):

By using the link between (8) and (16), it is also possible to express the variance of the prevalence in the ensemble average over trajectories as:

(18) Unlike with the variance for individual nodes (14) , the total variance depends on the joint probability for nodes i and j to be infected simultaneously i i i j (t) = 1|P i i P j i |ρ(t) for all node pairs. In the diagrammatic expansion, these terms receive contributions from diagrams where both i and j are infinitesimal stochastic nodes (sinks). Higher-order moments are also obtainable from this formalism, but in this paper we will focus only on single node expectation values.

We will now present a systematic procedure for computing the single node expectation values (7) . Its expression depends on the time-dependent probability vector |ρ(t) . By the master equation (4) we can express it as |ρ(t) = exp(tQ)|C 0 , where the initial state vector |C 0 corresponds to the initial configuration of sources and susceptible nodes. The probability vector can be expanded as:

Here henceforth, we adopt a graphical notation to represent the action of the operators P i i , q j s→i appearing in the infinitesimal generator Q(A). We suppress displaying any susceptible nodes, as they have not been acted upon by either of the operators; we denote infected nodes as white circles, and sinks as blue ones:

Any source node will be displayed as a crossed-out white node: . The local operators P i and q s→i in (6) act on the colored and susceptible nodes as:

In addition, the operator Q adds a directed edge pointing from i to j as specified by the adjacency matrix A ij . The expansion (19) can now be organized as a perturbative series of directed diagrams with an increasing number of edges. When starting from an initial configuration with a single source node, the most general diagrammatic expansion for subgraphs with up to three edges reads:

Where the dots denote diagrams with more than three edges. Here each diagram G graphically represents a vector a G [n]|G . The combinatorial coefficient a G [n] counts the number of ways G can be constructed by n applications of Q on the initial configuration, up to a sign discussed below. The vector |G is created as the tensor product of the displayed nodes states, times all the suppressed susceptibles |s . Subgraph diagrams of the displayed topology can appear in the network A with a multiplicity, which is accounted for by the sum over sub-graphs (s.g.) in the above equation. Depending on the network A some of the diagrams might not appear in the expansion, i.e. their multiplicity is zero.

To compute the sum over n in (22) we must obtain the coefficients a G [n] for each diagram. To solve this combinatorial problem we consider the action of Q on an arbitrary diagrams state vector |G . This defines four rules from which diagrams with m + 1 edges are constructed from 'parent diagrams' with m edges. Two rules govern the addition of new nodes to the diagram: Rule 1: If q s→i acts on a new susceptible node and P i on a white node, a new sink is attached to the white node by a directed edge.

Rule 2: If q s→i acts on a new susceptible node and P i on a blue node, the latter node is turned white and a new sink is attached to it.

In addition, internal edges between non-susceptible nodes are added to the diagrams by two more rules:

Rule 3: When q s→i acts on a blue node and P i on a white node, an edge is added from the latter to the former node and a minus sign appears Rule 4: When q s→i acts on a blue node and P i acts on a blue node, this node is turned white and a minus sign appears

Diagrammatically, examples of these rules are:

When following the third rule, the operators P i and q s→i could act on a node pair which are already connected. This operation does not change the number of edges in the diagram, only the sign of the coefficient. It implies that a G [n] receives a contribution −c a G [n − 1], where c is the number of edges ending in sinks. Together with the contributions from parent diagrams p G [n − 1] we can derive a recursive relation for a G [n]:

with

The graph of parental relations between diagrams with up to four nodes. Each diagram contribution as a function of t can be computed by performing the integral (31). The parental contribution pG(t) for each diagram is obtained by the sum of the parent diagrams times edge weights, αGH , denoted inside the squares.

Here P (G) denotes the set of parents of the graph G and α GH gives the multiplicity and sign of the corresponding parental relationship. The set P (G) is obtained by tracing the rules (23)-(26) backwards, and contain only diagrams with one less edge ending a blue node. The collection of all possible diagrams can be thought of as a weighted directed graph of diagrams, where edges represent the child→parent relationships and the edge weights correspond to the α GH in (28) . We display this graph of diagrams explicitly for all possible diagrams with four nodes in figure 1. Equation (27) can be solved iteratively. To this end, it is convenient to express each diagram as a function of t by performing the sum over n in (22) . If we define

for each diagram, the recursion relation (27) becomes a differential equation

with the initial condition a G (0) = 0, except when G = , for which a ⊗ (0) = 1. The solution for a diagram G is given by

So any diagrams contribution to the probability vector (22) can be explicitly computed from knowledge of the contributions of parent diagrams. As the parent diagrams necessarily have one edge less, we can compute all contributions systematically by starting from the smallest possible diagram: a single source node. The expectation value i i (t) in (7) is then obtained by summing the contributions a G (t) for all subgraphs G with fixed source, where the node i is the only sink in G.

Let us present and work out a simple example: the complete graph with three nodes K 3 . We suppose that one of the three nodes is the initially infected source node. There are then four terms in the expansion of (22):

To obtain the contributions corresponding to each of these diagrams we can explicitly compute the integral (31), following the graph of parental relationships in figure 1. The first diagram only receives contributions from the single source node. We will denote its contribution as a function of t by the diagram itself, appended by (t):

Here we have used the initial condition a ⊗ (t) = 1. The next two diagrams in (32) can be derived from this result as

The final diagram receive parent contributions from both these diagrams, with a relative sign due to (25) and (26):

The infected expectation value for each of the two initially susceptible nodes receives contributions only from subgraphs where the node of interest is blue. This can now be computed from the above contributions as:

While the loop contribution is a strictly negative function, it is balanced by two positive contributions corresponding to the two ways to reach the sink node in the loop diagram. Only the sum over all three diagrams corresponds to a probability and is hence bounded between 0 and 1. Using equations (15) and (18) it is straightforward to compute the prevalence and variance for the K 3 network.

For any diagram, a G (t) will be monotonically increasing (or decreasing) and converge to a positive (or negative) integer for t → ∞. Hence, each diagram possesses a sign, which determines whether its contribution is strictly positive or negative. This sign can be derived by the following argument: consider a diagram G with n nodes (of which n i initial infected) and m edges. The action of the generator Q(A) creates a directed edge between two nodes, i → j and may change their state through the action of the operators P i and q s→i . The in-degree k in i for node i counts the number of times this node has been acted upon with q s→i . There is no sign change when q s→i acts on a susceptible node, but after the node is turned blue any consequent action of q s→i on the node produces a minus sign. If the node is ultimately projected upon by P i i , no further sign changes occur. Hence the sign contribution from each white or blue node is (−) k in i −1 . The total sign of the diagram G is then the product over all blue and white nodes, which can be written as

Here we have used the fact that initially infected nodes have no incoming edges, so the sum over all in-degrees of white and blue nodes equals the total number of edges in the diagram, m.

In the above derivation, we are assuming that the subgraph G has only single directed edges. For larger graphs, diagrams with double directed edges can appear, as there could be several paths from sources to sinks through a particular edge. The first example of a double directed diagram is:

The double directed edge implies that some of the parents of this diagram have the arrow pointing one way, while other parents have it in the other direction. In this case, when computing the sign of the diagram, the double directed edges should be counted only once, and hence m in (38) represents the total number of edges of the undirected diagram.

The expectation value i i (t) receives contributions from all subgraphs where node i is the only sink. This sum is, by construction, bounded between 0 and 1, even though the individual terms may not be. In fact, the individual contributions a G (t) to |ρ(t) do not correspond to a probability. Some diagrams have strictly positive contributions and others have strictly negative contributions. In appendix A we prove that in the limit of infinite time, the diagrams contributions converge to a finite integer value k G :

where n o denotes the number of valid orientations the double directed arrows in the diagram G can take on.

(For example, n o = 2 for (39) as the double directed arrow can point both up or down.) Since (40) can take on any integer value, it is clear that only the final sum over all contributing diagrams is bounded as a probability; the individual contributions will conspire to balance negative contributions with positive ones. The number of subgraphs contributing to the expansion (22) grows quickly with the number of edges. However, not all diagrams will give independent and new contributions. We have found and proven a set of symmetry relations and decomposition rules on the diagrams which we will now discuss.

The dynamics of the si system gives rise to a number of relations between the contributions of different diagrams. For notational simplicity, each diagram below will now immediately represent its function a G (t).

A. Reverse the flow: sources ↔ sinks

The first relation states that a G (t) is equal, up to a sign, to a G (t), where G is obtained from G by transforming all of its n i sources into sinks and all of its n b sinks into sources.

Here −→ s.g. denotes any arbitrary directed subgraph consisting solely out of white nodes. This relation states that the flow of information from source to sink is invariant under changing the direction of the flow, up to a possible sign. Some of the first non-trivial instances of this relation are

and:

The invariance under reversal of the flow of the diagram can be understood from the way the diagrams are constructed. For each edge in the diagram, a single operation of A ij P i i q j s→i must have been applied to construct the edge from i to j [44] . If we replace ↔ , then the diagram of the same topology is constructed by applying A ij P j i q i s→i to the node pairs (i, j), creating the same edges in the opposite direction. Since the total number of operations to create this inverted diagram is unchanged, the only difference between the two contributions can be a sign. The sign arises because to obtain the flipped diagram the operator q s→i may act on a different number of blue nodes. Specifically, from (38) , by reversing the flow the contribution gets a minus sign if the difference (n i − n b ) is odd.

An extension of the above rule exists for diagrams whose only parents are related by the above symmetry relation. For instance, since the parents of the first two diagrams below satisfy the relation (42) , it follows that:

where the last diagram is obtained from the second by again reversing the flow. This relation generalizes to any diagram consisting out of a n node chain with an arbitrary subgraph g in the middle. For any k ≤ n, we have diagrammatically:

Here g represents an arbitrary graph consisting solely out of white nodes.

The second relation states that, for a graph G with s sources of degree one, each connected to distinct nodes, a G (t) = a G (t) where G is obtained from G by merging all sources into a single source of degree s:

By the first relation (41) this implies that also s sinks can be merged into one, now producing the sign (−1) s−1 :

In the above equalities, the subgraph g can in principle contain other sources or white nodes, but it must contain at least one sink; contrarily, the subgraphg must contain at least one source but it can contain other white or blue nodes.

Two elementary examples of this rule are:

A consequence of this relation is that diagrams containing multiple branches from the source node can be factorized into the product of the branches. For instance, the last diagram above can be further decomposed by separating the two branches from the source node, resulting in two disconnected diagrams. The diagram contribution thus factorizes into the contributions from the branches:

In general, any diagram containing multiple branches from the source node factorizes into the product of the branches. Graphically:

Here B 1 , . . . , B n are n mutually disconnected subgraphs containing at least one blue node. The proof follows immediately by separating the source into n sources. As the spreading pathways on disconnected components correspond to independent events, the contribution factorizes into the product of the branches contributions.

A final relation involves diagrams with a single source of degree one. It is graphically:

Here g represents a diagram with c edges to sinks, and p g represents its parent contribution. The last diagram on the right hand side is obtained from the original diagram by moving the source node along its only edge. An explicit example of this rule is:

In appendix B we provide a proof of this symmetry relationship.

Using combinations of the symmetry relations, many diagrams containing a single source can be expressed in terms of (sums or products of) simpler diagrams. For instance, any tree diagram can be decomposed completely in terms of only chain diagrams by using (51) and (52). The contribution from chains of any length d is given by [40] :

where Γ(d, t) is the upper incomplete Gamma function

Any diagram which can be mapped to a tree by separating sinks can consequently also be decomposed in terms of chain diagrams. One example of such decomposition is:

Here the first equality is obtaining by separating the sink. The second line follows from (52) and on the third line (53) has been used, together with the observation (51) that branches emanating from a single source factorize into the product of the branches.

Due to the large number of subgraphs contributing to (22) , explicit computation becomes increasingly prohibitive. For this reason, we have created a Python package [49] that, given an input network with specified sources and sinks, computes all contributing subgraphs, their parental relations (28) and the integrals (31) analytically. Here we present explicitly the exact solutions for the prevalence (15) on three small sample networks in figure 2. We suppose that the node marked dark red is a source and compute the expectation values (7) for all of the other nodes. Our algorithm, as detailed in appendix C, computes for each graph all contributing diagrams, finds their parent contributions and constructs the graph of parental relationships. This allows us to integrate (31) starting from the smallest subgraph. The normalized sum over all diagrams with a single sink node gives the prevalence, which we plot as solid black lines in figure 2. [45] with 15 nodes and 20 edges (top), a randomly generated Newman-Watts-Strogatz small-world network [46, 47] with 9 nodes and 16 edges (center) and a random Erdös-Rényi graph [48] of 12 nodes and 17 edges (bottom). In each graph, the dark red node is the initial infected source. The solid black line gives the analytic result from summing all contributing diagrams, and agrees well with the numerical simulations (black dots, averaged over 60 000 runs). The green dashed line shows the individual based mean-field approximation and is in poor agreement with the exact result.

Our analytical results (black line) are in perfect agreement with Monte-Carlo simulations (black dots). These were performed by initializing the networks in the same initial state as in the analytic computation. Then, the Markovian dynamics is simulated by taking small time-steps where susceptible nodes connected to infected nodes are infected with a small infection probability τ . As this is the only parameter in the model, τ effectively plays the role of the time-step ∆t for a dimensionless time. We choose the value of τ = 0.002, small enough to guarantee that at most a single node will flip its state in any one time-step. We simulate n = 60 000 spreading trajectories over the same network and record the infection averages. The difference between the simulated average and the analytical formula is of the order of σ/ √ n with σ the standard deviation of the simulated runs, as shown in figure 3 . This is a good indication that the averages converge to the exact result in the limit of infinite simulation runs. Note also that analytic and Monte-Carlo results clearly depart from individual-based mean field approximation (green dashed line), which gives an overestimation of the prevalence for finite t, as can be seen in figure 2 .

Besides the average expectation value for the full graph, we have also obtained the expectation value for individual nodes in the network. In figure 4 we plot the analytical result (solid line), the Monte-Carlo averages (dots) and the individual based mean-field approximations (dashed line) for three selected nodes in each network. The mean-field approximations are obtained by standard methods (see for instance [7] ). The nodes in the network are assumed to be statistically independent, such that i i s j = i i s j at each moment in time. The individual-based mean-field approximation then consists out of the coupled set of ordinary differential equations (ODEs):

Here n i i (t) is the (mean-field) probability of node i to be infected at time t. This set of equations is easily integrated numerically using a standard ODE integration routine (odeint). The approximation can by improved by including the dynamics for pairs and triples, closing the equations at higher orders [50] , though our exact approach captures all correlations present in the network.

In this paper, we have presented a general formalism which allows for obtaining analytic solutions for the expectation values, as well as higher momenta, for the si spreading process on general network. We have decomposed the time-dependent probability vector into a sum over subgraph diagrams, each of which can be computed systematically. We uncovered a set of symmetry relations, which relate contributions from diagrams among each other. Our methods are implemented numerically in [44] and give results which match with Monte-Carlo simulations for several small sample networks of differing topologies.

Even for the relatively small networks analyzed in the previous section, the explicit computation still involves performing thousands of integrals. It is a difficult task to determine exactly how the number of contributing diagrams scales with system size, as this depends strongly on the network topology. In general, the addition of a new edge could potentially double the number of contributing subgraphs, which will then scale exponentially with the number of edges. Using our formalism, one can track only the necessary diagrams contributing to the spreading and systematically organize their contributions by number of edges. In fact, since any diagram with m edges will start contributing at the order t m in the expansion (22) , only the smallest contributing subgraphs need be taken into account for an early time estimate of the spreading process. Furthermore, the uncovered symmetry relations and combinations thereof further reduce the number of independent diagrams to be computed. For example, for the three sample networks in figure 2 , the number of inequivalent diagrams, includ-ing all parents are 9314 (top), 21 376 (center) and 6055 (bottom), out of which only 2274, 1082 and 358 can not be reduced to smaller diagrams using the symmetry relations presented. More symmetry relations may exist, which could further reduce the number of independent diagrams needed to compute the expectations values analytically. For instance, it would be interesting to study relations among contributions of diagrams with double directed edges.

The formalism presented in this letter can be applied to different compartmental models of epidemiology and other out-of-equilibrium stochastic systems on networks [4, 16, 51] . It would be interesting to determine under which conditions exact solutions of such models can be found. We suspect this to be the case for any system where the graph of diagrams is acyclic, such that any diagrams contribution can be traced back to the smallest diagram of a single source.

We will now prove that the contribution a G (t) of each diagram G reaches asymptotically a constant integer value k G = 0. As a first step, we explicitly check that the function of t for the unique network with m = 1 edge converges to a constant (integer) value. This is trivial since the expectation value for a one-edge network takes on the form 1 − e −t which converges to 1 for t → ∞. Next we use the inductive hypothesis that all diagrams G m−1 with m − 1 edges converge to a non-zero constant for t → ∞, i.e:

such that lim t→+∞ f Gm−1 (t) = 0 and k Gm−1 = 0. Now we want to show, using the above hypothesis, that all diagrams with m edges converge to a non-zero constant asymptotically. Let us consider the integral 

where α GK is the multiplicity and relative sign of the (m − 1)-edged parent diagram K and k K denotes the limiting value of that parent. Now we can separate two contributions:

Note that the first term contains a finite contribution when t → ∞. The second term converges to 0 for t → ∞. To see this, consider:

The asymptotic limit of e −ct F (t) is evaluated as:

where to obtain the second equality we used L'Hôpital's rule and in the second line the inductive assumption (A1) was used. In addition, lim t→∞ e −ct F (0) = 0, since F (0) can only be a constant. Hence we have shown that all diagrams, regardless of the number of edges, will converge asymptotically to a non-zero constant k G , such that

Next, we prove that k G is a non-zero integer, i.e. k G ∈ Z. First, the one-edge diagram G 1 has asymptotic value k G1 = 1. Let us assume (inductive hypothesis) that all diagrams with m − 1 single-directed edges have limit ±1.

If no double-directed edges are present the number of parents will always be equal to the number of edges going into the blue nodes in the original diagram, c. To see this, consider one such diagrams G m with m edges: one can unequivocally obtain its parents G m−1 ∈ P (G m ) by systematically removing one of the edges connecting to one of the sinks. As there are c such edges, there will be c parents. Finally, the sign of each parent contribution to (A7) is equal to the sign of G, sign(α GK k K ) = sign(G). This is due to the fact that parents K with signs opposing G will transition into G by one of the rules which induce a sign change. This sign is then contained in α GK and the resulting product α GK k K will have the same sign as G. By induction, then, for the m-edged diagram G, we have that K∈P (G) α GK k K = sign(G)c and hence (A7) gives:

If double directed edges exist in a diagram G D with c edges to blue nodes, then there could be more than c parents. The reason for this is that each direction on the double directed edge will correspond to different parent diagrams. We can then write down all possible diagrams with the same topology as G D , but where the D double directed edges take on a given orientation, consistent with the graphs source and sink configuration. The number n o of these diagrams is bounded n o ≤ 2 D , since certain double-arrows may exist only conditionally to the existence of other double arrows. Now, each diagram among these n o will have no double-directed edges and c edges to blue nodes, and hence, c parents. Each of these parents will have limit sign(G D ) by the above argument, so finally:

Hence any diagram will, regardless of its topology, converge to a non-zero integer. This integer corresponds to the number of possible orientations of the double directed edges in the diagram (or equal 1 if there are no double directed edges). For example, the first diagram with a double directed edge (39) converges to −2, since its sign (38) is negative and there are two possible orientations of the double directed edge (up or down). Each choice of orientation has 2 parents with limit −1, resulting in a total of 4 parents. The diagram has c = 2, such that (A7) gives −2.

Appendix B: Proof of (52)

Here we will provide a proof of the 'cutting off sources' symmetry relation, presented in section VI C. For notational convenience, we will use the functions a G + (t), p G + (t) and a G (t) to denote the diagrams contributions. Here G + is the graph obtained from G by extending the source by one edge. Equation (52) of the main text then reads:

Let us first start by considering a G + (t), recalling the definition of the contribution as

Here m is the number of edges of the diagram G. The coefficients a G + [n] count the number of ways the graph G + can be constructed by applying Q n times on the initial configuration. The difference between the diagrams G + and G is the presence of a chain of length one emanating from the source. Hence to construct the diagram G + , first the chain must be made, and then the diagram G can be created. So for the operator Q to generate G + in n > m steps, it must be applied at least once to create the chain, and at most n − 1 times to create the m edges of G. The total number of ways to construct G + is then given by the sum over all ways to construct the chain in 1 ≤ i ≤ n − m steps times the number of ways to construct G in n − i steps, such that:

Now, the chain of length one G 1 satisfies the recursion relation

with the boundary conditions that a G1 [n = 1] = 1. This implies that a G1 [n] = (−1) n+1 so that equation (B3) becomes

Consider now the sum: ] or equivalently:

Next, we multiply this equation by t n /n! and sum over n. The right hand side immediately turns into the corresponding contributions as a function of t. The left hand side can instead be expressed as the time derivative of a G + (t):

Here we have used the dynamical equation (30) in the main text to obtain the last equality. A simple rewriting now reproduces the equation (B1) for a generic graph G. One can also obtain a relation between a G + (t) and a G (t) independent of the parents p G + (t) from solving the differential equation in the first line of (B8) as

Appendix C: Analytical methods and code

To compute the exact solution for the expectation value in a sample network we developed a numerical code in Python for the integration of each diagram, which combines the network functionality of NetworkX with the symbolic manipulation and integration functionality of SymPy. The algorithm to obtain the exact solution for i i (t) in (7) consists out of three basic steps. First, we create a list of contributing diagrams as NetworkX Di-Graphs by collecting all (labelled) subgraph configurations where the node i is blue. Secondly, the graph of diagrams GG for each inequivalent diagram in the list is constructed by finding the parents of all diagrams, starting from the largest subgraph down to the initial conditions. Grandparents are added iteratively and each diagram is stored as a node in GG. The multiplicities α GH of topologically inequivalent parents are stored as the edge weights of GG. Finally, a G (t) for each diagram in GG is obtained by integrating eqn. (31) . This is done starting from the smallest diagram up to the largest contributing diagram. The result is then the sum over all diagrams contributions obtained in the first step. The pseudo-code for the algorithm to compute the expectation value is given in Algorithm 1. It relies on a number of functions which we will expand upon below. ContributingDiagrams(G, s, i) computes a list of all labelled subgraph configurations which contribute to the expectation value for the node i to be infected at time t, given that node s was initially infected. Output: list of parent diagrams parents 4: for b ∈ diagram such that state[b] = blue do 5: neighbors ← list of neighbors of b 6: for i ∈ neighbors do if k i out = 0 then If there are no outgoing edges, the neighbor i must be blue 9: state[i] = blue 10: append diagram to parents 11: if n i out ∈ n i in then All outgoing edges are double directed: the neighbor can be both white and blue 12: fix directed edges of diagram and append to parents 13: if state[i] = source then Sources can not change state to blue 14: state[i] = blue 15: fix directed edges of diagram and append to parents 16: else There is at least one single directed outgoing edge and the neighbor must stay white 17: fix directed edges of diagram and append to parents 18: return parents made to include only actual parents in the list of parent diagrams, as some coloring of nodes can not be obtained from the forward action of Q on the initial conditions. For instance, it is impossible to have a diagram with two neighboring blue nodes (connected by an edge). Similarly, it is not possible to have a branch in the diagram without a blue node, as there would be no path from source to sink in that branch.

To determine whether a candidate parent of a given diagram can actually exist, we exploit the directed edges representing the flow from sources to sinks. Specifically, we must check that for each neighbor i of a sink, after removal of their edge, other outgoing edges from i exist. If so, the node i cannot be blue and must be white (sinks cannot have outgoing edges). If, on the other hand, the neighboring node has only incoming edges, it must be turned blue, since the flow of information must end in a sink. Finally, there is a scenario where the node i can be both blue or white. This happens when the node has only double directed edges as outgoing edges, and so the set of out-neighbors of i, n i out is contained within the set of in-neighbors n i in . In that case, both the diagrams with node i white and blue are valid parents. It could also be that some paths are no longer possible after removing the edge to a blue node, or after turning a white node blue. Therefore, the direction of the edges for the parent diagrams should be re-evaluated in each case. The pseudo-code for the algorithm which constructs all physical parents of any arbitrary diagram is given in Algorithm 3.

Our code is freely available in the online public repository [49] , along with a list of contributions a G (t) for all 80 332 diagrams with up to 10 edges. The repository also includes several Jupyter notebooks where the example graphs of the main text are computed using our algorithms. These notebooks contain the exact analytical expressions for the infection prevalence plotted in Fig. 2 of the main text. ACKNOWLEDGMENTS WM wishes to thank Clelia de Mulatier for useful discussions and acknowledges the support from the NWO Klein grant awarded to NWA route 2 in 2020.

Infectious diseases of humans

Exploring complex networks

Statistical mechanics of complex networks

Complex networks: Structure and dynamics

Dynamical processes on complex networks

Statistical physics of social dynamics

Covid-19: The unreasonable effectiveness of simple models

Covid-19 superspreading suggests mitigation by social network modulation

Inefficiency of sir models in forecasting covid-19 epidemic: a case study of isfahan

A contribution to the mathematical theory of epidemics

Modeling infectious diseases in humans and animals

Epidemic spreading in scale-free networks

Spread of epidemic disease on networks

Epidemic processes in complex networks

The explicit series solution of sir and sis epidemic models

Exact analytical solutions of the susceptible-infected-recovered (sir) epidemic model and of the sir model with equal death and birth rates

Exact solution to a dynamic sir model

Near-exact explicit asymptotic solution of the sir model well above the epidemic threshold

Infection dynamics on scalefree networks

Epidemic dynamics in finite size scale-free networks

Epidemic threshold in structured scale-free networks

Absence of epidemic threshold in scale-free networks with degree correlations

Virus spread in networks

Thresholds for epidemic spreading in networks

The n-intertwined sis epidemic network model

Solving the dynamic correlation problem of the susceptible-infected-susceptible model on networks

Unification of theoretical approaches for epidemic spreading on complex networks

Thermodynamic efficiency of contagions: a statistical mechanical analysis of the sis epidemic model

Effective approach to epidemic containment using link equations in complex networks

Coevolution spreading in complex networks

Exact epidemic models on graphs using graph-automorphism driven lumping

Exact deterministic representation of markovian sir epidemics on networks with and without loops

Susceptible-infectedsusceptible epidemics on the complete graph and the star graph: Exact analysis

Time-dependent solution of the nimfa equations around the epidemic threshold

Si epidemic model applied to covid-19 data in mainland china

Predicting the speed of epidemics spreading in networks

Notice sur la loi que la population suit dans son accroissement

Exact epidemic models from a tensor product formulation

We use a bra-ket notation for column vectors a| and row vectors |b

Chaos, scattering and statistical mechanics

Thermodynamic formalism for systems with markov dynamics

We assume the underlying network topology has only undirected edges, such that A ij = A ji

Cumulated social roles: The duality of persons and their algebras

Renormalization group analysis of the small-world network model

Collective dynamics of 'small-world'networks

On the evolution of random graphs

Exact solutions of the SI model on networks

Binary-state dynamics on complex networks: Pair approximation and beyond

Algorithm 1 Compute expectation value 1: procedure ComputeExpVal(G, s, i) 2: Input: graph G, source node s, sink node i 3: Output: Expectation value i i (t) 4: contributions ← ContributingDiagrams(G, s, i) 5: GG.nodes ← inequivalent diagrams from contributions 6: for all diagram ∈ GG.nodes do Run from largest to smallest diagram 7: parents ← GetParents(diagram) 8: Append parents to GG.nodes 9: for all p ∈ parents do 10: add edge (diagram, p) to GG.edges with weight w = Sign(diagram) * Sign(p) 11: a(t) ← 1 for the initial configuration s 12: for all diagram ∈ GG.nodes do Run from smallest to largest diagram 13: p(t) ← sum over w * a(t) for all neighbors p of diagram 14: c ← total degree of blue nodes in diagram 15: a(t) ← e −ct t 0 e c s p(s)ds 16: result ← sum over a(t) for each diagram ∈ contributions 17: return result Algorithm 2 Collect contributing diagrams 1: procedure ContributingDiagrams(G, s, i)

Input: graph G, source node s, sink node i 3:Output: list of contributing diagrams contributions 4:contributions ← Digraph ← edges 6: for diagram ∈ contributions do 7:for edge ∈ diagram do if diagram = connected then 10: for all j ∈ diagram do 11: if path(s, i) through j then 12: remove node j Removes nodes not in the in-component of i 13: add diagram to contributions Only add each unique configuration once 14: return contributions GetParents(diagram) gives a list of parent diagrams which transition into diagram by applying the dynamical rules of the si system. Sign(diagram) simply computes the sign (38) of a diagram and is needed if the dynamical rules generate a sign difference between the parent and the child diagram, for instance by closing a loop on a blue node.

The list of contributing diagrams is constructed as follows. First, the graph G is initialized by defining the states of the source node s and the sink node i. Then all possible paths from s to i are stored in a list and a directed diagram is built from these paths, with only white nodes in between s and i. The resulting diagram is an acyclic directed graph containing all possible paths from s to i corresponding to the largest possible diagram contributing to the expectation value. From this diagram, edges are removed systematically such that the resulting graph is connected and all remaining nodes stay within the in-component of the blue node i. If any node is not in the in-component of the node i, it is removed along with all edges connected to it. Each resulting connected diagram is added to the list of contributing diagrams. The pseudo-code for this algorithm is provided in Algorithm 2.

In order to integrate any give diagram, one needs to recover the parents of that diagram. This can be done by tracing the rules of the si model backwards. The parents of any given diagram necessarily have one edge less connected to a blue node. They can thus be obtained by systematically removing single edges connected to the sinks, while leaving all other edges intact. However, the neighbor connected to the removed edge might change state from white to blue, as the forward action of Q can project blue nodes into white ones. Special care must be