key: cord-0043385-mzyihmct authors: Castellano, Claudio; Pastor-Satorras, Romualdo title: Relevance of backtracking paths in recurrent-state epidemic spreading on networks date: 2018-11-27 journal: nan DOI: 10.1103/physreve.98.052313 sha: 1cd2cdfe419f2c9826ae74ec3cdb2beb740d5ed0 doc_id: 43385 cord_uid: mzyihmct The understanding of epidemics on networks has greatly benefited from the recent application of message-passing approaches, which allow us to derive exact results for irreversible spreading (i.e., diseases with permanent acquired immunity) in locally treelike topologies. This success has suggested the application of the same approach to recurrent-state epidemics, for which an individual can contract the epidemic and recover repeatedly. The underlying assumption is that backtracking paths (i.e., an individual is reinfected by a neighbor he or she previously infected) do not play a relevant role. In this paper we show that this is not the case for recurrent-state epidemics since the neglect of backtracking paths leads to a formula for the epidemic threshold that is qualitatively incorrect in the large size limit. Moreover, we define a modified recurrent-state dynamics which explicitly forbids direct backtracking events and show that this modification completely upsets the phenomenology. The study of epidemic processes [1] has been one of the fundamental levers in the development of modern network science [2] . Research in this field has been based on the consideration of several simplified epidemic models [3] , the most fundamental being the susceptible-infected-susceptible (SIS) and the susceptible-infected-removed (SIR) models. In both of them, a susceptible (healthy) individual, in contact through an edge with an infected individual, contracts the disease at rate β. In the SIS model infected nodes become spontaneously susceptible again at rate μ, while in the SIR model infected individuals become removed and cannot contract the disease again. This variation results in a different collective behavior: SIS allows for a long-lived (infinitely long in the thermodynamic limit of infinite network size) steady state with a finite fraction of infected individuals, while the SIR model proceeds by epidemic outbreaks of finite duration that reach a given fraction of individuals in the population. Both these models are characterized by the presence of an epidemic threshold λ c for the control parameter, defined as the ratio between the infection and the healing rates, λ = β/μ. In the SIS model, the region λ λ c corresponds to the healthy phase, in which epidemic episodes die out exponentially fast; for λ > λ c , on the other hand, a steady state with a finite fraction of infected individuals is observed. For the SIR model, on the other hand, the healthy phase corresponds to outbreaks affecting an infinitesimally small fraction of the population, while for λ > λ c the final fraction of removed individuals becomes finite. Initial analytical understanding of epidemic processes on networks was based on the so-called heterogeneous * Corresponding author: claudio.castellano@roma1.infn.it mean-field (HMF) approach [4] , which assumes that the network is annealed [5] and topologically uncorrelated [6] . Under these assumptions, both the SIR and the SIS processes exhibit a threshold λ HMF c inversely proportional to the second moment q 2 of the degree distribution P (q ) characterizing the topology of the network [2] . For a distribution scaling as a power law for a large degree, P (q ) ∼ q −γ , as empirically observed in many contexts [7] , this second moment scales, for γ 3, as q 2 ∼ q 3−γ max , where q max is the maximum degree in the network [8] . In this regime γ 3, the maximum degree diverges in the limit of infinite network size, leading to a vanishingly small epidemic threshold. For γ > 3, instead, both the second moment and the threshold attain a finite value in the thermodynamic limit. For the SIS model, a more refined approach, the quenched mean-field (QMF) theory [9] [10] [11] [12] , predicts an epidemic threshold λ QMF c = 1/ A M , where A M is the largest eigenvalue (LEV) of the adjacency matrix A ij representing the structure of connections in the network [13] . The observation that, in general networks with a power-law degree distribution, A M diverges with network size [14, 15] leads to a vanishing threshold for any value of the degree exponent γ , in contradiction to HMF theory. The validity of QMF theory was numerically checked [16] and later extended by taking into account long-distance reinfection events between high degree nodes [17] . For the case of the SIR model, a new theoretical framework was recently proposed based on the message-passing (MP) approach [18] . This approach builds on the observation that a node i that has been infected by a neighbor node j has absolutely no chance to reinfect j . This amounts to disregarding, in the epidemic evolution, the possibility of backtracking paths allowing for the mutual reinfection of pairs of connected nodes. A theory based on the mapping of the SIR model onto bond percolation [19] leads to an epidemic threshold for the SIR model given by the inverse of the largest eigenvalue H M of the Hashimoto, or nonbacktracking, matrix [20, 21] . This matrix has also been applied to the definition of centrality measures in networks [22] and to the detection of communities [23] . The validity of the MP method for percolation and SIR dynamics has been numerically confirmed [24] . In a recent paper, Shrestha et al. [25] applied the same MP approach to the SIS model, arguing that backtracking paths lead to "echo chamber" effects (successive reinfection events between neighboring nodes) and those effects should be disregarded as somehow pathological. Application of the MP approach to the SIS model provides a set of differential equations that can be solved to obtain the evolution of the prevalence (density of infected individuals) as a function of time and leads to an epidemic threshold again inversely proportional to the largest eigenvalue H M of the nonbacktracking matrix. Based on numerical simulations of the SIS process on a variety of small real and synthetic networks, the authors of Ref. [25] conclude that their MP approach provides an accurate estimate of the evolution of prevalence with time, as it takes properly into account dynamical correlations (neglected in both the HMF and QMF approaches), as well as a better bound for the epidemic threshold. The results of Ref. [25] are in striking contradiction to previously published literature, and the physical picture emerging from it. In particular, Ref. [26] pointed out the existence of different mechanisms triggering the onset of the extended and long-lasting outbreaks, highlighting the crucial role played in some cases by the subset composed by the node with the largest degree (the hub) and its immediate neighbors. This star graph alone is sometimes able to self-sustain the epidemic and spread it to the rest of the system: Backtracking events, with the repeated mutual reinfection among the hub and the leaves of the star graph, are at the heart of this phenomenology. The physical picture introduced in Ref. [26] has been confirmed elsewhere [16, 27] . A clarification of the apparent contradiction with respect to Ref. [25] is therefore needed. In this paper we show that, even though in some cases a MP approach might provide a good approximation of the value of the prevalence, it does not account properly for the position of the epidemic threshold in power-law distributed networks, especially in the case of very large networks. Comparing the predictions of QMF and MP theories, as given by the inverse of the largest eigenvalues of the adjacency and Hashimoto matrices, with each other and with direct large scale numerical simulations of the SIS model on real and uncorrelated synthetic networks, we show that MP theory does not capture the correct behavior of the epidemic threshold, particularly for large values of the degree exponent γ . We also observe that the MP prediction for the threshold is not a bound for the true value, while the scaling of the threshold with network size is instead more accurately described by QMF theory. Moreover, we consider a modified SIS model in which backtracking is hindered and show that its behavior radically differs from the original SIS model. Our work shows that backtracking paths, at odds with the claims in Ref. [25] , do play a fundamental role in the dynamics of the SIS model and are indeed at the core of the vanishing threshold observed asymptotically for uncorrelated networks with a power-law distributed degree. The quenched mean-field theory predicts the SIS threshold is inversely proportional to the largest eigenvalue A M of the adjacency matrix. The message-passing approach gives an analogous formula with the only difference being that the matrix for which the largest eigenvalue must be calculated is the Hashimoto matrix. In order to visualize the comparison between both predictions and also see what happens when N is not large, we evaluate numerically the largest eigenvalue of the adjacency and Hashimoto matrices. For the adjacency matrix, A M is calculated by applying a simple power iteration strategy [28] . In the case of the Hashimoto matrix, H M is evaluated by using the Ihara-Bass determinant formula [22] , i.e., by computing the largest eigenvalue of the 2N × 2N matrix where A is the adjacency matrix; I is the identity matrix; D is a diagonal matrix, with D ii = q i , the degree of node i; and 0 is the null matrix. The largest eigenvalue of the matrix M is again computed using the power iteration method. To compare the theoretical predictions with actual values of the SIS threshold, we compute the latter using the lifespan method [17, 29] . In this approach, simulations start with only the hub being infected. For each run, one keeps track of the coverage c, defined as the fraction of different nodes that have been infected at least once. In an infinite network, this quantity is vanishing for λ < λ c , while it tends asymptotically to 1 in the active region of the phase diagram. In finite networks, one can set a coverage threshold c t and consider all runs that reach c c t as endemic. The average lifespan T , where averaging is restricted only to nonendemic runs, plays the role of a susceptibility: The position of the threshold is estimated as the value of λ for which T reaches a peak. In our simulations we choose c t = 1/2. We start our analysis considering the set of 109 real networks described in Ref. [30] . This set represents a collection of networks of widely varying size, heterogeneity, and level of topological correlations [6, 31] (see Ref. [15] ). In Fig. 1 we present a comparison of the numerically estimated values of the LEVs A M and H M , presented as a scatterplot. As we can see from Fig. 1 , the difference between the two LEVs is minimal, being almost equal with the exception of the networks with a low level of heterogeneity, as measured by the factor q 2 / q . Thus, only for small values of q 2 / q is the LEV of the Hashimoto matrix noticeably smaller than A M . It is interesting to notice that this behavior is not clearly correlated with network size (data not shown) since some homogeneous networks with large size show marked differences in their LEVs. Because of the similarity among the two predictions it is difficult to assess which of the two better reproduces numerical SIS thresholds. We present such a comparison in Fig. 2 , where we plot the numerical SIS threshold λ c , computed for all networks of the data set considered (except the three [30] . The color of the symbols encodes the network heterogeneity as measured by the factor q 2 / q . largest), as a function of the QMF and MP predictions. As we can see from Fig. 2 , the relative accuracy of the two theoretical predictions is practically the same. The MP theory seems to be marginally better in some cases, but this is a consequence of the fact that the MP prediction (contrary to the QMF prediction) is not a strict bound on the actual threshold. While the consideration of real networks is undoubtedly interesting, it suffers from the problem that real networks are topologically complex, being rife with correlations, clustering, and nontrivial community structures [2] , which are not taken into account in theoretical approaches. For this reason, we now turn to the analysis of synthetic uncorrelated networks, lacking all those topological complications. In the case of uncorrelated networks, Chung and collaborators [14] provide an expression for the LEV of the adjacency matrix that can be safely recast [15, 32] as For the Hashimoto matrix instead, the value of the LEV can be approximated by [22] [23] [24] H M ≈ q 2 / q − 1. It is immediately noticeable that the two expressions are practically the same if the max in Eq. (2) is given by the second argument and q 2 / q is sufficiently large. For powerlaw distributed networks of size N this always occurs in the large N limit, provided γ < 5/2. On the other hand, it is clear that the two expressions give a qualitatively different result when √ q max is largest in Eq. (2), i.e., for γ > 5/2 and large N [10, 32] . To investigate what happens also for moderate network size, we numerically compute the eigenvalues A M and H M for uncorrelated synthetic power-law networks generated using the uncorrelated configuration model (UCM) [33] , with minimum degree q min = 3 and a maximum degree q max = N 1/2 for γ 3 (to avoid degree correlations [34] ) and q max = N 1/(γ −1) for γ > 3 (to avoid large sample-to-sample fluctuations [35] ). In Fig. 3 we present a scatterplot of the largest eigenvalue of the Hashimoto matrix as a function of A M for various degree exponents γ and network sizes ranging from N = 10 3 to 10 7 . As expected, for γ 5/2 the largest eigenvalues of both matrices attain essentially the same value in the limit of large (3), we know that, asymptotically, the former reaches a constant value, while the latter diverges. This represents a strong contradiction between the two theories: While QMF predicts a vanishing threshold for all γ , MP gives a finite threshold for any γ > 5/2. Notice also that for small values of N the two eigenvalues are not equal but also not very different. This analysis of synthetic networks is in agreement with our observations on real ones. Highly heterogeneous networks have large and practically identical LEVs. On the other hand, for networks with low heterogeneity, the LEVs are dissimilar, with H M in general smaller than A M . In order to compare the validity of the two theoretical approaches with respect to the epidemic threshold of the SIS model, in Fig. 4 we plot the inverse numerical epidemic threshold, estimated from the lifespan method, and the inverse of the QMF and MP predictions as a function of network size on uncorrelated networks generated with the UCM algorithm for different values of γ . From Fig. 4 it turns out that for γ = 2.2 the predictions of the two theories essentially coincide and reproduce numerical results extremely well. Indeed, the MP prediction seems, in this case, to work slightly better than QMF. This is due to the additional term of −1 in Eq. (3) with respect to Eq. (2). This term accounts for dynamical correlations that in general prevent a node from infecting the node that transmitted the disease to it in the first place, as the latter is, with high probability, still infected [17] . The opposite occurs, on the other hand, for γ > 5/2. In the case γ = 2.8, both the inverse numerical SIS threshold and the QMF and MP predictions diverge in the large size limit, but the slope of the growth of the SIS threshold is better captured by QMF. For γ = 3.2, the MP prediction tends to a constant value, while the inverse SIS threshold grows as a power law in the same fashion as the inverse QMF prediction. Notice also that while the QMF prediction is an upper bound for the inverse threshold, this is clearly not the case for the MP prediction. We conclude that while both theories perform extremely well for heterogeneous networks with small values of γ , for less heterogeneous networks they are both inaccurate. However, QMF captures the fundamental feature that the threshold vanishes in the large size limit, while the neglect of backtracking events has the consequence that MP is qualitatively off target. In order to further clarify the relevance of backtracking paths in the SIS dynamics, we consider a modified SIS-like dynamics, in which such paths are strongly depressed. First of all, let's point out that if backtracking is fully ruled out, no steady state is ever possible. Indeed, in that case, each infection event practically "removes" the corresponding edge, which is not available any longer for transmitting the infection. In the long run all edges are removed, and the absorbing state with all nodes in state S is the only possible asymptotic configuration. Therefore, we consider a modified SIS dynamics where backtracking events are prohibited but only until other infection events take place. Our model, which we dub nonbacktracking SIS (NBSIS) dynamics, is defined in terms of the SIS dynamics, with the following addition: If at a certain time node j is infected by node i and then i becomes susceptible, j cannot transmit the epidemic back to node i before some other neighbor m of i (m = j ) reinfects it. An illustration of this type of dynamics is provided in Fig. 5 . The central node is initially infected; it transmits the disease to two of its neighbors and then recovers. The two neighbors cannot retransmit the infection back to it. However, they can transmit the infection to other neighbors of theirs that can, in their turn, reinfect the central node. With this recipe, local echo chamber effects are strongly depressed. This small variation can have important effects even in simple network structures. In the case of a star network, composed of a hub of degree q connected to q leaves of degree 1, it is trivial to observe that the NBSIS model does not allow a steady state. Indeed, leaves can be infected only by the hub; therefore, once a leaf has been infected, it cannot be infected by any other node, and thus, after it spontaneously recovers, it has no chance to be reinfected again. The NBSIS epidemic dies out in a few time steps after its onset. This behavior is in contrast to the SIS dynamics on star networks, which can sustain long-lived steady states as soon as λ > 1/ √ q [36] . The same lack of an active state is observed in generic tree networks [2] when the infection starts at a single node. In the case of generic networks, again, a lack of steady state is observed in the limit of large λ. Indeed, in the limit λ → ∞, the SIS model quickly reaches a clear steady state in which the density of infected nodes tends to 1. In the NBSIS model, on the other hand, if λ → ∞, the initial infected seed infects all its nearest neighbors in the initial time steps. These infected neighbors cannot reinfect the seed due to the nonbacktracking condition, and therefore, the seed, once recovered, cannot become infected again and thus becomes effectively removed from the network. The neighbors of the seed experience a similar fate, after infecting all their neighbors. Therefore, in this limit the NBSIS dynamics lacks an active (infected) phase and quickly decays to the absorbing (healthy) phase. In Fig. 6 we show that the lack of an active state occurs for any finite value of λ in the NBSIS, which shows instead a transient that eventually decays into the absorbing state. As we can observe in Fig. 6(a) , for γ > 5/2 the density of infected individuals tends to zero for a sufficiently large time interval, whose length decreases as λ is increased, in stark contrast to what is expected for a system with a truly active state above its critical point. Moreover, as Fig. 6(b) shows, this time interval does not depend on the system size, revealing that for any value of λ the only stable state is the absorbing one. In the case γ < 5/2 [ Fig. 6(c) ] we observe some differences: The initial plateau is followed by an intermediate regime with a decay slower than exponential. For fixed λ and increasing network size N [ Fig. 6(d) ], the decay toward the absorbing state occurs over temporal scales that grow with N . Still, the most important features are the same: There is no steady state, and the temporal scale over which the epidemic disappears (weakly) decreases with λ [37] . Overall, this phenomenology can be attributed to the suppression of backtracking paths for the infection. In all cases no steady state is sustained for asymptotically long times. Notice that the way the absorbing state is reached for finite networks is completely different from what happens for SIS. In the latter case the absorbing state is reached from the steady state by a random fluctuation; for NBSIS, instead, ρ(t ) is pushed toward zero by a deterministic drift. The different decays exhibited depending on γ can be traced back to different topological features of the networks. It has been shown that for γ > 5/2 the epidemic transition in SIS is triggered by the hub (the node with the largest degree q max ) and its direct neighbors, which singlehandedly can keep alive the dynamics for very large periods of time by repeatedly reinfecting each other. The elimination of backtracking paths imposed by the NBSIS model forbids these reinfection events and thus completely destroys this triggering mechanism. On the other hand, for γ < 5/2 the transition is triggered by a different subset of nodes, identified with the K core of the maximum index [26] . This subgraph is composed of nodes with a fairly high number of mutual interconnections. It therefore provides many possible paths through which infection can propagate from one node to another, circumventing the cancellation of direct backtracking paths. Although these alternative paths are not sufficient to sustain the epidemic indefinitely, they permit the slow decay toward the absorbing state. We can obtain a further confirmation of this picture by reinterpreting the NBSIS model in terms of coupled SIS and dynamic bond percolation processes [38] . In this sense, when a node i infects nodes j for the first time, this dynamic step blocks the edge pointing from j to i for further infections until i is again infected by a node m = j . Accordingly, an infection event from i to j can unblock an edge from k = j , previously blocked due to an infection event at a past time from k to j . Thus, the NBSIS dynamics behave as a SIS model in which edges are blocked and unblocked in a percolationlike process due to infection events between pairs of nodes. Notice that this percolation process is intrinsically different from the standard one [38] since the blocking is dynamic and correlated. In this framework, it is easy to see that a steady active state can be achieved only when the density of blocked edges reaches a steady state of not very large value. In Fig. 7 we plot the normalized density of blocked edges η(t ) and the prevalence ρ(t ) as a function of time for different values of γ and λ. We observe that during the transient state the density of blocked states is also approximately constant over time. However, the pseudo-steady state eventually ends, and the density of blocked states resumes growth and attains, when the absorbing state is reached, values close to η = 1/2, which corresponds to each edge in the network being blocked in Message-passing methods are a powerful tool that can be successfully applied to a variety of dynamical models on complex networks. They rely on the neglect of echo chambers or backtracking events, in which a set of nodes can influence each other repeatedly. Dynamical paths are strictly nonbacktracking for models such as percolation or the SIR epidemic spreading model, and therefore, it is natural that a MP approach provides the correct description in such situations. Such a description provides the thresholds in terms of the largest eigenvalue of the Hashimoto or nonbacktracking matrix. In other situations, such as the one represented by the SIS model, backtracking paths represent a crucial ingredient of the dynamics. They are at the core of the mechanism that keeps epidemics active in power-law distributed networks with a low level of heterogeneity (i.e., for a large degree exponent), in which repeated reinfection events between the hub and its nearest neighbors are able to keep the infection alive for large intervals of time and to propagate it to the rest of the network. Disallowing such backtracking paths leads inevitably to the elimination of the steady state of the SIS model. In this case, a better description is provided by quenched mean-field theory, predicting the threshold is the inverse of the largest eigenvalue of the adjacency matrix of the network. In the present paper we have reconsidered the claims made in Ref. [25] concerning the applicability of the MP method to the description of the SIS dynamics. We have shown that, while no big difference exists for many real networks, in synthetic uncorrelated ones the threshold predictions of MP and QMF theories are widely different in the limit of large networks for mildly heterogeneous networks (for γ > 5/2), in agreement with theoretical expectations. Direct simulations of the SIS dynamics show that in this same regime, the numerical threshold of the SIS process is in better agreement with the QMF prediction than with the MP one, which is not a bound for the true threshold. While we have demonstrated that MP does not provide a better estimate of the SIS threshold than QMF, in Ref. [25] it is also argued that MP provides a better approximation of the probability that individual nodes are infected at a given time. In Fig. 8 we clarify this issue by comparing Monte Carlo estimates of the prevalence of the SIS process as a function of time on the karate club network (the same as considered in [25] ) with the QMF and MP predictions obtained from direct numerical integration of the corresponding nonlinear differential equations. Details are provided in the figure caption. We observe here that, away from the critical point, QMF theory provides a very good approximation of the steady-state prevalence measured over surviving runs, that is, over runs that, at given time t, have not yet fallen into the absorbing state. On the other hand, MP theory appears closer to the steady-state prevalence computed averaging over all runs, including those already in the absorbing state. Regarding the evolution of the prevalence at short times, we can see that MP appears slightly closer to numerical results than QMF theory. Close to the critical point, at short times QMF and MP practically coincide. In the steady state, QMF provides a very good match to the steady state of surviving rules, while MP predicts a steady state that does not match any of the simulations results. Considering that for all finite networks the dynamics fall, sooner or later, into the absorbing state, information about the critical probabilities is to be estimated using only surviving runs [39] . It is therefore clear that QMF provides a better estimate of the critical properties of the SIS model. In order to better understand the role of backtracking paths, we have analyzed a SIS-like model in which nonbacktracking paths are suppressed to some extent. Numerical simulations of this model show that it does not possess a steady state, at odds with SIS behavior, signaling again that the exclusion of backtracking paths cannot provide an accurate description of the SIS dynamics. The consideration of backtracking paths can improve theories for the SIS dynamics, as they can take into account dynamical correlation effects [35] ; see Ref. [40] for an early attempt. However, as already stated in Ref. [18] , the development of a message-passing method appropriate for SIS still remains an outstanding open problem. Networks: An Introduction Infectious Diseases in Humans 22nd International Symposium on Reliable Distributed Systems (SRDS'03) The symmetric adjacency matrix takes values A ij = 1 if nodes i and j are connected by an undirected edge and A ij = 0 otherwise Proc. Natl. Acad. Sci. USA Automorphic Forms and Geometry of Arithmetic Varieties Proc. Natl. Acad. Sci. USA Matrix Computations Proceedings IEEE INFOCOM 2005: 24th Annual Joint Conference of the IEEE Computer and Communications Societies Nonequilibrium Phase Transitions in Lattice Models We acknowledge financial support from the Spanish MINECO, under Project No. FIS2016-76830-C2-1-P. R.P.-S. acknowledges additional financial support from ICREA Academia, funded by the Generalitat de Catalunya.