key: cord-018208-sc8j1ate authors: Qu, Bo; Wang, Huijuan title: The Accuracy of Mean-Field Approximation for Susceptible-Infected-Susceptible Epidemic Spreading with Heterogeneous Infection Rates date: 2016-11-09 journal: Complex Networks & Their Applications V DOI: 10.1007/978-3-319-50901-3_40 sha: doc_id: 18208 cord_uid: sc8j1ate The epidemic spreading over a network has been studied for years by applying the mean-field approach in both homogeneous case, where each node may get infected by an infected neighbor with the same rate, and heterogeneous case, where the infection rates between different pairs of nodes are also different. Researchers have discussed whether the mean-field approaches could accurately describe the epidemic spreading for the homogeneous cases but not for the heterogeneous cases. In this paper, we explore if and under what conditions the mean-field approach could perform well when the infection rates are heterogeneous. In particular, we employ the Susceptible-Infected-Susceptible (SIS) model and compare the average fraction of infected nodes in the metastable state, where the fraction of infected nodes remains stable for a long time, obtained by the continuous-time simulation and the mean-field approximation. We concentrate on an individual-based mean-field approximation called the N-intertwined Mean Field Approximation (NIMFA), which is an advanced approach considered the underlying network topology. Moreover, for the heterogeneity of the infection rates, we consider not only the independent and identically distributed (i.i.d.) infection rate but also the infection rate correlated with the degree of the two end nodes. We conclude that NIMFA is generally more accurate when the prevalence of the epidemic is higher. Given the same effective infection rate, NIMFA is less accurate when the variance of the i.i.d. infection rate or the correlation between the infection rate and the nodal degree leads to a lower prevalence. Moreover, given the same actual prevalence, NIMFA performs better in the cases: 1) when the variance of the i.i.d. infection rates is smaller (while the average is unchanged); 2) when the correlation between the infection rate and the nodal degree is positive. Our work suggests the conditions when the mean-field approach, in particular NIMFA, is more accurate in the approximation of the SIS epidemic with heterogeneous infection rates. By considering the system components as nodes and the interactions or relations in between nodes as links, networks have been used to describe the biological, social and communication systems. On such networks or complex systems, viral spreading models have been used to describe processes e.g. epidemic spreading and information propagation [8, 10, 13, 20] . The Susceptible-Infected-Susceptible (SIS) model is one of the most studied models. In the SIS model, each infected node infects each of its susceptible neighbors with an infection rate β . The infected node can be recovered with a recovery rate δ . Both processes are independent Poisson processes. The ratio τ β /δ is called effective infection rate, and when τ is larger than the epidemic threshold τ c , the epidemic spreads out with a nonzero fraction of infected nodes in the metastable state. The average fraction of infected nodes y ∞ in the metastable state, ranging in [0, 1], indicates how severe the influence of the virus is: the larger the fraction y ∞ is, the more severely the network is infected. In this paper, we concentrate on deriving the average fraction y ∞ of infected nodes in the metastable state. Although the continuous-time Markov theory can be used to obtain the exact value of y ∞ , the number of states is too large to be solved in a large network [12] . Hence, the derivation of the average fraction y ∞ of infected nodes in the metastable state mostly relies on mean-field theoretical approaches. The first approach to study the SIS model in complex networks is a degree-based mean-field (DBMF) theory, also called heterogeneous mean-field (HMF) approximation, proposed by Pastor-Satorras et al. [14] , which assumes that all nodes with the same degree are statistically equivalent, i.e. the infection probabilities of those nodes are the same. An individual-based mean-field (IBMF) approximations, called the N-Intertwined Mean-Field Approximation (NIMFA), of the SIS model is then introduced [19] with the only assumption that the state of neighboring nodes is statistically independent. NIMFA, taking the network topology into account, turns out to be more precise on different types of networks for the classic SIS model with the homogeneous infection rates [7] while comparing to the DBMF approximation. However, as discussed in [4, 15, 22] , the infection rates could be heterogeneous, i.e. the infection rates between different pairs of nodes could also be different. The accuracy of NIMFA with heterogeneous infection rates has not yet been discussed. In this paper, we explore the influence of the heterogeneous infection rates on the precision of NIMFA. In particular, we compare the average fraction y ∞ of infected nodes as a function of the effective infection rate τ computed by NIMFA to that obtained by the continuous-time simulations of the exact SIS model when the infection rates are heterogeneous but the recovery rate is the same for all nodes. In fact, the effective infection rate τ refers to the average infection rate divided by the recovery rate in the SIS model with heterogeneous infection rates. We set the average infection rate to 1 and tune the recovery rate δ to control the effective infection rate τ. We consider both the independent and identically distributed (i.i.d.) and the correlated heterogeneous infection rates in different network topologies. The N-Intertwined Mean-Field Approximation (NIMFA) is so far one of the most accurate approximations of the SIS model that takes into account the influence of the network topology. For the classic SIS model with the homogeneous infection rate β and recovery rate δ . The single governing equation for a node i in NIMFA is where v i (t) is the infection probability of node i at time t, and a i j = 1 or 0 denotes if there is a link or not between node i and node j. The governing equation (1) can be extended to the heterogeneous case: where β i j = β ji is the infection rate between node i and j. In the steady state, defined by dV (t) where A is the N × N adjacency matrix with elements α i j , I is the N × N identity matrix, diag(v i (t)) is the diagonal matrix with elements v 1 (t), v 2 (t), ...., v N (t) and B is the infection rate matrix with elements β i j . The trivial, i.e. all-zero, solution of (3) indicates the absorbing state where all nodes are susceptible. The non-zero solution of V ∞ in (3), if exists, points to the existence of a metastable state with a non-zero fraction of infected nodes. Or else, the metastable state can be figured as 0 or not existing. We are interested in actually the metastable state in this paper. In this paper, we keep the average infection rate to 1 and tune the recovery rate δ to control the effective infection rate τ. In the case of the i.i.d. heterogeneous infection rates, we aim to explore how the heterogeneous infection rates influence the accuracy of NIMFA when the variance of the infection rate varies. We choose the infection-rate distribution that is frequently observed in real-world and importantly the variance is tunable with a fixed mean so that we can systematically explore how the accuracy of NIMFA changes with the broadness of the i.i.d. infection rate. We consider the log-normal distribution, of which we can keep the mean unchanged and tune the variance in a large range. The log-normal distribution [18] B ∼ Log-N(β ; µ, σ ), of which the probability density function (PDF) is, for β > 0 has a power-law tail for a large range of β provided σ is sufficiently large. The log-normal distribution has been widely observed in real-world, where interaction frequencies between nodes are usually considered as infection rates. Wang et al. [21] find that by employing the log-normal distributed infection rates, their epidemic model can accurately fit the infection data of 2003 SARS; we also find that the infection rates in an airline network follow the log-normal distribution [15] . In [15] , we find that, if the epidemic does not die out, the larger the variance of the i.i.d. infection rate is, the smaller the average fraction y ∞ of infected nodes is. We will show that this conclusion can actually explain the observation about how the accuracy of NIMFA changes with the variance of the i.i.d. infection rates at a given effective infection rate τ in this paper. For correlated heterogeneous infection rates, we build a correlated infection-rate scenario and a reference one. In the correlated infection-rate scenario, we assume where d i and d j are the degree of node i and node j respectively, c is selected so that the average infection rate is 1 and α indicates the correlation strength. As discussed in [17] , such a correlation between the infection rate and the nodal degree is motivated by the real-world datasets. In this case, the infection rate of each link is determined by the given network topology and α. For the reference scenario, we shuffle the infection rates from all the links as generated in the first scenario and redistribute them randomly to all the links. In this way, we keep the distribution of infection rates but effectively remove the correlation between the infection rates and nodal degrees. For simplicity, we name this reference scenario as the uncorrelated infection-rate scenario. Though the i.i.d. infection rates are also uncorrelated, we can tune the variance of the infection rate in the case of the i.i.d. infection rates while keeping the distribution and the mean of the infection rates. However, in the scenario of uncorrelated infection rates in this paper, the distribution of the infection rate changes with the parameter α, hence the variance of the heterogeneous infection rates cannot be systematically tuned. A positive α > 0 (or negative α < 0), suggests a positive (or negative) correlation between infection rates and nodal degrees. Too large or small values of α could not be realistic. For example, [3, 9, 11] suggest that α is around 0.5 or 0.8 in their datasets. Hence, we select α = −0.25, −0.5, −1 for the negative correlation and α = 0.25, 0.5, 1 for the positive correlation. Different values of α also offer the possibility to explore how NIMFA performs with different correlation strengths. In this paper, we aim to understand how the correlation influences the accuracy of NIMFA by comparing the average fraction y ∞ of infected nodes obtained by NIMFA and the simulations of the exact SIS model. In [17] , we explored the influence of the correlation between the infection rate and the nodal degree on the prevalence of epidemic, which can be used to partially explain the conclusions in this paper. As in our previous work [15, 17] , we perform the continuous-time simulations of the SIS model. We consider both the scale-free (SF) and Erdös-Rényi (ER) models for different network topologies. The SF model has been used to capture the scale-free nature of degree distribution in real-world networks such as the Internet [5] and World Wide Web [1] : where d min is the smallest degree, d max is the degree cutoff, and λ > 0 is the exponent characterizing the broadness of the distribution [2] . In real-word networks, the exponent λ is usually in the range [2, 3] , thus we confine the exponent λ = 2.5 in this paper. We further employ the smallest degree d min = 2, the natural degree cutoff d max = N 1/(λ −1) as in [6] , and the size N = 1000. Hence, the average degree is approximately 4. The distribution of the degree of a random node in ER network is binomial: Given a network topology and a recovery rate δ , we carry out 100 iterations. In each iteration, the networks are constructed as described above and the infection rates are generated as described in Section 2.2 and 2.3. Initially, 10% of the nodes are randomly infected. Then the infection and recovery processes of SIS model are simulated until the system reaches the metastable state where the fraction of infected nodes is nonzero and unchanged for a long time if the epidemic spreads out, or the fraction is zero if the epidemic dies out. The average fraction y ∞ of infected nodes is obtained over 100 iterations (no matter the epidemic dies out or not). In this section, we first explore the accuracy of NIMFA with the i.i.d. infection rates, and particularly how NIMFA performs when the variance Var[B] of the infection rate B varies. Then we explore the influence of the correlated infection rates on NIMFA. We aim to understand the precision of NIMFA under different effective infection rates, different variances of infection rates and different network topologies: we set the average infection rate to 1 and tune the recovery rate δ to control the effective infection rate τ; we change the variance of infection rates which follow the lognormal distribution; we consider both ER and SF networks to represent different topologies. For each value of the variance of the infection rate, we obtain the average fraction y ∞ of infected nodes as a function of the effective infection rate τ for NIMFA by numerically solving (3) and compare with that by the continuous-time simulations. As shown in Fig. 1a , no matter what the variance of the infection rate is, the curve of y ∞ vs. τ obtained by NIMFA for ER networks is close to that obtained by simulations when the actual prevalence of the epidemic is high, i.e. the effective infection rate τ is large. In order to quantify the difference between the two curves obtained by NIMFA and simulations, we define the variable ζ : where y ∞,N (τ) > 0 and y ∞,S (τ) > 0 denote the average fraction of infected nodes obtained by NIMFA and simulations respectively. The larger the value of ζ (τ) is, the less accurate NIMFA is at the corresponding τ. In Fig. 1b , the plot of ζ vs. τ is shown for ER networks. We find that, for a given effective infection rate τ, NIMFA becomes less accurate when the variance of the i.i.d. heterogeneous infection rates increases. This observation can be to a large extent explained by: 1) our finding in Fig. 1a that NIMFA is more accurate when the prevalence is higher; 2) that given an effective infection rate τ a smaller variance of the i.i.d. infection rates leads to a higher prevalence [15] . We observe the same in SF networks, and the figures, which can be found in [16] , are not shown here due to the page limit. We further explore how the variance of the infection rates influences the accuracy of NIMFA if the actual prevalence y ∞,S (τ) of epidemic is similar. We plot the variable ζ in (6) as a function of the actual average fraction of infected nodes obtained by simulations in Fig. 2 . We find that though it is less evident for ER networks in Fig. 2a , the difference ζ in (6) is actually larger if the variance of the infection rate is larger as shown in Fig. 2b for SF networks when the prevalence is the same. Hence, the higher heterogeneity, i.e. the larger variance, of the i.i.d. infection rates tends to lower down more the accuracy of NIMFA. Overall, we conclude that the prevalence of the epidemic mainly affects the accuracy of NIMFA, i.e. the higher the prevalence is, the more accurate NIMFA tends to be, and given the same prevalence, a larger variance of the i.i.d. infection rates tends to lower down the accuracy of NIMFA. In this subsection, we aim to understand how the correlation between the infection rate and the nodal degree as shown in (5) influences the accuracy of NIMFA. We first employ ER networks as an example and discuss the case when the correlation is positive. Afterwards we explore the influence of the negative correlation. As mentioned in Section 2.3, we build the scenario of uncorrelated infection rates as a reference to study the influence of the correlation between the infection rate and the nodal degree by shuffling the infection rates from all the links as generated in the scenario of correlated infection rates and redistributing them randomly to all the links. As shown in Fig. 3a , we compare the difference ζ between NIMFA and simulations in the scenario of uncorrelated and correlated infection rates for both α = 0.25 and α = 1, and find that ζ is smaller in the scenario of correlated infection rates, i.e. NIMFA is more accurate at a given effective infection rate τ when the correlation between the infection rate and the nodal degree is positive comparing to the scenario of uncorrelated infection rates. The observations are also consistent with our conclusion that NIMFA is more accurate when the prevalence is higher: the positive correlation tends to increase the average fraction of infected nodes [17] , and thus the accuracy of NIMFA, when the effective infection rate τ is small; however, when the effective infection rate τ is large, though the positive correlate may lower down a bit the average fraction y ∞ of infected nodes, the prevalence in both scenarios is high, i.e. NIMFA is relatively accurate, and the difference of the accuracy of NIMFA in the two scenarios is not obvious. As the correlation strength α increases in Fig. 3b , the difference ζ decreases at a given τ. That is to say, NIMFA tends to be more accurate when the positive correlation becomes stronger. We further consider the influence of the positive correlation on the accuracy of NIMFA when the prevalence is the same. The plots of the difference ζ as a function of the average fraction y ∞ of infected nodes are shown in Fig. 3c and Fig. 3d . Given the prevalence of epidemic, the positive correlation is more likely to increase the precision of NIMFA and the stronger the correlation is the more accurate NIMFA is. We observe the same on SF networks which is though not shown here. Regarding to the influence of the negative correlation between the infection rate and the nodal degree on the accuracy of NIMFA, we compare the variable ζ in the scenario of correlated and uncorrelated infection-rate scenario with α = −1 for both ER and SF networks as shown in Fig. 4a . We find that, in general, the negative correlation significantly decreases the accuracy of NIMFA when the effective infection rate τ is small but may slightly increase that when τ is large. Moreover, NIMFA becomes less accurate when the negative correlation is stronger as shown in Fig. 4b . As mentioned in Section 2.3, the negative correlation tends to decrease the prevalence when the effective infection rate τ is small while increase the prevalence when τ is large. Hence, the influence of prevalence on the precision of NIMFA could largely explain our observations here. When the prevalence of epidemic is the same, the influence of the negative correlation on NIMFA's accuracy is shown in Fig. 4c and Fig. 4d . We find that, in general, 1) NIMFA is less accurate with the negative correlation comparing to the uncorrelated scenario especially when the prevalence is low as shown in Fig. 4c ; 2) NIMFA becomes even less accurate if the negative correlation becomes stronger as shown in Fig. 4d . In this section, we choose the airline network from the real world as an example to illustrate how its heterogeneous infection rates affect the accuracy of NIMFA of SIS epidemics on the network. In the airline network, the nodes are the airports, the link between two nodes indicates that there's at least one flight between these two airports, and the infection rate along a link is the number of flights between the two airports. We construct this network and its infection rates from the dataset of openFlights 1 . As shown in [17] , the airline network possess roughly a power-law degree distribution. The heterogeneous infection rates from the dataset are normalized by the average so that the average is 1. We compare the difference ζ between NIMFA and the simulations of the exact SIS model in three scenarios: 1) the network is equipped with its normalized original heterogeneous infection rates (correlated) as given in the dataset; 2) the network is equipped with the infection rates in the normalized original dataset but randomly shuffled (uncorrelated); 3) the network is equipped with a constant infection rate (homogeneous) which equals to 1. The original heterogeneous infection rate between a pair of nodes are approximately correlated with the degrees of the two nodes as the relationship (5), and the parameter α ≈ 0.14 indicates a positive correlation [17] . We show the difference ζ as a function of the effective infection rate τ in Fig. 5a for the 3 scenarios defined as above. We find that NIMFA is generally more accurate when the effective infection rate τ is larger, i.e. the prevalence of epidemic is high. The variable ζ is smaller in the scenario of homogeneous infection rates than uncorrelated infection rates with any effective infection rate. This is because the i.i.d. infection rates with a non-zero variance tends to decrease the prevalence, and thus lower down the accuracy of NIMFA at a given effective infection rate τ. NIMFA is more accurate with the positive correlation by comparing the difference ζ in the scenario of correlated infection rates and uncorrelated infection rates. Furthermore, Fig. 5b shows that, given the same actual prevalence, i.e. the average fraction y ∞ of infected nodes obtained by simulations, NIFMA is more accurate: 1) in the homogeneous scenario than in the uncorrelated scenario; 2) in the correlated scenario than in the uncorrelated scenario. All the observations agree with our previous observations and explanations about how the heterogeneous infection rate influences the accuracy of NIMFA in network models. In this paper, we study how the heterogeneous infection rates affect the accuracy of NIMFA -an advanced mean-field approximation of SIS model that takes the underly network topology into account. By comparing NIMFA with the continuous-time simulations of the exact SIS model at a give effective infection rate τ, we find that the prevalence of epidemic could largely characterize the accuracy of NIMFA which is reflected in two aspects: 1) NIFMA is generally more accurate when the effective infection rate τ is larger, i.e. the prevalence of epidemic is higher; 2) when the variance of the i.i.d. infection rates or the correlation between the infection rate and the nodal degree decreases the prevalence at a given τ, NIMFA tends to become less accurate as well. Moreover, we also explore the influence of the heterogeneous infection rates on the accuracy of NIMFA at a given prevalence, i.e. when the average fraction y ∞ of infected nodes obtained by simulations is given. Regarding to the i.i.d. heterogeneous infection rates, the accuracy of NIMFA tends to decrease as the variance of infection rates increases. In the scenario of correlated infection rates, the positive correlation between the nodal degree and the infection rate is more likely to increase the accuracy of NIMFA whereas the negative correlation tends to lower down the accuracy especially when the effective infection rate τ is small. Note that we discuss the conditions when NIMFA is accurate but the cases where NIMFA is far from the simulations are still unexplored. Our work sheds light on the conditions when we the mean-field approximation of the SIS model with heterogeneous infection rates is accurate. Internet: Diameter of the world-wide web Emergence of scaling in random networks The architecture of complex weighted networks Slow epidemic extinction in populations with heterogeneous infection rates The fractal properties of internet Resilience of the internet to random breakdowns Susceptible-infected-susceptible model: A comparison of n-intertwined and heterogeneous mean-field approximations Epidemics on interconnected lattices. EPL (Europhysics Letters) 105 Statistical analysis of airport network of china Epidemics in interconnected small-world networks Minimum spanning trees of weighted scale-free networks Epidemic processes in complex networks Epidemic dynamics and endemic states in complex networks Epidemic spreading in scale-free networks SIS epidemic spreading with heterogeneous infection rates The accuracy of mean-field approximation for susceptible-infectedsusceptible epidemic spreading SIS epidemic spreading with correlated heterogeneous infection rates Performance analysis of communications networks and systems Virus spread in networks Effect of the interconnected network structure on the epidemic threshold Modelling the spreading rate of controlled communicable epidemics through an entropy-based thermodynamic model Epidemic spreading in weighted networks: an edge-based mean-field solution