key: cord-0467040-2axuveka authors: Liu, Yang; Wang, Xiaoqi; Wang, Xi; Wang, Zhen; Kurths, Jurgen title: PrEF: Percolation-based Evolutionary Framework for the diffusion-source-localization problem in large networks date: 2022-05-16 journal: nan DOI: nan sha: 3bbe6539148fd46770d98f93a26e3822aa131aaa doc_id: 467040 cord_uid: 2axuveka We assume that the state of a number of nodes in a network could be investigated if necessary, and study what configuration of those nodes could facilitate a better solution for the diffusion-source-localization (DSL) problem. In particular, we formulate a candidate set which contains the diffusion source for sure, and propose the method, Percolation-based Evolutionary Framework (PrEF), to minimize such set. Hence one could further conduct more intensive investigation on only a few nodes to target the source. To achieve that, we first demonstrate that there are some similarities between the DSL problem and the network immunization problem. We find that the minimization of the candidate set is equivalent to the minimization of the order parameter if we view the observer set as the removal node set. Hence, PrEF is developed based on the network percolation and evolutionary algorithm. The effectiveness of the proposed method is validated on both model and empirical networks in regard to varied circumstances. Our results show that the developed approach could achieve a much smaller candidate set compared to the state of the art in almost all cases. Meanwhile, our approach is also more stable, i.e., it has similar performance irrespective of varied infection probabilities, diffusion models, and outbreak ranges. More importantly, our approach might provide a new framework to tackle the DSL problem in extreme large networks. There has recently been an enormous amount of interest focusing on the diffusion-source-localization (DSL) problem on networks, which aims to find out the real source of an undergoing or finished diffusion [1, 2, 3] . Two specific scenarios are epidemics and misinformation and both of which can be well modeled by the networks. Being one of the biggest enemies to global health, infectious diseases could cause rapid population declines or species extinction [4] , from the Black Death (probably bubonic plague), which is estimated to have caused the death of as much as one third of the population of Europe between 1346 and 1350 [5] , to nowadays COVID-19 pandemic, which might result in the largest global recession in history, in particular, climate change keeping exacerbating the spread of diseases and increasing the probability of global epidemics [6, 7] . In this case, the study of the DSL problem can potentially help administrations to make policies to prevent future outbreaks and hence save a lot of lives and resources. Further regarding misinformation, as one of the biggest threats to our societies, it could cause great impact on global health and economy and further weaken the public trust in governments [8] , such as the ongoing COVID-19 where fake news circulates on online social networks (OSNs) and has made people believe that the virus could be killed by drinking slaty water and bleach [9] , and the 'Barack Obama was injured' which wiped out $130 billion in stock value [10] . In this circumstance, the localization of the real source would play important role for containing the misinformation fundamentally. • Percolation-based evolutionary framework. We propose a percolation-based evolutionary framework to solve the DSL problem, which takes a network percolation process and potentially works for a range of scenarios. • Extensive evaluation on synthetic and empirical networks. We evaluate the proposed method on two synthetic networks and four empirical networks drawn from different real-world scenarios, whose sizes are up to over 800, 000 nodes. Results show that our method is more effective, efficient, and stable than thestate-of-the-art approaches, and is also capable of handling large networks. 2 Related Work DSL approaches. Shah and Zaman first studied the DSL problem of single diffusion source and introduced the rumor centrality method by counting the distinct diffusion configurations of each node [1, 16] . They considered the Susceptible-Infected (SI) model and concluded that a node is said to be the source with higher probability if it has a larger rumor centrality. Following that, Dong et al. also investigated the SI model and proposed a better approach based on the local rumor center generalized from the rumor centrality, given that a priori distribution is known for each node being the rumor source [17] . Similarly, Zhu and Ying investigated the problem under the Susceptible-Infected-Recovered (SIR) model and found that the Jordan center could be used to characterize such probability [12] . Wang et al. showed that the detection probability could be boosted if multiple diffusion snapshots were observed [13] , which can be viewed as the case that the information of the diffusion time was integrated to some extent. Indeed, if the time-stamp or other additional information is known, the corresponding method would usually work better [2, 3, 18] . In short, almost all those exiting methods study the DSL problem on either simple networks (such a tree-type networks or model networks) or small empirical networks. Hence, their performance might be questioned in real and complex scenarios, such as networks having a lot of cycles [19, 20] . Network immunization. The network immunization problem aims to find a key group of nodes whose removal could fragment a given network to the greatest extend, which has been proved to be a NP-hard problem [21] . In general, approaches for coping with this problem can be summarized into four categories. The first one is to obtain the key group by strategies such as randomly selecting nodes from the network, which is usually called local-information-based methods [22, 23] . In this scope, since the network topology information does not have to be precisely known, these methods would be quite useful in some scenarios. Rather than that, when the network topology is known, the second category is usually much more effective. Methods of the second category draw the key group by directly classifying nodes based on measurements like degree centrality, eigenvector centrality, pagerank, and betweenness centrality [20] . More concretely, they firstly calculate the importance of each node using their centralities and choose those ranking on the top as the key group. The third category takes the same strategy but will heuristically update the importance of nodes in the remaining network after the most important node is removed, and the key group eventually consists of all removed nodes [21] . The last category obtains the key group in indirect ways [24, 25] . For instance, refs. [26, 27] achieve the goal by tackling the feedback vertex set problem. We assume that a diffusion ζ occurs on a network G(V, E), where V and E are the corresponding node set and edge set, respectively. Letting e uv ∈ E be the edge between nodes u and v, we define the nearest neighbor set regarding u as In particular, denoting |c i | the size of c i (i.e., the number of nodes in c i ), the largest connected component (LCC), c max , is then defined as the component that consists of the majority of nodes, that is, c max = arg max c i |c i |. Now assuming that G is the remaining network of G after the removal of q fraction nodes and the incident edges, the corresponding size of the LCC, |c max |, will hence be a monotonically decreasing function of q. Such function is also known as the order parameter G(q) = |c max |/n, where n = |V| is the number of nodes in G. According to the percolation theory [28] , when q is large enough, say larger than the critical threshold q c , the probability that a randomly selected node belongs to the LCC would approach zero. In other words, if q > q c , then there would be no any giant connected component in G . The diffusion ζ is generally associated with four factors: the network structure G, the diffusion source v s , the dynamic model M , and the corresponding time t, say ζ(G, v s , M, t). Regarding M , here we particularly consider the Susceptible-Infected-Recovered (SIR) model [29] as an example to explain the proposed method. More models will further be discussed in the result section. For nodes of G governed by the SIR model, their states are either susceptible, infected or recovered. As t → t + 1, an infected node u has an infection probability β uv (or a time interval τ uv = 1/β uv ) to transmit the information or virus, say ς, to its susceptible neighbor v ∈ Γ(u). Meanwhile, it recovers with a recovery probability γ u (or duration of τ u = 1/γ u ). Those recovered nodes stay in the recovered state and ς cannot pass through a recovered node. Now supposing that a group of nodes O ∈ V are particularly chosen as observers and hence their states could be investigated if necessary, we study what and how the configuration of O could facilitate better solutions for the diffusion-source-localization (DSL) problem [1, 2, 3] . In particular, we assume that a node u ∈ O would record the relative infection time stamp t u once it gets infected. Besides, we also consider that part of O, say O d , have the ability to record the infection direction, that is, a node u ∈ O d can also show us the node v if v transmits ς to u. Based on these assumptions, for a diffusion ζ triggered by an unknown diffusion source u s at time t 0 , the DSL problem aims to design an estimator σ(G, O) that gives us the inferred source u s = σ(G, O) satisfying u s = arg max v∈G P(O|v), where P(O|v) is the probability that we observe O given ζ(G, v, M, t). Obviously, the state of a node i ∈ O is governed by all parameters of ζ but with unknown M and t. Hence, with high probability v s differs from the real source v s in most scenarios [2, 3, 12] . And therefore, the error distance = h( v s , v s ) is conducted to verify the performance of an estimator, where h( v s , v s ) represents the hop-distance between v s and v s . Usually, an estimator is said to be better than another one if it has smaller . But here we argue that: what should we do next after we obtain the estimator having a small ? Or in other words, can the estimator help us find the diffusion source more easily? Indeed, after acquiring v s , one can further conduct more intensive search on the neighbor region of v s to detect v s . In this case, a small generally corresponds to a small search region. However, due to the heterogeneity of contact patterns in most real-world networks [30] , even a small region (i.e., a small ) might be associated with a lot of nodes. Therefore, it would be more practical in real scenarios if such estimator gives us a candidate set V c satisfying v s ∈ V c for sure. And hence, we formulate the goal function that this paper aims to achieve as where |V c | is the size of V c . Intuitively, Eq. (1) is developed based on the assumption that: the smaller the candidate set, the lower the cost of the intensive search. And in general, V c should be finite guaranteed by finite O for an infinite G, otherwise, the cost would be infinite since the intensive search has to be carried out on an infinite population. Let V r be the removal node set and V o the rest, i.e., Likewise, we write the component cover α(u) that a specific node u ∈ V r corresponds to as where c i (v) represents the component that node v belongs to. Denoting t v the time stamp that node v gets infected and O = {u|u = arg min v∈O t v }, where t v is assumed to be infinite if v is still susceptible, the proposed approach is developed based on the following observation (Observation 1). we then have v s ∈ V c for sure. Example 1. Considering Fig. 1(a) and (b), the boundary of the connected component that the diffusion source belongs to is {1, 2}. Example 2. Regarding Fig. 1(b) , α(1) consists of all nodes covered by those shadows. Example 3. With respect to Fig. 1(c) , V c comprises of node 1, node 2, the diffusion source, and the node adjacent to the source. We now consider the generation of the observer set O (or equivalently V r ). For a given network G(V, E) constructed by the configuration model [31] , letting k 2 and k accordingly be the first and second moments of the corresponding degree sequence, we then have Lemma 1. where GCC represents a connected component whose size is proportional to the network size n. Now suppose that V r consists of nodes randomly chosen from G and let q = |V r | represent the fraction of removed nodes. Apparently, such removal would change the degree sequence of the remaining network (i.e., subnetwork G , also see Fig. 1 (b) as an example). Assuming that there is a node v shared by both G and G , the probability that its degree k (in G) decreases to a specific degree k (in G ) should be where k k is the combinational factor (note that each node has q probability of being removed). Letting p k denote the degree distribution of G, we then have the new degree distribution p k of G as and hence the corresponding first and second moments can be further obtained as and respectively. Since G is constructed using the configuration model, its edges are independent of each other. That is, each edge of G shares the same probability of connecting to v ∈ V r . In other words, the removal of v would remove each edge with the same probability and hence G can be viewed as a special network that is also constructed by the configuration model. Thus, Lemma 1 can be used to determine whether a GCC exists in G and we reach where q c is the critical threshold of q, that is: i) if q < q c , with high probability there is a GCC in G ; ii) if q > q c , with high probability there is no GCC in G [32, 33] . For random networks (such as Erdős-Rényi (ER) networks [34] ), k 2 = k ( k + 1) gives us q c = 1 − 1/ k , which indicates that q c is usually less than 1 and it increases as G becomes denser. But for heterogeneous networks, say p k k − , k 2 = ∞ k k 2 p k k k 2− diverges if < 3 (most empirical networks have 2 < < 3 [20] ), which means q c approaches 1. From the above analysis, we have the following conclusions regarding the case that the observer set O is randomly chosen. For random networks, q c < 1 indicates that one can always have a proper O to achieve finite V c . And usually the denser the network, the larger the observer set O. But for heterogeneous networks, q c → 1 means that such goal could only be achieved by putting almost all nodes into the observer set O. We further consider the case that O consists of hubs [35] , where hubs represent those nodes that have more connections in a particular network. Specifically, for a network G(V, E), we first define a sequence S regarding V and assume that each element of S is uniquely associated with a node in G. Then, letting S(i) = u and S(j) = v if k u k v , satisfying i < j, where k u represents the degree of node u, we have O = {S(i), i ∈ [1, nq + 0.5 ]}. For heterogeneous networks under such removals on hubs, threshold q c < 1 can be achieved and obtained by numerically solving [36] where k min is the minimum degree. For V c , however, we could not obtain an explicit equation to indicate whether it is finite. But we can roughly show that there should be q c < 1 that gives us a finite V c for networks generated by the configuration model with degree distribution p k k − . Supposing that the size of the LCC of G is proportional to n b with b < 1, then gives us the possibly largest size of V c , where k max is the maximum degree of G and here we assume that such degree is unique. In the mentioned case, k max = k min n 1/( −1) holds and hence it approaches 0 as n → ∞ if 2 < < 3 (again, this is the case that we are particularly interested in), which indicates that one can always find some proper value of b > 0 satisfying b < ( − 2)/( − 1) for a given . Note that, same as the random removal, each edge of G also shares the same probability of being removed. Hence, both the size of the LCC and the number of connections that node v max has with V o decrease as q increases, where v max represents the node whose degree is k max . But for networks having k max n such as star-shape networks, V c is still infinite when q is finite. In this case, the information of only infection time stamps of nodes in O is apparently not enough. However, if the infection direction is also recorded, then the central node as the unique observer is fairly enough for the DSL problem in those star-shape networks. Note that most existing DSL methods would also fail in star-shape networks. Hence, we further assume that part of O, say O d , can also show the infection direction. Obviously, |V c | is a monotonically decreasing function of |O d | for a specific q. In particular, if O d = O, then the size of V c would be bounded by the size of the LCC. In general, finite V c of heterogeneous networks can be achieved by carefully choosing O. Or in other words, the configuration of O plays fundamental role for the suppression of V c . Associating O with the sequence S, such as random removal can be viewed as a removal over a random sequence S, our goal is now to acquire a better S that could give us a smaller V c in regard to a specific q. Besides, since the number of components that v max connects to is usually difficult to measure especially for real-world networks, here we choose to achieve the containment of V c by curbing the LCC of G (see also Eq. (11)), which coincides with the suppression of the order parameter G(q). where q c (S) = arg min q G(q) δ, F (S) = q G(q), and δ is a given parameter (such as δ = 0.01). Eq. (12) is also known as the network immunization problem which aims to contain epidemics by the isolation of as fewer nodes as possible [33] . And a lot of methods have been proposed to cope with such problem [21, 24, 25, 27, 37, 38, 39, 40] . Here we particularly choose and consider the approach based on the evolutionary framework (AEF) to construct S since it can achieve the state of the art in most networks. We first introduce several auxiliary variables to the ease of description of AEF. Consider a given network G(V, E) and the corresponding sequence S. Further, letting p max = |LCC|/n of G , we define the critical subsequence S c as the subsequence satisfying that p max δ of G (S ⊥ i ) and p max > δ of G ((S c , S ⊥ i )). Note that all F is scaled by n, namely, the size of the studied network G. The core of AEF is the relationship-related (RR) strategy that works by repeatedly pruning the whole sequence S. Specifically, per iteration T , RR keeps a new sequence S (i.e., S ← S ) if F (S ) < F (S) (or q c (S ) < q c (S)), which is obtained through the following steps. 1) Let j = n, S ← S, and G (V , E ) be a subnetwork of G, which consists of all nodes in V = {S (z), z ∈ [j, n]} and the associated edges in E = {e uv , ∀u, v ∈ V }. 2) Construct the candidate sets j by randomly choosing ∆ times from {S (i), i ∈ [max(j − r × n, 1), j]}, where ∆ ∈ [1, ∆] and r ∈ (0, r] are randomly generated per iteration, and ∆ and r are given parameters. 3) Choose the node u, where holds, then for a specific sequence S regarding a given network G, F i would be independent of F j if either i > j or j > i satisfies. That is, for such a case, the order of nodes in S i has no effect on F j . AEF is developed based on RR and Observation 2. Specifically, at T p , a random integer j(T p ) ∈ [π 1 , π 2 ] is generated, where π 1 and π 2 are two given boundaries. Then, for all subsequences S i , ∀i ∈ [1, h] , RR with the optimization of F is conducted if δ is unknown, otherwise, S c is optimized by RR with q c minimum. Hence, the containment of V has been achieved (see also Eq. (4)), based on which existing DSL approaches can be used to further acquire the candidate set V c ⊂ V (see also Eq. (1)). Here, since our goal is to have a framework that can effectively cope with the DSL problem in large-scale networks, we choose to propose the following approach. Let O be the effective periphery node set of V c defined as where Letting t min = arg min tu u, u ∈ O , we first refine the time stamp by t u = t u − t min . Then, a Reverse-Influence-Sampling (RIS) [41] like strategy is conducted to infer the source v c , which works in the following processes. 1) Let Λ = {} and G (V , E ) be the reverse network of G (V , E ) satisfying that |E | ≡ |E | and e uv ∈ E if e vu ∈ E . 2) Randomly choose a node u ∈ O and let t u = t 0 + t u , where t 0 is randomly generated from [0, t 0 ] and t 0 is a given parameter. 3) View u as the source and transmits ς to one of its randomly chosen neighbors, and then it recovers. 4) Such transmission repeats t u steps and denote the latest infected node by v. 5) Let Λ = Λ ∪ {v}. 6) Repeat 2)-5) T Λ times. Using θ(v) to represent the frequency that a node v ∈ Λ has regarding Λ, then we estimate The candidate set V c (see also Eq. (1)) is finally acquired by simply considering a few layers of neighbors of v c . In particular, in tandem with the approach that we present to draw V c from V c , we name such framework as the Percolation-based Evolutionary Framework (PrEF) for the diffusion-source-localization problem. Note that other strategies can also be further developed or integrated into PrEF to acquire V c based on V c , such as those existing DSL methods. Competitors. We mainly compare the proposed approach with the Jordan-Center (JC) method [12] that generally achieves comparable results in most cases [11, 15, 18] . For JC, the corresponding candidate set V c is constructed based on the associated node rank since neighbor-based strategy usually results in much larger V c . Meanwhile, O consists of hubs is also considered as a baseline, say Hubs s. Besides, since most current DSL approaches do not work for large networks, we also verify the performance of the proposed method by comparing it with approaches from the network immunization field, including the collective influence (CI) [21] , the minsum and reverse-greedy (MSRG) strategy [27] , and the FINDER (FInding key players in Networks through DEep Reinforcement learning) method [25] . Settings. JC considers all infected and recovered nodes to achieve the source localization. PrEF is conducted with ∆ = 50, r = 1, T = 20, π 1 = 1, π 2 = 0.1 × n , T p = 5, 000 for networks of n 10 5 , T p = 2, 500 for 10 5 < n 10 6 , T p = 500 for n > 10 6 (AEF), and T Λ = 10 6 (RIS). Besides, we use PrEF(R d ) to represent PrEF regarding a specific R d . In addition, for each network, q c is obtained at G(q c ) ≈ 0.005 of AEF. Diffusion models. SIR 1 : β uv = β 1 and γ u = γ 1 , ∀u, v ∈ V. SIR 2 : β uv ∈ [β 0 , β 1 ] and γ u = 0, ∀u, v ∈ V, i.e., the Susceptible-Infected (SI) model [29] . SIR 3 : β uv ∈ [β 0 , β 1 ] and γ u = 1, ∀u, v ∈ V, i.e., the Independent Cascade (IC) model [42, 43] . Letting n(t, I) and n(t, R) accordingly be the number of nodes in infection and recovery states at t of a particular diffusion ζ(G, v s , M, t), we generate a DSL sample by the following processes. 1) A node v s ∈ V is randomly chosen as the diffusion source to trigger ζ. 2) ζ is terminated at the moment when (n(t, I) + n(t, R))/n ε, where ε is the outbreak range rate. Note that (n(t, I) + n(t, R))/n might be much larger than ε if the infection probability is large. Evaluation metric. We mainly consider the fraction of the candidate set (see also Eq. (1)), φ, to evaluate the performance of the proposed method, which is defined as In what follows, φ is the mean drawn from over 1, 000 independent realizations if there is no special explantation. Besides, we also use φ(·) to denote φ of a specific approach, such as φ(PrEF) represents φ of PrEF. Networks. We consider both model networks (including the ER network [34] and scale-free (SF) network [30] ) and empirical networks (including the Power Grid (PG) network [19] , the Scottish-cattle-movement (SCM) network [37] , the loc-Gowalla (LOCG) network [44] , and the web-Google (WG) network [45] ). We choose the PG network since it is widely used to evaluate the performance of DSL approaches. Rather than that, the rest are all highly associated with the DSL problem. In particular, the SCM network is a network of Scottish cattle movements, on which the study of the DSL problem plays important role for food security [46] . Besides, the LOCG is a location-based online social network and the WG network is a network of Google web whose nodes represent web pages and edges are hyperlinks among them. The study of them can potentially contribute to the containment of misinformation. The basic information regarding those networks can be found in Table 1 . Results. We first fix q to verify the performance of PrEF over varied infection probability β 1 . Indeed, if the diffusion is symmetrical, JC would be an effective estimator (see Figs. 2a and 2b when β 1 is large). But such effectiveness sharply decreases as β 1 decreases. By contrast, PrEF has steady performance for the whole range of β 1 and is much better than JC when β 1 is small, such as φ(PrEF(1)) = 0.0004 versus φ(JC) = 0.0721 at β 1 = 0.1 in the SF network. Besides, PrEF(0) apparently works better in the ER network compared to the SF network, which indicates that k max might have impact on the effectiveness of PrEF(0) since the SF network has a much larger k max . To further demonstrate that, we also consider two empirical networks: the Power Grid network (with k max = 19) and the Scottish network (with k max = 3667). As shown in Figs. 3c and 3d, even Hubs s is more effective than JC in the Power Grid network but, JC, Hubs s, and PrEF(0) all fail in the Scottish network. Rather than that, PrEF(1) works extremely well in both cases. Further considering the fraction of the candidate set φ as a function of the outbreak range rate ε (Fig. 3) , PrEF(0) and PrEF(1) still have stable performance while φ(JC) rapidly increases as φ. We further evaluate the performance of PrEF under different q by comparing it with CI, MSRG, and FINDER on the two large networks. From those results shown in Fig. 4 , we have the following conclusions: i) φ → 1 when q → 0, which is in accordance with our previous discussion, i.e., G(q) → 1 when q → 0; ii) A specific method S that has better performance regarding R d = 0 usually also works better with respect to R d = 1; iii) For a specific q, PrEF always has much smaller φ compared to CI, MSRG, and FINDER, especially for the WG network. Indeed, the size of the observer set O, the value of R d (see also Fig. 5) , the strategy generating O all play fundamental roles for minimizing φ. In particular, PrEF has the best performance for almost all range of q. Besides, results in Fig. 6 further demonstrate that the proposed method is also stable against varied diffusion models. Aiming at the development of the-state-of-the-art approach to cope with the DSL problem for large networks, the PrEF method has been developed based on the network percolation and evolutionary computation, which can effectively narrow our search region of the diffusion source. Specifically, We have found that the DSL problem is in a degree equivalent to the network immunization problem if viewing immune nodes as observers, and hence it can be tackled in a similar scheme. In particular, we have demonstrated that the search region would be bounded by the LCC if the direction information of the diffusion is known, regardless of the network structure. But for the case that only the time stamp is recorded, both LCC and the largest degree have impact on the search region. We have also conducted extensive experiments to evaluate the performance of the proposed method. Results show that our method is much more effective, efficient, and stable compared to existing approaches. Detecting sources of computer viruses in networks: theory and experiment Locating the source of diffusion in large-scale networks Epa: Exoneration and prominence based age for infection source identification Climate warming and disease risks for terrestrial and marine biota Mathematical models in population biology and epidemiology Climate change and human health: risks and responses. World Health Organization Global health 2035: a world converging within a generation A survey of fake news: Fundamental theories, detection methods, and opportunities Demystifying the myths about covid-19 infection and its societal importance Can 'fake news' impact the stock market Information source finding in networks: Querying with budgets Information source detection in the sir model: A sample-path-based approach Rumor source detection with multiple observations: Fundamental limits and algorithms Identifying propagation sources in networks: State-of-the-art and comparative studies Inferring the origin of an epidemic with a dynamic message-passing algorithm Rumor centrality: a universal source detector Rooting out the rumor culprit from suspects Information sources estimation in time-varying networks Collective dynamics of 'small-world' networks Influence maximization in complex networks through optimal percolation Efficient immunization strategies for computer networks and populations Local immunization strategy based on the scores of nodes Generalized network dismantling Finding key players in complex networks through deep reinforcement learning Identifying optimal targets of network attack by belief propagation Network dismantling Introduction to percolation theory Modeling infectious diseases in humans and animals Emergence of scaling in random networks A critical point for random graphs with a given degree sequence Resilience of the internet to random breakdowns Network science On random graphs I Error and attack tolerance of complex networks Breakdown of the internet under intentional attack Immunization and targeted destruction of networks using explosive percolation Optimization of targeted node set in complex networks under percolation and selection Framework of evolutionary algorithm for investigation of influential nodes in complex networks Dismantle large networks through deep reinforcement learning Maximizing social influence in nearly optimal time Talk of the network: A complex systems look at the underlying process of word-of-mouth Maximizing the spread of influence through a social network Friendship and mobility: user movement in location-based social networks Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters Modelling vaccination strategies against foot-and-mouth disease