key: cord-355393-ot7hztyk
authors: Yuan, Peiyan; Tang, Shaojie
title: Community-based immunization in opportunistic social networks
date: 2015-02-15
journal: Physica A: Statistical Mechanics and its Applications
DOI: 10.1016/j.physa.2014.10.087
sha: 
doc_id: 355393
cord_uid: ot7hztyk

Abstract Immunizing important nodes has been shown to be an effective solution to suppress the epidemic spreading. Most studies focus on the globally important nodes in a network, but neglect the locally important nodes in different communities. We claim that given the temporal community feature of opportunistic social networks (OSN), this strategy has a biased understanding of the epidemic dynamics, leading us to conjecture that it is not “the more central, the better” for the implementation of control strategy. In this paper, we track the evolution of community structure and study the effect of community-based immunization strategy on epidemic spreading. We first break the OSN traces down into different communities, and find that the community structure helps to delay the outbreak of epidemic. We then evaluate the local importance of nodes in communities, and show that immunizing nodes with high local importance can remarkably suppress the epidemic. More interestingly, we find that high local importance but non-central nodes play a big role in epidemic spreading process, removing them improves the immunization efficiency by 25% to 150% at different scenarios.

With the rapid pervasive of a new generation of smart devices, it is possible and necessary to disseminate content in networks by exploiting the human mobility and intermittent device-to-device contacts. Networks with such intermittent device-to-device contacts are generally called delay-tolerant [1] or opportunistic [2] . Until recently, a variety of opportunistic social networks (OSN) have been studied, such as pocket switched networks [3] , publish/subscribe systems [4, 5] , and human contact networks [6] . Many interesting phenomena have been observed, including the heavy-tailed distribution of contact times and node degree [7, 8] , the small world phenomenon [9] , the dynamics of epidemic spreading [10, 11] , the high clustering of aggregated social contact statistics [12] , etc.

The social contact feature makes nodes in OSN vulnerable to infection. Therefore, when an infectious disease appears in a population (e.g., the severe acute respiratory syndrome (SARS) and the H7N9 virus), designing effective immunization strategies has become very important. To achieve this goal, several immunization strategies have been recently developed, ranging from ring immunization [13] to targeted immunization [14] [15] [16] [17] [18] . The targeted strategy has shown to be more effective than the ring strategy to delay the outbreak of epidemic. The basic idea of targeted immunization is that it first ranks importance of nodes and then removes them, from highest importance to lowest, to observe their impact on epidemic spreading speed. The importance of nodes is generally measured by node's degree, closeness or betweenness centrality in the network [19] . To further improve the immunization efficiency, the authors of Ref. [20] suggested that node importance should be recalculated after every step of node removal.

Previous work mainly concentrates on the global measures of nodes in a network, while ignoring the local importance of nodes in different groups. A number of recent studies indicate that opportunistic networks have a high clustered property [10, 21] and show a temporal community structure [22] . Networks with properties at the level of community are quite different from their properties at the entire network level [23] . Studies thus paying more attention to the whole network topology and neglecting temporal community structure may miss many interesting features. For example, in real world it is found that people have different average number of contacts in different social cliques. The same person in one clique may be sociable, having many contacts with others, while in another clique he/she may be more taciturn. Such contact behaviors have also been seen in opportunistic social networks, where people are more frequently to contact their family or friends, while they meet accidentally with strangers [24] . If one tried to characterize such a network by statistics of the mean number of contacts a person has, one would be missing the features of the network, such as the dynamics of epidemic, resulting in a biased understanding of the epidemic spreading process. As shown in Fig. 1 , suppose an infectious disease occurs in a community C 1 , and unfortunately, Alice is infected. In order to delay the epidemic spreading, Bob should be protected first, because of his close contact to Alice and other people. In this situation, although Carl has a high social status in the whole network, the infectious disease would break out in the community and then spread to the entire network if he was immunized first.

To this end, we investigate the evolution of community structure in opportunistic social networks, and analyze the effect of community-based immunization strategy on epidemic spreading. We make the following contributions.

We find that the spreading speed of epidemic within one community is faster than that across different communities. This result is encouraging as it indicates that the outbreak of epidemic could be delayed, if one could further break down the OSN traces into lots of small communities by removing some special nodes.

We observe that the most efficient immunization strategy on epidemic spreading is to remove nodes with high local importance in communities. More interestingly, we find that high local importance but non-central nodes contribute more to epidemic spreading process, leading us to conjecture that it is not ''the more central, the better'' for the implementation of control strategy.

We study the role of nodes' local importance in epidemic spreading, experimentally and analytically. We do so not only by excluding nodes with high local importance from OSN traces but also by developing an analytic model which formally characterizes the relationship between nodes' local importance and the community cohesion. We find that the community cohesion is heavily dependent on local importance of nodes. Removing locally important nodes sharply lowers the community cohesion, and thus helps to suppress the epidemic.

We organize the remainder of this paper as follows. Section 2 reviews related works. Section 3 introduces the OSN traces and network model. The next two sections present our solutions to evaluate node's local importance and cluster nodes, respectively. We analyze the effect of community-based immunization on epidemic spreading in Section 6. Finally, in Section 7 we conclude our paper.

Immunization strategies in general can be classified into two categories: the ring strategy [13] and targeted strategy [14] [15] [16] [17] [18] . The targeted immunization strategy, in which nodes playing an important role in the network are removed first, has shown to be very effective [16] . Pastor-Satorras et al. [14] first studied the targeted immunization strategy and found that immunizing nodes with strong connectivity is more effective than that of randomly selected nodes. Holme et al. considered a more challenging scenario, where only neighborhood information of nodes can be used. They observed that the most efficient strategy is to iteratively immunize the neighbor of nodes with big out-degree [15] . Subsequently, Zhang et al. [17] developed a more precise model by taking the immunization willingness of individuals into account. In addition, Schneider [18] tested the role of node's betweenness centrality in epidemic spreading, and obtained the similar result.

Recently, the concept of node importance has been introduced to the fields of routing and message diffusion in opportunistic social networks. For example, the authors of Refs. [25, 12, 26] exploited the node importance to make routing decisions. They found that forwarding messages to nodes with high centrality could increase the message delivery ratio. Similarly, the authors of Refs. [21, 22] observed that nodes wandering from one community to another (such nodes thus have a high contact frequency with others in the network) contributed more to message diffusion.

All of the above works focus on the central nodes in a network, they evaluated node's importance either by logical centrality metric or by physical contact behavior among nodes. Our work supplements the previous results, exploring the role [4, 5] , multicasting [27] and location of rumor source [28] .

We use real and synthetic traces in this study, both of them have their own advantages and can be complementary to each other. The former helps to observe the real epidemic spreading behavior; the latter provides an opportunity to study the dynamics of epidemic spreading in a large scale. The details of each kind of trace are discussed below.

Real trace (NCSU). The NCSU trace [29] is taken by twenty students who live in a campus dormitory. Every week, these students carried Garmin GPS 60CSx handheld receivers, which are WAAS (Wide Area Augmentation System) capable with a position accuracy of better than three meters 95% of the time, for their daily regular activities, and altogether thirty-five trajectories were gathered from 2006-08-26 to 2006-11-16. The GPS receivers perform a device discovery every 10 s to record their current positions. To reduce GPS errors, the authors of Ref. [29] calibrate a position at every 30 s by averaging three samples over the past thirty-second period. According to the position information in each trajectory, we assume that two nodes have a contact if their distance is less than 150 m, a realistic range for WiFi transmissions.

Synthetical trace (SLAW). Considering the limitation on scalability of real trace, we need a proper mobility model that can characterize the nature of human mobility, to investigate the dynamics of epidemic spreading, both in width and depth. Although many random mobility models, such as Random Walk and Random Way Point, have been widely used in opportunistic social networks for evaluating routing performance or even the epidemic dynamics [30, 31] , they cannot reflect the main features of human mobility, including the truncated power-law flights and pause-times, the heterogeneously bounded mobility areas of different nodes, etc. The recent work [32] proposed a new mobility model called SLAW (Self-similar Least Action Walk) that can produce synthetical trace incorporating various features of human mobility. We therefore use it to produce synthetical human traces for investigation on epidemic spreading. The simulation area is 2000 m × 2000 m 2 , the number of nodes is varying from 100 to 500 with a step 100. Table 1 summarizes all SLAW model parameters (for detail meanings of these parameters, please refer to Ref. [32] ).

Traditional methods to model opportunistic social network mainly include time expanded graph and binary graph. The time expanded graph catches each snapshot of the original network, that is, new edges connecting a node and its copy are added at the next snapshot with the edges having been existed in the last snapshot. It hence incurs to the scalability issue [33] . On the other hand, the binary graph can alleviate the storage overhead. It however neglects the duration of each contact and its decayed age. This method only characterizes the time-varying network at a coarse-grained level [12] .

Considering these facts, we model an OSN as a Decayed Aggregation Graph DAG = (V , E), where V denotes the set of nodes (|V | = n) and E denotes the set of edges. Let W (t) = (w uv (t)) n×n denote its adjacency matrix and N uv (t) = {(on i , off i ), i = 1, 2, . . . , N} denote the contact series between nodes u and v in the interval [0, t], where the tuple (on i , off i ) denotes the start moment and end moment of the ith contact respectively, and N is the number of contacts. We formulate the contact strength between nodes (i.e., the value of w uv (t)) as a Decayed Sum Problem [34] . Decayed Sum Problem includes two components. The first one is weighted function f (i), and the second one is decayed function g(T − off i ), as shown in the following equation.

Definition 1 (Decayed Sum). Given the contact series N uv (t), the goal is to estimate the decayed sum at any current time T

where f (i) = off i − on i denotes the ith contact duration (i.e., the weighted part of Decayed Sum Problem) and g(T − off i ) denotes the decayed part. We set g(T − off i ) = e −(T −off i ) as the inter-contact time between nodes generally obeys an exponential decay in OSN [35] . Hence, Eq. (1) can be reformulated as

(2)

We next analyze the space complexity of DAG. Exact tracking of w uv (T ) needs Θ(N) storage bits. Considering scalability issue (in general, N ≫ n), we should further reduce the storage overhead while keeping the same calculation precision.

Let 

From Theorem 1, each node only carries a single counter to exactly track the contact strength between itself and any other node, which forms the row vector w u of matrix W . We use this matrix to cluster nodes in the next section.

This paper mainly concentrates on the local importance of nodes and its effect on epidemic spreading. As discussed in Sections 1 and 2, the local importance of nodes reflects their social status in a community, rather than in the whole network. 1 Traditional solutions for evaluating node importance mainly include the node degree, closeness and betweenness. All of them are not applicable, due to the unknown number of neighbors (for degree measure) and vulnerable end-to-end path (for closeness and betweenness measures) in opportunistic social networks. For example, if we use the betweenness to measure node importance, we first need to collect the shortest paths for each node pairs in an offline way, and then count the number of times that each node appears in these shortest paths. Furthermore, the three methods have a bias towards a global measure of nodes. To deal with these issues, we use the technology of Principal Component Analysis (PCA) [36] to evaluate node's local importance, which provides an online method for evaluating node importance.

Principal component analysis is a powerful tool to extract relevant information from a data set by filtering noise and redundant data. This relevant information reveals the hidden, simplified structures underlying the data set. We generalize the principle of PCA as follows.

Suppose that a node u has built the matrix W (please refer to Section 3.2), and the matrix W has been centralized (i.e.,

subtract the corresponding mean from each column). Let C W = W T W /(n − 1) denote the covariance matrix of W . Let us further diagonalize the C W as

where Λ = diag(1, 2, . . . , n) and P is a normalized orthogonal matrix. Let x i be the eigenvectors of C W and λ i the corresponding eigenvalues, and λ 1 ≥ λ 2 ≥ · · · λ n . We can see from Fig. 2 that the row vector α u (α u1 , α u2 , . . . , α un ) Fig. 2 . The spectral space of W and its vector representation.

The main notations used in the paper.

Notation Explanation w u

The row vector of matrix W C W

The covariance matrix of W P k+1

The noise components of W P k

The principal components of W P

The eigenvector decomposition of C W W k

The dimensionality reduction matrix of W

The distribution of node u in the n-dimensional spectral space

The noise distribution of α u denotes the distribution of node u in the n-dimensional spectral space, and the column vector x i (α 1i , α 2i , . . . , α ni ) denotes the coordinates of all of the nodes in the ith dimension of the spectral space. In addition, once we get the orthogonal matrix P, we generally select the top k-dimensional spectral space (x 1 , x 2 , . . . , x k ) as the principal component of W , since the corresponding top k eigenvalues dominate the spectral graph features [37] . Algorithm 1 describes the above computation process and Table 2 lists the main notations used in the paper. Mathematically, let Λ k denote the diag(1, 2, . . . , k) and the matrix P k = (x 1 , x 2 , . . . , x k ). Let α + u represent (|α u1 | , |α u2 | , . . . , |α ui | , . . . , |α uk |), where |α ui | denotes the absolute value of α ui , we have Lemma 1. For a given decayed aggregation graph DAG with k communities, the matrix P k is the projection matrix, elements of the vector α + u are the projected values of node u in such k communities.

Proof. Let W k denote the dimensionality reduction matrix of W and the matrix C W k be the covariance matrix of W k . Based on the theory of PCA, C W k should been diagonalized as well, we have

On the other hand, from Eq. (4), we get

Replace Λ k with Eq. (5) and C W with W T W /(n − 1), respectively, we have Multiply both sides by (n − 1) and use the substitution of P T

Hence, we conclude that the matrix P k is the projection matrix. Let c i denote the principal direction of the ith eigenvector x i . Based on the singular value decomposition (SVD) of W k [38] , both c i and x i are satisfying:

After some algebra, we obtain x i = W T k c i /λ 

Proof. From Lemma 1, we know that the projection length of node u in community i is |α ui |, and from the spectral graph theory [37] , it has been shown that the eigenvalue λ i indicates the strength of community i in a graph. Hence, we get

Obviously, the L i u is mathematically equivalent to |α ui | if we ignore the factor λ i (considering the fact that local importance of each node in community i has the common factor λ i ). We specifically use the above equation to denote node's local importance, this is mainly because the product of the two parts (|α ui | and λ i ) has a special physical significance, it reflects the contribution of community i to node u's global importance G u , that is,

With the above two equations, 2 we evaluate the local and global importance of each node, and plot their empirical distributions in Fig. 3 , where the subgraphs correspond to the case in which the local importance of nodes is measured with respect to the largest community ( Fig. 3(a) ), the second ( Fig. 3(b) ), the third (Fig. 3(c) ) and the fourth largest community (Fig. 3(d) ), respectively. We observe that, for most nodes, the global importance shows a strong correlation with the local importance. Furthermore, the correlation decreases with decreasing community size, as small communities play a weak role in a graph. Another interesting observation is that the two social metrics of nodes have different increasing rates, which result in some central nodes having a relatively low importance in a community, and vice versa. In Section 6, we study their effects on epidemic spreading.

Cutting a graph into small clusters has been studied widely. We use the k-means, one of the most well-known clustering algorithms [39] , to detect the temporal community structure. The advantage of the k-means algorithm compared to other methods such as CNM [40] and k-clique [41] is that it does not need to know the neighbor relationship between nodes, it only requires the adjacent matrix of a weighted graph such as the DAG, while the CNM and k-clique are more appropriate to a binary graph. In addition, based on the technology of PCA discussed above, it is confident to determine the number of communities, the initial elements for each community and the termination condition, three issues strongly affecting the performance of k-means. We next discuss how to detect the community structure based on the refined k-means.

(1) Determining k, the number of communities: PCA provides a roadmap to reduce a confusing data set to a lower dimension that retains the main features of the original data set. The rationale behind this is that the eigenvalues of a network, play a big role in many important graph features. It has been shown that the maximum degree, clique number, and even the randomness of a graph are all related to λ 1 . In general, we select the top k eigenvectors to denote the main structures of the graph, where the value of k satisfies

In this paper, we set R = 0.85, an experiential value 3 belonging to the default interval [0.7, 0.9] [37] .

(2) Excluding the noise nodes: In opportunistic social networks, there commonly exist some nodes with few contacts to other nodes. We call them noise nodes in this study. Excluding the noise nodes has little effect on epidemic spreading speed. Furthermore, it helps to reduce the problem size and algorithm complexity. We now discuss how to use PCA to identify the noise nodes.

PCA divides a network into two different parts: (1) the principal components P k , and (2) the opposite P k+1 , where the P k+1 = (x k+1 , x k+2 , . . . , x n ), as shown in Fig. 2 . We call the latter noise components of the network. Accordingly, we divide the row vector α u by α 1,k u (α u1 , α u2 , . . . , α uk ) and α k+1,n u (α u,k+1 , α u,k+2 , . . . , α un ), the signal and noise of the node u. From the theory of PCA, if node u is dominated by its noise components, that is, α u is dominated by the α k+1,n u , we can exclude this node from the graph. We use signal-to-noise ratio to identify which component dominates a node.

. The SNR is the ratio of signal energy over that of noise.

From Theorem 2, we know that the node u's local importance relative to community i is |α ui | λ i , which is also the amplitude of node u's signal in the ith dimensional spectral space. Hence, the signal energy e u signal of node u can be presented as e u signal =  i∈ [1,k] (λ i |α ui |) 2 =  i∈ [1,k] (λ i α ui ) 2 , and the noise strength e u noise is equal to  j∈[k+1,n] (λ j α uj ) 2 . Based on Definition 2, we call node u a noise node if its SNR u satisfies SNR u < 1.

(3) Determining the initial elements for each community: After we have ascertained the number of communities and excluded the noise nodes, the next step is to determine the initial centroid m i (i = 1, 2, . . . , k) for each community C i . We select the node u, s.t. max |α ui | (u = 1, 2, . . . , n) for each eigenvector x i , as the initial node of community i, and set m i = α u . Algorithm 2 describes this procedure. v ← 1 {Tracking who is the maximum} 6: for u = 2 to n do 7: if |α ui | > maxValue ∧ u is not a noise node then (11) where n i is the number of nodes belonging to C i . k-means is characterized by minimizing the sum of squared errors, It has been shown that the standard iterative method to k-means suffers seriously from the local minima problem, because of the greedy nature of the update strategy. Fortunately, Theorem 3 guarantees the PCA-based k-means is immune to this problem.

Ref. [42] ). Minimizing J is equivalent to maximizing trace(P T C W P) (please refer to Eq. (19) of Ref. [42] ), and max trace(P T C W P) = λ 1 + λ 2 + · · · + λ k .

In other words, the PCA-based k-means has reached the optimal performance once we cluster all of the non-noise nodes for the first time.

(5) Clustering nodes: For any node u, we compute the distance between itself and the centroid m i , dist(α u , m i ), and select i, s.t. min dist(α u , m i ) (i = 1, 2, . . . , k) as the community node u belongs to, where, dist(α u , m i ) = θ (u, i) = arccos α u m i T ∥α u ∥ 2 ∥m i ∥ 2 and θ(u, i) denotes the angle between α u and m i . After node u joins the community i, we update the centroid m i by Eq. (11) so as to select the next node. Algorithm 3 describes the clustering procedure. 

Updating m i 8: end for 6. Results and analysis

We define the temporal community structure as a series of snapshots of communities underlying the traces. We take a snapshot every 120 s for the communities. Compared to the five hours duration of experiment, the snapshot interval is chosen to be relatively small so as to obtain a detailed view of the community evolution and to make an unbiased understanding of epidemic spreading as much as possible. Fig. 4 plots the number of communities hidden behind the traces at different snapshots, where the term ''S(number)'' denotes the SLAW trace with different number of nodes. We observe that the topology is volatile over time. At NCSU, the number of communities varies from 4 to 10 with a mean of 7.3. The number of communities at S(100) varies between 8 and 16 with a mean of 11.7. Table 4 summarizes the statistics of community structure for all the traces. Fig. 5 shows size of the top 4 communities at the SLAW trace with 500 nodes. We find that each community is not stable over time as well. The size of the largest community varies between 13% and 35% during the experiment, and 11%-20% of the nodes belong to the second largest community. The third and fourth largest community are much closer, varying from 9% to 17% and from 8% to 15%, respectively. In summary, the top 4 communities cover almost 60% nodes. Other smaller communities share the rest nodes.

These results suggest that nodes in opportunistic social networks do not belong to a single, stable community. Instead, the network is made of many temporal clusters. We analyze their effects on epidemic spreading in the next section. 

Having shown that the temporal communities are built on node's social contacts, it is significant to understand the role of these communities in epidemic spreading. To this end, we record the epidemic spreading time within one community and that across different communities. Table 3 summarizes the results. We find that the spreading speed of epidemic within one community is faster than that from one community to another. At NCSU, the mean spreading time within one community is 19 min, and 45 min between communities. The SLAW traces show similar phenomena, especially at S(300), where the inter-community spreading time is almost four times of that of intra-community. This is mainly because there exist many small communities at this scenario (as shown in Table 4 , the S(300) trace has the most number of communities and the community structure changes dramatically). The small communities are more effective to refrain the epidemic spreading than the big ones. This result is encouraging as it indicates that the outbreak of epidemic could be delayed, if one could break down the OSN traces into lots of small communities by removing some special nodes.

Previous work has suggested that removing central nodes is an effective way to delay the epidemic outbreak. This conclusion is somewhat inconsistent with our aforementioned results, since some central nodes have relatively low importance in communities and removing them does little damage to the community structure. Therefore, we conjecture that it may be inefficient to suppress the epidemic spreading by removing central nodes. To validate this, we classify nodes into different categories according to their global and local importance. Specifically, we select as High lOcal imporTance nodes (HOT) with the top 10% nodes that have the highest local importance with respect to a community, and the rest as Low lOcal imporTance nodes (LOT). In addition, we define a central node as a node that belongs to the top 10% nodes with the highest importance in the whole network, the remaining nodes are called non-central nodes (similar selection/definition has been used in Ref. [43] ). Our basic idea here is to understand the role of each kind of node in epidemic spreading.

We first calculate the epidemic spreading time for each trace including all nodes. We then repeat the same experiment by removing each of the four node categories (we remove the same amount of nodes for each category in order to make a fair comparison). Fig. 6 presents the results. The y-axis denotes the immunization efficiency compared to the base case (the case including all nodes), i.e., higher bar means higher efficiency. The first and counter-intuitive phenomenon is that HOT nodes, instead of the central, play a big role in epidemic spreading (as shown in Fig. 6(a) ). Compared to removing the central nodes, the immunization efficiency increases 50% on average when the HOT nodes are removed. More interestingly, we observe that non-central HOT nodes are responsible for most of the epidemic spreading in opportunistic social networks ( Fig. 6(b) and (c) ). Removing them improves the immunization efficiency by 25%-150% at all traces. In contrast, removing central LOT nodes shows a more limited improvement. This phenomenon experimentally validates that it is not ''the more central, the better'' for the implementation of control strategy. We next explore the reason behind this phenomenon. 

Section 6.2 indicates that the spreading speed of epidemic heavily depends on the temporal communities. As a result, even though HOT nodes have on average lower global importance than the central, they do great damage to the community structure when removed, and thus help to suppress the epidemic. The goal of this section is to formally characterize the relationship between the cohesive community and node's importance. We generally use the community density to denote its cohesion. Let D(C i ) represent density of community i, we have the following theorem (see Appendix for proof).

We use this equation to evaluate the impact of removing nodes on community density, and plot the changing of community density in Fig. 7 , where the Y -axis denotes the ratio of community density after some nodes are removed from a community over the initial density of that community. We find that community density is heavily dependent on node's importance. However, there exists a large difference of impact between the local and global importance of nodes. Removing in rank order the most important to the least nodes in a community leads to a faster decline in community cohesion. In contrast, removing first the central nodes will shrink the community but with a slower speed. Taken together, HOT nodes appear to be crucial for implementing the immunization strategy, because of their large impact on community structure when removed.

In this paper, we improve our understanding of immunization strategy on epidemic spreading in opportunistic social networks. We observe that a temporal community structure helps to control the epidemic spreading. This phenomenon is encouraging as it indicates that the outbreak of epidemic could be delayed, if we could further break down the OSN traces into lots of small communities by removing some special nodes. Motivated by this observation, we separate nodes into different behavioral classes from a community viewpoint. We show that HOT nodes can remarkably suppress the epidemic spreading when removed. More interestingly, we find that non-central HOT nodes are responsible for most of the epidemic spreading. These results reveal a counter-intuitive conclusion: it is not ''the more central, the better'' for the implementation of control strategy. For any t ∈ T 2 , since t ̸ = off i (note that off i ∈ T 1 and T 1 ∩ T 2 ≡ ∅), we have h(t) = 0. Hence, = h(T ) + e −1 w uv (T − 1).

We first give the following lemma.

Proof. From the clustering process mentioned above, we know that the centroid m i (m 1i , m 2i , . . . , m ni ) can approximately represent the line formed by nodes within the ith community (please refer to Eq. (11) p. 7). On the other hand, the virtual centroid vector m i should be close to eigenvector x i . This is mainly because m i ≈m i =   u⊂C i α ui  /n i , as α ui is the dominant part of α u . Hence,m i locates in the line formed by the eigenvector x i . We get the conclusion as different eigenvectors are linearly independent.

We now prove Eq. (9). Let variable E u denote the event measuring global importance of node u (i.e., G u ) in the whole network. Let variable E i u denote the event that measures node u's local importance in a community i. We have P(E u ) = P(measuring G u in the whole network) = P(measuring G u across all communities)

P(E i u )/ * from the Lemma 2 * /.

That is, measuring node u's global importance is equal to first measuring its local importance in different communities, and then put all the components together.

Proof of Theorem 4. Consider the division of an unweighted graph U into k non-overlapped communities C 1 , C 2 , . . . , C k . Let v i = (v 1i , v 2i , . . . , v ni ) be the index vector of community C i , and v ui is equal to 1 if node u belongs to C i and 0 otherwise.

For community C i , its density can be expressed as D(C i ) = number of edges in C i number of nodes in C i = v T

DTN: an architectural retrospective

Opportunities in opportunistic computing

Pocket switched networks and human mobility in conference environments

Socially-aware routin for publish-subscribe in delay-tolerant mobile ad hoc networks

Supporting cooperative caching in disruption tolerant networks

Data delivery properties of human contact networks

Impact of human mobility on opportunistic forwarding algorithms

Exploiting social interactions in mobile systems

Small-world behavior in time-varying graphs

A reaction-diffusion model for epidemic routing in sparsely connected manets

Networks of strong ties

Bubble rap: Social-based forwarding in delay-tolerant networks

Ring vaccination

Immunization of complex networks

Efficient local strategies for vaccination and network attack

Finding a better immunization strategy

Hub nodes inhibit the outbreak of epidemic under voluntary vaccination

Suppressing epidemics with a limited amount of immunization units

Centrality in social networks conceptual clarification

Tearing down the Internet

Overlapping communities in dynamic networks: their detection and mobile applications

Proceedings of the Thirteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing

Finding community structure in networks using the eigenvectors of matrices

Impact of strangers on opportunistic routing performance

Social network analysis for routing in disconnected delay-tolerant manets

2010 Proceedings IEEE INFOCOM

Multicasting in delay tolerant networks: a social network perspective

Finding rumor sources on random graphs

On the Levy-walk nature of human mobility

Efficient routing in intermittently connected mobile networks: the multiple-copy case

Performance modeling of epidemic routing

SLAW: Self-similar least-action human walk

Time-aggregated graphs for modeling spatio-temporal networks

Maintaining time-decaying stream aggregates

Power law and exponential decay of intercontact times between mobile devices

Principal Component Analysis

Spectral Graph Theory

Community detection in graphs

Finding community structure in very large networks

Uncovering the overlapping community structure of complex networks in nature and society

He, K -means clustering via principal component analysis

Inferring social ties across heterogenous networks

Matrix Perturbation Theory

We acknowledge the support of the National Natural Science Foundation of China under Grant Nos. U1404602, U1304607, the Science and Technology Foundation of Henan Educational Committee under Grant Nos. 14A520031, 14A520068. We also wish to thank the CRAWDAD archive project for making the DTNs traces available to research community.

Proof of Theorem 1. Let us split the interval [0, T ] into two disjoined parts T 1 and T 2 , where