key: cord-0229480-zs8ttra7
authors: Kumar, Vineet; Krackhardt, David; Feld, Scott
title: Interventions with Inversity in Unknown Networks Can Help Regulate Contagion
date: 2021-05-18
journal: nan
DOI: nan
sha: 8e9b8596b16fac1f8c01387cb3184c6c9224fad6
doc_id: 229480
cord_uid: zs8ttra7

Network intervention problems often benefit from selecting a highly-connected node to perform interventions using these nodes, e.g. immunization. However, in many network contexts, the structure of network connections is unknown, leading to a challenge. We develop and examine the mathematical properties of two distinct informationally light strategies, a novel global strategy and local strategy, that yield higher degree nodes in virtually any network structure. We further identify a novel network property called Inversity, whose sign determines which of the two strategies, local or global, will be most effective for a network. We demonstrate that local and global strategies obtain a several-fold improvement in node degree relative to a random selection benchmark for generated and real networks (including contact, affiliation and online networks). In some networks, they achieve a 100-fold improvement. We show how these new strategies can be used to control contagion of an epidemic spreading across a set of village networks, finding that the strategies developed here require far fewer ($<50%$) nodes to be immunized, relative to the random strategy baseline. Prior research has typically used the complete network structure to choose nodes for optimal seeding. The relevant network is often costly to collect, and is privacy-invasive, requiring knowing each person's network neighbors, and might not be possible to obtain for time-sensitive interventions. Our interventions are less invasive of individual privacy, since each selected node only needs to nominate some network neighbors for intervention, while mathematically guaranteed to provide better connected nodes.

Network-based interventions are of crucial importance in any setting where an individual's choice or action has an indirect impact on others. There are a wide range of network intervention applications. Consider: (a) A new infectious disease is spreading through a large population. We want to minimize the number of infected individuals by inoculating using a new vaccine; however, we only have a limited number of doses to administer. (b) We have a limited number of free samples of new product to distribute to consumers, so they can share information through word of mouth, and we would like to maximize the number of consumers who receive word of mouth. (c) We would like to identify virally spreading contagion (informational or biological) as quickly as possible by choosing individuals as observation stations (or for contact tracing).

Although seemingly distinct, these problems (a)-(c) represent a class of network interventions [1] in which we benefit from identifying more central or highly connected individuals in the network. 1 However, the challenge is that we do not have access to the relevant network structure. In application (a), having the Facebook network structure might not be

The Friendship Paradox, which our interventions are based on, is colloquially stated as "your friends have more friends than you" [4, 5] . 2 The intuition for why the friendship paradox helps obtain well-connected nodes is this: there are few well-connected hubs in real networks, and since they are connected to many other nodes (by definition), obtaining a friend (or neighbor) of a random node is likely to result in a hub with greater likelihood, compared to the case of randomly selecting nodes. We establish that the friendship paradox is actually not just one statement, but a set of distinct claims (All theorems and proofs are in Supplement §S.A). First, we find an impossibility, i.e. the individual-level friendship paradox cannot hold for all individuals in a network (Theorem S1). In practice for real networks, it can hold for a large proportion of nodes in the network (Figs. S1 and S2 in Supplement §S.C). Second, we demonstrate that in contrast to this individual view, the average number of friends of friends across the network can be characterized in two ways using the local and global mean defined below. Third, we find that both local and global means are greater than the mean degree of the network, and these means are related through a novel network characteristic we call inversity.

Local and Global Means We formally characterize the two distinct but related network properties deriving from the friendship paradox relating to the "average number of friends of friends." Denote a network (see Table S1 for full notation) as an undirected graph G = (V, E) with V the set of nodes and E the set of edges (e ij ∈ {0, 1} denoting absence or presence of a connection between i and j), D i refers to the degree of node i, and N (i) the set of i's neighbors. We specify the local mean as:

The global mean is defined as the ratio of the total number of friends of friends to the total number of friends in the network, consistent with [4] :

The above means arise from differently weighting the average degree across friends. Both means above are consistent with the notion of "average number of friends of friends," although they are distinct network properties (see Fig. 1 for an example and detailed explanation). The global mean was theoretically investigated earlier and found to be greater than the average degree and is independent of the local structure of connections, given node degrees (Theorem S2). In contrast, local mean has distinct properties that depend on local network structure (i.e. who is connected to whom). However, it does share the property with the global mean that it is greater than the mean degree (Theorem S3).

The global mean is greater when there is higher variation across nodes in terms of degree (variance), whereas local mean is greater when we have higher variation across edges, i.e. when edges connect nodes of very dissimilar degree (D) Global Mean. Global mean is the ratio of the total number of friends of friends to the total number of friends. The total number of friends of friends contributed by a is 5. Similarly, b contributes 5, c contributes 5 and d contributes 3 friends of friends. Thus, the total number of friends of friends (i.e. the nodes in red color) are (5 + 5 + 5 + 3) = 18.

number of friends is represented by the nodes in gray, (2 + 2 + 3 + 1) = 8.

(e.g. with a hub and spoke network). More specifically, the global mean is invariant to rewiring the network while keeping the degree distribution the same, whereas the local mean is impacted by the rewiring (Theorem S6). We term these means local or global since the former depends on the local structure (who is connected to whom), whereas the latter only depends on the global network properties (degree distribution).

We identify network structures that result in a greater divergence between these means and the average degree ( Figure  S3 in §S.D). We also examine a number of questions about the relationship between the local and global mean, including whether one of the means is always greater than the other, whether they exhibit correlated variation away from the mean degree ( §S.D).

The above formulation of local and global mean suggests distinct intervention strategies. We illustrate random, local and global strategies to choose a "seed" node in the network beginning with an initial randomly chosen node ( Table  1 ). Observe that with the local strategy, the number of seed nodes is fixed, whereas it is probabilistic under the global strategy. For the local strategy, by construction, the expected degree of the obtained seed is equal to the local mean. For the global strategy, we prove that the expected degree of chosen nodes is equal to the global mean (Theorem S5).

Though the local and global strategies appear to be similar in the sense that we are choosing friends of randomly chosen individuals, the crucial distinction lies in whether we are choosing one random friend or whether we are choosing among each friend. Table 1 details the algorithms to obtain k seeds in a network of size N k. This impacts their relative effectiveness as examined in §5 and §6. Step Details 0 Fix p ∈ (0, 1] (only used for Global strategy in Step 2G).

Repeat Steps 1-2 below until at least k seeds are present in the seed set S.

Draw a random node r uniformly from set of nodes, V . In Example Network, Nodes 10, 18 and 12 (in black) are drawn for (R), (L) and (G) strategies respectively. 2

Depending on the strategy Random (R), Local (L) or Global (G), do the following: 2R (Random): Add r to the seed set S.

In Example Network, add node 10 to the seed set. 2L (Local): Obtain a node s chosen with uniform probability from r's friends, i.e. s ∈ Nr. Add the friend s to the seed set S. In Example Network, one of node 18's friends, node 14 (in red), is chosen at random. Add node 14 to the seed set. 2G (Global): For each of r's friends, s ∈ Nr: With probability p (0 < p ≤ 1), add s to the seed set S. In Example Network, each of node 12's friends, nodes 1, 2, 8 and 9 (in green), are added probabilistically (with probability p) to the seed set. Implementation: For each s ∈ Nr, draw from an independent uniformly distributed random variable zs ∼ U [0, 1]. If zs < p, add s to the seed set S.

Note: With Random and Local strategies, we will obtain exactly k nodes in the seed set S. With the global strategy we might obtain more than k nodes in the seed set. In such a case, we select k nodes at random from the seed set S without replacement. Since both local and global strategies can be used for interventions, we next characterize their relative effectiveness. We identify and define a novel network property, Inversity, that determines when the local mean is greater than the global mean. This property captures all local network information related to the local mean and is scale-invariant, i.e. independent of the size or density of the network. We find that the sign of inversity helps us determine which of the local mean or the global mean is higher for any given network. We show how inversity is related to but distinct from degree assortativity (in Supplement §S5).

Inversity is a correlation-based metric that relates the global and local means for any network is obtained as follows. 

We can then connect (see Theorem S4) the local and global means with inversity and the degree distribution κ m = i∈V D m i as:

where Ψ is a positive function of the degree distribution.

Therefore, if inversity is known, we don't need the entire degree distribution to obtain the local mean. Rather, four moments of the degree distribution are sufficient for that purpose. Inversity captures the local information on imbalances in degree of nodes across edges, whereas the moments of the degree distribution represent global information about the network. Inversity ρ has a critical role in determining whether the local or global mean is larger for a network; specifically, ρ < 0 indicates the global mean is higher than the local mean, whereas ρ > 0 indicates the reverse, implying that knowing inversity can help us determine which strategy to use. Even computing inversity is information-light, requiring only the 2k distribution, which represents the degrees of nodes at the termini of each edge, rather than the entire network [9] .

To identify how much of an improvement over the random strategy is possible, and how this varies across a variety of generated and real networks, we examine the relative effectiveness of strategies, with the random strategy as the baseline and characterize leverage as the improvement another strategy can obtain in terms of expected degree. Leverage for strategy s on network G is defined as λ s (G) = µs(G) µ D (G) for s ∈ {L, G} (since the random strategy obtains the mean degree in expectation, the leverage for R is 1 and it serves as a baseline). A star (or hub-spoke) network obtains the highest possible leverage (see Theorem S7).

Generated Networks: Networks generated from a number of commonly used generative mechanisms are used to assess a number of structural features with regard to the friendship paradox. We examine 3 generative mechanisms for networks: (a) Erdos-Renyi (ER) [10] , (b) Scale Free (SF) [11] and (c) Small World (SW) (Fig. 2) [12] .

We find that for ER networks, at very low density (edge probability), the leverage is very low because most edges connect nodes that have a degree of 1. As density increases, we obtain more variation in degrees, and local leverage increases. However, beyond an edge probability of p = 0.05, leverage decreases as the density of the network decreases. Local leverage thus forms a non-monotonic pattern with ER networks. For SF networks, rather than density or edge probability, we initially examine leverage as the network becomes more centralized (as γ increases above 1, very high degree nodes have a lower probability of occurring). We find that as γ increases from 1 to 2, the leverage increases, but then decreases beyond 2. For SW networks, unlike in the ER and SF networks, leverage is monotonically decreasing with number of neighbors (or density), and is monotonically increasing with rewiring probability. In addition, in Figure  S4 ( §S.E), we show how leverage varies with network size, and find that larger networks typically obtain higher leverage for scale-free (SF).

Real Networks: We examine the range of real networks detailed in §S.B. First, observing the local strategy (Fig.  4A ), we find that for all networks, as expected, the friendship paradox strategies are at least as good as the random strategy. Second, for networks like Twitter (OS4) or Internet Topology (C1), the leverage can be as high as 100. Thus, obtaining a friend of a random node will provide a 100-fold increase in the expected degree of a chosen node. Third, we observe that both local and global leverage (Figs. 4A and 4B) are higher for nodes when average degree is intermediate, i.e. not too low or high. Some networks like the CA Roads network (I3) have very little degree variation and local and global strategies are relatively less effective. Finally, we examine when local and global strategies make a relative To examine whether different generative models result in more or less leverage for the friendship paradox strategies, we examine networks from 3 generative processes. A sample of 1,000 networks was used for each of the models. (A) Erdos-Renyi (ER) networks generated with edge probabilities, p ∈ [0.05, 0.95], and size ranging from N=50 to N=1000 nodes. We find that local leverage is highest for the lowest edge probabilities, and leverage converges to 1 as the networks become more dense. Overall, ER networks do not achieve high leverage with local and global strategies. (B) Static Scale Free (SF) or Barabasi-Albert networks with scale-free parameter γ ∈ [1, 6] . For these networks, observe that the leverage spans a wider range, e.g. for γ = 2, the samples range from leverage of 1 to over 40. The mean leverage is non-monotonic in terms of γ, increasing when γ < 2 and decreasing for γ > 2. The distribution of leverage across the samples also displays decreasing variance when γ > 2. At very high levels of γ ≈ 6, the local mean converges to the mean degree. Overall, we find that SF networks do achieve high leverage with local and global strategies, and intermediate levels of the gamma parameter obtain highest leverage. With small world (SW) or Watts-Strogatz networks, we have two parameters. First is the number of neighbors each node is connected to initially, n. The edges are then rewired with a specified probability, pr. First, in panel (C), we find that with a small number of neighbors, the leverage distribution is quite spread out, and there is a substantial leverage effect. However, as we begin to create very dense networks, both the mean and the variance of the leverage distribution leverage diminish substantially. Second, we examine the impact of rewiring probability on the leverage distribution in panel (D). We find that with lower rewiring probabilities, say pr = 0.05, the leverage distribution is closer to 1, whereas with a higher rewiring probabilities, the distributions feature increased variance as well as higher mean leverage. Overall, SW networks result in moderate levels of leverage for local and global strategies. difference (Fig. 4B ). We find that the highest ratio of local to global mean is for Twitter network (OS4), whereas the lowest ratio (indicating that global strategy has a higher expected mean degree) is shown by Flickr (OS2), both of which belong to the same category of online social networks. Citation networks tend to have higher global mean, whereas for Infrastructure networks, both strategies seem to work just as well.

We demonstrate an application comparing different strategies to control simple contagion spreading through a network. There are a number of models of contagion, and they can be parametrized several ways. However, remarkably most models of contagion can be characterized by a single parameter termed the epidemic threshold. If the ratio of infection to that of recovery is lower than the epidemic threshold, then the epidemic is contained and will die out, whereas if the ratio is above the threshold, then it could turn into an epidemic. The epidemic threshold is shown to be a function of both the network and the virus propagation model (VPM). The epidemic threshold of a network is characterized as the inverse of the greatest (first) eigenvalue of the adjacency matrix A of the network, denoted as below (details in §S.G):

For virtually any VPM, networks with higher epidemic thresholds are less likely to have an epidemic outbreak. The threshold in an undirected network is shown to be proportional to the inverse of the largest eigenvalue for a wide range of VPMs, including SIR, SEIR, etc. models that have been commonly used for modeling infectious diseases [13] . Nodes are selected for immunization or treatment using each of the intervention strategies (random, local and global). We then examine how the epidemic threshold changes as a function of the proportion of nodes vaccinated (removed), for each strategy.

We examine data in the India villages networks from [14] , who collected detailed full census data on the social networks of 75 villages in southern India. The social networks are captured at two different levels of aggregation, at the level of individuals and of households. Details of the network dataset are provided in §S.B.

We find that networks can have either positive or negative inversity depending on how nodes and edges are defined. When nodes as defined as individuals, we find that the networks have negative inversity, whereas if the nodes are defined as households, the inversity values of the resulting networks are mostly positive (Fig. 3) . Thus, a household-based intervention might use the local strategy, and the individual-based intervention might use the global strategy. Figure 3 illustrates the inversity values across the 75 villages separately for individual and household networks. Overall, we find that networks obtained from similar underlying relationships can result in dramatically different inversity characteristics, implying different interventions (local or global) would be better suited.

Epidemic Threshold and Immunization Strategies: Our first goal is to identify how the epidemic threshold τ changes as we immunize nodes from the network G. While immunizing (or removing) any node from the network is likely to increase the epidemic threshold, immunizing well-connected nodes is likely to prove especially beneficial. We examine the effectiveness of the three strategies (Random, Local and Global) in identifying which nodes to immunize from the network.

We evaluate the impact of the immunization strategies on the epidemic threshold of a number of real network datasets. First, we examine the data from N = 75 village social networks in India (see [14] ). This dataset is especially useful in our analysis since the villages are relatively isolated, implying they can be evaluated separately.

In Figure 5 , we evaluate the eigen threshold (τ ) for the networks in the Indian villages data set. First, for the household networks, we find that the local and global strategies obtain a significantly higher epidemic threshold. The difference in threshold between the random strategies and the friendship paradox strategies increases with the proportion of nodes immunized. For individual networks, a similar pattern obtains, but here we find that the global strategy obtains the highest epidemic threshold across all immunization levels, and the difference in thresholds between local and global strategies also increases with the proportion of immunized nodes. This broadly signifies that it is helpful to know which among the global or local strategies to use, and the sign of inversity helps us in making this decision. 

In te rn et : Epidemic Thresholds with Immunization in India Village Networks. Higher thresholds imply an outbreak is more likely to die out. The dark lines represent the mean values, and the shaded regions are the 95% confidence intervals. We examine 3 strategies (Random, Local and Global) to choose nodes to immunize. The proportion of nodes immunized ranges from 1% -75%. In both household and individual networks, we find that the friendship paradox strategies obtain higher thresholds than random, for the same proportion of nodes immunized. For instance, in the household networks, to achieve a threshold τ = 0.15, the random strategy needs to have about 50% of nodes immunized, but the local and global strategies require less than half of that, at around 25%. For the household networks (left panel), we find that the local strategy is better than the global strategies especially at higher levels of removal. However, for individual networks, we find that the global strategy obtains greater thresholds than local.

Epidemic Outcomes and Immunization Strategies: We deploy the epidemic propagation models on 75 village networks from India using an SIR virus propagation model (details in §S.G). Define I it ∈ {0, 1} as an indicator of whether an individual i is infected at time t. We evaluate epidemics on the following aspects:

• Proportion Infected at Peak = 1 N max t ( i I it ): Since epidemics increase in intensity and eventually die down, an important characteristic is to measure the proportion of the population who are infected at the peak of the epidemic. This directly impacts important decisions like hospital capacity planning etc. • Proportion Ever Infected = 1 N i max t (I it ): The proportion of the population that was ever infected by the disease is important since it represents the total spread of the disease in the population. It could also represent the number of people who might have immunity to future recurrences of the disease. • Total Suffering: 1

Here, the total suffering metric captures not just how many infections occur, but also the length of the infections. This represents the proportion of individual-period combinations with an infection.

In Figure 6 , we evaluate epidemic outcomes using the networks of Indian villages. First, for both household and individual level networks, we find that strategies based on the friendship paradox, i.e. the global and local strategies perform better than the random strategy. Second, for household networks, the local strategy performs relatively better than the global strategy for each of the epidemic characteristics detailed above. In contrast, for individual-level village networks, we find that the global strategy diminishes the severity of epidemic spread as measured by each of the above characteristics to a greater extent. Thus, while it may be beneficial to use either strategy, understanding the role of inversity (as in Figure 3 ) helps determine which of the friendship paradox strategies, i.e. local or global ould result in better epidemic outcomes. We also examine the outcomes for a network of Facebook users, and find that the local strategy achieves better outcomes on all of the above metrics (see Fig. S6 in §S.H). Table table: vpmsimulation for parameters of simulation. The top 3 panels represent outcomes for household networks, and the bottom 3 panels for individual-level networks. All outcomes are density plots. We plot 3 outcomes: (a) the proportion of population infected at the peak, (b) proportion of population that was ever infected, and (c) total suffering. The x-axis represent proportions and the y-axis represent density. We plot the outcomes for 3 strategies: (R)andom, (L)ocal and (G)lobal. The dashed vertical lines represent the means for the 3 strategies. A strategy with a density plot to the left of another is "better" in terms of reducing the severity of the epidemic. Thus, for household networks, the local strategy (in red) is better than the global, which in turn is better than the random strategy. This ordering is the same for all 3 outcomes. For the individual networks, however, the global strategy is "bettter," for all 3 outcomes.

We show that with unknown networks, the friendship paradox can be leveraged to obtain such individuals with minimal informational requirements. We identify intervention strategies (local and global) that have theoretical guarantees on obtaining better-connected individuals. With both generated random networks and real networks, our results show the value of using the local and global strategies to obtain highly connected nodes. In the vast majority of networks, we obtain at least double the average degree, and some networks show increases of several hundred-folds in node degree. We expect the advantages of speed of implementation, generality of application areas for these privacy-sensitive and informationally-light strategies to provide an important tool for network interventions in unknown structures.

Formally, the network graph G = (V, E) is comprised of a set of N individual nodes and a set of undirected edges E. Each element of E is a pair of nodes, (i, j) indicates an edge (connection) with e ij ∈ {0, 1}. We also define the directed edge setÊ including both (i, j) and (j, i) as distinct elements ofÊ corresponding to an undirected edge i ↔ j.

We detail the table of notation in Table S1 . 

The basic idea of the friendship paradox can be expressed as "your friends have more friends than you." We examine the degree to which the friendship paradox holds for individual nodes, or the individual friendship paradox. We find in the result below that it cannot hold for all nodes, but can hold for an arbitrarily high proportion (< 1) of nodes.

Theorem S1. The friendship paradox statement that "your friends have more friends than you" cannot hold for all nodes in a network. Also, the statement can hold for all nodes, except one.

Proof. Consider a connected network where not all degrees are identical (if all are identical, the statement cannot hold). There must be at least one node that has the highest degree D max and which is connected to at least one node with a lower degree. If not, then the connected network is comprised entirely of highest (identical) degree nodes, thus contradicting the initial statement. If the highest degree node is connected to a lower degree node, then the average friends of friends of the highest degree node must be lower than D max . Thus the statement cannot hold for all nodes.

To show the second part that it can hold for all nodes except one, consider the star (hub and spoke) network, where all of the nodes except the central node have fewer friends than their friends do.

Theorem S2. [Feld 1991 ] For a network G = (V, E) with degree mean µ D and variance σ 2 D , the global mean of friends of friends is

Proof. (as given in Feld, 1991) .

For any general network G = (V, E) with mean degree µ D , the local mean of friends is given by

where D i is the degree of node i, and e ij ∈ {0, 1} indicates a connection between i and j. 

Rewriting the expression for µ L in terms of the connections (edges) between individuals, we obtain:

Note that what we characterize as the local mean defined as above was examined by others including [4] etc. and was independently shown to be greater than the mean degree by us ( [15] ) and others (including by Christian Borgs & Jennifer Chayes in an online comment to an article by [16] , and by [17] ). However, the properties of the local mean htave not been formally examined and characterized.

Theorem S4. Define the m-th moment of the degree distribution by κ m = 1 N i∈V D m i . The local and global means are connected by the following relationship involving the inversity ρ and the -1,1,2, and 3rd moments of the degree distribution as follows:

Proof. Define the moments of the degree distribution as: κ m = 1 Next, we detail the mean and variance of the distributions. First, we consider the means. The mean of the origin

Next, consider the variances. The variance of the origin distribution (O) is computed as:

Next, we express the corresponding variance of the inverse destination degree distribution (ID), σ 2 ID . Again, recall that 1 Di does not appear just once, but D i times. Therefore, we have:

We next turn to the inversity and based on the definition we connect it to the local and global means and the degree distribution.

Finally, substituting µ D = κ 1 and the expressions for the variances, we obtain:

Theorem S5. The expected degree of nodes chosen by global strategy is the global mean.

Proof. To determine the expected degree of a node chosen by the global strategy: Choose M = 1 node initially, (say X). With probability q, choose each neighbor of X. For a node k with degree D k , the probability of being chosen by this process is the first step when any of k's friends is chosen as the initial node, and the second step is k being chosen with probability q. This probability is p k = 1 N D k × q = qD k N . The expected degree of a chosen "seed" node is then the degree-weighted probability: Proof. First, observe that the degree distribution is unaffected by the change, and therefore the global mean (which only depends on mean and variance of the degree distribution) is also unaffected, i.e. µ G (G) = µ G (G ). Recall that the local

Since between G and G the degrees of all nodes are the same, and all edges are the same except the two rewired edges, we can write the difference between the local means the local means as:

The last inequality follows from the ordering of the node degrees. Note that we actually only require the conditions

Lemma S1. Given a connected network with |V | = N > 3 nodes and any non-degenerate degree distribution. To achieve maximum local mean among all networks satisfying the given degree distribution, the nodes with maximum and minimum degree must be connected to each other.

Proof. We prove this by contradiction. Let the network G = (V, E) have the maximum local mean for the specified degree distribution. Label a and z as the nodes with minimum and maximum degrees in our network. These degrees must be different (D a = D z ) in a non-degenerate distribution. Assume a and z are not connected to each other.

There must be a highest degree node z connected to a node y that satisfies the following conditions: (1) y is not directly connected to a, i.e. (a, y) / ∈ E and (2) D a < D y < D z , Note that (1) must be satisfied since D z ≥ D a , and (2) must be satisfied since a and z are lowest and highest degree nodes. Now, we can find a neighbor of a,

Choose a neighbor x of z that is not connected to b. x must exist, otherwise b and z would have the same degree, contradicting the assumption that lowest and highest degree nodes are not connected. Observe that we can increase the local mean by rewiring the network to G by connecting (a, z) and (b, x) in place of (a, b) and (x, z) as in Theorem S6

above. Thus, network G that we started with could not have had the maximum local mean, and we have a contradiction.

Thus, the statement of the theorem must hold.

Theorem S7. If the degree distribution is unconstrained, the star network maximizes the local leverage λ L = µ L µ D .

Proof. Let δ and ∆ be the minimum and maximum degree. Define f (x, y) = x y + y x . Without loss of generality, assume that x ≥ y. First, observe that f (x + 1, y − 1) > f (x, y). To prove this, we can express

where the last inequality follows from the assumption x > y. Thus, the maximum value of f (x, y) when δ ≤ x, y ≤ ∆

Observe that the ratio of local mean to mean degree can be expressed as

Thus, for each edge, the maximum value of D i D j + D j D i from above is bounded by ∆ δ + δ ∆ , and the maximum local leverage is λ max

Observe that expression is maximum when the highest degree node is ∆ = N − 1 and is connected to a lowest degree node of degree δ = 1, which implies a star network. Therefore, no network can have higher local leverage than the star network.

We use a wide variety of real networks to determine properties and illustrate of the networks as it relates to the interventions detailed in the paper. We use data from two repositories.

The networks are selected across several categories (Affiliation, Face-to-face Social, Online Social, Computer, Infrastructure and Biological networks), and span a wide range in network characteristics like size and density (Table  S2 ). These networks also vary widely in terms of their size, from a low of 25 to networks with millions of nodes (e.g. Youtube). All network data was obtained from the Koblenz Network Collection [18] . We examine these real networks on a number of dimensions, the number of nodes, edges and the variation in the degree distribution. 

In addition, we also use data from N = 75 villages in India made publicly available (see [14] for details). The summary statistics for those village household networks are detailed in Table S3 . 

A basic view of the friendship paradox is developed by plotting the average number of friends (degree) of individual nodes' "friends" on the vertical axis against the average degree (Fig. S1, Fig. S2 ). For example, in the Contact (In person Social) network, we see a deep blue region above and to the left of the 45 • line. Although present across all networks, the pattern is most prominent in the WWW (Google) or Twitter (Online Social) network. Observe also that in the Road Network, only ∆ = 37% of nodes have a higher average number of friends of friends than their own degree.

We illustrate this "individual friendship paradox" using a scatterplot of the node degree versus the average friend degree in Figure S2 . Nodes that have a higher degree than their average friends do are colored red, whereas nodes that have lower degree are colored blue. Across most real networks, we observe that the blues vastly outnumber the reds. Relatedly, there are several nodes with low degrees whose friends on average have a high degree. For a specific node degree, the probability that a node with a lower (or identical) degree is chosen by the sampling strategy for random sampling (gray), local FoF sampling (red) and global FoF sampling (green). Across all networks, for lower degrees, the random sampling curve is to the left of the local and global FoF curves.

In several networks, global FoF is to the left and higher than local FoF (e.g. Contact), whereas in others, it is to the right (e.g. Flickr).

We illustrate the practical impact of the distinction between the two means, with the following questions : (a) Is the Local Mean always greater (or smaller) than the Global Mean? (b) Can both means be relatively high (or low)? (c) What network (sub)structures result in a high Local Mean or Global Mean?

We examine four illustrative network structures (Fig. S3 ) to answer these questions and to understand the differences between the two means. We find that both local and global mean can be much greater than the mean degree, and between these two means, either of them can be greater than the other. Especially noteworthy is the difference between the Local and Global panels for network (Fig. S3C) : the Local mean is equivalent to the mean degree, because each node is equally weighted in terms of w L i . However, the Global mean is higher for this network since it assigns a higher weight w G i to higher degree nodes.

Global (B) Two Central Hubs with Spokes: Each central hub is connected to 7 nodes. The mean degree is lowest in this network. However, local mean is substantially higher than the global mean, and is higher than the mean degree across all networks (a)-(d). In local panel, we see that the weight of central hubs has increased, whereas the corresponding weight for the low degree "spoke" nodes has decreased. In the global panel, the node weights are proportional to degree. (C) Heavy Core with Attached Cycle: The global mean is substantially higher than the local mean (and mean degree). Here, we see in the local panel that the weight of each of the nodes has not changed, and all nodes have the same weight. However, in the global panel, we see that the high degree nodes in the complete graph has higher weight compared to the original network, whereas the weights for the nodes in the 2-cycle are lower than in the original network. (D) Heavy Core with Pendants: Both the local and global mean are substantially higher than mean degree. In the local panel, the edges connecting core nodes to other nodes (both core and pendant) have a relatively low weight, and are not displayed. Figure S4 : Local Leverage Density in Generated Networks from three different generative models, and spans the parameter space. A sample of 1,000 networks was used for each of the models with size varying between 100 and 10,000 nodes. (A) Erdos-Renyi (ER) networks generated with edge probabilities, p ∈ [0.05, 0.95]. We find that as the size increases, the leverage decreases. In all cases, for ER networks, we find relatively low local leverage. (B) Static Scale Free (BA or Barabasi Albert) networks with scale-free parameter γ ∈ [1, 6] . For these networks, we find that leverage increases as the size of the network increases, with networks of size 10,000 having an average leverage of over 7. (C) With small world (Watts-Strogatz) networks, the network size does not seem to materially impact leverage. e.g. using assortativity in place of inversity could result in using a global strategy when local may be more appropriate.

We detail below several examples of virus propagation models being used for characterizing the transmission and spread of diseases. These compartmental build upon the early work of Kermack and McKendrick [22] . Thus, all individuals in a population (in our case, the nodes in a network) are in one of the states, either susceptible (S) or infected (I).

Based on the viral propagation, they can move to other states like Exposed (E), Recovered (R), or Deceased (D). For example, the SIR model involves individuals being in one of three states, (S), (I) or (R) and transitioning between the states probabilistically. Typically, the vast majority of nodes are present in the susceptible state (S), in which they might contrast the disease. The exposed state (E) is used to indicate a node that has been exposed to the disease, but could be asymptomatic during an incubation period and is not capable of infecting others. In contrast, the infected state (I) indicates a node that is capable of infecting others. The (R) recovered state implies permanent immunity.

There are further extensions possible, e.g. adding infants who have maternal antibodies (state M) that provide passive immunity. See [23] or [24] for an overview and survey of these models. These models have been extensively used in epidemiological studies to characterize disease dynamics as detailed in Table S4 , including measles, influenza and COVID-19.

There has been recent notable work that aims to characterize the epidemic thresholds of these compartmental models with disease transmission over a network [25, 13] . The critical idea is that the epidemic threshold of a network can be characterized as the inverse of the greatest (first) eigenvalue of the adjacency matrix A of the network, denoted as:

Eigenvalue λ 1 termed the spectral radius characterizes the connectivity of the network graph. Thus, networks that have higher connectivity or λ 1 are more likely to allow a contagion different paths to grow into an epidemic, whereas in networks with low connectivity, the epidemic is more likely to die out.

While there have been a number of epidemic thresholds for specific network generating processes (e.g. small world), the generality of the result above is valuable since it allows: (a) any arbitrary network, without placing restrictions on its topology or structure, (b) a wide range of compartmental models like SIS, SIR and others detailed in Table S4 typically used to model infectious disease. Consider a SIR model for illustration, the results also hold for the other models. The model is parametrized by two rates: β is the probability of an infected node infecting a susceptible node in a given time period, and δ is the probability at which an infected node recovers (or is cured) during the period. If time is continuous, β and δ can be viewed as the rates of infection and recovery. In either case, R 0 is defined as R 0 = β δ .

The epidemic threshold τ is defined as follows [25] :

There are a few observations relevant here. First, the critical value of epidemic threshold is a function of the adjacency matrix A of the network topology (structure) G. Second, a network topology with a higher epidemic threshold is less likely to have an epidemic. Third, interventions like immunizing nodes or reducing the number of connections (edges) can increase the threshold τ (A) so that infections are more likely to die out. [27] , Swine Flu H1N1 [28] , Ebola [29] SEIR Chicken Pox [30] , SARS [31] , COVID-19 [32] SIRD COVID-19 ([33])

Note: The states refer to (S)usceptible, (I)nfectious, (R)ecovered / (R)emoved, (E)xposed, (D)eceased

We begin with a seed set of 1% of the nodes being infected, and evaluate epidemic outcomes using the SIR model. All the nodes in the network that are not infected or recovered are susceptible (S) to the infection. Each infected node can transmit an infection in each period probabilistically to each of its neighbors. The probability of an infection is P transmit = β. Thus, a node can become infected (I) from contact with any of its neighbors. In each period, an infected node can be cured or recovered (R) probabilistically, with the likelihood P cure = δ. Recovered nodes cannot be reinfected and cannot transmit infections.

The process of immunizing (or vaccinating) a set of nodes involves choosing a proportion of nodes (5%, or 10% or 20%) and ensuring that these nodes do not transmit any disease. The nodes for immunization are chosen based on three strategies: random, local and global.

The parameters used in the simulation of the epidemic are detailed in Table S5 . Table table: vpmsimulation for parameters of simulation. All outcomes are density plots. We plot 3 outcomes: (a) the proportion of population infected at the peak, (b) proportion of population that was ever infected, and (c) total suffering. The x-axis represent proportions and the y-axis represent density. We plot the outcomes for 3 strategies: (R)andom, (L)ocal and (G)lobal. The dashed vertical lines represent the means for the 3 strategies. We find that for the Facebook network, the Local strategy is better for all outcomes than the Global, which in turn is better than the Random strategy.

In Figure S6 , we examine the epidemic propagation characteristics on the Facebook network [34] using the same parameters as detailed in Table S5 . For the Facebook network, we find that an epidemic's outcomes are better when using the local strategy compared to the global strategy, which in turn in are better than the random strategy.

Complex contagions and the weakness of long ties

The spread of behavior in an online social network experiment

Why your friends have more friends than you do

What makes you think you're so popular? self-evaluation maintenance and the subjective side of the "friendship paradox

Generalized friendship paradox in networks with tunable degree-attribute correlation

Generalized friendship paradox in complex networks: The case of scientific collaboration

Friendship paradox redux: Your friends are more interesting than you

Quantifying randomness in real networks

On random graphs, i

Emergence of scaling in random networks

Collective dynamics of 'small-world' networks

Michalis Faloutsos, Nicholas Valler, and Christos Faloutsos. Got the flu (or mumps)? check the eigenvalue! arXiv preprint

The diffusion of microfinance

Graph theoretical models of structural leverage in marketing

Friends you can count on. The New York Times

The friendship paradox and systematic biases in perceptions and social norms

Konect: the koblenz network collection

Assortative mixing in networks

Why social networks are different from other types of networks

Assortativity and leadership emerge from anti-preferential attachment in heterogeneous networks

Applications of mathematics to medical problems

The Kermack-McKendrick epidemic model revisited

The mathematics of infectious diseases

Epidemic thresholds in real networks

The entomological inoculation rate and plasmodium falciparum infection in african children

The dynamics of measles in sub-saharan africa

Clustering model for transmission of the sars virus: application to epidemic control and risk assessment

A simple mathematical model for ebola in africa

Estimation of the contact rate in a seasonal seir model: application to chickenpox incidence in france

Transmission dynamics of the etiological agent of sars in hong kong: impact of public health interventions

The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study

Chinese and italian covid-19 outbreaks can be correctly described by a modified sird model. medRxiv

On the evolution of user interaction in Facebook

To examine this question, we generate 1, 000 networks using different generative methods as above. We find that assortativity and inversity are not guaranteed to have opposite signs (Fig. S5A). Therefore, the sign of assortativity cannot be used to determine whether the local or global mean is greater for a network, unlike with inversity. All 3 network generating processes create networks with the same sign for both metrics (detail in Fig. S5B). Example networks for the case of same sign assortativity and inversity are illustrated (Fig. S5C, S5D)