key: cord-0333298-fchbel8f
authors: Correia, Rion Brattig; Barrat, Alain; Rocha, Luis M.
title: The metric backbone preserves community structure and is a primary transmission subgraph in contact networks
date: 2022-02-04
journal: bioRxiv
DOI: 10.1101/2022.02.02.478784
sha: 5247c726dd01e1ba2d221715c979240c7b3122fe
doc_id: 333298
cord_uid: fchbel8f

The structure of social networks strongly affects how different phenomena spread in human society, from the transmission of information to the propagation of contagious diseases. It is well-known that heterogeneous connectivity strongly favors spread, but a precise characterization of the redundancy present in social networks and its effect on the robustness of transmission is still lacking. This gap is addressed by the recently introduced metric backbone, a subgraph that is sufficient to compute all shortest paths of weighted graphs. We show that the metric backbones of nine contact networks obtained from proximity sensors in a variety of social contexts are generally very small, 49% of the original graph for one and ranging from about 6% to 20% for the others. This reflects a surprising amount of redundancy and reveals that shortest paths on these networks are very robust to random attacks and failures. We show that the metric backbone preserves the community structures of the full contact networks and is a primary subgraph in epidemic transmission. The metric backbone, thus, is a principled graph reduction technique that does not affect shortest path connectivity for any nodes in a network. It is an important subgraph to characterize epidemic spread, the robustness of social networks, and any communication dynamics that depend on complex network shortest paths.

No shortest path is a ected by the metric backbone. In particular, even the distance between nodes directly linked by a removed edge is not a ected as the shortest path between these nodes is then an indirect path via other nodes (edges not on the backbone break the triangle inequality; see [13] for details). 2 For -= 0.2 the disparity filter backbone of the airport network is composed of these proportions of edges and nodes, where is the p-value threshold for the null model-derived normalized-edge-weight distribution. For -= 0.05, the disparity filter backbone is composed of 17% of the original edges but also removes 34% of the original nodes (see [18] for details).

Here we present the metric backbones of nine networks obtained by measuring the contacts between pairs of individuals (using wearable sensors) in a variety of social settings [32, 33, 34] . Such networks are relevant in behavioral studies [35] and as input to data-driven numerical simulations of epidemic spread [36, 9, 34] . We already know that the metric backbone preserves all shortest paths in these networks and removes edges truly redundant for this purpose [13] . Here we show that in contact networks from various social contexts, the metric backbones: (a) are much smaller than the original networks (which have a lot of redundancy), (b) preserve community structure, and (c) are primary subgraphs for simple transmission dynamics.

Social contact networks are built by recording when individuals meet. The data we consider (see section 5) describe close proximity events in populations with a finite temporal resolution, i.e., in successive time windows of approximately 20 seconds [37, 32, 33, 34] . These data can be represented as graphs, R(X), in which each node x i oe X represents a person in a population X, and an edge exists between two nodes if the corresponding individuals have been in close contact at least once. The number of time windows in which individuals x i and x j were in close proximity is denoted by an entry r ij in the graph adjacency matrix. The diagonal entries of this matrix, r ii , denote the total number of time windows in which individual x i was in a close social interaction with any other individual in the population X: r ii = q xj oeX:j" =i r ij . A normalized transformation of this data, used to quantify the strength of association between individuals, can be obtained via the Jaccard measure [38, 39] :

where p ij oe [0, 1] denotes a proximity of two individuals, with p ij = 0 for individuals x i and x j have no contact, and p ij = 1 when they are both active in the same time windows and always in contact; naturally, p ii = 1. The proximity graph, P (X), can be produced by measures other than Jaccard's as long as the proximity strength is symmetrical and proportional to the intensity of the interaction [27] . Two additional forms of normalizing interactions are described in SM, section S1. Because computing shortest paths requires a measure of length, rather than proximity, for the edges we use distance graphs D(X) obtained via the nonlinear map Ï 3 :

where the resulting distance weights are symmetrical and inversely proportional to the intensity of social interaction, with d ij = +OE for individuals x i and x j who have no contact, d ij = 0 when they are simultaneously active and always in contact, and d ii = 0. The metric backbone [13] of a distance graph D(X) is defined as its invariant subgraph B(X) in the computation of the graph's metric closure D T (X). The metric closure is the graph obtained after computing the shortest paths between all pairs of nodes of the distance graph and replacing the original distance edges d ij with the length of the shortest path between x i and x j . Length is computed by summing the edge weights in the (shortest) path via " indirect (non-repeating) nodes, where " is no larger than the diameter of the

The edge weights of the metric backbone graph are given 3 Other maps are possible, but without loss of generality we use the simplest nonlinear isomorphism possible between proximity and distance graphs [27] . 4 The metric closure is one of infinite possible distance closures that are isomorphic to transitive closures in generalized probabilistic or fuzzy metric spaces. In this case, the topological space is defined by the algebraic structure ([0, +OE], min, +), where ([0, +OE], min) and ([0, +OE], +) are monoids that enforce shortest paths with lengths computed by summing edge weights.

There are many other ways to compute length besides summing edges [27] that lead to many other meaningful distance backbones [13] . However, the metric closure, or the All Pairs Shortest Paths (APSP) problem [40] is the most common in network science and typically computed via Dijkstra's algorithm [41] .

where b ij = +OE means that there is no direct edge between x i and x j in the distance backbone graph. The edge weights of D(X) that do not change after computation of the metric closure D T (X) are called metric because they obey the triangle inequality:

The edge weights that become smaller with the metric closure break the triangle inequality in D(X) and are not included on the backbone. When an edge d ij of D(X) breaks the triangle inequality, it means that the length of at least one indirect path between x i and x j is shorter than the direct distance:¸i j = d T ij < d ij . These are known as semi-metric edges [25] and do not contribute to any shortest path [27] . Thus, metric edges alone define the backbone and are su cient to compute the closure, as shown in Simas et al. [13] .

The proportion of semi-metric edges is, therefore, the proportion of edges in graph D(X) that are not necessary to compute shortest-paths. This measure of edge redundancy is given by:

Similarly, the proportion of metric edges in graph D(X) is the relative size of its metric backbone B(X):

It follows that · = 1 ≠ ‡. Because distance graphs are symmetric (d ij = d ji ) and edges are nondirected, in formulae 5 and 6 we count each edge only once and do not tally reflexive edges, d ii . This means we tally only the lower diagonal entries of the adjacency matrix: d ij : i > j. The metric closure, computation of all pairs shortest paths (APSP), induces a topological distortion [27] of the original graph obtained from the multivariate associations observed in the social contact data, whereby semi-metric edges are made to conform to the triangle inequality (eq. (4)). However, only the semi-metric edges get distorted; the metric edges and the metric backbone they compose remain invariant in the APSP. Therefore, ‡ also denotes the proportion of edges topologically distorted by the metric closure, whereas · denotes the proportion of edges topologically invariant under the metric closure. A measure of semi-metric edge distortion between nodes x i and x j is then obtained via the ratio of the direct distance over the shortest indirect path length.:

If an edge d ij is metric, s ij = 1, meaning there is no distortion. If an edge d ij is semi-metric, s ij > 1, meaning there is distortion; the larger the value over 1, the more the edge breaks the triangle inequality, and thus, the more distorted it is in the metric closure. While semi-metric edges are redundant for shortest paths and have null betweenness centrality, their distortion (the distribution of s) varies widely. This in turn a ects the robustness of shortest paths to edge removals [13] (see fig. 4 and section 4).

We have computed the metric backbone for nine di erent contact networks from a variety of social contexts, from an elementary school in Utah (USA) to an art exhibit in Dublin (Ireland). As shown in table 1, these networks vary widely in the number of nodes and edges, yet the metric backbone typically comprises a small proportion of edges. Our metric backbone analysis is exemplified by the contact network collected by the SocioPatterns collaboration [37] over a period of 4 days in 2013 in a High School (Fr-HS) in Marseilles, France [46] . A similar analysis of all other networks is provided in Supplemental Materials (SM). The Fr-HS contact network was built from 188,508 contact records among |X| = 327 students enrolled in distinct préparatoires classes, which are designed for college-bound students in the last two years of their high school studies. This student body had been largely separated from other high school students (i.e., di erent building and lunch), forming an almost closed population with few contacts with the outside world, at least during school days. Nine classes were organized in four di erent specializations: "MP" focus on mathematics and physics (3 classes), "PC" on physics and chemistry (2 classes), "PSI" on engineering (1 class), and "BIO" on biology (3 classes). The total student participation in the study was 86.3% [46] .

The distance graph D(X) and its metric backbone B(X) have been obtained for the Fr-HS data via eqs. (2) and (3) in section 2 and are depicted in fig. 1 , with other processing details described in section 5.1.

As shown in table 1, the Fr-HS metric backbone comprises only · ¥ 10% of the edges of the original network. Interestingly, in addition to preserving all shortest paths by design [13] , the metric backbone also preserves and highlights the community structure of the original social network. This is clearly seen in fig. 1A -C when we compare its center (B) and right (C) panels. The metric backbone depicted in fig. 1B the nodes are placed according to the computation of the ForceAtlas2 algorithm [47] (in Gephi [48] ) on the original distance graph (shown in fig. 1A ). In contrast, in fig. 1C the nodes are placed after recomputing the same algorithm using only the backbone subgraph. Interestingly, we observe that the community structure remains largely unaltered with respect to the fig. 1B panel.

Similar results have been observed for all other networks in table 1 (see section S2), with the interesting exception of the American high school (US-HS) contact network. In this case, the original distance graph is devoid of any obvious social structure. After edge removal, however, four distinct communities that likely correspond to the four grades typical of American high school education can clearly be seen (see fig. 1D -F). In this larger network (|X| = 788), removing the ‡ ¥ 92% of edges that are redundant reveals a clear community structure in the metric backbone graph. Moreover, while the metric backbone comprises only · ¥ 8% of the edges, it also preserves all of the original shortest paths.

It is useful to quantify how well the metric backbone preserves or highlights the social organization of contact networks beyond visual inspection. While visual representation of networks is an important part of social network analysis, especially for small-or mid-size networks, visual representations and inspections become increasingly di cult as the number of nodes and edges grows. Indeed, it is well known that the task of clustering of nodes into communities is extremely challenging [49] .

We have developed and used a set of measures to compare the community structure of each distance graph using its backbone and same-size threshold and random subgraphs (see section 5.2 for details). Because all measures support the same conclusions, in the main text we present the results for the bidirectional similarity The Fr-HS dataset includes m = 9 classes in the metadata (see fig. 1A -C), while the Louvain community detection algorithm [50] identifies m = 10 communities in the original distance graph, showing the additional very small community splitting from one of the MP classes, as shown in fig. 2A . In other words, the community structure of the social contact data represented by the distance graph, as captured by Louvain, is very similar to the metalabel communities. Indeed, y AB = 0.88 measures a high amount of bidirectional similarity between the two module partitions 5 . This confirms that the social contacts of students tend to occur more frequently within their assigned classes and a few small groups of students interact more with students from other classes/modules, though almost always within the same specializations, as shown in fig. 2A .

We do not have access to a "gold standard," such as an independent survey of the observed cohorts, that shows the true community structure of the contact networks we analyse. We only have access to either the metalabels provided in the data (e.g., the classes each student of the Fr-HS network is enrolled in), or the social organization uncovered by community detection algorithms, such as the Louvain algorithm applied to the original network. It is important to note that neither is guaranteed to perfectly characterize the true social organization. While the metalabels denote organizational roles (e.g., classes and professions), individuals may contact members of other (metalabel) groups more than others within their own group due to personal relationships outside of their organizational roles. Additionally, the original network may contain spurious contact edges, especially if the definition of contact from the sensors' properties do not correspond to close proximity, and if contacts with very short duration can be registered. The social organization of the original network as captured by community detection algorithms may thus be confounded by many contact edges that do not in reality denote a strong social link. Indeed, the US-HS network discussed above is one such case, as shown in fig. 1D -F. Therefore, to provide evidence that the metric backbone preserves a correct picture of the organization of contact networks, we compare its community structure with same-size subgraphs obtained by thresholding or randomly removing edges 6 .

In the case of the Fr-HS network, it is clear from table 2 that the metric backbone preserves the community structure of the original distance graph as well as that of its metalabel partition better than the same-size threshold and random subgraphs. The measure of bidirectional modularity similarity of the metric backbone (y AB = 0.74), with only about 10% of the edges, shows that it preserves the Louvain-community structure of the original distance graph much better than the same-size threshold subgraph (y AB = 0.61) and better than a sample of 100 same-size random subgraphs (y AB = 0.37 ± .02). The di erence can also be visualized as in fig. 2 , where it is clear that the metric backbone breaks into fewer modules (m = 17, B) than do the threshold (m = 27, C) and random (m = 47.19 ± 3.79, D) subgraphs. In the case of the threshold graph this is likely due to deletion of (weak) bridges within and between modules that are nonetheless required to preserve shortest paths [13] . It is also clear that random deletion not only breaks community structure more, it also mixes up the social organization, with students from di erent specializations being grouped together in small modules more than as observed with the metric backbone or threshold subgraph. While there is some variation across di erent social networks, as detailed in SM, the metric backbone typically preserves the community structure of the original distance graph better than same-size threshold and random backbones for the various similarity measures defined in section 5.2.

A similar behavior occurs with the metalabel partition, which is also better captured by the metric backbone (y AB = 0.68) than the same-size threshold subgraph (y AB = 0.55) and the sample of 100 same-size random subgraphs (y AB = 0.38±.02). However, as mentioned above, it is not clear that metalabel partitions capture the true social organization, especially in several of the networks analyzed in SM where metalabels are not always expected to form social contact modules in the observed contexts (e.g., teachers in classroom settings or in-patients in a hospital). Still, in school settings like that of the Fr-HS network, it is clear that threshold and random subgraphs break the metalabel partition into many more Louvain-communities than does the metric backbone, and the random subgraphs mix the metalabel partition much more. Interestingly, with the French Primary School network (Fr-PS) we observe that the Louvain-communities of the metric backbone can be closer to the metalabel partition than the Louvain-communities of the original distance graph (see section S2.1), which suggests that the metric backbone can in some cases filter out noisy or redundant contact data, as was also observed in the US-HS network ( fig. 1D-F ).

Networks are the supporting structure of various types of dynamical processes, ranging from the synchronization of oscillators to the spread of information or infectious diseases [2] . How these processes are altered when they occur on a subgraph, with respect to how they would unfold on the whole network, is a natural question when devising a filtering strategy that retains only a subgraph of the original network [51, 20] . We thus explore this issue in the metric backbone.

As many processes are based on di usion phenomena, we focus on simple and paradigmatic contagion processes typically used to simulate the spread of infectious diseases in a population. Moreover, because our focus is on how a spreading process occurring on a whole network di ers from that on the metric backbone (i.e., we are not interested in recovering dynamics) we consider the simplest such model, the Susceptible-Infected (SI) model. In this model, each node can be in only one of two states: S (susceptible) or I (infected 6 We do not compare community structure to other backbone methods, such as the disparity filter backbone, [18] because they can remove nodes (see section 1) which would result in a di erent universe X to partition into communities. Additionally, distance backbones such as the metric backbone are parameter-free, so we compare its community structure to threshold and random graphs obtained by removing exactly the same number of edges as removed in forming the metric backbone ( ‡(D).|d i>j |) in a parameter-free manner. A comparison and discussion of the advantages of distance backbones in regard to alternative backbone subgraphs is already available [13] . and infectious). A susceptible node, x j , in contact with an infectious one, x i , along an edge of (proximity) weight p ij can become infectious at rate -.p ij (i.e., with probability -.p ij .dt in each time interval dt) 7 . Once infected, nodes remain in that state.

Each simulation starts with an infectious seed node chosen at random and ends when no contagion event is possible (either because all nodes are in state I or because the remaining S nodes have no link with the infectious nodes). In order to characterise the spread, we do not focus on the final size (number of nodes reached by the spread, which is always close to the total number of nodes as they do not recover) but rather on the velocity of the spread [52] . More specifically, we compute the time needed by the process to reach either half of the nodes (t 1/2 ) or all the nodes (t 1 ) [52, 53, 54, 55] . In addition to comparing spreading times on the original network with those on the metric backbone we also, as in the section 3.1 analysis, consider two baselines: subgraphs obtained by thresholding the weights and by randomly removing edges 8 . For a fair comparison, we need to contrast the various subgraphs using the same number of edges. However, in many cases the threshold and random subgraphs are composed of several disconnected components though they have the same number of edges as the metric backbone because only the metric backbone guarantees that connectivity and shortest path distribution are preserved [13] . Therefore, we have devised the following procedure: Starting from the whole network, remove a fraction of the semi-metric edges edges randomly to yield a subgraph composed of the metric backbone plus a percentage ‰ of semi-metric edges (similar to the procedure in Presigny et al. [51] ).

For this study, we have performed n r = 10 runs of the SI model starting from a random seed and computed t 1/2 and t 1 averaged over these runs and over 100 realizations of the set of randomly selected semi-metric edges that are not removed. We then built threshold and random subgraphs with exactly the same number of edges as the metric backbone. We performed exactly the same procedure in each case to obtain a series of intermediate subgraphs between the original graph and the final threshold or random subgraphs of the same size as the metric backbone. We randomly removed a fraction 1 ≠ ‰ of edges that did 7 The infection probability uses the proximity graph, given by eq. (1) in section 2, rather than the isomorphic distance graph D(X) used to define the metric backbone. This probability can naturally be defined in terms of distance weights as -.(1/d ij ≠ 1).dt 8 As already noted in section 3.1, we do not compare to other backbone methods because they can remove nodes in addition to edges or introduce parameters that would complicate the transmission study pursued. A comparison and discussion of the advantages of distance backbones in regard to alternative backbone subgraphs is available [13] .

not belong to the threshold or to the random subgraph. Figure 3 depicts t 1/2 and t 1 as a function of ‰ for the various cases: The rightmost point corresponds to the whole network, D(X), and the leftmost to the metric backbone, B(X), shown in red; corresponding same-size threshold and random subgraphs shown in blue and green, respectively. In all cases, as edges are removed from the original network, the time to infection or half-infection increases as fewer paths are available to transmit the disease. However, it is quite clear that the infection on the metric backbone propagates faster and remains much closer to the infection times observed for the full original network than do the infection times observed on the baseline subgraphs. This is especially striking for the time to full infection · 1 , but also observed for time to half infection · 1/2 . Moreover, as ‰ was decreased we frequently obtained disconnected threshold and random subgraphs, for which t 1 is by definition infinite as not all nodes could be reached, and t 1/2 is finite only if the seed node existed in a connected component that contained at least half of the nodes. The fraction of connected networks for both baselines is indicated by blue and green bars in fig. 3 . It is clear that this fraction becomes rather small well before we reach the size of the metric backbone (‰ = 0). Indeed, by ‰ = 20%, only 5% and 20% of connected networks exist in the threshold and random baseline ensembles, respectively. Naturally, the values of t 1 and t 1/2 shown are only computed for connected networks, thus simulations for subgraphs of the same size as the metric backbone (|B(X)|) were not possible for both threshold and random baseline ensembles; in the case of the threshold baseline, even subgraphs with |B(X)| + ‰ = 10% were not connected. This means that we cannot find threshold and random subgraphs of the same size as the metric backbone that can sustain propagation to the whole (or even half) of the original network. In contrast, time to full (or half) infection is always finite in the case of the backbone-and any intermediate case between it and the full network as ‰ varies-as it preserves the connectivity (and shortest-path distribution) of the original network.

Altogether the SI simulations show that the metric backbone is a primary subgraph for simple transmission dynamics. Furthermore, considering that the threshold subgraph is built by removing the weaker edges (low proximity or large distance), it follows that there are weak links in the metric backbone that are removed from the network in the threshold subgraph but which are essential for transmission. These weak edges are potentially good candidates for interventions aiming at reducing transmission while minimizing reduction of social contact.

It is only in weighted graphs-where weights discriminate and characterize the degree of association between nodes as proximity or its isomorphic distance-that the concept of metric backbone is meaningful because the backbone of non-weighted (regular) graphs is the graph itself. In weighted graphs the metric backbone (and its generalized distance backbone [13] ) is a very useful construct because, unlike other graph reduction techniques, the backbone graph B(X) is guaranteed to preserve all bridges, connectivity, and distribution of shortest paths of the original graph D(X), and is defined in a non-parametric manner from simple algebraic principles. Thus, the metric backbone of a distance graph is unique, typically small, and provides a principled graph reduction technique that preserves shortest paths.

We have analyzed the metric backbones of nine contact networks collected in a variety of social settings and recorded based on the interactions between pairs of individuals via wearable sensors. It was already known that the metric backbone, by design, preserves network connectivity and all shortest paths after removal of all redundant edges for that purpose [13] . Here we have shown that the metric backbones of social contact networks are also much smaller than their associated original networks (large redundancy in shortest-path calculation), as previously observed in biological, technological, and knowledge networks [13] . Built-in redundancy is a hallmark of complex systems, allowing them natural protection against perturbations [56, 57] . In the social networks analyzed here, the small backbones reveal this in that their distribution of shortest paths is very robust to random removal of edges. Specifically, the proportion of semi-metric edges ( ‡) ranges from 52 to 94%, with eight of the nine networks observing ‡ > 80% redundancy. This means that all shortest paths can be computed in these eight networks with fewer than · = 20% of the edges. Indeed, five networks have metric backbones composed of · AE 10% of their original edges (see table 1 ).

The only network that has a fairly large metric backbone ( ‡ = 48 ± 9%) pertains to a context where we do not expect a typical social organization: an exhibit in the Dublin Science Gallery (Ir-Ex). In a museum setting, most visitors pass by alone or in small disconnected groups, thus a large fraction of (weak) edges are required to preserve shortest-path connectivity. Indeed, as shown in fig. 4 , the distribution of semi-metric distortion reveals a social organization of almost random encounters. The values of semi-metric distortion are quite low (s ij ¥ 2) with a small variation range (see also fig. S18 ). This means that the distance graph is very close to being entirely metric, as previously observed in random fluctuation phenomena that occurs in an physical (Euclidean) space [13] . In such cases, the backbone is expected to comprise 50% of the graph, but the (near 50%) semi-metric edges are almost all very close to metric (low distortion). Because shortest paths reveal the strength of social connectedness and the likelihood of transmission phenomena on these networks, we have focused on analyzing how both are captured by the metric backbone. Our results show that in comparison to same-size threshold and random subgraphs, the metric backbone best preserves community structure (section 3.1) and is a primary subgraph for simple transmission dynamics (section 3.2). Additionally, the metric backbone clearly helps in the identification and visualization of the social community structure of the network, especially when the original graph is too dense to be visually inspected (e.g., fig. 1D -F).

Thresholding is often used to decrease the number of edges in densely connected networks. The common assumption is that edges with small proximity (large distance) weights are spurious or simply noise. However, as we show here in all networks studied, the set of weak edges includes those that do in fact contribute to the computation of shortest paths and preserve connectivity and are, thus, important to keep communities connected and sustain spreading phenomena. Thresholding a network can isolate communities, creating disconnected components where dynamical processes, such as epidemic spread, cannot be sustained. Indeed, our epidemic spread simulations on the nine social networks (section 3.2) show that threshold or random subgraphs much larger than the metric backbone frequently lead to disconnected graphs, but the metric backbone preserves connectivity and shortest paths. The importance of weak edges has long been explored in social networks [17] . Our analysis, thus, suggests that weak edges on the metric backbone are potentially good candidates for interventions that aim at reducing transmission while minimizing reduction of social contact in communities.

It is worth also considering the role of semi-metric edges, those not on the backbone (section 3.2), as their properties show that the distance backbone methodology is not exclusively a network reduction technique. While all (semi-metric) edges with s ij > 1 have null betweeness centrality, the distribution of semi-metric edge distortion (eq. (7)) typically varies widely, often spanning many scales [13] . Indeed, the distribution of s ij is informative about the robustness of shortest-paths on a given network. If there are many semi-metric edges with low values of s, then the removal of metric edges from the backbone is likely to have a small e ect on the distribution of shortest-paths because alternative paths with low distortion are likely to exist. In contrast, if there are high values of s, removing some edges on the backbone is likely to result in a large disruption of the shortest-path distribution. Thus, while the size of the backbone captures how robust the distribution of shortest paths is to a random attack on the whole network, the distribution of s captures how robust the backbone itself is to attack.

This knowledge is relevant when devising intervention strategies to disrupt spreading phenomena on social networks. If the backbone is robust, with low central tendency of s and a more homogeneous distribution, then removing edges from it (hindering specific social connections) will have little impact on shortest paths; this is the case of the Exhibit (Ir-Ex) and Workspace (Fr-Wo) networks as shown in fig. 4 (see also figs. S15 and S18). If, on the other hand, the distribution of s is heterogeneous with a large central tendency and variation, then targeting specific edges on the backbone would result in a large disruption to shortest paths. The latter is the case of all school contact networks we have analyzed, especially the French (Fr-HS) and US (US-HS) high school networks as shown in fig. 4 (see also SM for s distributions of all school networks). For instance, the median semi-metric distortion of the US-HS network iss ij ¥ 30. This means that 50% of the (semi-metric) edges not on the backbone break the triangle inequality by a factor of at least 30. In other words, they refer to social distances that are at least 30 times shorter via indirect paths on the backbone. Thus, deleting edges from this backbone is likely to result in a very large impact on the distribution of shortest paths. Because the backbone is typically very small (for the US-HS · (D) = 7.84%), such intervention strategies are easier to design on it than on the whole network, which reveals the importance of targeting the backbone in containing epidemic spread.

Here we have studied social contact networks statically and left considering attack interventions to the network or backbone for future work. We also intend to include other distance backbones in upcoming studies. Indeed, the metric backbone generalizes to any measure of path length [13] , not just the summation of distance edges as in the standard shortest paths of the metric backbone. Thus, other distance backbones based on ultra-metric, euclidean, and di usion distances are likely to be relevant for both understanding the organization of and transmission phenomena on social networks. Taken together, our results indicate that the parameter-free metric backbone in particular and distance backbones in general can be widely applicable to network science problems, especially when shortest paths and distance or proximity measures are relevant in explaining properties of the system.

The datasets range in date from 2009 to 2015 and were collected from a number of di erent social environments, such an elementary school [34] , a primary school [45] , two high schools [46, 33] , a hospital ward [42] , a workplace [44] , a scientific conference [43] , and a museum exhibit [43] . The data were collected using proximity sensors worn by individuals. Most of the resulting datasets provide a contact record data file, where each row consists of the anonymized IDs of individuals and the time that they met someone, with a time resolution of approximately 20 seconds. Some datasets also contain additional metadata. For instance, in the primary school, class and type (i.e., student or teacher) are provided for each contact ID. Details for each dataset and their analysis are provided in SM, section S2.

Several measures are available to quantify how much the metric backbone is able to preserve the social organization captured by the original network and to compare the value with the social organization captured by null model subgraphs obtained by removing the same number of edges. Given two distance graphs A(X) and B(X) defined on the same set of nodes X from contact or proximity data as described in section 2, we consider the partitions of X to be the community structure of each graph. Specifically, graph A(X) is partitioned into a set of m A nonempty modules A i , and graph B(X) is partitioned into a set of m B nonempty modules B j 9

The module partitions we consider are either the metalabels of the original networks or the communities found by the Louvain algorithm [50] of a target distance graph. The goal is then to compare how similar the module partition of a distance graph is to the partition of its backbone-or threshold or random subgraph of the same size.

We consider various measures for computing similarity between modularity partitions of the same set of nodes, which is an old problem in classification [58] . A straightforward way to obtain a value that captures bidirectional similarity of modularity between two distance graphs is to use a normalized measure based on the Jaccard proximity [38] between all pairs of modules with one from each distance graph:

This measures how similar partitions {A 1 · · · A m A } and {B 1 · · · B m B } of X are to each other and varies in the unit interval (y AB oe [0, 1]). The larger the value, the more similar the partitions. We also consider a directional similarity based on the Jaccard proximity to compare each module in a partition to its most similar counterpart in the other partition (J AaeB ) and vice-versa (J BaeA ) as follows:

These dual measures also vary in the unit interval (J AaeB , J BaeA oe [0, 1]). The larger the value of J AaeB the more each module in partition {A 1 · · · A m A } has a similar counterpart module in partition {B 1 · · · B m B }, and vice versa for J BaeA . A measure of modularity dispersion, intuitively, yields a comparison opposite to those provided in eqs. (8) and (9):

H(A i ) and H(B j ) denote the Shannon entropy of probability distributions associated with the dispersion of each module of one partition into all modules of the other partition: For comparison, in addition to the measures in formulae 8 to 10, we also compute the CluSim [61, 62] measure of modularity similarity, which considers module "nestedness," and the Adjusted Rand Index [58] . All computed measures are shown in the respective dataset details in SM section S2. Data Availability Statement. Computations presented in the analysis were performed in Python. Source code is made available on Github at https://github.com/rionbr/socialbackbone. The distanceclosure Python package used to compute the metric backbones can be found at https://github.com/rionbr/distanceclosure. The SocioPatterns datasets analyzed can be found in http://www.sociopatterns.org/datasets. The Toth et al. [34] and Salathé et al. [33] datasets can be found in their respective publications.

Author Contributions. All authors designed and conducted the research, and contributed to the final manuscript. RBC developed the Python code, performed all backbone computations and visualizations, and drafted the manuscript. AB designed and performed the epidemic spread simulations. LMR developed the metric backbone methodology and the new measures of community similarity employed. He also took a leading role in writing and editing the final manuscript.

Epidemic processes in complex networks. Reviews of modern physics

Dynamical processes on complex networks

Twenty years of network science

Complex systems: A survey

Mining social media data for biomedical signals and health-related behavior

The social symbiome framework: Linking genes-to-global cultures in public health using network science. Handbook of Applied System Science

Modeling the Worldwide Spread of Pandemic Influenza: Baseline Case and Containment Interventions

Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: Theory and simulations

Mitigation of infectious disease at school: targeted class closure vs school closure

The geometry of evolution

Canalization and control in automata networks: body segmentation in Drosophila melanogaster

Control of complex networks requires both structure and dynamics

The distance backbone of complex networks

The e ective graph reveals redundancy, canalization, and control pathways in biochemical regulation and signaling

Information filtering in complex weighted networks

A comparative analysis of link removal strategies in real complex weighted networks

The Strength of Weak Ties

Extracting the multiscale backbone of complex weighted networks

Extracting h-Backbone as a Core Structure in Weighted Networks

Resistance for Pandemics: Mobility Network Sparsification for High-Fidelity Epidemic Simulation. 2021

Statistically Validated Networks in Bipartite Complex Systems

Information di usion backbones in temporal networks

A Pólya urn approach to information filtering in complex networks

The structured backbone of temporal social ties

Soft Computing Agents: A New Perspective for Dynamic Information Systems

Proximity and Semi-metric Networks for a Collaborative and Recommender Web Service

Distance closures on complex networks

Commentary: Semi-Metric Topology of the Human Connectome: Sensitivity and Specificity to Autism and Major Depressive Disorder. Frontiers in Neuroscience

Semimetric analysis of the functional brain network: Relationship with familial risk for psychotic disorder

The Shortest Path is Not Always a Straight Line: Leveraging Semi-metricity in Graph Analysis

Computational Fact Checking from Knowledge Networks

Dynamics of Personto-Person Interactions from Distributed RFID Sensor Networks

A high-resolution human contact network for infectious disease transmission

The role of heterogeneity in contact timing and duration in network models of influenza spread in schools

Gender homophily from spatial behavior in a primary school: A sociometric study

Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees

SocioPatterns: data-driven social dynamics and human activity

Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des

Explorations in Automatic Thesaurus Discovery

All Pairs Shortest Paths using Bridging Sets Retangular Matrix Multiplication

A note on two problems in connexion with graphs

Estimating Potential Infection Transmission Routes in Hospital Wards Using Wearable Proximity Sensors

What's in a crowd? Analysis of face-to-face behavioral networks

Can co-location be used as a proxy for face-to-face contacts?

High-Resolution Measurements of Face-to-Face Contact Patterns in a Primary School

Contact Patterns in a High School: A Comparison between Data Collected Using Wearable Sensors, Contact Diaries and Friendship Surveys

ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software

An Open Source Software for Exploring and Manipulating Networks

Community detection in graphs

Fast unfolding of communities in large networks

Building surrogate temporal network data from observed backbones

Velocity and Hierarchical Spread of Epidemic Outbreaks in Scale-Free Networks

Small But Slow World: How Network Topology and Burstiness Slow Down Spreading

Immunization strategies for epidemic processes in time-varying contact networks

Di usion on Networked Systems Is a Question of Time Or Structure

Measures of degeneracy and redundancy in biological networks

Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks

Comparing partitions

Fuzzy sets and fuzzy logic

Neural Networks and Fuzzy Systems

Element-centric clustering comparison unifies overlaps and hierarchy

CluSim: a python package for calculating clustering similarity. The Journal of Open Sourrce Software

(AB). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.