key: cord-0004406-qf32hk7p
authors: Lee, Eun; Emmons, Scott; Gibson, Ryan; Moody, James; Mucha, Peter J.
title: Concurrency and reachability in treelike temporal networks
date: 2019-12-10
journal: nan
DOI: 10.1103/physreve.100.062305
sha: ff2cb5878147570cad8b5eb117099fe750dc26a6
doc_id: 4406
cord_uid: qf32hk7p

Network properties govern the rate and extent of various spreading processes, from simple contagions to complex cascades. Recently, the analysis of spreading processes has been extended from static networks to temporal networks, where nodes and links appear and disappear. We focus on the effects of accessibility, whether there is a temporally consistent path from one node to another, and reachability, the density of the corresponding accessibility graph representation of the temporal network. The level of reachability thus inherently limits the possible extent of any spreading process on the temporal network. We study reachability in terms of the overall levels of temporal concurrency between edges and the structural cohesion of the network agglomerating over all edges. We use simulation results and develop heterogeneous mean-field model predictions for random networks to better quantify how the properties of the underlying temporal network regulate reachability.

Social networks are woven together by temporal contacts organized according to various structural details which together form the substrate of infectious dynamics, determining the impacts of spreading diseases and viral information flows. Compared to the extensive literature modeling spreading dynamics on static networks, we lack a thorough understanding of the particular effects of the temporal properties of network contacts. Meanwhile, recent studies have found that diverse temporal contact features, such as distributions of intercontact times, temporal correlation in intercontact times, and birth and death of nodes and links, may have very different impacts on the dynamics of spreading processes [1] [2] [3] [4] [5] [6] [7] [8] .

Concurrency, broadly defined as "relationships that overlap in time" [9] , is one of the key elements affecting the extent and speed of disease spreading. Concurrency is a longstanding concept in epidemiology and has been considered in diverse contexts, including for understanding the epidemic potential of HIV [9] [10] [11] [12] . Some studies have applied the concept as a property proportional to an average contact rate or an average degree in unit time [11, 12] or as a link density of a reachable network converted from pair-to-pair contact patterns [10] . In another recent study, concurrency was considered as the number of links of an individual in unit time within a generative temporal activity model [13] . Whatever the particular definition of concurrency, the general idea is that increased concurrency increases the density of the effective network structure over which an infection is transmitted, resulting in a larger number of alternative paths between nodes, thus increasing the potential for greater spread through the population. As such, the general conclusion that higher concurrency increases the potential for epidemic spread seems to be trivial. However, the detailed mechanism of this increase is important to understand and to quantify to assess the impact of concurrency in a particular temporal network setting.

Motivated by previous work [9] , we consider the reachability of the temporal contact network over which transmission can occur. Reachability is the density of the accessibility graph that includes an edge from node i to node j if and only if there is a temporally consistent path originating at i that can reach j in the underlying temporal contact network. That is, reachability quantifies the maximum possible impact of the infectious spreading by quantifying the fraction of node pairs that can be accessible via temporally consistent paths (see, e.g., [9, [14] [15] [16] ). Reachability is a useful metric not only because it measures the maximal substrate of infectious spreading, but also because it indicates how much temporal continuity can be ignored when one uses an aggregated static network to analyze infection dynamics [15] . The accessibility of a directed node pair quantifies whether there is any path along which a chain of edges can causally propagate the infection. Notions of accessibility and reachability have been suspected to be crucial elements for explaining the diversity of possible epidemic outbreak sizes [17] and have been applied to measure epidemic spreading potential of nodes in an animal trade network [18] . Similarly, Williams and Musolesi [19] applied a concept of reachability to measure the robustness of spatiotemporal networks. By directly measuring the maximal extent of temporally consistent paths, reachability controls the maximal extent of any general spreading dynamics on the temporal network, thus providing a benchmark for the overall potential for infection.

To separate the influences of the temporal and structural details, we focus as in [9] on two critical properties of the temporal network: temporal concurrency and structural cohesion. Temporal concurrency is defined here as the fraction of pairs of links that overlap in time. Meanwhile, structural cohesion measures the effective connectedness in the underlying topology of the time-aggregated network, ignoring the temporal details of the individual edges. A good measure of structural cohesion should embody the notion that highly cohesive networks should be difficult to separate, i.e., by node or edge removal, into separate components. As such, we employ the definition of structural cohesion as the average number of node-independent paths between two nodes [20] , as applied to the network that includes all edges over the total time period studied. We emphasize that structural cohesion is more than a simple function of edge density; rather, it is influenced by the organizational patterns of the connections. In particular, one can observe different amounts of structural cohesion even while keeping the total link density constant.

By deliberately separating the structural cohesion measurement from that for temporal concurrency, we explore the role of each and the interplay between them in affecting reachability. Pairing numerical calculations with an approximate model we develop here, we examine the roles of these temporal network properties, observing in particular how structural cohesion directly affects the description of the use of detours to find temporally consistent paths between node pairs. Our approximate model focuses on networks that are treelike in the sense of having low structural cohesion, in an effort to develop and assess the accuracy of model approximations for the level of reachability in random temporal networks. (We refer the interested reader to Ref. [21] for further discussion of what it might mean for a network to be treelike in this sense.)

We start with detailed definitions of temporal concurrency and structural cohesion in Sec. II A, continue to describe the methods for constructing our synthetic and sampled empirical networks in Secs. II B and II C, respectively, and provide specific quantitative details for numerically computing reachability in Sec. II D. We then develop our model approximation for reachability in Sec. III. In Sec. IV we compare numerical measurements and the approximation for reachability on synthetic trees, Erdős-Rényi networks, and configuration model realizations with exponential degree distributions, before continuing on to the empirical examples studied previously in [9] . We conclude with a discussion in Sec. V about the effect of temporal concurrency on the reachability and limits of the presently developed approximation, along with possible future directions for improvement.

The ease with which disease spreads on a network is typically increased in the presence of multiple diverse alternative paths between nodes. In a temporal network with many links overlapping in time, concurrency increases the number of such paths that are temporally consistent, possibly accelerating spread and increasing total outbreak size even without increasing the number of contacts in the network. To study the role of the structural and temporal connectedness, we separately consider the impacts of the structural cohesion and temporal concurrency, following the approach in [9] (summarized above and presented in detail below).

We emphasize that throughout this study we distinguish three related network representations describing the pattern of interaction: (1) the full temporal contact network [ Fig. 1(a) ], which we assume is undirected; (2) the static aggregated network that includes all links that ever appear [ Fig. 1 To measure the structural cohesion κ , we consider only the static aggregated network representation including all edges that are ever present in the specified temporal network. Within the aggregated network, we seek the number of nodeindependent paths κ (i, j) available between nodes i and j. We employ the shortest path approximation of [22] to numerically calculate κ (i, j) and then average over all pairs of nodes

where N is the size of the network. To measure temporal concurrency C, we consider here the single-interval case where the link between nodes i and j (if present) has a single specified starting time s(i, j) and persists for duration d (i, j). For simplicity, we will assume that start times and durations are each independent and identically distributed across the edges that ever appear in the aggregated network during the selected total time period T . That is, in particular, the timings of the edges emanating from a given node are necessarily independent of each other. As such, the temporal concurrency of edges associated with a given node, measuring the probability that there are such edges overlapping in time, becomes by this independent and identically distributed assumption equivalent to the probability that any randomly selected pair of links overlaps in time. We can thus select two randomly selected links with start times and durations denoted by s 1 , s 2 and d 1 , d 2 , respectively. Without loss of generality, let s 2 s 1 . The probability that these two edges overlap in time is then simply the probability that the duration of the first edge is larger than the difference between starting times d 1 > s 2 − s 1 . If we let p(d ) be the probability distribution function of durations and I (s) be the probability distribution function of start times, the concurrency under these simplifying assumptions becomes

For the simulated temporal network data studied here, we further simplify the above expressions for temporal concurrency by assuming specific probability distribution functions for the start times and durations of edges, inherently nondimensionalizing the timescale of the two distributions so as to work easily within a one-parameter model for modifying temporal concurrency. In so doing, we emphasize that the temporal concurrency and the subsequent calculation of reachability depends only on the orderings of start and end times, not the total amounts of time involved in those overlaps. We take a uniform distribution for edge start times I (s) = 1/T , s ∈ [0, T ], and draw durations from an exponential distribu-

We emphasize that changing the decay rate of the exponential in p(d ) is unnecessary, since doing so is nondimensionally equivalent to a change in T for calculating concurrency and reachability. That is, the range T has inherently become a nondimensional ratio of the underlying timescales of the distributions of start times to that of edge durations.

The complementary cumulative distribution function of edge durations, which we notate by D(t ), then simplifies to D(t ) = ∞ t e −τ dτ = e −t and the concurrency of the temporal network under these timings can be rewritten as

We note that C ≈ 2/T for T 1. Meanwhile, taking the series expansion of the exponential, we obtain that the temporal concurrency for T 1 approaches the value 1 like

. = 0.736. Importantly, our structural cohesion definition depends only on the topology of the aggregated time-independent network, ignoring start and end times and including all edges that ever exist during the time period. In contrast, the temporal concurrency depends only on the distributions of start times and edge durations, independent of the network topology.

To explore the effect of temporal concurrency and its interplay with structural cohesion, we will examine reachability with a model approximation based on the assumption that the networks are locally treelike. We thus start by confirming the analysis on balanced and unbalanced tree networks, which have only one node independent path for each node pair. In the balanced tree networks, each node has r successors except the leaves that are at distance h from the root.

We generate unbalanced tree networks by rewiring the balanced trees, ensuring that they maintain the same numbers of nodes and edges. In our rewiring, we choose a random edge (i, j) from the set of edges {E}. Removing this edge separates the network into two components: One includes i and the other includes j. We then choose a random node v from the component containing node j and connect i to v. In so doing, we ensure at each step that we maintain a tree structure without cycles. We continue this process φ|E| times, where the rewiring fraction used in our work here is φ = 0.1; that is, we rewired 10% of the links.

We generated connected components from Erdős-Rényi (ER) networks using the gnp_random_graph function in the NetworkX PYTHON package [23] , Generating 100 ER networks initially from N = 120 nodes and connection probability p = 0.017 yielded the largest connected components of size N = 93.92 ± 8.6 and average degree k ≈ 2. For p = 0.025, with the same initial size, the largest connected components have an average size of N = 113 ± 3.1 and k ≈ 3.

We further compare the reachability on the ER networks with randomly generated graphs with exponential degree distributions, as described in the Appendix. In particular, we observe that the average structural cohesion for an exponential degree distribution graph is typically smaller than for an ER network with the same mean degree. We then subsequently rewire the ER and exponential degree networks to a desired matched structural cohesion, to clarify the comparison being considered (see the Appendix).

To connect our results to the previous work of [9] , we used the same sampled collaboration networks studied there, which were extracted by four-step random walks from collaboration networks [24] . In particular, we consider the same four examples that were highlighted in Fig. 3 and Appendix 2 of [9] , having similar sizes to one another but different structural cohesion. These selected networks capture low average numbers of partners and skewed degree distributions, both of which are typical in sexual contact patterns, making them useful for testing the impact of temporal concurrency in the context of a spreading infection [9] .

For each of our four different classes of aggregated network structure (trees, ER networks, exponential degree distributions, and empirical examples), we randomly generate the temporal information for each edge, i.e., start times and durations. We note that we treat all of our synthetic networks as single-interval temporal networks, where each edge is present for the entirety of the duration after its start time, as drawn from the selected distributions. Given the start time s and duration d of an edge, its end time is of course = s + d.

As described above, we consider start times drawn from a uniform distribution s ∈ [0, T ], with durations following an exponential distribution p(d ) = e −d . As such, T is effectively a dimensionless time, which we vary in the range T ∈ [0, 20].

Given the specific temporal contact information of every link in the temporal contact network, we directly evaluate the average reachability as the density of the accessibility graph. Direct contacts like (A, B) in Fig. 1 (a) immediately carry over into the accessibility graph, along with additional ordered pairs like (A, C) and (D, C) in Fig. 1 (c). For example, an infection starting from D at t = 1 can reach C either by directly infecting B or by infecting A who then infects B, and then by B infecting C during their (later in time) contact. However, an infection seeded at C cannot reach A or D because of the absence of temporally consistent paths, since the (B, C) link does not appear until after all of the other edges have ended.

The role of concurrency as a potential enhancer of reachability is immediately apparent in this small toy example if we vary the start time of the (B, C) edge: If that start time were before t = 4, then the ordered pair (C, D) would also be in the accessibility graph, so an infection seeded at C can spread further than with the timings indicated in the figure. Similarly, if that start time were before t = 3, the ordered pair (C, A) would also be accessible.

We describe the unweighted accessibility graph through its adjacency matrix R with elements R i j = 1 when there is a temporally consistent path from node i to node j; otherwise R i j = 0, as shown in Fig. 1 (e). To quantify an average accessibility across the whole network, we calculate the density R of the accessibility graph

and we call this quantity R the reachability of the temporal network.

To numerically evaluate the reachability from temporal network data, we represent the essential temporal information into layers of contacts corresponding to edge end times that have been sorted in ascending order, as depicted in Fig. 2 . The process of generating the temporal layers is as follows.

(1) Sort edges in {E} by end time w in ascending order, where w is the end time of edge l w , w ∈ [0, E − 1] is the sorted index of edges, and E is the total number of edges Fig. 1 that overlap with the disappearance of the wth link (counting from 0, T 3 not shown). If multiple edges end at the same time, the accessibility calculation proceeds equivalently with either a separate but identical matrix for each edge ending at that time or a single matrix corresponding to that time. E = |E|. For example, l 0 is the edge with the earliest end time, whereas l E −1 is the last edge to end. (Breaking ties between identical end times is unimportant for calculating reachability, except insofar as it can be used to speed up the calculation by indexing the smaller number of distinct end times, under an appropriate change of notation.)

(2) Construct the wth temporal layer matrix T w by including edge l w and all other edges l w with w > w that are present just before the end time w . That is, T w includes l w and all l w satisfying both s w < w and w w . (3) By repeating step 2, the full set of temporal layer matrices T 0 , T 1 , . . . , T E −2 , T E −1 may be prepared.

(4) Multiply the matrix exponentials of each temporal matrix T w to obtain Fig. 2 , satisfying the step 2 conditions above. One can similarly determine the adjacency matrices corresponding to the end time of each edge and multiply the matrix exponentials to evaluate the accessibility matrix as described in step 4. Once we binarize the accessibility matrix R, the reachability R is obtained by averaging the off-diagonal elements of R. In the example in Figs. 1 and 2, the reachability is R = 10 12 .

= 0.83. The matrix exponentials in step 4 above provide a simply expressed formula indicating the connected components within each individual temporal layer. As such, multiplying the matrix exponentials for any set of consecutive temporal layers yields (after binarizing) the reachable network associated with that combination of layers. However, in practice for larger temporal networks, it is significantly more efficient computationally to instead directly calculate the connected components of T w and replace the matrix exponential with the binary indicator matrix whose elements specify whether the corresponding pair of nodes is together in the same component at that time. Similarly, to save memory overhead, steps 3 and 4 can be trivially combined to consider only one temporal layer at a time. For even larger networks whose adjacency matrices must be represented as sparse matrices to fit in memory, the corresponding accessibility graph could instead be constructed one row at a time, updating the running average of the density R to calculate the overall reachability.

We seek to approximate the reachability in terms of some minimal temporal and structural information necessary to accurately describe the essential relationships. Using simulated timings on random graphs, we immediately observe that the overall density and level of cohesion are insufficient for describing the needed structural effects. Specifically, we consider simulated timings on random networks with exponential degree distributions and ER graphs that have been rewired to target specific cohesion values as described in the Appendix. The results in Fig. 10 show reachability versus concurrency for rewired ER and exponential degree distribution graphs both with k = 3 and κ = 1.7, demonstrating clear differences. As such, we desire to more accurately approximate the reachability versus concurrency relationship using more structural information. Motivated by Fig. 10 and modeling successes for other network problems (see, e.g., [21, 25, 26] and citations therein), we consider approximations developed in terms of the underlying degree distribution.

We develop a heterogeneous mean-field model for reachability specified in terms of the degree distribution p(k), temporal concurrency, and structural cohesion. In so doing, we implicitly assume that the underlying aggregated network is sufficiently locally treelike (see, e.g., [21] ). Assuming that the essential network structure is (at least largely) dictated by the degree distribution p(k), we proceed to develop models for the effects of the temporal concurrency of edges.

As above, let I (t ) be the probability of edge starting times and D(t ) be the probability of an edge duration larger than t. We seek the probability P(s, ) that a given chain of edges is temporally consistent given that the first edge has start time s. By definition, P(s, 1) = 1, since any edge considered in isolation is temporally consistent with itself. Assuming all edge start times and durations are independent and identically distributed, the recursive equation for P(s, ) is developed by considering whether the start time of the next edge in the chain is before or after s:

The first integral accounts for the possibility that the next edge in the chain has a start time before the first edge, distributed according to I (t ), but duration long enough to be concurrent with the first edge, with probability D(s − t ). Importantly, we then assess that this next edge is the first edge in a chain of − 1 edges that are temporally consistent with probability P(s, − 1), as opposed to P(t, − 1), to account for the requirement that all of the edges on the remaining chain still need to be concurrent or appear after the original edge start time s. In contrast, the second integral directly measures the contribution from the next edge starting at time t > s, after the start time of the first edge, along with the probability that an edge starting at time t is part of a chain of − 1 edges that are temporally consistent. The second integral here ends at time T since by our definition

Given P(s, ), we can then determine the probability W (T ) that a randomly selected chain of edges of path length is temporally consistent, by averaging over the I (s) distribution for the start time of the first edge,

where we have here explicitly noted the remaining dependence on T for temporal consistency along path length . We note in particular that the initialization P(s, 1) = 1 yields W 1 (T ) = 1 for all T ; that is, a path of length 1 is necessarily temporally consistent.

Motivated by Moody and Benton [9] , specifically, inspired by their observations about the role of structural cohesion as measured by the average number of node-independent shortest paths, the development of our model approximation proceeds by restricting attention to the node-independent shortest paths between a node pair (i, j). This treatment, taking the assumption of locally treelike structure to an extreme, allows us to treat the probabilities along each nodeindependent path independently. However, in doing so, we recognize that we will undoubtedly fail to take into account all potential detours around temporally inconsistent parts of these paths. Nevertheless, as we will see below, this approach appears to be relatively accurate for small enough structural cohesion and particularly so at low levels of concurrency, presumably because the probabilities W (T ) drop off quickly with increasing under such conditions.

As above, we continue to denote the number of nodeindependent shortest paths between nodes i and j by κ (i, j). We index those paths by q ∈ {1, . . . , κ(i, j)} and identify the length of path q by (q). We then seek the probability that path q is temporally consistent, which we write as W (q) (T ). By the definition of reachability, an ordered node pair is accessible if there is at least one temporally consistent path from the one node to the other; so we only need to exclude the case that there are no temporally consistent paths between the nodes. Continuing to assume that we can reasonably consider only the node-independent paths, the probability ρ(i, j) that at least one of these κ (i, j) node-independent shortest paths under consideration between i and j is temporally consistent follows simply by independence:

That is, given the computation [22] that separately identifies the number and length of node-independent shortest paths for each node pair (i, j) in the aggregated network, Eq. (7) gives us the probability of accessibility between the pair, as restricted along these node-independent paths. In other words, the corresponding R(i, j) element of the accessibility matrix becomes 1 with probability ρ(i, j). The expected density of the accessibility graph (the off-diagonal parts of the accessibility matrix) under our approximation thus becomes

By our construction, ρ(i, j) = ρ( j, i), though this does not require corresponding similarity in the elements of R. We also note that we would ideally consider edge-independent paths, which are by definition at least as numerous as the nodeindependent ones. However, given the observed relationship between structural cohesion and the degrees of node pairs in our random graph results in Fig. 9 in the Appendix, we expect that the typical numbers of edge-independent shortest paths should not on average be much greater than the nodeindependent ones in these random cases.

The above calculation ofR requires detailed knowledge of the number and lengths of the node-independent paths between each (i, j) pair. That is, for all intents and purposes, we need the entire structure of the aggregated network upon which to calculate these quantities. However, in many network survey conditions, the available information is much more tightly constrained. It can be particularly beneficial under such settings to model outcomes at the level of heterogeneous mean-field theories that use only the degree distribution of the network (see, e.g., [26, 27] ). Since such models are typically derived from locally treelike assumptions (see, e.g., [21] ), we find it reasonable to consider how we might similarly extend our treelike assumptions above.

Given a distribution of path lengths p( ) to be considered as independent candidate paths between a randomly selected pair of nodes, the joint probability that a selected path has length and is temporally consistent is given by p( )W (T ). Summing over possible path lengths, we compute the probabilityρ that a path from this set is temporally consistent,

where L is the largest path length in the distribution p( ). Then the probability that at least one of κ (i, j) independent paths between nodes i and j is temporally consistent simplifies tõ

Notably, using the probabilityρ is this way decouples the considered probabilities along each path from all other possible properties of importance of nodes i and j (e.g., their degrees). Again, in making this calculation we have made the (rather strong) assumption that we considered only independent paths.

To model κ (i, j), we note that while an exact analytical measure of the structural cohesion appears to be prohibitively difficult, a trivial application of Menger's theorem [28] requires the maximum number of node-independent paths between nodes i and j to be bounded by the minimum degree of the pair min(k i , k j ), where k i and k j indicate the degrees of the two nodes. We observe that this upper bound yields a good approximation for the average cohesion κ in our random graphs, as observed in Fig. 9 . That said, we note by way of contrast that the four empirical networks from [9] that we study have much lower cohesions (1.61, 1.34, 1.07, and 1.06) than bounded by this relationship to node degrees (3.18, 3.39, 2.10, and 2.01, respectively). Moreover, in a true tree we require κ (i, j) = 1 for all node pairs by definition.

By assuming κ (i, j) ≈ min(k i , k j ) and substituting the approximation into Eq. (10),ρ(i, j) depends only on the node degrees k i and k j , along with the path length distribution p( ) under consideration. The resulting approximation of reachability, denoted byR to distinguish it from theR calculation of the preceding section, then becomes

where p(k) is the degree distribution and we have not bothered to correct the O(1/N ) contribution inR corresponding to pairing a node with itself. We again emphasize that we have assumed the structural and temporal details of our temporal networks are independent of one another. Therefore, for instance, there are no correlations between node degrees and all of the temporal details absorbed into the W (T ) terms. The only remaining structural contributions in theR approximation are from (i) the empirically observed p(k) degree distribution, (ii) the selected model for κ (i, j) as discussed above, and (iii) the selected distribution of path lengths p( ) to obtainρ in Eq. (9) .

In theory, one could continue by way of approximating p( ) in terms of the degree distributions [29, 30] . In particular, we note that in going in this direction one is more likely to be able to employ some model for the distribution of geodesic shortest path lengths, as opposed to that for node-independent shortest paths. Similarly, if the path length distribution is to be sampled by some manner, it may be more likely to get a reasonable sample of the geodesic paths versus the nodeindependent ones. To explore the effect of potentially using the geodesic shortest path length distribution instead of the node-independent path length distribution, we below consider both possibilities by direct use of the empirically observed path length distributions in each network, usingR s to represent the approximation obtained using the shortest path length distribution andR n for the model using the node-independent path length distribution.

We numerically examine the relationship between temporal concurrency and reachability on different families of networks, comparing with our model approximations. The reachability approximationR from Sec. III A uses the specific structural information of numbers and lengths of nodeindependent paths between each node pair. In contrast, theR s andR n approximations from Sec. III B employ path length distributions over the whole network, using the distributions of shortest paths and of node-independent paths, respectively. We confirm the results for trees (balanced and unbalanced). We then test the calculation on Erdős-Rényi networks at two different densities and on four empirical networks highlighted in [9] .

We numerically evaluate the reachability and our approximation, varying the temporal concurrency C on balanced and randomly unbalanced tree networks with two approximations. Since, by definition, a tree provides only a single node-independent path between a node pair, we accordingly set κ (i, j) = 1 in Eqs. (7) and (10) in calculatingR and R, respectively. Similarly, because the node-independent and geodesic shortest path distributions are thus identical, we note thatR n =R s on a tree.

In Fig. 3 the approximations accurately describe the typical increase in reachability with increasing temporal concurrency for both the balanced and unbalanced trees. We specifically note that the concurrency values plotted here are the expected for balanced and unbalanced tree networks, respectively. Navy lines display the simulated reachability with 100 synthetic balanced and unbalanced networks. Green dashed lines represent the approxima-tionR relying on the numbers and lengths of paths between each node pair. The dashed red line shows the approximated reachability with the node-independent path length distributionR n (which is equivalent toR s on a tree). Error bars indicate standard deviations. value given a specified time interval T . The results on the unbalanced trees include different network realizations as obtained by the rewiring described in Sec. II C. We observe a very slight gap between the approximationsR andR n for the unbalanced trees, compared to that of the balanced trees, though the predictions are still well within the standard deviation of the data. We hypothesize that the greater heterogeneity in the path length distribution p( ) in the unbalanced tree network may be a possible cause of this difference. The result confirms that the approximations accurately estimate the reachability for tree networks using only the path length and degree distributions p( ) and p(k), which is of course expected in this setting.

We next consider the reachability on sparse Erdős-Rényi networks with k ≈ 2 and 3, going in with the assumption that these random graphs will typically have sufficiently locally treelike structure [31] . As shown in Fig. 4(a) , the approximationR that includes the specific path length information between node pairs in the network largely underestimates the reachability. If anything, we should not be surprised thatR underestimates the true value of R like this, since the calculation leading tō R only considers reachability along node-independent paths. As such, the increased error made byR in increasing from k ≈ 2 to 3 is expected, though the size of the resulting error emphasizes the apparent importance of available detours around these paths even for these small mean degrees. We note in particular that theR approximation is quite good at very low concurrency C, where the node-independent shortest paths presumably have greater dominance because longer paths along detours become even more unlikely to maintain temporal consistency. However, the limiting behavior of the approximation in the R → 1 approach as C → 1 is clearly incorrect. We hypothesize that the behavior in this limit is possibly controlled by temporal inconsistency of key edgeto-edge transitions important along many paths, which is not an effect considered in the approximation.

We also include the approximationsR n andR s in Fig. 4 . We note thatR n is very similar toR here, indicating only modest change in the jump in the approximation obtained using full path length information for each node pair (R) versus a single path length distribution p( ) across all nodeindependent shortest paths (R n ). The additional gap betweeñ R n andR s is due to replacing the path length distribution empirically obtained over all node-independent shortest paths with the geodesic shortest path distribution, yielding shorter paths which are slightly more likely to be temporally consistent. Thus,R s slightly overestimates the reachability at very low temporal concurrency.

We additionally investigate the effect of a heavy-tailed duration distribution on ER networks with k = 2 and 3, as well as on the balanced tree with r = 3 and h = 4 (see Fig. 5 ). Supplementing our above results using exponentially decaying duration distributions, we consider the power-law distribution p(d ) ∝ d −α with α = 2.25 so that these distributions are heavy tailed with an exponent only a little larger in magnitude than that needed to ensure existence of a theoretical mean. Indeed, we note that the variance of this distribution is infinite, though of course the edge durations of a given finite synthetic temporal network do have an empirical variance. Comparing results on these same synthetic networks, we observe that neither the average nor the standard deviations of the reachability appear to be significantly different between the two duration distributions (exponential and power law). That said, we acknowledge that these similarities may be in part due to finite-size effects given these small networks or could also be due to the absence of correlations in the edge timings on these synthetic networks. Further investigation of such effects might be valuable directions for future research.

To further understand the limitations of our approximations, we explore the reachability frequency of node pairs according to their degrees in ER networks with k ≈ 2 (again restricting attention to the exponentially decaying duration distributions). We directly measure how many node pairs with given degrees are reachable out of the total number of reachable node pairs

where k and k represent the sets of nodes having degree k and k , respectively. We measure f across 100 ER networks with k ≈ 2 for low [C . = 0.1 in Fig. 6(a) ] and high temporal = 0.97 in Fig. 6(b) ]. Perhaps remarkably, we observe an only very small shift in f between these two figures, but the shift in the distribution that is apparent indicates that a larger fraction of the reachable pairs for high concurrency involve the degree-one nodes. That this should be the case makes intuitive sense in that the reachability of the degree-one nodes should be more suppressed at low concurrency than that for higher-degree nodes.

Noting that most node pairs are reachable at C . = 0.97, we make this observation more explicit by also computing the relative reachability frequencyf (k i , k j ) between degree pairs, defined asf

where H is the reachable matrix of the corresponding static network obtained by aggregating the temporal contacts. Because we only consider largest connected components in our numerical experiments, H (i, j) = 1 and the sum in the denominator merely counts the number of such pairs given the selected degrees. Figures 6(c) and 6(d) show the relative reachability frequency for low (C . = 0.1) and high (C . = 0.97) temporal concurrency, respectively. In particular, we confirm in Fig. 6(d) that almost all pairs are reachable, withf ≈ 1 for all degree values. In contrast, for C . = 0.1, the low-low-degree node pairs are much less likely to be reachable, as seen in Fig. 6(c) . Meanwhile, even at low concurrency, we see that high-high-degree node pairs are already quite likely to be FIG. 7 . Fit of κ for the reachability of ER network largest connected components with k = 3. The circles with error bars and the gray dashed line represent the numerical reachability andR s approximation with the shortest path length distribution, respectively. The dotted line showsR s recalculated with constant κ (i, j) = 1.5, while the solid curve uses κ (i, j) = 3.5 as the exponent in Eq. (10) for all node pairs. reachable, with nearly 30% of (k i , k j ) = (5, 5) node pairs being reachable in this setting. We note in looking at Fig. 6 (c) that there are very few degree-6 nodes in these networks, so the apparent drop-off inf for these cases is due to averaging over a small number of such cases.

The increasing errors in our model predictions at higher concurrency are directly because of the increasing importance of the neglected detours around the node-independent shortest paths. Recalling the explicit role of the number κ (i, j) of such paths between nodes i and j in our approximations, we ask whether the relationship between reachability and concur-rency observed numerically might be captured by assuming some other effective values for κ. In Fig. 7 we continue to consider reachability on the k ≈ 3 ER networks. Focusing for this figure only onR s approximations built from p(k) degree distributions and p( ) distributions of geodesic shortest paths, we reproduce here our regularR s approximation using κ (i, j) = min(k i , k j ) from Fig. 4(b) . This approximation overestimates the reachability at low concurrency because the p( ) distribution of geodesic shortest paths are shorter on average than the full set of node-independent shortest paths (the latter used in ourR n approximations). As seen in Fig. 7 , this overestimate at low concurrency can also be at least partially corrected for by decreasing the effective cohesion used in the approximation formulas to κ = 1.5. (For comparison, the average structural cohesion of the underlying ER networks is κ . = 2.08.) Of perhaps greater interest, we see in Fig. 7 that the underestimated reachability at large C appears to be corrected for at this level of modeling by choosing an effective cohesion value of κ = 3.5, yielding a good approximation over the range 0.5 C 1. We believe that identifying such effective cohesion values as modeled from other network features (as opposed to curve fitting here) may be an interesting direction for future work, as a means of extending the range of validity of our tree-based approximations.

We examined reachability versus concurrency on four sampled empirical networks that were highlighted in the previous work of [9] . Example networks (i) and (ii) have low structural cohesion, κ . = 1.06 and κ . = 1.07, respectively, while example networks (iii) and (iv) have relatively higher structural cohesion, κ . = 1.34 and κ . = 1.61, respectively. When the network structure is treelike in the sense of cohesion κ being near 1, all three of our model approximations plotted in Fig. 8 with the numerically calculated reachability. In accord with our other results above, we see that ourR approximation reasonably captures the low-concurrency limiting behavior in Figs. 8(a) and 8(b), and while it necessarily underestimates the level of reachability throughout, the deviation from the true reachability curves at low structural cohesion [Figs. 8(a) and 8(b) ] are not as large as at higher cohesion [Figs. 8(c) and 8(d) ]. Moreover, we see that much of this underestimate is effectively corrected in this case by the other modeling steps introduced by theR n andR s approximations, again particularly so at lower values of cohesion.

We also note here that theR n approximation overestimates the reachability in the low-concurrency regime in Figs. 8(a) and 8(b), unlike the above-observed behavior for ER graphs. This occurs because the way we constructed the empirical distribution p( ) of the node-independent shortest paths for this calculation here counted multiple short paths between nearby nodes. This counting yields on average shorter paths that then overestimate the reachability at small concurrency.

We investigated the overall level of reachability in temporal networks, considering the effects of temporal concurrency and its interplay with network structure, including structural cohesion. We developed a sequence of approximations for reachability based on strong (and potentially inaccurate) assumptions of locally treelike networks. We then compared our approximations to numerical results for simulated edge timings on a variety of types of networks. In networks that are treelike in the sense of low structural cohesion, our approximation agrees well with the numerically computed reachability, particularly so for small concurrency. At larger structural cohesion and/or larger concurrency, the importance of having many possible nonindependent paths is not captured by our existing approximations.

We further explored the effects in our different model approximations using different levels of detailed network information. OurR model approximation uses the observed number κ (i, j) and lengths of the node-independent shortest paths between each node pair (i and j) to describe the empirical node-independent path length distribution p( ). That is, in theR model, these quantities are directly calculated from the full network information.

In contrast, ourR model uses the estimated value κ (i, j) ≈ min(k i , k j ) as calculated from the observed p(k) degree distribution. That is, the minimum degree of two nodes i and j bounds the maximum number of node-independent paths between those nodes, and we observed that this bound often appears to be relatively tight in our networks studied here. Within thisR model framework, we then usedR n to represent results obtained using the measured distribution of nodeindependent path lengths p( ), whereasR s represents results using the measured geodesic shortest paths to approximate the p( ) distribution. Comparing theR n andR s model results indicates the potential level of sensitivity in the model from the details of the path length distribution p( ). It would be interesting in future work to assess this sensitivity under model approximations for either of these path length distributions (cf. the empirical distributions used here).

Whereas our present approximations are more accurate at small temporal concurrency, productive future work might focus on the limiting behavior as C → 1. Specifically, our approximation correctly captures R = 1 at C = 1, but the manner of approach as C → 1 is noticeably incorrect compared to the simulated temporal network measurements, unless we artificially select an increased cohesion value as in Fig. 7 . Given the relatively simple shape of the reachability versus concurrency curves, it is perhaps possible that a theory that is only correct in capturing the limiting C → 1 behavior of reachability might be matched or otherwise combined with our present model to better approximate reachability over the whole C ∈ [0, 1] interval.

The temporal networks studied here are inherently assumed to have edges with a single start time and nonzero duration. In contrast, there are other temporal network data sets that are in the form of lists of contacts that are present in each temporal snapshot, with the possibility that a given pair of nodes might be in contact at one time, not connected at a later instant, and then connected again at a still later moment. In such settings, it may be useful to distinguish the notion of instantaneous concurrency, counting concurrently appearing links at a contact time, from an interval concurrency calculated over all temporally interleaved links. That is, this notion of interval concurrency is equivalent to that obtained assuming that every edge is present for the full duration from its first appearance to its last. We studied the differences between instantaneous concurrency and interval concurrency, and the effect of these representations on reachability, for a variety of example networks in Ref. [32] . We found in that study that the change between the contact and interval representations barely affects the reachability of the sample empirical networks studied there. We believe this result emphasizes the importance of the start and end times, with the interevent contact time details being less important for setting the level of reachability in these cases. It will be worthwhile to further investigate the possible effects of substituting contact data with interval duration representations, especially as it impacts models for infection spread on temporal networks.

We believe the present study, focused on the role of temporal concurrency and structural cohesion in determining reachability, further emphasizes the need to better understand the interplay between the temporal and topological aspects in networks. We hope this study can give more insights for the diversity and predictability of outbreak sizes [17] , measurements of the epidemic potential of nodes [18] , and the estimation of the robustness of temporal networks [19] . With a more complete integrated picture of this interplay, it may be possible in the future to identify different immunization strategies for outbreaks on empirical temporal networks in terms of their estimated structural and temporal properties. For example, such models could then be used to help predict possible benefits obtainable from targeting hub nodes in the underlying contact network versus individual-level or population-level interventions to decrease concurrency. To complement our tests on ER networks, we additionally consider networks with exponential degree distributions that have been rewired to match the structural cohesion of ER networks having the same mean degree. We construct a network with an exponential degree distribution using the configuration_model function in the NetworkX package in PYTHON, which follows steps described in [33] . A degree sequence {k i } for nodes i = 1, . . . , N is generated by independent draws from the given distribution p(k) = k −1 e −k/ k , where k is the desired mean degree. We used the largest connected component from the generated network. We removed self-loops and multiedges and only accepted the resulting network if the mean degree was within 0.1 of the desired k . We note in particular that this procedure does not properly sample the space of simple configuration model graphs without self-loops and multiedges [34] . However, for our present purpose of using these networks as random examples, we do not rely on obtaining a proper sampling of the space. We have not shown figures here exploring our approximations for these exponential degree distribution networks, since they are qualitatively similar to that discussed for ER networks in the main text, in particular having better accuracy at small C.

We note that the exponential degree distribution networks as generated to this point of the procedure have natural levels of structural cohesion that are different from ER networks with the same mean degree, as shown in Fig. 9 . Because of the important role of structural cohesion in the present work, we seek to remove this difference between the exponential degree and ER networks. In Fig. 9 (a) we see that the observed structural cohesion κ in these random graphs is very close to their upper bounds given by averaging over min(k i , k j ), except at small mean degrees k . In Fig. 9 (b) we see that there is very little finite-size effect in the observed structural cohesion values on these graphs. [As an aside, we note that the empirical degree distributions in the largest connected component are generally slightly right shifted from the imposed degree distribution before restriction to the largest connected component. This shift thereby increases the upper bound for structural cohesion obtained by averaging over min(k, k ).]

To tune networks to a desired structural cohesion, specifically, to make networks with ER and exponential degree distributions but with the same structural cohesion, we rewire the links as follows (see, e.g., [35, 36] ). We randomly choose two links (i, j) and (i ¡ c, j ¡ c). If cutting these links does not break the network up into multiple components, we cut these links and then replace them with either (i, i ¡ c) and ( j, j ¡ c) or (i, j ¡ c) and (i ¡ c, j). In so doing, we reject new candidate edges that generate multiedges or self-loops and then select the pair of edges that make the new structural cohesion closest to the FIG. 10. Reachability R versus temporal concurrency C for random graphs with different degree distributions. The ER origin graphs are Erdős-Rényi graphs with mean degree k = 3, yielding an empirically expected cohesion κ = 2 (magenta line). The ER graphs are with mean degree k = 3 rewired to obtain expected cohesion κ = 1.7 (red line). The EXP origin graphs here are exponential degree distributions with the same k = 3, yielding an empirically expected cohesion κ = 1.6 (green line). The EXP denotes rewired graphs having exponential degree distribution, yielding expected cohesion κ = 1.7 as described in the Appendix. The filled area shows the standard deviation. desired value. If neither rewiring option successfully moves the cohesion closer to the target value, the original cut edges are restored. By this method, the degree distribution remains constant while the degree-degree correlation and the structural cohesion change. We repeat this rewiring process until either the target value of structural cohesion is obtained (to within a tolerance here of 0.025) or if the target is not achieved within E rewires then the process is restarted with a new random graph generated from the distribution. Figure 10 demonstrates the reachability of rewired ER and exponential degree distribution networks with the same average degree ( k = 3) and structural cohesion ( κ = 1.7). Even though the mean degree and structural cohesions of these random graphs are the same, the relationship between reachability and concurrency is noticeably different in the figure. This observation further motivates the development of our approximations in the main text in terms of degree distributions and path lengths.

Fast approximation algorithms for finding node-independent paths in networks

Networks: An Introduction

Temporal Network Theory

Research reported in this paper was supported by the Eunice Kennedy Shriver National Institute of Child Health 062305-10