key: cord-0966201-utg8vi7x
authors: Simard, Frédéric
title: Evaluating metrics in link streams
date: 2021-06-04
journal: Soc Netw Anal Min
DOI: 10.1007/s13278-021-00759-7
sha: b0b0c5928c0cd21dc3de06bca0d33b1af0ce5f3f
doc_id: 966201
cord_uid: utg8vi7x

We seek to understand the topological and temporal nature of temporal networks by computing the distances, latencies and lengths of shortest fastest paths. Shortest fastest paths offer interesting insights about connectivity that were unknowable until recently. Moreover, distances and latencies tend to be computed by separate algorithms. We developed four algorithms that each compute all those values efficiently as a contribution to the literature. Two of those methods compute metrics from a fixed source temporal node. The other two, as a significant contribution to the literature, compute the metrics between all pairs of source and destination temporal nodes. The methods are also grouped by whether they work on paths with delays or not. Proofs of correctness for our algorithms are presented as well as bounds on their temporal complexities as functions of temporal network parameters. Experimental results show the algorithms presented perform well against the state of the art and terminate in decent time on real-world datasets. One purpose of this study is to help develop algorithms to compute centrality functions on temporal networks such as the betweenness centrality and the closeness centrality.

Network science has been greatly influenced in recent years by temporal networks. Researchers in various fields have observed that real data varies over time and that static networks are insufficient to capture the full extent of some phenomenon. Different models of temporal networks have been suggested, among which the Link Streams of Latapy et al. (2018) that captures the network evolution in continuous time. As is the case with other forms of networks, the notions of paths and distances are fundamental to the study of link streams. Kempe et al. (2002) mention the use of time-respecting paths to study temporal networks. They further mention applications to epidemiology, in which one would seek information about the spread of a virus in a population. Human interactions can also be analyzed with temporal networks as has been observed by Tang et al. (2010a) and the link stream framework can help advance those studies by allowing edges to have durations. Although online social networks can be thought to vary in discrete time, with tweets and retweets on Twitter for example, in real social networks the interactions have durations which are important to take into account in order to have an accurate description of the data. In practice, studies have emerged from the SocioPatterns Collaboration that includes datasets on face-to-face contacts (see Cattuto et al. 2010; SocioPatterns 2021) with temporal labels. Those datasets are valuable tools to more accurately investigate aspects of social networks such as homophily (Stehlé et al. 2013 ) and epidemics (Moinet et al. 2018 ). Latapy et al. developed the notion of shortest fastest paths in their link stream model as a type of paths that gather together the temporal as well as the structural information of a link stream. A shortest fastest path is one that is shortest among the fastest paths between two endpoints. This type of path is used to define a betweenness centrality and it appears other distance-based centrality functions could be so defined as well. provide an explicit algorithm to compute this centrality in a link stream and the algorithms in the current manuscript can help speed up this process by A short version of this text was presented at the International Conference on Advances in Social Networks Analysis and Mining (ASONAM '19) , in Vancouver, Canada (Simard 2019b) computing the metrics used in their method. A social network can thus be analyzed through different perspectives: using the distance to measure how the connectivity of a group varies over time, the latency to measure how quickly an information can spread into a group of people and the length of a shortest fastest path to measure how efficiently this information is relayed. Note also how the time a shortest path starts and ends influences the information it can spread. Shortest fastest paths describe a natural notion of communication efficiency: when a viral rumour (such as a piece of disinformation) spreads over a network it can spread quickly and those actors on short fastest paths from the source to any receiver can be considered as efficient spreaders.

Therefore, shortest and shortest fastest paths not only enable one to compute Latapy et al.'s betweenness centrality over link streams, but also quantify the reachability of different nodes over different times. Many authors noted how important this task is given it can help determine how information spreads over a social network, for example what person or group of people most likely initiated a campaign over an online social network, but also what individuals should be restricted in the event of a pandemic. The COVID-19 pandemic leads to a trove of online misinformation and there is ongoing research to study the phenomenon (see for example Cinelli et al. 2020) . Similarly, epidemic spreading over temporal networks is still being investigated, for example in the recent work of Ciaperoni et al. (2020) , and some authors integrate centrality functions in their studies.

Our main contributions in this work are two groups of algorithms that compute metrics of shortest (fastest) paths in a link stream. Two algorithms in the first group work from a single source, while the two in the latter work from multiple sources. The two sets of algorithms are split on whether they assume paths have strictly positive or null delays. All methods return the lengths of shortest paths, the lengths of shortest fastest paths as well as pairs of starting and arrival times of temporal (fastest) paths. Some of that information is summarized as reachability triples (a, b, c) (see Definition 4.1) such that for every fixed source s and node v, a is a largest starting time from s to (b, v) and c is the distance from (a, s) to (b, v) . There are three major novel aspects in this work. First, we compute multiple metrics at once, whereas many other authors devise separate algorithms for the same tasks. Second, we compute lengths of shortest fastest paths, which is a novel metric in the literature. Finally, we present algorithms that work from multiple sources. This has not been considered recently in the literature and one has to go back to the work of Xuan et al. (2003) to find similar algorithms that work with multiple sources. In short, our major contributions are:

1. To provide four algorithms each of which compute multiple known metrics; 2. To design algorithms that compute the lengths of shortest fastest paths; 3. To design algorithms that compute metrics from multiple sources.

A major impact of this study is the ability to efficiently compute values required to compute the betweenness centrality of a temporal node in a link stream. It also makes it faster to compute shortest paths between large numbers of nodes in link streams. The novel ability to compute reachability triples for all destinations (and sources) is relevant due to the importance of shortest paths in the analysis of temporal networks. We think our algorithms are also simple yet powerful enough that they could be extended to other metrics such as arrival times of foremost paths and lengths of shortest foremost paths. This seems to hold in particular if the temporal dimension is the first argument optimized over, such as in shortest fastest paths or shortest foremost paths. Our algorithms are evaluated on datasets taken from the konect library of networks (Kunegis 2019) . A description of the datasets used can be found in Appendix B.

General definitions are presented in Sect. 2, followed by a state of the art in Sect. 3. Then, we present our main methods in Sect. 4, experiments in Sect. 5 and we conclude in Sect. 6.

Most definitions are taken from Latapy et al. (2018) . A link stream L is a tuple L = (T, V, E) where T ⊆ ℝ is a set of time instants, V is a finite set of nodes (vertices) and E ⊆ T × V ⊗ V is a set of links (edges). Here, V ⊗ V denotes the set of unordered pairs of vertices and we write uv ∈ V ⊗ V . We say an element (t v , v) ∈ T × V is a temporal vertex.

An edge of E is a tuple (t, uv) . Given an interval I ⊆ T , we write (I, uv) ⊆ E , instead of I × {uv} ⊆ E , to mean all edges (t, uv) such that t ∈ I are in E. We say an edge (I, uv) ⊆ E is maximal if there exists no other edge (J, uv) ⊆ E such that I ⊂ J . We say a maximal edge ([a, b] , uv) ⊆ E starts at a, ends on b and has duration b − a . Given the simple edge (a, uv), we write dur(a, uv) = b − a for the duration of the edge ([a, b], uv) . We let be the set of event times of T,

A maximal edge, as well as and × V are illustrated on the link stream of Fig. 1 The time-induced graph G t induced by a time t ∈ T is defined as G t = (V, {uv;(t, uv) 

We say that such a path starts at t 0 , arrives at t k , has length k + 1 and duration t k − t 0 . We write ( , u) ⇝ ( , v) to mean that there exists a path from ( , u) to ( , v) and say ( , v) is reachable from ( , u) . We also call t 0 a starting time and t k an arrival time from ( , u) to ( , v) . Each path between two fixed temporal nodes ( , u) and ( , v) defines a pair of starting time and associated arrival time. On the link stream of Fig. 1 , two paths are illustrated: the green one P 1 = (0, d, c), (1, c, b), (3, b, a) and the red one P 2 = (0, d, c), (2, c, b) , (3, b, a) . Both have the same starting and arrival times from (0, d) to (3, a), namely times 0 and 3. Both paths are fastest. We can also say s is a starting time from a temporal node ( , u) ∈ T × V to a node v ∈ V , in which case there exists some time t ∈ T such that s is the starting time of a path from ( , u) to (t, v) . Same goes for the arrival times.

We say a path P is shortest if it has minimal length and call its length the distance from ( , u) to ( , v) , written d (( , u) , ( , v) ) . Similarly, P is fastest if it has minimal duration, in which case this duration is called the latency from ( , u) to ( , v) and is written l(( , u), ( , v) ) . Note that if ( , u) ⇝ ( , v) , there exists at least one pair of starting time and arrival time (s, a) such that l(( , u), ( , v)) = a − s . Then, we say s is a latest starting time and a an earliest arrival time. Finally, P is called shortest fastest if it has minimal length among the set of fastest paths from ( , u) to ( , v) . We call its length the sf-metric from ( , u) to ( , v) and write it d f (( , u) , ( , v) ) . In general, this is not a distance as it does not respect the triangular inequality and is only a premetric, a simple counterexample is shown in Fig. 2 . On the same figure are drawn a shortest path, two fastest paths and a unique shortest fastest path.

This work is related to the study of Wu et al. (2014) and our algorithms can be applied in the same contexts as their shortest and fastest paths methods. The main contribution of the present work is to compute sf-metrics, as well as distances and latencies, in a single pass over a dataset. Separately, Wu et al.'s fastest and shortest paths methods are insufficient to compute centralities such as the betweenness of a link stream, while an algorithm combining them to produce sf-metrics is not efficient because it requires iterating multiple times over the dataset. Meanwhile, our methods iterate only once over the dataset to produce the three metrics and are suitable for studying different aspects of a link stream. We also output information on the starting and arrival times of shortest (fastest) paths. This study was instigated as a first step in computing Latapy et al.'s betweenness centrality defined in Latapy et al. (2018) and computed in .

More recently, Himmel et al. (2019) devised a generic algorithm to compute optimal paths in temporal graphs from one source node to all other destinations. Their algorithm is generic in the sense that it can compute different types of optimal paths such as shortest, fastest and foremost. Their main algorithm computes those paths separately. They do consider a method to combine some optimization criteria, such as fastest and shortest; however, this is done through a linear combination of optimization objectives. This is in contrast to the present work, which is focused on a bilevel optimization approach (see Colson et al. 2007) : the criterion that a path be fastest can never be violated, at the possible expense of paths not being shortest. We also present algorithms from multiple sources, which is not the case in their work. Brunelli et al. The shortest path from (1, g) to (9, a) (both encircled) is drawn in green. The two fastest paths are drawn in red and in blue. The sole shortest fastest path is the red one. Observe that, d f ((1, g), (9, a)) = 3 > d f ((1, g) , (9, f )) + d f ((9, f ), (9, a)) = 2 (color figure online) (2021) took a similar approach to Himmel et al. and devised a generic algorithm to compute Pareto optimal paths to generalize multiple criteria (shortest paths, foremost paths, etc.). Again, their method only works from a single source. Moreover, they present the temporal complexity of their method but do not derive the explicit complexities when their method is applied to specific problems. In particular, it is not clear what complexity their method would achieve on the task of computing lengths of shortest fastest paths and how a concrete implementation would fare against our programs. In 2019, Li et al. (2019) improved on the work of Wu et al. by using dynamic programming approaches to compute shortest paths, fastest paths and (restricted) earliest-arrival paths. Although interesting, this work comes with the same restrictions that we observed in the work of Wu et al. Furthermore, this work is also close to Tang et al. (2010b) since these authors define a betweenness centrality on temporal networks in terms of fastest shortest paths. Whether to use fastest shortest or shortest fastest paths (or any other type of path that combines temporal and structural information) depends on what information one wants to emphasize and on the context of the study. Shortest and fastest paths were also studied by Xuan et al. (2003) and we were inspired by their all-pairs fastest path method to develop Algorithm 2 and 4. The latter are relevant to compute some centralities because metrics between all pairs of (temporal) nodes may be required. To our knowledge, Xuan et al.'s method is the only of its kind to return latencies between all pairs of nodes. More recently, Casteigts et al. (2015) adopted the same strategies as Xuan et al. for computing shortest and fastest paths in a distributed way. Casteigts et al. (2012) also offer a survey of temporal networks that includes many applications of shortest and fastest paths. In particular, such paths can be used to study the reachability of a temporal node from another. It appears from that survey that either the distance or the latency is often used as a temporal metric to evaluate how well a temporal node can communicate with another. In this regard, the sf-metric can be used as another temporal function since it combines the temporal as well as the structural information into a single map. Note that the notion of foremost paths (or journeys) is also used by some authors (such as Casteigts et al. 2015) to study temporal reachability. A foremost path only has minimal arrival time, while its starting time is unconstrained. This type of path is also useful in many studies and we expect that our algorithms can be extended to those cases to output lengths of shortest foremost paths. Finally, Casteigts et al. (2020) and Thejaswi and Gionis (2020) studied another type of temporal paths called restless, such that one cannot wait more that a prescribed amount of time on each node. This is an interesting constraint that we have not considered. However, we expect that our methods can also be extended to fit these constraints.

Finally, observe that the link stream framework is also close to the Time-Varying Graphs framework of Casteigts et al. (2012) . Thus, all results presented in this paper carry to this other framework as well. We did not intend to survey the different models of temporal graphs present in the literature, as this is out of the scope of this work. To the best of our knowledge, there are two major models that allow edges to have duration and allow time to flow continuously: the Link Stream of Latapy et al. and the Time-Varying Graph of Casteigts et al. Other models, such as temporal networks (see Holme and Saramäki 2012), evolving graphs (Ferreira 2004 ) and temporal-event graphs (Mellor 2017 ) mostly focus either on discrete timesteps or on zero transmission delay. Our algorithms also work on those models. The time-dependent network model mentioned by Brunelli et al. (2021) is slightly more general than the link stream we use since it allows the transmission delay over the edges to follow a non-constant function.

The full implementations of the algorithms presented here, in C++, can be found online (Simard 2019b) . We present here two main methods, Algorithms 1 and 2 that compute the distances, latencies and sf-metrics from one source event node to all other event nodes. Algorithm 2 builds on the first method to compute those values for all pairs of event nodes. Section 4.4 also presents Algorithm 3 that was derived from Algorithm 1. This new method works with positive delays, a case that is often considered in the literature. Algorithm 4 also works with positive delays and is derived from Algorithm 2. We focus on the first two algorithms.

We present some small results that lead the way to those algorithms. The strategy for all methods is the same: we compute the distances from any temporal node (s

, then this distance is the sf-metric from the former to the latter temporal node. Otherwise, since we iterate chronologically over , this latency must have been computed at a time earlier than t v and is saved in memory.

All algorithms described in this section are presented in Appendix A in order not to disrupt the reading.

The algorithms we present compute what we call reachability triples that contain information about the lengths of shortest paths from one temporal node to another as well as the starting and arrival times of those paths. Definition 4.1 (Reachability triples) Let (t s , s) be an event node. If there exists a shortest path of length l from (t s , s) to the event node (t y , y) that starts on a largest starting time t ∈ , then we say (t, t y , l) is a reachability triple from (t s , s) to y.

In the following, we write R v for the dictionary of reachability triples from a fixed source event node to any node v. In order to reduce the cost of operations in R v , we assume this dictionary is implemented in such a way that R v holds keys

. This makes it easier to search for starting times s v .

All algorithms compute distances from largest starting times only. Those distances are contained in dictionaries R v for each v ∈ V as part of reachability triples. Note that if a link stream reduces to a network, that is if the set of time instants T is a singleton, then each R v will contain the usual distances from a fixed source to v. The temporal nature of a link stream forces us to take starting and arrival times into account when looking for shortest paths. Moreover, reachability triples could also be defined without the constraint that starting times are largest; however, the algorithms would not be as efficient because the dictionaries would grow larger with | |.

Lemma 4.2, due to Wu et al. (2014) , states that shortest paths are prefix-shortest. We say a path

Let (t s , s) and (t, v) be two temporal nodes. We define the outer distance from (t s , s)

, when t s = t . Lemma 4.3 suggests it suffices to compute distances in induced graphs G t for any time t to deduce the distances between two temporal nodes.

Let (t s , s) be a source temporal node and (t y , y) be a temporal node reachable from the source by a non-empty shortest path. Then, there exists t s ≤ t ≤ t y and a connected component C of G t such that Proof Let P = (t 1 , u 1 , u 2 ), … , (t n , u n , u n+1 ) be a non-empty shortest path from (t s , s) to (t y , y) . Then, t y ≥ t n ≥ t 1 ≥ t s and u 1 = s, u n+1 = y . There exist non-empty subpaths in P of the f o r m

largest number of elements. By Lemma 4.2, the prefix of P from (t s , s) to (t − j , u j ) is shortest and its length is d((t s , s), (t − j , u j )) . Moreover, the subpath of P from (t j , u k+1 ) to (t y , y) must also be shortest with length d((t j , u k+1 ), (t y , y)) . Finally, since P is shortest and the two subpaths formed by P⧵Q are shortest, Q must also be a shortest path. Then, Q has length d((t j , u j ), (t j , u k+1 )) . The result follows by letting t = t j and C = {u j , u j+1 , … , u k+1 } be a connected component of G t . ◻

In this section, we present Algorithm 1 that computes the distances (from largest starting times), latencies and sf-metrics from a source event node (t s , s) to all other reachable event nodes. This algorithm mixes iterations on the induced graphs G t for each time t ∈ with an all-pairs distances method on their connected components. Recall that if s * is the largest starting time from the source (t s , s) to some temporal node

. This length is computed with Lemma 4.3 by using the outer distances saved in memory as well as the all-pairs distance method on G t . Thus, when we iterate over all pairs (s v , d v ) of starting time and outer distance from the source to (t, v), we can deduce the duration and the length of the shortest fastest paths from the source to (t, v) . This method uses a set D that is assumed sorted in lexicographic order. Sorting D helps lower the temporal complexity, but is not fundamental to understand the algorithm.

In all algorithms, we assume the dictionaries use self-balanced binary trees in order to obtain logarithmic worst-case complexities with simple structures. In our implementations, we used hash tables and heaps (see Remark 4.7) to lower the running times. Furthermore, all operations on dictionaries such as min and max are well-defined whenever the relevant keys are present in the dictionaries. We assume the absence of a key is properly handled.

Before proving that Algorithm 1 is correct, let us go through a small example in order to build intuition. Other algorithms are highly similar.

Example 4.5 Consider again the link stream of Fig. 2 . Suppose the source is again (1, g), t = 7 and C = {a, b, c} . Thus, Algorithm 1 will look for shortest (fastest) paths that can reach temporal nodes (7, a), (7, b) and (7, c) . The unique largest starting time from the source to C at time 7 is s v = 4 . This time is given by the greatest key in R u for any u ∈ C . Then, we iterate over the outer distances from (4, g) to (7, v) for each v ∈ C . Note how the time of the source has changed from 1 to 4. By definition, and since the link stream is discrete, outer distances are given as the distances from (4, g) to (6, v) for each v ∈ C . Thus, we find outer distances 2 from (4, g) to (7, c) and 3 from (4, g) to (7, b) . Node a is discovered at time 7 and its outer distance does not exist before that. Finally, combining the outer distances with the distances inside the graph induced by C at time 7, we find that the distance from (4, g) to (7, c) is 2, 3 from (4, g) to (7, b) and also 3 from (4, g) to (7, a) . This last distance is given by the combination between the outer distance from (4, g) to (7, c) and the distance in C from (7, c) to (7, a) . Since node a is discovered first at time 7, that is its first arrival time from (1, g) is 7, then the latency from (1, g) to (7, a) is l ((1, g) , (7, a)) = 7 − 4 = 3 and the distance from (1, g) to (7, a) is the sf-metric from the former to the latter. Proposition 4.6 Algorithm 1 correctly computes the latencies and sf-metrics from a source event node to all other reachable event nodes as well as the set of dictionaries

-When = 1 , we iterate only on time t s and the result is clear. -Suppose the result holds for all k < . Let (t 1 , … , t −1 ) be the times previously iterated over on Line 3 and t the current time. By the induction hypothesis, by time t −1 , all values of R w , for all w ∈ V , are correctly updated. Let C v be the connected component of G t containing v. If s ∈ C v , then the result follows as in the case with = 1 . Then, suppose s ∉ C v . Since each R v is correctly updated up to time t −1 for each reachable v ∈ V , D contains triples (−s w , d w , w) for each w ∈ C v that have been visited prior to t −1 from the source from a starting time s w . The set D contains the largest starting time s w from the source to (t , w) . Then, either t + s w = l (t s , s), (t , w) or this latency is given by some f y), (t , w) ) . The sequence of distances

is non-increasing for each u ∈ V because each element is minimal. Thus, since w ∈ C v , in particular this lemma holds with t i = t and C i = C v . Then, By the induction hypothesis, the outer distance

Then, using d x and the dictionary d ′ returned by the all-pairs distances algorithm on Line 6, the expression above reduces to d((−s w , s), (t , w)) = min x∈C v d x + d � [(x, w) ] . In the last equation, the intermediary node x ∈ C v over which the minimum is taken is irrelevant. If y ∈ U x , then the distance from the source to y is the same as the distance from the source to x. Thus, it holds that:

Thus, when we iterate on the element (−s w , d w , w) from D, we construct the set U w of nodes at distance d w from (−s w , s) at time t . The last equation is thus used to insert into R w the right triple (−s w , t , d((−s w , s), (t , w))) for each w ∈ C v . When we have iterated over all of D, all dictionaries R v are correct at time t . Finally, it suffices to observe that once f [(t , w)] is updated with its final value, then by definition the update of d [(t , w) ] on Line 24 yields the sf-metric from (t s , s) to (t , w) for each w.

◻ Proof of complexity Let us write n ∶= |V|, m t ∶= |E t | and ∶= | | . On each time t ∈ {t 0 ∈ ;t 0 ≥ t s } , we first look up the connected components of G t , which requires at most O n + m t operations. On each component C of G t , we run an all-pairs distances method, which makes at most O n 2 + |C|m t operations. Observe that the O n 2 factor is for the initialization of a V × V array, which can be done once for the whole link stream. In order to see if a node has been reached at time t, it suffices at all previous times i to write distance d uv in d [u, v] as (i, d uv ).

For each node v ∈ V , the list in R v [−s u ] contains at most elements since there can be at most as many pairs in R v [−s u ] as there are arrival times on v. The same goes for the number of keys

We only need one distance in R v [−s u ] , thus one distance per node, and D can be constructed with at most O(|C|) operations for all v ∈ C . Inserting and removing an element from R v [−s u ] takes at most O(log ) operations. The costliest operation is finding the value of d min , which involves searching in the set U of size |C| . This value is recomputed for every source u ∈ C , destination w ∈ C and the set U ⊆ C changes with every iteration. The for loop on Line 14 will make at most O |C| 2 (|C| + log | |) .

The for loop over G t will make at most

Recall that ∑ t∈ m t = �E � . Summing over all times in , we deduce a complexity of O nm + n 2 log + n 3 . The final complexity follows from the fact that E ⊆ × V ⊗ V . ◻

Computing the minimum distance between all pairs of nodes in a connected component C, while accounting for the outer distances is tricky and leads to the factor of |V| 3 . We do not expect this can be much reduced. Indeed, in the worst case all nodes of a time-induced graph G t are reachable from the source from different starting times and are in the same connected component. Thus, |V| distances would have to be updated. Following Eq. (1), in order to update the distance from the source to any node w, we need to look at the distances from the source to any reachable node of G t as well as the distance from those nodes to w. In the worst case, this amounts to running a single-source shortest path method on G t from w. Doing this |V| times on each induced graphs leads to a lower bound of |V||E | .

with the same starting time s, that is R v [s] can grow linearly with | | . This is the simplest implementation of this dictionary that contains exactly all information. Since we only query for the largest starting time and the smallest distance (given that starting time), a more convenient way to implement R v is with max and min heaps. Thus, R v can be a maxheap while R v [s v ] , for any key s v , would be a min-heap of

is the last one. This defines the interval [a 1 v , a 2 v ] during which shortest paths of length d v are known which summarizes with one pair | | amount of information. This would remove the logarithmic factor from the complexity of Algorithm 1, which in practice leads to significant improvement. However, since the data structure is different, accessors to R v would have to be modified. Thus, if one specifically needs to know exactly

when a path of length d v arrives on v from (s v , s) , then this structure would be inadequate.

Proof This follows from Proposition 4.6 and Remark 4.7. ◻

We use the sets V, E and as parameters to evaluate the temporal complexities of our algorithms. These appear as natural choices since indicates how the temporal dimension affects the number of operations while E is a surrogate for E, which is in general uncountable.

Suppose is finite and starts on some time a. Algorithm 2 returns a set of dictionaries of sf-metrics D uv for each pair of nodes (u, v) ∈ V 2 of dictionary D uv [s uv ] = (a uv , d uv ) such that l (s uv , u), (a uv , v) = a uv − s uv and d ((s uv , u) , (a uv , v)) = d uv . During its execution, it updates a dictionary D 0 such that

This dictionary helps in computing D and in constructing R v from any source. It also returns a set of dictionaries F uv of latencies. Let us show that D 0 [u, v] [t v ] holds correct reachability triples from (a, u) to (t v , v) for any two nodes u, v and time t v . Thus, let us fix those three variables. Let us show this by induction on ∶= |{t ∈ ;a ≤ t ≤ t v }|.

O |V| 3 | | log | | operations.

-If = 1 , then either u and v are in the same connected component C of G t v or not. This part is clear. -Suppose the result holds for any k < . Let (t 1 , … , t −1 ) be the sequence of times previously iterated over. Let C v be the connected component containing v at time t . If u ∈ C v , then we argue as in the first case and the result follows. Otherwise, by the induction hypothesis, there must exist a largest starting time s v from u to (t , v) that can be found in SA [u, w] [t −1 ] , for some w ∈ C v since each such node w is connected to v. Observe that SA [u, v] [t −1 ] contains pairs of largest starting time and arrival time from u to (t −1 , v) . Observe also that t is again an arrival time on v. Thus, it suffices to compute the distance from (s v , u) to (t , v) to obtain a reachability triple (s v , t , d v ) from (a, u) to v. We argue as in the proof of Algorithm 1 that Algorithm 2 returns this distance d v . The update D [u, v] [t ][s * ] ← d * again follows the same reasoning as before.

◻ Proof of complexity Again, let n ∶= |V|, m t ∶= |E t | and ∶= | | . The costliest operations occur in the for loop starting on Line 11. There are at most keys on each SA uv , for any u, v ∈ V . For any t ∈ and u, v ∈ V , the size of SA uv [t] is upper-bounded by since the starting time is maximal. Thus, at most O(log ) operations are required. Finding the largest starting time s v requires in the worst case O |C v | log operations.

Dictionary D 0 uv [t] , for any u, v ∈ V and t ∈ , has a size at most 2 , thus the loop over C v (on Line 16) to find d min requires at most O |C v | log operations.

Meanwhile, inserting into SA uv takes at most O(log ) operations. The for loop on Line 11 thus makes at most:

operations. This loop is itself repeated for all connected components C ⊆ V(G t ) , which in turn yields:

operations. Thus, this method should make at most O n 2 + nm t + O n 3 log operations in the worst case on each time t. This number of operations is repeated at most times. Observe again that O nm ⊂ O n 3 . ◻

Observe that Algorithm 1 needs to be called |V| times in order to deduce the lengths of all shortest fastest paths from any source to any destination, since it discovers all starting times from each source. The multiple-sources algorithm is empirically faster when the desired output is the set of sfmetrics from all sources to all destinations. The temporal complexities of both methods are affected mostly by the induced graphs G t . In Sect. 4.4, we will see that complexities decrease drastically on cases such as -paths with > 0 since we can remove the dependency on those time-induced graphs. The complexity could also be reduced slightly with the use of heaps; however, this would make accessing information more difficult.

In the literature on temporal walks, it is commonly assumed that a path has a transmission delay and we call this a -path. This is the case with Wu et al. whose algorithm we wish to compare Algorithm 1 against since their shortest path procedure is the most efficient known in temporal networks. Transmission delays are natural when modeling the spread of information and the time required for spreading is commensurable with the time interval during which the network is observed. A -p a t h i n a l i n k s t r e a m i s a p a t h (t 1 , u 1 , u 2 ), … , (t n , u n , u n+1 ) such that t i ≥ t i−1 + for all 1 < i ≤ n and some ∈ ℝ + . 1 We call the delay and note that the usual path corresponds to a 0-path. An example -path is shown in Fig. 3 from (0, d) to (2, a) with = 1 2 . Note that with this choice of , the subpath from (1, c) to (2, a) is the unique path from the former to the latter temporal node. When > 0 , it is not necessary to iterate over connected components, since all nodes of a component do not communicate with each other, and we can simplify Algorithm 1 and 2 in order to reduce their numbers of operations. Thus, we present Algorithm 3 and 4 that are deduced from Algorithm 1 and 2 and assume > 0 . Their correctness and temporal complexities follow from the same arguments used in Proposition 4.6 and 4.9.

Observe that since paths have delays, we must ensure an edge appears long enough to be crossed given that delay. Proposition 4.10 When > 0 , Algorithm 3 computes the latencies and sf-metrics from a source event node to all reachable event nodes as well as the set of dictionaries R v , for all v ∈ V , in at most O |V| + |E | log | | operations.

Proof Correctness follows from the same reasoning as in Proposition 4.6. For the complexity, we first initialize a dictionary of |V| elements, then we iterate over all edges of E and perform logarithmic searches on lists that grow with | |.

Note that, by the same argument as for Algorithm 1, the complexity can be lowered to O |V| + |E | by using a combination of max and min heaps.

Finally, in Algorithm 3, the dictionaries d and f are implemented such that the keys are nodes and values are pairs (t, k) such that t is the time value k is computed at that node. For

, then the latency from the source to (t, v) is f v . This enables us to sort dictionaries by time.

We also extended Algorithm 2 to the case of > 0 and devised Algorithm 4. This method is also guaranteed by the proof of Algorithm 2.

Proof Correctness follows from Proposition 4.9. For complexity, initialization alone of the dictionaries requires O |V| 2 operations. Let (t, uv) be an edge of E . Then, searching in the dictionaries for each values takes at most O(log | |) operations when they are implemented as balanced trees. This is repeated for all nodes w ∈ V⧵{v} and all edges (t, uv) ∈ E .

We present some experiments to highlight the running times of our algorithms. In the first one, we compare Algorithm 3 with the single-source shortest path method from Wu et al. Algorithm 3 ( SSMD ) acts as a surrogate for Algorithm 1 ( SSMD ) since Wu et al. designed their method to work on paths with strictly positive delays. Moreover, these authors evaluated their method from a small set of source nodes on large datasets and we follow the same procedure. In a second experiment, we compared the running times of our two methods for null delays on synthetic link streams. We also compared Algorithm 2 ( MSMD ) and Algorithm 4 ( MSMD ) on synthetic link streams. Finally, we compared Algorithm 3 and 4 on the same task on some datasets.

Algorithm 2 was inspired by Xuan et al.'s fastest paths method that does not return distances. Comparing the two methods would be unfair against ours.

Remark 5.1 (Experimental setup) All experiments were run on a single machine with 2.6 GHz Intel Core i7 processor and 16 Gb of RAM. All methods were implemented in C++ with standard libraries, including Wu et al.'s method. We implemented standard approaches to compute connected components and all pairs distances in graphs. The full code can be found online (Simard 2019b). We ran experiments on link streams of various sizes, as measured with |V| , | | and |E | . We used the same datasets as Wu et al. and added some, randomly chose 200 different source nodes from each and ran both methods one after the other. The full results (in s) can be found in Table 1 . The running times of Wu et al.'s method are comparable to those of SSMD . Our method does more operations, since it must compute latencies as well and ensure the distances correspond to the sf-metrics, so this is encouraging and unexpected. All datasets are heterogenous, which explains the variability in running times and we have not yet pinpointed any hidden link stream parameter that would precisely explain this variability, including the measure used for the visualizations of Fig. 4 .

The datasets are only used as benchmarks. They all describe discrete temporal networks and can be found as part of the konect library of networks (Kunegis 2013) . Only the values of the parameters |V|, | | and |E | are extracted in Table 1 since only these were required for our experiments.

A short description of the datasets used follows in Appendix B. Figure 4 shows the temporal evolution of some network measures on two of the datasets used: Arxiv-Hepph and Wikiconflict. On each, we sampled 0.1% of the dataset to build a sublink stream. On each sublink stream and on each induced graph G t , we evaluated the density of G t and normalized the values to [0, 1]. We chose this measure as an indicator of the temporal evolution of each dataset. It shows how varied the datasets are. Even though in both cases we observe strong fluctuations in the density, in the Wikiconflict dataset the density appears stratified. Meanwhile, in the Arxiv-Hepph dataset this density is mostly low and takes more distinct values in the range [0, 0.4], which hints that its connectivity varies more than in the other dataset. The Wikiconflict therefore seems to have groups of edges of similar sizes appear at regular times, while in the other dataset there is a less obvious pattern in the appearances of edges. (Simard 2019a) showed SSMD to be unstable on some datasets compared with Wu et al. This can be explained by the data structure used to implement R . Note that R v [s v ] , for any node v and starting time s v , can grow as O(| |) which can be large. Thus, even logarithmic search in this structure can be costly. Notice also that we only ever need the largest key of R v and the smallest distance of R v [s v ] . Thus, we reimplemented dictionary R with (max/min)-heaps as explained in Remark 4.7. This provides constant time access to the largest/smallest element. This is a choice of implementation to increase the performance of this method and all results presented here that specifically test the performance of SSMD used that implementation. The new method is considerably faster and is comparable with Wu et al.'s method. We did not reimplement the latter's method using heaps because at each step they remove dominated elements which naturally tends to make their structures smaller. 

Algorithms SSMD and MSMD were run on a set of randomly generated link streams of size |V| ranging from 100 to 170, with increments of 10. Although the link streams are small in scale, the running times are significant since we compute the distances from every source to every destination. The link streams were constructed by generating Erdös-Renyi graphs G(n, p), with n = |V| and p = 0.7 . Then, on each edge (u, v), we drew a time instant t ∈ {0, 1, … , 7} uniformly at random and added both directed edges (t, uv) and (t, vu) to E. In this case, edges have no duration and the time instants are integers: this helps ensure the size of is fixed and small, so the running times scale only with |V| and |E |. Figure 5a shows the running times of each algorithms on a link stream with a fixed number of nodes. We observe that, as the number of nodes involved increases, the amount of time taken by SSMD grows faster than that of MSMD . This gives clear indication that this latter method is faster than the former. In terms of scale, the MSMD method manages a link stream of 170 nodes and about 20,000 edges in less than 3 s. Its counterpart takes more than 25 min for the same calculations.

Since MSMD is more scalable than SSMD , we generated a new set of link streams, again with the same process as before, with time instants drawn uniformly at random in the set {0, … , 10} while the duration of an edge (t, uv) is drawn uniformly at random in the subset {0, … , 10 − t} . In that case, the size of barely varies. We let |V| ∈ {10, 20, … , 190, 200} . The results are summarized in Fig. 5b . We fitted, with the statistical software R (R Core Team 2013), a linear model on the runtime of MSMD as function of |E | in order to extrapolate the runtime of this method for larger values of link stream parameters. Since the number of nodes and event times are low compared to the number of edges, the trend is linear in the number of edges. The fit is reasonable and this is sufficient to illustrate the scaling trend. Extrapolating, we obtain the values shown with the blue line. The trend does not suggest the method is at this point scalable to big link streams as in Fig. 4 Visualization of the temporal evolution of the density on sublink streams of two datasets, Arxiv-Hepph and Wikiconflict. Each sublink stream was constructed by sampling 0.1% of the dataset. On each sublink stream, we evaluated on each induced graph G t the density of G t . Values are normalized to [0, 1] general the sizes of V and would also grow to be much larger than in this experiment.

We ran algorithms MSMD and MSMD side-by-side on synthetic link streams to see how well the latter performs against the former. It is expected that MSMD would be faster. However, instead of looking at the trend on link streams with varying number of nodes, we investigated this trend against the density of the stream. Thus, we constructed link streams as before from graphs G(n, p), while varying the parameter p instead of n. The density of the resulting link stream varies with p. The number of nodes is fixed to 200 and edges have positive integer duration, to allow duration while keeping the cost of computation low.

Experimentally, the density of each time-induced graph of the link stream affects the performance of MSMD against MSMD . This can be seen in Fig. 6 : as the value of p increases, after a threshold around p ≈ 0.63 , MSMD takes longer to complete than MSMD . This suggests that when many edges appear at the same time, it becomes advantageous to compute all-pairs distances in the time-induced graphs and reuse them on each connected components. Note that these do not solve exactly the same problem since one works with > 0 while the other does not. However, if the choice of is a modeling parameter, 2 this can be interesting to take into account. It is not clear why the running time of MSMD is decreasing for 

Algorithm MSMD can terminate in reasonable time on some datasets, as opposed to MSMD . In order to probe this method further, we tested how long it would take for this method to terminate on some datasets against SSMD . We compared both methods on selected datasets on which SSMD took less than 10 s to finish (as observed in Table 1 ). This way, we could expect it to finish in a decent amount of time from all sources. We ran SSMD from all sources alongside MSMD . For the same task, it is faster to run the specialized method MSMD . However, note that both methods could finish in a short amount of time on real-world datasets, which is positive. Statistics on the running times of both methods are summarized in Table 3 . 

In this paper, we presented different algorithms to compute metrics between pairs of event nodes. As opposed to similar known algorithms, those methods return all metrics at once in a single pass over the dataset. Moreover, the starting and arrival times of shortest paths are returned, which is valuable information to compute, for example, the betweenness centrality of temporal nodes.

Algorithm SSMD works from a fixed source and is suitable when not all pairwise metrics are required. Our experiments show that SSMD is comparable to the state-of-theart method to compute distances from a source node to all other nodes. These results improve on previous ones (Simard 2019a) and make use of a more efficient data structure. Even though some experiments are advantageous to our methods, we do not claim they are in general faster than the state of the art. However for the task at hand of computing all metrics at once on a link stream, we think we can fairly conclude that our methods are usable in practice. The experiment in Sect. 5.1 was designed to illustrate if SSMD could be used in real-world setting. Thus, comparing it to another shortest path method is relevant since it does not do significantly more operations that this type of algorithm. One could surely implement the shortest path algorithm of Wu et al. to make it faster than SSMD .

In practice, MSMD has finished its task faster than its counterpart on synthetic link streams. Since the link streams used were smaller than what we would expect from realworld instances, we extrapolated the running times produced by MSMD . At this point, scalability is an issue, when = 0 , and we could not expect to run this method on realistic link streams and obtain results in a reasonable amount the time. Thus, in order to speed up the computation time, we suggest studying how to lessen the amount of operations in either methods by skipping some temporal nodes and extrapolating the distances. Also, finding ways not to have to recompute the connected components and the all-pairs distances at every time would also be helpful in improving both methods.

Both algorithms SSMD and MSMD , when > 0 , could finish their task on some datasets in a decent amount of time. Algorithm MSMD took less than 30 s to finish all tasks, even on datasets of almost 100, 000 edges. Thus, in that case, realistic datasets of decent size could be handled by our methods.

In light of our experiments as well as the recent literature (see Himmel et al. 2019) , it would be interesting to have a lower bound on the complexity of computing our metrics. Since shortest paths methods seem to be based on the same arguments as shortest paths algorithms in graphs, it might also be relevant to deduce a lower bound in terms of the latter. That is, how much slower is it to compute distances in a link stream as opposed to computing distances in a graph? Can this complexity be expressed in terms of natural parameters of the link stream such as V, and E ?

Aside from scalability, another limitation of this study lies in the ordering of the objective functions we chose to optimize. Namely, we compute lengths of shortest fastest paths. If one were to require lengths of fastest shortest paths, our methods would need to be redesigned. Moreover, the multitude of possible combinations of optimal paths to compute (foremost paths, shortest foremost paths, etc.) is not all considered in this work. Brunelli et al. tackled this limitation by building a more general framework for optimal paths. We believe our methods can be modified to compute some other types of paths combining temporal and structural information hierarchically, such as shortest foremost paths. In turn, those paths can be used to compute other centralities than the betweenness centrality or to investigate different topics such as reachability. In Algorithm 3, for example, whenever we consider an edge (t, uv) with u ≠ s , we have access to the last arrival time a u from the source at time s u . If we instead considered the first such arrival time, then we could decide if the path from (s u , s) to (t, v) that involves the edge (t, uv) is restless (see Casteigts et al. 2020) or not and update the relevant dictionaries accordingly, say of restless reachability triples. edia. org/ wiki/ Main_ Page. Each edge connects users to other users with whom they are in editing conflicts. 24. youtube-growth A social network. Extracted from the website youtu be. com. Edges connect users with their friends on the platform.

Extracted from the website digg. com. Edges connect two users when one replied to the other

Extracted from the 2016 Democratic National Committee email leak of the american Democratic Party. Edges connect two people if one sent an email to the other

Represents admin elections in the English Wikipedia. Edges connect two users if one voted for or against the other

The website seems deprecated

Data are extracted from a portion of the website Facebook faceb ook

Edges describe friendship connections on the website flickr. com. 13. lastfm band An interaction network between users and bands. From the website last. fm. Edges connect users to bands they listened to. 14. lastfm song An interaction network between users and songs listened

Edges connect people to threads in the mailing list they contributed to. 16. mit A human contact network

Extracted from the website twitt er. com. Edges connect users to the tags they used in tweets. 19. prosper loans An interaction network. Extracted from the website prosp er. com. Edges connect people (lenders) to other people (borrowers) to whom they lent money

Edges connect users when one replied to another

sociopatterns hyper A human contact network. Edges represent face-to-face contacts of more than 20 s

sociopatterns infect A human contact network. Edges represent face-to-face contacts of more than 20 s

On computing Pareto optimal paths in weighted time-dependent networks

Time-varying graphs and dynamic networks

Shortest, fastest, and foremost broadcast in dynamic networks

Dynamics of person-to-person interactions from distributed RFID sensor networks

Relevance of temporal cores for epidemic spread in temporal networks

The covid-19 social media infodemic

An overview of bilevel optimization

Building a reference combinatorial model for MANETs

Efficient computation of optimal temporal walks under waiting-time constraints

Temporal networks

Connectivity and inference problems for temporal networks

Konect: the koblenz network collection

The konect project

Stream graphs and link streams for the modeling of interactions over time

Accelerating minimum temporal paths query based on dynamic programming

The temporal event graph

Effect of risk perception on epidemic spreading in temporal networks

R: a language and environment for statistical computing. R Foundation for Statistical Computing

On computing distances and latencies in Link Streams

Computing betweenness centrality in link streams

SocioPatterns (2021) Sociopatterns collaboration. www. socio patte rns. org. Accessed

Gender homophily from spatial behavior in a primary school: a sociometric study

Characterising temporal distance and reachability in mobile and online social networks

Analysing information flows and key mediators through temporal centrality metrics

Restless reachability in temporal graphs

Path problems in temporal graphs

Computing shortest, fastest, and foremost journeys in dynamic networks

The authors would like to thank the Professors Paola Flocchini, Jean-Lou De Carufel and Nicola Santoro for their valuable insights.Funding This work was supported by the Fonds de Recherche Québécois Nature et Technologies (FRQNT) (Grant No. 197014).

Code availability Implementations of the four algorithms developed in this work are available online: Simard (2019b) . The datasets used are also available on the web: Kunegis (2019) .

Datasets with runtime less than 10 s are in bold 

The following datasets were used in the experiments. More documentation can be found online (Kunegis 2019) . Datasets are part of the KONECT project (Kunegis 2013) . Descriptions below are found in the directories of the datasets.1. arxiv-hepph A co-citation network from the website arXiv arxiv. org of scientific papers. Papers are taken from the high energy physics phenomenology (hep-ph) section. Two papers are linked if they are both cited by another paper, with the timestamp indicating the date of the latter's publication.