key: cord-0780940-pasyb39x authors: Yu, Pei-Duo; Tan, Chee Wei; Fu, Hung-Lin title: Epidemic Source Detection in Contact Tracing Networks: Epidemic Centrality in Graphs and Message-Passing Algorithms date: 2022-01-18 journal: IEEE Journal on Selected Topics in Signal Processing DOI: 10.1109/jstsp.2022.3153168 sha: cbf1f6ec22a4ce2b9a5520b3878353ab819fc20b doc_id: 780940 cord_uid: pasyb39x We study the epidemic source detection problem in contact tracing networks modeled as a graph-constrained maximum likelihood estimation problem using the susceptible-infected model in epidemiology. Based on a snapshot observation of the infection subgraph, we first study finite degree regular graphs and regular graphs with cycles separately, thereby establishing a mathematical equivalence in maximal likelihood ratio between the case of finite acyclic graphs and that of cyclic graphs. In particular, we show that the optimal solution of the maximum likelihood estimator can be refined to distances on graphs based on a novel statistical distance centrality that captures the optimality of the nonconvex problem. An efficient contact tracing algorithm is then proposed to solve the general case of finite degree-regular graphs with multiple cycles. Our performance evaluation on a variety of graphs shows that our algorithms outperform the existing state-of-the-art heuristics using contact tracing data from the SARS-CoV 2003 and COVID-19 pandemics by correctly identifying the superspreaders on some of the largest superspreading infection clusters in Singapore and Taiwan. The COVID-19 coronavirus pandemic has revealed severe deficiencies in public health protection [1] . As the COVID-19 disease is highly contagious and wide-ranging with long incubation periods (transmission rate of 3-5 persons within 6 feet), it becomes necessary to track down all the infected persons and their recent contacts once an outbreak has occurred. It is often necessary to account for the initial source of the outbreak (e.g., identification of superspreaders [2] , [3] ) so that public health is resilient against further outbreaks or to understand the underlying cause of secondary transmissions. Public health authorities worldwide employ contact tracing to address these problems [1] , [4] - [10] . In general, manual contact tracing is a complex and tedious process: Once a person has been diagnosed as infected, public health authorities fan out to trace the recent contacts of this person for the purpose of monitoring or quarantine. This process repeats if one of those contacts exhibits symptoms until all the contacts who have been exposed are out of circulation. The COVID-19 coronavirus epidemic has overwhelmed most contact tracing capabilities due to the speed and scale of infection [7] . For contact tracing to be efficient, especially to identify superspreaders in recurrent large-scale outbreaks during the COVID-19 pandemic, efficient and scalable contact tracing algorithms need to be developed [1] , [4] - [10] . Recently, there are public health protection schemes that employ mobile software technologies to use wireless signals (e.g., Bluetooth) to collect data on social contact connectivity for digital contact tracing in epidemiology [4] , [10] , [11] . These data lead to contact tracing networks that are essentially large graphs generated by stochastic processes. From an epidemiology perspective, all the infected persons in a contact tracing graph are potential candidates for tracking purposes as well as identification of Patient Zero or superspreaders [2] , [3] . A fundamental question in digital contact tracing is how to unravel stochastic spreading processes to find the initial outbreak source quickly, accurately and reliably with high confidence by exploiting the topological and statistical properties of contact tracing networks. The spreading of epidemics and rumors share much in common as stochastic processes in mathematical epidemiology [12] , [13] . Formulating the source detection as solving a maximum likelihood estimation problem was first studied in [14] using a network centrality called the rumor centrality to optimally solve a special case of degree-regular tree graph with countably infinite number of vertices and assuming a SI (susceptible-infectious) spreading model. There were a number of problem extensions subsequently, e.g., random increasing trees in [15] , probabilistic sampling in [16] , star graph topology in [17] , multiple sources or observations in [18] - [20] , Markov chain Monte Carlo based algorithms in [21] and probabilistic characterization of infection network boundaries in [22] . Source estimation for other spreading models include the Susceptible-Infected-Recovered model in [23] and a random branching process for irregular trees in [15] , [24] . Related work on proactive network protection against epidemics include [25] - [27] , e.g., a maximum likelihood algorithm in [27] that complements the work in [26] . How to optimally place observations concerning the efficiency and the cost and design message-passing algorithms were studied in [25] . Probabilistic inference methods that use machine learning techniques like graph neural networks and deep learning for epidemiology have been studied in [1] , [10] , [28] , [29] . To the best of our knowledge, there is no prior work that solves the problem in [14] when the network has cycles or arXiv:2201.06751v2 [cs.SI] 25 Feb 2022 finite graph boundaries even under the SI spreading model. This more general problem is computationally challenging due to the presence of irregular vertices (vertices without susceptible neighbors) and cycles. Irregular vertices are practical for modeling actual graph connectivity or users who are quarantined after being infected. In addition, the presence of cycles cannot be ignored. In essence, unlike the tree graphs, the cycles allow the stochastic spreading process to spread through multiple alternate paths, thus increasing the likelihood of those vertices on the cycle. As such, the presence of cycles and irregular vertices introduce nontrivial irregular effects that significantly shape the way that virus spread as well as the performance of detecting the original source on the infected graph. To be exact, existing algorithms in the literature, e.g., [14] , [30] - [32] , are no longer optimal even with the presence of a single cycle in degree-regular pseudo-tree graphs. In this paper, we address the outstanding issues of epidemic source detection for the general cases when there are irregular vertices and cycles. In particular, we demonstrate that the likelihood ratios between vertices of the infection graph provide insights to locating the most likely source, which leads to a new network centrality connecting graphtheoretic structures with estimation performance and enables low-complexity algorithms using the same spreading model as those studied in the literature [14] , [16] , [30] , [31] , [33] . The main contributions are summarized as follows: • We consider a general network (i.e., finite large network with cycles) for epidemic source detection, and analytically characterize how graph distances between vertices in an infection graph and a single cycle or irregular vertex affect the likelihood of each vertex being the source. • For a finite size degree-regular tree, we characterize the globally optimal maximum likelihood estimator based on graph distances and propose a scalable message-passing algorithm to compute this solution in linear time. • For a degree-regular graph with cycles, we prove that its epidemic center can be equal to the epidemic center of any of its spanning trees. We propose a polynomial-time algorithm to find the epidemic center of a graph with a single cycle by characterizing the optimal maximum likelihood estimator based on the graph distances. • We combine the results in finite-size degree regular graphs and unicyclic graphs and propose a polynomialtime distance-based algorithm to find the source estimator. For synthetic data, we conduct simulations on two finite-size regular graphs with cycles which are grid graphs and circulant graphs. We evaluate the performance by comparing the results with the BFS heuristic rumor centrality [14] , showing that the error (number of hops) of our source estimator is at least 50% smaller than BFS rumor center. Using real-world contact tracing data (due to SARS-CoV 2003 and COVID-19 pandemic), we apply our algorithm on some of the largest superspreading infection clusters in Singapore and Taiwan, correctly identifying the superspreader in this cluster. We model a contact network by an undirected graph G = (V, E), where the set of vertices V represents the vertices in the underlying network, and the set of edges E represents the links between the vertices. We shall assume that V is countably finite (this is the crucial departing point from the previous assumption of infinite graph in the literature [14] , [18] , [30] - [32] ). In this paper, we use the Susceptible-Infectious (SI) model in [12] , [13] , [34] to model virus spreading. Vertices that infected by the virus are called infected vertices and otherwise they are susceptible vertices. The spreading is initiated by a single vertex v ∈ V that we call the source. Once a vertex is infected, it stays infected and can in turn infect its susceptible neighbors. The virus can be spread from vertex i to vertex j if and only if there is an edge between them (i.e., (i, j) ∈ E). Let τ ij be the spreading time from i to j, which are random variables that are independently and exponentially distributed with parameter λ (without loss of generality, let λ = 1). Let S denote the set of all susceptible vertices that have at least one infected neighbor, i.e., those vertices in S might be infected in the near future. In the real world there are some people that are more likely to be infected by the virus, and some are more likely to spread the virus to others. We can assume that each person has two parameters say R i and R s which are corresponding to the rate of being infected and the rate of spreading the virus to others respectively. Due to the memoryless property of the exponential distribution. We have the fact that each newly infected vertex v is randomly chosen from S with the probability that where each u is an infected neighbor of v, R v i denote the infected rate of v and R u s denote the spreading rate of u. Hence, the probability of a vertex v a being infected in the next time period is defined as where u a and u represents each infected neighbor of v a and v respectively. Now, we have a random spreading model over an underlying finite graph G. We can view G as a contact network [35] - [38] where nodes represent individuals and edges represent social contacts such as two individuals have been in the same place or in close contact (within about 6 feet). Contact tracing is a process to identify the people who have been in contact with an infected individual [10] . Hence, an epidemic contact tracing network can be seen as a connected subgraph of a human contact network containing infected people. Let G n be a subgraph of order n of G, that models a snapshot observation of the spreading when there are n infected vertices, i.e., G n is a contact tracing network in G and |G n | = n. In the following, we shall call G n an infected subgraph of G for brevity. We denote the actual source in G n as v . The epidemic source detection problem is thus to find v given this observation of G n . For example, in Fig. 1 the infected I WE SUMMARIZE THE MAIN RESULTS OF THIS PAPER WITH A COMPARISON WITH PRIOR ART IN [14] IN THIS TABLE. ASSUME THE UNDERLYING NETWORK G IS A DEGREE-REGULAR TREE. WE DENOTE THE IRREGULAR VERTEX AS v ir AND THE EPIDEMIC CENTER AS vc. NOTE THAT, THE CASE THAT Gn CONTAINING AN IRREGULAR VERTEX CAN BE TREATED MATHEMATICALLY AS A SPECIAL CASE OF Gn CONTAINING A SINGLE CYCLE, SINCE AN IRREGULAR VERTEX CAN BE TREATED AS A SIZE ONE CYCLE WHICH IS ON THE BOUNDARY OF THE GRAPH. THEREFORE, THE LOGICAL FLOW OF THIS PAPER IS SUCH THAT WE FIRST CONSIDER THE IRREGULAR EFFECT CAUSED BY AN IRREGULAR VERTEX AND THEN ANALYSIS TO GRAPHS WITH subgraph are those vertices labeled from one 1 to 6, i.e., V (G n ) = {v 1 , v 2 , . . . , v 6 }, and the underlying network G is the whole graph including those dotted-line vertices and edges. Since we assume that the graph topology of G n is the only given information, i.e., two parameters R i and R s are unknown for all vertices in G n . Hence, we shall assume that R i = R s = 1 for all vertices and we have a simplified infection which implies that the probability of v a being infected is proportional to the number of its infected neighbors. Note that, when G is a tree network, this spreading model is equivalent to the one considered in [14] . However, the probability defined in (1) can be used to analyze more general cases such as graphs with cycles. In this paper, we aim to solve the epidemic source detection problem which is defined as follows: where P (G n |v) is the likelihood function assuming v is the source, and G is an almost d-regular graph with some irregular vertices, i.e., vertices with degree not equal to d. The maximum likelihood estimator for the epidemic source is the vertex v with the maximum P (G n |v) [14] . In this paper, we assume that all irregular vertices have degree less than d. Definition II.1. For a given infection subgraph G n over the underlying graph G,v is an maximum likelihood estimator for the epidemic source in G n , i.e., P (G n |v) = max vi∈Gn P (G n |v i ). In the remaining part of this section, we review the maximum likelihood estimation problem of this epidemic source in the simplest case: regular-tree networks. By Bayes' theorem, P (G n |v) is the probability that v is the actual epidemic infection source that leads to observing G n . Now, let σ i be the possible spreading order starting from v, and let M (v, G n ) be the collection of all σ i when v is the source in G n . Then, we have In particular, for a d-regular tree, we have [14] : . Now, if the spreading has not reached the irregular vertices, then P (σ i |v) = P (σ j |v) for all σ i , σ j ∈ M (v, G n ). By combining (4) and (5), we have If we treat the rooted tree as a partially ordered set, then a spreading order is a linear extension corresponding to this poset. Hence, the quantity |M (v, G n )| is actually the number all linear extensions of the poset G n rooting at v. The authors in [14] called |M (v, G n )| rumor centrality, which is crucial to solving the maximum likelihood estimation for degree-regular trees. We denote the vertex having the maximum |M (v, G n )| among all vertices in G n as v c , that is In particular, v c is called the rumor center when G n is a tree in [14] and the authors in [39] , [40] established its equivalence to the graph distance center and tree centroid. When G n is a general graph, rumor center is only defined on the BFS spanning tree of G n . To avoid confusion, in this paper we call the quantity |M (v, G n )| epidemic centrality and v c epidemic center for a general graph. Definition II.2. For a graph G and a vertex v ∈ G, the distance centrality of v can be defined as where d(u, v) is the shortest path distance, i.e., the number of edges along the shortest path from u to v. The distance center of a graph is the vertex with the smallest distance centrality. Fig. 1 . Example of G as a 3-regular tree except v 5 and Gn as a subtree with an irregular vertex v ir = v 5 . The maximum likelihood estimatev is v 1 , moreover, the likelihood of v 5 is also greater than v 2 . While a naive application of the rumor centrality in [30] , i.e., the rumor center vc of Gn, yields both v 1 and v 2 . III. TREES WITH A SINGLE DEGREE-ONE IRREGULAR VERTEX In this section we study the effect on the maximum likelihood due to an irregular vertex in a regular tree network. Assuming that the original underlying network is a regular tree, this irregular vertex is a vertex with different number of neighborhoods compared to other vertices. For example, in a regular tree G of bounded size, if the infected subgraph G n contains a leaf v of G, then v is an irregular vertex in G since it has no other neighbors except its parent vertex. This models a person under quarantine in a contact tracing network with fewer neighborhoods than other people. We consider the case where the degree of the irregular vertex is less than that of all other vertices. Note that a degreeone irregular vertex is a leaf of both G and G n . Now, assume that the infection subgraph G n only has a single irregular vertex denoted as v ir , where deg(v ir ) < d in G. Take Fig. 1 for example, we have v ir = v 5 since deg(v ir ) = 2 and other vertices are of degree 3 in G. In the following, we study how v ir affects the maximum-likelihood estimation performance. In particular, we compare this single irregular vertex special case with a naive prediction that assumes an underlying infinite graph. This illustrates that ignoring the irregular effect in the finite graph ultimately leads to a wrong estimate and thus requires an in-depth analysis and new epidemic source detection algorithm design for the general case of finite graphs. A. Impact of the Irregularity On P (G n |v) Example 1. Let G be a infinite 3-regular tree except v 5 has a different degree, and G 6 = is a subgraph of G shown in Fig. 1 . Consider P (G 6 |v 1 ) and with a spreading order Had v 5 not been the irregular vertex, then P (σ|v 1 ) = (1/3)·(1/4)·(1/5)· (1/6) · (1/7). This demonstrates that the order at which the virus spreads to the irregular vertex v 5 is important when computing P (σ|v 1 ). In particular, P (G 6 |v 1 ) ≈ 0.0149. We also have P (G 5 |v 2 ) ≈ 0.0114. Now, observe that v 1 and v 2 are two vertices with the largest epidemic centrality among all vertices in G 6 , but P (G 6 |v 1 ) > P (G 6 |v 2 ), and thusv = v. Note that, the likelihood of v 5 being the source is also greater than that of v 2 even v 2 has a greater epidemic centrality. Example 1 reveals some interesting properties of irregular effects due to even a single irregular vertex: • P (σ i |v) increases with how soon the irregular vertex appears in σ i (as ordered from left to right of σ i ). • When there is at least one irregular vertex in G n , then This means that P (σ i |v) is no longer a constant for each i, and depends on the position of the irregular vertex in each spreading order. We proceed to compute P (σ i |v) as follows. For brevity of notation, let v ir be the irregular vertex and let is the set of all the spreading orders starting from v and with v ir at the kth position, and its size is the combinatorial object of interest: Let D be the distance (in terms of number of hops) from v to v ir . Then we have Now, (8) This decomposition allows us to handle the irregular effect due to the different position of the irregular vertex in each spreading order. For example, in Table II we list out all m vir v (G n , k) for each possible starting vertex v and the position k of v ir . Let P vir v (G n , k) be the corresponding probability for each k. We can rewrite P (G n |v) for the case with single irregularity as: Thus, solving (3) means finding the vertexv that solves Since where the first factor of P vir v (G n , k) in (11) is the probability that k vertices are infected once the virus reaches the irregular vertex, i.e., v ir is the kth vertex infected in G n , and the second factor is the probability that all remaining n − k vertices are infected thereafter. On the other hand, the value of m vir v (G n , k) in (7) is dependent on the network topology, and thus there is no closed-form expression in general (though when G n is a line, a closed-form expression for m vir v (G n , k) is given in (12)). We now use a special case, line graph, to demonstrate how an irregular vertex and network topology affect the probability P (G n |v). Suppose G is a finite degree-regular tree and G n is a line graph with a single irregular vertex due to the bounded size of G. Without loss of generality, suppose n is odd (to ensure a unique v c ) and n = 2t+1 for some t. We label all the vertices in G n from 1 to 2t+1 and assume that v 2t+1 is the end vertex, i.e.,irregular vertex with degree 1. To compute P (G n |v i ) for v i ∈ G n , from (9) and (11), we already have P vir vi (G n , k), so we need to compute m vir vi (G n , k). The enumeration of m vir vi (G n , k) can be accomplished in polynomial-time complexity with a path-counting message-passing algorithm (see, e.g., Chapter 16 in [41] ). In particular, we have a closed-form expression for m vir v (G n , k) given by: when i = n, leading to an analytical formula for P (G n |v i ): where P vir vi (G n , k) is given in (11) . In (13), we suppose that n is odd. Using (13), let us numerically compute P (G n |v i ) for all v i in Fig. 2 , where G is a 4-regular tree and G n is a line graph with a single irregular vertex v ir = v n as boundary for different values of n = 7, 8, 9, 10 . The x-axis is the vertex v i where i = 1, 2, . . . , 10, and the y-axis plots P (v i = v |G n ). Fig. 2 illustrates that the influence due to the irregular vertex on P (v i = v |G n ) dominates that of the epidemic center when n = 7, 8, 9. However, the situation reverses when n = 10 (i.e., the epidemic center on P (G n |v i ) is dominant thereafter). at one end of the line graph, then there is a constant j such that P (G n |v c ) > P (G n |v ir ) when n > j. Remark: When n increases, i.e., the distance between v c and v ir increases, then the location ofv in G n converges to the neighborhood of the epidemic center. Example 2. To verify Theorem 1, we plot P (G n |v i ) for a line graph G n with G being a finite 4-regular graph in Fig. 2 . Clearly, we have j = 9. Theorem 1 implies that, for any d-regular underlying graph, when G n is a line graph with a single irregular vertex, the influence of the irregular vertex v ir on P (v i |G n ) decreases monotonically as n grows. In fact, this reduces to the special case in [30] , when n goes to infinity asymptotically, i.e.,v is the ML estimation of the source. Example 1 reveals that in addition to the spreading order, the distance (number of hops) between the irregular vertex and v also affects the likelihood probability P (G n |v). We can conclude that there are two factors affect the likelihood of v being the source, one is the distance d(v, v ir ) and the other one is the epidemic centrality , then the above observation may lead us to P (G n |v a ) > P (G n |v b ). We formalize this optimality result that characterizes the probabilistic inference performance between any two vertices and the location ofv in G n with a single irregular vertex v ir in the next theorem. Lemma 1. Let G be a tree of size n, for any three vertices say v a , v b and v ir , if one of the following conditions is satisfied for all possible k. Remark: Lemma 1 applies for any positive degree of v ir . We can verify Lemma 1 by Fig. 1 which satisfies the first condition of Lemma 1. Hence, the partial sum of |M (v 1 , G 6 )| is always greater or equal to the partial sum of |M (v 2 , G 6 )|. Then, the maximum likelihood estimatorv with maximum probability P (G n |v) is located on the path from the v c to v ir . The above theorem can be immediately deduced from Lemma 1. In addition, we can leverage Lemma 1 in the case when deg(v ir ) > d to reduce the search space of the optimization problem (3). We can conclude only that the ML estimator is on the path from v c to v ir , which is illustrated in Fig. 3 , thus narrowing down the search for the ML estimator to a vertex on this path. In this section, we consider the case when G n has more than a one irregular vertex (naturally, this also means d > 2 in G ruling out the trivial case of G being a line). The key insight from the single irregular vertex analysis still holds: Once the virus reaches an irregular vertex in G,v can be located near this very first infected irregular vertex. In addition, the algorithm design approach is to decompose the graph into subtrees to narrow the search for the maximum-likelihood estimate solution. To better understand the difficulty of solving the general case, we start with a special case: The entire finite underlying network is infected, i.e., G n = G, then P (G n |v) = 1/n for each vertex in G n , as each vertex is equally likely to spread the virus to all the other vertices in G to yield G n = G. In this case, P (G n |v) is exactly the minimum detection probability. So the bound of P (G n |v) given in previous study are not suitable for the case with irregular vertices. Therefore, when simulating the virus spreading in a network, we will set an upper bound n/k of the number of irregular vertices where k is some integer greater than 1, once the number of irregular vertices in G n reaches to n/k , then we will stop the spreading process. In Section III, we have shown that, when G n is sufficiently large, the effect of the single irregular vertex on P (v|G n ) for each vertex v on the line graph G n is dominated by v c . Now, we study the effect of multiple degree-one irregular vertices on a class of graphs whose topology is richer than the line graph in Section III. In particular, as shown in Fig. 4 , we add degree-one irregular vertices to v 2t , so that when G is d-regular, then there will be at most d − 1 degree-one irregular vertices in G n . We call this the broom graph. We can compute P (v|G n ) by extending the result in Section III. Let P , we can now compute the probability P (v i |G n ) by going through all possible {h 1 , h 2 , . . . , h k }. Fig. 5 shows that even though there are five irregular vertices, the effect of v c on P (v|G n ) eventually dominates that of the irregular vertices as n grows from 37 to 39. These result implies that: When there are more degreeone irregular vertices in G n , n needs to be sufficiently large to offset the effect of irregular vertices, i.e., for the transition phenomenon to take place. For other d and n in the broom graph, as shown in the proof of Theorem 1, we can prove this in the same way to conclude that, if we fix the number of irregular vertex, the probability P (v c |G n ) will be greater than P (v ir |G n ) when n is large enough. We propose a message-passing algorithm to findv on the finite regular tree G by leveraging the key insights derived in the previous sections. We summarize these features as follows: 1) If there is only a single irregular vertex v ir in G n , then v is located on the path from v c to v ir . 2) If G n = G, then for all v i ∈ G n , P (G n |v i ) = 1/n. 3) If G n has q degree-one irregular vertices, then there exists an n such that, if n > n , then P (G n |v c ) > max 1≤i≤k {P (G n |v ui )}. Furthermore, n increases as q increases. 4) If two vertices v 1 and v 2 are on the symmetric position of G n , then P (G n |v 1 ) = P (G n |v 2 ). For example, u 1 , u 2 , . . . , u k are topologically symmetric in Fig. 4 . In particular, Feature 1 is the optimality result related to decomposing G n into subtrees to search forv. The subtree t M L in G n corresponds to first finding the decomposed subtree containing v c and the likelihood estimate needed for Theorem 2 to apply. Then, Features 3 and 4 identifyv on a subtree t M L of G n as Theorem 2 only pinpoints the relative position ofv. Algorithm 1 first finds the epidemic center of G n and then determines the number of irregular vertices corresponding to each branch of the epidemic center v c . The final step is to collect vertices on the subtree wherev is, leading to a subtree of G n denoted as t M L . Observe that each step requires O(n) computational time complexity and t M L in a graph with multiple irregular vertices is akin to the path from v c to the irregular vertex in an infected subgraph with a single irregular vertex in Section III. Finally, we obtain a set κ containing the parent vertices of the leaves of t M L and v c . Algorithm 1 Message-passing algorithm to computev for G n with multiple irregular vertices Input: G n , κ = {} Step 1: Compute v c of G n . Step 2: Choose v c as the root of a tree and use a messagepassing algorithm to count the number of irregular vertices on each branch of this rooted tree. Step Next, we use the example in Fig. 6 to illustrate Algorithm 1. Let G 19 be the network in Fig. 6 with the six degreeone irregular vertices depicted as more shaded. Suppose v c is determined by the end of Step 1. Then, Step 2 enumerates the number of irregular vertices at each branch of the subtrees connected to v c , and these numbers are then passed iteratively from the leaves to v c . These messages correspond to the numerical value on the edges in Fig. 6 . The message in Step 2 is an upward (leaf-to-root) message. Step 3 is a message passing procedure from v c back to the leaves, which is a downward message, and the message is the maximum of number of irregular vertices in each branch. For example, the message from v c to child(v c ) is max{1, 2, 3} which is 3. Lastly, the second part of Step 3 collects those vertices whose upward message = downward message. For example, the left hand-side child of v c is first added to t M L , and then v t is added to t M L , and finally, the two leaves on the left hand-side is added to t M L . Observe that t M L must be connected. We simulate the virus spreading on finite regular tree networks and general tree networks with |G| = 1000 and |G n | = 100. Each vertex in a general tree network is not larger than a positive integer d m . The construction of G is a random branching process in which we start with a single vertex v 1 , and then randomly pick an integer, say i, from 1 to d m − 1 to be the number of children of v 1 , and then to assign v 2 to v i+1 to be the neighborhood of v 1 . Recursively applying these steps generates a finite tree G with five thousand vertices whose maximum degree is not larger than d m . We simulate a thousand times the spread of the virus on G by picking v , the true source, uniformly on G, and compare the average performance of Algorithm 1, a naive heuristic that simply uses the rumor centrality (RC) [14] , Jordan centrality (JC) [23] , and a spectral based method called Dynamical Age (DA) [42] . To fairly compare these algorithms, when Algorithm 1 yields a set with |κ| vertices, then other methods find a set of |κ| vertices having the top |κ| maximum score for all v of G n . Obviously, the size of the solution set |κ| depends on the topology of G n in each run of the simulation, and thus is not a constant in general over that thousand times. To quantify the performance of these two algorithms, let us define the error function of a vertex set η: This is the smallest number of hops between v and the nearest vertex in the set η. As shown in Table IV , we can observe that the number of vertices in κ is surprisingly small, moreover, |κ| is decreasing as d m increases. V. PSEUDO-TREES WITH A CYCLE Definition V.1. A pseudo-tree is a connected graph with equal number of vertices and edges, i.e., a tree plus an edge that creates a cycle. In this section, we consider the special case where G is a degree-regular graph, and G n has only a single cycle, i.e., G n is a pseudo-tree. We denote the cycle as C h where h is the size (number of vertices on the cycle) of the cycle. Here, we call those vertices on C h cycle vertices. Assume v . G 6 is an infected subgraph with a single cycle C 3 containing three cycle vertices v 1 , v 2 and v 3 . We can partition G 6 into three subtrees say is a cycle vertex, then we define t v to be the subtree rooted at v in G n . Take Fig. 7 for example, t v1 is the subtree that contains v 1 , v 4 and v 7 . In this section, we study how a cycle affects the probability P (v|G n ) when G n contains a cycle C h . To generalize the analysis in [14] , we should intuitively assume that the probability of being infected is proportional to the number of infected neighborhoods. With this assumption, the analysis in [14] will not change, but we can consider the case that two infected vertices have a common susceptible neighborhood, i.e., there is a cycle in G n . A. Impact of a Single Cycle On P (G n |v) Example 3. Consider the infected subgraph G 6 ⊂ G as shown in Fig. 7 , where G 6 = {v 1 , v 2 , v 3 , v 4 , v 5 , v 7 } and there is a 3cycle in G 6 . Consider a spreading order We have P (σ 1 |v 4 ) = (1/3) · (1/4) · (2/5) · (1/4) · (1/5) = 2/1200. Note that when v 1 and v 2 are infected, v 3 has two infected neighborhoods which implies that the probability of v 3 be infected in the next time period is twice higher than v 5 , v 7 and v 8 . In particular, there are three possible values for P (σ i |v 4 ) as shown in Table V , for all σ i ∈ M (v 4 , G 6 ). Moreover, we observe that the denominators are different in Table V , due to sharing a common neighbor in the presence of a cycle. We call this property the cycle effect. Example 3 reveals some interesting properties of the cycle effect due to a single cycle: 1) P (σ i |v) increases with how soon the last cycle vertex appears in σ i (as ordered from left to right of σ i ). For example, the last cycle vertex on σ 1 is v 2 , and is v 3 on σ 2 and σ 3 . 2) When there is a cycle in G n , then P (G n |v) is no longer proportional to |M (v, G n )|. 3) For each σ i , there are actually two corresponding permitted spreading orders due to the cycle. The first property shows that P (σ i |v) is dependent on the position of the last cycle vertex in each spreading order. Note that the cycle effect is similar to the irregular effect, and the main difference lies in that all cycle vertices may cause the cycle effect instead of only one irregular vertex may cause the irregular effect. We proceed to compute P (σ i |v) as follows. For brevity of notation, let v l denote the last cycle vertex. Definition V.2. We let the distance from a vertex v to the cycle C h denoted by d(v, C h ) be defined by the minimum value of distances from v to all cycle vertices on C h . That is, Take Fig. 7 for example, let Remark: For each σ ∈ |M (v, G n )|, v l can be any vertex on From previous observations, we have since the position of v l on the spreading order ranges from d(v, C h ) + h to n − t v l + 1. For example, in Table V , we can see that v l = v 2 is the 4th element on σ 1 and v l = v 3 is the 6th element on σ 3 whence the order 4 comes from d(v 4 , C 3 )+h = 1 + 3, and the order 6 comes from n − t v l + 1 = 6 − 1 + 1. Finally, the multiplication with 2 is due to the third property. Now, we can rewrite P (G n |v) for G n with a cycle as: and our goal is to find the vertexv that achieves P (G n |v) = max vi∈Gn P (G n |v i ). Since P (G n |v) is not proportional to |M (v, G n )|, we should compute P (G n |v) by considering each part m v l v (G n , k) and their corresponding probability The first factor in (18) is the probability that k vertices are infected where the kth infected vertex is v l , and the second factor is the probability of that all remaining n − k vertices being infected thereafter. The −1 in the denominator of the second factor and the coefficient 2 at the front are due to the common neighbor in a cycle. Note that multiplying by 2 at the front makes no difference when computing P (G n |v) for each v ∈ G n . From (16) , we see that the number of spreading orders and the corresponding position of v l affect P (G n |v). In this section, we focus on computing |M (v, G n )|. To compute |M (v, G n )|, we can leverage the message-passing algorithm in [14] if G n is a tree. Observe that for each infected vertex in G n , it is infected by one of its infected neighbors Fig. 8. C h is constructed by v 1 , v 2 , ..., v h , and t i is a subtree rooted at v i . (even if it has two infected neighbors), so the actual infecting route is a spanning tree of G n instead of a graph with cycle. Hence, the number of all spreading orders on a graph G n with a cycle can be computed as where T i is the spanning tree of G n , for i = 1, 2, ..., h. If G n contains a C h , then the time complexity of computing Since G n has h spanning trees, and for each spanning tree, we can use the messagepassing algorithm in [14] that has O(n) time complexity. In this section, we propose a theorem and a lemma to characterize the location of epidemic center in G n . Instead of computing |M (v, G n )| for all v ∈ G n in each spanning tree, we leverage some analytical results to find v c . Let t i = t vi be defined as above, and we slightly abuse the notation of the subtree size |t i | as t i . Theorem 3. Let G be a degree regular graph, and G n be a subgraph of G with a single cycle C h = {v 1 , v 2 , ..., v h }. The epidemic center v c of G n satisfies one of the following condition: 1) Each connected component of G n \{v c } is of size less or equal to n/2. 2) v c is a cycle vertex and t i ≤ n/2 for i = 1, 2, ..., h. Remark: Note that v being epidemic center of G n does not mean that each connected component of G n \{v} is of size less or equal to n/2. (Had G n been a tree, then this is true [14] .) However, condition 1 is the sufficient condition of a vertex being the epidemic center even in a general graph. We can locate v c of G n by the first condition in Theorem 3, however, if the first condition is not satisfied, then v c is on the cycle. In the following, we proceed to consider the case that if v c is a cycle vertex. Let T j denote the spanning tree of G n which is constructed by G n \(v j , v j+1 ), for j = 1, 2, ..., h − 1 and T h = G n \(v h , v 1 ). Note that (v h , v 1 ) and (v j , v j + 1) for j = 1, 2, ..., h − 1 are cycle edges of C h . Proposition 1. Let v i be a cycle vertex and assume that |M (v i , T p )| = r, where r and p are integers and 1 ≤ p ≤ h. Then for 1 ≤ q ≤ h, we have where T i p,j is the subtree T i j of the spanning tree T p . The ratio of |M (v i , T p )|/|M (v j , T p )| is proportional to their branch size in T p if v i and v j are adjacent. Now, for the same vertex v i , but in different spanning tree say T p and T x , we can also derive the ratio |M (v i , T p )|/|M (v i , T q )| from (20) . Hence, if we assume |M (v 1 , T 1 )| = r, then we can derive |M (v, T )| for all v ∈ C h and T is a spanning tree of G n in terms of r and t i , where t i is the subtree size as shown in Fig. 8 . The time complexity to find v c is O(n + h 2 ) in the worst case, where h is the size of the cycle in G n . By Theorem 3 and Proposition 1, we conclude that if G n contains a C 3 , then v c of G n is either a vertex that satisfies the first condition or a vertex v i on C 3 with t i = max 1≤j≤3 t j . Lastly, we combine Lemma 1 and Theorem 3 to characterize the location of the ML estimator on regular pseudo-tree. Theorem 4. Let G and G n be defined as in Theorem 3. The optimal solution to (3) is either on the path from the epidemic center of G n to the cycle or on the cycle. Besides, Theorem 4 generalizes the results in [43] , [44] . WITH CYCLES This section proposes a novel distance-based algorithm to solve the epidemic source detection problem on a finite degree regular graph with cycles. From Theorem 2, we can deduce that the likelihood of a vertex is greater if its distance to those irregular vertices and cycles is smaller. Hence, the maximum likelihood estimator should lie on the smallest induced subgraph containing three specific vertices: the epidemic center, the vertex closest to all cycles, and the vertex closest to all irregular vertices. Note that a degree on irregular vertex can be treated mathematically as a size one cycle. Hence we combine the irregular effect and the cycle effect in our algorithm. Definition VI.1. We say is a cycle is a minimum cycle if there is no path between any two non-consecutive cycle vertices except the path along the cycle. 18 Fig. 9 . The infected subgraph G 10 contains ten nodes colored in grey. The epidemic centralities of v 8 and v 10 are the same, however, v 10 is the MLE of the true source in G 10 . Since the irregular effect caused by a small cycle is greater than that of large cycle. Since a vertex v can be contained in multiple different-size cycles, we only take the minimum cycle that contains v into consideration in our algorithm. Let C(v) denote the size of the minimum cycle containing v. If v is not in any cycle and deg(v) > 1, then we set C(v) = ∞, otherwise C(v) = 1. Note that when deg(v) = 1, v is regarded as a size 1 cycle. Base on Theorem 2,4, we heuristically define the weight w v of a vertex v as Since we define the distance center to be the vertex with minimum distance centrality (cf. Definition II.2), we can design a weight such that the location of the ML estimator tends to be close to vertices with "small weights". This is also motivated by the fact that the likelihood of a vertex v being the source is greater if v has a larger epidemic centrality and is closer to those irregular vertices or cycles (cf. Theorem 2 and Theorem 4). The definition of SDC(v, G n ) is a distancebased centrality. Furthermore, the definition of w v reveals that the irregular effect caused by a small cycle is greater than that caused by a large cycle which can be observed from Table V and (16). This definition implies that a vertex within a smaller cycle has a smaller weight which contributes "more" to SDC(v, G) while a vertex not in any cycle has weight 1 which contributes "less" to SDC(v, G). Fig. 9 illustrates such an example of a regular graph G and the infected subgraph G 10 containing two different-sizes cycles, say C 3 and C 4 . Note that, v 10 and v 8 have the same epidemic centrality, however, we have P (G 10 |v 10 ) > P (G 10 |v 8 ) since v 10 is closer to the smaller cycle than v 8 . Definition VI.2. Given a d-regular graph G and vertex v of G, we define the statistical distance centrality of v, SDC(v, G) as the summation of the weighted distance from v to all other vertices in G. Hence, the statistical distance centrality of v in G is defined by The vertex v s with the minimum value for SDC(v, G) is called the statistical distance center. Algorithm 2 is based on the idea of message passing. Let lv(v) denote the level of v in a BFS tree. In Step 2, for a given root v r , we start a message-passing procedure in a BFS traversal to send a downward message containing level Algorithm 2 Statistical Distance-based Contact Tracing (SCT) Input: G n Step 1: For each vertex v, compute the size C(v) of the minimum cycle containing v, and set w v = C(v) C(v)+1 . Step 2: For each vertex v, compute SDC(v, G n ). Step 3: Letv = argmin v∈G SDC(v, G). information to other vertices in the BFS tree. Upon receipt of this information, each leaf v l sends back an upward message containing w v l · lv(v l ) to its parent. Each internal vertex v in sends an upward message, containing the summation of all message from its children plus w vin · lv(v in ), to its parent. In the following, we provide a time complexity analysis of Algorithm 2. For Step 1, the worst case time complexity is O(|C min | · |E(G n )|) [45] , where |C min | is the number of all minimum cycles in G n . Since G is a d-regular graph, each vertex in G n is contained in at most d minimum cycles which implies C min ≤ d · n. The worst case time complexity for the Step 2 in the Algorithm 2 is O(n 3 ), since the BFS traversal for each vertex takes O(n + |E(G n )|). Hence, the worst case time complexity of Algorithm 2 is O(d · n 3 ). In comparison with the BFS heuristic approach in [14] , it applies the BFS traversal for each vertex and compute their epidemic centrality which ends up with worst time complexity O(n 3 ). We provide simulation results on different finite graphs with cycles, the first two simulations are conducted on synthetic graphs such as finite size grid graph and circulant graphs. Both synthetic graphs are regular graphs with cycles except those vertices on the boundary of the grid graph. In synthetic graphs, we first simulate the virus spreading in a given network based on the model described in Section II, to construct the infected subgraph G n . Then, we apply Algorithm SCT to compute the source estimator. We conduct the other four experiments on real-world SARS-CoV2003 and COVID-19 contact tracing networks in Singapore and Taiwan. If we can identify the connection between any two confirmed cases in real-world contact tracing networks, we denote the connection (or contact) as an edge. However, when the number of confirmed cases is too large to record details of contact information, we can only have information about the geographical footprint for some confirmed cases. In this situation, we also denote those visited places as vertices, and we add an edge between a confirmed case and a place if the confirmed case had visited the place. Since G is unknown in practice and contact tracing networks are infected subgraphs G n , we assume that G is a regular graph with a few irregular vertices, and apply Algorithm SCT to the contact tracing networks to compute the source estimator. We use the graph distance from the actual source to the estimator to evaluate its performance. 1) Grid Graph: Disease spreading on grid graph is often considered under different spreading rules and models [34] , [46] . Hence, we select grid graph to be one of the testing synthetic networks. Simulation results are in Table VI and Fig. 10 . Comparing the error distribution (in the number of hops) between Algorithm SCT and the BFS heuristic [14] in a finite grid graph with |G| = 10000 and |Gn| = 150. In particular, the rate of the correct detection, i.e., error = 0, is 12.1% for Algorithm SCT and 2.6% for the BFS heuristic. one of the error distributions is in Fig. 10 . We can observe from Table VI that the statistical distance based algorithm outperforms the BFS heuristic in [14] . Moreover, the average error is increasing as the number of irregular vertices is increasing which again reveals the fact that the likelihood is evened out to those irregular vertices. 2) Circulant Graph: A circulant graph G = (N ; S) is a class of graphs which can be defined by its vertex set V (G) = Z and a set S. The edge set is defined by Note that a circulant graph G is connected if and only if S generates the integer group Z N and we only consider the connected circulant graph. In the simulation, we fix the size of N and S and randomly choose integers from the interval [1, N/2) to form the set S. 3) SARS-CoV2003 Contact Tracing Network in Taiwan: We reconstruct the contact tracing network data of SARS- CoV2003 Taiwan from a graph, which indicates Potential bridges among hospitals and households, in [47] . In the original data, there are four types of nodes which represent the confirmed case, suspected case, hospital, and area, respectively. Since cities or countries provide no information for personal contact, we delete all area nodes from the original data. In addition, we also delete all the nodes that represent suspected cases. We apply Algorithm SCT on this infected network and correctly identify the first place, Taipei Municipal Heping Hospital (now Taipei City Hospital Heping Branch), of cluster infection in April 2003 in Taiwan. In addition, the BFS heuristic approach chooses the red vertex, which represents a confirmed case (not the first case) who had been to Taipei Municipal Heping Hospital. The network graph is shown in Fig. 12 , and the orange vertex is the statistical distance center representing the Taipei Municipal Heping Hospital. Fig. 13 . Each vertex is a cluster (place), and the number on each vertex is the total amount of infected people who have visited the place. On April 3, four places form a cycle, and all vertices are on the cycle. Algorithm SCT suggests that the source estimator is WTG where the first case in this subgraph comes from. On April 10, the epidemic center is S11 due to the link between S11 and STL. Tracing Network in Singapore, 2020 Mar-Apr: The contact tracing network is an unconnected network due to the asymptomatic carriers, so we focus on the largest connected subgraph (cluster), including several worker dormitories and a construction site. We apply Algorithm SCT on subgraphs of the contact tracing network in Singapore provided in [48] . In our computation, each vertex represents either an infected person or a place where the person had visited. An edge between two vertices implies that either a person has visited a place or two places have at least one common visitor. Here we omit the edge of person-person contact since most of the contact history can only be traced back to a place, not a single person. Hence, we treat each person-vertex as a leaf vertex connecting to a place. The first massive outbreak occurred at the beginning of April and peaked on April 20. Hence, we consider the infected subgraph after April 1. We define the source in the connected subgraph to be the first case in this connected subgraph. As far as we know, Case 655 attaching to Westlite Toh Guan (WTG) is the first case found in this subgraph on March 26. On April 3, the connected subgraph formed a 4-cycle, which is shown in Fig. 13 . We can apply Algorithm SCT, which shows that WTG is the source estimator in this subgraph. After April 10, S11 Dorm (S11) becomes the new epidemic center due to the link between STL and S11 Dorm (S11). Note that S11 contains the second earliest case in this cluster and becomes the largest cluster in Singapore, which has more than two thousand cases confirmed in the middle of May. 5) COVID-19 Contact Tracing Network in Taiwan, 2021 Feb and 2021 May: We conduct experiments on two cluster infection in Taiwan recently. The first cluster infection originated from a northern Taiwan hospital on February. This network contains 18 tractable domestic cases. The reason we select this networked data is that the contact network is public information provided by Central Epidemic Command Center in Taiwan [49] and the relation between cases in this network is clearly defined. Note that if we apply the BFS heuristic, then both case 838 and case 856 have the same possibility to be the source estimator. However, case 838 is the vertex with maximum statistical distance centrality, i.e., Algorithm SCT correctly identifies the first domestic case. The second cluster infection is the latest cluster infection found at the beginning of May 2021. As the source of this cluster is unknown, we let the source be the person with the earliest symptom onset. We collect the data before May 14,2021 from [49] , and apply the 2-mode network model [47] to this cluster. Each vertex in this graph is either a workplace or a confirmed case. Both Algorithm SCT and the BFS heuristic identify the workplace, of the first case in this network. The contact tracing network is shown in Fig. 15 , and the red vertex is the source estimator determined by both algorithms. 6) Simulation Results on Other Networks: In addition to the rumor centrality approach, we have selected two other approaches, Dynamical Age [42] and Jordan Centrality [23] , to compare to Algorithm SCT. We use the average distanceerror to measure the performance of each algorithm. The simulation results are shown in Table VIII . In each simulation, we repeat the following process: generate G n , find estimators and compute errors for five hundred times in each type of network. All datasets in Table VIII are available at [50] , [51] or can be generated by networkx [52] . Note that SCT works well in circulant graph and grid graph, since SCT is designed to solve the maximum likelihood estimation problem on finite regular graph with cycles. In addition to the selected algorithms, there are other approaches such as PTVA [26] and gradient maximum likelihood algorithm (GMLA) [27] perform well on source detection problem. However, PTVA and GMLA are uncomparable with the methods listed in Table VIII . Since PTVA and GMLA require additional information such as relative time stamps observed by observers. Moreover, to fairly compare all algorithms, the number of the observers and their location in the network also need to be carefully considered which is out of scope of this article. We present an optimal contact tracing algorithm for epidemic source detection that finds application in identification of superspreaders in an epidemic (e.g., the COVID-19 pandemic). Our algorithm design leverages a statistical distance centrality based on solving a graph-constrained maximum likelihood problem for contact tracing graphs with irregular vertices and cycles. As a high-dimensional statistical problem, epidemic source detection requires a careful understanding of the interaction between stochastic spreading processes and topological features like graph distances, cycles and finite boundaries, allowing us to resolve an open problem of finite degree-regular graphs with cycles in [14] . Performance analysis demonstrated that our algorithm was near-optimal in correctly identifying superspreaders at some of the largest superspreading infection clusters using data from the SARS-CoV 2003 and COVID-19 pandemics in Singapore and Taiwan. It will be most interesting to study computational epidemiology through the lens of "network centrality as statistical inference" to design robust contact tracing algorithms using mathematical and data-driven methods (e.g., machine learning techniques like deep learning, see [1] , [10] , [29] ) that can analyze past epidemic behaviors to fight against newly emerging epidemics. We would like to acknowledge collaborations and interactions on this topic with H. Vincent Poor, Cheng-Shang Chang, Guanrong Chen, Siya Chen and Weng Kee Wong. In this proof, we use the fact that P (v c |G n ) > P ve vc (G n , n) · m ve vc (G n , n), and consider the ratio between the lower bound of P (v c |G n ) and P (v e |G n ) to simplify the proof. Proof. Let G n be the infected subgraph. Without loss of generality, we assume n = 2t + 1 and d ≥ 3. For v e , we have P (v e |G n ) = m ve ve (G n , 1) · P ve ve (G n , 1) = 1 · P ve ve (G n , 1) . For v c , it is simpler to consider the last term of (9) only, that is, P ve vc (G n , n) · m ve vc (G n , n). Note that m ve vc (G n , n) = |M (v e , G n−1 )| where G n−1 = G n \ {v e }. We have P ve vc (G n , n) · m ve vc (G n , n) . Now, let us consider the ratio given by = c 1 · 2 (n − 1) · B( n−1 2 , n−1 2 ) where c 1 and c 2 are some positive values with respect to d. The approximation is given by using Stirling's formula. The above result shows that the ratio becomes larger than 1 when d is fixed, and n is sufficiently large enough. This leads to P (v c |G n ) P (v e |G n ) > P ve vc (G n , n) · m ve vc (G n , n) P (v e |G n ) > 1, On the other hand, we consider the case when v a is on the path from v b to v ir , i.e., {v a , v b } ∈ E(G) and d(v a , v ir ) = d(v b , v ir ) + 1. For brevity, we denote the distance from v a to v ir as D, and relabel the vertices on the path from v p to v e as u 0 to u D+1 which are shown in Fig. A-B . Again, we consider the ratio of m m vir va (G n , i) ≥ k i=D+1 m vir v b (G n , i). Proof. Let D = d(v c , v ir ) and we relabel vertices on the path from v c to v ir as u 1 , u 2 , . . . , u D+1 . We can partition V (G n ) as follows For any vertex v in V (t vir vc ) other than v ir . If d(v, v ir ) > 1, then there is a neighbor of v, say u, such that d(v, v ir ) > d(u, v ir ). Since M (u, G n ) > M (v, G n ), by the condition 1 in Lemma 1, we can conclude that Since the probability P (G n |v) can be computed as follows and P vir v (G n , i) is a decreasing with respect to i. We can conclude that P (G n |u) > P (G n |v). This procedure can be continued until we reach to v ir . By the same argument, we can prove the case when v is in other parts with the remaining uncomparable vertices being on the path from v c to v ir . Proof. Let v ∈ G n and v is not a cycle vertex and each connected component of G n \{v} is of size less or equal to n/2. Consider any given spanning tree T j of G n , assume that v is not the rumor center of T j . Then, there is a vertex u such that t v u > t u v on T j . Since t v u + t u v = n, we can conclude that the size of the connected component of G n \{v} containing u is greater than n/2, which contradicts to the assumption. Hence, we have the fact that v is the rumor center on each spanning tree T j , for j = 1, 2, . . . , h, and v is also the rumor center of G n from (19) . An overview of healthcare data analytics with applications to the COVID-19 pandemic On the identification of superspreaders for infectious disease Super-spreaders in infectious diseases Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing A high-resolution human contact network for infectious disease transmission The effectiveness of contact tracing in emerging epidemics The effectiveness of backward contact tracing in networks Digital proximity tracing on empirical contact networks for pandemic control Bidirectional contact tracing could dramatically improve COVID-19 control Predicting infectiousness for proactive contact tracing Epimap: Towards quantifying contact networks for understanding epidemiology in developing countries The Mathematical Theory of Infectious Diseases and its Applications. Griffin Epidemics and rumours Rumors in a network: Whos's the culprit? Rumor source detection for rumor spreading on random increasing trees Rumor source detection under probabilistic sampling Maximum likelihood rumor source detection in a star network Identifying infection sources and regions in large networks Spotting culprits in epidemics: How many and which ones? Rooting out rumor sources in online social networks: The value of diversity from multiple observations Temporally agnostic rumor-source detection A probabilistic characterization of the rumor graph boundary in rumor source detection Information source detection in the SIR model: A sample-path-based approach Rumor source obfuscation on irregular trees Averting cascading failures in networked infrastructures: Poset-constrained graph algorithms Locating the source of diffusion in large-scale networks Fast and accurate detection of spread source in large complex networks Deep learning of contagion dynamics on complex networks Identifying the superspreader in proactive backward contact tracing by deep learning Rumor centrality: a universal source detector Rooting out the rumor culprit from suspects Rumor source detection with multiple observations: Fundamental limits and algorithms Confidence sets for the source of a diffusion in regular trees The Art of Mathematics: Coffee Time in Memphis A high-resolution human contact network for infectious disease transmission The effects of evolutionary adaptations on spreading processes in complex networks Modeling and analysis of the spread of COVID-19 under a multiplestrain model with mutations A time-dependent SIR model for COVID-19 with undetectable infected persons Optimal detection of influential spreaders in online social networks Medians and peripherians of trees Information Theory, Inference and Learning Algorithms Predicting the sources of an outbreak with a spectral technique Rumor source detection in unicyclic graphs Rumor source detection in finite graphs with boundary effects by message-passing algorithms IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining An efficient algorithm for enumerating chordless cycles and chordless paths Random disease on the square grid Social network analysis for contact tracing Singapore ministry of health Taiwan centers for disease control KONECT -The Koblenz Network Collection SNAP Datasets: Stanford large network dataset collection He received his Ph.D. degree at the Department of Computer Science, City University of Hong Kong, Hong Kong. Currently, he is an Assistant Professor at the Department of Applied Mathematics in Chung Yuan Christian University His research interests are in wireless networks, optimization and distributed machine learning. He is an IEEE Communications Society Distinguished Lecturer and has been an Associate Editor of IEEE Before retirement, he was a Professor with the Department of Applied Mathematics, National Chiao Tung University, for 32 years. Currently, he is a fellow of the Institute of Combinatorics and Its Applications, and an Emeritus Professor of National Yang Ming Chiao Tung University when n is sufficiently large. We first prove Lemma 2 and Lemma 1, which are tools to help us prove our main result.Let v m denote the vertex on the path from v a to v b . Then we can express M (v a , G) asMoreover, we can leverage the factProof. To prove Lemma 1, we first consider the second condition, i.e., the case when d(v a , v ir ) = d(v b , v ir ) = 1.Note that in this case, v a is not on the shortest path from v b to v ir and vice versa. We consider the ratio of m vir va (G n , i) to m vir v b (G n , i) for all possible i. Note thatvir ) are fixed when G n is given, we haveHence, the ratio