key: cord-0036904-gpl0nabv
authors: Li, Xiang; Wang, Jian-Bo; Li, Cong
title: Towards Identifying and Predicting Spatial Epidemics on Complex Meta-population Networks
date: 2017-10-05
journal: Temporal Network Epidemiology
DOI: 10.1007/978-981-10-5287-3_6
sha: 5b71c9914fdb6fc166a662f04c180e485b4b05de
doc_id: 36904
cord_uid: gpl0nabv

In the past decade, the network science community has witnessed huge advances in the threshold theory, prediction and control of epidemic dynamics on complex networks. While along with the understanding of spatial epidemics on meta-population networks achieved so far, more challenges have opened the door to identify, retrospect, and predict the epidemic invasion process. This chapter reviews the recent progress towards identifying susceptible-infected compartment parameters and spatial invasion pathways on a meta-population network as well as the minimal case of two-subpopulation version, which may also extend to the prediction of spatial epidemics as well. The artificial and empirical meta-population networks verify the effectiveness of our proposed solutions to the concerned problems. Finally, the whole chapter concludes with the outlook of future research.

feedback-loop analysis to stability and control of categories of systems and subjects, whatever large-scale or simply structured, linear or nonlinear, low dimensional or extremely high dimensional. The communications among humans and machines in the eyes of Norbert Wiener in 1950s were generally assumed as point-to-point or neglected as regularly structured in the scope of classic graph theory [2, 3] . Afterwards, Erdős and Rényi extended the graph description with uncertainty and randomness, and proposed the random graph theory in 1960s [4] . In the following decades the flourishing information and communication techniques have pushed the whole human society to a networking village of today, while the understanding of dominant yet hidden connectivity patterns of the communications among humans and machines were not revisited until recently.

The discovery of small-world and scale-free features in 1998-1999 has been verified in ubiquitous complex networks [5, 6] , which have attracted the world-wide attention to the new emergence of network science. The popular concerns cover not only the topological complexity of a large-scale complex network system but also the interdependence between the infrastructure and the collective performance of such networks [7] [8] [9] [10] . Typically, from the viewpoint of system and control, the precise mathematic description and appropriate models of a complex network play a significant role to achieve the desirable performance in return. However, in the situations of large-scale spatial prevalence of diseases in human populations, for example, such a solution may be infeasible if the availability of accurate data collections is far from sufficiently satisfactory.

Nevertheless the global outbreaks of prevalent infectious diseases in recent decades have led to great social, economic, and public health loss [11] [12] [13] [14] , which is partially due to the urbanization process and, in particular, the wide-establishment of long-distance public transportation networks (e.g., world-wide air-line web) and urban public commuting systems (e.g., subway and metro networks) to facilitate the dissemination of pathogens accompanied with passengers [15, 16] . Academia has witnessed that prediction and control of epidemic dynamics in networks as a flourishing research topic with interdisciplinary approaches [17] [18] [19] [20] . However, more challenging problems arising from the epidemic prevalence on a meta-population network have not received adequate attentions, such as identifying the parameters of epidemic network systems and the epidemic invasion pathways on a meta-population network, which, ignored previously, certainly play important roles in evaluating the intensity of outbreak of epidemics among human patches/populations.

Assume the seed of a disease/virus as the input signal to the whole human population system, and the observed patient samples as the system output. Then, the spatial invasion of the disease inside the human population is obscure as a black box to be identified, and this system combines many factors such as human mobility patterns (commuting and long-distance traveling) and mathematical epidemiology as well. Therefore, identifying such an epidemic process with the interplay of complex networks and the human population is a challenge to public health-care administrative agency when predicting the large-scale spatial prevalence of a disease and announcing counter strategies. the infection data. In a large-scale meta-population network, the complex pattern of pathways challenges the methodology to identify the epidemic invasion pathways in a meta-population network.

In this chapter, we review our series of work in recent years [33] [34] [35] [36] [37] on identifying parameters of the susceptible-infected model and spatial invasion pathways on a meta-population network as well as the minimal case of two-subpopulation version, which may also extend to the prediction of spatial epidemics as well. The remainder of this chapter is arranged as follows. Section 6.2 gives the detailed description of preliminaries. Section 6.3 introduces the parameter identification of epidemic models on a meta-population network. Section 6.4 contributes the inference of epidemic invasion pathways in a meta-population network with both methodologically and example verifications. In Sect. 6.5, extending the steps of the previous sections, the prediction of spatial epidemic transmission comes with several feasible methods. Finally, Sect. 6.6 concludes the whole chapter with outlook in future research.

A meta-population network, which was originated from the meta-population model proposed by Richard Levins [38] to explore spatial ecology, embeds public transportation networking systems to model and uncover nontrivial patterns of spatial prevalence of global infectious diseases in the past years [15, 31, 32, 39, 40] . In this section, we introduce the meta-population network model and the susceptibleinfected (SI) compartment epidemic dynamic as well. In this chapter, we consider the discrete-time dynamics.

The well-known susceptible-infected (SI) compartment model ( Fig. 6.1) , which is the simplest version in the epidemic compartment family, generally describes the early stage of prevalence of viruses/pathogen [24, 25, 41] , especially in the situation of non-recovery. In such a population, the states of individuals are stratified into two compartments (classes): susceptible to the infection of the pathogen; and infected by the pathogen. Generally, we assume that all individuals are homogenously mixing in the population. The state transition of an individual between two compartments Susceptible Infected is governed by the following reaction process: When a susceptible individual meets (i.e., has the contacts) with an infected individual in a unit time, the susceptible individual will be infected with an infection rateˇ.

Before describing a general meta-population network, we first introduce a minimal meta-population network containing two subpopulations (labelled as 1 and 2 as shown in Fig. 6 .2) with SI epidemic compartments. We assume the infection process evolves as a discrete-time system, and subpopulation 1 is infected initially (In this case of simulation, we assume 1 individual is infected among all 10,000 individuals in subpopulation 1). During each time step, the reaction takes place in each subpopulation if it contains two classes of individuals (susceptible and infected). Denote p 12 (p 21 ) the diffusion rate of individuals transferring from subpopulation 1 to 2 (2 to 1), which are often not symmetric, i.e., p 12 ¤ p 21 . Besides, an individual in subpopulation 1 (2) chooses jumping to subpopulation 2 (1) at diffusion rate p 12 (p 21 ), i.e., the so-called diffusion process. Therefore, the probability an individual stays in subpopulation 1 (2) is 1 p 12 (1 p 21 ). Therefore, without considering the diffusion of new increment of infected individuals after reaction, the whole evolution dynamics is described as 8 

after reaction from t to tC1. The second and third terms of RHS in Eq. (6.1) represent the diffusion of infected individuals in the diffusion process. As mentioned above, we do not consider the diffusion of new increment of infected individuals after reaction in this case. Besides, the evolution of susceptible individuals is similar with the infected individuals.

Extending the minimal version as two subpopulations to the general case of a metapopulation network, we divide the whole population (generally, such a population covers a large-scale spatial region of a country or the whole world) into a number of subpopulations. In a meta-population network, a subpopulation is connected with others via a public transportation network, e.g., the air-line web, the high-way web to form the backbone of such a meta-population network. A subpopulation as a node in the network contains a number of individuals homogeneously mixed, and individuals travel between two subpopulations (nodes) via the public transportation means (edge) with some (fixed) diffusion rate. All edges are directed.

With the SI dynamics, the disease propagates in subpopulations and spreads among neighbouring subpopulations via the reaction-diffusion process in a unit time, as illustrated in Fig. 6 subpopulation i at time t, where S i .t/ is the number of susceptible individuals, and I i .t/ is the number of infected individuals of subpopulation i at time t, respectively. Therefore, the intra-population epidemic dynamics in subpopulation i is governed by the SI model. Per unit time, the risk of infection of a susceptible individual within subpopulation i is characterized by i .t/ DˇI i .t/=N i .t/ during the reaction process. Denote the probability that an individual (S or I) of subpopulation i moves to its neighbouring subpopulation j as diffusion rate p ij , which describes the interpopulation mobility dynamics. The symbol of diffusion rate 0 Ä p ij D

where w ij is the number of individuals moving from subpopulation i to j per unit

Therefore, if we do not consider the diffusion of new increment of infected individuals after the reaction process, the evolution of an infected subpopulation i is described as follows:

which is investigated in Sect. 6.4. When we consider the diffusion of new increment of infected individuals after the reaction, the evolution is described by

where R I j .t/ is the increment of I j .t/ after the reaction from t to t C 1. We give the extensive investigation of the dynamics given by Eq. (6.3) in Sect. 6.5. We now discuss the individual mobility operator to handle the presence of stochasticity and independence of individual mobility, where the number of successful migration of individuals between adjacent subpopulations is quantified by a binomial or a multinomial process, respectively. If the focal subpopulation i only has one neighbouring subpopulation j, the number of individuals in a given compartment X (X 2 fS; Ig and P X X i D N i ) transferred from i to j per unit time, T ij .X i /, is generated from a binomial distribution with probability p ij representing the diffusion rate and the number of trials X i , i.e.,

where 1 p ij denotes the probability of an individual staying in subpopulation i. If the focal subpopulation i has multiple neighbouring subpopulations j 1 ; j 2 ; : : : ; j k , with k representing i's degree, the numbers of individuals in a given compartment X moving from i to j 1 ; j 2 ; : : : ; j k are generated from a multinomial distribution with probabilities p ij 1 ; p ij 2 ; : : : ; p ij k representing the diffusion rates on the edges emanated from subpopulation i and the number of trails X i , i.e.,

where integer`2 OE1; k, term 1 P`p ij`d enotes the probability of an individual staying in subpopulation i.

The epidemic parameters of a networked meta-population include the infection rate and diffusion rate, which play an important role in the SI dynamics, while the stochastic epidemic dynamics and the limit of available data make such an identification task more difficult. In this section, we review the method to identify both parameters for a two-subpopulation network and an estimation of infection rate for a general network version.

We first describe one realization of the invasion process evolving as follows. At the beginning, subpopulation 1 has been initialized with one infected individual in this case. When time evolves, the number of infected individuals I 1 .t/ of subpopulation 1 increases due to the SI reaction dynamics in this subpopulation. The epidemic arrival time (EAT) is defined as the first arrival time of infected individuals from an infected subpopulation moving to a neighbouring susceptible subpopulation. To address the EAT, some infected individual(s) will move (diffuse) to subpopulation 2, which finally succeed in infecting subpopulation 2. Therefore, recording the infection data (the number of infected individuals in subpopulation i at time t, i.e., I i .t/; i D 1; 2) of each subpopulation as the available data, we need to identify the unknown infection rateˇand diffusion rate p 12 .

At the early stage of epidemic dynamics, we can approximate S i .t/ N i .t/; i D 1; 2 (I i .0/ N i .0/) and therefore simplify Eq. (6.1) as We now discuss how to identify diffusion rate p 12 . Repeat the invasion of subpopulation 2 from subpopulation 1 until we record the epidemic arrival time to subpopulation 2, i.e., the disease/virus finally lands in subpopulation 2 and starts the local infection. We investigate the period from the initial time (t D 0) to the epidemic arrival time (t EAT ) that the first H individuals from subpopulation 1 invade subpopulation 2. From t EAT 1 to t EAT , we get 8 <

:

The likelihood function about the first H infected individuals from subpopulation 1 traveling to subpopulation 2 at time t EAT is P.

where t EAT 1 D Á, Á.Á 1/ is an integer. If there are s (s 1) rounds of repeated realizations of invasion processes, the joint likelihood function is given by

where s is the number of rounds of repeated simulation realizations of epidemic invasion processes. Take the logarithm of Eq. (6.9), the joint likelihood function yields L.P/ D ln.P.H f1g ; t 1 I H f2g ; t 2 I I H fsg ; t s //: Therefore, by means of the maximum likelihood estimation, we have dL.P/ dp 12 D 1 p 12 1 . 12 represents the estimation of diffusion rate p 12 .

Mathematically, the estimation of diffusion rates requires the availability of a large number of epidemic realizations for a given meta-population network. However, the availability of such repeated data for emergent infectious diseases is rather limited in reality. Therefore, the estimation of diffusion rates in the general case of a metapopulation network is infeasible due to the computational complexity and the limit of available data, which generally can be alternatively obtained from the statistics of public transportation section. The estimation of infection rateˇin the general case of a meta-population network is addressed here. Summing the number of infected individuals in Eq. (6.3) over all subpopulations

The term I i .t C 1/ I i .t/ fluctuates around its mathematical expectation, and we have the approximation asˇ

Thus, given all recorded times t 1 ; t 2 ; : : : ; t m 0 , the infection rate Ǒ is estimated as

where X > represents the transposition of X, and X D OEI. 

In this subsection, we only illustrate the identification performance of estimating diffusion rate p 12 on a two-subpopulation SI model as an example. A more general case (in the sense of an arbitrary number of subpopulations) of example of identification performance of infection rateˇwill be investigated in Sect. 6.5. In the two-subpopulation case, statistic information of p 12 is embedded in the surveillance infection data of the two subpopulations during the epidemic invasion process. As shown in Fig. 6 .4, the estimation of p 12 approaches the real value if the number of realizations increase, and the estimation error jO p 12 p 12 j is less than 5% of p 12 .

Finally the estimation of p 12 as O p 12 tends to the real value. 

During a real spatial cascade of an infectious disease, the spatial invasion pathways are the collection of directed transmission paths of an infectious disease rooted in the infected source subpopulation invading their susceptible neighbouring subpopulations. Actually, no one can predict such spatial invasion pathways to suppress the spreading processes at its infant prevalence. With the data availability of epidemic arrival time (EAT), i.e., the first invasion time discussed in the previous section, we may infer the patterns of invasion pathways.

Suppose one subpopulation is initially infected containing several infected individuals. As time evolves, the infected individuals of the seed subpopulation travel to the neighbouring subpopulations and try to infect their individuals. The successful invasion brings more invaded subpopulations with the cascade of infections. Therefore, the focus of interest is that when a subpopulation is invaded/infected by its m.m 2/ infected neighbours with the available EAT data, how can we infer the culprit(s) and identify the invasion pathways in such a cascade infection? In the concerned situation, we assume that the surveillance infection data (the number of infected individuals of each subpopulation at each time t) is available as well as the topology of the meta-population network (including diffusion rates).

We categorize all candidates of invasion pathways via the so-called invasion partition (INP) into four types of invasion cases (INCs), as shown in Fig. 6 .5. An invasion case contains two sets, i.e., S and I. Subpopulations which are not infected at t EAT 1 but infected at t EAT are put in set S, and their neighbours which are infected at t EAT 1 are put in set I. All four types of invasion cases are defined.

(i) I 7 ! S: In this case, both I and S only have one subpopulation. That is to say, a susceptible subpopulation is infected at t EAT by the first arrival of infected individual(s) from its unique neighbouring infected subpopulation at t EAT 1, and this infected subpopulation has no other newly infected neighbours at t EAT . (ii) I 7 ! nS.n > 1/: In this case, I contains one infected subpopulation, and S contains n.n > 1/ subpopulations. That is to say, an infected subpopulation simultaneously infects its n.n > 1/ susceptible neighbours, each of which has only one infected neighbouring subpopulation. (iii) mI 7 ! S.m > 1/: In this case, I consists of m.m > 1/ subpopulations, and S only contains one single subpopulation. That is to say, a susceptible subpopulation is infected by the first arrival of infected individual(s) coming from its m.m > 1/ infected neighbouring subpopulation, which has no other newly infected neighbours at this time. (iv) mI 7 ! nS.m; n > 1/: In this case, sets S and I both contain more than one subpopulation. The edges from I to S form a connected subgraph. Each previously susceptible subpopulation in S is infected by the new arrival of infected individual(s) from at least one of the m infected subpopulations in I. Each subpopulation in I has no other newly infected neighbours except the susbpopulations in S at this time. Figure 6 .5 illustrates such four types of invasion cases as I 7 ! S, mI 7 ! nS.n > 1/, mI 7 ! S.m > 1/ and mI 7 ! nS.m; n > 1/. Besides, we define the directed edges from infected subpopulation i in I to susceptible subpopulations in S as invasion edges, which are the candidates of invasion pathways. Therefore, we define a decomposition procedure invasion partition (INP) to achieve the task of dividing subpopulations and edges into such invasion cases. As summarized in Algorithm 1, we propose a heuristic algorithm to achieve the INP task.

1: At an epidemic arrival time, collect all newly infected subpopulations as initial S and their previously infected neighbours as I; 2: Start with an arbitrary element S i in set S, to compose the initial S ; 3: Find all neighbors of S i in set I to compose the set I ; 4: For each new member in I , find its new neighbours in the S to update S if any; 5: For each new member in S , find its new neighbours in the I to update I if any; 6: Repeat the above two steps until we cannot find any new neighbours in S and I, we get an invasion case consisting of I and S , then update the S and I; 7: Repeat steps 2-6 to get new invasion cases until there are no elements in S.

We further classify the observability of a subpopulation and an edge. Observability of a subpopulation is defined by comparing the number of infected individuals of subpopulation i at time t EAT 1 and t EAT , which reflects the information held for the inference of relevant invasion pathway. Observability of an directed edge emanated from an infected subpopulation can be defined by the types of subpopulations it connects to. (i) Observable Subpopulation: From t EAT 1 to t EAT , subpopulation i is an observable subpopulation if it experiences one of the following three state transitions. The first is S i ! I i , which indicates that this subpopulation has been infected (for the first time) during this period by infected individuals (because I i .t/ is available). The second is I i ! S i . We know how many infected individuals diffused from this subpopulation in this case. The third is S i ! S i . This case represents subpopulation i keeps its susceptible status. Here the observability of a subpopulation indicates the diffusion information of this subpopulation. Figure 6 .6a-c illustrate the above cases. Together with the observability of a subpopulation, the directed edges emanated from an infected subpopulation (here denoted i) in set I can be classified into three types, i.e., observable edges, partially observable edges and unobservable edges:

(i) Observable Edges: Any directed edge from i to observable subpopulation j whose transition is S j ! S j or I j ! S j from t EAT 1 to t EAT . This edge implies no infected hosts move from i.

(ii) Partially Observable Edges: If an directed edge emanated from infected subpopulation i to a partially observable subpopulation, the edge is partially observable. (iii) Unobservable Edges: If infected subpopulation i connects with an unobservable subpopulation, this directed edge from i is an unobservable edge.

We now consider to accurately identify the invasion pathways. Among the four types of invasion cases (INCs), since the two types of INCs (I 7 ! S and I 7 ! nS, n 2) have the unique invasion edge(s) from the neighboring infected subpopulation, the invasion pathways therefore are easy to identify accurately. We only need concern the other two types of INCs, i.e., mI 7 ! S and mI 7 ! nS. 

According to the definition of observability, in an INC, the number of local infected individuals in an partially observable source i will be decreased by I i .t EAT 1/ I i .t EAT / due to the movement of infected individuals. If the subpopulations j in the neighbourhood of i only experience the transition of S j ! S j or I j ! S j from t EAT 1 to t EAT , they do not to receive the infected individuals from subpopulation i. Therefore, the newly infected subpopulation S 1 is the only destination for those infected individuals departing from the partially observable sources. Since m 0 Ä m, the second condition guarantees that Eq. (6.13) only has a unique solution, which corresponds to the accurate identification of invasion pathways of this invasion case. Define D ffH i1 ji 2 Y 1 g; : : : ; fH in ji 2 Y n gg as a potential solution for the mI 7 ! nS, if is subject to the following two conditions: (i)

(6.14)

where H ik . 0/ is the number of infected hosts invading subpopulation S k from I i at t EAT ; (ii) For any H ik , we have

Suppose an mI 7 ! nS has M potential solutions, and j D ffH .j/ i1 ji 2 Y 1 g; : : : ; fH .j/ in ji 2 Y n gg .1 Ä j Ä M/ represents one of the solutions. Given some specific prerequisites (as the conditions of Theorem 2), Eq. (6.14) has a unique solution, which implies that the invasion pathway(s) can be identified accurately. Theorem 2 elucidates this scenario.

P m iD1 I i .t EAT / D P n kD1 H k . Proof Since the number of infected individuals in the partially observable subpopulation i reduces at time t EAT , i.e., I i .t EAT / < I i .t EAT 1/, I i .t EAT / > 0, it is inevitable that a few infected individuals diffuse away from subpopulation i. Occurring the state transitions of S j ! S j or I j ! S j from t EAT 1 to t EAT , subpopulations j in the neighbourhood of i (excluding the new infected subpopulation j) cannot receive infected individuals. Therefore, the only possible destination for those infected individuals is subpopulation S k in S.

The conditions E in Ä nCm and P m iD1

have the unique solution D ffH i1 ji 2 Y 1 g; : : : ; fH in ji 2 Y n gg. The reason is that rank ( 

Thus the invasion pathway(s) of this mI 7 ! nS.m; n > 1/ can be identified accurately. u t

Now we are in the position to construct the whole framework of identifying invasion pathways, namely, the invasion pathways identification (IPI) algorithm as summarized as below.

(i) Invasion partition: T whole invasion pathways is defined as the whole invasion pathways of an invasion process. where "opt" represents the optimal solution via dynamic programming. (ii) Accurate identification: For the two cases of I 7 ! S, I 7 ! nS, it is easy to reach the accurate identification of invasion pathways. In the other two cases of mI 7 ! S; mI 7 ! nS, we first evaluate whether mI 7 ! S or mI 7 ! nS can be accurately identified or not. If yes, Theorems 1 and 2 work out the accurate identification. (iii) Identification of potential invasion pathways: If accurate identification is not feasible, we propose an efficient optimization method based on the maximum likelihood estimation to identify the most likely invasion pathways. We define the maximum likelihood (ML) estimator as

where P.a i jINC i / is the likelihood of uncovering the potential pathway a i , supposing the actual pathway is a i . Therefore, we evaluate P.a i jINC i / and choose the maximal likelihood one as a i from all potential pathways a i 2 INC i . (iv) The whole spatial invasion pathways can be reconstructed by assembling all invasion cases chronologically.

Therefore, in the situations where accurate identification of invasion pathways is not feasible, e.g., the conditions of Theorems 1 and 2 are not satisfied, Eqs. (6.13) and (6.14) may have a number of potential solutions which correspond to a set of potential invasion pathways. Therefore, we propose the identification algorithm to infer the most likely pathways among all potential invasion pathways. Herein we unify mI 7 ! S.m > 1/ and mI 7 ! nS.m > 1; n > 1/ as mI 7 ! nS.m > 1; n 1/.

Denote .H .j/ kk / the transfer estimator of infected subpopulation I k in I, k 2 Y k . Here the transfer estimator is used to estimate the diffusion likelihood if I k diffuses H .j/ kk infected individuals to S k . Thus, the likelihood of potential solution j of an INC mI 7 ! nS.m > 1; n 1/ is presented by

where M represents the number of solution j . We now consider the events from t EAT 1 to t EAT , and give some definitions. We assume an infected subpopulation I i in I emanates k i edges in total, among which there are i .1 Ä i Ä n/ invasion edge(s) labeled as 1; 2; : : : ; i with the corresponding diffusion rates p ; 2 OE1; i , is an integer. We suppose H ii infected hosts invade its neighbouring subpopulations in the subset fY i D i g at t EAT . Assume there are`i unobservable and partially observable edges, labelled as 1 C i ; : : : ;`i C i . Along each unobservable or partially observable edge, the traveling rate is p`,`2 OE1;`i, and x`infected hosts leave I i . Accordingly, in total Á i D P`x`i nfected individuals leave I i through the unobservable and partially observable edges. Now there remain k i `i i observable edges, labelled as i C i C 1; : : : ; k i . Along each observable edge, the diffusion rate is p @ , integer @ 2 OE`i C i C 1; k i , and x @ infected individuals leave I i . With probability p i D 1 P p P`p` P @ p @ , an infected individual keeps staying at subpopulation I i . There are x i infected individuals staying in subpopulation I i with probability p i . Because I i connects the unobservable and partially observable infected subpopulations, we obtain P`x`C x i D Á 0 . Therefore, we have the transfer likelihood estimator of I i in the following three parts.

It is difficult to estimate whether and how many infected hosts move to which neighbours due to : : : ; l C I x @ ; p @ ; @ D OEl C C 1; l C C 2; : : : ; kI x i ; p i /: (6.18) With the definition of observable edges, the transfer likelihood estimator is simplified as u D

Given an I ! S observable subpopulation I i , the infected individuals H i D fH ii j D 1; 2; : : : ; g moved out of subpopulation I i to S i are all from the term of I i .t EAT /. Therefore, its transfer likelihood estimator is derived as

where i D fH 00 ii j D 1; : : : ; g denote the infected individuals departing from I i .t EAT /. We then have the transfer likelihood estimator in the following two cases.

The transfer likelihood estimator is pu D

Here, D P H 00 ii .0 Ä Ä I i .t EAT //, which represents the sum of infected individuals travelling from subpopulation I i to S i . For a given , we need to enumerate all possible sets H 00 i D fH 00 ii j jj D 1; : : : ; g to calculate the pu . Case 2:

Similar to Case 1, we should enumerate all possible permutations of H 00 i D fH 00 ii j jj D 1; : : : ; g for a fixed . Therefore, in this case we have the transfer likelihood estimator of I i as

where P 1 and P 2 are the same as those in Eq. (6.21). According to Eq. If the number of the first arrival infected individuals H ij 3, multiple potential solutions may correspond to the same set of potential pathway(s). In this case, we merge the transfer likelihood of all potential solutions of this INC if they belong to the same invasion pathways. Then we find out the most likely invasion pathways corresponding to the maximum likelihood.

After identifying the potential invasion pathways, the whole invasion pathway T whole invasion pathways can be reconstructed chronologically by assembling all INCs. Finally, we depict the IPI algorithm explicitly with the pseudocodes as outlined in Algorithm 2.

1: Inputs: the time series of infection data I i .t/ and topology of network G 2: Find all EAT data 3: for each EAT 4: Invasion partition to find out the I 7 ! S , I 7 ! nS, mI 7 ! S and mI 7 ! nS. 5: for each mI 7 ! S or mI 7 ! nS 6:

if it satisfies conditions of Th 1 or Th 2 7:

Compute the unique invasion pathway 8:

else It does not satisfy conditions of Th 1 or Th 2 9:

Find all M potential solutions j 10:

Compute the P. j jINC mI7 !S / or P. j jINC mI7 !nS / 11:

Merge the P. j jINC mI7 !S / or P. j jINC mI7 !nS / of j corresponding to same pathway(s) 12:

end if 13: end for 14: Find invasion pathway a mI7 !S or a mI7 !nS that maximize P. j jINC mI7 !S / or P. j jINC mI7 !nS / 15: end for 16: Reconstruct the whole invasion pathways (T) by assembling each invasion case chronologically

We now evaluate the identifiability of invasion pathways of all invasion cases. Denote the likelihood corresponding to the most likely pathways for a given invasion case. Therefore we have Definition 2 tells that the bigger . / and the smaller entropy S , the easier to identify the epidemic invasion pathways for an invasion case.

We illustrate the performance of our proposed IPI algorithm to identify the invasion pathways, with the maximal connected component of the American airports network (AAN, Fig. 6.7) to form a meta-population network. Note that the data to construct the AAN was collected from the U.S. demographic statistical data and domestic air transportation [35, 43] . Here, the AAN is a weighted and directed graph having V D 404 nodes (airports) and E D 6480 weighted and directed edges representing flight routes. The weight of edge E ij is defined as diffusion rate p ij D hw ij i hN i i , where hw ij i is the daily amount of passengers of the flight from i to j, hN i i is the population of serving areas [43] of airport i. The average degree of the AAN is hki 16, and the range of degree k is [1, 158] . The range of distributions of hw ij i and p ij is OE1; 9100 and OE7:4 10 8 ; 0:03, respectively. The range of distribution of hN i i is OE6100; 1:907 10 7 , and the total population of the AAN is N total 0:243 10 9 , i.e., approximately the whole population of the United States of America. Therefore, the AAN as the sample of a meta-population network shows high heterogeneity of connectivity patterns, traffic capacities as well as the population distribution [43] .

To verify the performance of the proposed IPI algorithm, we select three methods [15, 31, 32] as the benchmark for comparison, which generate the shortest path trees or minimum spanning trees of a meta-population network. In more detail, [31] generates the average-arrival-time-based (ARR) shortest path tree, and [15] generates the effective-distance-based (EFF) most probable paths, and [32] generates the Monte-Carlo-Maximum-Likelihood-based (MCML) most likely epidemic invasion tree.

We define the identifying accuracy as the ratio of the number of correctly identified invasion pathways by each method to the number of true invasion pathways. We also compute the accuracy of accumulative INCs of mI 7 ! S and mI 7 ! nS, which is defined as the ratio of the number of correctly identified invasion pathways by each method to the number of true invasion pathways in this INC. Besides, we also make the comparison of the identification accuracy at the early stage of epidemic dynamics, which is defined as the period when the first 50 subpopulation have been infected. In the top and middle panels of Fig. 6 .8, we observe the whole identification accuracy and the early-stage identification accuracy, while the bottom panel of Fig. 6 .8 presents the early and whole accumulative identification accuracy of mI 7 ! S and mI 7 ! nS through 20 independent realizations on the AAN, respectively. Here the whole identification accuracy means the identification accuracy of whole meta-population network has been infected. The seed subpopulation in all such independent realizations is set as the Sun Valley Airport in Bullhead City, Arizona. We clearly observe that the IPI algorithm is more accurate at identifying the invasion pathways than other benchmark methods. We then visualize the identified invasion pathways (the lower panel of Fig. 6 .9) during the early stage of a realization compared with the actual invasion pathways (the upper panel of Fig. 6.9 ). The weights (diffusion rates) of invasion edges are shown by the thicknesses of lines, and arrows represent the directions of invasions. We observe that most of the invasion pathways are correctly identified to form the invasive backbone of this realization of an epidemic dynamics, while there still exist some wrongly identified pathways in some INCs, indicating the necessity of defining the identifiability of an INC. We finally examine the identifiability of an invasion case. Figure 6 .10 shows the entropy and identifiability of wrongly identified mI 7 ! S of 20 independent realizations on the AAN. The smaller the identifiability of an invasion case is, the more prone it is to be wrongly identified. The identifiability depicts the wrongly identified mI 7 ! S more reasonably than the likelihoods entropy. The frequency of identifiability of INCs descends obviously, but that of the likelihood entropy of INCs does not clearly ascend. This statistical result indicates that the identifiability … has a better performance to distinguish whether an invasion case is difficult to identify or not than the distinction performance of the likelihood entropy, and also tells that why some invasion cases are easy to identify, whose … are more than 0.5, and why some invasion cases are difficult to identify, whose … are much less than 0.5. Here 0.5 is an empirical value.

As the final part of this chapter, we now move a step further to predict the early stage of an epidemic transmission. Suppose the epidemic process starts from the patient 0 subpopulation. This subpopulation invades and infects its neighbours, and the cascading transmission proceeds. At the early epidemic stage, the time series of the number of infected individuals in each subpopulation I i .t/ (i.e., the infection data) is recorded. Assume the topology of the meta-population network (including population sizes and diffusion rates, as Sect. 6.4) and the time series of the recorded infection data I i .t/ until time t are available, and the focus of interest in this section is to predict which subpopulations will be infected at time step tC1. We consider the SI model with the diffusion of new increment of infected individuals after reaction (see Sect. 6.2 Eq. (6.3)).

The growth of infected individuals in an infected subpopulation is governed by the infected rateˇ, while the diffusion process is ruled by the parameters of multinomial distribution. We first identify the infection rateˇby using the method in Sect. 6.3.2, then estimate the increment R I i .t/ of I i .t/ of subpopulation i after the reaction from t to t C 1.

To keep the population balance of each subpopulation, we assume hw ij i D hw ji i, i.e., hN i .t/p ij i D hN j .t/p ji i, where w ij is the number of individuals that have moved from subpopulation i to subpopulation j in a unit time (e.g., a day). Thus we have hN j .t/i D hN j .t C 1/i. At the early stage, N j .t/ S j .t/, and N j .t/ is included in the population information of each subpopulation of meta-population network. Therefore, we estimate R I j .t/ by

Next we give the algorithm predicting n.n 1/ subpopulations infected from t to t C 1 during the diffusion process. At time step t, all susceptible subpopulations having at least one infected neighbouring subpopulation comprise set S. We discuss the two cases of n D 1 and n > 1 in the following, and Algorithm 3 presents the pseudocode for the prediction algorithm.

(i) n D 1;

In this case, there is only one susceptible subpopulation infected at time t C 1. The likelihood L i .t C 1/ that subpopulation i in set S is infected at time t C 1 is derived as where m is the number of infected neighbouring subpopulations of i at time step t. We label infected neighbouring subpopulations of i as 1; 2; : : : ; m.

Accordingly, the most likely infected subpopulation O v is predicted as

where L j D 1 L j .

(ii) n 2;

The most likely n.n 2/ infected subpopulations in S can be predicted as Note that the above method only presents the most likely infected subpopulations at the next time step. Generally, the number of possibly infected subpopulations increases sharply during the epidemic dynamics. In this case, the likelihood of the most likely infected subpopulation may be very small. Therefore, we shall rank the likelihoods and investigate the top ranking subpopulations, which help us to judge which subpopulations are prone to be infected. Let P i D L i Q j¤i;j2S L j in Eq. (6.29). We define the infected likelihood vector fP 1 ; P 2 ; : : : ; P Z g of all Z candidate subpopulations in set S, where P i is the likelihood the susceptible subpopulation i gets infected in the next time step as Eq. (6.29), i D 1; 2; : : : ; Z. Then we define the infected likelihood entropy E as

This entropy tells the extent of prediction difficulty at each time step. The smaller E , the easier the prediction.

This time we select an artificial meta-population network as the simulation example of spatial epidemic prediction. We generate a scale-free network with the BA model [6] , then design the diffusion rate of each edge. Note that empirically the diffusion rates [44] of air transportation networks depend on the degree of the nodes. We define the diffusion rate from node i to node j as

where b ij stands for the elements of the adjacency matrix (b ij D 1 if i connects to j, and b ij D 0 otherwise), C is a constant (C is assumed as available, and set as 0.005), and O Â is a parameter. We assume that parameter Â follows the Gaussian distribution

2ı 2 / for each subpopulation. By setting constant C and computing the population of each subpopulation at equilibrium, the polynomial regression is employed to evaluate parameters O Â and ı 2 based on the empirical rule of T 0 kˇ0;ˇ0 ' 1:5˙0:1, (where T 0 D P l w jl , andˇ0 is approximately linear with O Â (observed in simulations). Assume O Â D a 0ˇ0 C b 0 , we can obtain O Â , where a 0 ; b 0 are parameters). Therefore we can determine the diffusion rate p ij along each edge. We set the whole BA meta-population network having 404 nodes (subpopulation), and fix hki D 16 .m 0 0 D 9; m 0 D 8/ as the average degree of the BA meta-population network. The initial size of each subpopulation is N 1 D N 2 D D N N D 6 10 5 , and the total population of the whole meta-population network is N total D 6 10 5 404 D 2:424 10 8 .

As illustrated in Fig. 6 .11, the estimation ofˇis close to the actual infection rate. We compare our prediction algorithm with the randomization prediction, i.e., we randomly choose a susceptible subpopulation in S as the most likely infected subpopulation at the next time step. Ranking distance is defined as the difference of rank of likelihood L .t C 1/ between the investigated two subpopulations i and j. In Fig. 6 .12, "RankError" means the ranking distance of the corresponding infected likelihood between the predicted candidate and the actual infected subpopulation. "RandError" means the ranking distance of the corresponding infected likelihood between the randomly selected candidate and the actual infected subpopulation. As shown in Fig. 6 .12, the subpopulations predicted by our algorithm are closer to the actual infected subpopulations at the next time step compared with those randomly selected subpopulations.

We further investigate why the accurate prediction of the infected subpopulation is difficult to achieve. At time step t, if any new subpopulation(s) will be infected in this realization at the next time step, t C 1 is called the prediction time. As shown in Fig. 6 .13, we observe that the number of possible infected candidates Z increases sharply, and the infected likelihood entropy also increases (generally 

As only a snapshot of the emergent frontier in the exciting network science, some latest advances on identification and prediction of epidemic meta-population networks have been introduced in this chapter. The future steps along this line may involve the following aspects: (1) The adaptiveness of humans deserves sufficient respect when facing the modelling, analyses and prediction of a large-scale spatial pandemic situation, and an appropriately designed role with the feedback-loop of human adaptiveness into such a complex networking system will be much appreciated.

(2) The power of Big Data and cloud computing may help embed highresolution records of human behavioural dynamics (including mobility, interaction and other non-private profiles) into the study. Nevertheless, abuse of data should be carefully avoided. (3) The verification even for the prediction of an infectious process requires the precise control means and public strategy in the viewpoints of not only mathematical results but also implementations in practice. Finally comes the end of this chapter, which may still stands at the beginning of the long journey in this exciting and challenging direction.

Cybernetics: or Control and Communication in the Animal and the Machine

Graph Theory with Applications

Introduction to Graph Theory

On the evolution of random graphs

Collective dynamics of 'small-world' networks

Emergence of scaling in random networks

Statistical mechanics of complex networks

Complex Networks: Theories and Applications

Networks: An Introduction

Introduction to Complex Networks: Models, Structures and Dynamics

Modeling Infectious Diseases in Humans and Animals

Infectious Diseases of Humans: Dynamics and Control

Modeling infectious disease dynamics in the complex landscape of global health

Engineering a global response to infectious diseases

The hidden geometry of complex, network-driven contagion phenomena

Globalization, climate change, and human health

Epidemic processes in complex networks

Propagation Dynamics on Complex Networks: Models, Methods and Stability Analysis

A Data-driven inference algorithm for epidemic pathways using surveillance reports in 2009 outbreak of influenza A (H1N1)

Forecast and control of epidemics in a globalized world

On identifiability of nonlinear ODE models and applications in viral dynamics

Inferring networks of diffusion and influence

Robust reconstruction of complex networks from sparse data

Rumors in a network: who's the culprit?

Rumor source detection with multiple observations: fundamental limits and algorithms

Discovering network behind infectious disease outbreak

Spatial dynamics of the 1918 influenza pandemic in England, Wales and the United States

Inferring epidemic network topology from surveillance data

Inferring plasmodium vivax transmission networks from tempo-spatial surveillance data

Inferring disease transmission networks at a metapopulation level

Global disease spread: statistics and estimation of arrival times

Multiscale mobility networks and the spatial spreading of infectious diseases

On estimating spatial epidemic parameters of a simplified metapopulation model

Inferring spatial transmission of epidemics in networked metapopulations

Identifying spatial invasion of pandemics on metapopulation networks via anatomizing arrival history

Predicting spatial transmission at the early stage of epidemics on a networked metapopulation

Towards identifying epidemic processes with interplay between complex networks and human populations

Some demographic and genetic consequences of environmental heterogeneity for biological control

A mathematical model for the global spread of influenza

Spatial epidemiology of networked metapopulation: an overview

A dynamic model of bovine tuberculosis spread and control in Great Britain

Money circulation, trackable items, and the emergence of universal human mobility patterns

Evolution of scaling emergence in largescale spatial epidemic spreading

The architecture of complex weighted networks