key: cord-0015052-g3pyi2l6
authors: Singer, Benjamin J.; Thompson, Robin N.; Bonsall, Michael B.
title: The effect of the definition of ‘pandemic’ on quantitative assessments of infectious disease outbreak risk
date: 2021-01-28
journal: Sci Rep
DOI: 10.1038/s41598-021-81814-3
sha: e57816fbd1e89dc3fb878d1a2bfcb28c7754ce40
doc_id: 15052
cord_uid: g3pyi2l6

In the early stages of an outbreak, the term ‘pandemic’ can be used to communicate about infectious disease risk, particularly by those who wish to encourage a large-scale public health response. However, the term lacks a widely accepted quantitative definition. We show that, under alternate quantitative definitions of ‘pandemic’, an epidemiological metapopulation model produces different estimates of the probability of a pandemic. Critically, we show that using different definitions alters the projected effects of key parameters—such as inter-regional travel rates, degree of pre-existing immunity, and heterogeneity in transmission rates between regions—on the risk of a pandemic. Our analysis provides a foundation for understanding the scientific importance of precise language when discussing pandemic risk, illustrating how alternative definitions affect the conclusions of modelling studies. This serves to highlight that those working on pandemic preparedness must remain alert to the variability in the use of the term ‘pandemic’, and provide specific quantitative definitions when undertaking one of the types of analysis that we show to be sensitive to the pandemic definition.

International health organisations such as the WHO have not provided any formal definitions of the term 'pandemic' , and the WHO no longer uses it as an official status of any outbreak 25, 31 . It would, however, be hasty to dismiss the importance of the term on these grounds. Although the WHO no longer uses the term 'pandemic' officially, the WHO Director-General drew attention to their use of the term as recently as March 2020, to describe the status of the COVID-19 outbreak 32 . The Director-General cited "alarming levels of inaction" as one of the reasons to use the term, along with the caveat that "describing the situation as a pandemic does not change WHO's assessment of the threat posed by this virus". The WHO's use of the term was of interest to the public, receiving extensive press coverage [33] [34] [35] . The term 'pandemic' clearly continues to be important to indicate serious risk during disease outbreaks.

Regardless of the extent to which the pandemic definitions currently in use do or do not agree, they are all qualitative in nature, using descriptions such as "very wide area" and "large number of people". Perhaps as a result of this, many quantitative studies on pandemics do not make use of a quantitative definition of a pandemic, but instead focus on causally related concepts, such as sustained transmission 19 , or emergence of novel viruses 36 . Others treat the spread of a pathogen at a pandemic level as a context in which to study transmission dynamics, without paying special attention to how those dynamics might lead to a pandemic as distinct from an epidemic or a more limited outbreak [37] [38] [39] . In this paper, we examine the effects of alternative pandemic definitions on the analysis of key epidemiological questions. The results provide a foundation for deciding the appropriate quantitative definition of 'pandemic' in a given context.

We use a metapopulation model to investigate the effects of pandemic definition on the results of a quantitative assessment of the probability of a pandemic. Metapopulation models are commonly applied to pathogens that spread between regions of the world, and so are appropriate for modelling pandemics [40] [41] [42] [43] [44] [45] . We represent states of our metapopulation model as states of a Markov chain, allowing us to calculate the probability of a pandemic directly, as opposed to simulating many stochastic outbreaks and recording the proportion which result in pandemics. We explore two different kinds of pandemic definition, following Morens et al. 2009 24 , specifically:

• the family of transregional definitions, where a pandemic is defined as an outbreak in which the number of regions experiencing epidemics meets or exceeds some threshold number n. We refer to specific transregional definitions as n-region transregional definitions, e.g. a 3-region transregional definition. • the interregional definition, where a pandemic is defined as an outbreak in which two or more non-adjacent regions experience epidemics.

Note that these definitions require a specific sense of 'region' . These regions could be countries, or they could be larger or smaller than individual countries-from counties to health zones to WHO regions. Our metapopulation model (detailed in the Methods section below) can be used to model regions of any size. We have chosen not to include definitions with criteria relating to the number of people infected or killed, instead of, or in addition to, geographical extension. Extension is the only universal factor in pandemic definitions, and so is the focus of the current study 24 . Three questions that help form public health policy at the beginning of an outbreak are:

• Would interventions restricting travel reduce the risk of a pandemic?

• Does a portion of the population have pre-existing immunity, and does this affect the risk of a pandemic?

• How is the risk of a pandemic affected by regional differences in transmission? Using our metapopulation model, we explore how changing the pandemic definition does or does not affect our answers to these questions. We show that the precise definition of a pandemic used in modelling studies can (but does not always) affect the inferred risk. The predicted effects of travel restrictions, the influence of pre-existing immunity, and the impact of regional differences in transmission can all vary when alternative definitions of 'pandemic' are used. This demonstrates clearly the need to consider carefully the pandemic definition used to assess the risk from an invading pathogen. This is necessary for clear communication of public health risk.

Travel rates. One important question about pandemic risk is what effect inter-regional travel rates have on the probability of a pandemic occurring 16, 17, 46, 47 . Here we model epidemics occurring in regions connected on a network in which the connections and their weighting can be set at fixed values representing the rates of travel between regions. We consider simple networks that can illustrate the effects of our different pandemic definitions-namely, the star network, in which one central region is connected to all others with equal weighting and the non-central regions lack any other connections, and the fully connected network, in which each region is connected to every other with equal weighting. Figure 1 illustrates that the connectivity of the full network is much higher than that of the star network. Using the star network allows us to make the distinction between adjacent and non-adjacent regions, thus allowing us to distinguish between transregional and interregional pandemic definitions. Unless otherwise stated, all figures in the current study are generated with a transmission rate of β = 0.28 per day, a recovery rate of µ = 0.14 per day, and an inter-regional travel rate of 2 × 10 −4 per day. This corresponds to a within-region basic reproduction number ( R 0 ) of 2. These values are within the plausible range for both seasonal and pandemic influenza, and as such they can be used to simulate a plausible pathogen of pandemic potential 38 . We further assume an initial population of 1000 susceptible individuals in each region, and that the outbreak is seeded by a single infectious individual in one region. In the full network, all regions are equivalent, www.nature.com/scientificreports/ so we seed the outbreak in a single arbitrary region. In the star network, we take the average probability of a pandemic over outbreaks seeded in each region. Using a model with ten regions allows us to test a range of different transregional definitions of a pandemic. The pandemic probability under a range of n-region transregional definitions for a 10-region network with a variety of travel rates is shown in Fig. 2 . An n-region transregional definition effectively provides a threshold number n-if more than n regions experience epidemics, the outbreak is counted as a pandemic, and otherwise it is not. Thus we indicate the different possible n-region definitions through their threshold numbers in Figs. 2, 5, and 6.

The 1-region transregional definition merges the definitions of 'pandemic' and 'epidemic' in an implausible way, but it is included in these figures for comparison. The comparison between the pandemic probability according to the 2-region definition and according to the 10-region definition shows the difference between pandemic definitions that are satisfied by any transregional transmission and definitions that are satisfied only by truly global spread. For the star network, or for the fully connected network with low travel rates, there is a marked difference between the probability of either of these definitions being satisfied. However, for the fully connected network at medium or high rates of travel, if the pathogen invades the initial region successfully, then it will go Pandemic probability for a range of between-region travel rates and a range of transregional pandemic definitions. The "pandemic threshold number" refers to the minimum number of regions that must experience epidemics before a pandemic is declared. The pandemic probability is, in general, sensitive to the pandemic definition used, but the degree of sensitivity depends on network structure and travel rates. (a) Pandemic probability for a star network. The pandemic probability is, in general, highly sensitive to the pandemic definition used. (b) Pandemic probability for a fully connected network. The sensitivity of the pandemic probability to the pandemic definition used is significantly reduced at high travel rates. www.nature.com/scientificreports/ on to spread globally. As such, the probability of a pandemic is nears the maximum of 0.5 (i.e. 1 − 1/R 0 ) at all thresholds. For any definition, the probability of a pandemic increases with the connectivity of the network, and with travel rates across the network. We can also explore the difference in pandemic probability between the transregional and interregional definitions, which make use of a distinction between adjacent and non-adjacent regions. This is shown for a 10-region star network in Fig. 3a , in which we consider the 2-region transregional and 2-region interregional definitions. We choose a star network as it is one of the simplest network types in which there are adjacent and non-adjacent regions. There is a difference between the 2-region interregional and transregional definitions, but the difference is much smaller than that between the 2-region interregional and 10-region (global) definition, and reduces as travel rates increase. In the case of a fully connected network, all regions are essentially adjacent to each other, so we compare only the 2-region transregional and global definitions. We find that the definitions are clearly distinct for low travel rates, but as the travel rate increases the difference between the likelihood of a pathogen causing an epidemic in one region and the likelihood of it causing epidemics in all regions disappears. This is due to the fact that the pathogen can be introduced into any population from any other.

In this section we have shown that, when a pandemic is defined in terms of which regions experience epidemics of a disease, different definitions can produce very different estimates of the pandemic probability at low connectivity or travel rates, but have a much smaller effect at high connectivity and travel rates. In the supplementary information, we illustrate that effects due to network structure are mostly due to the difference in motility between the full network and the star network, although topology still plays an important role.

Cross-immunity. Some pathogens with pandemic potential have a prior history of infecting humans, such as pandemic influenza. Newly emerged pathogens with no history of infecting humans are less likely than these established pathogens to encounter regions where susceptible individuals have partial immunity to infection. Established pathogens may encounter individuals with partial immunity acquired from infections with previously circulating strains-i.e. cross-immunity 48, 49 . It can be important in responding to an outbreak to consider whether any individuals might have existing immunity. We can therefore investigate the interaction between immunity generated by prior exposure and pandemic definition by examining how cross-immunity affects our calculation of the pandemic probability on a network.

We modelled the spread of a pathogen over a ten-region network with no cross-immunity initially, where the initial infected individual could originate in any region. We only included cases where at least one region experienced an epidemic of this initial pathogen. To simulate the emergence of a strain with higher pandemic potential, we then introduced a second pathogen with a higher transmission rate of β = 0.42 (corresponding to a basic reproduction number of 3), to which infection with the initial pathogen conferred some degree of partial immunity to infection. The strength of this immunity is written as α . See the Methods section for details of how cross-immunity is incorporated into our modelling framework. We defined a pandemic as occurring when all ten regions experienced epidemics of the second pathogen, and repeated the model for two values of the level of cross-immunity at a variety of between-region travel rates. The results are presented in Fig. 4 . First, increasing cross-immunity decreases the probability of a pandemic. Second, the presence of crossimmunity changes how pandemic probability scales with travel rates. In general, the pandemic probability increases faster with travel when the level of cross-immunity is low, except when it reaches a point of saturation as in Fig. 4b . www.nature.com/scientificreports/ Figure 5 shows the simultaneous effects of different n-region transregional pandemic definitions and the degree of cross-immunity in determining the pandemic probability. Here we fix the travel rate at 2.0 × 10 −4 per day. In the full network there is a distinct transition from higher risk to lower risk, as cross-immunity approaches one. However, in the star network there is, on average, less circulation of the initial pathogen, so the effect of cross-immunity is less dramatic. Increased cross-immunity can also increase the difference in risk for different pandemic definitions-for the fully connected network, when cross-immunity exceeds α = 0.5 , differences in probability between different thresholds become visible that are much smaller at lower values. This suggests that the probability that an outbreak will develop into a pandemic may be more sensitive to the exact pandemic definition for outbreaks of pathogens that encounter pre-existing immunity than for pathogens which encounter only fully susceptible populations. However, this effect is not seen for the star network, in which the low connectivity of the network results in larger differences in probability between different thresholds even at low levels of cross-immunity.

A topic of great concern during a pandemic is heterogeneity in risk between different countries or regions 50, 51 . Cross-immunity can create one kind of heterogeneity, since it is common for previous exposure to a pathogen to differ between regions 52 . Another kind of heterogeneity is that due www.nature.com/scientificreports/ to different public health interventions. Here we ignore cross-immunity and instead examine a heterogeneous fully connected network of ten regions, five of which have a higher rate of transmission of the pathogen than the other five. This can be thought of as an approximation to the difference between poor regions with a relative lack of public health interventions, and wealthy regions with well-funded public health organisations and increased access to healthcare. The level of heterogeneity was defined as the ratio of the transmission rate in the higher-transmission regions to the transmission rate in the lower-transmission regions. The average transmission rate across all regions was kept fixed at β = 0.28 per day, corresponding to a basic reproduction number of 2. The simultaneous effects of heterogeneity and the pandemic definition in determining the pandemic probability are illustrated in Fig. 6 .

The row for the 1-region definition shows how the risk of any outbreak varies with the changing basic reproduction number of the pathogen in the region in which it emerges. More complex effects can be seen for higher n-region definitions, especially the 10-region definition, where, at high levels of heterogeneity, even pathogens emerging in higher-transmission regions are prevented from spreading globally due to the low chance of epidemics in lower-transmission regions. Thus the probability of a pandemic under a 10-region definition increases and then decreases with increasing heterogeneity. In the supplementary information, we show that this increasing-decreasing effect exists in networks of different sizes and structures. It appears at different thresholds in different networks. No corresponding effect exists for a pathogen emerging in a lower-transmission region, where increasing heterogeneity always decreases the chance of a pandemic, however it is defined.

In this study, we have developed a theoretical framework to estimate the probability of a pandemic, as detailed in the Methods section below. We use a Markov chain technique based on SIR dynamics to model the spread of a pathogen. The results of this modelling framework reveal in which situations the definition of 'pandemic' has a strong effect on the calculated pandemic risk and in which situations it does not. The models also illustrate the effects of differing epidemiological parameters on the pandemic risk under different definitions, and how these effects interact with each other.

Returning to the three epidemiological questions introduced in the introduction, we can see that our results show how the answers can depend on our definition of a pandemic, and on key population and pathogen parameters. The first question was "Would interventions restricting travel reduce the risk of a pandemic?" In Fig. 2 , we see that reductions in travel rates always reduce risk in a network with low connectivity, where travel occurs mainly through a central hub. However, in a highly connected network with high travel rates, travel would have to be extremely highly suppressed to change the probability of a pandemic substantially, under most definitions. This accords with previous findings regarding the effectiveness of restricting travel 53 . Additionally, in the highly connected network, changing the definition of a pandemic makes little difference to the pandemic probability, for high enough values of the travel rate. Figure 3 further illustrates the effects of different definitions. Changing the pandemic definition can sometimes greatly alter the estimated probability of a pandemic, as seen in Fig. 3a between the yellow line, representing the 2-region transregional definition, and the purple line, representing the 10-region transregional definition. The effect on the pandemic risk of reducing travel rates also differs substantially between these two definitions. However, there are situations where changing the definition does not significantly change the pandemic www.nature.com/scientificreports/ probability, as seen in the same figure between the yellow line and the dashed green line, representing the 2-region interregional definition. Both the estimated risk and the effect of reducing travel are very similar in these two cases. So, while some changes in definition do not cause a large change in quantitative analyses of the risk of a pandemic, others may significantly alter both our point estimates and the predicted effects of key parameters. Figure 3b shows that this may depend on the values of those key parameters themselves. For low travel rates, the pandemic probability is very different for the two illustrated definitions, but at high travel rates the pandemic probabilities for the two definitions converge. The second question was "Does a portion of the population have pre-existing immunity, and does this affect the risk of a pandemic?" The presence of immunity can significantly alter the results discussed in the paragraphs above. In Fig. 5b , the leftmost column is equivalent to the column from Fig. 2b in which = 2.0 × 10 −4 per day, but with a higher transmission rate of β = 0.42 . However, as cross-immunity increases, a marked difference in the pandemic probability between different definitions becomes visible. This shows that the conclusion that precise pandemic definitions are of reduced importance in a highly connected network with high travel rates is context sensitive-if the population has high immunity, differences between definitions re-emerge.

The third question was "How is the risk of a pandemic affected by differences between regions?" In Fig. 6 , we examined how heterogeneous transmission rates in different regions affect the pandemic probability. Many pathogens have higher transmission rates in lower income countries, and novel pathogens are more likely to emerge in low income countries 50, 51, 54 . Putting these two facts together, we see that pathogens are most likely to emerge in countries in which they have higher transmission rates. Motivated by this, we compared the scenarios of emergence in a higher-transmission and lower-transmission region, finding that pandemic definition makes a larger difference for diseases emerging in a higher-transmission region. In particular, when the pandemic definition requires many countries to experience epidemics to qualify an outbreak as a pandemic, including countries with lower transmission rates, we see striking non-linearity in the relationship between heterogeneity and the pandemic probability. For these definitions, as the difference in transmission rates between higher-and lower-transmission regions increases, the pandemic probability increases initially, before decreasing. This initial rise is due to the enhanced spread between high-transmission regions increasing the importation rate to lowtransmission regions. This result implies that, when the mean value of the transmission rate is fixed, a small gap in the effectiveness of public health infrastructure between wealthy and poor regions puts all regions at greater risk, while a larger gap protects wealthier regions while the risk for poor regions continues to increase.

To illustrate this concept, consider the contrasting examples of Ebola and COVID-19. The 2014 outbreak of Ebola virus followed the pattern of high incidence in low income countries but low incidence in high income countries. The virus spread through several low-income African countries but was effectively contained when introduced to high-income countries [55] [56] [57] . In this case, high-income countries had the capacity to prevent a pandemic from taking hold, being able to quickly isolate and treat symptomatic individuals. This generated high heterogeneity in transmission, corresponding to the right side of Fig. 6a , with low-income countries at high risk and high-income countries at low risk. In contrast, high-income countries have not been able to escape the pandemic of COVID-19, in part due to asymptomatic and presymptomatic transmission of SARS-CoV-2 allowing it to evade surveillance and public health measures 58, 59 . This has led to more similar transmission rates between different countries, corresponding to the left side of Fig. 6a , where risk is more uniform between regions and therefore between pandemic definitions.

In our analyses, we use a metapopulation modelling framework. Metapopulation models are widely used in pandemic modelling [40] [41] [42] [43] [44] [45] . Our novel Markov chain approach allows us to calculate pandemic probabilities directly, without requiring large numbers of simulations to generate an approximation. We expect our overall conclusion, that the effects of key parameters on pandemic risk depend on the pandemic definition, to hold irrespective of the underlying modelling framework. Future studies could replicate our analyses using different models and modelling approaches, such as metapopulation models with additional epidemiological complexity 43, 45, 60, 61 or the widely used global epidemic and mobility (GLEaM) model [62] [63] [64] . Exploring how our quantitative results vary for different modelling frameworks in the field of mathematical epidemiology 14, 16, [65] [66] [67] is a target for further investigation.

Other future work using our modelling framework could address the role of pandemic definitions in quantifying the effects of additional epidemiological parameters on pandemic risk, such as use of different types of travel (e.g. within-country transport or international flights) 45, 68, 69 , the rate of nosocomial infections 70 , or age structure 71 . Our metapopulation modelling framework is generally applicable, and this framework could be extended to represent outbreaks of many different specific pathogens emerging in various locations. An important factor for response planning is the timescale over which outbreaks develop into pandemics. The duration of the initial phase of outbreaks has been a subject of previous study 72 , as has the overall duration of outbreaks 10, [73] [74] [75] [76] . In theory, Markov chain models could be used to assess the time for a local epidemic to develop into a pandemic, and we leave this as an avenue for further work.

In summary, we have developed a novel modelling framework for estimating the pandemic risk. We have applied this framework to assess the pandemic risk in a range of different scenarios, and have interpreted the results under a variety of pandemic definitions. We have found that certain relationships, such as the effect of heterogeneity in transmission between regions on the risk of a pandemic, are highly dependent on the definition of 'pandemic' used, while others, such as the effect of high travel rates on pandemic risk in a highly connected network, are not. This work provides a foundation for improved communication about pandemic risk, by highlighting the contexts in which pandemic definitions need to be provided in quantitative detail. In general, we contend that, when assessing the risk that an outbreak will develop into a pandemic, the precise pandemic definition used for a given analysis should be considered and stated clearly. Future work could investigate the effects of alternative definitions in more detailed epidemiological models, and extend this framework to investigate different dynamical features of pandemics. www.nature.com/scientificreports/

We have combined standard epidemiological modelling techniques with a novel Markov chain treatment of metapopulation dynamics to produce a method for calculating the probabilities of epidemics and pandemics in a network of population regions. At each step of this chain, we resolve information about which regions may experience epidemics. The order in which the status of any given region is resolved does not necessarily match the order in which the given epidemics occur in calendar time. A benefit of our model is that we can calculate the probabilities of different final outcomes directly, without requiring large numbers of stochastic simulations to estimate these values. This comes at the cost that temporal information is not represented explicitly in our model: we focus on the pandemic probability, accounting for all possible ways that a pandemic could occur, rather than estimating the possible times at which epidemics could occur in different regions or the timescale over which an outbreak will develop into a pandemic (see Discussion). We model the transmission of a pathogen through n regions labelled P 1 , P 2 , P 3 , . . . , P n . Each region P j has associated with it some intra-region pathogen transmissibility β j , disease recovery rate µ j , and population size N j . From these quantities it is possible to calculate a region-specific basic reproduction number R 0,j . This can be fixed across all regions for a particular pathogen, or allowed to vary from region to region to reflect local epidemiological differences.

First let us consider the spread of the pathogen in a single region, using well-established results of stochastic Susceptible-Infected-Recovered (SIR) models. If a region P j contains an initial number of infected individuals I j (0) , then in the stochastic SIR model, the probability that these individuals do not cause an epidemic in P j is (1/R 0,j ) I j (0) when R 0,j ≥ 1 , and 1 otherwise 17 . We also define the final size of an epidemic R j (∞) (not to be confused with R 0,j ) as the number of recovered individuals in P j at the end of the epidemic. This equals the total number of individuals in P j who become infected at any time, and is given by the solution of the following equation 77 .

Infected individuals are assumed to travel from region P j to region P m at a rate jm . We seek the probability that infected individuals travelling from P j will not cause an epidemic in P m , in the case where initially infected individuals in P m do not cause an epidemic in P m (including the case where there are no initially infected individuals in P m ). This is equal to the probability that i infected individuals migrate from P j to P m , multiplied by the probability that this number of individuals fails to cause a major epidemic, summed over possible values of i. The minimum value of i is the case where no infected individuals migrate, and the maximum value is the case where all individuals in P j that become infected at any point migrate. This gives us an expression for q jm , the conditional probability that, if P j experiences an epidemic and P m does not experience an epidemic due to a source of infected individuals other than P j , P m does not experience an epidemic.

This approximation is valid when the number of infected individuals that travel between regions is much smaller than the size of the regions.

We assume that infected individuals travelling from a region P j cannot cause an epidemic in a neighbouring region P m if P j does not itself experience an epidemic. Then computing the value of q jm for every pair of populations P j and P m gives us sufficient information to determine the probability of any particular set of regions connected on a network experiencing epidemics so long as there are no interactions between different groups of migrants arriving in a region, and the total numbers of migrants in any region remains very small relative to the region's size. If these assumptions hold, we can imagine the regions on a network with weighted directed edges, where the weight of the edge directed from region P j to region P m is q jm .

To determine how the final probabilities of epidemics depend on the pairwise probabilities q jm , we use a Markov chain. The states of this Markov chain assign one of three states to each region-N (for neutral), where it is not yet determined whether the region will experience an epidemic, E (for epidemic), where it is determined that the region will experience an epidemic but it is not yet determined in which further regions it will cause epidemics, and T (for terminal), where it is determined that the region will experience an epidemic and in which further regions it will cause epidemics due to onward transmission. As our model does not explicitly represent dynamical processes occurring over time, these states should not be interpreted as actual states of infection and recovery within regions, but rather as bookkeeping devices for the role of various regions in determining the spread of the pathogen through the network.

Suppose we have a network connecting n regions. In the initial state, each region where the initially infected individuals have caused an epidemic is in state E, and all the other regions are in state N. The global state of the network is simply the product of the states of each system. We can then define a transition matrix T that acts on the global state. The elements of this matrix are denoted t x 1 x 2 ...x n →y 1 y 2 ...y n .

(2) www.nature.com/scientificreports/ where x j is the state (N, E, or T) of region P j before the transition, and y j is its state afterwards. The expression inside the first set of square brackets ensures that the only acceptable transitions for any given region are N → E and E → T , and requires that all epidemic regions in the initial state must be terminated in the transition (this prevents double-counting of possible transmission paths). The expression inside the second set of square brackets gives the probability of each N → E transition, and the expression inside the final set of square brackets gives the probability of each N → N transition, given the regions that are in state E before the transition. Note that these transitions do not represent a dynamical process-the order of transitions in this model does not necessarily correspond to the order in which regions experience epidemics. Instead, the transitions are simply stages along the exploration of different routes and outcomes from the disease spreading process.

The initial probability of each global state z 1 z 2 . . . z n (where z i ∈ {N, E, T} ) is given by:

where Q j = min((1/R 0,j ) I j (0) , 1) is the probability that the initial population of infective individuals does not cause an epidemic in region P j . Essentially, no region can begin in state T, and the probability of each initial global state is given by the product of the probabilities of each region being in the corresponding initial regional state.

In this system, all states in which no region is epidemic are absorbing, and in each transition at least one epidemic state must become terminal. This means that the system must reach an absorbing state in at most n transitions, since at least one region becomes terminal in each transition, and a fully terminal state is absorbing. So the final probability vector p final is given by with T as the transition matrix and p initial as the vector whose elements defined by Eq. (5). This final vector gives the probabilities of each configuration of the metapopulation, with populations in state N never experiencing an epidemic, and regions in state T experiencing an epidemic at some point.

Cross-Immunity. The model described above can incorporate certain epidemiological details, such as heterogeneity of population parameters, but is restricted to treating quite simple disease dynamics. In this section we expand the model to treat pathogens that give those who overcome infection cross-protection against future strains of that pathogen. This is necessary to be able to investigate how pre-existing immunity changes how pandemic definitions affect the results of our model.

We first describe the spread of a pathogen strain X using the methods above, introducing a superscript X to the relevant parameters to mark the strain, e.g. R X 0 , R X (∞) , and p X final . We assume that infection with pathogen X confers cross-immunity α to a second strain of the pathogen, which we call Y. In each population P j we can define an effective basic reproductive number for Y in the case that P j has experienced an epidemic of X, which we call R Y e,j .

This expression simply multiplies the basic reproductive number by the effective number of susceptible individuals given the prevalence of cross-immunity in the population. It is through this expression that cross-immunity enters the model-the parameter α does not otherwise appear in what follows. We can write down an equation for the expected total number of individuals in P j infected in an epidemic of Y in analogy to Eq. (1). In the case where there has been no previous epidemic of X in P j , the expected epidemic size is the solution R Y j,noX (∞) of δ((a, b, . . . , z), (A, B, . . . , Z)) = 1 if (a, b, . . . , z) = (A, B, . . . , Z) 0 otherwise,

p final = T n p initial , www.nature.com/scientificreports/ In the case where there has been a previous epidemic of X in P j , the expected epidemic size is the solution R Y j,X (∞) of

We assume that individuals infected with Y travel at the same rate as individuals infected with X. We then define the pairwise probabilities of transmission of Y between populations in analogy to Eq. (2) . That is,

where R Y c,m = R Y 0,m when P m has not experienced a previous epidemic of X, R Y c,m = R Y e,m when P m has experienced a previous epidemic of X, R Y j,b (∞) = R Y j,noX (∞) when P j has not experienced a previous epidemic of X, and R Y j,b (∞) = R Y j,X (∞) when P j has experienced a previous epidemic of X. These expressions for q Y jm can be substituted for q jm in Eq. (3) to yield a transition matrix for modelling the spread of Y, which we will call T Y (s 1 s 2 . . . s n ) , where s j is the final state (either N or T) of the X outbreak in P j . We find the initial probabilities of each state with regards to Y, p Y initial , in analogy to Eq. To find the overall probability of each combination of epidemics of Y in various populations given a prior probability of each combination of epidemics of X (given by p X final (s 1 s 2 . . . s n ) defined in Eq. (6)), we sum over the possible values of (s 1 s 2 . . . s n ) , weighted by their probability.

Code is available on the Open Science Framework at https ://osf.io/z52te /.

Received: 4 September 2020; Accepted: 29 December 2020 

Estimating the probability of a major outbreak from the timing of early cases: an indeterminate problem?

Detecting presymptomatic infection is necessary to forecast major epidemics in the earliest stages of infectious disease outbreaks

Disease Control Priorities: Improving Health and Reducing Poverty

Characteristics of microbes most likely to cause pandemics and global catastrophes

Novel coronavirus outbreak in Wuhan, China, 2020: intense surveillance is vital for preventing sustained transmission in new locations

Effects of population density on the spread of disease

Spread of infectious disease through clustered populations

How big is an outbreak likely to be? Methods for epidemic final-size calculation

Threshold behaviour and final outcome of an epidemic on a random network with household structure

Will an outbreak exceed available resources for control? Estimating the risk from invading pathogens using practical definitions of a severe epidemic

Community-based pandemic preparedness: COVID-19 procedures of a Manitoba First Nation community

The critical role of biomedical research in pandemic preparedness

Epidemiology of pandemic influenza: use of surveillance and modeling for pandemic preparedness

Epidemic patch models applied to pandemic influenza: contact matrix, stochasticity, robustness of predictions

Comparative estimation of the reproduction number for pandemic influenza from daily case notification data

Modeling the worldwide spread of pandemic influenza: baseline case and containment interventions

Increased frequency of travel in the presence of cross-immunity may act to decrease the chance of a global pandemic

Pandemic potential of 2019-nCoV

Interhuman transmissibility of Middle East respiratory syndrome coronavirus: estimation of pandemic risk

World Health Organization

AIDS in the last 10 years

The spread, treatment, and prevention of HIV-1: evolution of a global pandemic

What is a pandemic?

The elusive definition of pandemic influenza

Pandemic influenza preparedness and response: a WHO guidance document

Health is more than influenza

Global mortality estimates for the 2009 influenza pandemic from the GLaMOR project: a modeling study

Coronavirus Disease (COVID-19) Press Conference

Control fast or control smart: when should invading pathogens be controlled?

WHO says it no longer uses 'pandemic' category, but virus still emergency

World Health Organization. WHO Director-General's opening remarks at the media briefing on COVID-19

WHO declares the coronavirus outbreak a global pandemic

WHO declares coronavirus pandemic

What is pandemic? Why did WHO just declare one?

Predictions for future human influenza pandemics

Pandemic H1N1 influenza: predicting the course of a pandemic and assessing the efficacy of the planned vaccination programme in the United States

Modeling influenza epidemics and pandemics: insights into the future of swine flu (H1N1)

The 1918-1919 influenza pandemic in England and Wales: spatial patterns in transmissibility and mortality impact

Metapopulation epidemic models with heterogeneous mixing and travel behaviour

Forecast and control of epidemics in a globalized world

resurgent epidemics in a hierarchical metapopulation model

The role of the airline transportation network in the prediction and predictability of global epidemics

Seven challenges for metapopulation models of epidemics, including households models

Forecasting the spatial transmission of influenza in the United States

Empirical evidence for the effect of airline travel on inter-regional influenza spread in the United States

Effectiveness of travel restrictions in the rapid containment of human influenza: a systematic review

Dynamics of annual influenza A epidemics with immuno-selection

Immunity to seasonal and pandemic influenza A viruses

The impact of COVID-19 and strategies for mitigation and suppression in low-and middle-income countries

Poverty trap formed by the ecology of infectious diseases

Rapid mortality transition of Pacific Islands in the 19th century

The effects of border control and quarantine measures on the spread of COVID-19

Global trends in emerging infectious diseases HHS Public Access

Response to Ebola in the US: misinformation, fear, and new opportunities

World Health Organization. Ebola virus disease -United Kingdom

Assessing the international spreading risk associated with the 2014 West African ebola outbreak

Asymptomatic transmission, the achilles' heel of current strategies to control Covid-19

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the epidemic and the challenges

A metapopulation model for the 2018 Ebola outbreak in Equateur province in the Democratic Republic of the Congo. bioRxiv 1

Control of equine influenza: Scenario testing using a realistic metapopulation model of spread

Multiscale mobility networks and the spatial spreading of infectious diseases

Human mobility networks, travel restrictions, and the global spread of 2009 H1N1 pandemic

Human mobility and the worldwide impact of intentional localized highly pathogenic virus release

Pandemics of focal plant disease, a model

FluTE, a publicly available stochastic influenza epidemic simulation model

Detection, forecasting and control of infectious disease epidemics: modelling outbreaks in humans, animals and plants

Infectious disease surveillance and modelling across geographic frontiers and scientific specialties

Human mobility and time spent at destination: impact on spatial epidemic spreading

Transmission dynamics and final epidemic size of ebola virus disease outbreaks with varying interventions

Pros and cons of estimating the reproduction number from early epidemic growth rate of influenza A (H1N1) 2009. Theor

Epidemics with general generation interval distributions

Stochastic epidemics: major outbreaks and the duration of the endemic period

Stochastic epidemics: the expected duration of the endemic period in higher dimensional models

The duration of the closed stochastic epidemic

A comparison of three different stochastic population models with regard to persistence time. Theor

A note on the derivation of epidemic final sizes

The authors declare no competing interests.

Supplementary Information The online version contains supplementary material available at https ://doi. org/10.1038/s4159 8-021-81814 -3.Correspondence and requests for materials should be addressed to B.J.S.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.