key: cord-0305325-dp4qv77q authors: Southall, Emma; Tildesley, Michael J.; Dyson, Louise title: Prospects for detecting early warning signals in discrete event sequence data: application to epidemiological incidence data date: 2020-04-02 journal: bioRxiv DOI: 10.1101/2020.04.02.021576 sha: dae510987d1be3e60f2a773f95c8f35e3f66ff15 doc_id: 305325 cord_uid: dp4qv77q Early warning signals (EWS) identify systems approaching a critical transition, where the system undergoes a sudden change in state. For example, monitoring changes in variance or autocorrelation offers a computationally inexpensive method which can be used in real-time to assess when an infectious disease transitions to elimination. EWS have a promising potential to not only be used to monitor infectious diseases, but also to inform control policies to aid disease elimination. Previously, potential EWS have been identified for prevalence data, however the prevalence of a disease is often not known directly. In this work we identify EWS for incidence data, the standard data type collected by the Centers for Disease Control and Prevention (CDC) or World Health Organization (WHO). We show, through several examples, that EWS calculated on simulated incidence time series data exhibit vastly different behaviours to those previously studied on prevalence data. In particular, the variance displays a decreasing trend on the approach to disease elimination, contrary to that expected from critical slowing down theory; this could lead to unreliable indicators of elimination when calculated on real-world data. We derive analytical predictions which can be generalised for many epidemiological systems, and we support our theory with simulated studies of disease incidence. Additionally, we explore EWS calculated on the rate of incidence over time, a property which can be extracted directly from incidence data. We find that although incidence might not exhibit typical critical slowing down properties before a critical transition, the rate of incidence does, presenting a promising new data type for the application of statistical indicators. Author summary The threat posed by infectious diseases has a huge impact on our global society. It is therefore critical to monitor infectious diseases as new data become available during control campaigns. One obstacle in observing disease emergence or elimination is understanding what influences noise in the data and how this fluctuates when near to zero cases. The standard data type collected is the number of new cases per day/month/year but mathematical modellers often focus on data such as the total number of infectious people, due to its analytical properties. We have developed a methodology to monitor the standard type of data to inform whether a disease is approaching emergence or disease elimination. We have shown computationally how fluctuations change as disease data get closer towards a tipping point and our insights highlight how these observed changes can be strikingly different when calculated on different types of data. One of the greatest challenges in society today is the burden of infectious diseases, 2 affecting public health and economic stability all over the world. Infectious diseases 3 disproportionately affect individuals in poverty, with millions of those suffering daily 4 from diseases that are considered eradicable. The potential for eradicating diseases such 5 as polio, guinea worm, measles, mumps or rubella is immense (International Task Force 6 for Disease Elimination, [31] ). Even where effective vaccines or treatments exist, disease it was later abandoned in 1969 due to funding shortages and drug resistance [26] , 10 leading to re-emergence of disease in Europe [27] . Assessing when a disease is close 11 enough to elimination to die out without further intervention, thus prompting the end 12 of a control campaign, is a problem of global economic importance. If campaigns are 13 stopped prematurely it can result in disease resurgence and subsequently put control 14 efforts back by decades. Conversely, the threat posed by newly emerging diseases such 15 as SARs, Ebola or the recent corona-virus outbreak COVID-2019 strains available 16 resources, places restrictions on global movement and disrupts the worlds most 17 vulnerable societies. Identifying which newly-emerging diseases will present a global 18 threat, and which will never cause a widespread epidemic is of critical importance. 19 To overcome the challenges identifying disease elimination or emergence, numerous 20 studies have suggested the use of early warning signals (EWS) [3] [4] [5] [6] [7] [8] . EWS are statistics 21 that may be derived from data that change in a predictable way on the approach to a 22 critical threshold. In epidemiology this threshold is commonly described as the point at 23 which the basic reproduction number, R 0 , passes through R 0 = 1. A system with R 0 24 increasing through 1 describes an emerging or endemic disease whereas R 0 decreasing 25 through 1 results in disease elimination. We seek to find EWS to identify when a 26 disease is approaching such a transition. We may identify such statistics using critical 27 slowing down (CSD) theory, which indicates the imminent approach of a threshold, 28 arising from increasing recovery times of perturbations as a system approaches a critical 29 transition [1, 2] . This increase in recovery time occurs because, as the stability of a 30 steady state changes, such as from a disease free state to emergence or from endemic 31 state to elimination, the dominant eigenvalue of the steady state passes through zero. 32 Since the eigenvalue also determines the relaxation time of the system, this recovery 33 time therefore increases as we approach a critical transition. 34 EWS offer the ability to anticipate a critical transition indirectly in real world noisy 35 time series data, by observing, for example, increasing variance or autocorrelation in the 36 fluctuations around the steady-state [2, 9] . Statistical indicators offer a computationally 37 inexpensive and efficient method for assessing the status of an infectious disease, 38 presenting a simple mechanism for disease surveillance and monitoring of control 39 policies. 40 The development of EWS is an active area of research in many fields, identifying the 41 statistical signatures of abrupt shifts in many dynamical systems. Studies have applied 42 EWS to historical data or laboratory experiments where a tipping point is 43 known [1, 13, 23] ; developed methods for using spatial variation [17, 18] , explored the 44 effects of detrending [8, 15] using the ensemble of multiple EWS [12, 13, 30] ; and 45 developed understanding of the limitations of EWS [16, 19, 20] . 46 Discrepancies in statistical signatures have been discovered in a variety of historical 47 March 27, 2020 2/21 datasets known to be going through a critical transition: from climate systems to stock 48 markets, to applications with ecological field data [16, 22, 23] . These studies observed 49 unexpected characteristic traits of common EWS, such as identifying a decreasing trend 50 in variance or standard deviation, leading to a discussion on the robustness of indicators. 51 It is therefore highly important to understand analytically how EWS are expected to 52 change on the approach to a critical transition for different data types to avoid any 53 misleading results. The initial development of EWS in epidemiology focused on prevalence data, 55 producing analytical solutions and numerically testing the capabilities for statistical 56 indicators of emergence and elimination of infectious diseases [5] [6] [7] [8] . Analysis of 57 computer simulations of well-studied epidemiological systems have highlighted 58 challenges such as seasonality [6] conclude that the mean, variance and coefficient of variation (CV) are poor indicators 83 since they are sensitive to reporting errors and insensitive to differences between 84 transmission and recovery rates. In this paper, we advance the current literature to describe generalised signatures of 86 statistical indicators for incidence data, on the approach to a threshold, highlighting the 87 differences between EWS descriptors of incidence and prevalence. Strikingly, our results 88 demonstrate that although EWS of emergence exhibit an increasing variance, a trait demonstrate that variance instead decreases on the approach to a disease elimination 91 transition. We find that although incidence data does not undergo the transcritical 92 bifurcation traditionally considered by CSD theory, nevertheless time series trends are 93 still a valuable tool to predict disease elimination. The discrepancy between prevalence 94 and incidence on the one hand, and elimination and emergence on the other, could lead 95 to potential problems in detecting thresholds if the differences are not clearly 96 understood. 97 We introduce an analytical theory from stochastic processes to address why variance 98 in incidence decreases for disease elimination. We study multiple other indicators of 99 (SDEs) that describe the analytical behaviour of prevalence in these systems. Derivations of the analytical results and calculations of each statistic can be found in 123 the supporting text (S1 Appendix). We present our analysis for incidence data and We verify our analytical results for prevalence and incidence with simulated studies, 129 and compare the contrasting results between prevalence and incidence. We measure the 130 change in trend of multiple statistical indicators using the Kendall's Tau score which 131 gives an indication of an increasing or decreasing trend. We begin with a simple example of a system that is approaching elimination from an 135 existing endemic state of I. We consider an SIS model where the effective contact rate 136 β acts as the control parameter. Effective reduction of β can be induced by public 137 health campaigns (such as washing hands or improving food hygiene) and through 138 social distancing (such as school closure). By decreasing β(t) in time, it slowly forces γ through the critical transition at R 0 = 1. The model transitions are shown in 140 the following schematic with transition probabilities given in Table 1, 141 where β(t) changes slowly in time, given by, can be separated using 144 the linear noise approximation [28] . The corresponding SDE for ζ defined for the SIS is 145 given by [8], Model 2: SIS with increasing vaccination coverage (births and deaths) 147 We consider an SIS model where a proportion of susceptible individuals are vaccinated 148 and gain immunity to the disease. By increasing the proportion of individuals to push the system through the critical transition at R 0 = 1. We interpret the dynamics 157 of the fluctuations of these SDEs with a two-dimensional Fokker-Planck Equation (see 158 supplementary text S1 Appendix and Table 1 for transition rates ), where ζ 1 defines the fluctuations about the susceptibles (ψ = S N ) and ζ 2 defines the 160 fluctuations about the infecteds (φ = I N ), and, In both cases it is assumed that the fluctuations can be separated linearly from the 162 steady state. Many statistics of this system can be described by the covariance matrix 163 Θ, given by In particular, the variance of the fluctuations about the infectious steady state is 165 given by Θ 22 . Finally we consider the SIS model with external infection which has been used to 168 investigate EWS in prevalence and in incidence [4, 5] . We demonstrate how our 169 analytical results compare for this system, and illustrate differences when applied to and we consider the model in a stochastic formulation, with transition probabilities 175 given in Table 1 . Disease emergence is driven by increasing the effective contact rate 176 β(t) over time, that slowly increases R 0 through the critical transition at R 0 = 1, The fluctuations, ζ about the steady state are an extension of those in Model 1 to 178 include the external force of infection and satisfy, found in the supplementary text (S1 Appendix). One limitation of this methodology is its inability to extend to other systems. This 186 derivation would need to be computed again for each specific example. In particular, if 187 one wanted to consider a simpler system with no external forcing, by setting ν = 0, it 188 would make this result redundant -prompting our study for generic EWS that can 189 describe all systems. A counting process can be used as a generalised theory to understand the dynamics of 192 the number of new events over a period of time. In particular, a diverse range of data 193 types can be described by a counting process and this motivates us to characterise how 194 statistics of such processes behave on the approach to a critical transition. Incidence 195 (the number of new cases, N t ) is a counting process, which is known to be described by 196 a non-homogeneous Poisson process where the integral approximation holds for ∆t sufficiently small. In the supporting 198 text (S2 Fig) we demonstrate that for our parameters, this approximation works well for 199 ∆t up to 3. We can derive EWS in disease incidence aggregated over a time interval ∆t 200 (e.g. daily, weekly, biweekly cases) using the well-known central moments of the Poisson 201 distribution: Prior work from O'Dea et al. [4] has also incorporated under-reporting using a 203 negative binomial distribution; this can be included in this model when the rate λ(t) is 204 gamma distributed. A common form of this force of infection is, so that λ(t) depends on the prevalence of infection, I(t). For Model 1, β(t) is a function 209 of time whereas in Model 2 β(t) = β 0 is fixed. Infection can also be increased in other 210 March 27, 2020 7/21 ways such as an external force of infection (Model 3), λ(t) = β(t)S(t)I(t) N + νS(t), that is 211 typically used to describe zoonotic spillover events or as an approximation for human 212 migration. Rate of Incidence Theory 214 We consider the rate of incidence (or the rate of the Poisson process) λ(t) = T (I + 1|I), 215 which can be described dynamically with an SDE. Our analyses shows that the critical 216 transition of the rate of the Poisson process corresponds to prevalence models (e.g. at 217 R 0 = 1) and importantly exhibits behaviours predicted by CSD. 218 We investigate here calculating statistics on the rate of incidence (RoI) and its 219 potential to be used as an EWS for disease transitions. In particular, by considering the time 227 derivative of λ t we can conclude that the fixed points of the rate of incidence can be 228 described by the transcritical bifurcation at R 0 = 1. We find that the stability of the 229 fixed points of λ t also correspond to those of I, as expected. 230 We describe the fluctuations, ω, about the steady state of λ t = β(N −I)I N using the 231 linear noise approximation (LNA). We are interested in statistics calculated on the 232 fluctuations about the rate of incidence, to develop new indicators of disease elimination. 233 We derive the resulting analytical solution for ω using Ito's Change of variable formulae 234 (details in supporting text: S1 Appendix) to approximate ω with the following Gaussian 235 process: In particular, the changing behaviour of the variance of the rate of incidence as the 237 system approaches disease elimination can be calculated from the SDE Eqn, 25, Model 2 If we consider models where there is population-level immunity, then λ t = βSI N and we 240 can no longer reduce the dimension of incoming transitions using S = N − I. This can 241 be seen in Model 2 (SIS with increasing rate of vaccination), in particular the prevalence 242 analysis of these systems presented in the Methods Section results in a multivariable We again use Ito's Change of variable formulae for the multivariable system (which 247 will depend on ζ 1 and ζ 2 ) to approximate ω. This leads to an SDE equation which 248 depends on the description of ζ 1 and ζ 2 (eqn. 9). In particular, we are interested in 249 statistics of the rate of incidence, such as the variance, which can be simplified in terms 250 of the original covariance matrix Θ (eqn. 12) and mean-field equations of infectious (φ) 251 and susceptible (ψ) individuals, ω ≈ β(φζ 1 + ψζ 2 )), Following, we can describe the fluctuations about the rate of incidence (ω) using the 258 LNA. In particular, As previously, we can derive statistics of ω from the solution of this SDE. In 260 particular, since the SDE is linear in ω then we can describe ω as a Gaussian variable 261 with mean zero and variance given by the solution to the following ODE, was reduced from β 0 = 1 to 0, slowly forcing R 0 = 5 to 0. In Model 2, the rate of 269 vaccination was increased from p 0 = 0 to 1, slowly forcing R 0 = 5 to 0. In Model 3, the 270 transmission parameter β was increased from β 0 = 0.12 to 0.24 so that the basic 271 reproduction number increases from R 0 ≈ 0.6 to ≈ 1.2. A drawback of using the rate of incidence (RoI) as a measure of disease elimination, 273 is the need to develop methods to extract this rate from incidence data. In our 274 simulation study, we calculate the RoI in two ways. Firstly, using simulations of (1)). We illustrate how 292 EWS change over time, and how accurate the theory is to predicting these trends. In 293 particular, we are interested in the trends over time and whether these time series 294 properties of EWS are the same for prevalence and incidence data. The Kendall-tau score gives a measure of an increasing or decreasing trend of each 296 statistic over the time series ( where N is the number of time points). We use the 297 measure to evaluate whether a statistic corresponds to an increasing or decreasing trend 298 and compare this for different data types (prevalence, incidence and RoI). The 299 Kendall-tau score is defined as [25] , where two points in the time series (t 1 , x t1 ) and (t 2 , x t2 ) with t 1 < t 2 are said to be a 301 concordant pair if x t1 < x t2 , and a discordant pair if x t1 > x t2 . If the two points are 302 equal (x t1 = x t2 ) then the pair is neither concordant or discordant. 303 We calculate each statistic on a moving window (size 50) for each detrended 304 simulation, and then evaluate the Kendall-tau score. We compare the Kendall-tau scores 305 calculated on simulations going through a critical transition with null simulations, and 306 we then calculate receiver operating characteristic (ROC) curves by considering the null 307 model to be negative and the other models to be positive. We compare the performance 308 of each model statistic using the area under the curve (AUC). Good statistics have an 309 AUC close to 1 or 0 since this indicates the statistic is far from picking by chance. Variance 312 Variance is one of the most intuitive statistical indicators. As a system approaches a 313 critical transition the time taken to recover from small perturbations increases, as 314 described by Critical Slowing Down theory. This can be observed in the fluctuations 315 about the steady state, which on the approach to a critical transition take longer to 316 return and consequently vary far more, defining the increasing nature of variance as an 317 early warning signal. 318 We evaluate analytical solutions of the variance in prevalence using the derived SDE 319 for each model (Model 1: Eqn.4, Model 2: Eqn.9, Model 3: Eqn.15). We compare this 320 to theoretical solutions of the variance in incidence, using the transition rates for each 321 model (Table 1) to compute the rate of the Poisson process λ. Figure 1 We observe that variance in prevalence simulations (Figure 1 a(ii) , b(ii) and c(ii)) 335 increases on the approach to the critical transition, as predicted by critical slowing 336 down. In comparison the variance in incidence decreases before the critical transition 337 for all disease elimination models (Model 1 Fig. 1 a(i) and Model 2 Fig. 1 b(i) ) and 338 increases similarly to prevalence for the disease emergence model (Model 4 Fig. 1 c(i) ). 339 This is contradictory to the theory of critical slowing down theory which predicts that 340 derivations from the steady state values return increasingly slowly on the approach to a 341 transition. As expected by our Poisson process analysis, the variance of this system should be 343 the same as the mean of the system. Therefore for disease elimination models, we 344 should expect a decreasing variance (along with a decreasing mean) when calculated on 345 incidence data, in contrast to an increasing variance with prevalence data. Likewise 346 with disease emergence models we expect an increasing variance to correspond to the 347 increasing mean. This demonstrates that our analysis of incidence has successfully 348 predicted the time-varying variance for these different systems. Rate of incidence 350 We have observed that incidence data does not approach a critical transition as 351 described by critical slowing down theory. Consequently we demonstrated in Figure 1 352 that the variance of incidence does not necessarily increase on the approach to a critical 353 transition. A new approach for working with incidence-type data is to consider the rate 354 of incidence, λ(t) = T (I + 1|I), which for each model we have derived the dynamical 355 SDE (see Methods). The analytical variance in the rate of incidence is presented in Fig. 2 (orange line) . We 358 find that the theoretical analysis supports the simulation studies and provides us with 359 an understanding of how statistical indicators calculated in RoI data change on the 360 approach to a critical transition. We observe an increasing variance in the rate of 361 incidence before a critical transition; a time series trend which if exhibited in real-world 362 data could be used to anticipate disease tipping points. 363 We present results calculated in RoI simulations using the two methods: "true" and 364 "approximated" RoI, in Fig. 2 . The first method uses prevalence data ("true", purple 365 line) and corresponds well with the analytical solution (orange line) for all models and 366 the latter method (smoothing incidence data "approximated", blue line) fits particularly 367 well for Model 3 (Fig. 2(c) ). However it does not follow as closely to some time-varying 368 properties of the variance for Model 1 & 2 ( Fig. 2(a) and (b) ) respectively. Although increasing variance on the approach to the critical transition. In particular, 371 "approximated" RoI can be implemented in practice from incidence type data (blue line), 372 captures this property in all models. In Fig. 2(a) and (b) we observe that the analytical prediction fits well with the 374 stochastic simulations of βSI N (purple line) for Model 1 and Model 2 respectively. This 375 demonstrates that this theory approximates the behaviour of the system well. Indeed, 376 we observe that approximating the rate of incidence by smoothing Gillespie simulations 377 of new cases and then calculating the variance of this quantity ("approximated" RoI) 378 predicts a similar increasing behaviour before the critical transition, shown in Fig. 2 (a) 379 (blue line). This corresponds to the same peak as the analytical prediction and "true" 380 simulations. However, it fails to capture the magnitude of the behaviour earlier on in 381 the dynamics. An area that still needs to be addressed with this methodology of smoothing new 383 case data is determining a suitable window size. This could result to misleading EWS 384 when used in practice. In the supporting text S1 Fig, we demonstrate that if the disease 385 is approaching elimination at a slower rate, both methods ("true" and "approximated") 386 converge to the analytical solution. We chose parameters such that Model 1 approaches 387 disease elimination at the same rate as Model 3 approaches disease emergence (R 0 388 changes from 1.2 to 0, β {1} 0 = 0.24). As the system changes slowly enough then the 389 system will be approximately ergodic, such that the moving average resembles the mean 390 incidence. Thus the "approximated" method will be closer to the "true" solution. In 391 comparison, the faster a system changes over time, will correspond to a wider range in 392 incidence cases across the moving window. Resulting in a lower mean over the window 393 which can be seen in Fig. 2(a),(b) ; although the statistic will be more pronounced at 394 the threshold. 395 We find that for Model 2 (Fig. 2(b) ) the general trend of the variance is less 396 pronounced at the critical transition than observed for Model 1. We observe that the 397 analytical solution (Fig. 2(b) orange line) and true stochastic simulations (Fig. 2(b) 398 purple) only slightly increase before the critical transition, implying this trend would be 399 difficult to detect in real-world data. In particular, the Kendall-tau score which can be 400 an indication of an increasing trend, is negative (decreasing, τ = −1) for Model 2, 401 whilst for Model 1 and 3 we find that τ = 0.987 and τ = 1 respectively. Although, we 402 observe that the "approximated" simulations of the rate of incidence (Fig. 2(b) blue 403 line) exhibit similar properties as Model 1. We observe that the early stage dynamics of 404 this method have not predicted the expected behaviour of the analytical solution. It can 405 be noted that R 0 decreases at the same rate as Model 1, suggesting that this can be a 406 result of the approximation when R 0 is not slowly changing. Due to this approximation, 407 it can be observed that the variance of new cases does therefore increase before the critical transition (blue line). In Fig. 2 (c) we observe that both measurements of the variance of λ t calculated on 410 stochastic simulations of Model 3 have closely followed the analytical solution of 411 variance. As expected the true stochastic simulations (Fig. 2(c) purple line) follow 412 closely to the theory, supporting that this derivation of ω is correct. More interestingly, 413 calculating the variance of the rate of incidence directly from simulations of new cases 414 (N t , Fig. 2(c) blue line) has performed far better than when presented in Model 1 (Fig. 415 2(a) ). For Model 3, we observe that the variance of the rate of incidence increases 416 before the critical threshold, where the infectious disease emerges with outbreaking 417 dynamics similar to prevalence for this model. We further found that the early 418 dynamics of the "approximated" RoI simulations represent the true behaviour of the 419 variance. This result may be due to R 0 increasing more slowly in Model 3 than the rate 420 it decreases at in Model 1, satisfying the ergodic condition. In this section, we investigate the potential of identifying an epidemiological transition 423 using five commonly implemented early-warning signals: variance, coefficient of 424 variation (CV), skewness, kurtosis and lag-1 autocorrelation (AC(1)). Exploration of 425 each EWS follows similarly to variance, as analysed above theoretically and numerically 426 for prevalence, incidence and rate of incidence. In the supporting text, time series analyses for these indicators (S1 Appendix). Here, we quantify these time series trends for each statistical indicator using the 431 Kendall-Tau score as a measure of an overall increasing or decreasing trend. We present 432 in Fig. 3 the predictive power of each statistical indicator using its time-changing trend 433 to classify simulations as either extinct (Ext simulations), emerging (Emg simulations) 434 or null simulations (Fix simulations). We calculate the area under the ROC curve 435 (AUC) by comparing the Kendall-tau score calculated over each time series up to two 436 end points: before the critical transition (t 1 ) and after the critical transition (t 2 ) which 437 gives an overall score of the true/false positive rate. The AUC score gives a predictive measure between different indicators, which we use 439 to assess their performances. The closer the AUC is to 0.5 signifies the worst the 440 statistic's performance at anticipating a critical transition. This is analogous with 441 randomly selecting simulations that are the null and disease elimination (Model 1, Fig. 442 3(a)) or disease emergence (Model 3, Fig. 3(b) ). In particular, skewness is a poor 443 indicator because of its inability to identify disease elimination with any type of disease 444 data it is applied to (rate of incidence, incidence and prevalence). Identifying emergence 445 with skewness in prevalence or RoI data (red and orange bars respectively) is also very 446 poor and its predictive ability is only slightly increased with incidence (green bars). A score close to 1 indicates nearly perfect sensitivity and specificity. For each EWS, 448 we assume that an increasing trend represents a disease going through a critical 449 transition. As a result a AUC score of 1 informs us that the indicator is increasing and 450 that it is possible to identify all Ext/Emg simulations when compared to the null 451 simulations by its increasing trend. Notably, in Fig. 3 coefficient of variation calculated 452 on all types of disease data (rate of incidence, incidence and prevalence) and for both 453 Model 1 & 2, exhibits a near perfect ability to identify the increasing trend. An AUC score of 0 demonstrates that the time series trend is instead decreasing and 455 as such it doesn't correspond to the predetermined prediction. A perfectly diagnosed 456 decreasing indicator when compared to the null model will result in zero sensitivity 457 under these conditions and an AUC score of 0. Fig. 3 highlights which indicators are in 458 some cases increasing (AUC close to one), decreasing (AUC close to zero) or are poor For each ROC curve, we measured the AUC which is an indication of how predictive each indicator is by its ability to distinguish between elimination simulations and the null model. A score closer to 0.5 signifies the worst performance (random diagnosis). We evaluate the Kendall-tau score up to before the critical transition (t 1 = 390) and after the critical transition (t 2 = 450), which gives an indication if the EWS is increasing or decreasing. A score of 1 demonstrates that it is possible to identify all Ext simulations when compared to null simulations by its increasing trend (i.e. perfect sensitivity, true positive rate). A score of 0 means that there is zero sensitivity and instead the simulations are decreasing. March 27, 2020 16/21 indicators (AUC close to 0.5). In particular, as discussed in the previous section, 460 variance always increases prior to disease emergence ( Fig. 3(b) ). However, for disease 461 elimination (Model 1: Fig. 3 (a) and Model 2: S11 Fig) results are substantially different 462 when we compare variance calculated in rate of incidence and prevalence (orange and red 463 bars respectively) with incidence (green bars). For RoI and prevalence data types, the 464 statistical signature is an increasing variance with an AUC near 1. This is in contrast to 465 the latter where the trend is decreasing with an AUC near 0. However, the results for 466 variance (both increasing and decreasing) are highly predictive (|AU C − 0.5| ≈ 0.5). Thus, if a system is not known or there is difficultly in determining the type of data, 468 incorrect conclusions could be drawn when interpreting the time series trend. While studies for EWS on incidence-type data have been growing in recent years, 471 theoretical exploration of how these indicators change on the approach to a critical 472 transition have been neglected. In this paper, we have shown that the typical trends of 473 EWS that precede a critical transition are exhibited in prevalence-type data but do not 474 always exist in incidence-type data. In particular, we have focused our investigation on 475 the trend of variance over time as an infectious disease system approaches a tipping 476 point. Prior work has shown that variance in incidence increases on the approach towards 478 disease emergence. However, our work highlights that this property is not a result from 479 critical slowing down theory as first expected. We have shown it is a consequence of the 480 counting process that can approximate incidence-type data. As such, we demonstrated 481 that the variance in incidence is expected to follow the mean in incidence. In particular, 482 the variance will increase on the approach to disease emergence, but will notably 483 decrease before a disease elimination threshold. We applied these findings to two 484 systems of disease elimination and verified that variance of incidence exhibits a 485 decreasing trend on the approach, following the behaviour of the mean and 486 contradicting critical slowing down theory. Therefore, it is highly recommended to understand analytically how EWS change on 488 the approach to a critical transition in order to avoid misleading results. The 489 generalised theory of a counting process can be applied to many other systems outside 490 of the scope of epidemiology where we would expect a decreasing variance preceding a 491 critical transition. Potential applications include the observation of animals through 492 camera traps, disease surveillance sampling in wildlife or movements in stock prices, 493 which are all examples of incidence-type data. Notably, a substantial number of studies 494 on ecosystem data, climate data and financial data have observed inconsistencies in 495 statistical indicators [16, 22, 23, 29] . Although we found the Poisson process to be 496 overdispersed in the context of epidemiology, it provides a broad framework which can 497 be extended to many other infectious disease systems using the incoming transition 498 probabilities into the infectious class. 499 We proposed extracting the rate of incidence (RoI) or intensity of Poisson process 500 from incidence-type data to illustrate that to utilising CSD, such as observing an 501 increasing variance, requires suitable data which undergoes a bifurcation. In particular, 502 we have shown that the critical threshold in the RoI corresponds with that of 503 prevalence; and as expected we demonstrated that the trend in variance in RoI does 504 increase before an imminent epidemiological transition. 505 We applied five early warning signals to simulated datasets comprising of the three 506 discussed types: prevalence, incidence and rate of incidence. The simulated data we 507 have investigated represents perfect reporting or the "best case scenario". Often is the 508 case that there is underreporting that may reduce the detectability of signals in 509 real-world data. The work we have presented here can be extended to include a gamma 510 distributed intensity λ. Using a gamma distributed rate of incidence will account for 511 reporting errors as described by O'Dea et al. Overall, our study suggests that a robust indicator is one that shares a highly 513 predictive time series trait (|AU C − 0.5| ≈ 0.5) amongst all three data types, even with 514 inconsistent trends (increasing or decreasing). Therefore, we suggest that variance and 515 coefficient of variation are overall good indicators due to their high predictive power in 516 all cases. Coefficient of variation is a robust indicator for disease elimination since the 517 trend is similar between different types of data ( Fig. 3 (a) ) and S11 Fig. However 518 discrepancies are demonstrated when considering opposite disease thresholds as shown 519 with disease emergence (Fig. 3(b) ) which has a decreasing trend for CV and performs 520 less well with disease prevalence data. However, we found that kurtosis and AC(1) are not robust indicators. Although 522 kurtosis and AC(1) have a predictive trend with prevalence data, this is not typically 523 the data which is readily available. In particular, kurtosis is highly predictive (with a 524 decreasing trend) in prevalence data on the approach to disease elimination ( Fig. 3(a) ) 525 and fairly predictive with an decreasing trend in the case of prevalence with emergence 526 (Fig. 3(b) ); it is a poor indicator for all other types of data. Likewise, although AC(1) 527 has a clear increasing trend for prevalence data elimination systems ( Fig. 3(a) , S11 Fig) , 528 it is less predictive trend for incidence and RoI data. Additionally, the trend is not 529 distinct for any datasets when considering an emergence transition, therefore there is a 530 potential for this indicator to be used incorrectly. In the cases where an EWS is poor in 531 some types of data but good for others could lead to misleading judgements of systems, 532 and therefore are not robust. These findings support prior work on prevalence and initial work from O'Dea et al. 534 and Brett et al. with incidence-type data. Our analytical exploration of incidence has 535 indicated a new data source, RoI, which can be extracted from incidence timeseries. A 536 potential powerful tool would be to compute variance and CV indicators with different 537 types of data (incidence, rate of incidence and prevalence) and ensemble these. An 538 ensemble or combination of multiple statistical indicators was suggested by Drake & 539 Griffen [13] and has been applied to case studies with the same data-type and a 540 combination of EWS by Kefi et al. [30] to help interpret between different critical 541 transitions and also has successfully detected transitions using an ensemble of different 542 time series data [12] . This suggests a potential approach to achieve a single metric from 543 a combination of indicators calculated on multiple timeseries data with different trends, 544 such as we have observed with incidence and RoI, to achieve a more pronounced 545 indication of disease transitions. 546 Additionally, further work would be to include a heterogeneous ensemble as 547 suggested by O'Dea et al. [4] , whereby all parameters are sampled randomly for each 548 realisation rather than being equal. This will lead to more realistic results, as each 549 parameter sample represents time series data from different locations, as suggested by 550 studies on spatial statistics, a promising method for addressing limited data [8, 17, 18] . 551 Comparatively, we have shown here that computing the statistics on a homogeneous 552 ensemble although unrealistic, it returns exact stochastic behaviours of the system and 553 we used this to verify the simulated study with the theory. In conclusion, there is a tremendous potential for using early warning signals to 555 provide evidence on our progress towards elimination and inform public health policies. 556 We have indicated that by monitoring simple statistics over time it is possible to 557 observe disease emergence and elimination, which with further development offers a 558 promising solution for an automated system that can update time series statistics in 559 real-time as new data becomes available. This would be particularly useful for emerging 560 diseases where EWS could be used to prompt early detection and help aid rapid responses. The focus of our paper has provided insight on how statistics behave for 562 different types of infectious disease data, where we considered suitable data which could 563 be incorporated into such monitoring system. We have researched the resemblance of 564 observed time series results between different data types, a necessary exploration for the 565 development of EWS before they can impact decision making. We reported that some 566 indicators traits are inconsistent across all data types and some EWS differ significantly 567 between disease thresholds: elimination and emergence. Knowledge of the type of data 568 which has been collected is imperative to avoid misleading judgements in response to 569 time series trends. Our work has provided analytical evidence to understand why results 570 differ, improving our ability to monitor EWS for infectious disease transitions. Early-warning signals for critical 593 Anticipating epidemic transitions with imperfect data. PLoS computational 596 biology Disentangling reporting and disease transmission Theoretical Ecology Theory of early warning signals of disease emergenceand 600 leading indicators of elimination. Theoretical Ecology Forecasting infectious disease 602 emergence subject to seasonal forcing Leading indicators of mosquito-borne disease 605 elimination. Theoretical ecology The problem of detrending 607 when analysing potential indicators of disease elimination Rising variance: a leading indicator of ecological 610 transition. Ecology letters Early warnings of regime shifts: a 613 whole-ecosystem experiment climate tipping points from critical slowing down: comparing methods to improve 616 robustness Including trait-based early warning signals helps predict 619 population collapse Early warning signals of extinction in deteriorating 621 environments Statistical indicators and state-space 623 Methods for detecting early 627 Early warning signals of ecological transitions: 637 methods for spatial patterns Catastrophic regime shifts in ecosystems: linking 639 Trends in ecology & evolution Factors influencing the 641 How one might miss early warning 644 signals of critical transitions in time series data: A systematic study of two major 645 currency pairs Critical slowing down as an early warning 647 signal for financial crises Early warnings of 650 regime shifts: a whole-ecosystem experiment M Modeling infectious diseases in humans and 653 animals A New Measure of Rank Correlation Some lessons for the future 657 from the Global Malaria Eradication Programme (1955-1969) Malaria resurgence: a systematic review and assessment of its causes. 661 Malaria journal Stochastic processes in physics and chemistry Lack of critical slowing down 665 suggests that financial meltdowns are not critical transitions, yet rising variability 666 could signal systemic risk Early warning signals also 668 precede non-catastrophic transitions Recommendations of the international task force 670 for disease eradication