key: cord-0003343-hp2d68su
authors: Cazelles, Bernard; Champagne, Clara; Dureau, Joseph
title: Accounting for non-stationarity in epidemiology by embedding time-varying parameters in stochastic models
date: 2018-08-15
journal: PLoS Comput Biol
DOI: 10.1371/journal.pcbi.1006211
sha: 4836e3bd00d59bf1af3b4f5000cf917c71655658
doc_id: 3343
cord_uid: hp2d68su
The spread of disease through human populations is complex. The characteristics of disease propagation evolve with time, as a result of a multitude of environmental and anthropic factors, this non-stationarity is a key factor in this huge complexity. In the absence of appropriate external data sources, to correctly describe the disease propagation, we explore a flexible approach, based on stochastic models for the disease dynamics, and on diffusion processes for the parameter dynamics. Using such a diffusion process has the advantage of not requiring a specific mathematical function for the parameter dynamics. Coupled with particle MCMC, this approach allows us to reconstruct the time evolution of some key parameters (average transmission rate for instance). Thus, by capturing the time-varying nature of the different mechanisms involved in disease propagation, the epidemic can be described. Firstly we demonstrate the efficiency of this methodology on a toy model, where the parameters and the observation process are known. Applied then to real datasets, our methodology is able, based solely on simple stochastic models, to reconstruct complex epidemics, such as flu or dengue, over long time periods. Hence we demonstrate that time-varying parameters can improve the accuracy of model performances, and we suggest that our methodology can be used as a first step towards a better understanding of a complex epidemic, in situation where data is limited and/or uncertain.
As our world becomes more and more globalized, infectious disease poses an everincreasing threat to human health. The multitude of environmental and behavioral factors, which account for the spread of infectious diseases, are ever-evolving and thus infectious diseases propagation is complex.
In the face of this complexity, mathematical models offer valuable tools to study the dynamics of epidemic diseases. Developing adequate statistical and mathematical tools, that take account of the time-varying nature of the different mechanisms responsible for
Our world constantly faces the threat of emerging and re-emerging diseases and it has been shown that this has intensified over the past 50 years. This intensification is due, in part, to climate change, urbanization and globalization [1] meaning that infectious diseases remain a constant and unpredictable threat to human health. Numerous factors contribute to the propagation of an infectious disease. These include increased human connectivity, limited availability of economic resources for adequate intervention, increasing antimicrobial resistance, evolution of the dominant strains and increasing parasite and vector resistance to the most widely used drugs and insecticides, etc. A key factor in this huge complexity is non-stationarity [2] , meaning that the characteristics of the dynamical epidemiological processes evolve with time. Thus, the mechanisms of transmission are uncertain, making it difficult to obtain quantitative predictions. One of the classic aspects of non-stationarity, is the seasonality of epidemiological dynamics, linked to environment and climate [3] [4] but the environmental variability can shape the disease propagation in unforeseeable ways on small and large spatial scales [5] [6] [7] [8] . Intervention and control may also modify the course of an epidemic. A less well-described but equally important cause of non-stationarity is linked to social cycles, e.g. school terms, religious holidays and agricultural cycles [9] [10] [11] [12] [13] . Research increasingly focuses on the effect of behavioral change in the presence of epidemiological risk as a source of non-stationarity [14] [15] [16] . Societal responses and changing human behavior play an important role in our connected society. Thus, during an epidemic, depending on the availability of information on the disease, people exhibit a variety of behaviors including anxiety and social distancing that might greatly influence the course of an epidemic.
For all of the above reasons, the spread of pathogens through human populations can be complex and hard to predict. In the face of this complexity, mathematical models offer valuable tools to study the dynamics of epidemic diseases, in order to synthesize information to understand observed epidemiological patterns and to test different hypothesis on the underlying key mechanisms [17] . Moreover, mathematical models play a crucial role in infectious disease prevention by assessing the impact of different control measures, e.g. vaccination strategies [18] [19] .
Nonetheless, there are very few, if indeed any, cases where modelers can access all the necessary information to reliably predict the course of an epidemic. This is particularly the case when we consider the non-stationarity features of epidemics and their transient nature poses a challenging problem for modeling. Further to this, different hypothesis must be formulated. In the case of influenza, for example, some researchers have suggested using a quantitative relationship between climatic variables and the effective transmission rate [20] . Another recent example illustrates non-stationarity in epidemiology. Between November 2010 and February 2011, despite a low level of population susceptibility, an unexpected third wave of infection by the H1N1pdm09 pandemic virus was observed in the United Kingdom. Using a compartmental mathematical model of influenza transmission, this third wave was explained, by a substantial increase in the transmissibility of the H1N1pdm09 virus [21] . It has been proposed that this modification of the transmissibility was caused by the virus evolution with a better adaptation to the human host, or by climatic factors, namely the very cold weather experienced in the United Kingdom at that time, or by a combination of these factors [21] .
To tackle the problem of non-stationarity in epidemiology, some approaches use a linear function to reconstruct the effective reproduction number (average number of secondary cases per primary case, R eff ). Wallinga and Teunis [22] proposed a generic method that requires only case incidence data and the distribution of the serial interval (the time between the onset of symptoms in a primary case and the onset of symptoms of secondary cases) to estimate R eff over the course of an epidemic. This approach has been improved by numerous authors and applied to real time estimation of R eff [23] [24] [25] . Other authors estimated using mathematical models R eff for each season [26, 27] . To calculate the time-varying infection rate the reconstructed time series of R eff derived from the notification data can be used [28] .
However, the time evolution of R eff by definition depends not only on the time evolution of the epidemiological parameters but also on the number of susceptibles. More complex approaches have therefore been proposed. These approaches use semi-mechanistic models that incorporate the known compartmental structure of disease transmission but do not specify the form of the transmission rate equation that is estimated based on the data. In an early paper, the force of infection is estimated by neuronal network or kernel regression [29] . Now it is more common to use B-spline [30] [31] [32] [33] . An alternative approach is to use diffusion models driven by fractional Brownian motion to model time-varying parameter of major epidemiological significance [34] [35] [36] . The models developed assign diffusion processes to the time-varying parameters embedded in a state-space framework. With the Kalman filter, the timeevolution of some key parameters (average transmission rate, mean incubation rate, and basic reproduction rate) were estimated during the course of the HIV/AIDS epidemics in the Paris region [34] [35] . Dureau et al. [36] generalized this approach using a Bayesian framework with an adjusted adaptive particle Markov chain Monte Carlo algorithm (PMCMC), but only applied to the transmission rate, for short epidemics, with application to the 2009 pandemic flu. Very recently, an algorithm relying on robustly estimating the time-varying infection rate, based on the method of the unknown input observers from control theory, has been proposed [37] . Similarly, an approach for the reconstruction of time-dependent transmission rates, by projecting onto a finite subspace, spanned by Legendre polynomials, has been introduced [38] .
In our previous works [34-36], we have introduced an approach for reconstructing the time evolution of some key parameters with just the weak hypothesis according to which they follow a basic stochastic process. The parameter time evolution is estimated solely based on observations of the incidence or the prevalence. Here, we propose to expand this approach to recurrent epidemics over time periods longer than just one season. The underlying idea of this approach is to capture unknown influences by considering time-varying parameters. As with other semi-mechanistic approaches, the key advantage of this approach, for the parameter dynamics, is that it is data-driven, and thus the shape of change does not need to be specified beforehand. We applied our framework both to a toy model, where parameters and observation process are known, and to two real data sets. This allows us to demonstrate that this datadriven approach is very effective for tackling the non-stationarity of recurrent epidemics, even with long time series. It has other benefits too. For instance, with limited access to information, it can capture unknown influences. By so doing, and by analyzing the parameter time evolution, this framework allows a more thorough analysis of the different influences, facilitating their introduction in more complex models with pertinent hypotheses based on observations.
Our approach is based on three main components: an epidemiological model embedded in a state-space framework, a diffusion process for each time-varying parameter and an up-to-date Bayesian inference technique based on adaptive PMCMC.
The main advantage of the state-space framework is the use of two sets of equations, the first set describes the propagation of the disease in the population and the second is for the observation process. This allows for consideration of unknowns and uncertainty both in the epidemiological mechanisms and in the partial observation of the disease:
( _ xðtÞ ¼ gðt; xðtÞ; y 0 ðtÞ; uðtÞÞ yðtÞjxðtÞ $ f ðhðxðtÞÞ; y 0 ðtÞÞ ð1Þ
The first equation is for the epidemiological model, with x(t) representing the state variables (for instance, S(t) the susceptibles, I(t) the infectious and R(t) the removed for the classical SIR model) and θ'(t) the epidemiological parameters. The second is the observational process defined by probabilistic law f and a reporting rate on transformation of some state variable h(x(t)) because we may not be able to directly measure all state variables but just some or a function of them. In these equations, y(t) are partial observations of x(t), u(t) is the process noise describing different form of stochasticity and the observational noise is included in f. In our applications, h(x(t)) will be the cumulative sum of new cases over the observation time step, that is generally the quantity observed by Public Health systems.
Considering the time-varying parameters θ(t) as a subset of θ'(t), we make the assumption that they evolve more or less randomly and do not follow a defined mathematical function. In the absence of prior information the use of diffusion motion allows us to impose few restrictions on the evolution of θ(t). We consider that they follow a continuous diffusion process (a discrete diffusion process was used in [35]):
where σ is the volatility of the Brownian process (dB(t)) and will be estimated during the fitting process. The use of a Brownian process can be viewed as a weak hypothesis for the imposed motion of θ(t) and the volatility σ being a regularized factor. Intuitively, the higher the values of σ the larger the changes in θ(t). The logarithm transformation avoids negative values which have no biological meaning. When prior knowledge on θ(t) is available this Brownian process can be modified to account for a drift in (2) (see [36] ).
For the time-varying parameter, we focus on the parameter of the force of infection classically defined as:
with β(t) the transmission rate usually defined by a sinusoidal function. The control or the behavior modification can also be taken into account:
We start our demonstration by showing that it is possible to reconstruct both the trajectory of a SIRS model (SIRS stands for Susceptibles, Infectious, Removed and Susceptibles again) and that of the sinusoidal transmission rate. In this example, the trajectory of each variable has been simulated with a model for which all the parameters were known. Moreover we also knew the observation process that has generated the data, a Poisson law for the incidence with an observation rate equal to 1. incidence and the transmission rate highlighting the potential of the method. The parameter estimations are in perfect agreement with the values used to generate the observations and the estimation process has correctly converged (S1 and S2 Figs). This clearly demonstrates the feasibility of accurately ascertaining the time evolution of the transmission rate and correctly estimating the R eff (see Fig 2) . It is worth emphasizing that the SIRS model is a complicated example for different reasons. First, even with a constant transmission rate the SIRS model can generate oscillations (damped oscillations, see [17, 44] ). Secondly, the model trajectories are not very sensitive, a modification of ± 10% can induce minor modifications of the trajectories that are inside or near the 95% CI of our inferences (Fig 2) . Moreover in this example we have used initial conditions outside the attractor of the dynamics to generate transients that appear more realistic for real applications, but are more complex to reconstruct. The robustness of our approach has also been tested: (i) using long time series and initial conditions near the attractor ( Fig 3A and S3A Fig); (ii) modifying the number of inferring parameters (S4-S6 Figs), for instance estimating just the volatility parameter (S7 and S8 Figs); (iii) considering the possibility of not using the transformation log in the diffusion process (S9 and S10 Figs) and (iv) using a true β(t) with 2 or 3 periodic components (Fig 3B and 3C and S3B and S3C Fig) .
We have also explored the performance of our approach by comparing their inferences to those of the true model. The re-estimation of the true model on its own data is displayed in S11-S13 Figs. Table 1 presents indices of the goodness-of-fit of the true model and models with time-varying β(t) with different number of parameters inferred. As expected, the error on β(t) is smaller when the true equation is used (Table 1 ). However, regarding the estimated incidence, the true model and our approach give similar results both in terms of mean and variance (Table 1 ). It could be argued that the price of the flexibility of our approach is a greater variability in some of the trajectory estimations (Table 1) . Nevertheless the average dynamics are always estimated correctly.
As misspecification is an important problem (e.g. [45] ) we have also compared the performance of our approach to those of a misspecified seasonal SIRS model. We have thus used the example of a sinusoidal β(t) with two periodic components (see Fig 3B) and computed the indices of the goodness-of-fit of the true model with the SIRS model with 1 year sinusoidal β(t) and with our time-varying periodic β(t). The results clearly show that our approach performed better than the misspecified model for the three trajectories analyzed, Incidence, β and R eff ( Table 2) . Once again the price of the flexibility of our approach is a greater variability in some of the trajectory estimations. However this is preferable to a large error in the median trajectories as occurred in those observed with the misspecified model ( Table 2) .
Our methodology is also applicable to other more complex or simpler tasks. For instance, it can follow the time evolution of a parameter describing the availability of susceptibles, ε S (t) (Fig 4 and S14 and S15 Figs). Fig 4 shows the accurate reconstruction of the trajectory of the incidence and also of the trajectory of ε S (t) that shifted at a given time point and decreased slightly thereafter. This highlights once again the potential of our approach as it is never easy to estimate a discontinuous dynamic with a continuous process (2).
In previous works, the dynamics of influenza in Israel have been analyzed using a discrete deterministic SIRS model and weekly data from Israel's Maccabi health maintenance organization [20, 46] . To describe the seasonality of this recurrent epidemic, the authors used a linear model between the transmission rate and local climatic variables, daily temperature and relative humidity [20, 46] . We have re-analyzed their dataset (but limited to 1998-2003 due to a modification in the reporting) to reconstruct the time evolution of β(t). Our results (
Our last example is on dengue in Cambodia. Again the idea is to relax the assumption of a sinusoidal β(t) in a SEIR model. Monthly data from the capital Phnom Penh [47] , for which the meteorological data is available from the international airport, was used. We can accurately describe the 12 year time series and reconstruct the time evolution of β(t) (Fig 6 and S17 Fig) . Our results stress that the β(t) oscillations are more complex than a simple sinusoidal function. Sometimes bi-modality occurs over one season. In general one observes a fast growth of β(t) and a slow decrease. Moreover the amplitude of the β(t) varies from year to year, perhaps depending on the fluctuations in the mosquito population and in the environment. Interestingly the peak in β(t) appears 1 to 2 months before the incidence peak. This delay can be explained by the extrinsic incubation period and might be used in a warning system.
To explain the β(t) oscillations we have explored the potential effects of local and global climatic variables using wavelet decomposition [48] as one of our main underlying hypotheses is non-stationarity. We observed very significant coherency between β(t) and climate for the local climate for the seasonal mode (Fig 7 and S18-S20 Figs) and also for the 2-3 year components with global climatic variable (S21 Fig). Thus, the rhythm of β(t) can be explained perfectly by climatic factors. Nevertheless, again mainly due to large non-stationarity, by using solely one or two climatic variables we are able to correctly describe dengue evolution in the short-term (Fig 7C, red area) but not over a large time period (Fig 7C, blue area) . This reflects Table 1 . Comparison of goodness-of fit indices for different models and different numbers of parameters inferred. The indices are computed on Incidence, β and R eff trajectories: RMSE: root mean square error using the median; MAPE: maximum absolute percentage error using the median; MIQR: mean inter-quartile range. The parameter values used are in the captions of the figures. For comparison purposes we used a stochastic version of the SIRS model with sinusoidal β and for all figures the observation process is applied to the inferred incidence trajectory.
Sinusoidal β
Figure S11 Fig Fig 1 S4 Fig S7 Fig # of the complexity of such a disease where the ecology of the vectors, the environment, the climate, the immune status of the human population and its behavior are all involved. This large nonstationarity association between dengue and climatic factors has recently been demonstrated using statistical models (dynamic generalized linear models) and data from a medium-sized city in Colombia [49] . The authors showed that dengue cases correlate with climatic variables (temperature, rainfall, solar radiation and relative humidity) but these correlations change over time, some intervals showing a positive association, while in others the association is negative [49] . The non-stationarity association between dengue and climate may be explained by the fact that a climatic variable has different effects depending on the biological cycle of the pathogen or of the vector. Moreover the effects of one climatic variable can also depend on other climatic variables potentially enhancing the non-stationarity association.
As there remain numerous uncertainties during the course of each epidemic, we are increasingly aware of the importance of developing adequate statistical and mathematical tools. Such tools need to take account of the time-varying nature of the underlying ecological and biological mechanisms as well as social and behavioral influences involved in an epidemic. Because of this, time-varying parameters modeled with a diffusion process, that track epidemiological patterns and update the key parameters according to data appear to be a worthwhile approach. Indeed developing a more complex model would be difficult considering the relative paucity of available data. We propose a flexible modeling framework that encompasses time-varying aspects of the epidemic. It does this via diffusion process equations for time-varying parameters and also considers uncertainty associated with key parameters and data. This data-driven framework for time-varying parameters has been coupled with simple stochastic models and a robust Bayesian procedure for inference. To test its efficiency, our proposed methodology was first applied to a toy model and then to real epidemiological examples.
Our results clearly demonstrate the potential of our framework. Firstly, our methodology was able to accurately reconstruct both the incidence and the sinusoidal transmission rate of a simple SIRS model just based on noisy observations (Figs 1-4 and S4 ,S5,S7,S9 and S14 Figs). Based on these reconstructions one can also closely estimate R eff which is one of the key relevant epidemiological parameters. Our results also highlight the flexibility of our developed methodology. It can reconstruct the time evolution of a shifting parameter (ε S (t), see . The comparison using goodness-of-fit indices with the inferred true model allows us to highlight the fact that our methodology performs as well for the observed incidence. Its flexibility results in greater variability in some other trajectories mainly β(t) and R eff (Table 1) . Moreover, in the absence of knowledge of the true evolution of the transmission rate, our approach appears to capture the dynamic observed more accurately than a misspecified model (Table 2) . Secondly, applied to real datasets, our framework is able, based solely on simple stochastic models, to reconstruct complex epidemics such as flu or dengue over long time periods (Figs 5 and 6 ). In such cases, the reconstruction of the time evolution of the transmission rate clearly stresses that, on real datasets, it is difficult to assimilate the dynamic of this parameter as a simple sinusoidal function. It is more irregular in amplitude and sometimes multi-modal over one season.
Considering the paucity of information available regarding the complexity of the mechanisms involved during an epidemic, describing and fitting a full model for a given transmissible disease is always challenging. Our data-driven methodology can be used as a first step towards a better understanding of a complex epidemic, where data is limited or lacks certainty. Indeed most of the unknowns and uncertainties can be put into time-varying parameters. The interactions between both the human behavior and complex environmental drivers. In a recent paper [50] , the authors reviewed evidence of interactions between seasonal influenza virus and other pathogens (bacteria or virus). They concluded that it is important to incorporate these different coinfecting pathogens in models of seasonal flu in order to get a better estimate of the burden of influenza. Our framework could be an alternative to the development of complex models with all the potential interactions between pathogens and to estimate the strength of the interactions. After reconstructing the time evolution of the transmission rate the statistical association between the coinfecting pathogens and the transmission rate could be tested. This screening may facilitate the construction of more complex models that could incorporate only the most significant coinfecting pathogens in the seasonal flu model.
Our methodology also has other advantages. Taking account of the simplicity of the model used, and the fact that weak hypotheses on the dynamics of the time-varying parameters have been included, our proposed methodology can retrospectively test the impact of interventions. This has previously been done in the case of HIV epidemics [34-35], where it was hypothesized that the reduction in the transmissibility was due to a modification of the sexual behavior in the population and the increase in the seropositive period duration due to the introduction of the first antiviral treatments. Evaluation of interventions has also been done recently in the case of the Ebola epidemic in West Africa [51] . The relative simplicity of our methodology is also suitable for short-term predictions and it can then easily be used to predict an epidemic in real time. Starting with a given estimated state defining the system, the fitting process can be run again each time new data is available and the new posteriors are used for new predictions [36] . This can inform public health decisions and indeed has been done recently to great effect in the case of the Ebola epidemics in West Africa [52] .
A major challenge in model fitting is the reliability of data collected and also the non-identifiability of the mechanistic models that always have very rich dynamical behavior. The question of identifiability is too often avoided in epidemiological models applied to a topical Public Health issues. There is, however, considerable literature on this subject (e.g. [41,42,53-55]). Identifiability is not evident even for a simple seasonal SIR model [56] . To solve this problem one can fit a combination of parameters or fix some of them (the population size for instance) [57] . In our applications there is a clear limitation due to practical non-identifiability of reporting rate and initial conditions. To fix these problems we have used informative priors (see Method). Using informative priors or fixing some parameters gives very similar results (compare Fig 1 and S4-S7 Figs) . Related to this is the misspecification of models [45] . In our cases, as with other semi-mechanistic models the time-varying parameter methodology captures some of the information in the data but not in the mechanistic part of the model. If the model is misspecified due to lack of precision, it compensates for it and the dynamics of β(t) will drive improvements in the model to make it more complex and realistic ( Table 2 ). If the model is misspecified to the extent that it creates mechanisms that do not exist, the reconstructed β(t) would compensate for these effects but it will be harder to interpret.
In this work we have used simple mechanistic models. The proposed methodology is not limited to simple models. For instance, a two-strain dengue model has also been tested. In this case the main problem was linked to the unavailability of both seroprevalence and incidence for each strain. Indeed, one of the major difficulties with these multi-strain models is the identification of the initial conditions (e.g. [58] ). Nevertheless it is worth emphasizing that the Bayesian inference method used in our framework, PMCMC, the approximation of the likelihood is limited for a large number of parameters and/or equations [59] . In such cases testing other methodologies like ABC [60, 61] is advisable.
It is always difficult to fit complex models with rich behaviors based on very limited information. In this regard we agree with Metcalf et al. [62] who stressed that nowadays we need seroprevalence studies to quantify the immunological status of the population, because in most cases the magnitude of the outbreak is difficult to evaluate without precise seroprevalence data.
To tackle the uncertainty and the non-stationarity of epidemics, our methodology, although it appears non-standard, makes important progress towards a better understanding of the mechanisms responsible for disease propagation. We believe that, should it form part of the development of the next predictive tools for Public Health, it will make a significant contribution to improving the understanding and control of infectious diseases in our increasingly uncertain world.
where S, I and R are the susceptibles, the infectious and the removed respectively, the transmission rate β(t) = β 0 .(1 + β 1 sin(2πt/365+2πϕ), 1/α is the average duration of immunity, γ is the recovery rate and μ is the recruitment or mortality rate. In (5) , C is the number of new cases, then h(x(t)) is the cumulative sum of C(t) over the observation time step, 7 days. With this model R eff (t) = β(t).S(t)/(N.γ). The parameter values are in the caption of Fig 1. For the fit of our simulated data, Gaussian priors are used for epidemiological parameters (α and γ). Initially non-informative priors were used for the volatility σ, the reporting rate ρ and the initial conditions γ(0) by β(0) but to reduce problems linked to practical non-identifiability materialized by correlation between some estimates, informative priors were used for ρ (see S1 Fig) . Some other simulations have been done fixing β(0) and ρ or just inferring σ (see S4-S8 Figs). SIRS flu model. For analyzing Israel flu data we have used a continuous SIRS model identical to (5), we simply added imported infectious people i in the force of infection:
The initial guess values for the parameters are from [46] . In this example the observational process is a Negative-Binomial law with an over-dispersion parameter equal to 0.05 and the reporting rate ρ = 0.15 [46] . For the fit, Gaussian priors are used for epidemiological parameters (i, α, γ) and non-informative priors for the volatility σ and the initial conditions (S(0), I(0)). S16 Fig displays the priors and the posteriors. SEIR one strain dengue model. To describe the dengue epidemics, taking account of the available data for Phnom Penh, we have fitted a one strain model using a SEIR model:
where E is for infected but not yet infectious, β(t) is the transmission rate, 1/δ is the average duration of the latent period, γ is the recovery rate and μ is the recruitment or mortality rate. The initial guess for parameter values comes from the literature [63] . In this example the observational process is a Negative-Binomial law with an over-dispersion parameter equal to 0.05 and the reporting rate ρ has been estimated using a narrow Gaussian prior. Non-informative priors are used for the volatility σ, initial condition for infected E(0) and imported infectious people i. Gaussian priors are used for other parameters and initial conditions. When E(0) is fitted, I(0) is estimated as a steady-state value I(0) = δ.E(0)/(γ+μ). S17 Fig. displays the priors and the posteriors.
Stochastic framework. Due to the use of a diffusion Eq (2) for the dynamic of the timevarying parameters, the stochastic versions of the previous models have been fitted. Thus the models are considered in a stochastic framework in which the compartments are discrete and the number of reactions occurring in a time interval dt is approximated by a multinomial distribution. It is fully described in [64, 65] .
SMC algorithm as implemented in SSM [65] . In a model with n observations and J particles. L is the model likelihood p(y 1:n |θ). W ðjÞ k is the weight and x ðjÞ k is the state associated to particle j at iteration k.
for (j = 1:J) do 5.
Sample x ðjÞ kþ1 from pðx kþ1 jx ðjÞ k ; yÞ 6.
Set a ðjÞ ¼ pðy kþ1 jx ðjÞ kþ1 ; yÞ 7. end for 8.
Set W ðjÞ kþ1 ¼ a ðjÞ = P J l¼1 a ðlÞ and L ¼ L 1 J P J l¼1 a ðlÞ 9.
Resample ðx ðjÞ 0:kþ1 Þ j¼1:J from W ðjÞ kþ1 . 10. end for Estimation with particle Markov Chain Monte Carlo (PMCMC). Since the epidemiological propagation models are considered in a stochastic framework, their likelihood is intractable and it is estimated with particle filtering methods (Sequential Monte Carlo, SMC). With a given set of parameters, the SMC algorithm reconstructs sequentially the trajectory of the state variables and the time-varying parameters, and computes the associated likelihood. Firstly, the distribution of the initial conditions of the system is approximated with a sample of particles. Then, at each iteration, the particles are projected according to the propagation model up to the next observation point, they receive a weight reflecting the quality of their prediction compared to the observation and the total likelihood is updated. A resampling step using the weights is performed before the next iteration, in order to discard the trajectories associated with low weight particles.
In order to estimate the parameters of the system, the particle filter is embedded in a Markov Chain Monte Carlo framework, leading to the PMCMC algorithm [43] . More precisely, the likelihood estimated by SMC is used in a Metropolis Hasting scheme (particle marginal Metropolis Hastings) [43] . The proposal distribution is a Gaussian whose co-variance matrix is adapted following the framework described in [65] .
The starting point of the MCMC chain is initialized using optimal values obtained from the KSimplex algorithm on a large number of parameter sets. Then, a pre-adaptation of the proposal co-variance matrix is performed with Kalman MCMC (KMCMC). Each time the idea relies on less computationally intense algorithms in order to facilitate the exploration of parameter space. But as we use stochastic models we approximate the likelihood using the extended Kalman filter both in the simplex algorithm (KSimplex) [65] and in the MCMC (KMCMC) [36] . Then the adaptive PMCMC is executed on the output of the KMCMC with 100000 iterations and 10000 particles for the final figures. For instance, the results such as those of Fig 1 take less than 24 hours, on a blade server from PowerEdge M-Series with 40 processor cores.
PMCMC algorithm as implemented in SSM [65] . In a model with n observations and J particles. q(.|θ (i) ) is the transition kernel 1. Initialize θ (0) 2. Using SMC algorithm, computepðy 1:n jy ð0Þ Þ and sample x ð0Þ 0:n frompðx 0:n jy 1:n ; y ð0Þ Þ 3. for (i = 0:N−1) do 4.
Sample θ Ã from q(.|θ (i) ) 5.
Using SMC algorithm, compute Lðy Ã Þ ¼pðy 1:n jy Ã Þ and sample x Ã 0:n from pðx 0:n jy 1:n ; y Ã Þ
Accept θ Ã (and x Ã 0:n ) with probability 1^L ðy Ã Þpðy Ã Þqðy ðiÞ jy Ã Þ Lðy ðiÞ Þpðy ðiÞ Þqðy Ã jy ðiÞ Þ
If accepted, θ (i+1) = θ Ã and x ðiþ1Þ In order to assess convergence of the chain, the visual inspection of the chains (e.g. S2 Fig or S8 Fig) was complemented by diagnosis provided in the Coda package in R [66] . Due to the large computational cost of the algorithm, we did not run multiple independent chains, rather we relied on diagnosis using one MCMC chain and testing its stationarity: Geweke diagnosis [67] and Heidelberger and Welch's diagnosis [68] . The results are presented in S1 and S2 Tables.
Among the various approaches developed to study nonstationary data, wavelet analysis is probably the most efficient. In particular, this method gives us the possibility of investigating and quantifying the evolution in time of the periodic components of a time series (see [69] ). Wavelets constitute a family of functions derived from a single function, the ''mother wavelet", C a,τ (t), that can be expressed as the function of two parameters, one for the time position τ, and the other for the scale of the wavelets a, related to the frequency. More explicitly, wavelets are defined as:
The wavelet transform of a time series x(t) with respect to a chosen mother wavelet is performed as follows:
where Ã denotes the complex conjugate form. The wavelet transform W x (a,τ) represents the contribution of the scale a to the signal at different time positions τ. The computation of the wavelet transform is done along the signal x(t) simply by increasing the parameter τ over a range of scales a until all coherent structures within the signal can be identified. Here, as mother wavelet, we have used the Morlet wavelet [69] .
With the wavelet approach, we can estimate the repartition of variance at different scale a and different time location τ. This is known as the wavelet power spectrum: S x (a,τ) = | W x (a,τ) | 2 .
An important point with the continuous wavelet is that the relationship between the wavelet frequency f 0 and the wavelet scale a can be derived analytically. For the Morlet wavelet this relationship is given by:
p Then when f 0 = 2π, the wavelet scale a is inversely related to the frequency, f % 1/a. This greatly simplifies the interpretation of the wavelet analysis and one can replace, on all equations, the scale a by the frequency f or the period 1/f.
To determine the statistical relationship between two time series, wavelet coherence can be computed (e.g. [48, 70] ):
where the angle brackets around terms indicate smoothing in both time and frequency, W x (f,τ) is the wavelet transform of series x(t), W y (f,τ) is the wavelet transform of series y(t), and W x,y (f,τ) is the cross-wavelet spectrum. The values of wavelet coherence are between 0 < R x,y (f,τ) < 1. The wavelet coherency is equal to 1 when there is a perfect linear relation at particular time and scale between the two signals, and equal to 0 if x(t) and y(t) are independent.
To complement this, phase analysis can be used to characterise the association between signals (e.g. [48, 70] ). The phase difference provides information on the sign of the relationship (i.e., in phase or out of phase) and can be computed, for complex mother wavelet, with the wavelet transform W x (f,τ) as: Wavelet coherence [48, 70] between the reconstructed β(t) and monthly rainfall. In (B), (C) and (D) the colors code for low values in white to high values in dark red. The dashed lines show the 95% CI computed with adapted bootstrappes [71] , in (C) the 90% and the 95% CI have been plotted. (E) The evolution of the phase of the two time series computed based on wavelet decomposition for the seasonal mode, blue dashed line for the normalized β(t) red dashed line for the normalized monthly rainfall and black dotted line for their phase difference. The graph on the right shows the distribution of the phase differences.
Global trends in emerging infectious diseases
Infectious diseases, climate influences, and nonstationarity
Seasonality and the dynamics of infectious diseases
Seasonality of infectious diseases
Nonstationary influence of El Nino on the synchronous dengue epidemics in Thailand
Cholera threat to humans in Ghana is influenced by both global and regional climatic variability
Forcing versus feedback: epidemic malaria and monsoon rains in northwest India
Identifying climate drivers of infectious disease dynamics: Recent advances and challenges ahead
Closure of schools during an influenza pandemic. The Lancet Infectious Diseases
School closure and mitigation of pandemic (H1N1) 2009, Hong Kong. Emerging Infectious Diseases
Contact, travel, and transmission: The impact of winter holidays on influenza dynamics in the United States
Global public health implications of a mass gathering in Mecca, Saudi Arabia during the midst of an influenza pandemic
Rural-urban gradient in seasonal forcing of measles transmission in Niger
Modelling the influence of human behaviour on the spread of infectious diseases: A review
Behavioural change models for infectious disease transmission: A systematic review (2010-2015)
The contribution of social behavior to the transmission of influenza A in a human population
Modeling infectious diseases in humans and animals
Modeling infectious disease dynamics in the complex landscape of global health
Opportunities and challenges in modeling emerging infectious diseases. Science
Multiannual forecasting of seasonal influenza dynamics reveals climatic and evolutionary drivers
Increased transmissibility explains the third wave of infection by the 2009 H1N1 pandemic virus in England
Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures
Real-time estimates in early detection of SARS. Emerging Infectious Diseases
The ideal reporting interval for an epidemic to objectively interpret the epidemiological time course
A new framework and software to estimate time-varying reproduction numbers during epidemics
Inference for nonlinear dynamical systems
Measuring the path toward malaria elimination
Estimating the attack ratio of dengue epidemics under time-varying force of infection using aggregated notification data
Noise and nonlinearity in measles epidemics: combining mechanistic and statistical approaches to population modeling
Inapparent infections and cholera dynamics
Malaria in Northwest India: Data analysis via partially observed stochastic differential equation models driven by Lévy noise
The mathematics of infectious diseases
Model distinguishability and inference robustness in mechanisms of cholera transmission and loss of immunity
Modelling seasonal influenza: the role of weather and punctuated antigenic drift
Can Human Movements Explain Heterogeneous Propagation of Dengue Fever in Cambodia?
Time-dependent spectral analysis of epidemiological time-series with wavelets
Bayesian dynamic modeling of time series of dengue disease case counts
Influenza interaction with cocirculating pathogens and its impact on surveillance, pathogenesis, and epidemic profile: A key role for mathematical modelling
Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model
Temporal changes in Ebola transmission in Sierra Leone and implications for control requirements: a real-time modelling study
Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design
New results for identifiability of nonlinear systems
Global identifiability of nonlinear models of biological systems
The structural identifiability of the susceptible infected recovered model with seasonal forcing
The structural identifiability of susceptible-infective-recovered type epidemic models with incomplete immunity and birth targeted vaccination
Structure in the variability of the basic reproductive number (R 0 ) for Zika epidemics in the Pacific islands
Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems
Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems
Approximate Bayesian Computation
Use of serological surveys to generate key insights into the changing global landscape of infectious disease
Dengue modeling in rural Cambodia: statistical performance versus epidemiological relevance
Time Series Analysis via Mechanistic Models
SSM: Inference for time series analysis with State Space Models
Coda: convergence diagnosis and output analysis for
Evaluating the accuracy of sampling-based approaches to calculating posterior moments
Simulation run length control in the presence of an initial transient
A practical guide to wavelet analysis
Wavelet analysis of ecological time series
Wavelet analysis in ecology and epidemiology: impact of statistical tests
A dipole mode in the tropical Indian Ocean
BC would like to express his gratitude to the "Institut Ecologie et Environnement" of CNRS for facilitating his research activity thanks to a "Délégation CNRS" for 2 years in the URA 3012 "Hosts, Vectors and Infectious Agents". "Lovely thanks" to Una Ni Mhaoldhomhnaigh who has smoothed our "French English".