key: cord-0565529-2pbd0lla authors: Chkrebtii, Oksana A.; Garc'ia, Yury E.; Capistr'an, Marcos A.; Noyola, Daniel E. title: Inference for stochastic kinetic models from multiple data sources for joint estimation of infection dynamics from aggregate reports and virological data date: 2017-10-27 journal: nan DOI: nan sha: fbc98fa08153732607da9ab3555fdf24319d0309 doc_id: 565529 cord_uid: 2pbd0lla Before the current pandemic, influenza and respiratory syncytial virus (RSV) were the leading etiological agents of seasonal acute respiratory infections (ARI) around the world. In this setting, medical doctors typically based the diagnosis of ARI on patients' symptoms alone and did not routinely conduct virological tests necessary to identify individual viruses, limiting the ability to study the interaction between multiple pathogens and to make public health recommendations. We consider a stochastic kinetic model (SKM) for two interacting ARI pathogens circulating in a large population and an empirically-motivated background process for infections with other pathogens causing similar symptoms. An extended marginal sampling approach, based on the linear noise approximation to the SKM, integrates multiple data sources and additional model components. We infer the parameters defining the pathogens' dynamics and interaction within a Bayesian model and explore the posterior trajectories of infections for each illness based on aggregate infection reports from six epidemic seasons collected by the state health department and a subset of virological tests from a sentinel program at a general hospital in San Luis Potos'{i}, M'{e}xico. We interpret the results and make recommendations for future data collection strategies. Before the current pandemic, influenza and respiratory syncytial virus (RSV) were the leading etiological agents of seasonal acute respiratory infections (ARI) around the world. In this setting, medical doctors typically based the diagnosis of ARI on patients' symptoms alone and did not routinely conduct virological tests necessary to identify individual viruses, limiting the ability to study the interaction between multiple pathogens and to make public health recommendations. We consider a stochastic kinetic model (SKM) for two interacting ARI pathogens circulating in a large population and an empirically-motivated background process for infections with other pathogens causing similar symptoms. An extended marginal sampling approach, based on the linear noise approximation to the SKM, integrates multiple data sources and additional model components. We infer the parameters defining the pathogens' dynamics and interaction within a Bayesian model and explore the posterior trajectories of infections for each illness based on aggregate infection reports from six epidemic seasons collected by the state health department and a subset of virological tests from a sentinel program at a general hospital in San Luis Potosí, México. We interpret the results and make recommendations for future data collection strategies. 1. Introduction. Inference on mathematical models of epidemic dynamics has become a powerful tool in the understanding and assessment of disease outbreaks (Huppert and Katriel (2013) ; Siettos and Russo (2013) ; Star and Moghadas (2010) ). Such models are predominantly stochastic, reflecting the inherently random nature of a large number of human interactions which enable infections to spread and individuals to change their infection status. The probabilities of discrete transitions from one infection state to another are defined up to a set of unknown parameters which may be inferred from observed data. The most widely used models are variations on the "Susceptible-(Exposed-)Infected-Recovered" (SIR/SEIR) formulation which describes the temporal evolution of the number of individuals in each infection state at a given time. A variety of strategies have been developed to incorporate process-specific demographic stochasticity in this compartmental model. For example, Dukic, Lopes and Polson (2012) model process stochasticity by an additive white noise process on the growth rate of the infectious population computed from states that evolve according to the deterministic compartmental dynamics described above. In a different approach, Farah et al. (2014) assume additive process noise on the infection states of a deterministic SEIR model. Another approach is taken by Shrestha, King and Rohani (2011) by modeling infection state counts as multinomial processes with probabilities of inclusion obtained by first solving the ODE corresponding to the compartmental model and then solving for the transition probabilities as functions of current states. Another popular class of models, considered in this paper, probabilistically models individual interactions and their outcomes directly. This approach is based on stochastic kinetic modeling (e.g., Wilkinson (2006) ), a first-principles interpretation of SIR dynamics inspired by chemical mass-action kinetics where molecules interact and change state according to specified transition probabilities. This approach realistically captures inherent stochasticity in infection states because it naturally models individual-level transitions from one infection state to another as stochastic processes incorporating assumptions about both the disease and the interactions. However, estimation for such models poses a unique set of challenges. For a small population or one where state transitions can be observed directly, posterior inference on the SKM parameters can exploit closed-form expressions for full conditional distributions (Boys, Wilkinson and Kirkwood (2008) ; Choi and Rempala (2011) ), but even for moderately sized populations, individual interactions and transitions become increasingly difficult to simulate and marginalize over. Furthermore, since data typically consists of observed infection counts rather than individual transition times, computation of the likelihood becomes computationally infeasible. To resolve this issue, a common strategy is to construct the likelihood based on a large-population limit via diffusion approximation of the SKM (van Kampen (1992) ), also known as the linear noise approximation (LNA). The LNA has been a popular approach for inference for stochastic reaction networks in epidemiology and systems biology (e.g., Fearnhead, Giagos and Sherlock (2014) ; Fintzi, Wakefield and Minin (2020) ; Golightly, Henderson and Sherlock (2015) ; Golightly and Wilkinson (2011) ; Komorowski et al. (2009) ) and has been successfully implemented within a Bayesian hierarchical modeling framework (Finkenstädt et al. (2013) ; Hey et al. (2015) ; Whitaker et al. (2017) ). Challenges in applying this approach to realistic data scenarios include integrating multiple data types that depend on the unknown infection states, empirical modeling of disease states that are not described by the mechanistic model, and efficient posterior sampling of model parameters. The contribution of the present work is to simultaneously address these challenges to enable joint inference and retrospective assessment of the dynamics of two interacting pathogens from both aggregated infection reports and a subset of virological data collected in the state of San Luis Potosí, México. The ability to utilize multiple data sources allows estimation of parameters involved in the interaction of different pathogens that cause similar symptoms. Indeed, to our knowledge our approach is the first to directly recover cross-interference parameters in a SKM for two pathogens within a Bayesian model. We additionally identify and address practical challenges of working with such models, including posterior sampling strategies that are well suited to these settings. The results of this analysis can play an important role in epidemiological studies of the burden of influenza on morbidity and mortality at local, national, or regional levels. Most current estimates of influenza-associated morbidity and mortality do not take into account the contribution of RSV to excess mortality, due mainly to the scarcity of virological surveillance information. Our findings provide evidence that maintaining a sentinel program, by administering virological tests at a clinical referral center, allows for reliable tracking of virus circulation time at the population level at a relatively low cost. Our analysis shows that even a relatively small number of such virological tests, when administered to all the patients in a particular group, can help to provide information about patterns of pathogen circulation over time. The article is organized as follows. The motivating application and the data are described in detail in Section 2. Section 3 constructs a SKM for the evolution of individual infection states of the acute respiratory pathogens influenza and RSV and formulates its behaviour in the large-population limit as a hidden Markov model. A Bayesian model is then presented relating aggregated infection counts and a subset of virological tests to the SKM for infection states of the two pathogens augmented by a background model component representing infection by other pathogens. The inferential strategy is described in Section 4. Section 5 describes and interprets the results of our retrospective analysis for six individual epidemic seasons, using predictive residual analysis to assess model performance and to guide future modeling efforts. Finally, Section 6 discusses the feasibility of our approach, summarizes our findings, and offers some perspectives on future work. 2. Motivating application. ARI are infections of the upper and lower respiratory tract caused by multiple etiological agents, most frequently, adenovirus, influenza A and B, parainfluenza, RSV, and rhinovirus. An important public health concern around the world, ARIs are responsible for substantial mortality and morbidity (Thompson et al. (2003) ; Troeger et al. (2018) ), mainly affecting children under five and adults over 65 years of age (Kuri-Morales et al. (2006) ). Understanding of the underlying mechanisms of spread, transmission, and, importantly, cross-interference between these pathogens aids policy makers in assessing public health strategies and decision-making (Huppert and Katriel (2013) ). Although different viruses are responsible for ARI, a substantial part of the burden of ARI in most regions has been due to influenza and RSV (Chan et al. (2014) ; Chaw et al. (2016) ; Velasco-Hernández et al. (2015)). The interaction and temporal dynamics of these pathogens are complex. Evidence suggests that influenza and RSV are seasonally related (Bloom-Feshbach et al. (2013) ; Mangtani et al. (2006) ) and circulate at similar times of the year in some temperate zones (Bloom-Feshbach et al. (2013); Velasco-Hernández et al. (2015) ). Because of their interaction and interference, these infections do not usually reach their epidemic peaks simultaneously (Ånestad (1982; 1987) , Ånestad and Nordbo (2009) ), with peak times typically differing by less than one month (Bloom-Feshbach et al. (2013) ). In the clinical setting it is difficult to determine which pathogen may be responsible for a patient's ARI because of their overlapping circulation times and similar symptoms. Furthermore, laboratory tests necessary for identification of the virus are not conducted in most patients (Chan et al. (2014) ). Our analysis is based on data from epidemic seasons 2003-2004 through 2008-2009 , beginning in the first week of August and ending in the last week of July of the following year. Specifically, we use data on weekly aggregated morbidity reports and virological time series obtained in the state of San Luis Potosí, México. Morbidity reports consist of weekly reports of ARI that required medical attention 1 at all community clinics and hospitals in the state that were reported to the State Health Service Epidemiology Department. The virological data was obtained from records of the Virology Laboratory (Facultad de Medicina, Universidad Autónoma de San Luis Potosí, San Luis Potosí, México) and consisted of weekly counts of positive tests for RSV and influenza as well as tests which were negative for both pathogens. These virological tests were performed on all eligible pediatric patients (children under five years of age) admitted with lower respiratory infections (Gómez-Villa et al. (2008) ; Vizcarra-Ugalde et al. (2016) ) to Hospital Central "Dr. Ignacio Morones Prieto". These virological tests are unlikely to reflect only small-scale outbreaks, since Hospital Central "Dr. Ignacio Morones Prieto" admits patients from all regions of the state. Though the number of virological tests is small relative to the total number of aggregated ARI reports, the systematic sampling of patients year-round, irrespective of the identification of an epidemic by any given virus, provides information on the period of circulation of influenza and RSV. As in our data, most virological tests for RSV are performed on samples from children (Amini et al. (2019) ; Meskill et al. (2017) ) and may be assumed to reflect the sequential infection patterns in a community (Fleming et al. (2015) ). Indeed, available data suggest that, when SIR model with two pathogens. X kl represents the number of individuals in immunological status k for pathogen 1 and status l for pathogen 2. Labels above the arrows represent the reaction rates for each reaction type. influenza and RSV are present in community outbreaks, they tend to affect all age groups simultaneously (Dowell et al. (1996) 3. Modeling. This section describes a SKM of influenza and RSV dynamics as well as its diffusion approximation in the large-population limit, required to compute the likelihood of reported infection data and virological tests. We then construct a Bayesian model that relates the governing equations, including a background infection component, to the aggregated ARI reports and the virological test data. 3.1. Stochastic kinetic model of a two-pathogen system. Stochasticity appears in biological systems due to their discrete nature and the occurrence of environmental and demographic events. In the case of disease dynamics, the occurrence of events such as interactions between individuals that constitute exposure can be reasonably described as stochastic. Therefore, it is reasonable to model this stochasticity directly in the individual transitions, in contrast to indirectly modeling their aggregate behavior or perturbing a deterministic compartmental model. Stochastic kinetic or chemical master equation modeling (Allen (2008) ; Wilkinson (2011)) is a mathematical formulation of Markovian stochastic processes which describe the evolution of the probability distribution of finding the system in a given state at a specified time (Gillespie (2007) ; Thomas, Matuschek and Grima (2012) ). To model the relationship between influenza and RSV (henceforward called pathogens 1 and 2, respectively) during a single year, we consider a closed population of size Ω, assumed to be well mixed and homogeneously distributed, where the individuals interacting in a fixed region can make any of R possible transitions. The stochastic "Susceptible-Infected-Recovered" (SIR) model with two pathogens (Adams and Boots (2007) ; Kamo and Sasaki (2002) ; Vasco, Wearing and Rohani (2007) ) is described by eight compartments, corresponding to distinct immunological statuses. Denote by X kl (t) the number of individuals at time t Baseline transmission rate for influenza β 2 Baseline transmission rate for RSV σ 1 Cross-immunity or enhancement for influenzaσ 2 Cross-immunity or enhancement for RSV in immunological status k ∈ {S, I, R} for pathogen 1 and immunological status l ∈ {S, I, R} for pathogen 2. We omit the state X II from the model because, although simultaneous infection by both viruses is biologically possible, clinical studies have found that the frequency of coinfection with RSV and influenza is usually low (Meskill et al. (2017) ). A graphical representation of this model is provided in Figure 1 . A list of reaction rate parameters is provided in Table 1 . The constants β 1 and β 2 represent the contact transmission rate, which describes the flow of individuals from the susceptible group to a group infected with pathogen 1 and 2, respectively. In the context of ARI, the average recovery time is known to be relatively stable and lasts for approximately seven days (Fiore et al. (2008) ). Therefore, the rate, γ, at which infected individuals recover (move from infected to temporary immunity in the recovered category) is 1/7 days −1 . Since the population is relatively stable over the years under study, we set the birth rate equal to the death rate µ in our transition model. We also assume an average life expectancy of 1/µ = 70 years (World Health Organization (2017)). Variables λ 1 (t) and λ 2 (t) represent the proportion of the population infected with pathogens 1 and 2, respectively, at a given time t. Finally, to describe the interaction between influenza and RSV, we use the cross-immunity or crossenhancement parameters σ 1 and σ 2 . Cross-immunity for influenza is present when 0 < σ 1 < 1, indicating that the presence of RSV inhibits infection with influenza. A value of σ 1 = 0 confers complete protection against influenza while a value of σ 1 = 1 confers no protection, and a value of σ 1 > 1 represents increasing degree of cross-enhancement, indicating that the presence of RSV enhances infection with influenza (Adams and Boots (2007)). The definition of σ 2 is analogous. Further customization of the transition mechanism is possible within this stochastic dynamical model (e.g., Wilkinson (2006) ) and is desirable when additional information about the system dynamics is available. Examples include considering the effect of vaccination programs, seasonal forcing, or the presence of specific additional pathogens in circulation. We discuss these in the section on future work. We next make the following standard assumptions about the random vector of infection states Allen (2008) ; Gillespie (2007)). For an infinitesimally small time increment, ∆t, the probability of the occurrence of an event of type j in the interval (t, t + ∆t] is a j (x)∆t, where a j (x) denotes the propensity function for the reaction at state X = x. Under these conditions, and assuming a well-mixed population, the probability mass function p t describing the probability of being in state X = x at time t evolves according to the Kolmogorov forward equation (chemical master equation, or CME), where v j (t) are stoichiometric vectors whose elements in {−1, 0, 1} describe the addition or subtraction of individuals from a particular compartment of x. Values of a j (x) and v j (t) for the two-pathogen SIR model are provided in Table 2 . The large-population approximation to this system characterizes the distribution of the Markov process X(t), t ∈ [0, T ] as the sum of a deterministic term Ωφ(t), φ : [0, T ] → R + dim(X) and a stochastic term Ω 1/2 ξ, where ξ is a Gaussian process with meanξ(t) : [0, T ] → R + dim(X) and covariance C(t) : [0, T ] → R dim(X)×dim(X) . Equivalently, we can write, The Appendix explains this large-population approximation and defines the quantities φ,ξ, C as solutions to ODE initial value problems involving the propensities a j and stoichiometric matrix V for the system. Section 3.2 describes how this approximation is used to model aggregated data on ARI reports. That is, observations are made on the total number G X(t) = X IS (t) + X IR (t) + X SI (t) + X RI (t) of reported infections from influenza and RSV at M discrete observation time points t 1 , . . . , t M . The following distributional fact follows from (2) and the standard assumption (e.g., Fearnhead, Giagos and Sherlock (2014)) of a zero initial drift term,ξ(0) = 0, so thatξ(t) = 0 for all t ∈ [0, T ]. Thus, where θ is a vector of the SKM parameters, defined in Section 3.1, augmented with unknown initial conditions X(0), The aggregate number of ARI cases in the San Luis Potosí data also includes infections by viruses other than influenza and RSV. Although these may be responsible for a significant fraction of all ARI cases, influenza and RSV are the two viruses that drive the epidemic fluctuations observed during the winter outbreaks in each year. Other viruses in circulation are included in the model by augmenting the state vector X by a background process corresponding to ARI infection that is not caused by influenza or RSV. In contrast to infection states for influenza and RSV, this background state D, which represents the number of individuals infected with a different pathogen, is modeled empirically. We assume that the number of background infections in a given week and the previous week are positively correlated and that the variance of the process error for D lies between the variance of the infection state X (proportional to Ω, as in (2)) and the variance of the observation error (proportional to Ω 2 ). This choice ensures that the background process is more variable than X but is not flexible enough to overfit the data. A positive drift term, proportional to Ω, reflects the expected growth in background cases. This suggests the lagged background model, where η i ∼ N (0, Ω 3/2 κ) are independent, zero-mean, stochastic noise terms, c, ν, and κ are unknown positive constants, and D 1 ∼ N (Ωc, Ω 3/2 κ) is the distribution of the initial background state. Discrepancy models, such as this one, must both be flexible enough to capture the unmodeled signal but structured enough to guard against overfitting (e.g., Brynjarsdóttir and O'Hagan (2014) ). In our model this is accomplished by setting the process error variance Ω 3/2 κ to be below that of the observation error but above that of X (our choice of κ = c 0 makes these variances directly comparable). Furthermore, the integration of the additional virological test data in the analysis helps to constrain the background model. Observed aggregate counts are assumed subject to independent Normal observation errors ε i with mean zero and variance Ω 2 Σ. We make the additional assumption that a fixed proportion, r ∈ [0, 1], of all individuals infected with ARI seek consultation. Therefore, the observed aggregated reports at observation times t 1 , . . . , t M are modeled as The justification for modeling the total number of influenza and RSV infections directly via the Normal model, obtained via the LNA, is computational (Fearnhead, Giagos and Sherlock (2014) ; Golightly, Henderson and Sherlock (2015) ) by exploiting the availability of closedform Bayesian updates over the states. Modeling counts, via a discrete distribution centered at the LNA, would require an additional layer of sampling at each observation location for each Markov chain Monte Carlo (MCMC) iteration which would quickly become computationally infeasible. Due to symmetry in the two-pathogen SIR model (Figure 1 ), conditioning on aggregate counts (6) alone leaves uncertainty about the circulation patterns of individual pathogen infection trajectories. Our analysis shows that the inclusion of the virological data, described in Section 2, can aid in disaggregating the dynamics of the two pathogens. Denote by N 1,i , N 2,i , N 3,i the number of virological laboratory tests that identified influenza, RSV, and neither pathogen, respectively, for children under five years of age admitted with an ARI during week i at Hospital Central. The total number of virological tests is related to the number of infected individuals in the population as follows. According to the 2010 census, there were Ω c = 266,761 children under five years of age in San Luis Potosí out of a total population of Ω = 2,585,518 (Consejo Nacional de Población, México). Furthermore, available data suggests that approximately a proportion r h = 0.0005 of children with reported ARI in the state are admitted to Hospital Central and are, therefore, eligible for a virological test. Thus, we model the expected number of virological tests that were positive for influenza, RSV, and The variability in the true proportion of the infected population admitted to Hospital Central, motivates the use of the variance-inflated model, where v is an unknown variance inflation parameter. The variance of (7) is m k,i (1 + 1/v) and is controlled by the parameter v which will be estimated along with the other model components. Parameters defining the discrepancy model (5) and observation processes (6) and (7) are listed in Table 3 . In the following we denote by τ the subset of these parameters which are unknown: τ = (c, ν, r, v, Σ). Furthermore, we assume that both the SKM parameters θ and the discrepancy and observation model parameters τ vary across epidemic years. This is due to factors, such as the circulation of different strains of ARI in different years, which are difficult to accurately model. The rationale for this assumption is further explained in the Discussion. 3.3. Prior probability model for unknown parameters. Prior distributions on the model and auxiliary parameters are obtained by expert elicitation and by enforcing physical constraints. For example, ensuring that initial conditions on the states represent the entire population requires that elements of the unknown initial state vector X(0)/Ω lie on the simplex, motivating the use of a Dirichlet prior. The reporting proportion r ∈ (0, 1) and the small positive lag coefficient ν for the background process are assigned uniform prior distributions on (0, 1). The remaining parameters are bounded below by zero and are, therefore, assigned Gamma priors. The shape and scale hyperparameters for the Gamma prior on the transmission rates β p , p = 1, 2 are chosen to be 20 and 3, respectively, to yield a reasonable prior mean of 60 days −1 and a large spread, reflecting our uncertainty about these quantities. The shape and scale for the Gamma prior on the cross-interference parameters σ p , p = 1, 2, are chosen to be 10 and 0.1, respectively. This choice places a relatively large prior weight on the region around the neutral case σ p = 1, p = 1, 2. The same hyperparameters are chosen for the variance inflation factor v so that the prior mean corresponds to a relatively small spread in the counts of virological test data. The prior on the error variance Σ is Gamma with shape parameter 1 and scale parameter 0.01, ensuring a moderately sized prior mean on the scale of the population and substantial posterior mass near zero. The state prior covariance, C 0 = c 0 I dim(X),dim(X) , is assumed to be diagonal with prior variance c 0 and assigned a moderate value of 0.01. 4. Inferential methodology. Consider inference on a general partially-observed SKM with states X and model parameters θ. A Markov chain Monte Carlo (MCMC) algorithm, targeting the posterior π(θ | y), requires the expensive evaluation of the marginal likelihood p(y | θ) at every iteration given θ. An approximate but more efficient evaluation of the marginal likelihood can be obtained via a linear filter under the LNA. The inference problem can then be formulated as a hidden Markov model (Fearnhead, Giagos and Sherlock (2014) ; Finkenstädt et al. (2013) ; Golightly, Henderson and Sherlock (2015) ). In this scenario the system equation for the hidden state (2) is where F is the zero dim(X) × dim(X) matrix and H(t i ) = Ωφ(t i ). Here, φ(t i ) and C(t i ) are obtained by integrating equations (11) and (17) from time t i−1 to t i with the initial condition φ(t i ) and C(t i ), respectively (to generalize this formulation for the case of a nonzero drift termξ, see Finkenstädt et al. (2013) ). A linear Gaussian observation process allows the use of the computationally efficient linear filter. However, in our applied problem the observation process (6) and (7) is more complex, incorporating an additional non-Gaussian data source and a background process. Section 4.1 outlines an extended sampling approach to incorporate additional data and model components into this framework. 4.1. Extended marginal sampling approach based on LNA. Algorithm 1 summarizes an extended sampling procedure to be performed at each iteration of an MCMC targeting π a (θ, τ, x 1:M , d 1:M | y 1:M , n 1:M ) given sampled values of (θ, τ ). We will use the shorthand notation x 1:i to denote the vector x at times t 1 through t i . Incorporating the background process (5) within this framework is straightforwardly done by augmenting equation (8) to include infection states with ARI other than influenza and RSV. Formulation of this problem as a hidden Markov model enables the use of a backward recursion to compute the smoothing distribution over the epidemic state (e.g., Touloupou, Finkenstädt and Spencer (2019) ). Once this is available, a likelihood-free sampling step over the states allows us to compute the likelihood component for the virological test data N . Assuming conditional independence between Y and N , given the infection states, background, and parameters, the posterior density, is proportional to the prior multiplied by the likelihood components (2), (5), (7), and the approximate marginal likelihood p a (y 1:M | θ, τ ) obtained by alternating forecast and analysis steps (see, e.g., West and Harrison (1997) , and Algorithm 1). The forecast and analysis steps used to obtain p a (y 1:M | θ, τ ) provide the necessary information to obtain the smoothing distribution over x 1:M and d 1:M via backward recursion. Conditioning on a sample from this distribution, we can then evaluate p(n 1:M | x 1:M , d 1:M , y 1:M , θ, τ ). This approach has the additional advantage of allowing visualization of the posterior samples over the infection states and background. Results. This section describes the results of the analysis of data from San Luis Potosí. Retrospective joint inference for the dynamics of influenza and RSV is carried out independently for each epidemic year due to the presence of yearly variability between circulating strains of ARI infections, as explained in the Discussion. pa(y 1 | θ, τ ) = N (y 1 ; rG m 0 , r 2G Ṽ 0G + Ω 2 Σ). The analysis distribution at time t = 1 is p(z 1 | y 1 ) = N (z 1 ;m 1 ,Ṽ 1 ), wherẽ Retain the previously computed quantitiesm i ,Ṽ i , pa(y 1:i | θ, τ ). Integrate equations (11) and (17) from time t i to t i+1 using as initial conditions φ i = m i /Ω and C i = V i /Ω, to obtain φ i+1 and . Backward recursion, together with the approximation used to obtain the analysis distribution above, leads to a smoothing distribution forZ 1:M equal to the approximate analysis distribution, 5.1. Computational implementation. Posterior sampling was conducted by embedding the extended marginal sampling algorithm 1 within a parallel tempering Markov chain Monte Carlo sampler (PTMCMC, Geyer (1991)) over (θ, τ, x 1:M , d 1:M ). PTMCMC is well suited to the present setting, because it can effectively explore challenging posterior densities, with features such as ripples and ridges which occur when different combinations of parameters lead to similar model fits. Five hundred thousand iterations of PTMCMC were obtained for each year using four parallel chains each. Due to strong posterior correlations between groups of parameters in our model, the parameters were sampled in three distinct blocks. The first two blocks consisted of the model parameters β 1 , β 2 , σ 1 , σ 2 , and parameters defining the discrepancy and observation processes, Σ, r, c, v, ν. Proposal covariance adaptation was performed for both blocks, during the first half of each chain, which was discarded as burn- in. The last block consisted of the initial states X SS (0), X IS (0), X RS (0), X SI (0), X RI (0), X SR (0), X IR (0), X RR (0). For this block, only the proposal variances were adapted, because both the sign and magnitude of posterior correlations between these quantities vary across the parameter space. Convergence was diagnosed by visual inspection of trace plots and correlation plots. Inference based on data from San Luis Potosí, México. This section presents and interprets the results for retrospective joint inference for the dynamics of influenza and RSV from observed aggregate ARI counts and auxiliary virological testing data from San Luis Potosí, México. Maximum a-posteriori (MAP) estimates and 95% credible intervals for the model parameters are provided in Tables 1 through 6 of the Supplementary Material (Chkrebtii et al. (2021) ). Summaries of the marginal posterior distribution over ARI reports are shown in Figures 2 and 3 . Figure 2 illustrates marginal posterior summaries over disaggregated infection states (influenza, RSV, and background). In all the years examined, except 2006-07, a single pathogen, either influenza or RSV, peaks first, followed by a peak associated with the other pathogen. Regardless of which peaks first, in all six years the posterior trajectories agree with the patterns suggested by the virological test data. Specifically, in all years, except 2003-04 and 2005-06, RSV cases peaks first, followed by influenza, with infections subsiding by the end of the epidemic year. In 2005-06, it appears that a small initial outbreak of RSV coincides with the main outbreak of influenza which is also consistent with the virological test data for that year. Secondary peaks that sometimes appear in the number of infections by a given pathogen may be due to local outbreaks of the ARI or ones caused by a different strain of the pathogen. The flexibility of our model to capture multiple epidemic peaks is useful and will be further considered in light of recent literature in the Discussion. The background model, which describes discrepancy due to other ARI infections as well as some degree of model misspecification, appears to suggest possible over-estimation of the number of infected individuals at the beginning of each epidemic season, due to the inability of the SKM to reproduce the observed dynamics starting from a small number of initial infections. This effect may be due to exogenous factors, including the fact that the beginning of the epidemic season (August) coincides with the beginning of the school year, when students come into more frequent contact with one another. Figure 3 aggregates these summaries to allow for visual comparison with the data. Posterior predictive plots are provided in the Supplementary Material as an additional tool to assess coverage. Figure 3 suggests that the model is not flexible enough to account for very steep local peaks in infection. These peaks could be due to exogenous factors that were not accounted for by the model, such as school schedules, weather, or more than one strain of influenza and RSV in circulation. In particular, the year 2008-09 was characterized by the outbreak of a second, highly virulent strain of influenza A (H1N1) which may be responsible for the poorer model fit. In contrast, during the years 2004-05 and 2006-07, the combination of the ARI dynamics and background process track the data quite well. Indeed, both of these years exhibit slowly-changing ARI dynamics, which are consistent with our process model, and the lack of sharp peaks means that the background process is able to capture most of the remaining variability. Estimates and credible intervals for cross-interference parameters σ 1 and σ 2 are shown in the left panel of Figure 4 . Our results suggest that the degree of cross-immunity or crossenhancement between influenza and RSV changes from year to year. The Discussion section elaborates on why this is expected, given the circulation of different strains, and why these parameters do not necessarily vary smoothly from year to year. Nevertheless, some interest- ing patterns emerge. Evidence is assessed though the coverage of the 95% posterior credible intervals for the parameters σ 1 and σ 2 . Cross-immunity for influenza is detected in epidemic years 2004-05 and 2006-07, meaning that prior infection with the circulating RSV strain may have inhibited infection with influenza, though there is little evidence of this effect in the opposite direction. In Figure 2 this effect manifests in the lack of a defined peak in influenza infections following the peak in RSV infections. These relatively small influenza outbreaks may be insufficient to determine influenza's cross-interference effect for RSV, as evidenced by the fact that, in both years, σ 2 has approximately equal posterior probability of lying on either side of 1. Evidence of cross-enhancement for influenza, due to a prior infection with RSV, was found in epidemic years 2005-06, 2007-08, and 2008-09 . Additionally, evidence for cross-enhancement for RSV was found in 2007-08, and 2008-09. However, results were inconclusive for the cross-interference outcome for RSV in 2005-06. Finally, epidemic season 2003-04 shows evidence of cross-enhancement for RSV from a prior influenza infection, but the magnitude of the effect in the other direction is inconclusive. This is expected, as this is the only year where the influenza peak preceded the peak in RSV and both peaks were well separated, resulting in a lack of RSV cases early in the season to inform the cross-interference parameter. Contact transmission rates describe the degree of infectiousness of a given ARI strain. Their estimates and credible intervals are shown in the right panel of Figure 4 . We can additionally investigate the relative sizes of the transmission rates for the particular strains of influenza and RSV in circulation. Assessment of this relationship is based on posterior probability of the difference between β 1 and β 2 . We find evidence that RSV is more contagious than influenza in all years except 2003-04 and 2005-06, where evidence suggests a more infectious influenza strain. This pattern is also clearly discernible in Figure 2 , where 2003-04 and 2005-06 are the only years studied where an influenza peak precedes a peak in RSV infections. The importance of enabling the estimation of cross-interference and transmission rates and, consequently, disaggregating pathogen dynamics is highlighted in the Discussion section. Our analysis suggests that novel sentinel programs, like the one implemented at Hospital Central, provide a powerful tool to jointly estimate features of these epidemics. 5.3. Model Assessment. The model fit was assessed by analyzing predictive residuals and interpreting any deviations from their theoretical distribution. Results of the residual analysis are shown in Figures 3 through 5 of the Supplementary Material. A posterior sample of 100 step-ahead predicted states were transformed, via the observation process (i.e., rescaled and aggregated), and subtracted from the aggregated ARI reports data to obtain a sample from the residual posterior distribution (Figure 3 in the Supplementary Material, gray circles). The mean residuals are roughly centered around zero and are approximately normally distributed, as illustrated in Figure 4 in the Supplementary Material. However, some evidence of a pattern in the residuals warrants further investigation. For all six years there is a noticeable peak or drop of the residuals around the 20th week of the epidemic season. This again suggests that the two-pathogen SKM is not flexible enough to produce large epidemic peaks. The 20th week of the season coincides with the winter holidays and subsequent return of children to school. The increased social contact during this time violates the assumption of a constantly well-mixed population in the SKM which likely accounts for this model discrepancy. The background term is intentionally not flexible enough to capture these peaks to avoid overfitting the data. Indeed, empirical autocorrelation plots for the mean residuals, shown in Figure 5 in the Supplementary Material, detect statistically significant correlations over small lags which are not captured by the background process. The posterior variance of the predictive residuals is much greater during 2008-09 than the previous years, and the pattern in the residuals is more pronounced, including a very large peak around week 40 of the epidemic season. This spike corresponds directly to the the timing of the emergence of the influenza A (H1N1)pdm09 strain of influenza, in addition to the strain already in circulation. As expected, our parameteric model, which explicitly accounts for only two pathogens, is not able to capture the dynamics of an additional major outbreak. The Discussion section outlines ways in which these issues can be accounted for in the model, subject to the availability of appropriate data. 6. Discussion. We have presented an analysis of aggregated ARI report data and virological test data collected in the state of San Luis Potosí, México. A Bayesian model was constructed based on a large-population (LNA) approximation to a stochastic kinetic SIR model for the dynamics for two ARI pathogens, influenza and RSV. In order to integrate multiple data sources and an empirical discrepancy model representing infections with other pathogens, we extended the existing marginal sampling approach based on the LNA. The inclusion of the additional virological data allowed us to jointly model disease dynamics of influenza and RSV in San Luis Potosí, to recover cross-interference parameters, and to disaggregate the trajectories of individual pathogens. The challenge of sampling from the posterior distribution over the model parameters was addressed by employing a population MCMC algorithm. Overall, our approach is general and can be applied to the problem of inference on other stochastic kinetic models as well as models of more than two interacting epidemics, when additional data sources are available to recover their individual dynamics. 6.1. Joint inference and cross-interference. Joint modeling of the two pathogens enables estimation of cross-interference parameters and related qualitative features, such as the order and location of epidemic peaks of influenza and RSV in a given epidemic year. This is important because of the central role that cross-interference plays in shaping the evolutionary and epidemiological dynamics of multistrain pathogens (Gjini et al. (2016) ; Zhang et al. (2013) ). Changes in cross-interference are known to give rise to complex temporal dynamics in disease incidence and create oscillating time series from which it is difficult to estimate parameters that govern the underlying epidemiological processes (Bhattacharyya et al. (2015) ; Reich et al. (2013) ). Studies about influenza-RSV interaction are predominantly qualitative (Bhattacharyya et al. (2015) ; Gröndahl et al. (2013) ; Pinky and Dobrovolny (2016) ). Though crossimmunities have been estimated for dengue (Reich et al. (2013) ) and influenza and pneumococcal pneumonia ), to our knowledge, there have been no quantitative studies of cross-interference between influenza and RSV. Furthermore, to our knowledge, our work is the first to estimate cross-interference parameters from an SKM. Pathogen fitness is determined by processes that occur both within and between hosts (Bashey (2015); Mideo, Alizon and Day (2008) ; van Baalen and Sabelis (1995)). Many factors can drive cross-interference patterns between these viruses, such as vaccination programs (Martcheva, Bolker and Holt (2007) ; Rohani (1999) ) and geographical location (Bloom-Feshbach et al. (2013) ). Furthermore, the strain that is most successful within a host is not necessarily the one that can best be transmitted between hosts (Mideo, Alizon and Day (2008) ). In-vitro within-host interaction studies (Mideo, Alizon and Day (2008) ) suggest that in cells coinfected with both pathogens, influenza blocks RSV infection (Shinjoh et al. (2000) ). Observational studies by Ånestad (1987) ; Míguez, Iftimi and Montes (2016); Nishimura et al. (2005) suggest a similar pattern. We note that our results do not contradict these studies, since we model between-rather than within-host interactions, focusing on the disease spread at the level of the host population. 6.2. Prediction. The model formulated in Section 3, and the extended marginal sampling approach presented in Section 4.1 can, in principle, be applied over time domains of any length. However, in the context of our motivating problem, estimating the model for each epidemic year separately is consistent with the understanding of the way that different strains of influenza and RSV circulate from year to year. Parameters defining the circulation and interaction dynamics of influenza and RSV differ from strain to strain of the virus, and different strains of each virus circulate in different years. Specifically, RSV is classified into two major groups, A and B, each of which contains multiple variants (Anderson et al. (1985) ; Mufson et al. (1988); Venter et al. (2001) ). Of the five years studied by Peret et al. (1998) , each year saw a shift in the predominant genotype or subtype of RSV so that no single genotype or subtype was dominant for more than one of the five years. Peret et al. (1998) hypothesized that newly introduced strains are better able to evade previously induced immunity in the population and, consequently, either circulate more efficiently or are more pathogenic. Similarly, influenza viruses continually change, due to antigenic drift, which requires vaccine reformulation before each annual epidemic (Ferguson, Galvani and Bush (2003) ) and antigenic shift (Fiore et al. (2008) ; Rambaut et al. (2008) ; Zambon et al. (2001) ) which can lead to the emergence of new influenza epidemics. All of these effects combine to create unique conditions in each epidemic year. We note that sometimes such changes can occur within a single epidemic year, such as with the emergence of the global influenza A (H1N1)pdm09 pandemic in April 2009. Indeed, we observed that the model fitted using data from the epidemic year 2008-2009 was only partially able to capture the dynamics of this second epidemic wave of influenza in the latter part of the year. 6.3. Model building. In future work we plan to address several questions. First, we believe that accounting for vaccination effects and seasonal forcing will yield additional insight into the interaction of these pathogens. Specifically, one important seasonal factor identified was the possible effect of holidays and school schedules on the system dynamics. This factor is difficult to incorporate directly into a SKM which assumes a well-mixed population. We hypothesize that this additional stochasticity may be introduced through a discrepancy process within the system equations, defining the LNA. Second, it may be realistic in some years to model multiple strains of influenza and RSV. Indeed, the emergence of additional strains, such as in April of 2009, is not suitably captured by our model in which parameters are essentially shared between strains of a single pathogen within each year. Therefore, modeling such strains as separate pathogens may yield better understanding of the circulation of ARI in the state. In order to fit these models, it will be critical to obtain new sources of data, such as vaccination reports, local climate variables, or virological tests for new strains, to identify additional model components. 6.4. Conclusion. The ability to understand the dynamics of infectious diseases is crucial for timely assessment of epidemics (Abat et al. (2016) ). While infectious disease outbreaks are usually caused by a specific pathogen, symptoms caused by respiratory viruses are nonspecific, and do not allow for accurate diagnosis of the etiology of illness (including ARI in general as well as pneumonia). Therefore, virological and bacteriological methods are required to identify the agent responsible for a patient's disease. However, laboratory testing methods are not always available, may be costly, and, in some instances, have low sensitivity or specificity. As a result, laboratory evaluation of all respiratory infections is not feasible, and the role of specific pathogens cannot be fully ascertained. These limitations are also present when assessing epidemics at a population level. Although seasonal ARI epidemics tend to be dominated by one virus (such as influenza), the contribution of a second pathogen (such as RSV) needs to be considered when analyzing aggregated data (Charu et al. (2011) ). Failing to consider these dynamics may result in overestimation of the effects of a specific virus as well as biased parameter estimates when modeling epidemics. Unfortunately, systematic data describing the epidemiological behavior during several consecutive seasons for some viruses, such as RSV, is not available in many countries ). In the present study we used influenza and RSV data, collected systematically and prospectively during a six-year period, together with ARI reports that are routinely collected by health-care authorities; this allowed us to analyze the contribution of these two pathogens to ARI epidemics. While influenza surveillance data is increasingly available worldwide, systematically obtained year-round RSV virological data from middle-and low-income countries is scarce, and no such data is currently available for México (aside that from San Luis Potosí). Overall, our study underscores the relevance of considering more than one pathogen when assessing ARI epidemics. The availability of two-pathogen models could have important applications not only during interpandemic winter seasons but also during the emergence of pandemic viruses, such as influenza A (H1N1)pdm09 virus and SARS-CoV-2. A large-population approximation of the CME (1) is given by the van Kampen expansion which can then be computed via the Linear Noise Approximation (LNA) (van Kampen (1992) ). For large Ω the system states X can be expressed as the sum of a deterministic term φ : [0, T ] → R + dim(X) and a stochastic term ξ, Assuming constant average concentration, the magnitude of the stochastic component will increase as the square root of population size. Let V = (v 1 , . . . , v R ) be a dim(X) × R stoichiometric matrix that describes changes in the population size, due to each of the R reactions. The time evolution of the term of order Ω 1/2 (van Kampen (1992)), φ i (t) = lim Ω,X−→∞ X i (t)/Ω, is governed by the ODE initial value problem, where φ 0 = X(0)/Ω. The full expressions for the macroscopic equations for the twopathogen model in the first line of (11) are The stochastic process ξ is governed by the Itô diffusion equation, , where A(t) = ∂V a(φ(t))/∂φ(t), B(t) = V diag{a(φ(t))}V , and W (t) denotes the dim(X)-dimensional Wiener process (Gillespie (2007) ; van Kampen (1992) ). Expressions for both matrices in the two-pathogen model are provided in Table 4 . For fixed or Gaussian initial conditions, the SDE in (13) can be solved analytically. The solution is a Gaussian process with meanξ and covariance C (van Kampen (1992)), that is, whereξ(t) and C(t) are obtained (see van Kampen (1992, pp. 244-258) ) by solving the ODE initial value problem, where Φ(t) is the evolution or fundamental matrix (Grimshaw (1993) ) determined by the matrix equation, A standard choice (e.g., Fearnhead, Giagos and Sherlock (2014) ) is to setξ(0) = 0, from which it follows thatξ(t) = 0 for all t ∈ [0, T ]. The covariance C is obtained by solving (17) dC(t) dt = C(t)A(t) T + A(t)C(t) + B(t), t ∈ (0, T ], C(0) = C 0 . It follows from (10) and (14) that the transition densities of X(t) are given by (2). Posterior predictive summaries for aggregated ARI reports from San Luis Potosí, México. The aggregated data, measured from August to July of the following year, is represented by black dots. Gray bands represent point-wise 95% prediction intervals and the gray dashed line shows a maximum a-posteriori predictions. Interference between outbreaks of respiratory syncytial virus and influenza virus infection Surveillance of respiratory viral infections by rapid immunofluorescence diagnosis, with emphasis on virus interference Interference between outbreaks of respiratory viruses Traditional and syndromic surveillance of infectious diseases and pathogens The influence of immune cross-reaction on phase structure in resonant solutions of a multi-strain seasonal SIR model An introduction to stochastic epidemic models Respiratory syncytial virus contributes to more severe respiratory morbidity than influenza in children < 2 years during seasonal influenza peaks Antigenic characterization of respiratory syncytial virus strains with monoclonal antibodies Within-host competitive interactions as a mechanism for the maintenance of parasite diversity Cross-immunity between strains explains the dynamical pattern of paramyxoviruses Latitudinal variations in seasonal activity of influenza and respiratory syncytial virus (RSV): A global comparative review Bayesian inference for a discretely observed stochastic kinetic model Learning about physical parameters: The importance of model discrepancy. Inverse Probl A robust parameter estimation method for estimating disease burden of respiratory viruses Mortality burden of the A/H1N1 pandemic in Mexico: A comparison of deaths and years of life lost to seasonal influenza Burden of influenza and respiratory syncytial virus infection in pregnant women and infants under 6 months in Mongolia: A prospective cohort study Supplement to "Inference for stochastic kinetic models from multiple data sources for joint estimation of infection dynamics from aggregate reports and virological data Inference for discretely observed stochastic kinetic networks with applications to epidemic modeling Respiratory syncytial virus is an important cause of community-acquired lower respiratory infection among hospitalized adults Tracking epidemics with Google Flu Trends data and a state-space SEIR model Bayesian emulation and calibration of a dynamic epidemic model for A/H1N1 influenza Inference for reaction networks using the linear noise approximation Ecological and immunological determinants of influenza evolution Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: An application to single cell data A linear noise approximation for stochastic epidemic models fit to partially observed incidence counts Prevention and control of influenza recommendations of the advisory committee on immunization practices Modelling estimates of the burden of Respiratory Syncytial virus infection in adults and the elderly in the United Kingdom Markov Chain Monte Carlo maximum likelihood Stochastic simulation of chemical kinetics How direct competition shapes coexistence and vaccine effects in multi-strain pathogen systems Delayed acceptance particle MCMC for exact inference in stochastic kinetic models Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo Effect of an infection control program on the frequency of nosocomial viral respiratory infections Nonlinear Ordinary Differential Equations: Applied Mathematics and Engineering Science Texts 2. Routledge The 2009 pandemic influenza A(H1N1) coincides with changes in the epidemiology of other viral pathogens causing acute respiratory tract infections in children Respiratory syncytial virus in healthy adults: The cost of a cold A stochastic transcriptional switch model for single cell imaging data Mathematical modelling and prediction in infectious disease epidemiology The effect of cross-immunity and seasonal forcing in a multi-strain epidemic model Bayesian inference of biochemical kinetic parameters using the linear noise approximation Mortalidad en México por influenza y neumonía The association of respiratory syncytial virus infection and influenza with emergency admissions for respiratory disease in London: An analysis of routine surveillance data Vaccine-induced pathogen strain replacement: What are the mechanisms? Prevalence of co-infection between respiratory syncytial virus and influenza in children Linking within-and between-host dynamics in the evolutionary epidemiology of infectious diseases Temporal association between the influenza virus and respiratory syncytial virus (RSV): RSV as a predictor of seasonal influenza Respiratory syncytial virus epidemics: Variable dominance of subgroups A and B strains among children The source of respiratory syncytial virus infection in infants: A household cohort study in rural Kenya The clinical features of respiratory syncytial virus: Lower respiratory tract infection after upper respiratory tract infection due to influenza virus Australian influenza surveillance report Circulation patterns of genetically distinct group A and B strains of human respiratory syncytial virus in a community Coinfections of the respiratory tract: Viral competition for resources The genomic and epidemiological dynamics of human influenza A virus Interactions between serotypes of dengue highlight epidemiological impact of cross-immunity Opposite patterns of synchrony in sympatric disease metapopulations In vitro growth profiles of respiratory syncytial virus in the presence of influenza virus Statistical inference for multi-pathogen systems Identifying the interaction between influenza and pneumococcal pneumonia using incidence data Mathematical modeling of infectious disease dynamics Global mortality estimates for the 2009 influenza pandemic from the GLaMOR Project: A modeling study The role of mathematical modelling in public health planning and decision making Intrinsic noise analyzer: A software package for the exploration of stochastic biochemical kinetics using the system size expansion Mortality associated with influenza and respiratory syncytial virus in the United States Scalable Bayesian inference for coupled hidden Markov and semi-Markov models Estimates of the global, regional, and national morbidity, mortality, and aetiologies of lower respiratory infections in 195 countries, 1990-2016: A systematic analysis for the Global Burden of Disease Study The dynamics of multiple infection and the evolution of virulence Stochastic Processes in Physics and Chemistry Tracking the dynamics of pathogen interactions: Modeling ecological and immune-mediated processes in a two-pathogen single-host system Superinfection between influenza and RSV alternating patterns in San Luis Potosí state Genetic diversity and molecular epidemiology of respiratory syncytial virus over four consecutive seasons in South Africa: Identification of new subgroup A and B genotypes Intensive care unit admission and death rates of infants admitted with respiratory syncytial virus lower respiratory tract infection in México Bayesian Forecasting and Dynamic Models Bayesian inference for diffusiondriven mixed-effects models Stochastic Modelling for Systems Biology Stochastic Modelling for Systems Biology Coherence of influenza surveillance data across different sources and age groups Contribution of influenza and respiratory syncytial virus to community cases of influenza-like illness: An observational study Co-circulation of influenza A virus strains and emergence of pandemic via reassortment: The role of crossimmunity Pairwise marginal contours for parameters defining the SKM based on data from San Luis Potosí, México for the years Pairwise marginal contours for parameters defining the SKM based on data from San Luis Potosí, México for the years 2005-06 (left) and Pairwise marginal contours for parameters defining the SKM based on data from San Luis Potosí, México for the years 2007-08 (left) and corner.py: Scatterplot matrices in Python Acknowledgements. The authors thank the following people for helpful comments and suggestions: Grzegorz A. Rempala (Mathematical Biosciences Institute, The Ohio State University) and Leticia Ramirez (Centro de Investigación en Matemáticas). The authors also wish to thank The Ohio State University and Centro de Investigación en Matemáticas (CIMAT) for making this collaboration possible. Finally, we thank the anonymous reviewers and Associate Editor for invaluable comments and suggestions.Funding. This research was supported in part by the Mathematical Biosciences Institute (MBI) and the National Science Foundation under grant DMS 1440386.