key: cord-0821214-3bpkiwd7 authors: Bayati, Basil S. title: Deterministic analysis of extrinsic and intrinsic noise in an epidemiological model date: 2016-05-13 journal: Phys Rev E DOI: 10.1103/physreve.93.052124 sha: e8521fd293492e0dfecdb6cd2163f2e4abbd5852 doc_id: 821214 cord_uid: 3bpkiwd7 We couple a stochastic collocation method with an analytical expansion of the canonical epidemiological master equation to analyze the effects of both extrinsic and intrinsic noise. It is shown that depending on the distribution of the extrinsic noise, the master equation yields quantitatively different results compared to using the expectation of the distribution for the stochastic parameter. This difference is incident to the nonlinear terms in the master equation, and we show that the deviation away from the expectation of the extrinsic noise scales nonlinearly with the variance of the distribution. The method presented here converges linearly with respect to the number of particles in the system and exponentially with respect to the order of the polynomials used in the stochastic collocation calculation. This makes the method presented here more accurate than standard Monte Carlo methods, which suffer from slow, nonmonotonic convergence. In epidemiological terms, the results show that extrinsic fluctuations should be taken into account since they effect the speed of disease breakouts and that the gamma distribution should be used to model the basic reproductive number. Stochastic processes are used to model complex physical phenomena that range from astronomy [1] to epidemiology [2] . An important example is stochastic chemical kinetics, which describes the time evolution of chemically reacting systems by taking into account the fact that molecules are discrete entities that exhibit randomness in their dynamical behavior. A master equation [3, 4] can be used to model this probabilistic process. The number of variables in this equation is large for all but the simplest systems, so analytical or direct numerical integration methods are usually impractical. Alternatively, Monte Carlo samples of the stochastic process can be numerically generated via stochastic simulation algorithms (SSAs) [5] . Two different lines of work have provided methods for overcoming the difficulties of the explicit solution of the master equation without resorting to Monte Carlo simulations, namely,(1) truncating a power-series expansion of the master equation, methods of which include the Kramers-Motyal expansion [6, 7] , expansion [3, 8, 9] , Wentzel-Kramers-Brillouin (WKB) approximation [10] , moment closure methods [11] , and distribution approximations [12] , and (2) truncating the number of states in the master equation [13] [14] [15] . Here we have utilized the expansion so that we can obtain an analytical expression for the master equation and therefore analyze the influence of extrinsic noise in model. The effect of extrinsic noise in biological models has recently attracted interest in cellular [16, 17] and population [18] scales since the parameters of these models represent the environment of the underlying system. For example, in cellular systems, the rate parameters may represent the binding rate of two proteins in the cytoplasm. At the population level, the intrinsic and extrinsic noise was recently studied in influenza [19] . Uncertainty quantification is the process of deterministically computing the effect of input uncertainty in equations of interest. The stochastic collocation method [20, 21] consists of an expansion that is related to polynomial chaos [22] . These methods have the desirable property of exponential * bbayati@intven.com convergence rates and therefore require comparatively few evaluations of the model equations. There has not yet been a systematic study of the effect of both extrinsic and intrinsic fluctuations in the susceptibleinfectious-recovered (SIR) master equation. In a previous publication [23] we showed that high-order approximations of the intrinsic noise in the master equation yield quantitatively different solutions. Here we examine the role of extrinsic noise on the master equation. We couple a high-order expansion of the master equation with a stochastic collocation method and analyze various sources of uncertainty modeled with uniform, beta, and gamma distributions. We show that depending on the distribution of the extrinsic noise, the master equation yields quantitatively different results compared to using the expectation of the distribution for the parameter. The use of a gamma distribution is based on an empirical study on the basic reproductive number in severe acute respiratory syndrome (SARS) [18] . In epidemiological terms, the results show that extrinsic fluctuations should be taken into account since they effect the speed of disease breakouts and that the gamma distribution should be used to model the basic reproductive number. The analysis presented here is notably not based on Monte Carlo simulations and therefore does not suffer from a slow, nonmonotonic convergence rate. Rather, the method here converges linearly with respect to the number of molecules and exponentially with respect to the order of the polynomials used in the stochastic collocation calculation. We note that there are restrictions on the applicability of the method as well as assumptions in the underlying model. Restrictions on the applicability of the method are (1) the stochastic process must be a so-called L 2 random variable, meaning that it must have a finite second moment [22] and (2) the error term incident to the expansion of the master equation is inversely proportional to the system size and therefore may be inaccurate for small systems. For example, a process with extrinsic noise from a Cauchy distribution would violate the L 2 assumption. We also make assumptions on the validity of the underlying model. Since the underlying equations rely on a compartmental model for the species, we assume that the system is well mixed such that mass action kinetics are valid. Moreover, we assume that the system is memoryless and subject to exponential waiting times between reaction events. In Sec. II we discuss how to compute the average population of a stochastic process over time. In Sec. III we show the results using various distributions for the extrinsic noise. Additionally, we compare the results of the method presented here to a classical Monte Carlo simulation. In Sec. IV we conclude by highlighting the importance of various sources of noise in epidemiological models and the effect they can have on simulations. The objective of the method presented here is to formulate an efficient way of calculating the expected population of a stochastic process over time. Here we assume that the stochastic process has intrinsic noise owing to the discrete nature of the system and has extrinsic noise that is governed by a known distribution. We also assume that the distribution of the extrinsic noise has a finite variance. In this section we will formulate a way to compute the average of the intrinsic noise by approximating a master equation. In the next section we will describe how to compute the expectation of the extrinsic noise without resorting to Monte Carlo sampling. We will consider an elementary nonlinear system, the canonical susceptible-infectious-recovered (SIR) model [24] , which is the foundation of more detailed models that include agedependent and spatially dependent processes, namely, where the reproductive number is defined as R 0 β/κ and S, I , and R denote the susceptible, infectious, and recovered persons, respectively. This process models the event in which an infectious person comes into contact with a susceptible person at a rate β and results in two infectious persons, i.e., S + I β − → 2I = S βI − → I . Let n( ,t) be a vector denoting the number of susceptible, infectious, and recovered persons. Then, the following differential equations are valid as ↑ ∞: where there is convergence to a concentration ζ (t) lim ↑∞ −1 n( ,t) and |ζ (t)| 1 = 1. The corresponding multivariate master equation is where P (n,t) denotes the probability of being in state n at time t, the discrete shift operator L μ i (P ) P (n i + μ,n \i ,t), n \i denotes the vector n excluding the ith element, and |n| 1 . We follow van Kampen [3, 8, 9] and define the following ansatz: n(t) = ζ (t) + 1/2 ξ , where ξ is an unknown random variable, and we subsequently define (ξ ,t; β,κ) P ( ζ (t) + 1/2 ξ ,t; β,κ) = P (n,t; β,κ). The discrete shift operators are expanded by means of a power series [3, 8, 9] : where μ ∈ {−1, + 1} and ρ is the order of the expansion. Inserting the ansatz and shift operators [Eq. (8)] into Eq. (6) and then collecting terms in decreasing powers of yield the desired power-series equation. Using the computer algebra system Mathematica, we expanded Eq. (6) to O( −2 ): The particular form of the functions F −j/2 (·) is provided in the Supplemental Material [25] . We note that the lowest-order approximation is the classical Fokker-Planck equation [3] , which also justifies the ansatz. In order to obtain the set of nonlinear differential equations for the evolution of the moments of the distribution (ξ ,t), we first define the moments, namely, where the powers k 1 ,k 2 , and k 3 determine the moments. The equations for the moments are found by multiplying both sides of the expansion by (ξ ) and integrating over the domain: which therefore requires repeated integration by parts on the terms kept in the expansion. The set of nonlinear equations that govern the evolution of the moments was derived analytically. Let ξ t ( ξ 1 t , ξ 2 t , ξ 3 t ) T , i.e., the first moments of the stochastic process; then the first-moment corrections to the solution of the continuum reaction rate equations are We note that in the lowest-order approximation, the Fokker-Planck equation, ξ t ≡ 0 [3] . The objective of the stochastic collocation method in this study is to evaluate the integral where the left-hand side represents the average over the extrinsic and intrinsic noise, ρ(χ ) represents the distribution of the extrinsic noise, and ζ represents the domain of the distribution. In a typical Monte Carlo method, χ would be repeatedly sampled from ρ(χ ), the differential equation solved numerically, and then averaged over all of the samples. After the expansion the equations for the moments are of the form ,t; ,β,κ) ), (13) where β is a random parameter sampled from the distribution ρ(χ ), κ is a constant, and the functions G −j/2 (·) are provided in the Supplemental Material. Assume that there exists a set of orthogonal polynomials where the Kronecker delta function δ i,j = 1 if i = j and δ i,j = 0 if i = j . The integral weights for the approximation are where A P is the coefficient of χ P in P (χ ), P = d P dχ , x k denotes the kth root of the polynomial P (x), i.e., x = (x 1 , . . . ,x P ) = −1 P (0), and P denotes the order of the polynomial chaos approximation. Then the integration of the random variable representing the extrinsic noise is approximated to order P by (16) where C is a constant independent of P (see [26] for the exponential convergence rate of collocation methods). The right-hand side of (12) is solved by substituting x k for β and then solved numerically using NDSOLVE in Mathematica to obtain θ (ξ ,x k ,κ) t . Note that the set of orthogonal polynomials { i } ∞ i=0 with respect to the distribution ρ(χ ) need not be known a priori and can be determined on the fly by an orthogonalization procedure. That said, for certain distributions ρ(χ ) the polynomials are known. For the uniform, beta, and gamma distributions used here, the associated orthogonal polynomials are Legendre, Jacobi, and Laguerre polynomials, respectively [22] . Here we analyze the role of a stochastic reproductive number and its effect on the canonical SIR model. The distribution that the reproductive number follows will be the uniform, beta, or gamma distribution. The uniform and beta distributions were chosen since these distributions are often used when limited or no information about the parameters is available. We also use a gamma distribution that is based on experimental results of the basic reproductive number in SARS [18] . We compare the results of a model with both extrinsic and intrinsic noise to both a model with only intrinsic noise and a model without any noise included. We show that the effect of the extrinsic noise can be just as large as the intrinsic noise and quantify this difference by computing a mean-squared deviation. We first analyze the effect of a uniform distribution for the random parameter β, i.e., β ∼ ρ(χ ), where where a = 1 and b = 2 denote the domain of the input distribution. Throughout the results section we will use E[ρ(χ )] = 3/2, the total population = 100, and κ = 1. Shown in the left panel in Fig. 1 is the result along with the solution of the deterministic model and the expansion without extrinsic noise where a constant is used for the input parameter, namely,β = E[ρ(χ )] = 3/2. It can be seen that represents the integration over time, whereα =β represent the parameters used in the beta distribution. Note that the variance of the beta distribution is inversely proportional to the value ofα andβ. The integrand of the norm is the squared difference between the infectious species of a simulation using extrinsic noise and a simulation using the expectation of the extrinsic noise. using the expectation of the distribution is insufficient to capture the time evolution of the system since the progression of the disease is slower than using a model with extrinsic noise. We note, however, that the steady state solutions are similar. We next analyzed the beta distribution for the stochastic parameter: . (18) This distribution approaches the uniform distribution asα andβ approach zero. Shown in the right panel in Fig. 1 is a simulation usingα =β = 100, i.e., a comparatively small variance around E[ρ(χ )] = 3/2. It can be seen that the simulation is essentially equivalent to using a constant parameter. This is to be expected since the intrinsic noise becomes a factor when the reproductive number is near unity. Since a beta distribution centered around 3/2 has a very small probability near R 0 = 1, no noticeable difference is discernible. To analyze the influence of the extrinsic noise on the infectious species, we will define an error metric between two approximations for the infectious persons (i = 2) as follows: where ξ 2 represents the infectious population of a model using only intrinsic noise and ξ 2 (S) t represents the infectious population of a model using both extrinsic and intrinsic noise. Additionally, we have letβ =α. This norm represents the time average of the squared distance between the two models. Figure 2 shows this norm plotted against the value used for bothα andβ in the distribution for the model ξ 2 (S) t . Note that the deviation increases as the variance of the input distribution increases. Additionally, we observe a nonlinear dependence on the variance of the input distribution. Experimental studies of diseases has shown that the reproductive number can be modeled effectively with a negative binomial distribution [18] . Here we have used the continuous analog of the negative binomial distribution, namely, the gamma distribution: whereα is the shape parameter andβ is the rate parameter. We usedα = 0 andβ = 2/3 and have plotted the results in Fig. 3 . We note that the initial outbreak occurs much quicker and also subsides quicker. Importantly, the steady state of the solution is considerably different as the susceptible population never decreases beyond the recovered population. This difference when using a gamma distribution for R 0 is notable since both the time-dependent and time-independent dynamics differ even for the simplest SIR model. Moreover, we have included the standard continuum equations in Fig. 3 to emphasize the effects of both sources of noise. In this section we compare the results obtained in Sec. III C with a Monte Carlo simulation. We use the stochastic simulation algorithm [5] to simulate a stochastic trajectory over time. Let N denote the total number of Monte Carlo samples. A random variate for the extrinsic noise is drawn for each Monte Carlo sample: β n ∼ ρ(x) for n = 1, . . . ,N. The propensities (unscaled probabilities for each reaction) are defined as follows: a 1 = β n SI and a 2 = κI for Eqs. (1) and (2) . At each time step in the stochastic simulation algorithm, two random variables govern the evolution of the system, namely, a reaction index and a time step. The reaction index is sampled from a pointwise distribution for the propensities, i.e., P (j = l) = a l /(a 1 + a 2 ), and a time step τ is sampled from an exponential distribution with a mean of 1/(a 1 + a 2 ). The system is updated by executing the reaction with index j and incrementing the system time by τ . Figure 4 shows the fraction of infectious individuals for 100 trajectories of a stochastic simulation along with the mean over time using a total of N = 2 × 10 4 samples. We used a gamma distribution for ρ(x) defined in Eq. (20) withα = 0 andβ = 2/3. We note that the fluctuations are due to the noise arising both from sampling a distribution for the parameter β and from the stochastic simulation algorithm itself. The mean of the processes over time is well captured by the expansion coupled with a stochastic collocation method. Both the approximate method derived here and the Monte Carlo simulation show accelerated disease progression compared with the continuum equations. We performed an analytical expansion of the epidemiological master equation, which was then coupled to a stochastic collocation method to analyze the extrinsic noise of a random reproductive number. To analyze the role of extrinsic noise in a susceptible-infectious-recovered model, we used the uniform, beta, and gamma distributions. While the difference was small in the case of a uniform or beta distribution, the gamma distribution caused the infection to peak earlier in time and therefore caused the infectious individuals to go zero earlier than the continuum model and stochastic models without noise. This is of importance to the public health community since a faster disease progression observed in empirical data may lead to an erroneous estimation of the reproductive number in a continuum model without extrinsic noise. We showed that the deviation away from the expectation of the extrinsic noise scaled nonlinearly with the variance of the input distribution. In epidemiological terms, the results imply that both intrinsic and extrinsic fluctuations should be taken into account since they may affect the speed of disease breakouts. The numerical methods presented here converge linearly with respect to the number of molecules in the system and exponentially with respect to the order of the polynomials used in the stochastic collocation calculation. This is in opposition to the standard Monte Carlo methods, which suffer from a slow, nonmonotonic convergence rate. Future work may involve the analysis of multiple sources of extrinsic noise. Efficient numerical methods, such as the Smolyak sparse-grid construction [27, 28] , could be used for a stochastic collocation method. Stochastic problems in physics and astronomy Epidemics and rumours-A survey Stochastic Processes in Physics and Chemistry Stochastic Methods: A Handbook for the Natural and Social Sciences Exact stochastic simulation of coupled chemical reactions Brownian motion in a field of force and the diffusion model of chemical reactions Stochastic processes and statistical physics A power series expansion of the master equation The expansion of the master equation WKB versus generalized van Kampen system-size expansion: The stochastic logistic equation A study of the accuracy of moment-closure approximations for stochastic chemical kinetics Distribution approximations for the chemical master equation: Comparison of the method of moments and the system size expansion The finite state projection algorithm for the solution of the chemical master equation Solving chemical master equations by adaptive wavelet compression An adaptive wavelet method for the chemical master equation Noise-based switches and amplifiers for gene expression Intrinsic and extrinsic contributions to stochasticity in gene expression Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures Single-cell analysis and stochastic modeling unveil large cellto-cell variability in influenza A virus infection Comparison of nonintrusive polynomial chaos and stochastic collocation methods for uncertainty quantification Stochastic approaches to uncertainty quantification in CFD simulations The Wiener-Askey polynomial chaos for stochastic differential equations Influence of highorder nonlinear fluctuations in the multivariate susceptibleinfectious-recovered master equation A contribution to the mathematical theory of epidemics for Mathematica code that was used to expand the multivariate master equation, generate the set of moment equations, compute the orthogonal polynomials and weights for the extrinsic noise, numerically solve the moment equations On the Smolyak cubature error for analytic functions Numerical integration using sparse grids The author thanks Bill and Melinda Gates for their active support of this work and their sponsorship through the Global Good Fund.