key: cord-0584487-45j82ars
authors: Leitao, 'Alvaro; V'azquez, Carlos
title: A stochastic $theta$-SEIHRD model: adding randomness to the COVID-19 spread
date: 2020-10-29
journal: nan
DOI: nan
sha: 09e326bbe2b4e9128763c2cb9105ac7a8b0a2221
doc_id: 584487
cord_uid: 45j82ars

In this article we mainly extend the deterministic model developed in [10] to a stochastic setting. More precisely, we incorporated randomness in some coefficients by assuming that they follow a prescribed stochastic dynamics. In this way, the model variables are now represented by stochastic process, that can be simulated by appropriately solve the system of stochastic differential equations. Thus, the model becomes more complete and flexible than the deterministic analogous, as it incorporates additional uncertainties which are present in more realistic situations. In particular, confidence intervals for the main variables and worst case scenarios can be computed.

The coronavirus disease 2019, renamed as COVID-19, is an infectious disease produced by the virus SARS-CoV-2, which was declared pandemic by the World Health Organization (WHO) in March 11, 2020 . This disease has posed many novel scientific challenges, ranging from contagious patterns, medical treatments or vaccine developments, to data analytics, spread modelling or evolution forecast. The research on all of these topics has been intensive in the last few months, as so will be in the near future. Thus, many disciplines from interconnected fields, like biosciences, mathematical and statistical modelling or artificial intelligence, will be involved and they will play an important role to rapidly overcome this exceptional emergency situation.

From the mathematical modelling point of view, epidemiological compartmental models have been often used to understand and analyze the behaviour of the contagious diseases, with the article [12] being the pioneering work. These models provide useful tools to make predictions about the future evolution of the epidemic and control its propagation. In the literature of epidemic modelling, many examples of general-purpose models have been proposed (see [8] , for a review), each of them accounting for some specific characteristics of the diseases, like the well-known examples SIR (Susceptible-Infected-Recovered), SEIR (Susceptible-Exposed-Infected-Recovered) or SIS (Susceptible-Infected-Susceptible) models.

Typically, the compartmental models are originally formulated following a deterministic approach, i.e., in terms of a system of Ordinary Differential Equations (ODEs), although usually they are readily extended to an stochastic version. There are two common approaches to include stochasticity into a deterministic model, relying on either the Continuous Time Markov Chain (CTMC) or Stochastic Differential Equations (SDEs), see [1] and the references therein. Here, we will focus on the latter. The stochastic models allow to capture many kinds of circumstances including uncertainty that may influence the compartments dynamics: behavioural effects, public interventions, seasonal patterns, environmental factors, etc. Furthermore, the solution of a deterministic model is given by a set of functions of time uniquely dependent on the initial data, while the solution of the stochastic model is a set of stochastic processes, containing much more information than the deterministic analogous. In fact, at each time instant we can exploit the information provided by the probability distribution associated to the underlying random variable. An statistical analysis can therefore be performed, producing useful outcomes like quartiles or worst case scenarios.

In this work we present an stochastic extension of the deterministic compartmental model in [10] , that has been proposed as an ad hoc model for the COVID-19 disease. Unlike to the classical models (SIR, SEIR, SIS), this model is adapted to the specific characteristics of the COVID-19, taking into account, besides the usual factors, the undetected infectious cases, the hospitalized population or the deaths, for example. Thus, the so-called θ-SEIHRD model proposed in [10] was developed as a very general and sophisticated model (based on a previously introduced model [11] ) in order to be able to study the spread of the disease worldwide. In the same work, the authors proposed a simplified version which reduces the mathematical complexity towards a more tractable model, but yet preserving the ability to capture the most relevant features. Other recent studies addressing the mathematical or statistical modelling in the context of the COVID-19 include [14, 21, 20, 17] , for example. As the literature focusing on COVID-19 modelling is increasing a lot during last months and weeks, it results quite difficult for the authors to select a list of the more relevant papers more recently arising in the topic, surely some of them being developed in parallel to this contribution.

As mentioned before, we will follow the SDE approach where the randomness is typically achieved by incorporating a Brownian motion, also known as Wiener process, to the ODEs of the deterministic system. There are two common ways of addressing this stochastic extension. On the one hand, an arbitrary random noise can be added to some of the equations in the ODE system, thus transforming them into SDEs. On the other hand, one (or more) of the existing model parameters can be perturbed, meaning that it (they) becomes a random variable or a stochastic process. Although both approaches have been well investigated, recent examples are [7, 15] for the random noise approach and [6, 5, 2] for the parameter perturbation, here we prefer the second alternative. When adding random noise to an equation, it often comes with an extra parameter to control the level of volatility, which might have no biological interpretation. However, in practice, the uncertainty will have impact on a particular model component, typically represented by a model parameter. Therefore, a randomly perturbed parameter can be reasonably explained in terms of the variability produced by the source of the considered uncertainty.

In this work, we will consider a randomly perturbed disease contact rates. The approach proposed here is however conceptually different than the ones typically found in the literature, aiming to deal with some of the observed inconsistencies in the COVID-19 context. As a natural choice, many stochastic SIR-like models rely on a normally distributed perturbation for the disease contact rate parameters (see [6, 2] , for example). However, this approach allows negative values for the rates, which is biologically non-sensical, potentially appearing when the rate is close to zero, as it is seen in practice for the COVID-19 [16] . This issue is overcome in [5] , by employing an exponential Ornstein-Uhlenbeck (OU) process to model the contact rate. Although this process indeed ensures positiveness, it presents another undesirable feature: an increasing variance in time. When analysing the disease transmission dynamics, there is not an objective evidence of more variability in the disease contact rates in the long-term [19] . Actually, one can expect more control of the disease spread patterns over the course of the epidemic. For all the reasons mentioned above, we propose to model the contact disease rates by means of the so-called CIR process, named after its authors Cox, Ingersoll and Ross [3] . The CIR process is widely employed to simulate the evolution of interest rates in the quantitative finance framework. In some sense, the interest rates in computational finance and the disease contact rates in epidemiology present a rather similar behaviour: positiveness, controlled variability and long-term stability.

The article is organized as follows. In Section 2 we introduce the proposed mathematical model. Section 3 is describes the numerical methods for solving the model. Section 4 contains the obtained results and their numerical and statistical analysis. Finally, Section 5 points out some conclusions and possible future research lines.

As mentioned in the previous introduction, we base our approach in the model in [10] , which provides an advanced extension of the classical SIR-like models that adds to the usual compartments in the SEIR model the very specific ones for COVID-19: undetected infectious, hospitalized and deaths. In [10] , the authors first presented a very sophisticated, well motivated and general model, with a significant amount of free parameters allowing an enormous flexibility, including several territories. In order to make the model more practically tractable, the authors also proposed a simplified version, by imposing several assumptions: single territory, regular natality/mortality is neglected, and no imported/exported cases. Furthermore, some predefined forms for the open parameters of the model are adopted.

Thus, the simplified θ-SEIHRD model proposed in [10] is therefore given by

Note that the last three equations of the system (1) are uncoupled with previous ones, so that the expression of their solution can be obtained by

once the first six ones have been solved in a first stage. Thus, only the first six coupled equations need to be solved. The system (1) is completed with the initial data S(t 0 ),

The model includes a number of open parameters, the model parameters, which are often determined either from the available data/literature or by calibration. In the following, a detailed explanation of each parameter is provided:

• Efficiency of the control measures m E , m I , m Iu , m HR , m HD ∈ [0, 1](%). Here, only one control measure is assumed (mobility restrictions, for example), implemented at date λ 1 and represented by a decaying function

with the parameter κ 1 ∈ [0, 0.2] entering in the calibration procedure. The generalization to more/individualized control measures is straightforward.

• The fatality rate ω(t) ∈ [ω, ω] ⊂ [0, 1], for which the following form is proposed

with ω and ω being the fatality rate limits with and without control measures, respectively.

As ω ≥ ω, its value is defined in terms of a displacement w.r.t the lower limit, i.e. ω = ω + δ ω , whose values are calibrated.

• The fraction of infected individuals which are detected and documented by the authorities, θ. Assuming that all deaths are detected i.e. θ ∈ [ω, 1] and it changes in time,

with θ, θ, λ 1 , λ 2 inferred from the data.

• The disease contact rates β E , β I , β Iu , β HR , β HD ∈ R + . The parameter β I is assumed to be given, calibrated to the available data, and a relation between it and the rest of the contact rates exists,

where β I = C u β I , with C E , C H (t) and C u ∈ [0, 1]. Parameters C E and C u are also obtained by calibration, while C H (t) is determined by employing the data of the infection transmissions within hospitals, usually available, as

with α H being the percentage of cases (healthcare workers) infected by individuals in compartments H R or H D .

• The compartment transition rates γ E , γ I , γ Iu , γ HR , γ HD ∈ (0, +∞). They are constructed in accordance to the recent literature, based on the average duration (in days) of an individual in each infectious compartment, denoted by d E , d I , d Iu , d HR and d HD . Lets assume that d Iu = d HR and d HD = d HR + δ R , δ R > 0. Thus,

represents the decrease of the duration of d I due to the application of control measures at time t, and d g is the maximum number of days that d I can be decreased due to the control measures. The parameter δ R is included in the calibration.

Our aim is to introduce stochasticity to the simplified θ-SEIHRD model above. For this purpose, to keep the interpretability of the model, we will add some randomness to a set of parameters. In our approach, we will add randomness on the disease contact rates, β's. More precisely, as in the simplified deterministic version proposed in [10] all the β's depend on β I , as indicated in (2), it is sufficient to add uncertainty in the model parameter β I by turning it into a stochastic process, i.e a random variable at each time instant. Thus, we start by writing the model in terms of β I . From the Equation (2), we have the following relations:

. By using the previous notation, the simplified θ-SEIHRD model in (1) can be rewritten as,

where

We now incorporate a stochastic component into the model by, as mentioned, by replacing the constant parameter β I of the deterministic model by a stochastic process.

Lets then assume that, instead of considering a constant value β I , the disease contact rate in compartment I follows a newly introduced stochastic processβ I (t). In order to preserve the positiveness in the parameter definition (imposed by its biological interpretation), we choose the well-known CIR process [3] . The main advantage of the CIR process is that it theoretically ensures the spacial states to be non-negative. Further, the CIR process is a mean-reverting process meaning that, in the long term, the dynamics of the process tend to a prescribed value (the average). This can have a biological interpretation since, at the early stages of the disease, the contacts between people are less controlled, so more volatile. In the mid-long term, the individual contacts and the disease spread patterns are more studied and the variability in its estimation ranges is reduced. Accordingly, the dynamics ofβ I read

where ν βI is the mean reverting speed, µ βI is the long-term average, σ βI is the volatility and dW (t) is a Brownian motion increment. Therefore, the corresponding system of SDEs governing the stochastic θ-SEIHRD model is given by,

with M (t) as defined in Equation (4). Although only one source of randomness is introduced, the solution of the whole system becomes a set of stochastic processes, due to the dependence of the remaining equations on the first equations for S and E, and their own dependence oñ β I . Note also that we could add more time dependency toβ I both in the deterministic model (which is not the case in [10] ) as well as in the stochastic version (by considering either ν βI , µ βI or σ βI time dependent). Note that our approach could be generalized to a setting where some of the parameters were independent, in this case we would consider each one as a Gaussian random variable and possible correlations between some or all of them. In this more general setting, a certain number of different (possibly correlated) Brownian motion processes would come into place.

For a set of constant initial data S(0), E(0), I(0), I u (0), H R (0), H D (0), R d (0), R u (0), D(0) and β I (0), the system of SDEs (6) has a unique strong solution. This follows from the fact that the coefficients of the SDEs are locally Lipchitz continuous and the initial condition is a constant value (see [13] , for example).

Although in the stochastic θ-SEIHRD model (6) the solution is a finite set of stochastic processes, we maintain the same notation we used for the finite set of real valued functions representing the solution of the deterministic θ-SEIHRD model.

As the system of SDEs (6) is nonlinear, it is not possible to obtain a closed-form expression for the solution, as it also happens with the deterministic version. Therefore, the use of numerical methods for solving (6) becomes mandatory. Here, we adopt the following strategy. First, we perform a simulation of the dynamics ofβ I (t), in accordance with the CIR process. After, we solve the resulting ODE system for each simulated path ofβ I (t). In this way, we obtain a set of random walks for each stochastic process representing a model variable.

The CIR process is a well-studied dynamics often employed in computational finance (see [18] for example), which satisfies the SDE in Equation (5) . From the mathematical point of view, one of its relevant features is that the underlying distribution is known analytically, relying on the non-central chi-squared distribution. Thus, given two time points, s and t, s < t, the conditional distribution ofβ I defined in Equation (5) reads

and χ 2 (a, b) is the non-central chi-squared distribution with a degrees of freedom and noncentrality parameter b. This form the basis for an exact simulation scheme, which can be used to obtain realizations ofβ I . Given a set of m + 1 time points, {t i } m 0 , where the solution will be computed, we have, for i = 0, . . . , m − 1,

with the constant parameter d = 4ν β I µ β I σ 2

β I and some initial valueβ I (t 0 ) =β I (0). By employing this scheme, we generate n simulated discrete sample paths ofβ I . Once the sample paths ofβ I have been obtained, we numerically solve n ODE systems, one for each of these paths. For this purpose, we choose the explicit Runge-Kutta method of order 5(4), known as RK45, RKDP or Dormand-Prince method, see [4] . Note that the time discretization needs to include the time points used in the simulation ofβ I . Alternatively, the whole system in Equation (6) can be solved by the Euler-Maruyama scheme (see [13] , for example), which is the stochastic version of the classical Euler scheme for solving ODE systems. Although this choice would provide faster solutions, it involves discretization errors, giving rise to some lack of precision and robustness.

In order to illustrate the potential of the stochastic version of θ-SEIHRD model, in Figures 1, 2 and 3 we present the solution of the deterministic model versus a number of possible scenarios (simulations) obtained by the stochastic model. We can clearly observe that the outcomes by the stochastic model provide much more information about the future evolution of each model compartment, while the deterministic one seems to be somehow an averaged version of the stochastic formulation. This gives just an initial insight of the potential of the newly introduced model, which will be completed in the next section with a more detailed analysis.

The experiments have been conducted in a computer system with the following characteristics: CPU Intel Core i7-4720HQ 2.6GHz, 16GB RAM memory and GPU GeForce GTX 970M. The numerical codes have been implemented in Python programming language. We consider a equally spaced time grid, i.e. ∆t := t i+1 − t i , ∀i, with time step ∆t = 1 6 (around 4 hours). In this section, we perform a numerical and statistical study of the proposed stochastic model. As mentioned, the solution of the system of SDEs in (6) is a set of stochastic processes, meaning that we can "extract" a random variable at each time point. Doing so, not only a single value (like for the deterministic case), but also some statistics can be provided given a prescribed time. In this work, we consider the mean, the interquartile interval, [Q 1 , Q 3 ], with Here we take advantage of the experiments conducted in [10] , referred to the case of China. In Tables 1 and 2 between the ones extracted from the literature or by experience, and the ones obtained by a calibration procedure in [10] , respectively. The initial data is given by S(t 0 ) = N − 1, E(t 0 ) = 1 The period of convalescence.

Fraction of the infected people hospitalized. and

We firstly test the evolution of the model variables. Thus, we extract some simulation-based statistics at a couple of time instants, the 8th February (inflection point) and the 29th March (final point). Furthermore, the impact of different levels of uncertainty is also reported by considering several representative values for σ βI , reflecting situations of no uncertainty (σ βI = 0), low uncertainty (σ βI = 0.1) and high uncertainty (σ βI = 0.5). In all cases, the long-term mean of the perturbation is set to the calibrated value of β I for the deterministic model, i.e. µ βI = β I . The mean reverting speed parameter is chosen as ν βI = 1, thus representing a regular (not too high, not too low) reversion speed. In Table 3 , the obtained results are presented. We can clearly observe the significant impact of the uncertainty in the disease evolution. In the case of higher uncertainty, the number of infections, in average, is almost doubled and, the worst case scenario multiplies this value by six. Even when a lower volatility is considered, the increment of cases and deaths in the worst scenario becomes important, up to 50%. Next to the previous experiment, in Figure 4 we present the histograms for the model variable I(t) to give an insight of the impact of the uncertainty in the infection evolution. First, we clearly observe that the produced distribution is skewed with a fatter right tail. Secondly, the bigger the volatility of processβ I (t), i.e. σ βI , is, the fatter the right tail becomes, thus indicating more 8th February, 2020 (t = 69) σ βI = 0 σ βI = 0. 

In [10] , the authors proposed a set of possible outputs that can be useful for the authorities to plan the resources allocation, like the number of clinical beds, for example, among other indicators. These outputs are:

• The cumulative number of COVID-19 cases, c m (t), at time t:

θ(s)γ I (s)I(s)ds.

• The cumulative number of deaths due to the COVID-19, at time t: d m (t) = D(t).

• The basic reproduction number, R 0 , and the effective reproduction number, R e (t), at time t, where R 0 = R e (t 0 ), and

• Hospitalized people, Hos(t), at time t:

where p(t) is the fraction of people in compartment H R that are hospitalized and C o is the period of convalescence.

• Maximum number of hospitalized people in the interval [t 0 , t]:

Hos(τ ).

• The number of individuals infected by others belonging to compartments E, I u and H = H R + H D :

(m HR (s)β HR H R (s) + m HD (s)β HD H D (s)) S(s) N ds, respectively.

We therefore perform a similar statistical analysis as before, although now reporting the model outputs. The results are shown in Table 4 . Again, it is clear that the uncertainty can significantly affect the disease evolution, and it should be taken into consideration. For example, the people requiring hospitalization may vary from around 4500 with no stochasticity included, to around 30000 including randomness.

As previously pointed out, one of the major advantages of the θ-SEIHRD model is that it accounts for important aspects of the COVID-19 pandemic, directly affecting the population or the healthcare systems. Particularly, the evolution of some curves, infected, hospitalized and deaths, is typically reported by the authorities, in both cumulative and daily fashion. In Figures  5 and 6 , we show the model outcomes for these specific curves, considering two uncertainty levels, σ βI = 0.1 and σ βI = 0.5, respectively. Again, we present the mean, the interquartile interval and the worst case scenario. From this experiment, an interesting observation can be extracted. Looking at the different patterns in the restults w.r.t. the volatility parameter, we can see that an increasing uncertainty pushes the mean close to the third quartile, Q 3 , meaning that the disease evolves, in average, according to the 75% worst scenario. This fact gives an insight of how important the randomness can be and why it is crucial to include it in the modelling.

We have extended the model developed in [10] by incorporating randomness to some relevant coefficients and we have shown the importance of considering this uncertainty. In this way, approach, which allows not only to compute confidence intervals for these variables in this new random setting but also to obtain the worst case scenario. The information about the model variables in this worst case scenario allows to develop more conservative policies in the actions against the results of the COVID-19, as for example to plan larger health resources to take care of a larger number of people requiring hospitalization at different levels. The differences between the deterministic and stochastic models in terms of the information contained in the output variables has been clearly illustrated in the previous section. However, we also understand that research in this line can be extended. A fist possible extension comes from making the parameters independent among each other and use different stochastic processes to characterize their dynamics. In this first approach, as in [10] , we consider that all parameters depend on β I . Also, it seems possible to incorporate randomness to the more recent model in [9] that mainly incorporate a new compartment of persons in quarantine (Q) to model the situation in certain countries like Italy.

A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis

A stochastic differential equation SIS epidemic model with two independent Brownian motions

A theory of the term structure of interest rates

A family of embedded Runge-Kutta formulae

Capturing the timevarying drivers of an epidemic using stochastic dynamical systems

A stochastic differential equation SIS epidemic model

Dynamics analysis of a nonlinear stochastic SEIR epidemic system with varying population size

The mathematics of infectious diseases

A simple but complex enough θ−SIR type model to be used with (COVID-19) real data. Application to the case of Italy

Mathematical modeling of the spread of the coronavirus disease 2019 (COVID-19) taking into account the undetected infections. The case of China

Be-CoDiS: A mathematical model to predict the risk of human diseases spread between countries-validation and application to the 2014-2015 Ebola virus disease epidemic

A contribution to the mathematical theory of epidemics

Numerical Methods for Stochastic Differential Equations

Nikos Bosse, and Stefan Flasche. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet Infectious Diseases

Analysis and numerical simulations of a stochastic SEIQR epidemic system with quarantine-adjusted incidence and imperfect vaccination

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Mitigation strategies for COVID-19: lessons from the K-SEIR model

Mathematical Modeling and Computation in Finance

World Health Organization. Coronavirus disease (COVID-2019) situation reports

Stochastic modelling of the COVID-19 epidemic

Real-time forecasts of the COVID-19 epidemic in China from february 5th to february 24th

This work was funded by Xunta de Galicia grant ED431C2018/033, including FEDER funding. Both authors also acknowledge the support received from the Centro de Investigación de Galicia "CITIC", funded by Xunta de Galicia and the European Union (European Regional Development Fund-Galicia 2014-2020 Program), by grant ED431G 2019/01.Alvaro Leitao acknowledges the financial support from the Spanish Ministry of Science, Innovation and Universities, through the Juan de la Cierva-formación 2017 (FJC17) grant in the framework of the national programme for R&D 2013-2016.