key: cord-0921159-tihzd2np authors: GAGLIONE, DOMENICO; BRACA, PAOLO; MILLEFIORI, LEONARDO MARIA; SOLDI, GIOVANNI; FORTI, NICOLA; MARANO, STEFANO; WILLETT, PETER K.; PATTIPATI, KRISHNA R. title: Adaptive Bayesian Learning and Forecasting of Epidemic Evolution–Data Analysis of the COVID-19 Outbreak date: 2020-09-30 journal: IEEE Access DOI: 10.1109/access.2020.3019922 sha: 0d770ac3f5315afe69894bad812bc5229c2735be doc_id: 921159 cord_uid: tihzd2np Since the beginning of 2020, the outbreak of a new strain of Coronavirus has caused hundreds of thousands of deaths and put under heavy pressure the world’s most advanced healthcare systems. In order to slow down the spread of the disease, known as COVID-19, and reduce the stress on healthcare structures and intensive care units, many governments have taken drastic and unprecedented measures, such as closure of schools, shops and entire industries, and enforced drastic social distancing regulations, including local and national lockdowns. To effectively address such pandemics in a systematic and informed manner in the future, it is of fundamental importance to develop mathematical models and algorithms to predict the evolution of the spread of the disease to support policy and decision making at the governmental level. There is a strong literature describing the application of Bayesian sequential and adaptive dynamic estimation to surveillance (tracking and prediction) of objects such as missiles and ships; and in this article, we transfer some of its key lessons to epidemiology. We show that we can reliably estimate and forecast the evolution of the infections from daily — and possibly uncertain — publicly available information provided by authorities, e.g., daily numbers of infected and recovered individuals. The proposed method is able to estimate infection and recovery parameters, and to track and predict the epidemiological curve with good accuracy when applied to real data from Lombardia region in Italy, and from the USA. In these scenarios, the mean absolute percentage error computed after the lockdown is on average below 5% when the forecast is at 7 days, and below 10% when the forecast horizon is 14 days. Beginning in early December 2019, Chinese health authorities have been detecting and monitoring an increasing number of pneumonia cases in the city of Wuhan, a province of Hubei. The pneumonia, later named COVID-19, is caused by a new strain of Coronavirus, and is technically referred to as the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) [1] . As of August 23, 2020, more than 23 million people worldwide have been infected, and over 800 thousand have died. In March 2020, a series of events have pushed many governments to take extraordinary social measures. These events include the lack of effective cures and vaccines, the exponentially increasing number of individuals requiring recovery in intensive care units, and the announcement by the World Health Organization (WHO) of a Coronavirus pandemic on March 11, 2020 [2] . The adopted measures included closure of schools, universities, shops, industries, public and cultural places, the prohibition of mass gatherings, travel bans, and extreme social distancing, including local and national lockdowns. The main aim of these measures is to slow down the infection rate and alleviate the pressure on healthcare systems, in order to ensure care to all individuals stricken by the virus. Indeed, after the adoption of these measures, most countries have seen a decrease in the daily numbers of infected individuals. In order to prevent another exponential rise in infections as the restrictive measures are progressively relaxed, it is of crucial importance to develop mathematical models and algorithms to track and forecast the evolution of the infection with acceptable accuracy, which can help authorities to make informed and timely decisions. Improving our ability to model and forecast is also of paramount importance to better address future pandemic outbreaks [3] . The algorithm proposed in this article builds on the concept of compartmental epidemiological models, which assume that a given population is divided into a fixed number of compartments. Each compartment represents an epidemic state that an individual can occupy. The flow dynamics from one compartment to another are modeled as a set of stochastic differential equations that we discretize according to the discrete nature of the available data, i.e., daily update on the number of infected, recovered, dead, etc. In the standard SIR model, proposed in the pioneering study on mathematical theory of epidemics by Kermack and McKendrick [4] , it is assumed that the entire population, e.g., of a city, a region, or a nation, is constant and divided into three compartments (population subgroups), namely, susceptible (S), infected (I), and recovered (R) individuals. Moreover, it is assumed that an infected individual infects a susceptible one at a given rate β [5] . Once infected, the individual is removed from the compartment of susceptibles and enters the infected compartment. Each infected person runs through the course of the disease, and eventually is removed from the number of those who are still infected either by recovery or death, thus exiting the system at "recovery" rate γ; the recovered people are considered permanently immune. 1 The ratio β/γ is called the contact ratio, and represents the mean number of people the infected individual comes in contact with. The SIR model is simple, yet very successful and useful in practice. Over the years, several more sophisticated extensions have been proposed to account for more compartments and other salient aspects of the epidemics. For example, a person who comes in contact with an infected individual and contracts the infection might not develop the symptoms immediately but only with a certain delay, called incubation period; in the case of COVID-19, this delay is around 3-15 days with a median of 5.2 days [6] . The SEIR model accounts for this circumstance by adding a further compartment that represents exposed -but not yet contagious -people [7] , [8] . That is, susceptible individuals who contract the virus, pass to the exposed compartment (E) before evident symptoms appear and the person is confirmed as infected. The SEIRQ model is a further extension that also accounts for quarantined people [9] . Restriction measures are directly taken into account in the recently proposed SIR-X model [10] that, introducing an additional mechanism, removes susceptibles from the transmission process when the measures become effective. A critical epidemiological characteristic for the pandemic potential of an emergent respiratory virus is represented by the undocumented, but infectious, cases. In contrast with the documented infectious cases, they often experience mild, limited, or no symptoms at all, and therefore, since they are generally not tested, remain undetected. These are the so-called asymptomatic cases in the context of COVID-19 pandemic. Based on their contagiousness and numbers, they can expose a far greater portion of the population to the virus than would otherwise occur. Li et al. [11] present a model-inference framework to estimate the contagiousness and proportion of undocumented infections in China before and after the lockdown in Wuhan. Most of the compartmental models described so far consider the disease spread inside a unique and single population: a city, a region, a nation. In contrast, metapopulation models add a further spatial dimension, by interpreting the population as a network of multiple spatially separated subpopulations (nodes), e.g., multiple cities in the same region; the connections from one subpopulation to another are represented by movements ("diffusion") of persons. Such interconnections represent contacts such as commuting to work, second homes, or national and international travels. In such a scenario, the diffusion of the infection is not only caused by the contacts among susceptibles and infectious people within each subpopulation, but also by the spatial interactions among the different subpopulations [12] - [15] . Li et al. [11] utilize a stochastic metapopulation model to simulate the spatiotemporal dynamics among 375 Chinese cities. The spatial spread of COVID-19 across cities is captured by the daily number of travelers from a city to another during the Spring Festival before the lockdown. Chinazzi et al. [16] model both the domestic (within Wuhan) and the international spread of the Coronavirus epidemic. The effects of the travel bans imposed in the city of Wuhan and the international travel ban adopted by several countries in early February 2020 are estimated. To model the international spread of the COVID-19 outbreak, the authors employ the stochastic global epidemic and mobility model. This metapopulation model is integrated with real-world data and relies on a network wherein each node represents a subpopulation located near major transportation hubs, e.g., airports; there are more than 3200 subpopulations, in roughly 200 different countries and territories. The degree of connection among subpopulations is represented by the number of people traveling daily among them. Within each subpopulation, there exist four states of the compartmental model, i.e., susceptible, latent (similar to exposed), infectious, and removed. The model generates an ensemble of possible epidemic scenarios described by the number of newly generated infections, time of disease onset in each subpopulation, and the number of traveling infection carriers. Most of the aforementioned epidemic models assume that relevant model parameters, e.g., the infection rate β and the recovery rate γ, are time-invariant, and several approaches have been proposed in the literature for tuning or estimating them [17] - [19] . However, the sudden imposition of restriction measures -and their subsequent relaxations -means that a static stochastic model is inappropriate. Moreover, even in the absence of dramatic restriction measures, there is no doubt that a time-varying model for the key epidemic parameters would better reflect the ground-truth. 2 The main contribution of this article is to propose a Bayesian sequential learning and forecasting framework of the epidemic curve based on the data that authorities provide on a daily basis, e.g., number infected and number recovered. We leverage our recent research on unknown covariance matrix estimation [20] and self-tuning multisensor multitarget tracking [21] . Indeed, similarly to the target tracking problem, where the objective is to automatically detect the time instants when a target sharply maneuvers in order to improve the overall tracking performance, here we aim to closely track the epidemic curve and the model parameters in order to provide reliable and accurate forecast of the contagion. Adapting ideas and tools from those works, our approach to Bayesian sequential learning and forecasting of epidemic evolution is as follows. First, the model parameters are assumed to take on values from rich but prespecified finite sets; their time-evolution is modeled by Markov chains. Second, in order to capture the effects of mitigation strategies (e.g., mobility restriction, lockdown, wearing masks, and social distancing), the marginal posterior distributions of both the variable states (number of infected and recovered people) and model parameters (infection rate β and recovery rate γ), are calculated at each time by means of recursive prediction and update formulae. Finally, we develop an efficient implementation of the proposed method based on mixture models and provide a concrete example of application using the stochastic SIR model. The proposed method is validated on real datasets acquired during the recent COVID-19 outbreak in the Lombardia region, Italy, and in the USA. As we shall see, even adopting the simple stochastic SIR model, we obtain superior forecast accuracy when compared to prediction algorithms that use time-invariant parameter models. The approach developed in this article is general enough to be applied to more-sophisticated stochastic epidemiological models [8] - [10] , including more-complex metapopulation models [11] , [16] ; these extensions are left for future investigations. The remainder of the paper is organized as follows. In Section II, we describe a general Bayesian adaptive framework that can be tailored to any discrete-time epidemiological model, and in Section III we propose an implementation thereof based on the use of mixture models. In Section IV, we develop the mixture-based Bayesian sequential approach in the context of the stochastic SIR model. Section V presents results using synthetic as well as real data. Finally, in Section VI we provide some conclusions and possible directions for future investigations. Vectors are denoted by boldface lower-case letters (e.g., a), matrices by boldface upper-case letters (e.g., A), and sets by calligraphic letters (e.g., ). The transpose is written as (·) T . We write diag(a 1 , … , a N ) for an N × N diagonal matrix with diagonal entries a 1 , … , a N , I N for the N × N identity matrix, 1 N for the N-dimensional vector of all ones, and 0 for the zero vector. [ ⋅ ] denotes statistical expectation, and ( ⋅ ) refers to both the probability density function (pdf) of a continuous random variable or vector and the probability mass function (pmf) of a discrete random variable or vector; the difference will be clear from the context. (μ, C) indicates a Gaussian distributed vector with mean μ and covariance matrix C, and (a, b) represents a uniformly distributed variable between a and b. Finally, (x; μ, C) refers to a multivariate Gaussian pdf of random vector x with mean μ and covariance matrix C. We present a sequential Bayesian framework that, at each time interval, jointly computes the posterior distribution of S unknown time-varying states and of M unknown time-varying parameters. These unknown quantities are inferred at times t k , with k ∈ {1, 2, …}, using noisy observations (e.g., information on the number of infected, discharged COVID-19 patients from the hospitals, dead). We assume that the time interval Δ t ≜ t k -t k-1 between consecutive observations is one day, unless otherwise stated. We denote x k ≜ [x 1, k , …, x S, k ] T ∈ ⊆ ℝ S as the state vector comprising the S epidemic states x s,k at time t k (e.g., numbers of infected and recovered individuals), and θ k ≜ [θ 1, k , …, θ M, k ] T ∈ ⊆ ℝ M as the parameter vector comprising the M model parameters θ m,k at time t k (e.g., infection and recovery rates). The objective of the proposed algorithm is twofold: to estimate, at each time k, the epidemic state vector x k and the model parameter vector θ k ; and, at a fixed time k, to forecast the epidemic evolution up to time k + K with associated uncertainty in the form of prediction variance. Both tasks are based on the past and present observations. Hereafter, Section II-A describes the dynamic and observation models, and Section II-B and Section II-C present the proposed Bayesian sequential estimation and forecasting tasks, respectively. The reader who is already familiar with dynamic estimation of a hybrid state, that is, comprising the state of the dynamic model and nuisance parameters [20] - [22] , might skip ahead to Section II-C. The dynamic model that describes the evolution of the epidemic is formally expressed as where u k is a random vector -whose dimension depends on S, M, and f(·) -with known distribution modeling the stochastic variation of the epidemic state in the time interval Δ t . Note that function f(·) might embed additional known (either time-varying or time-invariant) parameters. We assume that, conditioned on θ k , θ k-1 , and x k-1 , the state vector x k is independent of the previous states and parameters, that is, Given appropriate initial conditions, the pdf in (2) is fully determined by the dynamic model in (1) and the statistics of u k . Moreover, we assume that the parameter vector evolves according to a first-order Markov model fully described by the transition pdf (θ k | θ k − 1 ), assumed known. From these assumptions, it follows that the adopted Bayesian framework is a hierarchical Markov model (an event-driven dynamic process): firstly, the parameter vector evolves according to the Markov model described by (θ k | θ k − 1 ); then the state vector evolves, given the current parameter vector as well as the previous state and parameter vectors, according to (1) . Furthermore, it is easy to verify that that is, the joint evolution of x k and θ k follows a first-order Markov model. The observation vector at time k is denoted by z k ∈ ℝ B , and consists of up-to-date information on the state of the epidemic at time k. To take into account the randomness unavoidably present in real-word measurements, it is assumed that this information is uncertain, i.e., affected by noise (e.g., due to data collection errors, biases, holidays), and is modeled as where v k is a random vector with known distribution and whose dimension depends on S, M, B, and h(·). The model in (3) and the statistics of v k determine the likelihood (z k | x k , θ k ). For convenience, we define the vector z 1 The basic principles of Bayesian sequential estimation are now recalled. The reader is referred to [23] for further details. In the Bayesian setting, the estimation of state and parameters amounts to calculating the posterior pdf (x k | z 1: k ) of the state vector x k , and the posterior pdf (θ k | z 1: k ) of the parameter vector θ k , respectively. The minimum mean square estimators (MMSEs) of x k and θ k are given by [23, Ch. 4] x k and respectively. 3 We further note that, using the law of total probability, the pdf (x k | z 1: k ) can be expressed as Thus, the estimation problem boils down to calculation of the posterior pdfs (x k | θ k , z 1: k ) and (θ k | z 1: k ). In the following, we show how they can be obtained sequentially through the implementation of recursive update and prediction steps. 1) UPDATE STEP-Let us assume that we know the pdf of the parameter vector at time k given the observations up to time k -1, i.e., (θ k | z 1: k − 1 ), and the pdf of the state vector at time k given the parameter vector at time k and the observations up to time k -1, i.e., (x k | θ k , z 1: k − 1 ). Then, when a new observation z k becomes available, the parameter vector pdf and the state vector pdf are updated through Bayes' rule as and 3 The superscript E stands for estimate, and is used to distinguish the estimate from the forecast, later identified by the superscript F. respectively, where the last equality of (8) exploits the assumption that the observation at time k is conditionally independent of all the previous observations, given the state and parameter vectors at time k, i.e., (z k | x k , θ k , z 1: k − 1 ) = (z k | x k , θ k ). Using the same assumption and the law of total probability, the pdf (z k | θ k , z 1: k − 1 ) appearing in (7) is calculated as 2) PREDICTION STEP-In the prediction step, we assume that the posterior pdf (θ k | z 1: k ) in (7) and the posterior pdf (x k | θ k , z 1: k ) in (8) are known, and derive the pdfs (θ k + 1 | z 1: k ) and (x k + 1 | θ k + 1 , z 1: k ). The former, i.e., the pdf of the predicted parameter vector at time k + 1, is obtained using the law of total probability and the Markovian assumption as follows: Analogously, using the law of total probability, the pdf of the predicted state vector at time k + 1 is given by The first term within the integral (11), i.e., (x k + 1 | θ k + 1 , θ k , z 1: k ), is calculated using the law of total probability and assuming that x k+1 is conditionally independent of z 1:k given θ k+1 , θ k , and x k , i.e., (x k + 1 | θ k + 1 , θ k , x k , z 1: k ) = (x k + 1 | θ k + 1 , θ k , x k ). Thus, The last step follows from the Markov property of the parameter vector. Indeed, the pdf (x k | θ k + 1 , θ k , z 1: k ) can be calculated as The second term within the integral (11), i.e., (θ k | θ k + 1 , z 1: k ), is obtained using Bayes' rule and exploiting again the Markov property of the parameter vector, that is, where (θ k + 1 | z 1: k ) is given by (10). As the model we consider is nonlinear, the forecast of the epidemic evolution is assessed numerically through a methodology known as ensemble forecasting [24] - [28] , that consists of generating a collection of possible evolutions of the epidemic -given the state and parameter vectors estimated so far -and provide a single mean forecast with associated uncertainty. Let us assume that the latest available observation is z k , and that the posterior (6)) are known; the proposed ensemble forecasting approach is a Monte-Carlo technique that samples the posterior distribution of x k and θ k , and evolves these sampled initial vectors up to time k + K, where K is the forecast horizon. Specifically, let x k ( j) and θ k ( j) be the j th state vector and parameter vector samples extracted from (x k | z 1: k ) and (θ k | z 1: k ), respectively, where j ∈ ≜ {1, …, J} and J is the ensemble size. The sampled state and parameter vectors are then allowed to evolve 4 according to the state vector forecast transition distribution and the parameter vector forecast transition distribution , respectively, for k′ ∈ {k + 1, … , k + K}. We observe that these transition distributions can be equal to the transition distributions used within the Bayesian sequential estimation procedure described in the previous section, that is, , respectively, or can be suitably devised to improve the forecast performance. Finally, defining the ensemble state matrix as the mean of the epidemic state and model parameter at any time step k′ ∈ {k + 1, … , k + K} can be calculated as sample means of X k′ and Θ k′ , respectively; that is [29] , Higher order moments, such as sample covariance matrices, can also be computed [29] . This section describes a mixture model implementation -similar to the approach proposed in [20] -of the Bayesian sequential estimation procedure presented in Section II-B. For computational efficiency, the first step is the discretization of the parameter vector θ k = [θ 1,k , … , θ M,k ] T , such that each element θ m,k , m ∈ {1, … , M}, takes on values from a We note that all the expressions in Section II-B remain valid, provided that integrals ∫ dθ are replaced with summations Σ θ ; e.g., the MMSE estimator of θ k in (5) is rewritten as 4 By "evolving" is meant a simple procedure of recursively (in time) generating random variables according to a transition distribution; colloquially, this is a process of "rolling the dice". The key aspect of the formulation is to model the pdf (x k | θ k , z 1: k − 1 ) as a mixture of N components, that is, are weight and pdf of the n th component, respectively. The auxiliary variable N k ∈ {1, … , N} models the switch between the N mixture components; hereafter, for notational convenience, we will simply use n to denote that N k takes on the value n, i.e., N k = n. In the next two subsections, exploiting the development presented in Section II-B1 and Section II-B2, we provide the expressions for the sequential update and prediction of the state and parameter vectors according to the mixture model. The posterior pdf (x k | θ k , z 1: k ) appearing in (8) can be written through the law of total probability as The updated weight w k | k where the update coefficient can be derived from (9) assuming the conditional independence of the observation z k from the previous observations z 1:k-1 and the specific mixand n, given x k and θ k , i.e., (z k | n, That is, Using the same assumption, the posterior pdf of the n th mixture component is calculated from (8) . (20) Then, using (7) with integrals replaced by summations, the posterior pmf of the parameter vector is obtained as where, through the law of total probability, (z k | θ k , z 1: k − 1 ) can be calculated as Here, we recognize the update coefficient . Therefore, the posterior pmf in (21) can be finally recast as The pmf of the predicted parameter vector at time k + 1, i.e., (θ k + 1 | z 1: k ), is simply obtained by inserting (22) into (10), where the integral is replaced by summation. To derive the pdf of the predicted state vector at time k + 1, i.e., (x k + 1 | θ k + 1 , z 1: k ), instead, we consider the discrete version of (11), that is, and insert therein (12) and (13) Finally, by using (17) into (23), the latter can be recast as where the pdf of the n th component, i.e., ( and the predicted weights are defined as We note that in the prediction step the number of mixture components increases from N to N × D, thus a suitable merging/pruning criterion is required to avoid the exponential growth of the computational complexity [30] , [31] . Remark: We aim to provide a general framework for adaptive Bayesian estimation of epidemic evolution, here implemented via an efficient mixture model. Nonetheless, the same nonlinear problem could have been approached and implemented differently, for example by using an extended or unscented Kalman filter (EKF or UKF), or by means of sequential Monte Carlo (SMC) methods, e.g., particle filters [32] . These are not alternate to our approach, but can be possibly combined, as noted later in Section IV-B. However, we also observe that direct use of the aforementioned techniques -if not adequately tailored to the specific problem -may lead to poor or unexpected results. For example, the EKF approximation of the nonlinear behaviour of the system through local linearization, might fail in the presence of strong nonlinearities, leading to unreliable estimates or even to divergence. The UKF can potentially provide higher-order estimation accuracy using the unscented transform, but this usually has the effect of simply delaying the unavoidable divergence that will still happen in the case of severe process or measurement nonlinearities. On the other hand, SMC methods generally provide reliable numerical approximations to sequential nonlinear estimation problems. However, in real-world applications, where the system also depends on unknown time-varying parameters to be inferred from uncertain data, conventional particle filters could fail to detect and track the change in parameters, quickly leading to implementation issues, such as sample impoverishment. In addition, high performance of particle methods comes at the expense of increased computational demands. Another class of methods is based on system identification/machine learning (ML) techniques. Canonical approaches here include expectation-maximization, variational Bayes methods, and the variety of nonlinear auto-regressive models with external inputs, recurrent neural networks, long short-term memory networks [33] - [36] . The non-parametric ML methods suffer from lack of explainability and causal reasoning needed for policy decisions in pandemics. The SIR model [4] , [5] subdivides the population of a community into three interacting groups: susceptible, infectious, and recovered individuals. The interactions are governed by the infection rate, usually denoted by β, that is the average rate at which an infected individual can infect a susceptible one, and by the recovery rate, generally called γ. Let P be the total population size 5 ; s k , i k , and r k be the normalized (to P) number of susceptible, infectious, and recovered individuals at time k, such that s k + i k + r k = 1; and β k and γ k be the infection and recovery rates at time k. The discrete-time stochastic SIR system of equations is expressed as [17] , [37] Since s k and i k determine r k , we define the state vector as x k ≜ [s k , i k ] T , and the parameter vector as θ k ≜ [β k , γ k ] T ; hence, we have S = 2 and M = 2. The dynamic model of the epidemic is then expressed as (cf. (1)) where 5 Note that authors generally refer to the population size as N [4] , [5] , [7] - [13] . From (26) it then follows that the state transition pdf (cf. (2)) is independent of the current parameter vector θ k , i.e., ( where We observe the (uncertain) normalized number of infected and of recovered individuals at each time k. Therefore, B = 2, and the observation model (cf. (3)) is where v k ∼ (0, R(x k )) models the observation uncertainty, and The covariance matrix R(x k ) depends on the state vector at time k: since we assume that the observation "noise" accrues from the sum of uncertainties of each individual epidemic state, the variances are linear in the number of infected and recovered individuals, respectively. Hence, we define where R c is a constant diagonal matrix. Thus, the likelihood is independent of θ k , i.e., (z k | x k , θ k ) = (z k | x k ), and distributed according to Given the Gaussian nature of the dynamic and observation models, we adopt a Gaussian mixture implementation of the Bayesian sequential estimation. That is, we assume that the pdf of the n th mixture component in (16) is Gaussian with mean and covariance When the observation z k is available, mean, covariance matrix, and weight are updated. Specifically, the weight of the n th mixand is updated as in (18) through the coefficient ; this, in turn, is calculated recalling that the likelihood in (29) is independent of θ k , and inserting (29) and (30) into (19) , that is, where we made the approximation R(x k ) ≈ R(x k | k − 1 (n, θ k ) ). We observe that the last step would be an equality -rather than an approximation -if the observation covariance matrix was independent of the state x k . Then, the updated pdf of the n th Gaussian component is obtained by inserting (29), (30) , and (31) into (20) , and is equal to [30] (x k | n, θ k , z 1: where This is similar to a standard Kalman update, per mixture element, with essential difference that the measurement noise covariance is state-dependent. Let us now consider the evolution of the mixands according to the dynamic model. The n th predicted weight is computed using (25) where, assuming that infection rate β k and recovery rate γ k evolve independently, the parameter vector transition pmf can be written as ; the marginal transition pmfs (β k + 1 | β k ) and (γ k + 1 | γ k ) are specified later in this section. The pdf of the n th predicted mixand is obtained by recalling that the transition pdf in (27) is independent of the current parameter vector θ k , and inserting (27) and (32) into (24) . This yields Given the nonlinearity of the dynamic model (26), the integral (33) cannot be computed explicitly. A viable alternative is to approximate the pdf (x k + 1 | n, θ k + 1 , θ k , z 1: k ) = (x k + 1 | n, θ k , z 1: k ) as a Gaussian via moment matching, that is, The computation of x k + 1 | k (n, θ k ) and of C k + 1 | k (n, θ k ) by moment matching is detailed in Appendix A. An alternative method to solve integral (33) is via the unscented transformation used within the UKF. As described in Section III, the parameter vector θ k = [β k , γ k ] T is discretized for computational efficiency. Specifically, β k ∈ 1 = {ϑ 1 (1) , …, ϑ D 1 (1) } and (2) }. For concreteness, we assume that m (m = 1 or 2) is an ordered set, i.e., such that for any j, ℓ ∈ {1, … , D m } with j < ℓ, we have ϑ j [P β ] j, ℓ ≜ (β k = ϑ j (1) | β k − 1 = ϑ ℓ (1) ), (35) and 2) ) . (36) We note that ∑ j = 1 D 1 [P β ] j, ℓ = 1 and that ∑ j = 1 D 2 [P γ ] j, ℓ = 1. Eventually, according to (4) and (6), and replacing the integral in (6) with the summation, the MMSE estimates of the normalized numbers of susceptible and infectious are and respectively; and, according to (15) , the MMSE estimates of the infection and recovery rates are and respectively. Note that the estimates for the parameters β k E and γ k E are updated automatically based on the attractiveness (measured in terms of the relative likelihoods) of the estimates that assume them. Finally, the prior pmfs of the infection and recovery rates at time k = 0 are set to (β 0 ) = (β 0 ; β 0 , σ β 2 ) and (γ 0 ) = (γ 0 ; γ¯0, σ γ 2 ); the prior pdf of the n th Gaussian component at time k = 0 is (x 0 | n, θ 0 ) = (x 0 | n) = (x 0 ; x 0 (n) , C 0 (n) ). A detailed statement of the proposed Gaussian mixture filter for the Bayesian estimation of the epidemic evolution with the stochastic SIR model is provided in Algorithm 1. The forecasting is as described in Section II-C. Let us assume that z k is the most recent available observation, and that the posterior pdf (x k | z 1: k ) and pmf (θ k | z 1: k ) are known. The j th sample state vector extracted from the posterior pdf ( j ∈ , and evolves according to the state vector forecast transition distribution ( j) ; we assume that this forecast transition distribution coincides with that used within the sequential Bayesian estimation procedure (cf. (27)), i.e., we assume that the sampled state vector x k ( j) evolves according to the dynamic model in (26) . Concerning the parameter vectors, in order to obtain samples θ k the infinite set -rather than the discrete finite set -, the posterior pmf (θ k | z 1: k ) is approximated with a suitable continuous distribution; here, (θ k | z 1: k ) is approximated with a bivariate Gaussian pdf and samples θ k ( j) are extracted from it. Then, these sampled parameter vectors are allowed to evolve according to the parameter vector forecast transition where we assumed the infection and recovery rates to change independently. We recall that the inverse of the recovery rate expresses the average time that an individual takes to move from the group of infected (I) to the group of recovered (R) people; in our model, the latter includes both those discharged from hospitals, and those for whom the infection was fatal. Even though it is likely that the recovery rate will change during the forecast period (due to, e.g., reporting delays or the application of different criteria used to declare an individual recovered), there are no prior information that would suggest when and how this will happen; it is therefore reasonable to assume the recovery rate to be constant and deterministic during the forecast period. This equals to set the recovery rate forecast transition distribution to where δ(·) is the Dirac delta. The infection rate, instead, models the interaction between people, and it is therefore affected by the restriction measures. Therefore, once its time evolution is captured, it is reasonable to assume -in the absence of further knowledgethat it keeps the same trend linearly. That is, the infection rate samples β k , j ∈ , are assumed to evolve according to Appendix B provides details on the estimation of the slope β Eventually, defining the ensemble state and parameter matrices as X k = [s k , ı k ] and 14)) The steps of the proposed forecasting algorithm with stochastic SIR model are detailed in Algorithm 2. We present numerical results obtained with the sequential estimation and forecasting algorithm described in Section IV. In Section V-A the algorithm is applied to synthetic data, while real data from the recent COVID-19 outbreak are considered in Section V-B. The effectiveness of the proposed algorithm is validated in two simulated epidemic scenarios involving a community of P = 10 6 individuals. The simulations span 80 days, during which the infection rate changes as shown in Fig. 1 . The variations of the infection rate model the effects of the restriction measures established by the authorities: in the first scenario, the epidemic outbreak is controlled by long-term soft restriction measures that cause a slow, yet consistent, decrease in the infection rate; in the second scenario, an initial strict lockdown is then followed by a relaxation of the restriction measures that leads to a slight increase in the infection rate. The recovery rate is fixed and set to γ = 0.1. The initial state of the epidemic and C 0 (n) = C 0 = I 2 i 0 , respectively, where ε 1 (n) ∼ ( − i 0 ∕ 5, i 0 ∕ 5) and ε 2 (n) ∼ ( − r 0 ∕ 5, r 0 ∕ 5). Finally, the observation noise covariance matrix in (28) is R c = 50 As for the forecasting, the ensemble size is J = 2 × 10 4 , and the minimum and maximum numbers of points used to estimate the slope of the infection rate are L min = 5 and L max = 14, respectively. Fig. 2 shows the infection and recovery rates estimated in the first scenario over the 80 days, along with their 90% confidence intervals. Analogously, Fig. 3 shows the estimated infection and recovery rates in the second scenario. The results demonstrate the capability of proposed Bayesian sequential estimation algorithm to closely follow the time variation of the infection rates even in the presence of abrupt fluctuations, as well as to accurately estimate the recovery rate. In turn, the accuracy of the proposed algorithm allows one to reliably forecast the epidemic evolution. Fig. 4 presents the estimation and forecast on the infection rate and of the number of infected in the first scenario; we assume that the latest available observation is on day k = 44 -so that the estimation stops on this day -, and the forecast is up to day k = 80. The forecast of the number of infected individuals well represents the evolution of the epidemic, suggesting a peak between days 55 and 65. Furthermore, we observe how both the true infection rate and the true number of infected is always enveloped within the 90% confidence interval, showing the high reliability of the proposed algorithm. Finally, in Fig. 5 , we show the forecast of the epidemic evolution in the second scenario. Here, the estimation is performed up to day k = 57; the capability of the proposed algorithm to accurately estimate the large variation in the infection rate and forecast its future average value, allows one to forecast the evolution of the number infected, even though a further small variation of the infection rate will start at k = 60. This section presents the results obtained with the proposed estimation and forecasting algorithm when applied to real data obtained from the recent COVID-19 outbreak. The focus is on two very different areas in terms of population and interactions: Lombardia region in Italy, and the USA. in Italy are made available from Protezione Civile on a daily basis [38] . This includes many entries, both nationwide and per region, as the total number of cases, total number of current Here, we focus on the data from Lombardia region, the centre of Italy's COVID-19 outbreak, whose population is P ≈ 10 7 people. We used the normalized (to P) total number of current positive cases as number of infected i k , and the normalized sum of number of discharged patients and number of deaths as the number of recovered r k . These are reported in Fig. 6 and refer to the period between February 24, 2020, and June 30, 2020. The figure also shows the beginning of the lockdown established by the Italian government on March 8, 2020. Furthermore, we observe that, on May 6, the number of infected and number of recovered individuals present large steps, which hardly reflect physical reality. These steps are due to the fact that the numbers reported on May 6 include not only data referring to that day, but also data collected on previous days, and, erroneously, not reported in the correct day [38] . The setting of the Bayesian sequential estimation and forecasting algorithm is as described in Section V-A, except that the smallest and largest values used for the discretization of the infection rate are Table 1 presents the mean absolute percentage errors (MAPEs) calculated for each forecast and for different forecast horizons, that is, 3, 7, and 14 days. We note that the forecasts made on April 13, 18, and 23, follow the future observations well, with an average MAPE below 3% at a forecast horizon of 7 days. On April 28 and May 3 the forecasts are not reliable, since the future observations are not contained within the 90% confidence interval. However, this poor performance is related to the inaccurate data provided later on May 6; indeed, the next forecasts made from May 8, to June 7, present again low MAPEs, with an average of 3.49 %, 4.24 %, and 6.1 %, at forecast horizons of 3, 7, and 14 days, respectively. Neglecting the forecasts whose horizon includes May 6 (marked with an asterisk in Table 1 ), the average MAPE from April 13, to June 7, is 3.6% for forecasts at 7 days, and below 6% when the forecast horizon is 14 days. The proposed algorithm is compared with two alternative curve-fitting approaches. The first one, hereafter named SIR-fit, employs a nonlinear least squares fitting algorithm that, using the number of infected and recovered individuals, computes the best 6 time-invariant infection and recovery rates of the deterministic SIR model. These best rates are then used to forecast the evolution of the epidemic. The second curve-fitting approach follows the same methodology applied on a more-sophisticated recently proposed generalized SEIR (GSEIR) model [39] , for this reason hereafter called GSEIR-fit. The GSEIR model consists of seven compartments -three more compartments than those in the standard SEIR model, i.e., insusceptible, quarantined, and death -and six parameters. Table 2 compares the average MAPEs obtained with the proposed algorithm, the SIR-fit, and the GSEIR-fit. The comparison is made averaging the MAPEs over two different time intervals. The first interval is from March 4, i.e., the 10th day since the beginning of the data collection, to June 16; the second interval is from April 1, i.e., approximately three weeks after the lockdown, to June 16. The proposed algorithm clearly outperforms the SIR-fit for all the forecast horizons. The GSEIR-fit, instead, presents a single lower average MAPE over the interval from March 4 to June 16 when the forecast horizon is 14 days; however, when the interval from April 1 to June 16 is considered, the proposed algorithm outperforms the GSEIR-fit in all the cases. This confirms the benefit of sequentially estimating the time-varying model parameters, in order to have reliable and accurate forecasts. 2) UNITED STATES OF AMERICA-Since the beginning of the COVID-19 epidemic outbreak, the Johns Hopkins University (JHU) has tracked the evolution of the contagion and made the collected data publicly available [40] , [41] . The repository includes the total number of cases, the number of deaths, and the number of discharged COVID-19 patients from the hospitals from the USA and other countries at different levels of details, i.e., for the country as a whole and, when available, for single states and regions. Here, we use the overall dataset from the USA, whose population is P ≈ 329.8 · 10 6 people. As for the experiment made on the dataset from Lombardia, the normalized (to P) sum of number of discharged patients and number of deaths is used as the number of recovered r k ; the normalized number of infected i k is then given by the normalized difference between the total number of cases and the number of recovered. These are reported in Fig. 9 and refer to the period between March 1, 2020, and July 31, 2020. The setting of the Bayesian sequential estimation and forecasting algorithm is unchanged, except that for the discretization and initialization of the parameters, and the initialization of the epidemic state. Specifically, the smallest and largest values used for the discretization of epidemic is given by the normalized numbers of susceptible, infected, and recovered on March 1, that are s 0 = 1 -i 0 -r 0 , i 0 = 22/P, and r 0 = 8/P. Fig. 10 shows the estimated infection and recovery rates. From the second half of March and through April the infection rate decreases, presumably due to the restriction measures established by each single State. Here, we cannot mark a specific date as the beginning of the lockdown; nevertheless, it is reasonable to assume that about three out of four US citizens were under some form of lockdown by early April [42] . Around April 30, the estimated recovery rate shows a slight increase followed by an abrupt decrease. On that day, 35 thousand new recovered (i.e., hospital releases plus deaths) individuals were reported against a decrease of infected individuals of only 6 thousand (cf. Fig. 9) ; this results in a sudden increase of the recovery rate. Large numbers of new recovered individuals are also reported on May 22 and July 4, that are, 53 thousand and 104 thousand, respectively; however, these are better balanced by the numbers of people leaving the infected group, that are, 29 thousand and 58 thousand, respectively, thus not significantly affecting the estimates of the infection and recovery rates. Overall, the estimated recovery rate is roughly 0.01, which translates into an average number of 100 days that an individual takes to move from the group of infected (I) to the group of recovered (R). Although the recovery duration seems overestimated, it is worth highlighting that this is an aggregate estimate of the recovery rate from multiple States, which therefore suffers from multiple different reporting delays, as well as from the different criteria used to declare an individual as fully recovered. It underscores the need for the USA to provide timely and consistent data, similar to that just analysed in Lombardia, if public health policy is to be driven by reliable estimation and prediction. Forecasts of the epidemic evolution evaluated every five days in the time period between May 6 and June 30 are reported in Fig. 11 . Table 3 presents the MAPEs calculated for each forecast and for different forecast horizons, that is, 3, 7, and 14 days. We note that the forecasts made on May 6, 11, 16, and 21, present the highest MAPEs; these forecasts, however, are negatively affected by the abrupt decrease of the recovery rate that follows April 30, as described above. The forecasts made from May 26, to June 30, instead, present MAPEs that are always below 3% at forecast horizons of 3 and 7 days, and always below 4% when the forecast horizon is 14 days. Overall, the results in Table 3 confirm those obtained with the data from Lombardia: the average MAPE for forecast horizons of 3, 7, and 14 days, is, respectively, 2.35 %, 3.03 %, and 4.16 %. Table 4 compares the average MAPEs obtained with the proposed algorithm, with those obtained with the SIR-fit and the GSEIRfit curve-fitting approaches. The proposed algorithm consistently outperforms both the curve-fitting approaches, in both the considered time intervals. Moreover, we observe a significant improvement of the MAPE for the proposed algorithm when considering the interval from April 1 to July 17, compared to the interval from March 10 to July 17, which again confirms the benefit of sequentially estimating the time-varying model parameters in pursuance of reliable and accurate forecasts. The proposed analysis presents some limitations that may lead to future extensions and these are worth exploring. First, the considered stochastic SIR model may be broadened to include the fraction of undocumented (or asymptomatic) and quarantined infected individuals. Undocumented infections usually include mild or asymptomatic cases that go undetected and, hence, based on their proportion and contagiousness, can potentially increase the spread of the disease. The portion of undocumented infectious cases is suspected to be a critical epidemiological characteristic that is not easy to quantify. Most of the available evidence on asymptomatic SARS-CoV-2 infections, reviewed and summarised in [43] for different circumscribed cohorts, suggests that this is a significant factor in the fast progression of the COVID-19 pandemic. However, the difficulty in quantification of undocumented cases is largely due to the imperfection of the data available, which does not accurately reflect a large, representative sample of the general population. Moreover, in order to distinguish asymptomatic and presymptomatic cases, longitudinal data -that is, repeated observations of the individuals over time -should be available. Another possible extension may be to separate people who are confirmed infected and home-quarantined into a dedicated epidemic compartment. In addition, the recovered compartment typical of the SIR model may be separated into two distinct recovery and death compartments in the detection phase so that the available data on reported cases can be taken into account separately. Furthermore, the considered stochastic SIR model describes the spread of the disease inside a single and confined population, e.g., a city, a region (Lombardia, Italy), a nation (USA). However, the approach presented in this article could be extended to more complicated metapopulation models, which introduce a further spatial travel/diffusion dimension in the dynamic and observation models. In particular, the population could be represented as a network of multiple spatially separated subpopulations nodes, such as multiple cities in the same region or nation. The interconnections among different populations represents the diffusion of people, thus contributing to the disease spread among the different subpopulations. All these limitations can be addressed in future studies because they mainly affect data analysis, and do not restrict the application and the effectiveness of the proposed approach to learning and forecasting the evolution of the critical epidemiological characteristics. The recent worldwide epidemic outbreak, due to a new strain of Coronavirus, has intensified research into novel mathematical models and algorithms that are able to reliably estimate and predict the epidemiological curve of the infection. In this article, we proposed a Bayesian sequential estimation and forecasting algorithm that, based on the information that authorities provide on a daily basis, is able to estimate the state of the epidemic and the parameters of the underlying model, as well as to forecast the evolution of the epidemiological curve. We developed an efficient implementation of the above-mentioned Bayesian framework, specifically tailored to the stochastic SIR model of pandemic evolution. The proposed algorithm is validated using synthetic data simulating two epidemic scenarios, and on real data acquired during the recent COVID-19 outbreak in the Lombardia region, Italy, and in the USA. Results show that the mean absolute percentage error computed after the lockdown is on average below 5% when the forecast is at 7 days, and below 10% when the forecast horizon is 14 days. Moreover, the described Bayesian framework outperforms curve-fitting approaches that use deterministic epidemiological models (e.g., SIR and GSEIR), particularly when a clear change of model parameters occur, e.g., a decrease of the infection rate following the lockdown. Finally, accurate and timely data collection, especially on recovered individuals, hospitalizations, intensive care unit admissions, and intubations, is essential for reliable model-based decisions. There exists an enormous amount of very recent literature related to the forecast of COVID-19 pandemic evolution, part of which has been reviewed at the beginning of this article. The analysis of this literature makes clear the effectiveness of model-based approaches, over less structured data-centric methodologies. In this respect, one lesson learned by the present study is that accurate epidemic modeling requires accurate estimation of time-varying parameters, such as the infection rate β. This is obviously true in the presence of abrupt changes of the underlying physical situation (e.g., adoption of drastic countermeasures) but, more interestingly, it is by no means limited to these extreme situations. One consequence is that, once the epidemic is under control, small variations in the estimated β may be used as a sensible proxy for the incipient detection of possible pandemic recrudescence. This section reports the expressions, derived by moment matching, used to compute x k + 1 | k (n, θ k ) be mean and covariance matrix of the n th posterior mixand (x k | n, θ k , z 1: k ). We aim to approximate the n th predicted mixand (x k + 1 | n, θ k , z 1: k ) to be Gaussian with mean and covariance matrix To simplify the notation, we hereafter omit the index n and the dependence on the parameter vector θ k . Following simple algebraic calculations, the components of the mean vector are obtained as We note that the products ı k | k s k | k induce nonlinearity in their predicted values, and hence require extra calculation for the associated uncertainties; indeed, the components of the covariance matrix are Σ s, k + 1 2 = s k | k 2 + Σ s, k 2 + β k 2 Δ t 2 s k | k 2 ı k | k 2 + s k | k 2 Σ i, k 2 + ı k | k 2 Σ s, k 2 + (1 + 2ρ k 2 )Σ s, k 2 Σ i, k 2 +4ρ k s k | k ı k | k Σ s, k Σ i, k, − 2β k Δt ı k | k (s k | k 2 + Σ s, k 2 ) + 2ρ k s k | k Σ s, k Σ i, k + P −1 β k Δt(s k | k ı k | k + ρ k Σ s, k Σ i, k ) − s k + 1 | k 2 , and ρ k + 1 Σ s, k + 1 Σ i, k + 1 = (1 − γ k Δ t − P −1 β k Δt)(s k | k ı k | k + ρ k, n Σ s, k Σ i, k ) − (1 − γ k Δt)β k Δt s k | k (ı k | k 2 + Σ i, k 2 ) +2ρ k ı k | k Σ s, k Σ i, k + β k Δt ı k | k (s k | k 2 + Σ s, k 2 ) +2ρ k s k | k Σ s, k Σ i, k − β k 2 Δ t 2 s k | k 2 ı k | k 2 + s k | k 2 Σ i, k 2 + ı k | k 2 Σ s, k 2 + (1 + 2ρ k 2 )Σ s, k 2 Δ i, k 2 +4ρ k s k | k ı k | k Σ s, k Σ i, k − s k + 1 | k ı k + 1 | k . Let us assume that the infection rate in the time interval [k -L, k] is linearly changing according to the following model L + 1 most recent MMSE estimates of the infection rate, that is, The estimator β . L, k is unbiased, and its variance is where σ ω, L 2 is an estimate of σ ω 2 , i.e., the variance of the noise term ω k,ℓ , given by In practice, the slope β . k is piecewise constant, that is, constant over subintervals of the entire interval [k -L, k]. Therefore, we aim to find the value L ★ , L min ⩽ L ★ ⩽ L max , -hence, the subinterval [k -L ★ , k] -such that the linear model with constant slope in (41) is a valid assumption for the time evolution of the infection rate. Let L be the candidate value for L ★ , and let us define the auxiliary variable y L, k as It is easy to verify that, if the linear model (41) The threshold T H is obtained as where P fa is the probability to reject hypothesis H 0 when it is true and F χ 1 is the inverse cumulative density function of a central chi-square variable with one degree of freedom. In order to select L ★ , we employ the following iterative procedure: if with L hypothesis H 0 is accepted, and with L − 1 hypothesis H 0 is rejected, then L ⋆ = L; otherwise, L is decreased and the test is repeated. This iterative procedure starts with L = L max . Note that, alternatively, the threshold could be obtained by fixing the probability to accept hypothesis H 0 when it is false; in such a case, the non-centrality parameter ν is approximated with the numerator of (42). Evolution of the infection rate in the simulated scenarios. Estimation and forecasting, respectively in solid and dashed lines, of (top) the infection rate and (bottom) the number of infected individuals in the second scenario; the superscripts E and F stand for estimate and forecast, respectively. The estimation is up to k = 57 (marked by a vertical dotted line), and the forecast is up to k = 80. The shaded areas represent the 90% confidence interval. Numbers of infected and recovered (i.e., hospital releases plus deaths) individuals in Lombardia, Italy, from February 24, 2020, to June 30, 2020 (data from Protezione Civile [38] ). The vertical dashed line indicates March 8, 2020, the beginning of the lockdown. The large steps on May 6 are due to an inaccurate reporting of the data, as explained in Section V-B1. Estimation and forecasting, respectively in solid and dashed lines, of the number of infected individuals in Lombardia, Italy (legend is reported in the bottom-right corner image; the superscript E stands for estimate, and the superscript F stands for forecast). The date corresponding to the end of the estimation and the beginning of the forecast is marked by a vertical dotted line (the leftmost vertical dashed line marks March 8, the beginning of the lockdown). In all the cases, the forecast horizon is June 30. The shaded area represents the 90% confidence interval. The poor forecasts made on April 28, and May 3, relate to the inaccurate data later provided on May 6, as explained in Section V-B1. Estimation and forecasting, respectively in solid and dashed lines, of the number of infected individuals in the USA (legend is reported in the bottom-right corner image; the superscript E stands for estimate, and the superscript F stands for forecast). The date corresponding to the end of the estimation and the beginning of the forecast is marked by a vertical dotted line. In all the cases, the forecast horizon is July 31. The shaded area represents the 90% confidence interval. The poor forecasts made on May 11 and 16, relate to the abrupt decrease of the estimated recovery rate that follows April 30 (cf. Fig. 10 Mean absolute percentage errors (MAPEs) of the forecasts of the epidemic evolution in Lombardia, Italy, performed at different dates and calculated for different forecast horizons, that is, 3, 7, and 14 days. The asterisk means that the forecast performed at a given date and with a given forecast horizon includes May 6, when inaccurate numbers of infected and recovered individuals were reported. The average (last row) does not take into account these cases. His research interests include statistical signal processing with emphasis on state estimation, data fusion, and multisensor multitarget tracking. He was a recipient of the Best Student Paper Award at the he joined the NATO Science & Technology Organization Centre for Maritime Research and Experimentation (CMRE), where he is currently a Senior Scientist with the Research Department and a Project Manager of the Data Knowledge and Operational Effectiveness (DKOE) Program. Furthermore, he leaded a number of research projects funded by the EU Horizon 2020 programme, by the U.S. Office of Naval Research (ONR), and by DRDC. He conducts research in the general area of statistical signal processing with emphasis on detection and estimation theory, wireless sensor network, multi-agent algorithms, target tracking and data fusion, adaptation and learning over graphs, and radar (sonar) signal processing. He has coauthored more than 100 publications in international scientific journals and conference proceedings His research interests include proactive decision support, uncertainty quantification, smart manufacturing, autonomy, knowledge representation, and optimization-based learning and inference. A common theme among these applications is that they are characterized by a great deal of uncertainty, complexity, and computational intractability. He is an Elected Fellow of the Connecticut Academy of Science and Engineering. He was selected by the IEEE Systems, Man, and Cybernetics (SMC) Society as the Outstanding Young Engineer of 1984, and received the Centennial Key to the Future Award. He was a co-recipient of the Andrew P. Sage Award for the Best SMC Transactions Paper in 1999, the Barry Carlton Award for the Best AES Transactions Paper in 2000, the 2002 and 2008 NASA Space Act Awards for A Comprehensive Toolset for Model-based Health Monitoring and Diagnosis, and Real-time Update of Fault-Test Dependencies of Dynamic Systems: A Comprehensive Toolset for Model-Based Health Monitoring and Diagnostics, and the Author manuscript; available in PMC Novel Coronavirus-China Coronavirus Disease (COVID-19) Pandemic Modeling infectious disease dynamics in the complex landscape of global health A contribution to the mathematical theory of epidemics Modeling infectious epidemics The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application Analysis and control of an SEIR epidemic system with nonlinear transmission rate The effectiveness of quarantine of Wuhan city against the corona virus disease 2019 (COVID-19): A well-mixed SEIR model analysis Evaluation and prediction of the COVID-19 variations at different input population and quarantine strategies, a case study in guangdong province, China Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China Sub-stantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Metapopulation epidemic models. A survey Analysis and control of epidemics: A survey of spreading processes on complex networks Population flow drives spatio-temporal distribution of COVID-19 in China The effect of human mobility and control measures on the COVID-19 epidemic in China Author manuscript Monitoring and prediction of an epidemic outbreak using syndromic observations Comparison of the performance of particle filter algorithms applied to tracking of a disease epidemic Sequential Monte Carlo filtering estimation of Ebola progression in West Africa Multi-class random matrix filtering for adaptive learning Self-tuning algorithms for multisensor-multitarget tracking using belief propagation Tracking Data Fusion: A Handbook Algorithms An Introduction to Signal Detection Estimation Roots of ensemble forecasting Ensemble forecasting Ensemble size: How suboptimal is less than infinity? Prediction of infectious disease epidemics via weighted density ensembles Real-time forecasting of epidemic trajectories using computational dynamic ensembles The ensemble Kalman filter: Theoretical formulation and practical implementation Gaussian sum particle filtering A look at Gaussian mixture reduction algorithms Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches Pattern Recognition and Machine Learning A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis Epidemic analysis of COVID-19 in China by dynamical modeling An interactive Web-based dashboard to track COVID-19 in real time COVID-19 Data Repository by the Center for Systems Science and Engineering Coronavirus: Three Out of Four Americans Under Some Form of Lockdown Prevalence of asymptomatic SARS-CoV-2 infection Average mean absolute percentage errors (MAPEs) of the forecasts of the epidemic evolution in Lombardia, Italy, obtained with the proposed algorithm, and with the SIR-fit and GSEIR-fit curve-fitting approaches, for different forecast horizons Author manuscript; available in PMC Average mean absolute percentage errors (MAPEs) of the forecasts of the epidemic evolution in the USA, obtained with the proposed algorithm, and with the SIR-fit and GSEIR-fit curve-fitting approaches, for different forecast horizons Author manuscript; available in PMC σ 1, k′ − 1 ( j) P −1 β k′ − 1 ( j) s k′ − 1 ( j) ı k′ − 1 ( j) 10 :Numbers of infected and recovered (i.e., hospital releases plus deaths) individuals in the USA, from March 1, 2020, to July 31, 2020 (data from JHU [41] ).