key: cord-0232045-clje3b2r authors: Ghanam, Ryad; Boone, Edward L.; Abdel-Salam, Abdel-Salam G. title: SEIRD Model for Qatar Covid-19 Outbreak: A Case Study date: 2020-05-26 journal: nan DOI: nan sha: e4a31943d67c43f1b0e533ee264bdf2ca68f2a04 doc_id: 232045 cord_uid: clje3b2r The Covid-19 outbreak of 2020 has required many governments to develop mathematical-statistical models of the outbreak for policy and planning purposes. This work provides a tutorial on building a compartmental model using Susceptibles, Exposed, Infected, Recovered and Deaths status through time. A Bayesian Framework is utilized to perform both parameter estimation and predictions. This model uses interventions to quantify the impact of various government attempts to slow the spread of the virus. Predictions are also made to determine when the peak Active Infections will occur. Coronavirus Disease (COVID-) (Wu et al. ( ); Rezabakhsh, Ala, and Khodaei ( )) is a severe pandemic a fecting the whole world with a fast spreading regime, requiring to perform strict precautions to keep it under control. As there is no cure and target treatment yet, establishing those precautions become inevitable. These limitations (Giuliani, et al. ( )) can be listed as social distancing, closure of businesses and schools and travel prohibitions (Chinazzi et al. ( )). Corona Virus is a new human Betacoronavirus that uses densely glycosylated spike protein to penetrate host cells. The COVID-belongs to the same family classi cation with Nidovirales, viruses that use a nested set of mRNAs to replicate and it further falls under the subfamily of alpha, beta, gamma and delta Co-Vis. The virus that causes COVID-belongs to the Betacoronavirus B lineage and has a close relationship with SARS species. It is a novel virus since the monoclonal antibodies do not exhibit a high degree of binding to SARS-CoV-. Replication of the viral RNA occurs when RNA polymerase binds and re-attaches to multiple locations (McIntosh ( ); Fisher and Heyman ( )). Cases of COVID-started in December when a strange condition was reported in Wuhan, China. This virus has a global mortality rate of . %, which makes it more severe in relation to u. The elderly who have other pre-existing illnesses are succumbing more to the COVID-. People with only mild symptoms recover within to days, while those with conditions such as pneumonia or severe diseases take weeks to recover. The recovery percentage of patients, for example, in China stands at %. The recovery percentage rate of COVID-is expected to hit % (WHO ( )). The virus has spread from China to other countries and territories across the globe. From Wuhan, Hubei province, the virus spread to Mainland China, Thailand, Japan, South Korea, Vietnam, Singapore, Italy, Iran, and other countries. The State of Qatar was one of the countries that were a fected by the COVID-spreading and the rst infected case was reported on th of February and it could be considered the nd highest in the Arab World with the number of con rmed cases , as of May , . For e fectively specifying such security measures, it is essential to have a real-time monitoring system of the infection, recovery and death rates. Develop, implement and deploy a data-driven forecasting model for use by stakeholders in the State of Qatar to deal with the Covid-pandemic. The model will focus on infected, deaths and recovered as those are the only data available at this time. This document is organized in the following manner. In Section the SEIRD that is employed is de ned. Next, Section introduces the data available and gives description. Then shows how interventions are incorporated into the model. The Let S(t) be the number of people Susceptible at time t, E(t) be the number of people Exposed at time t, I (t) be the number of Infected at time t, R(t) be the cumulative number of recovered at time t and D(t) be the cumulative number of Deaths at time t. This can be modeled with the following system of ordinary di ferential equations: where α is the transmission rate from Susceptibles to Exposed, β is the rate at which Exposed become Infected, γ is the rate at which Infected become recovered and η is the mortality rate for those Infected. Notice that, this model formulation makes several key assumptions: . Immigration, emigration, natural mortality and births are negligible over the time frame and hence are not in the model. . Once a person is in the Infected group, they are quarantined and hence they do not mix with the Susceptible population. . The Recovered and Deaths compartments are for those who rst are infected. There is no compartment for those Exposed who do not become sick (Infected) and Recover on their own. Traditional analysis would include a steady state analysis, however, in this case the dynamics of the short term is of interest. Hence, this work does not address any steady state or equilibrium concerns. This work is concerned with tting the model given in ( ) to the Covid-data concerning the State of Qatar during the outbreak and using the model for forecasting several possible scenarios. The Johns Hopkins Covid-github site includes for every country for each day the cumulative number of con rmed infections, cumulative number of recovered and the cumulative number of deaths for each day starting January . The data for Qatar was obtained. Notice that in model ( ) the Recovered and Death states are cumulative as once one enters the compartment their is no exit. However, the Infected compartment has transitions from Exposed and to Recovered and Deaths. Hence the data provided for con rmed infections is cumulative and included both Recovered and Deaths and will need to be removed from this compartment's data. Let CI (t) be the Con rmed Infections at time t and let Infected I (t) be de ned as: For clarity the term "Active Infections" will be used to denote this derived variable versus the cumulative Infected provided in the data. Figure shows the plots of the Active Infections, Recovered and Deaths data for Qatar for the days since February . Notice that the Active Infections are very low until around day when there is large jump due to increased testing. The Active Infections then seems to plateau for until day , after which there is extreme growth in Active Infections. There seems to be a similar pattern for the Recovered with a delay showing the time of infection before recovery. The plot for Deaths shows no deaths until day and then a steady increase in Deaths for the remaining days. The State of Qatar, prepared an excellent exible plan for risk management, grounded on national risk assessment, taking account of the global risk assessment done by WHO, focuses on reinforce capacities to reduce or eliminate the health risks from COVID-. Embed complete emergency risk management strategy in the health sector. Furthermore, Enabling and promoting , closed all parks and public beaches to curb the spread of coronavirus. On March , (day ), the Ministry of Commerce and Industry decided to temporarily close all restaurants, cafes, food outlets, and food trucks at the main public era. Also, the Ministry of Commerce and Industry decided to close all the unnecessary business on March , (day ) Hamad Medical Corporation ( ) and MPH-Qatar ( ). These interventions taken by the government change the dynamics of the system and hence need to be incorporated into the model. The next section details how we introduce interventions both from the government and interventions guided by the data. In Figure , one can see the jump at day and a plateau until day . The model needs to be able to handle interventions made by the Government of the State of Qatar. The main parameter that policy can in uence is α, the rate of transmission from Exposed to Susceptible. One way to implement this the use of indicator functions W k (t) de ned as: where t k is the time where the k th intervention is taken and index k = , , .., K . For each intervention there needs to be a change to the value of α, denoted α k , that captures the impact of the intervention. Let the vector This formulation gives the following transitions rates between S(t) and E(t): which will require the following constraints due to the fact that α(t) > for all t: Let be the set de ned by the constraints above. In addition to changes in infection rates α, impulse functions can be used to model dramatic one time shifts in transitions between states. A Dirac delta function de ned by ). This can be integrated in the model to capture spikes in the number of cases. In our case the State of Qatar data shows exhibits this type of behavior at day where one can clearly see a large jump in the number of infections. This is incorporated into the model presented by a Dirac delta function, δ(t − τ), in transition rate between Exposed and Infected, which is coupled with a coe cient to β A to capture the impact of the jump. Due to the complexity of the model the Bayesian inferential framework is chosen. Recall, Bayes formula is given by (Bayes and Price, ) : where π(θ|D) is the posterior probability distribution for the parameters θ given the data D, π(θ) is the prior distribution of θ and L(D|θ) is the likelihood of the data given θ. In order to specify the likelihood of the model in equation ( ) the model modi ed to model the mean abundance in each compartment and is given by: and D(t), respectively and the parameters have the same de nition as provided in the system given in equation ( ). Since there is no data for S(t) and E(t) these compartments will be latent variables and will not directly factor into the likelihood. The likelihood for I (t), R(t) and D(t) are given by: To specify the prior distributions for α, β A , β, γ and η one must incorporate the following constraints α > , β > , γ > and η > . Hence the following prior distributions are set: where C ( ) is an indicator function takes the value if α ∈ . This serves to truncate the normal distribution in order to keep α in the feasible range of values. The likelihood and prior distributions speci cations lead to the following posterior distribution when a = and σ = : The posterior distribution does not lend to any analytic solution, hence Markov chain Monte Carlo (MCMC) techniques will be used to sample from the posterior distribution (Gelman et al., ) . Speci cally Metropolis-Hastings sampler is used to obtain samples from the posterior distribution (Gilks, Richardson, and Spiegelhalter, ) and (Albert, (@) . To tune the sampler a series of short chains were generated and analyzed for convergence and adequate acceptance rates. These initial short chains were discarded as "burn-in" samples. The tuned sampler was used to generate , samples from π(α, β A , β, γ, η|D) and trace plots were visually examined for convergence and deemed to be acceptable. All inferences will be made from these , samples. The model and sampling algorithm is custom programmed in the R statistical programming language version . . . The computation takes approximately seconds using a AMD A -. GHz processor with GB of RAM to obtain , samples from the posterior distribution. For more on statistical inference see Wackerly, Mendenhall, and Schea fer ( ), Casella and Berger ( ), and Berger ( ). To apply the model the following initial conditions are speci ed: S( ) = , , , E( ) = , I ( ) = , R( ) = and D( ) = . Here S( ) is the current population of the State of Qatar, I ( ), R( ) and D( ) are obtained directly from the data. The choice of E( ) was used as it a minimal value that would allow the disease to spread but not so large as to make the spread rapid. Several values of E( ) were explored and the value of was found to have the best t. Furthermore, model interventions were placed at days t = , t = , t = , t = and t = with an Dirac delta impulse at time τ = . Table shows the means, standard deviations and the . %, . % and . % quantiles for the model parameters based on the , samples from the posterior distribution. Notice that, α = . × − and α = − . × − are very close in magnitude with di ferent signs indicating that the rst intervention drastically reduced the transmission rate. Similarly one can see that the second and third interventions α = . × − and α = − . × − essentially are of the same magnitude with di ferent signs which when added resulting in a very low transmission rate. However, α = . × − is a small increase with a moderate decrease in α = − . × − which still leaves a nal transmission rate of K k= α k ≈ . × − . Of particular note is the mean mortality rate η = . ≈ / which means that about in , people perish from the disease each day, which is quite low. Also note that the mean infection (con rmed) rate is β = . ≈ / . which corresponds to about in . exposed people become con rmed each day. The quantile intervals provide a % credible interval for the parameters and can be used to obtain a range of reasonable parameter values. For example for the parameter β the interval is ( . , . ) meaning that the probability that β is between ( . , . ) is . . This can be used to create an interval for the risk interpretations as between / . ≈ . and / . ≈ . Exposed people are con rmed as infected each day. This also gives insight into how many people who may be in the population who are Exposed and may be infectious but do not yet exhibit symptoms. Recall that β A is associated with the Dirac delta function for impulse to model the jump in transition rate from Exposed to Infected at day . Notice that β A ≈ . means that there is a one time in ux of approximately . % of the people Exposed moved to the Infected compartment. Hence the increased testing captured many of the Exposed people. By adding this to the natural Exposed to Infected rate of β = . one obtains the one time transmission rate of β + β A = . + . = . corresponding to a total of approximately . % of Exposed being con rmed as Infected. Leaving the remaining approximately . % of Exposed people still interacting with the Susceptible population. While many of the parameters do not lend well to the traditional H : θ = hypothesis testing as they must be positive. We can conduct simple hypothesis tests on the α parameters to look for signi cant changes due to interventions using contrasts. Speci cally the sequential contrasts of α − α , α − α , α − α , α − α and α − α . These contrasts quantify the changes that in transmission rate from Susceptible to Exposed due to the interventions and are what policy makers want to see. Furthermore, they want a statistical test on whether or not the intervention performed in a statistically signi cant manner. This can be done by simply subtracting the MCMC samples to generate the contrast of interest. Using these subtracted samples one can look at the mean, standard deviation, quantiles and the proportion of samples above , P(> ). Table shows these quantities for the contrasts listed above. Notice that the intervention at day reduced the transmission rate by approximately . × − ) which is considerable and the proportion of samples above was . indicating a statistically signi cant change due to the intervention. The intervention taken at day , α − α , actually increased the transmission rate, where the intervention taken at day , α − α , then reduced the transmission rate. Similarly the other two interventions increased and then decreased the transmission rates, respectively. Furthermore, all interventions be deemed statistically signi cant since P(> ) is either . or . indicating signi cance. The model formulation also allows for the individual transmission rates to be computed by simply summing up the α k through to the desired time point. Table gives the mean, standard deviation and (Q . , Q . , Q . ) for the transmission rate of Exposed to Infected across each time interval. This is done by simply add the corresponding MCMC samples. This is another perspective on how the transmission rate changes across the time frame. Notice that all of the transmission rates are positive which is required by the model speci cation. Also notice that the mean transmission rates vary in orders of magnitude from . × − to . × − . One interesting point that should be made is the highest transmission rate is at the beginning and the lowest transmission rate is at the end. This is evidence that the interventions that the Qatari government has ultimately reduced the transmission rate. To assess the t of the model the posterior predictive distribution was used and is given by: Using the samples , samples from the posterior distribution , samples were generated from the posterior predictive distribution. At each time t the median, . and . quantiles were obtained to form a posterior predictive interval. Figure shows the model ts for Active Infections, Recovered and Deaths with posterior predictive bands. Notice that, the model does quite well at tting the dynamics of the Active Infections including the jump at day and captures the plateau and the exponential growth after the plateau as well. The Recovered model does ts well as does the Deaths data. To assess the explained variance a pseudo-R was formed using the median from the posterior predictive distribution at each time as the point estimates. This resulted in a pseudo-R of . which indicates the tted model explains approximately . % of the variance in the data. Based on this the model is deemed to t well. It should be noted that standard data splitting procedures for model validation are di cult in this scenario as removing values from the system may cause unstable behavior. To assess the model performance predictive performance is utilized with days from May and May used as a test set. Using the samples from the posterior distribution the posterior predictive distribution was computed for each day of the test set and % posterior predictive intervals were created using the . % and . % quantiles. The test data is then compared to the posterior predictive intervals for each of the endpoints. Another view of predictive performance is to examine pseudo-predictive-R which compares the predicted values with the actual values for the test set. This calculation leads to a pseudo-predictive-R ≈ . which is slightly lower than the pseudo-R associated with the t of the model to the data in the training set but is still very high. Of course several other measures of predictive performance exist however this is the easiest to understand as it measures the amount of variation explained by the predictions across the test set. This work has demonstrated how to build a SEIRD model for the Covid-outbreak in the State of Qatar, include interventions, estimate model parameters and generate posterior predictive intervals using a Bayesian framework. Furthermore, the model is able to treat the Susceptible and Exposed compartment as latent variables, as no data is observed about them other than approximate initial values. The model ts the data quite well with a pseudo-R ≈ . and predicts reasonably well with pseudo-predictive-R ≈ . . One can also note that in the model de nition, no immigration, emigration, natural births and natural mortality were not included and based on the high psuedo-R would have a negligible e fect on t. Furthermore, the model did not contain compartments for those who recovered without being con rmed infections. As this was not observed one can only speculate on the impact that additional data would have on the model t, however it would be very small. The modeling paradigm is quite exible for modeling the Covid-data as it easily incorporates interventions into the system and can quantify the impact of the intervention. Furthermore, using simple di ferences the model can be used to predict new infections as well. Figure shows the plots of the new infections with predictive bands based on . % and . % based on , samples from the posterior prediction distribution at each time point. Notice that the model does well at capturing the jump at day and the bands capture most of the data. The drop beginning at the intervention at day provides for the drop in daily infection rates. Another use of the model may be for long term predictions. While this is extrapolation it does provide policy makers a tool for planning, provided nothing changes, i.e. no interventions are taken. It also allows policy makers to see the possible long term e fects of their decisions. Figure shows the long-term predictions of the model if no other interventions are made past May . Notice that the predictions do eventually decrease across the future time frame. Notice the width of the predictive bands for times farther in the future. This re ects the uncertainty associated with extrapolating into the future. However, one item that we can calculate from this is a % predictive interval for the peak infection time. By simply recording the maximum value for each of the predictive distribution trajectories from the MCMC samples one can obtain a distribution of the time for the maximum. In this case this gives the % predictive interval for the maximum to be ( , ). This means that the peak infection time will be between day ( June ) and day ( August ) of the outbreak given that no other interventions or process changes occur. Fig. also shows this interval given by the dark dashed vertical lines. The width of the interval quanti es the uncertainty about where the maximum active infections will occur. Since the width of the interval is days, this indicates that there is a large amount of uncertainty on when the number of active infections will begin to decline. Future work could be to add an overdispersion parameter into the model to allow for the more accurate capture of uncertainty. Furthermore, one can perform simulation studies to better understand how the model may perform under various scenarios. Feature selection methods could be employed to select where the interventions should be placed as well as other forms of interventions could be included in the model. Another possibility to address any deviations from the standard model a semi-parametric technique could be studied as well. Quantiles from , samples from the posterior predictive distribution. Dark dashed vertical lines give the % predictive interval for the maximum active infections. Australian Health Sector Emergency Response Plan for Novel Coronavirus (COVID-) Qatar National Preparedness and Response Plan for Communicable Diseases. Doha, Qatar. Primary Healthcare Corporation COVID-Strategic Preparedness and Response Plan Operational Planning Guidelines to Support Country Preparedness and Response An Essay towards solving a Problem in the Doctrine of Chance. By the late Rev. Mr. Bayes, communicated by Mr Statistical Inference, nd Edition Markov chain Monte Carlo in Practice Bayesian Computation with R: Second Edition Statistical Decision Theory and Bayesian Analysis, Second Edition The Principles of Quantum Mechanics, Fourth Edition The authors would like to acknowledge to the State of Qatar and the Ministry of Health for the daily updates and additional data. In addition the authors would like thank Virginia Commonwealth University in Qatar and Qatar University for supporting this e fort.