key: cord-0913581-2qjae0x7
authors: Mbuvha, Rendani; Marwala, Tshilidzi
title: Bayesian inference of COVID-19 spreading rates in South Africa
date: 2020-08-05
journal: PLoS One
DOI: 10.1371/journal.pone.0237126
sha: af98f7f9021b218e66167538c6868ed415d567f4
doc_id: 913581
cord_uid: 2qjae0x7

The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has highlighted the need for performing accurate inference with limited data. Fundamental to the design of rapid state responses is the ability to perform epidemiological model parameter inference for localised trajectory predictions. In this work, we perform Bayesian parameter inference using Markov Chain Monte Carlo (MCMC) methods on the Susceptible-Infected-Recovered (SIR) and Susceptible-Exposed-Infected-Recovered (SEIR) epidemiological models with time-varying spreading rates for South Africa. The results find two change points in the spreading rate of COVID-19 in South Africa as inferred from the confirmed cases. The first change point coincides with state enactment of a travel ban and the resultant containment of imported infections. The second change point coincides with the start of a state-led mass screening and testing programme which has highlighted community-level disease spread that was not well represented in the initial largely traveller based and private laboratory dominated testing data. The results further suggest that due to the likely effect of the national lockdown, community level transmissions are slower than the original imported case driven spread of the disease.

The first reported case of the novel coronavirus (SARS-CoV-2) in South Africa was announced on 5 March 2020, following the initial manifestation of the virus in Wuhan China in December 2019 [1] [2] [3] . Due to its further spread and the severity of its associated clinical outcomes, the disease was subsequently declared a pandemic by the World Health Organisation (WHO) on 11 March 2020 [1, 2] . In South Africa, by 26 April 2020, 4546 people had been confirmed to have been infected by the coronavirus with 87 fatalities [4] .

Numerous states have attempted to minimise the growth in number of COVID-19 infections [1, 5, 6] . These attempts are largely based on non-pharmaceutical interventions (NPIs) aimed at separating the infectious population from the susceptible population [1] .

These initiatives aim to strategically reduce the increase in infections to a level where their healthcare systems stand a chance of minimising the number of fatalities [1] . Some of the critical indicators for policymaker response planning include projections of the infected population, estimates of health care service demand and whether current containment measures are effective [1] . As the pandemic develops in a rapid and varied manner in most countries, calibration of epidemiological models based on available data can prove to be [7] . This difficulty is further escalated by the high number of asymptomatic cases and the limited testing capacity [1, 2] .

A fundamental issue when calibrating localised models is inferring parameters of compartmental models such as susceptible-infectious-recovered (SIR) and the susceptible-exposedinfectious-recovered (SEIR) that are widely used in infectious disease projections. In the view of public health policymakers, a critical aspect of projecting infections is the inference of parameters that align with the underlying trajectories in their jurisdictions. The spreading rate is a parameter of particular interest which is subject to changes due to voluntary social distancing measures and government-imposed contact bans.

The uncertainty in utilising these models is compounded by the limited data in the initial phases and the rapidly changing dynamics due to rapid public policy changes.

To address these complexities, we utilise the Bayesian Framework for the inference of epidemiological model parameters in South Africa. The Bayesian framework allows for both incorporation of prior knowledge and principled embedding of uncertainty in parameter estimation.

In this work we combine Bayesian inference with the compartmental SEIR and SIR models to infer time varying spreading rates that allow for quantification of the impact of government interventions in South Africa.

Compartmental models are a class of models that is widely used in epidemiology to model transitions between various stages of disease [1, 8, 9] . We now introduce the Susceptible-Exposed-Infectious-Recovered (SEIR) and the related Susceptible-Infectious-Recovered (SIR) compartmental models that have been dominant in COVID-19 modelling literature [1, 5, 6, 10] .

The Susceptible-Exposed-Infectious-Recovered Model. The SEIR is an established epidemiological model for the projection of infectious diseases. The SEIR models the transition of individuals between four stages of a condition, namely:

• being susceptible to the condition,

• being infected and in incubation • having the condition and being infectious to others and • having recovered and built immunity for the disease.

The SEIR can be interpreted as a four-state Markov chain which is illustrated diagrammatically in Fig 1. The SEIR relies on solving the system of ordinary differential equations below representing the analytic trajectory of the infectious disease [1] .

Where S is the susceptible population, I is the infected population, R is the recovered population and N is the total population where N = S + E + I + R. λ is the transmission rate, σ is the rate at which individuals in incubation become infectious, and μ is the recovery rate. 1/σ and 1/μ therefore, become the incubation period and contagious period respectively. We also consider the Susceptible-Infectious-Recovered (SIR) model which is a subclass of the SEIR model that assumes direct transition from the susceptible compartment to the infected (and infectious) compartment. The SIR is represented by three coupled ordinary differential equations rather than the four in the SEIR. Fig 2 depicts the three states of the SIR model.

The basic reproductive number R 0 . The basic reproductive number (R 0 ) represents the mean number of additional infections created by one infectious individual in a susceptible population. According to the latest available literature, without accounting for any social distancing policies the R 0 for COVID-19 is between 2 and 3.5 [2, 6, 10, 11] . R 0 can be expressed in terms of λ and μ as:

Extensions to the SEIR and SIR models. We use an extended version of the SEIR and SIR models of [6] that incorporates some of the observed phenomena relating to COVID-19. First we include a delay D in becoming infected (I new ) and being reported in the confirmed case statistics, such that the confirmed reported cases CR t at some time t are in the form [6] :

We further assume that the spreading rate λ is time-varying rather than constant with change points that are affected by government interventions and voluntary social distancing measures. 

We follow the framework of [6] to perform Bayesian inference for model parameters on the South African COVID-19 data. The Bayesian framework allows for the posterior inference of parameters which updates prior beliefs based on a data-driven likelihood. The posterior inference is governed by Bayes theorem as follows:

Where P(W|D, M) is the posterior distribution of a vector of model parameters (W) given the model(M) and observed data(D), P(D|W, M) is the data likelihood and P(D) is the evidence.

The likelihood. The Likelihood indicates the probability of observing the reported case data given the assumed model. In our study, we adopt the Student-T distribution as the Likelihood as suggested by [6] . Similar to a Gaussian likelihood, the Student-T likelihood allows for parameter updates that minimise discrepancies between the predicted and observed reported cases.

Priors. Parameter prior distributions encode some prior subject matter knowledge into parameter estimation. In the case of epidemiological model parameters, priors incorporate literature based expected values of parameters such as recovery rate(μ), spreading rate(λ), change points based on policy interventions etc.

The prior settings for the model parameters are listed in Table 1 . We follow [6] by selecting LogNormal distributions for λ and σ such that the initial mean basic reproductive number is 3.2 which is consistent with literature [2, 5, 6, 10, 12] . We set a LogNormal prior for the σ such that the mean incubation period is five days. We use the history of government interventions to set priors on change points in the spreading rate. The priors on change-points include 19/ 03/2020 when a travel ban and school closures were announced, and 28/03/2020 when a national lockdown was enforced. We keep the priors for the Lognormal distributions of the spreading rates after the change points weakly-informative by setting the same mean as λ 0 and higher variances across all change points. This has the effect of placing greater weight on the data driven likelihood. Similar to [6] we adopt weakly-informative Half-Cauchy priors for the initial conditions for the infected and exposed populations.

Markov Chain Monte Carlo (MCMC). Given that the closed-form inference of the posterior distributions on the parameters listed in Table 1 is infeasible, we make use of Markov Chain Monte Carlo to sample from the posterior. Monte Carlo methods approximate solutions [6, 10] . In this work, we explore inference using Metropolis-Hastings (MH), Slice Sampling and No-U-Turn Sampler (NUTS).

Metropolis Hastings (MH). MH is one of the simplest algorithms for generating a Markov Chain which converges to the correct stationary distribution. The MH generates proposed samples using a proposal distribution. A new parameter state W t � is accepted or rejected probabilistically based on the posterior likelihood ratio:

A common proposal distribution is a symmetric random walk obtained by adding Gaussian noise to a previously accepted parameter state. Random walk behaviour of such a proposal typically results in low sample acceptance rates. While sample acceptance is guaranteed with slice sampling, a large slice window can lead to computationally inefficient sampling while a small window can lead to poor mixing.

Hybrid Monte Carlo (HMC) and the No-U-Turn Sampler (NUTS). Metropolis-Hastings (MH) and slice sampling tend to exhibit excessive random walk behaviour-where the next state of the Markov Chain is randomly proposed from a proposal distribution [13] [14] [15] . This results in low proposal acceptance rates and small effective sample sizes.

HMC proposed by [16] reduces random walk behaviour by adding auxiliary momentum variables to the parameter space [15] . HMC creates a vector field around the current state using gradient information, which assigns the current state a trajectory towards a high probability next state [15] . The dynamical system formed by the model parameters W and the auxiliary momentum variables p is represented by the Hamiltonian H(W, p) written as follows [15, 16] :

Where M(W) is the negative log-likelihood of the posterior distribution in Eq 7, also referred to as the potential energy. K(p) is the kinetic energy defined by the kernel of a Gaussian with a covariance matrix M [17] :

The trajectory vector field is defined by considering the parameter space as a physical system that follows Hamiltonian Dynamics [15] . The dynamical equations governing the trajectory of the chain are then defined by Hamiltonian equations at a fictitious time t as follows [16] :

In practical terms, the dynamical trajectory is discretised using the leapfrog integrator. In the leapfrog integrator to reach the next point in the path, we take half a step in the momentum direction, followed by a full step in the direction of the model parameters-then ending with another half step in the momentum direction.

Due to the discretising errors arising from leapfrog integration a Metropolis acceptance step is then performed in order to accept or reject the new sample proposed by the trajectory [15, 18] . In the Metropolis step the parameters proposed by the HMC trajectory w � are accepted with the probability [16] :

Pðw � jD; a; b; HÞ PðwjD; a; b; HÞ

Algorithm 1 shows the pseudo-code for the HMC where � is a discretisation stepsize. The leapfrog steps are repeated until the maximum trajectory length L is reached. The HMC algorithm has multiple parameters that require tuning for efficient sampling, such as the step size and the trajectory length. In terms of trajectory length, a trajectory length that is too short leads to random walk behaviour similar to MH. While a trajectory length that is too long results in a trajectory that inefficiently traces back.

The stepsize is also a critical parameter for sampling, small stepsizes are computationally inefficient leading to correlated samples and poor mixing while large stepsizes compound discretisation errors leading to low acceptance rates. Tuning these parameters requires multiple time consuming trial runs.

NUTS automates the tuning of the leapfrog stepsize and trajectory length. In NUTS the stepsize is tuned during an initial burn-in phase by targeting particular levels of sample acceptance. The trajectory length is tuned by iteratively adding steps until either the chain starts to trace back (U-turn) or the Hamiltonian explodes (becomes infinite). We use the samplers described above to calibrate the SEIR and SIR models on daily new cases and cumulative cases data for South Africa up to and including 20 April 2020 provided by Johns Hopkins University's Center for Systems Science and Engineering(CSSE) [3] .

SIR and SEIR model parameter inference was performed using confirmed cases data up to and including 20 April 2020 and MCMC samplers described in the methodology section. Each of the samplers are run such that 5000 samples are drawn with 1000 burn-in and tuning steps. We use leave-one-out(LOO) cross-validation error of [19] to evaluate the goodness of fit of each model. Table 2 shows the LOO validation errors of the various models. It can be seen that the SIR model with two change points as the best model fit with the lowest mean LOO of 448.00. The 

SEIR model with two change points showed a mean LOO of 459.94. We note that [6] similarly finds that the SIR model displayed superior goodness of fit to the SEIR on German data. We now further present detailed results of the SIR and SEIR models with inference using NUTS, the trace plots from these models indicating stationarity in the sampling chains are provided in S2 and S5 Figs. The trace plots for the SIR and SEIR models using MH are provided in S3 and S6 Figs, while similar trace plots for slice sampling are provided in S4 and S7 Figs. The trace plots largely indicate that the NUTS sampler displays greater agreement between parallel chains thus lower rhat values. 

Time-varying spread rates allow for inference of the impact of various state and societal interventions on the spreading rate. Fig 5 shows the fit and projections based on SIR models with zero, one and two change points. As can be seen from the plot the two change point model best captures the trajectory in the development of new cases relative to the zero and one change point models. The superior goodness of fit of the two change point model is also illustrated in Table 2 . The fit and projections showing similar behaviour on the SEIR model with various change points are shown in Fig 6. 

The mean reporting delay time in days was found to be 6.848 (CI [5.178, 8.165 ]), literature suggests this delay includes both the incubation period and the test reporting lags. The posterior distribution incubation period from the SEIR model in Fig 7 yields The inference of parameters is dependent on the underlying testing processes that generate the confirmed case data. The effect of the mass screening and testing campaign was to change the underlying confirmed case data generating process by widening the criteria of those eligible for testing. While initial testing focused on individuals that either had exposure to known cases or travelled to known COVID-19 affected countries, mass screening and testing further introduced detection of community level transmissions which may contain undocumented contact and exposure to COVID-19 positive individuals.

We have performed Bayesian parameter inference of the SIR and SEIR models using MCMC and publicly available data as at 20 April 2020. The resulting parameter estimates fall in-line with the existing literature in-terms of mean baseline R 0 (before government action), mean incubation time and mean infectious period [2, 5, 6, 10] .

We find that initial government action that mainly included a travel ban, school closures and stay-home orders resulted in a mean decline of 80% in the spreading rate. Further government action through mass screening and testing campaigns resulted in a second trajectory change point. This latter change point is mainly driven by the widening of the population eligible for testing, from travellers (and their known contacts) to include the generalised community who would have probably not afforded private lab testing which dominated the initial data. This resulted in an increase of R 0 to 1.304. The effect of mass screening and testing can also be seen in Fig 9 indicating a mean increase in daily tests preformed from 1639 to 4374.

The second change point illustrates the possible existence of multiple pandemics, as suggested by [20] . Thus testing after 28 March is more indicative of community-level transmissions that were possibly not as well documented in-terms of contact tracing and isolation relative to the initial imported infection driven pandemic. This is also supported by the documented increase in public laboratory testing (relative to private) past this change point, suggesting health care access might also play a role in the detection of community-level infections [21] .

We have utilised a Bayesian inference framework to infer time-varying spreading rates of COVID-19 in South Africa. The time-varying spreading rates allow us to estimate the effects of government actions on the dynamics of the pandemic.

The results indicate a decrease in the mean spreading rate of 60%, which mainly coincides with the containment of imported infections, school closures and stay at home orders.

The results also indicate the emergence of community-level infections which are increasingly being highlighted by the mass screening and testing campaign. The development of the community level transmissions (R 0 � 1.3041 (CI[0.887, 1.7748])) of the pandemic at the time of publication appears to be slower than that of the initial traveller based pandemic (R 0 � 3.278 (CI[2.715, 3.73])).

A future improvement to this work could include extensions to regional and provincial studies as current data suggests varied spreading rates both regionally and provincially. As more government interventions come to play priors on more change points might also be necessary. 

On Data-Driven Management of the COVID-19 Outbreak in South Africa. medRxiv

COVID-19)

An interactive web-based dashboard to track COVID-19 in real time. The Lancet infectious diseases

Coronavirus disease (COVID-19) case data-South Africa

Impact of nonpharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand

Inferring COVID-19 spreading rates and potential change points for case number forecasts

An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China

Compartmental Models in Epidemiology

An introduction to compartmental modeling for the budding infectious disease modeler

Fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the SARS-CoV-2 epidemic

Feasibility of controlling 2019-nCoV outbreaks by isolation of cases and contacts. medRxiv

Detecting Suspected Epidemic Cases Using Trajectory Big Data

Bayesian Automatic Relevance Determination for Feature Selection in Credit Default Modelling

Sampling Techniques in Bayesian Finite Element Model Updating

Automatic Relevance Determination Bayesian Neural Networks for Credit Card Default Modelling

Bayesian learning via stochastic dynamics

Bayesian learning for neural networks

MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo

Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC

SA's COVID-19 Pandemic Trends and Next Steps

Update on COVID-19