key: cord-0485815-vcyhx7sp
authors: Karlen, Dean
title: Characterizing the spread of CoViD-19
date: 2020-07-14
journal: nan
DOI: nan
sha: e9a29b9192e642d09b502557a9343d53e404d410
doc_id: 485815
cord_uid: vcyhx7sp

Since the beginning of the epidemic, daily reports of CoViD-19 cases, hospitalizations, and deaths from around the world have been publicly available. This paper describes methods to characterize broad features of the spread of the disease, with relatively long periods of constant transmission rates, using a new population modeling framework based on discrete-time difference equations. Comparative parameters are chosen for their weak dependence on model assumptions. Approaches for their point and interval estimation, accounting for additional sources of variance in the case data, are presented. These methods provide a basis to quantitatively assess the impact of changes to social distancing policies using publicly available data. As examples, data from Ontario and German states are analyzed using this framework. German case data show a small increase in transmission rates following the relaxation of lock-down rules on May 6, 2020. By combining case and death data from Germany, the mean and standard deviation of the time from infection to death are estimated.

The CoViD-19 epidemic gained world-wide attention in March 2020 as the number of cases began to rise rapidly.

For most people, this was their first experience with an epidemic and public health measures enacted to reduce social contact. While these measures caused major disruptions in daily life, they eventually reduced the rate of growth in case numbers. Several regions saw localized outbreaks, particularly in long term care facilities and meat processing facilities. After a few months, social distancing regulations began to be relaxed, with the intention that viral spread would kept at a manageable level.

While this describes the general experience in many western nations, there are significant differences seen within some nations, in their timelines and growth rates. This paper presents methods to characterize differences in the spread of CoViD-19 that can be deduced from publicly available data. This information can be useful for assessing the current situation and to identify successful strategies to reduce transmission. This paper introduces a new general-purpose modelling framework developed to study the spread of CoViD-19. A simple model, chosen to interpret CoViD-19 daily data, is described and its properties are investigated. Comparative statistics, chosen to have weak dependence on model assumptions, are defined. Data analysis approaches to estimate model parameters and their uncertainties are described, along with studies of simulated data. 1 University of Victoria 2 TRIUMF

The python Population Modeller (pyPM 1 ), is a general framework for building models of viral spread using discretetime difference equations. A pyPM model is an object that consists of a set of population objects connected by an ordered list of directional connector objects. Parameter objects are used to manage the various adjustable parameters necessary to define a specific model. The object oriented design separates the task of model design from numerical implementation which reduces the risk of implementation errors and simplifies the process of model redesign. The model object, containing the model design and its parameters, can be saved in small files, allowing for a multitude of models to be cataloged.

Two main reasons favor the use of discrete-time difference equations for this study, over the traditional ordinary differential equation approach. Firstly, with discrete-time difference equations, it is straightforward to implement arbitrary time delay distributions, such as normal distributions with mean and standard deviations specified as parameters. Secondly, the purpose of the modeller is to interpret publicly available data reported as daily counts, and with a step size of one day, daily expectations or simulated daily counts are computed directly. The daily values are saved as a time series in the population objects, and referred to as "histories".

The basic connector, called a propagator, transfers a fraction of incoming members from one population (referred to as the "from-population") to another (referred to as the "to-population") according to a delay distribution. The delayed transfer is accomplished by having each population maintain a list of incoming contributions for each day in the future. A "splitter" connector divides the incoming "from-population" members to two or more "to-populations".

A "multiplier" connector produces a number of new members in the "to-population", based on the sizes of the "frompopulations" and forms the basis of the infection cycle. "Subtractor" and "Adder" connectors subtract from or add to populations without delay.

Prior to taking a time step, the connectors are processed in order, with each calculating the future contributions to its "to-populations", given the number of "from-population" members arriving at the next time step. After all calculations are completed, the time step is taken, by having each population extend its history with a new value, consisting of the previous day's number and the future contribution for that day.

As an illustration, consider a model in which a fraction f u of the symptomatic population S are admitted into ICU, population U , and those are split into two populations, a fraction f v are treated with a ventilator V , and the remaining are released without ventilation, W . If the current time step is t, the expected incoming population for S is E[∆S t+1 ].

Future expected contributions to population U arising from that incoming population to S are:

where g u is the distribution that defines the delay in the symptomatic population transferring to the ICU population.

Following that calculation, the future contributions to the ventilated and non-ventilated populations, V and N , are calculated in a similar fashion, Alternatively, instead of calculating expectation values, the calculations can be performed as a stochastic simulation,   where population sizes are defined by integers, for example 

where B returns a binomial variate and M returns multinomial variates. The stochastic simulation of the infection cycle would use a Poisson variate if infections were described by independent events. To account for grouping of infections, negative binomial variates are used, parameterized by the mean µ and p nb ∈ (0, 1), such that the variance is µ/p nb .

The stochastic calculations are useful for checking for bias and for evaluating standard deviations of estimators.

The pyPM reference model 2.3 is a simple model developed to characterize the spread of CoViD-19. It includes an infection cycle connected to populations that correspond to publicly available data for cases, deaths, and hospitalization, including ICU and ventilator use.

Results in this paper use pyPM reference model 2.3 as the nominal model, and its infection cycle is defined by three connectors. Firstly, for each day, the expected number of new infections is calculated as:

where α is the transmission rate, S the susceptible population, N the total population, and C the circulating contagious population. In the initial stages of an epidemic, when almost the entire population is susceptible, α represents the average number of people that a contagious person infects in one day. Its value can be considered to be linearly proportional to the number of contacts and closeness of contact between individuals, and therefore it is a parameter that is linearly proportional to social distancing measures.

The second connector propagates a fraction of the newly infected population to the circulating contagious population, with a delay distribution modeling the so-called "latent period". The third connector reduces the size of the circulating contagious population using a propagator to a removed population and a subtractor is used to remove them.

The propagator uses a delay distribution to represent the "contagious circulation period" which arises from all ways that contagious individuals become unable to infect others, including quarantine, hospitalization, natural recovery, or death.

In the initial stages and when the transmission rate and circulation time are constant, the "steady state" solution to the time difference equations has the expected contagious population being exponential in time,

The parameter describing fractional daily growth, identified as δ in this paper, is often referred to as r in epidemiology literature. To ensure that the initial state corresponds to a "steady state" solution, the model uses a "boot-strap" approach. A state with a very small expectation value for the contagious population size is allowed to grow until the target contagious population for the initial state is reached. While only the initial contagious population needs to be specified, all other populations will be assigned non-zero expectation values at the initial state that correspond to a "steady state" solution.

The infection cycle model is purposefully simplistic, being described by a homogeneous population. This reduces the number of parameters and characterizes the epidemic spread for those groups in the population that contribute the greatest numbers to cases, hospitalization, or deaths. While the pyPM framework includes ensembles for modelling heterogeneous populations, there is little public data available to constrain the many additional parameters. For this paper, we characterize the spread using a homogeneous model.

The pyPM reference model 2.3 connects the contagious population to the symptomatic population and from symptomatic to reported (positive test cases), icu admission, and non-icu admission populations. It also has an independent propagator from contagious to recovery or death. Each of these connectors is parametrized by a fraction and a normal delay distribution.

If testing captures a constant fraction of contagious individuals, then the expected cases will follow the contagious population trajectory with a time lag. The case data can therefore be used to estimate δ during these periods without reference to a particular model. Hospitalization and death rates provide additional measures of the infection trajectory.

When these are seen to follow different growth curves, this may indicate differences in transmission by age or other factors, since these data are unequal samplings of the infected population.

It would be useful to characterize the phases of an epidemic with a transmission rate parameter like α that in some sense is linearly proportional to social distancing. By doing so, statements can be made about relative levels of social distancing observed in past phases and levels required going forward. Unfortunately, unlike for δ, estimating the transmission rate, α, from case data sensitively depends on the latent and circulation delay distributions which are not well known. The well known parameter to characterize the growth rate, the reproduction number R 0 , also suffers from strong model dependence 2,3 . Figure 1 shows how changing the delay distributions affects the relation between the exponential growth and the transmission rate. Increasing the mean circulation period increases δ, since contagious individuals have more time to infect more people. Increasing the latent period reduces δ for a growth phase and increases δ for a decline phase, since newly infected individuals take longer to become contagious thereby reducing the feedback responsible for producing the exponential growth or decline.

To further illustrate the sensitivity to the delay distribution parameters, consider the default conditions for pyPM reference model 2.3, which describes a period of growth followed by a period of decline:

• C 0 (initial contagious population): cont 0 = 10 • α 0 (initial transmission rate): alpha 0 = 0.4

• t 1 (day # for transmission rate change): trans rate 1 time = 20

• α 1 (new transmission rate): alpha 0 = 0.1

• µ (latent period mean): cont delay mean = 5 (days)

• σ (latent period standard deviation): cont delay sigma = 3 (days)

• c µ (circulation period mean): removed delay mean = 7 (days)

• c σ (circulation period standard deviation): removed delay sigma = 3 (days) If case data produced according to this nominal model, is analyzed with a modified model, having different assumptions for the latent and circulation periods, the estimators for α 0 and α 1 will be biased. To estimate the bias, assume that estimators for the model independent parameters, δ 0 and δ 1 , are unbiased. Given a choice for the delay parameters, the relation betweenα andδ can be found empiricallŷ

which would be the inverse of the functions shown in Fig. 1 . With sufficient statistics, the bias in the transmission rate estimators would be approximately Table 1 shows the expectation values and the relative bias for the transmission rate estimators for reasonable alternative delay parameter values. For many cases, the bias is larger than typical standard deviations of the estimators (statistical uncertainty). When alternative latent period parameters are used, the bias for the growth and decline estimators have opposite sign.

From this study, we find that transmission rates, or relative transmission rates, are not good choices for characterizing growth or decline due to their sensitivity to poorly known model parameters. Instead, in this paper we use the nominal model parameters to form estimators for α and convert those to estimators for δ. As shown in section 4, the biases in estimators for δ are found to be small. Table 1 . The size of transmission rate estimator biases for alternative delay parameter values for an epidemic having a growth period followed by decline, as described in the text. The last column shows that the ratio of the estimated transmission rates varies from 3 to 6 depending on the delay parameters. 

In addition to the rate of growth or decline, represented by δ, the size of the circulating contagious population is important to characterize the state of the epidemic. This is not directly measured by case data and estimators are affected by many unknown factors, such as the fraction of infected individuals who are tested. While the absolute number may be poorly known, a relative indicator that has weaker model dependence would be useful to indicate relative prevalence between two regions or between two different periods within a region. For this purpose, case data is analyzed using the nominal model, and the deduced contagious population is scaled by the ratio of total cases to total infections. The result, U C, for "uncorrected circulating contagious population", would need to be divided by the fraction of infected individuals tested, to be an estimate for the absolute size of the contagious population. The correction factor is not necessary to compare the relative prevalence in regions with similar testing policy.

Just as for α, the estimator for U C depends on the latent and circulation delay parameters. To illustrate the sensitivity, the same combinations for delay parameters are considered and the ratio of U C mod /U C nom is shown in Fig. 2 for the same conditions as described in section 3.3. Large deviations up to about 30% are seen, but this effect is small compared to the observed range of prevalence which spans several orders of magnitude. information that can distinguish these two hypotheses, it may be necessary to wait until for the case data itself to identify whether or not the region is experiencing a new rate of growth. During the initial phase of the epidemic in western nations, in March 2020, testing policy and availability were changing significantly. This may account for the fact that in many regions, including in Canada and the United States, the number of cases did not follow an exponential growth trajectory in early March. After that, the growth in number of cases are generally well described by models with relatively long periods of constant transmission and testing rates, even though testing availability generally increased. This suggests that revised testing policies enacted after March did not substantially change the fraction of infected individuals getting tested. Should a testing or reporting policy change cause a rapid rise in cases, this could be misinterpreted as a reporting anomaly or a localized outbreak, but this will not cause δ to be overestimated. It is the exponential growth of cases that determines δ, not the absolute number of cases.

Estimating the growth and size of the contagious population from case data, and interval estimation in particular, presents a challenge due to the large variance in the daily reported numbers. Occasionally, a very large number of cases is reported on one day, as a backlog of cases clears some bottleneck in the reporting process. These anomalies are normally announced and they are modelled by an injection into the reported population. Even ignoring those rare situations, daily case numbers have variance that far exceeds that expected in a model with independent infected individuals being tested as they become symptomatic. The pyPM model includes variation that arises from the reporting process itself, in which a variable fraction of cases are held back and reported the next day. The fraction is drawn from a uniform distribution between rn and 1, where rn is the reporting noise parameter. This produces a negative autocorrelation between one day and the next, which can be compared with actual data. A second parameter, noise backlog, is provided to allow that only a fraction of the backlog being reported the next day, to reduce the autocorrelation effect in the data. By including this additional source of variance, the intervals for the growth parameter estimates can grow by a factor of 2 or more.

As an example, the daily cases from the province of Ontario is shown, along with the expectations from the model fit to that data in Fig. 3 . The first 110 days are generally described by two transitions of transmission rate (on day 26 and 44) and one broad reporting anomaly centered at day 91. The day-to-day variation is larger than the model with no reporting noise. In many jurisdictions, the effect of reporting noise is even larger than this. To match the goodness of fit, calculated by assuming the daily data follow a Poisson distribution, the reporting noise parameter is set to 0.7.

Other provincial and state data typically require additional reporting noise with this parameter set between 0 and 0.5.

To match the goodness of fit for the cumulative cases, the negative binomial parameter for the infection cycle p nb is set to 0.1. These parameters were set by looking at the goodness of fit for 10 fitted simulation samples. 

As indicated in section 3.6, the variance in the daily case counts exceeds that expected in a simple model, and therefore an additional source of variation is included in the model to represent variation in daily reporting. A more significant complication is that the daily cases do not represent outcomes of independent random variables. Having a larger than expected number of infections for one day will generally result in larger than expected number of cases (and a larger than expected number of infections) in subsequent days. This effect is illustrated in Fig. 4 . Table 2 shows that the biases are less than 1 standard deviation for all parameters. 

Public data from regions with significant CoViD-19 hospitalizations can be used as an alternative measure of the exponential growth rate parameter δ. In the simplified homogeneous model assumed for this paper, daily hospital admissions and the number of individuals in hospital will also follow exponential growth or decline during periods of constant transmission rate. For the Ontario public hospitalization data corresponding to the case data used for the fits in Table 3 , there is sufficient data to estimate δ 2 . The results are found to beδ 2h = −0.003 ± 0.012 (in-hospital data) andδ 2i = −0.022 ± 0.010 (in-ICU data), both consistent with the case data estimate (δ 2 = −0.019 ± 0.003), albeit with much larger statistical uncertainties, given the limited number of hospitalizations as compared to positive tests.

Differences between case data and hospitalization estimates for δ could arise in models with heterogeneous populations, accounting for the fact that the population requiring hospitalization for CoViD-19 is older than the population tested positive for the disease, and the transmission rates and circulation periods may differ by age. In British Columbia, for example, the median age for the hospitalized population is 69 years, compared to 51 years for the population who have tested positive 5 . Table 4 . Figure 5 shows the cumulative distributions for cases and deaths for the first 45 days compared to the model predictions that use the fitted parameter values.

To estimate bias and standard deviations of the estimators, 200 simulated samples were produced using these point estimates, and fit keeping the transition dates fixed. The mean and standard deviations of the estimates are reported in Table 5 . In separate studies, the standard deviations of transition date estimates were found to be typically 1 day.

The estimated dates for the second transition has a mean of 22.6 days following March 1, and a standard deviation Table 4 for the period March 1 -April 15, 2020. The dashed lines show the model predictions for the cumulative deaths during this period. Figure 6 . Summary of the growth rate estimates (corrected for bias) and their one standard deviation statistical uncertainties for the four periods for each state in Germany.

Apr-05 Apr-12 

A test was performed for the statistical uncertainties assigned for growth rate estimates, by comparing estimates for the growth rates for two independent periods (A) March 27 -April 15 and (B) April 17 -May 6. Lock-down measures were fixed during this period, so growth rates would be expected to be nearly the same for the two periods. The cumulative cases and deaths for those periods are shown in Fig. 7 Model fits were performed using case data for those periods separately, by optimizing only the growth rate parameter, δ 2 , and the scale parameter C 0 (the initial size of the contagious population). The same approach was applied to 100 simulated samples using the point estimates shown in Table 4 . The differences between the point estimates for the two periods, ∆ =δ 2A −δ 2B , for data and the simulated samples are summarized in Fig. 8 

As described above, a transition in transmission rates was forced on May 6, so that the estimated growth rate parameters before (δ 2 ) and after (δ 3 ) that date can be compared. The test of bias and statistical uncertainty shown in section 5.1 confirms the method to estimate the statistical significance of the observed difference in growth rate.

After correcting the point estimates in Table 4 for biases deduced from fits to simulated samples as reported in Table 5 , the weighted average difference of the point estimates for the 13 German states isδ 3 −δ 2 = 0.0145 ± 0.0035, a small but statistically significant increase. Even with 50 days of data following relaxation, the growth rate estimatesδ 3 have large uncertainty due to the diminishing numbers of cases.

As discussed in section 3.4 the "uncorrected circulating contagious population" (or U C) is a useful comparative statistic for the size of the contagious population, with reduced systematic uncertainty. The U C history for each of the German states is shown in Fig. 9 relative to the state population. The relative sizes for different states were within a factor of about 5 throughout the period shown.

After estimating the transmission rates using the case data, the death delay distribution parameters are estimated in separate fits of the model to the cumulative death data. In these fits, the transmission rates are fixed to the point estimates in Table 4 and only three parameters are estimated; a scale factor (the fraction of the contagious who die), and the mean and standard deviation of the time between becoming contagious and death. The statistical uncertainty in these estimates are defined by the standard deviation of estimates from fits to 100 simulated samples. The results

from each state are shown in Fig. 10 . The data from the rapid rise in the death rate in the early stages of the epidemic carry the most information to estimate the standard deviation of the delay (σ D ). With larger values for σ D , the rise in deaths is slower. The analysis is not sensitive to a long upper tail in the delay distribution, and the assumption that the distribution is normal may be incorrect. reported from an early study of 24 deaths in China. 7

The simple pyPM model, introduced in this paper, can be used to interpret public data in order to characterize the spread of CoViD-19 in different regions around the world. With relatively few parameters, the epidemic history can be conveniently summarized, with relatively long periods having constant transmission rates (reported with the daily growth parameters δ i ) and the size of the contagious population (reported with U C).

As governments relax social distancing rules and open borders, it will be important to compare the state of the disease and to detect changes in growth rates. As the situation develops, results from additional studies will be made available on the website www.pypm.ca. 

The Failure of R0

How generation intervals shape the relationship between growth rates and reproductive numbers

Public Health Agency Canada Modelling Group weekly report

British Columbia COVID-19 Daily Situation Report

Estimates of the severity of coronavirus disease 2019: a model-based analysis