key: cord-0934219-9nft6b5e authors: TOVISSODE, C. F.; LOKONON, B. E.; GLELE KAKAÏ, R. title: On the use of growth models for forecasting epidemic outbreaks with application to COVID-19 data date: 2020-08-18 journal: nan DOI: 10.1101/2020.08.16.20176057 sha: 66d77a8aa0fd28b4f2fe74fedc289409bed8003f doc_id: 934219 cord_uid: 9nft6b5e The initial phase dynamics of an epidemic without containment measures is commonly well modeled using exponential growth models. However, in the presence of containment measures, the exponential model becomes less appropriate. Under the implementation of an isolation measure for detected infectives, we propose to model epidemic dynamics by fitting a flexible growth model curve to reported positive cases and to infer the overall epidemic dynamics by introducing information on the detection/testing effort and recovery and death rates. The resulting modeling approach is close to the SIQR (Susceptible- Infectious-Quarantined-Recovered) model framework. We focused on predicting the peaks (time and size) in positive cases, actives cases and new infections. We applied the approach to data from the COVID-19 outbreak in Italy. Fits on limited data before the observed peaks illustrate the ability of the flexible growth model to approach the estimates from the whole data. The current COVID-19 pandemic caused by the new coronavirus strain SARS-nCOV2 has emerged 15 from Wuhan, China (Giordano et al., 2020, Velavan and Meyer, 2020) . The COVID-19 outbreaks totalize 21 026 758 cases and 755 786 deaths across the world on August 15, 2020 (WHO, 2020) . 1 an application to daily case reporting data from Italy which has virtually completed a whole COVID-19 50 outbreak wave and thus offers the possibility to compare predicted outputs to real events. We consider a growth curve approach for modeling the course of an epidemic along time. We follow Chowell et al. (2020) , Chowell (2017) , Spencer and Golinski (2020) who among others used growth models for forecasting epidemic dynamics. 55 2.1 Structural model for epidemic incidence Let C t denote the size of the detected infected population at time t, i.e. the cumulative number of infected, identified and isolated individuals. We assume for convenience that C t is continuous and denotė C t its first derivative with respect to t. Also let I t be the true size of infectives at t, related to C t where δ t ∈ (0, 1] is the detection rate which is closely related to the testing effort (number of tests, tracing of contact persons of identified cases and targeting exposed people) and is assumed at least twice differentiable with respect to t. We ressort to the generic growth model of Turner et al. (1976) for the identified positive cases: with u t = [1 + νωρ(t − τ )] −1/ρ . In (2), K > 0 is the ultimate epidemic size (detected), ω > 0 is the 65 "intrinsic" growth constant, ν and ρ are powers (ν > 0 and −1 < ρ < ν −1 ) characterizing respectively the rates of change with respect to the initial size C 0 = δ 0 I 0 (number of cases detected at time t = 0) and the ultimate size K, and τ is a constant of integration, determined by the initial conditions of the epidemic and implicitly the detection rate δ 0 through C 0 = K 1 + (1 − νωρτ ) −1/ρ −1/ν for ρ = 0 and C 0 = K (1 + e νωτ ) −1/ν for ρ = 0. The growth model (2) is quite flexible to handle various shapes of 70 epidemic dynamics. Indeed, if K → ∞ and νρ → 0, (2) specializes to the exponential growth model where ω is the exponential growth rate. Apart from (3), other special or limiting cases of (2) include the hyper-Gompertz (ν → 0 while ων 1+ρ is constant) and the Gompertz (ν → 0, ρ → 0 while ων is constant), 3 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the Bertalanffy-Richards (ρ → 0), the hyper-logistic (ν = 1) and the logistic (ν = 1 and ρ → 0) growth models (Turner et al., 1976) . From (2), the observed epidemic incidenceĊ t is given by In order to ensure the restriction −1 < ρ < ν −1 , we set ρ = ρ 0 ν+1 ν − 1 with ρ 0 ∈ (0, 1) free of ν. The number A t of detected and active cases along an epidemic outbreak is of high interest for public health officials. Indeed, A t must be kept under the carrying capacity of the health system to avoid overload and disrupture. The derivativeȦ t of the detected and active cases satisfies where R t = α t A t denotes the number of removed and permanently immune (mortality and recovery) at time t, and α t is the unit time removal probability, i.e. the odds for having an outcome (recovery or death), averaged over the active cases. Equation (5) fits in the SIQR (Susceptible, Infectious, Quarantined, Recovered) model framework (Hethcote et al., 2002) with the detected actives cases referred to as "quarantined" and the strong assumption that α t is constant along the epidemic outbreak (see the third 85 equation in system (6) in Hethcote et al. (2002) ). The removal probability can more generally be given the logistic form α t = e η t 1+e η t with η t = X t β + κt where X t = (X t1 , X t2 , · · · , X tq ) is a vector of q covariates (known constants) and β is the q vector of associated effects, and κ determines the change in the log-odds ratio for having an outcome per unit time. These changes in α t can be due to an improve in the health care system during the epidemic outbreak (increase in recovery ratio) or a deterioration of the 90 health care system for infected individuals (increase in mortality ratio due to the outbreak). The general solution of the differential equation (5) turns to have the form where A 0 is the number of active cases at time t = 0. Indeed, when κ = 0, taking the first derivative of (6) yieldsȦ t =Ċ t e αtt e −αtt + A 0 + t 0Ċ s e αss ds (−α t ) e −αtt =Ċ t − α t A t which is the equation (5). For t = 0, the integral in (6) vanishes, resulting as expected in A t = A 0 since e −αtt = 1. When κ = 0, the 95 first derivative of (6) isȦ Here, for t = 0, e ηt = e X t β so that A t = A 0 . There are no general closed form solutions for the integrals in (6), unlessĊ t and α t are purposely chosen as functions of time to simplify the integral. A t can however be obtained in practice from (6) using a numerical integration routine such as the function integrate in R freeware (R Core Team, 2019) or the 100 function integral of Matlab (MATLAB, 2016) . To nevertheless circumvent this issue during estimation under the generic growth model (2), we discretized the active cases A t by assuming a binomial removal process R t conditional on the detected unit time new cases Y t as where BIN(n, α) denotes a binomial distribution with n trials and success probability α and Y t is a non negative process with expectation λ t =Ċ t . Clearly, the bivariate process {A t , R t } defined by (8-7) is not 105 stationary. However, since Y t ≥ 0 andĊ t → 0 as t → ∞, we have Y t → 0 in distribution as t → ∞, and if the removal probability α t does not approach zero as t → ∞, then A t → 0 as t → ∞. The epidemic peak is an important event in the disease dynamic and can be estimated for a better management of the epidemic. For an epidemic described by the exponential growth model (3) (K → ∞ or 110 νρ → 0) does not peak. Otherwise, the peak in the detected number of infected individuals corresponds to the maximum of the incidence rateĊ t . This maximum is then attained whenC t = ∂Ċt ∂t = 0. We have from (4) SolvingC t = 0 for t using (9) yields the peak time t p = τ + 1−νρ ν(1+ρ) ρ − 1 /(νωρ) which reads, on replacing ρ = ρ 0 ν+1 ν − 1. Inserting t p in (1) and denoting u p = ν 1+ρ 1−ρν gives the peak At the peak in detected cases, the cumulative number of detected cases is C p = K(1 + u p ) −1/ν . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . https://doi.org/10.1101/2020.08.16.20176057 doi: medRxiv preprint An important interest in modeling the epidemic incidence is the derivation of quantities related to the overall dynamics of the epidemic, in both detected and undetected cases. 2.4.1 Total cases: detected and losts 120 Let us denote S t the cumulative number of cases from the epidemic outbreak to t, and letṠ t be the first derivative of S t . We also introduce Λ t , the cumulative number of lost cases (with first derivativeΛ t ), i.e. people who were infected, undetected, and removed from infectives (mortality and recovery). The size of the lost cases is determined by the unit time removal rate π t ∈ (0, 1) from undetected infectives (π t is an average over all infectives, i.e. irrespective of the time since infection onset). The lost rate 125 π t which is assumed at least twice differentiable with respect to t, depends on various factors like the disease related mortality, the average infection duration, the natural proportion of asymptomatics within infectives, and the existence and the use of medicines that may reduce symptoms (induced asymptomatics). It is worthwhile noticing that π t can be estimated from the removal rate α t in the detected cases (see section 2.2), taking into account various factors that may induce difference between the two rates. For 130 instance, since the undetected cases include asymptomatics, desease related mortality may be lower and recovery rate higher in undetected as compared to detected cases. However, efficiency of the health care system in treating identified and isolated cases can reduce mortality thereby reducing α t , but also improve recovery thereby increasing α t . With the above notations, the lost cases count Λ t satisfies the differential equation, whereas the cumulative number of cases S t is given on The factor υ t represents at time t the proportion of infectives who will potentially continue to spread the epidemic after adequate contacts (i.e. contacts sufficient for transmission) with susceptibles. In other words, the number of undetected currently infectives is (1), the infectives I t and its first derivative with respect to timeİ t are given for t ≥ 0 by whereδ t is the first derivative of the detection rate δ t with respect to t. Straightforward algebraic 6 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . https://doi.org/10.1101/2020.08.16.20176057 doi: medRxiv preprint operations then give the number of new cases and the cumulative number of cases aṡ the first derivative of the lost rate π t , and the cumulative losts Λ t is given for t ≥ 0 by with S 0 the cumulative number of all cases until the first detection date t = 0. The total size of the 145 Let us assume a constant detection rate δ t = δ closely related to detection effort but also to the average duration from infection to recovery or death of non-isolated cases. Assuming in addition a constant lost rate (π t = π), we haveδ t =π t =υ t = 0, and the new casesṠ t and its accumulation S t as well as the lost cases Λ t simplify to The total epidemic size is here S ∞ = S 0 + 1 + π δ −1 − 1 K. At the time t p of the peak of reported cases (C t = 0) under constant detection and lost rates, the new infectives isṠ p = 1 + π δ −1 − 1 Ċ p withĊ p given in (11). This however corresponds to the peak in the overall new casesṠ t only under the unrealistic assumption δ = 1. The peak of new infections occurs 155 when the second derivativeS t of S t with respect to t vanishes (S t = 0). We have from (16) where ... Ct (the third derivative of C t with respect to t) and Ψ t are given by . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . https://doi.org/10.1101/2020.08.16.20176057 doi: medRxiv preprint withπ t andδ t the second derivatives of respectively π t and δ t with respect to t. The peak time and value depend on the particular forms of δ t and π t as functions of time. We here restrict attention to the simple situation with constant positive detection and lost rates (δ t = δ with δ ∈ (0, 1) and π t = π with 160 π ∈ (0, 1)) whereδ t =π t = Ψ t = 0 and (22) reduces tö It appears that the peak of new infections occurs before the time t p of the peak in detected cases. Indeed, (25) indicates that at the time t P of the peak of new infections,C t is ... CP is given by (23) with t = t P . The lower ζ, the lower 165 |C P |, and the lower the difference t p − t P (delay of the observed peak). Differentiating ζ with respect to ] 2 < 0, hence the higher δ, the lower the delay between the observed peak time and the time of the peak in new infections. Using (9) and (23), and setting z t = ut 1+ut ,S t becomes which does not have a closed form root. The root t P can however be obtained using root finding numerical routines such as the R function uniroot or the Matlab function fzero. Afterwards, the peakṠ p 170 size (the maximum number of new infections) is obtained using (19). Let us consider a record of new confirmed infected cases Y 1 , Y 2 , · · · , Y n , active cases A 0 , A 1 , · · · , A n−1 , removed cases R 1 , R 2 , · · · , R n (available from (8) as R t = Y t − A t + A t−1 ) and the associated vectors of covariates X 1 , X 2 , · · · , X n at n time points. The parameters K, ω, ν, ρ 0 , τ , and κ can be estimated using 175 maximum likelihood (ML) by assigning to each Y t an appropriate statistical distribution with expectation λ t =Ċ t and a dispersion parameter σ > 0, and probability density function (pdf) or probability mass . We subsequently consider inference under log-normal and negative binomial distributions. Epidemic incidence case data are generally fitted through non linear least squares applied at logarithmic scale , Chowell et al., 2007 . To deal with zero incidence 8 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . cases, the logarithmic transform is usually applied on the shifted cases Y t + 1. Mimicking this procedure in a likelihood inference framework, we consider a log-normal distribution assumption for the shifted incidence cases, i.e. Y t + 1 ∼ LN(λ t + 1, σ). The pdf of Y t , adapted from Limpert et al. (2001), reads so that Y t has expectation E[Y t ] = λ t and variance V ar[Y t ] = (λ t + 1) 2 e σ 2 − 1 . Since incidence cases are counts, Y t can be assumed to follow the negative binomial distribution, i.e. The incidence case Y t then has expectation E[Y t ] = λ t and variance V ar[Y t ] = λ t (1 + σλ t ). 190 Based on the information {Y t , R t } for t = 1, 2, · · · , n, the conditional log-likelihood of the parameter θ where f B (R t |θ) = At−1+Nt In order to test the reliability of the Turner's growth model in predicting the dynamics of an epidemic, we use data from one of the countries which has completed a whole COVID-19 outbreak wave. The daily case reporting data in Italy has been obtained from https://github.com/CSSEGISandData/COVID-19/ tree/master/csse_covid_19_data/csse_covid_19_time_series. We use only the confirmed data 205 9 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . https://doi.org/10.1101/2020.08.16.20176057 doi: medRxiv preprint (2020-02-20 to 2020-07-11) available on 2020-07-28, discarding the latest data subject to possible reporting delay, as indicated by the Istituto Superiore di Sanità (ISS) at https://www.epicentro.iss.it/en/ coronavirus/sars-cov-2-dashboard. All analyses were performed in R (R Core Team, 2019). We have fitted the Turner's growth model curve 210 to the whole italian data. Both the log-normal and the negative binomial distributions were used and the fit with the lowest root mean square error (RMSE) computed for the daily new positive cases was selected as the best. We next derived peak statistics (time and size) for daily new reported cases and actives cases. We also inferred the daily new infections from assuming constant detection and lost rates and estimated its peak (time and size). The detection (δ = 0.033/day) and the lost rates (π = 0.1/day) for 215 Italy were obtained from Pedersen and Meneghini (2020). These rates follows from assuming an average time of duration from infection to recovery or death of non-isolated cases of ten days (hence π = 0.1/day) and that during this detection window, 1/3 of infectives are tested positives (hence δ = 0.033/day). In order to assess the ability of the model to predict the peak of the new positive cases in countries which has not yet reached the peak, we retrospectively fitted the model to the italian data before the observed 220 peak (day 29 after the first case notification), using data of the first two weeks, and then data of the first three weeks. For these analyses with limited data, we fitted the full Turner's growth model to the positives cases, but also its special cases, namely the hyper-Gompertz (ν → 0 while ων 1+ρ is constant), the Gompertz (ν → 0, ρ → 0 while ων is constant), the Bertalanffy-Richards (ρ → 0), the hyper-logistic (ν = 1) and the logistic (ν = 1 and ρ → 0) models using the log-normal distribution for the daily counts. 225 We then computed the Akaike's Information Criterion (AIC) defined as AIC = −2 + 2N p with the maximized log-likelihood and N p the number of parameters in a fitted model. We finally retained and presented the best fit (lowest AIC value). 230 Table 1 shows parameter estimates using the whole italian COVID-19 daily case reporting data from 2020-02-20 to 2020-07-11, with standard errors and 95% confidence intervals. It appears that the lognormal distribution based fit recorded the lowest RMSE and is thus retained for subsequent analyses. The confidence bounds for the parameter ρ (ρ = 0.32 with CI(ρ) = [0.29, 0.35]) indicate that neither 10 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . Notes: β and κ define the daily removal rate from detected cases as αt = e β+κt 1+e β+κt ; σ is the log-normal/negative binomial distribution scale parameter (see pdf (27) and pmf (28)). The AICs of the retrospective fits of Tuners's growth model and its special cases to the italian COVID-19 data of the first two weeks and the first three weeks are presented in table 3. It can be observed that 12 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . https://doi.org/10.1101/2020.08.16.20176057 doi: medRxiv preprint Table 2 . Estimate, standard error (SE) and 95% confidence interval of peak statistics using the italian COVID-19 daily case reporting data from 2020-02-20 to 2020-07-11 Table 4 shows the estimate of the hyper-logistic growth model parameters for the two shorted datasets. It appears that the estimates of the intrinsic growth parameter ω increases slightly with data availability 270 from ω = 0.05 (CI ω = [0.04, 0.07]) using the data of the first two weeks, to ω = 0.07 (CI ω = [0.06, 0.08]) using the data of the first three weeks and to ω = 0.09 (CI ω = [0.07, 0.11]) using the whole dataset from Italy. The estimates of the peak time and size from the two shorted datasets are shown in table 4. It can be observed that the forecast of the peak time from the data of the first two weeks is day 44 ( t p = 43.38, 275 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020 . . https://doi.org/10.1101 /2020 Notes: β and κ define the daily removal rate from detected cases as αt = e β+κt 1+e β+κt ; σ is the log-normal distribution scale parameter (see pdf (27) This work proposes the use of a flexible growth model for modeling case reporting data from an epidemic outbreak with containment measures including at least isolation of individuals tested positive. The generic growth model of Turner et al. (1976) offers a flexible framework with the possibility to recover many 285 special growth models such as the common exponential and the logistic growth models, the hyper-logistic, the hyper-Gompertz, the Gompertz and the Bertalanffy-Richards growth models. Since the special models are all nested within the generic model framework, the most appropriate model can be identified 14 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2020. . https://doi.org/10.1101/2020.08.16.20176057 doi: medRxiv preprint using information criteria such as the Akaike's An Information Criterion (AIC), but a likelihood ratio test (Wilks, 1938) can also be conducted for models with different number of free parameters. When additional 290 information can be obtained on the ability to detect infective individuals (number of tests, tracing of contact persons), the proposed framework allows to include this information so as to infer on the dynamics of the epidemic beyond the identified (positive) cases, without ressorting to mechanistic/compartmental models. From our application to the COVID-19 outbreak data in Italy, the hyper-logistic model is the most 295 appropriate model for the dataset. It appears that the modeling approach can predict the dynamics of an epidemic using data from first few days of an outbreak, at least in this example. Indeed, the predicted peak time (and size) for the positives cases (using only the first two/three weeks data) overestimates (and underestimates) the observed peak time (and size). However, the biases can be attributed for instance to the increase in the testing effort and isolation (and the subsequent decrease in the growth rate) in Italy 300 where only about 3762 tests/day were performed in the first three weeks from 2020-02-20, and about 21248 tests/day were performed in the next three weeks. Our estimate of the ratio of the number of infectives to the number of active cases averaged 22.95 in the first week of the outbreak, in the range [5, 25] obtained by Pedersen and Meneghini (2020) using the SIQR model. Our proposal thus offers a valid alternative to mechanistic models, for instance the picewise exponential growth used by Pedersen 305 and Meneghini (2020) withing the SIQR model framework on the italian early outbreak data. In a very limited data situation, we suggest a further reduction of the number of model parameters to be estimated. Indeed, since the parameter τ in the growth model (2) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 18, 2020 . . https://doi.org/10.1101 /2020 A poisson autoregressive model to understand covid-19 contagion dynamics Athanasios Tsakris, and Constantinos Siettos. Data-based analysis, modelling and forecasting of the covid-19 outbreak Can the covid-19 epidemic be controlled on the basis of daily test reports? Real-time forecasting of epidemic trajectories using computational dynamic ensembles Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts Is it growing exponentially fast?-impact of assuming exponential growth for characterizing and forecasting epidemics with initial near-exponential growth dynamics Comparative estimation of the reproduction number for pandemic influenza from daily case notification data Mathematical models to characterize early epidemic growth: A review Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Effects of quarantine in six endemic models for infectious diseases Early dynamics of transmission and control of covid-19: a mathematical modelling study. The lancet infectious diseases Quantifying undetected covid-19 cases and effects of containment measures in italy R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Modeling the covid-19 epidemic using time series econometrics. medRxiv A theory of growth The covid-19 epidemic A generalized-growth model to characterize the early ascending phase of infectious disease outbreaks WHO. Coronavirus disease 2019 (covid-19): situation report The large-sample distribution of the likelihood ratio for testing composite hypotheses. The annals of mathematical statistics