key: cord-0977803-pem3eqiz authors: Smith, Ben A. title: A novel IDEA: The impact of serial interval on a modified-Incidence Decay and Exponential Adjustment (m-IDEA) model for projections of daily COVID-19 cases date: 2020-05-30 journal: Infect Dis Model DOI: 10.1016/j.idm.2020.05.003 sha: 720466e32bdb64a153d791060501bff7476db58f doc_id: 977803 cord_uid: pem3eqiz The SARS-CoV-2 virus causes the disease COVID-19, and has caused high morbidity and mortality worldwide. Empirical models are useful tools to predict future trends of disease progression such as COVID-19 over the near-term. A modified Incidence Decay and Exponential Adjustment (m-IDEA) model was developed to predict the progression of infectious disease outbreaks. The modification allows for the production of precise daily estimates, which are critical during a pandemic of this scale for planning purposes. The m-IDEA model was employed using a range of serial intervals given the lack of knowledge on the true serial interval of COVID-19. Both deterministic and stochastic approaches were applied. Model fitting was accomplished through minimizing the sum-of-square differences between predicted and observed daily incidence case counts, and performance was retrospectively assessed. The performance of the m-IDEA for projection cases in the near-term was improved using shorter serial intervals (1–4 days) at early stages of the pandemic, and longer serial intervals at mid-to late-stages (5–9 days) thus far. This, coupled with epidemiological reports, suggests that the serial interval of COVID-19 might increase as the pandemic progresses, which is rather intuitive: Increasing serial intervals can be attributed to gradual increases in public health interventions such as facility closures, public caution and social distancing, thus increasing the time between transmission events. In most cases, the stochastic approach captured the majority of future reported incidence data, because it accounts for the uncertainty around the serial interval of COVID-19. As such, it is the preferred approach for using the m-IDEA during dynamic situation such as in the midst of a major pandemic. The SARS-CoV-2 virus causes the disease COVID-19, and has caused high morbidity and mortality 43 worldwide (WHO, 2020; Wu et al., 2020) . COVID-19 was first reported in Canada on January 25, 2020 44 (COVID-19 Canada Open Data Working Group, 2020). This case, and many subsequent cases over the 45 following weeks, were linked to international travel (COVID-19 Canada Open Data Working Group, 46 2020). The WHO declared COVID-19 as a pandemic on March 11, 2020-the same day that the case total 47 reported in Canada exceeded 100 (Government of Canada, 2020). It might never be known precisely 48 when community transmission of the disease originally occurred in Canada, but the first documented 49 case was reported on March 1, 2020 (COVID-19 Canada Open Data Working Group, 2020). However, it is 50 likely that community transmission had occurred at some point before this date. 51 As of May 15 th , 2020, cumulative cases of COVID-19 continue to climb in Canada, with variations in the 52 trends of reported cases among provinces and territories (Government of Canada, 2020). There is a 53 need for mathematical models to provide projections of future incidence and cumulative cases in 54 Canada to assist with decision-making. During the early-to mid-stages of a novel epidemic or pandemic, 55 very little is known of the specific mechanisms of transmission of disease-this is a global issue for 56 COVID-19. Empirical models are useful tools to predict future trends of disease progression such as case 57 counts over the near-term. Many empirical models are available for such projections, with varying 58 degrees of success in projecting cases in past outbreaks of infectious disease (e.g. Chowell et al., 2016; 59 Xiao et al., 2013) . reproduction number of the disease (R 0 ) and d, the latter of which is discount factor that can represent, 68 for example, interventions implemented to curb the spread of disease (Fisman et al., 2014) . In part 69 because it does not require detailed mechanistic information on disease transmission, it has recently 70 been suggested for "quick and dirty" forecasting in the face of an infectious disease epidemic (Tuite & 71 Fisman, 2018 ). While the model has several limitations, it can provide a complement to mechanistic 72 approaches, with the advantage of being easily and rapidly applied by public health officials (Tuite & 73 Fisman, 2018) . 74 The IDEA model is parameterized on the basis of the serial interval of the disease in question. Given that the IDEA model is one tool used for producing near-term projections of cases of COVID-19 87 (Smith et al., submitted) , the goal of this study was to explore the impact of the assumed serial interval 88 for COVID-19 on projection accuracy using a retrospective analysis. This was accomplished by producing 89 projections of cases seven days into the future at various stages of the pandemic in Canada. Where represents the incidence of disease in one generation of the disease (defined by the serial 97 interval), R 0 is the reproduction number, d is a discount factor that can represent the impact of public 98 health measures to limit the impact of the disease spread, and g represents the disease generation (the 99 original IDEA model classified this as t, but it is referred to as g herein to avoid confusion with 100 subsequent equations). While the IDEA model compares predicted cases versus reported cases 101 aggregated by serial interval (g has historically been an integer value), the approach was modified to 102 provide case projections on a daily basis. 103 The generation of the disease, g, is defined as: 104 (2) 105 Where t represents time in days, and SerInt represents the serial interval of the disease in days. This 106 parameterization allows for g to represent continuous rather than discrete values, dependent on SerInt. 107 I(g) is calculated at each day, t, and returns the number of cases occurring between t -SerInt and t. 108 Cumulative cases at time t is then determined by summing each I(g) calculated every corresponding 109 serial interval from t = 0 through t: 110 Where − represents the first determination of since the beginning of the pandemic, and 112 represents the most recent determination of to date. Finally, daily incidence was calculated as: 113 In what follows, we defined the m-IDEA model to be the collection of Equations (1) -(4). The m-IDEA 115 model is a re-parameterization of the IDEA model and uses an approach that allows for calculation of 116 precise daily estimates. For example, if one assumes a serial interval of seven days for a given disease, 117 the original IDEA model returns estimates of incidence for each week-long period. So, the IDEA model 118 calculates the total cases aggregated by disease generation, that is, total cases occurring between Days 119 0-6, 7-13, 14-20, etc. As a result, it is impossible to determine the precise number of cases predicted for 120 a single day. The modified approach allows for g to comprise intra-generational values, and precise daily 121 estimates are determined through Equations (3) and (4). 122 The m-IDEA model was used to project daily case incidence using a range of serial intervals given the 123 lack of knowledge on the true serial interval of COVID-19 at this time. It is unknown precisely when the 124 first community-acquired case of COVID-19 occurred in Canada. It was assumed for this analysis that 125 community transmission was occurring on or before March 11, 2020 simply (and subjectively) because it 126 was the day when reported cumulative cases in Canada exceeded 100. A simple back-calculation from 127 this date, assuming an a priori R 0 of 2, was conducted to anchor the model so that March 11, 2020 128 corresponded to generation 6 of the disease in Canada. The assumed occurrence of the first case of 129 community-acquired COVID-19 in Canada was therefore dependent on the serial interval used. 130 The m-IDEA model was implemented in Microsoft Excel with the add-ins @Risk version 8.0 and Evolver 131 version 8.0 (Palisade Corporation, New York, USA) to allow for Monte Carlo simulation using Latin 132 Hypercube Sampling and optimization of model parameters. Optimization of R 0 and d parameters was 133 accomplished through minimizing the sum-of-square differences between predicted and observed daily 134 incidence case counts. This least-squares fitting method was performed using the past twenty days of 135 observed data, aside from the first three projection days, which only used 5, 10, and 15 days worth of 136 data, respectively, since that was all that was available from March 11 onwards. Near-term projections 137 using the m-IDEA model were made every five days using data from projection day (PD) 0 (March 11, 138 2020) through to PD 55 (May 4, 2020). The first projection occurred on PD 5, and considers data 139 reported from March 11, 2020 through March 15, 2020, whereas the last projection is referred to as PD 140 55 and considered data reported from April 15, 2020 through May 4, 2020. Case data were downloaded 141 from www.Canada.ca on May 11, 2020. On May 3, 2020, the Quebec government announced that new 142 case totals reported that day included 1,317 previously unreported cases from April. Therefore, 143 incidence on May 3, 2020 was reduced by 1,317 cases to account for this reporting error. Model 144 parameter solutions were derived for a range of discrete integer serial interval values on each PD using 145 the OptQuest optimization function in Evolver such that optimization results remained stable, defined as 146 within 0.01% for 20,000 consecutive trials. The OptQuest engine uses methods of tabu search, scatter 147 search, integer programming, and neural networks in a novel algorithm. 148 To provide a comparative visual of projections produced by both the original IDEA model and the m-149 IDEA, fitting was performed as described above, except it was performed on all data reported from 150 March 11, , 2020 through May 10, 2020, arbitrarily assuming a five day serial interval. The entire dataset 151 was used because limiting data to only the past 20 days of reporting resulted in too few data points to fit 152 the IDEA model. 153 Following analysis of the deterministic serial interval values, a stochastic approach was considered 154 Fitting was generally performed using the same procedure as for deterministic serial intervals. However, 155 the serial interval used on each PD was allowed to range uniformly from either 1 to 4 days or 5 to 9 156 days, given the results of the deterministic approach. To derive 90% prediction intervals and minimum 157 and maximum estimates, a Poisson error structure in the reported data was assumed. Monte Carlo 158 simulations with Latin Hypercube Sampling were performed for 1,000 iterations to consider both the 159 impact of total uncertainty of the serial interval of COVID-19 as well as the reported case data; for 160 computational practicality the parameter values were considered stable if they remained within 0.01% 161 for 500 trials (rather than the 20,000 trials used for deterministic simulations). 162 For each PD, daily incident cases projected for seven days in the future were compared with actual case 163 counts reported since the date of projection (e.g., PD 5 projections made using data up to March 15, 164 2020 were compared with cases reported on March 16 through March 22, 2020) using the root mean 165 square error (RMSE): 166 Where ( is the reported value at time t, ( & is the model projection at time t, and n is the number of 168 observations. 169 A plot of predicted incidence produced by the original IDEA and m-IDEA models using a five day serial 171 interval and reported incidence through May 10, 2020 is provided in Figure 1 Table 1 , and model performance in Figure 3 . Estimated R 0 179 parameters tended to increase with serial interval; however, in most cases, the control parameter, d, 180 also increased with serial interval (Table 1) . Generally, the serial intervals that provided the best fits 181 were variable throughout the first two weeks of the projection period. Once a full 20 days of data had 182 accumulated for fitting, lower serial intervals (1 to 4 days) provided better predictive capability through 183 PD 30. However, from PD 35 onward, in most cases higher serial intervals (5 to 9 days) were associated 184 with better projection performance (Figure 3) , and serial intervals ranging from 1 to 4 days considerably 185 under-predicted future incidence in the near-term. 186 Results from the stochastic approach to incorporate uncertainty in the serial interval used in the m-IDEA 190 are provided in Figure 4 . Serial intervals were allowed to range uniformly from 1 through 4 days for PDs 191 5 through 30, and 5 through 9 days for PDs 35 through 55, because these ranges generally included the 192 best fit discrete serial intervals in Figure 3 . This approach provides large prediction intervals and ranges 193 between minimum and maximum values due to the incorporation of a range of serial intervals. In most 194 cases, this approach captures the majority of future reported incidence data, with some notable 195 exceptions on PD 10, 15, and 35. The performance of the m-IDEA for projection cases in the near-term was improved using shorter serial 226 intervals (1-4 days) at early stages of the pandemic, and longer serial intervals at mid to late-stages (5-9 227 days). Correspondingly, studies examining pooled data from multiple countries reported median serial 228 intervals ranging from 4 to 4.6 days between December 12, 2019 and February 5, 2020, and 6. Best-fit projections at later PDs considered herein were associated with greater control parameter (d) 241 values (but also greater R 0 values compared to shorter serial intervals). 242 The best-fit parameters for the m-IDEA model were dependent on the serial interval used. Generally, 243 the estimated R 0 and d parameters both increased with serial interval (Table 1) . Projections made in 244 mid-April (PD 35) were associated with the highest R 0 values across the projection period when using 245 serial intervals ranging from three to eight days. On its own, this could mistakenly be interpreted as an 246 indication of increased disease transmission-which is contrary to expectations given that these 247 projections were made approximately one month following implementation of public health measures 248 across Canada. However, in most cases, d values were also greater compared to earlier PDs. This 249 emphasizes that the m-IDEA is not a tool intended to produce R 0 values in a vacuum, and R 0 must be 250 presented and considered hand-in-hand alongside its associated d value. 251 Given a lack of knowledge of the actual pandemic start date in Canada, March 11, 2020 was chosen as 252 an anchoring point for COVID-19 community transmission in Canada for the m-IDEA model, 253 corresponding to generation 6 of the disease. There is considerable uncertainty around the start date of 254 community transmission in Canada. With this approach, the calculated start date varied with serial 255 interval (i.e., a serial interval of 1 day corresponded with start date of March 5, 2020, whereas a serial 256 interval of 9 days corresponded with a start date of Jan 17, 2020), as well as the assumed R 0 of the 257 disease (a R 0 of 2 was assumed a priori for this analysis). The impact of the assumed start date of the 258 pandemic can have an influence on m-IDEA model projections: when March 11, 2020 was instead 259 anchored to lower or higher generations relative to generation 6, resulting near-term projections for PD 260 55 (the only PD for which this was explored) were also lower and higher, respectively (results not 261 shown). It is also likely that best fit serial intervals for each PD would vary with differing generation 262 numbers anchored to March 11, 2020. Future work could explore these impacts more thoroughly. 263 Implementations of the IDEA model for Ebola were better able to predict the epidemic peak in West 264 Africa compared to producing accurate short-term projections (Tuite & Fisman, 2018) . The m-IDEA was 265 unable to produce accurate short-term projections at early stages for the pandemic in Canada, which is 266 unsurprising given the rapidly changing landscape of public health intervention and testing strategies 267 during these early stages, as well as the lack of time-series data for fitting at early stages. The best fits 268 serial intervals appeared to stabilize somewhat from PD 25 onwards. A comparison of three 269 phenomenological models (the generalized Richard's model, logistic, and stochastic m-IDEA models), 270 showed that none of the models provided reasonable accurate projections in early March of 2020 271 (Smith et al., submitted) . 272 The stochastic approach used can account for the uncertainty around the serial interval of COVID-19, 273 and is preferred for the m-IDEA during a dynamic situation such as in the midst of a major pandemic. 274 Various distributions of serial intervals coupled with a Poisson error structure in reported case data 275 were implemented, and resultant distributions of case incidence were projected. This approach used 276 inherently accounts for correlation between m-IDEA parameters R 0 and d. Through iterative fitting 277 procedures, the incidence curves represent the 90% prediction intervals (and minimum and maximum) 278 for each data point of each projection associated with specific solutions of R 0 and d. This is more realistic 279 than simply calculating the 5 th and 95 th percentiles of R 0 and d across simulations and deriving prediction 280 intervals, which would result in highly unlikely to impossible combinations of parameters. 281 It was shown here that the serial interval associated with the best fit of the m-IDEA model changes 282 throughout the course of the COVID-19 pandemic in Canada. It is proposed that the serial interval used 283 in the m-IDEA model be frequently updated through retrospective analysis of model performance, and 284 modified where necessary for future projections. For example, here it was shown that lower and higher 285 serial intervals produced more accurate projections at earlier and later stages, respectively, of the 286 COVID-19 pandemic in Canada. Low serial intervals (1 to 4 days) projected considerably lower incidence 287 than reported for future cases at the later stages of this study period. Although the IDEA model was not 288 developed as a tool to predict the actual serial interval of a disease, the use of the m-IDEA provides 289 evidence that the serial interval of COVID-19 at the national level in Canada has increased over the 290 course of the pandemic. While the RMSE was used to assess model performance, it is possible that 291 results could vary when using alternate error metrics. Caution is advised before adopting these same 292 serial intervals for application of phenomenological models to other datasets, at the regional (e.g. 293 provinces and territories) or international scale. A similar analysis should be conducted in other areas to 294 determine the serial interval(s) that provides the best fit to date in a recurring, iterative manner. 295 It has been previously proposed that the IDEA model could implement "multiple time-specific control 296 parameters" to improve forecasting (Tuite & Fisman, 2018) . Until then, incorporating uncertainty 297 through a stochastic approach to parameterize the serial interval is preferred, as it has been shown that 298 the serial interval associated with the more accurate near-term projections varied throughout the study 299 period. Although this is somewhat akin to trying to "hit a moving target", it is nevertheless preferable 300 and more biologically plausible compared to assuming an identical, discrete value serial interval 301 throughout the pandemic. See text for full details on the fitting procedure. Projections were made every five days using data reported since March 11, 2020. All incidence data up to and including the vertical red line were used to fit the model. Incidence data reported since the projections are plotted to the right of the vertical red line. a) projection day (PD) 5; b) PD 10; c) PD 15; d) PD 20; e) PD 25; f) PD 30; g) PD 35; h) PD 40; i) PD 45; j) PD 50; k) PD 55. Plots for PDs 5 through 30 (a-f) were derived with a uniformly distributed serial interval ranging from one to four days, and assuming a Poisson error structure in reported data. Plots for PDs 35 through 55 reflect a serial interval ranging from five through nine days, also assuming a Poisson error structure in reported data. 90% prediction intervals and minimum and maximum predicted values are indicated in shaded red and pink, respectively. Early transmission dynamics in 372 of novel coronavirus-infected pneumonia Epidemiological and clinical 375 characteristics of COVID-19 in adolescents and young adults Community transmission of 378 severe acute respiratory syndrome coronavirus 2 Presymptomatic transmission of 381 COVID-19 in a cluster of cases occurred in confined space: A case report COVID-19 in tianjin Active or latent tuberculosis 387 increases susceptibility to COVID-19 and disease severity Estimating the ascertainment rate of SARS-CoV-2 390 infection in Wuhan, China: Implications for management of the global outbreak Epidemiological parameters 393 of coronavirus disease 2019: A pooled analysis of publicly reported individual data of 1155 cases 394 from seven countries Utilizing nontraditional data sources for near real-time estimation of transmission dynamics during 398 the 2015-2016 Colombian Zika virus disease outbreak Transmission potential of the novel coronavirus (COVID-19) 401 onboard the Diamond Princess cruises ship Glossary of terms for infectious disease modelling: A proposal for 404 consistent language Initial cluster of novel coronavirus nCoV) infections in Wuhan, China is consistent with substantial human-to-human transmission Serial interval of novel coronavirus (COVID-410 19) infections Risk of 413 travel-related cases of Zika virus infection is predicted by transmission intensity in outbreak-414 affected countries Epidemiologic characteristics of COVID-19 in Guizhou Investigation of three clusters of COVID-19 in Singapore: Implications for surveillance and response 419 measures COVID-19 transmission 421 within a family cluster by presymptomatic carriers in China The performance of 423 phenomenological models in providing near-term Canadian case projections in the midst of the 424 COVID-19 pandemic Epidemiological 426 characteristics of 2019 novel coronavirus family clustering in Zhejiang province Transmission interval estimates suggest pre-symptomatic spread of COVID-19 The IDEA model: A single equation approach to the Ebola forecasting 433 challenge doi Serial intervals of respiratory infectious diseases: A 435 systematic review and analysis Strongly heterogeneous transmission of COVID-19 in mainland 437 China: Local and regional variation WHO director-general's remarks at the media briefing on A new coronavirus associated 443 with human respiratory disease in China Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Estimated impact of aggressive 449 empirical antiviral treatment in containing an outbreak of pandemic influenza H1N1 in an First Nations community. Influenza and Other Respiratory Viruses Household transmissions of SARS-453 CoV-2 in the time of unprecedented travel lockdown in China Estimation of the time-varying 456 reproduction number of COVID-19 outbreak in China Evolving epidemiology 458 of novel coronavirus diseases 2019 and possible interruption of local transmission outside Hubei 459 province in China: A descriptive and modeling study The time-varying serial 462 interval of the coronavirus disease (COVID-19) and its gender-specific difference: A data-driven 463 analysis using public surveillance data in Hong Kong and Shenzhen Epidemic growth 466 and reproduction number for the novel coronavirus disease 2020: A preliminary data-driven analysis Estimating the serial 470 interval of the novel coronavirus disease (COVID-19): A statistical analysis using the public data in 471