key: cord-0871795-sr6uizat authors: Bartolomeo, Nicola; Trerotoli, Paolo; Serio, Gabriella title: Short-term forecast in the early stage of the COVID-19 outbreak in Italy. Application of a weighted and cumulative average daily growth rate to an exponential decay model date: 2020-12-30 journal: Infect Dis Model DOI: 10.1016/j.idm.2020.12.007 sha: 29926baccb33ba7e5d62224cc8d9ad819c21f814 doc_id: 871795 cord_uid: sr6uizat To estimate the size of the novel coronavirus (COVID-19) outbreak in the early stage in Italy, this paper introduces the cumulated and weighted average daily growth rate (WR) to evaluate an epidemic curve. On the basis of an exponential decay model (EDM), we provide estimations of the WR in four-time intervals from February 27 to April 07, 2020. By calibrating the parameters of the EDM to the reported data in Hubei Province of China, we also attempt to forecast the evolution of the outbreak. We compare the EDM applied to WR and the Gompertz model, which is based on exponential decay and is often used to estimate cumulative events. Specifically, we assess the performance of each model to short-term forecast of the epidemic, and to predict the final epidemic size. Based on the official counts for confirmed cases, the model applied to data from February 27 until the 17th of March estimate that the cumulative number of infected could reach 131,280 (with a credibility interval 71,415-263,501) by April 25 (credibility interval April 12 to May 3). With the data available until the 24st of March the peak date should be reached on May 3 (April 23 to May 23) with 197,179 cumulative infections expected (130,033–315,269); with data available until the 31st of March the peak should be reached on May 4 (April 25 to May 18) with 202,210 cumulative infections expected (155.235–270,737); with data available until the 07st of April the peak should be reached on May 3 (April 26 to May 11) with 191,586 (160,861-232,023) cumulative infections expected. Based on the average mean absolute percentage error (MAPE), cumulated infections forecasts provided by the EDM applied to WR performed better across all scenarios than the Gompertz model. An exponential decay model applied to the cumulated and weighted average daily growth rate appears to be useful in estimating the number of cases and peak of the COVID-19 outbreak in Italy and the model was more reliable in the exponential growth phase. In December 2019, an outbreak of a novel coronavirus occurred in Hubei Province, China. Since then, COVID-19 has spread around the world, and on March 11, 2020, it was declared a pandemic by the World Health Organization (WHO). As of April 30, 2020, there have been 3,090,445 cases and 217,769 deaths, reported globally. Italy has been one of the worst affected countries, and so far has reported 203,951 cases and 27682 deaths (as of April 30, 2020) (World Health Organization, 2020). The Italian government has implemented a series of social distancing and public health strategies to try and reduce the spread of COVID-19, including closing schools, screening passengers at railway stations and airports, identifying active cases, and isolating suspected cases. On March 9, 2020, Italy became the first country to implement a nation-wide lockdown to try and limit the COVID-19 spread (Sylvers & Legorano, 2020) . public health measures can be implemented quickly. Besides, it is essential to formulate a plan to ensure that patients with COVID-19 can be cared for in the hospital if required. Many scholars have been attempting to estimate the basic reproduction number (R0) and predict the future trajectory of the COVID-2019 outbreak. In this paper, we use phenomenological models without detailed microdata, which allows for simple calibrations to be made to the reported empirical data so that quick interpretations can be formulated. Many authors have attempted to use the cumulative number of new infections to estimate the "peak" in the early stage of the outbreak with the main models used in epidemiology, SIR, SEIR, Log-Linear, Negative-Binomial. However, these forecasting models are limited in that the transmission speed of new viruses, such as SARS-CoV-2 is unknown. As such, many epidemiologists have reported that it is challenging to make short-term predictions regarding the peak. Studying the spread of COVID-19 in a country such as China could be useful. In particular, Hubei Province, which is similar to Italy in terms of resident population numbers, could provide vital information for exploring the dynamics of COVID-19, despite the social contexts between the two locations being very different. The contagion curve of China and/or the Hubei Province could be used to create epidemiological growth models, from which the parameters could be used as a starting point for other populations. However, the absolute number of infections depends on a number of population-based characteristics, such as ethnicity and age structure. Another useful indicator to evaluate the speed of the virus spread, which can be used comparatively between two populations, is the daily growth rate and its associated trend. The daily growth rate, which is the number of new cases at time t divided by the total number of cases recorded up to time t-1, is a useful indicator for assessing the daily spread and speed of infection. The peak of an epidemic is considered to be when the growth rate is zero. This rate can be comparatively used as it is affected by the intrinsic characteristics of the virus. However, it may be affected by the particular J o u r n a l P r e -p r o o f detection conditions in a single day (number of tests carried out, time of detection, delay in the processing of tests by laboratories, partial communication of daily data by regions, and endogenous situations, such as climatic events that influence the movements of people). Indeed, the daily growth rate can have significant variability, particularly if the rate is only tracked over a short period of time. Besides, the daily growth rate and its variability depend on the total number of infections. At the beginning of the outbreak, the daily rates are more unstable, but they appear to stabilize when the cumulative number of infections increases. The excessive variability of daily growth rates adversely affects the robustness of forecasting models. The average of the daily growth rate could be used to reduce this high variability in the daily growth rate, as it has a smoothing effect on the rates cited. This study aimed to make short-term forecast by defining and evaluating the daily growth rate and applying the cumulative and weighted average of the crude daily growth rate to an epidemic curve. Italian epidemic data registered between February 26, 2020 and April 18, 2020 by Italian Protezione Civile (Protezione Civile Italiana, 2020) and Chinese (Hubei Province) epidemic data, between January 21, 2020 and March 18, 2020, were taken from Hubei Province government website and used in this study (Hubei provincial government website, 2020). The first step was to evaluate the growth rate: given the daily infections at day t (DI t ) and the cumulative infections at day t (CI t ), respectively. The daily growth rate (DGR) was calculated by: Second, the average of the DGR was evaluated. These were determined using the highest values of DI t and CI t , weighted for CI t-1 , and calculated progressively, using the cumulative number of cases from the second day of the series: where t > 1. J o u r n a l P r e -p r o o f DGR t = DI t /CI t-1 and DRG t * CI t-1 = DI t , (1) becomes Third, the weighted and cumulated average of the daily growth rate ( ) was determined by analyzing the trends in Hubei, from which the results were used to model an epidemic in Italy. The series of observed in Hubei approach a negative exponential model. Therefore, a nonlinear regression was applied to the exponential decay model (EDM) to evaluate its reliability. In this model, if a function Y(t) is monotonically decreasing with a velocity parameter r>0, then Y(t) the exponential decay model with the form: where t is time in days, a is the starting value Y(0), and b the decay rate; in our application, a is when t=0. To take into account the plateau, which is expected at the end of the first epidemic phase in (3), we added a term k that is the rate at the end of the curve: When the number of new cases 0 then, k. It was thought that the epidemic in Italy had peaked when this model was applied. Therefore, it was assumed that a conceivable k term in the exponential decay model for the Italian epidemic could be the same as that observed in the Hubei epidemic that was signed as , so the model applied was: J o u r n a l P r e -p r o o f To evaluate the variability of the forecast, we used credibility intervals (Andrej-Nikolai Spiess, 2018), determined through a first-order Taylor expansion and Monte Carlo simulation (Mekid &Vaja, 2008; Wang & Iyer, 2005) . Expected occurrences, !" , and expected cumulated occurrences #" , were determined using the inverse of (2) once estimated ( ). A comparison with the Gompertz model, which is often used to estimate cumulative events, was made to evaluate the fitting of the model based on the weighted and averaged growth rates. This was done because the two-parameter Gompertz model (GoM) is based on exponential decay, and is given by the ordinary differential equations: where t is time, C'(t) describes the incidence curve over time, and C(t) is the cumulative number of cases at time t, and b > 0 describes the exponential decay of the growth rate r (with r > 0). If C(0) is the initial number of cases, then the solution of (6) is This model was applied to the cumulative infection data in Italy during the same period. To compare the mathematical models and assess their predictive accuracy at different stages of the outbreak, we estimated the two models (5 and 7) using the observed values in four-time intervals: It is possible to quantify the error of the model fit to the data using performance metrics (Kuhn & Johnson, 2013) . The most widely used performance metrics are the Pearson's correlation coefficient, root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), given by the respective expressions: MAPE, together with the correlation coefficient, are performance indicators independent of the outcome size. For this reason, they were used to compare model (5) applied to model (7) applied to CI. RMSE and MAE were used to compare the same model in relation to the different time phases of the estimate. These four metrics are also useful for quantifying the error associated with a forecast and were used to evaluate the goodness of fit of the models and to measure the reliability of the model in predicting the values observed after the date of the estimate. The analysis was carried out using R software (version 3.6.3) (R Core Team, 2020) with the packages "nls2" (Grothendieck, 2013) and "propagate" (Andrej-Nikolai Spiess, 2018). Starting from February 27, 2020 until April 07, 2020 (last day before the model fitting), the number of infections had grown from 650 to 135,606. The daily growth rate ranged from 2.3% to 62.5%, with a median of 13.4% (Inter Quartile Range IQR 6.9% to 22.8%). The ranged from 7.0%-45.4%, with a median of 17.3% (IQR, 11.3%-26.0%) (Fig 1) . 2) . The fitting of the curve was remarkably high in relation to the observed values, and the residual standard error was 0.0084. We used =0.011 in (5) The cumulative number of cases (CI) was estimated at the same time intervals P1-P4 with the model (6): parameters decreased from 602.2 in P1 to 118.2 in P4, whereas exponential decay b and growth rate increased (Table 2 ). In these models, the RMSE increased from 256.7 in P1 to 106 in P2, and the MAE was opposite to the growth rate models (Table 2) . The correlation between the expected and observed values is high in all models, with the CI models having coefficients of nearly 1. The MAPE result was lower only in P1 for the CI models. In contrast, in time intervals P3 and P4, the measure of fit was higher for than for CI (Table 3a) . J o u r n a l P r e -p r o o f Table 3 . Fitting (a) and forecasting (b) performance statistics for the exponential decay model applied to (Eq. (5) ) and the Gompertz model applied to the CI (Eq. (7)). For each time interval, we highlight the lowest MAPE (gray) for the size of the error. Gompertz model (Eq. (7); right column) calibrated using the epidemic data for scenarios 1, 2, 3, and 4, respectively. The expected values (continuous red line) and the 95% credibility interval (red dotted line) for the calibrated models are displayed along with the observed data of the epidemic, both those used for the estimation of the models (empty black circles) and those after modeling (filled black circles). All models for CI based on Gompertz have forecast peak dates beyond the May 31 with cumulative infections expected using the data available for P1, P2, P3 and P4 of 585,836 (227,674-1,027,058), 437,665 (309,556-564412), 222,690 (190,579-243,492) and 195,472 (178,952-204,502) , respectively. In this report, we provide short-term forecasts of the cumulative number of reported cases of the SARS-CoV-2 in initial phase of the outbreak in Italy as of March 17, 2020. Dynamic growth models provide an important quantitative framework for predicting epidemic trajectories, generating estimates of key transmission parameters, assessing the impact of control interventions, gaining insight into the contribution of different transmission pathways, and producing short and long-term forecasts (Chowell, 2017) . To forecast the evolution of the outbreak, we used phenomenological models with an empirical approach, as such models are useful for reproducing patterns observed in time series data (Viboud, Simonsen & Chowell, 2016) . The result is a reasonably simple temporal description of the epidemic growth patterns (Chowell, 2017). Observation of the epidemic curve in other locales that have already experienced the same outbreak is fundamental in the application of phenomenological models. In this study, we used the epidemic curve of the Chinese province of Hubei to estimate one of the parameters of the exponential decay model that was then applied to the Italian data. It was necessary to find the time point at the beginning of the outbreak in which the curves of the weighted and cumulative average rates were superimposable between the two populations to enable this transposition. proved to be a stable The exponential model with WR, fitted with data available between February 27 and April 07 appears to be close to the observed data, and the MAPE shows a very good fit, with a lower difference between the expected and observed cases than those in the previous model. In contrast, the Gompertz model has a sudden deviation from the observed line and overestimated the number of cases and duration. It is still unclear what effect the number of swabs performed has on the reported number of positive subjects in the last week of the epidemic compared to at the beginning, and this could significantly affect any modeling of the COVID-19 outbreak. An exponential decay model applied to the growth rate of infections weighted with WR appears to be useful in short-term estimation of the number of cases and peak in the first phase of the COVID- propagate: Propagation of Uncertainty. R package version 1 COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecast Analysis and forecast of COVID-19 spreading in China nls2: Non-linear regression with brute force Applied Predictive Modeling Propagation of uncertainty: Expressions of second and third order uncertainty with third and fourth moments Early epidemiological assessment of the transmission potential and virulence of 2019 Novel Coronavirus in Wuhan City: China Coronavirus Emergency -Situation Map The Novel Coronavirus, 2019-nCoV, is Highly Contagious and More Infectious Than Initially Estimated A generalized-growth model to characterize the early ascending phase of infectious disease outbreaks On higher-order corrections for propagating uncertainties Novel Coronavirus (2019-nCoV) Situation Reports 103 We would like to thank Editage (www.editage.com) for English language editing The epidemic trend information from the COVID-19 outbreak in Hubei Province, China, proved to be essential for estimating the parameters to use in the Italian model.