key: cord-0823199-azpz6e7q authors: Distante, Cosimo; Gadelha Pereira, Igor; Garcia Goncalves, Luiz Marcos; Piscitelli, Prisco; Miani, Alessandro title: Forecasting Covid-19 Outbreak Progression in Italian Regions: A model based on neural network training from Chinese data date: 2020-04-14 journal: nan DOI: 10.1101/2020.04.09.20059055 sha: 26d4b8477cbf9322d92147d863f7d318a71c76da doc_id: 823199 cord_uid: azpz6e7q Background. Epidemiological figures of Covid19 epidemic in Italy are worse than those observed in China. Methods. We modeled the Covid19 outbreak in Italian Regions vs. Lombardy to assess the epidemics progression and predict peaks of new daily infections and total cases by learning from the entire Chinese epidemiological dynamics. We trained an artificial neural network model, a modified auto-encoder with Covid19 Chinese data, to forecast epidemic curve of the different Italian regions, and use the susceptible/exposed/infected/removed (SEIR) compartment model to predict the spreading and peaks. We have estimated the basic reproduction number (R0), which represents the average number of people that can be infected by a person who has already acquired the infection - both by fitting the exponential growth rate of the infection across a one month period, and also by using a day by day assessment, based on single observations. Results. The expected peak of SEIR model for new daily cases was at the end of March at national level. The peak of overall positive cases is expected by April 11th in Southern Italian Regions, a couple of days after that of Lombardy and Northern regions. According to our model, total confirmed cases in all Italy regions could reach 160,000 cases by April 30th and stabilize at a plateau. Conclusions. Training neural networks on Chinese data and use the knowledge to forecast Italian spreading of Covid19 has resulted in a good fit, measured with the mean average precision between official Italian data and the forecast. According to the Italian National Institute of Health (ISS), at the date of April 8 th in Italy there were about 140,000 people positive to the 2019-nCoV (including deceased patients) since the beginning of epidemic (95,262 currently positive and 26,491 healed). [7] About 53% of cases are males (median age: 62 years old). Detailed epidemiological figures are provided by the Italian National Institute of Health (ISS) and tell us that men represent the majority of cases in people aged 0-9 and 50-79 (range 52-63%), while in the younger age groups 0-19, as well as between 80 and 89 years old, males and females are equally represented among people who tested positive for Covid-19. Women accounted for 70% of cases >90 years old and about 55% between 20 and 39 years of age, but men represented also the vast majority of deceased people in all the age groups up to 89 years old (range 57-79%). [7] Regional figures are available up to April 8 th and show that about 30% (n=28,545) of currently positive people still live in Lombardy (56% if considering the overall cases confirmed from the beginning of the epidemic), followed by Emilia Romagna (13.7% of currently positive people), Piedmont (11.5%), Veneto (n=10.7%), Tuscany (5.8%), Marche (3.7%), Lazio (n=3.6%), Liguria and Trentino Alto Adige (3.4%), Campania and Apulia (3%), Sicily (2%), Friuli Venezia Giulia and Abruzzo (1.5%), and less than 1% in Umbria, Sardinia, Calabria, Val d'Aosta, Basilicata, and Molise. [7] A total of 28,485 symptomatic people were hospitalized at the same date of April 8 th With the exception of Lazio (n=196), Campania (n=97) and Apulia (n=90), all the other regions of Central and Southern Italy, at the moment have less than 65 patients admitted to the ICUs of their regional healthcare systems. [7] On April 8 th , total deaths were 17,669 at national level (+65% from March 30 th to April 8 th ), with in Val d'Aosta and less than 50 in the other four regions (Figure 1 ). [7] Lethality rates seems to increase with age and it is higher in males: 0% from 0 to 29 and <1% between 30 and 49 years of age; 2.3% in the age group 50-59 (1% in women and 3.5% in men); 8.4% from 60 to 69 years old ( The correct prediction of new daily cases at this time of Italian COVID-19 outbreak requires the correct estimation of the peak including the unknown remaining part of the epidemiological curve, where this later can be predicted ahead by using a deep convolutional auto-encoder. Therefore, we applied a Modified Auto-Encoder (MAE) for a time-series forecast in order to predict the evolution of daily cases for each of the 21 regions of Italy. [15] The model was trained with the data from the Chinese regions, which provides complete data in the sense that they already went through the peak number of daily cases and managed to suppress the epidemic by social distancing measures. Although such measures implemented by the Chinese government may be impossible to implement in other countries or may not be as effective as it was in China, the data generated by their experience going through the epidemic can be used to derive data-oriented models to predict the epidemic dynamic behavior in other countries. The forecast of new daily cases has been used to correctly estimate the single peaks but also to obtain better spreading predictions from SEIR model. We modeled spreading of Covid-19 using Chinese data and used the model The data provided by the dataset regards the total number of cases confirmed up to date. However, to use the daily cases, we differentiate the data of each region with regards to each day. In order to train our model, we used the data from 31 Provinces/Cities of mainland China and the three other regions: Hong Kong, Macau, and Taiwan. In total, data from 34 regions were used and . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . represented on the columns of the matrix D with rows representing each day since the first case report on the region. The Italian regional time series used in this paper for the forecast, has been taken from the Itali [16] models the input/output data by creati a hidden probabilistic representation of the data in its middle layer, also called latent space. The MA model modifies the traditional auto-encoders to employ an extra output branch derived from the late space. While the traditional output of auto-encoder architectures is designed to be trained to match t input, in our case, the extra output is modified to predict also the next sample of the sequencetime seri given to the input. Figure 2 depicts the MAE model architecture. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059055 doi: medRxiv preprint new daily cases would occur since it indicates the decrease in the number of cases and a possible end the epidemic. Therefore, we use the output of the prediction of step i-th as an input to the prediction of t (i+1)-th. Repeating such a recursive procedure multiple times gives us a multiple-step ahead forecastin In this way, we divided the forecast procedures into two phases. The first phase regards the time in which the real data is available, for this phase we compute only t one-step-ahead forecasting. The second phase regards the time in which there is no real data availab hence we compute multiple-step-ahead forecasting. In general, the second phase starts on the last day available data. We have chosen to train the modified auto-encoder by using several data and several latent variable The model trained with data from the provinces of China with z=4 presented optimal results. T following results presented at this section are obtained with the optimal model. We also evaluate the performance of our model by analyzing how close it predicts the next samp in the first phase of the forecasting procedure, that is when real data is available. Figure 4a depicts the daily cases and Figure 4b depicts the cumulative cases of Covid-19 for t Lombardy region. From Figure 4a , we can verify that the model is following the real data on the one-ste ahead forecasting and Figure 4b indicates a plausible trend in the multiple-step-ahead forecasting. The basic reproduction number (R0) is an indicator that resumes the average number of people that can infected by a person who has already acquired the infection. R 0 is a metric of how contagious is t disease and its correct estimation is extremely important for epidemiologists, especially when facing ne diseases like COVID-19. R 0 can be computed in different ways. In our models, we have estimated t basic reproduction number (R 0 ) both by fitting the exponential growth rate of the infection across a month periodand also by using day by day assessment, based on single observations [1] . This stu makes use of the susceptible-exposed-infected-removed (SEIR) compartment model [4] to predict t spreading of the pandemic in Italy. Our efforts could be helpfulin the adoption of all the possib preventive measures, and to study of the epidemics progression across Southern regions as opposed to t national trend. This metric can be biased by the optimal estimation of the basic reproductive number (pronounced R-nought). It must be said that R 0 is important if correlated with weather conditions and th reproductive index is reduced as the air temperature and relative humidity increase, [5] according to t formula: author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059055 doi: medRxiv preprint R 0 is an average value, but it can be also computed day by day to monitor the transmission of the infection. Being an average value, it can be skewed by super-spreader events. A super-spreader is an infected individual who infects an unexpectedly large number of people. In Italy this event can be also generated not necessarily by an individual, but from the perturbation of a susceptible population, as it happened in Apulia and Sicily with uncontrolled large group of people coming from outbreak areas. For a "super spreader" individual, such events are not necessarily a bad sign, because they can indicate that fewer people are perpetuating an epidemic. Super-spreaders may also be easier to identify and contain, since their symptoms are likely to be more severe. In short, R 0 is a moving target. Tracking every case and the transmission of a disease is extremely difficult, so the estimation of R 0 is a complex and challengingissue: estimates often change as new data becomes available. In [10] a review of 12 studies on the reproduction number for a time period covered from 1 January 2020 to 7 February 2020was analysed for Covid-19 from China and overseas. The work found that the R 0 estimates ranges from1.4 to 6.49 [12] [11], passing through [14] of 4.08 in mainland China. The review in [10] pointed out a mean of 3.28, a median of 2.79 and interquartile range (IQR) of 1.16, which is considerably higher than the WHO estimate at 1.95. Also in [18] the R 0 for 2019-nCov is reported in the range [1.4, 5.5] . These estimates of R 0 depend on the estimation method used as well as the validity of the underlying assumptions. In a beginning stage, due to a small amount of data and short time onset, these estimates can be biased, and in a longer period converges to the WHO estimate. The initial estimates result in a reproduction number for Covid-19 higher than SARS coronavirus, where this last it is reported to range between 2 and 5. If we define the Y(t) as the number of infected people with symptom at time t, the exponential growth as reported in [6] . . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint We used the susceptible-exposed-infectious-recovered (SEIR) model [4] to simulate epidemic since it was established on January 2020. This model is used with the predictions from the modified auto-encoder neural network to better estimate the peaks. It is based on a previous model SIR which was based on three compartments, but since the infection has an incubation period, the compartment E (Exposed) is included. These compartments are modeled over the time, and capture the changes in the population. Let us say that given N the total population, then N=S+E+I+R, where: -"S" Susceptible is the portion of population that does not have any vax coverage or immune; -"E" exposed: is the portion of the population that have been infected, but are in the incubation period that do not infect others; -"I" Infectious: is the portion of N that is infectious and may infect others, they become dead or may recover; -"R" Recovered: number of infectious people who have been healed, and become immune. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint with the mean average precision between official Italian data and the forecast. SEIR model can gain advantage on modeling epidemic since the compartments are based on the complete curve dynamic (the portion of the real data and those forecasted). We showed the validity of the method since the predictive model learns from the dynamics of Covid-19 in China and exploits its learned knowledge to predict future daily cases in Italy. As shown in Figure 9 and in Table 2 , the expected peak of SEIR model was . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059055 doi: medRxiv preprint confirmed at the end of March at national level. It is expected that Southern Italian Regions could rea the peak of total positive cases later, by April 11 th , as estimated with dynamic SEIR model. T u s c a n y . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059055 doi: medRxiv preprint Coronavirus Surveillance Bulletin Italian Ministry of Health, daily bulletin Covid-19 outbreak in Italy Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV Epidemic processes in complex networks High Temperature and High Humidity Reduce the Transmission of COVID-19 Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak Simple framework for real-time forecast in a data-limited situation: the Zika virus (ZIKV) outbreaks in Brazil from 2015 to 2016 as an example Early dynamics of transmission and control of COVID-19: a mathematical modelling study The reproductive number of COVID-19 is higher compared to SARS coronavirus Modelling the epidemic trend of the 2019 novel coronavirus outbreak in China Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Estimating the effective reproduc-tion number of the 2019-nCoV in China Artificial intelligence forecasting of covid-19 in china Reducing the dimensionality of data with neural networks Pathogenicity and transmissibility of 2019-nCoV-a quick overview and comparison with other emerging viruses CC-BY-NC-ND 4.0 International license It is made available under a Authors Contribution: AD, PP, AM conceived, wrote and revised the manuscript