key: cord-1001785-35xpmdbj authors: Kolozsvari, Laszlo Robert; Berczes, Tamas; Hajdu, Andras; Gesztelyi, Rudolf; TIba, Attila; Varga, Imre; Szollosi, Gergo Jozsef; Harsanyi, Szilvia; Garboczy, Szabolcs; Zsuga, Judit title: Predicting the epidemic curve of the coronavirus (SARS-CoV-2) disease (COVID-19) using artificial intelligence date: 2020-04-22 journal: nan DOI: 10.1101/2020.04.17.20069666 sha: 1c64f9fc33b20d3ad9e96e86896834cd38b7483c doc_id: 1001785 cord_uid: 35xpmdbj Objectives: The current form of severe acute respiratory syndrome called coronavirus disease 2019 (COVID19) caused by a coronavirus (SARSCoV2) is a major global health problem. The aim of our study was to use the official data and predict the possible outcomes of the COVID 19 pandemic using artificial intelligence (AI) based RNNs (Recurrent Neural Networks), then compare and validate the predicted and observed data. Materials and Methods: We used the publicly available datasets of World Health Organization and Johns Hopkins University to create the training dataset, then have used recurrent neural networks (RNNs) with gated recurring units (Long Short Term Memory: LSTM units) to create 2 Prediction Models. Information collected in the first t time steps were aggregated with a fully connected (dense) neural network layer and a consequent regression output layer to determine the next predicted value. We used root mean squared logarithmic errors (RMSLE) to compare the predicted and observed data, then recalculated the predictions again. Results: The result of our study underscores that the COVID19 pandemic is probably a propagated source epidemic, therefore repeated peaks on the epidemic curve (rise of the daily number of the newly diagnosed infections) are to be anticipated. The errors between the predicted and validated data and trends seems to be low. Conclusions: The influence of this pandemic is great worldwide, impact our everyday lifes. Especially decision makers must be aware, that even if strict public health measures are executed and sustained, future peaks of infections are possible. The AI based predictions might be useful tools for predictions and the models can be recalculated according to the new observed data, to get more precise forecast of the pandemic. WHO with R0 suggested to range between 1.4 and 2.5. More recent analyzes have indicated 109 higher R0 values around 3 (with the mean and median R0 for published estimates being 3.28 110 and 2.79, respectively). 11,15 111 The daily number of the newly diagnosed infections -epidemic curves 112 The initial epidemic curves of the COVID-19 outbreak from Hubei, China showed a mixed 113 pattern, indicating that early cases were likely from a continuous common source e.g. from 114 several zoonotic events in Wuhan, followed by secondary and tertiary transmission providing There are different mathematical models that may demonstrate and predict the dynamics of the 134 different infectious diseases. 18 These models, used to simulate the dynamics of infectious 135 diseases, may be based on statistical, mathematical, empirical or machine learning methods. 19 The first attempts to use Artificial Intelligence (AI) in medicine were made in the 1970s. Initially AI was used to implement programs to help clinical decision making, but to date its 138 use is gaining more and more widespread acceptance in biomedical sciences. 20 One class of AI, a form of artificial neural networks, the Recurrent Neural Networks (RNNs) 140 with Long short-term memory (LSTM) were previously used to model and forecast the 141 influenza epidemic, with strong competitiveness and reliable results. 21, 22, 23 The aim of the current study was to use the available official data as a training dataset, followed 143 by predicting the possible outcomes of the COVID-19 pandemic using AI-based RNNs, then 144 compare the predictions with the observed data. in China were from Hubei province, only data from that province was included. For each 154 country, the date of the first infection was set as day 1 for the disease time scale. (Fig 1) 155 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. cases where single verified cases were isolated, and no further transmission has occurred). This 159 was important to avoid distortion of the propagated epidemic curves. In Belgium, for example, 160 the first illness occurred on 04/02/2020 and there was no further case reported for up to 26 days. The next illness occurred on 01/03/2020. Inclusion of the early case from February would 162 contribute to a false learning rule for the AI, hence corrupting the results. As for Hubei Province, (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The training data set was obtained by averaging the daily incidence rates per 100 000 176 inhabitants across the 17 countries included, for each day in the time series. When calculating 177 the average, missing data was left blank, i.e. NULL, e.g. countries that did not contain a data (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . The intuitive interpretations of the difference between Prediction 1 and Prediction 2 are as 221 follows. Prediction 2 makes its predictions utilizes the information derived from the training 222 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. To validate the predictions, we first made the above mentioned two predictions based on data 235 available up to 30/03/2020. The resulting daily new morbidity data are labeled "Old Prediction 236 1" and "Old Prediction 2" on each graph. We then expanded our factual data set with new daily 237 data available until 10/04/2020. These new factual data are labeled "Observed next days" on 238 the graphs. Thus, except for Hungary, we have 11 new daily factual data elements for all 239 countries examined. In the case of Hungary, the data of 10/04/2020 were already available, so 240 in this case 12 new factual data elements are included. Using these data, we validated the two 241 predictions of our model. The amount of root mean squared logarithmic errors (RMSLE) was used for validation. In our analysis the possible bias regarding the difference ratios between the observed and 244 predicted values are interpreted using root mean squared logarithmic errors (RMSLE). 245 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. It should be noted that if the error function is parallel to the x-axis, it means that the trend of 255 the prediction is the same as the real trend, only at a lower or higher scale. As the next step, using the next 11 new observation data elements after the first prediction and 257 12 in the case of Hungary, we modified the predictions using both methods. These modified 258 prediction data are labeled New Prediction 1 and New Prediction 2, respectively. The following section shows the outcomes for Prediction 1 and Prediction 2 for the individual 261 country level data (Figs 4-10) . 262 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. The total errors for the entire investigated period, the summarized mean of the predictions 285 (RMSLE) by country shown in Table 1 . (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. Albeit suppression and mitigation measures can reduce the incidence of infection, COVID-19 302 disease, given its relatively high transmissibility reflected by average R0 values of 3.28, will 303 continue to spread, most likely. 14 Accordingly, public health measures must be implemented as 304 the incubation period of the virus may be long (1-14 days, but there are some opinions, that this 305 can be 21 days), during which time asymptomatic or presymptomatic spreading may ensue. Moreover currently it is uncertain, whether those, who were diagnosed with COVID-19 data regarding diagnostic tests performed per country, or death rates were omitted, given they Summarizing, the COVID-19 disease is a global health challenge, which caused the WHO to 343 declare a "public health emergency of international concern on 30/01/2020". 16 The influence of 344 this global epidemic has dug deep into the day-to-day conduct of everyone, with unforeseen 345 challenges still pending for governments and policymakers. Starting from this, everyone, 346 especially decision makers must be aware, that the current situation might be just the beginning, The results of our study underscore that the COVID-19 pandemic is probably a propagated 351 source epidemic, therefore repeated peaks of the rise of the daily number of newly diagnosed 352 infections are to be anticipated. To the best of our knowledge this is the first study to model the predicted evolution of the 354 pandemic using data from official databases with the help of the AI-based RNNs trained on the 355 currently available data regarding the spread of the disease and validated with comparison of 356 the predicted and observed data. Most studies to date expect a single peak on the epidemic 357 curve, but some fear the emergence of future peaks when mitigation-suppression measures will 358 be discontinued. According to our models, this can even happen, if the strict measures are 359 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The role of the funding source 384 The funding sources had no role in the writing of the manuscript of the decision to submit it for 385 publications, no involvement in data collection, analysis, or interpretation; trial design; patient 386 recruitment; or any aspect pertinent to the study. 387 We have not been paid to write this article by a pharmaceutical company of other agency. Dr. László R. Kolozsvári, the corresponding author had full access to all the data in the study 389 and had final responsibility for the decision to submit for publication. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 22, 2020. . Coronavirus 2019-nCoV: A brief perspective from the front 11 WHO. World Health Organization. Q&A on coronaviruses (COVID-19) Association of COVID-19 Disease Severity with Transmission 427 Routes and Suggested Changes to Community Guidelines SARS-CoV-2 and COVID-19: The most important 429 research questions WHO. World Health Organization. Novel Coronavirus (2019-nCoV) situation reports The reproductive number of COVID-19 is higher 434 compared to SARS coronavirus Characteristics of and important lessons from the coronavirus 436 disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the 437 Chinese Center for Disease Control and Prevention Impact of non-pharmaceutical 439 interventions (NPIs) to reduce COVID19 mortality and healthcare demand