key: cord-1052024-qq3e48cj authors: Ankarali, H.; Erarslan, N.; Pasin, O. title: Modeling and Short-Term Forecasts of Indicators for COVID-19 Outbreak in 25 Countries at the end of March date: 2020-05-01 journal: nan DOI: 10.1101/2020.04.26.20080754 sha: 4286e37f075ce17831cbe0d4db16287a5e58f140 doc_id: 1052024 cord_uid: qq3e48cj Background: The coronavirus, which originated in Wuhan, causing the disease called COVID-19, spread more than 200 countries and continents end of the March. There is a lot of data since the virus started. However, these data will be explanatory when accurate analyzes are made and will allow future predictions to be made. In this study, it was aimed to model the outbreak with different time series models and also predict the indicators. Methods: The data was collected from 25 countries which have different process at least 20 days. ARIMA(p,d,q), Simple Exponential Smoothing, Holts Two Parameter, Browns Double Exponential Smoothing Models were used. The prediction and forecasting values were obtained for the countries. Trends and seasonal effects were also evaluated. Results: China has almost under control according to forecasting. The cumulative death prevalence in Italy and Spain will be the highest, followed by the Netherlands, France, England, China, Denmark, Belgium, Brazil and Sweden respectively as of the first week of April. The highest daily case prevalence was observed in Belgium, America, Canada, Poland, Ireland, Netherlands, France and Israel between 10% and 12%.The lowest rate was observed in China and South Korea. Turkey was one of the leading countries in terms of ranking these criteria. The prevalence of the new case and the recovered were higher in Spain than Italy. Conclusions: More accurate predictions for the future can be obtained using time series models with a wide range of data from different countries by modelling real time and retrospective data. have been examined before. But the estimates have been made for several countries such as China, Japan and USA, whose data are relatively much more [1] [2] [3] [4] [5] [6] [7] [8] [9] . In many countries, the total number of cases reached very high at the end of March, and the outbreak period has exceeded one month. Under these conditions, modeling the data collected from many countries to define the outbreak process and the characteristics of the outbreak, to take precautions and to create projection for the countries where the outbreak has begun or will began is of great importance. In this study, it was aimed to model the outbreak for six indicators from twenty five countries with different starting dates and processing at least twenty days of data using the end of March data. For In the study, twenty five countries were selected which cumulative cases exceeding 1000 among the 160 countries have exposed to the COVID-19 outbreak before March 15 and data for six indicators were modeled. The countries selected for modeling were divided into periods at certain intervals according to the start times of the outbreak and was presented in Table 1 as a summary. The number of countries struggling with the outbreak increased to 200 at the end of March but data from 40 countries that did not contain sufficient data for modeling and started to struggle the outbreak after March 15 were excluded from the study. Since at least one of the countries which was announcing the first case at different periods, was selected for modeling, the results to be obtained from these countries will create a prediction for other countries under similar conditions and for countries that have just started the outbreak. The data were obtained from the internet sources (WHO, Worldometers and Wikipedia). The data of the indicators described below were modeled. • Cumulative Cases practice, data generally varies due to seasonal, irregular movements or incidental reasons. The process becomes stationary when the non-stationary ARIMA time series is taken from the appropriate degree. ARIMA processes are expressed in ARIMA (p, d, q) notation. Here, p indicates the degree of autoregressive process, d is the difference of degrees, and q is the degree of the moving averages process. The ARIMA (p, d, q) process includes both AR (p), MA (q) and ARIMA (p, q) processes. The ARIMA process is an integrated autoregressive moving average process. In the ARIMA models, where d = 1, p and q is 0, then the ARIMA (0,1,0) process is obtained, which is defined as the random walk process [10] [11] [12] . When the first difference of the non-stationary If the series is not still stationary, a second difference is taken. is obtained. In this case, d = 2. Therefore, the non-stationary series is made stationary and shows the ARMA process. The model takes the form of ARIMA (p, d, q). In the method, trial method is used to determine the appropriate p, d and q values. Therefore, this method has the disadvantage, but it gives very good results in short term estimations. While d value is determined as the number of differences in the series, p and q values are obtained by examining partial autocorrelation and autocorrelation functions, respectively. The least squares method or nonlinear methods are used in parameter estimates. Assessment of goodness of fit is investigated by AIC and SIC criteria. In addition, compliance with the white noise process is checked for model suitability. The model must comply with the white noise process [10] [11] [12] . The advantage of the model is that it can be applied in series with stochastic trends and the estimates can be updated by evaluating the latest changes in the data. which gives the lowest average standard error value, will be the best choice 13 . The simple exponential smoothing method is only suitable for time series that move around an average. In the moving averages method, equal weight is given to the period values by experiment while different weights are given in the simple exponential smoothing method and these values are becoming exponential shape. In the simple exponential smoothing method, higher values are given to recently obtained values in weighting. The simple exponential smoothing method is often used to estimate short-term predictions 13 . Holt's Two Parameter Model: The Holt's two parameter model is used when there is a trend in the data. But in the data there is no seasonality. In the method, while each forecast is obtained, the previous forecast is updated and new forecast values are obtained. Since the method also takes into account the trend in the data, it is slightly more complicated than moving averages and simple exponential smoothing methods 14 . Holt's two parameter model is created with the help of the following equations. In the above functions, p is the number of periods predicted, while the value of Method uses trend estimates, so in the model there is two coefficients. α is the smoothing coefficient and the β is the coefficient used for trend estimation. These coefficients ranges between 0 and 1. The . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020. is the predicted level in time t, while ܶ ௧ is the predicted trend in time. If there is an increase and decrease in the series in the data set, Brown's model gives better results. In the method, two different smoothed series are used, which are located in two different centers. Brown's method tries to create a linear equation. Estimates include single and double smoothed constants. Therefore, the method takes its name from this feature 15 . In modeling, cumulative cases, cumulative deaths, daily cases, daily deaths, cumulative recovered and active cases, which are indicators of the COVID-19 outbreak, were used. The models with the most successful results were selected and 10-day forecasts after the modelling were calculated. In the modelling process, autocorrelation values were calculated in the data and seasonality and trend of the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020. . data were examined. The largest R-square value and the smallest RMSE and Normalized BIC values were used for the model performance. The results of the China is very important for many countries because the COVID-19 outbreak first began in China. In addition, China has caused the outbreak to spread to many countries and has been fighting this epidemic for 3 months. Also, as of the end of March, it is the fifth country with the highest number of cases and total deaths in the world When the modeling results of China data were investigated considering six different indicators, It was seen that the first four indicators were modeled with high success, and sufficient success was reached in the modeling of cumulative recovered (Table 2 ). However, there was no suitable time series model that successfully models active cases. (R-squared < 0,50). The 10-day forecast results after the model building date for the indicators modeled with high success were given in Table 3 . When the table was evaluated, it was observed that the number of cases did not change much except recovered cases in the first half of April in China. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. Estimates of different indicator data obtained from China are shown in Figure 1 . From the graphs, it was seen that there was no significant change in the number of cumulative cases and the number of cumulative deaths as of the end of March. But the number of cumulative recovered was seen to increase. Daily cases and daily death were close to the minimum level. In 11 out of 24 countries that detected the first Coronavirus cases in the period of 16-31 January, the total number of cases was over 1000. So the data of the indicators used in the prediction of the outbreak process were modeled and the results are presented in Table 4 and Table 5 . It was seen that the activated cases can be modeled successfully in 6 out of 11 countries, and there was no time series model that produces a successful forecast for other countries (Table 4 , R-squared < 0,50). In addition, success in modeling the number of cumulative recovered numbers in several countries was between 70% and 85%, while the prediction success of models for other indicators was calculated around 99%. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. Italy was the second country with the highest number of cases in the world and the country with the highest number of deaths as of the end of March. In general, it was observed that the increase in indicator results continues, but the prevalence in the daily cases and daily death was slower than . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10.1101/2020.04.26.20080754 doi: medRxiv preprint other indicators when Table 5 and Figure 2 , which include the results of the forecast for the next 10 days, were examined. Spain was the 2nd country with the highest total number of deaths and the 4th country with the total number of cases. The forecast values for 10 days after modeling were presented in Table 6 and Figure 3 . It can be seen from the graphs that the number of cases and deaths continues to increase rapidly. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. South Korea was a country that has controlled the number of cumulative cases and cumulative deaths since the outbreak. The number of new deaths per day has never exceeded 10. After modeling the obtained data, the 10-day predicted results were given in Table 7 and Figure 4 . It was predicted that the changes in the indicators will not change in the first half of April when the table is evaluated. Germany has been the most successful European country that managed the outbreak for a long time. When the 10-day forecast values given in Table 8 and the forecast of the models in Figure 5 were examined, it was seen that there will be an increase in the indicators similar to the current days and the cumulative death in the cumulative case is very low (1%). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. The results for France were summarized in Table 9 and Figure 6 . The increase rate in the analyzed indicators continues. In the first half of April, the cumulative recovered prevalence in the cumulative case were expected to be around 14%. The mortality rate in positive cases is expected around 6%. USA data results were given in Table 10 and Figure 7 . The increase rate in the analyzed indicators continues. In the first half of April, the daily case prevalence in cumulative case was expected to be 12% and the cumulative deaths prevalence is around 2%. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. The results of UK data were given in Table 11 and Figure 8 . The rate of increase in the indicator values examined was similar in the first week of April. In this period, the cumulative recovered prevalence was quite low. The cumulative active case prevalence was found to be quite high (90%). When Canada data was evaluated, it is seen that the number of daily deaths is estimated to be quite low (Table 12 and Figure 9 ). The cumulative recovered prevalence was expected to be around 7% and the cumulative active case prevalence was around 87% in the first week of April. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. When Sweeden's data was evaluated, it was predicted that the new daily case prevalence will be around 5% in the first week of April, and the daily mortality and recoveried prevalence will still be at a very low level. In addition, cumulative death prevalence in cumulative case was estimated at 4% levels (Table 13 and Figure 10 ). At the end of the modeling for Australia data, it was predicted that the number of daily deaths will be very low in the first week of April, and the cumulative death prevalence will be observed at a very low level in the cumulative case. In addition, daily new cases and cumulative recovered prevalence were estimated to be around 4% (Table 14 and Figure 11 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. When Malaysia results were analyzed, it was predicted that the number of daily deaths will be very low and the cumulative death prevalence in the cumulative case will be 1% in the first week of April. In addition, the daily case prevalence was estimated to be around 6% and the cumulative recovered prevalence around 15% (Table 15 and Figure 12 ). These results emphasized that Malaysia has shown successful results in struggle with the outbreak. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. Model performance criteria obtained when Belgium data exposed to the outbreak between February 1-15 were modeled (Table 16) For Belgium, it was predicted that the daily mortality prevalence would be low in the cumulative case, the daily case prevalence would be around 15% and the cumulative recovered prevalence would be around 10% in the first week of April (Table 17 and Figure 13 ). Iran and Israel, which were exposed to the outbreak between 16-20 February, model performance criteria obtained and the model results were given in Table 18 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. When Iran data is modeled, it was predicted that the daily mortality prevalence in the cumulative case will be low in the first week of April, the daily case prevalence would be around 6% and the cumulative recovered prevalence will be around 30% (Table 19 and Figure 14 ). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. . When Israel data was modeled, it is predicted that there would be no new daily deaths in the first week of April, the daily case prevalence in the cumulative case would be around 10% and the cumulative recovered prevalence would be around 2% (Table 20 and Figure 15 ). The model performance criteria obtained when the data of 7 countries selected among the countries exposed to the outbreak between 21-29 February were modeled are given in Table 21 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020 When the Netherlands results were analyzed, it was predicted that cumulative recovered and daily new deaths are very low in the first week of April, the daily case prevalence in the cumulative case would be around 8-10% and the cumulative active case prevalence would be reach around 92% (Table 23 and Figure 17 ). When the results of Austria data were evaluated, it was predicted that cumulative death, daily new death and cumulative recovered prevalence would be quite low in the first week of April. In addition, the cumulative recovered prevalence in the cumulative case was estimated to be around 2% (Table 24 and Figure 18 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Cumulative death, Daily new death and cumulative recovered prevalence were predicted to be quite low in the first week of April for Norway. In addition, the prevalence of daily new cases in the cumulative case was estimated to be around 6-7% (Table 25 and Figure 19 ). The prevalence of daily new death and cumulative recovered were predicted to be quite low in the first week of April for Brazil. On the other hand, the cumulative active case ratio in cumulative cases was estimated at 95% (Table 26 and Figure 20) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. . Table 29 shows the model performance criteria for data from two countries selected from countries exposed to the outbreak between 1-7 March. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. The model performance criteria obtained when the data of one country selected among the countries exposed to the outbreak between 8-15 March are modeled were given in Table 29 . When the results of Turkey were analyzed, it was seen that the cumulative death prevalence in the cumulative case was about 1% in the first week of April, the daily case prevalence showed a decline, the forecast was around 6% on the tenth day and the cumulative recovered prevalence was around 2% (Table 33 and Figure 25 ). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 1, 2020. . In the table, on the 5th day after modeling for each country, it was calculated as the prevalence of the indicators in the cumulative cases as %. When the table was investigated, in the first week of April it was seen that the cumulative death prevalence in Italy and Spain is the highest, followed by the Netherlands, France, UK China, Denmark, Belgium, Brazil and Sweden respectively. The daily new case proportion in cumulative cases can also be considered as an indicator of an increase or decrease in the severity of the outbreak. When this indicator was evaluated, it can be said that the highest values was observed in Belgium, USA, Canada, Poland, Ireland, Netherlands, France, and Israel, which vary between 10% and 12% and the lowest proportions was observed in China and South Korea. The daily deaths proportion in the cumulative case was low in many countries and similar. But it is lowest in Netherlands. The cumulative recovered proportion in the cumulative case is the another important criteria for outbreak. This proportion was the highest in China, followed by South Korea. However, the highest value was in Switzerland (16.45%) among the ten countries exposed to the outbreak after February 21. The prevalence was found 1.75% for Turkey. Finally, the active case proportion in cumulative case was another criterion that can be used to evaluate the course of the outbreak. The predicted active case proportions are high in most of countries. This result shows that there are many people in the active disease period. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020. The calculations were made with on the 5th day after modeling for each country. The results obtained from the evaluations made with limited epidemiological data at the beginning of March or earlier have transferred some information to the present. As of the end of March, the longest 3.5 months and the shortest 1 month data are available and their evaluation is of great importance. The results of this study gives an important information for the countries struggling with the outbreak in less than one month or who have not yet been exposed to the outbreak. Also future models that can be simulated against similar risks will be obtained. It was predicted that the outbreak in China was almost under control as a result of the models obtained with this study. The number of cases and deaths has increased rapidly in Italy and the country could not control the outbreak since the outbreak started. However, it was seen that there was a relatively increase in the proportion of the recovered. Both the rapid case increase to date and the increase seen in future forecasts show that Spain has not yet been able to control the outbreak. The new case and recovered case proportions were higher than Italy. It is seen that there has not been a significant decrease in the severity of the outbreak, although it has been 2.5 months since Italy and Spain . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10.1101/2020.04.26.20080754 doi: medRxiv preprint announced the first case. South Korea, which started to fight the outbreak in the same period, was the most successful country in terms of results, followed by Germany. The results in Iran was better than Israel, which is in the same period, and France that the outbreak started 1 month ago. The results of Turkey was similar to Germany and Ireland. The differences and advantages of this study from the modelling studies published before can be summarized as follows. • Modeling and comparative analysis of real-time and retrospective data of 25 countries with a total number of cases exceeding 1000 among infected countries until 10 March, • The number of data of the most countries has reached sufficient size to develop a good model • It is also the evaluation of trend and seasonal effect in time series models. Prediction and Analysis of Coronavirus Disease COVID-19, SARS-CoV2, Infection: Current Epıdemiological Analysis and Modeling of Disease Application of the ARIMA model on the COVID-2019 outbreak dataset Short-term Forecasts of the COVID-19 Outbreak in Guangdong and Zhejiang Optimization Method for Forecasting Confirmed Cases of COVID-19 in China Prediction of Outbreak Spread of the 2019 Novel Coronavirus Driven by Spring Festival Transportation in China: A Population-Based Study Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV Prediction of the Outbreak Peak of Coronavirus Disease in Japan Epidemiological and Clinical Predictors of COVID-19 Applied Econometric Time Series Non-linear Time Series Models in Empirical Finance Time Series Forecasting Using Holt-Winters Exponential Smoothing A New Approach of Brown's Double Exponential Smoothing Method in Time Series Analysis It was predicted that the number of new daily deaths will be low in the first week of April, the daily case prevalence in the cumulative case would be around 6-10% and the cumulative recovered prevalence would be around 75% when Switzerland data was modeled (Table 22 and Figure 16 ). According to Denmark results, the prevalence of daily new death and cumulative recovered were predicted to be quite low in the first week of April. On the other hand, the cumulative active case ratio in the cumulative case was estimated at 95% (Table 27 and Figure 21 ). It was predicted that daily new death and cumulative recovered prevalence will be quite low in the first week of April for Ireland.On the other hand, the cumulative active case prevalence in the cumulative case was estimated at 95% and the daily case prevalence was estimated at 7-10% (Table 28 and Figure 22 ). When Portugal's results were analyzed, it was predicted that daily new death and cumulative recovered prevalence would be quite low in the first week of April. On the other hand, the prevalence of daily cases in the cumulative cases was at most 15% and it was predicted that this prevalence will decrease gradually, but the cumulative active cases prevalence would be 95% (Table 30 and Figure 23 ). It was predicted that cumulative death, daily new death and cumulative recovered prevalence would be quite low in the first week of April when Poland results were analyzed. The prevalence of daily new cases in the cumulative case was estimated at around 20% (Table 31 and Figure 24 ).