key: cord-0844719-44hsw4qp authors: Awan, Tahir Mumtaz; Aslam, Faheem title: Prediction of daily COVID-19 cases in European countries using automatic ARIMA model date: 2020-07-08 journal: J Public Health Res DOI: 10.4081/jphr.2020.1765 sha: 9fe7baefce887f37443ecfa667b440e15e91edf7 doc_id: 844719 cord_uid: 44hsw4qp The recent pandemic (COVID-19) emerged in Wuhan city of China and after causing a lot of destruction there recently changed its epicenter to Europe. There are countless people affected and reported cases are increasing day by day. Predictive models need to consider previous reported cases and forecast the upcoming number of cases. Automatic ARIMA, one of the predictive models used for forecasting contagions, was used in this study to predict the number of confirmed cases for next 10 days in four top European countries through R package “forecast”. The study finds that Auto ARIMA applied on the sample satisfactorily forecasts the confirmed cases of coronavirus for next ten days. The confirmed cases for the four countries show an increasing trend for the next ten days with Spain with a highest number of expected new confirmed cases, followed by Germany and France. Italy is expected to have lowest number of new confirmed cases among the four countries. A growing list of countries are locked down, governments are ordering residents to self-quarantine themselves by staying inside their homes during coronavirus pandemic . According to recent statistics, up to the 23 rd of March 2020, COVID-19 has spread over 168 countries, with 360,697 confirmed infections, 15,495 deaths and 100,471 recovered cases in 168 countries all over the world. The top countries in terms of total confirmed infections are China, Italy, USA, Spain, Germany, Iran, France, and South Korea, whereas in terms of deaths the top countries are Italy, the Hubei province of China, Spain, Iran, France, UK, The Netherlands, and Switzerland. According to the Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at John Hopkins University (JHU) Coronavirus Resource Center, 1 the highest numbers of recovered patients are found in the Hubei province of China, Iran, Italy, Spain, South Korea, France, the Guangdong and the Hunan province of China. The data is available at the repository for COVID-19 (https://systems.jhu.edu/research/public-health/ncov/) operated by JHU-CSSE supported by Esri Living Atlas Team and Applied Physics Lab of JHU along live news dashboard available at https://visualizenow.org/corona-news. COVID-19 was initially named Novel Coronavirus (2019-nCoV) by the National Institute of Viral Disease Control and Prevention (IVDC) on 3 rd of January 2020, 2 on 11 th February 2020 the name given by The World Health Organization became COVID-19, 3, 4 whereas the virus itself is named as SARS-CoV-2. This deadly epidemic was then declared as a pandemic by The World Health Organization. 5, 6 After spreading mass destruction in China, especially in the Hubei province from where it was originated, 7 it now moved its epicenter to Europe. 8 This virus-related diseases has a history of outbreaks in 2018 (MERS-CoV) with 41 deaths in Saudi Arabia, 9 2015 (MERS-CoV) with 36 deaths in South Korea, 10 2012 (MERS-CoV) with over 400 deaths, 11 and 2003 (SARS-CoV) with about 774 deaths. 12 As of 23 rd March 2020, the reported cases in EU/EEA and the UK account for 160,233 cases in total and 8622 fatalities, with Italy at the top with 59,138 cases and 5476 deaths followed by Spain, Germany and France with 28,572, 24,774, and 16,018 cases. 13 In these kinds of pandemic outbreaks, the importance of performing some kind of forecasting is rampant in many scientific and engineering disciplines. 14 The attempt to use statistical methods for predictions holds great importance as it helps the authorities for necessary arrangements and allowed timely response, which ultimately may reduce losses of lives in the case of this recent pandemic. Auto-Regressive Integrated Moving Average (ARIMA) is one of the forecasting models applied for future predictions using time series data. Its application is noticed in various domains, e.g. to predict next day electricity prices, 15 to forecast primary energy demand, 16 to predict stock prices, 17 to predict water quality, 18 to forecast traffic flow, 19 along with its application in medical science in general and specifically epidemics 14, [20] [21] [22] [23] [24] [25] to fulfill the purpose of prediction or forecasting various issues. Specifically, a recent article about COVID-19 26 used ARIMA model and predicted the epidemiological trend of the prevalence and incidence of the pandemic. Another article 27 showed similar kinds of results regarding COVID-19. However, this study makes predictions for the next seven days. This study primarily focuses at forecasting the confirmed cases of European countries. ARIMA technique was used for this purpose. The confirmed cases of COVID-19 data was used till 21 st of March 2020 and predictions of upcoming one week. The materials and methods section below World Health Organization (WHO) and medical authorities all over the world and specially in the European countries are busy in taking appropriate measures against COVID-19. It is important to do proper planning and the success is dependent on the arrangements that will be made in near future to stop the spread of this disease. This study by prediction of upcoming cases will help the authorities to plan accordingly, i.e. to arrange appropriate number of medical facilities. Similar approximations for other parts of the world can be made following the methodology used in this paper and better medical arrangements can be ensured. Overall, such kind of research play an important role for policy making and making task forces to combat against epidemics. discussed in detail about the forecasting mechanism. The results are discussed afterwards which are based on 80% and 95% confidence interval. The final section includes the conclusions and the implications and recommendations of the study for government departments and health ministries of the European countries, so that they can take preventive measures and quick policy decisions can be taken to overcome this deadly pandemic. In this study ARIMA technique was used to estimate the upcoming cases of COVID-19 in the European countries. For this purpose, time series data of daily confirmed cases of coronavirus emerging in the said countries was considered. The ARIMA is one of the most popular models for time series forecasting analysis, and has been originated from the combination of autoregressive model (AR), the moving average model (MA). The ARIMA model is used for stationary time series data, i.e. when there are no missing values. An identified underlying process based on observations is generated in ARIMA analysis to produce a precise processgenerating mechanism resulting in a good model. 28 The ARIMA analysis includes identification estimation, and diagnostic checking. 29, 30 It general ARIMA model is viewed as a filter that tries to separate signal from noise, and the signal further helps to extrapolate the future for obtaining forecasts. The data for this study was taken from https://github.com/CSSEGISandData/COVID-19/tree/master/ csse_covid_19_data/csse_covid_19_daily_reports, a repository maintained by Center for Systems Science and Engineering (CSSE) at John Hopkins University (JHU) Coronavirus Resource Center through GitHub pull request. The data about the reported COVID-19 pandemic cases of four European countries was used for this study for the following reasons: i) Europe is at high risk because of its population density and its business connections all over the world; ii) European countries exhibited a high peak of cases in the recent days. The daily data for most affected countries namely, Italy, Germany, France and Spain are collected from January 22 nd , 2020 to March 28 th , 2020, which corresponds to 66 observations. The selection of these four countries is done on the basis of highest daily growth (∆ [Xn-Xn-1]) as it shows as non-constant growth of the daily confirmed cases, which is calculated by taking the first difference. ARIMA is a frequently used technique for forecasting using the time series data, specified by three order parameters: p, d, q, where p stands for the order of auto regressive model, d is the order of differencing and q represents the order of moving average. The procedure of fitting an ARIMA model is also referred as the Box-Jenkins method, 31 where p, d and q are the orders of the AR part, the Difference and the MA part respectively. AR is a class of linear model where the variable of interest is regressed on its own lagged values. If yt is modeled via AR process, it can be written as: (1) where, δ is intercept; yt-i are regressors; ϕt-i are and ϵ is an error term (ϵϵ). MA is another class of linear model. In MA, the output or the variable of interest is modeled via its own imperfectly predicted values of current and previous times. It can be written as follows in terms of error terms: The mathematical form of ARMA (p,q) is as follows: In short, we can rewrite the above equation as: (4) Article [ For parameter estimations, the "auto.arima" function was used in R package "forecast". 32, 33 The purpose of using this package is to fit best the ARIMA model to univariate time series and returns best ARIMA model according to either Akaike Information Criterion (AIC), or its small-sample equivalent (AICc) or Bayesian Information Criterion (BIC) value. 34, 35 The function conducts a search over possible model 36 within the order constraints provided. 1 In Table 1 , the details of the model with corresponding AIC values are documented. On the basis of AIC, the best model of Italy, Germany, France and Spain are highlighted. After model selection, the best fit models are used to forecast the growth of COVID-19 confirmed cases in all four countries. Based on confirmed COVID-19 cases, predictions are made for the next 10 days for the top four European countries, namely Italy, Spain, Germany, and France. Table 2 details the forecasts of next ten days for the four countries under consideration based on 80% and 95% confidence interval (CI). The minimum and maximum values for both the confidence intervals are also presented in the table. For instance, it is predicted that in Spain there would be a trend of increasing additional number of cases in the coming 10 days, with an average addition of 11,410 cases. In the case of Spain, the 95% confidence interval shows an increase of number of cases would be between a minimum of 8770 to a maximum of 10975. Likewise, in Italy there would be additional 6190 cases on average ranging from 3540 (lower bound) to 8407 (upper bound) in the next ten days, by the end of first week of April 2020. Similar increasing trend can be observed in case of Germany and France from 29 th March 2020 to 4 th April 2020. In the case of Germany, an average increase of 9966 confirmed cases would be experienced in the next ten days, ranging from a minimum of 7776 to a maximum of 12,156 per day, statistically significant at 0.05 level. As compared to Germany, the addition in France is little low. We can predict an average addition of 6937 cases per day, with a minimum of 5848 to a maximum of 8027 cases per day, statistically significant at 0.05 level. The forecast of additional number of cases is presented in Figure 1 . The blue line shows the forecast value, dark gray shows the 95% confidence interval, while the light grey area shows the 80% lower and upper bounds. The ACF and PACF plots in Figures 2 and 3 shows no significant autocorrelations indicating that the residuals are behaving like white noise. To test the overall randomness based on a number of lags, a portmanteau test is applied to the residuals of all fitted ARIMA models. The significant p-values of Box-Pierce test also suggesting that the residuals are white noise. The purpose of this study was to predict the upcoming confirmed cases of COVID-19 in the top 4 countries (where till date the confirmed cases are highest in number). These countries are Italy, Spain, Germany and France. It is a needed study as through the estimates for next ten days, governments can have an idea whether the cases will be increasing or decreasing. Also, they can make their strategies accordingly and medical facilities can be managed accordingly. The ten days prediction of these four countries showed that there is an increasing trend and there will be more destruction in these countries in the coming days. The confirmed cases for the four countries show an increasing trend for the next ten days with Spain having an average of 11,410 additional cases in next ten days, Italy on average will have 6190 additional confirmed cases, Germany will probably have 9966 new cases and in France 6937 new cases will possibly emerge in next ten days of this deadly pandemic. Hospitals need to prepare more isolation wards and medical supplies are to be ensured for the upcoming cases. Furthermore, more investments in health are needed and primary prevention is needed for this pandemic burden. An interactive web-based dashboard to track COVID-19 in real time How early signs of the coronavirus were spotted, spread and throttled in China Coronavirus disease named Covid-19. Channel News Asia, China WHO names novel coronavirus as 'COVID-19'. Channel News Asia Coronavirus: COVID-19 is now officially a pandemic, WHO says Coronavirus confirmed as pandemic by World Health Organization A comprehensive timeline of the new coronavirus pandemic, from China's first COVID-19 case to the present Coronavirus: Europe now epicentre of the pandemic, says WHO diseases/news/infectious-disease-outbreaks-reported-in-theeastern-mediterranean-region-in-2018.html 10. WHO. Middle East respiratory syndrome coronavirus (MERS-CoV) -Republic of Korea Responding to global infectious disease outbreaks: lessons from SARS on the role of risk perception, communication and management European Centre for Disease Prevention and Control. Covid-19 situation update for the EU/EEA and the UK Deep transformer models for time series forecasting: The influenza prevalence case ARIMA models to predict next-day electricity prices ARIMA forecasting of primary energy demand by fuel in Turkey Stock price prediction using the ARIMA model A hybrid neural network and ARIMA model for water quality time series prediction Combining Kohonen maps with ARIMA time series models to forecast traffic flow Predicting Seasonal influenza based on SARIMA model Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran Mortality forecasting in the context of non-linear past mortality trends: an evaluation Forecasting respiratory infectious outbreaks using ED-based syndromic surveillance for febrile ED visits in a metropolitan city DEFSI: Deep learning based epidemic forecasting with synthetic information Application of the ARIMA model on the COVID-2019 epidemic dataset Real-time forecasts of the COVID-19 epidemic in China from Time series analysis: Forecasting and control Time series forecasting using a hybrid ARIMA and neural network model Introduction to time series and forecasting Advances in Box-Jenkins modeling: 1. Model construction Characteristic-based clustering for time series data A study of time series models ARIMA and ETS. Available at SSRN 2898968 Using the R-package to forecast time series: ARIMA models and Application. INTERNA-TIONAL CONFERENCE Economic & Social Challenges and Problems 2010 Facing Impact of Global Crisis The intelligent forecasting model of time series. Automation, Control and Intelligent Systems