key: cord-0688493-z1u3c5qm authors: MESSIS, A.; Adjebli, A.; Ayeche, R.; Ghidouche, A.; Ait ali, D. title: Forecasting daily confirmed COVID-19 cases in Algeria using ARIMA models date: 2020-12-20 journal: nan DOI: 10.1101/2020.12.18.20248340 sha: 3c4ecda3ac5320eede467b5789f68f1cbc75bbff doc_id: 688493 cord_uid: z1u3c5qm ABSTRACT Coronavirus disease has become a worldwide threat affecting almost every country in the world. The aim of this study is to identify the COVID-19 cases (positive, recovery and death) in Algeria using the Double Exponential Smoothing Method and an Autoregressive Integrated Moving Average (ARIMA) model for forecasting the COVID-19 cases. The data for this study were obtained from March 21st, 2020 to November 26th, 2020. The daily Algerian COVID-19 confirmed cases were sourced from The Ministry of Health, Population and Hospital Reform of Algeria. Based on the results of PACF, ACF, and estimated parameters of the ARIMA model in the COVID-19 case in Algeria following the ARIMA model (0,1,1). Observed cases during the forecast period were accurately predicted and were placed within the prediction intervals generated by the fitted model. This study shows that ARIMA models with optimally selected covariates are useful tools for monitoring and predicting trends of COVID-19 cases in Algeria. Keywords: COVID-19, Time series, Double Exponential Smoothing, ARIMA; forecast, Algeria. On March 11 th , 2020, the World Health Organization (WHO) declared COVID-19 as a worldwide pandemic. In December, 2019, a local outbreak of pneumonia of initially unknown cause was detected in Wuhan (Hubei, China), and was quickly determined to be caused by a novel coronavirus [1] , namely severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) [2] . The outbreak has since spread to every province of mainland China firstly and has been propagated around countries and regions of the world. The contagious COVID-19 devastated normal life around the world. As of November 26 th , 2020, COVID-19 has infected more than 60776978 confirmed cases in the world, has killed more than 1428228 people, and has forced more than 7 billion to stay in their homes [3] . In response to this ongoing public health emergency, an online interactive dashboard has been developed, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualize and track in real time reported cases of coronavirus disease 2019 in the world [3] . Coronaviruses are a large family of viruses with some causing less-severe disease, such as the common cold, and others more severe disease such as MERS and SARS. Some transmit easily from person to person, while others do not. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. According to Chinese authorities, the virus in question can cause severe illness in some patients and does not transmit readily between people. The new coronaviruses lurking around the world are threatening our rules, and the prevalence of fear and panic is increasing. It has also affected the cryptocurrency market [4] [5] . Algeria reported its first COVID-19 case, on February 25 th , 2020. In November 26 th , 2020, Algeria has reported 79110 confirmed cases with 51334 recoveries and 2352 deaths by COVID-19 [6] . Countries all over the world are challenged with this virus and have declared lockdowns in their various cities and states. The researchers estimate that the virus proliferates to more than two persons from every infected person, highlighting the possibility to infect millions [6] . In order to control this pandemic, Algerian government has instituted on March 23 rd , 2020, several nonpharmaceutical intervention strategies. Thus strategies along with other measures such as social distancing, isolation and quarantine aimed to break the chain of transmission of COVID-19 in Algeria [7] . These measures were implemented with the aim to flatten the pandemic curve and prevent an exponential rise in new COVID-19 infections that would allow for the effective management and control of the pandemic. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Accurate forecasting of COVID-19 case trends is essential for the preparedness of health systems in terms of outbreak management and resource planning. Mathematical and statistical modeling of infectious disease is effective tools that would enable health systems to anticipate future disease trends [8] [9] . Time series models such as the Autoregressive Integrated Moving Average (ARIMA) have been widely used to statistically model and forecast infectious disease trends [10] . ARIMA models are preferred in this context as they are suitable for investigations into short-term effects of acute infectious diseases and are a flexible class of models that are appropriate to fit several trajectories, and have been well documented in the literature [10] [11] . ARIMA models have been used in several studies to forecast the COVID-19 outbreak trends [12-13-14-15] . However, to date, there are no studies conducted using ARIMA models to forecast the COVID-19 cases outbreak in Algeria. In this study, ARIMA models were developed using daily COVID-19 confirmed and active cases in Algeria to identify the best fitting model COVID-19 cases from March 21 st , 2020 to November 26 th , 2020. Forecasting future COVID-19 cases using ARIMA models are suitable especially when model parameters that determine the disease dynamics are unavailable or undetermined due to the disease novelty. In addition, ARIMA models are a flexible, empirical method which is able to produce reliable All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; https://doi.org/10.1101/2020.12.18.20248340 doi: medRxiv preprint forecast in situations with limited data. This paper demonstrates that, ARIMA models are able to provide reasonable forecasts even with the above mentioned limitations. Data for this study were obtained from March 21 st , 2020 to November 26 th , 2020. The daily Algerian COVID-19 confirmed cases were sourced from The Ministry of Health, Population and Hospital Reform. Daily COVID-19 confirmed cases for neighboring countries were also obtained from the Johns Hopkins University's official website [6] . Average (ARIMA) processes refer to [16] with the following equation: • Determine the first smoothing value and determine the parameter • Determine the second smoothing value ′′ = ′ + (1 − ) −1 ′′ • ARIMA Model For Time Series Data ARIMA model is stated as follows: All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; https://doi.org/10.1101/2020.12.18.20248340 doi: medRxiv preprint ARIMA forecasts on its previous past values and portrayed by 3 terms-p, d, q. Where, p is the order for the Auto Regressive expression (AR), q is the order for the Moving Average expression (MA) and d is the Number of differencing required making the time arrangement fixed. Our goal is to that optimizes the metric of interest [17] . The experiment is carried out in Minitab 17 Programming software. In general equation can be approached using a regression model: Using the time-series model approach, the pattern of COVID-19 data distribution behavior in Algeria shows an exponential distribution pattern, where the addition of positive cases of COVID-19 increases significantly every day of the epidemic. This condition is also followed by a distribution pattern of the number of people who recovered and died (Figure 1) . As we know that in the time-series model the type of exponential distribution consists of a single exponential, double All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; exponential, and Winters' method. Based on the literature reviews, the best model is the double exponential model [18] . author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. In the time series model with the error probability (α) 5%, the graph follows the ARIMA process (0,1,1) with the p-Value MA 1 (0.0%) is smaller than α. The estimated results of parameters model for COVID-19 Positive Data using ARIMA model are shown in Table 1 . Referring to equation (4) , mathematically the ARIMA model (0,1,1) can be stated using coefficients in Table 1 author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; With the same steps as testing positive data, next step the identification process PACF and ACF from recovery data. Same as COVID-19 positive data, Figure 6 to show PACF and ACF plots of Residuals for COVID-19 Recovery data are obtained. The lag times through PACF cuts off at lag one and ACF tails off slowly. In the time series model with the error probability (α) 5%, the graph follows the ARIMA process (0,1,1) with the P-Value MA 1 (0,0%) is smaller than α. The estimated of parameters for COVID-19 Recovery data results using ARIMA model are reported in Table 2 . Referring to equation (4) , mathematically the ARIMA model (0,1,1) in Table 2 can be stated as follows: = 205.53 − 0.30 −1 All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; After positive and recovery data are analyzed, next the PACF and ACF models of the data death are shown in Figure 7 . The PACF and ACF plots of residuals for COVID-19 death data are obtained. The lag time through PACF cuts off at lag two and ACF tails off slowly. In time series model with the error probability (α) 5%, the graph follows the ARIMA process (0,1,1) with the p-Value MA 1 (0.00%) is smaller than α. All estimated parameters results of the ARIMA model are shown in the Table 3 . All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; https://doi.org/10.1101/2020. 12.18.20248340 doi: medRxiv preprint Referring to equation (4) , mathematically the ARIMA model (0,1,1) can be stated using constant coefficient's obtained in Table 3 as follows: =9.343−0.415 −1 author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; Since the WHO declared COVID-19 a pandemic in March 11 th , 2020, several countries including Algeria experienced an exponential rise in COVID-19 cases [7] . This rapid increase of cases has stressed most healthcare systems worldwide and has further made outbreak response and resource planning a challenge. In response, health authorities have attempted to forecast the trend of this pandemic, however this have proven to be difficult as COVID-19 is a novel disease with limited data and knowledge on the disease trends and dynamics [6] . This is especially observed when using ARIMA model to predict disease trends, where ARIMA model require sufficient long time series data to be accurate. Our forecast also showed an accurate trend which corresponded to the positive cases observed and reported by the ministry of health in Algeria during three days (252, 253 and 254). The same situation has been obtained for forecasted recovery and death cases. Table 4 , this finding is strengthened by variations of less than 5% between the forecast and observed cases in 100% of the forecasted data points. This paper demonstrates that ARIMA models are a suitable tool to forecast case trends especially during situations where data is limited. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; Similarly, studies on COVID-19 conducted in countries such as South Korea, Iran and Italy were able to predict case trends using ARIMA models in similar conditions. In addition as with our findings, a study in Italy also reported a high level of forecast accuracy of 95% in predicting COVID-19 trends using ARIMA models [14-19-20] . The strengths of this study include, firstly, this paper is the first to report the use of ARIMA models to forecast COVID-19 cases and trends in Algeria. Secondly, this was the first attempt to use smoothen case data to improve accuracy as compared to similar studies on ARIMA models for COVID-19 conducted in other countries [20] [21] . Thirdly, we used several independent covariates which provided more accurate signals to develop short-term model predictions for immediate outbreak response. And finally, we also optimized the model training and validation period to provide the highest number of data points to generate the best fit model. All rights reserved. No reuse allowed without permission. author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is the 2020. this version posted December 20, ; https://doi.org/10.1101/2020.12.18.20248340 doi: medRxiv preprint This study demonstrated the effectiveness of ARIMA models as an early warning strategy that can provide accurate COVID-19 forecasts on larger data points (251 days). ARIMA models are not only effective but it's a simple and easy tool by Preventing intra-hospital infection and transmission of COVID-19 in healthcare workers Clinical findings in a group of patients infected with the 2019 novel coronavirus (SARS-Cov-2) outside of Wuhan, China: retrospective case series World Health Organization. Coronavirus disease (COVID-19) outbreak situation URL Prediction of the epidemic peak of coronavirus disease in Japan Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is Novel coronavirus (COVID-19) cases URL A novel coronavirus outbreak of global health concern Early dynamics of transmission and control of COVID-19: a mathematical modelling study Investigating a serious challenge in the sustainable development process: analysis of confirmed cases of COVID-19 (new type of coronavirus) through a binary classification using artificial intelligence and regression analysis Use of time-series analysis in infectious disease surveillance All rights reserved. No reuse allowed without permission author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is A systematic review of methodology: Time series regression analysis for environmental factors and infectious diseases Application of the ARIMA model on the COVID-2019 epidemic dataset COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach Forecasting daily confirmed COVID-19 cases in Malaysia using ARIMA models Prediction of the COVID-19 Pandemic for the Top 15 Affected Countries: Advanced Autoregressive Integrated Moving Average (ARIMA) Model author/funder, who has granted medRxiv a license to display the preprint in perpetuity Modeling COVID -19 Epidemic of USA, UK, and Russia Lag order and critical values of the augmented dickeyfuller test Identification COVID-19 Cases in Indonesia with The Double Exponential Smoothing Method Confirmed Cases in Different Countries with ARIMA Models COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach Application of the ARIMA model on the COVID-2019 epidemic dataset All rights reserved. No reuse allowed without permission author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was not certified by peer review) is The authors would like to thank the Ministry of Health,