key: cord-0921844-f27c9og6 authors: GAHAN, P.; PATTNAIK, M.; Nayak, A.; Roul, M. K. title: Prediction of COVID-19 Pandemic of Top Ten Countries in the World Establishing a Hybrid AARNN LTM Model date: 2021-01-05 journal: nan DOI: 10.1101/2020.12.31.20249105 sha: 502d11836511ecdc768fcaaefe7fdbd98abe2e22 doc_id: 921844 cord_uid: f27c9og6 The novel COVID-19 global pandemic has become a public health emergency of international concern affecting 215 countries and territories around the globe. As of 28 November 2020, it has caused a pandemic outbreak with a total of more than 6,171,5119 confirmed infections and more than 1,44,4235 confirmed deaths reported worldwide. The main focus of this paper is to generate LTM real-time out of sample forecasts of the future COVID-19 confirmed and death cases respectively for the top ten profoundly affected countries including for the world. To solve this problem we introduced a novel hybrid approach AARNN model based on ARIMA and ARNN forecasting model that can generate LTM (fifty days ahead) out of sample forecasts of the number of daily confirmed and death COVID-19 cases for the ten countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and also for the world respectively. The predictions of the future outbreak for different countries will be useful for the effective allocation of health care resources and will act as early-warning system for health warriors, corporate leaders, economists, government/public-policy makers, and scientific experts. traditional time series forecasting, the ARIMA model is used predominantly for forecasting linear time series (Box et al., 2015) . Machine learning models such as ANN, ARNN and WBF (Paul a al., 2017, Nury et al., 2017, Aminghafari et al., 2007 , Beraouda et al., 2006 and Fay et al., 2007 are proved to perform well for nonlinear data structures in time series data. COVID-19 data sets are neither purely linear nor nonlinear. They usually contain both linear and nonlinear patterns. If this is the case, then the individual ARIMA or ARNN or ANN or WBF is not enough to model such situations. Therefore, the combination of linear and nonlinear models can be well suited for accurately modeling such complex autocorrelation structures ( To ignore this drawback, we took recourse to the ARNN model approach. ARNN fits a feed-forward neural network model with only one hidden layer to a time series with lagged values of the series as inputs. The advantage of ARNN over ANN and WBF is that ARNN is a nonlinear autoregressive model, and it provides less complexity, easy interpretability and better prediction as compared to others in many situations. Due to this, ARNN is getting more attention in recent literature of non-stationary time series forecasting (Hydmann, 2018) . In the absence of vaccines or antiviral drugs for COVID-19, these estimates will present an insight into the resource allocations for the extremely influential countries to keep this global pandemic under control. Besides detaching light on the dynamics of COVID-19 dissemination, the practical intent of this real-time data-driven analysis is to provide health warriors, corporate leaders, economists, government/publicpolicy makers, and scientific experts with realistic estimates for the magnitude of the pandemic for policy-making. The main inspiration of this paper is to handle the dynamism of the characteristics of the time series COVID-19 data. Taking a final decision in policy making based on a component model may be risky in such a severe problem like COVID-19 pandemic confirmed and death cases forecasting where one frequently observes changes in the dynamic properties of the variable being measured. Hybridization of two or more models is the most common solution to this problem (Oliveira et al., 2014). Even from the practitioners' point of view, hybrid models are more effective when the complete data characteristics are not known (Kuncheva, 2004) . Motivated from these discussions, this paper proposes a novel hybrid ARIMA-ARNN (AARNN) model that captures complex data structures and linear plus nonlinear behavior of COVID-19 data sets. In the first phase of our proposed model, ARIMA catches the linear patterns of the data set. Then the ARNN model is employed to capture the nonlinear patterns in the data using residual values obtained from the base ARIMA model. The proposed model has easy interpretability, robust predictability and can adapt seasonality indices as well. Through experimental evaluation, we have shown the excellent performance of the proposed hybrid AARNN model for the COVID-19 global pandemics forecasting for profoundly affected top ten countries including world data sets. The rest of the paper is structured as follows: In Section 2, country-wise COVID-19 confirmed and death cases datasets, preliminary of data analysis and performance evaluation metrics are presented. In Section 3, we discuss the proper formulation of the proposed hybrid AARNN model. We discuss the experimental evaluations and results for LTM out-of-sample forecasts of COVID-19 global pandemic for the top ten countries including for the world in Section 4. The discussions about the results and practical implications are presented in Section 5. Finally, in Section 6, concluding remarks of this study are presented. We explore on the daily confirmed and death cases of COVID-19 pandemic for ten countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and for the World respectively. The datasets are retrieved by the Global Change Data Lab 1 . The starting date of all the countries is different but the ending date is same for all i.e. November 28, 2020. Present section includes the brief discussion about the datasets, the development of the proposed hybrid model and finally, the application of the proposed model to generate long-term-memory (LTM) out-of-sample forecasts of the COVID-19 confirmed and death cases for the above ten countries including the world. 2.1 COVID-19 datasets Eleven univariate time series datasets of both confirmed and death cases of COVID-19 are collected for the real-time prediction purpose for the ten countries including the world. Different earlier studies have forecasted future COVID-19 confirmed cases only for different countries using mathematical and traditional time series models [].We try to investigate accurate prediction both of COVID-19 confirmed and death cases of top ten profoundly affected countries. Initial day of daily laboratory-confirmed and -death cases for all the countries are different but the last day of observation for all the countries is considered same timeline i.e. 28.11.2020. Table 2 shows the description of COVID-19 data sets of confirmed cases and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and for world respectively. The dataset of confirmed cases for the USA, India, Russia, France, Spain, UK and world contains a total of more than 300 observations whereas the dataset for Brazil, Argentina and Colombia contains less than 300 observations. The dataset of death cases for all the studied countries are in between a total of 205 and 300 observations except for world. 2.2 Preliminary data analysis A summary of the COVID-19 data sets of confirmed and death cases are shown in Table 1 . From p-values it indicates that all the eleven COVID-19 data sets of both confirmed and death cases are not normally distributed. Table 3 and 4 (A-1 & A-2) show the daily confirmed and death cases of COVID-19 pandemic of ten countries including world respectively. The epidemic curves still not showing the spiky declining nature. We bound our attention for trend and non-seasonal models. We consider a novel approach which assumes that the trend will tend to infinity in the future. To fit the traditional ARIMA model, hybrid ARIMA-ANN, WARIMA and proposed AARNN models of COVID-19 confirmed and death cases, the parameters namely ‫‬ and ‫ݍ‬ are specified by plotting ACF and PACF plots (see Table 3 and 4). (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. is the predicted output, and ݊ denotes the number of data points. By definition, the lower the value of these performance metrics, the better is the performance of the concerned forecasting model. To forecast COVID-19 confirmed and death cases, we adopt hybrid time series forecasting approaches combining ARIMA and ARNN techniques. The proposed AARNN hybrid model overcomes the deficiencies of the individual linear and nonlinear and other hybrid time series models. Before describing the methodology, we give a brief description of the individual traditional and advanced models to be used in the hybridization. is the random error at time t, ߮ and ߠ are the coefficients of the model. It is assumed that has zero mean with constant variance, and satisfies the i.i.d condition. The methodology consists of four iterative steps: 1. Model identification and model selection 2. Parameter estimation of the model parameters 3. Model diagnostics checking (namely, residual analysis) are performed to find the 'best' fitted model 4. Model forecasting In the model identification and model selection step, differencing is applied once or twice to achieve stationarity for non-stationary data. As stationarity condition is satisfied, the ACF plot and the PACF plot are examined to select the AR and MA model types. The parameter estimation step involves an optimization process utilizing metrics such as the AIC and/or the BIC. Finally, in the model checking step, the residual analysis is carried out to finalize the 'best' fitted ARIMA model. ARIMA model is a data-dependent approach that can adapt to the structure of the data set. But it has the major disadvantage that any significant nonlinear data and non-Gaussian set can restrict the ARIMA model. Therefore, some individual and hybrid models uses ARNN, ANN and WBF models to deal with the nonlinear data patterns for forecasting intricate time series structure. 3.2 ARNN Model ANN is a widely used supervised learning model, highly useful for sophisticated nonlinear time series forecasting. Neural nets are based on simple mathematical models of the brain, used for intricate nonlinear forecasting. Any neural net architecture can be described as a network of "neurons", arranged in layers, namely the input layer, hidden layer, and output layer. The information from one layer to another layer is passed using weights that are selected using a risk minimization based 'learning algorithm' ). The ARNN model is a modified neural network model especially designed for time series datasets which uses a pre-specified number of hidden neurons in its architecture (Faraway and Chatfield 1998 The aim of developing a novel hybridization is to harness the advantages of individual models and reduce the risk of failures of these models. The underlying assumption of the hybrid approach based on linear and nonlinear model assumption is that the relationship between linear and nonlinear components are additive. The strength of individual models for hybridization is very important, and this selection is essential to show the consistent improvement over single models. This paper presents a novel hybridization of ARIMA and ARNN (AARNN) to overcome the limitation of individual models and utilize their strengths. With the combination of linear and nonlinear models, the proposed methodology can guarantee better performance as compared to the component models. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. , where f is a nonlinear function modeled by the ARNN approach and ߦ ௧ is the random error. Therefore, the combined forecast is is the forecast value of the ARNN model. The rationale behind the use of residuals in the diagnosis of the sufficiency of the proposed hybrid model is that there is still autocorrelation left in the residuals which ARIMA could not model. This work is performed by the ARNN model which can capture the nonlinear autocorrelation relationship. In summary, the proposed hybrid AARNN model works in two phases. In the first phase, an ARIMA model is applied to analyze the linear part of the model. In the next stage, an ARNN model is employed to model the residuals of the ARIMA model. The hybrid model also reduces the model uncertainty which occurs in inferential statistics and forecasting time series. The algorithmic representation of the proposed hybrid AARNN model is also given in Table 2 (Chakraborty et al., 2020). It is a challenging task to study the characteristics and nature of the limited COVID-19 data sets. With the availability of COVID-19 datasets of both confirmed and death cases for ten different countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina and Colombia including world, we consider one significant problem relevant to ongoing COVID-19 global pandemic. This problem deals with the real-time data-driven forecasts of the daily COVID-19 confirmed and death cases of the above top ten profoundly affected countries. We propose a novel hybrid ARIMA-ARNN (AARNN) model to forecast daily COVID-19 confirmed and death cases of these countries. This proposed novel hybrid model can be treated as the best to notify the health practitioners, corporate leaders, economists, government/public-policy makers, and scientific experts to fight against the COVID-19 global pandemic. Eleven univariate time series COVID-19 datasets of daily confirmed and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, and Colombia including world are considered for training the proposed novel hybrid AARNN model and the traditional (ARIMA), advanced (ARNN, ANN and WBF) and other hybrid models (ARIMA-ANN and WARIMA). The datasets are nonlinear, nonstationary, and non-Gaussian in nature and statistical tests confirmed this (see Section 2.2 and refer to Table 1 We start the experimental evaluation for all the eleven datasets with the ARIMA model using 'forecast' (Hyndman et al., 2020) statistical package in R statistical software. We fit the ARIMA model using auto.arima function. As the ARIMA model is fitted, predictions are generated for 50-time steps ahead for LTM forecasts for all the eleven COVID-19 confirmed and death cases datasets respectively. We also compute training data predicted values and calculate the residual errors using the traditional individual model ARIMA model. Plots of the predicted and residual series of both confirmed and death cases of COVID-19 pandemic are given in Table 5 (A-3) . As the ARIMA model is fitted on the residual time series, predictions of both confirmed and death cases are generated for the next fifty (29 th November 2020 to 17 th January 2021) time steps respectively. Further, both the ARIMA, ANN, WBF and proposed ARNN residual forecasts are added together to get the final out-of-sample forecasts of both confirmed and death cases COVID-19 outbreaks using the traditional hybrid ARIMA-ANN, WARIMA and the proposed hybrid AARNN model for the next fifty (29 th November 2020 to 17 th January 2021) days respectively. The proposed hybrid AARNN model for actual and fittings (training data) of COVID-19 confirmed and death cases for these ten countries including world data respectively are displayed in Table 6 (A-4). The real-time LTM out of sample forecasts (50 days ahead) of COVID-19 confirmed and death cases using ARIMA, ARNN and proposed hybrid AARNN model for these ten countries including world are displayed in Table 7 and 8 (A-5 & A-6) respectively. For comparison purposes, we applied traditional and advanced individual models like ARIMA, ARNN, ANN, WBF along with the hybrid models like ARIMA-ANN and WARIMA models (Chakraborty et al., 2020) the non-linearity, non-stationarity and non-Gaussian COVID-19 confirmed and death cases datasets of these ten countries including world respectively. All the experimental results are reported in Table 9 . The performance of the proposed hybrid AARNN model is superior as compared to all the traditional, advanced individual and hybrid models under study. In comparison to other individual and hybrid models, nine for confirmed cases and seven for death cases out of eleven datasets of COVID-19 respectively, our proposed hybrid AARNN model outperformed all the hybrid, traditional and advanced individual models in the significant edge. The consistency and adequacy in experimental results empirically approves the same. Thus, the efficacy of the proposed methodology of the proposed hybrid model is experimentally validated. Table 9 : Quantitative measures of performance for different forecasting models on eleven time series (training data sets only) of COVID-19 confirmed cases and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In this study, a novel hybrid AARNN model is established that performs considerably well for both confirmed and death cases of COVID-19 forecasting for the top ten countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia including world respectively. The proposed novel hybrid AARNN model filters linearity using the ARIMA model and on the whole, it can explain the linearity, nonlinearity and nonstationary tendencies present in both the confirmed cases and death cases of COVID-19 data sets of ten different countries including world better as compared to the traditional, advanced individual and hybrid models. The proposed hybrid AARNN model also yields better forecast accuracy than various traditional, advanced individual and hybrid models like ARIMA, ARNN, ANN, WBF, ARIMA-ANN and WARIMA for nine and seven out of eleven data sets of both confirmed and death cases of COVID-19 respectively considered in this study. The proposal will be useful in decision and policy makings for government officials and policymakers to allocate adequate health care resources for the coming days in responding to the crisis. Although any epidemics time series can oscillate heavily because of various epidemiological factors and these fluctuations are generally challenging to be fittingly captured for precise forecasting. However, the proposed novel hybrid model may still predict with better accuracy. LTM of fifty days ahead out of sample forecasts of both confirmed and death cases of COVID-19 separately are provided for all the ten countries including world. The proposed model can be used as a premature word of warning to struggle against the COVID-19 global pandemic. A list of suggestions based on the results of the real-time forecasts using a novel data-driven hybrid methodology is presented below: 1. Since we presented a real-time forecast system unlike ex-post analysis, thus one can regularly update both the actual daily confirmed and death cases and update the predictions, just like it happens in weather forecasting or other epidemics forecasting. 2. The forecasts on the whole follow oscillating behaviour for the out of sample forecast of 50 days ahead respectively for the ten different countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia including world and reflect the impact of the broad spectrum of social distancing, wearing masks, containment zone, lock down, shutdown, quarantine and sanitizing properly measures implemented by the governments, which likely promoted stable the epidemic. pandemic. 4. Followed by the LTM out of sample forecasts reported in this paper, the wearing masks, sanitization, immunity recreation, social distancing, containment zone, self precaution etc. can be implemented accordingly to tackle the uncertain and robust situations of COVID-19 global pandemic. From the data-driven experimental analysis, it is established that all the traditional and advanced individual and hybrid models like ARIMA, ARNN, ANN, WBF, ARIMA-ANN and WARIMA models are unable to completely capture the behaviour of the nonlinearity, non-stationarity and non-Gaussian of the stochastic time series datasets containing intrinsic random shock or moving average component. This indicates the need to develop a novel hybrid model with AARNN to predict the subsequent both the confirmed and death cases of COVID-19 outbreaks accurately and respond to global pandemics more powerfully. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2020. 12 (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2020. 12 (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 5, 2021. ; https://doi.org/10.1101/2020.12.31.20249105 doi: medRxiv preprint Forecasting nonlinear time series with a hybrid methodology Forecasting time series using wavelets Wavelet-based nonlinear mul tiscale decomposition model for electricity load forecasting Modeling and forecasting of epidemic spreading: The case of covid-19 and beyond Risk assessment of novel coronavirus covid-19 outbreaks outside China Time series analysis: forecasting and control Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMA-ANN model Forecasting dengue epidemics using a hybrid methodology Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis Chaos Unemployment Rate Forecasting: A Hybrid Approach The analysis of time series: an introduction Analysis and forecast of covid-19 spreading in China, Italy and France A wavelet transfer model for time series forecasting Correcting and combining time series forecasters Clinical characteristics of coronavirus disease 2019 in China. N top The elements of statistical learning: data mining, inference, and prediction Forecasting: principles and practice An introduction to statistical learning Real time estimation of the risk of death from novel coronavirus (covid-19) infection: inference using exported cases A novel hybridization of artificial neural networks and ARIMA models for time series forecasting A comparative study of series arima/mlp hybrid models for stock price forecasting Appraisal of artificial neural network for forecasting of economic parameters Early dynamics of transmission and control of covid-19: a mathematical modelling study Combining Pattern Classifiers: Methods and Algorithms Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N top Comparison of ARIMA and NNAR models for forecasting water treatment plant's influent characteristics