key: cord-0836388-3pkncxgb authors: Maleki, Mohsen; Mahmoudi, Mohammad Reza; Wraith, Darren; Pho, Kim-Hung title: Time series modelling to forecast the confirmed and recovered cases of COVID-19 date: 2020-05-13 journal: Travel Med Infect Dis DOI: 10.1016/j.tmaid.2020.101742 sha: 0526f63d4f6a7af7401fefc1e72dbf415659fa8d doc_id: 836388 cord_uid: 3pkncxgb Coronaviruses are enveloped RNA viruses from the Coronaviridae family affecting neurological, gastrointestinal, hepatic and respiratory systems. In late 2019 a new member of this family belonging to the Betacoronavirus genera (referred to as COVID-19) originated and spread quickly across the world calling for strict containment plans and policies. In most countries in the world, the outbreak of the disease has been serious and the number of confirmed COVID-19 cases has increased daily, while, fortunately the recovered COVID-19 cases have also increased. Clearly, forecasting the “confirmed” and “recovered” COVID-19 cases helps planning to control the disease and plan for utilization of health care resources. Time series models based on statistical methodology are useful to model time-indexed data and for forecasting. Autoregressive time series models based on two-piece scale mixture normal distributions, called TP–SMN–AR models, is a flexible family of models involving many classical symmetric/asymmetric and light/heavy tailed autoregressive models. In this paper, we use this family of models to analyze the real world time series data of confirmed and recovered COVID-19 cases. Coronaviridae family includes two main subfamilies Coronavirinae and Torovirinae. The member genera include Alphacoronavirus, Betacoronavirus, Gammacoronavirus, Torovirus, and Bafinivirus. They are a huge family of viruses that affect neurological, gastrointestinal, hepatic and respiratory systems and can be grown among humans, bats, mice, livestock, birds, and others [1] [2] [3] . In the Coronaviridae family, a wellknown type of virus called SARS coronavirus (SARS-CoV) distributed from animal to animal and humans [4] . Another type of coronavirus, called MERS coronavirus (MERS -CoV), significantly distributed from human to human in 2012 [4] . In 2019 many cases in China with respiratory diseases were reported by the World Health Organization (WHO), with evidence that these cases originated from a seafood market in Wuhan [5] . In 2019 a new type of virus called COVID-19 (novel coronavirus, 2019-nCoV), belonging to the Betacoronavirus genera of Coronaviridae family, spread from Wuhan in China [6] . Evidence that COVID-19 is distributed from human to human has been verified by the Centers for Disease Control and Prevention (CDC), and also reported that COVID-19 is spreading by touching surfaces, close contact, air, or objects that contain viral particles. The incubation period of COVID-19 is at least 14 days [7] , and it can spread to others in the incubation period. Finally note that the incubation period and median age of confirmed cases are respectively 3 days and 47.0 years [8] . Preparation and controlling the outbreak of COVID-19 diseases requires thorough planning and policies. Some researchers have used statistical and mathematical modelling. In China, the number of unreported COVID-19 cases has been mathematically estimated in [9] . Also based on the information of some Japanese passengers in Wuhan, [10] estimated the rate of the infection for COVID-19 in Wuhan. The results indicated a rate of 9.5% for infection and a rate from 0.3% to 0.6%, for death. Based on mathematical modelling in [11] , the transmission risk of COVID-19 is on average about 6.47 persons and predicted the time that the peak of COVID-19 will be reached. Estimation of a sustained human-to-human transmission equal to 0.4 for COVID-19 using the information of 47 patients has been done in [12] . In [13] , based on two scenarios, found that the risk of death is 5.1% and 8.4%. The modelling, estimation and prediction of the prevalence of viruses and the epidemiological characteristics are important issues in providing the equipment needed to cope with their consequences. Forecasting of the cases and transmission risk of West Nile virus (WNV) has been provided by [14] . For further modelling and forecasting of the spread of several viruses such as the hepatitis A virus, Ebola, SARS, influenza A and MERS, refer to refs. [15, 21] . To have a suitable plan for COVID-19, forecasting the future confirmed cases are critical. An optimization method, named FPASSA-ANFIS, has been proposed by [22] to model the number of confirmed cases of COVID-19 and to predict its future values using collected data in China. Forecasting various data about COVID-19, by mathematical and statistical models, is very important to a program of cutting the transmission chain of diseases; see, e.g., [23, 24, 25] . According to credible daily reports from the World Health Organization and other world-renowned institutions in the field of public health, the total number of COVID-19 confirmed cases has increased in different countries, especially in U.S.A, Italy, Spain and Iran. Although the spread of COVID-19 has many dangers, fortunately reports show that the total number of COVID-19 recovered cases has also increased. Increasing the number of recovered cases, along with reducing or stabilizing the number of confirmed cases is important to control the spread of the COVID-19 and leads to stability of the rate of infections in the world. So modelling and forecasting the numbers of confirmed and recovered COVID-19 cases has an important role to plan the control of the spread of the COVID-19 in the world. Cumulative numbers of the confirmed and recovered COVID-19 cases, which are reported daily by the proposed organizations, on each day depend on their values on the past days. So using autoregressive time series model can be a useful tool to model, analyze and forecast the confirmed and recovered cases of COVID-19. The SIR epidemic modelling can be done at local (country) level but the autoregressive model can be good to look at overall patterns. The autoregressive time series model is a flexible tool to model dependent data and has been used to estimate and forecast many real practical problems, see refs. [27] [28] [29] [30] [31] [32] [33] [34] . In fact, the autoregressive model, determines the probabilistic behavior of the current values based on a linear combination of past values , , , … , in the form of: where the error terms are generally assumed to be uncorrelated and identically probabilistically distributed random variables from a distribution, and denoted by Time series plots of the total confirmed and recovered cases are plotted in Fig. 1 and Fig. 2 respectively. The proposed time series plots are not stationary because they are increasing and show signs of a trend. After some suitable transformations described in [40] , we obtain stationary data. Also using model selection criteria [34, 39, 40] the best TP-SMN-AR models (the autoregressive models based on the two-piece t distributions) were fitted to the stationary series of the confirmed and recovered cases and are given by The histograms of the estimated errors (residuals) based on the estimated heavy-tailed TP-SMN densities are superimposed in Fig. 3 and show the suitable performance of the estimated models to the stationary series of total confirmed and recovered COVID-19 cases datasets. Also the auto-correlation function (ACF) plots of the residuals presented in Fig. 4 show the suitability of the fitted models. To further demonstrate the goodness of fit of the model, we eliminated the last 10 days of the confirmed and recovered cases (2020-Apr-21 to 2020-Apr-30), and then fitted the TP-SMN-AR models and provided forecasts. Table 1 contains the predictions and 98% confidence intervals for this analysis. Also Fig. 5, Fig. 6 and Fig. 7 , show the forecasted values which are superimposed on the plots of the real values of the confirmed and recovered COVID-19 cases in the world. To evaluate the accuracy of the predictions, we use the mean relative percentage error (MAPE) , which for the confirmed COVID-19 cases is 0.22% and for the recovered COVID-19 cases is 1.6% which are reasonably low values demonstrating the suitability of the proposed models for prediction. Finally note that the proposed TP-SMN-AR models include as special or limiting cases the more standard autoregressive time series models used in the literature. In particular, some model selection criteria such as Akaike information criteria (AIC), Bayesian information criteria (BIC), and Box-Pierce and Ljung-Box tests on the residuals, demonstrate that the proposed fitted TP-SMN-AR models are more reasonable than other well-known counterparts. Coronaviruses are a huge family of viruses that affect neurological, gastrointestinal, hepatic, and respiratory systems. The number of confirmed cases has increased daily in different countries, especially in U.S.A, Italy, Spain, Iran, China and others. The spread of COVID-19 has many dangers and needs strict special plans and policies. Therefore, to consider plans and policies, predicting and forecasting the future confirmed and recoveries cases are critical. The autoregressive time series models are a useful tool to model data over time. However, some of the standard time series models are based on the assumption that the error term or residuals are symmetric (Gaussian). There exist many situations in the real world that the assumption of symmetric distribution of the error terms is not satisfactory. In our methodology, we considered autoregressive time series models based on the two-piece scale mixture normal (TP-SMN) distributions. The results indicated that the proposed method performed well in forecasting confirmed and recovered COVID-19 cases in the world. Using model selection criteria, the proposed models were also more reasonable than the standard Gaussian autoregressive time series model which is the simplest member of our proposed models. For future works, we suggest that the researchers apply cyclostationary, almost cyclostationary and simple processes [41-47] based on the TP-SMN distributions, instead of stationary processes. The authors declare no conflict of interest. Emerging coronaviruses: Genome structure, replication, and pathogenesis Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Transmission scenarios for Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and how to tell them apart Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding Novel Coronavirus: Where We are and What We Know Clinical characteristics of 2019 novel coronavirus infection in China Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak The Rate of Underascertainment of Novel Coronavirus (2019-nCoV) Infection: Estimation Using Japanese Passengers Data on Evacuation Flights Estimation of the Transmission Risk of the 2019-nCoV and Its Implication for Public Health Interventions Novel Coronavirus Outbreak in Wuhan, China, 2020: Intense Surveillance Is Vital for Preventing Sustained Transmission in New Locations Real time estimation of the risk of death from novel coronavirus (2019-nCoV) infection: Inference using exported cases Ensemble forecast of human West Nile virus cases and mosquito infection rates Comparison of four different time series methods to forecast hepatitis A virus infection Forecasting seasonal outbreaks of influenza Real-time influenza forecasts during the 2012-2013 season Inference and forecast of the current West African Ebola outbreak in Guinea, Sierra Leone and Liberia Forecasting versus projection models in epidemiology: The case of the SARS epidemics Realtime epidemic monitoring and forecasting of H1N1-2009 using influenza-like illness from general practice and family doctor clinics in Singapore Abd El Aziz, M. Optimization Method for Forecasting Confirmed Cases of COVID-19 in China How to make predictions about future infectious disease risks Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model A new method to detect periodically correlated structure Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions Time series process based on the unrestricted skew normal process A Bayesian approach to robust skewed Autoregressive process Nonlinear semiparametric autoregressive model with finite mixtures of scale mixtures of skew normal innovations Asymmetric heavy-tailed vector autoregressive processes with application to financial data Autoregressive processes with generalized hyperbolic innovations Leptokurtic and Platykurtic class of Robust Symmetrical and Asymmetrical Time Series Models Robust mixture modelling based on two-piece scale mixtures of normal family A robust class of homoscedastic nonlinear regression models The Skew-Reflected-Gompertz distribution for analyzing symmetric and asymmetric data Testing the Difference between Two Independent Time Series Models On the asymptotic distribution for the periodograms of almost periodically correlated (cyclostationary) processes On the detection and estimation of the simple harmonizable processes Periodically correlated modeling by means of the periodograms asymptotic distributions A new method to compare the spectral densities of two independent periodically correlated time series Testing the difference between spectral densities of two independent periodically correlated (cyclostationary) time series models Travel Medicine and Infectious Disease requires that all authors sign a declaration of conflicting interests. If you have nothing to declare in any of these categories then this should be stated. A conflicting interest exists when professional judgement concerning a primary interest (such as patient's welfare or the validity of research) may be influenced by a secondary interest (such as financial gain or personal rivalry). It may arise for the authors when they have financial interest that may influence their interpretation of their results or those of others. Examples of potential conflicts of interest include employment, consultancies, stock ownership, honoraria, paid expert testimony, patent applications/registrations, and grants or other funding. The authors declare that they have no conflict of interest. All sources of funding should also be acknowledged and you should declare any involvement of study sponsors in the study design; collection, analysis and interpretation of data; the writing of the manuscript; the decision to submit the manuscript for publication. If the study sponsors had no such involvement, this should be stated. No fund.