key: cord-224428-t8s52emf authors: Tandon, Hiteshi; Ranjan, Prabhat; Chakraborty, Tanmoy; Suhag, Vandana title: Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future date: 2020-04-16 journal: nan DOI: nan sha: doc_id: 224428 cord_uid: t8s52emf COVID-19, a novel coronavirus, is currently a major worldwide threat. It has infected more than a million people globally leading to hundred-thousands of deaths. In such grave circumstances, it is very important to predict the future infected cases to support prevention of the disease and aid in the healthcare service preparation. Following that notion, we have developed a model and then employed it for forecasting future COVID-19 cases in India. The study indicates an ascending trend for the cases in the coming days. A time series analysis also presents an exponential increase in the number of cases. It is supposed that the present prediction models will assist the government and medical personnel to be prepared for the upcoming conditions and have more readiness in healthcare systems. The pandemic of 2019-nCov commenced from December 2019 in Wuhan, China and has caused extreme havoc in almost the whole world. 1,2 2019-nCoV or COVID-19, commonly known as Coronavirus, is a novel highly contagious virus belonging to Coronaviridae family that has been suspected to be transmitted to humans from animals. This virus causes mild to severe respiratory illness and death. 3 This pandemic has engulfed 185 countries/regions in merely four months infecting 1,949,210 people and taking the death toll to 123,348. 4, 5 However, the premature cases show the infection is less severe as compared to other coronaviruses such as Syndrome Corona Virus), the cases of rapid human-to-human transmission signify that 2019-nCoV is highly infectious than others. 6 Although a local seafood market in Wuhan is believed to be the source of exposure, 7 the scope of occurrence of this disease is not clear since its occurrence at present is so dynamic. 3 An apparent variation is present in epidemiological examinations and detection abilities performed by different countries for detecting infected cases. 8 Presently, the highest cases of 2019-nCoV infections have been reported in US, however, the cases are abruptly rising in Spain, Italy, France and Germany daily. 4 China, the place of origin of the disease, is now receiving a very few cases. 4 The first case of coronavirus infection in India was reported on 30 January 2020 in Kerela, which was an imported case from Wuhan city of China. 9 In the initial phase the spread was extremely slow and only 3 people were positive for more than a month. However, the numbers started rising exponentially after one month and continue to do so. The numbers in India have reached up to 10,453 for confirmed COVID-19 infected cases with 358 deaths and 1181 recoveries as reported on 13 April 2020. 4 At present, there is neither a treatment nor a vaccination for the COVID-19 infection. Currently, it is a major health crisis around the world and it would not be wrong to say that it is 'an enemy to humanity'. In this circumstance, the only option is preventing the occurrence of infection and preparing our healthcare system for the probable up-comings. In that reference, it is extremely crucial to construct models that are computationally competent as well as realistic so that they can help policy makers, medical personals and also general public. Modeling the disease and providing future forecast of possible number of daily cases can assist the medical system in getting prepared for the new patients. The statistical prediction models are useful in forecasting as well as controlling the global epidemic threat. In the present effort, we have employed Auto-Regressive Integrated Moving Average (ARIMA) model for predicting the incidence of 2019-nCov disease. As compared to other prediction models, for instance support vector machine (SVM) and wavelet neural network (WNN), ARIMA model is more capable in the prediction of natural adversities. 10 For our study, we have identified the best ARIMA model and then predicted the number the cases for the next 20 days. The main objective of the study is to find the best predictive model and apply it to forecast future incidence of COVID-19 cases in India. This data is used to build predictive models. For forecasting a time series, ARIMA modeling is one of the best modeling techniques. ARIMA models are always represented with the help of some parameters and the model is expressed as ARIMA (p, d, q). Here, p stands for the order of auto-regression, d signifies the degree of trend difference while q is the order of moving average. We have applied an ARIMA model to the time series data of confirmed COVID-19 cases in India. Autocorrelation function (ACF) graph and partial autocorrelation (PACF) graph is used to find the initial number of ARIMA models. These ARIMA models are then tested for variance in normality and stationary. Next, they are checked for accuracy by observing their MAPE, MAD and MSD values to determine the finest model to forecast. In addition, the best fit ARIMA model is compared with Linear Trend, Quadratic Trend, S-Curve Trend, Moving Average, Single Exponential as well as Double Exponential models using an output of measure of accuracy, viz. MAPE, MAD, MSD, so as to select the finest model to forecast. The finest model is the one which has the lowest value for all the measures. After fitting the model, its parameters are estimated 3 followed by verification of the model. The built model is employed to forecast confirmed COVID-19 cases for the next 20 days, i.e. 14 April 2020 to 3 May 2020. The model for forecasting future confirmed COVID-19 cases is represented as, Here, Xt is the predicted number of confirmed COVID-19 cases at t th day, α1, α2, β1 and β2 are parameters whereas Zt is the residual term for t th day. The trend of forthcoming incidences can be estimated from the previous cases and a time series analysis is performed for this purpose. Time series forecasting refers to the employment of a model to forecast future data based on previously observed data. 11 In the present study, time series analysis is used to recognize the trends in confirmed COVID-19 cases in India over the period of 22 January 2020 to 13 April 2020 and to predict future cases from 14 April 2020 till 3 May 2020. The level of statistical significance is set at 0.05. A graph is plotted for actual confirmed cases and predicted confirmed cases with respect to time to verify the efficiency of the model. To get an idea of the recovery and death trends in India, a graph is plotted with respect to time. A comparative study is also performed to examine the status of confirmed COVID-19 cases of India with respect to those of highly infected countries. A similar comparison is made with the countries of South-East Asia region as well. All the model developments, computations and comparisons have been performed using Minitab software (version 17). 12 The present work encompasses development of a model to forecast COVID-19 incidences in the coming days. The results for measure of model accuracy for ARIMA, Linear Trend, Quadratic Linear, S-Curve Trend, Moving Average, Single Exponential as well as Double Exponential model are presented in Table 1 ARIMA (2, 2, 2) model (Eq. (3)) is used to forecast confirmed COVID-19 cases in India for the next 20 days, i.e. 14 April 2020 to 3 May 2020. The forecast for cases is presented in Table 4 with 95% confidence interval by China, that is, severe control and quarantine, it can be expected that India will also recover soon because of its similar preventive measures. Time series analysis presents the meaningful statistics for confirmed COVID-19 data. Figure 2 For comparing the actual and forecasted confirmed COVID-19 cases, a time series graph is plotted starting from 30 January 2020 till 13 April 2020. The plot is represented by Figure 3 . The similarity of forecasted data with actual data is clear from these plots. This comparison reveals the precision of the model in forecasting. January 2020 to 13 April 2020. Trend for the number of recovery and death cases with respect to time due to COVID-19 infections in India depicted in Figure 4 . It is observed that the number of recoveries as well as deaths increase with time, however the rate of recovery is higher than the death rate. Thus, a low mortality rate could be expected from the disease. Figure 5 shows a comparative study of confirmed COVID-19 infection cases of India with respect to those of highly infected countries. According to the plot, US is the most infected while India the least infected of the selected countries, viz. US, Spain, Italy, France, Germany, China and Iran. It is very obvious as India was the last amongst these countries to get infected. However, the plot also reflects that China has been able to control the pandemic and is now presenting very few new cases. Thus, it follows that if strict prevention measures such as quarantine and sanitization are continued for some days, the situation could be controlled in the coming days. In the remaining countries, infected cases are growing exponentially and severe spread of infection is seen. A similar comparison is performed for the countries of South-East Asia region as well as shown in Figure 6 . A look at the Figure 6 suggests India to be the most infected amongst the South-East Asian countries followed by Indonesia and Thailand. All the three countries are presenting continuous rise in confirmed COVID-19 infections. The remaining countries of the region have a very low infection rate, lowest being in Timor-Leste. It is clear that measures like quarantine and sanitization can decrease human exposure and control this pandemic. Thus, these measures should be stringently imposed in India and strict actions must be taken against those people who violate the rules and don't consider the severity of the situation. Although a large amount of data helps in providing a more exhaustive prediction and explanation, in the present circumstance, these models could be valuable in anticipating future cases of infection if the pattern of virus spread didn't change abnormally. It is obvious that this virus is new and has the capability to be transmitted intensely. Hence, it may have an influence on the predictions, however as per our knowledge, in the present situation this model is the finest. The novel coronavirus disease (COVID-19) has been declared as pandemic by WHO and is currently a major global threat. In order to support the prevention of the disease and aid in the healthcare service preparation, we have conducted this study to examine the finest model for the prediction of confirmed COVID-19 infection cases and to employ that model for forecasting future COVID-19 infection cases in India. As per the model forecast, the confirmed cases are expected to greatly rise in the coming days. The time series analysis shows an exponential enhancement in the infected cases. However, it is also anticipated that the efforts such as lockdown may affect this prediction and cases may start to decline after a month approximately. A comparative study with some of the highly infected countries and countries in south-east Asia region indicates that India can still control the situation if the prevention measures such as quarantine and city sanitization are strictly followed. The prediction models will help the government and medical workforce to be prepared for the upcoming situations and have more readiness in healthcare systems. A novel coronavirus from patients with pneumonia in China Coronavirus infections-more than just the common cold Coronavirus (COVID-19) Cases -20 coronavirus outbreak The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China Quantifying bias of COVID-19 prevalence and severity estimates in Wuhan, China that depend on reported cases in international travellers Kerela confirmed first novel coronavirus case in India Comparison of the Ability of ARIMA, WNN and SVM Models for Drought Forecasting in the Sanjiang Plain Time series analysis. Basic statistics and data analysis Minitab 17 Statistical Software Both the corresponding authors are thankful to Presidency University, Bengaluru and Manipal University Jaipur, Jaipur for providing research facility. H.T. and T.C. conceptualized the project. H.T. designed the study, performed the computations and investigations, contributed to data analysis and wrote the manuscript. P.R. provided the resources. T.C. and V.S. supervised the study and reviewed the manuscript. The authors declare no competing interests. This research did not receive any specific grant from funding agencies in the public, commercial, or not-forprofit sectors.