key: cord-0794693-bavt82ib authors: Swapnarekha, H.; Behera, Himansu Sekhar; Nayak, Janmenjoy; Naik, Bighnaraj; Kumar, P. Suresh title: Multiplicative Holts Winter Model for Trend Analysis and Forecasting of COVID-19 Spread in India date: 2021-08-16 journal: SN Comput Sci DOI: 10.1007/s42979-021-00808-0 sha: dc4b6786957dad15487db230364f6f7cf8cf093e doc_id: 794693 cord_uid: bavt82ib The surge of the novel COVID-19 caused a tremendous effect on the health and life of the people resulting in more than 4.4 million confirmed cases in 213 countries of the world as of May 14, 2020. In India, the number of cases is constantly increasing since the first case reported on January 30, 2020, resulting in a total of 81,997 cases including 2649 deaths as of May 14, 2020. To assist the government and healthcare sector in preventing the transmission of disease, it is necessary to predict the future confirmed cases. To predict the dynamics of COVID-19 cases, in this paper, we project the forecast of COVID-19 for five most affected states of India such as Maharashtra, Tamil Nadu, Delhi, Gujarat, and Andhra Pradesh using the real-time data. Using Holt–Winters method, a forecast of the number of confirmed cases in these states has been generated. Further, the performance of the method has been determined using RMSE, MSE, MAPE, MAE and compared with other standard algorithms. The analysis shows that the proposed Holt–Winters model generates RMSE value of 76.0, 338.4, 141.5, 425.9, 1991.5 for Andhra Pradesh, Maharashtra, Gujarat, Delhi and Tamil Nadu, which results in more accurate predictions over Holt’s Linear, Auto-regression (AR), Moving Average (MA) and Autoregressive Integrated Moving Average (ARIMA) model. These estimations may further assist the government in employing strong policies and strategies for enhancing healthcare support all over India. Throughout history, it is evident that different contagious diseases have claimed the lives of many people and caused difficult conditions that take a long period to conquer the situation. In the past, the surge of smallpox has killed roughly 500 million people all over the world [1] . In 1918, an approximate of 17-100 million individuals has been killed due to the epidemic of Spanish influenza [2] . Several pandemics have been emerging from the last 20 years like severe acute respiratory syndrome coronavirus (SARS-CoV) outbreak of novel coronavirus since December 2019 in the city of Wuhan in South China has killed above hundreds and infected more than thousands of individuals within the first few days of the pandemic. The human coronaviruses that have originated from the animal reservoirs in the twenty-first century lead to a global epidemic with frightening morbidity and mortality. These viruses are named corona due to the appearance of a spike-like morphology on the external area under the electronic microscope. It is composed of single-stranded RNA belonging to the Coronavirinae subfamily, which belongs to Coronaviridae family. α, β, γ and δ are the four genera of these viruses. Mammals are usually infected by α-and β-CoV, while the birds are infected by γ-and δ-CoV. Less pathogenicity and mild respiratory syndrome as the common cold are caused by the HCoV-229E and HCoV-NL63 of alpha coronavirus and HCoV-HKU1 and HCoV-OC43 of beta-coronavirus. While, severe and malignant breathing infections are exhibited by the SARS-CoV and MERS-CoV of β-CoVs [3] . In December 2019, local hospitals in City of Wuhan in South China were reported with people diagnosed with unidentified pneumonia [4] . All the people diagnosed with unidentified pneumonia were connected to the Huanan Seafood Market where varieties of live species are available. The symptoms of these cases are similar to the clinical characteristics of pneumonia caused by virus. On 7 January 2020, the Centers for disease control (CDC) experts after analyzing samples gathered from the throat swabs, declared the disease as novel coronavirus pneumonia (NCP) [5] . Later, the ICTV (International Committee on Taxonomy of Viruses) named the novel virus as SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) [6, 7] . On 11 February 2020, the World Health Organization (WHO) declared the disease as novel COVID-19 [8] . The COVID-19 induced by SARS-CoV-2 associates to β-CoV. The genome structure of SARS-CoV-2 exhibits 79.5% similarity to SARS-CoV as it sustains eight residues of the SARS-CoV-binding residues [9] . As the SARS-CoV-2 genome sequencing exhibits 96.2% similarity to Bat Coronavirus RaTG13, both the bat coronavirus and human SARS-CoV-2 use the similar ancestor [10] . On 30 January 2020, the WHO announced the surge as a Public Health Emergency of International Concern (PHEIC) after the dissemination of COVID-19 to 18 countries as a result of person-to-person contact. In the United States, the major crisis was established when they identified the first case that was not carried from China on 26 February 2020. When the number of COVID-19 infections has raised 13 times in different regions of the world other than China and when the number of countries affected by COVID-19 has tripled, then the WHO has announced COVID-19 as pandemic on 11 March 2020 as it causes serious threat to the public health all over the world. The number of COVID-19 cases registered in different countries of the world has crossed all the previous records of other pandemics over time. It is considered the most dangerous disease till date due to its rapid transmission [11] . The first COVID-19 case in India was registered on January 30, 2020. In the month of March, the total of COVID-19 infections started escalating. Most of these cases are connected to the people having travel history to other countries that are affected by COVID-19 [12] . The Indian government has implemented strict actions by suspending all visas to India with effect from 13 March 2020. As of May 14, 2020, the cumulative number of registered cases in India is 81,997. The number of daily registered COVID-19 cases in India up to May 14, 2020 over 7 days MA is shown in the following Fig. 1 . From the past decades, the progress in sensor technology, biological understanding, and mathematical techniques is contributing to the growing significance of modeling in the field of health and bioinformatics. A mathematical model can be described as a depiction of a system utilizing mathematical notions and language to facilitate appropriate interpretation of a system or to analyze the influence of different elements and to generate predictions on patterns of behavior [13] . As mathematical modeling activity insists transparency and certainty regarding inferences, it enables us to evaluate our understandings of the epidemiology of infection by correlating model results with the recognized patterns. In the field of medicine, mathematical models are suitable for performing research on epidemiology, planning, and assessment of precautionary and control programs, clinical investigations, health and cost-benefit analysis, investigation of patients and in maximizing the efficacy of operations directed in attaining stated goals with existing resources [14] . A statistical model incorporates a set of statistical assumptions to approximate reality and to make predictions from these approximations. The advantage of statistical models is that it summarizes the results of a test and presents them in such a way so that one can more easily see and understand any patterns within the data. The usage of a statistical model allows clinical analysts to obtain moderate and accurate assumptions from gathered information and to make reliable decisions in the existence of ambiguity. A mathematical procedure known as decomposition method has been suggested by Adomian [15] to provide solutions to the problems of neuroscience, such as the conduction of nerve impulses, analyzing the behavior of the immune system or observation of medication effects, and so on. Further, the results demonstrate the accuracy and efficacy of the proposed method. A mathematical model to predict whether isolation and quarantine can stop the spread of SARS has been developed by Castillo-Chavez et al. [16] . The amount of data required to predict SARS has been reduced due to the simplicity of questions and assumptions in the proposed model. Further, results indicate that the recommended model can reduce the size of the SARS outbreak by a factor of 1000. To determine the risk of non-immune persons obtaining dengue when traveling, a mathematical model has been represented by Massad et al. [17] . Further, the model is tested using Singapore data and the results depict the robustness of the proposed mathematical model in predicting the risk of getting dengue when traveling to countries having dengue-endemic. To forecast the spread of infectious diseases like dengue, two statistical models, namely ARIMA model and the Knorr-Held two-component (K-H) model, have been suggested by Earnest et al. [18] . The proposed models have been validated on Singapore dengue fever data. Further, the performance of the models has been distinguished with the Mean Absolute Percentage Error (MAPE). The results show that the K-H model results in a lesser MAPE value of 17.21 and takes a longer time to execute when compared to the ARIMA model. To analyze clinical data and more complicated data, the concept of linear and logistic regressions along with a modern statistical model known as Bayesian networks has been described by Yoo et al. [19] . Using the modern statistical model, the interactions among clinical, genomic, and environmental data have been represented. Further, it is also concluded that the modern statistical model outperforms in analyzing both clinical and complicated data. To analyze tuberculosis epidemiology, a statistical model named a Bayesian model has been proposed by Getoor et al. [20] . Statistical relation models which are constructed using a data-driven method are used to model distributions over relational domains. The model has been applied to the San Francisco tuberculosis patient data. Further, results indicate the potentiality of the proposed model over other conventional statistical approaches. From the past few pandemics, the assessment of human loss and the prediction of mortality rate until certain period or closure of the pandemic has been performed successfully using the statistical models. In the present pandemic, researchers and technocrats have been using the same statistical procedures in the assessment of spread rate and mortality rate as these models show better performance in the prediction of earlier epidemics. The statistical model based on multivariate analysis has been proposed by Xu et al. [21] to determine the false-negative results as well as window period for testing positive. This model is used to determine the clinical symptoms that are important for detecting the false-negative results of SARS-CoV-2. Moreover, a prediction model based on the clinical characteristics has been proposed to identify the right time for testing. Further, the findings show that the proposed model provides better accuracy in the clinical diagnosis of the COVID-19 pandemic. To estimate the dynamics of disease transmission over time, a statistical model combined with data of COVID-19 cases in Wuhan has been proposed by Kucharski et al. [22] . The proposed model has been evaluated on publicly available datasets on cases in Wuhan as well as on the International cases exported from Wuhan. Based on the findings, the authors concluded that there will be a decline in the transmission of COVID-19 in Wuhan during late January 2020. An analysis based on Boltzmann's function to predict the number of deaths in China has been proposed by Gao et al. [23] . From the findings, it can be concluded that the assessment of the severity of the situation can be better predicted using the proposed method. To calculate the real number of contaminated people and to assume the infection fatality ratio (IFR), a novel mechanistic statistical model combined with the SIR (Susceptible, Infected and Recovered) has been proposed by Roques et al. [24] . The findings show that the IFR is compatible with the earlier findings in China (0.66%) and lesser than the earlier computed value on the Diamond Princess Cruise ship data (1.3%). A statistical model based on Holt's second-order exponential smoothing method and ARIMA model has been proposed by Poonia and Azad [25] to forecast COVID-19 infected patients in 28 states and 5 union territories of India. From the results, it can be observed that the cumulative number of cases in India will increase to 36,335.63 and simultaneously the mortality rate may increase to 1099.38 by 1 May 2020. The other analysis done on the applicability of mathematical and statistical models has been depicted in the following Table 1 . Besides the successful implementation of statistical models in the prognosis and forecasting of the COVID-19 pandemic, yet certain limitations exist. The Moving-Average model performs well with stationary data. This model does not consider the trend or seasonality of time series data. In the Auto-regressive model, the assumption of uncorrelated error is easily violated as the independent variables are timelagged values for the dependent variable. With the ARIMA model, the long-term forecasting generates poor prediction results. Although ARIMA model is the mostly used model for forecasting the time series, there are certain limitations of the model. The limitations of the ARIMA model are: (i) it does not have automatic updating feature as in smoothing models. Due to this reason, the entire modeling process has to be repeated from the beginning whenever new data are available, (ii) the likeness of ARIMA model to solve complex real-world problem is not always adequate as ARIMA models cannot handle the non-linear patterns [26] , (iii) it does not provide support for changes in the middle of the prediction phases [27] . Therefore, in this paper, we propose Holt's-Winter model for forecasting the time series data with seasonal and trend patterns. Holt-Winters method is a time-series forecasting method that is used to extract and interpret data and statistics and portray results to more precisely forecast the future trend based on past data. In the time series analysis, error trend seasonality forecast (ETS), ARIMA and Holt-Winters are the main classical models that have been widely used as predictors. Holt's-Winter is a statistical model also called as triple exponential smoothing model used for short-term forecasting with seasonal and trend patterns. In Holt-Winters model, components, such as level, trend, and the season, are necessary for forecasting. The value of these components ranges between 0 and 1. Based on the pattern of the season, Holt-Winters model is classified as an additive model and multiplicative model. The additive method is considered when the variations in the season are constant throughout the series, while the multiplicative method is considered when the variations in the season change relative to the level of series. If the seasonal effect is independent of the prevailing mean level of the time series, then Holt-Winters additive model is used. If the seasonal effect is dependent on the mean level of the time series, i.e., the seasonal variations rise with the rise in mean level of time series, then Holt-Winter multiplicative model is used [28] . In COVID-19 time series data, trends can be observed due to the repetition of certain patterns on regular intervals of time because of external factors like lockdown of country, mandatory social distancing, quarantines, etc. Therefore, in this research, multiplicative method has been considered as the variations in COVID data are quite frequent. In this method, the seasonal components are communicated in relative terms, such as percentages, and the series are seasonally balanced by isolating through the seasonal component. Algorithm 1 represents the procedure of Holt-Winters model for COVID forecasting. The algorithm of Holt-Winters multiplicative model makes use of state space model to provide exponential smoothing that is similar to the statistical foundations used in the regression and Box/Jenkins methodology [28] . In Holt-Winters multiplicative model, the relationship to Holt-Winters multiplicative smoothing equations is revealed by providing equivalent exponential smoothing equations for the transition equations of level and trend. The observation equation represented as " y t " is used to disclose the relationship between time series and state variables. The parameter " p t " represents the level for time series, " q t " represents the growth per period and " r t " represents the seasonal factor. The error term is represented by " t " which are independent of the past value of time series and state variables. The parameters " p t " " q t andr t " represent the Holt-Winters multiplicative smoothing equations. The parameter " m " is used to represent the frequency of seasonality that is the number of seasons in that particular year. The framework of the proposed work is represented in Fig. 2 . This research has been experimented on system setup with Lenovo T520 with Windows 10 Operating System and Intel Core i5 processor. The system is having 6 GB RAM. For Feeding data and preparation of valid data are the primary steps in building a model. In this study, we considered the data of patients of different states in India from Cov-id19india.org [29] are helpful to know the efficiency of the model in terms of error rate. RMSE is calculated as the square root of the mean value of the squared difference between predictions and actual outcomes as shown in Eq. (11) . Total Predictions MSE is used to determine the average squared difference between the estimated and actual outcomes as shown in Eq. (12) . The total number of predictions is indicated by 'n' in Eq. (12) . MAE is used to determine the errors among paired observations signifying the similar circumstance. In Eq. (13), 'n' indicates the total number of predictions MAPE is standard loss function used to denote the prediction accuracy of forecasting as displayed in Eq. (14) . The total number of predictions is represented using 'n' in the Eq. (14) . The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points n. Here, we discussed the parameter setting of statistical models of Holt-Winters, Holt's Linear, MA, AR and ARIMA model for various states of India like Andhra Pradesh, Maharashtra, Gujarat, Delhi, and Tamil Nadu. The performance of the model is evaluated using the RMSE. Parameter setting in each classifier including bagging is depicted in Table 2 . The first case in India was reported on January 30, 2020. In Table 4 shows some of the predictions of total number of cases using Holt-Winters, Holt's Linear, AR, MA and ARIMA model with respect to Andhra Pradesh, Maharashtra, Gujarat, Delhi and Tamil Nadu. The predictions are computed up to June 21, 2020. The actual number of cases registered has been depicted in Table 5 . From Tables 4 and 5 , it can be noted that the forecast of COVID-19 predicted cases of Holt-Winters model is in proximity with actual values of the registered cases in Andhra Pradesh, Maharashtra, Gujarat, Delhi and Tamil Nadu states. Therefore, it can be concluded that Holt-Winters model performed better predictions of COVID-19 when compared to the other models. The prediction of number of cases can also be inferred from Fig. 3a -e, which presents the capacity and pattern of each model in the prediction of actual values of COVID-19 cases for Andhra Pradesh, Maharashtra, Gujarat, Delhi, and Tamil Nadu individually. From the Fig. 3a -d, it can be observed that the prediction values of Holt-Winters model, which is represented using red tick, are nearer to the actual validation values of the trained model which are represented using green ticks. In Fig. 3e , the prediction value of ARIMA model, which is represented by yellow ticks, is nearer to the actual validation values of the trained model. Figure 4a -e represents the prediction of number of confirmed cases by various time series models for Andhra Pradesh, Maharashtra, Gujarat, Delhi, and Tamil Nadu, respectively, from May 15, 2020 to June 21, 2020. From the Fig. 4a -e, it can be inferred that the predictions of number of COVID-19 confirmed cases by the Holt-Winters model [30] has been performed among the obtained results of all the models over all the considered datasets. This test considers the average results of all the models in form of ranks (assigned in ascending order as per the performance) [31] and is a non-parametric test. A null hypothesis, "the entire models have similar performance and their differences are merely random", has been considered for conducting this test. Table 6 indicates the assigned ranks (in brackets) to all the models w.r.t. the datasets. By considering all the parameters of Friedman test, " X 2 F " has been evaluated as 16.31. After obtaining X 2 F , the F F statistic is computed and found to be 10.615. Finally, the critical value is obtained 5.19 which is computed from the F F statistic and degree of freedom by setting = 0.05 (significance level). The null hypothesis is rejected as the obtained critical value (5.19) is found smaller than the F F statistic (10.615). Here, the details for process of calculation of the Friedman rank X 2 F , F F statistic, and critical value can be found [32, 33] . Hence, the proposed model's performance and result are statistically significant and better as compared to other models under the studies. From the test results, it is observed that the performance of the proposed model is statistically significant as compared to other models. By combining the results from the performance metrics, table, and graphs, it is evident that the Holt-Winters method is an efficient model to fit the following growing trend when compared to the other models, such as Holt's Linear, MA, AR and ARIMA models, in forecasting the number of confirmed cases. Since the first case of COVID-19 in India, the number of registered cases is steadily growing and imposing a great threat to public health in India. In this paper, we employed the Holt-Winters model for forecasting the number of COVID-19 cases in Maharashtra, Tamil Nadu, Gujarat, Delhi, and Andhra Pradesh states of India up to June 21, 2020. The future number of cases has been predicted by analyzing the data from January 30, 2020 to May 14, 2020. The performance of the model has been evaluated using RMSE and the analysis shows that Holt-Winters method has less RMSE, MSE, MAPE and MAE value and generates more accurate predictions when compared with the RMSE, MSE, MAPE and MAE value of Holt's Linear, AR, MA and ARIMA models. From the analysis, it can be predicted that the number of cases in other states of India may also increase in the near future. Based on the predictions, the government has to employ strict policies, such as awareness programs, imposing strict lockdown, etc. to prevent the spread of transmission. Moreover, the government also has to implement necessary measures for enhancing the medical facilities throughout India. The authors declare that this manuscript has no conflict of interest with any other published source and has not been published previously (partly or in full). No data have been fabricated or manipulated to support our conclusions. Consent for publication I on behalf of the authors would like to state that the above manuscript is our original research work and it has not been published elsewhere. Also, it has not been submitted to any journal for publication. Smallpox: the death of a disease Reassessing the global mortality burden of the 1918 influenza pandemic SARS and other coronaviruses as causes of pneumonia Wuhan Municipal Health and Family Planning Commission Clinical features of patients infected with 2019 novel coronavirus in Wuhan Origin and evolution of pathogenic coronaviruses evere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the epidemic and the challenges World Health Organization. 2020. Laboratory testing for 2019 novel coronavirus (2019-nCoV) in suspected human cases. Interim guidance Coronaviruses: an overview of their replication and pathogenesis A pneumonia outbreak associated with a new coronavirus of probable bat origin COVID-19: a danger and an opportunity for the future of general practice The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Wasington: US Government Printing Office Mathematical models and their applications in medicine and health Solving the mathematical models of neurosciences and medicine Mathematical models of isolation and quarantine Risk estimates of dengue in travelers to dengue endemic areas using mathematical models Comparing statistical models to predict dengue fever notifications Big data analysis using modern statistical and machine learning methods in medicine Understanding tuberculosis epidemiology using structured statistical models Analysis and prediction of false negative results for SARS-CoV-2 detection with pharyngeal swab specimen in COVID-19 patients: a retrospective study Early dynamics of transmission and control of COVID-19: a mathematical modelling study Forecasting the cumulative number of COVID-19 deaths in China: a Boltzmann function-based modeling study Using early data to estimate the actual infection fatality ratio from COVID-19 in France Short-term forecasts of COVID-19 spread across Indian states until 1 Time series forecasting using a hybrid ARIMA and neural network model Prediction of the COVID-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (ARIMA) model Forecasting models and prediction intervals for the multiplicative Holt-Winters method Coronavirus Outbreak in India-Covid19india.Org The use of ranks to avoid the assumption of normality implicit in the analysis of variance Comparison of alternative tests of significance for the problem of m rankings A self adaptive harmony search based functional link higher order ANN for nonlinear data classification Elitist teachinglearning-based optimization (ETLBO) with higher-order Jordan Pi-sigma neural network: a comparative performance analysis First two months of the 2019 Coronavirus Disease (COVID-19) epidemic in China: real-time surveillance and evaluation with a second derivative model SEIR and regression model based COVID-19 outbreak predictions in India Optimization method for forecasting confirmed cases of COVID-19 in China COVID-19): ARIMA based time-series analysis to forecast near future COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: a data driven model approach Forecasting COVID-19 impact in India using pandemic waves nonlinear growth models. medRxiv ARIMA modelling of predicting COVID-19 infections