key: cord-0880334-axasna8z authors: Rath, Smita; Tripathy, Alakananda; Tripathy, Alok Ranjan title: Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model date: 2020-08-01 journal: Diabetes Metab Syndr DOI: 10.1016/j.dsx.2020.07.045 sha: a475b100fbeb0c806024decc63ae72429a73abfc doc_id: 880334 cord_uid: axasna8z INTRODUCTION AND AIMS: The COVID-19 pandemic originated from the city of Wuhan of China has highly affected the health, socio-economic and financial matters of the different countries of the world. India is one of the countries which is affected by the disease and thousands of people on daily basis are getting infected. In this paper, an analysis of daily statistics of people affected by the disease are taken into account to predict the next days trend in the active cases in Odisha as well as India. MATERIAL AND METHODS: A valid global data set is collected from the WHO daily statistics and correlation among the total confirmed, active, deceased, positive cases are stated in this paper. Regression model such as Linear and Multiple Linear Regression techniques are applied to the data set to visualize the trend of the affected cases. RESULTS: Here a comparison of Linear Regression and Multiple linear Regression model is performed where the score of the model [Formula: see text] (tends to be 0.99 and 1.0 which indicates a strong prediction model to forecast the next coming days active cases. Using the Multiple Linear Regression model as on July month, the forecast value of 52,290 active cases are predicted towards the next month of 15th August in India and 9,358 active cases in Odisha if situation continues like this way.) CONCLUSION: These models acquired remarkable accuracy in COVID-19 recognition. A strong correlation factor determines the relationship among the dependent (active) with the independent variables (positive, deceased, recovered). In the beginning of 2020, the first case of COVID-19 pandemic in India was reported on 30 January 2020. COVID-19 is corona virus disease caused by SARS-CoV 2 (severe acute respiratory syndrome coronavirus 2). The novel corona virus was first originated from the wet market of Wuhan a city in China [1] . These plays a havoc on human by claiming 523,011 lives worldwide according to the world health organization [2] . The virus spread among the people more often through small droplets released by coughing, sneezing and talking in the close contact [3] . Instead of moving long distance through air the droplet falls onto the ground or surface. The basic symptoms of the COVID-19 are fever, cough, shortness of breath, loss of sense and fatigue [4] . Other symptoms include breathing difficulty and chest pain. To prevent the spreading of virus number of measures are being carried out like personal hygiene, washing hand frequently with soap and water, using face mask and making social distancing. In order to prevent the transmission of virus many countries impose shutdown and lockdown. The first case of COVID-19 is detected in 30 January 2020. In India,the COVID-19 has huge impact. According to the world health organization report [2] the number of cases in India is 793, 802 confirmed cases of the virus as on 11 July the total number of samples tested so far is more than one crore. The number of fatalities due to COVID-19 pandemic is 21,604 till date. As per the government of India report [5] the worst effected states and union territory are Maharashtra with 2,30,599 cases and the number of deaths is 9667, Tamil Nadu with 1,26,581 cases followed by Delhi with 1,07, 051.India declares nationwide lockdown to stop the exponentially growth of infection that affected in other countries like Italy [6] . The nationwide lock down is made in order to flatten the infection curve in India. The focus of the paper lies in finding out each daily active cases or new confirmed COVID-19 cases using a regression model that will be helpful in forecasting the next day's scenario of the country. The objective was to identify the relation among the data collected from each day and thus could make a significant contribution to a reliable data, and project a forecasting model for India as well as Odisha. We felt this was of utmost importance, because it would help to clarify the potential plan of action as well as prepare it. Multiple linear regression model is proposed for prediction of Active cases in COVID-19 daily data. The model predicts a value of 52,290 active cases in India and 9,358 active cases in Odisha towards the 15 th of August . 3. The ANOVA results shows a significant P value that accepts the proposed model. Statistical results show MLR model has fair predictive potential over the LR model. The whole data set was collected from WHO (Table 1 and Table 2 ) which is very close to +1. Initially the data cleaning process is performed on the two data set to remove any missing values. Then a correlation analysis is performed on the data sets using Python programming through Spyder of Anaconda Navigator App. Then Linear regression model is used to evaluate the relative impact of active cases due to daily positive cases in Odisha as well as in case of India data. The key goal of linear regression is to fit a straight line with the data forecasts Y for X where Y is the total number of daily active cases and X is the total number of positive cases. The least squares method is commonly used to estimate the intercept and slope regression parameters which define the line. The below Figures 1 and 2 shows the average peak values of active cases in part of Odisha as well as India. The model can be expressed as in equation (1) where are dependent and independent variable, is the intercept and is the regression parameter as slope and is the random error respectively. (1) The limitations of Linear regression are that it often explores a relation between the mean of the input variables and output variables. Just as the mean is not a full description of a single variable, linear regression is just not a clear understanding of variable relationships. Therefore, an analysis of the various factors is done using Multiple Linear Regression (MLR) model. The dependent variable (target variable) is dependent on many independent variables, in this case. You can describe a regression equation involving multiple variables as: Where Y is the predictor or target variable and 1, 2, 3 x x x are the independent variables.β is the y-intercept and β_o,β_1,β_2 and ε are the coefficients and error term respectively. Case 1: The two data sets are first spilt into eighty percent training set and twenty percent testing set and then Linear Regression is performed to train the first 80% set. Here the number of daily active cases are predicted based on daily positive cases. Here the model generates the coefficients to find the next active cases number on the test set as shown in Table 3 . As we explained, the regression line effectively selects the right value for the intercept and slope resulting in a line that fits the criteria best. to predict the daily number of active cases. We can derive a relation between the above variables using the correlation factor from the above Table 1 and 2. Table 4 represents the MAE, MSE, RMSE, intercept, Score and Coefficient for the predictor model. The value shows the predictor Multiple Linear Regression model to be more accurate as compared to the results obtained using Linear Regression model. From the above Table 4 , an equation is established as follows for predicting the next day active cases as (3) and (4) A visualization of 25 records in terms of actual vs predicted values are shown in the below bar graph in Figure 5 and 6 which shows the closeness between them. The forecast values for the next few days are shown in the Figure 7 and 8 using the above prediction model. India as well as Odisha as one of its state is now at a critical reaction point as shown in the above TOPSIS is studied by Luu, von Meding, and Mojtahedi [12] for predicting disaster from data. Similarly, the relationship between the mechanical properties of the tea stem and their impact factor has been studied by Du, Hu, and Buttar [13] to improve the picking efficiency of the tea plucking machine using MLR technique. Kadam et.al. [14] uses Artificial Neural Network and Multiple Linear Regression to predicting ground water quality fitness for drinking from Shivganga river basin located in the eastern slope of the western Ghats, India. Xu and Yan [15] to solve the calculation of probabilistic load flow (PLF). To reduce the number of accidents Jomnonkwao, Uttra, and Ratanavaraha [16] in their paper provide plan which required future forecast data using regression models. Yuchi et.al [17] uses Multiple Linear Regression and random forest model to predict the particulate matter increasing the death and diseases. This prediction model will speculate the advance situation that is coming in days and From the above training and testing of the prediction models, it was found to be an effective way to forecast the next number daily active cases during second week of August as we can see the forecast figure shows the active number of cases will tend to be around upper confidence value to be 10,134 cases and lower confidence value of 8,582 cases in Odisha and similarly the upper confidence value of forecast is around 48,711 and lower confidence value as 55,868 in case of India. These models acquired remarkable accuracy in COVID-19 recognition. Bearing in mind these projected active results, the current estimated for COVID-19 containment needs to be reinforced or updated. Our framework could assist and protect healthcare professionals, government officials in making plans appropriate to cope with the influx of future COVID-19 patients. Modes of transmission of virus causing COVID-19: implications for IPC precaution recommendations: scientific brief. World Health Organization Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India Partial correlation analysis using multiple linear regression: Impact on business environment of digital marketing interest in the era of industrial revolution 4.0 Multiple linear regression for reconstruction of gene regulatory networks in solving cascade error problems A study on multiple linear regression analysis Predicting strength of recycled aggregate concrete using artificial neural network, adaptive neuro-fuzzy inference system and multiple linear regression Application of artificial neural network and multiple linear regression in modeling nutrient recovery in vermicompost under different conditions Analyzing Vietnam's national disaster loss database for flood risk assessment using multiple linear regression-TOPSIS Analysis of mechanical properties for tea stem using grey relational analysis coupled with multiple linear regression Prediction of water quality index using artificial neural network and multiple linear regression modelling approach in Shivganga River basin Probabilistic load flow calculation with quasi-Monte Carlo and multiple linear regression Forecasting road traffic deaths in Thailand: Applications of time-series, curve estimation, multiple linear regression, and path analysis models Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city The model predicts a value of 52,290 active cases in India and 9,358 active cases in Odisha towards the 15th of The ANOVA results shows a significant P value that accepts the proposed model. 4. Statistical results show MLR model has fair predictive potential over the LR model [ In this paper, I/we show that a comparison of Linear Regression and Multiple linear Regression model is performed where the score of the model tends to be 0.99 and 1.0 which indicates a strong prediction model to forecast the next coming days active cases of COVID-19 in India as well as Odisha. A strong correlation factor determines the relationship among the dependent(active) with the independent variables (positive, deceased, recovered).We believe that this manuscript is appropriate for publication by Diabetes & Metabolic Syndrome: Clinical Research & Reviews because of its publishing quality articles rapidly and openly available to researchers worldwide. We also confirm that the manuscript contains no material, the publication of which would violate any copyright or proprietary rights of any person or entity.We have no conflicts of interest to disclose.Please address all correspondence concerning this manuscript to me at [email address].Thank you for your consideration of this manuscript.