key: cord-1003311-4spnfg9z authors: Daniyal, Muhammad; Ogundokun, Roseline Oluwaseun; Abid, Khadijah; Khan, Danyal; Ogundokun, Opeyemi Eyitayo title: Predictive modeling of COVID-19 death cases in Pakistan date: 2020-11-07 journal: Infect Dis Model DOI: 10.1016/j.idm.2020.10.011 sha: 63120407b37c4ab079e25455779d246b7c69862a doc_id: 1003311 cord_uid: 4spnfg9z BACKGROUND: The world is presently facing the challenges posed by COVID-19 (2019-nCoV), especially in the public health sector, and these challenges are dangerous to both health and life. The disease results in an acute respiratory infection that may result in pain and death. In Pakistan, the disease curve shows a vertical trend by almost 256K established cases of the diseases and 6035 documented death cases till August 5, 2020. OBJECTIVE: The primary purpose of this study is to provide the statistical model to predict the trend of COVID-19 death cases in Pakistan. The age and gender of COVID-19 victims were represented using a descriptive study. METHOD: ology: Three regression models, which include Linear, logarithmic, and quadratic, were employed in this study for the modelling of COVID-19 death cases in Pakistan. These three models were compared based on R(2), Adjusted R(2), AIC, and BIC criterions. The data utilized for the modelling was obtained from the National Institute of Health of Pakistan from February 26, 2020 to August 5, 2020. CONCLUSION: The finding deduced after the prediction modelling is that the rate of mortality would decrease by the end of October. The total number of deaths will reach its maximum point; then, it will gradually decrease. This indicates that the curve of total deaths will continue to be flat, i.e., it will shift to be constant, which is also the upper bound of the underlying function of absolute death. The COVID-19 pandemic has emerged very rapidly worldwide, affecting nearly 5,488,825 individuals with 349,095 deaths [1] . Initially, COVID-19 was thought to be a zoonotic virus (bat to human transmission); however, recent studies and the exponential increases in the incidence of COVID-19 indicate complete evidence of transmission from person to person [2, 3, 4, 5] . The first human exposure case was connected to a "wet market" from Wuhan, Hubei Province, China, in late December 2019 [6, 7, 8, 9] . The source of transmission was via droplets when an individual infected cough; it then entered into the human body and caused deteriorating effects on the intestines, spleen, and lungs. Even a single cough of corona infected individuals can affect three healthy individuals and six immunocompromised patients [10, 11] . The issue of COVID-19 in Pakistan arrived from the Iranian territory as several thousands of citizens travel to pilgrimage the spiritual place in Iran. After which Pakistan decided to close its border from the entry of individuals from Iran on 23rd February 2020 [7] . Apart from Iranian pilgrimages, several cases were traced to Afghanistan [12] . Initially, the first two cases in Pakistan were announced on 26th February 2020 by the government, and it was established that the two patients had a travel history from Iran. To curtail the outbreak of the COVID-19, the federal government launched a quarantine policy on the Pak-Iran border city of Taftan [1] . Up till 5th August 2020, the number of confirmed deaths cases was 6035 [13] . The government of Pakistan has continued to enforce blended rules about social separating. Pakistan was forced to lock down mosques, huge gettogethers, mass gatherings, shopping malls, private institutions, universities, marriage halls. The government is taking strict actions and reassuring priests about the wellbeing measures. The current situation is unfavourable for Pakistanis that the cases keep increasing; therefore, specialists were encouraged to force a lockdown in numerous urban areas, yet this wasn't easy. Many people didn't keep to the rules of the lockdown in Italy or China, so it would be challenging to authorize in a nation like Pakistan. However, to date, partial lockdowns under section-144 have been enforced in all Pakistan [14] . Medical researchers often use linear regression to understand the relationship between drug dosage and blood pressure of patients. Quadratic regression model serves the purpose of modelling when a set of data shaped like a parabola and logarithmic regression models have been extensively used for modelling intensity of sound, yields of chemical reactions, production of goods, and growth of infants. Several statistical models can predict essential insights for public health interventions by observing "what if" scenarios. Therefore, this study aimed to predict changes in the cumulative number of COVID-19 related deaths for the coming weeks in Pakistan. This would help evaluate the impact of quarantine, social distancing, masks wearing, and smart lockdowns in the country. Three regression models were chosen, which were conventionally used in the literature for modelling and prediction purposes. Different model selection criteria have been extensively used in the literature like Kullback-Leibler divergence, Akaike information criteria, PRESS statistic, Bayesian information criteria, coefficient of determination, adjusted coefficient of determination, Mallow's C p . R 2 is one of the conventional criteria which has been used for model selection. The closer it is to 1, the better is the fit. The goodness of fit means how close an estimated value of Y is to its actual value in the given sample observations. But it increases with the increase in the predictors, so it is not the best choice because it may also increase the variance of forecast error. Adjusted R 2 is another choice as it accommodates the problem of considerable conflict. The most reliable techniques for model selection nowadays are Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC), as it imposes a penalty for adding regressors to the model. They set a harsher punishment than R 2 and Adjusted R 2 . The main advantage of using AIC and BIC is that they are beneficial for forecasting purposes. Machine learning and predictive approaches have been widely applied in the earlier researches in the part of infectious ailments, which time series forecasting is a branch of. Sources include models of leptospirosis and its rainfall-temperature relationship [14] , including temporal associations amid the continuing figure of cases of Plasmodium falciparum and El Niño Southern Oscillation (ENSO) [15] . Different methods have often been adopted for modelling pathogens that transpire in recurrent or repeated cycles, for instance, seasonal virus, for which a variety of researches have been released which utilized time-series demonstration to forecast possible epidemics. In [16] , and ARIMA [17] method was built to predict the regular occurrence of infection in China for 2012, whereas in [18] , a predictive time series method (Tempel) was projected for influenza change estimation. Further sources include research by Lee et al. [19] , who developed a time series method utilizing daily virus-linked tweet totals and used it to deliver instantaneous infection distribution evaluation. Zhang et al. [20] , designed a SARIMA method [17] utilizing Australian infection investigation and resident Internet pursuit data to forecast periodic flu contagion in the northern hemisphere. Time series analysis was used in [21] to examine the function of temperature variables in the public health of virus spread in 2 warmenvironment areas, Hong Kong and Maricopa County (Arizona U.S.). Dominguez et al. [22] utilizing an alternative time series method to investigate the actions of 2 infection incidence markers in the Barcelona area to enhance their identification. As far as COVID-19 predictions are concerned, there remained a flow in the systematic research available over the preceding months. Much of these researches depend on forecasting metrics linked to coronavirus, for instance, active cases with demises in China, someplace the virus first evolved. In [23] , real-time estimates of the total sum of confirmed infected individuals were generated in the China district utilizing threedifferent phenomenological methods commonly used to predict transmittable ailments, for instance, SARS, Aids, contagion, infection, and dengue. In similar research, Yang et al. [24] merged residents relocation data and public health data to form a Susceptible -Exposed -Infectious -Removed (SEIR) method and amalgamate it with artificial intelligence system prepared on the 2003 SARS datasets to forecast China's contagion arc. In [25] ,the asymmetrical feature was used to model the average and an overall number of diseases and demises, including the associated pandemic whirling opinions in China. An improved loaded auto-encoder was established in [26] to forecast the epidemic conveyance dynamics and to estimate the sum of documented COVID-19 crisis crosswise China. In contrast, Al-qaness et al. [27] projected an amalgamation of an adaptive neuro-fuzzy inference system (ANFIS) and a salp-swarm-procedure-improved flower pollination algorithm (FPA) to envisage established COVID-19 crisis. Simple mean-field models were used in [28] , an analysis covering China including 2 European nations, Italy and France, to forecast the distribution of the pandemic, and most importantly the height and duration of its outbreak in both of those nations. Cai, Jia, Feng, Li, Hsu & Lee [29] implemented the Multi-Task Gaussian Process (MTGP) regression method to boost wind speed arithmetical forecasts is studied in this article. In the proposed system, the Numerical Weather Forecasters (NWF) forecasts are first combined with a Support Vector Regressor (SVR). Pandey, Chaudhary, Gupta & Pal [30] employed SEIR and regression models for forecast built on datasets gathered from John Hopkins University repository in India. Model efficiency was measured using RMSLE and 1.52 for the SEIR model, and 1.75 for the regression method was obtained. The fault degree of RMSLE amid the SEIR and Regression methods was 2.01. To explain the progress of the COVID-19 contagion procedure, Hou et al. [31] established a properly varied SEIR compartmentalized method. The acceptable outcomes of the properly diverse SEIR method presumed that the latent individuals' interaction degree is amid six and eighteen, reflecting the potential effect on the disease infection rate of isolation and quarantine interventions. The findings indicate that strategies can efficiently decrease the overall sum of COVID-19 contagions and deferred the ultimate duration of diseases by decreasing the touch proportion, for instance, seclusion and confinement. Multivariate COX regression was used by Ji et al. [32] to classify the risk factors associated with development, and then implemented into the nomograph to construct an innovative estimation recording method. To test the consistency of the novel method, ROC was used [32] . Hao, Xu, Hu, Wang [33] employed Elman neural network, long short-term memory (LSTM), and support vector machine (SVM). An SVM with fuzzy granulation was employed to forecast the evolution range of recently established incidents, recent demises, and recently recovered persons. To derive the association amid various features and the dispersal degree of COVID-19, Malki et al. [34] suggested different regressor machine, learning models. The machine learning procedures used in this analysis evaluate the effect on the transmission of COVID-19 of weather elements, for instance, temperature and humidity by removing the association amid the sum of reported incidence and weather elements in some provinces. In 2020, a risk model for forecasting essential diseases such as death was developed by Schalekamp et al. [35] Including clinical, CXR and laboratory results. They used multivariable logistic regression. Verdict arch examination was also conducted, and a hazard simulator was imitated The following are the three regression models that were compared for the modeling and prediction purposes. Table 1 shows the estimations of parameters and value of AIC and BIC from three models for the corona deaths. R 2 for the linear regression model is 0.928, and the Adjusted R 2 is 0.861. The value of the coefficient of determination for logarithmic regression is 0.705, which showed that independent variables explain 70.5% of the variation in the dependent variable as compared to the R 2 value of quadratic regression (0.997). This is much higher than logarithmic and linear regression (0.994) but does not guarantee the excellent fit of the model because as we increase the number of independent variables, the value of R 2 changes. The essential criteria which have been extensively used in the literature for model comparison purpose are Akaike information and Bayesian information criteria. Akaike information criterion (AIC) is a fined technique based on in-sample fit to estimate the likelihood of a model to predict/evaluate the future values. AIC is an estimator of out of sample prediction error and thereby the relative quality of statistical models for a given set of data. Given a collection of models for the datasets, AIC estimates the rate of each model close to each of the other models. We ought to choose AIC and BIC criteria for the selection of a good model. The optimal model is selected based on the highest R 2 and minimum AIC and BIC. From table 1, it can be seen clearly that the quadratic regression model shows the best results for every model selection criterion. It has the minimum value of AIC and BIC among all three regression models, which is 330.71 and 141.81 respectively. The value of the coefficient of determination is 0.997 showing that independent variables explain 99.7% variation in the dependent variable. The amount of adjusted R 2 is 99.4% and also showing the same trend. The value of Durbin Watson d-statistic is 0.01, which lies in the autocorrelation area. So, there is evidence of autocorrelation, but this also does not have an impact on the prediction date. There is evidence of multicollinearity as the value of VIF is one which is showing a linear relationship between the linear and the quadratic trend, but this does not have an impact on the prediction date. The multicollinearity could have been avoided by taking the transformation of the variable. Figure 1 shows the comparison of fitting regression models. The observed data were plotted against the fitted data of all three models. Quadratic regression shows a better fit as compared to the other two models. Figure 3 shows the scatter plot, which indicates that there exists no precise pattern, and the points are diffused. For this reason, there does not exist heteroscedasticity in the quadratic regression model. Figure 2 shows the probability plot of residuals that is meaning that the residuals follow the normal distribution. The model could be used for prediction purposes because all assumptions were met. In the presentation of data, the number of deaths in Pakistan demonstrated by the was obtained from the National Institute of Health of Pakistan from 26 th February 2020 up to 5 th August 2020. After testing the primary unit root and some functional formats, the daily data fits well and suggests a statistically appropriate model [12] . The estimations of parameters through the quadratic model for the corona deaths were mentioned in Table 2 , so, the quadratic regression equation for the corona deaths is; The quadratic regression equation, trends with a negatively signed coefficient allows the fitted ln t Y ∆ to reach a maximum (both local and global) and then to change its direction from increasing to decreasing. The model is relatively statistically adequate for prediction purposes. Table 3 shows the observed and fitted data of deaths cases. Figure 4 shows the predicted and fitted deaths cases due to COVID-19 in Pakistan, which is offering a good fit for the model and suitability for prediction purposes. In this paper, we have proposed three regression models for the prediction of death cases by COVID-19 in Pakistan and selected quadratic modelling based on the model selection criterion. There are four stages of the epidemic, S1: exponential, S2: power law, S3: linear and S4: flat [36] . The death cases in Pakistan have entered the phase of balanced and quadratic regression in this term is giving an excellent fit to the data. The same model has been used by [37] in which he showed that such a regression model is good even in the early stages of the epidemic, which is generally said to increase exponentially and monotonically. Quadratic regression modelling has also been involved in the prediction purpose for Fenton treatment of municipal landfill leachate [38] . This model included both significant linear and quadratic parameters. This method of modelling has also been suggested in the situations where an estimation of the possible date of flattening the curve of the cases of infected individuals [39] . The same modelling technique has been used for projections for first-wave COVID-19 deaths across the U.S. using socialdistancing measures derived from mobile phones [40] . The quadratic time trend model was also applied to the log of new cases, that accurately predict the trajectory of the epidemic in China [41] . WHO data of the whole world, together with the initial statistics about China, indicates that the daily cases and the number of patients who have been recovered from this disease are trending high. Although there are deaths because of this disease, it's not trending upward. The number of deaths has been analyzed concerning gender, and it was concluded from the data of different countries that men are vulnerable to COVID-19 than women. This may be due to heart diseases, blood pressure, and smoking habits in men, which makes them weaker towards COVID-19 than women. It has also been observed from different countries that most affected age groups by this virus vary from country to country. The least affected age group from this virus around the globe is below 18 years. There is a significant difference between the average deaths and recovery cases since the recovered patients are right in numbers than fatalities. Studies showed that as far as issues and death rates are concerned, age and gender impacted differently. The Quadratic regression model has been selected from three regression models based on the model selection criteria; conventionally used methods are AIC and BIC for discussing the death cases. The model which has the smallest value of AIC and BIC among all the regression models, that model is used for modelling and prediction. After applying the predictive model, the rate of mortality is predicted to decrease by the end of October. The total number of deaths will be reached at the maximum point; then, it will gradually decrease. This indicates that the curve of total deaths will continue to be flat, i.e., it will shift into a constant that is also the upper bound of the underlying function of total deaths. For the deterministic part of the model, the definition remains. This interpretation holds for the deterministic aspect of the model. The coronavirus carriers are anonymous, and everyone is a potential carrier of the virus that could cause great havoc to society. The outbreak may rise to an unmanageable scenario. With the increased number of deaths, the government should consider lockdown decision with strict rules and regulations as well as the public should follow simple and basic prevention guidelines. Situation Report -128: World Health Organization Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. The New England journal of medicine COVID-19): World health organization Machine Learning Prediction for COVID 19 Pandemic in India. medRxiv MOBILE HEALTH APPLICATION AND COVID-19: OPPORTUNITIES AND CHALLENGES Progress of COVID-19 Epidemic in Pakistan The outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle Predictive modelling of COVID-19 confirmed cases in Nigeria COVID-19 prevalence estimation: Four most affected African countries The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor ACE2 and the cellular protease TMPRSS2 for entry into target cells Predictive modelling of COVID-19 confirmed cases in Nigeria Transmission Potential and Severity of COVID-19 in Pakistan2020 Government of Pakistan Modeling seasonal leptospirosis transmission and its association with rainfall and temperature in Thailand using time-series and ARIMAX analyses. Asian Pac The role of El Ni no southern oscillation (ENSO) on variations of monthly Plasmodium falciparum malaria cases at the cayenne general hospital Time series analysis of influenza incidence in Chinese provinces from An introductory study on time series modelling and forecasting. arXiv 2013 Tempel: time-series mutation prediction of influenza A viruses via attention-based recurrent neural networks Forecasting influenza levels using real-time social media streams Predicting seasonal influenza epidemics using cross-hemisphere influenza surveillance data and local Internet query data Modeling, and predicting seasonal influenza transmission in warm regions using climatological parameters Monitoring mortality as an indicator of influenza in Catalonia Real-time forecasts of the COVID-19 epidemic in China from 5th Modified SEIR and A.I. prediction of the trend of the epidemic of COVID-19 in China under public health interventions Trend, and forecasting the COVID-19 outbreak in China Artificial intelligence forecasting of covid-19 in china Optimization method for forecasting confirmed cases of covid-19 in China Analysis and forecast of COVID-19 spreading in China Gaussian Process Regression for numerical wind speed prediction enhancement SEIR and Regression Modelbased COVID-19 outbreak predictions in India The effectiveness of quarantine of Wuhan city against the Corona Virus Disease 2019 (COVID-19): A well-mixed SEIR model analysis Prediction for progression risk in patients with COVID-19 pneumonia: the CALL Score Prediction and analysis of Corona Virus Disease Association between weather data and COVID-19 pandemic predicting mortality rate: Machine learning approaches Model-based prediction of critical illness in hospitalized patients with COVID-19 COVID-19 pandemic: Powerlaw spread and flattening of the curve Logarithmic Quadratic Regression Model for Early Periods of COVID-19 Epidemic Count Data Application of the quadratic regression model for Fenton treatment of municipal landfill leachate Modelling the Corona Virus Data in Switzerland2020 Projections for firstwave COVID-19 deaths across the U.S. using social-distancing measures derived from mobile phones. medRxiv When will the Covid-19 pandemic peak? The authors declared there is no conflict of interest during this study.