key: cord-0062755-czyscqgq authors: Ahsan-ul-Haq, Muhammad; Ahmed, Mukhtar; Zafar, Javeria; Ramos, Pedro Luiz title: Modeling of COVID-19 Cases in Pakistan Using Lifetime Probability Distributions date: 2021-05-04 journal: Ann DOI: 10.1007/s40745-021-00338-9 sha: bd9b1dc1bd2be77deedd7b043a9833cade14fee7 doc_id: 62755 cord_uid: czyscqgq The Coronavirus Disease (COVID-19) is a respiratory disease that caused a large number of deaths all over the world since its outbreak. The World Health Organization (WHO) has declared the outbreak a global pandemic. The understanding of the random process related to the behavior infection of COVID-19 is an important health and economic problem. In the proposed study, we analyze the frequency of daily confirmed cases of COVID-19 using different two-parameter lifetime probability distributions. We consider the data from the period of March 11, 2020, to July 25, 2020, of Pakistan. We consider nine lifetime probability distributions for the analysis purpose and the selection of best fit was carried out using log-likelihood, AIC, BIC, RMSE, and R(2) goodness-of-fit measures. Results indicate that Weibull distribution provides generally the best-fit probability distribution. A viral infectious disease named coronavirus 2019 (COVID- 19) was initially reported in the mid of December in Wuhan City of China [1] . COVID-19 spread worldwide and it affected more than 213 countries including Pakistan [2] . It is an infectious disease caused by Severe Acute Respiratory Syndrome (SARS-COV-2). The COVID-19 infection leads to respiratory illness and has the most common symptoms like fever, dry cough, tiredness, other symptoms are also widely reported such as sore throat, diarrhea, and loss of taste or smell, aches, and pains [3] . It is an exceptionally infectious and spreads utilizing real contacts and a respirational globule from the tainted ones, which is presently the principal wellspring of transmission of the malady. The infection can be active as long as 12 h or even two days on a reached surface [4] . In Pakistan, the first report of COVID-19 emerged on 26th February 2020 with two positive cases, within 2 days three new cases were reported in different cities without a connection between these patients [5] . Further, reported cases increased constantly until 12th June, where 139,230, positive cases were reported, later there was a decreasing trend of total cases. The total number of confirmed cases until 25th July was 273,113. The province wise detail of COVID-19 positive cases of Punjab, Sindh, KPK, and Baluchistan was 91,901, 117,598, 33,220, and 11,578 respectively. The COVID-19 became a worldwide pandemic and its spread could be controlled by taking preventive measures. For the patients, all symptoms above should be ceaselessly checked with essential signs and to maintain a strategic distance from additionally spread, they ought to be hatched with severe clinical measures under preventive rules. The administration needs to discover a system to fight this war in an opportune manner, for example, specialists took further proportions of shutting fringes, suspending network administrations and schools, limiting both local and universal goes until further notification [6] . The reason for these measures is to constrain the odds of physical contact among individuals with the goal of controlling the transmission of COVID-19, especially because the brooding time frame for this infection is moderately longer than different infections. Because of the novel nature of the virus, there is more prominent vulnerability around the choice on the ideal season of the vanishing of this sickness. In this manner, transient determining is critical even in the smallest insight for anticipating the up and coming month for the better administration of the cultural, financial, social, and general medical problems [7] . Data science techniques have been used to describe the behavior of pandemies, crop harvesting, business data mining, e-commerce fraud as well as others applied problems [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] . In the previous, not many months' scientists have created or utilized existing scientific and measurable strategies to anticipate the quantity of COVID-19 cases and related results. The summed up strategic model shows that pestilence development was exponential in china [20] . In view of the forecast, the circumstance will be exacerbated in whole Europe and the USA will turn into the focal point of new cases during the mid of April 2020 [21] . Around 115 million individuals are already tainted worldwide by March, 05, 2021 with more than 2,570,000 deaths. Expectations/gauges help to reinforce the procedures to keep the pandemic from compounding. Soltani-Kermanshahi et al. [22] worked on the statistical distribution of novel coronavirus in Iran. The study compared three types of parametric distributions known as normal, log-normal, and Weibull distribution of COVID-19 cases based on daily reported data of Iran. Yousaf et al. [5] conducted statistical Analysis of forecasting COVID-19 for the upcoming month in Pakistan. Due to a lack of epidemiological analyses, there are many uncertainties in assessing the risk of this disease in the population. In Pakistan, it will take at least a year for any future treatment or vaccination of COVID-19. In the meantime, the only way to avoid contact with this virus is through precautionary measures and Lockdowns. It causes economic problems and it is not easy to implement without economic losses. So, effective decisions by policymakers or SOPS need to be implemented. In short, the proper modeling of a pandemic can reduce the exponential spread of this infection. Researchers are needed to fully explain its pathways and mechanisms and to identify potential curative targets, which can be effective in developing common preventive and therapeutic targets. This Global Problem has attracted the interest of researchers, giving rise to several proposals to analyze and predict the evolution of pandemic. The first importance is to check the behavior of the number of cases of COVID-19. For this, we considered different parametric distributions to describe the number of daily reported COVID-19 cases in Pakistan. This paper aimed to identify the best fit model for the analysis of daily confirmed COVID-19 cases in Pakistan, as well as province wise. It is considered the most common two-parameter lifetime model to fit the data. To the best of our knowledge, for the first time, these probability distributions are used for modeling the number of occurrence of COVID-19 cases. The daily confirmed cases are taken from four provinces of Pakistan (Punjab, Sindh, KPK, and Balochistan). The parameters are estimated using the maximum likelihood approach. The best fit model selection was carried out using AIC, BIC, Coefficient of determination (R 2 ) and root mean square error (RMSE) criteria. The rest of the paper is as follows; Sect. 2 is based on information on Covid-19 data of selected regions. In Sect. 3 description of statistical models, Sect. 4 is presented by information about model evaluation measures. In Sect. 5, Data is analyzed by Parameter estimates and goodness of fit measures. Finally, conclusions, discussions, and future research are given in Sect. 6. Lifetimes models are mathematical functions that return the probability of observing the event of interest given a specific time. Usually referred to as probability density function (pdf), this function is used to achieve the probability that the event takes values in a given time interval. Here, the event of interest is the daily occurrence of COVID-19 in the Pakistan population. This section presents a brief description of the two-parameter models that will be considered in this study. Exploring the literature, some common probability distributions are used as lifetime distributions. For instance, Weibull distribution (WD), Power function distribution (PFD), Log-Logistic distribution (LLD), Log-Normal distribution (LND), inverse Weibull distribution (IWD), Gumbel distribution (GuD), Burr III distribution (BIIID), Burr XII distribution (BXIID), and Birnbaum Saunders distribution (BSD). The probability density function and range of parameters, range of pdf are given in Table 1 . The two-parameters models considered here are standard in statistical analysis and their properties, applicability, and inferential procedures are presented in the statistical literature. Our aim here is not proposed new distributions but to verify if some of the well-established distributions can be used to describe the frequency numbers of Covid-19 cases. We collect the data for daily positive cases of COVID-19, the time period was from March 11, 2020, to July 25, 2020, which were obtained from the public reports of the National Institute of Health (NIH)-Islamabad, Pakistan. It is also considered the confirmed daily case data from four provinces, Punjab, Sindh, Khyber Pakhtunkhwa (KPK), and Balochistan. Table 2 presents an exploratory analysis related to the COVID-19. Here, it is considered the following goodness-of-fit measures for the selection of bestfitted probability distribution. The measures are Akaike information criterion (AIC), Bayesian information criterion (BIC), Root mean square error (RMSE), and Coefficient of determination (R 2 ). The test statistics are; x > 0 and , > 0 α: Scale β: Shape x > 0 and , > 0 α: Scale β: Shape x > 0 and , > 0 α: Scale β: Shape is log-likelihood function evaluated at the MLEs and k refers to the number of parameters in the model. For each parameter i , MLE involves maximizing the likelihood function by solving the following: We apply such approach to obtain the likelihood functions for the parameters of the selected models, in this case, numerical techniques were used to obtain such parameter estimates. Interested readers can use statistical softwares such as R with packages that contains some of the cited models implemented, see for instance, Delignette-Muller and Dutang [23] . The codes and routines to obtain the parametes estimates can be obtained upon request. The parameters of the probability distributions are estimated using the maximum likelihood estimation method. Table 3 presents the estimates for the parameters of all probability models. Table 4 provides the results related to the goodness of fit measures. For Pakistan COVID-19 daily cases, W, Gu, PF, and LL distributions seem to have maximum R 2 and minimum AIC, BIC, and RMSE. Hence, among the selected distributions, we conclude that these four distributions can be utilized for describe the distributions of the diary number cases. For Punjab, we observed that W, LL, LN, and Gu distributions returned better fit than the other distributions with smaller RMSE, AIC, and BIC and higher R 2 values. Similar conclusions with the Weibull, LL, LN, and Gu distributions are observed for Sind, KPK, and Balochistan provinces. Overall, it is evident from Table 4 that the best suitable model to describe the data of the different provinces of Pakistan is Weibull distribution. Figures 1, 2 presents a box-plot of R 2 , RMSE, AIC, and BIC with the results obtained from the different models. As can be seen in the figures, we can easily identify the Weibull distribution performed better than the other models. Figure 3 provides the adjusted Weibull distribution with the empirical distributions for Pakistan, and Punjab, Sindh, KPK, and Balochistan provinces. It can be seen the figures that Weibull distribution has a good fit for all the considered datasets, which confirms the goodness of fit tests. Hence, the findings indicate that using Weibull distribution for analysis of COVID-19 daily cases returns more accurate probabilities than using the competitor distributions. From the adjusted results we can compute the expected number of cases assuming different levels of probability. The values can be computed from where and k are the MLEs available in Table 3 , x is the integer part of x and p is the probability level. As an example, assuming a probability level ofof 0.5 and using the estimates from Pakistan, we have that x 0.5 = 1241. It is important to point out that computing estimates in real-time play a key role as a tool for decision making during pandemic periods.In this way, we have provided the necessary codes in R (available in Supplemental Material) to update the estimates and compute the expected values according to different levels. The current study is conducted to analyze COVID-19 daily case data of the Pakistan region, as well as also analyze province wise. Our focus was also to identify the appropriate two-parametric models that can be used to describe the distribution of the daily number of positive COVID-19 cases. It is concluded that the Weibull distribution returned better results when compared with other well-known distributions with two parameters. This conclusion is based on widely used metrics to discriminate models such as R2, AIC, BIC, and RMSE. Visual confirmation was also observed comparing the empirical distributions with the adjusted by the Weibull distribution with different parameters. An interesting aspect of our findings is that while most of the analysis conducted with COVID-19 are aimed to flat the curve of the distributions due to the temporal observations ( the number of infected does not pass a threshold that could collapse the health system) here, we aim to obtain graphs with an exponential decay without a very long-tail, this would imply that there are Additionally, with the adjusted parameters of the Weibull distribution, we can use the complementary of the cumulative distribution to estimate the probability that a number of cases could be greater or equal to a determinate number of positive cases of COVID-19 in Pakistan or its provinces. To the best of our knowledge, no comparison have been considered using the proposed lifetime models. To the best of our knowledge, no comparison has been considered using the proposed lifetime models. These results are of main interest during resource allocation planning or social isolation policies. Coronavirus infections-more than just the common cold COVID 19 pandemic and Pakistan; limitations and gaps Probiotics and COVID-19 Finding an accurate early forecasting model from small dataset: a case of 2019-ncov novel coronavirus outbreak Statistical analysis of forecasting COVID-19 for upcoming month in Pakistan Transmission potential and severity of COVID-19 in Pakistan Forecasting the novel coronavirus COVID-19 Monitoring novel corona virus (COVID-19) infections in India by cluster analysis Outbreak prediction of COVID-19 for dense and populated countries using machine learning Culture vs policy: more global collaboration to effectively combat COVID-19. The Innovation What are the underlying transmission patterns of COVID-19 outbreak? An age-specific social contact characterization EClincialMedicine Probability on graphical structure: a knowledge-based agricultural case Introduction to business data mining Optimization based data mining: theory and applications Internet of things, real-time decision making, and artificial intelligence Risk management in e-commerce: a fraud study case using acoustic analysis through its complexity Modeling traumatic brain injury lifetime data: improved estimators for the generalized gamma distribution under small samples COVID-19: immunopathology and its implications for therapy Predicting the evolution and control of COVID-19 pandemic in Portugal Real-time forecasts of the COVID-19 epidemic in China from Forecasting the dynamics of COVID-19 pandemic in top 15 countries in April 2020: ARIMA model with machine learning approach Statistical distribution of novel coronavirus in Iran fitdistrplus: an R package for fitting distributions Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Data Availability Data sets are available on https:// covid. gov. pk/.