key: cord-0857547-madqizvz authors: Almetwally, Ehab M.; Abdo, Doaa A.; Hafez, E.H.; Jawa, Taghreed M.; Sayed-Ahmed, Neveen; Almongy, Hisham M. title: The new discrete distribution with application to COVID-19 Data date: 2021-12-05 journal: Results Phys DOI: 10.1016/j.rinp.2021.104987 sha: 7da168cc474cf55005324bc6902e47305252fa2c doc_id: 857547 cord_uid: madqizvz This research aims to model the COVID-19 in different countries, including Italy, Puerto Rico, and Singapore. Due to the great applicability of the discrete distributions in analyzing count data, we model a new novel discrete distribution by using the survival discretization method. Because of importance Marshall- Olkin family and the inverse Toppe-Leone distribution, both of them were used to introduce a new discrete distribution called Marshall–Olkin inverse Toppe-Leone distribution, this new distribution namely the new discrete distribution called discrete Marshall- Olkin Inverse Toppe-Leone (DMOITL). This new model posses only two parameters, also many properties have been obtained such as reliability measures and moment functions. The classical method as likelihood method and Bayesian estimation methods are applied to estimate the unknown parameters of DMOITL distributions. The Monte–Carlo simulation procedure is carried out to compare the maximum likelihood and Bayesian estimation methods. The highest posterior density (HPD) confidence intervals are used to discuss credible confidence intervals of parameters of new discrete distribution for the results of the Markov Chain Monte Carlo technique (MCMC). Corona viruses are a huge family of viruses that can cause a variety of diseases varying from the common cold to much more serious conditions such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). In Wuhan, China, a new Coronavirus (COVID-19) was discovered in 2019. This is an extremely new coronavirus that has not been found in people before. The coronavirus disease 2019 (COVID-19) has been declared a pandemic by the World Health Organization (WHO). To stop the virus from spreading further, a concerted global effort is required. A pandemic affected a wide geographic area and affecting an exceptionally high proportion of the population." The H1N1 flu pandemic in 2009 is the last pandemic reported in the world. There are numerous scientists that examined the pandemic Covid-19 and created models to match the data and offer predictions about the projected number of cases to aid the nations to make choices about prevention strategies. For example, see El-Morshedy et al. [1] he presented a new discrete distribution, a discrete generalised Lindley, for analysing everyday coronavirus infections in Hong Kong and daily new fatalities in Iran. Maleki et al. [2] he predicted recovered and verified COVID19 cases using an autoregressive time series model based on the two-piece scale mixture normal distribution. Nesteruk [3] and Batista [4] they studied the daily new COVID-19 cases in China were anticipated using a mathematical model dubbed susceptible, infected, and recovered (SIR). Almongy et al. [32] introduced a new modelling of the COVID-19 mortality rates in Italy, Mexico, and the Netherlands. Liu et al. [6] discussed new modeling of the survival times for the COVID-19 patients in China. By using the inverse transformation to random variables, we proposed the inverse distributions. These distributions display different features in the behavior of the density and hazard rate shapes. Many authors discussed the inverted distributions and their applications. Some of the well-known inverted models are inverse Weibull distribution (Calabria and Pulcini [7] , Muhammed and Almetwally [8, 9] ), inverted Topp-Leone (ITL) (Hassan et al. [10] , Almetwally et al. [11] , Hassan et al. [12] and Almetwally [13] ) among others. Hassan et al. [10] proposed the ITL with CDF given by where ϑ > 0 is the shape parameter. The probability mass function (PMF) related to equation (2) is given by We utilize discrete distributions in countable data analysis since most existing continuous distributions do not produce appropriate results for modelling COVID-19 cases, and counts of deaths or daily new cases exhibit significant dispersion. The survival discretization method is the most often used method for generating discrete distributions, and it necessitates the presence of a cumulative distribution function (CDF). Time is divided into unit intervals, and the survival function should be continuous and non-negative. Roy [14] defines the discrete distribution PMF as follows: Where is a continuous distribution CDF and Θ is a parameter vector. If the CDF of the random variable X has P (X = x) = F (x + 1; Θ), it is considered to have a discrete distribution. The hazard rate is given by hr(x) = P (X = x)/(S(x)). The discrete distribution's reversed failure rate is given as rf r(x) = P (X = x)/(1 − S(x)). Discrete Burr type XII and discrete Lomax distributions were proposed by Para, and Jan [15] . Discrete data with heavy tails can be modeled using Discrete Lomax(DL) distribution. Nakagawa and Osaki [26] proposed the discrete Weibull (DW) model, Krishna, and Pundir [25] introduced the discrete Buur (DB) model, Gómez-Déniz and Calderín-Ojeda [29] introduced discrete Lindley (DL), Nekoukhou et al. [16] suggested discrete generalized exponential (DGEx), Al-Babtain et al. [17] introduced the natural discrete Lindley (NDL), and Eliwa et al. [18] introduced the discrete Gompertz Exponential (DGzEx). Gillariose et al. [19] introduced a discrete Weibull Marshall-Olkin exponential distribution. Almetwally et al. [20] introduced Discrete Marshall-Olkin generalized exponential distribution. Marshal and Olkin [21] introduced a novel technique for adding a new parameter to an existing distribution, resulting in a new distribution known as the Marshall-Olkin(MO) extended distribution. This new distribution includes the original distribution as a unique feature and gives the model more flexibility. Sankaran and Jayakumar [22] have presented a detailed analysis on the physical interpretation of the MO family. Let S(x) = G(x) denote the survivor function of a continuous random variable X. The MO extended distribution has a survival function if f (x) = dF (x) dx is the density function connected to the cumulative distribution function (CDF) F(x). J o u r n a l P r e -p r o o f is a special case of G(x). The probability mass function (PMF) for equation (4) has the following shape: Our aim is to introduce discrete Marshal Olkin inverted Topp-Leone (DMOITL) and use this distribution to model the Covid-19 data from different countries. We made point estimation of the unknown parameters by using the maximum likelihood estimation method and Bayesian estimation. The HPD Intervals are used to discuss credible confidence intervals of parameters of new discrete distribution for the results of the MCMC. We computed the confidence intervals (CI) for the DMOITL distribution's unknown parameters using asymptotic confidence intervals (ACI) as well. The rest of this study is organized as follows. In Section 2. We define DMOITL distribution. In section 3, we introduce the statistical properties of DMOITL distribution. The Two parameters of the distribution were estimated by two classical and Bayesian point estimation methods in section 4. While Section 5 is concerned with the interval estimation methods. In section 6 we made a simulation study to compare the performance of the estimating approaches. Three real data sets from COVID-19 in different countries, including Italy, Puerto Rico, and Singapore, are used in section 7 to prove the efficiency of the DMOITL distribution with respect to other distributions. Finally, conclusions and major findings are given in section 8. In this part, we introduced the Marshall Olkin inverted Topp-Leone (MOITL) distribution and converted this new continuous distribution to discrete distribution as discrete MOITL (DMOITL) distribution. By using Equations (4), and Equation (1), the survival function of MOITL distribution can obtained and written as follows: where Θ is defined as a vector parameters of MOITL distribution α, and ϑ. The DMOITL distribution is obtained based on survival discretization method. Equation (6) is used as the survival function of a baseline MOITL model using the parameter vector Θ. As a result, the CDF of the DMOITL distribution is: The corresponding PMF of Equation (7) is defined by where Θ is positive vector parameters. X ∼DMOITL(Θ) indicates the random variable with PMF (8). Figure 1 is a graphical representation for various shapes of the PMF of the DMOITL distribution. These figures show that the PMF of the DMOITL distribution can be right-skewed, symmetric, or decreasing curves. The DMOITL distribution, as seen in the application section, has a lot of versatility and can be used to simulate skewed data. Therefore it's extensively utilized in fields like biomedical studies, biology, dependability, physical engineering, and survival analysis. Sub-models of the DMOITL model for selected values of the parameters are presented as: If α = 1, the DITL distribution with the PMF, and the CDF of the DITL distribution is given by: The DMOITL distribution's reliability measures, moments, and moment generating function (MGF) are shown here. The hazard rate function (HR) of the DMOITL distribution are given by The survival functions of DMOITL is given as There are some important shapes of the HR of the DMOITL distribution in Figures 2. The HR of the DMOITL distribution has some important shapes, containing decreasing, and upside down curve, which are appealing features for various count models. The reverse hazard function of DMOITL is given as The second rate of failure (srf) of DMOITL is The non-central r th -moment of DMOITL distribution can be derived using Equation (8) as follows: In particular, the mean of DMOITL distribution is J o u r n a l P r e -p r o o f The variance of DMOITL distribution is given as The dispersion index (DI) may be determined with the help of the following expression: The skewness value (SKV) for DMOITL distribution, can be positive, zero, negative, or undefined. It can be expressed in terms of the third raw moment: The kurtosis value (KTV) for DMOITL distribution can be expressed in terms of the four th raw moment: From Table 1 , it is apparent that the mean, µ 1 , variance, DI, µ 3 , µ 4 , SKV, and KTV of the DMOITL distribution with different parameters α and ϑ. Point estimation is a very important and critical estimation method, in this section, we will apply both classical and non-classical methods of estimation. First, we will apply the maximum likelihood estimation (MLE), and then we will apply the Bayesian estimation method. J o u r n a l P r e -p r o o f Journal Pre-proof Now we are talking about the first classical method which is the MLE. Let X = (X 1 , . . . , X n ) be a random sample of size n from the DMOITL distribution. The log-likelihood equation of the vector Θ = (α, ϑ) are given by By differentiating Equation (17), we can acquire the non-linear likelihood equations with respect to the parameters α, and ϑ, respectively: and where We use a nonlinear optimization algorithm like the Newton Raphson method because these equations are cannot be solved explicitly. Bayesian estimation is one of the most important and accurate methods of estimation. In Bayesian estimation the parameters is considered as a random variable that is distributed with a certain distribution. We assign a prior believe about the parameter by using a prior distribution for the two parameters. The capacity to integrate previous information into study helps make the Bayesian technique very valuable for reliability assessment, since one of the primary challenges involved with reliability analysis is data scarcity. For the α and ϑ parameters of DMOITL distribution are distributed with gamma prior distributions, where α and ϑ are non-negative values. The α and ϑ parameters as independent joint prior density functions can be expressed as follows: The joint posterior density function of Θ is derived from likelihood function of DMOITL distribution and joint prior density (20) . Under the symmetric loss functions, most of the Bayesian inference procedures have been developed squarederror loss function is commonly symmetric loss function. The Bayes estimators of Θ, say ( α B , ϑ B ) based on squared error loss function is given by and It is noticed that the integrals are given by (22, 23) are not possible to derive explicitly. As a consequence, we estimate the value of integrals in (22, 23) using the Markov Chain Monte Carlo (MCMC) approach. . Many studies used MCMC techniques such as Almetwally et al. [23, 24] , Basheer et al. [30] , Almongy et al. [32, 33] , and Bantan et al. [34] . Gibbs sampling and the more generic Metropolis within Gibbs samplers are significant sub classes of Markov chain Monte Carlo (MCMC) techniques. The Metropolis-Hastings (MH) and Gibbs sampling techniques are the two most often used instances of the MCMC method. The MH method, like acceptancerejection sampling, thinks that a candidate value from a proposal distribution can be produced for each iteration of the algorithm. The MH algorithm, similar to acceptance-rejection sampling, believes that for each iteration of the algorithm, a candidate value from a proposal distribution can be produced. To generate random samples of conditional posterior densities from the DMOITL distribution, we employ the MH within the Gibbs sampling steps: and In this section, we introduce the construction of confidence intervals with two different methods to estimate the unknown parameters of the DMOITL distribution, which are asymptotic confidence interval (ACI) in MLE and credible confidence interval in MCMC of α, and ϑ. Using the asymptotic normal distribution of the MLE is the most popular method to set confidence bounds for the parameters. This method is similar to the ACI for more information see Chen and Shao [31] The HPD intervals: Chen and Shao [31] discussed this technique to generate the HPD intervals of unknown parameters of the benefit distribution. In this study, samples drawn with the proposed MH algorithm should be used to generate time-lapse estimates. For example, using the MCMC sampling outputs and the percentile tail points, a (1 − γ%) HPD interval with two points for 2 th parameters of the DMOITL distribution can be generated. According to [31] , the BCIs of the parameters of DMOITL distribution α, ϑ can be obtained through the following steps: 1. Arrangeα, andθ as α [1] ≤α [2] ≤ . . . ≤α [A] and θ [1] ≤θ [2] ≤ . . . ≤θ [A] , where A denotes the length of the generated of MH algorithm. In this part of the paper, we made a simulation study to assess the performance of the distribution by The Tables 2, 3 and 4 summarise the simulation findings for the methodologies provided in this work for estimating parameters of the DMOITL distribution using point estimate and interval estimation. . It is necessary to compare the different point estimating approaches by calculating the Bias, MSE, and lower and higher confidence intervals. These tables let the following conclusions to be drawn: 1. As n rises, the Bias and MSE of the DMOITL distribution drop. 2. Bias and MSE for α and ϑ parameters grow as ϑ increases. 3. As the value of α grows, the Bias and MSE values for the α and ϑ parameters decrease. decrease. 4. Bayesian estimation is the best approach for estimating the parameters as it provides the smallest MSE and Bias and also has the shorties confidence interval 5. Using Bayesian estimation, the MLE ACI confidence interval for parameters of the DMOITL distribution has the smallest confidence interval. Journal Pre-proof In this part of the paper, we used two real data sets as an application on the superiority of the distribution The DMOITL distribution is fitted to more notable fields of Covid-19 with diverse countries such as Italy, Puerto Rico, and Singapore in this part. We compare the fits of the discrete Buur (DB) [Krishna and Pundir [25] ] model, discrete Weibull (DW) [Nakagawa and Osaki [26] ], discrete inverse Weibull (DIW) [Jazi et al. [27] ], Poisson, negative binomial (NB), discrete alpha power inverse Lomax (DAPIL) [Almetwally and Ibrahim [28] ], discrete Lindley (DLi) [Gómez-Déniz and Calderín-Ojeda [29] ], and DITL models in Tables 5, 6 and 7. Tables 5, 6 and 7 provide values of Cramér-von Mises (CvM), Anderson-Darling (AD), Kolmogorov-Smirnov (KS) and Akaike information criterion (AIC) statistics for the all models fitted based on three real data sets.These tables also include the MLE of the parameters for the models under consideration. Figures 3, 5 and 7 show the fitted DMOITL, PMF, CDF, PP-plot, and QQ-plot of the three data sets, respectively. These statistics show that among all fitted models, the DMOITL distribution has the lowest CvM, AD, KS, and AIC values. Using alternative data, Table 8 presented MLE and Bayesian estimation methods for parameters of the DMOITL distribution. Figures 4, 6, 8 show convergence plots of MCMC for parameter estimates of DMOITL distribution for different data set. Firstly: This is a COVID-19 data set from Puerto Rico that spans 38 days, from February 26 to April 4, 2021. This data set is comprised of newly reported instances on a daily basis. The data are as follows: 100, 311, 114, 253, 287, 151, 30, 102, 199, 261, 305, 185, 120, 68, 46, 356, 160, 235, 193, 216, 67, 69, 332 Secondly: This is a 61-day COVID-19 data set from Italy, recorded between 13 June and 12 August 2021. This data set is comprised of newly reported instances on a daily basis. The data are as follows: 52, 26, 36, 63, 52, 37, 35, 28, 17, 21, 31, 30, 10, 56, 40, 14, 28, 42, 24, 21, 28, 22, 12, 31, 24, 14, 13, 25, 12, 7, 13, 20, 23, 9, 11, 13, 3, 7, 10, 21, 15, 17, 5, 7, 22, 24, 15, 19, 18, 16, 5, 20, 27, 21, 27, 24, 22, 11, 22, 31, 31. Thirdly: This is a 242-day COVID-19 data set from Singapore, recorded between 20 November 2020 and 19 July 2021. This data set is comprised of newly reported instances on a daily basis. The data are as follows : 4, 4, 5, 12, 5, 18, 7, 5, 4, 6, 8, 5, 10, 2, 9, 3, 13, 5, 13, 12, 6, 6, 8, 8, 7, 5, 16, 12, 24, 9, 17, 19, 10, 29, 21, 13, 14, 10, 5, 5, 13, 27, 30, 30, 33, 35, 24, 28, 31, 33, 23, 29, 42, 22, 17, 38, 45, 30, 24, 30, 14, 30, 40, 38, 15, 10, 48, 44, 14, 25, 34, 24, 58, 29, 29, 19, 18, 22, 25, 26, 24, 22, 11, 15, 12, 18, 9, 14, 9, 1, 11, 11, 14, 12, 11, 10, 4, 7, 10, 13, 12, 11, 12, 8, 23, 19, 9, 13, 13, 13, 6, 10, 8, 10, 8, 17, 12, 11, 9, 15, 15, 17, 12, 12, 13, 15, 17, 12, 23, 12, 21, 26, 34, 26, 43, 18, 10, 17, 24, 35, 21, 26, 32, 20, 25, 14, 27, 16, 34, 39, 23, 20, 14, 15, 24, 39, 23, 40, 45, 12, 23, 35, 24, 34, 39, 17, 17, 16, 18, 25, 20, 28, 19, 25, 16, 34, 52, 31, 49, 28, 38, 38, 41, 40, 29, 25, 36, 30, 26, 24, 30, 33, 25, 23, 18, 31, 45, 13, 18, 20, 14, 9, 4, 13, 9, 18, 13, 25, 14, 24, 27, 16, 21, 11, 16, 18, 22, 23, 20, 17, 14, 9, 10, 16, 10, 10, 7, 11, 13, 10, 12, 16, 10, 6, 8, 26, 26, 60, 48 , 61, 68, 92. Akaike information criterion (AIC) statistics for the all models fitted based on three real data sets we found that our proposed distribution is the best model as it has the lowest value of AIC and KS values. By referring to these values, we can make sure that our proposed distribution is superior among all its competitors We sketched the log-likelihood for each parameter as shown in figure 9 , 10 , 11 by fixing one parameter and varying the other. The figures show that the three data sets behaves very well, as we can see that the two roots of the parameters are global maximum, and also by differentiating the log-likelihood with respect to each parameters, we found that the function is a decreasing function and it intersects the x-axis in a single point which is the root of the parameter, and that assures that the roots are unique In this paper, we introduce Discrete Marshall Olkin Inverted Topp-Leone distribution which is called DMOITL. We derived its statistical properties. We made the point and interval estimation by classical and Bayesian estimation methods for the DMOITL unknown parameters α and ϑ. We conducted simulation analysis using the R package to differentiate the performance of different estimation methods. We deduced that the Bayesian method is very efficient than the classical method as it gets more efficient results through the values of the MSE and the length of the confidence interval as it is always shorter and the MSE is always smaller. In order to prove the superiority and applicability of the proposed distribution, we made a data analysis through the COVID-19 data. We used three data sets in three different countries thought different intervals of time, and by referring to the results in tables 5, 6 and 7 that provide values of Cramér-von Mises (CvM), Anderson-Darling (AD), Kolmogorov-Smirnov (KS) and Akaike information criterion (AIC) statistics for the all models fitted based on three real data sets we found that our proposed distribution is the best model as it has the lowest value of AIC and KS values. The paper includes the data used to support the study's results. The authors state that they have no known conflicting financial or personal interests that might seem to have influenced the work presented in this study. A new statistical approach to model the counts of novel coronavirus cases Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel medicine and infectious disease Statistics-based predictions of coronavirus epidemic spreading in mainland China Estimation of the final size of the coronavirus epidemic by the SIR model A new extended rayleigh distribution with applications of COVID-19 data Modeling the survival times of the COVID-19 patients with a new statistical model: A case study from China On the maximum likelihood and least-squares estimation in the inverse Weibull distribution Bayesian and non-Bayesian estimation for the bivariate inverse weibull distribution under progressive type-II censoring On a bivariate Fréchet distribution Statistical properties and estimation of inverted Topp-Leone distribution A new inverted top-leone distribution: applications to the COVID-19 mortality rate in two different countries Kumaraswamy Inverted Topp-Leone Distribution with Applications to COVID-19 Data The Odd Weibull Inverse Topp-Leone Distribution with Applications to COVID-19 Data Discrete rayleigh distribution On discrete three parameter Burr type XII and discrete Lomax distributions and their applications to model count data from medical science Discrete generalized exponential distribution of a second type A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications Discrete Gompertz-G family of distributions for over-and under-dispersed data with properties, estimation, and applications On the Discrete Weibull Marshall-Olkin Family of Distributions: Properties, Characterizations, and Applications. Axioms Managing risk of spreading "COVID-19" in Egypt: Modelling using a discrete Marshall-Olkin generalized exponential distribution A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families On proportional odds models Bayesian and maximum likelihood estimation for the Weibull generalized exponential distribution parameters using progressive censoring schemes Maximum Product Spacing Estimation of Weibull Distribution Under Adaptive Type-II Progressive Censoring Schemes Discrete Burr and discrete Pareto distributions The discrete Weibull distribution A discrete inverse Weibull distribution and estimation of its parameters Discrete Alpha Power Inverse Lomax Distribution with Application of COVID-19 Data The discrete Lindley distribution: properties and applications Marshall-Olkin Alpha Power Inverse Weibull Distribution: Non Bayesian and Bayesian Estimations Monte Carlo estimation of Bayesian credible and HPD intervals Applying Transformer Insulation Using Weibull Extended Distribution Based on Marshall-Olkin Alpha Power Lomax Distribution: Estimation Methods, Applications on Physics and Economics Bayesian Analysis in Partially Accelerated Life Tests for Weighted Lomax Distribution Taif University Researchers Supporting Project number (TURSP-2020/318), Taif University, Taif, Saudi Arabia.