key: cord-0075860-02pyrugn authors: Osatohanmwen, Patrick; Efe-Eyefia, Eferhonore; Oyegue, Francis O.; Osemwenkhae, Joseph E.; Ogbonmwan, Sunday M.; Afere, Benson A. title: The Exponentiated Gumbel–Weibull {Logistic} Distribution with Application to Nigeria’s COVID-19 Infections Data date: 2022-03-19 journal: Ann DOI: 10.1007/s40745-022-00373-0 sha: 514206f46925445ef3a1366d7f0b48f8199eed18 doc_id: 75860 cord_uid: 02pyrugn A new flexible univariate probability distribution was defined in this paper. The new distribution is so called the ‘exponentiated Gumbel–Weibull {logistic} distribution’ and it arose by using the exponentiated Gumbel distribution to generate a generalized Weibull distribution using the logit function or the quantile function of the logistic distribution as a link. The new distribution was observed to be both unimodal and bimodal as well as exhibits various shape and tail properties consistent with data arising from several real life phenomena. A detail study of its statistical properties was carried out and the maximum likelihood method was used in the estimation of its parameters. The new distribution was applied in fitting the reported daily number of infections due to the COVID-19 pandemic in Nigeria. Five other datasets were further used to ascertain the flexibility of the new distribution in fitting data sets with different statistical properties. The science of data is one which involves the use of some methodologies from disparate fields in extracting information from data usually for policy purposes. These methodologies include statistical methodologies, scientific methodologies, artificial intelligence as well as data analysis methodologies [1] [2] [3] . These methodologies come handy in aggregating, cleaning, preparing data for analysis, manipulating data as well finding specific patterns or trajectories that data follow. Within the vanguard of statistical modeling of data, the practice is usually to find a stochastic model which best describe the behavior of a given data. These stochastic models are usually completely specified as probability distribution functions from which other desirable properties of the data are obtained for either policy making or for further investigations. The need to obtain appropriate distribution functions which can best describe the stochastic behavior of data sets arising from several real life situations is one of the major drives for the development of new and more flexible families of probability distributions. Within the context of applications, the classical probability distribution functions have been found to be unable to adequately fit data sets with varying shape and tail properties in many studies and hence the increasing volumes of research devoted so far to generalized them and in the process increase their flexibility. Several methods have been put forward in the literature for the generalization of a probability distribution [4] [5] [6] [7] [8] [9] [10] [11] [12] each with their attendant benefits and shortcomings. The COVID-19 pandemic is one which has ravage the entire world and accompanying it are economic, social and behavioral challenges and responses. Several studies, using mathematical models, statistical models, behavioral models and those involving artificial intelligence frameworks have been put forward already to explain the evolution, transmission and the impacts of the pandemic in several countries of the world using data on the daily, weekly or monthly number of infections from the disease [13] [14] [15] [16] [17] [18] [19] [20] [21] . However, it is important to state that data of this nature tends to possess one or more characteristics which classical probability distributions as used in statistical modeling may not be able to capture when they are used to describe them. For example, data of this sort tends to be highly skewed either to the right or to the left with the possibility of having some outlying observation and hence, a classical distribution like the normal distribution cannot be used to fit such data and it becomes imperative to use a very flexible distribution to fit data of this sort such as generalized families of distributions. In this paper a new probability distribution which is a generalization of the classical Weibull distribution is developed and used to fit the daily number of infections from the COVID-19 pandemic in Nigeria. The new distribution is further used in fitting five other data sets in order to demonstrate how flexible it can be. The rest of the paper is organized thus. In Sect. 2, the new distribution is presented. A discussion on some of the statistical properties of the distribution is contained in Sect. 3 . The process of using the maximum likelihood method for the estimation of the parameters of the distribution is contained in Sect. 4 while application of the distribution to real data sets is carried out in Sect. 5 . The paper closes in Sect. 6 with summary and conclusion. Supposed T is a random variable following the exponentiated Gumbel distribution defined by [22] with the cumulative distribution function (cdf), probability density function (pdf) and quantile function given respectively by Suppose also that R is a Weibull random variable with cdf, pdf and quantile function given respectively by Let Y be a standard logistic random variable with cdf, pdf and quantile function given respectively The cdf is a valid cdf and from (1) we have the cdf of the 5-parameter exponentiated Gumbel-Weibull {logistic} (EGuWL) distribution given as The pdf corresponding to (2) is expressed as where the parameters α, β, c and k control the shape of the distribution and λ is scale parameter. The graphs of the pdf in (3) In Fig. 1 , for fixed values of the parameters λ and k we observe that the EGuWL density is highly skewed to the right when the parameters α, βandc are varied. In fact, for decreasing (increasing) values of parameter α(parameter β) the density falls exponentially. This behavior shows that the EGuWL distribution can be very effective in fitting highly right-skewed data sets with possibility of outliers or reverse-J shaped data sets. In Fig. 2 , for fixed values of λ and β and varied values of α, candk the EGuWL density can be bimodal and almost symmetric. For negative values of k and increasing (decreasing) values parameter α(parameter c), the EGuWL density is bimodal and for non-negative values of the parameter k and increasing (decreasing) values of parameter α(parameter c), the EGuWL density is almost symmetric. This highlights that the EGuWL distribution can be used for fitting bimodal and near symmetric data sets. In Fig. 3 , the EGuWL density is also observed to possess left-skewness. In fact, for fixed values of λ and α the density is skewed to the left when the value of β is decreasing and when the values of kandc is increasing. This also shows that the EGuWL distribution can also be used to fit left-skewed data sets. Observe that in the Figs. 1, 2 and 3, the value of the parameter λ is always fixed, this is because λ is a scale parameter and its value does not affect the shape of the density. Proposition 1: Suppose X is an EGuWL random variable and U and T are uniform random variable defined on (0, 1) and exponentiated Gumbel random variable respectively, then. The proof of (i) and (ii) follow from (1) and (4) respectively. Proposition 1 is very useful for simulating random samples from the EGuWL distribution by first simulating from the exponentiated Gumbel distribution or the uniform distribution and applying the transformation accordingly. The relation in (i) can also be used to determine the moments of the EGuWL distribution. Here we present some essential statistical properties of the EGuWL distribution. A discussion on the hazard function is used to begin the section. The hazard function of the EGuWL distribution is expressed as The mode(s) of the EGuWL distribution is either at x 0 or it will satisfy the equation. where Proof: As observed from the graphs of the EGuWL density, the distribution can be both unimodal and bimodal. On differentiating the EGuWL density w.r.t x, one obtains. The derivative f (x) does not exist when x 0. Other critical point(s) satisfy f (x) 0, hence the EGuWL distribution mode(s) will either be at x 0 or it will satisfy the equation is a factor of f (x) and has the same sign as f (x). Analytical solution of (6) for x is not possible. However, (6) can be solved numerically in order to obtain the desired mode(s). An expression for computing the r th non-central moments of the EGuWL distribution can easily be obtained by making using of the relationship between the EGuWL random variable X and the exponentiated Gumbel random variable T as specified in Proposition 1(i). In particular, the relation X λ log e T + 1 1/α implies that Since X which is an EGuWL random variable is a transformed exponentiated Gumbel random variable T following from proposition 1(i), its moments can be obtained as if one is obtaining the moments of the exponentiated Gumbel random variable T hence the density function of the exponentiated Gumbel distribution will be used in obtaining the moments instead of the more complex density function of the EGuWL distribution and this is a major result in this paper. It follows that The r th non-central moments of the EGuWL distribution are computed from the relation in (7) . The mean (μ), variance σ 2 , skewness (S) and kurtosis (K ) of the EGuWL distribution are given respectively as μ μ 1 , The quantile function can also be used in computing the skewness and kurtosis of a distribution, especially when such quantile function exists in a simple analytic form. Annals of Data Science Galton [23] proposed a quantile measure based approach for evaluating skewness while Moor [24] did the same for Kurtosis. Galton's skewness and Moor's kurtosis are evaluated using the relations Since the quantile function of the EGuWL distribution exists in a simple analytic form as expressed in (4), the above expressions can be used in computing the skewness and kurtosis of the EGuWL distribution. 3-D plots of the Galton's skewness and the Moore's kurtosis of the EGuWL distribution for some selected parameters values are presented in Fig. 7 . Shannon [25] offered a probabilistic definition of entropy. The Shannon entropy η X of a random variable X following a known probability distribution is a measure of variation of uncertainty. The Shannon entropy of a random variable X following the EGuWL distribution can be expressed as. where η T and μ T are respectively the Shannon entropy and mean of the exponentiated Gumbel distribution, The integrals in (9)-(11) exist because log log e t + 1 ≤ log e t + 1 ≤ log2 + t when t > 0, log log e t + 1 ≤ log e t + 1 ≤ log2 when t < 0, log log e t + 1 ≤ log e −t + 1 ≤ log2 + t when t < 0, and log log e t + 1 ≤ log e −t + 1 ≤ log2 when t > 0. Hence It can be easily verified that. where G(.) is the cdf of the Gumbel distribution, γ 0.57722 is the Euler's constant and An expression for μ T was given in [22] as where (.) is the complete gamma function. Here the maximum likelihood method of estimation of parameters is presented for the estimation of the parameters of the EGuWL distribution. For a complete random independent sample x 1 , x 2 , . . . , x n of size n, the log-likelihood function of the EGuWL distribution is Suppose (αβckλ) T be the unknown parameter vector, the associated score function is given by where ∂L ∂α , ∂L ∂β , ∂L ∂c , ∂L ∂k and ∂L ∂λ are the partial derivatives of the log-likelihood function w.r.t. to each parameter and are given by The maximum likelihood estimate of is obtained by solving the non-linear systems of equations U( ) 0. Since the resulting systems of equations are not in closed form, the solutions can be found numerically using any of the Newton's type algorithms. The Fisher information matrix (FIM) of the EGuWL distribution is the 5 × 5 symmetric matrix given by Thus, the elements of the FIM can be obtained by realizing the second order partial derivatives of the log-likelihood function w.r.t. to the parameters. These elements can be numerically obtained by using the R software. The total FIM, I( ), can be approximated by For real data, J ˆ is obtained after the maximum likelihood estimate of is gotten, which implies the convergence of the iterative numerical procedure involved in finding such estimate. Supposeˆ is the maximum likelihood estimate of . Under the usual regularity conditions and that the parameters are in the interior of the parameter space, but not on the boundary, we have: is the inverse of the expected FIM, which also corresponds to the variance-covariance matrix of the parameters. The asymptotic behavior is still valid if I −1 ( ) is replaced by the inverse of the observed information matrix evaluated atˆ , that is J −1 ˆ . The multivariate normal distribution with mean vector 0 (00000) T and covariance matrix I −1 ( ) can be used to construct confidence intervals for the EGuWL parameters. The approximate 100(1 − ω)% two-sided confidence interval for the parameters α, β, c, kand λ are given bŷ Here we conduct a Monte Carlo simulations study to assess the performance and efficiency of the maximum likelihood estimators of the parameters of the EGuWL distribution. The performance of the maximum likelihood estimators are examined for different sample sizes and different combinations of parameter values. The simulation is repeated for N 5000 times using the sample sizes n 25, 80, 150, 400, 800and 1500 and parameter combination values I : (d) Coverage probability (CP) of 95% confidence intervals of the parameters (αβckλ) i.e., the percentage of intervals that contain the true value of parameter ; (e) Average width (AW) of 95% confidence intervals of the parameter (αβckλ). Tables 1 and 2 contain the results for the quantities ME, AVB, RMSE, AW and CP. In Tables 1 and 2 , it can be observed that ME of all the parameters reduce as the sample size increases and moves toward their true values. The AVB of all the parameters are all positive and reduce as the sample size increases. The RMSE and the AW of all the parameters also reduce as the sample size increases. The simulations was also conducted for other sets of combination of parameter values namely α 4, β 4.5, c 2, k 0, λ 0.5, α 2.5, β 3, c 5, k −5, λ 1 and α 4, β 2.5, c 1.5, k 5, λ 2.5 and the results followed similar pattern as obtained in Tables 1 and 2 . To conserve space, they are not reported. 675, 452, 649, 594, 684, 779, 490, 566, 561, 790, 626, 454, 603, 544, 575, 503, 460, 499, 575, 664, 571, 595, 463, 643, 595, 600, 653, 556, 562, 576, 543, 604, 591, 438, 555, 648, 624, 404, 481, 462, 386, 304, 288, 304, 457, 354, 443, 453, 437, 290, 423, 453, 373, 329, 325, 298, 417, 410, 593, 476, 340, 601, 322, 321, 252, 221, 296, 160, 250, 138, 143, 239, 216, 125, 162, 100, 155, 296, 176, 197, 188, 160, 79, 132, 90, 126, 131, 221, 189, 97, 195, 176, 111, 125, 213, 136, 126, 136, 187, 201, 153, 126, 160, 58, 120, 118, 155, 103, 151, 111, 163, 164, 225, 179, 148, 212, 113, 133, 118 The EGuWL distribution will be applied to fit the daily number of reported infections from the COVID-19 pandemic in Nigeria. Five other data sets will also be used to demonstrate its flexibility. The fit of the EGuWL distribution will be compared with those of other models in its class. For the first application, the EGuWL is used to fit the daily number of reported infections from the COVID-19 pandemic in Nigeria for a seven months period (20th March-19th October, 2020). The data set was obtained from the website of the National Center for Disease Control (NCDC) at http://covid19.ncdc.gov.ng/. The data set is unimodal, right-skewed and platykurtic (skewness 0.4671, excess kurtosis − 0.8916). The data set is contained in Table 3 . The Weibull (W), exponentiated Gumbel (EGu) [22] , the beta exponential (BE) [26] , the beta generalized exponential (BGE) [27] and the Gumbe Weibull {logis-tic} (GuWL) [28] distributions are also used to fit the data and their fits are compared with that of the EGuWL distribution. The BE, BGE and GuWL densities are given respectively by The results from fitting the COVID-19 data which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) Table 4 . Figure 8 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 4 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K-S statistic. For the second application, the EGuWL distribution is used to fit the fatigue time of 101 6061-T6 Aluminum Coupons cut parallel to the direction of rolling and oscillated at 18 cycles per second (cps). The data set was reported in [29] and presented in Table 5 . The data set is unimodal, right-skewed and leptokurtic (Skewness 0.3355 and excess kurtosis 1.1687). The beta normal (BN) [6] , the beta Weibull (BW) [30] , the beta Burr XII (BBXII) [31] , Gumbel-Burr XII {logistic} (GuBXIIL) [32] and the GuWL distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The BN, BW, BBXII and the GuBXIIL densities are given respectively by and (.) are the pdf and cdf of the normal distribution respectively, The results from fitting the Aluminum Coupons which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov-Smirnov (K-S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 6 . Figure 9 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 6 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K-S statistic. (iii) Application to the Kevlar 49/epoxy strands failure times data (pressure at 70%) For the third application, the EGuWL distribution is used to fit the Kevlar 49/epoxy strands failure times data (pressure at 70%). The data set was reported in [28] . The data set is multimodal, platykurtic, and approximately symmetric. (skewness 0.0998, excess kurtosis − 0.79). The data set is presented in Table 7 . The BN, BW, GuWL, beta exponentiated Weibull (BEW) [33] and the Gumbel-Weibull {logistic} Poisson (GuWLP) [12] distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The BEW and the GuWLP densities are given respectively by The results from fitting the Kevlar 49/epoxy strands failure times data (pressure at 70%) which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov-Smirnov (K-S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 8 . Figure 10 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 8 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K -S statistic. (iv) Application to the Kevlar 49/epoxy strands failure times data (pressure at 90%) For the fourth application, the EGuWL distribution is used to fit the Kevlar 49/epoxy strands failure times data (pressure at 90%). The data set was reported in [28] . The data set is unimodal, leptokurtic, and highly skewed to the right (reverse J-shape) (skewness 3.0472, excess kurtosis 14.4745). The data set is presented in Table 9 . The BN, BW, GuWL, exponentiated Weibull (EW) [5] and the GuWLP distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The EW density is given by The results from fitting the Kevlar 49/epoxy strands failure times data (pressure at 90%) which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov-Smirnov (K-S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 10 . Figure 11 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 10 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K -S statistic. (v) Application to the Australian Athletes' Height Data For the fifth application, the EGuWL distribution is used to fit the heights (in centimeters) of 100 female Australian athletes. The data set was collected by the Australian Institute of Sport and reported in [28] . The data set is unimodal, leptokurtic, and left-skewed (skewness − 0.5684, excess kurtosis 1.3212). The data set is presented in Table 11 . The BN, GuWL, EW, Weibull-Pareto {exponential} (WPE) [34] and the beta skew normal (BSN) [35] distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The WPE and the BSN densities are given by The results from fitting the Heights data which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov-Smirnov (K-S) statistic values (the corresponding p values are also reported) of all the fitted distributions are reported in Table 12 . Figure 12 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 12 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K -S statistic. For the last application, the EGuWL distribution is used to fit the sum skin folds of 100 female Australian athletes. The data set was collected by the Australian Institute of Sport and reported in [28] . The data set is unimodal, leptokurtic, and right-skewed (skewness 0.7878, excess kurtosis 0.7320). The data set is presented in Table 13 . The BN, GuWL, WPE, EW and BW distributions are also used to fit the data set and their fits are compared with that of the EGuWL distribution. The results from fitting the sum of skin folds data which include the estimate of the parameters, the standard errors of these estimated parameters, the loglikelihood (loglik) values, the Akaike Information Criterion (AIC) values and the Kolmogorov-Smirnov (K-S) statistic values (the corresponding p-values are also reported) of all the fitted distributions are reported in Table 14 . Figure 13 shows the graph of all the fitted densities alongside the histogram of the data. The results in Table 14 clearly show that the EGuWL distribution provided the best fit for the data by possessing the smallest AIC value as well as the highest p value of the K-S statistic. A new flexible probability distribution called the exponentiated Gumbel-Weibull {logistic} distribution has been defined and studied in this paper. The new distribution has been applied in modeling the daily number of infections from the novel COVID-19 pandemic in Nigeria. Five other data sets which exhibit various shape and tail behaviors have been further used to buttress the flexibility of the new distribution. The performance of the distribution in fitting the various data sets have been compared with those of other probability distributions in its class and results obtained showed that the new distribution gave the best fits. We hope the new distribution will attract further usage in fitting data sets from other fields. Funding No funding was received for conducting this study. On behalf of all authors, the corresponding author states that there is no conflict of interest. Ethics approval Ethical standards as recommended by the journal and in line with global best practices have been followed in the course of wrting the article as well as in the reporting of the results conatined therein. Data Availability All data as used in the article and in the generation of results are contained in the body of the article and where necesary, URL address have been provided to also acess them. The codes used in the article can be obtained upon request from the corresponding author. Introduction to business data mining Optimization based data mining: theory and applications Internet of things, real-time decision making, and artificial intelligence A class of distributions which includes the normal ones Exponentiated Weibull family for analyzing bathtub failure-rate data Beta-normal distribution and its applications The alchemy of probability distributions: beyond Gram-Charlier expansions and a skew-kurtotic-normal distribution from a rank transmutation map A new family of generalized distributions The exponentiated generalized class of distributions T -normal family of distributions: a new approach to generalize the normal distribution A new generalized family of distributions on the unit interval: the T -kumaraswamy family of distributions The T -R Y power series family of probability distributions Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data Real-time forecasts of the COVID-19 epidemic in China from An updated estimation of the risk of transmission of the novel corona virus (2019-nCov) Estimation of the transmission risk of the 2019-nCov and its implication for public health intervention Nowcasting and forecasting the potential domestic and international spread of the 2019-nCov outbreak originating in Wuhan, China: a modelling study Modeling the daily number of reported cases of infection from the COVID-19 Pandemic in Nigeria: a stochastic approach Using social media to predict the stock market crash and rebound amid the pandemic: the digital 'Haves' and 'Have-mores Culture vs policy: more global collaboration to effectively combat COVID-19 What are the underlying transmission patterns of COVID-19 outbreak? An age-specific social contact characterization The exponentiated Gumbel distribution with climate application Enquiries into human faculty and its development. Macmillan and Company, London 24. Moor JJ (1988) A quantile alternative for Kurtosis A mathematical theory of communication The beta exponential distribution The beta generalized exponential distribution Gumbel -Weibull distribution: properties and application A new family of life distributions The beta-Weibull distribution The beta Burr XII distribution with application to lifetime data A new Member from the T-X family of distributions: the Gumbel-Burr XII distribution and its properties The beta exponentiated Weibull distribution Weibull-pareto distribution and its applications A generalization of the beta skew-normal distribution: the beta skewnormal Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations