key: cord-0075731-j886a6wd authors: Rahman, Tousifur; Hazarika, Partha Jyoti; Ali, M. Masoom; Barman, Manash Pratim title: Three-Inflated Poisson Distribution and its Application in Suicide Cases of India During Covid-19 Pandemic date: 2022-03-15 journal: Ann DOI: 10.1007/s40745-022-00372-1 sha: ae6eb723df2349022110b2e1ab62cc1eb85ddfb2 doc_id: 75731 cord_uid: j886a6wd Inflated models are generally used whenever there is an excess number of frequencies at particular count. In this study, a three-inflated Poisson (ThIP) distribution is proposed by mixing the Poisson distribution and a distribution to a point mass at three. Some of its distribution properties and reliability characteristics are studied. A simulation study is carried out to see the performance of the MLEs. In India Covid-19 implications on mental health have been abysmal. Covid-19 related suicide data of India during lockdown to the first gradual relaxation of the terms of the total lockdown (unlocking 1.0) are used to examine the appropriateness of the proposed distribution. Likelihood ratio test is used for discriminating between Poisson and the proposed distribution. Statistical modelling is an essential part of data science in various areas of scientific research or decision-making situations. To carry out this approach, selecting an appropriate distribution is one of the significant tasks. Distributional properties are quite important while dealing with huge data in data science (see [1] [2] [3] for example). In medical and social sciences, modeling count variables is an everyday exercise [4] . Poisson distribution plays a foremost part in count data analysis. Count data such as number of suicide attempts, number of heart attacks, number of unprotected sexual encounters, number of days of alcohol drinking, the number of days of missing primary activities, number of cigarettes smoked, number of hospitalizations, or number of unhealthy days during a period are common in medical and psychological research [5] [6] [7] . Poisson distribution is widely used to model such type of count data [6] . As a matter of fact, what happens because of data clustering or certain other factors is a kind of heterogeneity in study populations leading to the creation of extra variability which in turn results in variance that is greater than the mean [6] . In such cases Poisson distribution is inappropriate for data modelling. Medical and public health research often used zero-inflated models when there is a large proportion of zeros [5] . Whenever there is an excess number of frequencies at particular count, inflated models are used. The following are the situations in which too many observed frequencies at particular count data point may occur: (I) All the participants of the study area contemplated are not affected by the Poisson process, so inflation occurs at a particular count; (II) The increase or decrease of the participants of an area at a particular count into the sample may be due to some certain unavoidable problem in the sampling, which leads to inflation or deflation at that particular count; (III) There is no possibility that all the participants in the sample would come into a particular count as an utmost case of situation (II) and this is called truncation at that particular count; (IV) Leading to the data generating procedure of Poisson distribution, we have a sub population as an amalgamation of situations (I) and (III), whereas the part of the population out of this subpopulation in contention not affected by the distribution process furnished excess count in that particular count. Thus the inflated distribution is a mixture between a distribution to a point mass at a particular count and any other count distribution supported by non-negative integers [8] . In statistical literature, the issue of zero moderation in count data has a long history. Neyman [9] and Feller [10] first introduced the concept of zero-inflation when there is a problem of extravagant zeros in the data. Mullahy [11] introduced the zero-inflated Poisson (ZIP) distribution as a mixture of Poisson distribution and a distribution to a point mass at zero, with mixing probability γ , denoted by Z I P(λ, γ ) and the probability mass function of this distribution is given by where γ is a zero-inflation parameter (0 < γ < 1), λ ≥ 0 and if γ 0, ZIP distribution reduces to Poisson distribution. Using inflated Poisson distribution Pandey [12] narrates a situation of the number of flowers of primulaveris and he exhibits the persistence of Poisson distribution inflated at the point eight, not at zero with the extravagant number of plants with eight flowers. Keeping this example in view, inflated discrete distribution should be studied at any point say k, (k 0). Johnson et al. [13] described the zero-inflated distribution as a mixture of any count distribution hold up on non-negative integers and distribution at a point mass at zero and is defined as follows: A random variable Z is said to be a zero-inflated distribution if its probability mass function is given by where γ is a zero-inflation parameter (0 < γ < 1) and h(z; ) is the pmf of Z with a vector of parameter, {φ 1 , φ 2 , ..., φ n }. Gupta et al. [14] studied the structural properties and attained the MLEs of discrete distributions inflated at the point zero. Murat and Szynal [15] studied the discrete distributions inflated at any point j, ( j ≥ 0), which was extended by the results of Gupta et al. [14] . Najundan et al. [16] estimate the parameters of zero-inflated Poisson model using the method of moments and compared with the maximum likelihood estimators. Using some natural calamities data Beckett et al. [17] studied zero-inflated Poisson model and juxtapose MLEs and MMEs regarding standardized bias and standardized mean squared error. Zero-inflated Poisson distribution, Zero-inflated binomial distribution, Zero-inflated negative binomial distribution and Zero-inflated geometric distribution are characterized by Najundan et al. [18] , Najundan et al. [19] , Suresh et al. [20] and Nagesh et al. [21] . Alshkaki [22] introduced zero-one-inflated Poisson (ZOIP) distribution and defined as: A random variable Z is said to be a zero-one inflated Poisson distribution, denoted by Z O I P(λ, γ , ψ), if its probability mass function is given by where γ and ψ are zero and one inflation parameter (0 < γ < 1, 0 < ψ < 1, 0 < γ + ψ < 1) and if ψ 0 ZOIP reduces to ZIP and if γ 0 and ψ 0 ZOIP reduces to Poisson distribution. He studied its structural properties and estimates its parameters by method of maximum likelihood and method of moments. Using three real data sets constituting, a stillbirths of rabbit's data, an accident insurance claims data and a heavy vehicle traffic accident data, he shows that the zero-one inflated Poisson distribution gives better fitting than the zero inflated Poisson distribution and also MLE provides better estimates than MME. Singh et al. [23] introduced two-inflated binomial distribution to investigate the mechanism of son preference through the modeling of the pattern of male children in Uttar Pradesh, where family size and sex composition are dominated by strong son reference. Mwalili et al. [24] studied a zero-inflated negative binomial model to gratify extravagant zeros, an extension of negative binomial distribution. For fitting a data set of excessive zeros, how a zero-inflated Poisson regression is better than a Poisson regression was demonstrated by Lambert [25] . Lambert [25] used a dataset of the number of manufacturing defects on writing boards to juxtapose the models. Many extensions and implementations of zero-inflated Poisson regression were described (for details see [26] [27] [28] [29] [30] and among others). Hall [31] procuring a zeroinflated binomial model considered a situation of upper bound count data, altering Lambert [25] methodology, with an example from horticulture. Famoye and Singh [32] propounded a zero-inflated generalized Poisson regression model, an extension of generalized Poisson regression model. Mwalili et al. [24] studied a zero-inflated negative binomial model to gratify extravagant zeros, an extension of negative binomial distribution. In this paper, the researchers propose a Three-inflated Poisson (ThIP) distribution along with its distributional properties, reliability characteristics and consider the method of moment estimation (MM) and maximum likelihood estimation (MLE) to estimate its parameters. A simulation study has been conducted to see the behavior of the MLEs. In the application part a real-life data set of Covid-19 related suicides during Lockdown to the first gradual relaxation of the terms of the total lockdown (Unlocking 1.0) is used to examine the pertinence of the proposed distribution. The proposed distribution is compared with PD, ZIPD and ZOIPD using log-likelihood, the Akaike Information Criterion (AIC) [33] , the Bayesian Information Criterion (BIC) [34] for model selection and the Kolmogorov-Smirnov (K-S) test [35] P-values for the goodness of fit. Likelihood ratio test is provided to discriminate between Poisson distribution and our proposed distributions. where α is a three-inflation parameter (0 < α < 1) and λ ≥ 0. Some particular cases: When. (ii) α → 0 and λ → ∞, T h I P D(λ, α) reduces to Normal distribution. The probability of three in ThIPD is larger than that of a general PD. 3 3! < 1. Now multiplying −α and adding α to both sides, we get Finally, adding both sides by e −λ λ 3 3! , we get It is observed from the plots of T h I P D (λ, α) in Fig. 1 that as α increases the curve is peak at z 3 and as α decreases and λ increases the curve tends to normal curve. , then the r th order moments about zero is is the second kind of Stirling number. Hence proved. In particular Therefore, The plots of mean and variance of the proposed distribution with different choice of parameters to study their variations are shown in the Figs. 2 and 3. From the Figs. 2 and 3 it is clear that as α tends to 0, the mean and variance of the proposed distribution tends to λ, which is the mean and variance of general Poisson distribution. If Z ∼ T h I P D (λ, α), then the Pearson's β 1 coefficient is as follows The plots of coefficient of skewness of the proposed distribution for different choice of parameters are shown in the Fig. 4 From the Fig. 4 it is observed that β 1 increases as α increases and β 1 tends to zero as λ decreases and increases for 0 < α < 1. As α → 0 and λ → ∞, the coefficient of skewness β 1 → 0 i.e. the proposed distribution tends to symmetric distribution. If Z ∼ T h I P D (λ, α), then the Pearson's β 2 coefficient is as follows The plots of coefficient of Kurtosis of the proposed distribution for different choice of parameters are shown in the Fig. 5 . From the Fig. 5 it is observed that β 2 > 3 as α increases. As α → 0 and λ → ∞, the coefficient of kurtosis β 2 3 i.e. the proposed distribution tends to normal. Theorem 3 If Z ∼ T h I P D (λ, α), then its Probability Generating Function(p.g.f),P z (S)is as follows Proof If Z ∼ T h I P D (λ, α), then the probability generating function P z (S) is Hence proved. Putting S e t in Eq. (10), the Moment Generating , then its CDF is as follows for detail see [36] where (z + 1, λ) and (z + 1) is an upper incomplete gamma function. Hence proved. The plots of CDF of T h I P D (λ, α) with different choice of parameters λ and α are provided in Fig. 6. for detail see [36] where γ (z + 1, λ) a lower incomplete gamma is function and (z + 1) is an upper incomplete gamma function. Hence proved. The plots of Survival Function (SF) of T h I P D (λ, α) with different choice of parameters λ and α are provided in Fig. 7 . Let z 1 , z 2 , z 3 , ..., z n be a random sample from three inflated Poisson distribution as given by Eq. (1) Define Y be the number of Z , i s taking the value three. Then Eq. (1) can be inscribed as follows and using S(z) from Eq. (14) The failure rate of T h I P D (λ, α) is given by The plots of Failure Rate (FR) of T h I P D (λ, α) with different choice of parameters λ and α are provided in Fig. 8 . The parameters λ and α of (1) can be obtained using method of moments as follows: Considering the first two moments from Eqs. (3) and (4) α Putting the value of α from Eq. (16), the Eq. (17) reduces to Then Solving the quadratic Eq. (19), we can estimate the value of λ, which has been used in Eq. (16) to estimate the value of α. The parameters λ and α of Eq. (1) can be obtained using method of maximum likelihood as follows: Let z 1 , z 2 , z 3 , ..., z n be a random sample from three inflated Poisson distribution as given by Eq. (1) can be described as follows Hence the likelihood function, L L(λ, α; z 1 , z 2 , z 3 , ......, z n ) will be Therefore, Similarly, Let, Now, if we replace p by their sample relative frequencies, i.e. by their sample estimates, the proportion of three's in the sample, i.e.p n 0 n and then Eqs. (23) and (24) reduces to 1 − α 6(n − n 0 ) n 6 − e −λ λ 3 (25) and Hence Eq. (27) can be solved by any numerical procedure, say, Newton Rapson method, to obtainλ numerically, i.e.C λ 0. Similarly using Eqs. (22) and (25), α can be estimated aŝ Therefore, the maximum likelihood estimates (MLE) of the parameter λ and α can be estimated by solving (27) numerically to findλ, and Eq. (28) givesα respectively. For the reckoning of the asymptotic variance-covariance matrix of the estimates the second order differentiations of the log-likelihood function are furnished here 3 2 The asymptotic variance-covariance matrix of the maximum likelihood estimates of λ and α for T h I P D (λ, α), can be acquired by inverting the Fisher information matrix (I), given by The ingredient of the above Fisher information matrix can be acquired as. , and so on. The asymptotic distribution of the maximum likelihood estimator λ ,α / is given by In this section a simulation study has been conducted to see the performance of the estimated parameters. Here, to generate random numbers Z from T h I P D (λ, α) we have applied acceptance rejection sampling [37]. By applying this method random samples are generated of size n 30, 50, 100 and 200 with different combinations of true values of parameters λ and α and finally, MLEs are computed using optim function of R software. Bias and MSE of the parameters given in the Table 1 are calculated using the following formulae. θ estimated parameter and θ true parameter Here r ( number of replication) 1000. From the values of the MSE and biases of the simulation study given in Table 1 , it is observed that the bias and MSE of the estimators are small and as the sample size increases the estimated bias and MSE also gradually decreases which is as expected. Furthermore we have checked the normality of the MLEs by normal Q-Q plot for all the parameters of each run. One such Q-Q plot is presented here obtained for the case when (λ, α) (5, 0.3) as a demonstration (Fig. 9) . From the Fig. 9 it is observed that the MLEs of all the parameters follow approximately normal distribution. Covid-19 virus started from Wuhan, China and has blazed the trail of a new world order [38] . This new world order necessitated that the global community drop and dissolve all culture differences and brainstorms to locate mitigating measures especially in Table 1 Results of Simulation λ → Table 1 (continued) λ → respect of providing sustenance to the economy (for details see [39] [40] [41] ). Something that came as a bolt from the blue, India was unprepared to defend the onslaughts of Covid-19. In India, the first case of Covid-19 was reported on 30 th January, 2020 [42] . The disease accelerated to such a level that is prompted the Govt. of India to enforce and clamp an emergency like Lockdown (lockdown denotes a clamp down on almost all human transactions and activities in an emergency), the fallouts of which have been discernible in an emphatic manner [43] . The Lockdown in India starts from 25 th March 2020 to 7 th June 2020 [44] in a phased manner. The first phase of lockdown in India starts from 25 th March 2020 for a period of twenty one days [45] . After the first lockdown, the next three phases are announced with conditional relaxations and restrictions-the second phase with effect from 15 th April 2020 to 3 rd May 2020, the third phase from 4th May 2020 to 17th May 2020 and the fourth phase from 18th May 2020 to 31st May 2020 [45] . The fourth phase of lockdown extended to 7th June 2020.The hardest hit because of the clamping of lockdown/s have been those living on the edge and on the margins like the daily wage earners, private job seekers who lost their job, the farmers who could not locate markets to sell their agricultural produce, the migrant workers who were left stranded like anything, the large chunks of underprivileged students and those who opted for reverse migration [43, 46] . The most discernible corollary has therefore, been the conditions of poverty and starvation which in turn have sapped the vitality and jolted the psychological and cognitive getup of the hugely dense Indian populace [43] . Billed as a biomedical disease with negative cognitive responses, it has been unfortunate that Asian countries have been facing the brunt in terms of the exponential growth of the transmission of Sars-Cov-2 in densely populated areas of internal migrants. There have alarming instances of socially irresponsible behavior and panic attacks among internal migrant workers who are in desperate need of psycho-social support [47] . After completion of the fourth phase of lockdown, the Government of India started unlocking (unlocking denotes relaxations on the imposed lockdown in the event of any emergency) the nation in a phased manner with restriction to containment zones from 8th June, 2020 [48] . The first phase of unlocking i.e. unlocking 1.0 starts from 8th June 2020 to 30th June 2020 [48] . Covid-19 has a diverse array of effects. The worst being that of committing suicides primarily triggered by uncertainty regarding living from hand to mouth in all aspects of life. Given the circumstances, the situation has come to such a pass that suicides, more often than not have occasionally hogged the limelight. The first Indian suicide related to Covid-19 took place on February 12, 2020 [49] , followed by two more such suicides [50] . In addition, the first Covid-19 related Student Suicide case was reported on June 2, 2020 [51] . The reasons being financial distress, fear of infection, freezing of employment opportunities, lack of freedom of movement, withdrawal etc. the major cause of occurrences of suicide [47, 49, 50] . The data set of 298 Covid-19 pandemic related suicide cases during Lockdown to Unlocking 1.0 in India are collected from the web portal https://www.kaggle.com/.. The age and sex distribution of the individuals who committed Suicides during Lockdown and Unlocking 1.0 are presented in (Fig. 10 ) which manifest that, throughout both lockdown and unlocking1.0, the highest percentage of suicides has been committed by individuals of the age group 21-40 and male individuals. The causes of suicide during Lockdown and Unlocking 1.0 are presented in (Fig. 11) , where it can be observed that due to financial distress and fear of infection maximum suicide occurred during lockdown and unlocking 1.0. The occupation of the individuals who committed suicide during Lockdown and Unlocking 1.0 are presented in (Fig. 4) , where it can be observed that maximum individuals who committed suicides during lockdown and unlocking1.0 are migration workers, worker and private sector service (Fig. 12) . In order to study the pattern of the suicide cases, Poisson distribution is used. Since during lockdown to unlocking 1.0 the proportion of three (3) deaths per day is inflated than the others, so we used three-inflated Poisson distribution T h I P D (λ, α) for fitting the real life data set of Covid-19 related suicides along with zero inflated Poisson distribution Z I P D (λ, γ ) [11] , and zero-one inflated Poisson distribution Z O I P(λ, γ , ψ) [22] and standard Poisson distribution P D (λ). The values of the MLEs of the parameters for different distributions are estimated using optim function of R language. The log-likelihood, Akaike information criterion (AIC), Bayesian information criterion (BIC) and the Kolmogorov-Smirnov test (KS test) with p-values are summarized in Table 2 for the number of suicides cases during the 98 days of lockdown and Unlocking 1.0 in India during Covid-19 pandemic. From the Table 2 it is seen that the value of AIC and BIC of ThIPD is smaller than PD, ZIPD, ZOIPD and highest P-value of the KS statistics of ThIPD and also the expected frequencies of ThIPD are closer to the observed frequencies. In Fig. 13 the observed histogram and estimated pmf's of PD, ZIPD, ZIOPD and ThIPD are plotted which also validate our findings and in Fig. 14 the observed Ogive and estimated cdf's of PD, ZIPD, ZOIPD and ThIPD are plotted for visual comparisons. The proposed three-inflated Poisson distribution (ThIPD) provides better fit to the data set under consideration of all criteria. Since Poi(λ) and T h I P D(λ, α) are nested models, the likelihood ratio (LR) test is used to discriminate between them. The LR test is carried out to test the hypothesis: H 0 : α 0, that is the sample is drawn from Poi(λ); against the alternative H 0 : α 0, that is the sample is drawn from T h I P D(λ, α). The value of LR test statistic for the above dataset is given below in Table 3 . The value of the LR test statistic for the dataset is respectively 28.542 which exceeds the critical value at 5% level of significance for one (1) degrees of freedom, i.e., 3.841. Thus the evidence is in support of the alternative hypothesis that the sample data comes from T h I P D(λ, α) and not from Poi(λ). A three-inflated Poisson distribution (ThIPD) is proposed and we studied its distributional properties and reliability characteristics. A simulation study has been conducted to see the behavior of the MLEs. The appropriateness of fitting the distribution is carried out based on the goodness of fit test and some information criteria. The usefulness of the proposed distribution is exemplified by the data of number of suicides occurred during lockdown to unlocking 1.0 in India. The real life data set of Covid-19 related suicides considered here has shown that the proposed three-inflated Poisson distribution (ThIPD) provides better fit in comparison to the other known distributions viz. ZIPD, ZOIPD and general PD under considerations in terms of model selection criteria, namely AIC and BIC and goodness of fit test, namely KS-test. The plots presented above also validate our findings. Moreover from the LR test it is observed that the sample comes from ThIPD, not from PD. Thus our proposed distribution provide better fitting in comparison to the other competitor distributions. Funding No funding. Code availability Not applicable. The authors declare that they have no conflict of interest. We hereby declare that this manuscript is the result of our independent creation under the reviewer's comments. Except for the quoted contents, this manuscript does not contain any research achievements that have been published or written by other individuals or groups. We are the only authors of this manuscript. The legal responsibility of this statement shall be borne by us. Introduction to business data mining Optimization based data mining: theory and applications Internet of things, real-time decision making, and artificial intelligence Modeling Zero-inflated and overdispersed count data: Application to IN-Hospital mortality data Semiparametric analysis of zero-inflated count data Structural zeroes and zero-inflated models A comparison of different methods of zeroinflated data analysis and an application in health surveys The zero-inflated negative binomial-Erlang distribution: An application to highly pathogenic avian influenza H5N1 in Thailand. Songklanakarin On a new class of contagious distributions applicable in entomology and bacteriology On a general class of contagious distributions Specification and testing of some modified count data models Generalized inflated Poisson distribution Univariate discrete distributions Inflated modified power series distributions with applications Non-Zero-Inflated modified power series distributions Asymptotic Comparison of Method of Moments estimators and maximum likelihood estimators of parameters in Zero-Inflated Poisson model Zero-inflated Poisson (ZIP) distribution: parameter estimation and applications to model data from natural calamities A note on the characterization of Zero-inflated Poisson model A characterization of Zero-inflated binomial model On a characterization of Zero-inflated negative binomial distribution A Characterization of Zero-inflated geometric model On the Zero-One Inflated Poisson distribution A probability model for sex composition of children in the presence of son preference The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research Zero-inflated Poisson regression, with an application to defects in manufacturing Zero-inflated Poisson models and C.A.MAN: A tutorial collection of evidence The zero inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives Analysis of zero-inflated Poisson data incorporating extent of exposure Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme Zero-inflated Poisson and binomial regression with random effects: A case study Zero-inflated generalized Poisson regression model with an application to domestic violence data A new look at the statistical model identification Estimating the dimension of a model An Introduction to Probability and Statistics Handbook of the Poisson distribution COVID-19 pandemic and its recovery time of patients in India: A pilot study Using social media to predict the stock market crash and rebound amid the pandemic: the digital 'Haves' and 'Have-mores Culture vs policy: more global collaboration to effectively combat COVID-19. The Innovation What are the underlying transmission patterns of COVID-19 outbreak? Monitoring Novel Corona Virus (COVID-19) infections in India by cluster analysis COVID-19, India, lockdown and psychosocial challenges: What next? Unlocking India during Covid-19 pandemic: a data driven investigation Dynamics of COVID-19 in India: A review of different phases of lockdown Uniyal R (2020) Depression, Anxiety and Stress among Indians in times of Covid-19 Lockdown COVID 19 pandemic: Mental health challenges of internal migrant workers of India Lockdown and unlock for the COVID-19 pandemic and associated residential mobility in India Fear of COVID 2019: first suicidal case in India! Self-harm and COVID-19 Pandemic: an emerging concern-a report of 2 cases from India Kerala class X girl ends life allegedly over lack of access to online classes Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Acknowledgements We acknowledge all the reviewers for extending their suggestions and pieces of expert advice towards the improvement of the manuscript. We are also thankful to Chandan Borgohain, Sibsagar College, Assam, India for extending a helping hand in taking care of the language part.Author's contributions All authors equally contributed towards preparing the manuscript.