key: cord-0469738-ji113ctp authors: Wei, Jiajin; He, Ping; Tong, Tiejun title: Estimating the reciprocal of a binomial proportion date: 2020-09-02 journal: nan DOI: nan sha: 24c24f4a05406f0ff529c384e96da1b406480530 doc_id: 469738 cord_uid: ji113ctp As a classic parameter from the binomial distribution, the binomial proportion has been well studied in the literature owing to its wide range of applications. In contrast, the reciprocal of the binomial proportion, also known as the inverse proportion, is often overlooked, even though it also plays an important role in various fields including clinical studies and random sampling. The maximum likelihood estimator of the inverse proportion suffers from the zero-event problem, and to overcome it, alternative methods have been developed in the literature. Nevertheless, there is little work addressing the optimality of the existing estimators, as well as their practical performance comparison. Inspired by this, we propose to further advance the literature by developing an optimal estimator for the inverse proportion in a family of shrinkage estimators. We further derive the explicit and approximate formulas for the optimal shrinkage parameter under different settings. Simulation studies show that the performance of our new estimator performs better than, or as well as, the existing competitors in most practical settings. Finally, to illustrate the usefulness of our new method, we also revisit a recent meta-analysis on COVID-19 data for assessing the relative risks of physical distancing on the infection of coronavirus, in which six out of seven studies encounter the zero-event problem. The binomial distribution is one of the most important distributions in statistics, which has been extensively studied in the literature with a wide range of applications. This classical distribution has two parameters n and p, where n is the number of independent Bernoulli trials and p is the probability of success in each trial (Hogg, McKean & Craig, 2005 ). The probability of success, p, is also referred to as the binomial proportion. For excellent reviews on its estimation and inference, one may refer to, for example, Agresti & Coull (1998) and Brown, Cai & DasGupta (2001) . Apart from the parameter p, it is known that some of its functions, say p(1 − p) and ln(p), also play important roles in statistics and have received much attention. In this article, we are interested in the reciprocal function which is another important function of p yet is often overlooked in the literature. For convenience, we also refer to θ in formula (1) as the inverse proportion of the binomial distribution. To demonstrate its usefulness, we will introduce some motivating examples in Section 2 that connect the inverse proportion with the relative risk (RR) and with the Horvitz-Thompson estimator (Horvitz & Thompson, 1952; Fattorini, 2006) . Moreover, we will also introduce in Section 6 a relationship of the inverse proportion to the number needed to treat (NNT) and the reduction in number to treat (RNT) in clinical studies, and present some future directions (Laupacis, Sackett & Roberts, 1988; Altman, 1998; Hutton, 2000; Zhang & Yin, 2021) . To start with, let X = n i=1 X i , where X i are independent and identically distributed random variables from a Bernoulli distribution with success probability p ∈ (0, 1). Then equivalently, X follows a binomial distribution with parameters n ≥ 1 and p. Now if we want to estimate the inverse proportion θ, a simple method will be to apply the maximum likelihood estimation (MLE) and it yieldŝ This estimator is, however, not a valid estimator because it is not defined when X = 0, i.e. when there is no successful event in n trials. We refer to this problem as the zero-event problem in the point estimation of θ. In fact, the same problem also exists in the interval estimation of p. Specifically by Hogg, McKean & Craig (2005) , the 100(1 − α)% Wald interval is given aŝ p ± z α/2 p(1 −p) n , wherep = X/n, and z α/2 is the upper α/2 percentile of the standard normal distribution. When X = 0, the lower and upper limits of the Wald interval are both zero; and consequently, they will not be able to provide a (1 − α) coverage probability for the true proportion. To overcome the zero-event problem, Hanley & Lippman-Hand (1983) proposed the "Rule of Three" to approximate the upper limit of the 95% confidence interval (CI) for p. Specifically, since the upper limit of the one-sided CI for p is 1−0.05 1/n when X = 0, the authors suggested to approximate this upper limit by 3/n, which then yields the simplified CI as (0, 3/n). For more discussion on the "Rule of Three", one may refer to Tuyl, Gerlach & Mengersen (2009) and the references therein. In particular, we note that the Wilson interval (Wilson, 1927) and the Agresti-Coull interval (Agresti & Coull, 1998) for p have also been referred to as the variations of the "Rule of Three". The Wilson interval was originated from Laplace who proposed the "Law of Succession" in the 18th century. As mentioned in Good (1980) , Laplace's estimator for the binomial proportion was given as (X +1)/(n+2), which is indeed a shrinkage estimator for p. Wilson (1927) generalized the shrinkage idea and proposed an updated "Law of Succession" asp(c) = (X + c)/(n + 2c), where c > 0 is a regularization parameter. Following the Wilson estimator, Agresti & Coull (1998) proposed to substitutep(c) for p in the Wald interval and yields the Agresti-Coull interval as It is also noteworthy that the Agresti-Coull interval always performs better than the Wald interval, no matter whether n is large or small (Brown, Cai & DasGupta, 2001) . By applying the Wilson estimatorp, one may estimate the inverse proportion as Note that the estimator with form (3) does not suffer from the zero-event problem, and so provides a valid estimate of θ for any given c > 0. In particular, two special cases of estimator (3) with c = 0.5 and 1 have been widely applied in the previous literature (Walter, 1975; Carter et al., 2010) . Moreover, there are other estimators that follow the structure of (3) including, for example, a piecewise estimator (PE) with corrections only on X = 0 or n (Schwarzer, 2007) . In addition to (3), another family of shrinkage estimators for the inverse proportion takes the form of For the special caseθ(0.5), it has been investigated by Pettigrew, Gart & Thomas (1986) and Hartung & Knapp (2001) . More recently, Fattorini (2006) appliedθ (1) to estimate θ in sampling designs and demonstrated that it provides a good performance when n is large. More specifically, it can be shown thatθ(1) is an asymptotically unbiased estimator of θ as n tends to infinity (Chao & Strawderman, 1972; Seber, 2013) . In this paper, we first review the existing estimators for the inverse proportion and study their statistical properties, and then develop an optimal estimator within family (4). In Section 2, we briefly review the literature and introduce two real situations where an estimate of the inverse proportion is needed. In Section 3, we derive the asymptotic properties of the existing estimators and derive the optimal shrinkage estimator within family (4). In Section 4, we conduct simulation studies to evaluate the performance of our new estimator, and compare it with existing competitors. In Section 5, we revisit a recent meta-analysis on COVID-19 data by Chu et al. (2020) for assessing the relative risks of physical distancing on the infection of coronavirus, and then apply our new estimator to overcome the zero-event problem on the relative risks. Lastly, we conclude the paper in Section 6 with some discussion and future work, and postpone the technical results in the Appendix. In this section, we provide two motivating examples in which an accurate estimate of the inverse proportion θ is highly desired. In clinical studies, the relative risk (RR), also known as the risk ratio, is a commonly used effect size for measuring the effectiveness of a treatment or intervention. Specifically, RR is defined as where p 1 is the event probability in the exposed group, and p 2 is the event probability in the unexposed group. To estimate RR, we assume that there are n 1 samples in the exposed group with X 1 being the number of events, and n 2 samples in the unexposed group with X 2 being the number of events. Let also X 1 follow a binomial distribution with parameters n 1 and p 1 , X 2 follow a binomial distribution with parameters n 2 and p 2 , and that they are independent of each other. Then by (5) and applying the MLEs of p 1 and p 2 respectively, RR can be estimated by A problem of this estimator is, however, that it suffers from the zero-event problem when X 2 = 0, which is the same problem as mentioned in Section 1 (Wei et al., 2021) . To overcome this problem, there are a few popular suggestions in the literature to further improve the RR estimator in (6). (i) Walter (1975) introduced a modified estimator of RR as RR(0.5) = (X 1 + 0.5)(n 2 + 1)/[(X 2 + 0.5)(n 1 + 1)]. Following this idea, the inverse proportion of the unexposed group is, in fact, estimated by the Walter estimator which is a special case of estimator (3) with c = 0.5. (ii) Pettigrew, Gart & Thomas (1986) proposed to estimate p i by (X i +0.5)/(n i +0.5) for i = 1 or 2, and further concluded that ln[(X i + 0.5)/(n i + 0.5)] is an unbiased estimator of ln(p i ) by ignoring the term O(n −2 ). Accordingly, the Pettigrew estimator for the inverse proportion can be given aŝ which is a special case of estimator (4) with c = 0.5. (iii) Originated from (3), a family of piecewise estimators is defined as where I(·) is the indicator function. Particularly, one special case with c = 0.5 that has been extensively applied in clinical studies (Carter et al., 2010; Higgins et al., 2019; Chu et al., 2020) is given as θ PE (0.5) = n 2 + I(X = 0 or n) X 2 + 0.5I(X = 0 or n) . (10) For ease of notation, we correspondingly denote this estimator as the piecewise Walter estimator in this paper. (iv) To further advance the piecewise Walter estimator, Carter et al. (2010) proposed RR(1) = (X 1 + 1)(n 2 + 2)/[(X 2 + 1)(n 1 + 2)], and it yields the Carter estimator for the inverse proportionθ (1) = n 2 + 2 X 2 + 1 , which is a special case of estimator (3) with c = 1. On random sampling without replacement from a finite population, it is known that the Horvitz-Thompson estimator has played an important role in the literature for estimating the population total (Horvitz & Thompson, 1952; Cochran, 2007) . Let U be a population composed of t units {u 1 , . . . , u t }, and p i be the first-order selection probability associated with unit u i . Let also Ω be a random variable associated with the population U, and Ω i be the value of Ω determined by unit u i . Following these notations, the population total of Ω can be defined as T = t i=1 Ω i . Then as an unbiased estimator of T , the Horvitz-Thompson estimator is given aŝ where ω j is the observed value of Ω j , and V ⊆ {1, . . . , t} is a subset of samples selected for estimating the population total. In practice, the inverse proportions θ j = 1/p j are often unknown and need to be estimated. To estimate θ j in (12), Fattorini (2006) proposed a numerical method via Monte Carlo simulations. Specifically in each simulation, a total of n samples were selected independently with replacement from the population U, with X j being the number of samples that contain the jth unit, where j ∈ V . Further to avoid the zero-event problem on X j , Fattorini applied estimator (4) with c = 1 to estimate the inverse proportions byθ which then yields the modified Horvitz-Thompson estimatorT m asT m = j∈V ω jθj (1). Unless otherwise specified, we will ignore the subscript j in (13) and refer toθ(1) as the Fattorini estimator. For the Fattorini estimator in family (4) with c = 1, Seber (2013) showed that Then by the fact that lim n→∞ Bias[θ(1)] = lim n→∞ [−θ(1 − 1/θ) n+1 ] = 0 for any fixed θ ∈ (1, ∞), the Fattorini estimator is an asymptotically unbiased estimator of θ when n is large. In addition, when p is large enough, or equivalently when θ is close to 1, the estimation bias of the Fattorini estimator is often negligible no matter whether n is large or small. In view of the demand for accurate estimation of the inverse proportion, we revisit the three families of shrinkage estimators in (3), (4) and (9) and compare them in both theory and practice. We first show that the three estimators are all consistent and asymptotically equivalent, with the proof of the theorem in Appendix A. Theorem 1. Let X be a binomial random variable with parameters n and p. For the shrinkage estimators in (3), (4) and (9) with any finite c > 0, we have the following properties: , whereθ is a generic notation for the three estimators and D → denotes convergence in distribution. Despite the asymptotic equivalence, we note however that their finite-sample performance can be quite different. To illustrate it, we conduct a numerical study by considering θ = 1.02, 2 or 50, which is equivalent to p = 0.98, 0.5 or 0.02. We also consider n = 10 or 200 to represent the small and large sample sizes respectively, and let c range from 0 to 2 so as to cover most common choices of c in the literature. Then for each setting, we generate N = 1, 000, 000 data sets from the binomial distribution and estimate θ by each estimator from the three families. Finally, with the simulated data sets, we compute the Stein loss (SL) (Dey & Srinivasan, 1985) of each estimator by and then report the simulation results in Figure Figure 1: The Stein losses for the shrinkage estimators from the three families with θ = 1.02, 2 or 50, n = 10 (top three panels) or 200 (bottom three panels), and c ∈ (0, 2), where "3" represents the estimators from family (3), "4" represents the estimators from family (4), and "9" represents the estimators from family (9). From Figure 1 , it is evident that the estimators from family (4) perform better than those from the other two families in most settings. In particular, no estimator from family (3) is able to provide an accurate estimate when θ = 1.02, no matter whether the sample size is large or small. On the other side, the estimators from family (9) fail to provide a stable performance when θ is moderate to large. To summarize, except for the extreme case where θ is relatively large and n is relatively small, the estimators from family (4) are always among the best and so can be safely recommended. Moreover, we also provide a theoretical evidence from the perspective of bias that the estimators from family (3) can be suboptimal for practical use. Theorem 2. Let X be a binomial random variable with parameters n and p. Then for the estimators from family (3), there does not exist a shrinkage parameter c such that The proof of Theorem 2 is given in Appendix B. Taken together the above comparisons, we propose to probe into the family of estimators (4) and find the optimal estimator of θ in this paper. For the estimators from family (4), we have introduced the Fattorini estimator with c = 1 as a special case with the asymptotic property in Section 2.2. However, as is shown in the numerical study, the Fattorini estimator may not provide an accurate estimate for the inverse proportion when n is small and θ is large. To further illustrate it, we take n = 10 and θ = 50; then according to (14), the relative bias of the Fattorini estimator is as large as In addition, it is noteworthy that the expected value of the Fattorini estimator is always lower than θ and so is consistently negatively biased. These evidences indicate that the Fattorini estimator may not be the optimal estimator in family (4). To alleviate the bias in the Fattorini estimator, we now define the optimal shrinkage parameter c as the value such that E[θ(c)] = θ. For ease of notation, we also express the expected value ofθ(c) as and then regard g(c) as a function of c. In the following theorem, we provide some properties of g(c), including the continuity, monotonicity and convexity, with the proof in Appendix C. Theorem 3. For the expected value function g(c) in (16) with any finite integer n, we have the following properties: and lim c→∞ g(c) = 1; Note also that θ takes value on (1, ∞), and g(1) < θ for any fixed n according to formula (14). Then by Theorem 3 and the Intermediate Value Theorem, there exists a unique solution c ∈ (0, 1) such that g(c) = θ, or equivalently, When n is small, in particular for n = 1 or n = 2, we can derive the explicit solution of c from equation (17). When n is large, since the degree of equation as a function of c is with n + 1, there may not have an explicit solution for c in mathematics. To summarize, we have the following theorem with the proof in Appendix D. Theorem 4. When n is less than 3, the solution of c in equation (17) is given by When n ≥ 3, we have the approximate solution of c as where To check the accuracy of the approximate solution in Theorem 4, we also plot the numerical results of the true and approximate solutions of c as a function of p in Figure 2 . Under various settings, we note that the true solution of c is given as a monotonically increasing function of p with the upper bound 1. And in addition, our approximate solution always works well as long as n or p is not extremely small. To apply Theorem 4 for the optimal shrinkage parameter, we need a plug-in estimator for the unknown p. Intuitively, the MLE of p,p MLE = X/n, can serve as a natural choice. By doing so, however, for n = 1 we haveĉ 1 =p MLE = X and further it yields thatθ(ĉ 1 ) = (1 +ĉ 1 )/(X +ĉ 1 ) = (1 + X)/2X, which then suffers from the zero-event problem. For n = 2, it is noted that the same problem also remains. While for n ≥ 3, the approximate solution will no longer suffer from the zero-event problem; but on the other side, the denominator term, (n + 1)(1 + D 1 )D 2 − D 1 , in (18) will be zero when X = n, and consequently the approximate solution is still not be applicable. To conclude, the MLE of p cannot be directly applied as the plug-in estimator when applying Theorem 4 to estimate the inverse proportion. To overcome the boundary problems on both sides, we consider the plug-in estimator of p with the following structure: where 0 < α ≤ 0.5 is the threshold parameter. Then withp plug (α) as the plug-in estimator of p, we letc n (α) be the estimator of c n in Theorem 4. To determine the best threshold value for practical use, we take several different α and then compute the relative bias of the estimator by represents the relative biases associated with α = 0.1, "2" represents the relative biases associated with α = 0.2, "3" represents the relative biases associated with α = 0.3, "4" represents the relative biases associated with α = 0.4, and "5" represents the relative biases associated with α = 0.5. And for comparison, "0" represents the relative biases of the Fattorini estimator. whereθ k is a generic form ofθ k (c n (α)). Specifically in Figure 2 , we plot the relative biases of the estimator as functions of θ for α = 0.1, 0.2, 0.3, 0.4, 0.5 and n = 1, 2, 10, 50. While for comparison, the relative biases of the Fattorini estimator are also presented in Figure 2 . In the top two panels of Figure 3 , it is evident that a small threshold value, say α = 0.1 or 0.2, may not provide an adequate remedy for the boundary problems when n is extremely small. Note also thatp plug (α) = 0.5 when α = 0.5. Then by Figure 1 that c n is always close to 1 when p = 0.5, the resulting estimator of θ with α = 0.5 will be nearly the same as the Fattorini estimator when n is large. And for moderate sample sizes, say n = 10 and n = 50, the bottom two panels of Figure 3 show that the best value of α should be neither too small or too large. Taken together, we recommend to apply α n = 1/(2 + ln(n)) as the adaptive threshold value, which follows a decreasing trend, say, for example, α 1 = 0.5, α 10 = 0.23, α 100 = 0.15, and α 1000 = 0.11. Then with p plug (α n ) = min(max(p MLE , α n ), 1 − α n ) as the plug-in estimator, our final estimator of the inverse proportion is given byθ wherec n = c n (p plug (α n )) is the estimator of c n given in Theorem 4. In addition, we derive the asymptotic properties of estimator (20) in the following theorem with the proof in Appendix E. Theorem 5. Let X be a binomial random variable with parameters n and p. For the estimatorθ(c n ) in (20), we havec n = 1 + o p (1) andθ(c n ) is a consistent estimator of θ. In this section, we conduct simulation studies to evaluate the finite-sample performance of our new estimator in (20) for the inverse proportion. For comparison, five existing estimators in the literature are also considered, including the Walter estimator in (7), the Pettigrew estimator in (8), the piecewise Walter estimator in (10), the Carter estimator in (11), and the Fattorini estimator in (13). For the simulation settings, we let θ range from 1.02 up to 50, which is equivalent to p ranging from 0.98 down to 0.02, and consider n = 10, 50 or 200 as three different sample sizes. We further generate N = 1, 000, 000 data sets from the binomial distribution with each combination of θ and n. Finally, we compute the relative bias by (19) and compute the Stein loss by (15) for each estimator, and then report the simulation results in Figure 4 . or 200, where " * " represents our new estimator, "1" represents the Walter estimator, "2" represents the Pettigrew estimator, "3" represents the piecewise Walter estimator, "4" represents the Carter estimator, and "5" represents the Fattorini estimator. recommended by virtue of its simple form and the good performance when the sample size is reasonably large. In this section, we apply our new estimator into a meta-analysis on COVID-19 data with zero-event studies. Chu et al. (2020) In the top panel of Figure 5 , seven studies were included in their meta-analysis of physical distancing for COVID-19 data, where six studies therein suffered from the zero-event problem. For the four single-zero-event studies, the 0.5 continuity correction was added to all the counts of events, while for the two double-zero-event studies, they were not included in the meta-analysis. By Xu et al. (2020) and our simulation results, adding the 0.5 continuity correction is suboptimal. Moreover, Xu et al. (2020) also showed that the double-zero-event studies may also be informative, and so excluding them can be questionable and/or even alter the results. In view of the above limi-tations, we re-conducted the meta-analysis on COVID-19 data that also includes the two double-zero-event studies. Specifically, by applying our new estimator in (20), the relative risks are estimated by RR(c n ) = (X 1 +c n 1 )(n 2 +c n 2 ) (X 2 +c n 2 )(n 1 +c n 1 ) , wherec n 1 andc n 2 are the estimates of the optimal shrinkage parameter for the exposed group and the unexposed group, respectively. While for comparison, we also conduct a meta-analysis for all seven studies by the 0.5 continuity correction, and then present all the forest plots in Figure 5 . From the middle and bottom panels of Figure 5 , it is evident that the new metaanalytical results with the double-zero-event studies also support the claim that a further distance will reduce the virus infection. On the other hand, the evidence becomes less significant as the combined relative risks get larger. Moreover, by comparing the two forest plots that both include the double-zero-event studies, we also note that our new estimator in (21) is able to yield a larger combined relative risk with a narrower confidence interval. By the variance function of ln( RR), 1/X 1 − 1/n 1 + 1/X 2 − 1/n 2 , the 0.5 continuity correction may lead to a large estimate of the relative risk after the exponential transformation, especially when the zero-event problem occurs. Hence, the confidence intervals of the relative risks in the two double-zero-event studies are very wide, which can indicate that there may exist high uncertainty in the interval estimation. In contrast, by applying our new estimator of the inverse proportion, the confidence intervals for the double-zero-event studies will be much narrower. The binomial proportion is a classic parameter originated from the binomial distribution, which has been well studied in the literature because of its wide range of applications. In contrast, the reciprocal of the binomial proportion, also known as the inverse proportion, is often overlooked, although it also plays an important role in var- properties. Finally, we proposed a new estimator of the inverse proportion by deriving the optimal shrinkage parameter c in the family of estimators (4). To be more specific, we derived the explicit formula for the optimal c in Theorem 4 for n = 1 or 2, and an approximate formula for the optimal c for n ≥ 3. Further to estimate the unknown p in the formula of the optimal shrinkage parameter, a plug-in estimator was also introduced and that also overcame the boundary problem of p. Simulation studies showed that our new estimator performs better than, or as well as, the existing competitors in most practical settings, and it can thus be recommended to estimate the inverse proportion for practical application. Finally, we also applied our new estimator to a recent meta-analysis on COVID-19 data with the zero-event problem, and it yielded more reliable results for the scientific question how physical distancing can effectively prevent the infection of the new coronavirus. To conclude the paper, we have made a good effort in finding the optimal estimator for the inverse proportion related to the binomial distribution. According to Gupta (1967) , there does not exist an unbiased estimator for the inverse proportion θ. To verify this result, by the proof-by-contradiction we assume thatθ u = η(X) is an unbiased estimator of θ. Then by definition, E(θ u ) = n x=0 η(x) n x p x (1 − p) n−x = θ. From the left-hand side, the expected value ofθ u is a polynomial of p with degree n. While for the right-hand side, by the Taylor expansion is a polynomial of p with infinite degree. This shows that the unbiasedness cannot be held for any finite n. In view of this property, there is probably no uniformly best estimator for the inverse proportion. Although we have conducted some nice work in this paper, we believe that more advanced research is still needed to further improve the estimation accuracy of the inverse proportion. For example, one may consider to develop a better and more robust approximation for the optimal shrinkage parameter when the binomial proportion p is extremely small. In addition, other families of shrinkage estimators can also be considered to see whether they can yield better estimators for the inverse proportion. Last but not least, we note that our new estimation of the inverse proportion can have many other real applications. For instance, the spirit of our new method may also be applied to estimate the number needed to treat (NNT), which is another im-portant medical term and was first introduced by Laupacis, Sackett & Roberts (1988) . Specifically, NNT is defined as NNT = 1/(p 1 − p 2 ), where p 1 is the event probability in the exposed group and p 2 is the event probability in the unexposed group. Noting also that p 1 − p 2 is the absolute risk reduction (ARR), NNT can be explained as the average number of patients who are needed to be treated to obtain one more patient cured compared with a control in a clinical trial (Hutton, 2000) . Nevertheless, the estimation of NNT will be more challenging than the estimation of the inverse proportion, mainly because the estimate of p 1 − p 2 can be either positive or negative, in addition to the zero-event problem in the denominator. More recently, Veroniki et al. (2019) also referred to this situation as the statistically nonsignificant result, which may lead to an unexpected calculation complication. In addition to NNT, Zhang & Yin (2021) proposed the reduction in number to treat (RNT) as a new measure of the treatment effect in randomized control trials. Specifically, let the two inverse proportions θ 1 = 1/p 1 be the average number of patients who are needed to be treated to obtain one patient cured in the exposed group and θ 2 = 1/p 2 be the average number of patients who are needed to be treated to obtain one patient cured in the unexposed group, then RNT is defined as RNT = θ 2 − θ 1 = 1/p 2 − 1/p 1 . Also by (2), the MLE of RNT is given as RNT MLE = n 2 /X 2 − n 1 /X 1 , which once again may not be applicable when the value of X 1 or X 2 is zero. Thus to study the statistical inference of RNT, it also requires a valid estimate for each of the inverse proportions that does not suffer from the zero-event problem. We expect that our new work in this paper will shed light on new directions on the NNT and RNT estimation, which can be particularly useful in clinical trials and evidence-based medicine. Then forθ(c) = (n + 2c)/(X + c) in (3), we note that Then by Slutsky's Theorem, it yields that The proofs for the other two estimators are similar, and so are omitted. Consequently, the three estimators in (3), (4) and (9) Hence to show that the estimator is unbiased for θ = p −1 = 2, it is equivalent to show that there exists a value c > 0 such that h(c) = 2 n+1 . When n is an even number, by noting that n x = n n−x , we can rewrite the first derivative as where the term with x = n/2 is zero and so is excluded. Note also that, for any x = 0, . . . , n/2 − 1, we have n − x > x and further This shows that h ′ (c) < 0 for any c > 0. When n is an odd number, we can write the first derivative of h(c) as And similarly, we can show that h ′ (c) < 0 for any c > 0. Combining the above results, h(c) is a strictly decreasing function of c on (0, ∞). In addition, for any finite n we note that This shows that there does not exist a finite value of c > 0 such that h(c) = 2 n+1 , and so Theorem 2 holds. To prove (ii), we verify that the first derivative of g(c) g ′ (c) = n−1 x=0 x − n (x + c) 2 n x p x (1 − p) n−x < 0. Hence, g(c) is a strictly decreasing functon of c on (0, ∞). To proof (iii), we show that the second derivative of g(c) g ′′ (c) = n−1 x=0 2(n − x) (x + c) 3 n x p x (1 − p) n−x > 0. As a consequence, g(c) is a strictly convex function of c on (0, ∞). After factorizing this equation, we have The solutions are c = p − 0.5 ± 0.5 − (p − 0.5) 2 . To remain a positive value of the estimator, the value of c is required to be positive, so c 2 = p − 0.5 + 0.5 − (p − 0.5) 2 . To get the solution of c when n ≥ 3, we apply the Taylor expansion of 1/(X + c) around c = 1 and it yields that By (16) and (24), for any finite n we have g(c) = E n + c X + 1 − E (n + c)(c − 1) (X + 1) 2 + O((c − 1) 2 ) = E n + c X + 1 − E (n + 1)(c − 1) (X + 1) 2 + (c − 1) 2 (X + 1) 2 + O((c − 1) 2 ) = E n + c X + 1 − E (n + 1)(c − 1) (X + 1) 2 + O((c − 1) 2 ). Let D 1 = E[1/(X + 1)] and D 2 = E[1/(X + 1)(X + 2)]. For D 1 , we have D 1 = 1 n + 1 n x=0 n + 1 x + 1 n x p x (1 − p) n−x = 1 p(n + 1) n x=0 (n + 1)! (x + 1)!(n − x)! p x+1 (1 − p) n+1−(x+1) = 1 p(n + 1) n+1 s=1 n + 1 s p s (1 − p) n+1−s = 1 p(n + 1) where s = x + 1. And for D 2 , we have D 2 = 1 (n + 1)(n + 2) n x=0 (n + 1)(n + 2) (x + 1)(x + 2) n x p x (1 − p) n−x = 1 p 2 (n + 1)(n + 2) n x=0 (n + 2)! (x + 2)!(n − x)! p x+2 (1 − p) n+2−(x+2) = 1 p 2 (n + 1)(n + 2) Now with D 1 and D 2 , to derive the solution of c, we take the approximation E[1/(X + 1) 2 ] ≈ (1 + D 1 )D 2 and also ignore the remainder term O((c − 1) 2 ) in (25). Then consequently, we have the approximate equation as (n+c)D 1 −(n+1)(c−1)(1+D 1 )D 2 ≈ 1/p, which yields the approximate solution of c as c n ≈ 1 − 1/p − (n + 1)D 1 (n + 1)(1 + D 1 )D 2 − D 1 = 1 − p −1 (1 − p) n+1 (n + 1)(1 + D 1 )D 2 − D 1 . Noting also that lim n→∞ D 1 = 0, it leads to (n + 1)(1 + D 1 )D 2 − D 1 = O 1 n + 2 . Moreover, we have Recall that the plug-in estimatorp plug (α n ) = min(max(p MLE , α n ), 1 − α n ) with α n = 1/(2 + ln(n)) is bounded in (0, 1), then we havec n = c n (p plug (α n )) = 1 + o p (1). Finally, by the similar argument as in Theorem 1,θ(c n ) is a consistent estimator of θ. Figure 6 : The relative biases and the Stein losses of the six estimators with n = 10, 50 or 200 and θ ∈ [1.02, 2], where " * " represents the simulation results of our new estimator, "1" represents the Walter estimator, "2" represents the Pettigrew estimator, "3" represents the piecewise Walter estimator, "4" represents the Carter estimator, and "5" represents the Fattorini estimator. Categorical Data Analysis Approximate is better than "exact" for interval estimation of binomial proportions Confidence intervals for the number needed to treat Interval estimation for a binomial proportion Relative risk estimated from the ratio of two median unbiased estimates Statistical Inference Negative moments of positive random variables COVID-19 Systematic Urgent Review Group Effort (SURGE) study authors Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis Sampling Techniques Estimation of a covariance matrix under Stein's loss Practical guide to the meta-analysis of rare events Applying the Horvitz-Thompson criterion in complex designs: a computer-intensive perspective for estimating inclusion probabilities On a class of estimators for a reciprocal of Bernoulli parameter Some history of the hierarchical Bayesian methodology Unbiased estimate for 1 The estimation and significance of the logarithm of a ratio of frequencies If nothing goes wrong, is everything all right? Interpreting zero numerators A refined method for the meta-analysis of controlled clinical trials with binary outcome Cochrane Handbook for Systematic Reviews of Interventions Introduction to Mathematical Statistics A generalization of sampling without replacement from a finite universe Number needed to treat: properties and problems Univariate Discrete Distributions A look at the rule of three An assessment of clinically useful measures of the consequences of treatment Statistical methods for meta-analyses including information from studies without any events-add nothing to nothing and succeed nevertheless Physical distancing, face masks, and eye protection for prevention of COVID-19 The bias and higher cumulants of the logarithm of a binomial variate meta: An R package for meta-analysis Statistical Models for Proportions and Probabilities The rule of three, its variants and extensions The number needed to treat in pairwise and network meta-analysis and its graphical representation The distribution of Levin's measure of attributable risk Meta-analysis with zero-event studies: a comparative study with application to COVID-19 data Probable inference, the law of succession, and statistical inference Exclusion of studies with no events in both arms in meta-analysis impacted the conclusions Reduction in number to treat versus number needed to treat Proof. To prove (i), the inverse ofθ(c) = (n + 2c)/(X + c) in (3) is given as X + c n + 2c = n n + 2cFor any p ∈ (0, 1), we note that X/n converges to p in probability as n → ∞. Thus for any fixed c > 0, by Slutsky's Theorem, formula (22) also converges to p in probability as n → ∞. This shows thatθ(c) converges to θ in probability as n → ∞ for any θ ∈ (1, ∞), i.e.,θ(c) is a consistent estimator of θ. The proofs for the other two estimators are similar and so are omitted for the sake of brevity.To prove (ii), let Y be a Bernoulli random variable and let L(.By the asymptotic normality of the MLE, we haveAppendix C: Proof of Theorem 3Proof. To prove (i), we note that (n + c)/(x + c) is a rational function of c and so is always continuous on the domain of (0, ∞). Now since n is also finite, g(c) is a continuous function of c on (0, ∞). Also for the limit of g(c), Proof. When n = 1, equation (17) becomesfrom which we obtain c 1 = p.When n = 2, it is necessary to solve 2 + c c (1 − p) 2 + 2 2 + c 1 + c p(1 − p) + p 2 = 1 p . Proof. For c n in formula (18) with p ∈ (0, 1), by (26) and (27) we have