key: cord-0306793-mighlgpy
authors: Guo, J.; Xiao, M.; Chu, H.; Lin, L.
title: Meta-analysis methods for risk difference: a comparison of different models
date: 2022-05-10
journal: nan
DOI: 10.1101/2022.05.06.22274777
sha: 8cc44a2a358660d3882403bb9f515403657f7c51
doc_id: 306793
cord_uid: mighlgpy

Risk difference is a frequently-used effect measure for binary outcomes. In a meta-analysis, commonly-used methods to synthesize risk differences include: 1) the two-step methods that estimate study-specific risk differences first, then followed by the univariate common-effect model, fixed-effects model, or random-effects models; and 2) the one-step methods using bivariate random-effects models to estimate the summary risk difference from study-specific risks. These methods are expected to have similar performance when the number of studies is large and the event rate is not rare. However, studies with zero events are common in meta-analyses, and bias may occur with the conventional two-step methods from excluding zero-event studies or using an artificial continuity correction to zero events. In contrast, zero-event studies can be included and modeled by bivariate random-effects models in a single step. This article compares various methods to estimate risk differences in meta-analyses. Specifically, we present two case studies and three simulation studies to compare the performance of conventional two-step methods and bivariate random-effects models in the presence or absence of zero-event studies. In conclusion, we recommend researchers using bivariate random-effects models to estimate risk differences in meta-analyses, particularly in the presence of zero events.

1 Introduction tackle this problem. 18 Either studies with DZEs are not included in the meta-analysis, or they are included after applying the continuity correction; both approaches bias the result. 28,29 Such problems would not occur with the one-step method; DZEs are modeled directly by bivariate random-effects models. 30 In this article, methods for estimating the RD are discussed in their mathematical formulation and compared empirically. We first describe the statistical rationale of all methods mentioned in Section 2. Then, Section 3 shows the results of methods for two real case studies, one without zero-event studies and one with zero-event studies. In Section 4, we explore and contrast the performance of each method by simulated meta-analyses under a variety of conditions. We will discuss our findings and conclude in Section 5.

Consider a meta-analysis of K studies with binary outcomes, and let k = 1, 2, . . . , K index the studies. For study k, we denote the number of patients in the experimental group by n 1k , the number of patients in the control group by n 2k , the number of events in the experimental group by e 1k , and the number of events in the control group by e 2k . In addition, the probability of events in the experimental group is p 1k and that in the control group is p 2k . For all the K studies, the true overall probability of events in the experimental group is denoted by p 1 , and that in the control group is denoted by p 2 . The number of events are assumed to follow binomial distributions: e 1k ∼ Binomial (n 1k , p 1k ) and e 2k ∼ Binomial (n 2k , p 2k ). The RD measures the difference in the event probabilities between the experimental and control groups, i.e., δ k = p 1k − p 2k in study k. When one assumes studies share the same true RD, δ k becomes δ, which is the true overall RD across all studies. Let δ method be δ for a particular method, e.g., δ CE means δ in the CE model. The observed RD in study k is denoted by d k .

The two-step methods first estimate the RD of each study and then calculate the pooled RD. In the first step, the probability of events of each group is estimated asp ik = e ik /n ik for i = 1, 2, and its variance estimate is Var (p ik ) = e ik (n ik − e ik )/n 3 ik . The estimate of the RD for study k is d k =p 1k −p 2k = e 1k /n 1k − e 2k /n 2k with variance estimatê σ 2 k = Var(d k ) = e 1k (n 1k − e 1k )/n 3 1k + e 2k (n 2k − e 2k )/n 3 2k .

In cases of DZE studies (with e ik = 0 or e ik = n ik ), it is questionable that the variance estimate turns to be zero. In the second step, the RDs d k 's from the individual studies are pooled together via certain meta-analysis methods such as the CE, FE, and RE models as follows.

One simple meta-analysis approach is to assume a common RD across all studies, i.e., the CE model, which is also commonly referred to as the FE model or equal-effect model. 22 , 31 We will distinguish between the CE model and FE model in Section 2.2. Under the CE model assumption, each component study comes from the same population where the differences between the observed effects are only due to random variations. The model is written as:

∼ N (0, 1), k = 1, 2, . . . , K,

where k is the random variation of observed RD from δ CE , and the σ k describes the variation. The variance estimate is given by Equation (1) . Conventionally, the estimated standard errorŝ σ k are used to replace the true standard errors σ k in Equation (2) . Because the estimate of σ 2 k in a DZE study is zero, DZEs may cause estimation problems for δ CE . In this case, an estimate to δ CE needs some adjustments of the observed RDs from individual studies. 32

The IV method is one of the classical methods for implementing the CE model. 33 It estimates the pooled RD as a weighted average of individual RD of each study, where the weight given to each study is the inverse of the variance of the RD:

with w k,IV = 1

Var(d k )

.

The variance estimates are zero in DZE studies, making the IV estimate incalculable. There are two ways to deal with this problem. One way is to remove the DZE studies. Another way is to add a continuity correction c 1k to e 1k and n 1k − e 1k and add c 2k to e 2k and n 2k − e 2k ; a common choice of the continuity correction is c 1k = c 2k = 0.5. An alternative continuity correction is c 1k and c 2k that satisfy c 1k /c 2k = n 1k /n 2k and c 1k + c 2k = 1; we call this correction method as the treatment arm continuity correction (TACC with w k,MH = n 1k n 2k n 1k + n 2k .

The advantage of this method is that it can naturally handle DZE studies as the weights are based on sample sizes that are always positive, instead of variances. Nevertheless, a continuity correction is still applied to the MH method as a convention. 29 2.2 Two-step method: fixed-effects model

The CE model assumes that all studies have the same effect, but this assumption may not strictly hold in real-world settings. The FE model permits each study to have different true treatment effects: 31,37

where δ k,FE denotes the true RD of study k. Different types of weights could be given to d k to estimate δ k,FE . A commonly-used weight is the inverse of the observed sample variance, i.e, 1/σ 2 k . In this case, the estimation of the FE model is the same as the CE model using the IV method. 20

In real-world cases, it is common that different component studies come from heterogeneous populations. 38,39 Consequently, the true RDs in each study are different, rather than sharing a common RD, e.g., δ IV . The heterogeneity statistic Q can test whether RDs are heterogeneous. 40 It is defined as Q = K k=1 w k,IV (d k −δ IV ) 2 and follows approximately a χ 2 K−1 distribution under the null hypothesis of homogeneity, i.e., H 0 : δ k = δ (k = 1, 2, . . . , K). If the null hypothesis is rejected, then the effects are different across studies due to their heterogeneous populations. To account for the heterogeneity of study-level true effects, the RE model uses random effects to represent the between-study heterogeneity. The true study effects are assumed to follow a normal distribution with mean zero and a between-study variance τ 2 :

Here, the summarized RD is δ RE , and u k is the random effect due to between-study heterogeneity. This model aims to estimate both δ RE and τ . We can first estimate τ and then incorporate τ in IV method in the CE model to estimate δ RE , i.e., w * k = 1/(σ 2 k +τ 2 ). The CE model is a special case of the RE model when τ = 0. As in the CE model, the RE model still has technical difficulties in the presence of DZE studies. The RE model may predict a RD outside the range of −1 to 1 due to unbounded normally distributed random effects. An accurate estimate of the . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint between-study variance (τ 2 ) may reduce such a problem. Many estimators of the between-study variance τ 2 exist 32,41 ; we review 8 of them as follows.

The DL estimator is the most widely used method to estimate τ 2 in the current literature. 42 The estimator is given by:

where Q is the the heterogeneity statistic.

The HE estimator is an unbiased estimator of τ 2 given by: 43

The major advantage of HE estimator is its simplicity because it does not consider different weights across studies. It can be used as an initial value for some estimators that need to be obtained via iterations.

The PM estimator applies iterations to the generalized Q statistic: 44

The generalized Q statistic is expected to equal K − 1, and Q (r+1) PM is the generalized Q statistic at the (r + 1)th iteration. Theδ PM just slightly over zero and stop when the change in τ 2 PM is less than a specific number or Equation (4) satisfies. (3) is essentially a linear mixed model, which could be implemented by the ML estimation. 45 The log-likelihood function which is given by:

. CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint Taking partial derivatives to the log-likelihood function, τ 2 ML could be solved via the following equation:τ

where w k,ML = (τ 2 ML +σ 2 k ) −1 andδ ML is the estimated δ ML . We estimateτ 2 ML and δ ML through iterations; the iterations can start from an initial value of 0 for δ ML and use the estimate of δ from other methods (e.g.,δ HE ) as the initial value for δ ML . The convergence is typically achieved within 10 iterations. 46 2.3.5 Restricted maximum-likelihood estimator The ML estimator fails to account for the heterogeneity of the data in the first step of iteration and is sometimes negatively biased; the REML estimator solves these problems by a transformation. 45,47 Specifically, the REML estimation modifies the log-likelihood function as

The estimator of τ 2 REML is obtained from solving the following equation:

whereδ ML is the likelihood-based estimate of δ, which is identical to the one obtained as the ML estimator. The weight w k,REML is (τ 2 REML +σ 2 k ) −1 . This estimation also requires iterations. Of note, the REML estimator may be obtained using an "approximate" equation: 48-50

Equations (5) and (6) are equivalent only when studies are homogeneous. This situation is rare in practice, so Equation (5) should be preferred. 45 . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint 

Here, w k can be any valid weights, and a convenient type of weights is w k,IV .

The SJ estimator is a general between-study variance estimator that is based on the weighted residual sum of squares as follows: 52

The EB estimator is a multi-parameter shrinkage estimator of τ . 53 If δ k is the true effect size for study k andδ IV is the effect estimate from the IV method, then

The posterior distribution of δ k is:

. Similar toτ 2 REML , τ 2 EB can be estimated by iterations using the following equations if the within-study variances σ 2 k are homogeneous:τ

where w k,EB = (τ 2 EB +σ 2 k ) −1 . As discussed earlier, σ 2 k often differs among studies in a metaanalysis. In that case,τ 2 EB is obtained by the EM algorithm. 54 The EB estimator is essentially the same as the PM estimator. 55 . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint 

Contrary to the two-step methods introduced above, the one-step methods do not force a normal distribution on the estimated RDs from individual studies. Instead, they typically assume that the event counts of individual studies follow binomial distributions. Thus, the one-step methods need no continuity correction for zero counts and avoid bias from continuity corrections. Here, we focus on the bivariate random-effects models: 

where the link function g(·) is usually the logit or probit link. To facilitate the estimation, the inverse of Fisher's z transformation, ρ = [exp(2z) − 1]/[exp(2z) + 1], is used to transform the correlation coefficient. 56 The parameters v 1 , v 2 , and ρ can be obtained via estimation methods such as the ML. The overall mean of p 1k and p 2k (k = 1, 2, . . . , K) across studies, i.e., p 1 and p 2 , are calculated as follows:

where φ(·) denotes the standard normal density. Then, the overall RD by the BGLMM is

In the BBBM, the joint distribution of p 1k and p 2k is modeled by the Sarmanov bivariate beta distribution in their original scales as: 27,57

. CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ;  where α i , β i > 0, ω is a real number that satisfies 1 + ω

is the beta function, and µ i is the mean of the binomial event probability, 58 The correlation coefficient between the two groups is

and ω depends on ρ and Var(p i ). To ensure that ρ lies between −1 and 1, we set some constraints on ω as follow: 27

where η is a real number.

Consequently, the marginal joint distribution function of the BBBM is

The ML estimation could be used to estimate the parameters in this model and the log-likelihood function is

If α i and β i are known, the overall RD from the BBBM can be calculated as

In this article, all the two-step methods are implemented by the "meta" package (version 5.2-0) in R (version 4.1.3). 32, 59 The bivariate random-effects models are implemented by SAS (version 9.4) using PROC NLMIXED. 60 It uses an adaptive Gaussian quadrature to approximate the likelihood integrated over the random effects, and it maximizes the likelihood function by dual quasi-Newton optimization techniques for the BGLMM and BBBM.

. CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ;

In order to show and compare the empirical performance of various methods in estimating RDs, we apply the above methods to two real datasets, one without zero events and another with zero events, as shown in Tables 1 and 2. Detailed code for case studies could be found in the Supplementary Materials. 

Due to the widespread of COVID-19 all over the world, evidence on the effectiveness of prevention measures can be combined to inform decisions. Chu et al. 62 performed a meta-analysis on different prevention measures in preventing the transmission of COVID-19 and related diseases. We focus on the meta-analysis that compared the use of face masks with no face masks in healthcare settings. The meta-analysis had 26 studies in total, including 5 single-zero-event studies and 6 DZE studies. The total number of patients was 9,445; 6,003 were in the face mask group, and 3,442 were in the no face mask group. This dataset is used to demonstrate the impact of zero-event studies in a meta-analysis.

Tables 3 and 4 present the estimates of the overall RD (δ) and between-study standard deviation (τ ) from each model, respectively. All models did not encounter estimation difficulties in the risk of stunting in left-behind children dataset. Similarly, no estimation difficulty occurred in the risk of COVID-19 infection with face masks dataset after corrections for zero events. The implementations of the bivariate random-effects models took slightly longer time than the onestep methods. Table 3 shows that the estimated RDs were more consistent among different models (from 0.008 to 0.015) in the case study of stunting risk than the results in the case study of COVID-19 risk, where different models produced a range of RD estimates from −0.197 to −0.032, likely . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint caused by DZE studies. If DZE studies were removed from the meta-analysis and zero event was not corrected, the RDs were overestimated below 0. Two continuity correction methods, i.e., adding a continuity correction of 0.5 to each cell (c 1k = c 2k = 0.5) and TACC, produced similar estimates. The estimated RDs from bivariate random-effects models were similar or slightly smaller to the estimates from the RE model with continuity corrections. All these estimates suggested a smaller effect of face masks than the RE model with DZE studies removed. Finally, link functions in BGLMMs resulted in little differences inδ.

In the case study of comparing stunting risks between left-behind children and children not left-behind, the I 2 statistic was larger than 70%, 63 and thep-value of the Q test of heterogeneity was less than 0.001. This suggested a high between-study heterogeneity so the RE model was suitable. Different heterogeneity estimators under the RE model gave consistentτ ; the largest difference between all τ estimates was 0.007 (Table 4 ).

Contrary to the small variation ofτ in the case study of stunting risk, a wide range of estimated τ 2 was observed under various heterogeneity estimators in the case study of COVID-19 risk. Because I 2 in this dataset was larger than 80% and the p-value of the Q test was less than 0.001 in the two-step methods, the RE model was also suitable. Among all the estimators for τ , the DL and HS estimators give smaller results than others, leading to substantial differences inδ.

The considerable differences of the various methods in these case studies leave us to wonder which model provides the best estimates of RDs. Section 4 will use simulated meta-analyses to further investigate the performance of the models.

We evaluated different methods for the RD by simulated meta-analyses under different settings using the bivariate probit-normal model. 64 Since our main interest is to understand the performance of all models in the presence of DZE, we designed scenarios where heterogeneity is absent or present in a meta-analysis with DZE. Table 5 outlines settings that simulated three types of meta-analyses: meta-analyses without zero events (Settings 1-4), meta-analyses with zero events and homogeneous studies (Settings 5-8), and meta-analyses with zero events and heterogeneous studies (Settings 9-12). Each metaanalysis type included four simulation settings that were combinations of two different designs of correlations between event probabilities in the treatment and control groups and the number of studies. When no zero events exist in meta-analyses, we used 13 different models introduced in Section 2, including the CE, RE, and bivariate random-effects models. In meta-analyses with . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ;  zero events, we additionally applied three different continuity corrections to two-step models, resulting in an evaluation of 31 methods in total (Table 6) .

We used bias, mean squared error (MSE), and coverage probability (CP) of nominal 95% confidence interval (CI) of the estimated RD to quantify the methods' performance. The bias is the difference between an estimate and the true value. The MSE reflects the combination of bias and variance. The CP is the probability that a CI contains the true RD. If an estimate is accurate and precise, we expect that the bias is close to 0, the MSE is small, and the CP is around 95%. 65 Under each setting in Table 5 , we generated 5,000 meta-analyses. We used the bivariate probit-normal model to generate meta-analyses with binary events, where event counts follow binomial distributions and probit-transformed probabilities follow normal distributions. To be comprehensive in our simulations, we also simulated data from the BBBM; see details in the Supplementary Materials.

In real-world meta-analyses, event probabilities often have within-study correlations between the experimental and control groups. 66 We set the correlation coefficient ρ to be 0.3 and 0.6. Moreover, we set the number of studies K to be 10 and 30. The number of patients in each group for each study (n) was 100.

The bivariate probit-normal model includes four other important parameters, i.e., the means and variances of the event probabilities in the probit scale of the two groups. These four parameters affect the likelihood of DZEs occurrence in data. If the event probabilities in both the treatment and control groups are small, then a meta-analysis likely contains DZEs. Nevertheless, simulated meta-analyses with small event probabilities cannot guarantee the presence of zero events, and meta-analyses with large event probabilities may still contain zero events. Therefore, we added further restrictions to the event probabilities by controlling the 95% lower confidence limit of the event probabilities, instead of variances of probabilities, to increase the proportions of meta-analyses with and without zero events under the designed settings. Additionally, we introduced study heterogeneity in a meta-analysis from the difference between the mean and the 95% lower confidence limit of the event probabilities in both groups (p i 's).

We set the mean and 95% lower confidence limit of events for both settings with and without zero events. For meta-analyses without zero events, we set E(p 1 ) = 0.2, E(p 2 ) = 0.4 and set the 95% lower confidence limit of p 1 to 0.1 and 95% lower confidence limit of p 1 to 0.2 across all studies. In meta-analyses with zero events and homogeneous studies, we set E(p 1 ) = 0.01, E(p 2 ) = 0.03 and set the 95% lower confidence limit of p 1 to 0.005 and 95% lower confidence limit of p 2 to 0.01. To introduce heterogeneity in meta-analyses with zero events, we set E(p 1 ) = 0.08 and E(p 2 ) = 0.10 and set the 95% lower confidence limit of p 1 to 0.0001 and 95% lower confidence . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint limit of p 2 to 0.0005. In such cases, the overall I 2 were above 0.80, in contrast to that I 2 < 0.20 in settings without heterogeneity. Table 5 gives a summary of simulation designs across all settings.

We summarized the bias (×10 −3 ), MSE (×10 −3 ), and CP (percentage) of 13 methods in metaanalyses without zero events in Table 7 and those of 31 methods in meta-analyses with zero events in Tables 8 and 9. Figure 1 visualizes the performance comparison among all methods, where each row represents each of the three performance metrics and columns correspond to three meta-analysis types.

When the true RD was −0.20 in meta-analyses without zero events, almost every method (except for Models I1 and I2) achieved nearly 90% CP (nominal level of 0.05). While the largest bias was around 0.05 (Model I1), most methods only had a bias of around 0.001. We observed small differences in MSE among all methods. CPs, however, varied across methods; the CE models (Models I1 and I2), in particular, had lower CPs (<75%) than other models. The CPs of BBBMs varied from 73.13% to 93.90% under different settings.

In meta-analyses with homogeneous studies and zero events, the performance across models had a greater variation than the previous settings without zero events (Table 8 and Figure 1 ). The bias ranged from 0.02 × 10 −3 to 8.27 × 10 −3 . Except for the MH method (Model I2), all two-step models (Model I1a-Model I1c and Model II1a-II8c) had a higher bias than the onestep models (Table 8 and Figure 1 ). In other words, the two-step models with the continuous correction for zero counts were generally biased. The problem was especially pronounced when a small bias towards the null might lead to an opposite sign to a small true RD. Different methods showed similar MSEs, and the CP ranged from 36.66% to 92.88%. While the one-step model and MH method had a CP >85%, two-step models showed a large variation of CP across different choices of models (CE or RE models) or between-study variance estimators in RE models. The variation was greater when the correlation (ρ) between treatment and control groups was larger (Settings 7 and 8 in Figure 1H ). Table 9 and Figure 1C , 1F, and 1G showed that one-step methods generally had a better performance than two-step methods when meta-analyses had zero events heterogeneity. The highest average bias was 12.68×10 −3 in CE model with artificial corrections given the true RD was only 20×10 −3 . Furthermore, most two-step models had higher biases than one-step models, and the bias for two-step models varied by both ρ and K. Different methods had similar MSEs to each other except for two-step models deleting DZE studies. The CP of all CE models were below 60%, suggesting that CE models were not suitable for meta-analysis with high between-study . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint heterogeneity and zero events. CP for RE models were all larger than 75% and CP for one-step models are all larger than 90%. While BGLMM with the probit link (Mode III1) and BBBM (Model IV) had a consistently low bias and CP of at least 95%, bias or CP in BGLMM with the logit link (Model III2) was sub-optimal when ρ was 0.6 in Settings 10 and 12. A possible reason for such a variation is that the logit link is more likely to suffer from estimation difficulties when ρ is high in bivariate settings. 67 In summary, one-step models, especially BGLMM (probit link) and BBBM, outperformed other models considering all three metrics; thus, they are recommended for estimating meta-analysis with substantial heterogeneity and zero events.

By all evaluation metrics, bivariate random-effects models (Models III1, III2, and IV) had significantly better performance than the others when studies had zero events. Although the CE model with the MH method (Model I2) performed better than most two-step methods when the between-study heterogeneity was low, it had a significantly reduced performance in meta-analyses with high between-study heterogeneity. CE models with the IV method (Models I1a, I1b, and I1c) had poor performance when DZE studies were present, regardless of study heterogeneity. Models that removed DZE studies (Models II1a to II8a) or added continuity corrections (Models II1b to II8b and Models II1c to II8c) not only biased the RD estimate but also increased the MSE and reduced the CP. For a specific model (e.g., Model II4b/II4c vs. Model II4a), the performance of using continuity corrections was slightly better than that of deleting DZE studies. Adding 0.5 to zero events and TACC in DZE had similar performance ( Table 8 ). The variance estimator that produced the best CP depended on specific settings of meta-analyses with DZE. Although two-step models appeared to have similarly large biases (> 2 × 10 −3 ) for all meta-analyses with DZEs and homogeneous studies (Table 8) , only settings with 30 studies (Settings 6 and 8) per meta-analysis had a small CP. This was because a large number of studies (e.g., 30) increased the precision of the RD estimate, and 95% CIs around the biased estimates were unlikely to cover the true RD. This means that most two-step models (except for the CE model with the MH method) were biased in both the point and interval estimate when DZEs studies were present in a meta-analysis with low heterogeneity and a large number of studies. When there was substantial heterogeneity in a meta-analysis with DZEs, bias in all CE models (including the MH method) caused a poor CP.

This article comprehensively evaluates both the two-step and one-step models for estimating the RD in a meta-analysis with binary outcomes. While many existing studies have compared methods in estimating the odds ratio and the risk ratio under various situations, their conclusions . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint may not easily extend to the RD because the RD can be defined in DZEs. In this study, we compared and evaluated multiple models to estimate the RD in meta-analyses with zero-event studies. Results from real-data case studies and simulations show that various models performed consistently in the absence of DZEs but had noticeable differences when DZEs appeared. For datasets with zero-event studies, we suggest using one-step methods, especially BGLMM (probit link) and BBBM. The CE model with the MH method is suitable when zero-event studies are not heterogeneous, in addition to one-step methods. In contrast to other models, these two methods do not need any artificial corrections for zero-event studies. Both had smaller biases and MSEs and higher CPs than other methods. Their performance remains superior under different settings. However, the CE model with the MH method assumes that all studies estimate the same effect, which may not be suitable in the presence of substantial heterogeneity. 22 Thus, when studies are heterogeneous, researchers should avoid using the MH method. One-step methods may be preferred over the MH method because they make no explicit assumption about homogeneity.

Although bivariate random-effects models have many advantages, some limitations still remain. Because bivariate random-effects models need sufficient data to estimate model parameters (p = 5 in bivariate models), the estimation may encounter singularity problems when the number of studies n is small in a meta-analysis. In order to avoid model singularity, the dataset should contain at least six studies such that n > p. We suggest researchers use two-step methods instead of bivariate random-effects models if there is a limited number of studies (e.g., ≤5). In addition, more parameters imply that estimation failures are more likely to arise with inappropriate starting values of parameters in bivariate random-effects models. Finally, some refined CIs of univariate random-effects models exist in the current literature. 68-70 While they were not fully investigated in this article, these alternative CI constructions are expected to improve the CPs for two-step methods. Methods to improve the CPs for the bivariate one-step methods demand further research.

Our work may spark future interests in estimating the RD from datasets of 2 × 2 tables, yet several limitations exist. Several RD estimators are not included in this article, such as Bayesian methods. 29,71 Bayesian methods have become popular in meta-analyses as they could overcome some computational difficulties encountered by traditional methods and permit the use of pertinent external information for improving the estimation. Current software packages are available to implement our one-step models from Bayesian approaches, such as the PROC BGLIMM in SAS. 72,73 Future studies may compare the performance of Bayesian methods with the methods discussed in this article. In addition, one may use other types of bivariate randomeffects models, such as the marginal beta-binomial model approach. 74 Finally, although we show that one-step models have better performance than two-step models, two-step models . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ;  that apply artificial corrections currently dominate meta-analysis applications, partly because they are readily implemented by most meta-analysis software. It is worthwhile to develop userfriendly software or modules for bivariate random-effects models for meta-analyses of RDs so that they can be conveniently implemented by practitioners.

The Supplementary Materials contain additional simulation studies and statistical code for reproducing the case studies. is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; 

Children not left-behind is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint Table 3 . Estimates of the overall risk differenceδ in the two case studies. is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint Table 5 . Simulation designs of meta-analyses from a bivariate probit-normal model.

95% lower Setting E(p 1 ) confidence limit of p 1 E(p 2 ) confidence limit of is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; https://doi.org/10.1101/2022.05.06.22274777 doi: medRxiv preprint Table 6 . Models in the simulation study for risk differences.

Without zero-event studies With zero-event studies . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint The copyright this version posted May 10, 2022. ; Table 9 . Results of simulated meta-analyses with heterogeneous zero-event studies (I 2 >0.80).

Setting 10 Setting 11 Setting 12 . CC-BY-ND 4.0 International license It is made available under a license to display the preprint in perpetuity.

is the author/funder, who has granted medRxiv a (which was not certified by peer review) holder for this preprint I2  II1  II2  II3  II4  II5  II6  II7  II8  III1 III2 I1c I2 II1a II2a II3a II4a II5a II6a II7a II8a II1b II2b II3b II4b II5b II6b II7b II8b II1c II2c II3c II4c II5c II6c II7c II8c III1 III2 I1c I2 II1a II2a II3a II4a II5a II6a II7a II8a II1b II2b II3b II4b II5b II6b II7b II8b II1c II2c II3c II4c II5c II6c II7c II8c III1 III2 I2  II1  II2  II3  II4  II5  II6  II7  II8  III1 III2 I1c I2 II1a II2a II3a II4a II5a II6a II7a II8a II1b II2b II3b II4b II5b II6b II7b II8b II1c II2c II3c II4c II5c II6c II7c II8c III1 III2 I1c I2 II1a II2a II3a II4a II5a II6a II7a II8a II1b II2b II3b II4b II5b II6b II7b II8b II1c II2c II3c II4c II5c II6c II7c II8c III1 III2 I1  I2  II1  II2  II3  II4  II5  II6  II7  II8  III1 III2 I1c I2 II1a II2a II3a II4a II5a II6a II7a II8a II1b II2b II3b II4b II5b II6b II7b II8b II1c II2c II3c II4c II5c II6c II7c II8c III1 III2 I1c I2 II1a II2a II3a II4a II5a II6a II7a II8a II1b II2b II3b II4b II5b II6b II7b II8b II1c II2c II3c II4c II5c II6c II7c II8c III1 III2 is the author/funder, who has granted medRxiv a 

Meta-analysis and the science of research synthesis

Empirical comparison of publication bias tests in meta-analysis

Quantifying publication bias in meta-analysis

Outcome reporting bias in trials: a methodological approach for assessment and adjustment in systematic reviews

Systematic Reviews in Health Care: Meta-Analysis in Context

Handbook of Meta-Analysis

Assessment of publication trends of systematic reviews and randomized clinical trials

Systematic Reviews in Health Care: Meta-Analysis in Context, chapter Effect measures for meta-analysis of trials with binary outcomes

Issues in the selection of a summary statistic for meta-analysis of clinical trials with binary outcomes

Is the risk difference really a more heterogeneous measure?

Controversy and debate: Questionable utility of the relative risk in clinical research: Paper 1: A call for change to practice

Controversy and debate: Questionable utility of the relative risk in clinical research: Paper 2: Is the odds ratio "portable" in meta-analysis? time to consider bivariate generalized linear mixed model

The odds ratio is "portable" across baseline risk but not the relative risk: Time to do away with the log link in binomial regression

Controversy and debate: Questionable utility of the relative risk in clinical research: Paper 4: Odds ratios are far from "portable"-a call to use realistic models for effect variation in meta-analysis

Empirical comparisons of heterogeneity magnitudes of the risk difference, relative risk, and odds ratio

A comparison of seven random-effects models for meta-analyses that estimate the summary odds ratio

A comparison of methods for meta-analysis of a small number of studies with binary outcomes

The identity of two meta-analytic likelihoods and the ignorability of double-zero studies

Estimating risk difference from relative association measures in meta-analysis can infrequently pose interpretational challenges

Averaging correlation coefficients: should Fisher's z transformation be used

Bacon with your eggs? Applications of a new bivariate beta-binomial distribution

Properties and applications of the sarmanov family of bivariate distributions

R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing

Health impacts of parental migration on left-behind children and adolescents: a systematic review and meta-analysis

Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and metaanalysis

Quantifying heterogeneity in a meta-analysis

Using simulation studies to evaluate statistical methods

The design of simulation studies in medical statistics

Bayesian analysis on meta-analysis of case-control studies accounting for within-study correlation

Estimating logistic regression parameters for bivariate binary data

Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections

A unified method for improved inference in random effects meta-analysis

Research reported in this publication was partially supported by the National Center for Advancing Translational Sciences grant UL1 TR002494 (HC), the National Library of Medicine grant R01 LM012982 (MX, LL, and HC), and the National Institute of Mental Health grant R03 MH128727 (LL) of the US National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.