key: cord-0475218-rr0ktzpx authors: Cahoy, Dexter; Sedransk, Joseph title: Bayesian inference for asymptomatic COVID-19 infection rates date: 2022-03-27 journal: nan DOI: nan sha: 697b3a31d23a0dda49785e28c1a65f9e6e7b85a1 doc_id: 475218 cord_uid: rr0ktzpx To strengthen inferences meta analyses are commonly used to summarize information from a set of independent studies. In some cases, though, the data may not satisfy the assumptions underlying the meta analysis. Using three Bayesian methods that have a more general structure than the common meta analytic ones, we can show the extent and nature of the pooling that is justified statistically. In this paper, we re-analyze data from several reviews whose objective is to make inference about the COVID-19 asymptomatic infection rate. When it is unlikely that all of the true effect sizes come from a single source researchers should be cautious about pooling the data from all of the studies. Our findings and methodology are applicable to other COVID-19 outcome variables, and more generally. Meta-analyses are commonly used to summarize information from a set of independent experiments, observational studies or sample surveys. Doing this may strengthen inferences when there are deficiencies in the individual studies such as small sample sizes. Methodology for combining findings from repeated research studies has a long history and, in particular, meta-analyses have become very popular over the past thirty years. From an online search for 'books meta-analysis' we found forty-nine books. Thus, it was natural that early in 2020 several meta-analyses were conducted (and subsequently published) about infection rates from the novel coronavirus. Looking at several early review papers we were concerned whether the meta-analyses were carried out in an appropriate manner. Even after careful evaluation to include only studies thought to be comparable, there may be subsets of the collection of studies where the true (subset) effects are very different. If this is so, pooling the data from all of the studies may result in misleading conclusions. Borenstein et al. (2010) add: "If the variation is substantial, then we might want to shift our focus .... Rather it should be on the fact that the ... effect differs from study to study. Hopefully, it would be possible to identify reasons ... that might explain the dispersion." In this paper we consider three Bayesian methods that have a more general structure. One can use these methods to check the validity of the more standard approaches by investigating whether the set of true effect sizes come from a common source. If the assumptions underlying the standard approaches are not met, our proposed methodology will lead to more appropriate inferences. From five review papers we selected for further analysis several studies that have different features. In each of these cases the objective is to make inference about the asymptomatic infection rate. Please note that we are not evaluating specific meta-analytic methods. Our concern is about appropriate aggregation of possibly disparate data. Buitrago-Garcia et al. (2020) explain the importance of a review: "Accurate estimates of the proportions of true asymptomatic and presymptomatic infections are needed urgently because their contribution to overall SARS-CoV-2 transmission at the population level will determine the appropriate balance of control measures. If the predominant route of transmission is from people who have symptoms, then strategies should focus on testing, followed by isolation of infected individuals and quarantine of their contacts. If, however, most transmission is from people without symptoms, social distancing measures that reduce contact with people who might be infectious should be prioritized, enhanced by active case-finding through testing of asymptomatic people." Referring to a narrative review report (Oran and Topol, 2020) that presents a range (over studies) of 6% to 96% for the proportion of individuals positive for SARS-CoV-2 but asymptomatic, the authors point out the need for a careful review. Standard meta-analyses typically assume that the true effect sizes, µ = (µ 1 , ..., µ L ) t , come from a common source. Even after including only those studies thought to be comparable, µ may be composed of distinct subsets, each with a different underlying distribution. This seems likely for some of the reviews, e.g., the seventy-nine rates in Buitrago-Garcia et al. (2020) that range from 0.01 to 0.92. To make appropriate inferences the three Bayesian methods have a more general structure than that assumed in a standard meta-analysis. The principal method, termed uncertain pooling, is flexible in that it can identify distinct subsets of µ: e.g., for r subsets there would be r true effect sizes, ν 1 , . . . , ν r . Then, pooling the data from all of the studies may lead to misleading inferences. This methodology will also indicate when true effect sizes have a common source, thus leading to an appropriate inference. The more general structure should ensure greater concordance of the data with our model than with a more restricted model. A better fitting model should lead to better inference. Specifically, only similar studies will be combined. It is not surprising that there is strong statistical evidence that in three of the four data sets that we analyze (Section 4) the true effect sizes do not come from a single source. Then the analyst should be cautious about combining the data from all of the studies. For a general discussion of Bayesian methods for meta-analysis see Schmid, Carlin and Welton (2021) . Borenstein et al. (2010) is a basic treatment of fixed and random effects models for a meta-analysis while Rice, Higgins and Lumley (2018) re-evaluate fixed effect(s) meta-analysis. Section 2 has brief descriptions of the data sets that we analyze, together with background information. The methodology is introduced in Section 3 while the results are summarized in Section 4. Section 5 has a brief summary and an extension that accommodates study level covariates with notes about the availability of covariates in the reviews we investigate. This section has brief descriptions of the meta-analyses that we have analyzed together with some background information. The definitions of asymptomatic infection rate and the conditions required for including individual studies in the meta-analysis differ considerably, and are too detailed to present all of them in this paper. The first paper that we considered was by He, Yi and Zhu (2020) . Using data from six studies they obtain estimates of the asymptomatic infection rate, noting that these measures differ considerably over the six studies, and explaining that this may be due to "differences in data collection, sample size, and the conditions." Since the information from one of the six studies is inconsistent with that in the other five studies we include only the latter in our analyses. As seen in Table 1 the sample proportions range from 0.22 to 0.78 with little or no clustering. The first meta-analysis concerning only the asymptomatic coronavirus disease rate is He et al. (2021) . In their Section 1 the authors note the importance of studying the asymptomatic rate and that this rate is not "well characterized." They conduct meta-analyses for all 41 studies and five subsets. The rates are markedly heterogeneous (proportions from 0.02 to 0.75 and total numbers of cases from 4 to 44,627), suggesting concern about the aggregation. Our analysis is for the children subgroup, with eleven studies. As seen in Table 2, the sample proportions range from 0.11 to 0.57 while the SEs range from 0.01 to 0.16. Unlike the first meta-analysis, there is some apparent clustering. The third data set is a subset of six of the eleven studies in He et al. (2021) . These six studies were chosen to illustrate properties when there is considerable separation. From Table 3, there are two apparent clusters with cluster proportions of about 0.13 and 0.55; overall, the SEs range from 0.01 to 0.16. Buitrago-Garcia et al. (2020) has several important features, i.e., the authors consider rates associated with both asymptomatic and presymptomatic cases and only include studies that document follow-up and symptom status at the beginning and end of follow-up or modeling. Their meta-analyses are based on seventy-nine studies, summarized for the entire set and seven subsets. The seventy-nine rates are markedly heterogeneous (proportions from 0.01 to 0.92 and total numbers of cases from 2 to 1012). Our analysis is for the screening subgroup with seven studies, noted by Buitrago-Garcia et al. (2020) as being of special interest. Here, the proportions range from 0.17 to 0.50 while the SEs range from 0.04 to 0.35. For additional background we give the criteria that He et al. (2021) used to select the studies that they used in their meta-analyses. Then we summarize features of the eleven children study, from He et al. (2021) , that we have analyzed. He et al. (2021) searched two databases, PubMed and Embase, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline. They included the following items: "COVID-19" and analogous phrases and "Asymptomatic." They included "articles reporting a specific number of asymptomatic infection cases in confirmed COVID-19 patients, information describing the epidemiological and clinical features of COVID-19." There is no evidence of a risk-of-bias assessment or consideration of a sufficient follow-up period. Byambasuren et al. (2020) identify these as characteristics essential when making decisions about which studies to include in a meta-analysis. Since He et al. (2021) is the first meta-analysis to make inferences about the asymptomatic infection rate, one may conjecture that they were motivated to publish their results quickly. Of the eleven studies of children five papers are published in Chinese, so there are only summaries in English. Only three of the eleven papers give the age distribution of the children, although most give the mean age and some give the range. Seven of the papers give the sex distribution. In five studies all cases were associated with a single hospital while four studies summarized the results from many hospitals. There was no information for two studies. Several papers noted that most of the patients had a history of close contact with adults with COVID-19. A common assumption in situations where combining data is plausible is: For i = 1, . . . , L, j = 1, . . . , n i , the Y ij are independent where Y i = n i j=1 Y ij /n i , the σ 2 i are known, L is the number of studies and n i is the number of replicates. Note that all of the analyses of COVID-19 data that we consider make assumptions like (1). The first method, uncertain pooling, is based on Malec and Sedransk (1992) and Evans and Sedransk (2001) . Since this method may be unfamiliar, we describe it in some detail. They showed that a prior for µ = (µ 1 , µ 2 , . . . , µ L ) t can be selected to reflect the beliefs that there are subsets of µ such that the µ i in each subset are "similar", and that there is uncertainty about the composition of such subsets of µ. Let G be the total number of partitions of the set L = {1, . . . , L}. Denote a particular partition by g = 1, . . . , G, let d(g) denote the number of subsets of L in the gth partition (1 ≤ d(g) ≤ L), and let S k (g) denote the set of study labels in subset k for k = 1, ..., d(g). For our analyses L = 5, 6, 7 and 11 with G = 37, G = 203, G = 877 and G = 678, 570, respectively. For other values of L the total number of partitions of an L-element set is given by the Bell number, B L . Recent work (e.g., Dahl, Day and Tsai, 2017) proposed using prior information to place a prior on the set of partitions g = 1, ..., G. This will increase the complexity of the computations but avoid the need to consider the G partitions explicitly. To specify a prior for µ, first condition on g. Malec and Sedransk (1992) and Evans and Sedransk (2001) assume that there is independence between subsets, and within S k (g) the Also, the ν k (g) are mutually independent with Conditioning on the variances above (but suppressing them in our notation), and letting γ 2 k (g) → ∞ leads to the following expected results. Defining y = (Ȳ 1 , . . . ,Ȳ L ) t and letting where Inference about µ includes uncertainty about the value of g, i.e., where the notation is simplified by using integration rather than summation for g. To evaluate (6) we need f (g, ∆|y). One must be careful about specifying how the γ 2 k (g) → ∞ because the models corresponding to the partitions have different numbers of parameters. We use a method described in Section 3 of Evans and Sedransk (2001) that postulates little prior, relative to sample, information about the ν k (g) and is invariant to changes in the scale of Y . Let ν(g) = (ν 1 (g), ..., ν d(g) (g)) t and K(f 1 (ν(g)), f 2 (ν(g)|y)) be the Kullback-Leibler information about ν(g). With prior f (g, ∆) = f (g)f (∆) and letting the γ 2 k (g) → ∞ subject to K(f 1 (ν(g)), f 2 (ν(g)|y)) = constant The factor in the exponent, a consequence of the limit process just described, is the usual within sum of squares from a conventional, weighted, analysis of variance. Now, Q{d(g)} is likely to decrease as d (g) increases, for example for a new partition of d(g) k=1 S k (g) obtained by creating subsets of the existing {S k (g)}. Since f (g, ∆|y) increases as Q{d(g)} decreases, it is helpful to have the second term, exp{−d(g)/2}, that penalizes partitions with larger values of d(g). For our analysis we take δ 2 k (g) = δ 2 and write λ i (g) = δ 2 /(δ 2 + σ 2 i /n i ). Inference for µ is made using (6) and (7) with µ|y, g, δ 2 ∼ N (E(µ|y, g, δ 2 ), V (µ|y, g, δ 2 )) ( where the conditional posterior moments of µ are given in (3) and (4). Our analyses will indicate whether the true effect sizes come from a common source. If so, then using a standard meta-analysis will provide appropriate inference. If not, several alternatives should be considered, as discussed below. If a prior evaluation indicates that one of the studies, i, can be regarded as a gold standard we can consider the posterior distribution corresponding to study i to be the object of inference. Then, using the posterior expected value for illustration, where E(µ i |y, g, δ 2 ) is defined in (3). Thus, inference for µ i is a function ofμ i together with data from the other L − 1 studies as determined by the form of (3), and, critically, by the likelihood associated with the set of subsets, S k (g), containing study i. See Evans and Sedransk (2001) for additional details and an application to a notable study of the effect of using aspirin by patients following a myocardial infarction. Otherwise, one must rely on substantive evaluation to decide whether any distinct subsets identified in the analysis should be analyzed separately, e.g., by separate standard metaanalyses. If there are no covariates that can distinguish the subsets, then the distribution of the true effect for study i is a mixture distribution with unknown probabilities associated with the components. If there are distinct subsets and a single analysis is presented it is important to include a credible interval for the overall true effect. Taken together with the presence of distinct subsets, a very wide interval for the overall true effect would be a strong indication that the single summary rate is not informative. With DP H is a discrete measure, so it is typical to extend the DP to DPM, by using a mixture over a simple parametric form such as a N (µ, σ 2 ) pdf. Let Θ be a finite dimensional parameter space. For each θ ∈ Θ, let f θ be a continuous pdf. Given a probability distribution H defined on Θ, a mixture of f θ with respect to H has the pdf This mixture model can be expressed as an equivalent hierarchical model, especially relevant for our application, i.e., The model in DPmeta, for σ 2 i fixed, is (9) and (10) The (independent) hyperparameters are The uncertain pooling method requires only that one specify a prior distribution for g and δ 2 . By contrast DPmeta requires substantial prior input, i.e., values for We have concerns about the sensitivity of inferences to some of the choices of distributions in DPmeta and possible over-fitting since there are more quantities to be specified than in the uncertain pooling method. Moreover, our analyses include data from only five, six, seven and eleven studies. So, we have omitted the specification M |a 0 , b 0 ∼ Gamma(a 0 , b 0 ) and made inferences for a selected set of values of M as suggested by Escobar (1994) . Also, we omitted the step, η|η b , S b , and replaced η and τ 2 with their maximum likelihood estimates. As noted by a reviewer, a limitation of our uncertain pooling method is that the sample standard errors are assumed to be known, as is typically done. There is contemporary research (Yao et al., 2021) that models both the sample mean and the log of the sample standard error. However, they assume a bivariate normal distribution for these two statistics, a questionable assumption for our (binomial) case. An alternative that we have investigated assumes a binomial likelihood together with beta and uniform prior distributions in a hierarchical model. The limitation, here, is that one must use Reversible Jump Markov Chain Monte Carlo (RJMCMC) for the computations, and implementation is substantially more difficult than the method presented in Section 3.1. Recall that the methodology used in Section 3.1 is based on Evans and Sedransk (2001) who use a constraint in the limit process to overcome the problem that the partitions, g, have different sizes, i.e., that d(g) varies with g. Without this adjustment p(g|y) would not be invariant to changes in the scale of the outcome variable, Y. RJMCMC addresses the problem of varying d(g) by introducing additional random variables that enable the matching of parameter space dimensions across the partitions. For a concise outline of the RJMCMC method see Gelman et al. (2004) . The pioneering paper (Green, 1995) provides the theoretical background for RJMCMC, but also includes, as an example, an application to the uncertain pooling methodology. We have used this model and RJMCMC to analyze the data in the four tables in Section 4. Assume L independent responses, y 1 , . . . , y L , with y i ∼ bin(n i , θ i ). Within S k (g) The group mean parameters, {α k }, are drawn independently from U (0,1) while log q ∼ U (log a, log b). Finally, p(g) ∝ d(g) −1 /#{g : d(g ) = d(g)}. Then the joint distribution of all the variables is p * = p(g, α, q, θ, y) with p * = p(g)p(α, q|g)p(θ|g, α, q)p(y|g, α, q, θ) = p(g)p(α|g)p(q)p(θ|g, α, q)p(y|θ) where the 1 in the last expression is the pdf of the (assumed) uniform distribution for α k . Section 6.2 of Green (1995) gives the full conditionals for θ i , q and α j . The step involving a possible move from partition g to a new partition g * is much more complicated. Green (1995) uses a process that jumps between partitions making only the changes of splitting a group (a birth) and combining two groups (a death). There is an algorithm to select the groups to split and merge. Then births are attempted with probability b g and deaths with probability d g . Jumping to a new partition requires a change in the vector α since its length must increase or decrease by one unit. Several steps are required to develop the associated proposal, finally leading to a complicated acceptance probability. For the pool-all partition Hamza (2008) suggest As an alternative it may be feasible to adapt this to the specification in (11), and implement it using RJMCMC. Our inferences are for the asymptomatic rates, i.e., population proportions, and Tables 1 -4 use this representation. That is, in Tables 1-4 As is typically done in applications such as this, e.g., DerSimonian and Laird (1986), we have replaced σ 2 i with an estimate from the sample. For each of the meta-analyses we give in the Appendix the number of asymptomatic cases and observations for each of the component studies together with references where these data can be found. For uncertain pooling inference for µ is made using (6). To start, evaluate the right side of (7) for {g : g = 1, ..., G; D grid points for δ 2 }, then standardize by dividing the individual terms in the grid by their sum. This provides an approximation for f (g, δ 2 |y). Then select a random sample of size B from the DG normalized values of f (g, δ 2 |y). For each selection, (g * , δ 2 * ), sample µ from f (µ|y, g * , δ 2 * ). For the logit case, transform µ to p = (p 1 , ..., p L ) t at each step. Starting with a grid for δ 2 and g with a very large range for δ 2 , we reduced the (g, δ 2 ) space to make the 2D-grid sampler faster. Specifically, we retained 99.2% of the probability associated with the extensive grid. We generated B = 30, 000 values of µ for the eleven study case and B = 10, 000 for the other cases. Finally, note that approximations for the marginal posterior distributions, i.e., f (g|y) and f (δ 2 |y), can be obtained directly from the grid approximation of f (g, δ 2 |y). As noted in Section 1, our study was motivated by a meta-analysis by He, Yi and Zhu (2020) that included early studies of the COVID-19 asymptomatic infection rate. He, Yi and Zhu (2020) carried out a standard meta-analysis using a normal-based random effects model with the effect sizes as the outcome random variable and assuming fixed SEs. Since there is considerable variation in the effect sizes it is prudent to be cautious and investigate whether the true effect sizes are from a single source. Our analysis starts with the basic uncertain pooling method (Section 3.1) and DPmeta (Section 3.2). We summarize the results using the binomial-beta model and RJMCMC (Section 3.3) at the end of this section. For the basic uncertain pooling method we assumed, a priori, that all 37 partitions have equal probability, i.e., p(g) = 1/37. For the prior on δ 2 , independent of g, we used two distributions: a) InvBeta: For a standard random effects model, Gelman (2006) recommends using a half-Cauchy prior distribution for δ, p(δ) ∝ 1/(1+δ 2 ). While our situation is very different, i.e., many partitions and weighting depending on δ 2 (not δ), we have adopted this suggestion in (a) by transforming the half-Cauchy pdf to obtain the Inverse Beta (InvBeta) pdf of δ 2 . Previous research has shown that there are benefits to having the prior distribution for δ 2 concentrated near 0, leading to the choice in (b) with α = 11.01, β = 0.001. In the following we present results only for (a) as those for (b) are similar. As described in Section 2, a complete specification of DPmeta requires estimates of many hyperparameters. This seems inappropriate for meta-analyses such as these. So, we have replaced η and τ 2 with their maximum likelihood estimates, thus eliminating the need to specify the other hyperparameters. We have adopted the suggestion of Escobar (1994) to use a small set of values for M , i.e., {L −1 , L 0 , L 1 , L 2 }, typically augmented by a value for M much smaller than L −1 and one larger than L 2 . We start by discussing the meta-analysis that motivated our investigation, i.e., He, Yi, and Zhu.He, Yi and Zhu (2020) In Table 1 we present for each of the five studies the sample effect size (sample proportion,p), posterior expected value of the true effect size, standard error (SE), and 95% credible interval for the true effect size. Here, SE = pq/n where n is the total number of observations. From DPmeta, there are the posterior expected values of the true effect size corresponding to M = 1/5 and 5 (a good representation of the six values we used in our analysis). The remaining columns give the posterior means and credible intervals obtained by using the Reversible Jump MCMC method. We summarize our results from the basic uncertain pooling method and DPmeta first, adding brief comments about the results obtained by using the RJMCMC method at the end of this section. In general, the results are consistent. Table 1 : Sample effect sizes, standard errors, and posterior summaries from the basic uncertain pooling and DPmeta methods and binomial-beta model for five COVID-19 studies (He, Yi and Zhu, 2020) probabilities is 1.1 × 10 −4 , a very small quantity. A standard way (Gelman, 2006) to assess the likelihood that the true effect sizes come from a common source is to assume the pool-all model, g = g 0 , and evaluate the posterior predictive p-value using a standard (6.4 of Gelman, 2006) , discrepancy measure. With overall effect (12), andσ 2 i /n i is 1/(n ipi (1 −p i )). The discrepancy measure, is based on the pool-all model, as defined in (1) and (2). Then the posterior predictive p-value is P r{T (y rep , ν, δ 2 ) ≥ T (y obs , ν, δ 2 )|y obs } with y rep denoting a replication from the pool-all model. For these data the p-value is 3.3 × 10 −5 , showing that the observed data are not concordant with the pool-all model. These results show that it is highly unlikely that the five true effect sizes come from a single source. Another way to analyze these data is to construct a similarity matrix. For each pair of studies, i and j, the similarity matrix gives the posterior probability that i and j are in the same cluster. We present in Figure Table 1 ). The results in Table 1 and Figure 1 suggest that one should not pool the data from these five studies. The next review that we consider, He et al. (2021) also shows the same issue, i.e., questionable pooling of the data from all of the eleven studies. However, in this case, our analysis provides evidence of considerable clustering. Looking at the characteristics of the eleven studies could reveal the reasons for this clustering, and the direction to take to make appropriate inferences. The analysis presented below uses a subset of the data in He et al. (2021) , namely the asymptomatic infection rate in eleven studies of children. The results are summarized in Table 2 and Figure 2 , analogous to Table 1 and Figure 1 . From Table 2 and Figure 2 it is apparent that there are several distinct subsets. Further investigation could reveal features that separate these subsets, leading to advances in understanding the differences in the asymptomatic rates. The posterior probability, p(g 0 |y), associated with the pool-all partition, is minuscule, i.e., 1.5 × 10 −11 . As described for the He, Yi and Zhu (2020) study, the sum of the posterior probabilities associated with partitions having only a single cluster and 0, 1, 2, 3, 4 or 5 singleton subsets is 1.63 × 10 −4 . This result suggests that it is unlikely that the eleven true effect sizes come from a single source. Moreover, using the discrepancy measure in (13), the posterior predictive p-value is 3.3 × 10 −5 , showing that the observed data are not concordant with the pool-all model. Finally, a 95% credible interval for the overall true effect size, ν, is 0.15 ≤ ν ≤ 0.45. This interval, together with the clustering, is a strong indication that a single summary value such as the posterior mean of ν would not be informative. From Figure 2 the most likely cluster is {3,6,7,11}, while the next most likely one is {1,2,5,8,9,10} . In this case, there is considerable clustering but it does not extend to all eleven studies. Without additional evidence this analysis suggests conducting separate standard meta-analyses for the two large subsets. In Table 2 The clustering seen in Figure 2 is also evident in the posterior expected values from DPmeta with small M ; see Table 2 with M = 1/6. With M = 6 there is good agreement between the posterior expected values for the two methods. For most of the eleven studies there is good agreement in the credible intervals for the two methods with M = 6 for DPmeta (not shown in Table 2 ). The next analysis uses the data from a subset of six of the eleven studies in He et al. (2021) . These six studies were chosen to illustrate properties of the methodology when there is considerable separation. The results are summarized in Table 3 and Figure 3 . From Figure 3 and Table 3 it is apparent that there are two distinct subsets, i.e., {1, 2, 5} and {6, 7, 11}. Presumably this separation reflects different characteristics of the two populations and/or different ways that the studies were carried out. As expected, p(g 0 |y) is minuscule, i.e., 3.1×10 −6 . Proceeding as described for the He et al. (2021) study, the sum of the posterior probabilities associated with partitions having only a single large cluster (i.e., with at least four members) is 1.1 × 10 −3 . Thus, there is no evidence that the true effect sizes from these six studies come from a single source. Moreover, using the discrepancy measure in (13) Table 4 and Figure 4 . In Table 4 (13), the posterior predictive p-value is 0.40, indicating that there may be a common source for the true effects. This result is supported by Figure 4 which suggests relatively uniform clustering (except for studies 4 and 6). Since the data in each of Tables 2-4 are for a subset of a much larger set of studies they are likely to be substantially more homogeneous than the data in the full set of studies. For example, the seven screening studies (Table 4 ) are a subset of seventy-nine studies with sample proportions ranging from 0.01 to 0.92. Results from using DPmeta are, for the most part, consistent with these observations. For Table 4 . For M = 6 the intervals corresponding to {5,6,7}, the studies with the smallest SEs, are similar to those from the basic uncertain pooling methodology. With a few modifications we have implemented the RJMCMC procedure outlined in Sections 6.2 and 6.3 of Green (1995) , but expand the range for the prior for q by taking log q ∼ U (log 100, log 1000) throughout. We summarize by using the posterior means and 95% credible intervals (bottom of Tables 1-4) , and similarity plots (bottom of Figures 1-4) . Since the sample likelihoods and prior distributions differ between the two approaches, i.e., those based on the likelihoods and priors in Sections 3.1 and 3.3, comparisons of the results may not be especially meaningful. However, examining Tables 1-4 it is notable that the summaries (posterior means and intervals) from the two approaches are consistent, generally close when the standard errors (SEs) are small, less so when the SEs are very large. There are no major differences between the comparable similarity plots corresponding to basic uncertain pooling and RJMCMC. The importance of good inferences for the COVID-19 asymptomatic infection rates is clear, as noted in the quotation from Buitrago-Garcia et al. (2020) in Section 1. Conducting meta-analyses is a common, often useful, way to summarize information from a collection of studies. However, inference will be misleading if there is pooling of data from studies that are not concordant. For example, Byambasuren et al. (2020) note: "A recent review by the Centre for Evidence Based medicine in Oxford found a range of estimates of asymptomatic COVID-19 cases which ranged from 5% to 80%. However, many of the identified studies were either poorly executed or poorly documented, making the validity of these estimates questionable." In this paper, we re-analyze data from three review papers, using three Bayesian methods that have a more general structure than the common meta-analytic ones. This methodology shows, in a principled manner, the extent and nature of the pooling that can be justified statistically. The more general structure should ensure greater concordance of the data with our model than with a more restricted model. If the authors of a review have screened the studies so that the ones remaining for analysis have no markedly aberrant characteristics then an analysis showing distinct clusters should prompt a further review, and careful consideration of the inferences to make and the method to use. In some situations there may be covariates associated with the studies that may help to explain differences in the outcomes. To illustrate, use the basic notation and a linear regression offset. That is, replace (1) withȲ i ∼ N (µ i + x t i β, σ 2 i ). Then inference for µ can be made using the extension of (6) f (µ, β|y) = f (β|µ, y, g, δ 2 )f (µ|y, g, δ 2 )f (g, δ 2 |y)dδ 2 dg where it is easily shown that β|µ, y, δ 2 , g ∼ M V N (d, A −1 ) with z µ = (Y 1 − µ 1 , ..., Y L − µ L ) t , X an L × p matrix of covariates, β a p × 1 vector of regression coefficients, V an L × L diagonal matrix with (i, i)th element σ −2 i , A = X t V X, c = z t µ V X and d = A −1 c t . In the reviews we have considered, only He et al. (2021) gives more than one covariate for each study, i.e., the number of confirmed cases and percent male. Using the covariates, an exploratory analysis of the residuals showed that such an augmented analysis will not improve inferences. The three methods can be implemented. For DPmeta there is an R package (DPpackage) and the code for DPmeta is included as Supplementary Material. R packages are being developed for the two uncertain pooling methods, i.e., basic uncertain pooling and RJMCMC. Finally, for basic uncertain pooling and DPmeta both the sample proportion and logit of the sample proportion are used in applications. While we have presented results only for the latter, our findings are similar for both choices. A Observed numbers of asymptomatic cases (Y ) and observations (n) of asymptomatic and presymptomatic SARSCoV-2 infections: A living systematic review and meta-analysis, PLOS Medicine Prevalence of asymptomatic Sars-Cov-2 infection: a narrative review Bayesian methods for meta-analysis A re-evaluation of fixed effect(s) meta-analysis Estimation of the basic reproduction number, average incubation time, asymptomatic infection rate, and case fatality rate for COVID-19: Meta-analysis and sensitivity analysis Proportion of asymptomatic coronavirus disease 2019: A systematic review and meta-analysis Estimating the extent of true asymptomatic COVID-19 and its potential for community transmission: systematic review and meta-analysis. Official Journal of the Association of Medical Microbiology and Infectious Disease Canada Bayesian methodology for combining the results from different experiments when the specifications for pooling are uncertain Combining data from experiments that may be similar Random partition distribution indexed by pairwise information Bayesian Nonparametric Data Analysis DPpackage: Bayesian semi-and nonparametric modeling in R Estimating normal means with a Dirichlet process prior Bivariate hierarchical Bayesian model for combining summary measures and their uncertainties from multiple sources Bayesian Data Analysis Reversible jump Markov chain Monte Carlo computation and Bayesian model determination The binomial distribution of meta-analysis was preferred to model within-study variability Meta-analysis in clinical trials Prior distributions for variance parameters in hierarchical models Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) Asymptomatic and pre-symptomatic SARS-CoV-2 infections in residents of a long-term care skilled nursing facility A considerable proportion of individuals with asymptomatic SARS-CoV-2 infection in Tibetan population MedRxiv Adda e un caso di studio: 'll 70% dei donatori di sangue 'e positivo COVID-19: four fifths of cases are asymptomatic, China figures indicate The authors are grateful to the reviewers for their comments which have improved the focus of the paper and motivated further methodological development. They are also grateful to Professor Peter Green for his assistance in applying Reversible Jump MCMC to our data.They also appreciate research allocation grants from XSEDE's Pittsburgh Supercomputing Center. None reported. The authors declare no potential conflict of interests.