key: cord-0823031-pv83pc6x authors: Dong, Q.; Gao, X. title: Bayesian Estimation of the Seroprevalence of Antibodies to SARS-CoV-2 date: 2020-08-25 journal: nan DOI: 10.1101/2020.08.23.20180497 sha: 7310edf764e5cef31288199ebd8492a2a4057869 doc_id: 823031 cord_uid: pv83pc6x Accurately estimating the seroprevalence of antibodies to SARS-CoV-2 requires the use of appropriate methods. Bayesian statistics provides a natural framework for considering the variabilities of specificity and sensitivity of the antibody tests, as well as for incorporating prior knowledge of viral infection prevalence. We present a full Bayesian approach for this purpose, and we demonstrate the utility of our approach using a recently published large-scale dataset from the U.S. CDC. Antibody tests for COVID-19 have been increasingly deployed to estimate the seroprevalence of antibodies to SARS-CoV-2 1 Recently, the U.S. Centers for Disease Control and Prevention (CDC) published a largescale study on antibody tests from 10 sites in the U.S. administered between March 23 and May 12, 2020 3 . The CDC antibody tests employed an enzyme-linked immunosorbent assay with a specificity (i.e., 1 -false positive rate) of 99.3% (95% CI, 98.3%-99.9%) and sensitivity (i.e., true positive rate) of 96.0% (95% CI, 90.0%-98.9%) 3 . In order to take the test accuracy into the consideration, the CDC study applied the following simple correction: R obs = PൈSensitivity + (1-P) ൈ (1-Specificity), where R obs is the observed seroprevalence in the study samples and P is the unknown seroprevalence in populations. Using the point estimates of the sensitivity (96.0%) and specificity (99.3%) of the antibody tests, they obtained the point estimate of the population prevalence P = (R obs -0.007)/0.953. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 25, 2020. . https://doi.org/10.1101/2020.08.23.20180497 doi: medRxiv preprint There are two main limitations with such an approach. First, only the point estimate of population prevalence P was obtained. Although the CDC study also generated confidence intervals for the point estimate based on a non-parametric bootstrap procedure, the confidence interval does not provide a probabilistic measurement of the uncertainty associated with all possible values of the unknown prevalence. Second, the above CDC approach could not account for any prior knowledge of the population prevalence P, which can lead to inaccurate estimation especially when the true rate of viral infection is low, even with high specificity and sensitivity of the tests 4,5 . To overcome the above limitations, we have developed a Bayesian approach. Our approach is not a simple application of Bayes' theorem by plugging in the point estimates of sensitivity and specificity into the formula and computing a posterior probability. Instead, our approach is a full Bayesian procedure that models the known variability in the sensitivity (95% CI, 90.0%-98.9%) and specificity (95% CI, 98.3%-99.9%) of the antibody test, and we can incorporate any prior knowledge of the viral infection rate to estimate the entire posterior probability distribution of the unknown population prevalence. Let N t and N p denote the number of people tested in total and the number of people tested as positive, respectively. Let p denote the unknown seroprevalence of antibodies to SARS-CoV-2. Let θ denote the true positive rate of the antibody test (i.e., sensitivity). Let κ denote the false positive rate of the test (i.e., 1 -specificity). Then, we can define the following likelihood function: . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 25, 2020. . https://doi.org/10.1101/2020.08.23.20180497 doi: medRxiv preprint In Eq. (1), the term (pθ + (1-p)κ) Np corresponds to the probability of observing N p people that have tested positive, since a person with a positive test result can either be infected (with the probability of p) and correctly test positive (with the probability of θ), or not infected (with the probability of 1 -p) and falsely test positive (with the probability of κ). Similarly, the term (p(1-θ) (Nt -Np) corresponds to the probability of observing (N t -N p ) people whose test results were negative. To estimate the posterior probability of p, , we need to sample from the following posterior distribution: To specify the prior distribution for p, κ, and θ, we chose beta distributions as they are commonly used to model probabilities 6 . where α p , β p , α κ , β κ , α θ , and β θ denote shape parameters of the corresponding beta distributions. For the unknown parameter p, we chose to use a non-informative flat prior probability distribution for this study (i.e., α p = β p = 1), although it can be adjusted if prior knowledge of the proportion of infected people for a particular region is known (see more in the Discussion section). For κ and θ, we chose informative priors to reflect the known specificity and sensitivity of a particular antibody test. Specifically, the shape parameters of α κ , β κ , α θ , and β θ can be estimated using the method of moments 5 as follows: . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 25, 2020. . https://doi.org/10.1101/2020.08.23.20180497 doi: medRxiv preprint where μ κ and σ κ 2 , and μ θ and σ θ 2 represent the mean and variance of the test specificity and sensitivity, respectively. For this study, the mean of specificity and sensitivity is 99.3% and 96.0%, respectively. The variances of specificity and sensitivity were approximated 7 as s(1-s)/n, where s is the mean value of specificity or sensitivity, and n = 618 according to the CDC validation study on the antibody test accuracy 8 . We used WinBUGS 9 (version 1.4.3) to implement the above models. In particular, the likelihood function was implemented using the "ones trick" 10 were evaluated with trace/history/autocorrelation plots and the Gelman-Rubin diagnostic 11 . Multiple initial values were applied for MCMC sampling. The above Bayesian procedure was validated with simulated datasets generated by our customized R 12 script (available in the above GitHub repository). The seroprevalence data was taken from the aforementioned CDC publication 3 . Our approach requires two inputs: (i) the total number of tested samples and (ii) the number of positive samples. For this project, we only focused on gender-specific data in the CDC study. We extracted the total number of male and female samples from the original Table 1 in the CDC publication. However, the number of positive samples was not reported in the CDC publication. To infer those numbers for both genders, we extracted the CDC estimated seroprevalence, P, for both genders from the original Table 2 in the CDC publication. Using the equation P = (R obs -0.007)/0.953 mentioned above, we obtained the observed seroprevalence R obs for . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 25, 2020. . https://doi.org/10.1101/2020.08.23.20180497 doi: medRxiv preprint both genders, which were used for calculating the number of observed positive male and female samples by multiplying R obs to the total number of samples in each respective gender. We applied our Bayesian approach to the data listed in Table 1 . It is important to emphasize that Bayesian approaches produce entire probability distributions instead point estimates 6 . Table 2 lists both the original CDC point estimates with the accompanying 95% confidence intervals, and our Bayesian estimates, which were presented as the medians and 95% credible intervals of the posterior distributions. It is worth noting that confidence intervals and Bayesian credible intervals are two different concepts 13 , thus they are not technically comparable despite being listed together in Table 2 for convenience. Although the posterior medians are similar to the original CDC point estimates overall, the entire posterior distributions ( fig. 1 ) inferred by our Bayesian approach accurately capture the uncertainties associated with seroprevalence (i.e., the posterior distribution provides a precise probability . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 25, 2020. . https://doi.org/10.1101/2020.08.23.20180497 doi: medRxiv preprint associated with every possible value of seroprevalence), which cannot be achieved through confidence intervals. Antibody tests have been increasingly applied to estimate the prevalence of people who have been infected by the SARS-CoV-2 virus. For example, New York City recently released data of more than 1.46 million coronavirus antibody test results on August 18, 2020. Accurately analyzing such data is critical for developing important public health policies 14 . Our Bayesian approach can account for the variabilities in antibody tests (i.e., uncertainties in the sensitivity and specificity of the tests). In addition, the Bayesian approach can easily incorporate prior knowledge of the proportion of infected people for a particular region. This is particularly important for accurate estimation if the true prevalence is low 5 . Moreover, the Bayesian approach also provides a natural framework for updating the estimation based on new data, . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 25, 2020. . https://doi.org/10.1101/2020.08.23.20180497 doi: medRxiv preprint which is particularly relevant to the continuous monitoring of the seroprevalence of coronavirus antibodies. For example, New York City is still releasing coronavirus antibody test results on a weekly basis 15 . By turning the estimated posterior distribution from previous weeks into a prior distribution for the next week, the seroprevalence of coronavirus antibody can be quickly updated within a solid Bayesian probabilistic inference framework. Q.D. and X.G. both contributed project conception. Q.D. contributed WinBUGS modeling and drafting the manuscript. X.G. contributed R programming and data analysis. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 25, 2020. The Promise and Peril of Antibody Testing for COVID-19 Test, test, test for covid-19 antibodies: the importance of sensitivity, specificity and predictive powers Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States COVID-19: understanding the science of antibody testing and lessons from the HIV epidemic Antibody testing for COVID-19: can it be used as a screening tool in areas with low prevalence? Bayesian Data Analysis Elementary Statistics Validation of a SARS-CoV-2 spike protein ELISA for use in contact investigations and sero-surveillance WinBUGS -a Bayesian modelling framework: concepts, structure, and extensibility The BUGS Book: A Practical Introduction to Bayesian Analysis Inference from iterative simulation using multiple sequences (with discussion) R: A language and environment for statistical computing. R Foundation for Statistical Computing Understanding and interpreting confidence and credible intervals around effect estimates What policy makers need to know about COVID-19 protective immunity Funding None