key: cord-0845429-f5mcyv0r authors: Dong, Qunfeng; Gao, Xiang title: Bayesian Estimation of the Seroprevalence of Antibodies to SARS-CoV-2 date: 2020-10-23 journal: JAMIA Open DOI: 10.1093/jamiaopen/ooaa049 sha: e2ae02bb39c26c957fbdc26d7285decbec22eeda doc_id: 845429 cord_uid: f5mcyv0r Accurate estimations of the seroprevalence of antibodies to SARS-CoV-2 need to properly consider the specificity and sensitivity of the antibody tests. In addition, prior knowledge of the extent of viral infection in a population may also be important for adjusting the estimation of seroprevalence. For this purpose, we have developed a Bayesian approach that can incorporate the variabilities of specificity and sensitivity of the antibody tests, as well as the prior probability distribution of seroprevalence. We have demonstrated the utility of our approach by applying it to a recently published large-scale dataset from the U.S. CDC, with our results providing entire probability distributions of seroprevalence instead of single point estimates. Our Bayesian code is freely available at https://github.com/qunfengdong/AntibodyTest. Lay summary To estimate the extent of the viral infection, we have developed a statistical method that can incorporate the variabilities of specificity and sensitivity of the antibody tests. Our computer code is freely available at https://github.com/qunfengdong/AntibodyTest. Antibody tests for COVID-19 have been increasingly deployed to estimate the seroprevalence of antibodies to SARS-CoV-2 1 3 a specificity (i.e., 1 -false positive rate) of 99.3% (95% CI, 98.3%-99.9%) and sensitivity (i.e., true positive rate) of 96.0% (95% CI, 90.0%-98.9%) 3 . In order to take the test accuracy into the consideration, the CDC study applied the following simple correction: R obs = P Sensitivity + (1-P) ( Let N t and N p denote the number of people tested in total and the number of people tested as positive, respectively. Let p denote the unknown seroprevalence of antibodies to SARS-CoV-2. Let  denote the true positive rate of the antibody test (i.e., sensitivity). Let  denote the false positive rate of the test (i.e., 1 -specificity). Then, we can define the following likelihood function: corresponds to the probability of observing (N t -N p ) people whose test results were negative. To estimate the posterior probability of p, , we need to sample from the following posterior distribution: To specify the prior distribution for p, , and , we chose beta distributions as they are commonly used to model probabilities 6 . https://mc.manuscriptcentral.com/jamiao where  p ,  p ,   ,   ,   , and   denote shape parameters of the corresponding beta distributions. For the unknown parameter p, we chose to use a non-informative flat prior probability distribution for this study (i.e.,  p =  p = 1), although it can be adjusted if prior knowledge of the proportion of infected people for a particular region is known (see more in the Discussion section). For  and , we chose informative priors to reflect the known specificity and sensitivity of a particular antibody test. Specifically, the shape parameters of   ,   ,   , and   can be estimated using the method of moments 5 as follows: where   and   2 , and   and   2 represent the mean and variance of the test specificity and sensitivity, respectively. For this study, the mean of specificity and sensitivity is 99.3% and 96.0%, respectively. The variances of specificity and sensitivity were approximated 7 as s(1-s)/n, where s is the mean value of specificity or sensitivity, and n = 618 according to the CDC validation study on the antibody test accuracy 8 . We used WinBUGS 9 (version 1.4.3) to implement the above models. In particular, the likelihood function was implemented using the "ones trick" 10 Multiple initial values were applied for MCMC sampling. The above Bayesian procedure was validated with simulated datasets generated by our customized R 12 script (available in the above GitHub repository). The seroprevalence data was taken from the aforementioned CDC publication 3 . Our approach requires two inputs: (i) the total number of tested samples and (ii) the number of positive samples. For this project, we only focused on gender-specific data in the CDC study. We extracted the total number of male and female samples from the original Table 1 in the CDC publication. However, the number of positive samples was not reported in the CDC publication. To infer those numbers for both genders, we extracted the CDC estimated seroprevalence, P, for both genders from the original Table 2 in the CDC publication. Using the equation P = (R obs -0.007)/0.953 mentioned above, we obtained the observed seroprevalence R obs for both genders, which were used for calculating the number of observed positive male and female samples by multiplying R obs to the total number of samples in each respective gender. 7 We applied our Bayesian approach to the data listed in Table 1 . It is important to emphasize that Bayesian approaches produce entire probability distributions instead point estimates 6 . Figure 1 depicts the posterior distributions of the seroprevalence of antibodies to SARS-CoV-2 virus in 10 U.S. sites. Table 2 lists both the original CDC point estimates with the accompanying 95% confidence intervals, and our Bayesian estimates, which were presented as the medians and 95% credible intervals of the posterior distributions. It is worth noting that confidence intervals and Bayesian credible intervals are two different concepts 13 , thus they are not technically comparable despite being listed together in Table 2 for convenience. This is particularly important for accurate estimation if the true prevalence is low 5 . Moreover, the Bayesian approach also provides a natural framework for updating the estimation based on new data, which is particularly relevant to the continuous monitoring of the seroprevalence of coronavirus antibodies. For example, New York City is still releasing coronavirus antibody test results on a weekly basis 15 . By turning the estimated posterior distribution from previous weeks into a prior distribution for the next week, the seroprevalence of coronavirus antibody can be quickly updated within a solid Bayesian probabilistic inference framework. Q.D. and X.G. both contributed project conception. Q.D. contributed WinBUGS modeling and drafting the manuscript. X.G. contributed R programming and data analysis. None declared https://mc.manuscriptcentral.com/jamiao The Promise and Peril of Antibody Testing for COVID-19 Test, test, test for covid-19 antibodies: the importance of sensitivity, specificity and predictive powers Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States COVID-19: understanding the science of antibody testing and lessons from the HIV epidemic Antibody testing for COVID-19: can it be used as a screening tool in areas with low prevalence? Bayesian Data Analysis Elementary Statistics Validation of a SARS-CoV-2 spike protein ELISA for use in contact investigations and sero-surveillance WinBUGS -a Bayesian modelling framework: concepts, structure, and extensibility The BUGS Book: A Practical Introduction to Bayesian Analysis Inference from iterative simulation using multiple sequences (with discussion) R: A language and environment for statistical computing. R Foundation for Statistical Computing Understanding and interpreting confidence and credible intervals around effect estimates What policy makers need to know about COVID-19 protective immunity