key: cord-0443556-dl8ztzwt
authors: Rosin, Samuel; Shook-Sa, Bonnie E.; Cole, Stephen R.; Hudgens, Michael G.
title: Estimating SARS-CoV-2 Seroprevalence
date: 2021-11-04
journal: nan
DOI: nan
sha: 160553d61690c80c9db1fcaa6310e2dcd2dd0608
doc_id: 443556
cord_uid: dl8ztzwt

Governments and public health authorities use seroprevalence studies to guide their responses to the COVID-19 pandemic. These seroprevalence surveys estimate the proportion of persons within a given population who have detectable antibodies to SARS-CoV-2. However, serologic assays are prone to misclassification error due to false positives and negatives, and non-probability sampling methods may induce selection bias. In this paper, we consider nonparametric and parametric prevalence estimators that address both challenges by leveraging validation data and assuming equal probabilities of sample inclusion within covariate-defined strata. Both estimators are shown to be consistent and asymptotically normal, and consistent variance estimators are derived. Simulation studies are presented comparing the finite sample performance of the estimators over a range of assay characteristics and sampling scenarios. The methods are used to estimate SARS-CoV-2 seroprevalence in asymptomatic individuals in Belgium and North Carolina.

or vaccination, and then eventually decreasing (or waning) over time. Thus individuals may not have detectable antibodies if never (or very recently) exposed, infected, or vaccinated, or if their antibody levels have waned to a level below the limit of detection (of the assay being employed). To the extent that antibody levels are associated with protection from infection with SARS-CoV-2 or COVID-19 disease (Earle et al., 2021; Khoury et al., 2021) , seroprevalence estimates may be helpful in modeling the fraction of a population which may be immune or less susceptible to Covid-19. Likewise, cross-sectional seroprevalence estimates, combined with certain modeling assumptions and other data, may permit inference about other parameters such as the cumulative incidence of SARS-CoV-2 infection, infection fatality rate, or attack rate (Takahashi et al., 2021; Shioda et al., 2021; Buss et al., 2021; Perez-Saez et al., 2021; Brazeau et al., 2020) .

Unfortunately, seroprevalence studies often suffer from at least two sources of bias: measurement error due to false positives and negatives, and selection bias because convenience sampling designs are often used due to time and cost constraints. Typically, blood tests for antibodies result in a continuous measure of a particular antibody response, such as that of immunoglobulin G, M, or A (IgG, IgM, or IgA), which is dichotomized into a positive or negative classification based on a cutoff value. This dichotomization almost always produces misclassification bias in the form of false positives and false negatives (Bouman et al., 2021) . The following example from Sempos and Tian (2021) demonstrates how this measurement error can lead to biased seroprevalence estimates. Suppose that true seroprevalence is 1% and antibody tests are performed using an assay which perfectly identifies true positives as positive, so with 100% sensitivity, and nearly perfectly identifies true negatives as negative, with 99% specificity. Despite this assay's high sensitivity and specificity, it is straightforward to show that naively using the sample proportion of positive test results as a seroprevalence estimator would, in expectation, lead to a seroprevalence estimate of nearly 2% rather than 1%.

To appropriately account for measurement error, sensitivity and specificity should be estimated and incorporated into seroprevalence estimators. The sensitivity and specificity of a diagnostic test are commonly estimated by performing the assay on samples of known (or gold standard) positives and negatives, respectively. For example, samples from patients who had a case of COVID-19 confirmed with reverse transcription polymerase chain reaction (RT-PCR) testing are often assumed to be true positives. Remnant blood samples that were drawn in 2019 or earlier are often assumed to be true negatives. Sensitivity and specificity are estimated from these validation data as the proportion of samples appropriately classified as positive and negative, respectively. These estimates of sensitivity and specificity can then be incorporated into the prevalence estimator, often with a method popularized by Rogan and Gladen (1978) (see also Marchevsky (1979) ), to appropriately adjust the seroprevalence estimate for assay characteristics.

Many seroprevalence studies are conducted by non-probability sampling methods such as convenience sampling, which may lead to selection bias when characteristics that drive participation in the study are also risk factors for SARS-CoV-2 infection. Probability-based sampling studies are ideal because they are representative by design and lead to less biased estimates than convenience samples analyzed with post-hoc statistical adjustments (Shook-Sa et al., 2020; Accorsi et al., 2021) . However, probability-based sampling may not always be feasible. Convenience sample based estimators often rely on the assumption that each person in a covariate-defined stratum has an equal probability of being in the sample (Elliott and Valliant, 2017) . Under this assumption, population prevalence can be estimated with direct standardization, though complex survey methods like calibration have been used as well in the SARS-CoV-2 setting (e.g., Bajema et al., 2021) .

In this paper, methods are considered which combine standardization and the Rogan-Gladen adjustment to account for both measurement error and selection bias. The article is organized as follows. Section 2 reviews prevalence estimation under measurement error. Nonparametric and parametric standardized prevalence estimators and their large-sample properties are described in Section 3. Section 4 presents simulation studies to evaluate the empirical bias and 95% confidence interval (CI) coverage of the standardized estimators across a range of assay characteristics and bias scenarios. The methods are then applied in Section 5 to estimate seroprevalence of SARS-CoV-2 among all residents of Belgium in 2020 and, separately, in asymptomatic residents of North Carolina during the spring of 2020. Section 6 concludes with a discussion. Proofs are in the Appendices.

2 Seroprevalence estimation under measurement error

Let the true serology status for an individual in the target population be denoted by Y , with Y = 1 if the individual has antibodies against SARS-CoV-2 and Y = 0 otherwise. Our goal is to draw inference about the population seroprevalence π = P (Y = 1). Because of error in the serology assay, Y is not observed directly. Let the result of the serology assay be denoted by X, with X = 1 if the individual tests positive (according to the antibody assay used) and X = 0 otherwise. Three key quantities are sensitivity, the probability that a true positive tests positive, denoted by σ e = P (X = 1 | Y = 1); specificity, the probability that a true negative tests negative, denoted by σ p = P (X = 0 | Y = 0); and the population expectation of the serology assay outcome, denoted by ρ = E(X) = P (X = 1). Unless the assay has perfect sensitivity and specificity with σ e = σ p = 1, ρ typically will not equal π and X will be a misclassified version of Y .

Suppose validation studies are conducted to estimate the sensitivity and specificity by applying the assay to known positives and negatives. Specifically, measurements are taken on n 1 independent and identically distributed (iid) units from strata of the population where Y = 1 and on n 2 iid units from strata where Y = 0. Thus n 1 copies of X are observed to estimate sensitivity and n 2 copies of X are observed to estimate specificity. To estimate seroprevalence in a target population, a 'main' study with n 3 iid copies of X is then conducted, among which true infection status is unknown.

Assume, as is realistic in many SARS-CoV-2 studies, that there is no overlap between the units in each of the three studies. Let δ i be an indicator of which study the ith individual's sample X i is from, with δ i = 1 for the sensitivity study, δ i = 2 for the specificity study, and δ i = 3 for the main study. Note that I(δ i = j) = n j for j = 1, 2, 3, where n = n 1 + n 2 + n 3 and here and throughout summations are taken from i = 1 to n unless otherwise specified. Assume n j /n → c j ∈ (0, 1) as n → ∞.

Let θ = (σ e , σ p , ρ, π) T . Consider the estimatorθ = (σ e ,σ p ,ρ,π RG ) T , whereσ e = n −1 1 I(δ i = 1)X i ,σ p = n −1 2 I(δ i = 2)(1−X i ),ρ = n −1 3 I(δ i = 3)X i , andπ RG = (ρ+σ p −1)/(σ e +σ p −1). The prevalence estimatorπ RG is motivated by rearranging the identity that ρ = πσ e +(1−π)(1−σ p ) and is sometimes referred to as the Rogan-Gladen (1978) estimator.

The estimatorθ can be expressed as the solution (for θ) to the estimating equation vector

where here and below 0 denotes a column vector of zeros. Since the samples were selected from three different populations, the data X 1 , . . . , X n are not identically distributed and care must be taken to derive the large sample properties ofθ. In Appendix A, the estimatorθ is shown to be consistent and asymptotically normal. Specifically, as n → ∞,

The proof of consistency and asymptotic normality is similar to those from standard estimating equation theory (e.g., Boos and Stefanski, 2013, Equation 7 .10), but because the data are not identically distributed the Lindeberg-Feller Central Limit Theorem (CLT) is used in place of the classical Lindeberg-Lévy CLT. Note that the asymptotic variance (1) consists of three components corresponding to the sensitivity, specificity, and main studies. In some circumstances, investigators may be able to decrease the variance ofπ RG by increasing the sample sizes of the sensitivity or specificity studies compared to the main study (Larremore et al., 2020) .

LetV π,RG denote the plug-in estimator defined by replacing σ e , σ p , ρ, π, and c j in (1) witĥ σ e ,σ p ,ρ,π RG , and n j /n for j = 1, 2, 3, and note thatV π,RG /n is the variance estimator proposed by Rogan and Gladen (1978) . By the continuous mapping theorem,V π,RG is consistent for the asymptotic variance assuming σ e > 1 − σ p and can be used to construct Wald-type CIs that asymptotically attain nominal coverage probabilities.

In finite samplesπ RG sometimes yields estimates outside of [0, 1] when (i)σ e < 1 −σ p , (ii) ρ < 1 −σ p , or (iii)ρ >σ e . Indeed, (ii) occurred in the ScreenNC study discussed in Section 5.2.

Estimates are typically truncated to be inside [0, 1] because the true population prevalence must exist in [0, 1] (Hilden, 1979) . In this article, all point estimates and bounds of interval estimates are so truncated. Note, though, that as the three sample sizes grow large the estimatorπ RG yields estimates inside [0, 1] almost surely unless σ e < 1 − σ p . In practice, settings where σ e < 1 − σ p may be very unlikely; in such scenarios, the probability of a positive test result is higher for seronegative persons than for seropositive persons, so such a measurement instrument performs worse in expectation than random guessing. Throughout this manuscript, we assume σ e > 1 − σ p .

3 Standardized seroprevalence estimation

In some settings it may not be reasonable to assume the n 3 copies of X from the main study constitute a random sample from the target population. Suppose instead that for each copy of X a vector of discrete covariates Z is observed, with Z taking on k possible values z 1 , . . . , z k . The covariates Z are of interest because seroprevalence may differ between the strata; for instance, Z might include demographic variables such as age group, race, or gender. Denote the mean of X in the jth stratum as ρ j = P (X = 1 | Z = z j ) and the sample size for the jth stratum as n z j = I(δ i = 3, Z i = z j ), so k j=1 n z j = n 3 . The distribution of strata in the target population, if known, can be used to standardize estimates so they are reflective of the target population (for a review of direct standardization, see van Belle et al. (2004, Chapter 15) ). Denote the proportion of the target population comprised by the jth stratum as γ j = P (Z = z j ) and suppose that these stratum proportions are known with each γ j > 0 and k j=1 γ j = 1. For instance, if the target population were all of the adults in a US state, then γ 1 , . . . , γ k could be obtained from US census data. Assume that all persons in a covariate stratum defined by Z have the same probability of inclusion in the sample. Then the covariates Z in the main study sample have a multinomial distribution with k categories, sample size n 3 , and an unknown sampling probability vector (s 1 , . . . , s k ) T where k j=1 s j = 1. Note that if the main study were a simple random sample from the target population, then the sampling probabilities would be equal to the stratum proportions, that is, s j = γ j for j = 1, . . . , k.

First, consider a seroprevalence estimator which combines nonparametric standardization and the Rogan-Gladen adjustment to account for both selection bias and measurement error. Notice that ρ is a weighted average of the stratum-conditional means ρ j where each weight is a known stratum proportion γ j , i.e., ρ = k j=1 ρ j γ j . A nonparametric standardization estimator for ρ using the sample stratum-conditional prevalencesρ j = n −1 z j I(Z i = z j , δ i = 3)X i for j = 1, . . . , k iŝ ρ SRG = k j=1ρ j γ j . A standardized prevalence estimator accounting for measurement error iŝ π SRG = (ρ SRG +σ p − 1)/(σ e +σ p − 1), which has been used in SARS-CoV-2 seroprevalence studies Barzin et al., 2020; Cai et al., 2020) .

Letting θ s = (σ e , σ p , ρ 1 , . . . , ρ k , ρ, π) T , the estimatorθ s = (σ e ,σ p ,ρ 1 , . . . ,ρ k ,ρ SRG ,π SRG ) T solves the estimating equation vector ψ(X i , Z i ; δ i , θ s ) = ψ e , ψ p , ψ ρ , ψ ρ , ψ π T = 0 where ψ e , ψ p , and ψ π are defined in Section 2; ψ ρ is a k-vector with jth element ψ ρ j = I(Z i = z j , δ i = 3)(X i − ρ j ); and ψ ρ = k j=1 ρ j γ j − ρ. It follows thatθ s is consistent and asymptotically normal and that

The asymptotic variance V π,SRG can be consistently estimated by the plug-in estimatorV π,SRG defined by replacing σ e , σ p , ρ j , s j , π, and c l in (2) withσ e ,σ p ,ρ j , n z j /n 3 ,π SRG , and n l /n for j = 1, . . . , k and l = 1, 2, 3. Consistency ofV π,SRG holds by continuous mapping, and a proof of asymptotic normality and justification of (2) are in Appendix B.

Standardization requires estimating the stratum-conditional mean of X, ρ j = P (X = 1 | Z = z j ). However, when n z j = 0 for some strata j, the corresponding estimatorρ j is undefined, and ρ SRG is then undefined as well. Values of n z j may equal zero for two reasons. First, the study design may exclude these strata (s j = 0), a situation referred to as deterministic or structural nonpositivity (Westreich and Cole, 2010). Second, even if s j > 0, random nonpositivity can occur if no individuals with Z = z j are sampled, which may occur if s j is small or if n 3 is relatively small. When nonpositivity arises, an analytical approach often employed entails "restriction" (Westreich and Cole, 2010), where the target population is redefined to consist only of strata j for which n z j > 0. However, this redefined target population may be less relevant from a public health or policy perspective.

We now consider a parametric approach for modeling the stratum-conditional means ρ j , an alternative strategy which allows inference to the original target population and may also perform better when some strata have small sample sizes. Assume the binary regression model g(ρ j ) = βh(z j ) holds where g is an appropriate link function for a binary outcome like the logit or probit function; β is a row vector of p regression coefficients with intercept β 1 ; and h(z j ) is a user-specified p-vector function of the jth stratum's covariate values that may include main effects and interaction terms, with lth element denoted h l (z j ) and h 1 (z j ) set equal to one to correspond to an intercept. Let supp(z) be the covariate support in the sample, i.e., supp(z) = {z j : n z j > 0} with dimension dim{supp(z)} = k j=1 I(n z j > 0), and assume p ≤ dim{supp(z)} ≤ k. (Note that dim{supp(z)} = k only when there is positivity, and in that caseπ SRG can be used with no restriction needed.)

Under the assumed binary regression model, each ρ j is a function of the parameters β and the covariates z j that define the jth stratum, denoted ρ j (β, z j ) = g −1 {βh(z j )}. A model-based standardized Rogan-Gladen estimator of π isπ SRGM = (ρ SRGM +σ p − 1)/(σ e +σ p − 1), wherê ρ SRGM = k j=1ρ j (β, z j )γ j andβ is the maximum likelihood estimator of β. Estimating equation theory can again be used to derive large-sample properties by replacing the k equations for ρ 1 , . . . , ρ k from Section 3.2 with p equations for β 1 , . . . , β p corresponding to the score equations from the binary regression.

Let θ m = (σ e , σ p , β 1 , . . . , β p , ρ, π) T . The estimatorθ m = (σ e ,σ p ,β 1 , . . . ,β p ,ρ SRGM ,π SRGM ) T is the solution to the vector ψ(X i , Z i ; δ i , θ m ) = ( ψ e , ψ p , ψ β , ψ ρ , ψ π ) T = 0 of estimating equations where ψ e , ψ p , and ψ π are as in Section 2; ψ β is a p-vector with jth element

It follows thatθ m is consistent and asymptotically normal and √ n(π SRGM − π) → d N (0, V π,SRGM ). The asymptotic variance V π,SRGM can be consistently estimated byV π,SRGM , the lower right element of the empirical sandwich variance estimator of the asymptotic variance ofθ m . A proof of asymptotic normality and the empirical sandwich variance estimator are given in Appendix C.

A simulation study was conducted to compareπ RG ,π SRG , andπ SRGM . Four data generating processes (DGPs) were considered, within which different scenarios were defined through full factorial designs that varied simulation parameters π, σ e , σ p , n 1 , n 2 , and n 3 . These DGPs featured no selection bias (DGP 1), selection bias with two strata (DGP 2), and more realistic selection bias settings with 40 strata and 80 strata (DGPs 3 and 4).

For each DGP and set of simulation parameters, sensitivity and specificity validation samples of size n 1 and n 2 were generated with X distributed Bernoulli with a mean of σ e or 1 − σ p , respectively. In DGPs 1 and 2 a main study of size n 3 was then generated where Y was Bernoulli with mean π and X | Y was Bernoulli with mean σ e Y + (1 − σ p )(1 − Y ); in DGPs 3 and 4 X was generated directly from the distribution of X | Z, as will be explained. Simulation parameter values were selected based on the seroprevalence studies described in Section 5. Sensitivity was varied in σ e ∈ {.8, .99}, specificity in σ p ∈ {.8, .95, .99}, and prevalence in π ∈ {.01, .02, . . . , .20}. Sample sizes were n 1 = 40, n 2 = 250, and n 3 = 2500. The full factorial design led to 120 scenarios per DGP, and within each scenario 1,000 simulations were conducted. Performance was measured across the 1,000 simulations by (a) mean bias, computed asπ − π for each estimatorπ; and (b) empirical coverage, i.e., whether the 95% Wald-type CIs based on each variance estimatorV π contained the true prevalence.

Simulations for DGP 1 were conducted to assess the performance ofπ RG when no selection bias was present. The estimatorπ RG was generally unbiased, as seen in Appendix Figure A.1. Performance improved as σ e and σ p tended toward 1, with specificity σ p being a stronger determinant of bias. An exception to these results was for low prevalence π (0.05 or lower), whenπ RG overestimated the true prevalence. Wald CIs based onV π,RG generally attained nominal coverage, as seen in Appendix Figure A.2. However, when prevalence π was near zero, coverage of the 95% CIs did not equal the nominal level. Coverage also decreased as σ p increased toward 1. These variable CI coverage results concord with previous simulation studies evaluatingV π,RG (Lang and Reiczigel, 2014) . Alternative methods for constructing CIs are discussed in Section 6.

In DGP 2, the target population was comprised of two strata defined by a covariate Z ∈ {z 1 , z 2 } with proportions γ 1 = γ 2 = .5. Within the main study, Z was generated from a multinomial distribution of sample size n 3 and sampling probability vector (.2, .8). Individuals' serostatuses were generated from the conditional distribution Y | Z, which was such that P (Y = 1 | Z = z 1 ) = 1.5π and P (Y = 1 | Z = z 2 ) = 0.5π for each value of π. In each simulationπ RG andπ SRG and their corresponding 95% CIs were computed.

The nonparametric standardized estimatorπ SRG was empirically unbiased for true prevalences π ≥ 0.05, as seen in Appendix Figure A .3, and 95% CIs based onV π,SRG attained approximately nominal coverage, as seen in Appendix Figure 4 A.4. As withπ RG in DGP 1, CI coverage fell slightly below the nominal level for very low prevalences, and more noticeably for σ p = .99. Appendix Figures A.3 and A.4 show thatπ RG performed poorly under selection bias, with large negative bias and CI coverage far below the nominal level in most cases.

DGPs 3 and 4 comparedπ SRG andπ SRGM in scenarios with larger numbers of strata.

Three covariates were defined as Z 1 ∈ {z 10 , z 11 }, Z 2 ∈ {z 20 , z 21 , z 22 , z 23 }, and Z 3 ∈ {z 30 , z 31 , z 32 , z 33 , z 34 }, leading to k = 40 strata with proportions (γ 1 , . . . , γ 40 ). Within the main study, Z was again generated as multinomial with size n 3 and known sampling probability vector. Figure 1(a) shows the structure of selection bias in DGP 3 by comparing the stratum proportions and sampling probabilities. Some low-prevalence strata that frequently occur in the population were oversampled, while most remaining strata were undersampled. Individuals' test results were generated from the conditional distribution X | Z, where

. The parameters β 1 = −1, β 2 = −.6, β 3 = .8, β 4 = .6, and β 5 = .4 were set to reflect differential prevalences by stratum, while a "balancing intercept" β 0 (Rudolph et al., 2021) was set to different values so that prevalence π equaled (approximately) {.01, .02, . . . , .20}. The nonparametric estimatorπ SRG and corresponding CI were computed using a restricted target population when random nonpositivity arose; the values of prevalence used to compute bias and coverage were based on the total (unrestricted) population, which is the parameter of interest. The parametric estimator π SRGM was computed with a correctly-specified logistic regression model. Bothπ SRG andπ SRGM performed well in this scenario. Figure 2 shows that the estimators were generally empirically unbiased, though modest bias occurred when σ p = 0.8 and prevalence was low. Appendix Figure A .5 shows 95% CIs based on eitherV π,SRG orV π,SRGM attained nominal coverage, with slight under-coverage for π < 0.05. On average across all 120 scenarios, 89% (range of 86%-92%) of simulated datasets had positivity. As in DGP 2,π RG was seriously biased and the corresponding CIs did not attain nominal coverage.

Data were generated as in DGP 3, but the inclusion of a fourth covariate Z 4 ∈ {z 40 , z 41 } now led to 80 strata. The conditional distribution X | Z was such that logit{P (X = 1 | Z)} = νh(Z), where h(Z) here contains the same terms as in DGP 3 plus a main effect for I(Z 4 = z 41 ) with corresponding coefficient ν 6 . Regression parameters were a balancing intercept ν 0 , ν 1 = −1, ν 2 = 3.25, ν 3 = .8, ν 4 = .6, ν 5 = .4, and ν 6 = .1. The larger value for ν 2 , as compared to β 2 , led to a stronger relationship between X and Z than was present in DGP 3. Figure 1 (b) displays selection bias in DGP 4. Some of the highest-prevalence and most commonly-occurring strata were undersampled to a greater degree than occurred in DGP 3, so in this sense there was more selection bias in DGP 4. Figure 3 shows that onlyπ SRGM was empirically unbiased in this scenario, whileπ SRG typically had a moderately negative bias. Nonpositivity almost always occurred (in either all or all but one of the simulations, for each of the 120 scenarios). The worse performance ofπ SRG may be explained by restriction leading to bias under nonpositivity. CIs based onV π,SRGM typically attained nominal coverage, unlike those based onV π,SRG orV π,RG , as seen in Appendix Figure A .6. However, the trend of empirical coverage being below the nominal level for low prevalence and σ p = .99 was more noticeable compared to other DGPs, perhaps due to greater selection bias. Note thatV π,SRGM was negative for two simulations in a single scenario, and these "Heywood cases" (Kolenikov and Bollen, 2012) were ignored in the coverage calculation for that scenario. Circle size is proportional to prevalence. Points are jittered slightly for legibility, and the diagonal lines denote equality between γ j (stratum proportion) and s j (sampling probability).

In summary, both the nonparametric and parametric standardized estimatorsπ SRG andπ SRGM had low empirical bias and close to nominal 95% CI coverage when there was little nonpositivity. As the number of covariates, amount of selection bias, and potential for nonpositivity increased, the (correctly-specified) parametricπ SRGM generally maintained its performance whileπ SRG had greater empirical bias and the corresponding CIs did not attain nominal coverage levels.

The performance ofπ SRGM was also assessed in scenarios similar to DGPs 3 and 4, but under mild model misspecification. Here, the true conditional distributions of Y | Z were logit{P (Y = 1 | Z)} = βh(Z) and logit{P (Y = 1 | Z)} = νh(Z), where βh(Z) and νh(Z) are the specifications used in the models for P (X = 1 | Z) in DGPs 3 and 4, respectively. The test results X were generated from X | Y as in DGPs 1 and 2. The degree of misspecification in each scenario thus Figure 2 : Empirical bias of the Rogan-Gladen (π RG ), nonparametric standardized (π SRG ), and logistic regression standardized (π SRGM ) estimators from simulation study for DGP 3, described in Section 4.3.1. The six facets correspond to a given combination of sensitivity σ e ('Sens') and specificity σ p ('Spec').

depended on the values of σ e , σ p , and π. Appendix Figures A.7 through A.10 show that, in terms of bias and 95% CI coverage,π SRGM was robust to the misspecification considered.

The standardized Rogan-Gladen methods were applied to a nationwide SARS-CoV-2 seroprevalence study in Belgium conducted across seven week-long collection rounds between March and October 2020 (Herzog et al., 2021) . The final collection round took place before the first vaccine authorization in the European Union in December 2020. Residual sera were collected in a stratified random sample from private laboratories encompassing a wide geographical network, with stratification by age group (10 year groups from 0-9, 10-19, . . . , 90-plus), sex (male or female), and region (Wallonia, Flanders, or Brussels). The presence of SARS-CoV-2 IgG antibodies was determined using a semi-quantitative EuroImmun ELISA. Based on validation studies of n 1 = 181 RT-PCR confirmed COVID-19 cases and n 2 = 326 pre-pandemic negative controls, sensitivity and specificity were estimated to beσ e = .851 andσ p = .988 (Herzog et al., 2021, Table S1 .1). The number of samples for assessing seroprevalence varied between n 3 = 2,960 and n 3 = 3,910 across the seven collection rounds.

In our analysis, we estimated nationwide seroprevalence in Belgium during each collection round standardized by age group, sex, and province (11 total), using 2020 stratum proportion data from the Belgian Federal Planning Bureau (2021). Province was used rather than region to match the covariates selected for weighting in Herzog et al. (2021) . In six of the seven collection rounds nonpositivity arose, with between 2 to 15 of the 220 strata not sampled, so restricted target populations were used for computation ofπ SRG . Forπ SRGM , each logistic regression model had main effects for age group, sex, and province, as well as an interaction term between age group and sex. Collection period Seroprevalence ρ π RG π SRG π SRGM Figure 4: Estimates and corresponding 95% confidence intervals for each of seven collection rounds for the 2020 Belgian seroprevalence study (Herzog et al., 2021) , described in Section 5.1. 95% CIs). The naive estimatesρ were typically greater than the other three estimates and had narrower CIs. The greatest differences were betweenρ andπ RG , which can be attributed to (estimated) measurement error in the assay. Both of the standardized estimates,π SRG andπ SRGM , were similar in value toπ RG in most collection periods. These estimates, in combination with the stratified random sampling design, suggest that the magnitude of measurement error in this study may have been larger than that of selection bias.

Time-specific seropositivity was at its highest level in the third collection period in May 2020, and thereafter followed a general decreasing trend through September 2020. This trend could reflect a decrease in seroprevalence in Belgium during this calendar time period, perhaps due to a decrease in seroincidence and waning antibodies (Herzog et al., 2021) . Alternatively, the apparent trend could reflect changes in care seeking behaviors during the study timeframe which could have affected the composition of the population contributing residual sera in each collection period (Herzog et al., 2021) , although the standardized estimates should at least partially account for changes in the composition of individuals contributing sera over time.

The standardization methods of Section 3 were also applied to ScreenNC, which tested a convenience sample of n 3 = 2,973 asymptomatic patients age 20 and older in North Carolina (NC) for antibodies to SARS-CoV-2 between April to June 2020 (Barzin et al., 2020) , before the authorization of vaccines in the United States. These patients were seeking unrelated medical care at eleven sites in NC associated with the University of North Carolina (UNC) Health Network. The presence of antibodies was determined with the Abbott Architect SARS-CoV-2 IgG assay. Based on validation studies of n 1 = 40 RT-PCR confirmed positive patients and n 2 = 277 pre-pandemic serum samples assumed to be negative, sensitivity was estimated asσ e = 1 and specificity asσ p = 0.989.

In our analysis, seroprevalence was estimated in two relevant target populations. First, we standardized to patients accessing the UNC Health Network during a similar timeframe (21,901 patients from February to June of 2020). The main study sample differed from this UNC target population in terms of age group, race, and sex characteristics, as seen in Table 1 , and meta-analyses suggested that prevalence of COVID-19 infections differed between levels of these covariates in some populations (Pijls et al., 2021; Mackey et al., 2021) , supporting the covariates' use in standardization. Note that several racial classifications, including patient refused and unknown, were reclassified as 'Other'. Second, we standardized to the 2019 NC population over the age of 20 (7,873,971 persons) using covariate data from the American Community Survey (US Census Bureau, 2019). The assumption of equal sampling probabilities may be less reasonable for this target population because not all NC residents are in the UNC Health Network and because there were some geographic areas where no patients in the study sample were from. There was no sample data in the main study for two covariate strata that existed in the UNC Health Network, so restriction was used forπ SRG . Logistic regression models with main effects for sex, race, and age group were used to computeπ SRGM ; interaction effects were not included as the small number of positive test results could have led to model overfit.

The sample proportion of positive tests wasρ = 24/2973 = 0.81%. The sample false positive rate was 1 −σ p = 1.08%, so the data are, at first appearance, consistent with a population prevalence of 0%. Indeed, the Rogan-Gladen seroprevalence estimate wasπ RG = 0% (95% CI 0%, 1.00%). Likewise, the UNC target population had nonparametric and parametric standardized estimates ofπ SRG = 0% (0%, 1.11%) andπ SRGM = 0% (0%, 1.13%), and the NC target population had corresponding estimates of 0% (0%, 1.10%) and 0% (0%, 1.11%). All estimates were truncated into [0, 1]. The closeness of the standardized and unstandardized results may be due to the small number of positive test results and similarities between the sample and the target populations.

We examined nonparametric and model-based standardized Rogan-Gladen estimators, deriving their large-sample properties and consistent variance estimators. While motivated by SARS-CoV-2 seroprevalence studies, the methods considered here are also applicable to prevalence estimation of any binary variable for settings where validation data can be used to estimate the measurement instrument's sensitivity and specificity and covariate data can be used for standardization. Simulation studies demonstrated that both standardized Rogan-Gladen methods had low empirical bias and nominal CI coverage in the majority of practical settings. The empirical results in Section 4 highlight the tradeoffs inherent in choosing which method to use for a seroprevalence study. The parametric standardized estimatorπ SRGM was empirically unbiased even when the number of strata and covariates, and with them the potential for random nonpositivity, increased. A drawback tô π SRGM is the need to correctly specify the form of a regression model. On the other hand, the nonparametric standardized estimatorπ SRG does not require model specification and performed well in scenarios with lower amounts of selection bias and nonpositivity. As the number of strata and covariates grew, however,π SRG was empirically biased and its corresponding 95% CIs did not attain nominal coverage.

For practical use of either method, careful choice of covariates is necessary. Including additional covariates may make the assumption of equal probability of sampling within strata more reasonable. However, such inclusion also makes covariate-defined strata smaller and random nonpositivity more likely. An alternative strategy is to collapse smaller strata with few or no persons to create larger strata, which may make random nonpositivity less likely. However, if strata with sufficiently different sampling probabilities were collapsed together, then the assumption of equal probability of sampling within strata would be violated.

Topics for future research and broader issues in SARS-CoV-2 seroprevalence studies merit mention. In this paper Wald-type confidence intervals are considered, which have known limitations (Brown et al., 2001; Dean and Pagano, 2015) ; alternative types of confidence intervals could be considered based on the bootstrap (Cai et al., 2020) , Bayesian posterior intervals (Gelman and Carpenter, 2020) , or test inversion (DiCiccio et al., 2021) . While the approaches here estimate seroprevalence at a fixed point in time, seroprevalence is a dynamic parameter. For analysis of studies with lengthier data collection periods, extensions of the estimators in this paper could be considered which make additional assumptions (e.g., smoothness, monotonicity) about the longitudinal nature of seroprevalence. Another possible extension could consider variations in assay sensitivity, which may depend on a variety of factors such as: the type of assay used; the recency of exposure, infection, or vaccination of an individual; disease severity in infected individuals; the type and dose of vaccine for vaccinated individuals; and so forth. Where additional data are available related to these factors, then extensions of the standardized Rogan-Gladen estimators which incorporate these additional data could be developed. As an alternative to standardization, inverse probability of sampling weights (Lesko et al., 2017) or inverse odds of sampling weights (Westreich and Cole, 2010) could be considered. Standardization and weighting methods may possibly be combined to create a doubly robust Rogan-Gladen estimator.

Marchevsky, N. (1979) . Re: "Estimating prevalence from the results of a screening test". American

Code and data availability statement R code based on the simulations in Section 4 and data analysis in Section 5, data from the Belgium and North Carolina seroprevalence studies in Section 5, and a Microsoft Excel spreadsheet that computes the estimatorsπ SRG andV π,SRG are available at https://github.com/samrosin/ rgStandardized.

A.1 Proof of asymptotic normality

Taylor expansion of n −1 ψ(X i ; δ i ,θ) around the true parameter θ yields

whereψ(X i ; δ i , θ) = ∂ψ(X i ; δ i , θ)/∂θ T and R is a remainder term. Rearranging and multiplying by

where R * is a new remainder defined below. It is shown below that n −1 −ψ(

where A(θ) and B(θ) are defined below. Therefore, by Slutsky's theorem,

As n → ∞, A n (X, δ, θ) → A(θ) by the assumption that n −1 I(δ i = j) = n j /n → c j ∈ (0, 1) for j ∈ {1, 2, 3}.

Second, for brevity, let ψ e denote ψ e (X i ; δ i , θ i ), and similarly for ψ p , ψ ρ , and ψ π . Define

Note that B(θ) = lim n→∞ B n (X, δ, θ) as E(ψ 2 e ) = E{I(δ i = 1)(X i − σ e ) 2 } = I(δ i = 1)σ e (1 − σ e ) and likewise for E(ψ 2 p ) and E(ψ 2 ρ ). It follows that

Lindeberg-Feller CLT. In particular, let s 2 n = {Var(ψ e ) + Var(ψ p ) + Var(ψ ρ ) + Var(ψ π )} = n 1 σ e (1 − σ e ) + n 2 σ p (1 − σ p ) + n 3 ρ(1 − ρ) and let · denote the Euclidean norm. Note that max i ψ(X i ; δ i , θ) ≤ √ 2 as the maximum magnitude of each element of ψ(X i ; δ i , θ) is 1, and for a given observation X i it is always true that two of the indicators δ 1 , δ 2 , δ 3 equal zero and the third indicator equals one. Therefore for all > 0,

implying the Lindeberg condition holds.

Third, it remains to prove √ nR * → p 0. The outline of the proof of Boos and Stefanski (2013) Theorem 7.2 can be followed, but their assumption of identically distributed data must be removed. Consider the second-order Taylor series expansion of the jth element of the vector n −1 ψ(X i ; δ i ,θ), denoted n −1 ψ j (X i ; δ i ,θ), around the true value θ:

where θ 1 , . . . , θ 4 are on the line segment joiningθ and θ andψ j (X i ; δ i , θ j ) is a 4 × 4 matrix with entry (j, k) equal to ∂ 2 ψ j (X i ; δ i , θ j )/∂θ j ∂θ k for j, k ∈ {1, 2, 3, 4}. Writing these 4 equations in matrix notation yields

is the 4 × 4 matrix with jth row given by (θ − θ) T n −1 ψ j (X i ; δ i , θ j ). Note that Q → p 0 4×4 by the Weak Law of Large Numbers, where in general 0 r×c denotes an r × c matrix of zeros. Also note that n −1 ψ (X i ; δ i , θ) → p −A(θ), where −A(θ) is nonsingular under the assumption that σ e > 1 − σ p . It follows that as n → ∞, R is invertible with probability one. On the set S n where R is invertible, (A.2) can be rearranged to yield

, and note that R * → p 0 4×4 by Slutsky's theo-

−1 + R * by an application of the Sherman-Morrison-Woodbury formula (Miller, 1981) . Thus, substituting this expression for (− R) −1 into (A.3) and multiplying by √ n yields (A.1), where R * = R * {n −1 ψ(X i ; δ i , θ)}. Therefore, because lim n→∞ Pr(S n ) = 1, R * → p 0 4×4 , and √ n {n −1 ψ(X i ; δ i , θ)} → d N (0, B(θ)), by Slutsky's theorem √ nR * → p 0.

Since A(θ) is lower triangular, it follows that

where * denotes quantities not expressed explicitly and

Appendix B Proofs for Section 3.2

The Taylor expansion of n −1 ψ(

which is similar in form to (A.1), except here the estimating equations are dependent on covariates Z. The remainder R * here is distinct from that in Appendix A.1, and in general symbols may be reused and notation may not hold the same meaning across appendices. Below it is established, using an analogous approach to that of Appendix A.1, that

as a block matrix where A = diag (n 1 /n, n 2 /n, n z 1 /n, . . . , n z k /n) is (k + 2) × (k + 2),

where A = diag{c 1 , c 2 , c 3 s 1 , . . . , c 3 s k }, and note that A(θ s ) = lim n→∞ A n (X, Z, δ, θ s ) since n j /n → c j for j ∈ {1, 2, 3}. Note that s 1 , . . . , s k are all nonzero due to positivity, without whichπ SRG is undefined.

Define B(θ s ) to have the same form as B n (X, Z, δ, θ s ) except E is replaced by

and note that max i ψ(X i , Z i ; δ i , θ s ) ≤ √ 3 because at most one of ψ e , ψ p , ψ ρ 1 , . . . , ψ ρ k is nonzero and each element of ψ(X i , Z i ; δ i , θ s ) has a maximum value of one. Therefore for all > 0,

implying the Lindeberg condition holds.

Third, let Q s be analogous to Q from Appendix A.1. Denote the jth entry of θ s as θ s j and note |∂ 2 ψ j (X i , Z i ; δ i , θ s )/∂θ s j θ s l | ≤ 1 for all j, l ∈ {1, 2, . . . , k + 4}. Thus there exists a function

This fact can be used to prove √ nR * → p 0 with the same technique as in Appendix A.1, so we omit the rest of the proof.

Note that A(θ s ) is block lower triangular, and thus

and therefore

The bottom right submatrix of A(θ s ) −1 B(θ s )A(θ s ) −T , and specifically the bottom right element of that submatrix, is of primary interest. Since A and E are both diagonal,

where * denotes quantities not expressed explicitly and

Appendix C Proof for Section 3.3

The proof that θ m is asymptotically normal is given for the case of logistic regression, but it extends to any link function g(·) appropriate for binary regression. Recall that the (p + 4)-vector of estimating equations is

In the above vector ψ e , ψ p , and ψ π are identical to the equations used in Appendices A and B; ψ ρ = k j=1 logit −1 {βh(Z j )}γ j − ρ; and ψ β is a p-vector with jth element ψ β j = I(δ

The rest of the proof is similar to Appendices A.1 and B.1. Namely, first define

as a block matrix where A = diag(n 1 /n, n 2 /n), B is p × p with entry (j, k) equal to n −1 I(

Let A(θ m ) have the same form as A n (X, Z, δ, θ m ), replacing A with A = diag(c 1 , c 2 ) and B with B, with entry (j,

Let B(θ m ) be of the same form as B n (X, Z, δ, θ m ) except with F and G replacing F and G , where F = diag{c 1 σ e (1 − σ e ), c 2 σ p (1 − σ p )} and G is identical to G except with c 3 replacing n 3 /n in each element. Noting that B(θ m ) = lim n→∞ B n (X, Z, δ, θ m ), it follows that

, B(θ m )} by the Lindeberg-Feller CLT. Specifically, note that the range of the inverse logit function is [0, 1] and that all elements of h(Z), the user-specified function of the covariates, must be finite. Then the Lindeberg condition holds by the same logic as in Appendices A.1 and B.1.

√ nR * → p 0 can be proved as in Appendix B.1. Letting θ m j denote the jth entry of θ m , the proof follows from the fact that |∂ 2 ψ j (X i , Z i ; δ i , θ m )/∂θ m j ∂θ m k | < ∞ for all j, k ∈ {1, 2, . . . , p + 4}. Therefore, by Slutsky's Theorem Empirical bias of the Rogan-Gladen (π RG ) estimator from simulation study for DGP 1, described in Section 4.1. The six facets correspond to a given combination of sensitivity σ e ('Sens') and specificity σ p ('Spec'). Figure A .5: Confidence interval coverage of the Rogan-Gladen (π RG ), nonparametric standardized (π SRG ), and parametric standardized (π SRGM ) estimators from simulation study for DGP 3, described in Section 4.3.1. Figure A .6: Confidence interval coverage of the Rogan-Gladen (π RG ), nonparametric standardized (π SRG ), and parametric standardized (π SRGM ) estimators from simulation study for DGP 4, described in Section 4.3.2. 

How to detect and reduce potential sources of biases in studies of SARS-CoV-2 and COVID-19

SeroTracker: a global SARS-CoV-2 seroprevalence dashboard. The Lancet Infectious Diseases

Estimated SARS-CoV-2 seroprevalence in the US as of

SARS-CoV-2 seroprevalences among a southern U.S. population indicates limited asymptomatic spread under physical distancing measures

Essential Statistical Inference: Theory and Methods

Estimating the cumulative incidence of SARS-CoV-2 with imperfect serological tests: Exploiting cutoff-free approaches

COVID-19 infection fatality ratio: Estimates from seroprevalence

Interval estimation for a binomial proportion

Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic

Exact inference for disease prevalence based on a test with unknown specificity and sensitivity

Evaluating confidence interval methods for binomial proportions in clustered surveys

Confidence intervals for seroprevalence

Evidence for antibody as a protective correlate for COVID-19 vaccines

Inference for nonprobability samples

Population projections 2020-2070

Bayesian analysis of tests with unknown specificity and sensitivity

Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States

Seroprevalence of IgG antibodies against SARS coronavirus 2 in Belgium -a serial prospective cross-sectional nationwide study of residual samples

Estimating prevalence from the results of a screening test

Neutralizing antibody levels are highly predictive of immune protection from symptomatic SARS-CoV-2 infection

Testing negative error variances: Is a Heywood case a symptom of misspecification?

Confidence limits for prevalence of disease adjusted for estimated sensitivity and specificity

Jointly modeling prevalence, sensitivity and specificity for optimal sample allocation

Generalizing study results: a potential outcomes perspective

Racial and ethnic disparities in COVID-19-Related infections, hospitalizations, and deaths

We thank Dirk Dittmer and the ScreenNC research team for data access. We also thank Shaina Alexandria, Bryan Blette, and Kayla Kilpatrick for constructive suggestions and discussions. This research was supported by the NIH (Grant R01 AI085073), the UNC Chapel Hill Center for AIDS Research (Grants P30 AI050410 and R01 AI157758), and the NSF (Grant GRFP DGE-1650116). The content is solely the responsibility of the authors and does not represent the official views of the NIH.