key: cord-0298549-1phc3qmr authors: Stone, M.; Grebe, E.; Sulaeman, H.; Di Germanio, C.; Dave, H.; Kelly, K.; Biggerstaff, B.; Crews, B. O.; Tran, N.; Jerome, K.; Denny, T. N.; Hogema, B.; Destree, M.; Jones, J. M.; Thornburg, N.; Simmons, G.; Krajden, M.; Kleinman, S.; Dumont, L. J.; Busch, M. P. title: Evaluation of commercially available high-throughput SARS-CoV-2 serological assays for serosurveillance and related applications date: 2021-09-15 journal: nan DOI: 10.1101/2021.09.04.21262414 sha: 371668a9bfc90f1f4cd31bf0efe7dbf512a25f4d doc_id: 298549 cord_uid: 1phc3qmr SARS-CoV-2 serosurveys can estimate cumulative incidence for monitoring epidemics but require characterization of employed serological assays performance to inform testing algorithm development and interpretation of results. We conducted a multi-laboratory evaluation of 21 commercial high-throughput SARS-CoV-2 serological assays using blinded panels of 1,000 highly-characterized blood-donor specimens. Assays demonstrated a range of sensitivities (96%-63%), specificities (99%-96%) and precision (IIC 0.55-0.99). Durability of antibody detection in longitudinal samples was dependent on assay format and immunoglobulin target, with anti-spike, direct, or total Ig assays demonstrating more stable, or increasing reactivity over time than anti-nucleocapsid, indirect, or IgG assays. Assays with high sensitivity, specificity and durable antibody detection are ideal for serosurveillance. Less sensitive assays demonstrating waning reactivity are appropriate for other applications, including characterizing antibody responses after infection and vaccination, and detection of anamnestic boosting by reinfections and vaccine breakthrough infections. Assay performance must be evaluated in the context of the intended use. Serosurveillance for SARS-CoV-2 infection is critical to monitor the course of the 71 evolving pandemic and local outbreaks, and informs infection fatality ratios, vaccine penetrance 72 and the impact of mitigation measures, and levels of population immunity. Serosurveillance 73 should be conducted with representative population sampling using well characterized 74 serological assays selected based on their performance characteristics and optimized algorithms. 75 The use of assays and algorithms that detect mild or asymptomatic infections are critical for 76 accurately estimating cumulative incidence, and case-and death-to-infection ratios. 77 More than >85 SARS-CoV-2 antibody (Ab) assays received FDA Emergency Use 78 Authorization (EUA) as of August 19, 2021, ranging from point-of-care tests to fully automated 79 high-throughput platforms [1] . These assays target different immunoglobulins (Ig) against viral 80 antigens (full length Spike protein [S1/S2], subunit 1 [S1] and/or subunit 2 [S2] of Spike, the 81 receptor binding domain [RBD] of Spike, or the nucleocapsid protein [N] ). Detection methods 82 include lateral flow assays [LFA] , enzyme-linked immunosorbent assays [ELISA] , and 83 chemiluminescent immunoassay [CLIA] , and detection of either total Ig, or selective IgG, IgM 84 or IgA antibodies [1] . There are limited head-to-head evaluation data available for high-85 throughput SARS-CoV-2 serological assays and few large-scale studies that have focused on 86 performance for serosurveillance applications. Comprehensive characterization of assay 87 performance must include sensitivity, specificity, and durability of antibody detection over time 88 since infection. 89 We conducted a standardized, multi-laboratory comparative assessment of 21 high-90 throughput, commercially available SARS-CoV-2 serological assays using blinded panels of 91 1,000 highly characterized de-identified specimens including longitudinal and cross sectional 92 relied on this determination as consistent with applicable federal law and CDC policy (45 C.F.R. 116 part 46, 21 C.F.R. part 56; 42 U.S.C. Sect. 241(d); 5 U.S.C. Sect. 552a; 44 U.S.C. Sect. 3501). 117 Qualification for CCP donation required documentation of positive SARS-CoV-2 molecular or 118 serologic test, complete resolution of symptoms 14-28 days prior to donation [2] , and reactivity 119 on the primary screening Ortho VITROS SARS-CoV-2 S total Ig (Ortho Clinical Diagnostics, 120 Raritan, NJ) Ab assay and standard allogeneic blood donor qualification criteria [3] . 121 To evaluate the waning of sensitivity over time, longitudinal specimens were included 122 from 24 CCP donors who continued to qualify for CCP donation at each of 4-14 donations 123 (median 9) over 79-126 days (median 95). A COVID-19 Seroconversion Panel consisted of 14 124 time points from a single source plasma donor during the progression of a SARS-CoV-2 125 infection over 87 days [4] . Fifteen CCP specimens were represented in 6 blinded replicates to 126 evaluate precision. The dilution panel consisted of six 4-fold serial dilutions of specimens with a 127 range of neat Ab titers [5] . The panel also included 24 apparent serosilent specimens from donors 128 who initially qualified for CCP donation as having a positive molecular test but without evidence 129 of seroconversion by the Ortho S total Ig assay. Specificity panel included pre-pandemic blood 130 donor specimens derived from plasma components collected before end of year 2019 (N=432) 131 and 27 donations collected in early 2020 that tested non-reactive on Ortho CoV2T and were non-132 neutralizing by pseudovirus neutralization assay [5] . 133 All statistical analyses were performed using the R statistical programming language (v. 4.0.4, 135 [6] ) and using various packages, including the binom package for confidence intervals on 136 proportions [7], the glm2 package [8] for regression analysis and the ggplot2 package [9] for 137 plotting. 138 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (1) and (3), 3 148 by (2) and (3) and only 22 positive by only one of the definitions. The 24 purposely selected 149 "serosilent" CCP specimens (see Table 2 ) were excluded from the sensitivity analysis based on 150 criterion 1 above, while specimens from the longitudinal CCP donor cohort were excluded from 151 all sensitivity analyses. Donors who continued to qualify for CCP donation may bias sensitivity 152 estimates given they were required to have bAb reactivity for continued donation of CCP. 153 Specificity was assessed using pre-pandemic blood donor specimens (N=432) and 27 154 seronegative early 2020 donations [5] . These 27 samples were not included in the primary 155 analysis but were included in a secondary specificity analysis (N=459) (Appendix Figure 1) . 156 All sensitivity and specificity estimates were based on reported qualitative interpretations 157 of assay results. Results defined and reported by the manufacturer as "equivocal" were excluded 158 from primary sensitivity and specificity estimates. A secondary analysis of sensitivity was 159 conducted in which we considered results reported as equivocal by the testing lab as non-reactive 160 (Appendix Figure 2 ). All 95% confidence intervals are Wilson score intervals. 161 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Coefficients of variation (CV, i.e., the ratio of the standard deviation across 163 measurements of the six replicate specimens to the mean of the six measurements, expressed as a 164 percentage) were computed for each of the replicate specimens (N=90). A limitation of this 165 approach is that assays with narrower dynamic range produced very low or zero CVs for results 166 at the upper limit of quantification. To adequately account for the impact of specimens with 167 reactivity outside the measurement range, we excluded these specimens from the overall 168 repeatability assessment for which we used the intraclass correlation coefficients (ICCs). The 169 ICC expresses between-sample variance as a proportion of total variance in the tested replicate 170 specimen. In the case of the Bio-Rad BioPlex assay (Bio-Rad, Hercules, CA), on-board dilutions 171 were conducted by the testing lab and used to estimate reactivity in specimens where initial 172 results were above the assay's limit of quantitation. 173 The dilution panel (N=55) allows comparative assessment of the linearity of observed vs. 175 expected reactivity measurements above and below assay cutoffs. Expected reactivity is defined 176 as the mean signal intensity measured over six replicates of the neat specimen divided by the 177 dilution factor. These analyses are reported in supplemental materials. 178 We assessed both qualitative and quantitative durability of bAb detection in longitudinal 180 CCP specimens (N=209 specimens from 27 donors). Documented dates of symptom onset, 181 symptom resolution or nucleic acid test (NAT)-based diagnosis are not available for these 182 donors, so all analyses are anchored to the index donation. These CCP donors first presented for 183 donation early in the pandemic, typically within one month of symptom resolution [5] . 184 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Qualitative detection was assessed by estimating the proportion of specimens with 185 detectable bAbs grouped in 30-day bins of time since index donations. To account for within-186 donor correlation, if a donor contributed more than one specimen in a particular time bin, the 187 proportion of the donor's specimens that were reactive was added to the numerator for the bin, 188 and only 1 to the denominator, so that the proportion detected reported is the proportion of 189 donors whose bAbs could be detected in each bin. EUROIMMUN IgA assays, with estimates below 80%. Figure 1 , panel C shows similar patterns 206 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101/2021.09.04.21262414 doi: medRxiv preprint to the first and second definitions of true positivity when true positivity was defined by the 207 'operational standard' of positivity based on bAb reactivity on three or more assays. 208 Specificities, based on testing 432 pre-COVID-19 specimens, were high, with estimates 209 ranging from 96.1% (95%CI: 93.8%-7.5%; Diazyme DZ-Lite assay) to 100% (95%CI: 99.1%-210 100%; Abbott IgG N, Bio-Rad BioPlex IgG, Bio-Rad Platelia Total Ig N, and Ortho VITROS 211 Total Ig S assays. Most assays (13/20) had specificities above 99%, and 5/20 assays had 212 specificities of 100% in this panel ( Figure 2 ). Assays with poorer specificity tended to have 213 poorer sensitivity, suggesting no tradeoff between sensitivity and specificity (Appendix Table 1 Durability of bAb detection was highly variable, with some assays reactive at all 219 longitudinal timepoints, while others showed substantial declines in the proportion of reactive 220 specimens over time (Figure 3 ). IgG assays and anti-N assays generally demonstrated more rapid 221 seroreversion proportions compared to total Ig and anti-S assays. For example, the Abbott and 222 EUROIMMUN IgG anti-N assays detected antibodies in <70% of specimens collected >90 days 223 after index donation, while total Ig assays like the Ortho Vitros S total Ig and Roche Elecsys N 224 total Ig assays detected antibodies in 100% of specimens at these timepoints. Given the relatively 225 small number of donors in the cohort, the declining detection rates at later timepoints were 226 generally not statistically distinguishable from sensitivity at earlier timepoints for these 227 qualitative assays. 228 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Regression models of quantitative signal intensity over time showed statistically 229 significant declining reactivity in some assays. All anti-S total Ig ("direct" antigen sandwich 230 format) assays showed stable or increasing reactivity, while all IgG assays showed declining 231 reactivity over time (Figure 4, panel A) . Anti-N assays showed more rapid waning than anti-S 232 assays, with multivariable regression confirming that both assay format and antigen (Ag) Wantai assay that had an ICC below 0.6 (Figure 5, Appendix Table 2 ). CVs were generally 239 <10% for low and medium titer blinded replicate specimens, and somewhat higher for high titer 240 specimens, ranging from ~20% to over 100% (Appendix Table 3 ). The Ortho VITROS anti-S 241 and Roche Elecsys anti-N total Ig assays had notably low CVs on most replicate specimens 242 (generally below 10%). 243 Dilutional performance was generally good, with most assays demonstrating reasonable 244 linearity in the relationship between expected and observed reactivity above the assay cutoff 245 (Appendix Figure 5 ). Assays with greater dynamic ranges tended to show a linear dilutional 246 response even below the cutoff. Most assays had a well-defined inflection point, representing a 247 level of reactivity below which the dilutional response was not linear. 248 For most assays all 24 serosilent specimens were non-reactive, 7 assays had 1/24 249 specimen reactive, 2 assays had 2/24 specimens reactive. The vast majority of apparent serosilent 250 specimens were not detected using any of the serological assays included in this study (Appendix 251 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101/2021.09.04.21262414 doi: medRxiv preprint Figure 6 ). For the single seroconversion series, most assays show seroconversion over the same 252 two-week timeframe, providing little evidence of variable sensitivity relative to time of infection 253 (Appendix Figure 7) . 254 This study characterized 21 commercial SARS-CoV-2 serological assays, supporting the 256 development, validation, and implementation of testing algorithms for serosurveillance 257 programs, including algorithms that can distinguish natural infection from vaccine induced 258 seroreactivity. 259 The three most critical characteristics for assays used to conduct serosurveillance are 1) 260 sensitivity including an assay's ability to detect antibodies following asymptomatic and mildly 261 symptomatic infections potentially resulting in weak Ab responses [11] [12] [13] , 2) specificity to The impact of particular performance characteristics on interpretation of serosurveillance 273 data is context dependent. Ideal assays for serosurveillance applications (eg total Ig assays 274 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is common practice for assay manufacturers to determine sensitivity based on timing of 280 seroconversion relative to diagnostic testing or clinical disease. Because this clinical diagnostic 281 definition of sensitivity may not be the most relevant criterion in cases where there is a high rate 282 of mildly symptomatic or asymptomatic infections, alternate definitions of true positivity should 283 be considered. Thus, in this study focused on serosurveillance applications, we used multiple 284 definitions to assess sensitivity. These definitions allowed us to assess sensitivity in practical 285 serosurveillance contexts. Of particular note, the inclusion of all CCP donors results in lower 286 sensitivity estimates consequent to inclusion of serosilent infection cases, whereas the 287 requirement for neutralization activity excludes those cases resulting in higher sensitivity 288 estimates. 289 The ability to detect past infections long after the resolution of symptoms is key to 290 accurately estimate cumulative incidence of infections based on seroreactivity rates; otherwise, 291 complex and unvalidated adjustments for seroreversion may be required [17, 19, 20] . To evaluate 292 the durability of humoral immunity after natural infection and vaccination, and to detect 293 anamnestic boosting of Abs following reinfection or vaccine breakthrough, it is necessary to 294 detect both changes in quantitative signal intensity and qualitative results over time, and this 295 requires assays with wide dynamic range and quantitative precision. Seroreactivity levels on 296 such assays may still plateau at the upper limit of quantitation. Dilution of specimens, which 297 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101/2021.09.04.21262414 doi: medRxiv preprint many platforms can perform automatically, extend dynamic ranges enabling quantitation of high 298 titer specimens as demonstrated by the Bio-Rad BioPlex assay. Furthermore, rates of waning 299 immunity are difficult to assess using assays with narrow dynamic ranges that constrain 300 detection of declining reactivity, which may persist at the upper limit of quantification. Although 301 qualitative seroreversion was observed over the timescale evaluated in some assays including 302 ones with narrow dynamic range (Figure 3) , further studies are required to assess the durability 303 of detection over longer timescales. Quantitation of very low-level reactivity is possible in assays 304 demonstrating linearity of dilutional performance below the manufacturer defined thresholds for 305 reactivity, which are generally set to maintain high specificity. Quantitation of high-level 306 reactivity requires assays with a wide dynamic range, or testing of dilutions to extend the 307 measurement range, which is more practical on platforms that support on-board dilutions. 308 We observed that all anti-S total Ig ("direct" antigen sandwich format) assays showed 309 stable or increasing reactivity, while all but one IgG assays showed declining reactivity over time 310 presumably due to continued maturation of Ab affinity and/or avidity resulting in increasing 311 signal intensity in these assays [21] [22] [23] . Anti-N assays showed more rapid waning than anti-S 312 assays, with multivariable regression confirming that both assay format and Ag target are 313 important rate of waning predictors. 314 The relatively stable detection of neutralizing activity up to four months post index 315 donation demonstrates that in the cross-sectional CCP sample set used in the sensitivity analysis, 316 any waning of nAb titers was very unlikely to have taken place by the time specimens were 317 collected and would therefore not have biased sensitivity analyses based on neutralizing activity. 318 Although there was sporadic reactivity in a few specimens from serosilent cases, most 319 assays included in this evaluation tested non-reactive on all specimens. This corroborates the 320 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 findings of other studies [11, 24] indicating that some infected individuals do not develop a 321 detectable systemic humoral immune response to SARS-CoV-2 infection. 322 The best performing assays for serosurveillance applications in this evaluation were high-323 throughput total Ig antigen sandwich format assays, as they met the three key performance 324 criteria of durable Ab detection, sensitivity and specificity. The Ortho and Roche total Ig assays 325 that target S and N antibodies, performed well and are currently employed in large scale 326 serosurveillance studies in the US, Canada, the UK and other countries, including the CDC-327 nationwide blood donor seroprevalence study (COVID Data Tracker). The Wantai assay has 328 been widely used in serosurveillance globally [25] [26] [27] ; while this demonstrated lower specificity 329 and reproducibility than the best performing assays, it performs adequately for serosurveillance 330 with accounting for those limitations. Several other assays, including the Abbott IgG anti-N and 331 EUROIMMUN IgG anti-S assays, have been employed in large-scale serosurveillance, but 332 require adjustments for rapid waning and seroreversion to estimate cumulative incidence or 333 attack rates, especially over longer periods and multiple epidemic waves. This study provides 334 critical data that can be applied to adjust for waning in other studies. 335 This study has several limitations. Asymptomatic cases are underrepresented in the panel 336 as CCP donors qualify based on recovery from symptomatic infection, potentially resulting in 337 overestimation of sensitivity. The assessment of durability of bAb detection is based on CCP 338 donations from donors whose continued qualification required ongoing Ortho VITROS Total Ig 339 anti-S1 reactivity. Although these CCP donors do not have documented dates of NAT-positivity, 340 symptom onset or resolution, the first donations were generally within 1-2 months of symptom 341 resolution [5] . To address these limitations we developed approaches to adequately characterize 342 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101/2021.09.04.21262414 doi: medRxiv preprint sensitivity and durability of reactivity. The number of specimens included in the dilutional series 343 subpanels are not sufficient for robust assessment of endpoint dilutional sensitivity. 344 This study provides a standardized, comparative assessment of 21 SARS-CoV-2 Ab 345 assays from major commercial manufacturers and allows for identification of optimal assays and 346 testing algorithms for serosurveillance applications in various contexts. These results also 347 provide performance data applicable to other serological testing use cases relevant to clinicians, 348 public health organizations, laboratorians, and emergency response planners. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. immunoglobulin; RVP, reporter viral particle; bAb, neutralizing antibodies. 379 380 381 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. contributed more than one donation in a time bin contributed the fractional proportion reactive to 404 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Table 1 for assay details. 420 421 Figure 5 . Intraclass correlation coefficients based on blinded replicate sample testing, reflecting 422 the proportion of total variance that is between-sample rather than within-sample variability. S, 423 spike protein; RBD, receptor binding domain; N, nucleocapsid; Ag, antigen; Ab, antibody; Ig, 424 immunoglobulin. See Table 1 for assay details. 425 * Results falling outside the primary measurement range excluded. ** On-board dilutions were 426 used to estimate reactivity in specimens where initial results fell outside the primary 427 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. for "moderate" (0.5), "good" (0.75) and "excellent" (0.9) repeatability [10] . 429 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Food & Drug Administration Selecting COVID-19 convalescent 434 plasma for neutralizing antibody potency using a high-capacity SARS-CoV-2 antibody 435 assay Food and Drug Administration. Investigational COVID-19 Convalescent Plasma Guidance for Industry Available COVID-19 440 serial seroconversion panel for validation of SARS-CoV-2 antibody assays. Diagnostic 441 Microbiology and Infectious Disease SARS-CoV-2 Antibody persistence in 443 COVID-19 convalescent plasma donors: Dependency on assay format and applicability 444 to serosurveillance R: A Language and Environment for Statistical Computing Binomial Confidence Intervals For Several Parameterizations glm2: Fitting generalized linear models with convergence problems Elegant Graphics for Data Analysis Skavberg Roaldsen K. Intraclass correlation -A discussion and 453 demonstration of basic features Lack of antibodies to SARS-CoV-2 in a large 455 cohort of previously infected persons Rapid Decay of Anti-SARS-CoV-2 458 Antibodies in Persons with Mild Covid-19 SARS-CoV-2 antibody magnitude and 460 detectability are driven by disease severity, timing, and assay Multiple SARS-CoV-2 variants escape 463 neutralization by vaccine-induced humoral immunity SARS-CoV-2 Serologic Assay Needs for 465 the Next Phase of the US COVID-19 Pandemic Response SARS-CoV-2 Vaccines and the Growing Threat of Viral Variants Resurgence of COVID-19 in Manaus, Brazil, 470 despite high seroprevalence. The Lancet 2021 The Importance and 472 Challenges of Identifying SARS-CoV-2 Reinfections Prevalence of SARS-CoV-2 in Spain 475 (ENE-COVID): a nationwide, population-based seroepidemiological study Estimating the cumulative incidence of SARS-CoV-478 2 infection and the infection fatality ratio in light of waning antibodies CoV-2 lineage in Manaus, Brazil Cross-Neutralization of a SARS-CoV-2 Antibody to a 482 Functionally Conserved Site Is Mediated by Avidity SARS-CoV-2 Antibody Avidity Responses 484 in COVID-19 Patients and Convalescent Plasma Donors. The Journal of infectious 485 diseases 2020 Intrafamilial Exposure to SARS-CoV-2 Associated 487 with Cellular Immune Response without Seroconversion, France. Emerging Infectious 488 Disease journal 2021 Humoral Immune Response to SARS-490 CoV-2 in Iceland. The New England journal of medicine 2020 Low SARS-CoV-2 seroprevalence in blood 492 donors in the early COVID-19 epidemic in the Netherlands SARS-CoV-2 seroprevalence survey among 495 17,971 healthcare and administrative personnel at hospitals, pre-hospital services, and 496 specialist practitioners in the Central Denmark Region