key: cord-0878996-juqkrokx authors: Perkmann, Thomas; Perkmann-Nagele, Nicole; Breyer, Marie-Kathrin; Breyer-Kohansal, Robab; Burghuber, Otto C; Hartl, Sylvia; Aletaha, Daniel; Sieghart, Daniela; Quehenberger, Peter; Marculescu, Rodrig; Mucher, Patrick; Strassl, Robert; Wagner, Oswald F; Binder, Christoph J; Haslacher, Helmuth title: Side by side comparison of three fully automated SARS-CoV-2 antibody assays with a focus on specificity date: 2020-08-10 journal: Clin Chem DOI: 10.1093/clinchem/hvaa198 sha: a02bd129f84f5d227ba1ab7ffb63d946086054d1 doc_id: 878996 cord_uid: juqkrokx BACKGROUND: In the context of the COVID-19 pandemic, numerous new serological test systems for the detection of anti-SARS-CoV-2 antibodies rapidly have become available. However, the clinical performance of many of these is still insufficiently described. Therefore, we compared three commercial, CE-marked, SARS-CoV-2 antibody assays side by side. METHODS: We included a total of 1,154 specimens from pre-COVID-19 times and 65 samples from COVID-19 patients (≥14 days after symptom onset) to evaluate the test performance of SARS-CoV-2 serological assays by Abbott, Roche, and DiaSorin. RESULTS: All three assays presented with high specificities: 99.2% (98.6-99.7) for Abbott, 99.7% (99.2-100.0) for Roche, and 98.3% (97.3-98.9) for DiaSorin. In contrast to the manufacturers’ specifications, sensitivities only ranged from 83.1% to 89.2%. Although the three methods were in good agreement (Cohen’s Kappa 0.71-0.87), McNemar tests revealed significant differences between results obtained from Roche and DiaSorin. However, at low seroprevalences, the minor differences in specificity resulted in profound discrepancies of positive predictive values at 1% seroprevalence: 52.3% (36.2-67.9), 77.6% (52.8-91.5), and 32.6% (23.6-43.1) for Abbott, Roche, and DiaSorin, respectively. CONCLUSION: We found diagnostically relevant differences in specificities for the anti-SARS-CoV-2 antibody assays by Abbott, Roche, and DiaSorin that have a significant impact on the positive predictive values of these tests. Coronavirus 2 (SARS-CoV-2), which was first described by Chinese scientists in early January 2020 (1). On March 11, the World Health Organization (WHO) officially declared the novel SARS-CoV-2 infections a pandemic, which has now spread rapidly across the entire globe, with more than 13.5 million confirmed cases and over 580,000 confirmed deaths (2) . COVID-19 is characterized by a broad spectrum of individual disease courses, ranging from asymptomatic infections to the most severe cases requiring intensive medical care (3) . The reliable detection of infected persons and, subsequently, their isolation is essential for the effort to prevent the rapid spread of the SARS-CoV-2 virus. Therefore, reverse transcriptase-polymerase chain reaction (RT-PCR) testing is required for direct detection of the pathogen. Unfortunately, RT-PCR testing could yield false-negative results, mainly due to preanalytical problems (4, 5) . On the other hand, serological testing for SARS-CoV-2 specific antibodies can be used as an additional diagnostic tool in case of suspected false-negative RT-PCR results (6) or for individual determination of antibody levels. Moreover, cross-sectional serological studies provide essential epidemiological information to allow a correct estimation of the spread of the disease within a population (7, 8) . The first commercially available serological SARS-CoV-2 tests, mostly standard ELISA tests or lateral flow rapid tests, have not always proved to be sufficiently specific and sensitive (9, 10) . Recently, the first tests for fully automated large-scale laboratory analyzers have been launched. The present evaluation aimed to compare three of these test systems manufactured by an adequate positive predictive value. In view of the currently low seroprevalence worldwide (11) , high specificities are important for reliable seroprevalence studies that attempt to close the gap between the number of RT-PCR confirmed COVID-19 cases and the total number of SARS-CoV-2 infections that have occurred (12) . This nonblinded prospective study aims at a detailed comparison of three automated SARS-CoV-2 detection methods with a particular focus on specificity and positive predictive value. A total of 1,154 samples from three cohorts of patients/participants with sampling dates before 01.01.2020 was used to test specificity. The samples derived from three different collections: a cross-section of the Viennese population, LEAD study days) were evaluated in parallel on all three analysis platforms. In this late phase, we assumed the majority of donors/patients had reached prominent and constant levels of SARS-CoV-2 specific antibodies. 52 of the 65 donors/patients were non-hospitalized convalescent individuals, two-thirds of them with mild symptoms. Of those, 42 donors/patients were RT-PCR confirmed cases, and 10 were close contacts of RT-PCR confirmed cases (similar to (14) ). For asymptomatic donors (n=5), SARS-CoV-2 RT-PCR confirmation to analysis time was used instead. We subjected only a single serum sample per patient to sensitivity analysis to avoid data bias due to uncontrolled multiple measurement points of individual patients. Symptom onset was determined by a questionnaire in convalescent donors and by reviewing individual health records in patients. Online Supplemental Table 1 gives a comprehensive overview of characteristics and cohort-specific inclusion and exclusion criteria; online Supplemental Tables 2, 3 2) The Elecsys® Anti-SARS-CoV-2 assay (Roche Diagnostics) was applied on a Cobas e 801 modular analyzer. It detects total antibodies against the SARS-CoV-2 nucleocapsid (N) antigen in a sandwich electrochemiluminescence assay (ECLIA). The cut-off was pre-defined to ≥1 COI. According to the manufacturer, the system delivers qualitative results, either being reactive or non-reactive for anti-SARS-CoV-2 antibodies. 3. The LIAISON® SARS-CoV-2 S1/S2 IgG test detects IgG-antibodies against the S1/S2 domains of the virus spike protein in a chemiluminescence immunoassay (CLIA). The test was applied to a LIAISON® XL Analyzer (DiaSorin S.p.A.). The manufacturer suggests a cut-off ≥15.0 AU/mL (borderline results 12.0 -<15.0, require a re-test algorithm). Samples that repeatedly tested borderline were classified as positive. According to the manufacturer, this assay is the only test system in the current comparison to achieve quantitative results (AU/ml). In addition to a two-point calibration, precision measurements at 3 levels and a linearity test according to CLSI EP-6A are cited as proof of this. Unless stated otherwise, continuous data are given as median (quartile 1 -quartile 3). Categorical data are given as counts and percentages. Diagnostic sensitivity and specificity, as well as positive and negative predictive values, were calculated using MedCalc software 19.2.1 (MedCalc Ltd.). 95% confidence intervals (CI) for sensitivity and specificity were calculated according to Clopper and Pearson ("exact" method) with standard logit confidence intervals for the predictive values (16) . Receiver-Operating-Characteristic (ROC)-curve analysis was used to evaluate test accuracy and compare the diagnostic performance of the three test systems, according to DeLong et al. (17) . To describe assay specificity, we used a total of 1,154 serum samples collected before DiaSorin was negative for an additional six serum samples exclusively (S1/S2-domain antigen-based assay). A comparative overview for specificity, sensitivity, as well as positive and negative predictive values at 1%, 5%, and 10% SARS-CoV-2 antibody seroprevalence is shown in online Supplemental Figures 2A and 2B and summarized in As shown in Figure 2A In the next step, we aimed to assess whether modifying the cut-off values could improve the explanatory power of the ROC-curves (Figure 2A-C) . Table 5 ). Correlation analysis of measurement values between the different platforms showed only moderate to weak concordance (online Supplemental Fig. 3) Table 3) . Despite a good overall inter-rater agreement, significant differences could be shown using the McNemar test for DiaSorin and Roche (online Supplemental Table 6 ). MERS. Like SARS-CoV-2, those highly virulent pathogens cause severe respiratory syndromes, often with lethal outcome (18) . In contrast, infections with other members of the coronavirus family usually present with mild colds, including 229E, OC43, NL63, and HKU1 (19) . Compared to SARS-CoV (which is no longer circulating), cross-reactivity between SARS-CoV-2 and endemic seasonal coronaviruses is low. To date, with few exceptions (20) , no accumulation of cross-reactivities between anti-SARS-CoV-2 antibodies and seasonal coronavirus antibodies has been found. We have therefore refrained from screening a coronavirus panel for possible cross-reactivity. To best describe the specificity of a serological test, it is essential to have a reliable reference, i.e., to ensure that the samples used are negative for the target analyte. For SARS-CoV-2, this means using serum/plasma samples obtained before the first appearance of the new virus. Therefore, we have compiled large pre-COVID-19 cohorts, which have the following characteristics: A) samples of an age and sex-controlled population-based cohort of more than 11,000 participants (LEAD-Study) (13), randomly chosen from Vienna and surrounding areas (n=494). B) samples of healthy voluntary donors (n=302), which are typically used at our Department for the evaluation of new assays, and C) samples of a disease-specific collection of patients with rheumatic diseases including rheumatoid arthritis and systemic lupus erythematodes (n=358), known to have a high prevalence of autoantibodies and other atypical immune activities, enhancing the potential of interference with serological testing. We found several falsepositives in the rheumatological cohort (n=13), and to a lesser extent in the other two cohorts (n=9 in the healthy donor cohort and n=10 in the LEAD study). Notably, falsepositive samples did not typically overlap in the different systems and only one out of 32 false-positive samples was reactive in more than one assay (Abbott and DiaSorin, a sample from the LEAD study). Since these two test systems use different antigens (nucleocapsid vs. S1/S2 proteins) but the same detection method (IgG), this falsepositive reaction is likely associated with interference in the IgG measurement. Calculated specificities are strongly dependent on the spectrum and the size of a selected specificity cohort. If we calculated the specificities of each cohort separately, we would be able to report variable specificities: cohort A (Abbott 99.2%, Roche 100%, DiaSorin 98.8%), cohort B (Abbott 99%, Roche 99.7%, DiaSorin 98.3%), and cohort C (Abbott 99.4%, Roche 99.4%, DiaSorin 97.5%). Roche would range from ideal 100% down to 99.4%, the same level as the best result for Abbott, and DiaSorin would be nearly as good as the worst Abbott specificity or show a 2.5% difference to the best Roche value. These variable specificities could have an enormous impact on prevalence dependent parameters like PPV. A recent evaluation of the DiaSorin LIAISON® SARS-CoV-2 S1/S2 IgG assay with 1,140 pre-COVID-19 samples reported a specificity of 98.5% (21) , nearly perfectly matching the specificity of 98.3% we found when calculating the average of all three cohorts. In contrast, another recent study reported a specificity of 100% for DiaSorin. However, the authors used only n=81 samples for specificity testing (22) . Similarly, a further evaluation comparing all three SARS-CoV-2 tests by Abbott, Roche, and DiaSorin found quite different specificities, namely 100%, 98%, and 96.9% for Abbott, Roche, and DiaSorin, respectively. Again, the specificity cohort was very small (n=100, and n=98 for DiaSorin) (23) . This underlines the importance of selecting adequately sized testing cohorts to obtain reliable and comparable results. In summary, the specificities of 99.2%, 99.7%, and 98.3% found in the present study are very close to the values given by the manufacturers of 99.6%, 99.8%, and 98.5% for Abbott, Roche, and Diasorin, which were also established on large collectives. The COVID-19 positive cohort used in this study for the estimation of sensitivities is relatively small (n=65). However, it has three distinctive features: only one serum sample per patient/donor was included, blood was sampled median 41 days after the onset of symptoms, and 80% were non-hospitalized COVID-19 patients (two-thirds of them with mild symptoms). Q.-X Long et al. (24) have previously shown that in the majority of COVID-19 patients serum conversion for anti-SARS-CoV-2 IgG and IgM started 13 days after onset of symptoms. In the same publication, serum IgG and IgM levels plateaued within 6 days after seroconversion. This is consistent with the observation that the sensitivity within the first 14 days after symptom onset is highly variable for most SARS-CoV-2 antibody assays but becomes better >14 days (25, 26) . As our median time between symptom onset and blood sampling was 41 days, we expected high sensitivities for all tested assays. Surprisingly five samples were negative in all three assays: all were RT-PCR confirmed cases, 4/5 non-hospitalized (42-51 days after symptom onset), two with mild and the other two with moderate symptoms, and symptom duration of <1 week for all. None of these patients had a known immune dysfunction or other severe diseases. One patient was an ICU patient with an underlying hematological disease, and the sample was taken at day 15 after symptom onset. For Abbott, there is some evidence available that the suggested sensitivity of 100% ≥14 days after symptom onset might not be higher sensitivities of 96.95% ≥ 14 days and 100% ≥17 days. In our study, we observed a sensitivity of 84.6% for Abbott, which is still far below the reported sensitivities. One possible explanation might be the high proportion of non-hospitalized and mild cases in our sensitivity cohort, as antibody levels could depend on disease severity (28, 29) . Interestingly, using the same specimens and validation protocol Tang Specificity and sensitivity alone are not sufficient to judge the performance of a diagnostic test; prevalence-dependent accuracy measures like PPV and NPV are necessary, and especially PPV, in times of low prevalence (31) . For most regions affected by the pandemic, the prevalence of SARS-CoV-2 antibody-positive individuals is unknown but can be estimated to be below 5% (32) (33) (34) . Seroprevalence can change substantially over time and large regional differences have been shown for example in a large nationwide seroprevalence study in Spain (32) . In line with this, the FDA compares the performance of all SARS-CoV-2 EUA approved antibody tests based on an assumed 5% seroprevalence (35) . At this rate, the results presented here show PPV values of 85.1% (74.7-91.7), 94.8% (85-98), and 71.6% (61.7-79.8) for Abbott, Roche, and DiaSorin, respectively. The PPV values between Roche and DiaSorin differ so clearly that not even the 95% CI intervals overlap. Therefore, we must assume that these two assays differ significantly from each other in terms of positive predictive value. Using these two tests at lower seroprevalences, such as 1%, leads to an even more The strength of this study is the side by side evaluation of three assays with a large number of negative samples to give reliable and comparable specificity data (no missing data). Limitations are the moderate numbers of positive samples. Moreover, obtained sensitivities cannot easily be compared to other studies because of the unique feature of our COVID-19 cohort, including 80% non-hospitalized patients with mainly mild symptoms. The latter is highly relevant for a potential use of antibody tests to assess seroprevalence in large populations. We find diagnostically relevant differences in specificities for the anti-SARS-CoV-2 antibody assays by Abbott, Roche, and DiaSorin that have a significant impact on the positive predictive value of these tests. We conclude that low seroprevalences require an unusually high specificity for SARS-CoV-2 antibody tests, which pushes some test systems to their limits earlier than others. Therefore, the choice of the test must depend on the respective seroprevalence, and strategies such as confirmation of possible falsepositive test results with additional testing must be considered. A Novel Coronavirus from Patients with Pneumonia in China An interactive web-based dashboard to track COVID-19 in real time Clinical characteristics of 145 patients with corona virus disease 2019 (COVID-19 A case of imported COVID-19 diagnosed by PCR-positive lower respiratory specimen but with PCRnegative throat swabs Positive rate of RT-PCR detection of SARS-CoV-2 infection in 4880 cases from one hospital in Evaluation of the auxiliary diagnostic value of antibody assays for the detection of novel coronavirus (SARS-CoV-2). [Epub ahead of print SARS-CoV-2 Serology: Much Hype, Little Data The Role of Antibody Testing for SARS-CoV-2: Is There One? Diagnostic accuracy of an automated chemiluminescent immunoassay for anti IgM and IgG antibodies: an Italian experience Test performance evaluation of SARS-CoV-2 serological assays. medRxiv SARS-CoV-2 seroprevalence in COVID-19 hotspots Antibody tests for identification of current and past infection with SARS-CoV-2 The LEAD (Lung, Heart, Social, Body) Study: Objectives, Methodology, and External Validity of the Population-Based Cohort Study Convergent antibody responses to SARS-CoV-2 in convalescent individuals Usage Data and Scientific Impact of the Prospectively Established Fluid Bioresources at the Hospital-Based MedUni Wien Biobank Confidence intervals for predictive values with an emphasis to case-control studies Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach Comparative pathogenesis of COVID-19, MERS, and SARS in a nonhuman primate model Forty years with coronaviruses Severe Acute Respiratory Syndrome Coronavirus 2-Specific Antibody Responses in Coronavirus Disease Neutralizing IgG In Covid-19 Patients Semiquantitatively Validation of a chemiluminescent assay for specific SARS-CoV-2 antibody High-throughput immunoassays for SARS-CoV-2, considerable differences in performance when comparing three methods. medRxiv Antibody responses to SARS-CoV-2 in patients with COVID-19 Clinical Performance of Two SARS-CoV-2 Serologic Assays Performance Characteristics of the Abbott Architect SARS-CoV-2 IgG Assay and Seroprevalence in Performance Characteristics of Four High-Throughput Immunoassays for Detection of IgG Antibodies against SARS-CoV-2 Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections Antibody profiles in mild and severe cases of COVID-19 Clinical Performance of the Roche SARS-CoV-2 Serologic Assay Measures of Diagnostic Accuracy: Basic Definitions Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a population-based study Seroprevalence of immunoglobulin M and G antibodies against SARS-CoV-2 in China EUA Authorized Serology Test Performance [Internet]. fda.gov Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 4 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved.T. Perkmann, financial support, statistical analysis, administrative support, provision of study material or patients; M.-K. Breyer, provision of study material or patients; R. Breyer-Kohansal, provision of study material or patients; O.C. Burghuber, provision of study material or patients; S. Hartl, provision of study material or patients; D. Aletaha, provision of study material or patients; D. Sieghart, provision of study material or patients; P. Quehenberger, provision of study material or patients; P. Mucher, administrative support, provision of study material or patients; O.F. Wagner, financial support; H. Haslacher, statistical analysis, administrative support.