key: cord-0865740-aprb819p authors: Perkmann, T.; Perkmann-Nagele, N.; Breyer, M.-K.; Breyer-Kohansal, R.; Burghuber, O. C.; Hartl, S.; Aletaha, D.; Sieghart, D.; Quehenberger, P.; Marculescu, R.; Mucher, P.; Strassl, R.; Wagner, O. F.; Binder, C. J.; Haslacher, H. title: Side by side comparison of three fully automated SARS-CoV-2 antibody assays with a focus on specificity date: 2020-06-05 journal: nan DOI: 10.1101/2020.06.04.20117911 sha: a73a02a9097483e778f6ac110c1477b81aa64dc7 doc_id: 865740 cord_uid: aprb819p Background: In the context of the COVID-19 pandemic, numerous new serological test systems for the detection of anti-SARS-CoV-2 antibodies have become available quickly. However, the clinical performance of many of them is still insufficiently described. Therefore we compared three commercial, CE-marked, SARS-CoV-2 antibody assays side by side. Methods: We included a total of 1,154 specimens from pre-COVID-19 times and 65 samples from COVID-19 patients ([≥]14 days after symptom onset) to evaluate the test performance of SARS-CoV-2 serological assays by Abbott, Roche, and DiaSorin. Results: All three assays presented with high specificities: 99.2% (98.6-99.7) for Abbott, 99.7% (99.2-100.0) for Roche, and 98.3% (97.3-98.9) for DiaSorin. In contrast to the manufacturers' specifications, sensitivities only ranged from 83.1% to 89.2%. Although the three methods were in good agreement (Cohen's Kappa 0.71-0.87), McNemar's test revealed significant differences between results obtained from Roche and DiaSorin. However, at low seroprevalences, the minor differences in specificity resulted in profound discrepancies of positive predictability at 1% seroprevalence: 52.3% (36.2-67.9), 77.6% (52.8-91.5), and 32.6% (23.6-43.1) for Roche, Abbott, and DiaSorin, respectively. Conclusion: We find diagnostically relevant differences in specificities for the anti-SARS-CoV-2 antibody assays by Abbott, Roche, and DiaSorin that have a significant impact on the positive predictability of these tests. COVID-19 is a new disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which was first described by Chinese scientists in early January 2020 (1). On March 11, the WHO officially declared the novel SARS-CoV-2 infections a pandemic, which has now spread rapidly across the entire globe, with almost 6.5 million confirmed cases and over 375,000 confirmed deaths (2) . COVID-19 is characterized by a broad spectrum of individual disease courses, ranging from asymptomatic infections to the most severe cases requiring intensive medical care (3) . The reliable detection of infected persons and, subsequently, their isolation is essential for the effort to prevent the spread of the SARS-CoV-2 virus quickly and efficiently. Therefore, reverse transcriptase-polymerase chain reaction (RT-PCR) testing is required for direct detection of the pathogen. Unfortunately, RT-PCR testing does not always give a clear answer to whether the SARS-CoV-2 infection is currently present or not (4, 5) . On the other hand, serological testing for SARS-CoV-2 specific antibodies can be used as an additional diagnostic tool in case of suspected false-negative RT-PCR results (6) or for individual determination of antibody levels. Moreover, cross-sectional serological studies provide essential epidemiological information to allow a correct estimation of the spread of the disease within a population (7, 8) . The first commercially available serological SARS-CoV-2 tests, mostly standard ELISA tests or lateral flow rapid tests, have not always proved to be sufficiently specific and sensitive (9, 10) . Recently, the first tests for fully automated large-scale laboratory analyzers have been launched. The present evaluation aims to compare three of these test systems manufactured by Abbott (11), DiaSorin (12), and Roche (13) , with particular emphasis on specificity, which is crucial for an adequate positive predictive value given the current low seroprevalence worldwide. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint The present study aims at a detailed comparison of three automated SARS-CoV-2 detection methods with a particular focus on specificity and positive predictability. A total of 1,154 samples from three cohorts of patients/participants with sampling dates before 01.01.2020 were used to test specificity. The samples derived from three different collections: a crosssection of the Viennese population, LEAD study (14) , preselected for samples collected between (13) . According to the manufacturer, the system delivers qualitative results, either being reactive or non-reactive for anti-SARS-CoV-2 antibodies. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. 3. The LIAISON® SARS-CoV-2 S1/S2 IgG test detects IgG-antibodies against the S1/S2 domains of Unless stated otherwise, continuous data are given as median (quartile 1 -quartile 3). Categorical data are given as counts and percentages. Diagnostic sensitivity and specificity, as well as positive and negative predictive values, were calculated using MedCalc software 19.2.1 (MedCalc Ltd., Ostend, Belgium). 95% confidence intervals (CI) for sensitivity and specificity were calculated according to Clopper and Pearson ("exact" method) with Standard logit All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint confidence intervals for the predictive values (16 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint To describe assay specificity, we used a total of 1,154 serum samples collected before SARS-CoV-2 circulated in the population and which are, by definition, negative for SARS-CoV-2 specific antibodies. The three different specificity cohorts A-C (described in detail in Supplementary Figure 1 ) presented with different rates of false-positives (Table 1) -cohort C (cohort of rheumatic diseases) showing the highest reactivities. We found in total 3, (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint overlapped with false-negatives in the Abbott test (both nucleocapsid-antigen based assays), whereas DiaSorin was negative for an additional six serum samples exclusively (S1/S1-domain antigen-based assay). Although specificity and sensitivity are essential criteria for assessing the quality of a test procedure, they have little informative value about the probability of a positive/negative test result, indicating the presence/absence of SARS-CoV-2 specific antibodies without taking prevalence into account. Therefore, a comparative overview for specificity, sensitivity, as well as positive and negative predictive values at 1%, 5%, and 10% SARS-CoV-2 antibody seroprevalence is shown in Figures 2A and 2B As shown in Figure 3A In the next step, we aimed to assess whether modifying the cut-off values could improve the explanatory power of the ROC-curves. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. (Table 3) . Despite a good overall inter-rater agreement, significant differences could be shown using McNemar's test for DiaSorin and Roche (Supplementary Table 5 ). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint To the best of our knowledge, this is the first side-by-side comparison of three fully automated SARS-CoV-2 antibody tests applying more than 1,200 distinct donor/patient samples. We identified significant differences between two of the three systems, especially regarding positive predictability at the expectable low prevalence rates. SARS-CoV-2, those highly virulent pathogens cause severe respiratory syndromes, often with lethal outcome (18) . In contrast, infections with other members of the coronavirus family usually present with mild colds, including 229E, OC43, NL63, and HKU1 (19) . Compared to SARS-CoV (which is no longer circulating), cross-reactivity between SARS-CoV-2 and endemic seasonal coronaviruses is low. To date, with few exceptions (24) , no accumulation of cross-reactivities between anti-SARS-CoV-2 antibodies and seasonal coronavirus antibodies has been found. We have therefore refrained from screening a coronavirus panel for possible cross-reactivity. To best describe the specificity of a serological test, it is essential to have a reliable reference, i.e., to ensure that the samples used are negative for the target analyte. For SARS-CoV-2, this means using serum/plasma samples obtained before the first appearance of the new virus. Therefore, we have compiled large pre-COVID-19 cohorts, which have the following characteristics: A) samples of an age and sex-controlled population-based cohort of more than (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint 32 samples tested positive in more than one assay (Abbott and DiaSorin, one sample from the LEAD study). Since these two test systems use different antigens (nucleocapsid vs. S1/S2 proteins) but the same detection method (IgG), this false-positive reaction is likely associated with interference of the IgG measurement. Calculated specificities are strongly dependent on the spectrum and the size of a selected specificity cohort. If we calculated the specificities of each cohort separately, we would be able to report variable specificities: cohort A (Roche 100%, Abbott 99.2%, DiaSorin 98.8%), cohort B (Roche 99.7%, Abbott 99%, DiaSorin 98.3%), and cohort C (Roche 99.4%, Abbott 99.4%, DiaSorin 97.5%). Roche would range from ideal 100% down to 99.4%, the same level as the best result for Abbott, and DiaSorin would be nearly as good as the worst Abbott specificity or show a 2.5% difference to the best Roche value. This would have an enormous impact on prevalence dependent parameters like PPV. A recent evaluation of the DiaSorin LIAISON® SARS-CoV-2 S1/S2 IgG assay with 1,140 pre-COVID-19 samples reported a specificity of 98.5% (20) , nearly perfectly matching the specificity of 98.3% we found when calculating the average of all three cohorts. In contrast, another recent study reported a specificity of 100% for DiaSorin. However, the authors used only n=81 samples for specificity testing (21) . Similarly, a further evaluation comparing all three SARS-CoV-2 tests by Abbott, Roche, and DiaSorin found quite different specificities, namely 100%, 98%, and 96.9% for Abbott, Roche, and DiaSorin, respectively. Again, the specificity cohort was very small (n=100, and n=98 for DiaSorin) (22) . This underlines the importance of selecting adequately sized testing cohorts to obtain reliable and comparable results. In summary, the specificities of 99.7%, 99.2%, and 98.3% found in the present study are very close to the values given by the manufacturers of 99.8%, 99.6%, and 98.5% for Roche, Abbott, and Diasorin, which were also established on large collectives. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint The COVID-19 positive cohort used in this study for the estimation of sensitivities is relatively small (n=65). However, it has three distinctive features: 1. each patient/donor is represented in the collective with only one serum sample, avoiding bias of the data by multiple measurements of the same individuals, 2. the median time of blood sampling was 41 days after onset of symptoms and thus in the plateau phase of antibody formation, and 3. 80% of the cohort were non-hospitalized COVID-19 patients (two-thirds of them with mild symptoms), and only 20% were intensive care patients. As sensitivity within the first 14 days after symptom onset is highly variable for most SARS-CoV-2 antibody assays but becomes better >14 days (23, 24) , we expected high sensitivities for all tested assays in the plateau phase of antibody formation. Surprisingly we found multiple RT-PCR confirmed COVID-19 patients displaying very low antibody titers that did not surpass the respective assay-specific cut-offs and therefore were considered negative. antibodies. This observation, combined with the claim that the detection of S1/S2 proteinspecific antibodies is equivalent to the detection of neutralizing antibodies (nAbs) (20) , raises the fundamental question of whether nAbs are detectable in all patients with confirmed COVID-All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint protein-specific antibodies (25) nor the assumption that all COVID-19 patients produce measurable titers of nAbs is universally valid (26) . A clear answer to the question of whether antibody measurements against nucleocapsid-or spike protein-associated antigens are more sensitive and specific, and how these behave in relation to nAbs assays (the postulated gold standard in terms of sensitivity and specificity) is not possible based on the data currently available. Specificity and sensitivity alone are not sufficient to judge the performance of a diagnostic test; prevalence-dependent accuracy measures like PPV and NPV are necessary, and especially PPV, in times of low prevalence (27) . For most regions affected by the pandemic, the prevalence of SARS-CoV-2 antibody-positive individuals is unknown but can be estimated to be below 5%. Therefore, for all SARS-CoV-2 EUA approved antibody tests, the FDA compares the performance of the assays based on a 5% seroprevalence (28) . At this rate, the results presented here show PPV values of 94.8% (85-98), 85.1% (74.7-91.7), and 71.6% (61.7-79.8) for Roche, Abbott, and DiaSorin, respectively. The PPV values between Roche and DiaSorin differ so clearly that not even the 95% CI intervals overlap. Therefore, we must assume that these two assays differ significantly from each other in terms of positive predictability. Using these two tests at lower seroprevalences, such as 1%, leads to an even more pronounced difference between Roche and DiaSorin (77.6% vs. 32.6%) and an unacceptable low PPV of 32.6% (23.6-43.1) for DiaSorin. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint significant differences, indicating disagreement (in particular in false-positives) more often than expected by chance. The strength of this study is the side by side evaluation of three assays with a large number of negative samples to give reliable and comparable specificity data (no missing data). Limitations are the moderate numbers of positive samples. Moreover, obtained sensitivities cannot easily be compared to other studies because of the unique feature of our COVID-19 cohort, including 80% non-hospitalized patients with mainly mild symptoms. The latter is highly relevant for a potential use of antibody tests to assess seroprevalence in large populations. We find diagnostically relevant differences in specificities for the anti-SARS-CoV-2 antibody assays by Abbott, Roche, and DiaSorin that have a significant impact on the positive predictability of these tests. We conclude that low seroprevalences require an unusually high specificity for SARS-CoV-2 antibody tests, which pushes some test systems to their limits earlier than others. Therefore, the choice of the test must depend on the respective seroprevalence, and strategies such as confirmation of possible false-positive test results with additional testing must be considered. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . https://doi.org/10.1101/2020.06.04.20117911 doi: medRxiv preprint Abbott SARS-CoV-2 IgG 4 (0.8%) 3 (1.0%) 2 (0.6%) 9 (0.8%) DiaSorin LIAISON® SARS-CoV-2 S1/S2 IgG 6 (1.2%) 5 (1.7%) 9 (2.5%) 20 (1.7%) All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . Table 2 . Values for Specificity, Sensitivity, Positive-Predictive-Value (PPV) and Negative-Predictive-Value (NPV) at 1%, 5% and 10% SARS-CoV-2 seroprevalence (SP) with 95% confidence intervals (95% CI). Abbott SARS-CoV-2 IgG DiaSorin LIAISON® SARS-CoV-2 S1/S2 IgG All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 5, 2020. . A Novel Coronavirus from Patients with Pneumonia in China An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases Clinical characteristics of 145 patients with corona virus disease 2019 (COVID-19 A case of imported COVID-19 diagnosed by PCR-positive lower respiratory specimen but with PCR-negative throat swabs Positive rate of RT-PCR detection of SARS-CoV-2 infection in 4880 cases from one hospital in Evaluation of the auxiliary diagnostic value of antibody assays for the detection of novel coronavirus (SARS-CoV-2) SARS-CoV-2 Serology: Much Hype, Little Data The Role of Antibody Testing for SARS-CoV-2: Is There One? Diagnostic accuracy of an automated chemiluminescent immunoassay for anti-SARS-CoV-2 IgM and IgG antibodies: an Italian experience Elecsys Anti-SARS-CoV-2 package insert 2020-04 The LEAD (Lung, Heart, Social, Body) Study: Objectives, Methodology, and External Validity of the Population-Based Cohort Study Usage Data and Scientific Impact of the Prospectively Established Fluid Bioresources at the Hospital-Based MedUni Wien Biobank. Biopreservation and Biobanking Confidence intervals for predictive values with an emphasis to case-control studies Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach Comparative pathogenesis of COVID-19, MERS, and SARS in a nonhuman primate model Forty years with coronaviruses Clinical And Analytical Performance Of An Automated Serological Test That Identifies S1/S2 Neutralizing IgG In Covid-19 Patients Semiquantitatively. bioRxiv. Cold Spring Harbor Laboratory Validation of a chemiluminescent assay for specific SARS-CoV-2 antibody Highthroughput immunoassays for SARS-CoV-2, considerable differences in performance when comparing three methods. medRxiv Clinical Performance of Two SARS-CoV-2 Serologic Assays Performance Characteristics of the Abbott Architect SARS-CoV-2 IgG Assay and Seroprevalence in Performance of six SARS-CoV-2 immunoassays in comparison with microneutralisation. medRxiv Neutralizing Antibody Responses to SARS-CoV-2 in a COVID-19 Recovered Patient Cohort and Their Implications Measures of Diagnostic Accuracy: Basic Definitions. EJIFCC. International Federation of Clinical Chemistry and Laboratory Medicine US Food Drug Administration. EUA Authorized Serology Test Performance Table 3 . Inter-rater agreement (Cohen´s kappa) with linear weights. Value of K <0.20 poor agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 good agreement, and 0.81-1.00 very good agreement.