key: cord-0841159-43fbfglt
authors: Plebani, Mario; Padoan, Andrea; Negrini, Davide; Carpinteri, Benedetta; Sciacovelli, Laura
title: Diagnostic performances and thresholds: the key to harmonization in serological SARS-CoV-2 assays?
date: 2020-05-30
journal: Clin Chim Acta
DOI: 10.1016/j.cca.2020.05.050
sha: d70624079bd8dac19421c92005343e20d145a605
doc_id: 841159
cord_uid: 43fbfglt

BACKGROUND: The evaluation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) specific antibody (Ab) assay performances is of the utmost importance in establishing and monitoring virus spread in the community. In this study focusing on IgG antibodies, we compare reliability of three chemiluminescent (CLIA) and two enzyme linked immunosorbent (ELISA) assays. METHODS: Sera from a total of 271 subjects, including 64 reverse transcription-polymerase chain reaction (RT-PCR) confirmed SARS-CoV-2 patients were tested for specific Ab using Maglumi (Snibe), Liaison (Diasorin), iFlash (Yhlo), Euroimmun (Medizinische Labordiagnostika AG) and Wantai (Wantai Biological Pharmacy) assays. Diagnostic sensitivity and specificity, positive and negative likelihood ratios were evaluated using manufacturers’ and optimized thresholds. RESULTS: Optimized thresholds (Maglumi 2 kAU/L, Liaison 6.2 kAU/L and iFlash 15.0 kAU/L) allowed us to achieve a negative likelihood ratio and an accuracy of: 0.06 and 93.5% for Maglumi; 0.03 and 93.1% for Liaison; 0.03 and 91% for iFlash. Diagnostic sensitivities and specificities were above 93.8% and 85.9%, respectively for all CLIA assays. Overall agreement was 90.3% (Cohen’s kappa = 0.805 and SE = 0.041) for CLIA, and 98.4% (Cohen’s kappa = 0.962 and SE = 0.126) for ELISA. CONCLUSIONS: The results obtained indicate that, for CLIA assays, it might be possible to define thresholds that improve the negative likelihood ratio. Thus, a negative test result enables the identification of subjects at risk of being infected, who should then be closely monitored over time with a view to preventing further viral spread. Redefined thresholds, in addition, improved the overall inter-assay agreement, paving the way to a better harmonization of serologic tests.

The spread of coronavirus disease 2019 , caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a pandemic, with sustained human-to-human transmission. Since the initial identification of COVID-19 in December 2019, there has been an exponential rise in the number of cases worldwide. The reasons for the rapid spread include the high transmissibility of the virus, especially among asymptomatic or minimally symptomatic carriers, as well as the apparent absence of any cross-protective immunity from related coronavirus infections, and the tardy public health response measures (1) (2) .

An accurate diagnosis of SARS-CoV-2 infection is required for prompt and effective patient care. In particular, the rapid identification of cases among hospitalized patients remains a high priority in assuring prompt, and effective, treatments, allocating personal protective equipment (PPE), and in preventing nosocomial spread with subsequent community transmission. In addition, accurate diagnosis is of paramount importance in controlling the outbreak, establishing protective measures, monitoring therapy and conducting epidemiological surveillance (3) . The detection of the viral genome in respiratory samples, particularly nasopharyngeal specimens for swab-based SARS-CoV-2 testing with RT-PCR, is currently considered the "gold standard" for confirming a clinically suspected diagnosis and identifying asymptomatic carriers (4) . COVID-19 infection can also be detected indirectly by measuring the host immune response to SARS-CoV-2 infection. Virus-specific antibody (Ab) detection for COVID-19 should complement nucleic acid testing, particularly in the later stages of infection (i.e., when the virus has been eliminated) (5) , in surveying for asymptomatic infection in close-contacts, in establishing and monitoring the extent of viral spread in the community, and in conducting epidemiological surveillance (4) .

The several assays developed and currently available on the market, differ due to two major variables: a) the format used and, in particular, whether the test is a quantitative laboratory-based immunoassay ELISA, CLIA or a qualitative point-of-care test (POCT); and b) the SARS-CoV-2 antigen targeted, in particular if the antibodies are addressed against the spike surface protein (s) (namely subunit 1 and 2), and/or the spike receptor binding domain (RBD), and/or the nucleocapside protein (NC) (6).

In order to appropriately use serological tests "for the right patient at the right time", it is therefore important to validate serological methods that can be used in a specific patient as well as in largescale studies, by comparing different available methods in order to identify the right cut-off as well clinical performance and diagnostic accuracy.

Aim of this study was to evaluate different chemiluminescent (CLIA) and enzyme-linked immunosorbent (ELISA) assays for SARS-CoV-2 antibodies in COVID-19 patients and healthcare operators, to identify appropriate cut-offs and evaluate diagnostic accuracy.

Pag 4of 18

A total of 271 subjects (87 healthcare workers, 64 SARS-CoV-2 patients, 19 autoimmune patients and 101 blood donors) underwent analysis with the different chemiluminescent (CLIA) and enzyme linked immunossays (ELISA) systems in order to verify immune response in relation to the different subject categories. With the exception of donors, all other subjects underwent at least one nasopharyngeal swab test, analyzed by RT-PCR. Of the 87 healthcare workers, 71 were considered negative (Neg-HW), since at least three sequential molecular results obtained between February 26th and April 10th , 2020 were negative, and the remaining 16 were considered positive with mild disease (Mild), and recovered at home, with supportive care and isolation. The 64 SARS-CoV-2 patients had moderate or severe disease and were monitored throughout hospitalization; 32 recovered with, and 32 without, the need of air ventilation support (Mod and Sev, respectively). The 19 autoimmune patients, who were SARS-CoV-2 negative and regularly referred to the clinic's consultants for autoimmune disease (AI), were included in order to evaluate possible analytical interferences. The 101 donors were included because specimens had been collected from them and frozen at -80°C in 2015, before the emergence of SARS-Cov-2 (Pre-COV). Table 1 reports on the number and characteristics of subjects included in the study for gender, age and negativity or positivity to the virus SARS-CoV-2.

The study protocol (number 23307) was approved by the Ethics Committee of the University-Hospital, Padova.

The following five different analytical systems were evaluated. 

For Liaison and iFlash, precisions were estimated by using two or three human serum pools, respectively, of samples with different values. Estimations of precision were obtained by means of triplicate measurements of aliquots of the same pool, performed for a total of five consecutive days (Liaison) or three consecutive days (iFlash). Analysis of variance was used to estimate precision.

Maglumi data on repeatability and intermediate precision are reported elsewhere (7).

Analyses were performed using Stata v13.1 (StataCorp, Lakeway Drive, TX) and R software v 4.0 (The R Foundation for Statistical Computing) and MedCalc Statistical Software version 19.2.1. Mean and standard deviations were used for descriptive statistics. Logarithmic transformation (log 10 ) was applied to skewed data when necessary, before using parametric t-test to compare groups. Multiple comparisons were made using the calculation of Bonferroni adjusted (B-adj) p-values. Fisher's exact test was employed to evaluate categorical data. The empirical method and Youden index were used to estimate the area under (AUC) the receiver operating characteristics curve (ROC) and best thresholds, respectively. Bland Altman analysis and Passing-Bablok regressions were used to assess the comparability of CLIA assays. Assessment of agreement was performed by concordance (in percentage) and by Cohen's kappa. Thresholds for harmonizing assay results were determined by an in-house Rscript iterating the assessment of agreement for all the possible combinations of methods cut-offs, considering a minimum incremental delta of 0.2 kAU/L. Table 1 reports the demographic characteristics of the study subjects/ healthcare workers, SARS-CoV-2 patients autoimmune patients and donors. The (total) overall mean age of subjects was 51.6 years with a standard deviation (±SD) of 14.7 (min, 10.2; max, 89.9); 141 (52%) were females. Age 

The results of Liaison precisions calculated for IgG at two levels for repeatability and intermediate 

IgG and IgM Ab results are reported in Figures 1 and 2 , a log 10 scale being used to enhance the visualization of data dispersion and box plots. Donors (only for CLIA assays) and negative autoimmune patients results were included to verify possible analytical interferences and differences with respect to healthcare workers who repeatedly tested negative to nasopharyngeal swab.

Manufacturers' cut-offs are reported in Table 2 .

Different numbers of samples were measured for each assay, depending on the availability of reagents, and in particular, 170 for Maglumi, 131 Liaison and 156 for iFlash. The ROC analyses Pag 7of 18 underlined overlapping results in terms of AUC for all assays.

Following the manufacturers' specifications, the sensitivities, specificities, likelihood ratios (LR), classification accuracy and Cohen's kappa were calculated and reported ( Table 2 ). The highest sensitivity and specificity were obtained for Maglumi and Liaison, respectively. The performances of the two assays resulted in a negative and positive likelihood ratio of 0.06 and 25.94, respectively.

Classification accuracies were greater than 90% for Maglumi and iFlash.

Using the Youden index metric, for each assay the best thresholds were estimated. These thresholds were different from manufacturers' suggested cut-offs, especially for Maglumi and Liaison. The redefined thresholds allowed higher values to be obtained for: specificity, classification accuracy and positive LR for Maglumi; sensitivity, accuracy and negative LR for Liaison; sensitivity and negative LR for iFlash. Using these redefined thresholds, the predictive characteristics of each assay were investigated by Fagans' nomogram considering the prevalence of disease detected among healthcare workers at the University-Hospital of Padova as 0.04 (4%; data not shown). The results showed that Liaison and iFlash assays allowed an almost perfect classification of negative subjects, with a posttest probability of not-having a disease of around 0.0015 (0.15%) (Supplementary Fig. 1 ).

The pairwise agreements between the results of CLIA and ELISA assays were evaluated considering 

The analyses made enabled us to define the best (possible) threshold for allowing agreement of CLIA results. The highest agreements and Cohens' kappa were achieved with the following assays/thresholds settings: a) 92.9% and 0.856 for Maglumi and Liaison with 2.0 kAU/L and 7.6 kAU/L cut-offs, respectively; b) 97.2% and 0.944 for Maglumi and iFlash with 1.6 kAU/L and 10 kAU/L cut-offs, respectively; c) Pag 8of 18 94.2% and 0.883 for Liaison and iFlash with 7.6 kAU/L and 15.8 kAU/L cut-offs, respectively.

Considering all CLIA assays, the highest agreements and Cohens' kappa were achieved with the following threshold settings: 91.0% and 0.879 for Maglumi at 2.0 kAU/L, Liaison at 7.6 kAU/L and iFlash with 15.8 kAU/L.

There is an urgent need to identify strategies aiming to safely ease lockdown measures, thus allowing a return to productive economic levels, and social activity. Individuals who test positive for antibodies against SARS-CoV-2 could act as 'shields' against transmission. A recent model suggested that serological testing could make an important contribution in the reduction of viral spread and overall mortality (9). Rigorous comparative performance data are crucial to understanding the potential clinical usefulness of serological assays, starting from the evaluation of analytical performance characteristics to improve the definition of diagnostic accuracy not only in terms of specificity and sensitivity but also as positive and negative likelihood ratios, in order to provide reliable clinical information in different disease prevalence settings. A significant challenge in determining whether an individual is immune to SARS-CoV-2 depends on the fact that, so far, serologic data have mainly been obtained in hospitalized symptomatic patients. Serologic findings in asymptomatic or mildly symptomatic exposures may not present the same high degree of correlationin hospitalized patients. Therefore, we evaluated not only hospitalized patients but also healthcare operators in order to provide evidence of any different antibody behavior. Currently, there is an urgent need to identify those subjects who were not previously infected by COVID-19, in order to prevent further spread of the virus. In this study we focused on detecting the virus in healthcare workers, who are at a high risk of contracting the disease and consequently putting patients and coworkers at risk. For this purpose, we performed an in-depth optimization of assays thresholds, to achieve the best negative likelihood ratios. COVID-19 patients were identified according to the currently recognized "gold standard": a positive nasopharyngeal swab test result for both hospitalized patients and healthcare workers. In view of current knowledge of seroconversion time and antibody kinetics (10,11), we only included samples collected after 11 days from the onset of symptoms. SARS-CoV-2 negative healthcare workers were defined as subjects with at least three sequential negative nasopharyngeal swab test as a criterion to assure the complete absence of infection. In addition, we included 101 donors with samples collected in 2015 (before the emergence of SARS-CoV-2) and 19 autoimmune patients in order to verify possible analytical interferences.

At the time the study started, the data available on precision for CLIA assays were limited. Apart from Maglumi, which we evaluated in a previous study, Liaison and iFlash assays repeatabilities and intermediate precisions were estimated and results obtained with both methods proved satisfactory.

Pag 9of 18 Figure 2 shows the comparison data for IgM and IgA SARS-CoV-2 antibodies. The Liaison and Euroimmun assays have been developed to measure SARS-CoV-2 IgG only, and therefore IgM were not available, whilst Euroimmun IgA was evaluated and included in the comparison. As shown, and in agreement with findings reported in recently published studies (12), IgM does not provide valuable information for study purposes, and therefore these results were not included in the assays comparison. Interestingly, as highlighted in Figure 2 , some Neg-HW cases were found to be positive whereas some SARS-CoV-2 positive patients were negative in all IgG, and IgM and IgA, methods.

This might suggest that some SARS-CoV-2 positive patients neither produce detectable IgG antibodies, nor produce IgA and IgM, even if the nasopharyngeal swab test is positive.

The overall performances of assays were elevated, AUC being above 96% for all the three CLIA methods, although overlapping confidence intervals show that this finding was not of statistical significance. Further studies are required to confirm this observation. The threshold redefinition was effective in improving diagnostic performances. Considering the purpose of achieving the best negative likelihood ratio, the optimized cut-offs allowed us to obtain a real improvement for Liaison and iFlash. Furthermore, the Fagan's nomogram was used to enhance the provision of evidence of the usefulness of IgG values in clinical practice, emphasizing achievements. Given a pre-test probability of disease (e.g. 0.04 or 4%), this tool shows the post-test disease probability for both positive and negative test results. In an ongoing study of the Veneto Region, the estimated prevalence of COVID-19 disease is 4% (data not shown). Consequently, the final probability of a healthcare worker with a negative test result contracting COVID-19 infection is around 0.15% for Liaison and iFlash and 0.3% for Maglumi; this, from a clinical view-point means that a negative result provides a highly satisfactory exclusion power.

The between-methods agreements obtained were all above 93%, but our data suggest that the highest agreements can be achieved with ELISA assays. A combination of thresholds was estimated in order to improve the overall agreement between CLIA assays. The thresholds calculated allowed us to obtain the highest agreement (91%) on adopting the threshold that assured the best performance (Youdex index) for Maglumi and Liaison, while for iFlash the threshold was higher than that identified with the Youden index, as previously reported. This, in turn, provides preliminary evidence that analytical harmonization is not a "mission impossible".

The present study has some limitations. First, antibody dynamics monitoring was extended to a maximum of 54 days, the initial COVID-19 patients being admitted to our hospital at the end of February 2020. Second, the limitation in sample sizes and reagents precluded measurement with different assays in the same number of subjects. Third, the relationships of currently measured antibodies with neutralizing activity against SARS-CoV-2 were not evaluated. A body of evidence, however, demonstrates that antibodies targeting different domains of S protein, including S1, RBD Pag 10of 18 and S2, may all contribute to virus neutralization (13, 14).

Overall performances of the evaluated CLIA assays were highly satisfactory, and allowed us to achieve an accurate classification. Moreover, good agreement was found between CLIA and ELISA assays. The results obtained indicate that it might be possible to define thresholds that improve the negative likelihood ratio. On considering healthcare workers, a negative test result allowed us to identify negative subjects, who should then be closely monitored over time to prevent viral spread. 

-Serological assays for SARS-CoV-2 are widely available -Data on diagnostic sensitivity, specificity and likelihood ratio are needed -The identification of reliable thresholds improves diagnostic accuracy -The negative predictive value is a valuable clinical information

Diagnostic testing for Severe Acute Respiratory Syndrome-Related Coronavirus-2: A Narrative Review

Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship

Emergence of a Novel Coronavirus Disease (COVID-19) and the Importance of Diagnostic Testing: Why Partnership between Clinical Laboratories, Public Health Agencies, and Industry Is Essential to Control the Outbreak

Interpreting Diagnostic Tests for SARS-CoV-2

SARS-CoV-2 positive patients, subdivided into Mild (Mild), Moderate (Mod), and Severe (Sev) symptoms. Statistically significant differences: for Maglumi, Pre-COV and AI vs NegHW and SARS-CoV-2 positive patients (p<0.01 for all) and Mild vs Mod and Sev (p<0.01 and p = 0.022, respectively); iFlash, Pre-COV and AI vs NegHW and SARS-CoV-2 positive patients (p<0.01 for all) and Mild vs Sev

01 for Mod and Sev). 1: Fagan's nomograms for CLIA methods, calculated using the likelihood ratios obtained considering the thresholds from Youden's index, and a prevalence of SARS-CoV-2 infection of 0.04 (4%). Post-test probabilities for positive (LR_positive) and negative (LR_Negative) test results are shown for A) Maglumi, B) Liaison and C) iFlash

Supplementary Figure 2:Bland Altman and Passing Bablok analyses for: A) and D) Liaison vs

B) and E) for Liaison and iFlash; C) and F) for iFlash and Maglumi