key: cord-1049840-miztel9l authors: rudolf, f.; Kaltenbach, H.-M.; linnik, J.; Ruf, M.-T.; Niederhauser, C.; Nickel, B.; Gygax, D.; Savic, M. title: Clinical Characterisation of Lateral Flow Assays for Detection of COVID-19 Antibodies in a population date: 2020-08-21 journal: nan DOI: 10.1101/2020.08.18.20177204 sha: 921eef4d9a3571c01367bbf816b6821bccebb6ee doc_id: 1049840 cord_uid: miztel9l Importance: Serological assays can help diagnose and determine the rate of SARS-CoV-2 infections in a population. Objective: We characterized and compared 11 different lateral flow assays for their performance in diagnostic or epidemiological settings. Design, Setting, Participants: We used two cohorts to determine the speci- ficity: (i) up to 350 blood donor samples from past influenza seasons and (ii) up to 110 samples which tested PCR negative for SARS-CoV-2 during the first wave of SARS-CoV-2 infections in Switzerland. The sensitivity was determined using up to 370 samples which tested PCR positive for SARS-CoV-2 during the same time and is representative for age distribution and severity. Main Outcome: We found a single test usable for epidemiological studies in the current low-prevalence setting, all other tests showed lacking sensitivity or specificity for a usage in either epidemiological or diagnostic setting. However, orthogonal testing by combining two tests without common cross-reactivities makes testing in a low-prevalence setting feasible. Results: Nine out of the eleven tests showed specificities below 99%, only five of eleven tests showed sensitivities comparable to established ELISAs, and only one ful- filled both criteria. Contrary to previous results from lab assays, five tests measured an IgM response in >80% of the samples. We found no common cross-reactivities, which allows orthogonal testing schemes for five tests of sufficient sensitivities. Conclusions and Relevance: This study emphasizes the need for large and diverse negative cohorts when determining specificities, and for diverse and repre- sentative positive samples when determining sensitivities of lateral flow assays for SARS-CoV-2 infections. Failure to adhere to statistically relevant sample sizes or cohorts exclusively made up of hospitalised patients fails to accurately capture the performance of these assays in epidemiological settings. Our results allow a rational choice between tests for different use cases. weakest antigen interaction. In contrast, IgA and IgG antibodies are produced later, but show orders of magnitude higher affinity and can provide long-term protection. Hence, IgM levels are sometimes used for diagnostic testing, while IgA and especially IgG levels can confirm a previous infection or vaccination . 3 Scalable and accurate serological tests are fundamental to the understanding of the cumulative incidence of infections in a population. 4 According to the WHO such knowledge can help in establishing the occurrence of infection in a population, 5 which is crucial to determine the true extent of the disease, the distribution of severe, mild or asymptomatic cases, the infection fatality ratio of a population, the number of cases missed using routine disease surveillance methods, and the proportion of the population potentially protected against future infection. A SARS-CoV-2 infection can lead to a strong IgG and IgA response for all tested epitopes in people with severe symptoms, while oligosymptomatic cases showed a diverse response in both antibody levels and epitope recognition. [6] [7] [8] Most studies also point to a fast IgA and IgG response and a rather weak IgM response. Lateral flow assays (LFA) are scalable and affordable tests 9 that can potentially be used as both diagnostic and epidemiological assays. The use and approval of a test by the FDA requires a rigorous characterisation of its specificity, the probability that a negative sample yields a negative test result, and its sensitivity, the probability that a positive sample yields a positive test result. These characteristics also provide a means to compare different assays in the absence of established clinical decision points. An accurate and precise determination of the specificity hinges on sufficiently many and sufficiently diverse samples to check for unknown cross-reactivities, such as a large, diverse panel of samples from blood donors including samples from flu seasons of the previous years, as performed for the Roche assay. 10 Similarly but using a much smaller panel, post-market validations in Australia 11 and the US 12 are conducted by the regulators to check the products for their actual performance. Most LFA did not reach the manufacturers specification and were deemed "should not be used" in the US. 13 Such cohorts also mimic the setting of a seroprevalence study, an important feature for epidemiological application of a test, 14 where current low prevalence settings require high precision of specificity estimates. 15 3 . CC-BY-NC 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint On the other hand, the sensitivity needs to be assayed using a sufficiently large and representative sample of the antibody response in a population for time post infection, (but especially also in disease severity) i don't know what you mean. 16 A diverse standard panel representative for the SARS-CoV-2 response in a population yields a sufficiently precise estimate. However, most published clinical characterisations of LFA and claims from vendors are not based on such representative samples, but instead use small negative and/or positive cohorts predominantly comprised of hospitalised patients with severe cases of the disease. [17] [18] [19] [20] [21] They thereby neglect the approximately 80% oligosymptomatic cases observed in the population 22 and likely overestimate a the sensitivity of a test when applied population-wide rather than for severe cases only. Here we present the clinical characterisation of eleven commercially available SARS-CoV-2 lateral flow assays (Table 1 ). All tests were characterised using the same positive cohort consisting of all or part of a collection of 366 convalescent samples 23 Overall, most tests appear unsuitable for general use: only two tests, Hightop and Augurix, achieved a specificity higher than 99%. Several tests, including Hightop, showed sensitivities greater than two previously characterized Euroimmun and Epitope Diagnostics ELISA (>92%), while Augurix showed a sensitivity only slightly above 50%. However, our analysis and data provide evidence that orthogonal testing strategies combining several tests with high sensitivity can compensate for the individual low specificities to achieve a combined specificity of more than 98.5% with sensitivities between 85-95% for IgG. Interestingly, we found that five LFA were able to detect an IgM response early after symptom onset in more than 80% of the samples. In contrast to previous findings, this indicates a robust IgM response in SARS-CoV-2 infections. Taken together, our study highlights the need for standardized testing of these assays and suggests applications for those that perform better performing. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . https://doi.org/10.1101/2020.08.18.20177204 doi: medRxiv preprint The clinical specificity Sp=TN/(TN+FP) characterizes the qualitative performance of a dichotomous test based on the number of true negatives (TN) and false positives (FP) and is the probability that a sample with no or very low antibodies yields a negative test result. Conversely, clinical sensitivity Sp=TP/(TP+FN) characterizes the probability that a patient with detectable level of antibodies yields a positive test result based on the number of true positives (TP) and false negatives (FN). To accurately calculate the specificity, we used the plasma of a blood donor cohort com- We characterised eleven different commercially available lateral flow assays (LFA) for detection of SARS-CoV-2 specific IgM and IgG with serum samples from the three cohorts (Table 1 ). All LFA were performed according to their respective manual. In brief, test components were brought to room temperature, sera or plasma aliquots were completely thawed before testing. The test cassette was removed from the sealed pouch and the required amount of sample was pipetted into the specimen well (Table 1) , followed by addition of two or three drops of sample buffer to the specimen or, if present, buffer well. Results were read within the specified time window stated in Table 1 . Lot numbers and expiration dates are given in (Supp. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . We assayed the Hightop test using whole blood, 23 serum and plasma , while all other tests were assayed using serum and plasma. The Hightop and MEDSan assays were characterised at the SwissTPH using the identical biobank and experimental setup as outlined previously. 23 Eight tests were characterised simultaneously at the KUSPO Münchenstein and the Biotime at the FHNW, Muttenz. These latter nine tests were characterised using the experimental design outlined below. We prepared ten 96-well plates to distribute the samples of the SERO-BL-negative, SERO-BL-positive and blood donor cohorts. We aimed at a roughly equal distribution of IgG levels (as previously established using ELISA) on each plate to avoid plate-specific biases and to this end divided the positive samples into five strata of different IgG level, each strata occurring on each plate roughly the same number of times. Assignment of samples of each stratum and the negative cohorts to each plate was fully randomized. We selected two samples from the SERO-BL-negative cohort, and two samples each from 6 . CC-BY-NC 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . medium and high-level IgG patients and replicated each of these samples on each plate to estimate between-plate variation. We also selected one patient sample at random for each plate and replicated it five times on that plate to provide an estimate of withinplate variation. Finally, we randomly selected one patient sample with high IgG level for each plate, and added a ten-fold 1:2 dilution series on the same plate to be able to establish detection limits for each test. Assignment of patient samples to wells was fully randomized individually for each plate and the same plate layouts were used for all tests. Data analysis and creation of figures and tables was carried out using in-house scripts in R; 24 binomial confidence intervals are 95%-Clopper-Pearson intervals calculated using exactci provided by the package PropCIs. 25 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . Overall, the assays from Hightop and Augurix show specificities >99% for both IgG and IgM, while the assays by CTK, SureBiotech and Mexacare have specificities between 97% and 99% for IgG and IgM. The assay by Biotime also showed a specificity between 97% and 99%, but only for IgM. All other assays have specificities below 95% for both IgG and IgM. The combined IgG/IgM specificity of TAmiRNA is 97.8%. To address potential cross-reactivity with different viruses circulating in the population, we additionally characterized each assay based only on blood donor samples from previous flu seasons. We found no significant differences for any test based on IgM, and only Lumiratek and MEDsan showed markedly worse characteristics for samples from earlier flu seasons. We next checked whether samples are cross-reactive in multiple tests, which could be indicative for common epitopes, membranes or impurities. We found no noticeable shared cross-reactivities: Only 11 samples were positive in two or more tests and only two tests showed common cross-reactive samples for IgG (Supp . Table 5 ). Similarly, only six samples were positive in two or more IgM tests and only two tests showed common cross-reactive samples for IgM (Supp. Table 6 ). Only two samples were cross-reactive in two or more tests for both IgM and IgG. The sensitivity of an assay depends on the time post infection and on the disease severity. To arrive at a comprehensive characterization of the sensitivity of each assay, we used samples from our previously established biobank of the canton of Basel-Landschaft. It contains samples from people tested during the first wave of the pandemic in Switzerland, with a wide range of days post symptom onset and of disease severity; these samples are representative for symptomatic and oligosymptomatic cases. 23 Days post symptoms and disease severity were established with a doctor's interview, and we categorized the severity in 'bedridden', 'help needed', and 'no restriction'. An overview of the results stratified by days after onset of symptoms ( Figure 1A) and severity of the disease ( Figure 1B) . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. Table 3 ). The assays from SureBiotech, Lumiratek, and Hightop showed sensitivities >94%, a value exceeding the sensitivities of two previously characterised ELISAs (Epitope Diagnostics and Euroimmun), 23 while the assay from Biotime showed a sensitivity of about 91%, comparable to these ELISAs. Tests generally showed higher sensitivities with increasing days post onset of symptoms, while Augurix and NTbio surprisingly detected more cases at >14d, although on a low level. The assays from SureBiotech and Lumiratek additionally showed a response for about 2/3 of the samples with less than 14 days post symptom onset. Levels of IgM are expected to increase early during a disease and then decrease at later timepoints. We also observe this general trend for the 11 tests, even though each test seems to react differently at different stages (Supp Table 3 ). Detection of IgM levels is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. To evaluate the suitability of each test for applications, we calculated the PPV and NPV for an assumed prevalence between 0% and 25% (Figure 3 and Supp. Figure 8 ). The PPV is mainly determined by the specificity of the test, and a low prevalence can severely influence the PPV due to the high proportion of false positive test results ( Figure 3 ). For IgG at >21d post onset of symptoms, only the assays by Hightop and Augurix showed sufficient PPV for low prevalence, while all other assays showed poor performance, often far below 50% PPV (meaning at least one in two positive tests is incorrect). Even at prevalence as high as 5%, the tests by MEDSan, Biotime, Biozek, and NTBio have PPVs below 50%. Results for IgM for 7-28 days post symptoms are generally worse. Conversely, the NPV is largely determined by the sensitivity of a test and usually decreases with increasing prevalence, as proportionally more positive cases are observed; this decrease in NPV is therefore more pronounced for tests with low sensitivity, especially Augurix and CTK (Figure 3 ). 14 . CC-BY-NC 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . https://doi.org/10.1101/2020.08.18.20177204 doi: medRxiv preprint Europe, augmented by samples from blood donors of previous influenza seasons. Our point estimates of specificities differ from those previously reported in studies or by manufacturers. However, the corresponding interval estimates show compatible estimates with all but one previous study (Supp Table 7 ). Our comparatively large number of negative samples resulted in a narrower interval estimation, indicating higher precision. We did not observe clusters of samples showing cross-reactivity, these false positive samples are unlikely to come from a common recognition of the employed epitopes. This also indicates that few common cross-reactivities exist and specificities are therefore inherent to each lateral flow assay. Consequently, orthogonal testing strategies become viable options for increasing specificity beyond a single test. Our results indicate that specificity increases dramatically while the sensitivity remains usable, most often by employing a combination of the Spike and NCP as epitopes. Our point estimates of the sensitivities differ substantially from previous reports for some tests (Supp Table 7 ), but interval estimates are again compatible. In particular, our estimated sensitivities of samples of bedridden patients with >21 days post symptoms are comparable to most previous studies. This subcohort includes hospitalised patients and might therefore be comparable to the previous studies, but no definite statement is possible due to the small size of this subcohort. Importantly, our biobank allows stratifying the sensitivity estimates by days post symptoms and disease severity, thereby providing a more detailed picture of test performance for oligosymptomatic patients which comprise about 80% of cases in a population. We found that for the 'no restriction' cohort, three assays-SureBiotech, Lumiratek, Hightop-perform similarly to previously characterised ELISAs. In contrast to previous reports, we were able to determine an early IgM response with several assays. This hints at the possibility of using IgM for diagnostic purposes in early infection, but our cohort is again for a conclusive statement. Additionally, the sensitive IgM assays show low specificity resulting in a high false positive rate, and their results can therefore only serve as an additional input for a clinical decision. At the moment, there is no diagnostic value in the IgG measurement as a comparison to neutralising titers is missing in all studies as well as the vendors specifications. . CC-BY-NC 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . We anticipate that our characterisation of 11 LFA on common and diverse samples provides a basis to establish clinical and epidemiological decision points from which analytical sensitivities can be established. This requires standard panels of antibodies representative for the immune response to a SARS-CoV-2 infection, which are still under development. 26 In contrast, the clinical specificity will always remain the sole meaningful one and future specificity tests have to be performed on rather large, diverse panels of negative samples, especially as only few common cross-reactivities were detected. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . https://doi.org/10.1101/2020.08.18.20177204 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . https://doi.org/10.1101/2020.08.18.20177204 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 21, 2020. . https://doi.org/10.1101/2020.08.18.20177204 doi: medRxiv preprint Overview of the immune response Structure and function of immunoglobulins A guide to utilization of the microbiology laboratory for diagnosis of infectious diseases: 2018 update by the Infectious Diseases Society of America and the Serology assays to manage COVID-19 World Health Organization . Population-based age-stratified seroepidemiological investigation protocol for coronavirus 2019 (COVID-19) infection, 26 May 2020. tech. rep.World Health Organization Severe acute respiratory syndrome coronavirus 2-specific antibody responses in coronavirus disease 2019 patients Virological assessment of hospitalized patients with COVID-2019 Antibody responses to SARS-CoV-2 in patients with COVID-19 Performance of an automated anti-SARS-CoV-2 immunoassay in prepandemic cohorts Post-market validation of three serological assays for COVID-19 The COVID-19 Serology Studies Workshop: Recommendations and Challenges. Immunity EUA authorized serology test performance Point-of-care serological assays for delayed SARS-CoV-2 case identification among health-care workers in the UK: a prospective multicentre cohort study. The Lancet Respiratory Medicine Requirements for minimum sample size for sensitivity and specificity analysis Basic methods for sensitivity analysis of biases Diagnostic accuracy of Augurix IgG serology rapid test Four point-of-care lateral flow immunoassays for diagnosis of COVID-19 and for assessing dynamics of antibody responses to SARS-CoV-2 Diagnostic performance of 7 rapid IgG/IgM antibody tests and the Euroimmun IgA/IgG ELISA in COVID-19 patients Clinical performance of different SARS-CoV-2 IgG antibody tests Head-to-Head Accuracy Comparison of Three Commercial COVID-19 IgM/IgG Serology Rapid Tests Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Initial characterisation of ELISA assays and the immune response of the clinically correlated SARS-CoV-2 biobank SERO-BL-COVID-19 collected during the pandemic onset in Switzerland R: A Language and Environment for Statistical Computing. R Foundation for Statistical ComputingVienna PropCIs: Various Confidence Interval Methods for Interim Guidelines for COVID-19 Antibody Testing