key: cord-0682818-fqj2mh78 authors: Spick, Matt; Lewis, Holly M.; Wilde, Michael J.; Hopley, Christopher; Huggett, Jim; Bailey, Melanie J. title: Systematic review with meta-analysis of diagnostic test accuracy for COVID-19 by mass spectrometry date: 2021-10-27 journal: Metabolism DOI: 10.1016/j.metabol.2021.154922 sha: 4446034fa38a4e5a1cf8d7701c5a1f77685cb13b doc_id: 682818 cord_uid: fqj2mh78 BACKGROUND: The global COVID-19 pandemic has led to extensive development in many fields, including the diagnosis of COVID-19 infection by mass spectrometry. The aim of this systematic review and meta-analysis was to assess the accuracy of mass spectrometry diagnostic tests developed so far, across a wide range of biological matrices, and additionally to assess risks of bias and applicability in studies published to date. METHOD: 23 retrospective observational cohort studies were included in the systematic review using the PRISMA-DTA framework, with a total of 2858 COVID-19 positive participants and 2544 controls. Risks of bias and applicability were assessed via a QUADAS-2 questionnaire. A meta-analysis was also performed focusing on sensitivity, specificity, diagnostic accuracy and Youden's Index, in addition to assessing heterogeneity. Findings. Sensitivity averaged 0.87 in the studies reviewed herein (interquartile range 0.81–0.96) and specificity 0.88 (interquartile range 0.82–0.98), with an area under the receiver operating characteristic summary curve of 0.93. By subgroup, the best diagnostic results were achieved by viral proteomic analyses of nasopharyngeal swabs and metabolomic analyses of plasma and serum. The performance of other sampling matrices (breath, sebum, saliva) was less good, indicating that these protocols are currently insufficiently mature for clinical application. CONCLUSIONS: This systematic review and meta-analysis demonstrates the potential for mass spectrometry and ‘omics in achieving accurate test results for COVID-19 diagnosis, but also highlights the need for further work to optimize and harmonize practice across laboratories before these methods can be translated to clinical applications. The COVID-19 pandemic has resulted in significant morbidity and mortality across the globe. 1 The severity of the pandemic has also triggered developments and accelerated application in many scientific fields, including vaccine technology, drug treatment, and testing. Whilst the global standard in diagnosis has been the polymerase chain reaction combined with reverse transcription (RT-PCR), at times demand has exceeded supply, leading to research across many analytical disciplines for alternative diagnostic solutions. 2, 3 The potential of mass spectrometry (MS) for research into diseases and their diagnosis is well-established, 4, 5 with the flexibility of the technique allowing both proteomic and metabolomic analysis across a wide array of biological matrices. A number of methods have been developed and improved over the last eighteen months, 6 but given the exigencies of the pandemic, researchers have often been unable to establish ideal case-controls, blind tests or sufficient participant recruitment to meet best-practice thresholds for either point of care or laboratory-based detection tests. 7 Whilst clinical diagnostic tools such as bilateral chest X-rays and similar methods have been systematically reviewed, 2 no such systematic review and diagnostic meta-analysis has to our knowledge been published on tests based on mass spectrometry. In this review we explored the state of mass-spectrometry-led diagnostic testing for COVID- 19 infection across different biological matrices using 'omics approaches, incorporating a meta-analysis of key parameters. These included accuracy, sensitivity, specificity, and Youden's Index, as well as an assessment of heterogeneity. Any diagnostic test must have a J o u r n a l P r e -p r o o f relevant use-case, and in this review we focused on applicability to hospital admissions, 8 given that this use-case for MS would complement the capabilities offered by RT-PCR (highly sensitive, but slow turnaround relative to point of care tests) and lateral flow tests (faster but do not take advantage of the facilities and expertise available in a hospital setting). We additionally aimed to assess published studies for issues relating to bias and applicability, in order to review the undoubted progress made so far, as well as to highlight improvements that can be made in future work. The objective of this review was to benchmark a series of MS based diagnostic index tests against each other using RT-PCR as a reference test. We also aimed to identify how well new tests might meet a clinical role of accurate identification of COVID-19 infection, with a focus on admission settings. The review also sought to identify areas in existing research where bias or applicability issues may occur, and how future research may mitigate against these issues. the reference lists of all studies identified by the search strategy described above. The search strategy included articles published on the above-listed databases up to and including 14 September 2021. For all articles identified under the search strategy, titles and abstracts were screened for eligibility. The relevant articles were then read in full, including data extraction for metaanalysis. In this work, the eligibility criteria for inclusion in the systematic review and meta analysis were set as follows: (a) evaluation of a diagnostic method for COVID-19 using mass spectrometry, based on 'omics approaches, (b) using human biological matrices and (c) including diagnostic analyses, at a minimum reporting sensitivity and specificity by confusion matrix, or receiver operating characteristic (ROC) curves provided that the sensitivity / specificity trade-off was unambiguous. Articles in non-Roman characters were not included. The above search and eligibility steps were carried out by two researchers, with differences in identified articles reviewed by a third author for inclusion / exclusion. The following items were collected by two researchers from articles identified above: key metadata for each article (authors, date of publication, country of origin); methods employed (mass spectrometry, separation, biological samples collected) and diagnostic outcomes (true positive -TP; false positive -FP; false negative -FN; and true negative -TN). Diagnostic outcomes were taken directly from research where possible, or were calculated using confusion matrices based on reported sensitivity and specificity outcomes as applied to cohort data, or in one case by use of a reported ROC chart. Two researchers independently evaluated risks relating to both bias and applicability using the Diagnostic Precision Study Quality Assessment Tool (QUADAS-2), 10 with the approach (and conflicts between the researchers) being reviewed by a third author. Meta-analysis was performed for the aggregate of mass spectrometry 'omics based approaches. Given the small sample sizes, not all subgroups offered meaningful results, but subgroups comprising viral proteomics, blood-based metabolomics, and novel 'omics approaches (saliva, sebum and breath) were reviewed independently from the aggregate. The following ratios were calculated: sensitivity, specificity, diagnostic accuracy, Youden's Index, positive likelihood ratio (PLR) and negative likelihood ratio (NLR). Sensitivity was defined as the true positive rate, i.e. the probability that a positive test result will be obtained when the disease is present, and calculated as TP / (TP+FN). Specificity was defined as the true negative rate, i.e. the probability that a negative test result will be obtained when the disease is not present, and calculated as TN / (TN+FP). Youden's Index was defined as sensitivity -(1 -specificity), or alternatively, one minus the sum of the error rates. The PLR was defined as the true positive rate / false positive rate. The NLR was defined as false negative rate / true negative rate. Heterogeneity of diagnostic power across the different biofluids investigated in this work was investigated by measuring Cochran's Q and Higgins I 2 . In this work, a p-value below 0.10 or I 2 value greater than 50% was taken as evidence of substantial heterogeneity of diagnostic power; it should be noted however, that lower values do not necessarily confirm homogeneity, only an absence of evidence for heterogeneity. 11, 12 A summary receiver operating characteristic (sROC) curve was also constructed for the studies included herein. ROC curves show the trade-off between sensitivity and specificity, whereby a test can be more sensitive (by over-diagnosing disease) at the cost of being less specific (more false positives), and vice versa. A test that was 100% sensitive and 100% specific would generate an area under the curve (AUROC) of exactly 1, and more generally values closer to 1 indicate better diagnostic performance. All statistical analysis was performed in the R Studio environment, 13, 14 with additional functionality using the epiR, forestplot and mada packages. 15, 16, 17 3 Results In total, 253 articles were identified in the initial search strategy by the terms described, after removing 308 duplicate results. From this initial list, 51 were identified as meeting the eligibility criteria and 202 were excluded. The articles on this shortlist were then read in full. 23 of the 51 identified articles contained the complete set of diagnostic accuracy data to allow for meta-analysis, albeit for one article 18 the data were imputed from provided ROC charts. Figure 1 provides a flowchart illustrating these steps. Table 1 , grouped by methods whose focus was on host characteristics, methods focused on the virus (by proteomics), and groups that identified features but were agnostic as to the source of those features. J o u r n a l P r e -p r o o f In the analysis that follows, Unclear does not denote 'medium' risk of concern; rather it denotes that insufficient information was provided, and there is no basis to consider the study to be at 'low' risk of bias or inapplicability. In terms of risks of bias around patient selection, 30% of the studies provided no cohort analysis, making it impossible to ascertain whether the work was free from bias in this regard. Furthermore, only 9% studies specified whether participants were recruited consecutively or at random. Only 23% of studies explicitly stated that asymptomatic patients were included, 39% stated that they were excluded, and 39% provided no information, The key extracted diagnostic indicators are summarised in Table 2 Across the studies reviewed, sensitivity ranged from 0.62 to 1.00 (aggregate sensitivity of 0.87), and specificity ranged from 0.72 to 1.00 (aggregate specificity of 0.88). Specificity was greater than sensitivity on average, albeit the difference was not statistically significant based on a two-tailed t-test (p-value of 0.34). In terms of biofluids analysed, sebum was analysed in 2 papers, and delivered the lowest aggregated sensitivity (0.76) and specificity (0.82), calculated by summing confusion matrices. Saliva was investigated in 2 studies, with sensitivity and specificity of 0.74 / 0.75 for metabolomic analysis of saliva, and 1.00 / 0.93 for proteomic analysis. Breath was analysed in 4 studies, with comparable sensitivity (0.78) and specificity (0.81) to sebum. Nine (9) studies sampled nasopharyngeal swabs, with high sensitivity (0.89) and specificity (0.88). The remaining 5 studies sampled blood (either plasma or serum), with aggregated sensitivity of 0.89 and specificity of 0.96. Proteomic approaches that targeted the virus reported higher sensitivity and specificity than approaches that targeted the impact on the host, albeit within the latter category there was considerable variation. Table 1 lists the major features differentiating the populations by study. In studies focusing on proteomics, a number identified features by m/z only, but 2 studies targeted peptides originating from spike proteins, and 2 identified peptides originating from nucleocapsid proteins. For the 4 studies analysing breath, a wide variety of alcohols, aldehydes and J o u r n a l P r e -p r o o f ketones were found to differentiate the populations, but there was limited overlap, with heptanal and octanal featuring in 2 of the 4 studies. In terms of sebomics, the studies described in this review found no differentiating features in common. Within plasma and serum, 2 papers identified ratios of amino acids (kynurenine in particular) as key differentiating features, and 2 papers focused on lipid dysregulation. As a single measure of performance, estimates of Youden's Index including confidence intervals are shown in Figure 3 , with Youden's Index calculated as sensitivity minus (one minus specificity), or alternatively one minus the sum of error rates. Heterogeneity assessment of the studies J o u r n a l P r e -p r o o f The studies show variation in their diagnostic performance measured by either sensitivity, specificity or Youden's Index (Table 2 and Figure 3 ) and -partly due to small participant populations -confidence intervals are wide. Cochrane's Q was calculated as 26.2 with a pvalue of 0.24, and Higgins' I 2 was calculated as 16%. The latter value should be treated with caution given the small samples sizes assessed in this meta-analysis as Higgins I 2 tends to be underpowered in the meta-analysis of studies with small n and therefore lower precision. 42 A low I 2 does not represent evidence of homogeneity per se, but may indicate that the variability in results could be due to wide confidence intervals rather than unexplained heterogeneity, as is this case in this work ( Figure S1 , Supplementary Material). Heterogeneity was also investigated by broad method employed, specifically proteomics versus metabolomics, and also by subgroup. Heterogeneity was notably low for proteomics including viral proteins, with Cochrane's Q calculated as 7.6, and Higgins' I 2 was 0%. For blood-based analyses, Cochrane's Q was calculated as 4.1, and Higgins' I 2 was calculated as 3%. For saliva, sebum and breath (the more novel 'omics analyses), Cochrane's Q was calculated as 3.2 and Higgins' I 2 was calculated as 0%. Visual inspection also illustrates the differences between, but similarity within, these methods ( Figures 4A and 4B ). This can also be illustrated by calculating summary area under the sROC curves for these groups. J o u r n a l P r e -p r o o f metabolomics analyses applied to other sampling matrices (saliva, breath, sebum) may partly reflect instrumental setup, but could also relate to confounders, and illustrates the need for much more inter-laboratory validation and comparison before these diagnostic techniques are likely to be suitable for translation to clinical practice. RT-PCR as a reference standard achieves very high analytical sensitivity and specificity and is generally seen as the clinical gold standard for release of patients from isolation, 47 but there has also to be a role for less sensitive, faster approaches to support a triage environment, e.g. for ward allocation on hospital admission, where a negative RT-PCR result will often require additional testing for confirmation. 48 Antigen detection assays can offer an alternative to RT-PCR with faster response time, depending on type; one meta-analysis found sub-category sensitivity ranging from 0.66 (for lateral flow immunoassays) to 0.98 (for chemiluminescent immunoassays). 49 Bilateral chest X-rays have also been reported to be a useful supplementary tool in COVID-19 diagnosis. In a recent meta-analysis chest X-rays were found to have sensitivity of 0.91 and specificity of 0.78, again with RT-PCR as the reference, 50 albeit the American College of Radiographers has noted that chest imaging in COVID-19 is not specific, and overlaps with other infections. 51 Compared with these benchmarks, MS-based approaches show promise based on achieved sensitivity and specificity and -given that mass spectrometry facilities are often available in hospital settings -may find a use-case by offering faster turnaround than RT-PCR and so supplementing clinical diagnosis. In addition, MS-based approaches offer alternatives in the initial stages of a pandemic, when supplies for PCR or other tests may be in short supply. Because of the ability of MS based techniques to identify dysregulation involving many pathways, such tests could provide information on the wider host metabolome and J o u r n a l P r e -p r o o f Journal Pre-proof proteome. This potentially allows for prognosis as well as diagnosis, and promising results have already been obtained for mass spectrometry-based prognostic analyses of serum, plasma and saliva. 31, 52, 53 In this work, the best results were found to be delivered by metabolomic study of homeostatically regulated biofluids (serum and plasma) and by proteomic study of nasopharyngeal swabs, with areas under their respective sROC curves of xxx and xxx respectively. These results were mainly obtained by UHPLC-MS (for blood metabolomics) and MALDI-TOF-MS (for proteomics). These sampling methods are, of course, more invasive than skin swabs or exhaled breath, but based on the studies reviewed here, the invasive methods deliver the greatest diagnostic accuracy and are most concordant with the WHO's shows low heterogeneity of diagnostic performance for proteomics and blood-based sampling, the variation (and lack of overlap) in differentiating features suggests that much more inter-laboratory validation and optimisation will be required before these results can be translated into a clinical setting. The pilot studies described herein have shown the potential for accurate diagnosis of COVID-19, but we believe that future work should focus on larger recruitment cohorts, the inclusion of more blind tests for validation, validation across different locations, and optimisation of techniques. The detection and diagnosis of COVID-19 by mass spectrometry has made substantial progress over the course of the SARS-CoV-2 pandemic. Achieved sensitivity and specificity of the diagnostic tests discussed in this review are encouraging, but with clear limits in the biases and applicability of the research undertaken so far. Whilst results based on proteomics and blood metabolomics delivered the most compelling performance, and these J o u r n a l P r e -p r o o f methods are most promising in terms of clinical application in the near term, more validation studies are still needed to reduce risks of bias and applicability. In the case of less invasive matrices, whilst the potential advantages are attractive, as yet there is little agreement between studies on suitably robust and reproducible targets. Whilst mass spectrometry techniques may show promise, and advances in this field could be applicable to disease diagnosis beyond COVID-19, future research should focus on reducing bias by recruiting larger numbers of participants without inappropriate exclusions, especially to meet thresholds for determining suitability for point of care or other use-cases. In addition, greater use of blind test sets for validation would reduce bias from over-fitted machine learning models in MS based diagnostic testing. Furthermore, and especially for the less invasive sampling matrices, considerable work is required to harmonise and optimise methodologies so that features can be validated between labs. WHO. WHO Coronavirus Disease (COVID-19) Dashboard Systematic review with meta-analysis of the accuracy of diagnostic tests for COVID-19 COVID-19 diagnosis -A review of current methods Metabolomics by numbers: Acquiring and understanding global metabolite data Mass spectrometry-based 'omics' technologies in cancer diagnostics Mass Spectrometry Analytical Responses to the SARS-CoV2 Coronavirus in Review Guidance for industry and manufacturers: COVID-19 tests and testing kits Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies The Combination of Estimates from Different Experiments Measuring inconsistency in metaanalyses R: A Language and Environment for Statistical Computing RStudio: Integrated Development Environment for R epiR: Tools for the Analysis of Epidemiological Data Advanced Forest Plot Using 'grid' Grap hics Metabolomics Profiling of Critically Ill Coronavirus Disease 2019 Patients: Identification of Diagnostic and Prognostic Biomarkers World Health Organization. Epidemic Intelligence from Open Sources (EIOS) Multi-omics analysis of respiratory specimen characterizes baseline molecular determinants associated with SARS-CoV-2 outcome A combined approach of MALDI-TOF mass spectrometry and multivariate analysis as a potential tool for the detection of SARS-CoV-2 virus in nasopharyngeal swabs Rapid Screening of COVID-19 Directly from Clinical Nasopharyngeal Swabs Using the MasSpec Pen Diagnosis of COVID-19 by analysis of breath with gas chromatography-ion mobility spectrometry -a feasibility study Diagnosis of COVID-19 by exhaled breath analysis using gas chromatography_mass spectrometry Metabolomics of exhaled breath in critically ill COVID-19 patients: A pilot study Covid-19 Automated Diagnosis and Risk Assessment through Metabolomics and Machine Learning Integrative Modeling of Quantitative Plasma Lipoprotein, Metabolic, and Amino Acid Data Reveals a Multiorgan Pathological Signature of SARS-CoV-2 Infection Diagnostic potential of the plasma lipidome in infectious disease: Application to acute sars-cov-2 infection Changes to the sebum lipidome upon COVID-19 infection observed via rapid sampling from the skin Skin imprints to provide noninvasive metabolic profiling of COVID-19 patients Untargeted saliva metabolomics reveals COVID-19 severity Novel application of automated machine learning with MALDI-TOF-MS for rapid high-throughput screening of COVID-19: a proof of concept Detection of SARS-CoV-2 in nasal swabs using MALDI-MS Detection of SARS-CoV-2 Infection in Human Nasopharyngeal Samples by Combining MALDI-TOF MS and Artificial Intelligence Rapid Detection of COVID-19 Using MALDI-TOF-Based Serum Peptidome Profiling Establishing a mass spectrometry-based system for rapid detection of SARS-CoV-2 in large clinical sample cohorts MALDI-ToF Protein Profiling as a Potential Rapid Diagnostic Platform for COVID-19 Rapid and Sensitive Detection of SARS-CoV-2 A mass spectrometry-based targeted assay for detection of SARS-CoV-2 antigen from clinical specimens A rapid and sensitive method to detect SARS-CoV-2 virus using targeted-mass spectrometry Cautionary tales in the clinical interpretation of studies of diagnostic tests Undue reliance on I2 in assessing heterogeneity may mislead Ultra-High-Throughput Clinical Proteomics Reveals Classifiers of COVID-19 Infection Article Proteomic and Metabolomic Characterization of COVID-19 Patient Sera ll Article Proteomic and Metabolomic Characterization of COVID-19 A comprehensive overview of proteomics approach for COVID 19: new perspectives in target therapy strategies Comprehensive Meta-Analysis of COVID-19 Global Metabolomics Datasets World Health Organization. Criteria for releasing COVID-19 patients from isolation Patient safety recommendations and management in patients with COVID-19 pneumonia suspicion: a retrospective study Diagnostic accuracy of serological tests for covid-19: Systematic review and meta-analysis Chest CT versus RT-PCR for the detection of COVID-19: systematic review and meta-analysis of comparative studies ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection The serum metabolome of COVID-19 patients is distinctive and predictive SARS-CoV-2 RNAemia and proteomic trajectories inform prognostication in COVID-19 patients admitted to intensive care World Health Organization. COVID-19 Target product profiles for priority diagnostics to support response to the COVID-19 pandemic v.1.0 Impact of baseline cases of cough and fever on UK COVID-19 diagnostic testing rates: estimates from the Bug Watch community cohort study The missing season: The impacts of the COVID-19 pandemic on influenza Anticipating outcomes for patients with COVID-19 and identifying prognosis patterns