key: cord-0921845-0o6bl9px authors: Gomila, Rosa M; Martorell, Gabriel; Fraile-Ribot, Pablo A; Doménech-Sánchez, Antonio; Albertí, Miguel; Oliver, Antonio; García-Gasalla, Mercedes; Albertí, Sebastián title: Use of matrix-assisted laser desorption ionization time-of-flight mass spectrometry analysis of serum peptidome to classify and predict COVID-19 severity date: 2021-05-02 journal: Open Forum Infect Dis DOI: 10.1093/ofid/ofab222 sha: 306c409fd1486b2842444010680672c831b7c7fc doc_id: 921845 cord_uid: 0o6bl9px BACKGROUND: Classification and early detection of severe COVID-19 patients is required to establish an effective treatment. We tested the utility of matrix assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to classify and predict the severity of COVID-19. METHODS: We used MALDI-TOF MS to analyse the serum peptidome from 72 COVID-19 patients (training cohort), clinically classified as mild (28), severe (23) and critical (21), and 20 healthy controls. The resulting matrix of peak intensities was used for Machine Learning (ML) approaches to classify and predict COVID-19 severity of 22 independent patients (validation cohort). Finally, we analysed all sera by liquid chromatography mass spectrometry (LC MS/MS) to identify the most relevant proteins associated to disease severity. RESULTS: We found a clear variability of the serum peptidome profile depending on COVID-19 severity. Forty-two peaks exhibited a log fold change ≥ 1 and 17 were significantly different and at least four-fold more intense in the set of critical patients than in the mild ones. ML approach classified clinical stable patients according to their severity with a 100% of accuracy and predicted correctly the evolution of the non-stable patients in all cases. LC MS/MS identified five proteins that were significantly upregulated in the critical patients. They included the serum amyloid protein A2, which probably yielded the most intense peak detected by MALDI-TOF MS. CONCLUSION: We demonstrated the potential of the MALDI-TOF MS as a bench to bedside technology to aid clinicians in their decisions on COVID-19 patients. Coronavirus infectious disease 19 (COVID-19) was first reported in Wuhan, Hubei province, China as a new coronavirus disease caused by a positive-strand RNA virus designated as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1). This virus is responsible of a pandemic of unprecedented dimensions. Approximately 80% of cases are asymptomatic or present mild symptoms, such as fever, cough, fatigue, and dyspnea. Conversely, about 20% of patients with COVID-19 develop viral pneumonia, with an exaggerated host inflammatory response and hypoxia, requiring intubation and mechanical ventilation (2, 3) . These patients, classified as clinically severe or critical lifethreatening infections, are mainly diagnosed empirically based on a set of clinical characteristics. However, patients with these symptoms have already evolved to a serious clinical condition that requires specialized intensive care. Therefore, it is essential to set up novel and rapid approaches to identify biomarkers for symptom onset and disease progression to facilitate triage of patients and establish appropriate treatments. Peptidome-based studies using serum from patients and high-throughput spectrometric techniques promise to be valuable for the identification of COVID-19-associated biomarkers. Serum may contain proteins induced by the systemic effects or released to the lung as a result of the viral infection. Thus, patient serum can reflect the physiological or pathological state. Indeed, a proteomic and metabolomic analysis of serum from 46 COVID-19 patients performed by Shen et al demonstrated that using serum proteins and metabolite biomarkers it is possible, not only classify patients according to their grade of severity, but also predict the progression to severe COVID-19 (4) . More recently, Messner et al re-designed a highthroughput mass spectrometry platform that enabled the identification of up to 27 potential biomarkers that were differentially expressed depending on the severity grade of COVID-19 (5) . Although the technologies used in both studies are highly sensitive and provide robust results, they are time consuming, requires specialized personnel and most importantly, they are not available in most of the hospitals, so their translation bench to bedside is limited. A c c e p t e d M a n u s c r i p t In the present study, we used matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS), a simple and fast technology, available in most of the hospitals, to conduct a comparative analysis of serum from COVID-19 patients. Our results demonstrate the value and power of MALDI-TOF to classify and predict the progression of COVID-19 in a clinical setting. All human samples were taken after written consent of each participant. They were informed of the purposes of the study. The study was approved by the Institutional Review Board of Hospital Universitario Son Espases and the Regional Ethics Committee of Illes Balears (CEI). This study included a total of 94 COVID-19 patients who attended Hospital Son Espases, the reference hospital of the Balearic Islands, between March 2020 and November 2020. COVID-19 cases were confirmed based on the Chinese management guideline for COVID-19 (6). Only patients who had a confirmed molecular diagnosis of SARS-CoV-2 ribonucleic acid polymerase chain reaction (PCR) positive were enrolled. The severity grade of COVID-19 was defined based on the abovementioned guideline (6). Accordingly, COVID-19 patients were classified into three subgroups: mild, severe, or critical. Mild included non-pneumonia and mild pneumonia cases. Severe was characterized by dyspnea, respiratory frequency ≥30/minute, blood oxygen saturation ≤93%, PaO 2 /FiO 2 ratio <300, and/or lung infiltrates >50% within 24-48 hours. Critical cases were those that exhibited respiratory failure, septic shock, and/or multiple organ dysfunction/failure. A c c e p t e d M a n u s c r i p t Twenty healthy volunteers, including 13 recovered from COVID-19 were also included in the study. Blood samples were collected into anticoagulant free-tubes. The tubes were centrifuged at 2,500 rpm at 20ºC for 10 min within a 30-min time frame. Serum from each patient sample was then collected, aliquoted and stored at -80ºC. Each serum sample was heat inactivated a 56ºC for 90 min to inactivate the virus prior to analysis. The preparation of the serum and the analysis of the samples by MALDI-TOF MS were performed as previously described (7) . In order to obtain a better resolution and sensitivity in the MALDI-TOF analysis, serum samples were purified and concentrated using reversed phase C18 tip, Pierce™ C18, following the manufacture instructions. These are miniature reverse-phase columns packed into a 10 mL pipet tips, with a micro volume bed of reversedphase medium fixed at its end, without dead volume. These tips are suitable for concentrating, desalting and enriching protein/peptide samples prior to analysis. Samples are passed through activated reversed phase C18 tips where the proteins/peptides are Raw mass spectra obtained by MALDI-TOF MS was analysed using the MALDIquant R package (8) . Square root transformation, peak smoothing, baseline correction, and intensity normalization were performed on each mass spectrum. The average spectrum from the triplicates was obtained. Peaks were detected and binned across all average spectra with a signal to noise ratio of 5 and a tolerance of 0.002. Peaks presents in less than 15% of the spectra were rejected. All spectra from the groups under study were pre-processed, and peak detection was applied to obtain an intensity matrix. For every peak, the log fold-change A c c e p t e d M a n u s c r i p t intensity (logFC) between different groups was calculated using the median of the intensity for each group. Peaks with a logFC ≥ 1 were used for Principal Component Analysis (PCA) (9) and Machine learning (ML) analysis (10) . Five different ML algorithms were used to classify samples namely, logistic regression, support vector machine with a linear kernel, naive Bayes, random forest, and decision tree. Results of testing the ML algorithms demonstrated that the models' accuracies did not vary substantially among the ML methods (supplementary table 2), being the Decision Tree algorithm the best one. We decided to use the most frequent result obtained using all five ML algorithms to classify each sample. Three microliters of serum were diluted to 1 ml with 50 mM Ammonium bicarbonate (0.2 g/l taking the average of plasma proteins as 80 mg/ml). One hundred L of the dilution were reduced with 11 l of 50 mM dithiothreitol for 30 minutes at 56°C and were alkylated with 12.5 l of iodoacetamide 20 mM for 20 minutes in the dark at 37 °C. Total volume 123.5 l, containing likely 24 g of plasma protein, were digested with 10 L of trypsin 100 ng/l at 37°C overnight. Ten microliters of formic acid (FA) 5% were added to stop the digestion. The tolerance was set to 10 ppm, and product ion mass tolerance was set to 0.6 Da. The peptide-spectrum-match allowed 1% target false discovery rate (FDR) (strict) and 5% target FDR (relaxed). Normalization was performed against the total peptide amount. The other parameters followed the default setup. We acquired MALDI mass spectra of 72 serum samples obtained from different COVID-19 patients; 28 samples were collected from mild COVID-19 patients, 23 from severe patients and 21 from critical patients. To avoid the effect of the longitudinal immunological changes associated with this infection, we verified that patients did not change their clinical A c c e p t e d M a n u s c r i p t classification during at least 72 h after the sample was taken. We also analysed 20 serum samples obtained from healthy people, including 13 samples collected from individuals completely recovered, at least one month before, from COVID-19. The samples were randomized with respect to spotting and analysis. The relevant characteristics of each group are shown in the supplementary Table 1 . All spectra from the four groups under study were processed and peak detection was applied to obtain an intensity matrix of 179 peaks in the mass range of 2,000 to 25,000 daltons using MALDIquant. Representative mass spectra from each patient set are shown in supplementary figure S1. To select the most characteristic peaks distinguishing the groups classified according to COVID-19 severity, we calculated the median intensity of each peak for each group. Figure 1A illustrates the quantitative variability of those peaks that exhibited a log fold change ≥ 2 for COVID-19 severity on a heatmap. We found clear differences between critical and mild patients. Sixteen peaks were significantly different and at least four-fold more intense in the set of critical patients than in the mild ones ( Figure 1B) . However, only five of those peaks (m/z; 11,532, 4,875, 2,979, 3,067 and 6,294) were significantly different between severe and critical patients. We then used PCA to compare samples from patients with different severity grade in a multidimensional space using the 42 peaks (m/z) that exhibited a log fold change ≥ 1 obtained from each sample by MALDI-TOF MS ( Figure 2 ). All critical patients (red dots) were clearly separated from the mild patients (blue dots), except for the outlier sample 165. However, samples from severe patients were not separated from either mild or critical patients. In fact, samples from severe patients were distributed almost equally between the group of samples from mild and critical patients. Next, we applied a ML approach to classify and predict COVID-19 severity. We built a support vector ML model using the same peaks (m/z) used in the PCA. Samples from the 72 patients were classified as mild, severe and critical according to the Chinese guide and used Case studies demonstrated the clinical utility of the peptidome profiles to classify or predict COVID-19 evolution (Figure 3) . Thus, all patients who remained clinically stable during at least 72 h (55, 56, 57, 61, 75, 73, 111, 113, 143, 157, 161, 163, 167, 168, 170 and 171) were correctly classified using ML. Three samples from patients classified as clinically severe (101, 118 and 137), were clustered in the group of critical patients using ML 48 h before they clinically progressed from severe to critical. On the other hand, samples from patients 51, 52, and 162, classified as clinically severe (grade 2) when the samples were collected, which 48 h later evolved to mild (grade 1), were clearly clustered in the group of mild patients using ML. Most of the discriminating peaks identified by MALDI-TOF MS analysis had a low molecular weight (< 5,000 Da). They probably resulted from the fragmentation of proteins upregulated in severe and critical patients. Interestingly, the most substantial intensity difference was exhibited by the peak with m/z of 11,532, which might correspond to an unfragmented protein of the acute phase induced by the virus. To investigate this hypothesis, we performed a proteomic analysis of the samples by LC MS/MS. We identified five proteins that were significantly upregulated according to the severity of the disease; the serum amyloid A2 protein (SAA2), the C reactive protein (CRP), the serum amyloid protein A1 (SAA1), the lipopolysaccharide binding protein (LBP) and the gamma chain of the fibrinogen (FGG) (Figure 4 and supplementary figure 2) . Only the serum level of SAA2 exhibited significant increments between mild and severe patients and between severe and critical patients. In addition, SAA1, CRP, LBP and FGG were increased in the serum from the critical patients compared with to mild patients, while only CRP was increased in the severe patients compare to mild patients. Given that the molecular weight of SAA2 is approximately 11.7 kDa, depending on the isoform (11) , and that we found a good correlation between the A c c e p t e d M a n u s c r i p t level of both proteins and the intensity of the peak with m/z of 11,532, we suggest that this peak might correspond to the serum amyloid protein A2. In this study we demonstrate that the molecular changes that occur in the sera of COVID-19 patients may be detected by MALDI-TOF MS analysis generating peptidome profiles that may be used as clinical classifiers. In addition, we show that it is possible to predict the progression of the disease using the peptidome signatures obtained with this technology. Finally, we provide strong evidences that serum amyloid A2 protein is one of the major biomarkers of severe COVID-19 disease. To our knowledge, only two previous studies reported the use of mass spectrometry analysis of serum from COVID-19 patients to classify disease severity (4, 5) . However, both studies were performed using sophisticated technologies, which are not available in most of the hospitals. Our challenge was to test whether MALDI-TOF MS analysis, a simpler technology available in most of the clinical microbiology laboratories for identification of microbial species, was able to achieve similar results. Our peptidome profile data identify the most important changes within the severe patients, upon which a patient is put on oxygen supply. This observation is consistent with the proteome analysis conducted by Messner et al, who found that at molecular level the requirement of oxygen supply coincided with the progression to severe disease (5). In contrast, mild patients have a peptidome signature virtually identical to the healthy controls suggesting that in non-severe patients changes are restricted to the site of infection, the respiratory tract, without significant molecular systemic alterations. As in previous studies (4, 5) , major differences between critical and mild patients were due to the presence of upregulated proteins in the sera of the critical patients rather than the presence of downregulated proteins. Thus, among the peaks that exhibited major differences, only one peak (m/z 2,023) was four-fold more intense in the mild patients than in the critical patients. A c c e p t e d M a n u s c r i p t ML approach and analysis of clinical data demonstrated the clinical utility of the peptidome profiles to classify and predict COVID-19 evolution. Our study classified clinically stable patients with a high accuracy (100%), even higher to that obtained in a previous report (93%) (4) . In addition, the evolution of 100% of the patients was correctly predicted 48 h before the clinical change. Overall, these results suggest that this technology is quite accurate to classify patients and to predict their prognostic. It would be interesting to conduct a longitudinal study with sequential daily samples from a cohort of patients at different grades of severity until their recovery to determine with more confidence the anticipation time of prediction. One of the potential limitations of our study is that due to the rapid response required in the initial stages of the pandemic situation, we collected samples from the patients that were admitted in our hospital using as unique criteria that they were hospitalized due to a SARS-CoV-2 infection. Therefore, our study did not take in account some confounding factors, like age. Nonetheless, the change of the intensity of the peaks between groups substantially exceeded the variability observed within each group with ages ranging from 33 to 89, suggesting that differences in the peptidomes profiles of different groups are poorly influenced by confounding factors. The results of our comparative analysis of serum from COVID-19 patients with different grades of severity demonstrate the potential of the MALDI-TOF MS as a fast and clinically available technology to classify and predict the progression of this infectious disease in a clinical setting. Our workflow has a total hands-on time of less than 5 min per sample from the inactivation of the serum samples, collected following standard procedures, to the acquisition of the mass spectra with the MALDI-TOF mass spectrometer. A single person can perform all the procedure because the workflow is designed to reduce the pipetting to only two steps mitigating variability. A key step concerns to the cleanup and concentration of the samples using the reversed phase C18 tips, which improves the resolution and sensitivity of the assay. One target plate with 96 positions can contain up to 30 serum A c c e p t e d M a n u s c r i p t samples per triplicate plus calibrating standards and can be ready for the mass spectrometric analysis in 2 h. Next, a data acquisition scheme based on the MALDI Biotyper® commonly used in the clinical microbiology laboratories for microbial identification is used. Therefore, this system could be implemented in most of the hospitals without major problems. However, if the available mass spectrometer does not have this acquisition system, the method should be optimized from a positive linear method available using the BTS (Bruker Bacterial Test Standard) calibrant. Finally, raw mass spectra obtained by MALDI-TOF MS can be analysed using packages (MALDIquant R) (12) The laboratory can provide to the clinicians the results of the mass spectra analysis in less than 3 h. These results may support some clinical decisions including patient triage and early identification of patients at high risk of disease progression and severe illness in order to establish an effective treatment to avert progression to more serious illness, with the additional benefit of reducing the burden on healthcare systems. Furthermore, the data provided by the peptidome analysis could be useful to monitor the efficacy of the treatments and to manage the hospital resources, particularly, intensive care unit stays, which is particular important during the pandemic waves. M a n u s c r i p t A) Heatmap illustrates peptidome profiles that inform on COVID-19 severity. Heatmap was generated using the ComplexHeatmap package (8) using those significantly different peaks based on unpaired two tailed t test (p < 0.05) and a log fold change ≥ 2 for COVID-19 severity. Groups were classified according to COVID-19 severity following the Chinese management guideline for COVID-19. Blue bracket below heatmap indicates healthy individuals recovered of COVID-19. B) Relative intensity of MALDI mass spectra peaks with major differences between groups. The boxes show the first and third quartiles as well as the median (middle), the mean (cross), and the outliers (circles outside the whiskers) of the relative intensity of the peaks that exhibited a log fold change ≥ 2 for COVID-19 severity. Asterisks indicate statistical significance based on unpaired two tailed t test. (p < 0.05). Groups were classified according to COVID-19 severity following the Chinese management guideline for COVID-19; mild (blue dots), severe (orange dots) and critical (red dots) patients. Only peaks with a log fold-change ≥ 1 between groups were used for the analysis. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China Proteomic and metabolomic characterization of COVID-19 patient sera Ultra-high-throughput clinical proteomics reveals classifiers of COVID-19 infection The novel coronavirus pneumonia emergency response epidemiology team. The epidemiological characteristics of an outbreak of 2019 novel Coronavirus diseases (COVID-19) -China MALDI-TOF analysis of blood serum proteome can predict the presence of monoclonal gammopathy of undetermined significance RStudio: Integrated Development Environment for Principal component analysis Machine learning in bioinformatics Purification, identification and profiling of serum amyloid A proteins from sera of advanced-stage cancer patients MALDIquant: a versatile R package for the analysis of massspectrometry data The authors thank Sara Fernández for helpful discussions on statistical analysis. This study was supported by Instituto de Investigación Sanitaria de las Islas Baleares (IdISBa).Potential conflicts of interest. A c c e p t e d M a n u s c r i p t A c c e p t e d M a n u s c r i p t severity following the Chinese management guideline for COVID-19; mild (blue boxes), severe (orange boxes) and critical (red boxes) patients. A c c e p t e d M a n u s c r i p t