key: cord-0920462-sewfb3q8 authors: Kang, Xixiong; Xu, Yang; Wu, Xiaoyi; Liang, Yong; Wang, Chen; Guo, Junhua; Wang, Yajie; Chen, Maohua; Wu, Da; Wang, Youchun; Bi, Shengli; Qiu, Yan; Lu, Peng; Cheng, Jing; Xiao, Bai; Hu, Liangping; Gao, Xing; Liu, Jingzhong; Wang, Yiping; Song, Yingzhao; Zhang, Liqun; Suo, Fengshuang; Chen, Tongyan; Huang, Zeyu; Zhao, Yunzhuan; Lu, Hong; Pan, Chunqin; Tang, Hong title: Proteomic Fingerprints for Potential Application to Early Diagnosis of Severe Acute Respiratory Syndrome date: 2005-01-01 journal: Clin Chem DOI: 10.1373/clinchem.2004.032458 sha: 936d6133d9a02ae489d4379698839dc86acfb4d0 doc_id: 920462 cord_uid: sewfb3q8 Background: Definitive early-stage diagnosis of severe acute respiratory syndrome (SARS) is important despite the number of laboratory tests that have been developed to complement clinical features and epidemiologic data in case definition. Pathologic changes in response to viral infection might be reflected in proteomic patterns in sera of SARS patients. Methods: We developed a mass spectrometric decision tree classification algorithm using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Serum samples were grouped into acute SARS (n = 74; <7 days after onset of fever) and non-SARS [n = 1067; fever and influenza A (n = 203), pneumonia (n = 176); lung cancer (n = 29); and healthy controls (n = 659)] cohorts. Diluted samples were applied to WCX-2 ProteinChip arrays (Ciphergen), and the bound proteins were assessed on a ProteinChip Reader (Model PBS II). Bioinformatic calculations were performed with Biomarker Wizard software 3.1.1 (Ciphergen). Results: The discriminatory classifier with a panel of four biomarkers determined in the training set could precisely detect 36 of 37 (sensitivity, 97.3%) acute SARS and 987 of 993 (specificity, 99.4%) non-SARS samples. More importantly, this classifier accurately distinguished acute SARS from fever and influenza with 100% specificity (187 of 187). Conclusions: This method is suitable for preliminary assessment of SARS and could potentially serve as a useful tool for early diagnosis. causative agent for SARS (7, 8 ) . Rapid progress has also been made in the determination of its genome sequences (9 -11 ) and the molecular evolution of the coronavirus (12 ) . Identification of angiotensin-converting enzyme 2 as the viral receptor provided further information toward deciphering its molecular mechanisms of infection (13 ) . Despite such advances in virologic studies, early diagnosis of SARS has been based primarily on the clinical definitions released by WHO and CDC (14, 15 ) , which can be confusing or contradictory (16 ) . Available serologic tests cannot guarantee an early diagnosis (17 ) , and PCRbased molecular detection of the viral RNA suffers from unsatisfactory sensitivity and specificity (3, (17) (18) (19) . In the last year, failure to develop diagnostic tests for SARS, especially in the acute phase, severely impacted specific prevention and treatment measures for SARS. There is a need to establish a reliable diagnostic methodology for SARS-CoV, in particular, to distinguish the similar clinical manifestations of SARS and other respiratory tract infections. This urgency is reinforced by the first SARS case not linked to laboratory contamination, which occurred in Guangdong, China this year (20 ) . Proteomic analysis has provided a unique tool for the identification of diagnostic biomarkers, evaluation of disease progression, and drug development (21, 22 ) . Surfaceenhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) enables rapid, reproducible protein/peptide profiling of multiple disease-specific biomarkers directly from crude samples (e.g., tissue cell lysates or body fluids) (23, 24 ) . Small amounts of sample can be applied directly to a biochip coated with specific chemical matrices (e.g., hydrophobic, cationic, or anionic) or specific biochemical materials such as DNA fragments or purified proteins. The bound proteins/peptides can then be analyzed by MS to obtain the protein fingerprints, or even amino acid sequence determinants, when interfaced to a mass spectrometric microsequencing device. Analogous to the proteomic detection of various cancers (25, 26 ) , we used a weakly cationic ProteinChip (WCX2 chip surface) to retrospectively analyze SARS sera to determine whether there are distinct and reproducible protein fingerprints potentially applicable to the diagnosis of SARS. We established a decision tree algorithm consisting of four unique biomarkers for acute SARS in the training set and subsequently validated the accuracy of this classifier by use of a completely blinded test set. More than 2000 serum specimens from suspected/probable SARS patients admitted to 38 major hospitals in the Beijing area between April 14 and June 5, 2003, were eligible for inclusion. The serum procurement, data management, and blood collection protocols were approved by the Beijing SARS-Control Working Group and were in accordance with WHO biosafety guidelines (27 ) . Among the retrospective samples, only 74 were selected from probable patients whose blood samples were collected with onset of fever within 7 days at the time of admission (acute SARS patients; Table 1 ). Probable cases were based on the eligibility criteria set forth by WHO (15 ) . These cases had also radiographic evidence of infiltrates consistent with pneumonia or respiratory distress syndrome on chest x-ray. The paired convalescent serum samples from the SARS cohort tested positive for IgM seroconversion by the IFA method (Beijing Genomics Institute), and four samples also tested positive in a DNA array test using nasopharyngeal samples. The 1067 non-SARS control se- The patients and serum samples were then divided into two groups: one for the "training" set and the other for the blinded "test" set (Tables 1 and 2 ). SARS and non-SARS control sera were all stored at Ϫ80°C in 30-L aliquots. Before each round of mass spectrometric assays, we routinely performed quality control of serum samples by the appearance and peak intensity of m/z 6635.09 (Fig. 3A ). Because the peak intensity of m/z 6635.09 remained relatively constant among spectra from different assays and different instruments, it was also used for normalization between each round of analyses. Three different chip chemistries (hydrophobic, anionic, and cationic) were first evaluated to determine which affinity chemistry gave the best serum profiles in terms of the number and resolution of proteins. The weakly cationic exchange chip (WCX) gave the best results with mass spectra from 0 to 200 kDa. The WCX chips in an 8-well bioprocessor format (Ciphergen) were chosen to allow a larger volume of serum for the chip array. The bioprocessor was pretreated with 150 L of 100 mmol/L sodium acetate (pH 4) on a platform shaker at 250 rpm for 5 min. The excess sodium acetate was removed by inverting the bioprocessor on a paper towel. This process was repeated twice. The serum samples were thawed on ice in a Biosafety Level II cabinet, and 20 L of each sample was mixed with 30 L of U9 buffer (9 mol/L urea, 10 g/L CHAPS in phosphate-buffered saline) in a 1.5-mL Eppendorf tube and vortex-mixed at 4°C for 20 min. We then added 100 L of U1 buffer [U9 buffer diluted by ninefold (100 mL of U9 buffer plus 800 mL of Tris-HCl) with 50 mmol/L Tris-HCl (pH 7)] to the serum/urea mixture, vortex-mixed it for 10 min, and stopped the reaction by addition of 600 L of sodium acetate on ice. We applied 50 L of the serum/urea sample to each well, and the bioprocessor was sealed and shaken on a platform shaker at 250 rpm for 30 min. The excess serum/urea solution was discarded, and the bioprocessor was washed three times with 100 mmol/L sodium acetate as described above. The chips were removed from the bioprocessor, washed twice with deionized water, and air-dried. Subsequently 0.5 L of EAM sinapinic acid saturated in 500 mL/L acetonitrile-5 g/L trifluoroacetic acid was added to each well. After air-drying, the sinapinic acid application was repeated. Chips were then placed in the Protein Biological System II (PBS II) mass spectrometer reader (Ciphergen), and TOF spectra were generated by an average of 104 laser shots collected in the positive mode. The settings for low-energy readings were set with a high mass of 50 kDa and were optimized from 3 to 15 kDa at a laser intensity of 200, detector sensitivity of 8, and a focus by optimization center. High-energy readings were set with a high mass of 200 kDa and were optimized from 10 to 50 kDa at a laser intensity of 230 and a detector sensitivity of 9. Mass accuracy was calibrated externally by use of the All-in-One peptide molecular mass calibrator (Ciphergen). Sera from a healthy control were individually applied to seven bait surfaces of eight WCX2 chips and run during 3-day intervals for analysis of within-run reproducibility. In parallel, 40 samples (10 from SARS patients, 10 from patients with fever, 10 from patients with pneumonia, and 10 from health controls) were applied in duplicate to a single chip and run on two different instruments (PBS II and PBS IIc; Ciphergen) for between-run analysis of instrument drift. To avoid the possibility that placement or run order of samples would affect assay accuracy, samples were loaded on chips in a rotational fashion. In in the pericardium (n ϭ 1), upper right clavicle (n ϭ 1), lymph nodes (n ϭ 1), liver (n ϭ 1), and brain (n ϭ 1); accompanying hydrothorax was also observed in nine patients. brief, sample 1 was spotted on the 8-well directional chip (wells A to H) in duplicate in wells A and B and then in wells G and H of the second chip. Samples 2, 3, and 4 were loaded on chips in the same rotation order. We also randomized the order of chip placement in the spectrometer to minimize bias from run order. Spectra were collected for each sample and analyzed independently using the classification algorithm established in the training step. The peak at m/z 6635.09 in the quality-control serum was adjusted to have an intensity of 40 -60 for both the PBS II and PBS IIc. The peak intensity of m/z 6635.09 in the quality-control serum was used to normalize instrument resolution between the PBS II and PBS IIc. We normalized spectra using total ion current with an identical normalization coefficient and a low mass cutoff Ͻ2000 Da. If the factor was Ͻ0.3 or Ͼ2.9 after normalization to total ion current for the peak at m/z 3939, repeated runs would be performed. No outlier was rejected in the test. The "root" biomarker, m/z 3939, yielded the lowest and similar P value in both the PBS II and PBS IIc. Peak detection was performed with Biomarker Wizard software 3.1.1 (Ciphergen). The m/z ratios between 2000 and 20 000 were selected for analysis because this range contained the majority of the resolved protein and peptides. The m/z range between 0 and 2000 was eliminated from analysis to avoid interference from adducts, artifacts of the energy-absorbing molecules, and other possible chemical contaminants. Peak detection involved baseline subtraction, mass normalization using a common calibrant peak (m/z 6635.09), and normalization to the total ion current intensity with a minimum m/z of 2000, using an external normalization coefficient of 0.2 (normalization factor for individual spectrum ϭ 0.2/average ion current for each spectrum) for spectra obtained at different times or locations. The settings used for autodetect peaks to cluster in the first pass were a signal-to-noise ratio of 5 and a minimum peak threshold of 5% of all spectra. The peak clusters were completed by second-pass peak detection using a signal-to-noise ratio of 2 and 0.3% of mass for the cluster window. An average of 99 peaks was detected in each spectrum. The mass range from 20 to 200 kDa was analyzed in parallel. analytical procedure Data analysis.The data analysis process used in this study involved three stages: (a) peak detection and alignment; (b) selection of peaks with the highest discriminatory power; and (c) data analysis using a decision tree algorithm. A random sampling (acute SARS, fever, pneumonia, lung cancer, and healthy) with two strata (acute SARS and non-SARS) was used to separate the entire data set into training and test data sets. The training data set consisted of SELDI spectra from 37 acute SARS and 74 non-SARS serum samples. The validity and accuracy of the classification algorithm were then challenged with a blinded test data set consisting of 37 acute SARS and 993 non-SARS samples. Decision tree classification. Construction of the decision tree classification algorithm was performed as described previously (26 ) with modifications based on the Biomarker Patterns Software (Ciphergen). Classification trees were split into two branches or nodes, using one rule at a time. We set target the variable level at 2 and the minimum value at 0, and the decision was made based on the presence or absence and the intensity of one peak, using the Gini or Twoing method, favoring even splits from 0.00 to 2.00 and varied by 0.2 each time, and with V-fold cross-validation from 6 to 12 changed by 2 for the growth of 88 trees. The lowest cost tree (value ϭ 0.068; Gini ϭ 2.0; V-fold ϭ 10) was selected for the final test. To identify the serum biomarkers that could distinguish SARS from non-SARS samples, we used a training set of specimens (37 SARS acute and 74 controls; Tables 1 and 2) and constructed the decision tree classification algorithm using 10 989 peaks [99 peaks ϫ (37 ϩ 74) spectra] of statistical significance identified in the low energy readings (see Materials and Methods). The classification algorithm used four peaks between 3 and 12 kDa (m/z 3939.08, 4137.71, 8136.64, and 11 514.2) and generated five terminal nodes (Fig. 1) . These discriminatory peaks efficiently split SARS specimens into terminal nodes 3 and 5 and non-SARS samples into terminal nodes 1, 2, and 4. Each mass peak showed a mean intensity ratio of SARS vs non-SARS Ͼ3 and a P value close to 0 (Table 3) . Notably, the protein or peptide with masses at 3939.08, 8136.64, and 11 514.2 Da was up-regulated in patients with acute SARS, whereas that of a mass at 4137.71 Da was down-regulated compared with healthy controls or patients with respiratory tract infections. A representative spectrum of a SARS specimen aligned with that of a healthy control ( Fig. 2A) showed the four fingerprints in node 3 required for pattern recognition in the classifier. The unique presence of the root biomarker, m/z 3939.08, is demonstrated in the alignment of representative spectra of samples from patients with acute SARS (1, 3, 5, and 7 days after the onset of fever; from terminal node 5) and those from healthy controls and patients with fever and influenza or pneumonia (Fig. 2B) . This decision algorithm correctly classified 37 of 37 (100%) of the acute SARS samples and 72 of 74 (97.3%) of the non-SARS controls in the training set ( Table 3) . The above classifier used only those masses in the low-energy readings (m/z Ͻ50 000). To exhaust all meaningful serum biomarkers, we expanded the analysis of the same training samples in the high-energy setting (m/z combine two energy settings for analysis, we reasoned that the decision tree generated with only low-energy readings (Fig. 1) would be more sensitive (100%) and more convenient for a clinical application. To determine the reproducibility of SELDI spectra, mass location, and intensity from array to array on a single chip (intraassay) and between instruments (interassay), we first spotted the serum from a healthy control on seven baits in a single chip and collected seven independent spectra over a time span of 21 days (Fig. 3A) . We then selected seven proteins in the range of 3-10 kDa (m/z 4089.59, 5334.17, 5631.18, 5901.49, 6625.63, 7762.24 , and 7966.63; black arrows in Fig. 3A ) to calculate the intraassay CV. These peaks were selected because they were in the proximity of the four biomarkers with comparable current intensities. The interassay experiments were similar except that sera from healthy controls and from patients with high fever, pneumonia, and SARS were applied to a single chip, and the independent spectra were collected from two different instruments (PBS II and PBS IIc; Fig. 3 , B and C). The mean intra-and interassay CVs for peak location were 0.02% and 0.03%, respectively. We considered masses with accuracies within 0.1% between spectra to be the same. The mean intra-and interassay CVs for the normalized intensity were 15% and 20%, respectively. CV calculations using lower intensity peaks (Fig. 3A, gray arrowheads) , on the other hand, yielded results similar to those obtained with the seven high-intensity peaks (peak location, intra-and interassay CVs both 0.03%; peak intensity, intraassay CV ϭ 17% and interassay CV ϭ 18%). Analysis of spectra from the completely blinded test set (37 acute SARS and 993 controls; Tables 1 and 2) accurately classified 36 of 37 (97.3%) SARS specimens and accurately classified 987 of 993 (99.4%) of the controls as non-SARS (Table 3 ). More important was that the classification algorithm successfully distinguished acute SARS from fever and influenza, with a sensitivity and specificity reaching 97.3% (36 of 37) and 100% (187 of 187; 60 of 60 with influenza), respectively. Interestingly, when we tested the classifier using an additional control population of 40 samples from patients in the Beijing area with measles after July 16, 2003 , who had no history of close contact with SARS patients and had not visited those hospitals treating SARS patients, the classifier had a specificity of 100% (95% confidence interval, 89 -100%; data not shown). Several laboratory tests, based on either viral RNA (3, 17, 19 ) or serology (6, 17 ) , have been developed to complement clinical characteristics and epidemiologic data in the identification of SARS, but early detection of SARS with sufficiently high sensitivity and specificity has not been achieved. The identification of proteins/peptides of pathophysi- Fig. 3 . Intra-and interassay reproducibility. (A), example of intraassay reproducibility of mass spectra and tree decision classification. Serum from an unaffected healthy control was individually applied to seven bait surfaces on eight chips, and seven randomly selected peaks (arrows) in each spectrum over a course of 27 days were used as surrogate markers for calculation of CV. The reproducibility of SELDI spectra, mass location, and intensity from spectrum to spectrum was determined accordingly. ologic significance (phenomic fingerprints) in crude biological and clinical samples by SELDI-TOF MS has been demonstrated in various cancer studies (28 ) . Using a similar profiling strategy, we have established a classification algorithm that delineates probable SARS patients as early as day 1 after self-described onset of symptoms from healthy individuals and from patients with respiratory tract infections in the training set (sensitivity ϭ 100%; specificity ϭ 97.3%). When applied to the blinded test set, this discriminatory profiling method precisely classified 97.3% of patients with acute SARS and 99.4% of non-SARS patients. More strikingly, our classifier was able to discriminate SARS-CoV infection from bacterial (mycoplasma, tuberculosis) and other local (influenza) or systemic (measles) viral infections of the respiratory tract with a specificity reaching 100%. This was attributable to the inclusion of corresponding inflammatory control samples in the training set and optimization of the classification algorithm. The biomarkers identified in the acute phase of SARS seemed to remain throughout the convalescent phase of the disease because when we applied the identical tree classification to samples from patients in whom onset of fever had been Ͼ2, 3, 4, and Ͼ5 weeks previously, we could detect SARS with sensitivities and specificities reaching 89.2% and 91.8%, 86.0% and 91.8%, 93.1% and 91.8%, and 79.5% and 91.8%, respectively (data not shown). One intriguing observation was that SARS patients clustering in terminal node 3 all demonstrated moderate clinical features, whereas those in node 5 were severe cases. We are investigating the correlation between this proteomic pattern and the pathology of SARS. These results represent, to the best of our knowledge, the most accurate laboratory technique for early detection of SARS: PCR-based assays have a maximum sensitivity of 80% when used to test nasopharyngeal aspirates or plasma specimens (29, 30 ) . The proteomic method described here also has advantages over PCR-based assays in that it does not require BSL-3 containment and it can detect SARS in serum samples. This is a critical alternative to PCR-based tests, which are challenged by low viral loads in nasopharyngeal aspirates and throat swab specimens in the acute phase of SARS. Instead of traditional chromatographic fractionation of samples, we directly spotted the crude serum on the WCX chips. By doing this we avoided the unnecessarily biased depletion of thousands of proteins and/or peptides associated with human serum albumin before MS analysis. Processing of samples and generation of the diagnostic mass spectra by our method required only a small amount of serum (20 L vs several milliliters needed for PCR methods) and took Ͻ3 h. High-throughput proteomic screening for SARS in a 96-well format is also feasible. We adhered to the WHO case definition and eligibility criteria for SARS and avoided using samples from non-SARS controls from hospitals where SARS patients had been admitted because these persons might have a history of close contact with SARS patients or had been inside those SARS hospitals. We further emphasized this point by sampling control sera from a nonepidemic region of the country. Although the possibility might exist that the difference in serum fingerprints would reflect differences among SARS and non-SARS hospitals, the fact that all SARS cases from 38 different hospitals fit into the single classification algorithm would likely rule out such a concern. More importantly, severe and mild cases of SARS from different hospitals, which had been completely randomized in the experimental analysis, fell into distinct nodes of the tree classification, strongly indicating that the biomarkers we have identified were specific to SARS and not the sites at which blood samples were collected. We further minimized the potential sampling bias by simultaneously using four biomarkers instead of one (e.g., m/z 3939.08), which nevertheless could sufficiently delineate SARS from non-SARS (sensitivity ϭ 93.7%; specificity ϭ 91.8%; data not shown). All SARS and non-SARS samples were from patients with the same ethnic background. SARS and non-SARS control sera collected at different times were all freshly aliquoted and properly stored at Ϫ80°C. The differential protein pattern as the discriminator between SARS and non-SARS is independent of protein identities. The origins and full identities of the discriminating biomarkers are under investigation. To know their identities for the purpose of differential diagnosis is not absolutely required, as shown by numerous studies showing diagnosis of cancers by SELDI methods. However, to characterize these peaks would certainly help in understanding the biological roles of these peptide/proteins and could potentially lead to the discovery of more direct diagnostic tools and novel therapeutic targets for SARS-CoV. Cumulative number of reported probable cases of SARS Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Identification of severe acute respiratory syndrome in Canada Coronavirus as a possible cause of severe acute respiratory syndrome Newly discovered coronavirus as the primary cause of severe acute respiratory syndrome Koch's postulates fulfilled for SARS virus Characterization of a novel coronavirus associated with severe acute respiratory syndrome The genome sequence of the SARS-associated coronavirus Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection SARS-beginning to understand a new virus Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Updated interim U.S. case definition of severe acute respiratory syndrome (SARS) Case definitions for surveillance of severe acute respiratory syndrome (SARS) Clinical presentations and outcome of severe acute respiratory syndrome in children Clinical progression and viral load in a community outbreak of coronavirus-associated SARS pneumonia: a prospective study Quantitative analysis and prognostic implication of SARS coronavirus RNA in the plasma and serum of patients with severe acute respiratory syndrome Rapid diagnosis of a coronavirus associated with severe acute respiratory syndrome (SARS) Laboratory confirmation of a SARS case in southern China Disease proteomics Biomedical informatics for proteomics Clinical proteomics translating benchside promise to bedside reality SELDI proteinchip MS: a platform for biomarker discovery and cancer diagnosis Use of proteomic patterns in serum to identify ovarian cancer Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men WHO biosafety guidelines for handling of SARS specimens Proteomic applications for the early detection of cancer Detection of SARS coronavirus in plasma by real-time RT-PCR Crouching tiger, hidden dragon: the laboratory diagnosis of severe acute respiratory syndrome