key: cord-0258420-0gsq1i2h authors: Fossati, Andrea; Richards, Alicia L.; Chen, Kuei-Ho; Jaganath, Devan; Cattamanchi, Adithya; Ernst, Joel D.; Swaney, Danielle L. title: Towards comprehensive plasma proteomics by orthogonal protease digestion date: 2021-04-29 journal: bioRxiv DOI: 10.1101/2021.04.28.441706 sha: 10c4732c1ecec4a13365b5866c1a0f023b0a6e42 doc_id: 258420 cord_uid: 0gsq1i2h Rapid and consistent protein identification across large clinical cohorts is an important goal for clinical proteomics. With the development of data-independent technologies (DIA/SWATH-MS), it is now possible to analyze hundreds of samples with great reproducibility and quantitative accuracy. However, this technology benefits from empirically derived spectral libraries that define the detectable set of peptides and proteins. Here we apply a simple and accessible tip-based workflow for the generation of spectral libraries to provide a comprehensive overview on the plasma proteome in individuals with and without active tuberculosis (TB). To boost protein coverage, we utilized non-conventional proteases such as GluC and AspN together with the gold standard trypsin, identifying more than 30,000 peptides mapping to 3,309 proteins. Application of this library to quantify plasma proteome differences in TB infection recovered more than 400 proteins in 50 minutes of MS-acquisition, including diagnostic Mycobacterium tuberculosis (Mtb) proteins that have previously been detectable primarily by antibody-based assays and intracellular proteins not previously described to be in plasma. Mass spectrometry-based proteomics is among the most promising technologies for biomarker discovery due to the ability to simultaneously detect thousands of proteins, post-translational modifications, and isoforms, all of which holds great potential as future biomarkers. 1 This high throughput approach can lead to the identification of proteins that can be translated into simple, affordable, and non-invasive assays at the point-of-care for disease diagnosis and monitoring. For example, tuberculosis (TB) is a leading cause of mortality from an infectious disease globally for which diagnosis remains a key challenge. There is a critical need for rapid, low-cost, point-of-care assays but there are few promising biomarker 2 targets for assay development. Proteomics offers the potential to address this challenge and facilitate advances in diagnostics development for TB and other diseases. . Plasma is easy to obtain and has been used for diagnosis of a variety of infectious diseases, such as AIDS, 3 Hepatitis C 4 and recently, Sars-CoV-2. 5 The plasma proteome also represents a particularly challenging matrix to analyze due to the large dynamic range of protein concentrations spanning 10 orders of magnitude and the overwhelming presence of a select set of highly abundant proteins (e.g. albumin). Historically, this has limited both the number of proteins detected, as well as the reproducibility of detection. To mitigate issues in protein detection, numerous studies have successfully employed extensive off-line chromatographic fractionation, allowing for the injection of individual fractions of reduced complexity into the mass spectrometer. This approach has been highly successful to increase the number of proteins detectable in plasma, 6, 7 albeit at the cost of a correspondingly dramatic increase in MS acquisition time to analyze dozens of fractions. Furthermore, the reliance on off-line fractionation introduces a low-throughput and cumbersome additional step in sample preparation that is not accessible to many labs. Reproducible protein quantification is also critical for biomarker discovery, as differences in the abundance of specific proteins can be used as a clinical marker. Regardless of the method of quantification employed, data-dependent acquisition (DDA) 8 strategies suffer from stochastic precursor ion sampling resulting incomplete quantification, particularly with increase sample numbers. 9 In contrast, data independent acquisition mass spectrometry approaches (DIA/SWATH-MS) 10 sequentially sweep across m/z precursor isolation windows to acquire multiplexed tandem mass spectra irrespective of which peptides are being sampled. This results in highly complete and consistent quantification that readily scales for the analysis of hundreds or thousands of samples. While DIA offers great potential for plasma proteomics, most studies have been limited to measuring ≈ 300 proteins, 11 partially due to the lack of comprehensive spectral libraries that are used to guide peptide identification and quantitative data extraction. Here we offer a plasma proteomics spectral library in which we have utilized accessible tipbased fractionation and non-conventional proteases to boost proteome sequence coverage, and combined this with a DIA-MS strategy to reproducibly quantify the differential regulation of the plasma proteome upon active TB infection. Sample-specific library generation Plasma samples from 3 adults with (0 with HIV) and 3 adults without (0 with HIV) active pulmonary TB were used from the FIND specimen bank. The samples were inactivated by addition of 2x inactivation buffer (8M urea, 100mM ammonium bicarbonate, 150 mM NaCl) in a 1:1 v:v ratio, followed by addition of RNAse (NEB) to 0.75µL/mL concentration. 10 µL of plasma from the individuals with active TB were pooled and depleted using the top12 most abundant depletion kit (Thermo-Fisher) according to manufacturer's instructions. Following depletion, the samples were boiled at 90°C for 5 minutes. Denatured proteins were reduced with 5 mM TCEP for 30 minutes at 56°C and then alkylated with 10 mM of chloroacetamide for 30 minutes at room temperature in the dark. The samples were then loaded into a Samples were resuspended in 100 µL of 50 mM ammonium bicarbonate and then subjected to proteolysis using either 2 µg of trypsin (Promega), 2 µg of AspN (Promega), or 2 µg of GluC (Sigma-Aldrich) overnight at 37°C on a shaker at 1000 rpm. Peptides were collected by centrifugation (8000 g for 30 minutes) and the filters were washed once with 100 µL of ddH20. To perform basic reverse phase fractionation, the samples were acidified to 0.1% TFA final concentration. C18 spin columns (Nest group) were activated with 1 column volume of ACN and equilibrated with two column volumes of 0.1% TFA. Peptides were bound to the column and washed twice with 0.1% TFA. For elution, 7 solutions were used with increasing concentration of ACN in 0.1% triethylamine from 2.5% to 20% and following the last elution the column was washed twice with 1 column volume of 50% ACN (see Supplementary Table 1 ). Fractions were dried under vacuum and resuspended in 15 µL buffer A (0.1% FA in MS-grade H20) and approximately 500 ng were subjected to proteomic analysis. In plate sample processing for Mtb positive and negative samples Data for each fraction was acquired on a timsTOF Pro mass spectrometer (Bruker) interfaced with a Thermo Easy-nLC 1200 (Thermo Fisher Scientific). The peptides were separated at a flow rate of 400 nL/min over a manually packed 15 cm long column containing 1.7 µm BEH beads (Waters) packed with a silica PicoTip TM Emitter (inner diameter 75 µm) (New Objective, Woburn, USA). Peptides were eluted from the column using a linear gradient from 2% to 32% buffer B (80% acetonitrile and 0.1% formic acid in HPLC grade H 2 O) in Buffer A (0.1% formic acid in HPLC grade H 2 O) with a total length of 90 minutes. The peptides were sprayed into the timsTOF Pro using a CaptiveSpray source (Bruker), with a end plate offset of 500 V , a dry temp of 200°C, and with the capillary voltage fixed at 1.6 kV . The mass spectrometer was operated in positive ion mode. For DDA acquisition the timsTOF Pro (Bruker) was operated in PASEF mode using Compass Hystar v5.1 and oTOF control v6.2. The mass range was set between 100-1700 m/z, with 10 PASEF scans between 0.6 V s/cm 2 and 1.6 V s/cm 2 . Accumulation time was set to 2 ms and ramp time was set to 100 ms. Fragmentation was triggered at 20,000 arbitrary units (a.u.) and peptides (up to charge 5) were fragmented using collisionally-induced dissociation (CID) with a spread between 20 eV and 59 eV . For DIA acquisition, each sample was acquired on the same HPLC-MS setup previously described, and analyzed with either the 90 min gradient used for DDA analysis, or a shorter 50 minute gradient in which peptides were separated for 35 minutes using a linear gradient of buffer B (80 % acetonitrile and 0.1% formic acid in HPLC grade H 2 O) from 5% to 33%, then buffer B was increased to 40% in 5 minutes and the column was washed at 90% for 10 minutes before the next run. The separation was done at 400 nL/min while the column wash was performed at a flow rate of 500 nL/min. Similar MS1 range, PASEF parameters, and fragmentation parameters were employed as described above for DDA. 12 DIA-PASEF scans were performed. The AspN library and trypsin libraries were generated using Spectronaut. 12 The samples were searched using Pulsar against a combined database encompassing the Mycobacterium Tubercolosis proteome (4081 entries, downloaded from Uniprot on the 12/02/21) and Homo Sapiens proteome (20,397 entries, downloaded on 07/01/21). The default BGS settings without iRT normalization were used. The GluC spectral library was generated using MS-Fragger. 13 Briefly, the 'SpecLib' workflow was employed using default parameters. The number of missed cleavages was fixed to 2, using cysteine carbamydomethylation as fixed modification, N-terminal acetylation and methionine oxidation as variable modifications. The GluC DDA-PASEF files were also searched against the combined human-Mtb database. Decoys were generated by pseudo-inversion as previously described. 14 Both searches were performed with 1% FDR at peptide and protein level. EasyPQP (https://github.com/ grosenberger/easypqp, commit #dfa4ead) was used to generate the aligned retention time using high confidence iRT (ciRT). The resulting library was then converted into a Spectronautcompatible library using an in-house Python script. The final sample specific spectral assay combined data from all proteases and encompasses 765,411 assays from 30,400 peptides mapping to 3309 protein groups (Supplementary Table 2 ). The spectral assay library has been deposited to the ProteomeXchange via the PRIDE 15 partner repository with the dataset identifier PXD025671. To compute sequence coverage the protein coverage summarizer from the Pacific Northwest National Laboratory was used (https://github.com/ PNNL-Comp-Mass-Spec/protein-coverage-summarizer). Data processing and analysis for DDA and DIA data DIA data for each protease was searched independently for both 90 minutes and 50 minutes gradients using Spectronaut and the correspondent spectral library. The settings employed in Spectronaut were default BGS (iRT normalization kit off) and each file was exported at the peptide level. For protein inference the average top3 peptide intensities were used. The resulting protein level matrix was log2-transformed and the data was normalized using median-centering. For missing value imputation, a distribution-based strategy was employed. For each sample, we selected the lowest 10% of values and calculated standard deviation (σ) and mean (µ). We then generated a normal distribution having similar σ but downshifted mean by 1.8 × σ. Rational for this imputation strategy is that lack of peptide detection cannot be differentiated between precursor ion intensity being below the limit of detection (LOD) or true biological absence. By sampling intensities below the LOD (defined here as the lowest 10% of recorded values per MS-injection) we assume that all not-detected peptides are below the LOD of the instrument. Following normalization and imputation the log2FC was calculated as ratios of the average intensities between Mtb infected and not infected individuals in log space. P were calculated using a two-tailed Welch t-test and corrected for multiple testing using the Benjamini-Hochberg correction. The coefficient of variation was calculated on the non-log transformed data and defined as σ µ . For estimation of concentration for proteins detected in the spectral library, the con- Comprehensive plasma proteome spectral library generation Mycobacterium tuberculosis (Mtb) proteins have been challenging to detect in plasma due to their intrinsic low abundance, estimated to be in the picomolar range, 26 Figure 4) . When analyzing the downregulated proteins, we observed several immunoglobulins having lower abundance in our TB cohort compared to the healthy controls. Interestingly, this has also been observed in another proteomics study. 7 Overall our analysis recapitulates previous findings and showcases the applicability of DIA and multi-protease digestion for robust analysis of clinical samples. Clinical proteomics play an important role in understanding the pathogenesis of human disease and identifying new biomarkers for diagnosis and treatment monitoring. As plasma is easy to obtain and commonly used in diagnostic testing, we developed a novel protocol that utilizes orthogonal proteases coupled with DIA-MS to improve dynamic range, protein coverage, and quantification. While mass spectrometry has not been routinely used in large scale clinical trials and biomarker discovery cohorts, it has the potential to be a key technology for robust protein detection and quantification in a variety of clinical settings. We have demonstrated its utility in TB disease, which triggers a large host response and creates a complex plasma sample that can challenge standard mass spectrometry approaches. From a biological perspective, our results recapitulate several previous transcriptomic and proteomic analyses from TB patient samples, such as the upregulation in inflammatory pathway components reported to be specific for TB disease. 35 The sensitivity of our meth- While we observed a slight decrease in protein identifications upon pooling proteases in DIA analysis, the proteins additionally identified by trypsin were not consistently found across samples and are thus unlikely to have potential clinical utility. Altogether, we showcase the applicability of library-based DIA-MS for plasma proteomics for consistent recovery of hundreds of proteins with a great degree of quantitative accuracy. We anticipate our spectral library can serve as a useful as a base for future biomarker studies utilizing the timsTOF Pro, or complemented with additional assays to increase proteome coverage. While our approach showed improvements over previous methods, a limitation is that current tools for DIA analysis, and more broadly DIA acquisition, have been developed specifically for tryptic digests. Thereby it is conceivable to develop ad-hoc DIA windows schemes which exploit differences between proteases (e.g. z, m/z, etc.) to more comprehensively sample the precursor space while reaching an optimal duty cycle. Further advances in software could also include FDR models trained on non-tryptic sets or novel decoy-generation methods may also significantly improve the number of peptides which are possible to extract from DIA data using alternatives proteases. Looking forward, the application of alternative proteases could be beneficial to perform deep proteomic profiling of clinical specimen and to increase the confidence in identified proteins in large clinical cohorts. We used digested plasma from different proteases and acquired them in DIA-MS using a library derived from a tip-based fractionated representative plasma sample. We showed increased sequence coverage, robustness, and reduced missing values for the combination of AspN, GluC, and trypsin compared to a standalone tryptic digested sample. Mass spectrometry-based proteomics High-priority target product profiles for new tuberculosis diagnostics: report of a consensus meeting Guidelines for Using HIV Testing Technologies in Surveillance Working Group on Global HIV / AIDS / STI Surveillance Hepatitis C: Diagnosis and treatment Proteomic and Metabolomic Characterization of COVID-19 Patient Sera Robust Microflow LC-MS/MS for Proteome Analysis: 38 000 Runs and Counting Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis Andromeda: A peptide search engine integrated into the MaxQuant environment Multibatch TMT reveals false positives, batch effects and missing values Aebersold, R. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis Analysis of 1508 plasma samples by capillary-flow dataindependent acquisition profiles proteomics of weight loss and maintenance Extending the limits of quantitative proteome profiling with dataindependent acquisition and application to acetaminophen-treated three-dimensional liver microtissues Ultrafast and comprehensive peptide identification in mass spectrometrybased proteomics Target-decoy search strategy for increased confidence in largescale protein identifications by mass spectrometry The PRIDE database and related tools and resources in 2019: Improving support for quantification data Tissue-based map of the human proteome Skyline: An open source document editor for creating and analyzing targeted proteomics experiments Interactive peptide spectral annotator: A versatile web-based tool for proteomic applications Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool Array programming with NumPy SciPy 1.0: fundamental algorithms for scientific computing in Python Online parallel accumulation-serial fragmentation (PASEF) with a novel trapped ion mobility mass spectrometer An Augmented Multiple-Protease-Based Human Phosphopeptide Atlas Multi-in-One: Multiple-Proteases, One-Hour-Shot Strategy for Fast and High-Coverage Phosphoproteomic Investigation A high-confidence human plasma proteome reference set with estimated concentrations in PeptideAtlas Quantification of circulating Mycobacterium tuberculosis antigen peptides allows rapid diagnosis of active disease and treatment monitoring Cloning and characterization of secretory tyrosine phosphatases of Mycobacterium tuberculosis Dual RNA-Seq of Mtb-Infected Macrophages In Vivo Reveals Ontologically Distinct Host-Pathogen Interactions Identification of new diagnostic biomarkers for Mycobacterium tuberculosis and the potential application in the serodiagnosis of human tuberculosis Rv0753c and Rv0009 antigens specific T cell responses in latent and active TB -a flow cytometrybased analysis Evaluation of cytokine and chemokine response elicited by Rv2204c and Rv0753c to detect latent tuberculosis infection BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes Pathogen recognition and innate immunity Tuberculosis is associated with expansion of a motile, permissive and immunomodulatory CD16+ monocyte population via the IL-10/STAT3 axis Mycobacterium tuberculosis infection and inflammation: What is beneficial for the host and for the bacterium? Frontiers in Microbiology Extracellular membrane vesicles in the three domains of life and beyond Extracellular vesicles in the context of Mycobacterium tuberculosis infection Gram-positive bacterial extracellular vesicles and their impact on health and disease We thank Nevan J. Krogan for use of the Thermo Fisher Scientific Proteomics Facility for Disease Target Discovery at the Gladstone Institutes. We thank FIND for providing plasma samples from its specimen bank.Funding: NIH R01GM133981 to DLS, NIH R01AI152161 to AC and JE, and NIH K23HL153581 to DJ.