key: cord-0002629-ciwts11b authors: Burke, Thomas W.; Henao, Ricardo; Soderblom, Erik; Tsalik, Ephraim L.; Thompson, J. Will; McClain, Micah T.; Nichols, Marshall; Nicholson, Bradly P.; Veldman, Timothy; Lucas, Joseph E.; Moseley, M. Arthur; Turner, Ronald B.; Lambkin-Williams, Robert; Hero, Alfred O.; Woods, Christopher W.; Ginsburg, Geoffrey S. title: Nasopharyngeal Protein Biomarkers of Acute Respiratory Virus Infection date: 2017-02-21 journal: EBioMedicine DOI: 10.1016/j.ebiom.2017.02.015 sha: bc7219efb063d1c0aa002e536f9503f999fbcb42 doc_id: 2629 cord_uid: ciwts11b Infection of respiratory mucosa with viral pathogens triggers complex immunologic events in the affected host. We sought to characterize this response through proteomic analysis of nasopharyngeal lavage in human subjects experimentally challenged with influenza A/H3N2 or human rhinovirus, and to develop targeted assays measuring peptides involved in this host response allowing classification of acute respiratory virus infection. Unbiased proteomic discovery analysis identified 3285 peptides corresponding to 438 unique proteins, and revealed that infection with H3N2 induces significant alterations in protein expression. These include proteins involved in acute inflammatory response, innate immune response, and the complement cascade. These data provide insights into the nature of the biological response to viral infection of the upper respiratory tract, and the proteins that are dysregulated by viral infection form the basis of signature that accurately classifies the infected state. Verification of this signature using targeted mass spectrometry in independent cohorts of subjects challenged with influenza or rhinovirus demonstrates that it performs with high accuracy (0.8623 AUROC, 75% TPR, 97.46% TNR). With further development as a clinical diagnostic, this signature may have utility in rapid screening for emerging infections, avoidance of inappropriate antibacterial therapy, and more rapid implementation of appropriate therapeutic and public health strategies. Acute respiratory viral (ARV) infections are among the most common reasons for patient visits in primary and acute care settings (Hong et al., 2004; Johnstone et al., 2008) . Many viruses cause such acute respiratory illness including human rhinovirus (HRV), respiratory syncytial virus (RSV) and influenza. These viruses can be associated with a range of clinical severity from asymptomatic to mild, self-limited illness to respiratory failure and death. Influenza alone causes 25 to 50 million infections annually in the USA, resulting in several hundred thousand hospitalizations and 20-40,000 deaths (Thompson et al., 2010) . Despite viral etiologies driving most cases of acute respiratory infection, definitive diagnostic tools for these syndromes are lacking. Even highly sensitive pathogen-specific tests such as PCR are dependent upon proper sampling technique and inclusion of virus-type-specific reagents and processing methods. Moreover, detection of a specific microbe in a clinical sample does not necessarily indicate the cause of the acute clinical syndrome. For example, it has been reported that HRV has been detected in up to 44% of asymptomatic individuals (Byington et al., 2015; Johnston et al., 1993) . Therefore, better tools that help providers define the etiology of a suspected infectious syndrome in a safe, rapid, accurate, and cost-effective manner are of paramount importance for both individual and public health as recently noted by the Presidential Advisory Council on Combating Antibiotic-Resistant Bacteria (House, 2014) , and others (O'Neill, 2015; Organization, 2015) . A complementary diagnostic strategy to pathogen detection could focus on utilizing the varied (but pathogen-class specific) host-response to infection (Ramilo and Mejias, 2009; Zaas et al., 2014) . This approach discriminates between infection and colonization. It is pathogen-agnostic and therefore circumvents another limitation of pathogen detection assays, which due to technical limitations are only capable of detecting a limited subset of microorganisms. Furthermore, categorizing infection based on host response provides additional insights into the mechanisms of infection and disease response, and may offer new targets, pathways, or strategies for therapeutic intervention. We recently identified gene expression patterns in peripheral whole blood capable of differentiating (Zaas et al., 2009; McClain et al., 2016; Tsalik et al., 2016; Huang et al., 2011) individuals with symptomatic infection due to influenza H3N2, HRV, or RSV from uninfected individuals with N90% accuracy. Moreover, this ARV signature was validated in an independent population of patients with influenza A infection, demonstrating an ability to distinguish from bacterial respiratory infections (93% accuracy) and healthy controls (100% accuracy) (Zaas et al., 2009 ). Thus, host derived biomarkers are capable of making these types of distinction. However, considering the technical challenges inherent in developing peripheral blood host gene expression classifiers as a diagnostic toolincluding semi-invasive venipuncture, RNA instability, processing complexity, relatively high cost of RNA profiling, and time to resultwe sought to extend this host response paradigm for ARV diagnosis to an alternative and potentially more suitable sample matrix and analyte class. Upon contact with the respiratory epithelium, respiratory viruses incite activation of type I interferons (IFNs) and pro-inflammatory cytokines, orchestrate proliferation of inflammatory cells and the innate immune response, and regulate induction of adaptive immunity (Yoneyama and Fujita, 2010; Koyama et al., 2008; Bhoj et al., 2008) . Based on the prominent role of the nasopharyngeal epithelium in mediating ARV infections, we hypothesized that nasopharyngeal lavage (NPL) would reflect the in situ host response and serve as a potential target for diagnostic development. Furthermore, the NPL protein fraction represents an accessible sample matrix, providing a highly tractable diagnostic analyte class. Multiple reaction monitoring (MRM), a quantitative mass spectrometry (MS) platform for facile development of multiplexed, quantitative assays for measuring specific protein levels in biologic fluids and is routinely used for biomarker verification in clinical cohorts (Kiyonami et al., 2011; Gerszten et al., 2010; Boja and Rodriguez, 2011) . In addition to being customizable for nearly any target protein, MRM assays provide a more specific quantitation of individual proteins and protein isoforms by targeting multiple unique peptides per protein target. Combined with internal stable-isotope labeled (SIL) peptide standards, these assays match or exceed the quantitative precision of ELISA assays with low femtomole limits of quantitation and analytical precision coefficient of variation b 10% across clinically sized cohorts (Addona et al., 2009; Aebersold et al., 2013) . Using human viral challenge cohorts for influenza A/H3N2 and HRV, we have discovered and independently verified multiple NPL protein biomarkers capable of classifying human influenza A and HRV infection from uninfected individuals. This work reinforces the important concept that host response to infection, particularly in the NPL proteome, serves as a potential basis for diagnostic testing. It also sheds light on the complex interactions of host and pathogen in two of the most common infectious diseases in humans. All pathogen exposures were approved by the relevant Institutional Review Boards and conducted according to the Declaration of Helsinki. All volunteers provided informed consent. The objective of these experimental challenge studies was to generate clinico-molecular classifiers of ARV infection through the development and characterization of high-density sample and data sets across the course of respiratory virus exposure, infection, and resolution. A description of methods used in each challenge study can be found in Supplementary materials and have been described previously McClain et al., 2016; Woods et al., 2013; Zaas et al., 2009) . Briefly, healthy volunteers underwent extensive pre-enrollment health screening and were excluded for positive baseline antibody titers of the strain of virus used in each challenge (Influenza A H3N2 A/Wisconsin/67/2005 or HRV serotype 39). Following 24-48 h in quarantine, we instilled viral inoculum into bilateral nares of subjects using standard methods. At predetermined intervals, biological samples and clinical and symptom data were collected. NPL sampling was performed daily for each participant. The H3N2 #2 cohort included an early (36 h post-inoculation) oseltamivir treatment arm, while HRV #2 included a blinded "sham" inoculation (saline only) control group. NPL analyses included baseline and time T samples from all individuals in each challenge study with complete and unambiguous symptomatology and microbiology data, and available NPL samples. Sample phenotype labels were blinded for MRM analysis, but were assayed in a manner to ensure that samples from an individual, and from within a challenge study, would be processed in same batch and assayed in close temporal proximity, to minimize batch effects between distinct phenotypes. Self-reported symptoms were recorded at predetermined intervals prior to inoculation and at least twice daily throughout the time-course of infection and resolution as reported previously (Zaas et al., 2009; Jackson et al., 1958) and described in Supplementary materials. This modified Jackson score requires subjects to rank 8 symptoms of upper respiratory infection (headache, sore throat, rhinorrhea, rhinitis, sneezing, coughing, myalgia, malaise) on a standardized scale of 0 (no symptoms) to 3 (high symptoms). Symptom scores were tabulated for each study participant to assign symptom status as symptomatic or asymptomatic (Supplementary Table S2A ). For each symptomatic subject, time T was identified as time of maximal symptoms. The average time T was then defined for that cohort, which served as the time chosen for asymptomatic subjects (Table 1) . Participants were tested for virus shedding based on quantitative culture assays as described previously Table 1 Description of experimental ARV challenge cohorts. Four experimental HRV challenge cohorts (two influenza A/H3N2 and two HRV) are described, including adjudicated phenotype summary data for each. Individuals with discordant a symptom and shedding labels, i.e. symptomatic non-shedders, or asymptomatic shedders, are shown. Sx = symptomatic; Asx = asymptomatic; mean Sx and Asx time T represents the average time of maximal self-reported symptoms among subject included in NPL analysis. (Zaas et al., 2009 ) and outlined in Supplementary Table S2B . For the purpose of the current analysis, we differentiated between "symptomatic" and "shedding". A symptomatic subject shedding virus was labeled as "infected". Asymptomatic non-shedders were "uninfected". Discordance between symptom and shedding status were reconciled by measuring a previously published peripheral blood gene expression score, as described in Supplementary materials. This tiebreaker was a gene expression analysis (GEA) representing the host peripheral blood response to viral infection (Supplementary Sample collections for subjects in H3N2 cohort #1 and HRV cohort #1 have been described previously (Zaas et al., 2009) . Samples for H3N2 cohort #2 and HRV cohort #2 were collected, processed, and analyzed similarly. Briefly, biological samples were collected prior to inoculation (baseline) and at predetermined intervals throughout the course of infection and resolution. Nasopharyngeal lavage procedures were performed using sterile 0.9% saline solution (5 ml into each nares), as described in Supplementary Materials, prior to inoculation and at 24-hour intervals post-inoculation and stored in aliquots at −80°C. This study focused on comparison of pre-inoculation (baseline) samples with samples taken at or shortly following time T. NPL samples were processed for proteomic analyses as described in Supplementary materials. For the pooled 2D-LC-MS/MS discovery analysis, four sample pools were created representing the four groups in H3N2 #1 challenge -Uninf-BL (n = 4), Uninf-T (n = 6), Inf\ \BL (n = 6), and Inf-T (n = 8) -with equal protein mass (2 μg) from each participant. Normalized pooled samples (2D-LC-MS/MS) and individual participant samples (MRM) were reduced, alkylated, and digested with trypsin as described in Supplementary materials. Prior to analysis, all samples were spiked with ADH1_YEAST digest (Massprep standard, Waters Corporation) as an internal technical standard. Unbiased proteomic discovery analysis was performed using Nano-scale Capillary UPLC-MS/MS as described in Supplementary Materials. Briefly, quantitative 10-fraction 2D-LC-MS/MS was performed on duplicate injections for each sample pool, providing accurate mass and intensity (abundance) acquisitions with qualitative identification of the resulting peptide fragments via searching against a SwissProt_Human (www. uniprot.org) database that also contained a reversed-sequence "decoy" database for false positive rate determination. Analytical reproducibility of the label-free 2D-LC-MS/MS method was assessed by calculating the variation in measured abundance of the spiked ADH1_YEAST standard, demonstrating a coefficient of variation of 10.6% across all eight injections. For quantitative processing, peptide quantities across all ten 2D-LC fractions were summed and the dataset was intensity-scaled to the robust mean (excluding highest and lowest 10% of detected features) across all quantitative acquisitions. The final quantitative dataset for NPL was based on 3285 peptides and contained 438 unique proteins. Following the selection of 25 candidate protein targets from the unbiased discovery data, all individual samples were subjected to a targeted MRM assay as detailed in Supplementary materials. MRM assay development and transition selection was performed within the open-access Skyline (MacCoss Laboratory, Univ of Washington) software. Initially, up to five unique peptides were selected from each candidate protein based on average precursor ion intensity. Five transitions for each precursor ion were selected based on 1) qualitative DDA discovery MS/MS data, 2) other discovery datasets for which the same peptide sequence was identified or 3) from the PeptideAtlas (www. PeptideAtlas.org) public repository. Following deployment of the initial MRM assay on a healthy human control NPL pool, the MRM method was optimized to choose three transitions from the two most robust peptides per protein. Custom SIL peptides were ordered for each candidate peptide to be assayed, and were spiked into each individual digested NPL participant sample at one of four ratios relative to endogenous peptide as described in Supplementary Materials. Each of the four patient cohorts (two H3N2 and two HRV) was run as individual run blocks and samples within a cohort were randomized in injection order across the cohort. Single MRM assays were performed on spiked NPL samples with a target quantity of up to 1 μg on-column. Four samples (3 from H3N2 #1 and 1 from H3N2 #2) were prepared and MRM quantification was attempted, but did not have sufficient protein material to generate robust quantitative data and therefore were excluded from subsequent analyses. To assess analytical variation, an equal portion of each patient's SIL spiked NPL sample was used to generate a QC pool, which was run approximately every 12 h across the entire cohort. In addition, all samples were spiked with five SIL peptides from yeast_ADH as an internal technical control. MRM assay reproducibility metrics are shown in Supplementary Fig. S1 . We examined 51 human peptide analytes (plus 5 yeast ADH) from 26 different human proteins. Eighty subjects from the four viral challenge studies were assayed although four subjects had insufficient NPL material at one time point resulting in 156 samples. Four peptides with N2 missing values were excluded from analysis. Missing values and zeroes (11 sample-analytes) were imputed with half the observed minimum value of a given peptide, and expression levels from the remaining 47 peptides were log transformed and carried forward for further analysis. Simple batch (study) correction was performed by removing study-wise mean values from each peptide. Final analysis included 47 human peptides measured in 156 samples. Univariate testing was performed using two-sided t-tests with Benjamini-Hochberg FDR corrected p-values. For classification we used sparse logistic regression, in particular a Least Absolute Shrinkage and Selection Operator (LASSO) generalized linear model with binomial likelihood (Friedman et al., 2010) . Performance metrics and model parameters were obtained via nested leave-one-out cross-validation (LOOCV). As classification performance metrics we consider area under receiver operating curve (AUROC) (Fawcett, 2006) , true positive rate (TPR), and true negative rate (TNR). A 17 protein (30 peptide) relaxed classifier (α = 0.1 rather than 1.0) was subjected to pathway association analysis using DAVID 6.7 Functional Annotation Tool (Huang da et al., 2009) using UniProt accession identifiers and human background. The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. Computational scripts were written in MATLAB using the GLmnet toolbox (web.stanford.edu/~hastie/glmnet_matlab/) and can be accessed at https://bitbucket.org/rhenao/npl_ebm. Four independent viral challenge studies were conducted (Table 1) two challenges with influenza A/H3N2 and two with HRV, and were described previously Woods et al., 2013; Zaas et al., 2009) . Clinical and self-reported symptom data, along with corresponding samples, were collected multiple times per day across the course of exposure, infection, and resolution. For each challenge participant, standardized algorithms were applied to assign phenotype labels describing symptomatic, shedding, and "infected" status (Supplementary Tables S2A, S2B , S2C, respectively) across the timecourse. Since symptom and shedding status were not always congruent, we required that both be present to define a patient as "infected". When both were absent, that subject was labeled "uninfected". When discordant, we applied a tiebreaker based on gene expression analysis (GEA) representing the host peripheral blood response to viral infection (Supplementary Table S2C ) (unpublished data; N. Arzouni, T. Burke, M. McClain, A. Hero). Of the 80 challenge participants included in this analysis, just over half (42/80) of the participants were adjudicated as becoming infected (H3N2 #1: 9 of 15 infected; H3N2 #2: 10 of 21; HRV #1: 10 of 20; and HRV #2: 13 of 24). An overview of the proteomic discovery and candidate biomarker verification strategy is depicted in Fig. 1 . Despite being a readily accessible matrix, one technical challenge of studying NPL is the variability in protein yield from subject to subject. To overcome this limitation, we utilized a two-phase biomarker discovery strategy: The first phase used pooled NPL samples from a single H3N2 challenge for in-depth MS-based discovery proteomics to assist in identification and prioritization of candidate biomarkers. In the second phase, targeted assays were developed for these candidates using the more sensitive and quantitatively reproducible MRM approach. Verification of the utility of the markers was then performed in the original and three independent challenge cohorts using targeted MRM quantitation. Discovery proteomics was performed on the H3N2 #1 challenge cohort using open-platform, 2-dimensional liquid chromatography, tandem MS (2D-LC-MS/MS) analysis of four sets of pooled NPL samples (uninfected and infected individuals at baseline and time of maximal symptoms, time T) using equal protein mass per participant sample. Across the four unique NPL sample pools, a total of 3285 peptides corresponding to 438 unique proteins were identified at a 1.0% peptide-level false discovery rate (FDR). We next investigated the variable expression of NPL proteins in infected and uninfected viral challenge pooled samples. Three criteria were used to prioritize suitable peptides for subsequent MRM quantification from the entire collection of 438 identified NPL proteins (Supplementary Table S3 ). First, we sought a minimum two-fold change in expression between baseline and time T in the infected pool in at least 2 peptides per protein; this criterion was used to classify proteins that would increase or decrease as a function of infection within the same individuals over time. Levels of 107 proteins increased at least two-fold, while 61 proteins decreased at least two-fold from baseline to time T. The second criterion was a greater than two-fold difference between infected and time-matched uninfected subjects; this criteria was to enable specificity of the proteins by comparing test and control subjects at a time where symptoms are present. This included 36 proteins with higher expression and 33 proteins with lower expression. The third criterion excluded proteins that might reflect general nasal trauma stemming from repeated collections. To address this, we calculated the expression change between the time of maximal symptoms for infected subjects (Inf-T) relative to similar times for uninfected subjects (Uninf-T), and prioritized candidates with unique response to infection at time T. Additionally, four proteins met 2 of 3 criteria and also had a reported association with infection (IC1) (Zaas et al., 2009 ) and inflammation (APOA1, APOA2, and APOA4) (Pirillo et al., 2015) and were included in the verification phase. Based on these criteria, 25 proteins were selected for subsequent MRM assay development -13 had increased and 12 had decreased expression in infected participants. MRM is a quantitative LC-MS/MS method utilizing synthetic, SIL peptides as internal standards, and provides absolute specificity for the target analyte and relative abundance measurements (Kiyonami et al., 2011; Addona et al., 2009) . A total of 51 unique peptide MRM assays were designed to target two unique peptides for each of the 25 prioritized candidate biomarker proteins, plus human serum albumin and Fig. 1 . Study design and experimental workflow. A two-phased strategy was employed to identify and characterize candidate protein biomarkers of ARV infection from NPL samples collected from participants in four experimental ARV challenge cohorts. For phase 1 discovery analysis, four NPL pools were prepared from H3N2 #1 cohort and analyzed using unbiased 2D-LC-MS/MS. The numbers of subject (N) with samples included in each pool are shown (Uninf = uninfected individuals; Inf = infected individuals; BL = baseline; T = time of maximal symptoms). For phase 2, the original and three additional independent challenge cohorts were assayed by targeted MRM. Quantitative peptide expression data from 80 individuals and 156 total samples were used in the derivation of an NPL ARV classifier, and classification performance was assessed in independent challenge cohorts using LOOCV. yeast alcohol dehydrogenase as a spiked exogenous control. Two proteins yielded only a single peptide that was suitable for an MRM assay -Statherin (only one viable SIL peptide could be synthesized) and Filaggrin (one of two SIL peptides was insoluble). A third protein, Calcyphosin, yielded a third suitable peptide assay from the publicly available PeptideAtlas database. Following development, MRM assays were performed on 156 individual NPL samples from 80 subjects across four viral challenge study cohorts. This included the original influenza H3N2 cohort (H3N2 #1), a second H3N2 cohort (H3N2 #2), and two HRV challenge cohorts (HRV #1 and #2). From the original set of 51 human peptide assays designed, 47 peptides representing 26 proteins were successfully measured. For the H3N2 #1 cohort, we observed that both the direction and magnitude of change in protein expression was consistent between the initial pooled discovery measurements ( Fig. 2A ) and the targeted MRM peptide measurements (Fig. 2B) . This was expected, since the pooled sample is the biological average of all samples included in the pool. For example, retinoic acid receptor responder protein TIG1 was measured across three unique peptides in the pooled discovery data set, with an average fold change from uninfected to infected time T of 3.3-fold. Targeted MRM assays developed for two TIG1 peptides demonstrated average increased expression of 2.3-and 2.0-fold when assayed in individual NPL samples (Table 2) . Similarly, two complement factor B (CFAB) peptides displayed 2.0-and 2.0-fold increased expression when assayed in individual H3N2 #1 samples by MRM, compared to a 3.6-fold increase for the summed 24-peptide CFAB protein composite assayed in the pooled H3N2 #1 discovery dataset. Also as expected, the pooled discovery dataset shows much more narrow standard deviation ( Fig. 2A) , as this deviation is due only to technical variability, whereas the variability in the targeted MRM experiment of all individuals (Fig. 2B ) is able to discern the actual biological variation. These combined results provided confidence that the pooled-sample 2D-LC-MS/MS phase 1 discovery strategy gives a good measure of the average protein expression level in situations where the matrix of interest is extremely sample-limited, and that the phase 2 targeted MRM assays reproduce protein expression changes candidates selected using the pooled data, while also providing biological variance with higher sensitivity. We then sought to independently verify candidate peptide expression changes in individual participant samples from a second H3N2 challenge cohort, as well as two additional challenge studies using a second common ARV, human rhinovirus. Individual participant NPL samples from the 3 additional cohorts were processed and measured using the same panel of experimental and control peptide MRM assays derived from the H3N2 #1 cohort. In a combined analysis across the four independent ARV cohorts, we found that median sample-wise peptide expression intensity was relatively constant across all samples regardless of infection status, though 26 of 47 peptide analytes measured to be differentially expressed (Benjamini-Hochberg FDR b 0.05) at the time of maximal symptoms in the infected H3N2 subjects. Likewise, 30 peptide analytes were differentially expressed (Benjamini-Hochberg FDR b 0.05) in the HRV subjects at time T, with 16 peptides overlapping between the two H3N2 and two HRV cohorts. The direction and magnitude of change for all 40 peptide analytes (10 H3N2 only, 14 HRV only, 16 in both) that were significantly differentially expressed in either the H3N2 or HRV cohorts are highly correlated (r = 0.871) between viral groups (Fig. 3) , indicating that these analytes are likely not virus typespecific or useful in differentiating between H3N2 and HRV. This finding provided a basis for investigating the utility of these protein assays in a multi-analyte classification model of ARV infection. To build an NPL protein classifier that distinguishes infected from uninfected subjects we utilized LASSO sparse logistic regression (Friedman et al., 2010) to build a list of potential logistic regression models. The classifier was trained to discriminate samples from infected individuals at time T (Inf-T) from paired baseline measures (Inf-BL) and from uninfected individuals at time T (Uninf-T). Fig. 4A shows the performance of a NPL classification model that selected 10 peptides (9 unique proteins), in the four independent challenge cohorts, including discordant individuals adjudicated using GEA status as a tiebreaker. For this analysis we ignored the paired nature of samples in the training and classification tasks in order to better estimate the accuracies in a situation in which baseline samples are not available. Overall classifier performance (Fig. 4B) in LOOCV yielded an AUROC of 0.8623 (95% CI: 0.7538-0.9315, bootstrapped 10 K samples) with a 75% TPR and 97.46% TNR. Baseline samples from the asymptomatic cohort were withheld from model training since this was not a phenotype the model was trying to identify. However, they represent a cohort of asymptomatic individuals available for validation. Applying the model Table 2 Candidate biomarker protein relative expression ratios. Rows represents the 26 host NPL proteins for which MRM assays were developed. UniProt gene symbol and protein names are shown for each candidate protein, with number of peptides measured for both unbiased pool and targeted MRM analyses. Unbiased pool ratios are shown for selection criteria 1, 2, and 3, as described in Supplementary Table S3 . Fold expression changes from BL to T for infected individuals, as measured by peptide-specific MRM assay, across all four ARV challenge studies are shown with two-sided t-test Benjamini-Hochberg FDR-adjusted p-values (in parentheses) for each peptide, respectively. to these 37 samples revealed only one misclassification error (2.7%). LOOCV confusion matrix and identities of the 10 peptide classifier with weights are shown in Table 3A and B, respectively. We also performed analyses separately on infection status of individuals independently of the GEA status (i.e., including only symptomatic shedders and asymptomatic shedders) with no surprisingly increased Table 3A and B, respectively. classification performance, 0.8821 LOOCV-AUC (vs. 0.8623), and on all individuals using shedding as outcome (ignoring symptoms) with 0.8610 LOOCV-AUC. The experimental design of the HRV #2 challenge included a "sham" inoculation control group (Fig. 4A , panel HRV #2 in green), with several participants receiving intranasal inoculation of saline without HRV. Four of the seven sham controls reported symptoms sufficient to be classified as symptomatic, despite receiving no HRV inoculum and showing no microbiological evidence of infection. All four subjects were classified as negative by NPL MRM analysis, consistent with absence of HRV shedding and no observed GEA host response in the blood mRNA ARV infection pathway (Huang et al., 2011; Woods et al., 2013; McClain et al., 2016) for these subjects. Since the classifier tested in Fig. 4 includes only 9 unique proteins, we repeated the analysis using a more relaxed variable selection parameter to better describe the biological pathways involved (LASSO regularization parameter, α = 0.1 rather than 1.0). The new classifier has nearly identical performance (0.8765 AUROC, 75% TPR, and 97.46 TNR) but selected 30 of 47 peptides (17 unique proteins). Pathway association analysis was performed using DAVID 6.7 Functional Annotation Tool (Huang da et al., 2009) using the 17 proteins selected by this expanded classifier as being informative in differentiating infected from uninfected individuals (results in Supplementary Table S5 ). Significant annotation clusters were primarily driven by 5 proteins (A1AG1, CFAB, A1AT, IC1, and APOA4), and were associated with Gene Ontology (GO) biological process terms acute inflammatory response (GO:0002526; p = 1.6e-4), innate immune response (GO:0045087; p = 9.9e-3), inflammatory response (GO:0006954; p = 5.0e-3), and defense response (GO:0006952; p = 3.9e-3). Similar results were seen when expanding the query to include all 26 protein candidates used in the phase 2 MRM analyses. We conducted unbiased and targeted protein analyses on human NPL samples collected from four independent ARV challenge cohorts (two influenza A H3N2, two HRV) to define host protein expression patterns characteristic of response to viral respiratory infection. The results demonstrate that robust changes in secreted proteins occur in the NPL of infected individuals, and that a subset of proteins is capable of accurately classifying the infected state of the individual. Despite changes to the assay approach and methods in transitioning from pooled phase 1 discovery with unbiased 2D-LC-MS/MS proteomics to individual measurements in phase 2 discovery with MRM, we observed good reproduction of both the direction and magnitude of peptide expression measurements between the methods. This provides confidence in the specificity and quantitative performance of both methods. Importantly, this approach enabled accurate selection of candidate biomarkers in a pooled phase 1 discovery where sample amounts were severely limited on individual samples, and precluded the ability to analyze all samples using unbiased 2D-LC-MS/MS. Though some biomarker candidate attrition was expected because the biological variance is not available from the pooled data, with appropriate strategic filtering of the candidates we were able to have a high success rate for statistical validation between phase 1 (pooled discovery) and phase 2 (individual MRM measurements). Investigations by our group and others have shown differential expression of host genes at the RNA level in peripheral blood in response to ARV infections (Ramilo et al., 2007; Zaas et al., 2009; Woods et al., 2013; Zaas et al., 2013) , with heavy representation of genes in the IFN-signaling canonical pathway and innate immune response signaling. Analysis of differentially expressed NPL proteins in infected individuals demonstrates involvement in several biological pathways critical to mounting a host defense to virus infection, including innate immune responses, acute inflammatory responses, and defense response pathways. The inclusion of three members of the complement system (CFAB, A1AT, and IC1) is particularly consistent with an innate immune response, as the complement system enhances the ability of antibodies and phagocytic cells to clear pathogens from the infected site (Ricklin et al., 2010) . Despite the apparently related pathways involved, there does not appear to be extensive overlap between the differentially expressed nasal proteins identified in our H3N2 #1 challenge study, and the previous peripheral blood RNA signatures of ARV infection characterized in the same challenge cohort (Zaas et al., 2009) . One gene product that is increased at both the NPL peptide and peripheral blood RNA level upon influenza infection (Cameron et al., 2008; Cameron et al., 2007; Zaas et al., 2009 ) is IC1 (SERPING1 gene). IC1 (also called C1-inhibitor) is a peptidase inhibitor belonging to the serpin superfamily and has an important role in innate immunity through modulation of the classical pathway of complement activation (Gaboriaud et al., 2004) . As the complement system has the potential to be damaging to host tissues, complement control proteins must tightly regulate activation. IC1 binds to complement protein C1 to inhibit activation of the classical complement pathway, and thus its discovery fits well with our understanding of the biology of these diseases. Notably absent from this NPL protein analysis are proinflammatory cytokine and chemokine gene products which have previously been shown to be strong contributors to the host response both in peripheral blood and near the site of ARV infection (Kimura et al., 2013) . Oshansky and colleagues assayed nasal lavage samples from a cohort of healthy and naturally-infected influenza patients using a multiplex cytokine and chemokine assay panel, reporting correlation of inflammatory cytokines MCP-3 and IFN-α2 with disease progression (Oshansky et al., 2014 ). An aptamer-based detection method was subsequently used to screen the same cohort and generate quantitative measures of over 1000 protein analytes from nasal lavage, showing differential expression of 162 proteins including cytokines associated with immune response to infection (Marion et al., 2016) . We did not identify Table 3 10-peptide classification model performance. (A) Classifier performance on individual samples from all four ARV cohorts as represented by LOOCV confusion matrix, with (B) identity and contribution (weight) for 10 peptides, with amino acid sequence in one-letter notation. Average peptide length is 11.8 amino acids, with range between 8 and 30 residues. Negative weight values indicate down-regulation upon infection. inflammatory cytokines to be differentially expressed in our pooled NPL analysis. It is possible that cytokine proteins in NPL samples are expressed at levels below the detection limits of LC/MS-based methods, and that coupling targeted methods capable of detecting and quantifying cytokines directly may provide enhanced datasets for biomarker discovery. The inclusion of "sham" infected participants in the HRV #2 cohort provided an important opportunity to assess the accuracy of standard clinical symptom assessments in the diagnosis of ARV infection. The clinical definition of symptomatic, even when applying standardized symptom scoring and algorithms, includes elements of subjectivity and thus may be imperfect in describing such infections. The finding that, despite the apparent presence of symptoms of infection, MRM analysis classified all seven individuals as uninfected suggests that this NPL biomarker assay has a lower false positive rate in diagnosing HRV infection than standard symptom status measures. Experimental challenge studies provide an excellent model for studying ARV infection and illness, with pre-screened volunteers, known time of exposure, standardized pathogen exposure, and extensive sampling and data collection through the course of illness. However, experimental challenges do not perfectly replicate natural ARV infection and illness in humans. Volunteers in these studies are young and healthy, and represent a relatively homogenous population. In contrast, patients presenting to clinical care are demographically heterogeneous, have a variety of comorbidities, present at various times in the course of their illness, and contain a far greater breadth of potential pathogens beyond the two studied here. As such, additional validation should be performed in a more diverse population, such as patients presenting to clinical care with community-onset disease. While the generalizability of this study's findings will require additional validation, the high TNR of 97.46% suggests this assay may provide value in ruling out ARV infection. Categorizing infection based upon host response represents an emerging strategy with great potential for complementing current pathogen-based diagnostics, as well as providing additional insights into the pathobiology of infection. The results presented in this study provide evidence that a protein-based host response to ARV infection can be detected in the nasopharyngeal space, and that this response involves perturbation of pathways involved in acute inflammation and innate immune response. Further, this work demonstrates that targeted assays measuring peptides involved in this response allow classification of ARV infection with a high degree of accuracy. Validation of these findings across independent experimentally infected Influenza A and human rhinovirus cohorts suggests a robust and generalized response to viral infection. With further development as a clinical diagnostic, this signature may have utility in rapid screening for emerging infections, avoidance of inappropriate antimicrobial therapy, and more rapid implementation of appropriate therapeutic and public health strategies. Nonetheless, as with other validated biomarkers, additional validation in community-based cohorts will be important to demonstrate the potential utility of such an assay in its clinical applications. Furthermore, testing this approach across a larger series of upper respiratory viruses will be important to understand its full potential utility and limitations. An assay that combines host protein biomarkers with nasal viral antigen detection may be quite valuable in clinical care to optimize therapeutic decision making. And whilst a positive result from such an assay may avoid the use of inappropriate microbial therapy, it will likely require vigilance on the part of the clinician to exclude bacterial co-infection when clinically indicated. Our previous demonstration of the potential for host response-based pre-symptomatic detection of H3N2 infection using blood RNA expression raises the intriguing possibility that an NPL protein host response assay might be useful in early detection of ARV infection, and should be evaluated. The availability of a proteomic 'signature' that accurately classifies ARV infection and might be migrated to simple and inexpensive antibody-based tests that are routinely used in both clinical laboratory and over-the-counter diagnostic applications represents an important advance, and may one day yield a ARV host response test that is safe, simple, rapid, inexpensive, and accurate. Funding Support for this work was provided by the U.S. Defense Advanced Research Projects Agency (DARPA) through contract N66001-07-C-2024. E.L.T. and M.T.M. were supported by award numbers 1IK2CX000530 and 1IK2CX000611, respectively, from the Clinical Science Research and Development Service of the VA Office of Research and Development. The funders had no role in the preparation of this manuscript. Multi-site assessment of the precision and reproducibility of multiple reaction monitoringbased measurements of proteins in plasma Western blots versus selected reaction monitoring assays: time to turn the tables? MAVS and MyD88 are essential for innate immunity but not cytotoxic T lymphocyte response against respiratory syncytial virus The path to clinical proteomics research: integration of proteomics, genomics, clinical laboratory and regulatory science Community surveillance of respiratory viruses among families in the Utah better identification of germs-longitudinal viral epidemiology (BIG-LoVE) study Gene expression analysis of host innate immune responses during Lethal H5N1 infection in ferrets Interferon-mediated immunopathological events are associated with atypical innate and adaptive immune responses in patients with severe acute respiratory syndrome An introduction to ROC analysis Regularization paths for generalized linear models via coordinate descent Structure and activation of the C1 complex of complement: unraveling the puzzle Integration of proteomic-based tools for improved biomarkers of myocardial injury Acute respiratory symptoms in adults in general practice National Strategy for Combating Antibiotic-Resistant Bacteria Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza a infection Transmission of the common cold to volunteers under controlled conditions. I. The common cold as a clinical entity Use of polymerase chain reaction for diagnosis of picornavirus infection in subjects with and without respiratory symptoms Viral infection in adults hospitalized with community-acquired pneumonia: prevalence, pathogens, and presentation Cytokine production and signaling pathways in respiratory virus infection Increased selectivity, analytical precision, and throughput in targeted proteomics Innate immune response to viral infection An individualized predictor of health and disease using paired reference and target samples Respiratory mucosal proteome quantification in human influenza infections A genomic signature of influenza infection shows potential for presymptomatic detection, guiding early therapy, and monitoring clinical responses Rapid Diagnostics: Stopping Unnecessary Use of Antibiotics Global Action Plan on Antimicrobial Resistance. World Health Organization Mucosal immune responses predict clinical outcomes during influenza infection independently of age and viral load HDL in infectious diseases and sepsis Gene expression patterns in blood leukocytes discriminate patients with acute infections Shifting the paradigm: host gene signatures for diagnosis of infectious diseases Complement: a key system for immune surveillance and homeostasis Estimates of deaths associated with seasonal influenza-United States Host gene expression classifiers diagnose acute respiratory illness etiology A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2 Recognition of viral nucleic acids in innate immunity A host-based RT-PCR gene expression signature to identify acute respiratory viral infection Gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans The current epidemiology and clinical decisions surrounding acute respiratory infections We thank the staff of the Duke Clinical Research Unit, University of Virginia School of Medicine Research Unit, and hVIVO for their assistance in conducting the experimental challenge studies. The authors declare that they have no competing interests. A provision patent application has been filed pertaining to the results presented in this article.