key: cord-0750142-xca1bkcv authors: Filbin, Michael R.; Mehta, Arnav; Schneider, Alexis M.; Kays, Kyle R.; Guess, Jamey R.; Gentili, Matteo; Fenyves, Bánk G.; Charland, Nicole C.; Gonye, Anna L.K.; Gushterova, Irena; Khanna, Hargun K.; LaSalle, Thomas J.; Lavin-Parsons, Kendall M.; Lilley, Brendan M.; Lodenstein, Carl L.; Manakongtreecheep, Kasidet; Margolin, Justin D.; McKaig, Brenna N.; Rojas-Lopez, Maricarmen; Russo, Brian C.; Sharma, Nihaarika; Tantivit, Jessica; Thomas, Molly F.; Gerszten, Robert E.; Heimberg, Graham S.; Hoover, Paul J.; Lieb, David J.; Lin, Brian; Ngo, Debby; Pelka, Karin; Reyes, Miguel; Smillie, Christopher S.; Waghray, Avinash; Wood, Thomas E.; Zajac, Amanda S.; Jennings, Lori L.; Grundberg, Ida; Bhattacharyya, Roby P.; Parry, Blair Alden; Villani, Alexandra-Chloé; Sade-Feldman, Moshe; Hacohen, Nir; Goldberg, Marcia B. title: Longitudinal proteomic analysis of plasma from patients with severe COVID-19 reveal patient survival-associated signatures, tissue-specific cell death, and cell-cell interactions date: 2021-05-03 journal: Cell Rep Med DOI: 10.1016/j.xcrm.2021.100287 sha: 24600b5cefc2d7c3cd838fd3112cb8f40401658a doc_id: 750142 cord_uid: xca1bkcv Mechanisms underlying severe COVID-19 disease remain poorly understood. We analyze several thousand plasma proteins longitudinally in 306 COVID-19 patients and 78 symptomatic controls, uncovering immune and non-immune proteins linked to COVID-19. Deconvolution of our plasma proteome data using published scRNAseq datasets reveals contributions from circulating immune and tissue cells. Sixteen percent of patients display reduced inflammation yet comparably poor outcomes. Comparison of patients who died to severely ill survivors identifies dynamic immune cell-derived and tissue-associated proteins associated with survival, including exocrine pancreatic proteases. Using derived tissue-specific and cell type-specific intracellular death signatures, cellular ACE2 expression, and our data, we infer whether organ damage resulted from direct or indirect effects of infection. We propose a model in which interactions among myeloid, epithelial, and T cells drive tissue damage. These datasets provide important insights and a rich resource for analysis of mechanisms of severe COVID-19 disease. COVID-19 has caused >1 million deaths globally. Disease varies considerably, 1-4 ranging from an asymptomatic carrier state to severe illness, organ dysfunction and death. 5 Implicated in the pathophysiology of severe disease is immune dysfunction, involving both hyper-immune responses (activated inflammatory cascades, cytokine storm, tissue infiltrates, damage) and hypo-immune responses (relative lymphopenia, impaired T-cell function, impaired interferon antiviral responses, reduced viral clearance). [5] [6] [7] [8] To date, many studies addressing the immune response to SARS-CoV-2 are limited by small sample sizes or analyze narrow sets of immune mediators, 2, 4, [9] [10] [11] [12] [13] though multiomic approaches are beginning to overcome these limitations. 14 By analyzing responses to SARS-CoV-2 using two unbiased plasma proteomic methodologies in a large cohort of acutely ill patients presenting to a large urban Emergency Department (ED), we uncover protein signatures associated J o u r n a l P r e -p r o o f 4 with COVID-19 infection, severity, and death. To gain insights into underlying disease mechanisms, we map these to specific cell types in the context of relevant clinical phenotypes. We enrolled 384 unique subjects who presented with acute respiratory distress suspected or known to be due to COVID-19 infection. 306 patients were subsequently confirmed to be COVID-19-infected. We classified patients by acuity levels A1-A5 on days 0, 3, 7, and 28 (based on WHO Ordinal Outcomes Scale: 15 A1, died; A2, intubated, survived; A3, hospitalized on oxygen; A4, hospitalized without oxygen; A5, discharged), with the primary outcome of maximal acuity (Acuity max ) within 28 days of enrollment ( Figure 1A ; Table S1 ). COVID-19-positive patients were younger than COVID-19negative patients (median age 58 vs. 67 years, respectively), with a wide age distribution ( Figure S1B ), and were predominantly Hispanic (54% vs. 15%, respectively). Clinically measured non-specific inflammatory markers, including C-reactive protein (CRP) and ferritin, were significantly higher in COVID-19-positive than COVID-19-negative patients; 28-day outcomes were similar ( Figure S1B ). Given that enrollment occurred early in the pandemic, few patients received targeted therapies that might be expected to alter the disease course; 6 received remdesivir vs. placebo and 22 received anti-IL6 receptor monoclonal antibody vs. placebo (both as study protocols), and dexamethasone was not administered as usual care for COVID-19. We analyzed 1472 unique plasma proteins measured by Proximity Extension Assay (PEA) using the Olink platform (Olink® Explore 1536) for all patients on day 0 (D0, N=383, one assay outlier excluded) and for COVID-19-positive patients still hospitalized on days 3 (D3, N=217) and 7 (D7, N=143) (Tables S2 and S3). Time since symptom onset at presentation ranged from 0 to 31 days (median 7 days). Unsupervised clustering of D0 protein levels shows clustering by COVID-19 status, age, acuity, ethnicity, and kidney disease ( Figure S1A ). To identify proteins differential between COVID-19-positive and COVID-19-negative patients, linear models were fit to each protein at D0 with COVID-19 status as a main effect and adjusted for age, demographics, and key comorbidities (Figures 1B, S2 ; Table S3 ). Hierarchical clustering of patients using these differentially expressed proteins demonstrated a clear separation of the majority of COVID-19-positive from COVID-19-negative patients (Figures 1C, S1C-S1E) . COVID-19-positive patients displayed higher expression of viral response and interferon pathway proteins, including DDX58 (RIG-I), type II (IFNγ) and type III (IFNλ1) interferons, and the proinflammatory cytokines CCL7, CXCL10, and CXCL11 ( Figures 1D and 1E) , with enrichment of proteins in pathways associated with vaccine response, innate immune activation, and T cell function (Figures S1F). Fifty (16%) COVID-19-positive patients clustered with COVID-19-negative patients, displaying lower levels of the typical COVID-19-positive inflammatory signature (Figure 1C ), yet with mortality similar to the main cluster of COVID-19-positive patients (Table S1); although significantly older than the main COVID-19-positive patients (median age 69 vs 57 years), with more cardiac and kidney comorbidities, this subset is comparably ill with a distinct low-inflammatory proteomic signature. To infer potential immune cell subtype origins of key proteins, we mapped the differential protein expression (Olink assay) in COVID-19-infected patients to published single-cell RNA sequencing (scRNAseq) profiles from peripheral blood mononuclear cells (PBMCs) and bronchoalveolar lavage (BAL) samples from COVID-19-infected patients 8, 16, 17 (Figures 1F, S3A-S3B ). The majority of proteins were selectively expressed in circulating plasmablasts (e.g. RRM2, WARS, PRDX1) and myeloid cells (e.g. CD14, SIGLEC1, SIGLEC10, IL1RN, CCL8, CXCL10), particularly monocytes and neutrophils, consistent with the reported remodeling of these cell types in infected patients. 8, 16, 17 A smaller group of proteins, expressed strongly in peripheral CD8+ T cells and natural killer (NK) cells, reflected cytotoxic responses, including IFNγ, granzymes B and H (GZMB, GZMH), which trigger cell death upon delivery into target cells, and the receptor LAG3 ( Figure 1F ). Whereas membrane embedded LAG3 inhibits T cell activation, soluble LAG3, such as we observed in plasma, functions as an immune adjuvant. 18, 19 A set of proteins was found to be overlapping within BAL cells and J o u r n a l P r e -p r o o f 6 circulating myeloid and T cells; BAL lung epithelial cells additionally expressed proteins not detected in the plasma ( Figure S3B ). As these datasets were generated from distinct cohorts of patients, the conclusions drawn will require validation in individuals in whom plasma proteomics is performed in parallel with scRNA-seq of PBMCs and BAL samples. Similar to previous reports, [20] [21] [22] Acuity max of COVID-19 patients was significantly correlated with age, D0 acute kidney dysfunction, lactate dehydrogenase (LDH), lymphopenia, acute inflammatory markers (ESR, CRP, D-dimer, ferritin), and the pre-existing comorbidities kidney disease, diabetes, smoking, and heart disease (Figures 2A, S3C-S3E) . Distinct from some reports, [20] [21] [22] Acuity max was not significantly correlated with race, ethnicity, or body mass index (BMI). Virus neutralization activity by plasma was highly correlated with inflammatory markers, absolute neutrophil count (ANC), and COVID-19-positive status, but not with Acuity max (Figures 2A, 3A, S3C-S3D ). Unsupervised clustering of COVID-19-positive patient samples demonstrated separation of patient samples by Acuity max , severity (severe: Acuity max A1-A2; non-severe: Acuity max A3-A5), age, and time point (Figure 2B, S3F) . To identify proteins associated with Acuity max levels and severity, we fit linear mixed models (LMMs), which correct for non-independence of time course data, to protein values with time and either Acuity max or severity as main effects, and with covariates age, demographics, and key comorbidities (Figures 2C, S4A-S4E ; Table S3B ). At D0, 251 Olink plasma proteins were differentially expressed between severe and non-severe patients, 694 at D3, and over 767 at D7 (Figure 2D-2F; Table S3B ). Because many patients with mild disease were discharged from the hospital within 3 days of admission, and D3 and D7 samples were collected from the subset of patients who remained hospitalized at these timepoints, D3 and D7 samples represent a generally sicker population than D0 samples ( Figure 2E ). The increased numbers of severity-associated proteins at the D3 and D7 timepoints indicates that even though the population is generally sicker, the differences between those with severe disease and those with non-severe disease becomes more pronounced with time; these dynamic changes likely reflect clinically relevant phenotypes and underlying disease processes. These severity proteins showed signals for enrichment in pathways implicated in COVID-19 inflammatory response, including IFNγ, IL6, and TNF signaling, and in tissue remodeling, including KRAS signaling and epithelial to mesenchymal transitions (Table S4) . Hierarchical clustering of patients by D0 severity-associated proteins revealed multiple distinct clusters of severe patients (Figure 2C ), indicating that severe disease is phenotypically heterogeneous and underscoring the presence of multiple phenotypes of patients with severe disease, beyond the single subgroup described above that displayed a lowinflammatory proteomic signature ( Figure 1C ). Similar to proteins associated with COVID-19-positive status (Figure 1) , the majority of circulating proteins associated with severity were most highly transcriptionally expressed in myeloid and plasmablast subsets ( Figure S5 ). To test whether D0 plasma proteins predict subsequent disease severity, we built a classifier of severe disease (Acuity max A1 or A2, Olink data) using elastic-net logistic regression with crossvalidation; the classifier yielded good predictive performance (AUC 0.85, CI 0.81-0.86) (Figures 2G, S4F and S4G), though an independent validation dataset is needed. Amongst the strongest weighted proteins in the predictor were IL6, IL1RL1, PTX3, and the IL1 receptor inhibitor IL1RN, consistent with our LMM results, the epithelial damage marker keratin-19 (KRT19, a predominantly intracellular cytoskeletal protein 23 ), and the apoptosis inhibitor TRIAP1 24 ( Figure S4F) . The strength and weighting of the this predictor highlight that disease severity can be accurately predicted at the time of presentation to the hospital, that proinflammatory signatures are associated with severity, and that severity-associated proteins identified both here (PTX3, IL1RN) and previously (IL6 2-4, 9, 10, 25, 26 ) contribute to a robust predictor. J o u r n a l P r e -p r o o f 8 Virus neutralization activity was detected in plasma from nearly all COVID-19-positive patients ( Figure 3A , Table S5 ), consistent with previous reports. 27, 28 Consistent with the observed lack of correlation with Acuity max (Figure 2A) , neutralization activity increased over time among the majority of both severe and non-severe patients (Figures 3B-3D) , indicating that, as previously described, 27 neutralization activity per se does not predict milder disease. However, neutralization activity was inversely correlated with age and age-related comorbidities (Figure 2A) , as previously observed, 27 displaying age-associated decreases in both the rate of increase over time and the level of neutralization activity achieved (Figures 3E and 3F ). The negative impact of age on the rate of rise in neutralization over time was observed only among patients who died (A1) (Figure 3E) , suggesting that disease processes present in severe illness contribute to impaired adaptive immune responses. D0 plasma protein levels (Olink assay) predicted neutralization levels at D3 (AUC 0.83, confidence interval (CI) 0.80-0.85), with many proteins contributing to the prediction being independent of those associated with severity ( Figures 3G and 3H) , though this needs to be validated in an independent dataset. Amongst the proteins most often selected in predicting neutralization were those involved in induction of apoptosis (TNF superfamily members TNFSF10, TNFSF8, and galectin-7 (LGALS7B)), phagocytosis (BRK1), T cell proliferation (IL2), and tissue regeneration and proliferation (EGFR, PTEN, PLA2G10, DKK3, RRM2). To identify plasma proteins differentially expressed between patients with high and low neutralization titers, we also used a linear mixed model with neutralization level and time as main effects ( Figure 3I) ; this identified several proteins expressed in plasma cells, (e.g., MZB1, SDC1), and others known to be important for priming (e.g., CD40LG). Amongst the proteins most significantly highly expressed in patients with low neutralization titers were CXCL10, which has recently been implicated to be negatively correlated with CD4+ T cell features associated with antibody titers, 27 and GPA33, a marker of thymic regulatory T cells 29 (Figure 3I) . These findings indicate that elderly patients who do poorly display distinctive neutralization activity-associated protein J o u r n a l P r e -p r o o f profiles that may be useful in clinical prediction algorithms, vaccine response prediction, and identifying subsets of patients most appropriate for trials of antibody-based therapy. Acute respiratory distress syndrome (ARDS) is the leading cause of death in COVID-19. To gain insight into processes that might underlie the development of ARDS, we compared patients who died (A1, median time to death 9 days ) to those receiving mechanical ventilation yet surviving (A2) (Figure 4 , Table S3C ); by clinical criteria, essentially all patients in both groups had ARDS, although not all who died were mechanically ventilated. At D7, 24 plasma proteins were significantly differentially-expressed between the two groups ( Figure 4A ); amongst those elevated in patients who died were previously reported proinflammatory proteins (proinflammatory cytokines IL6 [2] [3] [4] 9, 10, 25, 26 , IL8 2, 3, 9, 10, 26 , and CXCL10 2, 9, 10, 26 ), chemokines that attract monocytes/T cells (CCL2, CCL7, CCL8, CCL20), a receptor for IL33 that activates T cells and mast cells (IL1RL1), regulators of innate immunity (PTX3), the endothelial and monocyte receptor for the growth factors VEGF and PGF (FLT1), and a multi-functional cytokine (IL24). Most of these proinflammatory proteins showed similar upward trajectories in survivor and non-survivor groups through D3, but diverged at D7, with a decline in survivors and a sustained elevation in those who died ( Figure 4C ); D0 plasma levels were associated with survival ( Figure 4B ). Whereas an upward trajectory at D3 could result from the sicker composition of the D3 patient population compared with the D0 population, the subsequent divergence in trajectories observed at D7 between survivors and those who died instead likely represents relevant biological processes associated with death. Several exocrine pancreas proteases and protease inhibitors (CTRC, CELA3A, CPA2, CTRB1, AMY2A, AMY2B) were reduced in the plasma of those who died relative to survivors (A2) (Figures 4A and 4D) ; whereas their relevance in COVID-19 remains uncertain, many display anti-inflammatory effects in mouse models. [30] [31] [32] [33] Few patients received the anti-inflammatory dexamethasone because it was not yet the standard-of-care at J o u r n a l P r e -p r o o f 10 the time of patient recruitment. These findings suggest that survival from COVID-19 ARDS is associated with decreased proinflammatory and increased anti-inflammatory responses over time. To elucidate patterns of tissue damage, we calculated gene expression signatures associated with specific tissues using the Genotype-Tissue Expression (GTEx) 34 dataset and confirmed using published scRNA-seq datasets that these signatures were expressed primarily in non-immune cells that compose the structure of tissues (Figures S6A-S6F) . Because of the breadth of the database, we chose for analysis of these signatures the SomaScan platform 35 , which detects greater than 4,400 proteins. We confirmed a high degree of overlap in differentially expressed proteins between severe and non-severe patients using this platform as compared to the Olink platform (Table S6 ; STAR methods). We identified plasma proteins that overlap with these tissue signatures and filtered for intracellular proteins (Table S7A) , based on the principle that intracellular proteins found in the circulation represent release of cellular cytosolic contents in the setting of tissue damage. When possible, we validated our tissue-specific signatures against clinically measured laboratory values, finding significant correlations with tissue-specific clinical markers of damage (Figures 5B, S6I-S6L ). In patients with severe COVID-19 (A1, A2), among the organ-specific signatures, heart, lung, and skeletal muscle intracellular plasma protein signatures were elevated as early as D0 and remained elevated to D7 (Figures 5A and S6G) . Elevated D0 heart and skeletal muscle protein signatures portended poor overall survival ( Figure 5C ; Table S7C ). Our lung signature contained only one protein, the intracellular cytoskeletal protein keratin-7 (KRT7); therefore, this particular signature should be interpreted with caution. Together, our tissue damage signatures suggest that COVID-19 illness drives organ damage that can be detected in the circulation upon hospital presentation. Lung damage due to epithelial death J o u r n a l P r e -p r o o f 11 We mapped intracellular severity-associated plasma proteins to organ-specific cell types using published scRNAseq datasets. Datasets from healthy lung, kidney, pancreas, and liver revealed that In contrast, heart cell-type plasma signatures did not correlate with ACE2 expression (Figure 5F ), suggesting that heart damage may be largely an indirect effect of the disease process (assuming that ACE2 expression is similar in healthy individuals and COVID-19-infected patients); the implications of the observed correlation with TMPRSS2 expression on cells (R=0.62, p<0.001; Figure 5F ) are unclear. Unlike in circulating immune cells (Figure S5A-S5D) , the expression of a larger subset of severity-associated intracellular plasma proteins was found in effector and cytotoxic CD8+ T cells and NK cells located within the lung ( Figure 5D ). Intracellular plasma signatures from cell subsets that do J o u r n a l P r e -p r o o f not express ACE2 or TMPRSS2 (Figure 5D ), including these cytotoxic CD8+ T cells and NK cells, epithelial progenitors, and alveolar macrophages, may result from bystander cell death. Together, these findings suggest that immune-mediated death of virus-infected lung epithelial cells is a key feature of severe disease, that damage to several other cell types is indirect, and that cell death is detectable in the circulating proteome. To gain insights into immune activation in severe disease, we looked for enrichment of inflammatory pathways among plasma proteins that are normally secreted or membrane bound. Within the D0 Olink severity-associated proteome (and consistent with SomaScan results), we analyzed enriched pathways against the entire measured protein set and found enrichment in signaling by cytokines IL6 and IL10, activation of myeloid and T cells by the cytokine IL17, airway pathology in COPD, cardiac hypertrophy signaling, signaling by the proinflammatory danger-associated molecular pattern (DAMP) molecule HMGB1, and signaling via the glucocorticoid receptor ( Table S4A) . Analysis of upstream regulators revealed TNF to be the most significant putative regulator of these pathways (Table S4B) . To identify cellular mechanisms regulated by severity-associated proteins, we analyzed ligandreceptor interactions 38 using the BAL fluid cell dataset from COVID-19-infected patients 36 (Figure 6 ). From D0 to D3, the number of predicted ligand-receptor interactions increased dramatically ( Figure 6A ), predominantly represented by ligand-receptor interactions occurring in lung epithelial cells, T cells, and mast cells ( Figure 6B ). Most dramatic changes in terms of fold-change were in mast cells though the total number of interactions was lower than other cell types. This was driven by their interactions with other mast cells, CD4+ and CD8+ T cell subsets, and epithelial progenitors ( Figure 6B ). Consistent with this, the mast cell function marker tryptase was differentially expressed between severe and non-severe patients over time ( Figure S6H ). Mast cell activity in lung tissue may be related to signaling by the proinflammatory cytokine IL18, 39 with release of proinflammatory cytokines IL4 and IL13, 40 Figure 6C ). Several T cell activating and exhaustion signals were upregulated and may originate from lung epithelial cells, including, as early as D0, PVR triggering of the receptors TIGIT and CD96, which induces an immunosuppressive and non-cytotoxic response, and at D3, IL18 and IL7 ( Figure 6C) , which dampen T cell exhaustion 43 and maintain non-exhausted T cells, 44 respectively. IL18 is a predominant effector released upon inflammasome activation and pyroptotic cell death; the observed increase of IL18 here thus suggests increased inflammasome activation in severe COVID-19. We examined lung epithelial cell receptor interactions with severity-associated ligands ( Figure 6D) ; a correlation matrix of plasma ligand abundance identified coregulated groups of proteins that act on lung epithelial cells, including protein modules for regeneration and growth factor signaling (module 1: growth factors EGF, TGFB1, and VEGFA, and anti-apoptotic factor DKK1; module 2: growth factors BMP6 and HGF, and Wnt signaling pathway activators RSPO3 and RSPO1) and for IL6 pathway other myeloid cells, most apparent at D3, may drive later-stage damage, immune suppression, and regulation of phagocytosis (e.g., ligand-receptor interactions TGFB1-ITGB6 and -ITGB8, SPP1-ITGAV, SIRPA-CD47; Figure 6E ). The interaction of TGFB1 proprotein with ITGB6/8 on lung epithelial cells likely releases active TGFB1, 45, 46 which inhibits cytotoxic T cells, naïve T cell and B cell proliferation, and enhances Treg differentiation. 47 The interaction of SPP1 with its receptor integrin ITGAV is associated with lung fibrosis and is proposed to inhibit apoptosis 48 . The interaction of CD47, which is ubiquitously expressed on cell surfaces, with SIRPa on macrophages inhibits phagocytosis. Based on these data, we propose a model of COVID-19-induced immune and cellular responses and cell death within the lower airways. We posit that early monocyte activation drives T cell recruitment, activation, and exhaustion. This is followed by a temporally delayed activation of additional proinflammatory monocyte pathways and repair and regeneration within lung epithelial cells ( Figure 7 ). In patients who die, there is increased expression over time of severity-associated, monocytesecreted ligands that interact with T cells (e.g., IL18, IL7, IL15), suggesting an inability to contain proinflammatory immune responses. (Figure 1) . The large size of our cohort enabled the identification of a substantial subset (16%) of COVID-19-infected patients with inflammatory signatures similar to COVID-19-negative controls but outcomes similar to those of other COVID-19positive patients. In these patients, the muted levels of circulating inflammatory proteins suggest that much of the underlying pathology is due to viral infection itself and pre-existing co-morbidities in the setting of advanced age rather than immune-mediated processes. In this case, clinical response to immune-targeted therapies, including dexamethasone, could be sub-optimal, and anti-viral and other interventions may be more impactful. Over 250 proteins were independently associated with COVID-19 severity, with multiple inflammatory mediators associated with death in ARDS patients, including previously-identified markers (IL6, 2-4, 9, 10, 25, 26 IL8, 2, 3, 9, 10, 26 and CXCL10 2, 9, 10, 26 ) and several other markers (CCL2, CCL7, CCL8, CCL20, AREG, IL1RL1, FLT1, IL24) (Figures 2 and 4) , including some recently reported in a smaller study of hemodialysis-dependent COVID-19 patients from a distinct geographic region, 49 independently validating our findings. Of note, several exocrine pancreas proteases and other proteases were significantly associated with survival of patients with ARDS. Determining whether these proteases are markers of underlying processes that contribute to survival or are directly contributing to a beneficial anti-inflammatory response will require further investigation. Prior proteomics studies have included fewer COVID-positive patients (N=46, N=22, N=48) compared with ours (N=306) and have not obtained a sample at point of hospital arrival, yet these studies have the advantage of using unbiased methods (e.g., liquid chromatography / mass spectrometry) for protein discovery. 11, 13, 50 Among the few overlapping proteins from these prior datasets, our findings are consistent, yet compared to these other works, our data show overall stronger associations of proinflammatory cytokines and chemokines with severity and death, and less strong associations with J o u r n a l P r e -p r o o f 16 complement activation and coagulation signals. These differences may in part reflect an enrichment in our panel of proteins of immune-mediated markers. This enrichment enables us to better infer immune cell function and cellular communication at play in severe COVID-19. Our classifier of severity did not perform as well as in the above-mentioned studies. 13, 50 Decreased classifier performance may reflect increased heterogeneity of our population with respect to comorbidities and treatments received, resulting in less distinct proteomic signatures in severe versus mild COVID-19, or it may be a limitation of the finite number of proteins assayed on our platform. We observed a strong association between advanced age and attenuated neutralizing antibody production and identified discrete plasma protein signatures associated with the neutralization response (Figure 3) , which may predict vaccination response and have implications for vaccination strategies. The strong predictive value of D0 plasma proteins highlights the presence of severityassociated pathways that may be amenable to early therapeutic intervention. Incorporation of derived biomarkers into diagnostics could stratify high-risk patients for tailored therapies. 55 and a TNFα pathway signature observed by scRNA-seq in COVID-19 severityassociated monocytes. 17, 56 Few severity-associated proteins were part of the type I interferon response, in agreement with published data 4, 7 and with the association of COVID-19 severity with genetic variants that weaken interferon-related viral sensing. 57 Our proteomic analysis of a large cohort of COVID-19 patients reveals COVID-19 severity-and mortality-associated pathways that may serve as potential therapeutic targets and provide the basis for diagnostics to stratify high-risk patients for tailored therapies and earlier interventions. The proteomic datasets we generated, which are freely available for investigators from Mendeley Data at https://data.mendeley.com/datasets/nf853r8xsj/1, will serve as a valuable resource in COVID-19 discovery. First, because it was not feasible to collect a second cohort for validation of our findings, trends seen here will need to be corroborated in future studies, especially at other institutions. Second, blood collections at later time points were biased towards sicker patients, as they were more likely to remain hospitalized, thus skewing the balance of severity groups over time, affecting the comparison of differentially expressed proteins, and limiting the ability to interpret effect estimate trends. Third, relative contributions to the plasma proteome from circulating immune cells or lung-resident cells was inferred from mapping to published scRNA-seq data from PBMC and BAL datasets, respectively. Whereas consistent patterns of co-expression were observed between our data and published scRNA-seq datasets, because circulating plasma proteins may have multiple sources, confirmation of cell or tissue origin will require validation in individuals in whom plasma proteomics is performed in parallel with scRNA-seq of PBMC and BAL samples. Fourth, the mapping of peripheral plasma proteins onto tissue expression was done using scRNA-seq data from normal, healthy tissues that may not reflect expression profiles in SARS-CoV-2 infection. Fifth, in linear mixed models, we used significance in the interaction term of severity x time to define our subset of severity-associated proteins; thus, significant association of a protein with severity required a dynamic effect over time, We worked to ensure gender balance in the recruitment of human subjects. We worked to ensure ethnic or other types of diversity in the recruitment of human subjects. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science. The author list of this paper includes contributors from the location where the research was conducted and who participated in the data collection, design, analysis, and/or interpretation of the work. Blood samples were collected in EDTA tubes and processed no more than 3 hours post blood draw in a Biosafety Level 2+ laboratory on site. Whole blood was diluted with room temperature RPMI medium in a 1:2 ratio to facilitate cell separation for other analyses using the SepMate PBMC isolation tubes (STEMCELL) containing 16 mL Ficoll (GE Healthcare). Diluted whole blood was centrifuged at 1200 g for 20 minutes at 20 °C. After centrifugation, plasma (5 mL) was pipetted into 15 mL conical tubes and placed on ice during PBMC separation procedures, centrifuged at 1000 g for 5 min at 4°C, aliquoted into cryovials, and stored at -80°C. Study samples (45 µL) were randomly allocated onto 96well plates based on disease outcome grouping and were treated with 1% Triton X-100 for virus inactivation at room temperature for 2 hrs. The Olink Proximity Extension Assay (PEA) is a technology developed for high-multiplex analysis of proteins using 1 µL of sample. In PEA, oligonucleotide-labelled monoclonal or polyclonal antibodies (PEA probes) are used to bind target proteins in a pair-wise manner thereby preventing all cross-J o u r n a l P r e -p r o o f 31 reactive events. Upon binding, the oligonucleotides come in close proximity and hybridize followed by extension generating a unique sequence used for digital identification of the specific protein assay. With recent developments, PEA enables an increased number of 384 multiplex assays and higher throughput using next-generation sequencing (NGS) as a readout method. PEA probe design is based on addition of Illumina adapter sequences, unique barcodes for protein identification and indexes to distinguish samples in multiplex sequencing. The protocol has also been miniaturized and automated using liquid handlers to further improve robustness and maximize output. The full library (Olink® Explore 1536) consists of 1472 proteins and 48 controls assays divided into four 384-plex panels focused on inflammation, oncology, cardiometabolic and neurology proteins. In each of the four 384-plex panels, overlapping assays of IL6, IL8 (CXCL8), and TNF are included for quality control (QC) purposes. Library content is based on target selection of low-abundant inflammation proteins, actively secreted proteins, organ-specific proteins leaked into circulation, drug targets (established and from ongoing clinical trials), and proteins detected in blood by mass spectrometry. Selection, classification, and categorization of proteins were based on using various databases (e.g. Gene Ontology), the Blood Atlas -the human secretome (www.proteinatlas.org), a collaboration with the Institute of Systems Biology, Seattle WA, for tissue-specific proteins, www.clinicaltrials.gov for mapping of drug targets, detection of proteins in blood measured by mass spectrometry and finally, various text-mining approaches identifying protein biomarkers described in the literature. The analytical performance of PEA is carefully validated for each protein assay; performance data are available at www.olink.com. Technical criteria include assessing sensitivity, dynamic range, specificity, precision, scalability, endogenous interference, and detectability in healthy and pathological plasma and serum samples. In the immune reaction, 2.8 µL of sample is mixed with PEA probes and incubated overnight at 4 °C. Then, a combined extension and pre-amplification mix is added to the incubated samples at room temperature for PCR. The PCR products are pooled before a second PCR step following addition of J o u r n a l P r e -p r o o f 32 individual sample index sequences. All samples are thereafter pooled, followed by bead purification and QC of the generated libraries on a Bioanalyzer. Finally, sequencing is performed on a NovaSeq 6000 system using two S1 flow cells with 2 x 50 base read lengths. Counts of known sequences are thereafter translated into normalized protein expression (NPX) units through a QC and normalization process developed and provided by Olink. The Olink PEA QC process consists of specifically engineered controls to monitor the performance of the main steps of the assays (immunoreaction, extension and amplification/detection) as well as the individual samples. Internal controls are spiked into each sample and represent a control using a nonhuman assay, an extension control composed of an antibody coupled to a unique DNA-pair always in proximity and, finally, a detection control based on a double stranded DNA amplicon. In addition, each plate run with Olink includes a control strip with sample controls used to estimate precision (intra-and inter-coefficient of variation). A negative control (buffer) run in triplicate is utilized to set background levels and calculate limit of detection (LOD), a plate control (plasma pool) is run in triplicate to adjust levels between plates, and a sample control (reference plasma) is included in duplicate to estimate CV between runs. NPX is Olink's relative protein quantification unit on a log2 scale and values are calculated from the number of matched counts on the NovaSeq run. Data generation of NPX consists of normalization to the extension control (known standard), log2-transformation, and level adjustment using the plate control (plasma sample). The SomaScan Platform for proteomic profiling uses 4979 SOMAmer reagents, single-stranded DNA aptamers, to 4776 unique human protein targets. The modified aptamer binding reagents, 35 SomaScan assay, 35, 59 its performance characteristics, 60, 61 and specificity 62, 63 to human targets have J o u r n a l P r e -p r o o f 33 been previously described. The assay used standard controls, including 12 hybridization normalization control sequences to control for variability in the Agilent readout process and 5 human calibrator control pooled replicates and 3 quality control pooled replicates to mitigate batch effects and verify the quality of the assay run using standard acceptance criteria. The SomaScan Assay is run using 96-well plates; 11 wells are allocated for control samples used to control for batch effects and to estimate the accuracy, precision, and buffer background of the assay are replaced by the NRVRQGYS sequence of HIV-1, a strategy previously described for retroviruses pseudotyped with SARS-CoV S. 64 The truncated SARS-CoV-2 S fused to gp41 was cloned into pCMV by Gibson assembly to obtain pCMV-SARS2∆C-gp41. psPAX2 and pCMV-VSV-G were previously described. 65 pTRIP-SFFV-EGFP-NLS was previously described 66 ACE2/TMPRSS2 cells, which were selected with 320 µg/ml of hygromycin (Invivogen) and used as a target in pseudotyped SARS-CoV-2 S lentivirus neutralization assays. The protocol for lentiviral production was previously described. 65 All statistical analyses for the clinical and proteomics data in this cohort was performed using R version 4.0.2. All plots were generated using the ggplot2 package in R with the exception that the correlation plots were generated using the corrplot() function in R. Pairwise Pearson correlations were calculated for all proteins, and rows and columns of correlation plots were ordered based on hierarchical clustering. All heatmaps were generated using the heatmap3 67 package and NPX values for each protein centered to have a mean of 0 and scaled to have a standard deviation of 1 within each protein. Scaled data greater than either 4 or 5 standard deviations from the mean were truncated at +/-4 or 5. Rows and columns were ordered based on hierarchical clustering. Principal components analysis (PCA) was performed using all proteins and all samples using the prcomp() function in R. Unsupervised clustering by UMAP was performed using all proteins, and J o u r n a l P r e -p r o o f 36 either all samples or just day 0 samples, using the umap() function in R, and UMAP coordinates were plotted using the ggplot2 package. Unsupervised clustering by tSNE was by first performing dimensionality reduction by PCA and then taking the top principal components for a tSNE embedding using the Rtsne package and the argument pca=TRUE. k-nearest neighbor (KNN) graphs and Louvain community detection was performed using custom code and the FNN package provided in R. Linear regression models were fit independently to each protein using the lm package in R with protein values (NPX for Olink data) as the dependent variable. The models included a term for COVID-19 status and covariates for age, gender, ethnicity, heart disease, diabetes, hypertension, hyperlipidemia, pulmonary disease, kidney disease, immuno-compromised status to control for any potential confounding. P-values were adjusted to control the false discovery rate (FDR) at 5% using the Benjamini-Hochberg method implemented in the emmeans package in R. Linear mixed effects models (LMMs) were fit independently to each protein using the lme4 66 package in R with protein values (NPX for Olink data) as the dependent variable. The model for severity included a main effect of time, a main effect of severity, the interaction between these two terms, and a random effect of patient ID to account for the correlation between samples coming from the same patient. Covariates for age, gender, ethnicity, heart disease, diabetes, hypertension, hyperlipidemia, pulmonary disease, kidney disease, and immuno-compromised status were included in the model to control for any potential confounding. Significance of the three model terms was determined with an F-test using Satterthwaite degrees of freedom and type III sum of squares implemented with the lmerTest 68 package in R. P-values for the three model terms of interest were adjusted to control the FDR at 5% using the Benjamini-Hochberg method. Group differences were calculated for each protein passing the FDR threshold with p-values adjusted using the Tukey method implemented by the J o u r n a l P r e -p r o o f 37 emmeans package in R. Group differences with Tukey adjusted p-values less than 0.05 were considered statistically significant. Note, all other models were run similarly with time in addition to either Acuity max , age, or both age and severity as main effects instead of severity. For SomaScan data, LMMs for severity and time as main effects were run as was done for Olink. Overall, significant proteins were found to be partially overlapping with those found for Olink (hypergeometric test p=0.002) ( Table S3B and S6); for example, at D0, of the 1085 overlapping assays between the two platforms, 779 proteins were significant for severity or interaction term in Olink data, and 669 in the SomaScan data, with 460 proteins overlapping between the two sets. In other words, 69% of the SomaScan severity-associated proteins overlapped with those identified by Olink data. The non-overlapping assays in part due to a narrower dynamic range for some of the SomaScan assays. Model residual values were extracted from LMMs (as described above) independently fit to every protein using NPX as the dependent variable, age, gender, ethnicity, heart disease, diabetes, hypertension, hyperlipidemia, pulmonary disease, kidney disease, and immuno-compromised status as covariates and a random effect of patient ID to account for the correlation between samples taken from the same patient. These residuals represent the remaining unexplained variance in the protein expression after accounting for the effects of the included covariates. For the Olink assay, the likelihood of observing 1131 statistically significant proteins for the Acuity max model term and 963 statistically significant proteins for the time and Acuity max interaction term from the linear mixed models was evaluated using permutation testing. Acuity max group was randomly permuted 100 times among patients and for each permutation the full LMM procedure was followed. J o u r n a l P r e -p r o o f 38 None of the permutations produced as many statistically significant results as were observed when using the true Acuity max groupings. For analysis of functional pathways, two different strategies were employed: (i) gene set enrichment analysis 69 using the ClusterProfiler package in R using the C7 immunologic signature gene set from the molecular signatures database v7.2 (https://www.gsea-msigdb.org/gsea/msigdb); and (ii) Ingenuity Pathway Analysis (Qiagen) on our gene lists using default parameters from the vendor. Pathways were visualized in dot plots and bar plots using the ggplot2 package in R. Predictive performance of severity within 28 days was performed using all proteins and model covariates and was estimated using elastic net logistic regression implemented by the glmnet 70 package in R and 100 repeats of 5-fold cross validation. Model tuning was performed using the caret package in R. Variable scaling, model tuning, and feature selection was performed independently for each held-out fold such that the predictive model was never exposed to the held-out data. Measures of predictive performance are reported as medians and 95% confidence intervals calculated from the 100 repeats of the cross validation. Features were ranked by how frequently they were chosen to be included in the model. Generalized linear models with lasso regularization were trained (using the R caret package) on COVID-19-positive patient proteome samples (consisting of 1472 Olink protein features) from each selected day (0, 3, and 7) to neutralization levels (≤ or > 75%). For percent neutralization predictions, protein levels at day 0 were used to predict binned neutralization categories at day 3. Repeated 5-fold cross validation (with a hyperparameter scan from 0.0001 to 1 to select the lambda constant yielding J o u r n a l P r e -p r o o f 39 the greatest prediction accuracy) was replicated 100 times to obtain a confidence interval for the area under the ROC curve (where ROC curves were generated using each patient's estimated probability while serving as the held-out fold). The average feature weights of the final models from each of the 100 rounds of 5-fold cross validation were used to identify proteins of importance. Orthogonally, 10fold cross validation was used to train and validate a random forest model (with default ntree=500 and mtry=38) to predict neutralization quartiles (0-25, 25-50, 50-75, 75-100%) and important proteins were identified based on mean decrease in Gini. To identify protein features that were independent from or overlapping with severity markers, the union of the top 50 important features from the lasso and random forest models were intersected with significantly variable proteins between severity groups on day 0 (from the LMM described above). We analyzed 4 publicly available scRNAseq PBMC datasets from COVID-19 patients, which were Normalization: Single cell/nuclei RNA-seq datasets from individual studies were aggregated and normalized using Scanpy. 79 Each study was subjected to identical pre-processing steps. First, UMI count values were winsorized, those above the 99th percentile of non-zero counts were reduced to the value of the 99th percentile (13 counts). Winsorized count data were normalized, so that UMI counts per cell/nucleus summed to 10,000, and then were logged, resulting in log(1+10,000*UMIs / total UMIs) for each cell/nuclei ("logtp10k"). Then the aggregated expression data were scaled using the scanpy `scale` function with zero_center=False. To prepare cell type labels, we mapped each annotation to a common reference list before training. Cells labeled with cell types with ambiguous mappings (e.g., "T cell" or "myeloid") were excluded from training. All scRNAseq gene expression data was analyzed in R version 4.0.2 using custom code to look at average expression of genes of interest in each cell type. Genes of interest were selected from the proteomic analysis, and the tissue distribution of these genes (or groups of genes) were evaluated in the different scRNAseq datasets. For visualization, gene expression was normalized across cell types (rows) with Z-scores and visualized in heatmaps using the heatmap3 function in R with hierarchical clustering of both cell types and genes. Where cell types were annotated on heatmaps, this was done by identifying cell types with the highest relative expression by Z-scores. The cell-type-specific intracellular gene list was defined as the top 20 genes with the highest relative expression for that cell type. Organ specific protein signatures were defined using RNA sequencing data from the Genotype-Tissue Expression (GTEx) Portal (https://www.gtexportal.org/home/). The median transcripts per million (TPM) of 56,200 genes across 54 non-diseased tissue sites were obtained. For each tissue site, the intersection of the top 500 highest TPM genes and the top 500 most variable genes (based on coefficient of variation across tissue types) was identified. Proteins that were also measured by SomaScan were extracted, validated for high tissue specific expression, and consolidated across related tissues for each organ of interest. Organ signatures were split based on localization (intracellular versus membrane/secreted) using UniProt and literature annotations. The values for each protein across all COVID-19-positive patients were scaled to Z scores, and the mean Z score of all proteins in an organ set was used as an overall signature score for a given patient. Single-cell RNAseq expression profiles (10X genomics) of immune cells isolated from BAL fluid of healthy and COVID-19-infected patients of varying severity from Bost et al, 2020 36 Table Legends Table S1 . Patient characteristics by 28- . Table S2 . List of proteins assayed using the Olink proteomics platform. Related to Figures 1, 2, 3 Table S2 ). See methods for derivation. Table S6 . SomaScan models of severity. Related to Figure 5 . Linear mixed model for SomaScan data with severity and time as a main effects and age, gender, ethnicity, heart disease, diabetes, hypertension, hyperlipidemia, pulmonary disease, kidney disease, immunocompromised status as covariates. Table S7 . Derived organ-specific intracellular plasma protein signatures. Related to Figure 5 Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications Longitudinal analyses reveal immunological misfiring in severe COVID-19 An inflammatory cytokine signature predicts COVID-19 severity and survival Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans Factors associated with COVID-19-related death using OpenSAFELY Comprehensive mapping of immune perturbations associated with severe COVID-19 Impaired type I interferon activity and inflammatory responses in severe COVID-19 patients Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe Systems-Level Immunomonitoring from Acute to Recovery Phase of Severe COVID-19 Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections Ultra-High-Throughput Clinical Proteomics Reveals Classifiers of COVID-19 Infection Cytokine profile in plasma of severe COVID-19 does not differ from ARDS and sepsis Proteomic and Metabolomic Characterization of COVID-19 Patient Sera Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19 Novel Coronavirus. COVID-19 Therapeutic Trail Synopsis A single-cell atlas of the peripheral immune response in patients with severe COVID-19 Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment Lymphocyte activation gene-3 induces tumor regression and antitumor immune responses Soluble human LAG-3 molecule amplifies the in vitro generation of type 1 tumor-specific immunity Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease Evaluating the Association of Clinical Characteristics With Neutralizing Antibody Levels in Patients Who Have Recovered From Mild COVID-19 in Cytokeratin 19 fragments in patients with acute lung injury: a preliminary observation TRIAP1/PRELI complexes prevent apoptosis by mediating intramitochondrial transport of phosphatidic acid Novel Outcome Biomarkers Identified With Targeted Proteomic Analyses of Plasma From Critically Ill Coronavirus Disease 2019 Patients A dynamic COVID-19 immune signature includes associations with poor prognosis Antigen-Specific Adaptive Immunity to SARS-CoV-2 in Acute COVID-19 and Associations with Age and Disease Severity Rapid Generation of Neutralizing Antibody Responses in COVID-19 Patients Proteomic Analyses of Human Regulatory T Cells Reveal Adaptations in Signaling Pathways that Protect Cellular Identity Chymotrypsin Reduces the Severity of Secretagogue-Induced Pancreatitis in Mice Carboxypeptidase E modulates intestinal immune homeostasis and protects against experimental colitis in mice Regulation of tissue inflammation by thrombin-activatable carboxypeptidase B (or TAFI) Natural single-nucleotide deletion in chymotrypsinogen C gene increases severity of secretagogue Transcriptomic signatures across human tissues identify functional rare Nucleic Acid Ligands With Protein-like Side Chains: Modified Aptamers and Their Use as Diagnostic and Therapeutic Agents Host-Viral Infection Maps Reveal Signatures of Severe COVID-19 Patients CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes Interleukin-18 has an Important Role in Differentiation and Maturation of Mucosal Mast Cells Mast cell production and response to IL-4 and IL Protective and pathogenic roles for mast cells during viral infections Mast cells can secrete vascular permeability factor/ vascular endothelial cell growth factor and exhibit enhanced release after immunoglobulin E-dependent upregulation of fc epsilon receptor I expression IL-18BP is a secreted immune checkpoint and barrier to IL-18 immunotherapy Inverse correlation between IL-7 receptor expression and CD8 T cell exhaustion during persistent antigen stimulation Force interacts with macromolecular structure in activation of TGF-beta Regulation of the Immune Response by TGF-beta: From Conception to Autoimmunity and Infection Fibrotic microenvironment promotes the metastatic seeding of tumor cells via activating the fibronectin 1/secreted phosphoprotein 1-integrin signaling Longitudinal proteomic profiling of dialysis patients with COVID-19 reveals markers of severity and predictors of death Plasma Proteomics Identify Biomarkers and Pathogenesis of COVID-19 Single-Cell RNA Expression Profiling of ACE2, the Receptor of SARS-CoV-2 Temporal and spatial heterogeneity of host response to SARS-CoV-2 pulmonary infection Tissue-Specific Immunopathology in Fatal COVID-19 Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 IFN-gamma and TNF-alpha drive a CXCL10 Induction of a regulatory myeloid program in bacterial sepsis and severe COVID-19 Inborn errors of type I IFN immunity in patients with life-threatening COVID Transcriptional and Cellular Diversity of the Human Heart Aptamer-based multiplexed proteomic technology for biomarker discovery Stability and reproducibility of proteomic profiles measured with an aptamerbased platform Plasma protein patterns as comprehensive indicators of health Genomic atlas of the human plasma proteome Co-regulatory networks of human serum proteins link genetics to disease Retroviruses pseudotyped with the severe acute respiratory syndrome coronavirus spike protein efficiently infect cells expressing angiotensin-converting enzyme 2 Transmission of innate immune signaling by packaging of cGAMP in viral particles ESCRT III repairs nuclear envelope ruptures during cell migration to limit DNA damage and cell death Heatmap3: an improved heatmap package with more powerful and convenient features Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles Regularization Paths for Generalized Linear Models via Coordinate Descent Single cell transcriptomics identifies focal segmental glomerulosclerosis remission endothelial biomarker Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter-and Intra-cell Population Structure Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis A molecular cell atlas of the human lung from single-cell RNA sequencing A cellular census of human lungs identifies novel cell states in health and in asthma Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 SCANPY: large-scale single-cell gene expression We owe deep gratitude to the study participants, Translational and Clinical Research Center (TCRC) and nursing staff, in particular Grace Holland, RN, Katherine Broderick, RN and Siobhan Boyce, RN, and Kathryn Hall, NP, for sample collection. We thank the Massachusetts General Hospital for institutional support to enable enrollment when access to clinical spaces was limited. We thank the Departments of Emergency Medicine and Medicine for maintaining needed staffing levels during enrollment, when many research funding sources were suspended. We thank Caroline Beakes and Nicole Russell for assistance with data entry, and Jayaraj Rajagopal, Itai Yanai, Patrick Ellinor, and Mark Chaffin for access to processed single-cell RNA-sequencing datasets.