key: cord-0743409-rfege9x3 authors: Juan Guardela, Brenda M.; Sun, Jiehuan; Zhang, Tong; Xu, Bing; Balnis, Joseph; Huang, Yong; Ma, Shwu-Fan; Molyneaux, Philip L.; Maher, Toby M.; Noth, Imre; Michaud, Gaetane; Jaitovich, Ariel; Herazo-Maya, Jose D. title: 50-gene risk profiles in peripheral blood predict COVID-19 outcomes: A retrospective, multicenter cohort study date: 2021-06-20 journal: EBioMedicine DOI: 10.1016/j.ebiom.2021.103439 sha: 180e370e78838d4881e754048d8a22b1f9d688e0 doc_id: 743409 cord_uid: rfege9x3 BACKGROUND: COVID-19 has been associated with Interstitial Lung Disease features. The immune transcriptomic overlap between Idiopathic Pulmonary Fibrosis (IPF) and COVID-19 has not been investigated. METHODS: we analyzed blood transcript levels of 50 genes known to predict IPF mortality in three COVID-19 and two IPF cohorts. The Scoring Algorithm of Molecular Subphenotypes (SAMS) was applied to distinguish high versus low-risk profiles in all cohorts. SAMS cutoffs derived from the COVID-19 Discovery cohort were used to predict intensive care unit (ICU) status, need for mechanical ventilation, and in-hospital mortality in the COVID-19 Validation cohort. A COVID-19 Single-cell RNA-sequencing cohort was used to identify the cellular sources of the 50-gene risk profiles. The same COVID-19 SAMS cutoffs were used to predict mortality in the IPF cohorts. FINDINGS: 50-gene risk profiles discriminated severe from mild COVID-19 in the Discovery cohort (P = 0·015) and predicted ICU admission, need for mechanical ventilation, and in-hospital mortality (AUC: 0·77, 0·75, and 0·74, respectively, P < 0·001) in the COVID-19 Validation cohort. In COVID-19, 50-gene expressing cells with a high-risk profile included monocytes, dendritic cells, and neutrophils, while low-risk profile-expressing cells included CD4(+), CD8(+) T lymphocytes, IgG producing plasmablasts, B cells, NK, and gamma/delta T cells. Same COVID-19 SAMS cutoffs were also predictive of mortality in the University of Chicago (HR:5·26, 95%CI:1·81–15·27, P = 0·0013) and Imperial College of London (HR:4·31, 95%CI:1·81–10·23, P = 0·0016) IPF cohorts. INTERPRETATION: 50-gene risk profiles in peripheral blood predict COVID-19 and IPF outcomes. The cellular sources of these gene expression changes suggest common innate and adaptive immune responses in both diseases. The COVID-19 pandemic has so far caused more than three million deaths worldwide, mainly due to the development of acute respiratory distress syndrome (ARDS). While autopsy data from patients dying early on after ARDS development demonstrate diffuse alveolar damage, endothelial injury, thrombosis, and angiogenesis [1, 2] ; longer disease courses associate with features of Interstitial Lung Disease (ILD) including tissue remodeling, fibroblast proliferation, airspace obliteration, micro-honeycombing and extracellular matrix deposition [3, 4] . Moreover, radiological surrogates of lung fibrosis, including sub-pleural reticulation and fibrotic streaks have also been described in COVID-19 [5] . While an association between COVID-19-induced ARDS and risk for ILD development has been recently suggested [6] , no research has focused on immune gene expression profiles shared by COVID-19 and Idiopathic Pulmonary Fibrosis (IPF) patients. That characterization could provide pathophysiological insight to better understand the mechanisms regulating COVID-19-induced pulmonary injury and repair as well as facilitate the identification of molecular predictors of long-term lung damage, mortality, and other relevant outcomes in these patients. In this work, we hypothesized that a peripheral blood transcriptomic signature known to predict mortality in IPF [7, 8] could also be associated with COVID-19 outcomes. To address that hypothesis, we analyzed transcriptomic data reported by multiple centers enrolling COVID-19 and IPF patients. Using a previously established bioinformatic pipeline [8] we found a remarkable overlap of an outcome-predicting signature demonstrated by both diseases, and data from single-cell RNA-sequencing (RNA-seq) analyses revealed the cell types accounting for the aforementioned signature in COVID-19. In this retrospective, multicentre cohort study, we analyzed gene expression and clinical data from five independent cohorts: (1) COVID-19 Discovery cohort (N = 8 subjects). Peripheral Blood Mononuclear Cells (PBMC) were obtained twice from three of these subjects at two different time points during hospitalization. PBMC specimens from patients with COVID-19 were assigned to severe (N = 6) or mild (N = 5) disease groups according to the National Early Warning Score [9] (NEWS; mild < 5, severe 5) evaluated on the day of blood sampling [10] (PBMC, Single-cell RNA-seq data, GEO Accession: GSE149689 10 ); (2) COVID-19 Validation cohort (N = 100 subjects, bulk leukocyte RNA-seq data, GEO Accession: GSE157103 11 ). This study was designed to enroll all hospitalized patients older than 18 years of age with COVID-19 diagnosis who were not anticipated to die imminently (3) COVID-19 Single-cell cohort (N = 7 subjects, N = 155 single cells , single-cell RNA-seq data, GEO accession: GSE150728 12 ); (4) IPF-University of Chicago cohort (N = 45, Bulk PBMC, Affymetrix Human Exon 1.0 ST RNA Array data, GEO Accession: GSE28221 7 ); 5) IPF-Imperial College London cohort (N = 55, Bulk whole blood, Affymetrix Human Gene 1¢1 ST RNA Array data, GEO Accession: GSE93606 13 ). Transcriptomics data collection from all cohorts have been previously described [7, [10] [11] [12] [13] . As these are publicly available and de-identified datasets, no institutional review board's approvals were warranted. All analyses were performed in R software (version 4¢0¢2) [14] . For the COVID-19 Discovery cohort, we used the R package "Seurat" to pre-process the feature-barcode matrices of the single-cell RNA-seq data. Cells expressing less than 200 genes or more than 15% of mitochondrial genes of their total gene expression were excluded. Genes expressed in less than 10 cells were also excluded from the analysis. NormalizeData Ò function was used to normalize gene expression levels. The subject-level expression profile was estimated using the average expression level across all cells. For bulk RNA-seq data in the COVID-19 validation cohort, Transcripts Per Million (TPM) matrix was analyzed using log(1+TPM) to normalize gene expression levels. For the COVID-19 Single-cell cohort dataset, pre-processed and normalized data were provided directly according to the published report [10] . The Scoring Algorithm of Molecular Subphenotypes (SAMS) was used to identify genomic risk profiles as previously described [8] . SAMS, Up and Down scores were calculated in each cohort using the product of two variables: the proportion of genes expected to be increased or decreased per subject (or single-cells) and their median normalized expression levels. In this study, we calculated Up and Down scores based on the expression levels of seven increased genes (PLBD1, TPST1, MCEMP1, IL1R2, HP, FLT3, S100A12) and 43 decreased genes (LCK, CAMK2D, NUP43, SLAMF7, LRRC39, ICOS, CD47, LBH, SH2D1A, CNOT6L, METTL8, ETS1, P2RY10, TRAT1, BTN3A1, LARP4, TC2N, GPR183, MORC4, STAT4, LPAR6, CPED1, DOCK10, ARHGAP5, HLA-DPA1, BIRC3, GPR174, CD28, UTRN, CD2, HLA-DPB1, ARL4C, BTN3A3, CXCR6, DYNC2LI1, BTN3A2, ITK, CD96, GBP4 , S1PR1, NAP1L2, KLF12, IL7R) from a gene signature previously found to be predictive of IPF mortality [7, 8] . Two non-coding transcripts (SNHG1, C2orf27A) of the original gene signature were excluded because they were not consistently present across COVID-19 datasets. The Scoring Algorithm for Molecular Subphenotypes (SAMS) was applied as follows: (1) Gene normalization: The expression of each gene was normalized to the median of all the samples in each independent cohort. This step is performed to determine whether the expression of a gene is either increased or decreased in a subject or single-cell when compared to other subjects or singlecells in the same cohort. (2) Calculation of the proportion of up and down-regulated genes: Given that 50-gene risk profiles are based on seven increased and 43 decreased genes, the proportion of genes expected to be either increased or decreased can be estimated per subject or single-cell to calculate up and down scores. That is if a subject or single-cell X has five increased genes out of the seven genes expected to be Evidence before this study We searched the scientific literature using PubMed to identify studies that use gene expression in peripheral blood to identify outcome prediction in COVID-19 and Idiopathic Pulmonary Fibrosis (IPF). We used the search terms "COVID-19 00 , "gene expression", "outcome prediction", and "blood", and identified 23 studies. When we added the term Idiopathic Pulmonary Fibrosis (IPF) we found no studies investigating this association. We have previously identified a transcriptomic signature predictive of IPF mortality in peripheral blood. In this work, we sought to determine whether genomic risk profiles based on 50 genes of this signature could be predictive of COVID-19 outcomes. A 50-gene, high-risk profile predicted ICU admission, need for mechanical ventilation and in-hospital mortality in COVID-19. 50-gene expressing cells with a high-risk profile in COVID-19 mainly included CD14 + monocytes, dendritic cells, and neutrophils while low-risk profile-expressing cells included CD4 + , CD8 + T lymphocytes, IgG producing plasmablasts, B cells, NK and gamma/delta T cells The identification of 50-gene risk profiles in COVID-19, in addition to clinical variables, can facilitate healthcare utilization such as triage of patients to the most appropriate location, reduce hospital length-of-stay, and allow for proper allocation of limited resources. It may also allow the identification of patients that are more likely to respond to COVID-19 targeted therapies. increased, then the proportion of increased genes for this subject or single-cell is 0¢714. (3) Sum of the median normalized expression values of increased and decreased genes: the sum of the median normalized expression values is calculated per subject or single-cell for the entire set of increased and decreased genes. (4) Calculation of the product between the sum of normalized expression values and the proportion of increased or decreased genes: for this step, the sum of increased genes calculated in step three is multiplied by the proportion of increased genes calculated in step two. To determine 50-gene risk profiles in the COVID-19 Discovery cohort, up scores above the median and Down scores below the median value within this cohort were classified as high-risk. Subjects without this pattern of expression were classified as low-risk. In the 50-gene, high-risk group of the COVID-19 Discovery cohort, the lowest Up score (0¢41) and the highest Down score (À0¢41) were used as cutoffs to identify a 50-gene, high-risk profile (subjects or single-cells with Up score >0¢41 and Down score <À0¢41) in the COVID-19 Validation, COVID-19 Single-cell cohort, IPF-University of Chicago and IPF-Imperial College London cohorts. Two-sided Fisher's exact test was used to identify differences in disease severity between risk profiles in the COVID-19 Discovery cohort. Categorical variables and continuous clinical variables were analyzed using Two-sided Fisher's exact and two-sample t-test, respectively. The Area Under the Curve (AUC) was used to assess the prediction accuracy of 50-gene risk profiles to determine ICU admission, use of mechanical ventilation and in-hospital mortality in the COVID-19 Validation cohort. These patients were followed for 45 days after hospitalization. We used logistic regression to determine the relationship between 50-gene risk profiles and studied outcomes after adjusting for Age, Charlson comorbidity index, absolute lymphocyte count, corticosteroid therapy and convalescent plasma [14] to 50-gene risk profiles could improve outcome prediction in COVID-19, we compared three AUC models (50-gene risk profiles alone, Charlson index alone and 50gene risk profiles combined with Charlson index) using logistic regression with 10-fold cross-validation. Kaplan-Meier curves were used to evaluate the association between 50-gene risk profiles and Mortality in IPF cohorts. Significance was defined as P < 0.05 for all tests. To determine cell types expressing either 50-gene high or lowrisk profiles in COVID-19, we conducted a cell-type-specific analysis using eight single-cell data measurements from seven subjects with COVID-19 (COVID-19 Single-cell cohort). We estimated the average expression levels of each gene, for each cell type, producing 155 celltype-specific expression profiles. An Up score >0¢41 and a Down score <À0¢41 were used to classify 50-gene risk profiles into High and Low-risk groups. The estimated proportion of specific cell types was compared between risk profiles (High versus Low). The cell type definition and classification has been previously described [12] . We tested the overall difference in cell proportions between high and low-risk subgroups using a chi-square test. The Funders had no role in study design, data collection, data analyses, interpretation of results, or manuscript writing. 3.1. 50-gene risk profiles in peripheral blood distinguish COVID-19 severity subgroups in a discovery cohort The COVID-19 Discovery cohort included PBMC samples from eight subjects, three of them (Subjects C3, C6, and C7) with two repeated measurements during hospitalization. These samples were classified as mild (N = 5) and severe (N = 6) COVID-19, based on the NEWS score as previously published [10] . To identify 50-gene risk profiles in this cohort, SAMS Up and Down scores were calculated for each sample. All of the samples with a 50-gene, high-risk profile were classified as severe COVID-19 while 83¢3% of samples with a low-risk profile were classified as mild COVID-19 (P = 0¢015) (Fig. 1A) . Table 1 describes the clinical characteristics of the COVID-19 Discovery cohort. Subjects in the low-risk profile had radiological evidence of pneumonia while subjects from the high-risk profile had evidence of multifocal pneumonia with ground glass opacities. 50gene, high-risk samples had significantly higher NEWS score (mean of 9¢2 versus 1¢8, P < 0¢001), C-reactive protein (mean of 16¢9 mg/dl versus 3¢3 mg/dl, P = 0¢047) and lower absolute lymphocyte counts (mean of 802 cells/mL versus 1430 cells/mL, P = 0¢033) when compared to low-risk samples. Regarding the three subjects with baseline samples and repeated measurements, the high-risk profile remained the same in subject C3 (Table 1 ) after four days of follow up which associated with an increase in NEWS score from eight to ten. Subjects C6 and C7 (Table 1 ) changed their 50-gene risk profile from High to Low-risk during follow up (mean: 5¢5 days) which associated with a mean decline in NEWS score from seven to four. To assess the reproducibility of our findings, we analyzed 50-gene risk profiles in the COVID-19 Validation cohort. SAMS cutoffs derived from the COVID-19 Discovery cohort (Up score >0¢41 and Down score <À0¢41) distinguished High versus Low-risk subjects in the COVID-19 Validation cohort (Fig. 1B) . High-risk subjects in the validation cohort were significantly older (64¢8 versus 55 years, P = 0¢002), had higher APACHE-II severity score (22¢5 versus 14¢1, P = 0¢006), Charlson Comorbidity Index (4 versus 2¢3, P < 0¢001), Creactive protein (165¢7 mg/l versus 101¢3 mg/l, P = 0¢003), and Ferritin levels (1215¢6 ng/ml versus 497 ng/ml, 0¢002) when compared to low-risk subjects. They also had lower absolute lymphocyte counts (838¢2 cells/ versus 1550, P < 0¢001) and albumin levels (2¢8 mg/L versus 3¢2 mg/L, P < 0¢001) ( Table 2) . High-risk subjects were more likely to have a prior history of myocardial infarction (16¢9% versus 2.4%, P = 0¢02) and were more likely to receive convalescent plasma (32¢2% versus 12¢2%, P = 0¢02), and corticosteroid therapy (64¢4% versus 14¢6%, P < 0¢001). There was no significant difference in the incidence of venous thromboembolism between risk subgroups. A 50gene, high-risk profile predicted ICU admission (AUC:0¢77, 95%CI:0¢686À0¢844, P < 0¢001), mechanical ventilation (AUC:0¢75, 95%CI:0¢67À0¢827, P < 0¢001) and in-hospital mortality (AUC:0¢74, 95%CI:0¢678À0¢815, P < 0¢001) in the COVID-19 Validation cohort ( Table 2) . Prediction based on 50-gene risk profiles remained statistically significant (P < 0¢05) for each outcome measure after adjusting for age, Charlson index, absolute lymphocyte count, corticosteroid therapy and convalescent plasma use. The addition of the Charlson index to 50-gene risk profiles modestly improved the in-hospital mortality prediction accuracy of the genomic classifier by 3% (AUC went from 0¢74 to 0¢77) ( Table 3) . High-risk patients spent more days on mechanical ventilation (21¢9 versus 15¢5 days, P < 0¢001) and had longer hospitalizations (21¢1 versus 9 days, P < 0¢001) compared to low-risk patients. Only one patient in the 50-gene, low-risk profile group died while 23 patients in the 50-gene, high-risk profile group died during hospitalization (P = < 0¢001) ( Table 2 ). All deceased patients in the validation cohort were in severe ARDS and on mechanical ventilation. Refractory respiratory failure was the cause of death in all the patients who died from COVID-19 in the validation cohort. A COVID-19 Single-cell cohort [11] was used to identify the cellular origin of 50-gene risk profiles. SAMS cutoffs derived from the COVID-19 Discovery cohort (Up score>0¢41 and Down<À0¢41) classified 47 cells with a high-risk profile and 108 cells with a low-risk profile ( Fig. 2A) . 50-gene expressing cells with a high-risk profile mainly included CD14 + monocytes (16¢7%), dendritic cells (16¢7%) and neutrophils (16¢7%), while 50-gene expressing cells with a lowrisk profile mainly included IgG producing plasmablasts (7¢48%), mature (7¢48%) and naïve (7¢48%) CD4 T cells, CD8 mature T cells (7¢48%), B cells (7¢48%), NK cells (7¢48%), proliferative lymphocytes (7¢48%), gamma/delta T cells (6¢54%) and Interferon stimulated CD4-T cells (5¢41%) (Fig. 2B) . Cells with overlapping 50-gene risk profiles included: developing neutrophils, stem cells, eosinophils, myeloid cells, CD16 monocytes, platelets, plasmacytoid dendritic cells, IgA and IgM producing plasmablasts. The overall difference of cell proportions between 50-gene risk profiles (High versus Low) was statistically significant (P < 0¢001). The full list of 50-gene expressing cells can be seen in Table 4 . These findings provide evidence of the cellular source of 50-gene expression changes in peripheral blood and point at specific cell types potentially associated with increased risk of mortality, and other poor outcomes in COVID-19. To determine whether the same SAMS cutoffs used to distinguish a 50-gene, high-risk profile in COVID-19 could also be applied to predict IPF mortality, we reanalyzed peripheral blood 50-gene expression data from two independent IPF cohorts (IPF-University of Chicago and IPF-Imperial College London). An Up Score >0¢41 and a Down Score <À0¢41 distinguished 50-gene high versus low-risk profiles in both IPF cohorts (Fig. 3A and B) . 50-gene risk profiles were significantly predictive of mortality in the IPF-University of Chicago (HR:5¢26, 95%CI:1¢81À15¢27, P = 0¢0013) and IPF-Imperial College London (HR:4¢31, 95%CI:1¢81À10¢23, P = 0¢0016) cohorts ( Fig. 3C and D) . These results confirmed our previous findings [7, 8] and indicated an overlapping outcome-associated transcriptomic signature between COVID-19 and IPF. In this study, we show that a high-risk, 50-gene profile, previously shown to predict IPF mortality is also predictive of worse outcomes in COVID-19 patients. The transcriptomic overlap captured in different cohorts and experimental settings suggests a remarkably conserved systemic gene expression signature evoked by COVID-19 and IPF. Moreover, this overlapping profile combined with the observed pathological and radiological surrogates of pulmonary fibrosis shown by some severe COVID-19 patients suggests that both diseases share, to some extent, common host response features. The single-cell RNAseq data in COVID-19 subjects points at the cells expressing the 50 genes predictive of poor disease outcomes. These data suggest that CD14 + monocytes, dendritic cells and neutrophils are critical regulators of the high-risk profile. In SARS-CoV-2 infected primates, increased circulating levels of classical and non-classical monocytes, and neutrophilic migration to the lungs [15] was associated with poor disease outcomes. In humans, reports have shown that severe COVID-19 is associated with elevated numbers of neutrophil precursors and circulating levels of CD14 + monocytes with high expression of alarmins S100A8/9/12 and low expression of HLA-DR. [16] . The present analysis is consistent with that data. A recent report also indicates that serum calprotectin, which belongs to the S100 protein family, is associated with IPF diagnosis and correlates with diffusing capacity for carbon monoxide (DLCO) and the composite physiologic index (CPI) [17] . Moreover, previous evidence indicates that S100A9 is elevated in bronchoalveolar lavage fluid from IPF patients in comparison with healthy controls [18] and increased circulating levels of CD14 + monocytes were found to be predictive of mortality in IPF and other fibrotic lung diseases [19] . The single-cell RNA sequencing data shows increased proportion of CD4 and CD8 T lymphocytes and immunoglobulin-producing plasmablasts in individuals with a low-risk genomic profile, suggesting an association between a strong T cell response [20, 21] and better disease outcomes [22] . This finding is consistent with recent data indicating that severe COVID-19 infection induces a distinct inflammatory program characterized by suppression of the innate immune system in the periphery and that milder cases evoke a more robust T cell response [23] . The biomarker and therapeutic implications of this discovery are significant since the identification of 50-gene risk profiles in COVID-19, in addition to clinical variables, can facilitate healthcare utilization such as triage of patients to the most appropriate location (home, ward, ICU), reduce hospital length of stay, allow for proper allocation of limited resources including mechanical ventilators and reduce the cost of inappropriate hospitalization. It could also allow the early identification of patients likely to deteriorate and resolve specific transcriptomic sub-phenotypes that are amenable to certain treatments. For example, while corticosteroids are currently recommended for hospitalized COVID-19 patients due to their positive effect in survival [24] , the use of corticosteroids in IPF has been controversial due to increased risk of death and hospitalizations associated with immunosuppressive therapy [25] . Thus, the 50-gene, highrisk profile may facilitate the identification of patients that are more likely to respond to COVID-19 targeted therapies such as corticosteroids and others [26] or to identify a subgroup of IPF patients who may benefit from a limited course of corticosteroid therapy. The use of 50-gene risk profiles could also support the rationale to investigate the use of IPF-targeted antifibrotic medications [27, 28] to prevent shortand long-term sequela of COVID-19. Another important aspect of our study that is worth mentioning is the remarkable ability of SAMS scores derived from the COVID-19 Discovery cohort, to identify 50-gene risk profiles predictive of poor outcomes across two additional COVID-19 and IPF cohorts despite using different genomic technologies and different starting material (bulk versus single-cell RNA). Despite the relevance and reproducibility of our findings, we need to acknowledge some limitations of our study. COVID-19 and IPF are diseases with different etiologies. COVID-19, the illness caused by SARS-CoV-2 infection, is characterized by diffuse [1] and extensive alveolar damage, dysmorphic pneumocytes and thrombosis of the lung micro, and macro-vasculature [29] . Poor outcomes in COVID-19 are predominantly driven by the host response to the infection [22] . IPF is a specific form of chronic fibrosing interstitial pneumonia of unknown etiology, limited to the lung and histologically characterized by usual interstitial pneumonia [30] . Accumulating evidence suggests that under genetic predisposition and environmental factors, the fibrotic response seen in IPF is driven by abnormally activated alveolar epithelial cells (AECs) leading to epithelial to mesenchymal transition and activation, proliferation, and differentiation of fibroblasts to myofibroblasts [31] . While our study focuses on predictive features of peripheral blood transcriptomic profiles in COVID-19 and IPF, we did not study the underlying mechanisms triggering this aberrant immune response and its potential relationship to alveolar epithelial cell injury or any other molecular mechanisms shared between COVID-19 and IPF. Future studies could be performed to characterize lung autopsy findings in deceased individuals with a 50-gene, high-risk profile or whether COVID-19 survivors with a high-risk profile are more likely to develop chronic ILD changes and a fibroproliferative phenotype. While this is to our knowledge the first systematic analysis of the overlapping gene expression signature of COVID-19 and IPF, we believe these data needs corroboration in large, prospective trials including more diverse patient populations to generalize our findings. That research could be complemented with an unbiased whole-exome analysis of circulating blood in both diseases, which could uncover other relevant genes associated with poor outcomes. Also, it would be important to determine whether the identification of 50-gene expressing cells in COVID-19 can be replicated in single-cell RNA-seq analyses of IPF patients, which could help define if the mentioned overlap is driven by similar cell type distributions in both diseases. Finally, given the retrospective nature of our study, we were limited by the lack of a comprehensive radiological assessment of COVID-19 subjects in both cohorts. Future studies should focus on comparing the radiological characteristics of subjects with a 50-gene high versus low risk profile. In conclusion, peripheral blood, 50-gene risk profiles predict ICU admission, need for mechanical ventilation and in-hospital mortality in COVID-19 and overlaps a signature known to predict poor IPF outcomes. The cellular sources of these gene expression changes suggest common mechanisms implicating innate and adaptive immune responses in both diseases. A 50-gene, risk profile test in peripheral blood could be a potentially useful biomarker to predict COVID-19 mortality and morbidity. BJG, JS and JHM conceptualized, designed the study, collected data, carried out the initial analyses and drafted the initial manuscript. BJG, JS, TZ, BX, JB and JHM performed statistical analyses. JB, Y. H, SFM, PM, TM, IN, GM and AJ collected data, critically reviewed and drafted the revised manuscript. BJG and JS contributed equally to this work. All authors read and approved the final version of the manuscript. BJG, JS and JHM verified the underlying data. Gene expression and clinical data has been previously deposited in the Gene Expression Omnibus (GEO) under the following accession numbers: GSE149689, GSE157103, GSE174818 GSE150728, GSE28221 and GSE93606. JHM has a patent titled "52-gene signature in peripheral blood identifies a genomic profile associated with increased risk of mortality and poor disease outcomes in idiopathic pulmonary fibrosis" that relates to the work presented in this manuscript. IN receives consulting fees from Boehringer Ingelheim, Genentech and Parion Sciences. Pulmonary post-mortem findings in a series of COVID-19 cases from northern Italy: a two-center descriptive study Pulmonary vascular endothelialitis, thrombosis, and angiogenesis in Covid-19 Lung fibrosis: an undervalued finding in COVID-19 pathological series Pathological study of the 2019 novel coronavirus disease (COVID-19) through postmortem core biopsies CT Features of Coronavirus disease 2019 (COVID-19) pneumonia in 62 patients in Wuhan, China Pulmonary fibrosis secondary to COVID-19: a call to arms? Peripheral blood mononuclear cell gene expression profiles predict poor outcome in idiopathic pulmonary fibrosis Validation of a 52-gene risk profile for outcome prediction in patients with idiopathic pulmonary fibrosis: an international, multicentre, cohort study The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death Immunophenotyping of COVID-19 and influenza highlights the role of type I interferon's in development of severe COVID-19 Large-Scale multi-omic analysis of COVID-19 severity A single-cell atlas of the peripheral immune response in patients with severe COVID-19 Host-microbial interactions in idiopathic pulmonary fibrosis R: a language and environment for statistical computing Cellular events of acute, resolving or progressive COVID-19 in SARS-CoV-2 infected non-human primates Severe COVID-19 is marked by a dysregulated myeloid cell compartment Serum calprotectin as new biomarker for disease severity in idiopathic pulmonary fibrosis: a cross-sectional study in two independent cohorts S100A9 in BALF is a candidate biomarker of idiopathic pulmonary fibrosis Increased monocyte count as a cellular biomarker for poor outcomes in fibrotic diseases: a retrospective, multicenter cohort study T cell responses in patients with COVID-19 Distinct early serological signatures track with SARS-CoV-2 survival Viral and host factors related to the clinical outcome of COVID-19 Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans The RECOVERY Collaborative Group. Dexamethasone in hospitalized patients with COVID-19 Prednisone, azathioprine, and N-acetylcysteine for pulmonary fibrosis Precision medicine for COVID-19: a call for better clinical trials A phase 3 trial of pirfenidone in patients with idiopathic pulmonary fibrosis Efficacy and safety of nintedanib in idiopathic pulmonary fibrosis Persistence of viral RNA, pneumocyte syncytia and thrombosis are hallmarks of advanced COVID-19 pathology European Respiratory Society International multidisciplinary consensus classification of the idiopathic interstitial pneumonias The leading role of epithelial cells in the pathogenesis of idiopathic pulmonary fibrosis This work is dedicated to those who lost their lives due to COVID-19. This work was supported by the Ubben Pulmonary Supplementary material associated with this article can be found in the online version at doi:10.1016/j.ebiom.2021.103439.