key: cord-0737788-8n8pnoxs authors: Delorey, Toni M.; Ziegler, Carly G. K.; Heimberg, Graham; Normand, Rachelly; Yang, Yiming; Segerstolpe, Asa; Abbondanza, Domenic; Fleming, Stephen J.; Subramanian, Ayshwarya; Montoro, Daniel T.; Jagadeesh, Karthik A.; Dey, Kushal K.; Sen, Pritha; Slyper, Michal; Pita-Juárez, Yered H.; Phillips, Devan; Bloom-Ackerman, Zohar; Barkas, Nick; Ganna, Andrea; Gomez, James; Normandin, Erica; Naderi, Pourya; Popov, Yury V.; Raju, Siddharth S.; Niezen, Sebastian; Tsai, Linus T.-Y.; Siddle, Katherine J.; Sud, Malika; Tran, Victoria M.; Vellarikkal, Shamsudheen K.; Amir-Zilberstein, Liat; Atri, Deepak S.; Beechem, Joseph; Brook, Olga R.; Chen, Jonathan; Divakar, Prajan; Dorceus, Phylicia; Engreitz, Jesse M.; Essene, Adam; Fitzgerald, Donna M.; Fropf, Robin; Gazal, Steven; Gould, Joshua; Grzyb, John; Harvey, Tyler; Hecht, Jonathan; Hether, Tyler; Jane-Valbuena, Judit; Leney-Greene, Michael; Ma, Hui; McCabe, Cristin; McLoughlin, Daniel E.; Miller, Eric M.; Muus, Christoph; Niemi, Mari; Padera, Robert; Pan, Liuliu; Pant, Deepti; Pe’er, Carmel; Pfiffner-Borges, Jenna; Pinto, Christopher J.; Plaisted, Jacob; Reeves, Jason; Ross, Marty; Rudy, Melissa; Rueckert, Erroll H.; Siciliano, Michelle; Sturm, Alexander; Todres, Ellen; Waghray, Avinash; Warren, Sarah; Zhang, Shuting; Zollinger, Daniel R.; Cosimi, Lisa; Gupta, Rajat M.; Hacohen, Nir; Hide, Winston; Price, Alkes L.; Rajagopal, Jayaraj; Tata, Purushothama Rao; Riedel, Stefan; Szabo, Gyongyi; Tickle, Timothy L.; Hung, Deborah; Sabeti, Pardis C.; Novak, Richard; Rogers, Robert; Ingber, Donald E.; Jiang, Z. Gordon; Juric, Dejan; Babadi, Mehrtash; Farhi, Samouil L.; Stone, James R.; Vlachos, Ioannis S.; Solomon, Isaac H.; Ashenberg, Orr; Porter, Caroline B.M.; Li, Bo; Shalek, Alex K.; Villani, Alexandra-Chloé; Rozenblatt-Rosen, Orit; Regev, Aviv title: A single-cell and spatial atlas of autopsy tissues reveals pathology and cellular targets of SARS-CoV-2 date: 2021-02-25 journal: bioRxiv DOI: 10.1101/2021.02.25.430130 sha: 9f0155b6377b2492795b32d6bfdface6e1f04ca3 doc_id: 737788 cord_uid: 8n8pnoxs The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients’ demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63(+) intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes coronavirus disease 2019 , which has resulted in over 1 million deaths globally as of November 2020 (https://covid19.who.int/). The variable host immune response to infection can result in a range of clinical outcomes spanning from an asymptomatic state, to severe illness, organ failures, and death. The vast majority of deaths are due to acute lung injury and acute respiratory distress syndrome (ARDS), or direct complications thereof that can lead to multiple organ failure [1] [2] [3] [4] . Progression to ARDS is thought to reflect: a combination of increasing viral load; cytopathic effects; translocation of virus into pulmonary tissue; and, inappropriate or insufficient host immune responses 1-6 . Collectively, this contributes to clinical deterioration in the acute phase of systemic illness, leading to ineffective viral clearance and collateral tissue damage-which can impact the lung 1-8 , gastrointestinal tract 5 , kidney 6, 7 , liver 3,6 , vasculature 8 , heart 2,6,9,10 and brain 6,11-14 -resulting in single or multi-organ failure [1] [2] [3] [4] . While clinical knowledge of severe COVID-19 is developing rapidly, our molecular and cellular understanding of disease pathogenesis and pathophysiology remains limited. Current data indicate that severe COVID-19 is accompanied by an inappropriate host immune response, characterized by cytokine storms involving pro-inflammatory cytokines (TNF-α, IL-1 and IL-6), chemokines (IL-8) and a diminished antiviral interferon response [15] [16] [17] [18] Addressing these questions using relevant human tissue sources will be essential to inform the identification of new therapeutic targets and prevention strategies. It has been challenging to tackle these questions as it is rare to obtain samples from essential affected organs of living patients. Autopsies are thus a critical path to gaining knowledge of severe COVID-19 pathology [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] , but are usually followed by established molecular histopathology approaches relying on marker preselection, such as immunohistochemistry, which may limit discovery of novel molecular insights into COVID-19 pathogenesis. Conversely, comprehensive genomic studies remain challenging as autopsy specimens are often collected after a prolonged post-mortem interval (PMI), which can lead to degradation of biomolecules. Furthermore, the heterogeneous composition of, and unpredictable cellular responses within, post-mortem tissues require high resolution analyses that can distinguish individual cells and their features. To overcome these challenges, we leveraged ongoing clinical COVID-19 autopsy efforts across three major Boston area hospitals to orchestrate a concerted study and assemble a comprehensive biobank of approximately 420 autopsy specimens spanning 11 organs from 17 donors who succumbed to COVID-19. We then utilized recently developed methods for single-nucleus RNA-Seq (snRNA-Seq) from frozen specimens 29, 30 and for spatial RNA profiling from formalin fixed paraffin embedded (FFPE) tissues 31 , and developed and applied analytical strategies to overcome ambient RNA in autopsy specimens and relate annotations across datasets. Together, this enabled us to generate comprehensive single-cell/single-nucleus tissue atlases of lung, kidney, liver and We generated single-cell/single-nucleus atlases of the lung (n=16 individuals, k=106,792 cells/nuclei), heart (n=15, k=36,662), liver (n=16, k=47,001) and kidney (n=11, k=29,568). We initially tested both scRNA-Seq and snRNA-Seq on tissue samples collected from COVID-19 autopsies through methods we recently established for analyzing human tumor biopsies and resections 29, 30 (Methods) . In these pilots, snRNA-Seq, which is well-suited for processing hardto-dissociate or damaged tissues, performed better in systematically capturing the complex tissue cellular ecosystem (Supplemental Fig. 1 and data not shown) and was thus chosen for profiling most COVID-19 autopsy samples. We developed a computational pipeline (Fig. 1c) to tackle key technical challenges posed by this dataset. These include: (1) ambient RNA contamination, which can reduce the cell type-specificity of transcriptional profiles; (2) the need to efficiently and uniformly process a large dataset across tissues; and, (3) the inclusion of diverse tissues with many cell types, which we needed to annotate systematically to allow comparisons between donors and to existing reference atlases. First, the long PMI associated with many autopsies in this cohort likely promoted significant cell death and tissue damage. As a result, we observed substantial amounts of ambient RNA in our sc/snRNA-Seq profiles, reflected by reduced separation of cell subsets in low dimensionality projections relative to what is observed in typical tissue atlases (e.g., Fig. 1d) , and in lower cell type specificity for known marker genes (e.g., Fig. 1e, Supplemental Fig. 2) . To address this, we used CellBender remove-background 32 (Methods) , which removes ambient RNA and likely empty (non-cell) droplets from droplet-based sc/snRNA-Seq based on a principled generative model of the various . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint technical errors and contaminants in such data. This improved separation of cell subsets and increased the specificity of marker gene expression among key cell subsets (e.g., AT1 and AT2 cells in lung, Fig. 1d,e, Supplemental Fig. 2) . Second, to efficiently and systematically process our data, we relied on the cloud-based platform Cumulus 33 , which provides a standard pipeline for processing sc/snRNA-Seq data at scale with cloud computing resources. Cumulus generated gene-count matrices, filtered low quality cells, reduced dimensions, clustered, and generated UMAP visualizations of the data. The samplespecific results allowed rapid sample quality control and preparation for integrated analyses. Third, we devised an automated approach to annotate individual cells/nuclei by type through transferring labels from previously annotated datasets of matched tissues from diverse sources (e.g., healthy or other diseases; different organ regions; single-cell or single-nucleus) to our unlabeled expression profiles. Briefly, we trained a logistic regression classifier on individual expression profiles from public sc/snRNA-Seq datasets of matched tissues to assign cell types to individual cells/nuclei without cell clustering or prior knowledge of marker genes (Fig. 2a, Methods). To further refine these cell type labels, we also performed unsupervised sub-clustering on batch-corrected integrated data for each of the main cell lineages and manually annotated subclusters using known lineage markers and established gene signatures ( Fig. 2b-g, Supplemental Fig. 3, 4, 5, Methods) . The automated annotation approach allowed us to compare cells/nuclei to other data resources, including atlases of the same tissues in health and disease, and was conducted on a per-cell/nucleus basis (without the need for batch correction). Our manual annotation, meanwhile, enabled us to refine cell identity assignments with detailed domain knowledge to . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint generate richer labels and describe clusters of cell states that may be specific to COVID-19 hostimmune response and thus would not be readily captured through automated annotation using non-COVID-19 atlases. We assembled a COVID-19 lung atlas comprised of 106,792 sc/snRNA-Seq profiles (99, 735 single nucleus, 4,608 cryopreserved single cells, and 2,449 freshly processed single cells) from 24 lung tissue samples from 16 COVID-19 deceased donors (Fig. 1a, Fig. 2, Supplemental Fig. 3 ). were generally well-mixed across donors and lab protocols (Supplemental Fig. 3a -c, adjusted rand index between sample and cluster at 3.8%). Thus, despite variation between donors, intrinsic cell profiles generalized well across samples, and all major parenchymal, endothelial, and immune cell subsets were captured. We defined 28 cell subsets based on annotations automatically transferred from six sc/snRNA-Seq lung datasets (for cell types present in at least two datasets), spanning different lung regions from healthy donors and from donors with lung fibrosis, with the classifier capturing canonical cell type markers well (Fig. 2a Table 3 ). The annotations were robust by cross-validation (~80% accuracy on two held out datasets), and cell type mis-assignments were typically between cell subtypes from the same lineage. For finer annotations, especially of cell states that may not have been observed in the reference datasets (e.g., in immune and endothelial cells), we partitioned cells (after batch correction) into six main cell subgroupings -epithelial cells, endothelial cells, fibroblasts, myeloid cells, T and NK cells, and B and plasma cells -and annotated each by sub-clustering and manual analysis ( Fig. 2b-g, Supplemental Fig. 4b-e, Methods) . . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02. 25.430130 doi: bioRxiv preprint In the immune subgroupings, among the 24,417 myeloid cells, we distinguished six cell subsets: CD14 high CD16 high inflammatory monocytes expressing transcripts with antimicrobial properties (e.g., LYZ, S100A6); and five macrophage subsets enriched for distinct key immune genes (Fig. 2b, Supplemental Fig. 4d, 5a) , including either scavenger receptors (e.g., CD163, STAB1), tolllike receptor ligands (e.g., VCAN) and inflammatory transcriptional regulators (e.g., LDB2, YAP1); or genes associated with metabolism (e.g. DHFR, INO80D) and higher mtRNA reads. Among the 9,950 T and NK cells, we annotated six subsets including: two CD4 + subsets, including a regulatory T cell subset expressing FOXO1 and ANK3, and a metabolically active subset enriched for DHFR expression and high mtRNA reads; one CD8 + subset; and, two T/NK cell subsets ( Fig. 2c, Supplemental Fig. 4d, 5b) , including one enriched for cytotoxic effector genes (e.g., GNLY, PRF1). In addition to the subsets, there were three doublet clusters, one containing CD4 + T cells, In the stromal grouping, among 21,391 endothelial cells (ECs), we identified seven cell subsets ( Fig. 2e) : Three were annotated using known signatures 35 as arterial endothelial cells (cluster 5), venous endothelial cells (cluster 6) and lymphatic endothelial cells (cluster 7, Supplemental Fig. 4d, 4e) . The remaining four subpopulations were identified using imputation and comparison to . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02. 25.430130 doi: bioRxiv preprint other lung and endothelial cell atlases 35, 36 (Supplemental Fig. 4f) . They included: a capillary aerocyte subpopulation (cluster 2) based on expression of EDNRB, HPGD, CA4, and PRKG1; a capillary EC-1 population (cluster 1) by VWF and PTPRB; a capillary EC-2 population (cluster 3) by RUNX1, CD44, and mitochondrial genes, a pattern typically observed in states of cellular stress or dying cells; and a EC subpopulation (cluster 4) whose gene expression signature overlaps with multiple EC subpopulations in the healthy lung reference data and is enriched with predicted doublets (Methods). This unidentified subpopulation was also rich in whole cells (fresh and cryopreserved) compared to the more specific nuclear preparation of the other EC subpopulation (Supplemental Fig. 4g ). Within the fibroblast lineage, among 20,925 cells, we identified fibroblasts, proliferative fibroblasts, myofibroblasts, and doublet clusters 35 (Fig. 2f, Supplemental Fig 4d) . In the epithelial compartment, among 21,700 cells, we observed eight cell clusters, including club/secretory cells, AT1 cells, AT2 cells, a proliferative state of the AT2 cells, and three sets of cell doublets (Fig. 2g) . The eighth cluster corresponds to an intermediate cell state (KRT8+ PATS/ADI/DATP) previously described as a transitional state from AT2 cells to AT1 cells during alveolar regeneration [37] [38] [39] (Fig. 2g, Supplemental Fig. 4h) , which we discuss below. Deconvolution of bulk RNA-Seq profiles from the same lung samples largely agreed with our cell type classifications, suggesting that our snRNA-Seq reflects tissue composition. We first used sn/scRNA-Seq profiles to deconvolve bulk RNA-Seq and infer relative composition across 11 major cell subsets (Supplemental Fig. 5c-e, Methods) . Most samples had at least 25% inferred fibroblast content, which may indicate lung fibrosis (Supplemental Fig. 5c) , and substantial variation in inferred myeloid cell composition, which was consistent between replicates but varied from 4-42% between samples. Conversely, the inferred proportion of epithelial and endothelial cells was often lower (0.05 -38% and 0.03 -43% respectively). Overall, predicted cell type composition was broadly consistent with sc/snRNA-Seq, identifying the same cell types as most prevalent, with the exception of T+NK cells that we inferred at low proportions in bulk RNA-Seq, but present at ~10% on average in sc/snRNA-Seq (Supplemental Fig. 5e) . Importantly, our two annotation strategies were highly coherent, such that 94% of the lineage assignments matched between the manual and automatic annotations (Fig. 2h, Supplemental Fig. 5a , 5b). Deconvolution of bulk RNA-Sequencing data also largely agreed with sc/snRNA-Seq cell type classifications (Supplemental Fig. 5c-5e, Methods) . Manual annotations were particularly important for resolving specific cell states not in the classification model, such as the pre-alveolar type-1 transitional cell state (PATS), TP63 + intrapulmonary basal-like progenitor cells, and macrophage and endothelial cell subsets; whereas the automatic annotations allowed us to next compare cell compositions to non-COVID-19 lung atlases. Although most cell subsets were present in all samples (Supplemental Fig. 3a,b) , there was variability in the cellular composition of the lung samples across donors. For example, the proportion of lymphoid cells ranged from 0.4% to 33% across snRNA-Seq samples between donors, and epithelial cells ranged from 3.8% to 52% (Supplemental Fig. 6a,b) . These two . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made compartments were strongly anti-correlated (though it is not possible for us to distinguish in such analyses between causal relations and the expected zero-sum game due to sampling). We also compared the cellular composition of COVID-19 lungs to that of normal lung, by contrasting against a snRNA-Seq lung dataset from a matching tissue region, generated using similar lab protocols and profiling technology in healthy deceased donors (MS, ORR, AR, unpublished data), and using the automatic cell type classification model to transfer labels between the two studies ( Fig. 3a, Methods) . The largest change was a significant decrease in the proportion of AT2 cells (p-value = 2*10 -16 ), dropping from 40% of cells (the largest population) in healthy lung lobes to 10% in COVID-19 samples (Methods). As AT2 cells were previously identified as likely targets of SARS-CoV-2 [40] [41] [42] , this may reflect widespread, virally-induced AT2 cell death. Conversely, immune cells -including dendritic cells (p-value=0.001), macrophages (p-value = 6.4*10 -11 ), and NK cells (p-value = 0.008) -all increased in their relative abundance in severe COVID-19. Fibroblasts (p-value = 0.0041), lymphatic endothelial cells (p-value = 0.0001), and vascular endothelial cells (p-value = 2.3*10 -5 ) were also captured at a greater relative abundance in COVID-19 lungs. To identify cell intrinsic changes in expression associated with severe COVID-19, we created a lung meta-atlas of ~1,280,000 cells and nuclei from 14 studies (Supplemental Table 2 ), spanning healthy and COVID-19 infected lungs, sampled by biopsy or bronchial alveolar lavage (BAL) and profiled using scRNA-Seq or snRNA-Seq on the 10x Chromium platform. After dataset . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint aggregation, we applied our automated cell annotation to classify each of the 1.28M cells into the same 28 classes identified in our severe COVID-19 lung atlas. We then used a linear regression model to identify differentially expressed genes between COVID-19 and healthy tissues for each abundant cell type (Supplemental Table 4 , Methods). Differential gene expression showed widespread transcriptional changes in key COVID-19 cell types (Fig. 3b,c) . CD16 + monocytes had 1,580 genes with elevated gene expression levels in the late stages of COVID-19. Lymphatic endothelial (578 genes with elevated expression), vascular endothelial (317), AT2 (309), and AT1 (307) cells also showed large transcriptional changes. Some cell types, such as ciliated cells, had very few genes that significantly increased expression (38) , showing less involvement in the infection's late stages. Within AT2 cells, there was higher expression of genes associated with host viral response (Fig. 3c ), including programmed cell death genes (e.g., STAT1, p-value = 1.6*10 -43 ), and inflammation and adaptive immune response genes (e.g., EREG with p-value = 3.2*10 -14 , CTSE with p-value = 2.1*10 -25 , TNFSF13B with p-value = 2.5*10 -7 , and SAMD9 with p-value = 8.1*10 -14 ). Gene sets associated with cell migration (e.g., ITGA3 with p-value = 1.3*10 -11 , AGRN with p-value = 1.6*10 -14 ) and damage response (e.g., TAOK1 with p-value = 1.6*10 -9 ) were also more highly expressed in AT2 cells from COVID-19 lung tissue. By comparison, expression of genes linked to proliferation and apoptosis regulation (e.g., CD63 with p-value = 7.5*10 -16 , Supplemental Table 4 ) were significantly reduced in COVID-19 lung tissue. Expression of lung surfactant genes were also reduced in COVID-19 lung tissue (SFTPD, SFTPC, SFTPA1, SFTPB, SFTA2, SFTPA2), with . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 38 is also observed in vivo. Notably, we saw an increase in the expression of the PATS program signature in epithelial cells in COVID-19 lungs compared to healthy lungs (p-value < 2.2*10 -16 , one-sided Mann-Whitney U test), consistent with prior studies showing that this progenitor program is induced during lung injury [37] [38] [39] (Fig. 3d, 3e) . Interestingly, these studies all reported that this intermediate state expands in lung diseases, such as idiopathic pulmonary fibrosis (IPF), and lung fibrosis has been documented in patients with severe COVID-19 43, 44 . These studies also highlighted the expansion of myofibroblasts, which we also observe in COVID-19 lungs (Fig. 3a) . Additionally, we detected a subset of cells among those that express the PATS program which shares the expression of PATS markers (KRT8/CLDN4/CDKN1A) but also expresses KRT5, TP63, and KRT17, which are not canonical features of the PATS program (Supplemental Fig. 4h , top Supplemental Table 5) . Notably, these cells are also distinct from KRT5 + /TP63 + airway basal cells (Supplemental Fig. 4h , bottom). We thus hypothesize that these cells may represent TP63 + intrapulmonary basal-like progenitor cells (IPBLP), which were previously identified in H1N1 influenza 45 , and are thought to be an emergency cellular reserve for severely damaged alveoli (Fig. 3e, 3f) 46 . Compared to airway basal cells, the putative IPBLP cells are characterized by the expression of numerous interferon viral-defense pathway genes (IFI27, IFITM1, IFITM2, IFITM3, IFI6, ISG15, BST2) as well as genes involved in the differentiation of progenitor cells (S100A11, PPDPF, S100A16, TNFRSF12A). . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To determine SARS-CoV-2 viral load and its association with host responses, we examined the donor and cell type-specific distribution of reads that aligned to the SARS-CoV-2 genome ( Fig. 4a-4m, Supplemental Fig. 7, Methods) . Here, we found substantial variation between donors: in 13 donors (14 samples), we detected at least one SARS-CoV-2-aligned unique molecular identifier (UMI) (from 0 -4,731 distinct viral UMI per lung, Fig. 4a, Supplemental Fig. 7a ). Viral-aligning UMI spanned the entire SARS-CoV-2 genome, and were biased toward positive-sense alignments, with a few cells containing reads aligning to all 28 viral segments, including the negative strand (Supplemental Fig. 7d) , which may indicate productive infection. Notably, viral detection from single-cell or single-nucleus transcriptomic data was not driven by confounding technical factors on a per-cell or per-sample basis (Supplemental Fig. 7e-7h) . This inter-donor variation reflected the SARS-CoV-2 burden in the tissue microenvironment, as estimated by SARS-CoV-2 RT-PCR on bulk RNA from directly adjacent biopsy-sized tissue pieces (Fig. 4b, Supplemental Fig. 7i, 7j ). Importantly, viral load (number of SARS-CoV-2 copies/ng RNA by RT-PCR) in lung parenchyma was negatively correlated to the time interval between donor-reported symptom onset and death (spearman's rho = -0.68, p-value = 0.005, Fig. 4f) , consistent with previous reports 47, 48 . Total viral burden per sample (Fig. 4a, 4b , including ambient RNA, Methods) was associated with significant changes in lung-resident cell populations. In particular, there was significant positive correlation between viral burden and the proportion of mast cells (Bonferroni-corrected p-value (q) = 0.012, Pearson's r = 0.67), VCAN high FCN1 high macrophages (q = 0.015, r = 0.66), CD163 high MERTK high macrophages (q = 0.021, r = 0.65), LDB2 high OSMR high YAP1 high macrophages . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Fig. 7k-q) . Differential expression analysis of bulk RNA-Seq profiles from adjacent lung biopsies comparing "highly infected" (viral RNA copies/ng total RNA > 5*10 3 ; samples: D7, D10, D11) and "uninfected" (< 1.5 viral RNA copies/ng total RNA; samples: D4, D6, D8, D15, D17) samples (Methods), highlighted viral response and innate immune processes. Specifically, genes upregulated (log2FC > 1.4, Wald test, FDR-corrected p-value < 0.05) in highly infected specimens included those associated with host response to virus (IFIT1, ISG15, IFIT3, APOBEC3A, and PRF1), and innate immune and effector responses (LAG3, GZMB, GREM, CCL7, and SEMA4A) (Supplemental Fig. 7r) . These genes were enriched (Kolmogorov-Smirnov statistic, FDR q-value = 3.12E*10 -6 ) with genes upregulated in post-mortem lung tissue of COVID-19 donors compared to uninfected biopsies 49 (e.g., ISG15, IFIT1, IFIT3, GBP5, CCL7, APOBEC3A). Genes significantly down-regulated (log2FC < 1.4, Wald test, FDR-corrected p-value < 0.05) in highly infected lung tissue were involved in lung fibrosis, surfactant metabolism dysfunction, and lamellar body function (SFTPA1, SFTPA2, SFTPC, LAMP3), which serve as secretory vesicles in AT2 cells 50, 51 . To evaluate viral genetic diversity, we performed metagenomic sequencing on bulk RNA collected from adjacent biopsy-sized tissue from 16 donors. We assembled nine complete genomes from the six unique donors with the highest viral loads. When compared to a previously-published database of SARS-CoV-2 genomes from Massachusetts between January-May 2020, the six genomes derived from this cohort are broadly distributed across local circulating lineages (Supplemental . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Fig. 7s ) and all carry the "G" allele at D614G, which represents the dominant variant observed in Massachusetts. Finally, metagenomic classification of reads from bulk genomic data did not identify co-infections from common respiratory pathogens such as influenza, rhinovirus, enterovirus, metapneumovirus, or other seasonal coronavirus strains (Supplemental Fig. 7t) , confirming the expected clinical history. Cell types with SARS-CoV-2 RNA+ cells spanned myeloid cells, B and plasma cells, epithelial cells, fibroblasts, pericytes and T/NK cells, with only myeloid cells being significantly enriched ( Fig. 4g-4i, Supplemental Fig. 7u-7w) . Notably, the presence of SARS-CoV-2 RNA+ cells did not overlap with co-expression of the SARS-CoV-2 entry factors ACE2 and TMPRSS2, nor did they correspond with an expanded set of novel or hypothesized proteases and cofactors thought to contribute to SARS-CoV-2 entry, including FURIN, CTSL, PCSK7, PCSK2, PCSK5 and/or NRP1 ( Fig. 4g, 4h) . We attribute this finding to the lack of expression of ACE2 among myeloid subsets (the largest contributor of SARS-CoV-2 RNA+ cells), as well as the relative loss of viable epithelial cells (previously identified as highest expressors of ACE2 among lung resident cell types) among the donors with highest abundances of SARS-CoV-2 RNA (e.g., D10 and D11, Supplemental Fig. 8a-d) . To define SARS-CoV-2 RNA+ cells, we corrected for the variable amount of SARS-CoV-2 reads within the ambient RNA pool with a virus-specific ambient RNA removal step (combining CellBender with a previously described approach 52 to achieve a more stringent cutoff for cellassociated viral RNA, Methods). Following correction and filtering, we retained seven samples . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made (donors: D3, D6, D7, D10, D11, D13 and D16, all from snRNA-Seq data) ( Fig. 4c-e, Supplemental Fig. 7b, 7c) , and computed an enrichment score to quantify the proportional abundances of SARS-CoV-2 RNA+ cells within each manually-annotated cell type category ( Fig. 4h-i, Supplemental Fig. 7u, Methods) . Myeloid cells were enriched for SARS-CoV-2 RNA+ cells ( Fig. 4i , 158 SARS-CoV-2 RNA+ cells, adjusted FDR < 0.001). Mast cells, B cells, and plasma cells had an elevated enrichment score compared to other lineages, but these were not significant (Fig. 4i) . Viral RNA+ cells were also found among many other cell types, including endothelial, fibroblasts, AT1, AT2, T/NK, pericytes and myofibroblasts, but none were significantly enriched (Fig. 4h,i) . Moreover, cells containing UMIs aligning to the viral negative strand, SARS-CoV-2 UMI/ cell > 100, and SARS-CoV-2 genomic features/cell > 20 -all metrics associated with high confidence in cell-associated viral RNA -were largely found among myeloid cells, and rarely within each of the other listed cell types. Viral enrichment scores varied among the six myeloid subsets (Fig. 4j,k, Supplemental Fig. 7v, 8e ). In particular, SARS-CoV-2 RNA+ cells were significantly enriched within "Inflammatory monocytes CD14 high CD16 high " (adjusted FDR < 0.001, 40 cells, represented by 5 donors) and "Macrophages LDB2 high OSMR high YAP1 high " (adjusted FDR < 0.009, 27 cells, 5 donors) (Fig. 4j) . When calculating enrichment scores in each donor individually (to account for different cell subset proportions across donors), we found large inter-individual variability (Supplemental Fig. 8c- RNA containing cells were often present in multiple myeloid subsets within the same sample ( Fig. 4j,k, Supplemental Fig. 7v, 8e) . Endothelial cells were the next most abundant cell type containing SARS-CoV-2 RNA+ cells (49 cells, 5 donors), but were not significantly enriched (Fig. 4i) . Within endothelial cells, the mixed population subset (cluster 3, adjusted FDR < 0.007, 16 cells, 4 donors) and lymphatic endothelial cells (cluster 7, adjusted FDR < 0.001, 9 cells, 3 donors) were enriched compared to other endothelial subsets (Fig. 4l,m and Supplemental Fig. 7w) . Within individual donors, lymphatic endothelial cells were enriched in SARS-CoV-2 RNA+ cells in one donor (D10: adjusted FDR < 0.014, 6 cells) and the capillary 2 subset in two others (D7: adjusted FDR < 0.001, 4 cells, D11: adjusted FDR < 0.001, 9 cells) (Supplemental Fig. 8f ). Next, we tested whether SARS-CoV-2 RNA+ cells had distinct transcriptional programs compared to RNA-counterparts in the same sample and cell type, focusing on subsets with a large enough number of SARS-CoV-2 RNA+ cells (myeloid, epithelial, T/NK, B/Plasma, and endothelial cells, and fibroblasts and select sub-clusters; Methods). We found significantly differentially expressed genes (FDR-corrected p-value < 0.05; Methods) between SARS-CoV-2 RNA+ and SARS-CoV-2 RNA-cells in epithelial and myeloid cells, and the "Macrophages PPARG high CD15L high " and "Inflammatory monocytes CD14 high CD16 high " subsets (Supplemental Table 6 ). Genes upregulated in epithelial SARS-CoV-2 RNA+ cells (compared to SARS-CoV-2 RNAepithelial cells in matched donors) were enriched for TNF signaling, chemokine/cytokine . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made signaling, SARS-CoV-2 driven cell responses following in vitro infection 49 , AP1 signaling and pathways involved in keratinization, which may reflect pneumocyte specific responses to injury, and included key chemokines (e.g., CXCL2 and CXCL3), early-response genes, and transcriptional regulators (e.g., CREB5, JUN, EGR1, ZBTB10, NFKBIA and TCIM) (Supplemental Fig. 8g ). Genes upregulated in myeloid SARS-CoV-2 RNA+ cells were enriched for chemokine and cytokine signaling, and responses to interferon, TNF, intracellular pathogens, and viruses (Methods, Supplemental Table 6 , Supplemental Fig. 8h) , and included CXCL11, CXCL10, CCL8, CCL18, and other interferon-responsive factors (e.g., ISG15, GBP5, TNFAIP6, GBP1 and NFKB1) (Fig. 4n ,o, Supplemental Table 6 ). Within "Inflammatory monocytes CD14 high CD16 high ", CXCL11, CCL18, CXCL10, TNFAIP6, ISG15, and GBP1 were upregulated in SARS-CoV-2 RNA+ cells (Fig. 4p, Supplemental Table 6 ), and among "Macrophages PPARG high CD15L high ", NT5DC1, NCF1, FLT1, CXCL11 and TNFAIP6 were all upregulated (Supplemental Table 6 ). To put our single-cell atlas in the tissue context and evaluate spatial expression patterns in autopsy tissues, we used Nanostring GeoMx Digital Spatial Profiling (DSP) for highly multiplexed proteomic or transcriptomic profiling from user-defined regions of interest (ROIs) (Methods). We profiled a total of fourteen lung tissues: nine COVID-19 donors also studied by snRNA-Seq (D8, D10, D11, and D12; LUL and D13-D17; upper lobe (UL)) and five LUL tissues from COVID-19 donors with no accompanying snRNA-Seq (D9, D18-D21; LUL). We also profiled right upper lobe (RUL) from D18; left lower lobe (LLL) from D19; heart left ventricle from D20; and parenchyma from three SARS-CoV-2 negative patients (D22-D24) (Supplemental Fig. 9, 10a) . . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made We stained serial sections from the same FFPE block with an RNAscope probe against the viral genome S gene; a GeoMx ® Cancer Transcriptome Atlas Panel (CTA, 1,811 genes); an early-access GeoMx ® Whole Transcriptome Atlas Panel (WTA, 18,335 genes); and, a DSP protein panel (77 proteins) . The CTA and WTA panels were supplemented with 26 human genes associated with SARS-CoV-2 biology (Methods). We selected ROIs in the WTA and CTA slides based on the S gene RNA probe and a 3-color nuclear (Syto13), immune (CD45), and epithelial (Pan-cytokeratin; PanCK) staining of the investigated slide (Fig. 5a, Methods) . We also performed a similar protocol using five additional FFPE blocks from subjects D13-D17, where GeoMx ® WTA ROIs were selected based on tissue type (bronchial epithelium, artery, alveoli), and where serial sections were DAB stained with an anti-SARS-CoV-2 spike glycoprotein antibody and RNAscope probes for SARS-CoV-2 to complement ROI selection (Supplemental Fig. 11a, Methods) . We chose ROIs that spanned a range of anatomical structures and viral abundance levels based on RNAscope signals, and, when possible, segmented them to PanCK + and PanCK -, inflamed and normalappearing alveoli areas of interest (AOIs) (Fig. 5a, bottom, Supplemental Fig. 11 , Methods); We did not use CD45 staining for segmentation due to high off-target binding. We then acquired profiles (WTA or CTA) from matched AOIs for all assays based on distance to morphological landmarks (Methods). We retained high quality data for all kept AOIs, observing high sequence saturation (Supplemental Fig. 10b, Methods) , and agreement between the CTA and WTA assays as reflected by correlation of expression levels for their overlapping genes (Supplemental Fig. 12a ,b; cases with lower correlations are partly due to physical distance between serial sections; Supplemental Fig. 12c ), viral expression scores (Fig. 5b, Supplemental Fig. 10c) , inferred cell . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint type composition (Fig. 5c, Supplemental Fig. 12d) , and differential expression between COVID-19 and healthy samples (Fig 5d,e, Supplemental Fig. 12e ,f, Supplemental Tables 7,8) . We focused our subsequent analyses on the WTA data. Across samples D8-D12 and D18-D24 (D13-D17 were analyzed separately, below), the donor autopsies showed variation in SARS-CoV-2 RNA expression, with four donors demonstrating elevated levels (Fig. 5b, Supplemental Fig. 10c, Methods) , consistent with the corresponding SARS-CoV-2 expression levels for the three samples also profiled by viral qPCR and sc/snRNA-Seq, and with the highest SARS-CoV-2 Spike signal in D20 by NanoString Protein assay (Supplemental Fig. 10c) . Deconvolution of major cell type composition with gene signatures from our snRNA-Seq atlas (Supplemental Tables 9, 10, Methods) showed distinct composition of PanCK + and PanCKcompartments of alveolar AOIs (Fig. 5c, Methods) , such that PanCK + compartments were dominated by inferred AT1 and AT2 cells, with smaller contributions of endothelial, ciliated, and basal cells. The two donors with normal alveolar morphology (D23 and D21) showed expression of both AT1 and AT2 cell markers, while most other donors showed a preference for either AT1 or AT2 cells in the PanCK + compartment. Compared to the PanCK + compartment, PanCKcompartment had greater inferred cellular diversity across and within subjects, with strong expression of genes associated with fibroblasts, myofibroblasts, vascular endothelial, and lymphatic endothelial cells. Within the set of PanCK -AOIs, AOIs from COVID-19 autopsy samples showed increased scores for fibroblasts and myofibroblasts compared to AOIs from control autopsies. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made In comparing gene expression between COVID-19 and control alveolar AOIs (Fig. 5d, Table 11 ). AOIs characterized as bronchial epithelium, artery, and alveoli in donors (D13-D17) exhibited distinct transcriptional programs (Supplemental Fig. 11b) . Specifically, 8,958 genes were differentially expressed (FDR < 0.05, Methods) between bronchial epithelial AOIs and matched normal alveoli from the same subjects, with cilium assembly being the top most enriched pathway (Komogorov-Smirnov statistic FDR q-value = 2.25*10 -25 ). Furthermore, significant transcriptional differences were also identified between inflamed and uninflamed alveoli AOIs within the same group of samples (D13-D17). A total of 1,246 genes were differentially expressed between inflamed and normal-appearing alveolar AOIs in the donors , with increased expression of innate . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint immune and inflammatory pathway genes by ssGSEA 53, 54 (Methods), including neutrophil degranulation (FDR q-value = 5.2*10 -17 ), interferon-γ signaling (FDR q-value = 3.4*10 -15 ), and signaling by interleukins (FDR q-value = 1.4*10 -13 ). Notably, contrasting transcriptional activity was observed in AOIs adjacent to each other, as demonstrated by the example of upregulated interferon-γ pathway in inflamed alveoli when comparing to normal-appearing alveoli within the same biopsy sample (Supplemental Fig. 11c) , with dysregulation of numerous pathway genes (Supplemental Fig. 11d) . Table 6 , Supplemental Fig. 8g) . Interestingly, NT5C, encoding a nucleotidase with a preference for 5'-dNTPs, is consistently up-regulated in SARS-CoV-2 high in both PanCK + and PanCKcompartments (Supplemental Fig. 13c ,d, Supplemental Table 12 ). There are no reports of this gene playing a role in lung injury, and it will be the target of future investigation. To gain insight as to how COVID-19 might affect other tissue types, we profiled liver (n = 16), heart (n = 15) and kidney (n = 11) tissues from a subset of the donors by snRNA-Seq (Methods), . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made (Fig. 6) . In all three tissues, we recovered the major parenchymal, endothelial, and immune cell subsets, with both automated and manual annotations (Fig. 6a,c,d,f,g,i) . Cell classes were generally well-mixed among donors and SARS-CoV-2 entry factors (ACE2, TMPRSS2, CTSL) were expressed in the expected cell classes; for example, ACE2 is highly expressed in kidney proximal tubular cell subsets and in heart cardiomyocytes and pericytes (Fig. 6j, Supplemental Fig. 14a,h-k) , and ACE2 and TMPRSS2 are co-expressed in liver cholangiocytes ( Fig. 6k) . Notably, we detected very few viral RNA reads in all three tissues, and almost all in droplets that would be deemed technical artifacts, such as misalignments (Supplemental Fig. 15 ). In one heart sample, the absence of viral reads was also confirmed by NanoString DSP and RNAscope (data not shown). Though no viral counts were present in the heart nuclei that passed quality control, we further examined the cardiomyocytes for evidence of gene expression changes similar to those observed in SARS-CoV-2 infected iPSC-derived cardiomyocytes in vitro 55 (Fig. 3e) . Scoring each nucleus by the expression of gene sets upregulated in infected vs. uninfected iPSC-derived cardiomyocytes revealed elevated expression among select cardiomyocytes (Fig. 6b) . The highscoring cells are predominantly from a single individual, D17, who was diagnosed with cardiac complications (Supplemental Fig. 14b-g) . Specific cell types in lung, liver and kidney are associated with severe COVID-19 through integrated GWAS risk loci . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made A complementary view of the mechanisms and causes of severe COVID-19 is offered by genetic studies, based on either common genetic variation studied through GWAS 56 , or on rare, coding loss-of-function variants 57, 58 . While these studies identify loci that causally contribute to disease risk, mapping the specific genes, their mechanisms, and organ/cell of action is challenging, especially because these may only manifest in the disease context. We reasoned that integration of the common variant GWAS data with our rich atlas can help address some of these challenges, To study genetic variants associated with severe COVID-19, we integrated our atlas with the COVID-19 GWAS summary statistics from COVID-19 Host Genetics Initiative's meta-analysis 59 (round 4 freeze, Oct 20, 2020) for two phenotypes: severe COVID-19 and COVID-19. We formed a list of 27 curated genes proximal to the 6 regions with the genome-wide significant association signals thus far (chromosomes 3p, 3q, 9, 12, 19 and 21 loci (https://www.covid19hg.org/results/), Supplemental Table 13 ), but excluded ABO from the reported results, because its expression cannot be reliably reported in sc/snRNA-Seq, given red blood cell contamination. We also identified 64 genes with aggregated GWAS signals using MAGMA 60 based upon two SNP-togene (S2G) linking strategies (Methods), which included 17 of the 25 curated GWAS genes (Supplemental Table 13 ). (We expect curated GWAS genes to overlap with MAGMA, as SNP associations lead to both GWAS loci and MAGMA genes.) Finally, we analyzed the 25 curated . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint GWAS genes and 61 MAGMA implicated genes separately in each of the four tissue atlases (lung, liver, kidney, and heart). The 25 curated GWAS genes across the four tissue atlases showed both cell type specific expression and differential expression between cells of the same cell type in COVID-19 vs. healthy tissue. In the lung, 21 of the 25 curated GWAS genes showed significant (FDR < 0.05) expression specificity in at least one cell type, including DPP9 (chr 19) in lung AT2 cells and CCR1 and CCRL2 (chr 3) in lung macrophages ( Fig. 7a and Supplemental Table 14 ). Furthermore, 18 of the 25 curated GWAS genes showed significant (FDR < 0.05) differential expression between cells of the same type in COVID-19 vs. healthy lung, including SLC6A20 in goblet cells, CCR5 in CD8 T cells and Tregs, and CCR1 in macrophages and CD16 monocytes ( Fig. 7b and Supplemental Table 14 ). For 55% of these GWAS genes, the average expression was significantly higher in the lung (p value < 0.05, t-test) compared to heart, liver, or kidney (Supplemental Fig. 16) . A similar specificity in cell type expression in the lung was also observed for the extended list of 61 MAGMA implicated genes (Supplemental Table 14 ). To relate COVID-19 severity traits to cell type level processes in each tissue, we analyzed gene programs characterizing cell types in general, as well as disease associated programs that are differentially expressed in the same cell type between COVID-19 and healthy tissue (Methods). We used sc-linker, a new computational approach (Methods, KAJ, KKD, ALP, AR, unpublished work), to link gene programs defined by scRNA-Seq to genetic signal using linkage disequilibrium (LD) score regression [61] [62] [63] . . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made In the lung, three cell type programs attained nominal significance (but none attained Bonferronicorrected significance, and should thus be interpreted cautiously): AT2 cells, ciliated cells and CD8 T cells ( Fig. 7c and Supplemental Table 15 ). The CD8 T cell program showed the highest excess overlap with the curated GWAS genes (3.4x), followed by the AT2 cell program (1.8x). In other tissues (kidney, liver and heart), nominal (but not Bonferroni-corrected) significance was attained in the proximal convoluted tubule and connecting tubule cell type programs in kidney and cholangiocytes in liver ( Fig. 7d and Supplemental Table 15 ). Moreover, disease progression programs in the lung highlighted significant association and high excess overlap in GWAS genes (3.8x) ( Fig. 7c and Supplemental Table 15 ). Finally, we nominated genes in each cell type (across programs) as potentially relevant for COVID-19 by integrating the sc-linker results with MAGMA gene level analysis (Methods). In lung, this approach highlighted, for example, OAS3 in AT2 and club cells and SLC4A7 in CD8 T Table 16 ). The highest number of nominated genes was observed for lung AT2 cells and spanned several loci, hinting at a polygenic architecture linking AT2 cells with COVID-19 (Supplemental Table 16 ). We further used this approach in the lung to prioritize genes by cell type specificity in COVID-19 lung at unresolved, significantly associated GWAS loci (chromosomes 3, 9, 12, 19 and 21) (Fig. 7e) , prioritizing, for example, FYCO1 (AT2, ciliated, club; chr3p), NFKBIZ (AT2; chr3q), and DPP9 (AT2; chr 19) (Supplemental Table 16 ). Here, we established a clinical sample processing pipeline to collect and profile autopsy tissues from COVID-19 donors at the single cell and spatial level, offering a rare opportunity to examine . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint the impact of severe disease in the same tissue across multiple donors, and multiple tissues within a single donor. To ensure long term success for meta-analyses, we collected careful clinical annotations, harmonized the common descriptors, and coordinated lab protocols with a sister atlas collected in a New York City hospital (Melms et al., companion manuscript). To overcome variable tissue quality, given differences in necrosis and PMI, we optimized our tissue processing protocols and implemented new computational strategies to analyze the resulting sc/snRNA-Seq data. Using CellBender, we removed ambient RNA and empty droplets from the data, and addressed ambient viral RNA, leading to more defined cell subsets with specific expression of canonical cell type markers, and allowing us to reliably identify cells harboring viral RNA. By implementing a novel approach to automatically annotate individual cells by their types, we generated annotated atlases of the lung, heart, kidney, and liver, and were able to leverage existing human data sets to contextualize our findings. Our parallel expert manual annotation of cell clusters by prior knowledge and literature-based gene sets helped further characterize cell subsets that may be more specific to COVID-19 host responses, including those that may not yet have been observed in other disease atlases and thus would not be readily captured through automated annotation with reference datasets. For example, this allowed us to distinguish different subtypes of macrophages, expressing different scavenger receptors and showing varying rates of viral enrichment. In particular, our analysis of different epithelial cell subsets and programs provides a detailed cellular landscape of the catastrophic set of events in these infected terminal lungs, showing how multiple avenues of repair have failed. First, we see relative depletion of AT1 and AT2 cells, the . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made [37] [38] [39] , especially in the context of lung fibrosis 46 indicates that multiple regenerative strategies are invoked to re-establish the alveolar epithelial cells that are lost to viral infection. The loss of AT2 cells and accumulation of PATS that we observe in COVID-19 lungs indicates that AT2 self-renewal and AT2 to AT1 differentiation are compromised, likely because of ACE2-mediated infection of AT2 cells. We hypothesize that the loss of AT2 cells and their insufficient self-replacement may trigger the mobilization of IPBLP as a last-resort strategy to regenerate the alveolar epithelial barrier, though this may come at the cost of fully-functional alveolar structure and function 46 . Notably, both airway secretory and alveolar AT2 cells express ACE2, indicating the tropism of SARS-CoV-2 for major progenitor populations in the lung, and suggesting that the loss of these progenitors may further compound the devastating epithelial damage caused by direct viral infection. Thus, the path to the terminal disease state of these lung samples is evidenced by the serial failure of epithelial progenitors to enact regeneration at the rate that functional cells are lost, first by secretory progenitor cells in the nasal passages and large and small airways, then followed by alveolar AT2 cells, PATS, and IPBLP cells, and eventually leading to complete lung failure (Fig. 3f) . . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To allow characterization of possible tissue pathologies, we also report possible "doublets" (cell barcodes with transcripts that would reflect cells of two different types and may reflect two cells/nuclei that are co-encapsulated) or cells/nuclei with high mitochondrial reads that passed initial quality controls. Further work will be needed to validate whether "possible doublets" are indeed technical artifacts or a result of biological events, such as dying or virally infected cells being phagocytosed (e.g., T+endothelial/macrophage cells). Meanwhile, cells with relatively high mitochondrial reads could similarly be biologically relevant, as both proliferating/metabolically active and necrotic cells tend to express higher levels of these genes. Relatedly, at the macroscopic level, tissue necrosis was evident and we observed evidence of proliferating fibroblasts at the gene expression level (Fig. 2b) . The level of viral RNA detected in lung samples varied significantly by donor, and there was a negative correlation between viral burden in lung tissue and time from symptom onset to time of death, likely reflecting different stages of COVID-19 lung pathology and viral infection. We primarily detected viral RNA within myeloid and endothelial cells ( Fig. 4h-m) . Genes differentially expressed between SARS-CoV-2 RNA+ and SARS-CoV-2 RNA-cells of the same type were enriched for chemokine and cytokine signaling, and responses to interferon, TNF, intracellular pathogens, and viruses. Surprisingly, however, we did not observe significant concordance between the expression of SARS-CoV-2 entry factors (ACE2, CTSL, FURIN, TMPRSS2, PCSK7, PCSK5, PCSK2, and NRP1, Fig. 4g ) and SARS-CoV-2 RNA+ cells (Fig. 4h) . Notably, in the seven samples containing highest levels of total SARS-CoV-2 UMI and the only SARS-CoV-2 RNA+ cells, we did not detect substantial populations of viable epithelial cells, including AT1, AT2, and ciliated subtypes compared to samples with lower total viral loads. In . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint particular, within the four individuals with the overall highest viral load, epithelial cells accounted for less than 10% of the total cells captured (Supplemental Fig. 7k, 8c) . This may be due to excessive epithelial death secondary to high viral replication within targeted pneumocytes and a highly inflammatory environment. We further caution that cell-associated SARS-CoV-2 UMI here only in SARS-CoV-2 high epithelial regions of the lung (Supplemental Fig. 9 ). While SARS-CoV-2 has been previously detected in the heart, liver and kidney, we did not detect viral reads by sc/snRNA-Seq in these tissues in the cohort in our study. By necessity, our study is largely descriptive, due to a small sample size (n=17), but by combining our single-cell profiles with signals from GWAS of COVID-19 and severe COVID-19, we can begin to shed light on the relationship between genetic risk factors and tissue physiology. Analyzing both 27 genes from regions that have genome-wide statistically significant associations as well as genome wide association signals from the COVID-19 summary statistics, our atlas helps highlight likely functionally relevant genes in unsolved GWAS loci by their expression in relevant cells, especially in the diseased lungs, as well as highlight cell types that are implicated by overall genetic signal. Both analyses showed more prominent signal in lung cells than in kidney, heart, or . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint liver, especially implicating genes in AT2 cells, ciliated cells, and CD8+ T cells at the cell type level, and AT2 cells and macrophages based on disease specific gene expression, and helped highlight specific genes from multi gene regions, such as FYCO1, SLC4A7 and NFKBIZ in chromosome 3, as underlying the association to those cells. Proximal convoluted tubule and connecting tubule cells in the kidney, as well as cholangiocytes in the liver were also implicated in this analysis. Nevertheless, given the size of our cohort and limited power of the current GWAS, our results should be interpreted with care. As the GWAS grows, and as meta-analyses of multiple atlases can be done (for example, by integration with Melms et al., companion manuscript), such analyses can help better characterize the mechanisms of COVID-19 severe disease. The data presented here can be used to inform future studies of COVID-19 tissue pathology and pathophysiology. The tissue processing and computational protocols we developed should be useful when studying an array of diseased or damaged tissues. In future studies, we will complete the profiling and release of the remaining seven tissue types collected, such as brain, spleen and trachea, to obtain a more complete view of COVID-19 derived organ pathology, and will integrate across atlases in meta-analyses, to provide critical resources for the community studying host-SARS-CoV-2 biology. Samples in this study underwent IRB review and approval at the institutions where they were originally collected. Specifically, Dana-Farber Cancer Institute approved protocol 13-416, . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint kidney, and brain (for donor D6) were placed in collection tubes, which were then placed inside a cooler for immediate transport to the Broad institute where all tissues were processed fresh. Autopsies at Brigham and Women's Hospital were performed in a negative pressure isolation room by personnel equipped with powered air-purifying or N95 respirators. Organs were removed from the body en bloc, and subsequently dissected for individual organ examination, including weighing and photographing. Representative samples of each lung lobe, trachea, left ventricle of the heart, aorta, coronary arteries, liver, kidney, lymph nodes, and spleen, as well as nasal and oral swab/scrapings were placed in 25 mL of RPMI-1640 media with 25 mM HEPES and L-glutamine (ThermoFisher Scientific) + 10% heat inactivated FBS (ThermoFisher Scientific) in 50 ml falcon tubes (VWR International Ltd). Collection tubes containing tissue were then placed inside a cooler for immediate transport to the Broad Institute. Additional tissue specimens from lung and heart were fixed in 10% formalin, processed and paraffin embedded using standard protocols. 5 µmthick slides were prepared from the FFPE tissue blocks and transferred to the Broad Institute. Autopsies at Beth Israel Deaconess Medical Center were performed on a cohort of 5 patients with positive SARS-CoV-2 nasopharyngeal swab on admission with a minimally invasive approach. Consent for autopsy and research was obtained from the healthcare proxy or the next of kin. Ultrasound-guided biopsies were performed within 3 hours of death on a gurney in the hospital morgue. The body was not removed from the bag until staff was wearing standard precautions and an N95 mask. Biopsies were obtained through 13G coaxial guide with 14G core biopsy and 20 mm sample length. Multiple core biopsies were acquired from each organ by tilting the coaxial needle a few degrees in different directions 65 . Cores for snRNA-Seq and spatial transcriptomics were flash-frozen and stored at -80˚C or paraffin embedded, respectively. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Tissues from deceased COVID-19 donors were processed immediately upon arrival in a Biosafety Level (BSL) 3 lab space. Tissues were washed in cold PBS and dissected in sterile Petri dishes. Grossly necrotic tissue areas were excluded. As phenotypes varied across organs, representative biopsies were collected from each tissue sample. Biopsies (each approximately 0.5 cm 3 ) were cut and either directly dissociated for scRNA-Seq or frozen for future processing (each described below). For snRNA-Seq, three dry biopsies per tissue were placed in one cryovial and flash frozen on dry ice before being placed at -80°C for long term storage. Two to four flash frozen tubes were collected per tissue. For scRNA-Seq of viably frozen tissue, tissue was viably frozen by placing 4 biopsies in 500 ml of CryoStor CS10 freezing media (STEMCELL Technologies). Tubes were inverted and then placed on ice for up to 30 minutes. Samples were then stored at -80°C for future processing. For scRNA-Seq of fresh tissue (i.e., samples processed on the day of collection), single biopsies were placed in a dissociation buffer and processed as described below. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made For each sample, 1 mL of lung dissociation medium (100 µg/mL of DNAse I, Roche; 1.25 mg/mL of Pronase, Roche; 9.2 µg/mL of Elastase, Worthington; 100 µg/mL of Dispase II, Roche; 1.5 mg/mL of Collagenase A, Roche; 100 µg/mL of Collagenase IV, Worthington) was aliquoted in a 15 mL Falcon tube (VWR International Ltd) and kept on ice until use. For each tissue, 2 wells of a 24 well plate were prepared and contained 2 mL RPMI-1640 media with 25 mM HEPES and Lglutamine (ThermoFisher Scientific) + 10% heat inactivated FBS (ThermoFisher Scientific). For viably frozen tissue, before sample thawing, tubes with lung dissociation media and the 24 well plate containing media were placed at 37ºC to warm. One biopsy piece was transferred to one well of a 24 well tissue culture plate (Corning) and moved back and forth in the liquid to wash. The biopsy was then placed in a second well containing media for 1 minute before moving to a 1.5 mL Eppendorf tube containing warmed lung dissociation media. While the tube was in a rack, sterile scissors were used to mince the tissue until most pieces were < 1 mm. The Eppendorf tubes containing minced tissues were then placed in a thermomixer (Eppendorf) and incubated for 25 minutes at 300 RPM, 37ºC. After 10-15 minutes, the sample was briefly triturated with a P1000 pipette, and returned to the thermomixer. After 25 minutes, 600 µL cold RPMI-1640 + 10% FBS + 1 mM EDTA (Invitrogen) was added to quench the digestion. Using a wide bore P1000 pipette tip (Rainin), the tissue was triturated~ 20 times. The dissociated tissue was then placed on ice. One 50 mL conical per tissue preparation was placed in a rack and on ice, and a 100 µm filter was prepared by wetting with 1 mL cold RPMI-1640 + 10% FBS + 1 mM EDTA. Using a wide-bore P1000 pipette tip, digested tissue was passed through the filter. The filter was washed with 4 mL cold RPMI-1640 + 10% FBS. Tubes were spun at 300g x 8 minutes at 4ºC. Supernatant was removed and pellet was resuspended in 2 mL ACK lysis buffer, placed on ice for 2-3 minutes. The . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint lysis was quenched in 2 mL RPMI-1640 + 10% FBS, and the sample was spun at 300g x 8 minutes at 4ºC. After centrifugation, most of the supernatant was removed and the sample pellet was resuspended in ~1 mL of RPMI-1640 + 10% FBS. Cells were counted and immediately loaded on the 10x Chromium controller (10x Genomics) for partitioning single cells into droplets. All sample handling steps were performed on ice. TST and ST buffers were prepared fresh as previously described 30, 66 . A 2x stock of salt-Tris solution (ST buffer) containing 292 mM NaCl (ThermoFisher Scientific), 20 mM Tris-HCl pH 7.5 (ThermoFisher Scientific), 2 mM CaCl2 (VWR International Ltd) and 42 mM MgCl2 (Sigma Aldrich) in ultrapure water was made and used to prepare 1xST and TST. TST was then prepared with 1 mL of 2x ST buffer, 60 µL of 1% Tween-20 (Sigma Aldrich), 10 µL of 2% BSA (New England Biolabs), 930 µL of nuclease-free water and supplemented with 0.2 U/ul RNaseIN Plus (Promega) and 0.1U/ul Superasin (ThermoFisher Scientific). 1xST buffer was prepared by diluting 2x ST with ultrapure water (ThermoFisher Scientific) in a ratio of 1:1, and was supplemented with 0.1% RNase inh (Enzymatics). 1 mL of 1xST without RNase was also prepared for sample dilution prior to 10X chip loading. Single frozen biopsy pieces were kept on dry ice until immediately prior to dissociation. With clean forceps, a single frozen biopsy was placed into a gentleMACS C tube on ice (Miltenyi Biotec) containing 2 mL of TST buffer. gentleMACS C tubes were then placed on the gentleMACS Dissociator (Miltenyi Biotec) and tissue was homogenized by running the program "m_spleen_01" one to three times, until tissue was fully dissociated. A 40 µm filter (VWR International Ltd) was placed on a 50 mL falcon tube (VWR International Ltd) and pre-wet with 1 mL of 1xST. Homogenized tissue was then transferred to the 40 µm filter and washed with 2 mL . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint of 1xST buffer containing RNase inhibitors. Samples were then centrifuged at 500g for 10 minutes at 4ºC. Sample supernatant was removed and the pellet was resuspended in 100 µL -500 µl 1xST buffer (without RNase inhibitor). Nuclei were counted and immediately loaded on the 10x Chromium controller (10x Genomics) for single nucleus partitioning into droplets. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint the program "RNA_02". Samples were centrifuged at 2,000xg for 1 minute and moved to new tubes. As DNA/RNA shield inactivates SARS-CoV-2, the samples could then be moved out of the BSL3 lab space. RNA was extracted using the quick-RNA MagBead kit (Zymo Research). RNA was treated with proteinase K according to manufacturer's instructions. For some donors (D13-D17), we had access to a single frozen lung biopsy. Bulk nuclei were isolated as a suspension (as described above) and flash frozen in RLT + 1 % 1 BME. To isolate total RNA, we added ~ 600 µl of 2x DNA/RNA shield. After proteinase K treatment, one volume of water was added to dilute the DNA/RNA shield. Total RNA was quantified via Nanodrop (ThermoFisher Scientific). This ligation-based library construction method utilized adapters (1.5 uM each) that contained a unique molecular identifier on the upstream of the i7 70 . Some libraries with lower viral loads were . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made For each bulk RNA sample, SARS-CoV-2 RNA was quantified by an RT-qPCR assay targeting the nucleocapsid protein (modified from the CDC N1 assay, as described previously 72 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The amount of hybridized product from each plate to pool together was determined based on the total segment area (µm 2 ) per column from the Lab Worksheet file produced by the DSP instrument following the volumes listed in Table 6 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Cell Ranger mkfastq (10x Genomics) was used to demultiplex the raw sequencing reads, and Cell Ranger count on Terra using the cellranger_workflow in Cumulus 33 was used to align sequencing reads and generate a counts matrix. Reads were aligned to a custom-built Human GRCh38 and SARS-CoV-2 RNA reference. ScRNA-Seq reads were aligned to "GRCh38_and_SARSCoV2", and snRNA-Seq reads were aligned to "GRCh38_premrna_and_SARSCoV2''. The GRCh38 pre-mRNA reference captures reads mapping to both exons or introns, and was built as previously described 66 . The SARS-CoV-2 viral sequence (FASTA file) and accompanying gene annotation and structure (GTF file) are as previously described 73 . The GTF file was edited to include only CDS regions, with added regions for the 5' UTR ("SARSCoV2_5prime"), 3' UTR ("SARSCoV2_3prime"), and anywhere within the Negative Strand ("SARSCoV2_NegStrand") of SARS-CoV-2. Trailing A's at the 3' end of the virus were excluded from the SARSCoV2 fasta file, as we found these to drive spurious viral alignment in pre-COVID-19 samples. CellBender remove-background 32 was run (on Terra) to remove ambient RNA and other technical artifacts from the count matrices. The workflow is available publicly as cellbender/removebackground (snapshot 11) and documented on the CellBender github repository as v0.2.0: https://github.com/broadinstitute/CellBender. Following CellBender, individual samples were processed using Cumulus, including filtering out cells/nuclei with fewer than 400 UMI, 200 genes, or greater than 20% of UMIs mapped to mitochondrial genes. Sc/sn RNA-Seq data from individual samples were combined into a single expression matrix and analyzed using Scanpy 74 with identical pre-processing steps -aggregation, PCA, and batch correction using Harmony-Pytorch 34 . First, transcript counts were clipped and then normalized. UMI counts were clipped at 13, the value of the 99th percentile of non-zero counts. Clipped count data were then normalized so that UMI counts per cell/nucleus summed to 10,000 and then logged, resulting in log(1+10,000*UMIs / total UMIs) for each cell/nuclei, "logtp10k". Next, highly variable genes were identified by Scanpy's highly_variable_genes function applied separately to data from each profiling modality: cryopreserved (cells), fresh tissue (cells), and fresh-frozen (nuclei), and retained genes that were highly variable across two or more methods. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Next, data dimensionality was reduced to 75 top principal components by PCA, followed by Harmony 34,75 for batch correction, where each sample was considered a separate batch. Neighbors were computed using the Harmony corrected PCA coefficients and UMAP and Leiden clusters (below) were computed using the resulting nearest neighbors' graph. Leiden clustering was performed on a k-nearest neighbor graph (k=100) with two different resolutions: 1.3 and 2. Resolution 1.3 was sufficient for coarse annotations of clusters. Clustering was performed using Pegasus 33 with default parameters (except the resolution as mentioned above). Individual cell profiles were annotated by classification using a logistic regression model trained on six publicly available lung or bronchial alveolar lavage scRNA-Seq datasets with published manual labels, all profiled using the 10x Genomics platform. These datasets included healthy lung 35 (and unpublished), pulmonary fibrosis 76, 77 , and COVID-19 78 (and unpublished) samples. Two additional datasets were withheld from training to evaluate the predictive model 79, 80 . Since annotations between published datasets differed in resolution and nomenclature, we manually determined a single reference label set. We selected the most granular cell type labels that appeared in at least two datasets in the training set. Published cell type labels were mapped to that reference list or those cells were removed from training if they did not match a term. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To ensure consistent gene expression quantification across studies, training datasets were reindexed to a single reference gene list and re-normalized from counts. Before training, data from all datasets were scaled together with the Scanpy scale function with zero_center=False. An L2 penalized logistic regression model was trained using SciKit-Learn's LogisticRegression function with C=0.1, max_iter=30, and multiclass=ovr. Performance on train and test sets was quantified by determining the frequency that predictions matched provided labels. If the predicted label was a cell type not included in the provided labels for that dataset, that cell was excluded from the accuracy calculation. Table 3 Kidney, heart and liver were classified using a similar approach after ambient removal and basic quality control as described above. For classifying kidney nuclei, a published meta-atlas 81 was leveraged as a training dataset. For liver and heart, we used publicly available 10x datasets for training (Supplemental Table 2 ). . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The manual annotation of clusters was done in two main steps: (1) Identifying main lineages in the global clustering, (2) Identifying sub population of cells by sub-clustering each lineage separately. Lung clusters were manually annotated by identifying elevated expression of pre-known markers in UMAP plots as well as visualizing mean Z-score expression of gene sets characterizing known populations (Supplemental Fig. 4b-4c ). Immune cell types were identified using canonical markers (Supplemental Fig. 4b) , whereas in order to identify the rest of the lineages we used both published 29, 82, 83 and unpublished gene sets (Supplemental Fig. 4c, Supplemental Table 17 ). After identifying the main cell lineages, we studied each lineage by sub-clustering it separately from the rest of the cells, to enable identification of subtle cell states. For each of the three immune cell types and the endothelial cells we re-computed highly variable genes, performed PCA, batch corrected with Harmony, generated K neighbor neighbors (k=100), and performed Leiden clustering, all with Pegasus 33 . We tested different options for the number of highly variable genes and PCs, followed by manual inspection of the resulting UMAP, and choosing the parameters where cell clusters were evenly distributed and devoid of finger-like structure. We assessed the stability of clusters with rand-index 84, 85 , and chose the highest resolution that showed a rand_index value > 0.85. We tested each cluster for differentially expressed genes, and merged clusters that showed no difference. One cluster in the original T and NK sub-clustering was identified as a B cell subset and was reported in that lineage. Harmonization at the lineage level merged different samples evenly across clusters, but cells and nuclei data were separated in all the sub-clusterings, making the cell data merged in one cluster, with or without additional nuclei in the same cluster (Supplemental Fig. 4g ). Differentially expressed (DE) genes were determined for each final sub-cluster by two methods: (1) single cell differential analysis implemented in Pegasus, and (2) pseudo-bulk analysis, summing UMIs from all cells in each cluster per donor, and then performing DE analysis between clusters by fitting a linear regression using the functions "lmFit" and "eBayes" from the limma R package 86 (Results from method 1 are available in Supplemental Tables 18-21) . Fig. 4d) we set different cutoffs to the statistics measured in each lineage to get the following number of DE genes per lineage: (a) the top 25 DEGs from method (1), (b) all DE genes above a threshold value of AUC from method . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Annotations of the unique sub-populations that were identified in the sub-clustering process were done by using legacy knowledge of these sub-populations through looking at unique gene markers associated with AUC statistics > 0.70 and pseudo-bulk DE over 2-fold difference in expression. In the remaining four subpopulations, we did not detect known gene signatures that correspond to healthy human lung ECs, so we imputed the endothelial subpopulations using Adaptivelythresholded Low Rank Approximation (ALRA; implemented with Seurat v3.2.1) to compensate for EC signature dropout. After imputation, we were able to annotate the remaining EC subpopulations (Supplemental Fig. 4f) : clusters 1-3. The remaining EC subpopulation (cluster 4) has a gene expression signature that overlaps with multiple EC subpopulations in the healthy lung reference data. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made For the lung epithelial and fibroblast lineages, we harmonized the cells using LIGER 87 , a nonnegative matrix factorization approach that allows cells to be described as occupying multiple different cell states. For the epithelial and fibroblast lineages, we used 8 factors and 7 factors respectively in the factorization. Each factor describes a different cell state, and to describe these states, we examined the top 100 genes loading each factor (Supplemental Tables 22-23) . We identified the PATS cell state using the gene signature derived from alveolar organoids in Supplementary Table 1 [37] [38] [39] . Within the epithelial lineage, we further sub-clustered the KRT8/PATS/ADI/DATPs (cluster 7 in Fig. 2g) , and for each we applied LIGER with 4 factors. We identified intrapulmonary basal-like progenitor cells (IPBLPs) and airway basal cells by expression of the marker genes TP63, KRT5, and KRT8, and by testing for differentially expressed genes using the Mann-Whitney U test . To identify expression differences between IPBLPs and airway basal cells, we compared these two cell subsets using the same test. For heart, liver and kidney, we used standard approaches of graph-based clustering on the integrated datasets (Seurat v3 for kidney and liver; Scanpy for heart) to first derive high-level groupings followed by differential gene expression to determine cluster enriched genes. For kidney and liver, we defined the endothelial and immune compartments as predominantly PECAM1+/CD31+ and PTPRC+/CD45+ clusters respectively from the high-level assignment, and subjected them to iterative clustering, differential expression and post-hoc annotation to determine granular subsets. Similar iterative clustering was performed for the high-level . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint mesenchymal groups in both tissues, proximal tubular compartment in the kidney and hepatocyte compartment in the liver. For heart, Leiden clustering at a resolution of 1.5 yielded clusters whose marker genes were compared to the heart atlas from Tucker et al. 88 . Where applicable, technical or putatively low-quality clusters are indicated as such to allow for permissive downstream analysis by the interested reader. We identified doublets using a two-step procedure for each tissue type based on ambient RNA removed gene count matrices. First, we independently identify doublets for each sample by calculating Scrublet-like 89 doublet scores and then automatically determining a doublet score cutoff. Secondly, we integrated and clustered all samples from the sample tissue type and further marked any cluster that has more than 70% of cells identified as doublets as a doublet cluster. All cells within doublet clusters were marked as doublets. We used a recent re-implementation of the Scrublet algorithm in Pegasus to calculate Scrubletlike doublet scores per sample. The original Scrublet algorithm consists of three major steps: preprocessing, doublet simulation and doublet score calculation using a k-nearest-neighbor (kNN) classifier. In the preprocessing step, we filtered low quality cells using the same criteria discussed in the quality control section, normalized each cell to TP100K, log-transformed expression values to log(TP100K+1) and identified highly variable genes using Pegasus 33 . We then standardized each highly variable gene based on the normalized count matrix (not log-transformed) and computed the first 30 principal components. In the doublet score calculation step, we built the kNN graph using Pegasus' kNN building function. We then tested if each cluster was statistically significantly enriched for doublets using Fisher's exact test and controlling False Discovery Rate at 5%. Finally, among clusters that are significantly enriched for doublets, we picked those with more than 70% of cells identified as doublets and marked all cells in these clusters as doublets. We performed composition estimation for the bulk lung samples using the MuSiC R package 93 . Ten bootstrap estimates, each using an independent sampling of 10,000 single cells, stratified by annotated cell type, to use as reference, were used to perform estimation.We confirmed that the . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made subsampling of the reference to 10,000 cells was appropriate and did not introduce a specific sample composition bias by comparing the predicted sample composition with successively lower number of reference cells in the range of 10,000 to 5,000 cells (Supplemental Fig. 5d ). To test statistical significance of changes in cell type proportions between healthy and COVID-19 samples we used a Dirichlet-multinomial regression, as described by Smillie et al 94 . We tested for differential expression between COVID-19 and other lung samples using Linear Regression models applied to each individual gene with the automated cell annotations. A model for each cell type was built using the "statsmodels'' module in Python and included Boolean covariates for snRNA-Seq vs. scRNA-Seq and 10X V3 kit vs. other kits as well as a continuous covariate for UMI counts for each cell and a random intercept (Gene expression ~ c1 + c2*(nuclei indicator) + c3*(10X v3 indicator) + c3*(number of UMIs) + c4*(COVID-19 indicator). Since each dataset had different numbers of cells, we sampled 2000 cells with replacement from the 10 healthy datasets and four COVID-19 datasets weighted specifically to balance the disease states and studies. We used Hold-Sidak multiple test correction in "statsmodels'' with an error rate of alpha=0.05 to test which COVID-19 coefficients were predictive of gene expression levels. To address viral ambient RNA, we adapted several methods (CellBender 32 and those described in Kotliar, Lin et al. 52 and Cao et al. 95 ) to create a conservative criteria for assigning a single cell or single nucleus as SARS-CoV-2 RNA+ or RNA-. We used a binomial test to determine the . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The abundance of SARS-CoV-2 aligning UMIs in the ambient pool was defined as the sum of all UMIs in the pre-CellBender output (from CellRanger counts) within discarded cells determined as "empty" or "low quality", and was therefore determined on a per-sample basis. The estimated percent ambient contamination was defined as the ratio of the total UMIs for each single cell before vs. after CellBender. An exact binomial test was applied to each single cell using R given these parameters. P-values were adjusted using a Bonferroni correction and p < 0.01 was considered significant. Cells were assigned as "SARS-CoV-2 RNA+", "SARS-CoV-2 Ambient" (if they had any UMIs to SARS-CoV-2 but were not significantly higher than the ambient pool) and "SARS-CoV-2 RNA-" (no UMIs to SARS-CoV-2). To test for host genes associated with the presence of SARS-CoV-2 RNA, we used the following approach to mitigate potential biases or confounding factors due to low cell numbers, differences in cell quality, or donor-to-donor variability: (1) We removed any cells classified as "SARS-CoV-2 ambient" from our analysis; (2) We analyzed DE genes only within cell types that contained at least 10 SARS-CoV-2 RNA+ cells; (3) Within a given cell type, we only considered donors where at least 2 cells were SARS-CoV-2 RNA+, and selected SARS-CoV-2 RNA-cells only from these . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made donors; (4) We subsampled the SARS-CoV-2 RNA-cells to match complexity distributions to the SARS-CoV-2 RNA+ as previously described 41 (briefly, cells were partitioned into 10 bins based on complexity (defined by the log10(# genes/cell)), and the SARS-CoV-2 RNA-cells were randomly subsampled to match the distribution in the SARS-CoV-2 RNA+ cells). We tested differential expression using the DESeq2 package 96 , which fits a general linear model to each gene's expression, and included % mitochondrial genes per cell as a continuous covariate, and the donor as a discrete covariate. Genes were considered DE if they had an FDR-corrected pvalue < 0.05. Gene set enrichment testing with GSEA 53,54 was completed by defining a pre-ranked list of genes based on the log2(fold change) calculated from DESeq2 between SARS-CoV-2 RNAcells and SARS-CoV-2 RNA+ cells of a given cell type. Hallmark, C2, and C5 gene sets were tested for enrichment, and an FDR-corrected p < 0.05 was considered significant. Differential expression (DE) analysis was performed after FASTQ files were aligned to Hg19 using STAR 97 . RSEM was used to generate a count matrix 98 and DE genes were computed in R 99 using DESeq2 96 . A log2 FC cutoff of 1.4 and an adjusted p-value of 0.05 was used as a threshold for significance. GSEA 53,54 was used to find gene ontology terms associated with the genes significantly up-regulated genes or down-regulated genes in highly infected tissue. Inspired by Viral Track 100 , a program that identifies viral UMIs in human single cells and enumerates them per meta-cell, we computed a viral enrichment score per cluster. We could not . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint employ the enrichment score of ViralTrack given the low number of SARS-CoV-2 RNA+ cells in our data, such that entire clusters, rather than higher resolution meta cells, were required for sufficient power. The enrichment score for cluster C in a clustering of cells is computed as follows: Where ? is the proportions of the total number of cells in cluster C out of the total number of cells in the given clustering (either all cells or within lineage). We assessed the significance of each enrichment score empirically by permuting the data 100 times by randomly assigning the same number of SARS-CoV-2 RNA+ annotations to cells, such that the overall proportion of SARS-CoV-2 RNA+ cells was the same, computing the enrichment score per cluster, and calculating empirical p-value as the fraction of permutation that showed a higher enrichment score compared to the real result. We adjusted for multiple hypothesis testing by a Benjamini/Hochberg False Discovery Rate (FDR) as implemented in the function 'fdrcorrection' in the Python package statmodels 101 . For CTA and WTA (RNA) data, sequencing reads from NovaSeq or NextSeq was compiled into FASTQ files corresponding to each AOI using bcl2fastq. FASTQ files were demultiplexed and . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made were removed from the analysis if sequencing reads totaled less than 10,000. Target counts were generated by taking the geometric mean of all probe counts against the same gene target in each well. Individual CTA probes were dropped if they had counts less than 10% of the target counts (2 CTA) or if they were global Grubb's outliers at alpha = 0.01 (6 for CTA). These probe screens were not performed on the WTA assay as each gene is sampled by one probe only. As a quality control step, sequencing saturation was calculated by dividing the number of non-unique sequencing reads by the total number of reads for each AOI. AOIs with sequence saturation below 50% were dropped from analysis (2 CTA, 0 WTA). The 75 th percentile of the target counts of each probe pool in each AOI were calculated and normalized to the geometric mean of the 75 th percentiles across all AOIs to give normalization factors. Target counts were divided by these normalization factors to give normalized expression counts. WTA and COVID-19 Spike-in GeoMx DSP samples from D13-17 patients were processed similarly as above with a few modifications. Both WTA and COVID-19 Spike-in panels were sequenced on an Illumina NovaSeq platform, aligned to probe references and demultiplexed using Nanostring's DnD pipeline (as above). The COVID-19 Spike-in panel contains multiple probes per target gene and counts that were not excluded by outlier detection were collapsed into a single value by calculating the geometric mean. In any cases where the probe counts were zero, the probe count was set to one prior to taking the geometric mean. Target counts for both the BIDMC's WTA and COVID-19 Spike-in genes were then normalized by multiplying each count by its poolspecific negative probe normalization factor. Genes were further removed if their mean expression . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint was greater than 100. Following these AOI-based and feature-based filtering steps, the data were normalized to the geometric mean of the 75 th percentile as above. To measure DSP protein expression, indexing oligonucleotides were UV photocleaved from the profiled antibodies and deposited into a 96-well plate. Oligonucleotides were then hybridized to fluorescent barcodes and run through the nCounter MAX analysis system to generate raw digital counts. The raw counts were inputted into the DSP for calibration (adjusting for oligo-barcode binding efficiency) and for generation of a protein expression matrix; this matrix was subsequently normalized to internal spike-in positive controls (ERCC) to account for system variation. The calibrated and ERRC-normalized expression matrix was then normalized to the geometric mean of housekeeping genes 102 as follows: first, the geometric mean of three housekeeping genes, histone H6, GAPDH, and S6, was calculated for each AOI; next, normalization factors were calculated as the geometric mean in each AOI dividing by the geometric mean of geometric means across all AOIs; finally, raw expression values were divided by the calculated normalization factors to give normalized protein expression levels. Log-transformed matrices were obtained from normalized CTA, WTA and protein expression matrices by applying the function ( ) = ( + 1) to each element of the matrices and were used for downstream analysis. We note that protein DSP data showed heterogeneity in immune cell marker levels across ROIs and donors (Supplemental Fig. 10d) , but in many alveolar ROIs, PanCK and CD45 immunofluorescence stains showed significant mucus staining, which could affect antibody-based DSP data. Such off-target mucus binding was not observed in RNAscope images, and is thus less likely to be a concern in oligonucleotide probe-based WTA and CTA data. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made A signature score summarizes the expression levels of a set of functionally-related genes for each AOI while controlling for gene abundance related technical noise 82 . Signature scores were calculated on the log-transformed expression matrices using Pegasus 33 ' new signature score calculation algorithm 103 . A gene set consisting of COVID-19 S and ORF1ab genes was used to calculate the SARS-CoV-2 virus signature score. To score cell signatures we used a 'reconciled' annotation between the automated ('predictions') and manual annotations, as determined by the following rules. If a cell is assigned more than one label according to the rules, the first rule that applies to that cell determines its label. AT1: To define cell markers for deconvolution, Pegasus v1.0 was used to conduct differential expression (DE) analysis on the snRNA-Seq data using the 'reconciled' annotation (above). For each cell type, Mann-Whitney U tests were conducted between cells annotated as this cell type and all other cells and with an FDR control at 0.05. DE genes were ranked by their Area Under the Receiver Operating Characteristics (AUROC) scores (Supplemental Table 9 ). Next, 10 marker genes were selected for each cell type by the combination of the DE results and published lung cell type markers 29, 41, 79 . We additionally picked 6 published marker genes for basal cells and neutrophils, respectively. These cell type markers were used for Fig. 5c and are available in Supplemental Table 10 . Many of these markers were not available for the CTA data, which comprises only ~1,800 genes. Thus, for cell types with less than 3 markers in the CTA data, we added genes based on the DE results to ensure at least 3 markers. These markers were used for Supplemental Fig. 12d and are available in Supplemental Table 10 . All ROIs in WTA data annotated as alveoli PanCK + and alveoli PanCKfrom donors D18, D19, D20, D21, D8, D9, D10, D11 and D12 were grouped together as the donor group (Fig. 5d,e) ; all ROIs in WTA data annotated as alveoli PanCK + and alveoli PanCKfrom controls D22, D23 and D24 were grouped as the control group and then the limma package 86 was used to perform differential expression analysis between the two groups (Fig. 5d,e) . Specifically, Q3-normalized . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint count data was transformed to log2-counts per million. The transformed data was fit with a linear model, considering the indicator functions of donor and control ROI, and empirical Bayes moderation was used to estimate the contrast S -C, where S stands for donor data and C for control. A similar approach is taken for alveoli CTA (Supplemental Fig. 12e,f) . GSEA analysis was conducted using fGSEA (R package) 104 using the MSigDB Hallmark gene set (https://www.gseamsigdb.org/gsea/msigdb/download_file.jsp?filePath=/msigdb/release/7.2/c2.cgp.v7.2.symbols.g mt), with gene ranks preset by their log2(fold-change) in a DE analysis, and the size of a gene set to test ranging between 15 and 500. Terms that are up-regulated (corresponds to COVID-19 samples) and down-regulated (corresponds to healthy samples) were plotted separately, with an FDR q-value threshold of 0.05. Correlation analysis was performed across 1,783 genes that were shared between the CTA and WTA data and 306 ROIs where both CTA and WTA data were collected. For each ROI, the Pearson's correlation coefficient between its corresponding CTA and WTA count vectors of the common genes. We excluded 10 ROIs where the WTA counts for the common genes were a constant vector of small numbers (resulting in NaN correlation). . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Fig. 13a,b) as SARS-CoV-2-high (score>1.30), SARS-CoV-2-medium (0.75<=score<=1.30), and SARS-CoV-2-low (score<0.75). The limma package was used to perform differential expression analysis comparing SARS-CoV-2-high vs. SARS-CoV-2-low ROIs (separately for PanCK + and PanCK -AOIs), where Q3-normalized count data was transformed to log2-counts per million, the transformed data was fit into a linear model with A total of 50 AOIs from lung biopsies (subjects D13-D17) were annotated as inflamed alveoli and 17 AOIs as normal alveoli within the same slides following review by board certified pathologists at the time of ROI selection (Supplemental Fig. 11a) . Inflamed ROIs were selected based on histomorphology. Specifically, inflamed alveoli were defined as areas of alveolar tissue, excluding medium-sized bronchioles and large vessels, with intra-alveolar inflammatory cells (CD45 + or CD68 + cells) with or without fibroblastic foci. These areas generally also showed enlarged and detached type II pneumocytes and hyaline membrane formation. Non-inflamed ROIs (normal . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint alveoli) were selected as areas of alveolar tissue from the same section. For reference, AOIs of bronchial epithelium (n=8) and of arterial vessels (n=5) were also selected. Q3-normalized count data were transformed to log2-counts per million and fit into a generalized linear model (Expression ~ Patient + ROI Group). The patient-normalized residuals were utilized to generate plots following dimensionality reduction (UMAP, PCA). Differential expression analysis between inflamed (n=50) and normal appearing alveoli (n=17) was performed by incorporating the log2counts per million into a generalized linear model and using contrasts to identify significant comparisons. All models were fitted using the quasi-likelihood negative binomial generalized loglinear model implemented in edgeR 105 . SARS-CoV-2 genomes were assembled using the open-source software viral-ngs version 2.0.21, implemented on the Terra platform (app.terra.bio), using workflows that are publicly available via the Dockstore Tool Registry Service (dockstore.org/organizations/BroadInstitute/collections/pgs). Briefly, reference-based assembly (assemble_refbased) was performed using the SARS-CoV-2 reference NC_045512.2. We constructed a phylogenetic maximum likelihood (ML) tree using the Augur pipeline (augur_with_assemblies) including unique genomes assembled with greater than 10X mean coverage depth and 772 previously reported genomes from unique individuals from (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint syncytial virus, SARS-CoV, MERS-CoV, human rhinovirus) We used the classify_single and merge_metagenomics workflow with reference files as previously described 106 . Cell type gene programs were constructed from the scRNA-Seq data for each annotated cell type by performing a differential expression between the cell type on interest and the rest of the cells using a Wilcoxon rank sum test. Disease progression programs were constructed by a differential expression of disease and healthy cells of the same cell type as described in the "Healthy reference comparisons and differential expression" section earlier in the Methods. We also investigated gene programs based on genes that are differentially expressed in infected cells compared to uninfected, or genes that covary with the ACE2 and TMPRSS2 genes, but these programs showed no disease informative signal. Additionally, we performed MAGMA (de Leeuw 2015) on the COVID-19 summary statistics using both window-based and enhancer based SNP-to-gene linking strategies (union of Roadmap and Activity-By-Contact (ABC); RoadmapUABC). Cardiomyocytes from the combined data (nuclei from Leiden resolution 1.5 clusters 2, 3, 12, 13, (CALM1, ATP2A2, ENO3, ACTB, TPM2, TPM4, CALM3, TPM1, CALM2, MYL9, MYL6, TNNC1, CKM, MYL7, MYH11, MYLK, MYL4, TNNI1, MYH6, TPM3, TNNI3, CAPZB) 55 using . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint the "score_genes'' function from Scanpy 1.5.1. A "virus/infection response score" was calculated as max(0, infected_score) -max(0, uninfected_score). The genes that contribute most to this score were assessed by conducting a differential expression test between high-scoring and low-scoring cardiomyocytes. Testing was performed by summing cardiomyocyte expression over individual and score-group (high: > 5, low: <=5) using voom and limma 86 with the model "~ 0 + scoregroup", testing the contrast (scoregroup:high -scoregroup:low), and using Benjamini Hochberg to adjust p-values for multiple hypothesis testing. All samples were initially processed using Cumulus (https://github.com/klarman-cellobservatory/cumulus), which we ran on the Terra Cloud platform (https://app.terra.bio/). Code for all other analyses will be available on GitHub upon publication. Processed sequencing (sc/snRNA-Seq and bulk) data will be deposited in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and raw human sequencing data will be deposited in the controlled access repository DUOS (https://duos.broadinstitute.org/), upon publication. Nanostring GeoMx raw and normalized count matrices will be available on GEO upon publication. The processed lung data (sc/snRNA-Seq) is available on the Single Cell Portal: https://singlecell.broadinstitute.org/single_cell/study/SCP1052/ . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint P.R.T. receives consulting fees from Cellarity Inc., and Surrozen Inc., for work not related to this manuscript. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made cell self-renewal (1) and AT1 differentiation (2) are inhibited, resulting in PATS accumulation (3) and recruitment of airway-derived IPBLP progenitors to alveoli (4). . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. color bar) for genes sets specific to lung (AT1 a. 79 and b. 41 and AT2 (c. 79 samples (x axes). Pearson's r denoted in the upper left corner with significance following Bonferroni correction (q). r. Impact of viral load on bulk RNA profiles. Significance (-log10(Pvalue), y axis) and magnitude (log2(fold-change), x axis) of differential expression of each gene . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Figure 9 . Metadata for the COVID-19 autopsy lung spatial atlas. Metadata for FFPE tissues from 12 donors (3 SARS-CoV-2 negative, 9 SARS-CoV-2 positive). S/s to death: time from onset to death, in days. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. Fig. 6b ). g. Proportion of cardiomyocytes (dot size) in each sample (row) that score low (score<5) or high (score >5) (columns) for the infected iPSC-derived cardiomyocytes 55 signature. Nearly all high scoring nuclei come from a single sample. h-k. Cell type specificity of SARS-CoV-2 viral entry factors. h-j. UMAP of COVID-19 heart snRNA-Seq profiles colored by expression of ACE2 (h), TMPRSS2 (i), and CTSL (j) (color is saturated to emphasize small counts). k. Mean expression (color) and proportion of expressing cells (dot size) for each viral entry factor (columns) across the cell subsets in the heart dataset (rows). . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint Supplemental Figure 15 . Different source of viral UMIs in heart, liver and kidney, compared to lung a, b, c, d. Number of viral UMIs (red, right y axis) and all UMI (black, left y axis) for each droplet (x axis), rank ordered by total number of UMIs, in lung (a), heart (b), liver (c), and kidney (d). In lung (a) viral reads are in either ambient RNA (UMI counts (~100 for many droplets) in the empty droplet plateau #200,000 -1,000,000) or in nuclei, above and beyond the ambient RNA level. In heart (b), liver (c), and kidney (d) most viral UMI counts do not come from empty droplets, but mostly from droplets in the tail (beyond droplet 1,000,000), suggesting these are not ambient RNA, but a technical or mapping artifact. e, f, g, h. Cumulative distribution functions of the number of viral-positive droplets as a function of total droplet UMI count. In lung (e) most viral-positive droplets are empty droplets (total UMI count ~ 100) with some viral-positive droplets which contain nuclei (UMI count > ~1,000), but in heart (f), liver (g), and kidney (h), most of the "viralpositive" droplets have fewer than 10 total UMI counts, suggesting these reads are not trustworthy. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint Mean expression (dot color, log(TP10K + 1)) and proportion of expressing cells (dot size) for each curated GWAS gene (columns) in each cell subset (rows) for lung (a), heart (b), liver (c) and kidney (d) COVID-19 autopsy atlases. Some GWAS genes have higher expression in the lung compared to the other three tissues. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Table 5 . Differentially expressed genes in cluster 7, epithelial cell sub-cluster Supplemental Table 6 Table 7 . Differentially expressed genes between subject and control tissues of WTA data in alveolar AOIs. Sheets of titles ending with ".up" give significant up-regulated genes on subject data, while those ending with ".down" give significant genes on control data. Only genes with adjusted p-value < 0.05 are listed; "PanCK" in sheet names refers to PanCK + AOIs, while "Syto13" refers to PanCK -AOIs. . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Table 8 . Differentially expressed genes between subject and control tissues of CTA data in alveolar AOIs. Sheets of titles ending with ".up" give significant up-regulated genes on subject data, while those ending with ".down" give significant genes on control data. Only genes with adjusted p-value < 0.05 are listed; "PanCK" in sheet names refers to PanCK + AOIs, while "Syto13" refers to PanCK -AOIs. Table 9 . DE results of snRNA-Seq based on the reconciled annotation. Each sheet lists up-regulated genes by comparing nuclei inside the corresponding cluster with those outside, under q-value control at 0.05 using Mann-Whitney U test. Genes are sorted by AUROC in descending order. Table 10 . Cell type markers used for cell type deconvolution of WTA and CTA data. This table contains two sheets, one for WTA and one for CTA. Table 11 . List of up-regulated genes in patient PanCK + alveolar AOIs overlapping with genes expressed at higher levels in epithelial cell types with high ACE2 expression. In each sheet title, the corresponding cell types are separated with spaces. Table 12 . SARS-CoV-2 high vs. SARS-CoV-2 low. Differentially expressed genes from subject tissues of WTA data in alveolar AOIs. Sheets of titles ending with ".up" . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint give significant up-regulated genes on SARS-CoV-2 high data, while those ending with ".down" give significant genes on SARS-CoV-2 low data. Only genes with adjusted p-value < 0.05 are listed; "PanCK" in sheet names refers to PanCK + AOIs, while "Syto13" refers to PanCK -AOIs. Table 13 . Gene descriptions for genes relevant to COVID-19 GWAS. We list 26 genes (with gene descriptions) identified by the COVID-19 Human Genetics Initiative based on their proximity to GWAS peaks, and 64 genes identified by MAGMA analyses using 0kb and enhancer map S2G strategies. Table 14 . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.25.430130 doi: bioRxiv preprint Pathological findings of COVID-19 associated with acute respiratory distress syndrome Clinical features of patients infected with 2019 novel coronavirus in Wuhan SARS-CoV-2 infection of the liver directly contributes to hepatic impairment in patients with COVID-19 Clinical Features of COVID-19-Related Liver Functional Abnormality Clinical Characteristics of Coronavirus Disease 2019 in China Multiorgan and Renal Tropism of SARS-CoV-2 SARS-CoV-2 renal tropism associates with acute kidney injury Endothelial cell infection and endotheliitis in COVID-19 Clinical and radiographic features of cardiac injury in patients with 2019 novel coronavirus pneumonia COVID-19 and the cardiovascular system Neuropathology of patients with COVID-19 in Germany: a post-mortem case series International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 12 Viral presence and immunopathology in patients with lethal COVID-19: a prospective autopsy cohort study Neurologic Manifestations of Hospitalized Patients With Coronavirus Disease Clinical and immunological features of severe and moderate coronavirus disease 2019 Dysregulation of Immune Response in Patients With Coronavirus 2019 (COVID-19) in Wuhan, China Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Impaired type I interferon activity and exacerbated inflammatory responses in severe Covid-19 patients Postmortem Kidney Pathology Findings in Patients with COVID-19 Postmortem examination of COVID-19 patients reveals diffuse alveolar damage with severe capillary congestion and variegated findings in lungs and other organs suggesting vascular dysfunction COVID-19 pulmonary pathology: a multi-institutional autopsy cohort from Italy and New York City Association of Cardiac Infection With SARS-CoV-2 in Confirmed COVID-19 Autopsy Cases International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 23. Wichmann, D. et al. Autopsy Findings and Venous Thromboembolism in Patients With COVID-19: A Prospective Cohort Study Megakaryocytes and platelet-fibrin thrombi characterize multiorgan thrombosis at autopsy in COVID-19: A case series Clinicopathologic and Immunohistochemical Findings from Autopsy of Patient with COVID-19 Pulmonary and cardiac pathology in Covid-19: the first autopsy series from New Orleans Autopsy of COVID-19 patients in China The evolution of pulmonary pathology in fatal COVID-19 disease: an autopsy study with clinical correlation Author Correction: A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors The Human and Mouse Enteric Nervous System at Single-Cell Resolution Multiplex digital spatial profiling of proteins and RNA in fixed tissue CellBender remove-background: a deep generative model for unsupervised removal of background noise from scRNA-seq datasets Cumulus provides cloud-based data analysis for large-scale single-cell and . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made single-nucleus RNA-seq Fast, sensitive and accurate integration of single-cell data with Harmony A molecular cell atlas of the human lung from single-cell RNA sequencing Integrated Single Cell Atlas of Endothelial Cells of the Human Lung. Cold Spring Harbor Laboratory Alveolar regeneration through a Krt8+ transitional stem cell state that persists in human lung fibrosis Persistence of a regeneration-associated, transitional alveolar epithelial cell state in pulmonary fibrosis Inflammatory Signals Induce AT2 Cell-Derived Damage-Associated Transient Progenitors that Mediate Alveolar Regeneration SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes SARS-CoV-2 induces transcriptional signatures in human lung epithelial cells . CC-BY-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made that promote lung fibrosis Lung fibrosis: an undervalued finding in COVID-19 pathological series Lineage-negative progenitors mobilize to regenerate lung epithelium after major injury Basal-like Progenitor Cells: A Review of Dysplastic Alveolar Regeneration and Remodeling in Lung Repair Virological assessment of hospitalized patients with COVID-2019 SARS-CoV-2 detection, viral load and infectivity over the course of an infection Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 Human alveolar epithelial type II cells in primary culture Release of lamellar bodies from alveolar type 2 cells Single-Cell Profiling of Ebola Virus Disease In Vivo Reveals Viral and Host Dynamics International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes SARS-CoV-2 infection of human iPSC-derived cardiac cells predicts novel cytopathic features in hearts of COVID-19 patients Severe Covid-19 GWAS Group et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 Autoantibodies against type I IFNs in patients with life-threatening COVID-19 The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic MAGMA: generalized gene-set analysis of GWAS data Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection Partitioning heritability by functional annotation using genome-wide association summary statistics Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNP-to-gene linking strategies. Cold Spring Harbor Laboratory Temporal and Spatial Heterogeneity of Host Response to SARS-CoV-2 Feasibility and safety of ultrasound-guided minimally invasive autopsy in COVID-19 patients A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors Coordinated host-pathogen transcriptional dynamics revealed using sorted subpopulations and single macrophages infected with Candida albicans Spatial reconstruction of single-cell gene expression data Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing Capturing sequence diversity in metagenomes with comprehensive and scalable probe design Remdesivir for the Treatment of Covid-19 -Final Report SCANPY: large-scale single-cell gene expression data analysis Github) Single Cell RNA-seq reveals ectopic and aberrant lung resident cell populations in Idiopathic Pulmonary Fibrosis Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis A cellular census of human lungs identifies novel cell states in health and in asthma Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 RAAS blockade, kidney disease, and expression of ACE2, the entry receptor for SARS-CoV-2, in kidney epithelial and endothelial cells. Cold Spring Harbor Laboratory A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade The Human Lung Cell Atlas: a high-resolution reference map of the human lung in health and disease Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data An immune-cell signature of bacterial sepsis limma powers differential expression analyses for RNA-sequencing and microarray studies Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity Transcriptional and Cellular Diversity of the Human Heart Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data Doublet detection in Pegasus A single-cell molecular map of mouse gastrulation and early organogenesis Decoding human fetal liver haematopoiesis Bulk tissue cell type deconvolution with multi-subject single-cell expression reference Intra-and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis Single-cell analysis of upper airway cells reveals host-viral dynamics in influenza infected adults. Cold Spring Harbor Laboratory Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 STAR: ultrafast universal RNA-seq aligner RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome R: A language and environment for statistical computing Host-Viral Infection Maps Reveal Signatures of Severe COVID-19 Patients Econometric and statistical modeling with python Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes A streamlined method for signature score calculation Fast gene set enrichment analysis edgeR: a Bioconductor package for differential expression analysis of digital gene expression data Phylogenetic analysis of SARS-CoV-2 in the Boston area highlights the role of recurrent importation and superspreading events Improved metagenomic analysis with Kraken 2 Capillary cell-type specialization in the alveolus Single-cell longitudinal analysis of SARS-CoV-2 infection in human bronchial epithelial cells International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity We are deeply grateful to all donors and their families. We acknowledge the contribution of Casey