key: cord-0684317-iqd0xj4a authors: Stephenson, E.; Reynolds, G.; Botting, R. A.; Calero-Nieto, F. J.; Morgan, M.; Tuong, Z. K.; Bach, K.; Sungnak, W.; Worlock, K. B.; Yoshida, M.; Kumasaka, N.; Kania, K.; Engelbert, J.; Olabi, B.; Spegarova, J. S.; Wilson, N. K.; Mende, N.; Jardine, L.; Gardner, L. C.; Goh, I.; Horsfall, D.; McGrath, J.; Webb, S.; Mather, M. W.; Lindeboom, R. G.; Dann, E.; Huang, N.; Polanski, K.; Prigmore, E.; Gothe, F.; Scott, J.; Payne, R. P.; Baker, K. F.; Hanrath, A. T.; Schim van der Loeff, I. C.; Barr, A. S.; Sanchez-Gonzalez, A.; Bergamaschi, L.; Mescia, F.; Barnes, J. L.; Kilich, E.; de Wilton, A.; Sai, title: The cellular immune response to COVID-19 deciphered by single cell multi-omics across three UK centres date: 2021-01-15 journal: nan DOI: 10.1101/2021.01.13.21249725 sha: b560c93ea82b96b6fd85ade085642cad411bd85d doc_id: 684317 cord_uid: iqd0xj4a The COVID-19 pandemic, caused by SARS coronavirus 2 (SARS-CoV-2), has resulted in excess morbidity and mortality as well as economic decline. To characterise the systemic host immune response to SARS-CoV-2, we performed single-cell RNA-sequencing coupled with analysis of cell surface proteins, providing molecular profiling of over 800,000 peripheral blood mononuclear cells from a cohort of 130 patients with COVID-19. Our cohort, from three UK centres, spans the spectrum of clinical presentations and disease severities ranging from asymptomatic to critical. Three control groups were included: healthy volunteers, patients suffering from a non-COVID-19 severe respiratory illness and healthy individuals administered with intravenous lipopolysaccharide to model an acute inflammatory response. Full single cell transcriptomes coupled with quantification of 188 cell surface proteins, and T and B lymphocyte antigen receptor repertoires have provided several insights into COVID-19: 1. a new non-classical monocyte state that sequesters platelets and replenishes the alveolar macrophage pool; 2. platelet activation accompanied by early priming towards megakaryopoiesis in immature haematopoietic stem/progenitor cells and expansion of megakaryocyte-primed progenitors; 3. increased clonally expanded CD8+ effector:effector memory T cells, and proliferating CD4+ and CD8+ T cells in patients with more severe disease; and 4. relative increase of IgA plasmablasts in asymptomatic stages that switches to expansion of IgG plasmablasts and plasma cells, accompanied with higher incidence of BCR sharing, as disease severity increases. All data and analysis results are available for interrogation and data mining through an intuitive web portal. Together, these data detail the cellular processes present in peripheral blood during an acute immune response to COVID-19, and serve as a template for multi-omic single cell data integration across multiple centers to rapidly build powerful resources to help combat diseases such as COVID-19. The outbreak of coronavirus disease 2019 was declared a global pandemic on 11 90 March 2020 1 , and as of 12 January 2021 has led to over 91 million infections and 1.9 million 91 deaths worldwide 2 . Common symptoms, which are often mild and transient, include cough, fever 92 and loss of taste and/or smell 3 . In a small proportion of those infected, symptoms can worsen and 93 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 lead to hospitalisation, with the elderly and those with comorbidities being most at risk 4 The aetiologic agent of COVID-19 is a novel highly-infectious pathogenic coronavirus, severe 102 acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This enveloped, positive-sense single-103 stranded RNA betacoronavirus utilises the cell surface receptor angiotensin-converting enzyme 2 104 (ACE2) to enter host cells 9 . ACE2 is expressed in various barrier tissues, including nasal 105 epithelium, conjunctival epithelium and intestines, as well as internal organs, including alveoli of 106 the lung, heart, brain, kidney and the uterine-placental interface 10 Several studies have highlighted a complex network of peripheral blood immune responses in 112 COVID-19 infection, with the role of T cells during infection being an area of particular focus 13, 14 . 113 A reduction of absolute numbers of T cells linked with disease severity has been reported, as well 114 as a decrease in IFN-γ production by lymphocytes 15 . However, a significant expansion of highly 115 cytotoxic effector T cell subsets has also been found in patients with moderate disease 16 . 116 Additionally, higher expression of exhaustion markers PD-1 and Tim-3 on CD8 + T cells have been The response of myeloid cells and B cells have been less well explored in COVID-19. Emergency 120 myelopoiesis, driven by inflammation, is thought to arise as a way to prevent tissue damage 18,19 . 121 In severe cases, dysregulation of myelopoiesis coupled with abnormal monocyte activation can 122 occur 18 , but the underlying mechanisms remain to be explored. Extrafollicular B cell activation is 123 present in critically ill patients but despite the high levels of SARS-CoV-2 specific antibodies and 124 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 antibody secreting cells, many of these patients do not recover from the disease 14 Table 2 ). In total, following demultiplexing and doublet removal, we sequenced 149 1,141,860 cells from 143 samples with 850,100 cells passing quality control (min of 200 genes 150 and <10% mitochondrial reads/cell) (Extended Data 1A). The full scRNA-seq dataset was 151 integrated using Harmony 24 (Fig. 1B) . There was good mixing of cells by the kBET statistic 152 calculated for each cluster across sample IDs (rejection rate improved from 0.62 to 0.36 following 153 integration, p<2.1x10 -8 by Wilcoxon paired signed rank test (Extended Data 1B-C)). 154 155 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Following Leiden clustering, cells were manually annotated based on the RNA expression of 156 known marker genes. RNA-based annotation was supported by surface protein expression of 157 markers commonly employed in flow cytometry to discriminate PBMC subpopulations (Extended 158 Data 1D). We defined 18 cell subsets across the datasets (Fig. 1B) , with an additional 27 cell states 159 identified following sub-clustering (Fig. 1B, 2A, 3A-B, 4A-B) . Our annotation was further 160 validated using the Azimuth annotation tool for PBMC where more than 50% of the cells were 161 mapped and matched to a unique cluster in 32/33 of the clusters defined in the Azimuth PBMC 162 dataset (proliferating CD8 T cells mapped across two clusters). Clusters unique to our data include 163 proliferating monocytes, ILC subpopulations and isotype-specific plasma cells. (Extended Data 164 1E). Our complete COVID-19 peripheral blood multi-omic data is available through the web-165 portal at https://covid19cellatlas.org. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 in critical disease (Extended Data 2D). However, concordant changes according to symptom 187 duration were still found when excluding critical patients, indicating the additional influence of 188 symptom duration on peripheral immune cell changes in SARS-CoV-2 infection (Extended Data 189 2F). 190 191 We observed expression of Type I/III interferon response genes in monocytes, DCs and HSPCs 192 across the spectrum of COVID-19 severity, but not in patients challenged with in keeping 193 with the importance of type I and III interferons in the innate immune response to viral infection 194 (Fig. 1D) . Type I/III interferon response-related genes were recently identified as harbouring 195 association signals in a Genome Wide Association Study (GWAS) for susceptibility 25, 26 . Of the genes identified in this study, we found IFNAR2 was both upregulated in 197 COVID-19 compared to healthy in most circulating cell types and highly expressed by 198 plasmablasts, monocytes and DCs (Extended Data 2G). To take advantage of the comprehensive protein expression data, we used Cydar 27 to characterise 208 how the immune landscape changes with disease severity based on surface protein expression. We 209 divided cells into phenotypic hyperspheres based on the expression of 188 proteins. We then 210 quantified the number of cells from each severity group within the hyperspheres, which allowed 211 us to identify 430 hyperspheres that differed significantly in abundance with increasing severity 212 (spatial FDR < 0.05, Fig. 1E ). Examining the surface protein expression profiles post-hoc showed 213 that differentially abundant hyperspheres were present in all major immune compartments. In 214 particular, we found an increase in populations of B cells (CD19 + /CD20 + ), plasma cells (CD38 + ) 215 and HSPCs (CD34 + ) as well as a previously reported remodelling of the myeloid compartment 18 216 (Fig. 1E) . . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Mononuclear phagocytes and haematopoietic stem progenitors 218 Transcriptome and surface proteome analysis of blood mononuclear phagocytes identified known 219 DC subsets (pDC, ASDC, DC1, DC2, DC3) and several monocyte states ( Figs. 2A-B We previously demonstrated the egress of blood DCs and monocytes from blood to alveolar space 232 with rapid acquisition of a lung molecular profile following human inhalational LPS challenge 28 . 233 To better understand the relationship between circulating and lung alveolar mononuclear 234 phagocytes in COVID-19, we compared the transcriptome profile of blood DCs and monocytes 235 with their bronchoalveolar lavage (BAL) counterparts during COVID-19 using recently published 236 data (GSE145926) 29 (Extended Data 3B). As expected, partition-based graph abstraction (PAGA) 237 suggests transcriptional similarity between healthy circulating CD14 + monocytes and healthy BAL 238 macrophages, in agreement with recent data demonstrating that BAL macrophages can arise from 239 circulating CD14 + monocytes (Fig. 2C) 30 . However, there is a surprisingly greater transcriptional 240 similarity between BAL macrophages and the expanded population of circulating 241 C1QA/B/C + CD16 + monocytes in COVID-19 (Fig. 2C) . These observations raise the possibility of 242 a differential origin of alveolar macrophages during health and COVID-19. Both BAL 243 macrophages and C1QA/B/C + CD16 + monocytes express FCGR3A and C1QA/B/C and are 244 enriched for expression of type I interferon response genes ( Fig. 2A) . Myeloid hyperinflammatory 245 response has been reported to mediate lung and peripheral tissue damage via secretion of 246 inflammatory cytokines such as IL-6 and TNFa in COVID-19. We evaluated the expression of 247 these cytokines and found that they are primarily expressed by tissue rather than blood 248 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 mononuclear phagocytes (Fig. 2C) . Genes differentially expressed in CD83 + CD14 + monocytes 249 and BAL macrophages across pseudotime identified expression of IL15, which is produced in 250 response to viral infections to promote NK proliferation, and leukocyte recruiting chemokines 251 including CCL2, CCL4, CCL7, and CCL8 upregulated by BAL macrophages (Fig. 2D) . Tissue DCs respond to local inflammation and pathogen challenge by migrating to the draining 254 lymph node to activate naïve T cells. BAL contains a population of mature, migratory DCs that 255 express CCR7 and LAMP3 but downregulate DC-specific markers such as CD1C and CLEC9A 256 (Extended Data 3B). These migratory DCs express IL10 in healthy BAL but express TNF and the 257 common IL-12 and IL-23 subunit IL12B in COVID-19, suggesting altered capacity for T cell 258 polarisation (Fig. 2E) . In peripheral blood, C1QA/B/C + CD16 + monocytes expressed the highest 259 amount of Type 1 IFN response genes compared to all peripheral blood myeloid cells (Fig. 2F) . 260 We detected minimal TNF-or IL6-mediated JAK-STAT signaling pathway activation in 261 circulating monocytes and DCs but this was upregulated by COVID-19 BAL mononuclear 262 phagocytes (Fig. 2F) . Coagulation abnormalities and monocyte-platelet aggregates have been previously reported in 265 COVID-19 patients 31,32 and we observe an expansion of platelets associated with disease severity 266 (Fig. 1C) . This led us to investigate the receptor-ligand interactions predicted to mediate 267 monocyte-platelet interactions using CellPhoneDB, which identified ICAM1 interactions on 268 platelets with CD11a-c/CD18 primarily on C1QA/B/C + CD16 + monocytes and CD16 + monocytes 269 (Fig. 2G ). This is accompanied by increased expression of surface proteins indicative of platelet 270 activation (Fig. 2H) . Our large dataset (850,100 PBMCs) allowed us to interrogate rare populations, including the HSPC 273 compartment. To this end, we selected all cells in clusters with significant expression of the HSPC 274 marker CD34, which resulted in a total of 3,085 HSPCs, following removal of minor clusters co-275 expressing mature lineage markers. Leiden clustering and UMAP visualisation resulted in a cloud-276 like representation with closely attached clusters, consistent with a stem/progenitor landscape 277 previously described for bone marrow HSPCs 33 (Fig. 2I, Extended Data 3C) . Absence of CD38 278 mRNA and protein expression marks the most immature cells within the CD34 compartment, 279 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 while expression of markers such as GATA1, MPO and PF4 characterises distinct erythroid, 280 myeloid and megakaryocytic progenitor populations (Fig. 2I) . Accordingly, we were able to 281 annotate six transcriptional clusters as CD34 + CD38 -HSPCs, CD34 + CD38 + early progenitor 282 HSPCs, and CD34 + CD38 + erythroid, megakaryocytic and myeloid progenitors as well as a small 283 population distinguished by the expression of genes associated with cell cycle (S-phase) (Fig. 2I) . Following stratification by disease severity, the most noteworthy observation was that the 286 megakaryocyte progenitors were essentially absent in healthy and asymptomatic individuals, but 287 comprised approximately 5% of CD34 + cells in mild, moderate, severe and critical patients (Fig. 288 2J). Unlike the bone marrow, which contains rapidly cycling progenitors, the peripheral blood is 289 not thought to constitute a site for haematopoiesis 34 , consistent with the low number of CD34 + 290 cells expressing a cell cycle signature, which was furthermore restricted to genes associated with was of particular interest, as it suggested a rebalancing of the stem/progenitor compartment. The 298 overall number of these megakaryocyte progenitors however was low, prompting us to seek 299 additional evidence for reprogramming of immature haematopoiesis. To this end, we carried out 300 differential gene expression analysis between the megakaryocyte, myeloid and erythroid 301 progenitor clusters, and used the resulting gene lists to build gene signatures in order to interrogate 302 early activation or priming of lineage-specific transcriptional programs in the most immature 303 haematopoietic progenitor cell clusters (Extended Data 3D). This analysis showed activation of 304 the megakaryocyte progenitor signature in both the CD38and CD38 + HSPC populations (Fig. 305 2K), with less pronounced effects seen with the erythroid and myeloid signatures (Extended Data 306 3E). Of note, the megakaryocyte signature was also strongly induced in the asymptomatic patients, 307 which do not contain substantial numbers of CD34 + megakaryocyte progenitors in their peripheral 308 blood. Our earlier observation of increased platelet activation within the context of normal platelet 309 counts (Fig. 2H , Extended Data 2B) is therefore consistent with a model whereby exaggerated 310 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint megakaryopoiesis may be compensating for peripheral platelet consumption in COVID-19 311 patients. Of note, our HSPC compartment analysis suggests that immature haematopoiesis is also 312 affected in asymptomatic patients, but possibly through distinct differentiation and/or cell 313 mobilisation processes. Taken together, our data suggest that alterations in the cellular composition 314 and transcription programs of the stem/progenitor compartment contribute to the patho-315 physiological response to SARS-CoV-2 infection. To investigate T cell phenotype beyond differential abundance of T cell subsets, we performed 347 differential gene expression analysis across disease severity (FDR 1%) followed by gene set CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint IFN response genes in B cells in severe or critical disease does not reflect an inability of B cells to 404 respond to IFN-ɑ, but rather attenuation of IFN-ɑ. This may be because the initial anti-viral 405 response has waned in patients with severe or critical disease or because these patients fail to 406 sustain adequate IFN-ɑ production by myeloid cells and pDCs following symptom onset as 407 previously reported 13 . Longitudinal sampling would be required to distinguish these two 408 possibilities. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint with BCR activation signaling, suggesting a potential for greater BCR activation. This may 435 indicate that more avid responses early in disease prevent progression to a more severe phenotype, 436 or may merely reflect the immune response in the early phase of the disease. Longitudinal analysis 437 of patient samples would be required to address this question. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint We summarise the immunological cellular and molecular profiles observed in our study 466 highlighting known and new discoveries as well as the distinguishing features of asymptomatic 467 and milder disease from severe and critical disease (Fig. 5) . Future longitudinal studies may enable 468 us to distinguish if the distinct responses in asymptomatic and milder disease prevent progression 469 to severe phenotypes. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. A. Volcano plots showing results of differential abundance testing. Hypothesis testing was 864 performed using quasi-likelihood F-test comparing healthy controls to cases for linear trends 865 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint across disease severity groups (healthy > asymptomatic > mild > moderate > severe > critical). 866 Differentially abundant cell types were determined using a 10% false discovery rate (FDR) and 867 marked (*). Hypothesis testing was performed using quasi-likelihood F-test comparing healthy 868 controls to cases. Differentially abundant cell types were determined using a 10% false discovery Hypothesis testing was performed using quasi-likelihood F-test comparing healthy controls to 907 cases, or for either a linear or quadratic trend across disease severity groups (asymptomatic > mild 908 > moderate > severe > critical). Differentially abundant cell types were determined using a 10% 909 false discovery rate (FDR). C. Gene set enrichment (MSigDB Hallmark 2020) in each T cell type 910 based on differential gene expression (DGE) analysis was performed across COVID-19 disease 911 severity groups, ordered from healthy > asymptomatic > mild > moderate > severe > critical. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 plasmablast and plasma cells between severity statuses. Significance is denoted by *P < 0.05; **P 928 < 0.01; ***P < 0.001. (Bottom) Cell type abundance counts were modelled as a function of disease 929 severity. Hypothesis testing was performed using quasi-likelihood F-test comparing asymptomatic 930 to symptomatic covid, for either a linear or quadratic trend across disease severity groups 931 (asymptomatic > mild > moderate > severe > critical). Differentially abundant cell types were 932 determined using a 10% false discovery rate (FDR). C. GSEA of pathways from MSigDB v7. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. Size of nodes is scaled according to increasing node closeness centrality scores i.e. nodes that are 972 highly central to a clonotype network will be larger. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Methods . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint TotalSeq-C™ antibody cocktail (BioLegend 99813; see Supplementary Table 2) was centrifuged 1124 at 14,000 g at 4°C for 1 min, resuspended in 52 μL of FACS buffer, incubated at room temperature 1125 for 5 min and centrifuged at 14,000 g at 4°C for 10 min. 25 μL were subsequently added to each CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10. 1101 /2021 Cambridge: 1186 Non-empty droplets were called within each multiplexed pool of donors using the emptyDrops 1187 function implemented in the Bioconductor package DropletUtils, using a UMI threshold of 100 1188 and FDR of 1%. The probability of being a doublet was estimated for each cell per sample (that is 1189 one 10x lane) using the "doubletCells" function in scran based on highly variable genes (HVGs). 1190 Next, we used "cluster_walktrap" on the SNN-Graph that was computed on HVGs to form highly 1191 resolved clusters per sample. Per-sample clusters with either a median doublet score greater than 1192 the median + 2.5 x MAD or clusters containing more than the median + 2.5 MAD genotype 1193 doublets were tagged as doublets. This was followed by a second round of highly-resolved 1194 clustering across the whole data set, in which again cells belonging to clusters with a high 1195 proportion (> 60%) of cells previously labelled as doublets were also defined as doublets. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint each protein was then subtracted from the log CPM from the QC-passed droplets in the respective 1217 experimental sample. 1218 1219 Quality control, normalisation, embedding and clustering 1220 Combined raw data from the three centres was filtered to remove those that expressed fewer than 1221 200 genes and >10% mitochondrial reads. Data was normalised (scanpy: normalize_total), log+1 1222 corrected (scanpy: log1p) and highly variable genes identified using the Seurat vst algorithm 1223 (scanpy: highly_variable_genes). Harmony was used to adjust principal components by sample ID 1224 and used to generate the neighbourhood graph and embedded using UMAP. Clustering was 1225 performed using the Leiden algorithm with an initial resolution of 3. For initial clustering, 1226 differentially expressed genes were calculated using Wilcoxon rank-sum test. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 clinical/technical factor on cell type composition was estimated by the interaction term with the 1248 cell type (see Supplementary Note 1 for detail). The likelihood ratio test was performed to assess 1249 the statistical significance of each factor on cell type abundance by removing one interaction term 1250 from the full model at a time. The number of factors was used to adjust multiple testing with the 1251 Bonferroni approach. The 'glmer' function in the lme4 package implemented on R was used to fit 1252 the model. The standard error of variance parameter for each factor was estimated using the 1253 numDeriv package. 1254 1255 Cydar Analysis 1256 We utilized cydar to identify changes in cell composition across the different severity groups based 1257 on the protein data alone. First, the background-corrected protein counts from the three different 1258 sites were integrated using the 'fastMNN' method (k = 20, d = 50, cos.norm = TRUE) in scran 58 . The batch-corrected counts for 188 proteins (4 rat/mouse antibody isotypes were removed) were 1260 then used to construct hyperspheres using the 'countCells' function (downsample = 8) with the 1261 tolerance parameter chosen so that each hypersphere has at least 20 cells which was estimated 1262 using the 'neighborDistances' function. To assess whether the abundance of cells in each 1263 hypersphere are associated with disease status, hypersphere counts were analyzed using the quasi-1264 likelihood (QL) method in edgeR. After filtering out hyperspheres with an average count per 1265 sample below 5 we fitted a mean-dependent trend to the NB dispersion estimates. The trended 1266 dispersion for each hypersphere was used to fit a NB GLM using the log-transformed total number 1267 of cells as the offset for each sample and blocking for sex, age and batch. The QL F-test was used 1268 to compute P values for each hypersphere which were corrected for multiple testing using the 1269 spatial FDR method in cydar. Comparisons of PBMC annotation using the Azimuth tool 1272 The final annotation of PBMCs was compared to a published PBMC annotation using the Azimuth 1273 tool (http://azimuth.satijalab.org/app/azimuth). Because of size restrictions of 100,000 cells, our 1274 data was subsampled to 10% of the total cells. After running the algorithm, results with a prediction 1275 score < 0.5 were removed (5.8% of total removed). For each cluster in the COVID-19 PBMC data, 1276 the percentage of cells mapped to each cluster in the Azimuth annotation was calculated. 1277 1278 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Interferon, TNF and JAK-STAT response scoring 1279 A list of genes related to response to type I interferons was obtained from the GSEA Molecular CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 CellphoneDB 60 was used to assess putative interactions between monocytes (CD14_mono, CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint expression of specific marker gene and protein pairs. Dot plots to visualise marker protein and 1402 mRNA expression across clusters were generated using the R package ggplot2. UMAP 64 was used 1403 to project all single T cells into a 2D space (k=31) using the first 30 batch-integrated PCs as input 1404 using the R package umap. 3) productive CDR3 spanning V+J genes. Chain-specific TCR clones were defined for each 1425 observed α and β chain by first concatenating the V, J and identical CDR3 nucleotide sequences. 1426 For each single T cell, these chains were then combined to form a single clonotype, removing cells 1427 that contained: 1) > 2 β chains and > 2 α chains, 2) a single α or a single β chain only. T cells with 1428 exactly 2 β chains and 1 α chain, or those with exactly 2 α chains and 1 β chain were retained. TCR 1429 clonotypes were counted within each donor sample, and expanded clones were defined where > 1 1430 cell was assigned to the TCR clonotype. 1431 1432 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint from the number of cells sampled (post-QC) may not necessarily recapitulate the entire repertoire, 1526 which may under-or over-represent merged counts for gini index calculation. We propose the use 1527 of node closeness centrality computed on each expanded clone (clone size > 1) as an alternative 1528 metric to emulate the statistics to adapt to the single-cell nature of the data; closeness centrality 1529 defines how close and central each node is with respect to other nodes in the graph and therefore 1530 cells with identical BCRs will have high closeness centrality scores, due to the way the BCR 1531 network is constructed in dandelion. Thus, we can quickly calculate if cells across clones, and/or 1532 samples overall, in the entire graph display proportionately/disproportionately high or low 1533 closeness centrality scores. One caveat to the current implementation is that it is only meaningful (1 or 0). The information is turned into an adjacency matrix where an edge is created between two 1543 individuals if there is at least one clonotype (at least 1 cell from each individual displays an 1544 identical combination of heavy and light chain V-and J-gene usage with allowance for somatic 1545 hypermutation at the CDR3 junctional region) that is similar between the two individuals. 1546 Visualisation is achieved using the CircosPlot function from nxviz package (v0.6.2). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. CLEC9A CADM1 CLEC10A CD1C CD14 VCAN CCR7 LAMP3 AXL SIGLEC6 LILRA4 ITM2C GZMB IL1B IER3 LDLR CD83 S100A12 CSF3R FCGR3A MS4A7 LILRB1 CSF1R CDKN1C C1QA C1QB C1QC CCR1 MARCO MKI67 IL2 IL1A IL1B IL10 IL22 TNF IFNG LTA IL4 IL5 IL13 IL17A IL17F IL21 IL12A MS4A1 CD19 CD40 CD69 CD22 FCER2 CD24 CR2 MME MKI67 IGHD IGHM IGHA1 IGHG1 MZB1 CD27 TNFRSF13B CD38 1% (v/v) Penicillin/Streptomycin (100 U/mL and 100 μg/mL 1341 respectively; Sigma Aldrich, P0781) and 1% (v/v) L-Glutamine (2 mM; Sigma Aldrich, G7513), 1342 referred as RPMI10, followed by centrifugation at 500 g for 5 min. Cell pellet was resuspended in 1343 500 μL RPMI10 with added DNAse (1 μg/mL, Merck, 10104159001), divided into 5 wells of 1344 round bottom 96-well plate and left to rest at 37°C for an hour CoV-2 PepTivator peptide S for pan-HLA (2 μg/mL, Miltenyi Biotech μL/mL, Cell Activation cocktail, Biolegend, 423301), and 1347 incubated at 37°C for 2 h. Negative controls were left untreated. Brefeldin A (2 μg/mL :50, clone H4A3, BD Bioscience, 1349 566558) was added for additional 4 h into all conditions. Cells were stained for detection of 1350 activation induced markers and intracellular cytokines 6 h after stimulation and subjected to flow 1351 cytometry Flow Cytometry of stimulated cells 1354 PBMC stimulated for 6 h with the SARS-Cov-2 peptide were washed with PBS, and cell surface 1355 stained for 1 h at room temperature: anti-CD14-FITC (1:50, clone M5E2, BD Biosciences anti-CD3-BUV395 (1:50, clone UCHT1, BD Biosciences, 563546) Cells were washed with PBS 2% (v/v) FCS, fixed with 4% (w/v) paraformaldehyde 1364 (ThermoFisher Scientific, 28908) and kept at 4°C overnight :10, clone 1367 The proportion of expanded clones as a function of a linear trend across disease severity groups 1433 was modelled using logistic regression, adjusted for age, gender and batch. A separate model was 1434 run for each T cell subtype which contained at least 5 cells assigned to the expanded TCR 1435 clonotypes The TE:EM ratio change across COVID-19 severity was tested using a robust linear 1440 model implemented in the R package robustbase, regressing TE:EM ratio on disease severity as 1441 an ordered linear variable (asymptomatic > mild > moderate > severe > critical), adjusted for age, 1442 gender and batch Cell type proportions were 1448 computed by normalizing the counts of each cell type within each donor by the total number of 1449 cells captured for that donor sample. Donor samples were ranked according to their disease 1450 severity (healthy > asymptomatic > mild > moderate > severe > critical). Differential correlation 1451 analysis was then performed between CD4.Tfh vs all B cell types BCR contigs contained in filtered_contigs.fasta and filtered_contig_annotations.csv from 1458 all three sites were then pre-processed using immcantion inspired preprocessing pipeline 70 1459 implemented in the dandelion python package; dandelion is a novel single cell BCR-seq analysis 1460 package for 10x Chromium 5' data. All steps outlined below are performed using dandelion 1461 v0 17/EE/0025, and "Genetic variation AND Altered Leukocyte Function in health and disease -1031 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint 37 GANDALF" REC ref 08/H0308/176). All participants provided informed consent. Each 1032 participant provided 27 mL of peripheral venous blood collected into a 9 mL sodium citrate tube. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101/2021.01.13.21249725 doi: medRxiv preprint 38 ventilation were defined as "Critical". There were no patients in ICU that did not require 1062 supplemental oxygen. BCRs were grouped into clones/clonotypes based on the following sequential criterion that applies 1492 to both heavy chain and light chain contigsi) identical V-and J-gene usage, ii) identical 1493 junctional CDR3 amino acid length, and iii) at least 85% amino acid sequence similarity at the 1494 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 15, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 52 CDR3 junction (based on hamming distance). Light chain pairing is performed using the same 1495 criterion within each heavy chain clone. Only samples collected at day 0 of the study were analyzed 1496 from this step onwards and clones/clonotypes were called across the entire dataset; the sample 1497 from one of the donors who was subsequently found to have a B cell malignancy was separated 1498 from the analysis and processed independently. The use of the BCR network properties for computing gini indices was inspired from bulk BCR-1516 seq network analysis methods where distribution of clone sizes and vertex sizes (sum of identical 1517 BCR reads) in BCR clone networks were used to infer the relationships between BCR clonality, 1518 somatic hypermutation and diversity 73 . However, there are challenges with native implementation 1519 of this approach for single-cell data. Firstly, to enable calculation of network-based clone/cluster 1520 and vertex/node size distribution, BCR networks needed to be reduced such that nodes/cells with 1521 identical BCRs had to be merged and counted; this required the re-construction of BCR networks 1522 per sample and discarding single-cell level information. Furthermore, the process of node 1523 contraction and counting of merging events requires significant computation time and resource. 1524