key: cord-0841299-95renluh authors: Garg, Manik; Li, Xu; Moreno, Pablo; Papatheodorou, Irene; Shu, Yuelong; Brazma, Alvis; Miao, Zhichao title: Meta-analysis reveals consistent immune response patterns in COVID-19 infected patients at single-cell resolution date: 2021-01-24 journal: bioRxiv DOI: 10.1101/2021.01.24.427089 sha: be332d3ed28dbfea455fdf964288351f4319215d doc_id: 841299 cord_uid: 95renluh A number of single-cell RNA studies looking at the human immune response to the coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been recently published. However, the number of samples used in each individual study typically is small, moreover the technologies and protocols used in different studies vary, thus somewhat restricting the range of conclusions that can be made with high confidence. To better capture the cellular gene expression changes upon SARS-CoV-2 infection at different levels and stages of disease severity and to minimise the effect of technical artefacts, we performed meta-analysis of data from 9 previously published studies, together comprising 143 human samples, and a set of 16 healthy control samples (10X). In particular, we used generally accepted immune cell markers to discern specific cell subtypes and to look at the changes of the cell proportion over different disease stages and their consistency across the studies. While half of the observations reported in the individual studies can be confirmed across multiple studies, half of the results seem to be less conclusive. In particular, we show that the differentially expressed genes consistently point to upregulation of type I Interferon signal pathway and downregulation of the mitochondrial genes, alongside several other reproducibly consistent changes. We also confirm the presence of expanded B-cell clones in COVID-19 patients, however, no consistent trend in T-cell clonal expansion was observed. Coronavirus disease 2019 (COVID- 19) , the global pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a recognised major threat to the humanity. Thanks to studies from countries around the world, a significant progress in the fields of disease diagnosis, treatment, prevention and control of this disease has been made. However, the pathogenesis of SARS-CoV-2 infection and the immunological characteristics associated with the severity of the disease are still unknown. To understand the pathology and immune response in COVID-19 patients, a number of single-cell RNA sequencing (scRNA-seq) experiments have been performed on different cell types obtained from human patients [1] [2] [3] [4] [5] [6] [7] . Studies on diseases caused by influenza and other respiratory viruses have shown that the peripheral immune response plays an important role in the defence against the infections and disease progression 1 . In COVID-19 patients, several pathways have been reported to be regulated, including the CCR1 and CCR5 pathways 5 , the HLA class II and type I interferon pathways 2 , the IL1B pathways and interferon-stimulated genes 4 . However, most studies only focus on pathway analysis in certain cell types. Due to the limited patient sample availability, it is difficult to derive statistically reliable trends in the changes of cell subtype proportions over the disease stages in individual studies. It is still unclear to what extent some of the observations can be generalized and which pathways are consistently regulated. Here we report results from systematic meta-analysis of 9 single-cell RNA-seq datasets from SARS-CoV-2 infected patient samples and a healthy control dataset. In these datasets, 3 of them have T cell receptor (TCR) sequencing data and 2 have B cell receptor (BCR) sequencing data available. Specifically, we analysed 1) the cell type and cell proportion captured in different experiments to infer the immune response upon SARS-CoV-2 infection; 2) the gene expression regulation and pathway activation in COVID-19 patients; 3) the clonal expansion in T cells and B cells. While we can reproduce 84.21% of the 19 previous conclusions based on small sample size in their original datasets, we could not fully reproduce less than half of these conclusions in other studies. In particular, we show consistent upregulation of type I Interferon signal pathway and downregulation of the mitochondrial genes, as well as demonstrate an absence of consistent T-cell clonal expansion in COVID-19 patients across studies. Meta-analysis reveals common cell types and tissue-specific heterogeneity • Mapping of the disease states The overview of the datasets is given in Table 1 , which also gives each dataset a name referred to throughout the text. In total, the datasets that we analysed include 159 experimental samples and 862,354 cells across 9 different disease conditions. Considering the sample stages in different datasets are defined with different names, we map them to standardised terms Healthy, Mild, Moderate, Severe, Post Mild, Convalescent, Late recovery, Asymptomatic and Influenza ( Supplementary Table S1 ). Although such retrospective standardisation may introduce a certain level of noise, the mapping of comparable conditions is essential for meta-analysis ( Figure 1 ). We have taken care to consider the detailed descriptions of the named conditions as presented in the respective papers and we are confident that for the purposes of our analysis the introduced mappings are correct (see Supplementary Data 1 ). Visualisation of the combined data using uniform manifold approximation and projection (UMAP) in Figure 2a shows strong batch-effect corresponding to different studies. We find the peripheral blood mononuclear cell (PBMC) datasets are more similar to each other and form three large cell clusters, which are the lymphoid lineage, myeloid lineage and the B cells. The Wilk PBMC dataset, which is based on the Seq-Well technique, clusters separately. After the data integration and batch-effect removal with Harmony 8 , the cells from different studies were no longer separated and largely clustered by the underlying biology ( Figure 2b ). Moreover, the healthy samples are also well-mixed, except the epithelial cells, which are mainly captured in the Chua dataset. Similar to previous publication 4, 9 , the cells can be classified into 5 major cell populations, Cells in each population are then further divided into subpopulations based on the expression of marker genes and logistic regression reference-based annotation methods (see Methods) . The resulting cell clusters were further refined using SCCAF 10 ; all the 5 cell populations achieved self-projection accuracies above 92% in the Harmony latent space, Supplementary Table S2 . The final result includes 37 cell subpopulations excluding doublets. In the lymphoid cell population, we define 5 CD8 + T cell subpopulations, including CD8n T cell, CD8m T(GZMK), CD8m T(GZMH), CD8m T(IL7R) and CD8 effector cells. The CD8n T cells cluster next to the CD4 naive T cells and express CD8B, CD8A, CD3D, CD3E and CCR7 . CD8m T(GZMK), CD8m T(GZMH) and CD8m T(IL7R) are 3 subpopulations of the CD8 T cells and show complementary marker genes ( GZMK, GZMH and IL7R ), Supplementary Figure S2 . GZMH is known as a cytotoxic effector T cell marker, while GZMK is a transitional effector T cell marker 11 . GZMH and GZMK not only show good discrimination of the CD8 + T cell subpopulations, but also discriminates the NK cell populations, Supplementary Figure S3 . The CD8eff T cell is a proliferating cell subpopulation and is easily distinguishable. CD56 ( NCAM1 ), GNLY and NKG7 highlight the NK cells. We further divide the NK cells into 4 subpopulations NK (GZMH + ), NK(GZMK), NK(SYNE2) and mito-high NK. The mito-high NK cluster has a higher level of percentage of mitochondrial RNA contents and also expresses T cell markers ( CD3D, CD3E ) as well as some red blood cell markers ( HBB, HBA1, HBA2 ). Therefore, the mito-high NK cluster is possibly a contaminated subpopulation. Considering γδ T cells comprise 1%-10% of human PBMCs, it is difficult to distinguish the γδ T cell subpopulations within a single dataset. The existing annotations of γδ T cells are not fully consistent: Wilk et al. annotated according to the γδ TCR constant chains encoding genes ( TRGC1, TRGC2 and TRDC ), while Zhang et al. annotated γδ T cells as TRGV9 + TRDV2 + . However, these groups of marker genes highlight two cell subpopulations next to each other, Supplementary Figure S4 . It is known that Vγ9Vδ2 T cells are the major subset of γδ T cells in human PBMCs 12 . According to the marker knowledge of γδ T cells [12] [13] [14] , we define the TRGV9 + TRDV2 + subpopulation as Vγ9Vδ2 T cells and other CD161(KLRB1) + TRGC1 + TRGC2 + cells as γδ T cells. The γδ T cells also express canonical γδ T cell markers CCR5, CCR6 15 as well as SLC4A10 16 and TRAV1-2 17 , which are known as Mucosal-associated invariant T (MAIT) cell markers. Although the monocytes cells have been discussed in a previous publication 18 , we are able to refine this type to include 3 CD16 + monocytes subpopulations and 2 CD14 + monocytes subpopulations. CD16 + monocytes express high FCGR3A ( CD16 ), while CD16 monocytes(C1QA) not only express the complement proteins ( C1QA, C1QB, C1QC ) but also HLA-DRA and CD63 . The CD16 monocytes(CTSL) cluster expresses CTSL and CD163 , the CD16 monocytes(S100A9) cluster expresses S100A8 and S100A9 , Supplementary Figure S5 . These cell clusters correlate well with the cell populations reported by Schulte-Schrepping et al. 18 . Neutrophils were not annotated in all the reported datasets 3 4 . However, after annotating neutrophils according to the canonical markers ( FCGR3B and CXCR2 ), Supplementary Figure S6 , we find that they exist in all the datasets (but hardly in the healthy controls). We broadly divided the B cells into naïve B, memory B and plasma cells, where the naive B-cells were further divided into Naïve B(EEF1G), Naïve B(TMSB4X) and Naïve B(IGHD) based on the marker genes' expression and memory B-cells were further divided into Memory B(EEF1G) and Memory B(CD27). Meta-analysis shows consistently decreasing relative proportion of lymphoid cells and increasing proportion of myeloid cells with disease severity In Figure 4a , we show the average cell composition change of the 37 cell subpopulations at different disease stages. The lymphoid cells (T cells and NK cells) decrease from healthy to severe (healthy, mild, moderate, severe), while myeloid cells (CD14 and CD16 monocytes, Neutrophils) relatively increase. In the recovery stages (post mild, convalescent and late recovery), the lymphoid cell proportions are comparable to the proportion at the mild stage but higher than the proportions in moderate and severe stages. And for myeloid cells, the cell proportions are higher than those in the healthy and mild stages. The cell composition of asymptomatic patients is similar to healthy donors. Possibly the asymptomatic patients do not show notable immune responses. Influenza infection data was only found in the Lee dataset and the result shows a very different cell composition from those in COVID-19 infection, where influenza patients have a similar level of lymphoid cells as the moderate COVID-19 patients but more CD14 + monocytes. To understand the disease process in each dataset separately, 4 datasets (Zhang, Wilk, Chua and Liao) include stages of healthy, moderate and severe, while the Lee dataset includes healthy, mild and severe (here, we keep the mild and moderate as different stages because the samples from patients with mild response show difference in the cell composition from those with moderate infections). Since the 10X reference dataset was used as a healthy reference in the He study 6 , we use the same strategy when analysing this dataset. According to the distribution of these 6 datasets, we find a consistent decreasing trend over the stages from healthy to severe, except for the Liao dataset ( Figure 4b ). Healthy samples of the same tissue type are expected to show similar results. The 4 datasets (Lee, Zhang, Wilk and Chua) show a similar proportion of T cells in healthy donors, which is around 30-40%. But the 10X healthy PBMC reference shows~60% T cells and the Liao dataset of BALF shows only~10% T cells. Due to the lack of BALF-derived healthy samples in the He dataset, we cannot confirm whether this is expected in healthy BALF samples or if it corresponds to the general individual-specific variability such as that explored by Wong et al. 2013 19 . However, we can still find a decrease of T cell proportion from moderate to severe stages in the Liao dataset. The percentages of monocytes and neutrophils relatively increase from healthy to severe stages. Similar to the T cell proportion distribution, the Lee, Zhang, Wilk, Chua and 10X datasets show~20% monocytes in the healthy samples, Figure 4d . In the Liao data this percentage is around 80% and it still shows an increase in monocytes from moderate stage to severe stage. If we use 20% as a reference value for monocytes in healthy controls, in the Liao dataset monocytes further increase from healthy to severe patients. For neutrophils, the 6 datasets consistently illustrate an increasing trend from healthy to severe, though the levels of increase can vary, Figure 4e . In the He and Chua dataset, some of the samples may even include 40%-70% neutrophils. Only a small number of Plasma cells have been detected in the data from healthy donors, but the cell proportion increases in the moderate and severe stages, which is consistent for all the 6 datasets, To study the response of gene regulatory pathways to COVID infection, we identified differentially expressed (DE) genes and pathways by comparing the mild, moderate and severe patient samples with healthy controls. For each dataset, we first measure the DE genes between moderate samples and healthy controls, and also the DE genes between severe samples and healthy controls, Figure 5a . Then, we compare the log fold changes of the genes (False Detection Rate <0.01) between moderate/healthy and severe/healthy. Interestingly, the log fold changes between severe and healthy correlates with the log fold changes between moderate and healthy, which indicates that the genes upregulated and downregulated in moderate and severe samples compared to the healthy controls are similar, Figure 5b , c . Moreover, we find the correlation happens in most of the cell types in all studies (including Zhang, Wilk, Liao and Chua). The Lee dataset does not include moderate samples, but includes mild samples, which are different from the moderate ones in cell type composition. It is not surprising that the correlation between mild/healthy and severe/healthy is lower than moderate/healthy and severe/healthy. Nevertheless, this correlation is still positive in all the cell types. Furthermore, we identified the genes upregulated in both moderate and severe samples for each study and checked the overlaps between different studies. In CD14 monocytes, 3822, 639 and 650 genes are upregulated in the Zhang, Wilk and Liao datasets respectively. Of these, 153 genes are consistently upregulated by all the studies, Figure 5d . Pathway analysis of these 153 genes highlight the gene ontology terms: immune response, response to virus and type I interferon signalling pathway. Although the upregulation of type I interferon signalling pathway in monocytes has been reported by Wilk et al. 2020 2 , the upregulation mainly happens in CD14 monocytes and not in CD16 monocytes. We find the upregulation of type I interferon signalling pathway in many of the cell types, including T cells (CD4 T, CD8 T and γδ T), NK cells, DC, pDC, B cells, Plasma cells and Neutrophils ( Supplementary Figure S8 ) . Thus, the upregulation of type I interferon signalling pathway seems to be consistent for many cell types. Specifically, we find the upregulation of IFITM3, IFIT3, IFI44, IFI44L, IFI16, IFI6, LY6E, ISG15, ISG20 Additionally, the human leukocyte antigen (HLA) class II genes have been reported to be downregulated in monocytes of severe COVID-19 patients 2 . We found that in CD14 monocytes, HLA class II module scores (see Methods ) were significantly lower (Bonferroni corrected p-value < 0.05) in severe COVID-19 patients than healthy controls across all the three PBMC datasets (Lee, Zhang and Wilk), as well as in the Liao dataset and the Chua dataset ( Figure 5f ). This observation was also consistent in CD16 monocytes, dendritic cells (DCs) and plasmacytoid dendritic cells (pDCs) across all the five datasets but not in NK cells, naive B-cells or memory B-cells ( Supplementary Figure S9 ). The interferon-stimulated genes (ISGs) have been reported to be heterogeneously upregulated in CD14 monocytes in COVID-19 (both moderate and severe) 2 . We found the result is reproducible in Wilk, Zhang and Liao datasets. However, the result in the Lee dataset demonstrates a downregulation and the difference in Chua dataset was not statistically significant ( Figure 5g ). Therefore, we may not confirm a definite result for the ISG regulation in COVID-19 patients. Meta-analysis did not provide validation of the developing neutrophil population in severe COVID-19 patients We first pooled T-cell and NK cell data from all the studies together and visualized them using UMAP ( Figure 3 : "Lymphoid cells" and Figure 6a ). As expected, NK-cells mostly had no corresponding TCR data. For the cells with TCR data, we observed clonal expansion of CD8m T-cells, particularly the subpopulations CD8m T(GZMH) and CD8m T(GZMK) and Vγ9V 2 T-cells ( Figure 6a ). Their clonal expansion did not appear to be stage-specific ( Figure 6a ). However, we observed an over-representation of CD8 + effector T-cells in severe patients. These observations were confirmed in the bar plots where samples from all the datasets at a specific stage were pooled in together and normalized according to the total number of cells at each stage ( Figure 6b ). This is also consistent with Zhang et al. 2020 20 . To further check if these observations were consistent across studies, we analyzed the overall clonal expansion trend of T-cells. Contrary to previously published results 3,4 , we didn't observe a consistent trend of increased clonal expansion of T-cells in COVID-19 patients compared to healthy controls ( Figure 6c ). Although the moderate and severe patients of Zhang et al. 2020 3 Zhang et al. 2020 20 showed that the overall increased TCR clonal expansion was observed in the severe convalescent COVID-19 patients, compared to healthy control, in the top 6 clonotypes ratio analysis. However, in all datasets, the differences between individuals were found to be large, and in some datasets greater than the differences between groups ( Figure 6c ) 20 . This seems to indicate that there are multiple factors that may affect TCR clonal expansion. We repeated the analysis on CD8 + T-cell subpopulations showing clonal expansion in Figure 6a (and Figure 6b ) to check if they share a common trend across stages. As shown in Figure 6d , the results were inconclusive for memory CD8 + T-cells but increased clonal expansion of γ T-cells, with more than 5 clones in each clonotype (group 'Expnd (n>5)') was observed in recovered patients compared to healthy across both the PBMC datasets. Further, due to the presence of less than 100 CD8 + effector T-cells in most of the samples, we were unable to analyze this subpopulation in greater detail ( Supplementary Figure S12a We further analyzed whether the previously reported observations 3,4 of increased B-cell clonal expansion in COVID-19 patients compared to healthy controls can be confirmed and found that this is indeed consistent ( Figure 7b ). For closer examination of clonal expansion of specific B-cell subpopulations in COVID-19 patients ( Figure 3 : "B-cells" ), we grouped them into three major groups to have a comparable number of cells in each sample: namely memory B-cells, naïve B-cells and plasma cells. It was observed that there were less than 100 plasma cells with corresponding BCR data in most of the samples to draw any reliable conclusions ( Supplementary Figure S13b ). In case of naïve B-cells, we observed slightly higher clonal expansion in convalescent COVID-19 patients compared to healthy controls in both the studies (clonally expanded clonotypes with 2 to 5 clones each; group 'Expnd (2<=n<=5)'; Supplementary Figure S13c ), but this difference was not statistically significant. We also observed increased clonal expansion of naïve B-cells in moderate and severe COVID-19 patients compared to healthy controls in both the clonally expanded groups ('Expnd (2<=n<=5)' and 'Expnd (n>5); Supplementary Figure S Understanding the immunological landscape of SARS-COV2 is an important part of molecular biology research relevant to fighting the pandemic. Previous single cell studies have tried to contribute to the knowledge of COVID-19 infection across multiple disease stages, however, these studies suffer from limited sample size and lack standardization in data collection, pre-processing, cell-type annotation and analysis. In this study, we have tried to identify which of the previously published conclusions are reproducible across multiple datasets and which are study specific when these datasets are analyzed in a standardised way (summarized in Tables 1 and 2 ). By remapping the reads, we were able to identify certain cell populations that were missed out by the original studies. For example, we observed neutrophil population in the Zhang and the Wen datasets, not reported by the original studies. Further, all nine COVID-19 scRNA-seq datasets analyzed in this study had samples collected from patients at different stages, thereby making it difficult to perform homogeneous comparison across all studies. An ideal way would have been to regroup the samples according to the stages defined on a standardized scale (e.g. the WHO scale 21 ) but due to the lack of this metric in the present datasets, we had to make various assumptions, which we verified by considering the available detail in the information provided in original publications and in some cases by contacting the authors. In particular, we assumed that the healthy, severe, early recovery (convalescent) and asymptomatic stages should be comparable. We also checked whether samples at moderate stage from other datasets could be merged with samples at mild stage from Lee et al. 2020 1 dataset, but these two stages exhibited different cell-proportion distributions ( Figure 4 ) and differentially expressed genes and were, therefore, kept as separate. Presence of more BCR data, longitudinal samples with a detailed history of received therapeutic interventions, particularly those with immunomodulatory effects such as Azithromycin 22 given to severe patients in the Wilk dataset 2 , additional clinical data on pre-existing conditions and ethnicities 19 , comparison with asymptomatic patients and presence of other omics modalities [23] [24] [25] [26] [27] would have helped to strengthen the current analysis. Bar the mentioned uncertainties, which may explain some of the unconfirmed original results across individual studies, this study provides a direct comparison of multiple single cell COVID-19 studies published to date. As shown in The raw fastq files for the Wen dataset 4 were downloaded from the GSA (Genome Sequence Archive) database 28, 29 under the accession number PRJCA002413, including all the files for scRNA-seq, TCR-seq and BCR-seq. The fastq files were aligned to the human genome (GRCh38 version 3.0.0, including 33538 genes), which was downloaded from the 10X Chromium website ( https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest ), using Cell Ranger (3.1.0). TCR and BCR results were aligned to the human vdj reference (version 4.0.0) from 10X Chromium website using Cell Ranger (3.1.0). For the Liao dataset 9 , the Cell Ranger mapped results together with the TCR results were downloaded from GEO 30 under the accession number GSE145926. The downloaded scRNA-seq data includes 33538 features, which is the same as the reference genome used in the above mentioned datasets. The cell type annotation of the dataset was downloaded from GitHub [ https://raw.githubusercontent.com/zhangzlab/covid_balf/master/all.cell.annotation.meta.txt ]. The unannotated cells were considered as low quality and were excluded from analysis. Results for the Lee dataset 1 was downloaded from GEO under the accession number GSE149689. The expression matrix also includes 33,538 features, which is the same as the reference genome used in the above-mentioned datasets. We downloaded the raw fastq files for the Yu dataset 7 from the GSA (Genome Sequence Archive) database under the accession number PRJCA002579. Reads mapping was performed in the same way as on the Wen dataset. The mapped results for the Jiang dataset 31 was downloaded from figureshare [https://figshare.com/articles/dataset/Single_cell_and_immune_repertoire_profiling_of_COVI D-19_patients_reveal_novel_therapeutic_candidates/12115095]. The expression matrix also includes 33,538 features, which is the same as the reference genome used in the above mentioned datasets. We downloaded the raw fastq files of the Wilk dataset 2 from the ENA database 32 under the accession number PRJNA633393. According to Wilk et al., dropEst, samtools and STAR were used for the reads mapping. Human genome GRCh38 version 3.0.0 (the same as the Wen dataset reference) was used as the reference genome for reads mapping. The resulting expression matrix includes 33,538 features for 150,245 cells. We downloaded the raw fastq files of the Zhang dataset 3 from the GSA (Genome Sequence Archive) database under the accession number HRA000150. Reads mapping was performed in the same way as on the Wen dataset. We downloaded the mapped results of the He dataset 6 from GEO under the accession number GSE147143. The expression matrix includes 33,578 features, 24,020 of which are overlapped with the features in the reference genome used in the Wen dataset. The mapped results of the Chua dataset 5 were downloaded from figureshare [ https://doi.org/10.6084/m9.figshare.12436517 ] as mentioned in the publication. The expression matrix includes 26,924 features, 18,410 of which are overlapped with the features in the reference genome used in the Wen dataset. The 'celltype' column in the dataset was used as cell type annotation. Quality control All the datasets include 33,538 features except the He and Chua datasets. For the He and Chua datasets, only the gene symbols that overlap with the other datasets are considered. All the datasets are combined together for quality control and downstream analyses. We filtered the cells with higher than 15% of mitochondrial contents. We also excluded cells of fewer than 500 UMIs or fewer than 200 features. Features expressed in fewer than 3 cells were removed in the analysis. We used Scrublet (version 0.2.1) 33 to determine the doublets in the datasets. As Scrublet was designed to deal with 10X data, the result on Wilk dataset was not used. Cell clusters represented by doublets were removed from the analysis ( Supplementary Figure S14 ) . The cell numbers after quality control were listed in Table 1 . Data Integration SCANPY 34 workflow was used to analyze the data. The following steps were performed: data normalization, log-transformation, highly variable genes selection using the 'cellranger' flavor and principal component analysis. Then, we used Harmony (version 0.0.5) 8 to integrate data from different samples. UMAP 35 and Louvain 36 clustering were calculated according to the Harmony corrected latent space. For the eight datasets of the same reference genome (Wen, Liao, Lee, Yu and Jiang datasets) we combined the dataset to annotate all the cell types. We first used logistic regression in SCCAF 10 to infer the cell types using the possible cell types. And each of the Louvain clusters was assigned to a cell type label accordingly. For Wilk dataset and Chua dataset, we used the cell type annotations adopted from the publication. For the He dataset, we annotated the cell types according to the marker gene expression. To determine the cell types in the data, we first used a machine learning based method to generate cell type labels according to a reference dataset 2 To account for the technical effects such as number of cells or sequencing depth, the hurdle model in MAST 37 was used to model the differential expression of the cells. Only the genes with a false detection rate (FDR) lower than 0.01 was used for volcano plot and later gene ontology analysis. In the MAST results, the genes with a log fold change value greater than 0 are considered as up-regulated genes, while the rest are the down-regulated genes. And the up-regulated genes in a cell cluster are used for gene ontology analysis. The python package of GProfiler 38 was used to understand the pathway regulation. The gene module scores for HLA class II and ISG signature were calculated for each individual cell using sc.tl.score_genes function of scanpy 39 For the other studies included in the current manuscript, the raw sequencing data are available in European Genome-phenome Archive (EGAS00001004481 for Chua et al. 5 The authors declare no conflicts of interests. Tables Table 1. The scRNA-seq studies of SARS-CoV-2 infected patients samples included in this meta-analysis. The 10 datasets used in this meta-analysis study include 6 datasets (Lee, Wilk, Zhang, Wen, Yu, Jiang) of Peripheral blood mononuclear cells (PBMCs), 2 datasets (Liao and He datasets) of Bronchoalveolar lavage fluid (BALF), 1 dataset of nasopharyngeal and bronchial samples (Chua dataset) and 10X healthy control dataset of PBMCs. Increased T-cell clonal expansion (Fig. 6c) Increased CD8 + T-cell clonal expansion in moderate COVID-19 patients compared to severe ( Fig. 6d; Supp Fig. S12b , S12c) Table 1 . The number of samples are listed on the left, while the sample tissue types are colored on the top. The similarities between different conditions are colored from white to black (as from no similarity to high similarity). Here, "BALF" denotes samples derived from Bronchoalveolar lavage fluid, "NB" denotes samples derived from nasopharyngeal/bronchial tissue and "PBMC" denotes samples derived from peripheral blood mononuclear cells. a) The volcano plots show the differential expressions between: 1) moderate patients and healthy donors; 2) severe patients and healthy donors; b) For CD4 T cells, the dot plots show the correlation between the log fold change of moderate/healthy and that of severe/healthy in the five studies; c) For CD8 T cells, the dot plots show the correlation between the log fold change of moderate/healthy and that of severe/healthy in the five studies; d) The venn plot shows the number of upregulated genes in CD14 + monocytes overlapped between the three studies; e) The dot plot shows the top upregulated pathways traced from the 153 overlapped genes identified in c). The number of differentially expressed genes (DEGs) that share a particular GO term is represented by the size of the datapoint, the color shows the FDR-corrected p-value, and the x-axis is the ratio of genes in the full DEG set that contain the particular annotation. f) Violin plots comparing the HLA class II module score of each CD14 + monocytes across studies. All differences were analyzed using two-sided unpaired Wilcoxon rank sum tests with Bonferroni correction and p-values <0.05 are reported. g) Violin plots comparing the ISG module score of each CD14 + monocytes across studies. All differences were analyzed using two-sided unpaired Wilcoxon rank sum tests with Bonferroni correction and p-values <0.05 are reported. . Colors indicate the clusters of TCR detection and cells belonging to expanded (Expnd) vs non-expanded (Non-expnd) clonotypes (left) and the stage of the patients (right). Here, n denotes the number of clones in each clonotype belonging to that group. b) Bar plots showing the percentage of cells at each stage belonging to the specific T-cell subpopulation as well as clone status. Here, grey color denotes no TCR data, orange color denotes non-expanded clone status (Non-expnd) and blue color denotes expanded clone status (Expnd). c) Box plots comparing the percentage of all the T-cells belonging to expanded clonotypes (Expnd (n>=2)) at each stage across studies. Colors denote the stage of the patient. All differences were analyzed using two-sided unpaired Wilcoxon rank sum tests with Bonferroni correction and p-values <0.05 are reported. d) Box plots comparing the percentage of CD8 + memory T-cells (left) and γ T-cells (right) belonging to each group of clonotypes at each stage across studies. Here, the three groups of clonotypes are non-expanded clonotypes (Non-expnd (n=1)), expanded clonotypes with greater than 2 and less than 5 clones (Expnd (2<=n<=5)) and expanded clonotypes with greater than 5 clones (Expnd (n>5)). Colors denote the stage of the patient. All differences were analyzed using two-sided unpaired Wilcoxon rank sum tests with Bonferroni correction and p-values <0.05 are reported. Here, n denotes the number of clones in each clonotype belonging to that group. b) Box plots comparing the percentage of all the B-cells belonging to expanded clonotypes (Expnd (n>=2)) at each stage across studies. Colors denote the stage of the patient. All differences were analyzed using two-sided unpaired Wilcoxon rank sum tests with Bonferroni correction and p-values <0.05 are reported. Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19 A single-cell atlas of the peripheral immune response in patients with severe COVID-19 Single-cell landscape of immunological responses in patients with COVID-19 Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis Single-cell analysis reveals bronchoalveolar epithelial dysfunction in COVID-19 patients Thymosin alpha-1 Protected T Cells from Excessive Activation in Severe COVID-19 Fast, sensitive and accurate integration of single-cell data with Harmony Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 Putative cell type discovery from single-cell gene expression data Dysfunctional CD8 T Cells Form a Proliferative Vγ9Vδ2 T Cells: Can We Re-Purpose a Potent Anti-Infection Mechanism for Cancer Therapy? Cells 9 Functional Phenotypes of Human Vγ9Vδ2 T Cells in Lymphoid Stress Surveillance Enriched CD161high CCR6+ γδ T cells in the cerebrospinal fluid of patients with multiple sclerosis Patterns of chemokine receptor expression on peripheral blood gamma delta T lymphocytes: strong expression of CCR5 is a selective feature of V delta 2/V gamma 9 gamma delta T cells Differences in the molecular signatures of mucosal-associated invariant T cells and conventional T cells TRAV1-2 + CD8 + T-cells including oligoconal expansions of MAIT cells are enriched in the airways in human tuberculosis Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment Reference ranges for lymphocyte subsets among healthy Hong Kong Chinese adults by single-platform flow cytometry Adaptive immune responses to SARS-CoV-2 infection in severe versus mild individuals Azithromycin modulates immune response of human monocyte-derived dendritic cells and CD4+ T cells Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19 Large-Scale Multi-omic Analysis of COVID-19 Severity Longitudinal Multi-omics Analyses Identify Responses of Megakaryocytes, Erythroid Cells, and Plasmablasts as Hallmarks of Severe COVID-19 Multi-omics-based identification of SARS-CoV-2 infection biology and candidate drugs against COVID-19 The cellular immune response to COVID-19 deciphered by single cell multi-omics across three UK centres NCBI GEO: archive for functional genomics data sets--update Single cell and immune repertoire profiling of COVID-19 patients reveal novel therapeutic candidates The European Nucleotide Archive in 2019 Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data SCANPY: large-scale single-cell gene expression data analysis Dimensionality reduction for visualizing single-cell data using UMAP Fast unfolding of communities in large networks MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update) SCANPY: large-scale single-cell gene expression data analysis Elegant Graphics for Data Analysis kassambara/rstatix Welcome to the Tidyverse User-friendly, scalable tools and workflows for single-cell analysis. Cold Spring Harbor Laboratory We thank Prof. Catherine Blish for providing us with WHO scores 46 of samples from Wilk et al. 2020 2 that helped us to redefine non-ventilated patients as moderate for comparison. We would like to acknowledge Dr. Pedro Beltrao and other members of the EMBL COVID-19 discussion group for useful suggestions. We also thank Dr. Craig Russell for comments.Author contributions MG analysed the data and wrote the paper. XL summarized patient level clinical data, gave biological inputs and wrote the paper. ZM designed the project, collected the datasets, analysed the data and wrote the paper. PM and IP visualised the single cell data. AB, ZM and YS supervised the project.