key: cord-0872196-fnb2j2t9
authors: Nassir, Nasna; Tambi, Richa; Bankapur, Asma; Karuvantevida, Noushad; Khansaheb, Hamdah Hassan; Zehra, Binte; Begum, Ghausia; Hameid, Reem Abdel; Ahmed, Awab; Deesi, Zulfa; Alkhajeh, Abdulmajeed; Uddin, K M Furkan; Akter, Hosneara; Safizadeh Shabestari, Seyed Ali; Gaudet, Mellissa; Hachim, Mahmood Yaseen; Alsheikh-Ali, Alawi; Berdiev, Bakhrom K.; Al Heialy, Saba; Uddin, Mohammed
title: Analyzing single cell transcriptome data from severe COVID-19 patients
date: 2022-04-21
journal: STAR Protoc
DOI: 10.1016/j.xpro.2022.101379
sha: 68a93b7e4a21e0f4fa6f8f2634690d9b38e71be3
doc_id: 872196
cord_uid: fnb2j2t9

We describe the protocol for identifying COVID-19 severity specific cell types and their regulatory marker genes using single-cell transcriptomics data. We construct COVID-19 comorbid disease associated gene-list using multiple databases and literature resources. Next, we identify specific cell type where comorbid genes are upregulated. We further characterize the identified cell type using gene-enrichment analysis. We detect upregulation of marker gene restricted to severe COVID-19 cell type and validate our findings using in silico, in vivo and in vitro cellular models.

We describe the protocol for identifying COVID-19 severity specific cell types and their 31 regulatory marker genes using single-cell transcriptomics data. We construct COVID-19 32 comorbid disease associated gene-list using multiple databases and literature resources. Next, we 33 identify specific cell type where comorbid genes are upregulated. We further characterize the 34 identified cell type using gene-enrichment analysis. We detect upregulation of marker gene 35 restricted to severe COVID-19 cell type and validate our findings using in silico, in vivo and in 36 vitro cellular models. Authority (approval number #DSREC-04/2020_02). The requirement for informed consent was 81 waived as this study was part of a public health surveillance and outbreak investigation in the 82 UAE. Nonetheless, all patients treated at a healthcare facility in the UAE provide written consent 83 for their deidentified data to be used for research and this study was performed in accordance 84 with the relevant laws and regulations that govern research in the UAE. 85 Key resources Analyze the h5ad output file generated from the above step using Scanpy library (in Python). 112

Perform principal component analysis (PCAs), select the highly variable PCAs for cluster 113 detection (PhenoGraph) and calculate the cluster connectivity using the partition-based graph 114 abstraction (PAGA) algorithm. The python codes are described in the Github repository at 115 https://github.com/theislab/scanpy. The control dataset 2 can be downloaded as h5ad object and 116 similar steps for filtering, normalization, PCA calculation and clustering can be done using 117

Scanpy. Use UMAP (uniform manifold approximation and projection) to visualize the clusters in 118 reduced dimensional space, and then compare the cluster topography with the UMAP from the 119 library(SingleCellExperiment) library(scran) library(sceasy) library(reticulate) #SET working directory pointing to the folder where all your files are to be saved and the outputs will be saved at that location setwd(/PATH/TO/YOUR/DIRECTORY) # Read matrix airway.data <-read.table(file="exprMatrix_hg19.tsv", header = T, row.names=1, sep="	", as.is=T) #Normalize airway.data_log <-log(airway.data + 1) airway.data_log_mtx <-data.matrix(airway.data_log) write.csv(airway.data_log_mtx, file = exprMatrix_norm_hg19.tsv") #Identify Highly variable genes mgv <-modelGeneVar(airway.data_log_mtx) top_hvg <-getTopHVGs(mgv) my_metadata <->read.csv("meta_cell_types_merged.tsv",sep="	") #generate SCE object sce <-SingleCellExperiment(assays = list(logcounts = airway.data_log),colData = my_metadata) use_condaenv('EnvironmentName') loompy <-reticulate::import('loompy') sceasy::convertFormat(sce, from="sce", to="anndata",outFile='deprez_cell_types_merged.h5ad ') original studies. Identify the number of PCAs using elbow plot for data clustering and decide on 120 the highly variable features to be used for downstream analysis. 121 122 b) COVID single cell dataset processing using Seurat: Once you download the data from 123 GSE145926, process it using Seurat in R (https://github.com/zhangzlab/covid_balf/ 124 seurat_integration.R). The script 'seurat_integration.R' carries out the following functions 125 i) Load data and create Seurat object. 126

ii) Preprocess data, filtering the matrix with nFeature_RNA, nCount_RNA, and percent.mito. 127

iii) Integrate data using Seurat v3. 128 iv) Scale data in 'Integrated' assay and compute PCAs. #install curl sudo apt install curl #check if curl is installed curl --version library(curl) library(dplyr) #PanglaoDB url <-"https://panglaodb.se/markers/PanglaoDB_markers_27_Mar_2020.tsv.gz" dest<-"/Users/ Desktop/STAR_protocol-_review_comments/new/PanglaoDB_markers_27_Mar_2020.tsv.gz" curl_download(url,dest) #tsv=gzfile("PanglaoDB_markers_27_Mar_2020.tsv.gz") #or tsv=("PanglaoDB_markers_27_Mar_2020.tsv") pdb<-read.csv(tsv,header=T,sep = "	") pdb_filtered<-filter(pdb, species == "Hs" | species == "Mm Hs") pdb_filtered<-filter(pdb_filtered,organ=="Lungs") pdb_filtered<-filter(pdb_filtered,sensitivity_human>=0.5) 182 pdb1 <-split(as.character(pdb_filtered$official.gene.symbol), pdb_filtered$cell.type) View(pdb1) #Cellmarker url2<-"http://biocc.hrbmu.edu.cn/CellMarker/download/Human_cell_markers.txt" dest2<-"/Users/ Desktop/STAR_protocol-_review_comments/new/Human_cell_markers.txt" curl_download(url2,dest2) cmdb<-read.csv("Human_cell_markers.txt", header=T, sep="	") cmdb_filtered<-filter(cmdb,cancerType=="Normal") cmdb_filtered<-filter(cmdb_filtered,tissueType=="Lung") cmdb1<split(as.character(cmdb_filtered$geneSymbol),cmdb_filtered$cellName)

#making list of all cell types nam_all_cell<-unique(unlist(c(lis,lis2))) #compare cell type names #merge #Compare genes within-keep only unique for(j in 1: (Key resource Table) . 

SFARI Gene 2.0: a 551 community-driven knowledgebase for the autism spectrum disorders (ASDs)

Systems biological assessment of immunity to 554 mild versus severe COVID-19 infection in humans

ImmPort, toward repurposing of open access 557 immunological assay data for translational and clinical research

The NHGRI-EBI GWAS 560 Catalog of published genome-wide association studies, targeted arrays and summary statistics 561 2019

Severe infectious diseases of childhood as monogenic inborn errors of 563 immunity

A Single-Cell Atlas of the Human 566 Healthy Airways

PanglaoDB: a web server for 568 exploration of mouse and human single-cell RNA sequencing data

Circuits between 571 infected macrophages and T cells in SARS-CoV-2 pneumonia

Single-cell landscape of bronchoalveolar immune cells

In vivo antiviral host transcriptional response to 577 SARS-CoV-2 by viral load, sex, and age

scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold 581 preservation

Single-cell transcriptome identifies 584 molecular subtype of autism spectrum disorder impacted by de novo loss-of-function variants 585 regulating glial cells

Single-cell transcriptome identifies 588 FCGR3B upregulated subtype of alveolar macrophages in patients with critical COVID-19

Single-Cell Transcriptomic 592

Analysis of Human Lung Provides Insights into the Pathobiology of Pulmonary Fibrosis

Cytoscape: a software environment for integrated 596 models of biomolecular interaction networks

Comprehensive Integration of Single-Cell Data

Single-cell transcriptomics trajectory and molecular convergence 602 of clinically relevant mutations in Brugada syndrome

A 605 benchmark of batch-effect correction methods for single-cell RNA sequencing data

SCANPY: large-scale single-cell gene 608 expression data analysis

CellMarker: a manually curated resource of cell markers in human and mouse

the highest median expression (across the clusters and expression value was more than 99th percentile 632 overall expression. B). Dot plot showing expression of macrophage and its subtype

marker genes for the severe COVID-19 dataset. The y-axis represents the cell types based on the marker 634 database and x-axis represents the marker genes

Steps in identifying cluster associated with comorbid disease gene expression and finding gene 637 restricted to that cell type. Enrichment analysis of severe COVID-19 clusters with comorbid gene set

Higher expression (above global median) is indicated by a star. The cluster having maximum number of 639 upregulated gene set is selected for further downstream analysis and identifying candidate genes with 640 restrictive expression in severe COVID-19 cluster. The representative feature plots are reused from Figure 641 3 of

Steps to perform Cytoscape analysis. Flowchart representing pathway network 644 creation using Enrichment Map and Autoannotate tool. The tab selections are highlighted in red

In the preprocessing steps, the single cell transcriptome data will be available in different formats 466 (Seurat, h5ad, loom and so on). It is important to use the same steps and the single cell analysis 467 tool as used by original work and then converting the final output to the desired format for 468 further processing. The overall topology of the UMAP will be preserved for comparison purpose.