key: cord-0872196-fnb2j2t9 authors: Nassir, Nasna; Tambi, Richa; Bankapur, Asma; Karuvantevida, Noushad; Khansaheb, Hamdah Hassan; Zehra, Binte; Begum, Ghausia; Hameid, Reem Abdel; Ahmed, Awab; Deesi, Zulfa; Alkhajeh, Abdulmajeed; Uddin, K M Furkan; Akter, Hosneara; Safizadeh Shabestari, Seyed Ali; Gaudet, Mellissa; Hachim, Mahmood Yaseen; Alsheikh-Ali, Alawi; Berdiev, Bakhrom K.; Al Heialy, Saba; Uddin, Mohammed title: Analyzing single cell transcriptome data from severe COVID-19 patients date: 2022-04-21 journal: STAR Protoc DOI: 10.1016/j.xpro.2022.101379 sha: 68a93b7e4a21e0f4fa6f8f2634690d9b38e71be3 doc_id: 872196 cord_uid: fnb2j2t9 We describe the protocol for identifying COVID-19 severity specific cell types and their regulatory marker genes using single-cell transcriptomics data. We construct COVID-19 comorbid disease associated gene-list using multiple databases and literature resources. Next, we identify specific cell type where comorbid genes are upregulated. We further characterize the identified cell type using gene-enrichment analysis. We detect upregulation of marker gene restricted to severe COVID-19 cell type and validate our findings using in silico, in vivo and in vitro cellular models. We describe the protocol for identifying COVID-19 severity specific cell types and their 31 regulatory marker genes using single-cell transcriptomics data. We construct COVID-19 32 comorbid disease associated gene-list using multiple databases and literature resources. Next, we 33 identify specific cell type where comorbid genes are upregulated. We further characterize the 34 identified cell type using gene-enrichment analysis. We detect upregulation of marker gene 35 restricted to severe COVID-19 cell type and validate our findings using in silico, in vivo and in 36 vitro cellular models. Authority (approval number #DSREC-04/2020_02). The requirement for informed consent was 81 waived as this study was part of a public health surveillance and outbreak investigation in the 82 UAE. Nonetheless, all patients treated at a healthcare facility in the UAE provide written consent 83 for their deidentified data to be used for research and this study was performed in accordance 84 with the relevant laws and regulations that govern research in the UAE. 85 Key resources Analyze the h5ad output file generated from the above step using Scanpy library (in Python). 112 Perform principal component analysis (PCAs), select the highly variable PCAs for cluster 113 detection (PhenoGraph) and calculate the cluster connectivity using the partition-based graph 114 abstraction (PAGA) algorithm. The python codes are described in the Github repository at 115 https://github.com/theislab/scanpy. The control dataset 2 can be downloaded as h5ad object and 116 similar steps for filtering, normalization, PCA calculation and clustering can be done using 117 Scanpy. Use UMAP (uniform manifold approximation and projection) to visualize the clusters in 118 reduced dimensional space, and then compare the cluster topography with the UMAP from the 119 library(SingleCellExperiment) library(scran) library(sceasy) library(reticulate) #SET working directory pointing to the folder where all your files are to be saved and the outputs will be saved at that location setwd(/PATH/TO/YOUR/DIRECTORY) # Read matrix airway.data <-read.table(file="exprMatrix_hg19.tsv", header = T, row.names=1, sep=" ", as.is=T) #Normalize airway.data_log <-log(airway.data + 1) airway.data_log_mtx <-data.matrix(airway.data_log) write.csv(airway.data_log_mtx, file = exprMatrix_norm_hg19.tsv") #Identify Highly variable genes mgv <-modelGeneVar(airway.data_log_mtx) top_hvg <-getTopHVGs(mgv) my_metadata <->read.csv("meta_cell_types_merged.tsv",sep=" ") #generate SCE object sce <-SingleCellExperiment(assays = list(logcounts = airway.data_log),colData = my_metadata) use_condaenv('EnvironmentName') loompy <-reticulate::import('loompy') sceasy::convertFormat(sce, from="sce", to="anndata",outFile='deprez_cell_types_merged.h5ad ') original studies. Identify the number of PCAs using elbow plot for data clustering and decide on 120 the highly variable features to be used for downstream analysis. 121 122 b) COVID single cell dataset processing using Seurat: Once you download the data from 123 GSE145926, process it using Seurat in R (https://github.com/zhangzlab/covid_balf/ 124 seurat_integration.R). The script 'seurat_integration.R' carries out the following functions 125 i) Load data and create Seurat object. 126 ii) Preprocess data, filtering the matrix with nFeature_RNA, nCount_RNA, and percent.mito. 127 iii) Integrate data using Seurat v3. 128 iv) Scale data in 'Integrated' assay and compute PCAs. #install curl sudo apt install curl #check if curl is installed curl --version library(curl) library(dplyr) #PanglaoDB url <-"https://panglaodb.se/markers/PanglaoDB_markers_27_Mar_2020.tsv.gz" dest<-"/Users/ Desktop/STAR_protocol-_review_comments/new/PanglaoDB_markers_27_Mar_2020.tsv.gz" curl_download(url,dest) #tsv=gzfile("PanglaoDB_markers_27_Mar_2020.tsv.gz") #or tsv=("PanglaoDB_markers_27_Mar_2020.tsv") pdb<-read.csv(tsv,header=T,sep = " ") pdb_filtered<-filter(pdb, species == "Hs" | species == "Mm Hs") pdb_filtered<-filter(pdb_filtered,organ=="Lungs") pdb_filtered<-filter(pdb_filtered,sensitivity_human>=0.5) 182 pdb1 <-split(as.character(pdb_filtered$official.gene.symbol), pdb_filtered$cell.type) View(pdb1) #Cellmarker url2<-"http://biocc.hrbmu.edu.cn/CellMarker/download/Human_cell_markers.txt" dest2<-"/Users/ Desktop/STAR_protocol-_review_comments/new/Human_cell_markers.txt" curl_download(url2,dest2) cmdb<-read.csv("Human_cell_markers.txt", header=T, sep=" ") cmdb_filtered<-filter(cmdb,cancerType=="Normal") cmdb_filtered<-filter(cmdb_filtered,tissueType=="Lung") cmdb1