key: cord-0823832-1nqldrpu authors: He, Bing; Garmire, Lana X. title: ASGARD: A Single-cell Guided pipeline to Aid Repurposing of Drugs date: 2021-09-14 journal: ArXiv DOI: nan sha: cce424e57e0891edda8b4dfa54e599ac1923dfd1 doc_id: 823832 cord_uid: 1nqldrpu Intercellular heterogeneity is a major obstacle to successful personalized medicine. Single-cell RNA sequencing (scRNA-seq) technology has enabled in-depth analysis of intercellular heterogeneity in various diseases. However, its full potentials for personalized medicine are yet to be reached. Towards this, we propose A Single-cell Guided pipeline to Aid Repurposing of Drugs (ASGARD). ASGARD can repurpose single drugs for each cell cluster and for multiple cell clusters at individual patient levels; it can also predict personalized drug combinations to address the intercellular heterogeneity within each patient. We tested ASGARD on three independent datasets, including advanced metastatic breast cancer, acute lymphoblastic leukemia, and coronavirus disease 2019 (COVID-19). On single-drug therapy, ASGARD shows significantly better average accuracy (AUC=0.95) compared to two other single-cell pipelines (AUC 0.69 and 0.57) and two other bulk-cell-based drug repurposing methods (AUC 0.80 and 0.75). The top-ranked drugs, such as fulvestrant and neratinib for breast cancer, tretinoin and vorinostat for leukemia, and chloroquine and enalapril for severe COVID19, are either approved by FDA or in clinical trials treating corresponding diseases. In conclusion, ASGARD is a promising pipeline guided by single-cell RNA-seq data, for repurposing personalized drugs and drug combinations. ASGARD is free for academic use at https://github.com/lanagarmire/ASGARD. Heterogeneity, or more specifically the diverse cell populations within the diseased tissue, is the main cause of treatment failure for many complex diseases, such as cancers 1 , Alzheimer's disease 2 , stroke 3 , and coronavirus disease 2019 (COVID- 19) 4 , etc., as well as a major obstacle to successful personalized medicine [5] [6] [7] . Recent significant advances of single-cell technologies, especially the single-cell RNA sequencing (scRNA-seq) technology, have enabled the analysis of intercellular heterogeneity at a very fine resolution 8, 9 and helped us to have many breakthroughs in understanding the disease mechanisms 10 , such as breast cancer 11 , liver cancer 12 and COVID-19 13 . However, its full potentials for personalized medicine have not been fulfilled 14, 15 . Drug repurposing (also known as drug reposition, reprofiling, or re-tasking) is a strategy to identify new uses of a drug outside the scope of its original medical approval or investigation 16 . So far, very few drug repurposing methods have been developed to utilize the highly valuable information residing in scRNA-seq data. The pipeline by Alakwaa identifies significantly differentiated genes (DEGs) for a specific group of cells, then predicts candidate drugs for DEGs using the Connectivity Map Linked User Environment (CLUE) platform, followed by prioritizing these drugs using a comprehensive ranking score system 17 . This pipeline identified didanosine as a potential treatment for COVID-19 using scRNA-seq data 17 . Another pipeline by Guo et al. uses a simple combination of Seurat 18 , a tool for scRNA-seq analysis, and CLUE to identify 281 FDA-approved drugs that have the potential to be effective for treating COVID-19 19 . However, the above pipelines predict drugs for each single-cell cluster within the patient but can't give comprehensive drug scores at the patient level. Meanwhile, in heterogeneous diseases caused by multiple types of cells, a combination of cell-targeting drugs has shown to be a better treatment strategy 20 . Neither of these above-mentioned pipelines is capable of predicting the combination of drugs, limiting their utility in the era of precision medicine. ASGARD accepts processed scRNA-seq data from the Seurat package 18 . In this study, genes identified in fewer than 3 cells are removed from the dataset. We used the same criteria as their original studies to filter cells 11, 13, 21 . Epithelial cells from breast cancer PDXs and healthy breast tissues with fewer than 200 unique genes are removed from the dataset. PBMC cells from leukemia patients and healthy controls with fewer than 200 unique genes are removed from the dataset. BALF cells from COVID-19 patients with fewer than 200 unique genes or more than 6000 unique genes or have a proportion of mitochondrial genes larger than 10% are removed from the dataset 13 . We used cell cycle marker genes and linear transformation to scale the expression of each gene and remove the effects of the cell cycle on gene expressions. ASGARD suggests using functions from Seurat for cell pairwise correspondences. In this study, gene counts for each cell were divided by the total counts for that cell and multiplied by a scaling factor (default is set to 10000). The count matrix was then transformed by log 2(count+1) in R. To identify gene variance across cells, we firstly fitted a line to the relationship of log(variance) and log(mean) using local polynomial regression (loess). Then we standardized the feature values using the observed mean and expected variance (given by the fitted line). Gene variance was then calculated on the standardized values. In this study, we used the 2,000 genes with the highest standardized variance for downstream analysis. Then we identified the K-nearest neighbors (KNNs) between disease and normal cells, based on the L2-normalized canonical correlation vectors (CCV). Finally, we built up the cell pairwise correspondences by identifying mutual nearest neighbors 18 . We applied principal component analysis (PCA) from Seurat on the scaled data to perform the linear dimensional reduction. Then we used a graph-based clustering approach 18 . In this approach, we firstly constructed a KNN graph based on the euclidean distance in PCA space and refined the edge weights between any two cell pairs using Jaccard similarity. Then we applied the Louvain algorithm of modularity optimization to iteratively group cell pairs together. We further ran non-linear dimensional reduction (UMAP) to place similar cells within the graph-based clusters determined above together in low-dimensional space. To annotate clusters of cells, we ran an automatic annotation of single cells based on similarity to the references single-cell panel using the SingleR package 22 . We used the dominant cell type (>50% cells) as the cell type of the cluster. ASGARD supports multiple methods for differential gene analysis, including Limma 23 , Seurat (Wilcoxon Rank Sum test) 18 , DESeq2 24 , and edgeR 25 . The differentially expressed gene list in a disease is transformed into a gene rank list. ASGARD uses 21,304 drugs/compounds with response gene expression profiles in 98 cell lines from the LINCS L1000 project 26 . A differential gene expression list in response to drug treatment is also transformed into a gene rank list. ASGARD further identifies potential candidate drugs that yield reversed gene expression patterns from that in the diseased vs. normal cells, using the DrInsight package 27 (version: 0.1.1). Specifically, it identifies consistently differentially expressed genes, which are up-regulated in cells from diseased tissue but down-regulated in cells with drug treatment, or down-regulated in cells from diseased tissue but up-regulated in cells with drug treatment, to calculate the outliersum (OS) statistic 28 . The Kolmogorov-Smirnov test (K-S test) is then applied to the OS statistic, to show the significance level of one drug treatment relative to the background of all other drugs in the dataset. The reference drug dataset contains gene rank lists of 591,697 drug/compound treatments from the LINCS L1000 data, as mentioned above. The Benjamini-Hochberg (BH) false discovery rate (FDR) is used to adjust P-values from the K-S test to avoid false significance due to multiple hypothesis testing 29 . ASGARD defines a novel drug score (Formula 1) to evaluate the treatment efficacy of single-drug and drug combinations on multiple single-cell clusters per sample. The drug score is a comprehensive estimation of drug therapeutic effects summing over every single cell cluster with weights. The drug score is estimated by the following formula: In this formula, is a particular single-cell cluster. �� . . Besides the drug score, ASGARD further provides a Fisher's combined P-value 30 over the original P value of every cluster. It shows the drug significance on multiple single-cell clusters per sample. The combined p-value is calculated as the right-tail probability . The BH FDR is used to adjust Fisher's combined P-value. For the drug combination, the user needs to set the number of drugs (size) in the combination. ASGARD explores all the potential combinations of required size, using significant drugs obtained from each cluster. It removes combinations that have adverse effects according to the data from Drugbank 31 and SIDER database 32 . For the remaining combinations, ASGARD uses an additive model to estimate the combined gene expression response to the drug combination 20 . The combined gene responses are then used to identify the reversely deregulated gene by the drug combination. If a gene is significantly up/down regulated in the diseased cluster , and the gene is reversely down/up regulated according to the combined gene response using additive model 20 , then this gene is called the reversely deregulated gene by the drug combination. The drug combination score of combined drug A and drug B is modified from Formula 1, using Formula 2 below: In this formula, and are the FDRs of drug A and drug B for cluster k, respectively. ( ) is the number of the combined reversely deregulated genes by drug A and drug B in cluster k. Other terms have the same meanings as in Formula 1. We use the receiver operating characteristic curves (ROCs) and the areas under the ROC curves (AUCs) to compare the performance of ASGARD with those of the other two pipelines, as well as bulk methods. Since these pipelines/methods report both drugs and compounds, we let the ASGARD report both drugs and compounds in the comparisons with other pipelines/methods. For the performance estimation of ASGARD with different methods of differential gene analysis, we let the ASGARD only report drugs. ROCs and AUCs are calculated for each pipeline using the pROC package 33 Using scRNA-seq data, ASGARD repurposes drug combinations maximally efficiently for all cell populations, by fully accounting for the cellular heterogeneity of patients ( Figure 1 ). In ASGARD, every cell type in the diseased sample is paired to that in the normal (or control) sample, according to "anchor" genes that are consistently expressed between diseased and normal cells. It then identifies consistently differentially expressed genes (P-value < 0.05) between the paired diseased and normal clusters in the scRNA-seq data, per computational methods including Limma 23 , Seurat (Wilcoxon Rank Sum test) 18 , DESeq2 24 and edgeR 25 . These individual clusters can be optionally annotated to specific cell types. Then ASGARD uses these consistently differentially expressed genes as inputs to identify drugs that can significantly (FDR < 0.05) reverse their expression levels in the L1000 drug response dataset, which comprises 591,697 drug/compound treatments 26 . Specifically, ASGARD calculates the outlier-sum (OS) statistic first 28 by using the differentially expressed gene list, then applies the Kolmogorov-Smirnov test (K-S test) to the OS statistic to obtain the significance level of one drug treatment relative to the background of all other drugs in the reference drug dataset, as done before 27 . Finally, ASGARD estimates the comprehensive drug score to identify single-drugs that are most efficient in treating all or selected diseased cell clusters (Formula 1 in Methods). ASGARD can also perform the drug combination analysis to identify the synergistic combination of repurposed drugs that are most potent in treating the selected diseased cell clusters (Formula 2 in Methods). Before comparing ASGARD to other bulk RNA-Seq sample-based repurposing methods, we first determined the default differential expression method in ASGARD. For this, we compared several representative differential expression methods: Limma 23 , Seurat (Wilcoxon Rank Sum test) 18 , DESeq2 24 , and edgeR 25 , using advanced metastatic breast cancer 11, 21 , acute lymphoblastic leukemia 35 To compare ASGARD with those drug repurposing methods using bulk RNA-Seq samples, we summarized scRNA-seq data into pseudo-bulk RNA-Seq data. We then applied bulk methods CLUE 37 and DrInsight 27 on the pseudo-bulk RNA-Seq query data and compared their results with ASGARD on predicting both drugs and compounds ( Figure 2B ). We took the same scRNAseq data from the same three datasets above. Since CLUE and DrInsight predict both drugs and compounds, we added compounds validated in animal models to the true positive dataset for the AUC evaluation of drug/compound predictions. As a result, the AUCs obtained from ASGARD on drugs and compounds ( Figure 2B Figure 2B ). In summary, by paying attention to heterogeneity at single-cell levels, ASGARD shows much better drug repurposing predictability than methods that rely on bulk samples. We also compared single drug prediction using ASGARD with two other pipelines developed by Alakwaa et al. 17, 19 and Guo et al. 17, 19 , which were reported to handle scRNA-Seq data. Note that ASGARD offers more functionalities than those two methods, from at least two aspects. Figure 3C ). These results collectively support the conclusion that ASGARD predicts single drugs more accurately than Alakwaa' and Guo' pipelines. Additionally, given that sample size, cell population similarity, and proportion of disease cells impact significantly on differential gene analysis 38 Figure 2C ). We collected scRNA-seq data from 24,741 epithelial cells of advanced metastatic breast cancer Patient-Derived Xenografts (PDXs) models 11 We first applied ASGARD for single drug repurposing prediction and predicted 11 drugs (FDR<0.05 and overall drug score >0.99 quantiles) for advanced metastatic breast cancer ( Figure 4C ). Among them, fulvestrant and neratinib have been approved by the Food and Drug Administration (FDA) for breast cancer treatment 39, 40 . Fostamatinib is the top 1 drug candidate ( Figure 4C ). It is a tyrosine kinase inhibitor medication approved for the treatment of chronic immune thrombocytopenia 41 . We next applied ASGARD to predict drug combinations, using significant drugs (FDR<0.05) repurposed for each cluster ( We next investigated the target genes and pathways of fostamatinib and colchicine combination ( Figure 4F ). Fostamatinib and colchicine both target all the significant pathways in each cluster. Fostamatinib and colchicine are complementary in targeting genes of these pathways. Among the 143 target genes from these significant pathways, only 29 target genes are shared by fostamatinib and colchicine ( Figure 4F ). The combination of fostamatinib and colchicine also shows biologically synergistic targeting of multiple genes on the same significant pathways. For example, fostamatinib inhibits Cyclin D1 (CCND1) to produce G1 arrest in the p53 signaling pathway, while colchicine inhibits Cyclin-dependent kinase 1 (CDK1) to produce G2 arrest in the p53 signaling pathway and cell cycle pathway 43 ( Figure 4F ). Additionally, the drug scores of top drug combination candidates vary from one PDX model to another (Figure 4E ), demonstrating that ASGARD is a forward-looking personalized medicine strategy in silico. We further applied ASGARD to the collected scRNA-seq data from 2 Pre-T ALL patients and 3 normal healthy controls 35 (ZAP70) in apoptosis and NF−kappa B signaling pathways ( Figure 5E ). All these genes were previously shown significance in T cell clusters of ALL [46] [47] [48] . Using ASGARD, we annotated scRNA-seq data collected from the bronchoalveolar lavage fluid In this study, we present A Single-cell Guided pipeline to Aid Repurposing of Drugs (ASGARD). To evaluate the accuracy of ASGARD in single drug repurposing, we compared ASGARD to other This drug score shows a significantly (P-value <0.05, student's t-test) better AUC than the prediction based on individual clusters (Figure 2, 3) . It suggests that targeting individual cell clusters isn't sufficient for treating diseases. Good therapy should be able to target all essential diseased cell clusters of the patient. On the other hand, it is not ideal to propose drug repurposing using bulk RNA-seq as done by traditional methods either (eg. CLUE and DrInsight). There exists significant heterogeneity of different cell populations and not all these cells play equal roles in the diseases 53, 54 , reflected by different gene expression responses to drug treatment 55 . ASGARD can distinguish more important cell types from others and repurpose drugs accordingly, explaining why ASGARD has significantly (P-value <0.05, student's t-test) better AUC performance than traditional bulk methods ( Figure 2B) . Moreover, ASGARD also demonstrates variations of drug and drug combination scores within patients ( Figure 4C, 4E, 5D, and 6D ). This stresses that personalized therapy is necessary for the best therapeutic effect and utilizing single-cell sequencing information may help to achieve that. We chose breast cancer or leukemia datasets to illustrate the utilities of ASGARD, given the relative abundance of prior knowledge on drugs. Many drugs predicted by ASGARD have been approved by FDA, such as fulvestrant and neratinib for breast cancer 39,40 ( Figure 4C ), tretinoin, and vorinostat for leukemia 44, 45 ( Figure 5D ). The drug combination is an alternative strategy to precisely addressing multiple diseased single-cell clusters. In the breast cancer dataset, fostamatinib and colchicine is the top 1 candidate combination ( Figure 4E ). Fostamatinib is a tyrosine kinase inhibitor. Tyrosine kinase inhibitors have been widely used either in single drug treatment or combination therapy for breast cancer 56, 57 . Colchicine is an alkaloid used in symptomatic pain relief in attacks of gout. Fostamatinib and colchicine show synergistic targeting of multiple genes in the significant pathways associated with breast cancers ( Figure 4F ). For example, fostamatinib inhibits Cyclin D1 (CCND1) to produce G1 (first growth) arrest 58 in the p53 signaling pathway, and colchicine inhibits Cyclin-dependent kinase 1 (CDK1) to produce G2 (second growth) arrest 59 in the p53 signaling pathway and cell cycle pathway. G1 and G2 are important phases of the cell cycle, essential for the treatment of breast cancer 60 . The combination of colchicine and fostamatinib shows a synergistic effect according to the drug interaction record from the DrugBank database 31 , although it hasn't been tested in breast cancer in particular. COVID-19 is an ongoing and evolving pandemic, therefore drug knowledge is changing too. Remdesivir is the only drug approved for COVID-19 61 . But it's not in the L1000 drug response dataset, and therefore wasn't predicted by ASGARD. However, ASGARD does predict ACE inhibitors rescinnamine and enalapril as the 2nd and 4th drug candidates, aimed to reduce the mortality of the severe COVID-19. A study of 19,486 COVID-19 patients from 8.28 million England participants showed that enalapril was associated with reduced risks of COVID-19 disease 62 . Another observational clinical trial on 22,213 participants (ClinicalTrials.gov Identifier: NCT04467931) further showed that the use of enalapril is associated with a 15% lower relative risk of mortality of COVID-19 patients 63 . Our drug-gene-pathway analysis shows that enalapril targets all the significantly deregulated pathways, such as coronavirus disease−COVID−19, chemokine signaling, and IL−17 signaling pathway, in monocyte, NK cell, neutrophil, and T cell ( Figure 6E ). These pathways play important roles in COVID-19 patient severity and survival 64 . It may explain the observed efficiency of enalapril in reducing mortality of COVID-19 patients. On the other hand, rescinnamine was rarely studied for COVID-10. Since rescinnamine targets the same significant pathways as enalapril in COVID-19 ( Figure 6E ), rescinnamine could be an alternative candidate for further investigation. Altogether, this study shows clear evidence that ASGARD repurposes confident drugs that were approved or in clinical trials for breast cancer, leukemia, and COVID-19, respectively. It also provides new applications for drugs and drug combinations that warrant further clinical studies. In all, ASGARD is a single-cell guided pipeline with significant potential to recommend personalized repurposeful drugs and drug combinations using scRNA-seq data. -05 FOS, BIRC5, CTSF, CASP2, ITPR2, LMNB1, ATF4, ENDOG, PARP1, HRK, TUBA1C, CAPN2, GADD45B, EIF2S1, ACTG1, CDKN2A, MAD2L1, CCNA2, CCNB2, CCNB1, SKP1, E2F5, PRKDC, SKP2, ORC6, ORC1, TTK, PTTG1, YWHAE, CDKN2C, SFN Tumour heterogeneity and resistance to cancer therapies Heterogeneity of Alzheimer's disease: consequence for drug trials? Cerebrovascular Disease: Primary and Secondary Stroke Prevention One size does not fit all -Patterns of vulnerability and resilience in the COVID-19 pandemic and why heterogeneity of disease matters Genomic Heterogeneity as a Barrier to Precision Medicine in Gastroesophageal Adenocarcinoma Heterogeneity in Colorectal Cancer: A Challenge for Personalized Medicine? Harnessing big 'omics' data and AI for drug discovery in hepatocellular carcinoma Single-cell analysis tools for drug discovery and development Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data Using single-cell multiple omics approaches to resolve tumor heterogeneity Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer Intratumoral heterogeneity and clonal evolution in liver cancer Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 Eleven grand challenges in single-cell data science SINGLE CELL ANALYSIS, WHAT IS IN THE FUTURE? in Biocomputing Drug repurposing: progress, challenges and recommendations Repurposing Didanosine as a Potential Treatment for COVID-19 Using Single-Cell RNA Sequencing Data Comprehensive Integration of Single-Cell Data Identification of Repurposal Drugs and Adverse Drug Reactions for Various Courses of Coronavirus Disease 2019 (COVID-19) Based on Single-cell RNA Sequencing Data Combination therapeutics in complex diseases Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage limma powers differential expression analyses for RNA-sequencing and microarray studies Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles Breaking the paradigm: Dr Insight empowers signature-free, enhanced drug repurposing Outlier sums for differential gene expression analysis Prediction of repurposed drugs for treating lung injury in COVID-19 Powerful p-value combination methods to detect incomplete association DrugBank 5.0: a major update to the DrugBank database for 2018 The SIDER database of drugs and side effects pROC: an open-source package for R and S+ to analyze and compare ROC curves R Foundation for Statistical Computing, R. C. R: A language and environment for statistical computing. R Foundation for Statistical Computing Single-cell analysis of childhood leukemia reveals a link between developmental states and ribosomal protein expression as a source of intra-individual heterogeneity COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas The Drug Repurposing Hub: a next-generation drug library and information resource Power analysis and sample size estimation for RNA-Seq differential expression Fostamatinib for the treatment of chronic immune thrombocytopenia Novel Colchicine Derivatives and their Anti-cancer Activity Patterns of cell cycle checkpoint deregulation associated with intrinsic molecular subtypes of human breast cancer cells Tretinoin in the treatment of acute promyelocytic leukemia Vorinostat in acute myeloid leukemia and myelodysplastic syndromes Therapeutic targeting of the cyclin D3:CDK4/6 complex in T cell leukemia Activation of endogenous c-fos proto-oncogene expression by human T-cell leukemia virus type I-encoded p40tax protein in the human T-cell line T-cell acute lymphoblastic leukemia Natural killer cells associated with SARS-CoV-2 viral RNA shedding, antibody response and mortality in COVID-19 patients Monocyte-driven atypical cytokine storm and aberrant neutrophil activation as key mediators of COVID-19 disease severity T cells in COVID-19 -united in diversity Neutrophil Extracellular Traps as Prognostic Markers in COVID-19: A Welcome Piece to the Puzzle Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization Intercellular signaling dynamics from a single cell atlas of the biomaterials response Drug discovery in traditional Chinese medicine: from herbal fufang to combinatory drugs The role of tyrosine kinase inhibitors in the treatment of HER2+ metastatic breast cancer Tyrosine Kinase Inhibitors in the Combination Therapy of HER2 Positive Breast Cancer Targeting the Spleen Tyrosine Kinase with Fostamatinib as a Strategy against Waldenström Macroglobulinemia KINETICS OF INHIBITION AND THE BINDING OF H3-COLCHICINE Targeting the cell cycle in breast cancer: towards the next phase Baricitinib plus Remdesivir for Hospitalized Adults with Covid-19 Risk of severe COVID-19 disease with ACE inhibitors and angiotensin receptor blockers: cohort study including 8.3 million people Angiotensin II receptor blocker or angiotensin-converting enzyme inhibitor use and COVID-19-related outcomes among US Veterans An inflammatory cytokine signature predicts COVID-19 severity and survival The authors thank Qianhui Huang L.X. Garmire conceived the study of and supervised the project. B. He wrote the code and analyzed the data. B. He and L.X. Garmire wrote the manuscript. The authors declare that they have no competing interests.