key: cord-0841126-j51zyodw authors: Zeng, Xiangxiang; Song, Xiang; Ma, Tengfei; Pan, Xiaoqin; Zhou, Yadi; Hou, Yuan; Zhang, Zheng; Li, Kenli; Karypis, George; Cheng, Feixiong title: Repurpose Open Data to Discover Therapeutics for COVID-19 Using Deep Learning date: 2020-07-12 journal: J Proteome Res DOI: 10.1021/acs.jproteome.0c00316 sha: a4ce5b42f3e41611b3977a58e667668efc53a627 doc_id: 841126 cord_uid: j51zyodw [Image: see text] There have been more than 2.2 million confirmed cases and over 120 000 deaths from the human coronavirus disease 2019 (COVID-19) pandemic, caused by the novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), in the United States alone. However, there is currently a lack of proven effective medications against COVID-19. Drug repurposing offers a promising route for the development of prevention and treatment strategies for COVID-19. This study reports an integrative, network-based deep-learning methodology to identify repurposable drugs for COVID-19 (termed CoV-KGE). Specifically, we built a comprehensive knowledge graph that includes 15 million edges across 39 types of relationships connecting drugs, diseases, proteins/genes, pathways, and expression from a large scientific corpus of 24 million PubMed publications. Using Amazon’s AWS computing resources and a network-based, deep-learning framework, we identified 41 repurposable drugs (including dexamethasone, indomethacin, niclosamide, and toremifene) whose therapeutic associations with COVID-19 were validated by transcriptomic and proteomics data in SARS-CoV-2-infected human cells and data from ongoing clinical trials. Whereas this study by no means recommends specific drugs, it demonstrates a powerful deep-learning methodology to prioritize existing drugs for further investigation, which holds the potential to accelerate therapeutic development for COVID-19. As of June 22, 2020, in the United States alone, more than 2.2 million cases and over 120 000 deaths from Coronavirus Disease 2019 (COVID- 19) , the disease caused by the virus SARS-CoV-2, have been confirmed. 1 However, there are currently no proven effective antiviral medications against COVID-19. 2 There is an urgent need for the development of effective treatment strategies for COVID- 19 . It was estimated that in 2015, pharmaceutical companies spent $2.6 billion for the development of an FDA-approved new chemical entity drugs using traditional de novo drug discovery. 3 Drug repurposing, a drug-discovery strategy using existing drugs, offers a promising route for the development of prevention and treatment strategies for COVID-19. 4 In a randomized, controlled, open-label trial, 5 lopinavir and ritonavir combination therapy did not show a clinical benefit compared with standard care for hospitalized adult patients with severe COVID-19, limiting the traditional antiviral treatment for COVID-19. SARS-CoV-2 replication and infection depend on the host cellular factors (including angiotensin-converting enzyme 2 (ACE2)) for entry into cells. 6 The systematic identification of virus−host protein− protein interactions (PPIs) offers an effective way toward the elucidation of the mechanisms of viral infection; furthermore, targeting the cellular virus−host interactome offers a promising strategy for the development of effective drug repurposing for COVID-19, as demonstrated in previous studies. 7 −9 We recently demonstrated that network-based methodologies leveraging the relationship between drug targets and diseases can serve as a useful tool for the efficient screening of potentially new indications of FDA-approved drugs with wellestablished pharmacokinetic/pharmacodynamic, safety, and tolerability profiles. 10−12 Deep learning has also recently demonstrated its better performance than classic machine learning methods to assist drug repurposing, 13 −16 yet without foreknowledge of the complex networks connecting drugs, targets, SARS-CoV-2, and diseases, the development of affordable approaches for the effective treatment of COVID-19 is challenging. Prior knowledge of networks from the large scientific corpus of publications offers a deep biological perspective for capturing the relationships between drugs, genes, and diseases (including COVID- 19 ), yet extracting connections from a large-scale repository of structured medical information is challenging. In this study, we present the state-of-the-art knowledge-graph-based, deep-learning methodologies for the rapid discovery of drug candidates to treat COVID-19 from 24 million PubMed publications ( Figure 1 ). Via systematic validation using transcriptomics and proteomics data generated from SARS-CoV-2-infected human cells and the ongoing clinical trial data, we successfully identified 41 drug candidates that can be further tested in large-scale randomized control trials for the potential treatment of COVID-19. Specifically, a comprehensive knowledge graph that contains 15 million edges across 39 types of relationships connecting drugs, diseases, genes, pathways, expressions, and others by incorporating data from 24 million PubMed publications and DrugBank (Table S2) . Subsequently, a deeplearning approach (RotatE in DGL-KE) was used to prioritize high-confidence candidate drugs for COVID-19 under Amazon supercomputing resources (cf. Methods and Materials). Finally, all CoV-KGE predicted drug candidates were future-validated by three gene expression data sets in SARS-CoV-1-infected human cells and one proteomics data set in SARS-CoV-2 infected human cells. Tables S1 and S2 ). In this KG, we represent the Coronaviruses (CoVs) by assembling multiple types of known CoVs, including SARS-CoV-1 and MERS-CoV, as described in our recent study. 9 We next utilized DGL-KE's knowledge graph embedding (KGE) model, RotatE, 20 to learn representations of the entities (e.g., drugs and targets) and relationships (e.g., inhibition relation between drugs and targets) in an informative, lowdimensional vector space. In this space, each relationship type (e.g., antagonists or agonists) is defined as a rotation from the source entity (e.g., hydroxychloroquine) to the target entity (e.g., toll-like receptor 7/9 (TLR7/9)). In this study, we constructed a comprehensive KG from Global Network of Biomedical Relationships (GNBR) 18 and DrugBank. 19 First, from GNBR, we included in the KG relations corresponding to drug−gene interactions, gene−gene interactions, drug−disease associations, and gene−disease associations. Second, from the DrugBank database, 19 we selected the drugs whose molecular mass is >230 Da and also exist in GNBR, resulting in 3481 FDA-approved and clinically investigational drugs. For these drugs, we included in the KG relationships corresponding to the drug−drug interactions and the drug side-effects, drug anatomical therapeutic chemical (ATC) codes, drug mechanisms of action, drug pharmacodynamics, and drug-toxicity associations. Third, we included the experimentally discovered CoV−gene relationships from our recent work in the KG. 9 Fourth, we treated the COVID-19 context by assembling known genes/ proteins associated with CoVs (including SARS-CoV and MERS-CoV) as a comprehensive node of CoVs and rewired the connections (edges) from genes and drugs. The resulting KG contains four types of entities (drug, gene, disease, and drug side information), 39 types of relationships (Table S1) , 145 179 nodes, and 15 018 067 edges (Table S2) . Models for computing KGEs learn vectors for each of the entities and each of the relation types so that they satisfy certain properties. In our work, we learned these vectors using the RotatE model. 20 Given an edge in the KG represented by the triplet (head entity, relation type, and tail entity), RotatE defines each relation type as a rotation from the head entity to the tail entity in the complex vector space. Specifically, if h and t are the vectors corresponding to the head and tail entities, respectively, and r is the vector corresponding to the relation type, then RotatE tries to minimize the distance where ⊗ denotes the Hadamard (element-wise) product. To minimize the distance between the head and the tail entities of the existing triplets (positive examples) and maximize the distance among the nonexisting triplets (negative examples), we use the loss function where σ is sigmoid function, γ is a margin hyperparameter with γ > 0, (h i , r, t i ) is a negative triplet, and p(h i , r, t i ) is the probability of occurrence of the corresponding negative sample. DGL-KE 17 is a high-performance, easy-to-use, and scalable package for learning large-scale KGEs with a set of popular models including TransE, DistMult, ComplEx, and RotatE. It includes various optimizations that accelerate training on KGs with millions of nodes and billions of edges using multiprocessing, multi-GPU (graphics processor unit), and distributed parallelism. DGL-KE is able to compute the RotatEbased embeddings of our KG in ∼40 mins on an EC2 instance with 8 GPUs under Amazon's AWS computing resources. We divide the triplets (e.g., a relationship among drug, treatment, and disease) into a training set, validation set, and test set in a 7:1:2 manner. We selected the embedding dimensionality of dim = 200 for nodes and relations. The RotatE is trained for 16 000 epochs with a batch size 1024 and 0.1 as the learning rate. We choose γ = 12 as the margin of the optimization function. Gene set enrichment analysis was performed to further validate the predicted drug candidates from CoV-KGE. The goal of the gene set enrichment analysis was to identify drugs that can reverse the cellular changes (transcriptome or proteome levels) that result from virus infection. Four differential expression data sets were collected, including two transcriptome data sets from SARS-CoV patients' peripheral blood 21 (GSE1739) and Calu-3 cells 22 (GSE33267), one transcriptome data set of Calu-3 cells infected by MERS-CoV 23 (GSE122876), and one proteome data set of human Caco-2 cells infected with SARS-CoV-2. 24 These four data sets were used as the gene signatures for the viral infections. For the drugs, we retrieved the Connectivity Map (CMap) database 25 Journal of Proteome Research pubs.acs.org/jpr Article where j = 1, 2, ..., s are the genes from the CoV signature data set sorted in ascending order using the gene profiles of the drug being computed. V(j) denotes the rank of j, where 1 ≤ V(j) ≤ r, with r being the total number of genes (12 849) from the CMap database. Next, ES up/down is set to a up/down if a up/down > b up/down and is set to −b up/down if b up/down > a up/down . Permutation tests are repeated 100 times to quantify the significance of the ES score. In each repeat, the same number of up-and down-expressed genes as the CoV signature data set was randomly generated. ES > 0 and P < 0.05 are considered significantly enriched. The number of significantly enriched data sets is used as the final result for a certain drug. We introduced the area under the receiver operating characteristic (ROC) curve (AUROC) and several evaluation metrics for evaluating the performance of drug−target interaction prediction. The AUROC 27 is the global prediction performance. The ROC curve is obtained by calculating the true-positive rate (TPR) and the false-positive rate (FPR) via varying cutoffs. ■ RESULTS After mapping drugs, CoVs, and the treatment relationships to a complex vector space using RotatE, the top 100 most relevant drugs were selected as candidates for CoVs in the treatment relation space ( Figure S1 ). Using the ongoing COVID-19 trial data (https://covid19-trials.com/) as a validation set, CoV-KGE has a larger AUROC (AUROC = 0.85, Figure 2 ) for identifying repurposable drugs for COVID-19. We next employ t-SNE (t-distributed stochastic neighbor embedding algorithm 28 ) to further investigate the lowdimensional node representation learned by CoV-KGE. Specifically, we projected drugs grouped by the first level of the Anatomical Therapeutic Chemical (ATC) classification systems code onto a 2D space. Figure 3A indicates that CoV-KGE is able to distinguish 14 types of drugs grouped by ATC codes, which is consistent with a high AUROC value of 0.85 ( Figure 2) . We further validated the top candidate drugs using an enrichment analysis of drug−gene signatures and SARS-CoVinduced transcriptomics and proteomics data in human cell lines (cf. Methods and Materials). Specifically, we analyzed three transcriptomic data sets in SARS-CoV-1-infected human cell lines and one proteomics data set in SARS-CoV-2-infected human cell lines. In total, we obtained 41 repositioned drug candidates (Table 1) using subject-matter expertise based on a combination of factors: (i) the strength of the CoV-KGE predicted score, (ii) the availability of clinical evidence from ongoing COVID-19 trials, and (iii) the availability and strength of enrichment analyses from SARS-CoV-1/2-affected human cell lines. Among the 41 candidate drugs, 9 drugs are or have been under clinical trials for COVID-19, including thalidomide, methylprednisolone, ribavirin, umifenovir, tetrandrine, suramin, dexamethasone, lopinavir, and azithromycin ( Figure 3A and Table 1 ). We excluded chloroquine and hydroxychloroquine from our ongoing clinical trial list based on recently controversial reports. 29, 30 Discovery of Drug Candidates for COVID-19 Using CoV-KGE We next turned to highlight three types of predicted drugs for COVID-19, including anti-inflammatory agents (dexamethasone, indomethacin, and melatonin), selective estrogen receptor modulators (SERMs), and antiparasitics ( Figure 3) . Anti-Inflammatory Agents. Given the well-described lung pathophysiological characteristics and immune responses (cytokine storms) of severe COVID-19 patients, drugs that dampen the immune responses may offer effective treatment approaches for COVID-19. 31, 32 As shown in Figure 3A , we computationally identified multiple anti-inflammatory agents for COVID-19, including dexamethasone, indomethacin, and melatonin. Indomethacin, an approved cyclooxygenase (COX) inhibitor, has been widely used for its potent anti-inflammatory and analgesic properties. 33 Indomethacin has been reported to have antiviral properties, including SARS-CoV-1 33 and SARS-CoV-2. 34 Importantly, a preliminary in vivo observation showed that oral indomethacin (1 mg/kg body weight daily) reduced the recovery time of SARS-CoV-2-infected dogs. 34 Melatonin plays a key role in the regulation of the human circadian rhythm that alters the translation of thousands of genes, including melatonin-mediated anti-inflammatory and immune-related effects for COVID-19. Melatonin has various antiviral activities by suppressing multiple inflammatory pathways 35,36 (i.e., IL6 and IL-1β); these inflammatory effects are directly relevant given the well-described lung pathophysiological characteristics of severe COVID-19 patients. Melatonin's mechanism of action may also help to explain the epidemiologic observation that children, who have naturally high melatonin levels, are relatively resistant to COVID-19 disease manifestations, whereas older individuals, who have decreasing melatonin levels with age, are a very highrisk population. 37 In addition, exogenous melatonin administration may be of particular benefit to older patients given the aging-related reduction of endogenous melatonin levels and the vulnerability of older individuals to the lethality of SARS-CoV-2. 37 Dexamethasone is a U.S. FDA-approved glucocorticoid receptor (GR) agonist for a variety of inflammatory and autoimmune conditions, including rheumatoid arthritis, severe allergies, asthma, chronic obstructive lung disease, and others. 38 Glucocorticoid medications have been used in patients with MERS-CoV and SARS-CoV-1 infections. 39 As shown in Figure 3A , dexamethasone is the fourth predicted drug among 41 candidates. The Randomized Evaluation of COVID-19 therapy (RECOVERY, ClinicalTrials.gov Identifier: NCT04381936) trial showed that dexamethasone reduced mortality by one-third in patients requiring ventilation and by one-fifth in individuals requiring oxygen, 40 yet dexamethasone did not reduce death in COVID-19 patients not receiving respiratory support. 40 Selective Estrogen Receptor Modulators. An overexpression of the estrogen receptor has played a crucial role in inhibiting viral replication and infection. 41 Several SERMs, including clomifene, bazedoxifene, and toremifene, are identified as promising candidate drugs for COVID-19 ( Figure 3A and Table 1 ). Toremifene, the first generation of the nonsteroidal SERM, was reported to block various viral infections at low micromolar concentration, including Ebola virus, 42, 43 MRES-CoV, 44 SARS-CoV-1, 45 and SARS-CoV-2 46 ( Figure 3B ). Toremifene prevents fusion between the viral and endosomal membranes by interacting with and destabilizing the virus glycoprotein and eventually blocking replications of the Ebola virus. 42 The underlying antiviral mechanisms of SARS-CoV-1 and SARS-CoV-2 for toremifene remain unclear and are currently being investigated. Toremifene has been approved for the treatment of advanced breast cancer 47 and has also been studied in men with prostate cancer (∼1500 subjects) with reasonable tolerability. 48 Toremifene is 99% bound to plasma protein with good bioavailability and typically orally administered at a dosage of 60 mg. 49 In summary, toremifene is a promising candidate drug with ideal pharmacokinetics properties to be directly tested in COVID-19 clinical trials. Antiparasitics. Despite the lack of strong clinical evidence, hydroxychloroquine and chloroquine phosphate, two approved antimalarial drugs, were authorized by the U.S. FDA for the treatment of COVID-19 patients using emergency use authorizations (EUAs). 2 In this study, we identified that both hydroxychloroquine and chloroquine are among the predicted candidates for COVID-19 ( Figure 3A and Table 1 ). Between the two, hydroxychloroquine's in vitro antiviral activity against SARS-CoV-2 is stronger than that of chloroquine (hydroxychloroquine: 50% effective concentration (EC 50 ) = 6.14 μM, whereas for chloroquine: EC 50 = 23.90 μM). 50 Hydroxychloroquine and chloroquine are known to increase the pH of endosomes, which inhibits membrane fusion, a required mechanism for viral entry (including SARS-CoV-2) into the cell. 19 Although chloroquine and hydroxychloroquine are relatively well tolerated, several adverse effects 29 As June 20, 2020, the National Institutes of Health halted the clinical trial of hydroxychloroquine owing to the lack of clinical benefits. 30 Thus further Niclosamide, an FDA-approved drug for the treatment of tapeworm infestation, was recently identified to have a stronger inhibitory activity on SARS-CoV-2 at the submicromolar level (IC 50 = 0.28 μM). Gassen et al. showed that niclosamide inhibited SKP2 activity by enhancing autophagy and reducing MERS-CoV replication as well. 54 Altogether, niclosamide may be another drug candidate for COVID-19, which is warranted to be investigated experimentally and further tested in randomized controlled trials. Given the up-regulation of systemic inflammationin some cases, culminating to a cytokine storm observed in severe COVID-19 patients 31 combination therapy with an agent targeting inflammation (melatonin, dexamethasone, or indomethacin) and with direct antiviral effects (toremifene and niclosamide) has the potential to lead to successful treatments ( Figure 4) . Because of the aging-related reduction of endogenous melatonin levels and the vulnerability of older individuals to the lethality of SARS-CoV-2, 37 combining exogenous melatonin administration and antiviral agents (such as toremifene or niclosamide) may be of particular benefit to older patients with COVID-19. Yet all computationally predicted drug candidates (Table 1 ) and proposed drug combinations (Figure 4 ) must be validated experimentally and be tested in randomized controlled trials. Several combination antiviral and anti-inflammatory treatment trials (remdesivir plus baricitinib) are underway for patients with COVID-19 (clinicalTrials.gov Identifier: NCT04373044), indicating the proof-of-concept of this combination therapy for COVID-19. As COVID-19 patients flood hospitals worldwide, physicians are trying to search for effective antiviral therapies to save lives. Multiple COVID-19 vaccine trials are underway, yet it might not be physically possible to make enough vaccines for everyone in a short period of time. Furthermore, SARS-CoV-2 replicates poorly in multiple animals, including dogs, pigs, chickens, and ducks, which limits preclinical animal studies. 55 To fight the emerging COVID-19 pandemic, we introduced an integrative, network-based, deep-learning methodology to discover candidate drugs for COVID-19, named CoV-KGE. Via CoV-KGE, we built a comprehensive KG that includes 15 million edges across 39 types of relationships connecting drugs, diseases, proteins/genes, pathways, and expressions from a large scientific corpus of 24 million PubMed publications. Using the ongoing COVID-19 trial data as a validation set, we demonstrated that CoV-KGE had high performance in identifying repurposable drugs for COVID-19, indicated by the larger AUROC (AUROC = 0.85). Using Amazon's AWS computing resources, we identified 41 high-confidence repurposed drug candidates (including dexamethasone, indomethacin, niclosamide, and toremifene) for COVID-19, which were validated by an enrichment analysis of gene expression and proteomics data in SARS-CoV-2 infected human cells. Altogether, this study offers a powerful, integrated deep-learning methodology for the rapid identification of repurposable drugs for the potential treatment of COVID-19. We acknowledge several potential limitations in the current study. Potential data noises generated from different experimental approaches in large-scale publications may influence the performance of the current CoV-KGE models. The original data of GNBR contain the confidence values of the relations between entities. However, we ignored the weights so that we could directly apply the RotatE algorithm because we tried to obtain the prediction result in a cheap computing-cost way. In our future work, we will take these confidence values into account and try to design a knowledgegraph-embedding algorithm that can be used for a KG with weighted relationships. The lack of dose-dependent profiles and the biological perturbation of SARS-CoV-2 virus−host interactions may generate a coupled interplay between adverse and therapeutic effects. The integration of pharmacokinetics data from animal models and clinical trials into our CoV-KGE methodology could establish the causal mechanism and patient evidence through which predicted drugs would have high clinical benefits for COVID-19 patients without obvious adverse effects in a specific dosage. In summary, we presented CoV-KGE, a powerful, integrated AI methodology that can be used to quickly identify drugs that can be repurposed for the potential treatment of COVID-19. Our approach can minimize the translational gap between preclinical testing results and clinical outcomes, which is a significant problem in the rapid development of efficient treatment strategies for the COVID-19 pandemic. From a translational perspective, if broadly applied, the network tools developed here could help develop effective treatment strategies for other emerging infectious diseases and other emerging complex diseases as well. However, all predicted drugs not used in clinical trials must be tested in randomized clinical trials before being used in COVID-19 patients. Table 1 . Details of the five categories of relationships in our KG. Supplementary Table 2 . Statistics of nodes (entity) and edges (relation) in our KG (PDF) An interactive web-based dashboard to track COVID-19 in real time Pharmacologic treatments for Coronavirus disease 2019 (COVID-19): A Review The $2.6 billion pill-methodologic and policy considerations Coronavirus puts drug repurposing on the fast track SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Systems biology-based investigation of cellular antiviral drug targets identified by gene-trap insertional mutagenesis Network-based drug repurposing for novel coronavirus 2019-nCoV/ SARS-CoV-2 Network-based approach to prediction and population-based validation of in silico drug repurposing A genome-wide positioning systems network algorithm for in silico drug repurposing C.; et al. A deep learning approach to antibiotic discovery deepDR: a network-based deep learning approach to in silico drug repositioning Predicting drug− protein interaction using quasi-visual question answering system Target identification among known drugs by deep learning from heterogeneous networks Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs A global network of biomedical relationships derived from text Knowledge Graph Embedding by Relational Rotation in Complex Space Expression profile of immune response genes in patients with Severe Acute Respiratory Syndrome Cell host response to infection with novel human coronavirus EMC predicts potential antivirals and important differences with SARS coronavirus SREBP-dependent lipidomic reprogramming as a broad-spectrum antiviral target Proteomics of SARS-CoV-2-infected host cells reveals therapy targets The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease Discovery and preclinical validation of drug indications using compendia of public gene expression data Visualizing data using t-SNE COVID-19) Update: FDA Revokes Emergency Use Authorization for Chloroquine and Hydroxychloroquine Halts Clinical Trial of Hydroxychloroquine Clinical and immunologic features in severe and moderate Coronavirus Disease SARS-CoV-2 infects T lymphocytes through its spike protein-mediated membrane fusion Indomethacin has a potent antiviral activity against SARS coronavirus as adjuvant treatment for coronavirus disease 2019 pneumonia patients requiring hospitalization (MAC-19 PRO): a case series Beneficial actions of melatonin in the management of viral infections: a new use for this ″molecular handyman″? Corticosteroids: Mechanisms of action in health and disease Impact of corticosteroid therapy on outcomes of persons with SARS-CoV-2, SARS-CoV, or MERS-CoV infection: a systematic review and meta-analysis Dexamethasone in hospitalized patients with Covid-19 -Preliminary report A structure-informed atlas of human-virus interactions Toremifene interacts with and destabilizes the Ebola virus glycoprotein MERS-CoV pathogenesis and antiviral efficacy of licensed drugs in human monocyte-derived antigen-presenting cells Repurposing of clinically developed drugs for treatment of Middle East respiratory syndrome coronavirus infection Infection and rapid transmission of SARS-CoV-2 in ferrets Toremifene, a new antiestrogenic compound, for treatment of advanced breast cancer. Phase II study Prostate cancer and prostatic intraepithelial neoplasia: true, true, and unrelated? Effect of toremifene on clinical chemistry, hematology and hormone levels at different doses in healthy postmenopausal volunteers: phase I study Hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting SARS-CoV-2 infection in vitro Efficacy of hydroxychloroquine in patients with COVID-19: results of a randomized clinical trial The QT interval in patients with SARS-CoV-2 infection treated with hydroxychloroquine/azithromycin Fatal toxicity of chloroquine or hydroxychloroquine with metformin in mice Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARS-coronavirus 2