key: cord-0557911-3wb3m31e authors: Nam, Yonghyun; Yun, Jae-Seung; Lee, Seung Mi; Park, Ji Won; Chen, Ziqi; Lee, Brian; Verma, Anurag; Ning, Xia; Shen, Li; Kim, Dokyoon title: Network reinforcement driven drug repurposing for COVID-19 by exploiting disease-gene-drug associations date: 2020-08-12 journal: nan DOI: nan sha: 0e62e7255e2694d7e8fe7c77a3b5f7985511d987 doc_id: 557911 cord_uid: 3wb3m31e Currently, the number of patients with COVID-19 has significantly increased. Thus, there is an urgent need for developing treatments for COVID-19. Drug repurposing, which is the process of reusing already-approved drugs for new medical conditions, can be a good way to solve this problem quickly and broadly. Many clinical trials for COVID-19 patients using treatments for other diseases have already been in place or will be performed at clinical sites in the near future. Additionally, patients with comorbidities such as diabetes mellitus, obesity, liver cirrhosis, kidney diseases, hypertension, and asthma are at higher risk for severe illness from COVID-19. Thus, the relationship of comorbidity disease with COVID-19 may help to find repurposable drugs. To reduce trial and error in finding treatments for COVID-19, we propose building a network-based drug repurposing framework to prioritize repurposable drugs. First, we utilized knowledge of COVID-19 to construct a disease-gene-drug network (DGDr-Net) representing a COVID-19-centric interactome with components for diseases, genes, and drugs. DGDr-Net consisted of 592 diseases, 26,681 human genes and 2,173 drugs, and medical information for 18 common comorbidities. The DGDr-Net recommended candidate repurposable drugs for COVID-19 through network reinforcement driven scoring algorithms. The scoring algorithms determined the priority of recommendations by utilizing graph-based semi-supervised learning. From the predicted scores, we recommended 30 drugs, including dexamethasone, resveratrol, methotrexate, indomethacin, quercetin, etc., as repurposable drugs for COVID-19, and the results were verified with drugs that have been under clinical trials. The list of drugs via a data-driven computational approach could help reduce trial-and-error in finding treatment for COVID-19. Coronavirus typically affects the respiratory tract of humans, and it leads to mild severe respiratory tract infections. Since December 2019, the novel coronavirus, well known as severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) or coronavirus disease 2019 (COVID- 19) , has been declared a global pandemic by the World Health Organization 1 . As of June 14 th , there are about 755 million confirmed cases and about 42.3 million deaths worldwide. The most common symptoms of COVID-19 are fever, dry cough, and fatigue. Additional symptoms include nasal congestion, headaches, conjunctivitis, and sore throat. In severe cases, symptoms of dyspnea and chest pain were reported. Several recent studies analyzed hospitalized patients' records to investigate diseases comorbid with COVID-19. Patients with these comorbidities such as diabetes mellitus, obesity, liver cirrhosis, kidney diseases, hypertension, and asthma have a higher risk for severe illness from COVID-19 [2] [3] [4] [5] [6] [7] . Among confirmed cases of COVID-19, patients with any of the above comorbidities do worse than those without. Since the coronavirus leads to death in severe cases and is highly contagious between humans, there is an urgent need for developing therapeutic treatments or vaccines. However, there are currently no effective medications 8 . To overcome the pandemic, many researchers and companies are currently developing new treatments and vaccines for COVID-19. However, this process usually takes at least a decade of work and about $300-600 million of funding. In order to overcome this global pandemic quickly, treatments must be developed immediately, but this can be difficult if we only employ traditional drug discovery methods. One way to hasten our search for effective treatments involves drug repurposing. Drug repurposing (or drug repositioning) is the process of reusing of alreadyapproved drugs for new medical conditions. Many clinical trials for COVID-19 patients using treatments for other diseases are already in place or will be performed in the clinical sites shortly 10 . So far, candidate treatments such as Remdesivir (originally designed to treat the Ebola virus), hydroxychloroquine (the now-infamous malaria drug), and lopinavir/ritonavir (an antiretroviral drug targeting HIV) have been used on hospitalized COVID-19 patients. In addition to these clinical trials, many studies have recently been conducted using computational drug repurposing strategies. Zhou et al constructed a drug-human coronavirus network using protein associations between coronavirus and 2,938 drugs and discovered 16 repurposable drugs 11 . They constructed a protein interactome network using coronavirus and drug-related genes and then predicted candidate drugs using network proximities. Ge et al proposed a data-driven drug repurposing framework that integrates coronavirus-related data and identifies drug candidates from 6,255 drugs 12 . Muralidharan et al found candidate drugs capable of molecular docking or drug binding by modeling the protein structure of coronavirus and drugs with structure-based virtual screening identified four commercially available drugs 13 . These previous studies used only protein/gene information or the molecular structure of coronavirus that can be targeted by repurposable drugs. This means that unfortunately clinical information has not been utilized. Because COVID-19 patients with comorbidity disease have a high risk of illness, this unexplored relationship of comorbidity disease may help to find repurposable drugs. In this study, we propose a network-based drug repurposing framework to predict candidate therapeutic agents for COVID-19. Our framework combines heterogeneous relational data such as disease-gene associations, disease-drug associations, drug-target gene associations, and comorbidity information with the novel coronavirus. Initially, we construct the comprehensive disease-gene-drug association networks by combining the disease-disease networks, gene-gene networks, and drug-drug networks. Then, the network propagation algorithm is applied to predict candidate repurposable drugs for COVID-19. In more detail, from the associations between diseases, genes, and drugs, we construct a disease-gene-drug network (DGDr-Net) to understand the interactions between diseases, genes, and drugs. Then, the comorbidity information was added to the DGDr-Net to enhance the interactions between COVID-19 and other diseases. The constructed networks have tripartite relationships among diseases, genes, and drugs and these interactions can be represented in the form of layered networks. In establishing a tripartite relationship between a specific disease, gene, and drug, the following three procedures should be considered for drug repurposing: (1) Find a disease that is genetically or clinically similar to COVID-19 (at the disease level); (2) Find the key gene that serves as the bio-marker or some other therapeutic evidence for COVID-19 at the gene level; (3) Find a drug that can be used to treat COVID-19 by combining or finding a drug that is already being used to treat other diseases. Given the nature of tripartite relationships, all three steps need to be done simultaneously rather than independently. Thus, we construct a disease-gene-drug network consisted of three single-layered networks: a disease-disease network (disease layer), a gene-gene network (gene layer), and a drug-drug network (drug layer). Additionally, each single network is connected according to association information. Within the network, drug scores for COVID-19 are calculated by network-based label propagations that are using graph-based semi-supervised learning. Since the treatments of COVID- 19 are not yet found, there is no label information associated with COVID-19 in the drug network. In those cases, the semi-supervised approach that can be applied when label information is insufficient is a very suitable method for recommending candidate drugs 14, 15 . Semi-supervised learning can deal with few labeled data and can perform predictions by propagating the label information to unlabeled nodes along with edges 16 . When the COVID-19 in the disease layer is set as a seed node for propagation, then the label information is transmitted through the edges of the disease-gene-drug network. The comorbidity and disease-gene association information in the disease layer are propagated to the gene and drug layer to inform the priority of the repurposable drug of COVID- 19 . From the resulting scores, the priorities of candidate drugs can be identified. The top-tier candidate drugs are recommended by repurposable drugs for coronavirus. The predicted results can help reduce trial-and-error in finding treatment for COVID-19. We propose a network-based drug repositioning method that utilizes and integrates biochemical and clinical COVID-19 data such as disease comorbidity, protein associations, and drug-target gene information. The overall framework to find repurposable drugs for treating COVID-19 consists of mainly two parts as shown in Figure 1 We construct the Disease-Gene-Drug Networks (DGDr-Net) to show the COVID-19 centric interactome with components for diseases, genes, and drugs. The interactions among them have a tripartite relationship that can be represented in the form of layered networks. The DGDr-Net is a graph, = ( , , ), where represents the set of nodes, represents the set of edges, and represents the set of layers. The proposed networks have three single layers with = { ! , " , !# }, depending on the type of components (e.g., disease, gene, and drug) as shown in Figure 1 (a). In the network, let $ ∈ denote a node and $% ∈ to denote the edge that connects two vertices $ and % . The edge weights in the single layer have weighted values and unweighted values between different layers. The edge weights in a single layer $% between two nodes $ and % is calculated by Gaussian similarity kernel as follows: The edge values between different layers can take a value 1 if the associations exist, otherwise 0. Then, the value of the similarity for DGDr-Net is represented by a matrix = { $% }. To easily describe the DGDr-Net with three single layers, the similarity matrix can be expressed in block-wise as follows: The block diagonal matrix ( / , 1 , and /2 ) represents a similarity for single networks with disease (D), gene (G), and drug (Dr) layers respectively, and the block off-diagonal matrix represents the connections between different layers. With the DGDr-Net, a scoring algorithm is applied to find repurposable drugs. Predicted resulting scores can be obtained simultaneously from each node of the disease layer, gene layer, and drug layer. In the disease layer, it is possible to obtain quantitatively how many associations between target and other diseases are. Disease scores for COVID-19 reflect both shared gene information with other diseases and comorbidity information. In the gene layer, disease-disease associations of the disease layer and protein interaction of the gene layer are propagated through the edges, and the resulting score indicates the priority of candidate target genes for treating COVID-19. In the drug layer, label information from COVID-19 in the disease layer is propagated through the edges of DGDr-Net. Resulting drug scores can identify the priority of candidate drugs for drug repurposing. Graph-based semi-supervised learning (SSL) is employed for scoring algorithms, as only one disease node (COVID-19) is given. SSL can perform predictions by propagating the label information to the nodes along with the edges even if there is only one labeled node 18 . The formulation of the scoring algorithm using SSL is as follows. Consider a DGDr-Net, = ( , , ), with node (= ∪ ∪ ) corresponding to the (= + + ) vertices. Let We collected a list of components and relational data from public databases. Table 1 IL10, CXCL8, IL6, IL1B, AGT, IL2, CXCL10, CCL3, TMPRSS2, IL7, IL2RA, CSF3, TMPRSS4, ACE2, and BSG) as shown in Figure 2a . Also, 194 drugs in our network are currently in clinical trials for COVID-19 23 . A more detailed description of data and how to construct networks is explained below. The disease network is a sub-graph = ( / , / ) when = { / } in the multi-layered DGDr-Net. A set of nodes / represent a disease and / represents the similarity between diseases. To obtain the COVID-19 centric disease network, we construct and integrate two types of networks; (a) networks with disease-gene associations and (b) networks with comorbidity relations. In the framework for drug repurposing using disease interactome, the former network allows us to use genetic information for other related diseases while the latter network allows us to utilize clinical data. First, we constructed networks with disease-gene associations Gene network is a sub-graph, = ( 1 , 1 ), when = { 1 } in DGDr-Net. The protein-protein interaction (PPIs) is employed for gene networks from the STRING database 21 . To avoid false positive information, we select 26,681 genes (proteins) and 841,068 interactions with a high confidence level (≥ 0.7). In the gene network, the edge stands for protein interaction, in which the edge weight indicates the presence or absence ('1' or '0' respectively). Drug network is a sub-graph, = ( /2 , /2 ), when = { /2 } in multi-layered DGDr-Net. 9,540 drugs (compounds) and 525,207 drug-target gene associations were obtained from DrugBank and CTD databases 19, 22 . Each drug is composed of 26,681-dimensional binary attributes, each of which stands for existing target gene associations ('1') or not ('0'). The similarity matrices for the drug network /2 were calculated by cosine distance and transformed by Gaussian similarity kernel by Eq(1). Recall similarity matrix for DGDr-Net in Equation (2). The block off-diagonal matrix represents the connections between different layers. From the relational data in Table 1 We performed a scoring algorithm to predict candidate repurposable drugs for COVID-19 using 23 . To verify the performance of our scoring results, we assume these 194 drugs as ground truths. Only one node in the disease layer was set as "1" for the labeled set, and the remaining 590 diseases, 26,681 genes and 2,173 drugs were set as "0" for the unlabeled set. After scoring, we tested how many drugs in clinical trials were highly ranked. The scoring algorithm with DGDr-Net provides score values for all nodes (disease, gene, drugs) in the three layers. Disease scores represent which diseases are strongly associated with COVID-19, and gene scores represent the priorities of target genes for biomarkers. Also, drug scores represent the priorities of candidate drugs for COVID-19. Figure 3 shows the list of top 30 ranked diseases, genes, and drugs when COVID-19 is given. The colored nodes represent COVID-19 related diseases, genes, and drugs, and they are sorted in descending order according to the resulting scores in each layer. Blue nodes indicate diseases, green nodes indicate the genes, and red nodes indicate the drugs that are recommended for repositioning. Also, we show the tripartite connections among diseases, genes, and drugs such as the disease-gene associations and drugdisease associations. In this study, we developed a network-based drug repurposing framework for recommending repurposable drugs by utilizing and combining COVID-19 related data such as disease comorbidity, protein associations, and drug-target gene information. We first constructed a form of layered disease-gene-drug networks (DGDr-Net) to figure out the interactions among the diseases, genes, and drugs related to COVID-19. With the network, we apply semi-supervised scoring algorithms to identify candidate repurposable drugs. However, the information we have about coronavirus associated drugs is not yet sufficient because we do not yet have a cure for COVID-19. Therefore, in the scoring procedures, similar to the Cold Start Problem in recommendation systems, the candidate drugs were identified without any inferences between COVID-19 and drugs. We provided the priorities of candidate repurposable drugs and candidate target genes based on predicted scores. Of the top 30 drugs, 17 drugs were currently in clinical trials for the treatment of COVID-19. Steroids such as dexamethasone, prednisolone, and hydrocortisone are recommended as top candidates. Among them, dexamethasone, an anti-inflammatory drug, had the highest scores. A unique strength of our experimental approach is the inclusion of comorbidity factors to our network. This allowed us to more accurately identify repurposable drugs while affording a bird'seye view of the intricate associations between COVID-19 and other related diseases, drugs, and genes. One particular limitation of our study is the relatively small sample size of the few databases we utilized; however, this concern is quickly alleviated as the robust yet flexible nature of a network-based approach allows us to very easily supplement and correct our current model. As we receive the newest information regarding the novel coronavirus, we can easily update the candidate drug/gene components of our networks, performing a set of updated calculations and generating an updated gene and drug candidate list almost instantly. With this in mind, we hope our approach may help clinicians and scientists make the hard decisions regarding which drugs or gene targets to test first in this global race for a cure. China Novel Coronavirus Investigating and Research Team. A novel coronavirus from patients with pneumonia in China Comorbidity and its impact on 1590 patients with Covid-19 in China: A Nationwide Analysis Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area Prevalence and impact of cardiovascular metabolic diseases on COVID-19 in China Does comorbidity increase the risk of patients with COVID-19: evidence from meta-analysis Kidney disease is associated with in-hospital death of patients with COVID-19 APOE e4 genotype predicts severe COVID-19 in the UK Biobank community cohort Pharmacologic treatments for coronavirus disease 2019 (COVID-19): a review Computational drug repositioning: from data to therapeutics Ongoing clinical trials for the management of the COVID-19 pandemic Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2 A data-driven drug repositioning framework discovered a potential therapeutic agent targeting COVID-19 Computational studies of drug repurposing and synergism of lopinavir, oseltamivir and ritonavir binding with SARS-CoV-2 Protease against COVID-19 Drug repurposing with network reinforcement Disease gene identification based on generic and diseasespecific genome networks Semi-supervised learning literature survey Introduction to semi-supervised learning Semi-supervised learning The comparative toxicogenomics database: update 2019. Nucleic acids research Fast unfolding of communities in large networks The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible DrugBank 5.0: a major update to the DrugBank database DrugBank: COVID-19 information Dexamethasone in the management of covid-19 Coronavirus breakthrough: dexamethasone is first drug shown to save lives Van Hemelrijck M: COVID-19 and treatment with NSAIDs and corticosteroids: should we be limiting their use in the clinical setting? ecancermedicalscience 2020 High-dose methotrexate with leucovorin rescue for severe COVID-19: An immune stabilization strategy for SARS-CoV-2 induced 'PANIC'attack The Antiviral Properties of Cyclosporine. Focus on Coronavirus, Hepatitis C Virus, Influenza Virus, and Human Immunodeficiency Virus Infections