key: cord-199630-2lmwnfda
authors: Ray, Sumanta; Lall, Snehalika; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra; Schonhuth, Alexander
title: Predicting potential drug targets and repurposable drugs for COVID-19 via a deep generative model for graphs
date: 2020-07-05
journal: nan
DOI: nan
sha: 
doc_id: 199630
cord_uid: 2lmwnfda

Coronavirus Disease 2019 (COVID-19) has been creating a worldwide pandemic situation. Repurposing drugs, already shown to be free of harmful side effects, for the treatment of COVID-19 patients is an important option in launching novel therapeutic strategies. Therefore, reliable molecule interaction data are a crucial basis, where drug-/protein-protein interaction networks establish invaluable, year-long carefully curated data resources. However, these resources have not yet been systematically exploited using high-performance artificial intelligence approaches. Here, we combine three networks, two of which are year-long curated, and one of which, on SARS-CoV-2-human host-virus protein interactions, was published only most recently (30th of April 2020), raising a novel network that puts drugs, human and virus proteins into mutual context. We apply Variational Graph AutoEncoders (VGAEs), representing most advanced deep learning based methodology for the analysis of data that are subject to network constraints. Reliable simulations confirm that we operate at utmost accuracy in terms of predicting missing links. We then predict hitherto unknown links between drugs and human proteins against which virus proteins preferably bind. The corresponding therapeutic agents present splendid starting points for exploring novel host-directed therapy (HDT) options.

The pandemic of COVID-19 (Coronavirus Disease-2019) has affected more than 6 million people. So far, it has caused about 0.4 million deaths in over 200 countries worldwide (https://coronavirus.jhu.edu/map.html), with numbers still increasing rapidly. COVID-19 is an acute respiratory disease caused by a highly virulent and contagious novel coronavirus strain, SARS-CoV-2, which is an enveloped, single-stranded RNA virus 1 . Sensing the urgency, researchers have been relentlessly searching for possible therapeutic strategies in the last few weeks, so as to control the rapid spread.

In their quest, drug repurposing establishes one of the most relevant options, where drugs that have been approved (at least preclinically) for fighting other diseases, are screened for their possible alternative use against the disease of interest, which is COVID-19 here. Because they were shown to lack severe side effects before, risks in the immediate application of repurposed drugs are limited. In comparison with de novo drug design, repurposing drugs offers various advantages. Most importantly, the reduced time frame in development suits the urgency of the situation in general. Furthermore, most recent, and most advanced artificial intelligence (AI) approaches have boosted drug repurposing in terms of throughput and accuracy enormously. Finally, it is important to understand that the 3D structures of the majority of viral proteins have remained largely unknown, which raises the puts up the obstacles for direct approaches to work even higher.

The foundation of AI based drug repurposing are molecule interaction data, optimally reflecting how drugs, viral and host proteins get into contact with each other. During the life cycle of a virus, the viral proteins interact with various human proteins in the infected cells. Through these interactions, the virus hijacks the host cell machinery for replication, thereby affecting the normal function of the proteins it interacts with. To develop suitable therapeutic strategies and design antiviral drugs, a comprehensive understanding of the interactions between viral and human proteins is essential 2 .

When watching out for drugs that can be repurposed to fight the virus, one has to realize that targeting single virus proteins easily leads to the viruses escaping the (rather simpleminded) attack by raising resistance-inducing mutations. Therefore, host-(1) We link existing high-quality, long-term curated and refined, large scale drug/protein -protein interaction data with (2) molecular interaction data on SARS-CoV-2 itself, raised only a handful of weeks ago, (3) exploit the resulting overarching network using most advanced, AI boosted techniques (4) for repurposing drugs in the fight against SARS-CoV-2 (5) in the frame of HDT based strategies.

As for (3)-(5), we will highlight interactions between SARS-Cov-2-host protein and human proteins important for the virus to persist using most advanced deep learning techniques that cater to exploiting network data. We are convinced that many of the fairly broad spectrum of drugs we raise will be amenable to developing successful HDT's against COVID-19.

In the following, we will first describe the workflow of our analysis pipeline and the basic ideas that support it.

We proceed by carrying out a simulation study that proves that our pipeline accurately predicts missing links in the encompassing drug -human protein -SARS-CoV-2-protein network that we raise and analyze. Namely we demonstrate that our (high-performance, AI supported) prediction pipeline accurately re-establishes links that had been explicitly removed before. This provides sound evidence that the interactions that we predict in the full network most likely reflect true interactions between molecular interfaces.

Subsequently, we continue with the core experiments. We predict links to be missing in the full (without artificially having removed links), encompassing drug -human protein -SARS-CoV-2-protein network, raised by combining links from year-long curated resources on the one hand and most recently published COVID-19 resources on the other hand. As per our simulation study, a large fraction, if not the vast majority of the predictions establish true, hence actionable interactions between drugs on the one hand and SARS-CoV-2 associated human proteins (hence of use in HDT) on the other hand. A B C D Figure 1 . Overall workflow of the proposed method: The three networks SARS-CoV-2-host PPI, human PPI, and drug-target network (Panel-A) are mapped by their common interactors to form an integrated representation (Panel-B). The neighborhood sampling strategy Node2Vec converts the network into fixed-size low dimensional representations that perverse the properties of the nodes belonging to the three major components of the integrated network (Panel-C). The resulting feature matrix (F) from the node embeddings and adjacency matrix (A) from the integrated network are used to train a VGAE model, which is then used for prediction (panel-D).

For the purposes of high-confidence validation, we carry out a literature study on the overall 92 drugs we put forward. For this, we inspect the postulated mechanism-of-action of the drugs in the frame of several diseases, including SARS-CoV and MERS-CoV driven diseases in particular.

See Figure 1 for the workflow of our analysis pipeline and the basic ideas that support it. We will describe all important steps in the paragraphs of this subsection.

This reduces the training time compared to the general graph autoencoder model. We tested the model performance for a different number of sampled nodes, keeping track of the area under the ROC curve (AUC), average precision (AP) score, and model training time in the frame of a train-validation-test split at proportions 8:1:1. Table 1 shows the performance of the model for sampled sugraph sizes N S = 7000, 5000, 3000, 2500 and 1000. For 5000 sampled nodes, the model's performance is sufficiently good enough concerning its training time and validation-AUC and -AP score. The average test ROC-AUC and AP score of the model for N s =5000 are 88.53 ± 0.03 and 84.44 ± 0.04.

To know the efficacy of the model in discovering the existing edges between only CoV-host and drug nodes, we train the model (with N s =5000) on an incomplete version of the graph where the links between CoV-host and drugs have been removed. We further compute the feature matrix F based on the incomplete graph, and use it. The test set consists of all the previously removed edges. The model performance is no doubt better for discovering those edges between CoV-host and drug nodes (ROC-AUC: 93.56 ± 0.01 AP: 90.88 ± 0.02 for 100 runs).

The FastGAE model is learned with the feature matrix (F) and adjacency matrix (A). The node feature matrix (F) is obtained from A using the Node2Vec neighborhood sampling strategy. The model performance is evaluated with and without using F as feature matrix. Figure 2 shows the average performance of the model on validation sets with and without F as input for the different number of sampling nodes. We calculate average AUC, and AP scores for 50 complete runs of the model. From figure 2 , it is evident that including F as feature matrix enhances the model's performance markedly. 

We use the Node2Vec framework to learn low dimensional embeddings of each node in the compiled network. It uses the Skipgram algorithm of the word2vec model to learn the embeddings, which eventually groups nodes with a similar 'role' or having a similar 'connection pattern' within the graph. Similar 'role' ensures that nodes within the sets/groups are structurally similar/equivalent than the other nodes outside the groups. Two nodes are said to be structurally equivalent if they have identical connection patterns to the rest of the network 20 . To explore this, we have analyzed the embedding results in two steps. First, we explore structurally equivalent nodes to identify 'roles' and similar connection patterns to the rest of the networks, and later use Lovain clustering to examine the same within the groups/clusters. The most_similar function of the Node2Vec inspects the structurally equivalent nodes within the network. We find out all the CoV-host nodes which are most similar to the drug nodes. While it is expected to observe nodes of the same types within the neighborhood of a particular node, in some cases, we found some drugs are neighbors of CoV-host proteins with high probability (pobs > 0.65). SARS-CoV-2 3CL protease 21 . Some other drugs such as 'Clenbuterol' and 'Fenbendazole', the probable neighbor of ppp1cb and EEF1A respectively, are used as bronchodilators in asthma.

To explore the closely connected groups, we have constructed a neighborhood graph using the K-th nearest neighbor algorithm from the node embeddings and apply Louvain clustering ( Figure 3 -panel-C). Although there is a clear separation between host proteins (including CoV-host) cluster and drug cluster, some of the Louvain clusters contain both types of nodes. For example, Louvain cluster-16 and -17 contain four and two drugs along with the other CoV-host proteins, respectively. Figure 3 panel-D represents a network consisting of these six drugs and their most similar CoV-host nodes.

For drug-Cov-host interaction prediction, we exploit Variational Graph Autoencoder (VGAE), an unsupervised graph neural network model, first introduced in 18 to leverage the concept of variational autoencoder in graph-structured data. To make learning faster, we utilized the fastGAE model to take advantage of the fast decoding phase. We have used two data matrices in the fastGAE model for learning: one is the adjacency matrix, which represents the interaction information over all the nodes, and the other one is the feature matrix representing the low-dimensional embeddings of all the nodes in the network. We create a test set of 'non-edges' by removing all existing links between drugs and CoV-host proteins from all possible combinations (332 CoV-host × 1302 drugs) of edges. The model is trained on the whole network with the adjacency matrix A and feature matrix F. The trained model is then applied to the test 'non-edges' to know the most probable links. We identified a total of 692 most probable links with 92 drugs and 78 CoV-host proteins with a probability threshold of 0.8. The predicted CoV-host proteins are involved in different crucial pathways of viral infection (table 4). The p-values for pathway and GO enrichment are calculated by using the hypergeometric test with 0.05 FDR corrections. Figure 4 , Panel-A shows the heatmap of probability scores between predicted drugs and CoV-host proteins. To get more details of the predicted bipartite graph, we Figure 4 . Drug-CoV-host predicted interaction: panel-A shows heatmap of probability scores between 92 drugs and 78 CoV-host proteins. The four predicted bipartite modules are annotated as B1, B2, B3 and B4 within the heatmap. The drugs are colored based on their clinical phase (red-launched, preclinical-blue, phase2/phase3-green and phase-1/ phase-2-black ). Panel-B, C, D and E represents networks corresponding to B1, B2, B3 and B4 modules.The drugs are annotated using the disease area found in CMAP database 22

A B C D E Figure 5 . Predicted interactions for probability threshold: 0.9. panel-A shows the interaction graph between drugs and CoV-host. Drugs are annotated with their usage. Panel-B, C, D and E represents quasi-bicliques for one, two, three and more than three drugs molecules respectively.

use a weighted bipartite clustering algorithm proposed by J. Beckett 23 . This results in 4 bipartite modules (Panel-A figure 4): B1 (11 drugs, 28 CoV-host), B2 (4 drugs, 41 CoV-host), B3 (71 rugs and 4 CoV-host), and B4 ( 6 drugs and 5 CoV-host). The other panels of the figure show the network diagram of four bipartite modules. B1 contains 11 drugs, including some antibiotics (Anisomycin, Midecamycin), and anti-cancer drugs (Doxorubicin, Camptothecin). B3 also has some antibiotics such as Puromycin, Demeclocycline, Dirithromycin, Geldanamycin, and Chlortetracycline, among them, the first three are widely used for bronchitis, pneumonia, and respiratory tract infections 24 . Some other drugs such as Lobeline and Ambroxol included in the B3 module have a variety of therapeutic uses, including respiratory disorders and bronchitis. The high confidence predicted interactions (with threshold 0.9) is shown in Figure 5 panel-A. To highlight some repurposable drug combination and their predicted CoV-host target, we perform a weighted clustering (clusterONE) 25 on this network and found some quasy-bicluques (shown in Panel-B-E) We matched our predicted drugs with the drug list recently published by Zhou et al. 13 and found six common drugs: Mesalazine, Vinblastine, Menadione, Medrysone, Fulvestrant, and Apigenin. Among them, Apigenin has a known effect in the antiviral activity together with quercetin, rutin, and other flavonoids 26 . Mesalazine is also proven to be extremely effective in the treatment of other viral diseases like influenza A/H5N1 virus. 27 .

Baclofen, a benzodiazepine receptor (GABAA-receptor) agonist, has a potential role in antiviral associated treatment 28 . Antiinflammatory antecedents fisetin is also tested for antiviral activity, such as for inhibition of Dengue (DENV) virus infection 29 . It down-regulates the production of proinflammatory cytokines induced by a DENV infection. Both of the drugs are listed in the high confidence interaction set with the three CoV-hosts: TAPT1 (interacted with SARS-CoV-2 protein: orf9c), SLC30A6 (interacted with SARS-CoV-2 protein: orf9c), and TRIM59 (interacted with SARS-CoV-2 protein: orf3a) ( Figure 5 -panel-C).

Topoisomerase Inhibitors play an active role as antiviral agents by inhibiting the viral DNA replication 30, 31 . Some Topoisomerase Inhibitors such as Camptothecin, Daunorubicin, Doxorubicin, Irinotecan and Mitoxantrone are predicted to interact with several CoV-host proteins. It has been demonstrated that the anticancer drug camptothecin (CPT) and its derivative Irinotecan have a potential role in antiviral activity 32, 33 . It inhibits host cell enzyme topoisomerase-I which is required for the initiation as well as completion of viral functions in host cell 34 . Daunorubicin (DNR) has also been demonstrated as an inhibitor of HIV-1 virus replication in human host cells 35 . The conventional anticancer antibiotic Doxorubicin was identified as a selective inhibitor of in vitro Dengue and Yellow Fever virus replication 36 . It is also reported that doxorubicin coupling with monoclonal antibody can create an immunoconjugate that can eliminate HIV-1 infection in mice cell 37 . Mitoxantrone shows antiviral activity against the human herpes simplex virus (HSV1) by reducing the transcription of viral genes in many human cells that are essential for DNA synthesis 38 .

Histone Deacetylases Inhibitors (HDACi) are generally used as latency-reversing agents for purging HIV-1 from the latent reservoir like CD4 memory cell 39 . Our predicted drug list (Table 3 ) contains two HDACi: Scriptaid and Vorinostat. Vorinostrate can be used to achieve latency reversal in the HIV-1 virus safely and repeatedly 40 . Asymptomatic patients infected with SARS-CoV-2 are of significant concern as they are more vulnerable to infect large number of people than symptomatic patients. Moreover, in most cases (99 percentile), patients develop symptoms after an average of 5-14 days, which is longer than the incubation period of SARS, MERS, or other viruses 41 . To this end, HDACi may serve as good candidates for recognizing and clearing the cells in which SARS-CoV-2 latency has been reversed.

Heat shock protein 90 (HSP) is described as a crucial host factor in the life cycle of several viruses that includes an entry in the cell, nuclear import, transcription, and replication 42, 43 . HSP90 is also shown to be an essential factor for SARS-CoV-2 envelop (E) protein 44 . In 45 , HSP90 is described as a promising target for antiviral drugs. The list of predicted drugs contains three HSP inhibitors: Tanespimycin, Geldanamycin, and its derivative Alvespimycin. The first two have a substantial effect in inhibiting the replication of Herpes Simplex Virus and Human enterovirus 71 (EV71), respectively. Recently in 46 , Geldanamycin and its derivatives are proposed to be an effective drug in the treatment of COVID-19.

Inhibiting DNA synthesis during viral replication is one of the critical steps in disrupting the viral infection. The list of predicted drugs contains six such small molecules/drugs, viz., Niclosamide, Azacitidine, Anisomycin, Novobiocin, Primaquine, Menadione, and Metronidazole. DNA synthesis inhibitor Niclosamide has a great potential to treat a variety of viral infections, including SARS-CoV, MERS-CoV, and HCV virus 47 and has recently been described as a potential candidate to fight the 9/19 SARS-CoV-2 virus 47 . Novobiocin, an aminocoumarin antibiotic, is also used in the treatment of Zika virus (ZIKV) infections due to its protease inhibitory activity. In 2005, Chloroquine (CQ) had been demonstrated as an effective drug against the spread of severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV). Recently Hydroxychloroquine (HCQ) sulfate, a derivative of CQ, has been evaluated to efficiently inhibit SARS-CoV-2 infection in vitro 48 . Therefore, another anti-malarial aminoquinolin drug Primaquine may also contribute to the attenuation of the inflammatory response of COVID-19 patients. Primaquine is also established to be effective in the treatment of Pneumocystis pneumonia (PCP) 49 .

Cardiac glycosides have been shown to play a crucial role in antiviral drugs. These drugs target cell host proteins, which help reduce the resistance to antiviral treatments. The antiviral effects of cardiac glycosides have been described by inhibiting the pump function of Na, K-ATPase. This makes them essential drugs against human viral infections. The predicted list of drugs contains three cardiac glycosides ATPase inhibitors: Digoxin, Digitoxigenin, and Ouabain. These drugs have been reported to be effective against different viruses such as herpes simplex, influenza, chikungunya, coronavirus, and respiratory syncytial virus 50 .

MG132, proteasomal inhibitor is established to be a strong inhibitor of SARS-CoV replication in early steps of the viral life cycle 51 . MG132 inhibits the cysteine protease m-calpain, which results in a pronounced inhibition of SARS-CoV-2 replication in the host cell. In 52 , Resveratrol has been demonstrated to be a significant inhibitor MERS-CoV infection. Resveratrol treatment decreases the expression of nucleocapsid (N) protein of MERS-CoV, which is essential for viral replication. As MG132 and Resveratrol play a vital role in inhibiting the replication of other coronaviruses SARS-CoV and MERS-CoV, so they may be potential candidates for the prevention and treatment of SARS-CoV-2. Another drug Captopril is known as Angiotensin II receptor blockers (ARB), which directly inhibits the production of angiotensin II. In 53 , Angiotensin-converting enzyme 2 (ACE2) is demonstrated as the binding site for SARS-CoV-2. So Angiotensin II receptor blockers (ARB) may be good candidates to use in the tentative treatment for SARS-CoV-2 infections 54 . In summary, our proposed method predicts several drug targets and multiple repurposable drugs that have prominent literature evidence of uses as antiviral drugs, especially for two other coronavirus species SARS-CoV and MERS-CoV. Some drugs are also directly associated with the treatment of SARS-CoV-2 identified by recent literature. However, further clinical trials and several preclinical experiments are required to validate the clinical benefits of these potential drugs and drug targets. 

In this work, we have successfully generated a list of high-confidence candidate drugs that can be repurposed to counteract SARS-CoV-2 infections. The novelties have been to integrate most recently published SARS-CoV-2 protein interaction data on the one hand, and to use most recent, most advanced AI (deep learning) based high-performance prediction machinery on the other hand, as the two major points. In experiments, we have validated that our prediction pipeline operates at utmost accuracy, confirming the quality of the predictions we have raised. The recent publication (April 30, 2020) of two novel SARS-CoV-2-human protein interaction resources 15, 16 has unlocked enormous possibilities in studying virulence and pathogenicity of SARS-CoV-2, and the driving mechanisms behind it. Only now, various experimental and computational approaches in the design of drugs against COVID-19 have become conceivable, and only now such approaches can be exploited truly systematically, at both sufficiently high throughput and accuracy.

Here, to the best of our knowledge, we have done this for the first time. We have integrated the new SARS-CoV-2 protein interaction data with well established, long-term curated human protein and drug interaction data. These data capture hundreds of thousands approved interfaces between encompassing sets of molecules, either reflecting drugs or human proteins. As a result, we have obtained a comprehensive drug-human-virus interaction network that reflects the latest state of the art in terms of our knowledge about how SARS-CoV-2 and interacts with human proteins and repurposable drugs. For exploiting the new network-already establishing a new resource in its own right-we have opted for most recent and advanced deep learning based technology. A generic reason for this choice is the surge in advances and the resulting boost in operative prediction performance of related methods over the last 3-4 years. A particular reason is to make use of most advanced graph neural network based techniques, namely variational graph autoencoders as a deep generative model of utmost accuracy, the practical implementation of which 19 was presented only a few months ago (just like the relevant network data). Note that only this recent implementation enables to process networks of sizes in the range of common molecular interaction data. In essence, graph neural networks "learn" the structure of links in networks, and infer rules that underlie the interplay of links. Based on the knowledge gained, they enable to predict links and output the corresponding links together with probabilities for them to indeed be missing.

Simulation experiments, reflecting scenarios where links known to exist in our network were re-established by prediction upon their removal, pointed out that our pipeline does indeed predict missing links at utmost accuracy.

Encouraged by these simulations, we proceeded by performing the core experiments, and predicted links to be missing without prior removal of links in our encompassing network. These core experiments revealed 692 high confidence interactions relating to 92 drugs. In our experiments, we focused on predicting links between drugs and human proteins that in turn are known to interact with SARS-CoV-2 proteins (SARS-CoV-2 associated host proteins). We have decidedly put the focus not on drug -SARS-CoV-2-protein interactions, which would have reflected more direct therapy strategies against the virus. Instead, we have focused on predicting drugs that serve the purposes of host-directed therapy (HDT) options, because HDT strategies have proven to be more sustainable with respect to mutations by which the virus escapes a response to the therapy applied. Note that HDT strategies particularly cater to drug repurposing attempts, because repurposed drugs have already proven to lack severe side effects, because they are either already in use, or have successfully passed the preclinical trial stages.

We further systematically categorized the 92 repurposable drugs into 70 categories based on their domains of application and molecular mechanism. According to this, we identified and highlighted several drugs that target host proteins that the virus needs to enter (and subsequently hijack) human cells. One such example is Captopril, which directly inhibits the production of Angiotensin-Converting Enzyme-2 (ACE-2), in turn already known to be a crucial host factor for SARS-CoV-2. Further, we identified Primaquine, as an antimalaria drug used to prevent the Malaria and also Pneumocystis pneumonia (PCP) relapses, because it interacts with the TIM complex TIMM29 and ALG11. Moreover, we have highlighted drugs that act as DNA replication inhibitor (Niclosamide, Anisomycin), glucocorticoid receptor agonists (Medrysone), ATPase inhibitors (Digitoxigenin, Digoxin), topoisomerase inhibitors (Camptothecin, Irinotecan), and proteosomal inhibitors (MG-132). Note that some drugs are known to have rather severe side effects from their original use (Doxorubicin, Vinblastine), but the disrupting effects of their short-term usage in severe COVID-19 infections may mean sufficient compensation.

In summary, we have compiled a list of drugs, which when repurposed are of great potential in the fight against the COVID-19 pandemic, where therapy options are urgently needed. Our list of predicted drugs suggests both options that had been identified and thoroughly discussed before and new opportunities that had not been pointed out earlier. The latter class of drugs may offer valuable chances for pursuing new therapy strategies against COVID-19.

We have utilized three categories of interaction datasets: human protein-protein interactome data, SARS-CoV-2-host protein interaction data, and drug-host interaction data.

We have taken SARS-CoV-2-host interaction information from two recent studies by Gordon et al and Dick et al 15, 16 . In 15 , 332 high confidence interactions between SARS-CoV-2 and human proteins are predicted using using affinity-purification mass spectrometry (AP-MS). In 16 , 261 high confidence interactions are identified using sequence-based PPI predictors (PIPE4 & SPRINT).

The drug-target interaction information has been collected from five databases, viz., DrugBank database (v4.3) 57 , ChEMBL 58 database, Therapeutic Target Database (TTD) 59 , PharmGKB database, and IUPHAR/BPS Guide to PHARMACOLOGY 60 . Total number of drugs and drug-host interactions used in this study are 1309 and 1788407, respectively.

We have built a comprehensive list of human PPIs from two datasets: (1) CCSB human Interactome database consisting of 7,000 genes, and 13944 high-quality binary interactions 61-63 , (2) The Human Protein Reference Database 56 which consists of 8920 proteins and 53184 PPIs.

The summary of all the datasets is provided in Table 2 . CMAP database 22 is used to annotate the drugs with their usage different disease areas.

We have utilized Node2vec 17 , an algorithmic framework for learning continuous feature representations for nodes in networks. It maps the nodes to a low-dimensional feature space that maximizes the likelihood of preserving network neighborhoods.

The principle of feature learning framework in a graph can be described as follows: Let G = (V, E) be a given graph, where V represents a set of nodes, and E represents the set of edges. The feature representation of nodes (|V |) is given by a mapping function: f : V → R d , where d specify the feature dimension. The f may also be represented as a node feature matrix of dimension of |V | × d. For each node, v ∈ V , NN S (v) ⊂ V defines a network neighborhood of node v which is generated using a neighbourhood sampling strategy S. The sampling strategy can be described as an interpolation between breadth-first search and depth-first search technique 17 . The objective function can be described as:

This maximizes the likelihood of observing a network neighborhood NN S (v) for a node v given on its feature representation f . Now the probability of observing a neighborhood node n i ∈ NN S (v) given the feature representation of the source node v is given as :

where, n i is the i th neighbor of node v in neighborhood set NN S (v). The conditional likelihood of each source (v) and neighborhood node (n i ∈ NN S (V )) pair is represented as softmax of dot product of their features f (v) and f (n i ) as follows:

Variational Graph Autoencoder (VGAE) is a framework for unsupervised learning on graph-structured data 64 . This model uses latent variables and is effective in learning interpretable latent representations for undirected graphs. The graph autoencoder consists of two stacked models: 1) Encoder and 2) Decoder. First, an encoder based on graph convolution networks (GCN) 18 maps the nodes into a low-dimensional embedding space. Subsequently, a decoder attempts to reconstruct the original graph structure from the encoder representations. Both models are jointly trained to optimize the quality of the reconstruction from the embedding space, in an unsupervised way. The functions of these two model can be described as follows:

Encoder: It uses Graph Convolution Network (GCN) on adjacency matrix A and the feature representation matrix F. Encoder generates a d -dimensional latent variable z i for each node i ∈ V , with |V | = n, that corresponds to each embedding node, with d ≤ n. The inference model of the encoder is given below:

where, r(z i |A, F) corresponds to normal distribution, N ( z i µ i , σ 2 i ), µ i and σ i are the Gaussian mean and variance parameters. The actual embedding vectors z i are samples drawn from these distributions.

Decoder: It is a generative model that decodes the latent variables z i to reconstruct the matrix A using inner products with sigmoid activation from embedding vector, (Z).

where, A is the decoded adjacency matrix. The objective function of the variational graph autoencoder (VGAE) can be written as: 

The objective function C V GAE maximizes the likelihood of decoding the adjacency matrix w.r.t graph autoencoder weights using stochastic gradient decent. Here, D KL (.||.) represents Kullback-Leibler divergence 65 and p(Z) is the prior distribution of latent variable.

Drug-SARS-CoV-2 Link Prediction 1. Adjacency Matrix Preparation In this work, we consider an undirected graph G = (V, E) with |V | = n nodes and |E| = m edges. We denote A as the binary adjacency matrix of G. Here V consists of SARS-Cov-2 proteins, CoV-host proteins, drug-target proteins and drugs. The matrix (A) contains a total of n = 16444 nodes given as:

where, N Nc is the number of SARS-CoV-2 proteins. N DT is the number of drug targets, whereas N NT and N D represent the number of CoV-host and drugs nodes, respectively. Total number of edges is given by:

where, E 1 represents interactions between SARS-CoV-2 and human host proteins, E 2 is the number of interactions among human proteins, and E 3 represents the number of interactions between drugs and human host proteins.

The neighborhood sampling strategy is used here to prepare a feature representation of all nodes. A flexible biased random walk procedure is employed to explore the neighborhood of each node. A random walk in a graph G can be described as the probability:

where, π(v, x) is the transition probability between nodes v and x, where (v, x) ∈ E and a i is the i th node in the walk of length l. The transition probability is given by π(v, x) = c pq (t, x) * w vx , where t is the previous node of v n the walk, w vx is the static edge weights and p, q are the two parameters which guides the walk. The coefficient c pq (t, x) is given by

where, distance(t, x) represents the shortest path distance between nodes t and node x. The process of feature matrix F n×d generation is governed by the Node2vec algorithm. It starts from every nodes and simulates r random walks of fixed length l. In every step of walk transition probability π(v, x) govern the sampling. The generated walk of each iteration is included to a walk-list. Finally, the stochastic gradient descent is applied to optimize the list of walks and result is returned.

3. Link Prediction: Scalable and Fast variational graph autoencoder (FastVGAE) 19 is utilized in our proposed work to reduce the computational time of VGAE in large network. The adjacency matrix A and the feature matrix F are given into the encoder of FastVGAE. The encoder uses graph convolution neural network (GCN) on the entire graph to create the latent representation (Z).

The encoder works on full Adjacency Matrix A. After encoding, sampling is done and decoder works on the sampled sub graph.

The mechanism of decoder of FastVGAE is slightly different from traditional VGAE. It regenerate the adjacency matrix A based on a subsample of graph nodes, V s . It uses a graph node sampling technique to randomly sample the reconstructed nodes at each iteration. Each node is assigned with a probability p i and the selection of noes is based on the high score of p i . The probability p i is given by the following equation:

where, f (i) is the degree of node i, and α is the sharpening parameter. We take α = 2 in our study. The node selection process is repeated until |V s | = n s , where n s is the number of sampling nodes.

The decoder reconstructs the smaller matrix, A s of dimension n s × n s instead of decoding the main adjacency matrix A. The decoder function follows the following equation:

A s (i, j) = Sigmoid(z T i .z j ), ∀(i, j) ∈ V s ×V s .

At each training iteration different subgraph (G s ) is drawn using the sampling method.

After the model is trained the drug-CoV-host links are predicted using the following equation:

where A i j represents the possible links between all combination of SARS-CoV-2 nodes and drug nodes. For each combination of nodes the model gives probability based on the logistic sigmoid function.

A new coronavirus associated with human respiratory disease in china

Host-pathogen systems biology

Host-directed therapies for bacterial and viral infections

Network-based drug repositioning: Approaches, resources, and research directions

New horizons for antiviral drug discovery from virus-host protein interaction networks

Drug target prediction and repositioning using an integrated network-based approach

Mapping protein interactions between dengue virus and its human and insect hosts

A review of in silico approaches for analysis and prediction of hiv-1-human protein-protein interactions

Network-based study reveals potential infection pathways of hepatitis-c leading to various diseases

Prediction of the ebola virus infection related human genes using protein-protein interaction network

A genome-wide positioning systems network algorithm for in silico drug repurposing

deepdr: a network-based deep learning approach to in silico drug repositioning

Network-based drug repurposing for novel coronavirus 2019-ncov/sars-cov-2

Network bioinformatics analysis provides insight into drug repurposing for covid-2019

A sars-cov-2 protein interaction map reveals targets for drug repurposing

Comprehensive prediction of the sars-cov-2 vs. human interactome using pipe4, sprint, and pipe-sites

Scalable feature learning for networks

Fastgae: Fast, scalable and effective graph autoencoders with stochastic subgraph decoding

From community to role-based graph embeddings

Specific plant terpenoids and lignoids possess potent antiviral activities against severe acute respiratory syndrome coronavirus

A next generation connectivity map: L1000 platform and the first 1,000,000 profiles

Improved community detection in weighted bipartite networks

Drugbank: a comprehensive resource for in silico drug discovery and exploration

Detecting overlapping protein complexes in protein-protein interaction networks

The therapeutic potential of apigenin

Delayed antiviral plus immunomodulator treatment still reduces mortality in mice infected by high inoculum of influenza a/h5n1 virus

Baclofen promotes alcohol abstinence in alcohol dependent cirrhotic patients with hepatitis c virus (hcv) infection

Antiviral and immunomodulatory effects of polyphenols on macrophages infected with dengue virus serotypes 2 and 3 enhanced or not with antibodies

Evaluation of topoisomerase inhibitors as potential antiviral agents

Potent antiviral activity of topoisomerase i and ii inhibitors against kaposi's sarcoma-associated herpesvirus

Antiviral action of camptothecin

An analog of camptothecin inactive against topoisomerase i is broadly neutralizing of hiv-1 through inhibition of vif-dependent apobec3g degradation

Water-insoluble camptothecin analogues as potential antiviral drugs

Inhibition of hiv-1 replication by daunorubicin

A derivate of the antibiotic doxorubicin is a selective inhibitor of dengue and yellow fever virus replication in vitro

Elimination of hiv-1 infection by treatment with a doxorubicin-conjugated anti-envelope antibody

Antiviral activity of mitoxantrone dihydrochloride against human herpes simplex virus mediated by suppression of the viral immediate early genes

Histone deacetylase inhibitors for purging hiv-1 from the latent reservoir

Interval dosing with the hdac inhibitor vorinostat effectively reverses hiv latency

The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application

Synthesis and in vitro anti-hsv-1 activity of a novel hsp90 inhibitor bj-b11

Heat shock protein 90 facilitates formation of the hbv capsid via interacting with the hbv core protein dimers

Severe acute respiratory syndrome coronavirus envelope protein regulates cell stress response and apoptosis

Hsp90: a promising broad-spectrum antiviral drug target

Drug repositioning suggests a role for the heat shock protein 90 inhibitor geldanamycin in treating covid-19 infection

Broad spectrum antiviral agent niclosamide and its therapeutic potential

Hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting sars-cov-2 infection in vitro

Pharmacokinetic optimisation in the treatment of pneumocystis carinii pneumonia

The antiviral effects of na, k-atpase inhibition: A minireview

Severe acute respiratory syndrome coronavirus replication is severely impaired by mg132 due to proteasome-independent inhibition of m-calpain

Effective inhibition of mers-cov infection by resveratrol

Structural basis of receptor recognition by sars-cov-2

Angiotensin receptor blockers as tentative sars-cov-2 therapeutics

Next-generation sequencing to generate interactome datasets

Development of human protein reference database as an initial platform for approaching systems biology in humans

Drugbank 4.0: shedding new light on drug metabolism

Chembl: a large-scale bioactivity database for drug discovery

Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information

The iuphar/bps guide to pharmacology: an expert-driven knowledgebase of drug targets and their ligands

Towards a proteome-scale map of the human protein-protein interaction network

A proteome-scale map of the human interactome network

A reference map of the human binary protein interactome

Stochastic backpropagation and approximate inference in deep generative models

On information and sufficiency. The annals mathematical statistics