key: cord-0730634-hmq6bke1 authors: Zhou, Liqian; Wang, Juanjuan; Liu, Guangyi; Lu, Qingqing; Dong, Ruyi; Tian, Geng; Yang, Jialiang; Peng, Lihong title: Probing antiviral drugs against SARS-CoV-2 through virus-drug association prediction based on the KATZ method date: 2020-07-31 journal: Genomics DOI: 10.1016/j.ygeno.2020.07.044 sha: 4a3dd056307ab2210c06b206d95bc738e625534a doc_id: 730634 cord_uid: hmq6bke1 It is urgent to find an effective antiviral drug against SARS-CoV-2. In this study, 96 virus-drug associations (VDAs) from 12 viruses including SARS-CoV-2 and similar viruses and 78 small molecules are selected. Complete genomic sequence similarity of viruses and chemical structure similarity of drugs are then computed. A KATZ-based VDA prediction method (VDA-KATZ) is developed to infer possible drugs associated with SARS-CoV-2. VDA-KATZ obtained the best AUCs of 0.8803 when the walking length is 2. The predicted top 3 antiviral drugs against SARS-CoV-2 are remdesivir, oseltamivir, and zanamivir. Molecular docking is conducted between the predicted top 10 drugs and the virus spike protein/human ACE2. The results showed that the above 3 chemical agents have higher molecular binding energies with ACE2. For the first time, we found that zidovudine may be effective clues of treatment of COVID-19. We hope that our predicted drugs could help to prevent the spreading of COVID. In the end of December 2019, a new coronavirus pneumonia named COVID-19 by WHO, was found in Wuhan, Hubei, China [1] . The disease was caused by a new coronavirus called SARS-CoV-2. Till June 15th, 2020, a total of 7,823,289 infection cases with 431,541 deaths have been reported [2] . Therefore, it is emergent to identify effective treatment options to prevent COVID-19. SARS-CoV-2 is an emerging virus and there is no specific drug or vaccine [3] . Developing a new antiviral drug or vaccine may be unrealistic in such an urgent situation. However, SARS-CoV-2 is a singlestranded positive-sense RNA virus [4] and strongly similar to SARS-CoV [5] and MERS-CoV [6] . These three viruses may all cause severe respiratory symptoms including fever, cough and shortness of breath [7] . Previous studies have repositioned many existing drugs for effectively treating infectious diseases caused by single-stranded RNA viruses [8] , such as SARS [9] , MERS [10] and influenza [11] . Similarly, selecting potential antiviral drugs against SARS-CoV-2 from FDA-approved compounds may be an effective option to combat COVID-19 [12] . In this study, complete genomic sequence similarity of viruses, chemical structure similarity of drugs and virus-drug association (VDA) network were first integrated. A KATZ-based VDA prediction model was then developed to identify possible antiviral drugs against SARS-CoV-2. The proposed VDA-KATZ method was compared with three state-of-theart association prediction methods, including SMiR-NBI [13] , LRLSH-MDA [14] and NGRHMDA [15] . The results showed that VDA-KATZ obtained the best AUC of 0.8803 when the walking length is 2. Remdesivir, oseltamivir, and zanamivir were predicted to be the top 3 small molecules associated with SARS-CoV-2. Molecular docking is a theoretical simulation method to compute the binding mode and affinity through electrostatic interactions [16] , hydrogen bond interactions [17] , Van der Waals forces interactions [18] and hydrophobic interactions [19] between molecules (for example, ligands and receptors). Energy matching is the basis of the stable combination between molecules. In this study, we further used Auto-Dock, an open-source molecular simulation software, to measure molecular binding activities between the predicted top 10 antiviral drugs and the SARS-CoV-2 spike protein [20] /human ACE2 [21] . The docking results suggested that the predicted top 3 chemical agents, with binding energies of −7.4 kcal/mol, −4.73 kcal/mol, and − 5.48 cal/mol with ACE2 respectively, may be the effective options to combat COVID-19. We considered 11 viruses similar to SARS-CoV-2 and obtained 96 VDAs between these 11 viruses and their associated 78 small molecular drugs from the DrugBank [22] , NCBI [23] and PubChem [24] databases. These 96 VDA data is represented a matrix A m×n where A ij = 1 if there is an association between the ith drug and jth virus, otherwise, A ij = 0. Complete genomic sequence similarity matrix K v of viruses was computed using MAFFT, a multiple sequence alignment program [25] . Chemical structure similarity matrix K d of small molecular drugs was calculated using RDKit, an open-source cheminformatics tool [26] . The detailed information is shown in Table 1 . Assume that the association profile of a virus is represented as a binary vector used to encode whether there is an association between the virus and all drugs in known VDA network, we further calculated virus similarity matrix G v based on GAPK. For a given virus v i , its association profile AP(v i ) is defined as the ith column of A. The GAPK similarity G v (v i , v j ) between two viruses v i and v j is calculated by Eq. (1). 2 denotes the normalized core bandwidth based on bandwidth parameter γ v ′, and n is the number of viruses. We integrated K v and G v to compute the final virus similarity matrix S v by Eq. (2): where w 1 is a parameter ranging from 0 to 1 and used to balance the importance between the complete genomic sequence similarity and the GAPK similarity. Similarly, for a given drug d i , its association profile AP(d i ) is defined as the ith row of A. The GAPK similarity G d (d i , d j ) between two drugs d i and d j is calculated by Eq. (3). 2 denotes the normalized core bandwidth based on bandwidth parameter γ d ′, and m is the number of drugs. We obtained the final drug similarity matrix S d by Eq. (4): where w 2 is a parameter ranging from 0 to 1 and used to balance the importance between the chemical structure similarity and the GAPK similarity. KATZ is a network-based association prediction method based on the similarity of nodes in a heterogeneous network [27] . It uses the traversal times and step sizes between nodes as effective similarity indicators. Inspired by the KATZ method, we represented VDA identification as a problem of computing the number of connection paths between viruses and drugs in the heterogeneous virus-drug networks and then developed a new VDA prediction model, VDA-KATZ. The details are shown in Fig. 1 . First, virus similarity matrix S v , drug similarity matrix S d and known VDA network A are integrated into a heterogeneous network A * : Assume that l-length walks between v i and d j are computed by A l (i, j). All different length walks were integrated to compute the association probabilities between n viruses and m drugs by Eq. (6): where β l (l = 1,2,…, k) is a non-negative parameter and used to restrain the contribution of different walks. In addition, we represent the association probability matrix P in Eq. (6) as the block matrix in Eq. (7): = ⎡ ⎣ ⎤ ⎦ P P P P P 11 12 21 22 (7) where P 11 denotes virus-virus association probability matrix, P 12 denotes VDA probability matrix, P 21 represents drug-virus association probability matrix, and P 22 represents drug-drug association probability matrix. The constructed VDA matrix is sparse and it is meaningless to consider the long-length walks in a sparse network, therefore, we set k as 2, 3 and 4 to measure the influence of k on the prediction performance: Finally, based on the prediction results by KATZ, we conducted molecular docking between the predicted top 10 small molecules and the SARS-CoV-2 spike protein/human ACE2. Molecular binding energies are computed to evaluate the binding activities between these small molecules and target proteins. In this study, we evaluated the performance of VDA-KATZ prediction method based on 5-fold cross-validation. Accuracy, sensitivity, specificity and AUC were used to evaluate these methods. Accuracy, sensitivity, and specificity are defined as Eqs. (11)-(13). where TP, FP, TN, and FN are defined as Table 2 . AUC is the area under the ROC curve. The curve can be plotted by true positive rate (TPR, i.e., Eq. (14)) and false positive rate (FPR, i.e., Eq. (15)). When the AUC value is 1, it denotes the optimal performance. When the AUC value is in the range of (0.5,1), the larger AUC represents the better performance. = In this section, we conducted a series of experiments to measure the performance of our proposed VDA-KATZ method based on 5-fold cross validation. All known VDAs in VDA network were randomly divided into 5 mutually exclusive and roughly equal subsets. In each round, four subsets were used as train set and the remaining was applied to test the performance of models. The experiment was repeated 100 times and the final result was averaged over 5-fold results for these 100 trials. For VDA-KATZ, SMiR-NBI [13] , LRLSHMDA [14] and NGRHMDA [15] , we conducted grid search to obtain the optimal values of parameters. In VDA-KATZ, we set the parameter β, w 1 , w 2 , γ v ′ and γ d ′ in the range of best performance. The parameter lw in LRLSHMDA was set as lw = 0.1 where LRLSHMDA obtained the optimal performance. The parameters γ d , γ w , α and β in NGRHMDA were set as 0.5 where NGRHMDA can better predict VDA candidates. To evaluate the prediction performance of our proposed VDA-KATZ method, SMiR-NBI [13] , LRLSHMDA [14] and NGRHMDA [15] were compared to VDA-KATZ (k = 2). SMiR-NBI [13] was used to predict the responses of anticancer drugs on a heterogeneous network by a network-based method. LRLSHMDA [14] used a Laplacian regularized least square classifier to find new microbe-disease associations. NGRHMDA [15] was applied to microbe-disease association prediction combining neighbor-based collaborative filtering and graph-based scoring. The above three methods had achieved good performance in the corresponding areas. The experimental results were shown in Table 3 . The best results were denoted in bold in each row in Table 3 . As shown in Table 3 , VDA-KATZ obtained better AUC, accuracy, and specificity in the above four methods. The AUC values of these four methods are shown in Fig. 2 . AUC is the average area under ROC (the receiver operating characteristics) curve. The ROC curve can be plotted by the ratio of true positive rate to false positive rate based on different thresholds. True positive rate denotes the ratio of the predicted true VDAs to all known VDAs. False positive rate represents the ratio of the predicted false VDAs to all known negative VDAs. AUC was a more important evaluation metric although its sensitivity was slight lower than SMiR-NBI [13] . Therefore, VDA-KATZ can better discover antiviral drugs against SARS-CoV-2. We conducted extensive experiment to measure the effect of different walking lengths on the prediction performance. The results were shown in Table 4 . The best results were denoted in bold in each row in Table 4 . As shown in Table 4 , although VDA-KATZ computed the best accuracy of 0.8145 and specificity of 0.8203 when k = 3, it obtained the best AUC of 0.8803 and sensitivity of 0.6976 when k = 2. Fig. 3 showed the AUC values of different walking lengths. AUC is a more important measurement compared with other three evaluation metrics. Larger AUC denotes the better prediction performance. The AUC value obtained from VDA-KATZ when k = 2 is better than those of k = 3. We found that larger walking lengths may result in slight lower prediction performance in the constructed dataset. Therefore, we selected k = 2 as the optimal walking length. We further analyzed possible VDAs related to SARS-CoV-2 after confirming the performance of VDA-KATZ. We predicted the top 10 antiviral drugs against SARS-CoV-2 when k= 2. The results were shown in Table 5 . Among the predicted top 10 antiviral drugs against SARS-CoV-2, 8 chemical agents have been reported by recent publications, that is, 80% small molecules are supported to be the clues of treatment of COVID-19 by related works. We found that remdesivir obtained the highest association score with SARS-CoV-2. Remdesivir is a nucleoside analogue with antiviral activity [28] . It has broad-spectrum activities against RNA viruses [29] , such as SARS and MERS [30] [31] [32] , and has been tested in a clinical trial for Ebola [33] . Oseltamivir is an antiviral neuraminidase inhibitor [34] and used to prevent the infection of influenza A virus (for example, A-H1N1 [35] and A-H5N1 [36] ) and influenza B virus. It can prevent the germination, replication and infectivity of the virus in the host cell. More importantly, oseltamivir combined with other drugs has been reported to inhibit the infection of SARS-CoV-2 [37] . Zanamivir is an antiviral drug and neuraminidase inhibitor used to the treatment of non-complex acute diseases caused by influenza A and influenza B viruses [38] . The small molecule shows potential function as 3CL PRO main proteinase inhibitor. It is likely to be applied to the treatment of COVID-19 [39] . We downloaded the chemical structures of the above 10 small molecules and ACE2 from the DrugBank database and the RCSB Protein Data Bank (ID:6 MJ0) [40] . The structure of the SARS-CoV-2 spike protein was computed based on homologous modeling from zhanglab [41] . We used AutoDock, an available bioinformatics tool, to conduct molecular docking for the predicted antiviral drugs and the SARS-CoV-2 spike protein/ACE2. Search algorithm used genetic algorithm and grid box selected the entire protein in AutoDock. We computed the molecular binding energies of the predicted top 10 small molecules and the SARS-CoV-2 spike protein and ACE2. The results were shown in Table 6 . The results suggested that remdesivir has high binding activities of −5.22 kcal/mol and − 7.4 kcal/mol with these two proteins, followed by oseltamivir with −4.04 kcal/mol and − 4.73 kcal/mol and zanamivir with −4.93 kcal/mol and − 5.48 kcal/mol. More importantly, we predicted that zidovudine has molecular binding energies of −6.54 kcal/mol and − 7.93 kcal/ mol. The drug is an effective HIV replication inhibitor. The chemical agent can improve immune function, partially reverse HIV induced neurological dysfunction, and other AIDS related clinical abnormalities [42] . As an HIV nucleoside/nucleotide analogue reverse transcriptase inhibitor, zidovudine is likely to be the clues of treatment of COVID-19 [43] . Figs. 4 and 5 show the molecular dockings of remdesivir, oseltamivir, zanamivir, and zidovudine with the SARS-CoV-2 spike protein/ human ACE2. Circle in each subfigure describes the binding site of the drug with the target protein. For example, the amino acids L849, T827, W1212, L144, and P504 were predicted to be the key residues for remdesivir binding to the SARS-CoV-2 spike protein/ACE2 while A892, K786, F438, and I291 were predicted as the key residues for zidovudine binding to these two target proteins. Table 6 The molecular binding energies between the predicted top 10 antiviral drugs and two target proteins. COVID-19 is rapidly spreading around the world and it is urgent to find effective treatment strategies. However, it is almost impossible to develop a new antiviral drug against SARS-CoV-2 in such a short time. Drug repositioning, aiming to find new uses for FDA-approved drugs, provides new strategy. We can identify antiviral drugs applied in COVID-19 by drug repositioning. In our proposed VDA-KATZ method, we integrated complete genomic sequences of viruses, chemical structures of small molecules, and VDA network into a unified framework and developed a VDA prediction method based on the KATZ model. The method is mainly based on the assumption that similar drugs/viruses may associate with the same or similar viruses (drugs). The originality of the proposed method remains, identifying potential antiviral small molecules against SARS-CoV-2 from FDA-approved drugs through new VDA prediction based on drug repositioning. More importantly, VDA-KATZ integrated topological features and different walking length in known VDA network. The comparative experiments showed better performance of VDA-KATZ. We performed molecular docking for the predicted top 10 drugs with the SARS-CoV-2 spike protein/ACE2. The results showed that remdesivir, oseltamivir, and zanamivir have higher molecular binding energy with these two target proteins. Higher AUC and molecular binding energies suggested that the inferred chemical agents related to SARS-CoV-2 are likely to be effective to combat COVID-19. Interestingly, zidovudine was predicted as an antiviral drug against SARS-CoV-2 although there is no works about its relationship with the virus. In the future, we will further consider ensemble strategy [44] to improve VDA prediction by integrating logistic matrix factorization [45] and bipartite network projection [46] . We hope that our prediction results may be helpful to prevent the rapid transmission of COVID-19. Source code and dataset are freely available for download at https://github.com/plhhnu/VDA-KATZ/. Authors Geng Tian, Jialiang Yang, Qingqing Lu and Ruyi Dong were employed by the company Geneis (Beijing) Co. Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. World Health Organization, WHO Director-General's Remarks at the Media Briefing on Coronavirus Disease (COVID-19): Situation Report-147 Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2 Characterization of severe acute respiratory syndrome-associated coronavirus (SARS-CoV) spike glycoprotein-mediated viral entry State of knowledge and data gaps of Middle East respiratory syndrome coronavirus (MERS-CoV) in humans World Health Organization Polypharmacology: challenges and opportunities in drug discovery Enteric involvement of severe acute respiratory syndrome-associated coronavirus infection SARS and MERS: recent insights into emerging coronaviruses Influenza virus assembly and budding Drug treatment options for the 2019-new coronavirus (SARS-CoV-2) Network-based identification of microRNAs as potential pharmacogenomic biomarkers for anticancer drugs LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction Prediction of microbe--disease association from the integration of neighbor and graph with collaborative recommendation model A local reaction field method for fast evaluation of long-range electrostatic interactions in molecular simulations Hydrogen bonding and molecular surface shape complementarity as a basis for protein docking Combination rules for van der Waals force constants Stability of protein structure and hydrophobic interaction Return of the coronavirus: 2019-nCoV Single-cell RNA expression Fig. 5. Molecular docking between remdesivir, oseltamivir, zanamivir and zidovudine and ACE2 profiling of ACE2, the putative receptor of Wuhan SARS-CoV-2 DrugBank 5.0: a major update to the DrugBank database for Database resources of the national center for biotechnology information PubChem substance and compound databases Recent developments in the MAFFT multiple sequence alignment program RDKit: Open-Source Cheminformatics A new status index derived from sociometric analysis Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro The antiviral compound remdesivir potently inhibits RNA-dependent RNA polymerase from Middle East respiratory syndrome coronavirus Role of GS-5734 (Remdesivir) in inhibiting SARS-CoV and MERS-CoV: The expected role of GS-5734 (Remdesivir) in COVID-19 (2019-nCoV)-VYTR hypothesis Comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against MERS-CoV COVID-19) epidemics, the newest and biggest global health threats: what lessons have we learned? Mechanism of inhibition of Ebola virus RNA-dependent RNA polymerase by remdesivir Oseltamivir-resistant influenza virus A (H1N1), Europe, 2007/08 season Oseltamivir resistance during treatment of influenza a (H5N1) infection Clinical features and treatment of 2019-nCov pneumonia patients in Wuhan: report of a couple cases A search for medications to treat COVID-19 via in silico molecular docking models of the SARS-CoV-2 spike glycoprotein and 3CL protease The protein Data Bank Antiretroviral drug activity and potential for pre-exposure prophylaxis against COVID-19 and HIV infection HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy Predicting lncRNA-miRNA interactions based on logistic matrix factorization with neighborhood regularized The Bipartite Network Projection Recommended Algorithm for predicting long noncoding RNA-protein interactions This work was supported by the National Natural Science Foundation of China (Grant 61803151), the Natural Science Foundation of Hunan province (Grant 2018JJ2461, 2018JJ3570). We are thankful for help from Ming Kuang, and Longjie Liao from Hunan University of Technology, Lebin Liang and Jidong Lang from Geneis (Beijing) Co. Ltd., and Junlin Xu from Hunan University. We would like to thank all authors of the cited references. Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ygeno.2020.07.044.