key: cord-0262707-b5uih9yr authors: Yin, Qijin; Cao, Xusheng; Fan, Rui; Liu, Qiao; Jiang, Rui; Zeng, Wanwen title: DeepDrug: A general graph-based deep learning framework for drug-drug interactions and drug-target interactions prediction date: 2022-04-12 journal: bioRxiv DOI: 10.1101/2020.11.09.375626 sha: a0c18e0e4dd92de419c538a230f58ee23c6f9498 doc_id: 262707 cord_uid: b5uih9yr Computational approaches for accurate prediction of drug interactions, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), are highly demanded for biochemical researchers due to the efficiency and cost-effectiveness. Despite the fact that many methods have been proposed and developed to predict DDIs and DTIs respectively, their success is still limited due to a lack of systematic evaluation of the intrinsic properties embedded in the corresponding chemical structure. In this paper, we develop a deep learning framework, named DeepDrug, to overcome the above limitation by using residual graph convolutional networks (RGCNs) and convolutional networks (CNNs) to learn the comprehensive structural and sequential representations of drugs and proteins in order to boost the DDIs and DTIs prediction accuracy. We benchmark our methods in a series of systematic experiments, including binary-class DDIs, multi-class/multi-label DDIs, binary-class DTIs classification and DTIs regression tasks using several datasets. We then demonstrate that DeepDrug outperforms state-of-the-art methods in terms of both accuracy and robustness in predicting DDIs and DTIs with multiple experimental settings. Furthermore, we visualize the structural features learned by DeepDrug RGCN module, which displays compatible and accordant patterns in chemical properties and drug categories, providing additional evidence to support the strong predictive power of DeepDrug. Ultimately, we apply DeepDrug to perform drug repositioning on the whole DrugBank database to discover the potential drug candidates against SARS-CoV-2, where 3 out of 5 top-ranked drugs are reported to be repurposed to potentially treat COVID-19. To sum up, we believe that DeepDrug is an efficient tool in accurate prediction of DDIs and DTIs and provides a promising insight in understanding the underlying mechanism of these biochemical relations. The source code of the DeepDrug can be freely downloaded from https://github.com/wanwenzeng/deepdrug. To whom correspondence may be addressed. Email: liuqiao@stanford.edu; The exploration for biomedical interactions between chemical compounds (drugs, molecules) and protein targets 38 is of great significance for drug discovery 1 . It is believed that drugs interact with biological systems by binding to In summary, the contributions of this paper are summarized as follows: -DeepDrug provides a unified framework based on RGCNs with edge features incorporation to extract structural 115 information and CNNs to extract sequential information for both drugs and proteins for downstream DDIs and -The interpretation of structural features learned from DeepDrug proves the key insight that biomedical structure 124 may determine their function and drugs with similar structures tend to have similar targets. -The results of drug repositioning for SARS-CoV-2 suggest that DeepDrug can be a useful tool for effectively 126 predicting DDIs and DTIs and greatly facilitate the drug discovery process. Overview of DeepDrug We developed a deep learning framework, DeepDrug, to predict drug interactions (e.g., DDIs and DTIs) by 131 combining sequence profile and structural profile (Methods). For each input (drug or protein), we used sequence 132 data as well as the partially available structure profile data as separate input branch to the DeepDrug model ( Fig. 133 1). The input sequence was converted into a representation using one-hot encoding, and connected to several Table 1) . Comparing to the second best baseline method DeepPurpose, DeepDrug achieved averaged 2.1% higher F1 157 score,1.3% higher auPRC score and 1.1% higher auROC score in balanced data settings. However, Due to the rarity of occurrence of DDIs 41 , the number of known DDIs among a typical drug database is 159 usually very low. Hence, to be more realistic and practical, we also evaluated robustness of DeepDrug with To further showcase the predictive capability of our model, we compare DeepDrug with other methods in multi-171 class/multi-label classification tasks. We conducted the classification experiments using DrugBank and Twosidess 172 databases based on the 86 and 1317 interaction types, respectively. All of the DDI methods were evaluated using 173 standard metrics including macro F1 score and auPRC score. In multi-class classification, DeepDrug achieved the 174 best performance by obtaining 4.3%-5.8% higher F1 score and 4.9%-6.7% higher auPRC than the best baseline 175 method (Supplementary Table 2 DeepDrug was shown to be superior and robust in both binary and multi-class/multi-label classification of DDIs. Therefore, unlike DeepPurpose that only used the SMILES sequence information, DeepDrug exploited both 182 structural information from a novel graph representation and sequence information from SMILES string, which is 183 potentially capable of learning the underlying structural properties to gain better performance. Although proteins generally have more intricate structures than chemical drugs due to their three-dimensional 186 arrangement of sequence residuals, they can still be effectively represented by 3D graphs. We first classified the (Fig. 4A, Supplementary Fig. 3) . We assumed that drugs that were closer in the embedding space (e.g, within the same cluster) implied the presence of certain form of higher similarity or closer relationship. consistently achieved the best performance (Fig. 4C) . Furthermore, to evaluate the performance of Deepdrug 256 applied to unseen drugs, we collected 4886 unseen drugs from DrugBank website (Methods) and the DCES was 257 0.575 (Supplementary Table 9 ), which was significantly better than the background distribution (DCES of null 258 distribution = 0.10). For further understanding, we isolated 12 drugs in the cluster 4 (enriched as opioids) and compared their chemical We next investigated whether DeepDrug was able to correctly identify the interactions of SARS-CoV-2 proteins. We constructed two drug-target positive datasets (i.e. one is expert-confirmed and one is literature-based) for 281 SARS-CoV-2 from a recent study 46 (Methods). In our benchmark BindingDB dataset, there were 68 SARS-CoV-2 282 interacting drugs and 124 proteins which were similar with these SARS-CoV-2 proteins. To obtain a stringent rule 283 for constructing dataset, we removed those SARS-CoV-2 interacting drugs and analogous drugs from the training 284 set that shared similar SMILE sequences (defined as drugs sequence similarities > 60%, see Methods, Supplementary Fig 4A-B) , and removed proteins similar SARS-CoV-2 with protein sequence similarities > 30%. After removing these records we re-trained the DeepDrug model and combined the SARS-CoV-2 interacting drugs 287 with the remaining 64,980 drugs to construct an independent test set. The DeepDrug prediction scores for 288 interacting pairs and non-interacting pairs were shown in Fig 5A, we noticed that DeepDrug assigned higher 289 prediction scores for those interacting pairs. The results showed that DeepDrug was able to distinguish expert- 303 In this study, we proposed DeepDrug as a novel end-to-end deep learning framework for DDIs and DTIs We provide two future directions for improving our DeepDrug model. First, the current interaction predictions (e.g., DTIs) do not consider the causal interaction where one drug is involved in a biological or biochemical process to We collected 3 DTI benchmark datasets for evaluation, including DAVIS, KIBA and BindingDB 50 dataset. Specifically, DAVIS dataset consists of 68 drugs and 316 proteins, which constructs 21488 drug-protein pairs. KIBA The two RGCN or CNN modules had shared weights during DDI tasks and are independent for DTI tasks. The features extracted by these modules are concatenated together and fed to the combined prediction module, which 379 consisted of two linear layers and a final prediction layer. The two linear layers had 128 and 32 nodes respectively, 380 and each was followed by a batch normalization layer, a dropout layer and a ReLU nonlinear layer. The final 381 prediction layer was a linear layer with an activation function, which was dependent on the tasks. Specifically, the 382 Sigmoid activation function were used for binary classification task and multi-label classification task. The Softmax 383 activation function was selected for multi-class classification task and none of activation function was used for 384 regression task. We used Adam optimizer with initial settings of a learning rate of 0.01, and a weight decay of 10 -4 . The dropout Secondly, for each protein in the SARS-CoV-2, the most similar templates in the RCSB database are used as the 452 crystal structure of the protein, which are also provided in the SARS-CoV-2 3D database 57 . Supervised prediction of drug-target interactions using bipartite local models Machine Learning for Integrating Data 483 in Biology and Medicine: Principles, Practice, and Opportunities Sildenafil: an orally active type 5 cyclic GMP-specific phosphodiesterase inhibitor for the 485 treatment of penile erectile dysfunction Mechanisms of drug combinations: interaction and network perspectives Synergistic drug combinations for cancer identified 489 in a CRISPR screen for pairwise genetic interactions Combining genomic and network characteristics for extended capability in predicting synergistic 491 drugs for cancer Incidence of adverse drug reactions in hospitalized patients: a meta-493 analysis of prospective studies Inappropriate prescribing in an acutely ill population 495 of elderly patients as determined by Beers' Criteria Mibefradil-a drug which may enhance the propensity for the development of abnormal QT 497 prolongation Cerivastatin and reports of fatal rhabdomyolysis DrugBank: a comprehensive resource for in silico drug discovery and exploration Data-driven prediction of drug effects and interactions Protein Data Bank: biological macromolecular structures enabling research and 505 education in fundamental biology, biomedicine, biotechnology and energy PubChem 2019 update: improved access to chemical data Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity Deep learning improves prediction of drug-drug and drug-food interactions DeepDTA: deep drug-target binding affinity prediction DeepCDR: a hybrid graph convolutional network for predicting cancer drug 516 response Survey of Similarity-Based Prediction of Drug-Protein Interactions Prediction of drug-target interaction networks 520 from the integration of chemical and genomic spaces Modeling polypharmacy side effects with graph convolutional networks Deep learning for drug-drug interaction extraction from the literature: a review Machine learning 526 approaches and databases for prediction of drug-target interaction: a survey paper DeepPurpose: A Deep Learning Library for Drug Predicting drug-target binding 531 affinity with graph neural networks. bioRxiv Why is Tanimoto index an appropriate choice for fingerprint-based similarity 533 calculations? Drug-target interaction prediction via chemogenomic space: learning-based 535 methods A network integration approach for drug-target interaction prediction and computational drug 537 repositioning from heterogeneous information Semi-supervised classification with graph convolutional networks Graph attention networks Gated graph sequence neural networks Reinforced molecular optimization with neighborhood-controlled grammars Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and 548 Clinical Applications Convolutional networks on graphs for learning molecular fingerprints Protein interface prediction using graph convolutional networks Advances in neural information processing systems 30 Padme: A deep learning-based framework for drug-target 554 interaction prediction Structural learning of proteins using graph convolutional neural networks PAIRpred: partner-specific prediction of interacting residues from sequence 558 and structure AttentionDDI: Siamese Attention-based Deep 560 Learning method for drug-drug interaction predictions A community computational challenge to predict the activity of pairs of compounds Compound-protein interaction prediction with end-to-end learning of neural 564 networks for graphs and sequences MolTrans: Molecular Interaction Transformer for drug-target interaction 566 prediction TransformerCPI: improving compound-protein interaction prediction by sequence-based deep 568 learning with self-attention mechanism and label reversal experiments The DDI corpus: an annotated corpus with 570 pharmacological substances and drug-drug interactions A SARS-CoV-2-Human Protein-Protein Interaction Map Reveals Drug Targets and 572 Potential Drug-Repurposing. bioRxiv The SARS-CoV nucleocapsid protein: a protein with multifarious activities Tiotropium is predicted to be a promising drug for COVID-19 through 576 transcriptome-based comprehensive molecular pathway analysis DDInter: an online drug-drug interaction database towards improving clinical decision-making 578 and patient safety A graph neural network module and a convolutional neural network (CNN) module are used for 601 extracting features for drug and protein separately in the DTI task. For the DDI task, the weights in deep graph 602 neural network are shared for a pair of drugs. The features extracted are concatenated and finally fed to a prediction 603 module for various tasks Note that the performance scores of NDD and AttentionDDI 607 are from the original paper and "-" indicates not applicable. The performance of DeepDrug and the best baseline 608 method, DeepPurpose, on the unbalanced dataset are shown in (C) and (D). The x axis indicates the odds of 609 DeepDrug is compared with every baseline 611 on every dataset in terms of F1 (E) and auPRC (F). The x axis and the y axis of each dot indicate the performance 612 of a certain baseline (indicated by dot color) and DeepDrug on a certain dataset DeepDrug is benchmarked with 6 615 baselines on three datasets in terms of F1 (A), auROC (B) and auPRC (C) in the DTI tasks