key: cord-0274585-hg85t2sp authors: Lee, Segyu; Bang, Junil; Hong, Sungeun; Jang, Woojung title: ABCnet : Self-Attention based Atom, Bond Message Passing Network for Predicting Drug-Target Interaction date: 2021-12-27 journal: bioRxiv DOI: 10.1101/2021.12.27.474154 sha: 2ca98144c72da1d9d417f42b224dda6cb7e93597 doc_id: 274585 cord_uid: hg85t2sp Drug-target interaction (DTI) is a methodology for predicting the binding affinity between a compound and a target protein, and a key technology in the derivation of candidate substances in drug discovery. As DTI experiments have progressed for a long time, a substantial volume of chemical, biomedical, and pharmaceutical data have accumulated. This accumulation of data has occurred contemporaneously with the advent of the field of big data, and data-based machine learning methods could significantly reduce the time and cost of drug development. In particular, the deep learning method shows potential when applied to the fields of vision and speech recognition, and studies to apply deep learning to various other fields have emerged. Research applying deep learning is underway in drug development, and among various deep learning models, a graph-based model that can effectively learn molecular structures has received more attention as the SOTA in experimental results were achieved. Our study focused on molecular structure information among graph-based models in message passing neural networks. In this paper, we propose a self-attention-based bond and atom message passing neural network which predicts DTI by extracting molecular features through a graph model using an attention mechanism. Model validation experiments were performed after defining binding affinity as a regression and classification problem: binary classification to predict the presence or absence of binding to the drug-target, and regression to predict binding affinity to the drug-target. Classification was performed with BindingDB, and regression was performed with the DAVIS dataset. In the classification problem, ABCnet showed higher performance than MPNN, as it does in the existing study, and in regression, the potential of ABCnet was checked compared to that of SOTA. According to experiments, for Binary classification ABCnet have an average performance improvement of 1% for higher performance on DTI task than other MPNN and for regresssion ABCnet have CI with an average 0.01 to 0.02 performance degradation compared to SOTA. https://www.overleaf.com/project/618a05533676801d8f68ccf6 The study of the interaction between compounds and proteins plays an important role in 29 the development of a wide range of drugs, as it can reveal the extent of the therapeutical 30 benefit to patients of the activation, inhibition, or conformational changes of functional 31 proteins. Drug-target affinity (DTA) predicts the interaction, specifically the binding cannot be applied when the three-dimensional structure of a protein is not known, and 48 large-scale simulations using this method require a lot of time. 49 Machine learning approaches are attracting attention because they can scan a large 50 number of candidates in a short period of time [2] .An immense volume of chemical and 51 biomedical data have accumulated over decades of experiments, prompting the 52 emergence of a data-based research methodology, and recently, a machine learning 53 method that increases performance according to the quantity and quality of data is 54 attracting attention [21] . The deep learning method has achieved overwhelming success 55 in fields such as computer vision, natural language processing, and speech recognition in 56 particular [10] . As the applicability of deep learning has been proven in various fields, 57 research on its application to drug discovery has also emerged, first introduced in the 58 fields of molecular property prediction and DTI [1] . Recently, research on the latest 59 technology beyond the introduction stage is also being conducted, including cases where 60 GCN [15] and transformer [12] methodologies are applied to DTA and molecular 61 property prediction. Recent studies have been conducted using deep learning methods 62 to find drug candidates for the SARS-CoV virus [29] [3] . Since deep learning can find 63 drugs which can interact with and bind to disease-causing target proteins in a relatively 64 short time and at low cost, deep learning in the discovery of candidate substances 65 during drug development is attracting attention. 66 DTA is an analysis which predicts the binding affinity of a compound and a target 67 by receiving the sequence data of a compound and a target protein, and can be applied 68 to various drug discovery processes. To explain the relationship between DTA input and 69 output, compound data, a component of the input data, is commonly represented with 70 a data expression method called SMILES, a format which lists compound element 71 symbols and combinations as strings. The target protein is expressed as a 72 one-dimensional sequence, in which its amino acids are listed in the form of a string. 73 From these two strings, the DTA model learns and predicts whether or not the 74 compound and target protein will bind with each other, and to what extent. 75 The architecture of the deep learning-based DTA model consists of three major 76 components. The first is the drug encoder, which converts the information about atoms 77 and bonds expressed as strings in SMILES into a data form that can be learned with 78 RDkit from a generated one-hot encoding matrix. In other words, it is a step in 79 generating expression embeddings which can better contain structural and physical The second is the target encoder, which converts the amino acid sequence of a 83 protein into a one-hot vector for each amino acid expressed in 26 amino acid 84 2/17 combinations, then converts the sequence into a vector expression. That is, it performs 85 the expression embedding of the protein. The third is a decoder which fuses the drug 86 and protein feature embedding results extracted by the drug and target encoders into a 87 latent vector. Several deep learning techniques can be used in the subsequent process. If 88 the prediction target is regression, the model is configured to yield an affinity score if it 89 is classified and whether it is combined. Our study aimed to derive performance beyond the SOTA by focusing on improving 91 the drug encoder during the DTA stage. Various efforts are underway to extract drug 92 characteristics. Among them, research using message passing neural networks (MPNNs) 93 have been active for several years. Gilmer [8] first introduced MPNNs in the effective 94 extraction of drug characteristics, Tang [24] implemented self-attention in addition to 95 MPNN, and Huang [11] performed a study to improve MPNN. MPNN creates the 96 message passing function through the Dense layer before creating the initial value, that 97 is, the 0th hidden state for the atom. And without calculating the hidden state of the 98 bond, based on only the atom v, only the information about the neighboring Atom u 99 and the neighboring Bond v − u is transferred to create an atom message. This method 100 makes it difficult to properly reflect the structural information of the compound due to 101 the simplification of Atom initial value setting, and since the bond also uses only 102 features, the structural information about the bond is ignored. Huang's study added a 103 hidden state calculation for the bond to improve the bond constraint, but it did not 104 overcome the limit for the atom. In our study, we did not set the initial value of the atom hidden state as a simple 106 dense layer, but used the neighboring atom feature information to set the initial value 107 using the dense layer, message passing, and update function. The second point of focus, 108 the Bond feature, is not the method that uses the existing raw data as it is, but the 109 method of setting the initial value of the atom hidden state is also applied to the bond. 110 In addition, referring to the study using self-attention, self-attention was used in Atom, 111 Bond, and Concat Message to improve the part where the relationship between data 112 that is too far away, which may occur in sequence data, is not well reflected. 113 We propose ABCNet as Figure 1 . First, the Drug Encoder is composed of an Atom 114 initial message block, a Bond initial message block, and a Concat message block that 115 concatenates the Atom and Bond initial hidden state, transmits neighboring Atom and 116 Bond information, and generates an Atom message. This is called ABCnet: Atom, Bond, 117 Concat message passing network.The second is a Target Encoder, which uses a Simple 118 Convolutional Neural Network as a protein model to extract protein features.The 119 second is a Target Encoder, which uses a Simple Convolutional Neural Network as a 120 protein model to extract protein features. The third is a decoder, which predicts a 121 drug-target interaction score through a fully connected layer by using a latent vector 122 obtained by concatenating drug and protein features extracted from the drug encoder 123 and the target encoder as input data. To verify our proposed ABCNet, two experiments were performed: Binary 125 Classification, which predicts DTI, and Regression, which predicts the binding 126 score.BindingDB [17] and DAVIS [5] benchmarks were used for each experiment, and 127 accuracy and AUPR were used as performance evaluation indicators, and MSE and 128 AUPR were used. For accurate performance comparison, SOTA algorithms were 129 performed in the same environment and the results were compared and analyzed. As a 130 result of performance evaluation with SOTA, there was a 1-2% improvement in 131 classification and 0.01 0.02 lower performance in regression CI. The purpose of the Binary Classification experiment is to compare existing MPNN 133 models to find out whether the drug encoder of the proposed model extracts drug 134 properties well. In order to achieve the objective, the structure of the rest of the models 135 except for the drug encoder and the hyperparameter values were matched to compare 136 the drug encoder to find the optimal drug encoder. Since this result performance is all 137 the same except for the drug encoder, the DTI performance result depends entirely on 138 the drug encoder. Through regression analysis with ABCnet derived in this way, the 139 performance was compared with State-of-the-art (SOTA) of DTI. As studies of predicting drug-target protein interaction (DTI) using artificial intelligence 142 have been actively conducted, these two studies have received a lot of attention. The 143 first study, DeepDTA, is the first study to predict the interaction using deep learning by 144 binary classification. This study proposed a deep learning-based 2-way CNN model that 145 uses only the sequence information of the target and drug to predict the interaction 146 binding affinity, and utilized the Davis and KIBA dataset. Through this study, it can be 147 seen that the door to new drug development research using artificial intelligence has 148 been opened, and it is being used as a model that serves as a reference point for other 149 studies. The second study, GraphDTA, is a model that graphs drugs and predicts 150 drug-target affinity using a graph neural network. The graph was used so that the data 151 in the original graph form could be used in a way similar to the natural state, rather 152 than using a simple sequence. This study also used the same data as DeepDTA. Through this study, it was verified that graphs are a method that can be used for Node(Atom), E is Edge(Bond), n is the number of all Node, d is the feature dimension 185 for Node and A is the adjacency matrix for the graph). In this paper, using these input 186 x and adjacency matrix A as input data, a forward pass was performed through a filter 187 calculated by applying the Chebyshev polynomial and two hidden layer GCNs. Through 188 this hidden state, node-level prediction is made. Through node-level prediction, we can 189 classify semi-supervised nodes for nodes we do not know [14] . Message Passing is a necessary process in most Graph Neural Networks. Among the 191 methodologies for learning graphs, the Spatial Convolutional Network borrows the idea 192 of a Convolutional Neural Network, and CNN combines the surrounding pixels using a 193 convolution filter when determining the value of the central pixel. In the Spatial 194 Convolutional Network, it works by merging the features of neighboring nodes instead 195 of neighboring pixels. In addition, the Spectral Convolutional Network, designed based 196 on the graph signal processing theory, also performs the matrix multiplication of the 197 adjacency matrix and the feature matrix, which are data representations of the graph. 198 Most Graph Convolutional Networks (GCNs) operate in this way, and research has been 199 conducted on how to transmit information to share and update node information. The function corresponding to how information is transmitted is the message-passing 201 function, and Gilmer Justin conducted a study to apply GNN to Quantum Chemistry 202 using the following three functions. In this study, we describe an MPNN that has a 203 message passing phase and a readout phase in forward propagation, and operates on an 204 undirected graph G for simplicity. The message passing phase is shown below. In the 205 equation, node features are x v , edge features are e vw , message function is M t , vertex 206 update function is U t , hidden states h t v at each node in the graph, the updated message 207 is m t+1 v , and N (v) is the neighbors of v in graph G. M essage P assing f unction : In the readout phase, the message function is M t , the vertex update function is U t , 211 and the readout function R is used to calculate the feature vector for the entire graph 212 and is expressed by the following equation. The first step is to deliver the neighbor message to its own node. The second step is 215 to update the own node message through the message received from the neighbor. The 216 feature is that messages from neighboring nodes are reflected by using an adjacency 217 5/17 matrix. The parameter that determines how many neighboring nodes are viewed is 218 called hop or depth, and according to this size, all the features of the subgraph of the 219 entire graph are transmitted to compose one node information [8] . 220 As mentioned in the MPNN section above, MPNN studies are very important for 221 learning graph-structured data. This is because the embedding performance, which is a 222 new expression created as a by-product of learning, varies greatly depending on how the 223 MPNN method is designed. In Withnall's study, there is a difference in the method of 224 creating edge features when constructing neighboring messages, unlike the existing 225 MPNN method. The difference is that when constructing a neighboring message, when 226 creating an edge feature, the edge feature is embedded by concatenating the edge's own 227 feature and two node features connected to the edge. By adding two node features 228 connected to the edge in this way, the edge feature information is enriched [27] . 229 Kearnes [13] , Huang [11] conducted a study that modified the edge feature learning 230 method in various ways. In this study, edge features were learned by setting hidden 231 states for edges separately. When updating an edge message, not only the immediately 232 preceding neighbor (1-hop) edge message is obtained as in the existing MPNN, but an 233 edge message that is as far away as the depth (n-hop) is obtained, and more neighbors 234 are considered in the configuration of the edge message, and the amount of information 235 around to expand the expressive power of embedding learned with MPNN. dot product for input data, query and key. This is to find the similarity between each 239 Query Q i and Key K i through dot-product.Divide the calculated value by √ d k . By 240 scaling with √ d k , it is to prevent in advance the situation in which there is little change 241 in the derivative value of the Softmax function as the dot-product value increases. Using 242 this calculated value, the weight for Value V i is obtained through Softmax. If this 243 weight, that is, attention score, is multiplied by Value V i , the value similar to Query Q i 244 has a higher value. In other words, it is possible to express the attention mechanism 245 that the more similar the value, that is, the more important the information, the more 246 focused on that information. Attention was first introduced in Vaswani's study.Previous sequence data were 248 trained with a Recurrent Neural Network. RNN, which is a method of learning by 249 remembering data order, can reflect the order of sequence data or a neighbor 250 relationship, so it has been used for many sequence data. However, the RNN has to 251 receive a sequence unit input, and when the length increases, the gradient is lost due to 252 continuous matrix operation, so there is a problem that the relationship with distant 253 data is not properly reflected. Attention is a way to solve this problem. In a way that 254 does not only consider the distance of a sequence, but rather considers information 255 about the relationship between words, high correlation can be shown even if they are far 256 apart. Attention, unlike RNNs, does not learn sequentially, so it also has a 257 computational advantage [25] . In particular, attention has achieved high performance in 258 the field of natural language processing, and the famous BERT [6] and GPT [22] are 259 also attention-based algorithms. 260 Shin applied a Transformer that includes attention to DTA.The information included 261 in SMILES, which is a string expression of molecules, was transformed so that it can be 262 used as an input to the Transformer.This study proved that the Transformer technique 263 can be applied to the embedding of important compounds in the bio field [23] . 264 Similarly, Maziarka added a total of three features of Molecule distance, adjacency, 265 and self-attention to learning when applying the Transformer to Molecule property 266 prediction. [18] . These studies have shown that attention can improve SOTA in 267 predicting the properties of compounds or binding affinity with the target. Additionally, 268 6/17 there is a study by Tang that applied self-attention to a message passing neural network 269 (MPNN) to predict chemical properties. Basic MPNN was applied to Molecule, and 270 self-attention was applied to the feature information matrix for each atom that came 271 out through MPNN [24] . In this study, using this self-attention method, we studied a model that preserves 273 atom and bond information through self-referential calculation formula in feature 274 extraction and reduces information loss after calculating adjacent nodes. By scoring 275 each atom and bond in self-attention, it is designed to identify factors that significantly 276 contribute to or degrade the interaction. In the next, experiments to add methods to 277 the protein model were adopted as the default algorithm for upgrading each data to a 278 descriptive artificial intelligence model. In this paper, we propose a method for binding 279 affinity prediction of protein and molecules using self-attention graph based Drug 280 Target Interaction 2-way, end-to-end Model. Here, 2-way refers to a model that extracts 281 drug and protein features, respectively. After passing through a model that predicts using self-attention, attention is given to deliver a message with high relevance to 286 drug-target combination. As a result, we proposed the ABC structure, which is the drug 287 feature extraction MPNN structure that can best predict DTI. ABCnet is a model that predicts binding affinity using the fully connected layer in 303 the decoder by concatenating the drug and protein feature vectors calculated using 304 CNN in the drug encoder and the target encoder, respectively. The drug model uses SMILES as input data, but it does not encode SMILES. The 307 Atom, Bond Feature matrix, which has extracted features for Atom and Bond, is used 308 as input data. Atom and Bond features were extracted using the Python library of 309 RDkit [16] . Gilmer's research [8] and Tang's research [24] were referenced for specific 310 features to be extracted from atom and bond. 311 We tried to extract features that reflect Molecule's properties and features that are 312 highly related to drug-target binding. Formal Charge, Degree, Chirality, Aromaticity, 313 Bond Stereo, Ring, which contain the chemical and structural characteristics of 314 Molecule, including atom and bond types that must be entered, were selected. 315 Furthermore, the hybridization type, that is, the orbital shape in which electrons move, 316 7/17 Bond v and the tth hidden state for its neighbor Bond w through the t + 1th hidden 335 state for its own v create a message. The t + 1th message for the renewed self Bond v and the previous tth Bond hidden state 338 for the self v bond hidden state for the t + 1th own Bond v update. After that, the hidden state for Atom v, w and v − w Bond hidden state are input to 341 create a Message. A new hidden state for the t+1th atom v is created through the message about atom v 344 and the previous t-th atom v hidden. The feature vector is calculated through the 345 hidden state of the atom v that has been calculated in this way. 3. Concat Message Block concat message : An atom-based message is generated by 369 receiving neighboring atom messages and bond messages as atom messages and bond 370 messages. And calculate the feature vector with the hidden state of atom v. In our study, when generating a message, an atom-based message was created by neighbor information is delivered as much as depth to each atom and bond message 380 configuration, the features of the entire molecule can be delivered more abundantly. At the end of MPNN, a vector for each atom is calculated through a readout 382 function, and self-attention was added to this part to obtain two effects. First, 383 self-attention was applied to the last vector of the Concat message block, in order to 384 obtain an interpretable vector to find the atom of the drug-target binding part. Second, 385 self-attention was applied to the last generated vector of atom message block and bond 386 message block, for effective learning by assigning weights according to the degree to 387 which substructures for each atom and bond contribute to drug-target binding. ABCnet is similar to the structure of the existing DTI model. The first input is the 389 drug SMILES data, and the second input is a one-dimensional protein sequence in 390 which amino acids are expressed. As each input, we embed features in the Drug Model 391 and Protein Model. In particular, in the Drug Model, Atom-based GCN and Edge-based 392 GCN are calculated so that the graph data structure can be learned better. Here, Atom-based GCN is defined as ACnet, and Edge-based GCN component is defined as 394 BCnet. Attention is added to each result so that importance can be considered. In order 395 to predict the interaction, we fused the two results that went through attention and 396 used a fully-connected layer. In addition, by adding attention at the end, it was possible 397 to determine which drug atom and which protein amino acid is related in DTI. In other 398 words, the design intention is to express the correlation between each element of Drug 399 and Protein in the DTI numerically, and to enable an analytical model as a result. Consequently, we build Total model, which is Two-ways End-to-End Neural Network for 401 predicting DTI. Protein is a sequence of several amino acids, which can be expressed in the form of 404 one-dimensional amino acid sequence data. Sequence data can be expressed as a matrix 405 in the form of one-hot encoding, and there are a total of 26 amino acids (include 406 unknown). When expressing methionine, 'M' can be expressed as a vector having a size 407 of (26, 1). If the protein sequence has 1,200 amino acids, it can be expressed as a size of 408 (26, 1200) . Similarly, protein sequences can have different lengths, and the length can be 409 adjusted depending on whether the model wants a fixed length or not. Using a one-hot 410 encoding matrix as input data, it passes through a one-dimensional convolutional neural 411 network to embed a protein sequence. Since our study focused on the modification of 412 the drug model, the protein model consisted of a simple extraction section. Two problem situations were assumed for the performance evaluation of our model. Binary classification, where the label that predicts whether the drug-target is bound or 416 11/17 not, can be expressed as 0 and 1, and regression, which predicts the IC50, which is the 417 drug-target interaction score. In Binary Classification, various MPNN models introduced and described above 419 (previously MPNN, BCnet, ACnet, ABCnet) were all tested to find the MPNN model 420 with the best performance. The model with the best performance was used for the 421 regression problem experiment, and it was analyzed and compared with the DTI SOTA 422 model. When comparing various MPNN models in binary, ABCNet, which is the best 423 model, achieved more than 96.6% performance, and in the regression problem, it was 424 compared with ABCnet and SOTA models. In order to predict the DTI, experiments were conducted to predict the drug-target 427 binding (binary) and drug-target binding score (regression). The experimental dataset 428 for each of Binary and Regression is shown in Table 3 Binary Classification was performed with BindingDB Dataset [16] . BindingDB is a 430 public, web-accessible database of measured binding affinities, focusing chiefly on the 431 interactions of protein considered to be drug-targets with small, drug-like molecules. Interaction indicators of BindingDB include Ki and IC50. The experiment was carried 433 out by selecting the IC50 with the largest number of data as the label. IC50 is an 434 abbreviation of The inhibitory concentration of drug that causes 50% of the maximum 435 inhibition, and represents an active action on drug-target binding. It refers to the 436 concentration of the drug when the degree of binding between the drug and the target is 437 suppressed to 50%. Therefore, a small IC50 value means that a drug has a high binding 438 affinity since it reduces the degree of binding to a target by 50% even at a low 439 concentration. The IC50 value of BindingDB is an integer, and there are several 440 numbers. In order to proceed with model learning with binary classification, it is 441 necessary to determine whether the label is 0 or 1, that is, whether it is not combined or 442 whether it is combined. For the thresholds of positive and negative labels for binary 443 classification, refer to [26] and [19] papers. According to these papers, Label is positive 444 if its IC50 is less than 100nm, negative if IC50 greater than 10,000nm. Regression was performed with DAVIS Dataset [17] . The DAVIS Dataset has data 446 on the interaction of 72 kinase inhibitors with 442 kinases covering >80% of the human 447 catalytic protein kinome. The interaction indicator of DAVIS is Kd. Kd is the ratio of 448 antibody dissociation rate (how fast it dissociates from antigen) to antibody binding 449 rate (how fast it binds from antigen). Therefore, the smaller the Kd value, the greater 450 the binding affinity. We can know the strength of the binding interaction between the 451 drug and the target protein through the Kd value, that is, the strength of the binding 452 affinity. The dataset for binary classification was created as follows. There are 1,202,086 in 454 BindingDB, and a total of 187,700 data were sampled when extracted as positive and 455 negative according to the critical points mentioned in the [26] and [19] papers. In extracted. In addition, the test data is composed of a total of five data sets according to 464 the "Korea Testing and Certification Institute" test regulations, so that abnormal 465 performance such as overfitting can be checked. Data set validation and performance 466 validation were performed under the supervision of the respective institutions. Comparing the 5 test datasets of Test1-5, the negatives have the same data, while 468 the positives have about 2,300 different data. Objective evaluation was conducted to 469 find the optimal deep learning model through a total of 5 positive test datasets with a 470 difference of 95%. The evaluation index and loss function in our study are shown in Table 6 below. Where Y i is the ith label(1 for positive affinity and 0 for negative affinity) and Y i is 478 the ith predicted probability of being positive affinity for all n data. mMean Square Where h(x) is followed by: After the data preparation process, the experimental results for each model are shown 488 in Tables 7 and 8 focused only on the drug model, the overall performance fell short of expectations. My 503 goal is to focus on the drug model, taking one step towards finalizing an accurate DTI 504 model in the future. The next thing to proceed is the study of the protein model. Looking at the deep learning research that predicts the current DTI, the input data 506 used in the deep learning model uses a drug (molecule) sequence and a protein sequence. 507 In particular, the protein sequence is a one-dimensional sequence of amino acids. However, when docking Molecule and Target, 3D material is docked in 3D space. Therefore, the current deep learning research that docks Molecule and Target as a 510 one-dimensional sequence has a distance from the actual DTI. In order to narrow this 511 distance, research to predict the 3D structure of Protein or further predict the 3D 512 structure of Protein itself is absolutely necessary. Therefore, the next research that I will 513 proceed is to complete the accurate protein model by including the 3D structure of the 514 protein and further predicting it. It is not only to predict the simple binding value 515 through the model that predicts the protein 3D structure, but also predicts the binding 516 site to make a more accurate and useful model. We proposed an ABCNet model with Molecule Feature Embedding and Attention 519 applied within the DTI model in the DTI model for predicting drug and target affinity. 520 Among the affinity prediction problems, our model showed about 1% improvement in 521 performance in comparison with SOTA in classification, and similar or lower 522 performance in regression than SOTA. In the case of the regression problem, since the 523 MSE is compared, it is not possible to know how good the model is because the 524 difference in the size is so small, but it is meaningful in that the difference is not large. 525 Combining these two results, we come to the conclusion that there was an improvement 526 in performance compared to SOTA. In DTI research, like ours, most of the studies 527 improve Molecule Feature Embedding. A study to strengthen the expression learning 528 ability of Target Sequence Embedding is needed, and we plan to study this in the future. 529 15/17 Deep learning in drug target interaction prediction: Current and future perspectives Machine learning approaches and databases for prediction of drug-target interaction: a survey paper Predicting commercially available antiviral drugs that may act on the novel coronavirus (sars-cov-2) through a drug-target interaction deep learning model Prediction of drug-target interactions from multi-molecular network based on deep walk embedding model Comprehensive analysis of kinase inhibitor selectivity Pre-training of deep bidirectional transformers for language understanding Interpretable drug target prediction using deep neural representation Neural message passing for quantum chemistry Simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines The business impact of deep learning Deeppurpose: a deep learning library for drug-target interaction prediction Graph neural networks with multiple feature extraction paths for chemical property estimation Molecular graph convolutions: moving beyond fingerprints Semi-supervised classification with graph convolutional networks Directional message passing for molecular graphs Bindingdb: a web-accessible database of experimentally determined protein-ligand binding affinities Molecule attention transformer A multiple kernel learning algorithm for drug-target interaction prediction Deepdta: deep drug-target binding affinity prediction Activity, assay and target data curation and quality in the chembl database Language models are unsupervised multitask learners Self-attention based molecule representation for predicting drug-target interaction A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility Attention is all you need Improving chemical similarity ensemble approach in target prediction Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction Estimated research and development investment needed to bring a new medicine to market An in silico deep learning approach to multi-epitope vaccine design: a sars-cov-2 case study