key: cord-0203412-g06c0ehp authors: Nguyen, Tri Minh; Nguyen, Thin; Tran, Truyen title: Learning to Discover Medicines date: 2022-02-14 journal: nan DOI: nan sha: 63e89067c67bc6eb10e636fb9c671bee4b5e92a8 doc_id: 203412 cord_uid: g06c0ehp Discovering new medicines is the hallmark of human endeavor to live a better and longer life. Yet the pace of discovery has slowed down as we need to venture into more wildly unexplored biomedical space to find one that matches today's high standard. Modern AI-enabled by powerful computing, large biomedical databases, and breakthroughs in deep learning-offers a new hope to break this loop as AI is rapidly maturing, ready to make a huge impact in the area. In this paper we review recent advances in AI methodologies that aim to crack this challenge. We organize the vast and rapidly growing literature of AI for drug discovery into three relatively stable sub-areas: (a) representation learning over molecular sequences and geometric graphs; (b) data-driven reasoning where we predict molecular properties and their binding, optimize existing compounds, generate de novo molecules, and plan the synthesis of target molecules; and (c) knowledge-based reasoning where we discuss the construction and reasoning over biomedical knowledge graphs. We will also identify open challenges and chart possible research directions for the years to come. The COVID-19 pandemic has triggered an unprecedented rise of investment capital in AI for drug discovery (DD), the process of identifying new medicines for a druggable target [Zhang et al., 2021] . Recent breakthroughs in AI present a great opportunity to break the so-called Eroom's Law in DDthe inverse of the well-known Moore's Law-dictating that the rate of FDA drug approval is slowing down despite a huge increase in development cost [Scannell et al., 2012] . Enabled by deep learning advances, powerful computing, and large databases, modern AI is ready to make a huge impact through in silico processes to supplement and sometimes replace the in vitro counterparts of drug development. For a wide range of problems, from determining the 3D structure of proteins to predicting drug-target binding, to generating synthesizable molecules, AI has helped change the DD landscape in recent years. The reverse also holds: The problems in DD necessitate new advances in AI methodologies to deal with the new scope and complexity typically not seen in traditional application domains of AI such as computer vision and language processing. There are three major DD questions AI can help answer. The first is, given the molecule, what are its chemo-biological and therapeutic properties? Second, for a given target, what kind of molecules will therapeutically modify its functions? Finally, given a molecule, how can we synthesize and optimize the molecule from the available compounds, meaning solving the problems of synthetic tractability and reaction planning? In this survey, we bring in the machine learning (ML) and reasoning perspectives for answering these questions, with an emphasis on recent developments. Each question poses representation, learning, and reasoning sub-problems. This is because drugs, targets, and the hosting environments need to be represented in machine comprehensible formats. Modern ML suggests that the representations should be learned instead of handcrafted to best explore the power of computing and the richness of data (Sec. 2). Once learning has been completed, the next phases of prediction, search and discovery are performed using reasoning methods that leverage the learned models (Sec. 3) and the vast domain knowledge (Sec. 4). Before concluding, we will discuss the remaining challenges and opportunities for AI/ML in this important area (Sec. 5). See Fig. 1 for a taxonomy of the problem space, which we follow in the paper. The first step of applying AI/ML is forming a computerreadable representation of biomedical entities and concepts. We will primarily focus on the drug-target pairs. A drug is a small molecule while a target such as protein is a large (macro) one. Typically, in drugs, we are concerned with atoms and bonds, while in protein, we are concerned with amino acids. The atoms and bonds of a small molecule can be efficiently represented as a string of ASCII characters. A popular representation is SMILES (Simplified Molecular-Input Line-Entry System), which can be decoded back to the atom graph. However, the SMILES string may not be unique as a molecule can have different SMILES forms. Figure 1 : Three aspects of AI in drug discovery: (i) Transforming the biological data into representations readable by computer; (ii) data-driven reasoning in which models estimated from data are used to infer properties, optimize, generate molecules and plan synthesis; and (iii) reasoning with biomedical knowledge graphs. Likewise, a protein can also be represented by a string of characters varying in length. Each character represents one of the 20 amino acids. We often incorporate the evolutionary information of the target sequence by searching for related proteins to form multiple sequence alignment (MSA) and extracting evolutionary information (EI). The intuition is that the important substructure of protein remains stable through evolution. The EI has shown its effectiveness in several tasks such as protein folding prediction [Jumper et al., 2021] . A richer and more precise representation of a molecule is attributed graph. A molecular graph is defined as G = (V, E) where V is the atom set of the molecule and E is the edge set of bondings between atoms. To balance between 3D structural information and simplicity, 2D representation via attributed graph can be used. For example, in the case of protein, we predict the distance/contact between residues to form the contact/distance map. The contact/distance map is then used as an adjacency matrix of an attributed graph where each node represents a residue and edges represent the contact/distance between residues. Indeed, many biomedical problems are well cast into graph reasoning: Molecule properties prediction as graph classification/regression, drug-target binding as graph-in-graph, chemical-chemical interaction as graph-graph pairing, molecular optimization as graph edit/translation, and finally chemical reaction as graph morphism. For small molecules, fingerprints are often used to encode the 2D structure into a vector. One approach is using structural keys to encode the structure of the molecule into a bit string, each bit represents the presence or absence of predefined feature such as substructure or fragment [Durant et al., 2002] . The structural keys suffer from the lack of generalization because it depends on the pre-defined fragments and substructure to encode the molecule. The alternative approach is hashed fingerprints which encode the counting of molecular fragments into numeric values using a hash function, without relying on a pre-defined library. Based on the fragment enumerating process, the hashed fingerprints can be categorized into path-based and circular types [Rogers and Hahn, 2010] . In [Duvenaud et al., 2015] , manual fingerprint extraction is replaced by a learnable hashing function on the convolution over the molecular graph. For proteins, a set of descriptors are constructed based on the amino acids, their appearance frequency [Gao et al., 2005] , their individuals [Sandberg et al., 1998] or autocorrelation [Feng and Zhang, 2000] physical-chemical characteristic, and sequence-order feature derived from physicochemical distance. With the advance in the protein structure prediction, the distance between residues is also considered in protein description [Xu et al., 2020]. The string representation of molecules makes it ready for applying language modeling techniques, assuming the existence of statistical sequential patterns, akin to those found in linguistic grammars. Because the atom structure in the molecule follows a set of rules such as valance, we can view it as grammar rules in chemical language. This is not limited to string representation and can be extended to other types of representation such as graphs. Several unsupervised representation learning models have exploited the structural patterns to learn the molecule representations. An early work in sequential representation learning, word2vec [Mikolov et al., 2013] , learns the representation by using the predicting neighbour tokens as a self-supervised task. Recently, Transformers with BERT-like masking strategy has become a popular technique to learn sequential representations, e.g., out of molecular SMILES sequence [Chithrananda et al., 2020] . Likewise, the graph representation of molecules allows us to learn the graph structure patterns. For example, DeepWalk [Perozzi et al., 2014] learns the node representation of the given graph structure using random walks to learn the pattern of nodes' neighbour. Grover [Rong et al., 2020] uses subgraph masking as contextual properties prediction and graph motif prediction. An increasingly popular strategy is through exploring the local structure in the chemical space by using contrastive learning for estimating representations. This works by minimizing an energy-based loss to keep the distance in the embedding space small for similar molecules and large for dissimilar pairs [Qiu et al., 2020] . The main difference between these contrastive losses is the number of positive, negative samples and how the pairs are sampled. There are three basic questions in drug discovery. The first question is determining whether the given molecule is druglike, meaning having therapeutic effects on druggable targets. The second question is given a biological target, what are the candidate compounds that are likely to modify their functions in a desired way? This question addresses two subproblems: searching and generating. The former is about finding a suitable molecule from an approved list, and the latter is generating a de novo molecule tailored to the target. The final question is given a molecule, how can we make it? This question addresses the sub-problems of synthetic tractability, reaction planning, and retrosynthesis. The first question is predicting the molecule's properties. There are a wide range of prediction tasks, from predicting the drug-likeness, which targets it can modulate, to predicting molecule dynamics/kinetics/effects/metabolism if administered orally or via injection. Molecule property prediction is a fundamental task in many stages of drug discovery. The task is a specific case of many-body systems where we predict the emergent properties of a group of interacting objects. The most popular first-principle technique to tackle this general problem is Density Functional Theory (DFT) to approximate the wave function, which describes the quantum state of an isolated quantum system in the many-body system. However, calculating DFT is computationally expensive and can take up to O(10 3 ) seconds for a medium-sized molecule, making rapid screening over millions of potential candidates intractable. A recent approach is to approximate DFT calculations by learning a graph neural network (GNN) over the molecular graphs, which can be trained on a large pre-computed dataset. Once trained GNNs can run many orders of magnitudes faster than precise DFT methods with reasonable accuracy. A popular type of GNNs is the Message Passing Neural Network (MPNN) [Gilmer et al., 2017; Unke et al., 2021] which models the atoms interaction with message passing function, update function, and readout function. MPNN and its cousin, the Graph Convolutional Network (GCN), have since been frequently used in predicting physical chemistry properties (e.g., water solubility, hydrophobicity), physiology (e.g., toxicity), and biophysics (bind affinity) [Altae-Tran et al., 2017; Yang et al., 2019] . Alternatively, the molecule properties prediction can also be formulated as a reasoning process to answer a query (of a specific property), and thus lends itself to more elaborate neural networks such as Graph Memory Networks (GMN) [Pham et al., 2018] . Drug-target binding affinity indicates the strength of the binding force between the target protein and its ligand (drug or inhibitor) [Ma et al., 2018] . There are two main approaches: the structural approach and the non-structural approach [Thafar et al., 2019] . Structural methods utilize the 3D structure of proteins and ligands to run the interaction simulation between proteins and ligands. On the other hand, the non-structural approach relies on ligand and protein features such as sequence, hydrophobic, similarity and other structural information. Structural approach The structure-based approach usually relies on molecular docking which simulates the post-binding 3D conformation of drug-target complex. As there are several possible conformations, the simulated structure is evaluated using a scoring function. The scoring function can vary from the molecular mechanics' interaction energies [Meng et al., 1992] to machine learning predicted value derived from protein and drug features [Kundu et al., 2018; Gomes et al., 2017] , or 3D convolution on 3D structure [Stepniewska-Dziubinska et al., 2018] . Non-structural approach The non-structural approach relies on the drug/target similarity, and structural features such as protein sequence or secondary structure without relying on calculating the exact 3D structure of the drug-target complex. Popular among them are kernel-based methods which employ kernel functions to measure the molecule similarity [Cichonska et al., 2017] . Alternatively, we can use drug-drug, target-target, and drug-target similarity features as input for a classifier/regressor [He et al., 2017] . More recently neural networks have become common as they can learn the drug and target representations instead of handcrafting them. For sequences, 1D convolution [Öztürk et al., 2018] , BiLSTM [Zheng et al., 2020] , or language model feature [Nguyen et al., 2021] are used to encode the biological sequence to the latent space. The drawback of sequential features is that they ignore the structural information of the drug and the target which also plays a critical role in the drug-target interaction. Thus, graph neural networks have been applied whenever graph structures are available [Do et al., 2019a; Jiang et al., 2020; Nguyen et al., 2021 ]. More elaborate techniques use self-attention to model the residue-atom interactions between drug and protein [Zheng et al., 2020; Nguyen et al., 2021] . The second question is what kind of molecules interact with the given target. The traditional combinatorial chemistry approach uses a template as a starting point. From this template, a list of variations is generated with the goals that they should bind to the pocket with good pharmacodynamic, have good pharmacokinetics, and be synthetically accessible. The space of drugs is estimated to be 10 23 to 10 80 substances but only 10 8 substances have been synthesized thus far. Thus, it is practically impossible to model this space fully. The current techniques for graph generations can be search-based, generative, or a combination of both approaches. The search-based approach starts with the template and uses optimization framework such as Bayesian Optimization to improve it over time. This approach does not require a large amount of data but demands a reliable evaluator through expensive computer simulation or lab experiments. A more ambitious approach is building expressive generative models of the entire chemical space, and thus it requires a large amount of data to train. The search-based approach can be formulated as structured machine translation. Here we search for an inverse mapping of the knowledge base and binding properties back to query molecules. In this approach, the template molecule is represented as a graph or a string. The starting molecule is optimized towards desirable properties. There are two common strategies for optimization. The first strategy is sequential optimization in the discrete chemical space via atom/bond addition/deletion while maintaining the validity of the molecule. This sequential discrete search fits well to the reinforcement learning frameworks with target molecule properties as rewards [Zhou et al., 2019] . Reinforcement learning can cooperate with the graph representation of the molecule with a graph policy network [You et al., 2018a] . The second strategy is continuous optimization in the latent representation space. We first encode the input molecule graphs or strings into the latent space. The encoder architecture depends on the input molecule representation, varying from sequence-based encoder (e.g., RNN [Gómez-Bombarelli et al., 2018] or molecule graph junction tree [Jin et al., 2018] ). To have an embedding space representing a set of specific properties, the encoder is jointly trained with the property prediction task [Gómez-Bombarelli et al., 2018] . Then the molecule is optimized in the latent space with Bayesian Optimization (BO) [Jin et al., 2018] or genetic algorithms [Winter et al., 2019] before being decoded back to the original molecule space. The bottleneck of this approach is in accurate modeling of the drug latent space. The molecule optimization can be viewed as inverse function learning where we learn the function that maps the desired outputs to the target structure. Then we can leverage the existing data and query the simulators in an offline manner. In particular, we start with randomly sampled structure x variable. Then the simulators answer the query structure x with the properties y. With a sufficiently large number of (x, y) pairs, the machine learning can learn the inverse function x ≈ g (y). The core idea of generative models is learning and sampling from the density function p(x) of the training data. The main challenges are due to the complexity of the discrete molecular space, unlike those typically seen in continuous domains like computer vision. The most popular generative models to date are variational autoencoder (VAE), generative adversarial networks (GAN), autoregressive models and normalizing flow models. VAE is a two-stage process: we first encode the visible input structure into the hidden variable and then decode back to the original structure. The first VAE implemented in modeling the drug space was by mapping the SMILES sequence into the vector space [Gómez-Bombarelli et al., 2018] . Then we explore the vector space by optimization methods such Bayesian Optimization (BO) [Gómez-Bombarelli et al., 2018] or genetic algorithms [Winter et al., 2019] . GraphVAE [Simonovsky and Komodakis, 2018] operates directly on the expressive graph representations. Since the iterative generation of discrete structure such as graph is non-differentiable, GraphVAE models the decoded graph as probabilistic fully-connected on the restricted k-node domain. Then the decoded graph is compared with the ground truth by a standard graph matching. The main drawbacks of searching in the latent space of VAEs is that it cannot explore the low density regions, where most interesting novel compounds reside. A more intrinsically explorative strategy is through compositionality, where novel combinations can be generated once the compositional rules are learnt. This has been studied under GrammarVAE [Kusner et al., 2017] , an interesting method that imposes a set of SMILES grammar rules via parse trees to ensure the validity of the SMILES sequence. GAN is a powerful alternative to VAEs as it does not require an encoder, and hence it models the compound distribution implicitly. GAN has two sub-models: a discriminator and a generator. The discriminator determines if any two samples come from the same distribution. The generator learns the to generate good samples by trying to fool the discriminator to believe that the generated samples are real training data. Mol-CycleGAN [Maziarka et al., 2020] learns the mapping function G : X → Y and G : Y → X with two discriminators D X and D Y where X is the set of input molecule and Y is the molecule set with desired properties. This ensures the generator transform input molecule to the desired properties while retaining the structure. Autoregressive models factorize the density function p(x) as: hence allowing generation of molecules in a stepwise manner. GraphRNN [You et al., 2018b ] encodes a sequence of graph states using RNN. Each state represents a step in the graph generation process. GraphRNN uses BFS to reduce the complexity of learning all the possible graph state sequences. Normalizing flow models explicitly learn the complex density function by transforming the simple distribution through a series of invertible functions. GraphAF [Shi et al., 2020b] defines an invertible function mapping the multivariate Gaussian distribution to a molecular graph structure. Each step of molecule generation samples random variables to map them to atom/bond features. The third question is given a molecule graph, how can we synthesize the target molecule? The problem is known as retrosynthesis planning, and it involves determining a chain of reactions to finally synthesize a target molecule with high efficiency and low cost. At each reaction step, we need to identify a set of reactants for an intermediate molecule. This problem can be viewed as the reverse of chemical reaction prediction. Normally, we have chemical reaction prediction where we predict the post-reaction products of two or more molecules. However, in retrosynthesis, given the post-reaction product, the task is to search for two or more feasible candidates for chemical reaction. Both reaction prediction and retrosynthesis can be cast as graph morphism, where the molecules form a graph of disconnected sub-graphs, each of which is a molecule. Reaction changes the graph edges (dropping bonds and creating new bonds) but keeps the nodes (atoms) intact. A learnable graph morphism was introduced in GTPN [Do et al., 2019b] , a reinforcement learning based technique to sequentially modify the bonds. There are two main approaches to solve the retrosynthesis problem: template-based and template-free. The templatebased approach relies on the set of predefined molecules to construct the target molecules. This approach formulates the retrosynthesis as the subgraph matching problem to match the template to the target molecule. The matching problem is then solved by a variety of techniques, ranging from a simple deep neural network [Baylon et al., 2019 ] to a more sophisticated framework such as conditional graphical models [Dai et al., 2019] . The template-based approach suffers from poor generalization on unseen structures as it relies on predefined fragments and template libraries. The template-free approach is proposed to overcome the poor generalization of the template-based approach by inheriting the strong generalization from the (machine) translation model. The Transformer model can effectively solve the retrosynthesis problem formulated as sequence-to-sequence, in which the product molecule SMILES sequence is translated into a set of reactants SMILES sequences [Karpov et al., 2019] . Graph-to-graph is another approach [Shi et al., 2020a] , in which at first the reaction center is identified using edge embedding of the graph neural network to break the target molecule into synthons. Then the synthons are translated into reactants using graph translation model. The biomedical community has accumulated a vast amount of domain knowledge over the decades, among them those structured as knowledge graphs are the most useful for learning and reasoning algorithms. We are primarily interested in the knowledge graphs that represent the relationships between biomedical entities such as drug, protein, diseases, and symptoms. Examples of manually curated databases are OMIM [Amberger et al., 2019] and COSMIC [Forbes et al., 2017] . Formally a knowledge graph is a triplet K = H, R, T where H and T are the set of entities, R is the set of relationship edge connecting entities of H and T . Biomedical knowledge graphs enable multiple graph reasoning problems for drug discovery. Among them is drug repurposing, which aims to find novel uses of existing approved drugs. This is extremely important when the demands for new diseases are immediate, such as COVID-19; when the market is too small to warrant a full de novo costly development cycle (e.g., rare, localized diseases). Given a knowledge graph, the drug repurposing is searching for new links to a target from existing drug nodes -a classic link prediction problem. This setup is also used in gene-disease prioritisation in which we predict the relationship between diseases and molecular entities (proteins and genes) [Paliwal et al., 2020] . Another reasoning task is polypharmacy prediction of the adverse side effects due to the interaction of multiple drugs. The multi-relation graph with graph convolution neural network can encode the drug-drug interactions [Zitnik et al., 2018] . Given a pair of drugs, the drugs are embedded using the encoder and the polypharmacy prediction task is formulated as a link prediction task. In what follows, we briefly discuss two major AI/ML problems: graph construction and graph reasoning. Biomedical knowledge graph is constructed using existing databases or a rich source of data from biomedical publications. As manual literature curation is time-consuming, ML has been applied to speed up the process. The usual framework starts with relevant sentences filtering, followed by biomedical entity identification and disambiguation [Weber et al., 2021] . The biomedical entities relationships are extracted from selected text using rule-based method [Müller et al., 2018] , unsupervised [Szklarczyk et al., 2021] , or supervised manner [Li et al., 2016] Reasoning on knowledge graphs is the process of inferring the relationship between a pair of entities as well as the logic behind the relationship. Machine learning reasoning applying to this problem can be categorized into rule-based reasoning, embedding-based reasoning, and multi-chain reasoning. Rules-based reasoning uses logic rules or ontology to infer the new triplet from the knowledge graph KG. A logic rule is defined by its head r(x, y) and body B = {B 1 , B 2 , ..., B n }: AMIE [Galárraga et al., 2013] explores the knowledge graph with the mining scheme similar to association rule mining. Ontology is the formal way to describe the types, categories of entities' structure. Web ontology language (OWL) is a logic-based language to describe the entities and their relationship. OWL can apply to complex structure like biomedical knowledge graphs . The logic-based reasoning suffers from the lack of generalization. A more robust technique assumes a distributed representation of entities and relations, typically as embedding vectors in high-dimensional spaces. Matrix/tensor factorization projects the high-dimensional/multi-way objects into multiple low dimensional vectors. TriModel [Mohamed et al., 2020] learns a low-rank vector representation Θ E and Θ R of knowledge entities E and relations R. The graph embedding encoder is trained using tensor factorization where each entity is represented by three embedding vectors. The Distance-based models exploit the fact that given a triplet (h, r, t), the embedded representation of h and t is in the proximity, translated by the embedded vector of the relationship r. The best known model TransE [Bordes et al., 2013] learns the embedding of entities by minimizing the distance between h + r and t and maximizing the distance between the h + r and t where (h, r, t ) triplet does not hold. Structural information like 2D structure or 3D conformation is also helpful for entity representation learning. It is necessary to integrate heterogeneous information with structural information. The knowledge graph embedding can be combined with the structural embedding using neural factorization machine to form the hybrid representation [Ye et al., 2021] . Shallow embedding has achieved remarkable results in reasoning over the biomedical graphs. However, they can fail to reason when presented with multiple complex relationships. Multi-chain reasoning extends the reasoning from a triplet to an extended path of reasoning chain. DeepPath [Xiong et al., 2017] applies reinforcement learning (RL) to find the optimal path of reasoning in the knowledge graph. The RL can be combined with pre-defined logic rules to learn the drug repurposing to achieve explainable reasoning [Liu et al., 2021] . We are now in a position to discuss remaining challenges and chart possible courses to overcome them. Large biomedical space The drug space is estimated to be from 10 23 up to 10 60 . Due to the diversity of the molecules in term of function and structure, and the combinatorial nature of their interaction, unconstrained exploration of the biomedical space-such as molecule optimization and generation-is intractable. Search-based optimization requires an accurate predicting model which maps the generated molecule to the target properties. The generated molecule may have undiscovered properties which leads to an inaccurate predicting model. As a result, the search direction can be misleading. Human implicit and explicit feedback can assist and redirect the optimization to the correct course. Data quality Poor data quality will have a snowball effect in the multi-stage process of discovery. The data error may lead to an inaccurate machine learning model when trained on insufficient data. Factors affecting the data quality are data entry error, hidden bias, and incompleteness due to law and regulations. Machine learning techniques can help enhance the data quality by detecting and removing the hidden bias in the early stage of data collection, data pre-processing, or considering the bias in the model design. One promising direction is to develop a "foundation model" trained on large-scale data and then adapted to a wide-range of relevant downstream tasks, similar to what is happening in the space of text and vision [Bommasani et al., 2021] . Large gap between virtual screening and real clinical trials There is a large gap between clinical trial results and in silico results [Viceconti et al., 2021] : Clinical trials can fail despite excellent model prediction. For example, machine learning only predicts the interaction between a drug and a protein without factoring in a chain reaction or off-target interaction that reduces the effectiveness of the drug. It is necessary to have a drug discovery framework that takes account of multiple and chain drug-target, drug-drug, and protein-protein interactions. Drug effect on the protein functions The current drug discovery and optimization work on the binding interaction between the target protein and drug molecule. The machine learning molecule generation and optimization work on the principle of targeting a specific set of properties or proteins. The machine learning framework tries to generate or optimize a molecule that is likely to fit to the binding pocket of the protein. However, there is no clear connection between the binding activity predicted by the machine learning framework and the target protein function change. This opens up the direction to cooperate the protein function information from other sources like literature into the optimization model. Personalized prescription and drug discovery Personalized medicine allows efficient and safe treatment by coursing the treatment based on the patient's genomic environments. With the advance in the 3D printing techniques in pharmaceutics [Goole and Amighi, 2016] , a patient-tailored drug delivery system allows safe and efficient usage of drugs. At the same time, with the development of bio-markers in both clinical and biomedical data, the information from bio-markers is getting integrated into the drug discovery loops. From the machine learning point of view, it presents a challenge as well as an opportunity in personalized medicine and drug discovery systems. With the advance in generative models and optimization, the machine learning framework can combine bio-maker data with the high-speed drug screening, optimization and printing techniques to develop a personalized drug discovery system. Efficient human-machine co-creation The end goal of the drug discovery process and the intermediate goal of machine learning systems may not align due to undiscovered knowledge. Having an efficient human-machine ecosystem allows the domain experts to inject prior knowledge, verify and discover the underlying mechanism. We have provided a survey on recent AI advances targeting one of the most impactful areas of our time: drug discovery. While this is a very challenging task, the rewards are huge, and AI is already making solid progress, contributing to the saving of development costs, and speed up the discovery. Reversing Eroom's Law will demand new fundamental advances in AI itself, from learning in the low-data regime, to explore the vast molecular space, to sophisticated reasoning, to robotic automation. AI will need to work alongside humans and help expand the knowledge bases and then benefit from it. Low data drug discovery with one-shot learning org: leveraging knowledge across phenotype-gene relationships Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification Anna Cichonska, Balaguru Ravikumar, and Elina Parri et al. Computational-experimental approach to drugtarget interaction mapping: A case study on kinase inhibitors Attentional multilabel learning over graphs: a message passing approach Convolutional networks on graphs for learning molecular fingerprints Simon A Forbes, David Beare, and Harry Boutselakis et al. COSMIC: somatic cancer genetics at high-resolution Association rule mining under incomplete evidence in ontological knowledge bases Automatic chemical design using a data-driven continuous representation of molecules 3D printing in pharmaceutics: A new tool for designing customized drug delivery systems SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines Drug-target affinity prediction using graph neural network and contact maps Junction tree variational autoencoder for molecular graph generation Highly accurate protein structure prediction with AlphaFold A transformer model for retrosynthesis A machine learning approach towards the prediction of proteinligand binding affinity based on fundamental molecular properties Grammar variational autoencoder BioCreative V CDR task corpus: a resource for chemical disease relation extraction Neural multi-hop reasoning with logical rules on biomedical knowledge graphs Overview of the detection methods for equilibrium dissociation constant KD of drug-receptor interaction Mol-CycleGAN: a generative model for molecular optimization Automated docking with grid-based energy evaluation Efficient estimation of word representations in vector space Discovering protein drug targets using knowledge graph embeddings Textpresso Central: A customizable platform for searching, text mining, viewing, and curating biomedical literature GEFA: early fusion approach in drug-target affinity prediction DeepDTA: deep drug-target binding affinity prediction Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs DeepWalk: Online learning of social representations Graph memory networks for molecular activity prediction Graph contrastive coding for graph neural network pre-training Self-supervised graph transformer on large-scale molecular data New chemical descriptors relevant for the design of biologically active peptides. a multivariate characterization of 87 amino acids Diagnosing the decline in pharmaceutical R&D efficiency A graph to graphs framework for retrosynthesis prediction GraphAF: a flow-based autoregressive model for molecular graph generation Development and evaluation of a deep learning model for protein-ligand binding affinity prediction The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets Comparison study of computational prediction tools for drug-target binding affinities In silico trials: Verification, validation and uncertainty quantification of predictive models used in the regulatory evaluation of biomedical products A unified drug-target interaction prediction framework based on knowledge graph and recommendation system The AI index 2021 annual report Predicting drug-protein interaction using quasi-visual question answering system Optimization of molecules via deep reinforcement learning Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks