key: cord-0000000- authors: Deng, Lei; Huang, Yibiao; Liu, Xuejun; Liu, Hui title: Graph2MDA: a multi-modal variational graph embedding model for predicting microbe-drug associations date: 2021-08-14 journal: nan DOI: nan sha: 714e55ea0e31133fa8164843cb54263014f2bbee doc_id: y7zbivx8 cord_uid: Accumulated clinical studies show that microbes living in humans interact closely with human hosts, and get involved in modulating drug efficacy and drug toxicity. Microbes have become novel targets for the development of antibacterial agents. Therefore, screening of microbe-drug associations can benefit greatly drug research and development. With the increase of microbial genomic and pharmacological datasets, we are greatly motivated to develop an effective computational method to identify new microbe-drug associations. In this paper, we proposed a novel method, Graph2MDA, to predict microbe-drug associations by using variational graph autoencoder (VGAE). We constructed multi-modal attributed graphs based on multiple features of microbes and drugs, such as molecular structures, microbe genetic sequences, and function annotations. Taking as input the multi-modal attribute graphs, VGAE was trained to learn the informative and interpretable latent representations of each node and the whole graph, and then a deep neural network classifier was used to predict microbe-drug associations. The hyperparameter analysis and model ablation studies showed the sensitivity and robustness of our model. We evaluated our method on three independent datasets and the experimental results showed that our proposed method outperformed six existing state-of-the-art methods. We also explored the meaningness of the learned latent representations of drugs and found that the drugs show obvious clustering patterns that are significantly consistent with drug ATC classification. Moreover, we conducted case studies on two microbes and two drugs and found 75%-95% predicted associations have been reported in PubMed literature. Our extensive performance evaluations validated the effectiveness of our proposed method. 185133 Microbe communities, including bacteria, archaea, viruses, protozoa, and fungi, are closely associated with human hosts (Huttenhower et al., 2012, Sommer and Backhed, 2013) . They reside in various human organs, such as skin, gastrointestinal tract, oral cavity, and other tissues. Accumulated studies have confirmed that microbe plays a fundamental role in keeping the homeostasis of the human internal environment. Their beneficial functions cover quite a few aspects, such as improvement of metabolism, synthesis of essential vitamins, resistance to pathogens, and enhancement of immunity (Ventura et al., 2009) . For example, gut microbes play a key role in modulating immune response or immune tolerance through Treg cell regulation (Malla et al., 2018) . Also, the microbiome acts in collaboration with host mucosal sites as a barrier to pathogens (Macpherson and Harris, 2004) . As such, imbalance of microbe communities often leads to various diseases, such as diabetes (Wen et al., 2008) , obesity (Zhang et al., 2009) , and even cancer (Schwabe and Jobin, 2013) . Also, several studies have shown that microbes involve in drug absorption and metabolism, and thus modulate the drug efficacy and drug toxicity (Zimmermann et al., 2021) . For instance, the gut Actinobacterium Eggerthella lenta is responsible for the inactivation of the cardiac drug digoxin (Haiser et al., 2013) . The vaginal microbiota is effective at degrading tenofovir before the host cell converts the drug into its drug active form (Klatt et al., 2017) . On the other hand, drugs also, in turn, change the diversity and function of microbe communities living in the human body. Some new drugs have been developed to target the microbes to maintain the fitness of microbial community structure (Kashyap et al., 2017) . But everything has two sides, the interplay between microbes and drugs leads to the spread of drugresistant bacteria, which poses another serious threat to human health (Sutradhar et al., 2021) . As a result, there is an urgent need to identify the associations between microbes and drugs. As conventional wet-lab assay is time-consuming and labor-intensive, in silico methods have been proposed to prioritize microbe-drug associations for further experimental validation. HMDAKATZ adopted KATZ metric to prediction microbe-drug associations, while NTSHMDA (Luo and Long, 2020) used random walk with restart on microbe-disease heterogeneous network to predict new associations. Recently, several databases have released a large number of experimentally validated associations of microbes with drugs, such as MADA (Sun et al., 2018) , Abiofilm (Rajput et al., 2018) and Drugvirus (Andersen et al., 2020) , as well as the COVID-19 database HDVD (Meng et al., 2021) . These freely accessible datasets motivated the development of deep learning-based methods to predict microbe-drug associations. For example, EGATMDA (Long et al., 2020b) constructed a novel integrated graph attention network to predict human microbe-drug associations. GCNMDA (Long et al., 2020a) engineered conditional random field into the graph convolutional network framework to predict microbe-drug association. LAGCN (Yu et al., 2021 ) is another computational model based on layered attention map convolutional network. The studies demonstrated their progress on the predictive performance compared to traditional classifiers, but they still have limitations that seriously restrict their effectiveness. These models constructed heterogeneous networks and extract features to predict new associations. However, they failed to tackle outlier nodes in the feature extraction, which often resulted in incorrect predictions. Also, these models take into account either network structural information or node features, but failed to deal with both simultaneously. In this paper, we proposed a novel model, Graph2MDA, which is an integrated framework of Variational Graph Autoencoder (VGAE) and Deep Neural Network (DNN) to prioritize microbe-drug association. To make full use of multiple types of attributes that can be derived from microbes and drugs, we constructed multi-modal attribute graphs composed of microbes and drugs and their associations. For drugs, the molecular structure, drug interaction profile, and network topological attributes were taken into account. For microbe, the genomic sequence and functional annotations are considered. Based on the multi-modal attribute graphs, we trained VGAE to learn informative and interpretable latent representations of each node and whole graph. Next, a DNN classifier is fed by the learned latent representation and outputs the probability standing for the presence of microbe-drug associations. We have performed extensive performance evaluations and demonstrated that Graph2MDA can effectively leverage multi-modal attribute networks to improve its performance. On other independent datasets, Graph2MDA outperformed the other six state-of-the-art methods. Especially, we found that the drugs show obvious clustering patterns in the latent representation space, and the clusters are significantly consistent with drug ATC classification. This validated the interpretability of the learned latent representations. Finally, our case studies on two microbes (i.e. human immunodeficiency virus, Mycobacterium tuberculosis) and two drugs (i.e. Cloxacillin and Aloe Vera Gel) also proved the effectiveness and robustness of our proposed model. To our best knowledge, we are the first to apply VGAE to the prediction of microbe-drug associations. The experimentally confirmed microbe-drug associations are obtained from the MDAD database (Sun et al., 2018) . The database has collected a total of 2,470 clinical reports or experimentally verified associations between 1,373 unique drugs and 173 kinds of microbes. The drug-drug interactions are retrieved from the Drugbank database (Wishart et al., 2018) . We selected the interactions associated with the drugs included in the MDAD dataset and got 5,586 drug-drug interactions covering 1,228 drugs. The microbe-microbe interactions are derived from the MIND database 1 . Similarly, we selected only the interactions associated with the microbes included in the MDAD dataset and got 138 microbe-microbe interactions covering 123 microbes. The FASTA genome sequence of microbes is downloaded from the NCBI database, and 131 of the 173 microbes are found. The detailed numbers of the aforementioned datasets are shown in Table 1 . Given the data source of drugs and microbes, we built the bipartite network (BipNet) using only the microbe-drug associations. Specifically, the nodeset of the BipNet network includes only the microbes and drugs involved in known associations, as shown in Figure 1 . Also, we constructed the heterogeneous network (HetNet) that includes microbe-drug association, as well as the drug-drug and microbe-microbe associations. Note that the HetNet network contains additional nodes that are not appear in the BipNet network. Without loss of generality, denote by A ∈ R nd×nm the adjacent matrix of the network, in which nd and nm represent the number of microbes and drugs, respectively. If there exists a known association between node i and j, element A ij is 1, and otherwise 0. Molecular structure similarity We used SIMCOMP2 (Hattori et al., 2010) tool to calculate drug structural similarity. SIMCOMP2 measures the similarity between drugs based on their chemical structure information. We computed the pairwise drug structural similarity and then constrcted the similarity matrix DS struct ∈ R nd×nd , where DS struct (d i , d j ) represents the molecular structural similarity between drug d i and drug d j . Drug gaussian kernel similarity With the assumption that drugs with similar therapeutic effects interact closely with similar microbes, we used the Gaussian kernel interaction of drugs to calculated another similarity measure. Denote by DIP (d i ) the drug-drug interaction profile of drug d i , namely the i-th row of drug-drug interaction matrix representing the interactions between drug d i and all other drugs, the Gaussian kernel similarity DS gauss (d i , d j ) between drug d i and drug d j is formulated as below: in which µ represents the normalized kernel bandwidth, defined as follows: Where µ is the original bandwidth and is generally set to 1. Fused drug similarity We combined the molecular structure similarity DS struct (d i , d j ) and the drug Gaussian kernel similarity DS gauss (d i , d j ) to get the integrated drug similarity S d (d i , d j ), and the new drug similarity is calculated as follows: Since several drugs are far from the majority in terms of the similarity measure, we ran random walk with restart (RWR) on the drug-drug network to derive another drug attribute. RWR can effectively capture the inherent features of the local and global topological information of a network and has been widely used in image recognition to reduce noise (Jain et al., 2018) . Formally, RWR is defined as: in which θ is the restart probability, and T is the transition probability matrix, p (0) i ∈ R n×1 represents the starting probability vector of the i-th node, and p (t) i ∈ R n×1 denotes the probability of node i moving to other nodes at time t. After convergence of RWR, we obtain the probability distribution vector of each drug. As such, the probability distribution vectors are used as the network topological attribute matrix F d ∈ R nd×nd . With the integration of structural feature, the attribute of outlier nodes is effectively enriched, and lead to informative and interpretable latent representations. We used the Kamneva tool (Kamneva, 2017) to calculate the microbe functional similarity. To obtain the functional similarity between the two microbes, a microbial protein-protein functional association network was constructed, in which the nodes represent any gene family encoded by the genome, and the link represents a genetic neighbor score based on the latest STRING database (Szklarczyk et al., 2021) . Then, the functional similarity of the microbe was calculated as the ratio of the link scores connecting the two microbes to the sum of all the link scores of the two microbial gene families. Finally, a matrix Sm ∈ R nm×nm is used to represent the microbe function similarity, where Sm(m i , m j ) represents the similarity between microbe m i and microbe m j . The original genome sequences were encoded by one-hot coding, and all sequences were filled with 0 to make the length of all sequences the same. For microbes with no sequence found in NCBI, we use the mean of other known microbes instead. Finally, we ran principal component analysis (PCA) to extract the main dimensional feature of microbes (Weber et al., 2016) . The microbe sequence attribute is expressed by the k-dimensional matrix as Fm ∈ R nm×k . To make the descriptive values of microbes (drugs) comparable across multiple types of measures, we separately normalized the similarity-based attribute matrix S d and topology-based attribute matrix F d for drugs, as well as the Sm and Fm for microbes. Then, we constructed multi-modal attributes for microbes and drugs. In fact, we could construct some modals by combining the different attributes above. For simplicity, we consider only two different modals. The first is merely based on the similarity measure, denoted by X similarity ∈ R (nd+nm)×(nd+nm) and defined as below: The other modal integrated the secondary features extracted from the drug-drug interaction network, and the main component of microbe sequence. So, it is referred to as secondary attribute, denoted by X secondary ∈ R (nd+nm)×(nd+k) : The two different attributes of drugs and microbes are separately or combinedly taken as input into the VGAE learning framework, and we evaluate their influence on the predictive performance in our experiments. We proposed a novel model, Graph2MDA, an integrated framework based on Variational Graph Autoencoders (VGAE) and Deep Neural Networks (DNN), to predict the association between microbes and drugs in multi-mode networks. As shown in Figure 1 , VGAE is used to learn the latent representations (embedding) by taking as input the graph and node attributes constructed from raw data. The deep neural network (DNN) classifier receives the learned embedding by the VGAEs to predict microbe-drug associations. Note that we used two VGAE when BipNet and HetNet are simultaneously feed, and the output embedding vectors are concatenated to form the input of the DNN classifier. Variational Graph Autoencoder(VGAE) (Kipf and Welling, 2016 ) is a an unsupervised learning framework based on variational autoencoder (VAE). VAE (Kingma and Welling, 2014) is a deep generative model in the combination of variational Bayesian inference and neural networks, and it provides a generic probability modeling for describing a variable in latent space. VGAE extends VAE to learn interpretable latent representations of graph-structured data. VGAE uses a graph convolutional network (GCN) as the backbone of the encoder to learn the latent node-level and graphlevel representations, which is used by the decoder to reconstruct the graph. VGAE has made great achievements in many bioinformatics tasks, such as miRNA-disease association prediction (Ding et al., 2021) , lncRNAdisease association prediction (Shi et al., 2021) . However, VGAE has not been applied to predict microbe-drug associations, and current studies did not make full use of node similarity and topological attributes. represents the set of microbe and drug nodes, and each edge e ij ∈ E represent an association between a pair of different type nodes (microbe and drug) or same type nodes (two microbes or two drugs). Denote the adjacency matrix of G by A, and the attribute matrix by X served by X similarity or X secondary or a combined form of both. As shown in Figure 1 , the GCN encoder converts each node v i in the graph to a low-dimensional latent representation by aggregating the Based on the multi-modal attribute graphs constructed from multiple type of attributes of microbes and drugs, two VGAE models are trained to learn informative and interpretable latent representations for node attributes and whole graphs. Next, a DNN classifier is fed by the learned latent representations to predict microbe-drug associations. informations from its local neighborhood nodes and the attributes itself. Next, the decoder tries to reconstruct the adjacency relationship A ij corresponding to the node pair v i and v j . The encoder and decoder are trained to optimize the latent representations by progressive message propagation among the microbes and drugs in the heterogeneous network. Finally, the learned latent representations are fed into a DNN classifier to predict new microbe-drug associations. Encoder The encoder is a two-layer graph convolutional network (GCN) (Kip and Welling, 2016), which takes the network adjacency matrix A and the node attribute matrix X as input and output the latent variable Z. More specifically, we tried to model the conditional probability distribution q(Z|X, A) using Normal distribution by a two-layer GCN as below: q(Z|X, A) = N (Z; µ, σ 2 I) in which µ and σ are the mean and variance of the Gaussian distribution with respect to the latent variable Z, respectively. I is an identity matrix. The two-layer GCN is defined as follows: in which and W i represents the parameters of i-th GCN layer we need to train, ReLU = max(0, •) is an element-wise activation function, D is the degree matrix of A, and A is a symmetric normalized adjacency matrix. A is first normalized by symmetric normalization to maintain the size of the eigenvector so that the sum of all rows is 1. With the GCN output, we calculated the mean and variance of the Gaussian distribution as: where Wµ and Wσ represent the parameters of layer µ and layer σ we need to train. Once we obtain the value of µ and σ, Z can be sampled from the distribution q(Z|X, A) using the reparameterization technique (Kingma and Welling, 2014) , that is, z i can be calculated by the following formula: Where ε i ∼ N (0, 1). Intuitively, the GCN encoder progressively aggregates information from neighbors (including both microbes and drugs) to update the attribute for each node, so that the latent variable of each node (microbe or drug) yields to an informative representation that integrates both structural and attributes information. This is particularly useful for microbes (or drugs) with few annotations. Decoder Considering that the latent variables Z learned by the encoder already carry sufficient information, the decoder runs a simple inner product to reconstruct the adjacency matrix A. Formally, let p(A ij |z i , z j ) be the conditional probability having an edge between node i and j given latent variable z i and z j , we have in which p (A ij |z i , z j ) is defined as where ϕ(•) is a sigmoid function. The sigmoid function we use transforms take as input the inner products to estimate the probability of associations between microbes and drugs. The decoder is trained to output close to the real adjacency matrix A. The loss function consists of two parts, the first part E( * ) is the binary cross entropy between input graph A and output graphÂ, and the second part is the KL divergence between q(Z|X, A) and p(A|Z): According to the setting in VAE, we assume p(Z) ∼ N (0, 1). During training, we used stochastic gradient descent to train VGAE to minimize the loss function. We formulated the prediction of the association between microbes and drugs into a binary classification task. A multi-layer fully connected DNN is used as the classification model. The DNN classifier takes as input the latent representations of microbes and drugs, transforms the information through multiple non-linear hidden layers, and outputs the probability standing for the existence of the association between two nodes. The cross entropy is used as the loss function, which is first pretrained using the adaptive optimizer Adam, and then fine-tuned using the Stochastic Gradient Descent (SGD) optimizer. As the performance of the DNN classifier is affected by the hyperparameters, we empirically evaluated the impact of hyperparameters on the performance shown in detail in the subsequent experiments.According to the setting in VAE, we assume p(Z) ∼ N (0, 1). During training, the stochastic gradient descent algorithm was used to train VGAE to minimize the loss function. There are several hyperparameters in the VGAE and DNN classifier. To explore the impact of the hyperparameters, we performed a 10-fold CV on the MDAD dataset to observe the changing trend of performance so as to tune their values. We first investigated the effect of the number of hidden layers of the VGAE encoder and DNN classifier. As too many hidden layers lead to overfitting and meaningless latent representations, we considered most 4 hidden layers in the VGAE encoder and DNN classifier. We adopt a tower-shaped DNN structure, as suggested by He et al. that the towershaped DNN model can extract more compact and discriminative features at a higher level (He et al., 2016) . We tested the number of hidden layers in {1,2,3,4}, and the size of the hidden layer is set to 1024, 512, 256, 128, respectively. As shown in Figure 4 (a), the AUC value increases when the number of hidden layers of the DNN classifier increases, achieves the best performance at 3, and decreased once the layer number exceeds 3. Thus, we choose a three-layer tower structure for the DNN classifier model. As shown in figure 4 (b) , when the number of VGAE hidden layers is 2, the performance is the highest and drops sharply once the layer number exceeds 2. Moreover, we explored the impact of dropout and learning rate (lr) in the VGAE encoder. As shown in Figures 4 (c) and (d), when the dropout probability is greater than 0.5, the performance decreases slightly because too much high dropout will prevent information from spreading between hidden layers. We also noticed that too low or too high lr will cause performance degradation, and the AUC is highest when lr is 0.0005. Therefore, we set dropout to 0.5 and lr to 0.0005 for the VGAE encoder. To benchmark the performance of our proposed method, we compared our method with existing methods proposed for microbe-drug association prediction, as well as a few methods for link prediction problems in the bioinformatics field. We briefly summarized these methods as below: • EGATMDA (Long et al., 2020b) is proposed to predict human microbe-drug associations by constructing a novel integrated graph attention network. • GCNMDA (Long et al., 2020a ) is a method based on graph convolutional network and conditional random field to predict human microbe-drug association. • HNERMDA (Long and Luo, 2021) proposes an embedding representation based on heterogeneous networks for predicting human microbe-drug associations. • HMDAKATZ is developed specifically for microbial drug prediction based on the KATZ metric. • NTSHMDA (Luo and Long, 2020) adopts random walk with restart model to predict microbe-disease associations. • LAGCN (Yu et al., 2021) is a model based on the convolutional neural network with attention mechanism for predicting drug-disease associations. We ran these methods on the MDAD dataset with their default parameter settings, and adopt the AUC value as the performance measure. We performed 10-fold cross-validation (CV) for all methods, that is, the experimentally confirmed microbe-drug associations are randomly divided into 10 subsets with equal size, each subset is taken in turn as a test set and the remaining are used to train the model. The average prediction accuracy over the 10-folds is used as the final performance measure. To eliminate the bias of random sampling, the process is repeated 10 times, and the final AUC score was calculated from the average of 10 repetitions. As shown in the figure 2, our Graph2MDA model achieved the highest AUC value of 0.9732, followed by LAGCN with 0.9645 AUC value, while NTSHMDA gets the lowest AUC value 0.8893. Overall, the deep learning-based methods outperform those based on traditional machine learning. We went further to conduct ablation experiments to evaluate the impact of multi-modal attribute graphs on model performance. First, we tested three different input graphs, including bipartite network, heterogeneous network, and both of them, to perform ablation studies on the MDAD dataset. Specifically, the bipartite network includes only microbe-drug associations, while the heterogeneous network includes microbe-drug, drug-drug, and microbe-microbe associations. Table 2 shows the AUC values achieved by our model upon three different input graphs. For comprehensive contrast, we ran both 5-fold and 10-fold cross validations. We found that the best performance could be achieved when both networks are fed into the model, suggesting that the combination of multisource information facilitates the feature extraction for the prediction of associations between microbes and drugs. We also evaluated the robustness of our model upon multi-modal node attributes. Under the premise of using the BipNet+HetNet network, we independently tested the similarity attribute, secondary attribute, and both of them. Note that when both type attributes are used, the attribute vector is concatenated for each node. The AUC values gained by 5-fold and 10-fold cross validations are shown in Table 3 , and the best performance can be achieved when both node attributes are fed into the model. Meanwhile, we found that the secondary attributes also obtain notably high performance alone. We think the reason may lie in that the secondary attributes contain high-order information. To verify the generality of our model, we compared our method with the competitive methods on other two independent datasets, aBiofilm (Rajput et al., 2018) , DrugVirus (Andersen et al., 2020) . The aBiofilm database contains 1,720 unique anti-biofilm drugs that target more than 140 microbes. After filtering out the duplicates, we obtained 2,884 microbedrug associations covering 1,720 drugs and 140 microbes. The DrugVirus database has manually collected associations between 175 drugs and 95 human viruses from the drug database and related publications, there are 933 clinically or experimentally confirmed drug-virus associations. The detail of these two datasets is shown in Table 4 . Similarly, we ran all competitive methods using their default parameter settings on aBiofilm and DrugVirus datasets, and adopt the AUC value as the performance measure. As the DrugVirus dataset is relatively small, we performed a 5-fold cross-validation to avoid too small test set. Figure 3 shows the performance of Graph2MDA and six competitive methods. Compared with other methods, our model still showed the best performance in the two independent datasets. The performance evaluation shows that Graph2MDA is an effective and powerful computational model for predicting the association between microbes and drugs. Our model can learn high-level feature representations from the original input graph and node attributes. To mine the interpretability, we visualized the learned latent representations of drugs in the MDAD dataset using t-SNE(Hinton, 2017), which is a tool to realize high-dimensional data visualization by embedding high-dimensional features into twodimensional (2D) images. As shown in Figure 4 (a) , the scatter plot shows the distribution of drugs using the original attributes. It can be seen that before training the drug distribution is in chaos, while with the learned latent representations by our model, the drugs show clear clustering patterns, as shown in Figure 4 (b). To further explore the meaningness of the latent representations, we used the Anatomical Therapeutic Chemical (ATC) code of drugs to verify the consistency between the clustering patterns and ATC classification (Nahler, 2009) . The ATC code is assigned to medicine according to the organ or system it works on and how it works. We manually curated the ATC codes of the 1,373 drugs in MDAD dataset, and successfully map 269 drugs with ATC codes. According to the ATC classification system, drugs belonging to different categories are marked in different colors. For comparison, the drugs without ATC codes are also represented in gray, as shown in Figure 5 (c). We found that the yellow scattered spots (yellow corresponding to anti-infectives systemic use) obviously gather together and separate from other clusters. This observation confirms that the learned features of drugs are informative and interpretable, and we make a bold guess that our model can effectively learn the feature that can be translational to drug pharmacological functions. To further verify the effectiveness of Graph2MDA, we applied Graph2MDA to two popular antibacterial drugs, Aloe Vera Gel and Cloxacillin, and two microbes, human immunodeficiency virus (HIV) and mycobacterium tuberculosis as our case studies. For the top 20 predicted microbes or drugs, we cross-checked their synonyms by searching MeSH and DrugBank, and then verify whether the predicted microbe-drug associations have been reported by searching PubMed literature. Cloxacillin is a semi-synthetic penicillin antibiotic and is widely used to treat β-hemolytic streptococcal and pneumococcal infections as well as staphylococcal infections. Cloxacillin has an inactivation effect on the penicillinase of most staphylococci and is active against many strains of Staphylococcus aureus and Staphylococcus epidermidis that produce penicillinase (Castle, 2007) . For the top 20 predicted microbes of cloxacillin, we also found that 15 microbes (75%) have been studied, as shown in Table 5 . For example, Saengsai et al. studied the antimicrobial activity of cloxacillin against the iridoid lactone Plummericin against Enterococcus faecalis and Bacillus subtilis (Saengsai et al., 2015) . Orogade and Akuse also reported that cloxacillin can inhibit most to 50% activity of Staphylococcus aureus, Schistosoma, and Salmonella Typhi. The drug Aloe Vera Gel has long been used as a traditional medicine to induce wound healing. It is a natural product and is now used in the cosmetics industry. Although there are multiple instructions for use, the benefits of aloe vera are attributed to the polysaccharides contained in the leaf gel (Gupta and Malhotra, 2012) . Kaithwas et al. also studied the use of aloe vera in constipation, inflammation, cancer, ulcers, and diabetes (Kaithwas et al., 2014) . As a result, out of the 20 microbes predicted to be targeted by Aloe Vera Gel, 16 microbes (80%) have been reported by previous studies. As shown in Table S1 , we list the name of microbes and the PMIDs of the publications that reported the associations between the microbes and Aloe Vera Gel. For example, Fani M et al. investigated the inhibitory activity of aloe vera gel against certain cariogenic bacteria (Streptococcus mutans), periodontal bacteria (Aggregatibacter actinomycetemcomitans, porphyromonas gingivalis), and opportunistic periodontal bacteria (Bacteroides fragilis) isolated from patients with dental caries and periodontal disease (Fani and Kohanteb, 2012) Mycobacterium tuberculosis is a gram-positive aerobic bacteria. This bacterium can invade all organs of the body, but it is the most common cause of tuberculosis. Among the top 20 drugs predicted to target mycobacterium tuberculosis, 80% of drugs are supported by the literature. Table 6 shows the drug names and PMIDs of publications. For example, in PMID29311078, the absolute concentration method showed that the activity of Amikacin against Mycobacterium tuberculosis was higher than kanamycin and capreomycin (Dijkstra et al., 2018) . Apramycin is a unique aminoglycoside with antibacterial activity and ototoxicity. In two murine models of infection, Apramycin has shown significant antibacterial efficacy to Mycobacterium tuberculosis and Staphylococcus aureus [Meyer et al., 2014] . The human immunodeficiency virus (HIV) is a virus that can attack the human immune system. HIV will induce the failure and functional damage of key T cells, which will lead to dysfunction and defects of the entire immune system, and ultimately lead to opportunistic infections and tumors (Stingl, 1988) . We checked the top 20 predicted drugs and found 95% of them have been already confirmed, as shown in Table S2 . Take Erythromycin as an example, it has been reported to inhibit HIV-1 replication in macrophages through modulation of MAPK activity to induce small isoforms of C/EBPβ (Komuro et al., 2008) . In this paper, we propose a novel model based on Graph2MDA, an integrated framework of Variational Graph Autoencoder(VGAE) and Deep Neural Networks (DNN), to predict the correlation between microbes and drugs. The experimental results show that our proposed Graph2MDA model is better than the existing state-of-the-art methods. We think the main contributions of our work lie in at least three aspects: 1) We constructed multi-modal attribute graphs to effectively integrate the rich similarity and ontology information of microbes and drugs. 2) A novel framework based on VGAE and DNN was proposed to predict the association between microbes and drugs. Compared with current state-of-the-art methods for predicting microbe-drug association based on graph neural networks and link prediction, our method achieves better performance than competitive methods on three independent datasets. 3) We verified that the learned latent representations are sematically related to drug pharmacological functions. Specifically, we found the drugs show obvious clustering patterns in the latent representation space, and the clusters are significantly consistent with drug ATC classification. Discovery and development of safein-man broad-spectrum antiviral agents Cloxacillin -sciencedirect. xPharm: The Comprehensive Pharmacology Reference A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases Susceptibility of Mycobacterium tuberculosis to Amikacin, Kanamycin, and Capreomycin Variational graph auto-encoders for miRNA-disease association prediction Inhibitory activity of Aloe vera gel on some clinically isolated cariogenic and periodontopathic bacteria Pharmacological attribute of Aloe vera: Revalidation through experimental and clinical studies Predicting and manipulating cardiac drug inactivation by the human gut bacterium Eggerthella lenta SIMCOMP/SUBCOMP: chemical structure search servers for network analyses Deep residual learning for image recognition Visualizing data using t-sne laurens van der maaten micc-ikat Structure, function and diversity of the healthy human microbiome Random walk-based feature learning for micro-expression recognition Evaluation of in vitro and in vivo antioxidant potential of polysaccharides from Aloe vera (Aloe barbadensis Miller) gel Genome composition and phylogeny of microbes predict their co-occurrence in the environment Microbiome at the Frontier of Personalized Medicine Semi-supervised classification with graph convolutional networks Vaginal bacteria modify HIV tenofovir microbicide efficacy in African women Erythromycin derivatives inhibit HIV-1 replication in macrophages through modulation of MAPK activity to induce small isoforms of C/EBPbeta Association Mining to Identify Microbe Drug Interactions Based on Heterogeneous Network Embedding Representation Predicting human microbedrug associations via graph convolutional network with conditional random field Ensembling graph attention networks for human microbe-drug association prediction NTSHMDA: Prediction of Human Microbe-Disease Association Based on Random Walk by Integrating Network Topological Similarity Interactions between commensal intestinal bacteria and the immune system Exploring the Human Microbiome: The Potential Future Role of Next-Generation Sequencing in Disease Drug repositioning based on similarity constrained probabilistic matrix factorization: COVID-19 as a case study In vivo efficacy of apramycin in murine infection models anatomical therapeutic chemical classification system (atc) aBiofilm: a resource of anti-biofilm agents and their potential implications in targeting antibiotic drug resistance Antibacterial and Antiproliferative Activities of Plumericin, an Iridoid Isolated from Momordica charantia Vine The microbiome and cancer A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations The gut microbiota-masters of host development and physiology MDAD: A Special Resource for Microbe-Drug Associations Computational Model To Quantify the Growth of Antibiotic-Resistant Bacteria in Wastewater. mSystems The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets Genome-scale analyses of healthpromoting bacteria: probiogenomics Associations between explorative dietary patterns and serum lipid levels and their interactions with ApoA5 and ApoE haplotype in patients with recently diagnosed type 2 diabetes Innate immunity and intestinal microbiota in the development of Type 1 diabetes DrugBank 5.0: a major update to the DrugBank database Predicting drug-disease associations through layer attention graph convolutional network Human gut microbiota in obesity and after gastric bypass Towards a mechanistic understanding of reciprocal drug-microbiome interactions This work was supported by the National Natural Science Foundation of China under grants No. 61972422 and No. 62072058. Conflict of interest: none declared.