key: cord-0288651-kj1hh3xa
authors: Beshartifard, Milad; Ghorbanali, Zahra; Zare-Mirakabad, Fatemeh
title: The assessment of similarity vectors of fingerprint and UMLS in adverse drug reaction prediction
date: 2021-11-16
journal: bioRxiv
DOI: 10.1101/2021.11.15.468509
sha: 7b5f7d675d9d53de27347c33a7fbd4180f8ecb59
doc_id: 288651
cord_uid: kj1hh3xa

Identifying and controlling adverse drug reactions is a complex problem in the pharmacological field. Despite the studies done in different laboratory stages, some adverse drug reactions are recognized after being released, such as Rosiglitazone. Due to such experiences, pharmacists are now more interested in using computational methods to predict adverse drug reactions. In computational methods, finding and representing appropriate drug and adverse reaction features are one of the most critical challenges. Here, we assess fingerprint and target as drug features; and phenotype and unified medical language system as adverse reaction features to predict adverse drug reaction. Meanwhile, we show that drug and adverse reaction features represented by similarity vectors can improve adverse drug prediction. In this regard, we propose four frameworks. Two frameworks are based on random forest classification and neural networks as machine learning methods called F_RF and F_NN, respectively. The rest of them improve two state-of-art matrix factorization models, CS and TMF, by considering target as a drug feature and phenotype as an adverse reaction feature. However, machine learning frameworks with fewer drug and adverse reaction features are more accurate than matrix factorization frameworks. In addition, the F_RF framework performs significantly better than F_NN with ACC = %89.15, AUC = %96.14 and AUPRC = %92.9. Next, we contrast F_RF with some well-known models designed based on similarity vectors of drug and adverse reaction features. Unlike other methods, we do not remove rare reactions from the data set in our frameworks. The data and implementation of proposed frameworks are available at http://bioinformatics.aut.ac.ir/ADRP-ML-NMF/.

After the outbreak of coronavirus (SARS-CoV-2), according to studies and performed tests, the World Health Organization (WHO) issued an emergency use authorization for the drug hydroxychloroquine, which was canceled shortly afterward [1] . One of the reasons was the presence of a rare adverse reaction for this drug that caused heart disorders (cardiotoxicity) [2] [3] . However, after its widespread use, this drug also caused many deaths [2] .

In addition, it has been reported that about 26% of the people are admitted, only in a single hospital in southern India, due to adverse drug reactions (ADRs) [4] . Moreover, drug toxicity is common among children [5] caused to be hospitalized in about 300 children with an average age of 5 years in a medical center in the Netherlands due to ADRs. Such problems show the importance of various drug assessments before a drug is produced and released to the market, considering its time-consuming and costly process.

Nevertheless, sometimes monitoring a drug after launch shows some rare adverse reactions, and it is caused to be withdrawn after a few years, e.g., Rosiglitazone [6] . In May 2007, after examining data from a clinic, it was found that taking Rosiglitazone had a significant effect on the deaths caused by cardiovascular diseases. Eventually, the drug was first withdrawn in Europe, and then in the same year, severe restrictions were imposed on its use in the United States [7] .

Regardless of laboratory studies performed at various stages of drug production to identify its adverse reactions, this strategy is still not effective enough to solve the ADR problem.

Therefore, there is a severe need to diagnose ADRs accurately. In this regard, researchers are interested in approaching the ADR problem by computational methods. Nowadays, recommender systems [8] and machine learning methods [9] [10] have been common computational models for ADR prediction.

A recommender system can predict whether a user prefers an item based on its profile [11] . This technique is also used in the ADR problem. Drugs and adverse reactions are assumed as users and items, respectively. In other words, the recommender system predicts whether a drug has an adverse reaction based on the drug profile [12] . Galeano [13] introduced a collaborative filtering recommendation system to predict an adverse reaction for a new drug using known similar drugs. Matrix factorization [14] is a class of collaborative filtering recommendation systems. Poleksic et al. [15] used the compressed sensing (CS) model as a matrix factorization to predict unknown relationships between drugs and adverse reactions. In addition, the CS model is an appropriate model for dealing with sparsity data. It is suitable for solving the ADR problem because known drug-adverse reactions (positive data) are less than unknowns. In this method, the latent preference of drugs and adverse reactions is computed in a lower dimension by minimizing the defined loss function to complete the drug-adverse reaction associations.

Also, later in 2020, Guo et al. [12] recovered drug-adverse reaction matrix using triple matrix factorization (TMF) model based on calculating the similarity between drugs and adverse reactions with different features.

In addition to recommender systems, various machine learning methods are also very effective in solving this problem. Chen et al. [16] predicted the possible likelihood of that drug being associated with adverse effects for each drug. In this method, the similarity of two drugs is calculated based on the interaction of each drug with other drugs and target proteins. Finally, this algorithm uses the other drugs with this adverse reaction to calculate the score of a drugadverse reaction association based on the relationship between the drugs and calculating the likelihood.

Khan [17] applied different learning models for predicting ADRs such as neural networks, support vector machine, random forest, naive Bayes, using different drug features like fingerprint and drug indications. He limited the SIDER database [18] based on ten adverse reactions with the maximum variance across the drugs. The negative and positive data are defined using the frequency of causing each side effect by a drug according to the recorded medical history of 30 patients in the SIDER database. Therefore, if the frequency of each drugadverse reaction is more than 0.5, its association is considered positive and vice versa. Finally, for each adverse reaction, it is created a classification model. Zhao et al. [19] proposed a different approach based on the similarity of drugs with different properties such as fingerprints, the two-dimensional structure of drugs, target proteins, ATC code, and some features from the STITCH database. Similarity vectors are considered as the input of the random forest classifier model. Also, the positive data is determined based on known adverse reactions of the drugs with label 1. The negative data is randomly selected from unknown adverse reactions of drugs with label 0. In addition, they chose adverse reactions that are associated with more than five drugs. Rodriguez et al. [20] proposed a Bayesian network approach to predict ADRs using 593 pharmaceutical care center reports. Dey [21] introduced a model to convert two-dimensional or three-dimensional drug structures into numerical vectors using convolutional neural networks (CNNs) for each adverse reaction. Zheng et al. [22] calculated the similarity of drugs based on various properties such as chemical structure, target protein, drug alternatives, and ATC code to form the feature vectors of each drug-adverse reaction association as the model input. Also, they introduced a new approach for selecting negative data. The negative data is chosen based on the assumption that dissimilar drugs have fewer common adverse reactions. Uner [23] proposed a learning method to solve the ADR problem based on CNNs using the structural features of the drug and the characteristics of gene expression. They also selected negative data from pairs of drugs and adverse reactions whose relationship is unknown. In 2020, Liang et al. proposed a new approach for making negative data by random walk on drug-drug interaction networks [24] . They removed adverse reactions associated with less than six drugs from the dataset. In 2021, It was shown that combining different data sources can improve the accuracy of learning models for the ADR problem [25] .

Zhang et al. [26] defined negative data based on drug-indication associations and applied a machine learning model to classify adverse reactions as adverse or therapeutic. This article proposes four frameworks to analyze drug features and adverse reaction features in drug-adverse reaction association prediction. Two frameworks are based on two machine learning methods; a random forest classifier[10] and a neural network [9] . The rest improve two well-known matrix factorization models; CS [14] and TMF [12] .

The first framework is F_RF to predict drug-adverse reactions based on a random forest classifier by considering drug-drug similarity vectors and the vector of adverse reaction similarity. F_RF approach is compared with the neural network-based framework, F_NN, to address the ADR problem. F_RF framework shows better performance than F_NN. Although we improve the CS model [15] into ℎ and TMF method [12] into three versions, , ℎ and ℎ as matrix factorization approaches, the result announces that the F_RF model has better accuracy than state-of-the-art matrix factorization methods. Therefore, we compare the F_RF framework with some well-known algorithms in machine learning approaches that consider similarity vectors as features.

It seems that defining negative data similar to Zheng's approach [22] and considering similarity vectors as drug and adverse reaction features improve the performance of F_RF. Moreover, unlike some methods [24] [19] that remove rare adverse-drug reactions, we consider all drugs and adverse reactions, including rare associations, as training data.

Although the limited association of rare adverse reactions can reduce the model's accuracy, the F_RF framework achieves comparable performance. Finally, our framework successfully predicts some associations in some case studies.

This paper is organized as follows: the "Materials and methods" section presents the databases and datasets, the notations and definitions, and a description of our proposed frameworks. The "Results" section includes assessing our frameworks and comparing results with other models.

The "Discussion" section shows that the F_RF framework successfully predicts adverse reactions for some case study drugs, and finally, the "Conclusion" represents the future point of the ADR problem.

This section introduces drug and adverse reaction datasets and databases and then defines some notations to describe the problem of the adverse drug reaction (ADR). Finally, the proposed model for ADR prediction is explained in more detail.

In the ADR problem, we need to extract drug and adverse reaction features in addition to known drug-adverse reaction associations from databases. In this paper, we generate three datasets called Δ 1 , Δ 2 , and Δ 3 from the SIDER database [18] , and apply Poleksic [15] and Mizutani [28] datasets as Δ 4 and Δ 5 , respectively.

We use the SIDER database [18] for collecting adverse reactions (CUI codes), drugs (CID codes), and drug-adverse reaction associations. We split the extracted adverse reactions and drugs from the SIDER database into three datasets, Δ 1 , Δ 2 and Δ 3 . Fig. 1 . shows that the first dataset includes adverse reactions, which are not more common in most drugs. Fig. 3 . indicates Δ 3 dataset contains rare adverse reactions. Moreover, we apply Poleksic [15] and Mizutani [28] datasets as Δ 4 and Δ 5 , respectively. Table   3 shows more details of these datasets. Meanwhile, we extract 17843 and 17418 phenotypes for the adverse reactions in Δ 4 and Δ 5 datasets from the CTD database, respectively.

The fingerprint of each drug as a binary vector with length 881 is extracted from PubChem [25] database . The directly interacted proteins with drugs of the Mizutani database are extracted from DrugBank [29] and Matador [30] databases. The number of these target proteins is equal to 1368.

The number of indications 

This part describes the selected biological features of a drug and an adverse reaction. Moreover, we define some notations for the feature representations.

A set of drugs is denoted by = { 1 , 2 , … , }, where ∈ shows the ℎ drug. Each drug ∈ is displayed by fingerprint chemical structure or protein targets as follows:

• The binary vector = [ 1 , … , 881 ] with length 881 represents fingerprint [31] . Each with value 1 or 0 represents the existence or absence of the ℎ substructure descriptor associated with a specific chemical feature, respectively.

• The binary vector Ʈ = [ 1 , 2 , . . . , 1368 ] with length 1368 shows target proteins. Each with value 1 or 0, represents the ℎ protein as a known target for drug or not, respectively.

To calculate the fingerprint chemical structure similarity of drugs , ′ ∈ , we use Gaussian Interaction Profile ( ) meter [12] [32] defined as follows:

where the bandwidth control parameter (γ) is assigned one [12] .

To measure the target protein similarity of drugs , ′ ∈ , we apply the cosine similarity ( ) criterion [12] , as follows:

For each drug ∈ and similarity function ∈ { , }, we define the similarity vector , with length , as follows:

where , shows the feature representation of drug .

The set of adverse reactions is denoted by = { 1 , 2 , … , }, where ∈ shows the ℎ adverse reaction. For each ∈ , phenotype or UMLS [33] [15] is considered as an adverse reaction feature:

• The phenotype of adverse reaction is shown by the binary vector = [ 1 , … . , 18058 ].

In our datasets, 18058 is the union of all extracted phenotypes and is considered 1 or 0 to show the existence and absence of the ℎ phenotype for adverse reaction . The phenotype illustrates each adverse reaction based on genetic ontology through biological processes, cellular components, and molecular functions [34] .

• The UMLS [33] includes over 100 medical terminologies with a unified and semantic network designed by the National Library of Medicine to support scientific research .

To calculate the phenotype similarity of adverse reactions, , ′ ∈ , we use the cosine similarity ( ) criterion as follows:

The UMLS similarity of adverse reactions , ′ ∈ is computed using UMLS-similarity software which is denoted by function ( , ′)[33] [15] .

For each adverse reaction ∈ and similarity function ∈ { ℎ , }, we define the similarity vector α , with length , as follows:

where α , indicates the feature vector of adverse reaction .

We assume that = { 1 , 2 , … , } and = { 1 , 2 , … , } represent drugs and adverse reactions. In the ADR problem, biological features of drug ∈ and adverse reaction ∈ are given to the model. The primary goal of the ADR problem is to predict the association between adverse reaction ∈ and drug ∈ . If the model predicts adverse reaction associated with drug , the output is one and otherwise zero.

For each drug , we use the similarity vector , , or , as a drug feature. For each adverse reaction , the similarity vector α , ℎ or α , is considered as an adverse reaction feature.

In this paper, four frameworks are proposed to solve the ADR problem. As machine learning approaches, the first and second frameworks use random forest[10] and neural networks [9] , respectively. The third and fourth ones improve CS [15] and TMF [12] as matrix factorization models for ADR prediction. In the following, we describe these frameworks in more detail.

The first and second frameworks are called F_RF and F_NN, respectively. In both models, the concatenation of the similarity vectors , and α , is given as input to predict drug-adverse reaction association. For each drug , the similarity vectors , is computed using the GIP function (see Eq.1). Meanwhile, the similarity vector α , is obtained by the UMLS function [33] for each adverse reaction . In F_RF and F_NN frameworks, we require positive and negative samples for training the model. Known drugadverse reaction associations and known drug-indication associations are considered positive and negative data, respectively (see Fig. 4 

The fourth framework is defined based on TMF [12] model as a matrix factorization approach for ADR prediction. This model combines different criteria to compute the similarity between two drugs , ′ ∈. We call this similarity ( , ′). Meanwhile, the original TMF uses drug profile as a feature to compute the similarity between two adverse reactions , ′ ∈ named ( , ′). Here, we define three different versions of TMF as follows:

• : The drug-drug similarity matrix of TMF is changed as:

where ( , ′ ) is obtained based on Eq. 2 to show the similarity between targets of drugs.

• ℎ : The similarity matrix between adverse reactions is developed as:

where ℎ is obtained based on Eq.4 to show the similarity between phenotypes of adverse reactions .

• ℎ : Both drug-drug similarity matrix and the similarity matrix between adverse reactions are improved by Eq.7 and Eq.8.

The matrix factorization models define positive and negative data based on known drugadverse reaction associations and unknown drug-adverse reaction associations, respectively. 

In this section, we evaluate the four proposed frameworks. The first and second models, F_RF and F_NN, are based on random forest classifier and neural network as machine learning methods. The third framework, ℎ , improvs CS model as a matrix factorization approach.

In the fourth framework, we define three versions of the TMF model, , ℎ and ℎ , as the matrix factorization model. Each framework was implemented in Matlab 2018b under Windows and Intel Core i5-2430M processor and 4GB of memory.

In the following, we introduce our selected evaluation criteria, then the parameters of each framework are explained. Next, we assess the performance of machine learning frameworks F_RF and F_NN and analyze their effectiveness to predict rare adverse reactions. Then, the assessment of frameworks 3-4, based on matrix factorization, is evaluated. Later, we compare our proposed frameworks to introduce the best one and determine its performance against four well-known models. Finally, we assess our best framework effectiveness on predicting associations of some case studies.

We evaluate our frameworks using the area under receiver operating characteristic curve respectively (see Table 4 ).

True Positive (TP) the number of known drug-adverse reaction associations predicted correctly by the model AUPRC shows the relationship between sensitivity (recall) and positive predictive value (precision), where:

ACC indicates the rate of correct prediction to all predictions as below:

The hyperparameters of F_RF, F_NN, ℎ , , ℎ and ℎ models, respectively, are defined as follows:

• The F_RF framework

In Matlab 2018, the random forest classifier is located in the package Statistics and Machine

Learning Toolbox 1 and has some hyperparameters which can be changed according to the problem. Here, we refer to the three most important ones:

o "MinLeafSize" shows the minimum observations (samples) per leaf, which is essential in dividing the nodes in the decision trees. By default, this parameter is 1 for classification. A smaller number of "MinLeafSize" makes the model more prone to capturing noise in the training data.

o "NumPredictorsToSample" means the number of predictor or feature variables to select at random for each decision split. By default, it is equal to the square root of the total number of variables for classification.

o "NumLearningCycles" variable represents the number of decision trees in the random forest.

As it can be seen in Table 5 • The F_NN framework

We use a neural network strategy using the back-propagation approach for learning to predict drug-adverse reaction associations. Our model contains an input layer (according to the size of features) followed by one fully connected hidden layer (with100 neurons) and an output layer that decides whether a drug and an adverse reaction are associated or not based on sigmoid function. The learning rate is 0.05, and we train the network about 100 iterations. Our activation function for the hidden layer is tan-sigmoid, and errors are backpropagated according to Scaled

Conjugate Gradient (SCG) strategy.

• Improved CS and TMF models

The parameters of these two models are not changed and set according to the original models [12] [15].

In this subsection, we assess F_RF and F_NN frameworks on each dataset ∆ ∈ {Δ 1 , Δ 2 , Δ 3 }. Moreover, we compare our models to find which one is more accurate on rare adverse reactions.

3.3.1. The assessment of F_RF and F_NN frameworks on ∆ 1 , ∆ 2 and ∆ 3 datasets

For each dataset ∆ ∈ {Δ 1 , Δ 2 , Δ 3 } (see Table 2 ), we consider ∆=< , > where D and A indicate drug and adverse reaction sets. All known drug-adverse reaction associations are considered as positive data ( ∆ ). Similar to Zhang et al. [26] , we suppose that if a drug is prescribed for one of the adverse reactions, it does not cause that adverse reaction. According to this assumption, we extract drug-indication associations as negative data ( ∆ ).

To form a test set from the dataset, we randomly select 10% of negative data ( ∆ ) and the exact size of positive data ( ∆ ). The rest of the positive ( ∆ = ∆ − ∆ ) and negative data ( ∆ = ∆ − ∆ ) are considered as the training data (see Table 6 ).

As it can be seen in Table 6 , the training data extracted from Δ 1 dataset is imbalanced because the number of negative data is less than positive data. We balance this training data by oversampling strategy to repeat the negative data.

For each dataset ∆, we train F_RF and F_NN frameworks on ∆ ⋃ ∆ and test on ∆ ⋃ ∆ . Table 7 shows the values of corresponding evaluation criteria on the test set.

Both frameworks have better performance on Δ 1 dataset because the number of training data in 24239  1498  24089  1348  150  150  4591  3788  4212  3409  379  379  3017  2786  2738  2507 279 279 this dataset is more than the others with common adverse reactions. The average accuracy on all evaluation criteria on F_RF is more than in the F_NN framework. 3.3.2. The assessment of F_RF and F_NN frameworks on the rare adverse reactions An adverse reaction associated with a maximum of two drugs (and a minimum of one drug) is considered as adverse reaction rare. Predicting rare adverse reactions is an obstacle because the known associations of them are too limited. Some studies [24] [19] exclude the rare adverse reactions from their dataset to increase their performance. The number of drug-rare adverse reaction associations in each dataset Δ 1 , Δ 2 and Δ 3 contain 582, 668, and 1797, respectively. Fig.   1 ., Fig. 2 ., and Fig. 3 . depict the ratio of the adverse reaction numbers and their related drugs in each dataset ∆ ∈ {Δ 1 , Δ 2 , Δ 3 }. Each adverse reaction in datasets ∆ 1 , ∆ 2 , and ∆ 3 is associated with average 16.52, 2.99, and 1.05 drugs, respectively.

To evaluate the accuracy of F_RF and F_NN frameworks on rare adverse reactions in ∆ dataset, we choose positive test data ( ∆ ) with 10% drug-adverse reaction associations known as rare adverse reactions. The sets of ∆ , ∆ and ∆ are generated the same as the previous section. Table 8 illustrates the results for predicting rare adverse reactions on the test set. In both models, the performance of ∆ 3 dataset is better than other ones because it has more rare adverse reactions than the other datasets. The results show that F_RF is generally more accurate than the F_NN framework. 

In this subsection, we assess the performance of the improved CS model [15] called ℎ , and TMF model [12] in three versions, , ℎ and ℎ .

As mentioned above, these models are extended by adding the new drug and adverse reaction features to the original ones. The main version of these models are performed on datasets Δ 4 [15] and Δ 5 [28] , respectively. Therefore, we perform the improved CS model, ℎ , and three different versions of the TMF model, , ℎ and ℎ , on these datasets.

In these models, positive and negative data are defined based on all known drug-adverse reaction associations and unknown drug-adverse reaction associations, respectively.

To evaluate improved models, we perform cross-validation similar to the original version of CS [15] and TMF [12] . Here, we divide our known drug-adverse reaction associations randomly into equal subsets. One of them is chosen randomly, and its associations are set as 0 in the drugadverse reaction associations matrix, called the test set. Then, the model is trained by the remaining subsets. For prediction evaluation, the test set is added to the whole matrix as positive samples again. This process is repeated for every subset. 

This paper makes two similarity matrices of drugs and adverse reactions based on different features of drugs and adverse reactions. Meanwhile, we design four frameworks. Two of them are based on machine learning, F_RF and F_NN, and the others are based on matrix factorization, improvement of CS [15] , and TMF [12] .

In machine learning frameworks, the contention of each row of similarity matrices is given as input to F_RF and R_NN. In matrix factorization frameworks, the similarity matrices are integrated into the original similarity matrices of CS and TMF approaches.

The results show that although we improve the CS and TMF models as matrix factorization approaches, F_RF and R_NN represent better performance as machine learning approaches. In addition, fewer features are considered for drug and adverse reactions to computing the similarity in machine learning frameworks (see Table 7 and Table 9 ).

According to corresponding results of applying F_RF and F_NN on datasets (see Table 7 and Table 8 ), random forest performs more accurately on datasets and predicts rare adverse reactions with higher performance.

We compare the best-proposed framework, F_RF, with three machine learning models [24] [19] [22] that predict drug-adverse reaction associations using similarity-based methods. In addition, we compare F_RF with the logistic regression model introduced by Zhang in 2021 [26] . This model defines negative data as ours by considering drug-indication associations for negative data [26] . Table 10 illustrates the values of evaluation criteria for each one.

It should be noted, the Liang et al. model [24] excluded all adverse reactions with less than six associations from its dataset, and it positively affected the performance. However, the performance of F_RF on datasets includes rare adverse reactions. In addition to considering rare adverse reactions, F_RF uses fewer features than the others while the results are competitive with the rest models.

Moreover, the performance of F_RF is more accurate than Zheng et al. [22] , which performs a classifier for each adverse reaction separately. 

We also use several antiviral drugs out of our drug set to test the performance of the F_RF model. It should be noted, the association of these drug-adverse reactions are not available in the training set. We assume that positive association is determined based on the predicted probability of an association between a drug-adverse reaction which is more than 0.5. In addition, we check these pairs with F_NN, too.

For drug darunavir, F_RF predicts two adverse reactions Cardiac arrest, a type of heart disorder, and Hepatic steatosis, a type of liver disorder with the probability of association 0.98 and 0.61 (see Table 11 ). it means darunavir causes these adverse reactions. According to [35] [36], the model correctly makes the prediction.

Moreover, F_RF suggests drug ribavirin and adverse reaction Liver disorder as a negative association. It is mentioned in [37] which this drug can treat Liver disorder. 

This study proposed a framework called F_RF based on a random forest classifier to predict drug-adverse reaction associations. For this aim, a similarity vector is suggested by the drugdrug similarity score and the adverse reactions similarity function as the drug and adverse reaction representations. As the performance of machine learning methods depends on the training data, similarly to Zhang et al. [26] , the drug-adverse reactions and drug-indication are considered positive and negative data, respectively. Then, another framework was introduced using a neural network named F_NN. Comparing the corresponding of these frameworks indicated the F_RF got higher evaluation scores than F_NN for predicting rare and non-rare adverse reactions. Later, two state-of-the-art matrix factorization methods, CS and TMF, were improved to ℎ and , ℎ and ℎ and contrast with F_RF. According to the results, F_RF is performed more accurately than these models. Moreover, F_RF framework was compared with some popular machine learning approaches in ADR problem.

Although some methods exclude rare adverse reactions [24] [19] or use more features to solve the ADR problem, F_RF utilized all drug-adverse reactions, including rare ones, and fewer features. The results announced that the F_RF performance was better than most of them.

Meanwhile, F_RF correctly predicted cardiac arrest and hepatic steatosis as darunavir adverse reactions and suggested ribavirin to treat the liver disorder.

We conclude that using similarity vectors as drug and adverse reaction features and considering drug indications as negative data can improve drug-adverse reaction association prediction.

Moreover, applying a random forest classifier with less computational complexity than other models achieves higher performance scores.

In the future, we aim to assess the 3D structures of drugs to increase the performance of drugadverse reaction association prediction. In addition, applying drug-related clinical information can improve the accuracy of the model.

The authors declare no conflicts of interest relevant to the manuscript contents.

Coronavirus (COVID-19) Update: FDA Revokes Emergency Use Authorization for Chloroquine and Hydroxychloroquine | FDA

Hydroxychloroquine: a comprehensive review and its controversial role in coronavirus disease 2019

Hydroxychloroquine-induced cardiomyopathy and heart failure in twins

Analysis of the Adverse Drug Reactions and Associated Cost Burden on the Patients in a South Indian Teaching Hospital

Analysis of Reporting Adverse Drug Reactions in Paediatric Patients in a University Hospital in the Netherlands

Post marketing surveillance of suspected adverse drug reactions through spontaneous reporting: current status, challenges and the future

Rosiglitazone: a review of its use in the management of type 2 diabetes mellitus

Collaborative Filtering Recommender Systems

A methodology to explain neural network classification

Classification and Regression by Random Forest. R News

Recommender systems

A Novel Triple Matrix Factorization Method for Detecting Drug-Side Effect Association Based on Kernel Target Alignment

A Recommender System Approach for Predicting Drug Side Effects

Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem

Predicting serious rare adverse reactions of novel chemicals

Predicting drugs side effects based on chemical-chemical interactions and proteinchemical interactions

Drug side-effect prediction using machine learning methods

The SIDER database of drugs and side effects

A similarity-based method for prediction of drug side effects with heterogeneous information

Causality assessment of adverse drug reaction reports using an expert-defined Bayesian network

Predicting adverse drug reactions through interpretable deep learning framework

Inverse similarity and reliable negative samples for drug side-effect prediction

DeepSide: A Deep Learning Framework for Drug Side Effect Prediction

Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy

PubChem as a public resource for drug discovery

Prediction of adverse drug reactions based on knowledge graph embedding

NDDSA: A network-and domainbased method for predicting drug-side effect associations

Relating drug-protein interaction network with drug side effects

DrugBank: a knowledgebase for drugs, drug actions and drug targets

SuperTarget and Matador: resources for exploring drug-target relationships

PubChem: Integrated Platform of Small Molecules and Biological Activities

Gaussian interaction profile kernels for predicting drug-target interaction

UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity

Comparative Toxicogenomics Database (CTD): update 2021

Side effect information for Darunavir

Pharmacotherapy Management for COVID-19 and Cardiac Safety: A Data Mining Approach for Pharmacovigilance Evidence from the FDA Adverse Event Reporting System (FAERS)

Oral Ribavirin for Prevention of Severe Liver Disease Caused by Hepatitis C Virus During Allogeneic Bone Marrow Transplantation