key: cord-0996713-nc3avvvp
authors: Groza, Vlad; Udrescu, Mihai; Bozdog, Alexandru; Udrescu, Lucreţia
title: Drug Repurposing Using Modularity Clustering in Drug-Drug Similarity Networks Based on Drug–Gene Interactions
date: 2021-12-08
journal: Pharmaceutics
DOI: 10.3390/pharmaceutics13122117
sha: 4ac36ee6a26ba2e3b093e04a75b69ae5adf5f115
doc_id: 996713
cord_uid: nc3avvvp

Drug repurposing is a valuable alternative to traditional drug design based on the assumption that medicines have multiple functions. Computer-based techniques use ever-growing drug databases to uncover new drug repurposing hints, which require further validation with in vitro and in vivo experiments. Indeed, such a scientific undertaking can be particularly effective in the case of rare diseases (resources for developing new drugs are scarce) and new diseases such as COVID-19 (designing new drugs require too much time). This paper introduces a new, completely automated computational drug repurposing pipeline based on drug–gene interaction data. We obtained drug–gene interaction data from an earlier version of DrugBank, built a drug–gene interaction network, and projected it as a drug–drug similarity network (DDSN). We then clustered DDSN by optimizing modularity resolution, used the ATC codes distribution within each cluster to identify potential drug repurposing candidates, and verified repurposing hints with the latest DrugBank ATC codes. Finally, using the best modularity resolution found with our method, we applied our pipeline to the latest DrugBank drug–gene interaction data to generate a comprehensive drug repurposing hint list.

The growth in the number of newly approved pharmaceutical substances has stagnated despite the ever-growing resources that the industry allocates [1] [2] [3] [4] . Designing, developing, and testing new medicines is an expensive, long, and cumbersome process [5] , which becomes explicitly bothersome for new rare diseases-because funds are limited-and new pathogen epidemics-stopping the disease spread requires a rapid therapeutic solution [6, 7] . One convenient alternative to the pharmaceutic industry's productivity challenges is drug repurposing, underpinned by the R&D in the pharmaceutical industry, as well as the observations and long-time experience indicating the favorable polypharmacological profile of drugs (in other words, most pharmaceutical substances tend to have multiple functions) [8] [9] [10] . The trend that calls for drug repurposing techniques is in sync with the recent expansion of Big Data and machine learning in genetics, biology, and medicine; therefore, we witnessed the development of a wide array of computer-based methodologies to uncover new drug repurposing [11] [12] [13] .

A significant area in computational repurposing (or repositioning) relies on the complex network representations of various drug interaction/relationship types, e.g., drug-drug [14] , drug-target [15] [16] [17] , drug-side effect [18] , drug-gene. The networks consist of nodes/edges-representing drugs, targets, genes, or side effects-and links/edgesrepresenting interactions or other types of relationships [19] . The network of specific drug interactions allows for the characterization of a complex biological system under therapy; therefore, researchers can use computational techniques and network science principles to Figure 1 . The overview of our proposed computational drug repurposing pipeline. In the first step, we use drug-gene interaction information from DrugBank 5.0.9 to build the (bipartite) drug-gene interaction network, which we then projected as a drug-drug similarity network (DDSN). In the second step, we used modularity class network clustering to identify drug communities with shared properties, analyzed the DrugBank 5.0.9 first-level ATC code histograms in each community to predict new drug properties, and checked these predictions against the latest DrugBank 5.1.8 level 1 ATC codes. The procedure in the second step allows maximizing the number of confirmed repositionings by adjusting modularity resolution. The third step uses our method with the optimized resolution value determined in the second step to generate a repurposing hints list according to DrugBank 5.1.8. Three arguments support the novelty of the research presented in this paper. First, this manuscript is-to the best of our knowledge-the first to build and process a DDSN based on drug-gene interaction data. Second, we present a novel method (based on level 1 ATC codes) that labels clusters and generates repositioning hints automatically. Third, we tuned modularity resolution algorithmically and automatically confirmed repositioning hints by comparing two chronologically distinct DrugBank versions.

From a pharmacological perspective, our overarching contribution is to develop, for the first time, and promote the drug-gene interaction networks as a valuable analytical, screening, and visualization tool in drug repositioning. Our method can complement existing computational repositioning pipelines; therefore, it can be integrated into more sophisticated ensemble methods.

In this section, we present the conceptual description of our algorithmic drug repositioning method from Figure 1 . The thorough technical implementation and description are provided on our GitHub page https://github.com/GrozaVlad/Drug-repurposing-using-DDSNs-and-modularity-clustering (last commit on 21 October 2021). We used Nodejs with packets xml-js (for parsing the DrugBank xml files) and pg (for interacting with the PostgreSQL database), and Docker and Docker-compose for containerized databases [34] . For building and clustering DDSN, we used the Python packages Psycopg2, Pandas [35] , NetworkX [36] , and Cdlib [37] ; for visualizing the networks, we used Gephi [38] . The hardware platform for running this project was a MacBook Pro, Intel Core i9-2400 MHz with 16 GB RAM, GPU Radeon Pro 560× 4 GB.

In order to facilitate an automated procedure of validating our drug repurposing pipeline, we used the earlier DrugBank version 5.0.9 to generate repurposing predictions in one of the anatomical or pharmacological groups described by the first-level ATC codes, then we validated the predictions with the ATC codes with the latest DrugBank version 5.1.8 (last accessed on 30 September 2021).

In DrugBank version 5.0.9, there are 1966 drugs, 2352 genes, and 7249 drug-gene interactions; the interaction types are part of the set I e = {inhibitor, agonist, antagonist, other/unknown, ligand, partial agonist, inducer, other, suppressor, binder, antibody, modulator, allosteric modulator, potentiator, neutralizer, stimulator, activator, component of, substrate, inactivator, blocker, antisense oligonucleotide}. In the latest DrugBank version 5.1.8, there are 3117 drugs, 4108 genes, and 8396 drug-gene interactions with interaction types part of the set I l = {inhibitor, agonist, antagonist, other/unknown, antibody, substrate, ligand, partial agonist, inducer, other, suppressor, binder, potentiator, modulator, activator, cofactor, degradation, positive allosteric modulator, incorporation into and destabilization, allosteric modulator, neutralizer, stimulator, binding, inactivator, inverse agonist, blocker, chaperone, inhibition of synthesis, antisense oligonucleotide, gene replacement, regulator}. Refer to Section 4.1 for explanations.

We chose DrugBank [33] because it is a comprehensive, versioned, and scientifically curated (i.e., robust) database with consistent support for in silico drug design and repositioning space exploration [32] .

The bipartite drug-gene interaction network is a graph G = (V, E), where V is the set of vertices or nodes, and E is the set of edges. The network G is bipartite because

where V D is the set of drugs and V G is the set of genes. The edges e ij ∈ E represent interactions between a drug D i ∈ V D and a gene G j ∈ V G (the interaction is of the type T k ∈ I, with I defined in Section 2.1). An example of such a drug-gene bipartite graph is presented in Figure 2a , with 4 drugs, 3 genes, and 3 types of drug-gene interactions. An illustrative example of projecting the bipartite drug-gene interaction graph G (a) into a weighted drug-drug similarity network W (b). In our example, G has 4 drugs (D 1 , D 2 , D 3 , and D 4 ), 3 genes (G 1 , G 2 , and G 3 ), and 3 types of drug-gene interactions. In the drug-drug similarity network from panel (b), nodes are drugs, and links between two drugs represent the number of genes with which the drugs interact in the same manner. For instance, as shown, the link w 1,3 between nodes/drugs D 1 and D 3 has a weight of 3 because D 1 and D 3 have the same type of interaction with genes G 1 , G 2 , and G 3 .

From the drug-gene bipartite network G, we generated the weighted drug-drug similarity network W = (V D , W) using network projection [39] . In the DDSN, the nodes represent drugs, and a link between two nodes exists if there is at least one gene with which the two drugs interact in the same manner (i.e., the interactions are of the same type T k ∈ I). In Figure 2b , we present the DDSN projection of the drug-gene example network in Figure 2 . The network is weighted because two drugs D i and D j can have the same type of interactions with m genes; therefore, the weight of edge w ij ∈ W is m.

The clustering of network G = (V, E) is the process of classifying all nodes v i ∈ V in one of the n (disjoint) subsets C j , with V = n j=1 C j , according to their topological properties. In this paper, we use modularity-based clustering because of its proven effectiveness in drug network analysis [14, 22, 23] . As defined in [40] , the modularity of a clustering C in a weighted network such as our DDSN-represented as W-is defined as follows.

In Equation (1), a = 1 2 ∑ ij w ij ; i and j are the indexes of nodes v i , v j ∈ V D ; k i and k j are the node degrees (i.e., the sums of weights of incident edges) for nodes v i , v j ∈ V D ; w ij is the adjacency matrix of nodes in W; C i and C j are the communities that include nodes v i , v j ∈ V D , respectively; and p is a function p(x, y) that returns 1 if x = y and 0 otherwise. (In our DDSN, nodes v i and v j are drugs D i and D j , respectively).

The modularity of clustering C is a value M C ∈ [−1, 1], representing the edge density within the clusters with respect to the edge density between clusters. The clustering algorithms are based on modularity search for the best partitioning C of the node-set such that the value of M is maximized. The problem is that an exhaustive search for the best modularity entails large computational burden. Consequently, in practice, heuristic algorithms approximate optimal modularity clustering. However, if the network is very large, such approximations cannot identify small-size clusters-even if the density of internal edges is high and the density of edges between these small clusters and the rest of the network is low.

In this paper, we use the modularity-based clustering algorithm from [41] , which controls the resolution of the clustering using a recursive procedure that starts with each node being a cluster and then moving nodes v i (i.e., D i in our DDSN) to a different cluster C j if this generates a positive modularity gain expressed as follows.

In Equation (2), K * C j is the sum of the weights of all edges within cluster C j ; K C j is the sum of the weights of all edges incident to nodes in cluster C j ; K i is the sum of the weights of all edges incident to node v i (D i in DDSN); and K C j i is the sum of the weights of links from v i to all nodes in cluster C j . The algorithm controls the clustering resolution using the value of λ = ∆M-a lower λ determines a higher number of clusters.

Using Algorithm 1, we tune the modularity resolution to achieve efficiency in predicting new drug properties. To this end, we try λ values in the [0. 1, 5] interval, with a step of 0.1, generate the modularity clustering C for each resolution value (Clustering(G, λ)), and determine the dominant property P i in each cluster C i ∈ C. The dominant property P i corresponds to the level 1 ATC code of the majority of drugs in cluster i, D j ∈ C j , as resulting from the level 1 ATC code histogram of C i , and denoted A 1 (C i ). Then, for each drug D j in each cluster C i , we checked the list of first level ATC codes for drug D j (denoted A 1 D j ) against the drug's cluster dominant property P i . If P i is not in the list of DrugBank 5.0.9 level 1 ATC codes for D j (i.e., A 1 D j ), but it is present in the list of DrugBank 5.1.8 level 1 ATC codes (i.e., A 1 c D j ), then we consider this as a confirmed repositioning of D j to property P i . As such, we will add drug D j to the list of repositionings confirmed with DrugBank 5.1.8 level 1 ATC codes, R c . Value λ max corresponds to R c with the biggest number of elements, namely max{|R c |}. Algorithm 1 Find the parameter λ, such that the clustering C of nodes/drugs D i in G with modularity resolution λ (i.e., Clustering(G, λ)) produces the biggest number of repositionings confirmed with the level 1 ATC codes in DrugBank 5.1.8. C ⇐ Clustering(G, λ) 3: for all C i ∈ C do 4:

end if 10: end for 11: end for 12: R c =⇐ i R c i 13: end for 14: Return the value of λ max corresponding to max{| R c |}

We generated a list of new repositioning hints using the modularity clustering with the resolution value determined by Algorithm 1 in Section 2.4. Algorithm 2 presents the method we follow: Cluster the DDSN built with drug-gene interaction information from DrugBank 5.1.8 using the tuned resolution λ max (C = Clustering(G, λ max )); determine the dominant property P i of each cluster C i ∈ C as resulted from C i 's level 1 ATC code histogram (denoted A 1 (C i )); and check for each drug D j in each cluster C i the list of first level ATC codes of D j (denoted A 1 D j ) against its cluster's dominant property P i . If the cluster's dominant property P i is not in A 1 D j (the list of D j level 1 ATC codes), we hint that D j can be repositioned to P i . Consequently, we add these repositioning cases as drug-predicted property pairs D j , P i to the repositioning hints list N . Algorithm 2 Generate the list of drug repurposing hints by clustering the DDSN G with the tuned modularity resolution.

Input: Drug-drug similarity network G = (V D , E) based on drug-gene interaction data from DrugBank 5.1.8, λ max , and the ATC codes for drugs in DrugBank 5.1.8. Output: The repositioning hints N as a list of drug-predicted property pairs,

end for 10: end for 11: Return the list of drug repositionings N as drug-predicted property pairs 3. Results 3.1. DDSN Using Drug-Gene Interactions from DrugBang 5.0.9

Following the algorithmic approach presented in Figure 1 , according to the methods described in Sections 2.2-2.5, we employ cluster-based network analysis on the drug-drug similarity network (DDSN) built with drug-gene interaction information from DrugBank 5.0.9 to search for the most effective modularity resolution λ max -in other words, the modularity resolution that produces the highest number of drug repositionings confirmed with level 1 ATC codes from DrugBank 5.1.8. Figure 3 presents the result of running Algorithm 1 from Section 2.4; the best results correspond to resolutions 1.9 and 2.0 (the same nine confirmed repositionings in both cases). Henceforth, we will consider λ max = 2.0. . Drug-drug similarity network (DDSN) built with drug-gene interaction data from DrugBank 5.0.9, clustered using modularity classes for resolution λ max = 2.0. We indicate the position of drugs repositioned and confirmed (with level 1 ATC codes from DrugBank 5.1.8) them by labeling the corresponding nodes with their names. The brown nodes represent drugs in cluster C 0 (512 drugs), yellow nodes represent drugs in cluster C 1 (238 drugs), green nodes represent drugs in cluster C 2 (197 drugs), pink nodes represent drugs in cluster C 3 (143 drugs), and light blue nodes represent drugs in cluster C 4 (88 drugs).

In Figure 4 , nodes represent drugs, and links represent similarity relationships based on drug-gene interactions, as described in Section 2.2; node colors correspond to specific clusters, as determined by the modularity class, and all links are represented with grey lines.

In Appendix A.1, Figures A1-A3 , present zoomed details of DDSN from Figure 4 in the vicinity of nine confirmed repositionings corresponding to λ max = 2.0. The repositionings come from cluster C 0 -brown and cluster C 2 -green nodes. We indicated the drug repositionings confirmed with DrugBank 5.1.8 data with red arrows (→) in Figures A1 and A2; in Figure A3 , we have many confirmed repurposed drugs and a high density of nodes; hence, red diamonds ( ) were used instead of arrows.

The zoomed details provided by Figures A1 and A2 show that mepolizumab and naloxone are within cluster C 0 (brown nodes), where the dominant property is given by the level 1 ATC code N-Nervous system, followed by code R-Respiratory system. As such, our method automatically predicts that mepolizumab (listed as L-Antineoplastic and immunomodulatory drugs in DrugBank 5.0.9) acts as a drug with level 1 ATC code R. (In Appendix A.2, Figure A4 shows that in cluster C 0 -in addition to the dominant level 1 ATC codes N-we also have many subcluster drugs with level 1 ATC codes A-Alimentary tract and metabolism; R-Respiratory system; and C-Cardiovascular system). Our method predicts that naloxone (an opioid overdose antidote in DrugBank 5.0.9) also acts on the nervous system (first level ATC N). The more recent DrugBank 5.1.8 confirms the predictions, listing mepolizumab with first level ATC code R and naloxone with N (see more details in Section 3.3.1).

In Appendix A.1, Figure A3 , we zoom in to the region in DrugBank 5.0.9 DDSN with the confirmed repositionings in cluster C 2 (green nodes), with the dominant level 1 ATC code G-Genitourinary system and sex hormones (see the histogram in Appendix A.2 Figure A4 ). The confirmed repositionings in cluster C 2 are torasemide (ATC level 1 code C, cardiovascular system), quinetazone (C), methazolamide (S, sensory organs), acetazolamide (S), dorzolamide (S), and brinzolamide (S). Zonisamide (N, nervous system) is a brown node (cluster C 0 ) but in the close vicinity of cluster C 2 ; therefore, one can expect functional overlappings [14] . Our method automatically predicts that all these drugs have genitourinary system properties, and DrugBank 5.1.8 confirms the predictions (see the detailed description in in Section 3.3.1).

Using ATC codes as references for drug repurposing is already used in the stateof-the-art contexts, although confirmations based on ATC codes are very conservative (i.e., the World Health Organization assigns new ATCs after a long and thorough process) [25, 42] . Confirming the predicted drug repositionings by performing a research literature review will reveal many more confirmations [25, 43] . By this logic, our analysis of DrugBank 5.0.9 does not reveal many confirmed repurposings, yet it helps tune the modularity resolution λ.

According to the algorithmic approach presented in Figure 1 , we generated the DDSN based on the drug-gene interactions reported in DrugBank 5.1.8 and clustered DDSN using the modularity classes obtained for resolution λ max (by employing Algorithm 1 with the results presented in Section 3.1). We display the largest connected component of the DrugBank 5.1.8 DDSN in Figure 5 , with cluster C 0 (brown nodes) having the dominant level 1 ATC code N-Nervous system; clusters C 1 and C 2 (green and orange nodes) J-Anti-infectives for systemic use; cluster C 3 (light blue nodes) L-Antineoplastic and immunomodulating agents; and cluster C 4 (pink nodes) A-Alimentary tract and metabolism.

By running Algorithm 2 on the DDSN built with DrugBank 5.1.8 data and clustered with modularity classes at resolution λ max , we generated lists of drug repurposing hints for each drug cluster. In the Supplementary Materials Table S1 file DDSN-results.xls, tab DB 5.1.8 resolution 2.0, we present the first 10 drug clusters and the entire list of drug repurposing candidates generated with Algorithm 2 (759 candidates).

Generating a list of 759 drug repurposing candidates with the latest DrugBank data and experimental confirmation is beyond the focus of our paper, and we select the first 10 drugs in each cluster in terms of betweenness/degree centrality (the methodology used in [22] ) and checked them with the state-of-the-art scientific literature. For checking repositioning hints, we searched for articles in PubMed. The terms we used to search the literature were the name of the drug and the words/pharmacological terms that form level 1 of the ATC code. For example, our methodology predicted for methotrexate ATC code with level 1 J-Anti infectives for systemic use; we searched for the confirmation of this prediction by using keywords methotrexate anti-infective, as well as keywords representing therapeutic groups included in class J (i.e., methotrexate antiviral, methotrexate antibacterial, or methotrexate antimycotic). The confirmation results of our extensive literature check are presented in Table 1 , showing the drug name, cluster number, current level 1 ATC code, predicted level 1 ATC code, and confirmation references. We also added a detailed discussion of the repurposing hints from Table 1 in Section 3.2. Figure 5 . Drug-drug similarity network (DDSN) built with drug-gene interaction data from Drug-Bank 5.1.8, clustered using modularity classes for resolution λ max = 2.0. The brown nodes represent drugs in cluster C 0 (479 drugs), green nodes represent drugs in cluster C 1 (346 drugs), light blue nodes represent drugs in cluster C 2 (270 drugs), orange nodes represent drugs in cluster C 3 (129 drugs), and pink nodes represent drugs in cluster C 4 (12 nodes).

We present the topological DDSN placement of Pyridoxal phosphate-predicted repositioning from cluster C 0 -in Figure 6 , where a red diamond ( ) marks the exact position.

In Figure 7 , we illustrate the position of albendazole and methotrexate in the DDSN built with DrugBank 5.0.8 data as predicted drug repositionings from cluster C 1 . Other drug repurposing candidates from cluster C 1 (presented in Table 1 ) are shown in Appendix B.1 and Figure A5 : simvastatin, fluvastatin, lovastatin, and atorvastatin. Figure 8 displays the DrugBank 5.0.8 DDSN placement of cholecalciferol, ergocalciferol, and calcifediol-drug repurposing candidates from cluster C 2 . In Appendix B.1, Figures A6-A8 , we identify the topological positions of the other drug repurposing canditates in cluster C 2 (Table 1) : meloxicam, theophylline, and chloroquine. Table 1 . The list of drug repurposing candidates generated with our methodology in Figure 1 on data from DrugBank 5.1.8, and confirmed with scientific literature. The rows correspond to drugs or drug classes (for example, simvastatin, fluvastatin, lovastatin, and atorvastatin are statins). The columns indicate-from left to right-the name, the cluster, the current level 1 ATC code in DrugBank 5.1.8, the predicted level 1 ATC code, and the confirmation references for the drug (or drug class) in each row.

Cluster We also show the placement of drug repurposing candidates mecasermin and mecasermin rinfabate (in Figure 9 , in cluster C 4 , with red diamonds ) and ornithine (in Figure 10 , in cluster C 25 , with a red arrow →).

The histograms showing the dominant properties (as level 1 ATC codes) in clusters C 0 , C 1 , C 2 , and C 4 are presented in Appendix B.2, Figure A9 . This section discusses the drug repositioning hits generated with our methodology in DrugBank 5.0.9 and confirmed with the level 1 ATC codes in DrugBank 5.1.8. Our procedure confirmed the predicted hints in modularity classes 0 and 2.

Modularity Cluster C 0

In modularity cluster C 0 , DrugBank 5.1.8 confirms mepolizumab and naloxone (see Figures A1 and A2) . Naloxone (ATC code V03AB15) is a µ-opioid receptor antagonist indicated in the treatment of opioid overdose. In DrugBank 5.0.9, naloxone's first level ATC is V-Various; its level 4 (V03AB) means naloxone is in the Antidotes category.

Our methodology predicts naloxone's level 1 ATC as N-Nervous system; the latest DrugBank 5.1.8 adds two N level 1 ATC codes to naloxone (level 4 ATC category Natural opium alkaloids for the combinations with hydromorphone and oxycodone), thus confirming our prediction.

Mepolizumab (ATC code L04AC06) is a monoclonal antibody acting as an antagonist of interleukin-5, included in the L-Antineoplastic and immunomodulating agents level 1 ATC category by DrugBank 5.0.9. DrugBank 5.1.8 does not list the L04AC06 code anymore for mepolizumab; instead, it uses the level 1 ATC code R-Respiratory system (the level 4 ATC is R03DX, which includes other systemic drugs for obstructive airways diseases, as mepolizumab is indicated in severe eosinophilic asthma).

In modularity cluster C 2 , DrugBank 5.1.8 confirms torasemide, methazolamide, acetazolamide, dorzolamide, brinzolamide, zonisamide, and quinetazone (see Figure A3) .

Torasemide, quinetazone, methazolamide, acetazolamide, dorzolamide, and zonisamide, brinzolamide (ATC codes: C03CA04, C03BA02/C03BB02, S01EC05, S01EC01, S01EC03, N03AX15, S01EC04/S01EC54) are sulfonamide compounds with various pharmacodynamic effects. According to DrugBank 5.0.9, torasemide and quinetazone are diuretics used as antihypertensive drugs, included in the C-Cardiovascular system level 1 ATC category. Zonisamide is an antiepileptic drug (level 1 ATC N-Nervous system). Methazolamide, acetazolamide, dorzolamide, and brinzolamide are carbonic anhydrase inhibitors used in glaucoma (level 1 ATC S-Sensory organs).

Our methodology predicts G-Genito urinary system and sex hormones as the level 1 ATC code for torasemide, quinetazone, methazolamide, acetazolamide, dorzolamide zonisamide, and brinzolamide. Indeed, the latest DrugBank 5.1.8 version includes all these drugs in the G level 1 ATC category-more precisely, in the G01AE level 4 ATC category of Anti-infective and antiseptics having a sulfonamide-based chemical structure.

This section discusses the validity of some drug repositioning hints generated with our methodology in DrugBank 5.1.8; as this is the latest database version, we cannot use the same confirmation procedure based on ATC codes. Consequently, we provide evidence found in the state-of-the-art literature as confirmation clues. However, as both the number of clusters and their size prohibit an exhaustive literature search, we focus on the clusters with confirmed drug repurposing candidates-clusters C 0 , C 1 , C 2 , C 4 , and C 25 .

Pyridoxal phosphate (cluster C 0 , ATC code A11HA06) is the active form of vitamin B6 and belongs to the A-Alimentary tract and metabolism level 1 ATC category, along with the rest of water-soluble and fat-soluble vitamins. Our method predicts pyridoxal phosphate as level 1 ATC code N-Nervous system (see Figure 6 ); H-S Wang et al. reported that pyridoxal phosphate controls idiopathic intractable epilepsy in children [44] . P.B. Mills and team identified two groups of patients with neonatal epileptic encephalopathy (determined by PNPO mutations) that respond to pyridoxal phosphate [45] .

Albendazole (cluster C 1 , ATC code P02CA03) is an antiparasitic drug (first level ATC P-Antiparasitic products, insecticides and repellents) efficient in various helminthic infections. Our methodology predicts J as level 1 ATC code, suggesting potential systemic anti-infective effects (see Figure 7 ). Of note, ATC lists drug classes such as antivirals, antibacterials, antimycotics, and vaccines in the J-Anti-infectives for systemic use category. In vitro results show that albendazole exerts antifungal activity against Aspergillus spp. [46] ; moreover, experiments on mice revealed antifungal effects against Pneumocystis carinii [47] , confirming the new potential antifungal medical use of albendazole.

Methotrexate (cluster C 1 , ATC codes L04AX03, L01BA01) is an anticancer and immunosuppressant agent; therefore, the level 1 ATC is L-Antineoplastic and immunomodulating agents. We predict the first level J-Anti infectives for systemic use (see Figure 7) . The literature survey reveals several papers reporting in vitro antiviral effects of methotrexate in a dosedependent manner on SARS-CoV-2 [48] and Zika virus replication [49] ; methotrexate also prevents the replication of human cytomegalovirus and inhibits viral DNA synthesis [50] .

Simvastatin, fluvastatin, lovastatin, and atorvastatin (cluster C 1 , ATC codes A10BH51/ C10AA01/C10BX04/C10BA02/C10BX01/C10BA04, C10AA04, C10AA02/C10BA01, and C10BX15/C10AA05/C10BX03/C10BA05/C10BX11/C10BX08/C10BX06/C10BX12) are HMG-CoA reductase inhibitors (also called statins) that lower serum lipid levels, reducing the risk of cardiovascular events caused by hyperlipidemia; they are in the level 1 ATC C-Cardiovascular system class. The first level of their ATC code, as predicted by our method, is J-Anti infectives for systemic use (see Figure A5 ), confirmed by literature; as such, simvastatin exhibits in vitro antimicrobial effect on methicillin-susceptible Staphylococcus aureus [51] . S.P. Parihar et al. [52] review the literature reporting preclinical and clinical evidence of statins effects in viral, parasitic, fungal, and bacterial infections, pointing out the factors that influence the response to statins, such as human polymorphism, metabolism, and drug interactions; this review includes data on all mentioned statins. Our algorithm predicts that all statins in cluster C 1 are potential anti-infective agents. As shown, for the statins we highlighted in Figure A5 , we found literature confirming our prediction; for the other statins, new experiments and studies may provide confirmation.

Theophylline (cluster C 2 , ATC codes R03DA54, R03DA74, R03DA20, R03DA04, and R03DB04) is a methylxanthine derivative used to treat obstructive respiratory conditions, such as asthma and COPD, hence having R-Respiratory system as first level ATC code. Our methodology indicates theophylline's Anticancer and immunomodulating properties, as reflected by the predicted ATC first level L (see Figure A7 ), thus further confirming the repositioning proposed by our previous research [14] . Indeed, recent literature demonstrates the anticancer properties of theophylline in breast and cervical cell lines [53] .

Meloxicam (cluster C 2 , ATC codes M01AC56 and M01AC06) is an oxicam derivative with anti-inflammatory and antirheumatic properties of the M-Musculo-skeletal system ATC category. Our network-based methodology predicts L as the first level of the ATC code (see Figure A6 ). The literature confirms our prediction of the anticancer properties of meloxicam: Meloxicam inhibits tumor growth in COX-2 positive colorectal cancer [54] . Tsubouchi et al. report that COX-2 plays a significant role in the pathogenesis and progression of non-small cell lung cancer (NSCLC), demonstrating the inhibitory effect of meloxicam on the NSCLC growth by preferentially inhibiting COX-2 [55] . Reference [56] shows that meloxicam is efficient in osteosarcoma in both COX-2-dependent and independent inhibitory manners.

Cholecalciferol, ergocalciferol, and calcifediol (cluster C 2 , ATC codes M05BB09/ M05BX53/M05BB07/M05BB08/A11CC55/M05BB05/A11CC05/M05BB03/M05BB04, A11CC01, and A11CC06) are vitamin D analogs. Cholecalciferol (vitamin D3) is a fat-soluble vitamin (ATC level 1 A-Alimentary tract and metabolism, a category which includes hydro-soluble and lipo-soluble vitamins) with a well-established role in bone mineralization (ATC second level M05-Musculo-skeletal system, drugs for treatment of bone diseases). Ergocalciferol and calcifediol are also grouped in A-Alimentary tract and metabolism level 1 ATC. We predict these drugs as targeting diseases at level 1 ATC code L-Antineoplastic and immunomodulating agents (see Figure 8 ).There is extensive literature reporting the beneficial effects of vitamin D analogs in different cancers and highlighting the epidemiological, preclinical, and clinical results; all these back up their evolution as prophylactic and curative anticancer drugs [57, 58] .

Chloroquine (cluster C 2 , ATC code P01BA01) is an antimalarial drug; consequently, it belongs to the P-Antiparasitic products, insecticides and repellents level 1 ATC category. According to our results, the predicted first-level ATC is L-Antineoplastic and immunomodulating agents for chloroquine (dominant in cluster C 1 , see Figure A8 ). Multiple research reviews report in vitro, in vivo, and clinical trials testing chloroquine's anticancer effect in glioblastoma [59] and other types of cancers [60] [61] [62] [63] , hence supporting the potential repositioning of chloroquine as an anticancer drug, as uncovered by our methodology.

Mecasermin and mecasermin rinfabate (cluster C 4 , ATC codes H01AC03, H01AC05) are recombinant insulin-like growth factor-1 drugs indicated in growth failure in children with primary IGF-1 deficiency and, hence, are included in the H-Systemic hormonal preparations, excluding sex hormones and insulins. Literature and medicine regulatory authorities reports present the secondary pharmacologic actions of mecasermin and mecasermin rinfabate, including the anabolic and insulin-like effects (i.e., hypoglycemia) [64] [65] [66] ; these pharmacologic effects could place the drugs in the A-Alimentary tract and metabolism level 1 ATC, as predicted by our methodology (see Figure 9 ).

Ornithine (cluster C 25 , ATC code A05BA06) is a non-essential amino acid indicated as nutritional supplementation and for a good liver function and included in the A-Alimentary tract and metabolism level 1 ATC. M. Miyake et al. suggest that L-ornithine may interfere with the Central Nervous System, following a randomized, double-blind controlled trial that demonstrated that L-ornithine relieved stress and improved sleep quality in humans compared to the placebo group [67] . Indeed, we predicted ornithine at level 1 ATC N-Nervous system (see Figure 10 ).

In this section, we discuss the particularities of our method, namely the data we use, the limitations of our method and its validation with ATC codes, and the way to integrate it into an ensemble drug repositioning framework.

The method we propose in this paper uses drug-gene interaction data from DrugBank versions 5.0.9 and 5.1.8. Table 2 presents examples of drug-gene interactions and their corresponding types, as defined by DrugBank 5.1.8 (see a detailed list of drug-gene interaction types in the Supplementary Materials Table S1 file DDSN-results.xls and how to retrieve such drug-gene interactions from DrugBank in the GitHub page https://github.com/ GrozaVlad/Drug-repurposing-using-DDSNs-and-modularity-clustering (last commit on 21 October 2021)).

The mechanisms that influence the polypharmacological profile of drugs are highly complex. Indeed, the medicinal compound interacts with a complex system represented by the human organism. Complex systems are context-dependent; in other words, any detail at the micro-scale influences the macroscale behavior. As such, many factors can be considered when analyzing the functions of any pharmaceutical substance: from the chemical structure to various types of relationships and interactions, as well as pharmacokinetics and pharmacodynamics. By this logic, our approach is limited to considering a narrow informational angle, namely drug-gene interactions. Nonetheless, considering many mechanisms and types of data simultaneously within the same model would be prohibitively complex, and the networks would become much too dense for any centrality of community analysis. Even considering one type of information has become significantly complex; for instance, the drug-drug interaction networks in DrugBank 3.0 had an average degree of ∼20, and in DrugBank 5.1.8 the average DDI network degree is ∼600). Recent literature [68] [69] [70] advances the so-called ensemble methods to address this new situation of being confronted with an overabundance rather than scarcity of data (see Section 4.4). 

Employing computational methods (i.e., data mining and machine learning) in drug repositioning is generally hampered because we do not a have robust ground truth. Indeed, databases such as DrugBank record positive information about the drugs' known properties and functions, yet the absence of evidence is not evidence of absence (some drug properties may be hidden, and only future experiments can fully reveal them). That is why performance evaluation and validation of computational drug repositioning models are still an open issue; therefore, researchers adopt ad hoc, particular strategies, which are hard to compare [71] . Consequently, we resorted to making predictions with an older database version and then validating them with the latest version. However, even the latest database still cannot contain exhaustive information about drug functions. Furthermore, the negative information on drug functions/effects (stating what properties a drug does not have) will help prune the vast search space in drug repositioning. Unfortunately, negative information is scarce and scattered throughout the literature; to the best of our knowledge, no comprehensive dataset contains such data based on experimental results. As such, the existing negative information cannot be used algorithmically/automatically. As explained, one feasible method for filtering the noise and navigating the search space affected by uncertainty-an approach supported by recent research-is to integrate tools (such as the one we propose here) in ensemble methods.

Many computational drug repositioning methods based on complex networks rely on community detection and community labeling. However, labeling can be cumbersome and subjective; thus, we decided to use ATC codes, since this system is the standard for classifying medicines accepted by the WHO. Furthermore, the automated approach is fostered because the ATC code aggregates all information about a drug in a combination of letters and numbers, which are easier to process algorithmically. The ATC code classifies drugs on five levels considering three criteria simultaneously: anatomical (A)-the first level; therapeutic (T)-levels 2 and 3; and chemical (C)-levels 4 and 5. The anatomical criterion indicates the anatomical level or the physiological organ systems on which a specific drug acts. Each anatomical level is indicated in the ATC code by a letter (e.g., A-Alimentary tract and metabolism, C-Cardiovascular system, M-Musculoskeletal system, or R-Respiratory system); the ATC system contains 14 anatomical groups. Level 2 represents the therapeutic classification criterion and is encoded by two digits. Level 3 (encoded by a letter) indicates the particular pharmacological group of the drug. Level 4 (encoded by a letter) indicates the chemical class of the drug. Level 5 is encoded by two digits the chemical structure of the drug. This paper only used the first-level ATC codes for labeling and validation of prediction, although drug function is more precisely expressed by levels 1-3; we opted perform this because the sophisticated hierarchical clustering algorithms entailed by such an approach would have unnecessarily intensified the computational character of our study.

When the problem at hand is too complex to solve by employing a single model, machine learning uses an ensemble strategy [72] , which trains several models on the same set of data to operate collectively for solving the problem. This strategy is already used in bioinformatics to approach complex problems such as motif discovery in ChIP-Seq data [73] . The problem of drug repositioning is also very complex; however, prediction accuracy is not the primary indicator of success (the benefit of correctly predicting even a few drug repositionings is more significant than the cost of experiments entailed by testing the wrong predictions [74] .) As such, very recent literature advances the idea of using ensemble methods for drug repositioning [69, 70] .

In this context, considering that-as explained in Section 4.2-our method uses druggene interaction data that partially describes the behavior of drugs, we indicate the ensemble strategy as ta method to use our method. As shown in Figure 11 , drug repositioning prediction based on drug-gene interaction data may be Methodi from the group of machine learning methods based on distinct models {Method1, Method2, . . . Methodm}. The repositioning hints list i is aggregated (i.e., via voting, averaging, or other procedures) to produce a final drug repositioning hints list. The aggregation process may use pharmacological expertise, e.g., to adjust the weights of a weighted average. However, implementing the ensemble strategy is beyond the scope of this paper, which aims to analyze and promote-for the first time-the beneficial role of drug-gene interaction networks for computational drug repositioning. Figure 11 . Overview of the ensemble strategy in drug repositioning. A group of machine learning and data mining methods {Method1, Method2, . . . Methodm}, implementing various models and using distinct features (e.g., drug-drug interactions, drug-target interactions, drug-gene interactions, drugadverse reactions relationships, pharmacokinetic properties) from the same comprehensive dataset and predicting a list of drug repositioning hints. Each method Methodi generates its repositioning hints list, and an aggregation process assembles all lists in the final repurposing hints list.

In this paper, we propose a new drug repurposing methodology based on algorithmic complex network analysis. To this end, we introduce an original method of building the Drug-Drug Similarity Network (DDSN) using drug-gene interactions from DrugBank, clustering DDSN with modularity classes, and labeling each cluster with the dominant first level ATC code of drugs within the cluster. The assumption that results in drug repurposing hints is that drugs in a cluster share the dominant property of the cluster. We use an automated procedure to tune modularity resolution, to apply our methodology on a DDSN built with data from DrugBank 5.0.9, to generate the list of drug repurposing hints (i.e., drugs for which the first level ATC does not match the dominant cluster label), and to check it against ATC codes in DrugBank 5.1.8.

By running our method on the DrugBank 5.1.8 DDSN, we generated a consistent list of drug repositioning candidates; we select the top betweenness/degree drugs in each cluster and perform a preliminary validation with state-of-the-art experimental results reported in the literature. Due to the fact that we collected many literature confirmations of our method's predictions, we argue that our fully automated pipeline, based on Big Data and unsupervised machine learning, is a practical tool that can substantially narrow the enormous search space in drug repositioning.

To summarize, the overarching methodological contributions of our paper are listed as follows:

A new method to build weighted drug-drug similarity networks based on druggene interactions; (ii)

An automated procedure to optimize the modularity resolution such that network clustering maximizes the number of identified drug repurposings. A known/ confirmed drug repurposing is a drug with more level 1 ATC codes in the latest drug database, compared with the earlier database-used to generate the drugdrug similarity network; (iii)

A new drug repurposing list was generated with our pipeline from the latest DrugBank 5.1.8 by analyzing the three most representative clusters.

In the present context, affected by the COVID-19 pandemic, we believe that the most promising findings/results presented in our paper are the anti-infective effects of statins, especially their potential antiviral effects. Indeed, the very recent comprehensive study [6] also finds, following in vitro screening, that fluvastatin presents what the authors call "strong effect" against SARS-CoV-2.

Considering all aspects presented in Section 4.2, we will extend our research on druggene interaction networks by implementing hierarchical clustering to predict ATC codes on levels 1-3, developing a dedicated cluster overlapping algorithm as a drug repositioning prediction strategy (i.e., one would reasonably expect that drugs in the overlapping zone would inherit the dominant properties of the respective clusters) and integrating the druggene network method into an ensemble strategy. These future objectives require substantial reliance on developing bioinformatic tools, entailing algorithm design, machine learning, and Big Data analytics. Figure A1 . The zoomed detail of the DDSN network built with drug-gene interaction data from DrugBank 5.0.9, which shows the relative position of mepolizumab within cluster C 0 (brown nodes) with a red arrow (→). Our repositioning pipeline predicts that mepolizumab-listed as antineoplastic in DrugBank 5.0.9-also acts as a drug with level 1 ATC code R (Respiratory system), confirmed by the more recent DrugBank version 5.1.8. Figure A2 . The zoomed detail of the DrugBank 5.0.9 DDSN network showing the relative position of naloxone within cluster C 0 (brown nodes) with a red arrow (→). Our repositioning pipeline predicts that naloxone-listed as opioid overdose antidote in DrugBank 5.0.9-also acts as a drug with level 1 ATC code N (Nervous system), confirmed by the more recent DrugBank version 5.1.8. Figure A3 . The DrugBank 5.0.9 DDSN network's zoomed detail shows the confirmed repositionings within cluster C 2 (green nodes) with red diamonds ( ). Our repositioning pipeline predicts that torasemide and quinetazone (both with ATC level 1 code C-Cardiovascular system in DrugBank 5.0.9), methazolamide, acetazolamide, dorzolamide, and brinzolamide (all with ATC level 1 code S-Sensory organs in DrugBank 5.0.9) are Genito urinary system and sex hormones drugs (first level ATC G). Zonisamide (N-Nervous system) is a brown node (cluster C 0 ) but in the close vicinity of cluster C 2 ; therefore, they are also predicted at level 1 ATC code G. Figure A6 . The DrugBank 5.1.8 DDSN network's zoomed detail shows a repositionings within cluster C 2 (light blue nodes) with a red diamond ( ). Our repositioning pipeline predicts that meloxicam (currently at ATC level 1 code M-Musculo-skeletal system) has properties described by the level 1 ATC code L-Antineoplastic and immunomodulating agents. Figure A7 . The DrugBank 5.1.8 DDSN network's zoomed detail shows a repositioning within cluster C 2 (light blue nodes) with a red diamond ( ). Our repositioning pipeline predicts that theophylline (currently at ATC level 1 code R-Respiratory system) has properties described by the level 1 ATC code L-Antineoplastic and immunomodulating agents. Figure A8 . The DrugBank 5.1.8 DDSN network's zoomed detail shows repositioning within cluster C 2 (light blue nodes) with a red diamond ( ). Our repositioning pipeline predicts that chloroquine (currently at ATC level 1 code P-Antiparasitic products, insecticides and repellents) has properties described by the level 1 ATC code L-Antineoplastic and immunomodulating agents. Appendix B.2. DDSN Cluster Histograms Figure A9 . Histograms of level 1 ATC codes in the DrugBank 5.1.8 DDSN clusters holding drug repositionings confirmed by literature review: cluster C 0 (brown nodes), cluster C 1 (green nodes), cluster C 2 (light blue nodes), and cluster C 4 (pink nodes). The dominant property in cluster C 0 is N-Nervous System, J-Anti-infectives for systemic use in cluster C 1 , L-Antineoplastic and immunomodulating agents in cluster C 2 , and A-Alimentary Tract and Metabolism in cluster C 4 .

Lessons from 60 years of pharmaceutical innovation

The cost of new drug discovery and development

Discovery pharmaceutics-Challenges and opportunities

The productivity crisis in pharmaceutical R&D

The role of the medicinal chemist in drug discovery-Then and now

Network medicine framework for identifying drug-repurposing opportunities for COVID-19

Developing therapeutic approaches for twenty-first-century emerging infectious viral diseases

Drug repositioning: Identifying and developing new uses for existing drugs

Drug discovery for the future

Drug repurposing and polypharmacology to fight SARS-CoV-2 through inhibition of the main protease

Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data

Artificial intelligence in COVID-19 drug repurposing

Drug repurposing: Progress, challenges and recommendations

Clustering drug-drug interaction networks with energy model layouts: community analysis and drug repurposing

Network-based approach to prediction and population-based validation of in silico drug repurposing

A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information

Drug-target network

Construction of drug network based on side effects and its application for drug repositioning

A review of network-based approaches to drug repositioning

Data clustering based on complex network community detection

A comparative analysis of community detection algorithms on artificial networks

Uncovering New Drug Properties in Target-Based Drug-Drug Similarity Networks

Topological network measures for drug repositioning

A COVID-19 Drug Repurposing Strategy through Quantitative Homological Similarities Using a Topological Data Analysis-Based Framework

Discovery of drug mode of action and drug repositioning from transcriptional responses

Prediction of polypharmacological profiles of drugs by the integration of chemical, side effect, and therapeutic space

PREDICT: A method for inferring novel drug indications with application to personalized medicine

A diseasome cluster-based drug repurposing of soluble guanylate cyclase activators from smooth muscle relaxation to direct neuroprotection

Exploring the human diseasome: The human disease network

The human disease network

Limits of modularity maximization in community detection

DrugBank: A comprehensive resource for in silico drug discovery and exploration

0: A major update to the DrugBank database for

Docker: Lightweight linux containers for consistent development and deployment

Pandas: A foundational Python library for data analysis and statistics

Exploring Network Structure, Dynamics, and Function Using NetworkX

CDLIB: A python library to extract, compare and evaluate communities from complex networks

Gephi: An open source software for exploring and manipulating networks

Bipartite network projection and personal recommendation

Community structure in social and biological networks

Fast unfolding of communities in large networks

Drug repositioning by prediction of drug's anatomical therapeutic chemical code via network-based inference approaches

Drug repositioning by applying 'expression profiles' generated by integrating chemical structure similarity and gene semantic similarity

Pyridoxal phosphate is better than pyridoxine for controlling idiopathic intractable epilepsy

Epilepsy due to PNPO mutations: genotype, environment and treatment affect presentation and outcome

In vitro susceptibility of Aspergillus spp. clinical isolates to albendazole

Albendazole inhibits Pneumocystis carinii proliferation in inoculated immunosuppressed mice

Methotrexate inhibits SARS-CoV-2 virus replication "in vitro

Mechanism of action of methotrexate against Zika virus

Human cytomegalovirus stimulates cellular dihydrofolate reductase activity in quiescent cells

Unexpected antimicrobial effect of statins

Statins: A viable candidate for host-directed therapy against infectious diseases

Theophylline exhibits anti-cancer activity via suppressing SRSF3 in cervical and breast cancer cell lines

Meloxicam inhibits the growth of colorectal cancer cells

Meloxicam inhibits the growth of non-small cell lung cancer

Meloxicam inhibits osteosarcoma growth, invasiveness and metastasis by COX-2-dependent and independent routes

Vitamin D signalling pathways in cancer: potential for anticancer therapeutics

The anti-cancer actions of vitamin D

Re-purposing chloroquine for glioblastoma: potential merits and confounding variables

Repurposing Drugs in Oncology (ReDO)-Chloroquine and hydroxychloroquine as anti-cancer agents

Anticancer autophagy inhibitors attract 'resurgent' interest

Dissecting pharmacological effects of chloroquine in cancer treatment: Interference with inflammatory signaling pathways

Chloroquine against malaria, cancers and viral diseases

Mecasermin rinfabate for severe insulin-like growth factor-I deficiency

Mecasermin Rinfabate [rDNA Origin] Injection)

INCRELEX, INN: Mecasermin. Scientific Discussion. Available online

Randomised controlled trial of the effects of L-ornithine on stress markers and sleep quality in healthy workers

EMUDRA: Ensemble of multiple drug repositioning approaches to improve prediction accuracy

Predicting Drug-Disease Association Based on Ensemble Strategy

Drug repositioning prediction using voting ensemble. arXiv 2021

A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions

Ensemble learning: A survey

A review of ensemble methods for de novo motif discovery in ChIP-Seq data

Review of drug repositioning approaches and resources

 Figure A4 . Histograms of level 1 ATC codes in the DrugBank 5.0.9 DDSN clusters holding drug repositionings confirmed by DrugBank 5.1.8: cluster C 0 (brown nodes) in the left panel and cluster C 2 (green nodes) in the right panel. The dominant property in cluster C 0 is N-Nervous system, with many subcluster drugs with level 1 ATC codes A, R, and C (Alimentary tract and metabolism, Respiratory system, and Cardiovascular system, respectively). The dominant properties in cluster C 2 are G, C, and D (Genito urinary system and sex hormones, Cardiovascular system, and Dermatologicals, respectively). Figure A5 . The DrugBank 5.1.8 DDSN network's zoomed detail shows four repositionings within cluster C 1 (green nodes) with a red diamond ( ). Our repositioning pipeline predicts that simvastatin, fluvastatin, lovastatin, and atorvastatin (currently at ATC level 1 codes C-Cardiovascular system) have properties described by the level 1 ATC code J-Anti infectives for systemic use.