key: cord-0681756-f51jorqh
authors: Gallo, Kathleen; Goede, Andrean; Eckert, Andreas; Moahamed, Barbara; Preissner, Robert; Gohlke, Björn-Oliver
title: PROMISCUOUS 2.0: a resource for drug-repositioning
date: 2020-11-16
journal: Nucleic Acids Res
DOI: 10.1093/nar/gkaa1061
sha: 4163968c60387fb5a59976ba030bc33776b8f6f4
doc_id: 681756
cord_uid: f51jorqh

The development of new drugs for diseases is a time-consuming, costly and risky process. In recent years, many drugs could be approved for other indications. This repurposing process allows to effectively reduce development costs, time and, ultimately, save patients’ lives. During the ongoing COVID-19 pandemic, drug repositioning has gained widespread attention as a fast opportunity to find potential treatments against the newly emerging disease. In order to expand this field to researchers with varying levels of experience, we made an effort to open it to all users (meaning novices as well as experts in cheminformatics) by significantly improving the entry-level user experience. The browsing functionality can be used as a global entry point to collect further information with regards to small molecules (∼1 million), side-effects (∼110 000) or drug-target interactions (∼3 million). The drug-repositioning tab for small molecules will also suggest possible drug-repositioning opportunities to the user by using structural similarity measurements for small molecules using two different approaches. Additionally, using information from the Promiscuous 2.0 Database, lists of candidate drugs for given indications were precomputed, including a section dedicated to potential treatments for COVID-19. All the information is interconnected by a dynamic network-based visualization to identify new indications for available compounds. Promiscuous 2.0 is unique in its functionality and is publicly available at http://bioinformatics.charite.de/promiscuous2.

The promiscuity of small molecule compounds and proteins can be described as their ability to bind to a number of targets and, specifically, in the case of drugs, to different targets in addition to the intended main target (1) . Since the modulation of off-target activity can lead to unintended and potentially harmful side-effects, promiscuity was often regarded as a negative trait (2, 3) . However, following the increased availability of compound-target interaction data, an increased number of formerly unknown promiscuous interactions and proteins became known, and the term polypharmacology was coined for multitarget binding activity (4, 5) . Since the development of drugs is an expensive process which can cost up to two billion US dollar (6, 7) , pharmaceutical companies quickly became interested in exploiting drug promiscuity as a means to cut development and approval costs of new drugs by reassigning already approved drugs to different application areas derived from their off-targets (8) (9) (10) . Due to the increased interest in using drug-target relations to reposition drugs for new indications, a number of databases were developed in recent years with a focus on providing information and features useful for this purpose. Regarding the ongoing COVID-19 pandemic, drug repositioning is an especially promising approach in order to find treatments or cures, since in addition to cutting development costs, it also significantly decreases the time for approval, which is of great importance during the rapid spread of a new disease (11) (12) (13) .

In recent years, several different approaches, both biological and computational, were used to find new areas of application for already approved drugs. Experimental approaches include target screening approaches, cell assay approaches, animal model approaches and clinical approaches, whereas computational approaches are, for instance structure-, network-or text-mining-based (14) . A variety of biological databases such as Drugbank (15) and ChEMBL (16) exist, which can provide the information which is necessary to build a potential drug repositioning pipeline.

Unfortunately, the scope of existing databases that focus directly on drug repurposing is limited, as they either serve as a pure compilation of successfully repositioned drugs (17, 18) or completely lack interactive features (19, 20) . Most importantly, there is currently no database that provides possible indications for a compound of interest in an easily accessible way or shows data in a visual and interactive fashion. With Promiscuous 2.0, we aim to close this gap by providing a powerful yet easy to use resource, which enables experts and non-experts alike to create complex interaction networks in an intuitive and interactive way, and potentially infer new indications for existing compounds. Moreover, it is possible to search for existing drugs or even submit new molecular structures and be presented with suggested application areas or, vice versa, get potential drug candidates for disease indications of interest; a completely unique feature across all published databases, which was also tested and applied in order to create a list of candidate drugs against COVID-19.

Promiscous was designed as a comprehensive database of information to identify potential drug repositioning opportunities. For this, an extensive integration of available data is necessary. To ensure high data quality, different quality control and filter steps were included.

In comparison to the first version of Promiscuous, the number of drugs and drug-like compounds was vastly increased from formerly 25,000 to almost a million.

Drug-target information was included from DrugBank (15), ChEMBL (16), SuperDrug2 (21), Supertarget (22) and combined into this comprehensive database. To only consider highly accurate and direct interactions between small molecules and proteins, which are necessary to provide viable drug repositioning strategies, the data was filtered and standardized.

Afterwards a variety of information was compiled to further enrich the data, most of which is neither included in the old Promiscuous database nor in comparable resources.

Information regarding side effect occurrence and frequency for the extracted drugs was obtained from the SIDER resource (v. 4.1) (23), which does not only contain adverse events occurring during the clinical trials phase but also those identified post-approval.

Newly included in Promiscuous 2.0 are predicted targets for a total of almost 600 000 drugs and small molecule compounds which were inferred using the SuperPred Webserver (24) as well as disease indications for both drugs and targets. Those were obtained in the form of ICD-10 codes from the Therapeutic Target Database (25) .

The entirety of the data contained in Promiscuous 2.0 is stored in a relational MySQL database and hosted on the Charité IT system. Backend functionality is provided via a lab-based LAMP (Linux/Apache/MySQL/PHP) server. PHP serves as the backend language, connecting to the database through the MySQL interface and delivering data to the frontend via a mixture of classic html form submission responses and AJAX requests.

As the website functionalities rely strongly on JavaScript and its plugin jQuery (https://jquery.com/) the usage of a JavaScript capable browser is essential and, due to more extensive testing, Google Chrome is recommended. Various functionality from the CSS-Framework Bootstrap https: //getbootstrap.com/) is used on the website. Additionally, tables were created using the jQuery plugin-in DataTables (https://datatables.net), along with its absolute sorting extension (https://datatables.net/plug-ins/sorting/absolute). Charts on the website were created with the JavaScript library amCharts 4 (https://www.amcharts.com/), the network uses the Javascript library D3.js (https://d3js.org/).

The machine learning was implemented in Python using a random forest approach provided by the Python package, scikit-learn (https://scikit-learn.org/stable/). In order to assess molecular features, molecule structures were converted to MACCS molecular fingerprints (26) using RDKit (https://www.rdkit.org/) and, based on the indication mapping provided by the TTD database (25) , used as input in order to predict the effectiveness of given molecular structures against disease indications.

The main aim of Promiscuous 2.0 is to provide a freely available, powerful and easily accessible resource for drug repositioning opportunities. By providing different searching levels, non-experts as well as experts can work with this resource. The interactive network visualization provides easy access for all users to complex drug-target and target/drug to indication interplay.

The major new feature in Promiscuous 2.0 provides the unique opportunity to search for drugs and compounds with the goal of obtaining potential areas of interest for drug repositioning, as well as suggest drugs and molecular structures from the Promiscuous 2.0 database for indications of interest. Therefore, there are two options to explore potential drug repositioning opportunities: Starting with a molecular structure, possible areas of application are provided and starting from an indication of interest, potentially applicable drugs are reported.

When exploring drug repositioning opportunities for a molecular structure in question, the input compound is defined by the user providing its molecular structure ( Figure  1E ), either from PubChem via entering its PubChem name, by providing a SMILES string or MOLFile/ChemDoodle JSON, or by drawing the structure. As a result, the drug repositioning option is not only viable for known and approved drugs, but also for experimental compounds which are not included in the Promiscuous 2.0 database at the time of the search. In order to provide a comprehensive result, two different approaches are implemented, based on machine learning and overall structural similarity to known drugs.

In order to use a machine learning approach for the prediction of possible areas of application, indications were grouped according to their ICD-10 3-letter categories, due to the requirement of a suitable number of known drugs to build a viable prediction model. Upon entering a molecular structure of interest, it is compared to each of the precalculated models. As a result, the tool lists each indication group for which the corresponding model reported a probability of at least 80% of being associated with the structure in question. Even structures known to the database are The detection of potential drug repositioning areas for a given molecular structure according to structural similarity is based on the assumption that structural similarity leads to similar interaction profiles. Therefore, a similarity search using extended connectivity fingerprints (ecfp4) for compounds is performed (27) . Identical compounds are excluded by requiring the Tanimoto similarity to be less than one. If there are suitable candidates, their corresponding known targets are evaluated and, based on associated indications, potential repositioning areas are suggested.

Should the specified drug already be contained in the database and should there be indications associated with it, the corresponding ICD-10 codes are excluded from the proposed results. Therefore, in the case of existing known indications, these are separately displayed in the result screen for reference.

Additionally, for structures previously contained in the database, all known targets that are associated with indications are also reported and suggestions based on shared targets with the reference compounds are excluded from the main results.

The network, established from either a compound or indication, interlinks compounds, side-effects, indications, and target-as well as drug-drug interactions. Spherical markers represent side-effects (blue), indications (red), target interactions (yellow) and drug-drug interactions (green). Selecting a graph node shows further information and allows additional connected nodes to be added. Not needed nodes can be removed from the graph and additionally, each node can be dragged to rearrange it. Scrolling over a particular area provides zoom and it is also possible to hover over a node to enlarge its name.

As an addition to the interactive functionalities of the database, a compilation of supporting information to drugs and targets was included. Since it is both directly searchable and browsable by indications and ATC-codes, it provides functionalities usable by experts and non-experts alike.

Searching for a drug or compound in Promiscuous 2.0 is based on comparing a user specified input structure to all compound structures contained in the database via their respective molecular fingerprints. Therefore, researchers interested in a particular drug or compound are given different options to specify their molecular structure of interest, the simplest of which is to enter the PubChem (28) name, which results in the corresponding molecular structure being uploaded to the integrated ChemDoodle (29) structure viewer. Here, it is possible to preview the transmitted structure as well as make modifications, such as the deletion or addition of single atoms or substructures. Similarly, it is also possible to draw a molecule structure completely from scratch using the provided drawing tools. Additionally, it is possible to enter a SMILES string (30) describing the molecular structure or to a file containing the molecular structure ( Figure 1E ).

Upon starting the similarity search the database is queried with the input structure and once the comparisons are finished, all highly similar compounds are listed and sorted according their Tanimoto coefficient, with a Tanimoto similarity equal to 1 indicating that the structure from the database is identical to the queried structure. Here, it is possible to choose a molecule from the results to display more detailed information, such as known and predicted targets or associated indications and side effects.

To search for a specific target, there are also different identifiers available. In addition to UniProt accession numbers and names, it is also possible to use gene names to identify a target or to perform a simple text search for target names. Except for seaching for a specific UniProt accession number, a substring search is performed which can potentially produce multiple results, even though only the first 20 hits are reported to the user.

The result page contains an overview over all obtained protein targets, including tables containing indications and drugs if available for the specific target. From here, it is also possible to directly select one of the associated drugs and display the corresponding drug details.

As an option that requires minimal background knowledge, a browsing functionality was newly implemented in Promiscuous 2.0. This feature enables users to find drugs and targets that are associated with specific diseases or work in specific organs. For this purpose drugs were associated with diseases via the International Classification of Diseases 10th Revision (ICD-10) as well as according to the Anatomical Therapeutic Chemical Classification System (ATC). By either scrolling through the whole list of available ICD-10 and ATC-codes or through filtering the list of options by typing in a disease or area of interest, it is possible to select a specific code and search for drugs or targets associated with it. The obtained result screens resemble the result pages of the direct drug and targets searches, respectively, by displaying a list of drugs and targets with the possibility to select a drug to gain additional information.

As an example drug, warfarin, one of the most widely used anticoagulants worldwide, was chosen (31) . To use the drug repositioning option, simply entering the name 'warfarin' is sufficient to load its structure and start the similarity search. The result screen reports a total of four potential new indications for warfarin, which can be traversed via clicking on parts of the pie charts. The larger chart corresponds to a broader compilation of ICD-10 categories and clicking on a single slice changes the small chart to represent the ICD-10 subcategories corresponding to the selected slice. Upon clicking on either of the pie charts, a table with detailed information for the chosen ICD-10 categories is displayed underneath the graphic ( Figure 1C) .

When evaluating the results for warfarin, the newly reported indications are given as thus according to their ICD-10-codes: G20: Parkinson disease, G43: Migraine, G47: Sleep disorders and F51: Nonorganic sleep disorders. For each of the indications it is reported from which reference compound(s) the indication was derived and how high the Tanimoto similarity between the compounds was. For example, Parkinson disease was suggested as a new application area for warfarin due to the structural similarity to ethyl biscoumacetate ( Figure 1C) .

As an alternative, it is possible to use the network visualization function to obtain new indications for existing drugs, or vice versa to find new compounds possibly associated with a disease of interest. By starting with the ICD-10 categories G20-G26 (Figure 2A) , which include Parkinson, it is possible to find medications with a potential connection to the indications. The addition of one set of associated targets brings up GLUL ( Figure 2B ), which is identified as Glutamine synthetase when clicking on the corresponding node. When searching for drugs and compounds that interact with the enzyme, after two drug additions, ethyl biscoumacetate ( Figure 2C ) is found. Finally, after one addition of drugs that interact with ethyl biscoumacetate, warfarin is added to the network ( Figure 2D ). As apparent from the network (Figure 2) , there is no connection from warfarin to Parkinson, but via interacting compounds and targets a possible mechanism of action via drug-drug- interactions, which suggest a similar interaction profile, is found.

The potential treatment of Parkinson with warfarin is supported by warfarin already having completed Phase 1 Trials for Parkinson's disease Treatment in 2014. Similarly, for the other suggested application areas, warfarin is frequently reported to be effective in the treatment of Migraine (32) (33) (34) .

Another opportunity for drug repositioning is presented with the addition of predicted targets to Promiscuous 2.0. For warfarin there are three potential targets predicted (Figure 1B) . Searching for these targets reveals that they are associated with indications such as Huntington's disease or Osteoporosis. Likewise, there are reports showing that warfarin can be effective against Huntington's disease (35) . Contrarily, in the case of Osteoporosis, there are articles showing that warfarin can negatively affect bone density (36, 37) .

These reports support the validity of the results, but also stress the importance of carefully considering the positive or negative nature of the inferred associations. This is additionally reinforced by the application of the machine learning models to predict possible interactions of warfarin. In addition to a number of heart diseases, which are closely connected to the approved indication of warfarin as an anticoagulant (31) , two other clusters of indications are found, the first being various forms of joint disorder, such as arthropathies, the second various forms of noeplasms. warfarin is hereby frequently associated with hemarthrosis, a joint bleeding disorder which can be induced by anticoagulation therapy (38, 39) , whereas in a 2017 population-based cohort study in 1.2 million people, it was found that there was a significantly lower influence rate of cancer among warfarin users compared to non-users, which indicates that warfarin might be a protective factor for cancer (40) .

Additionally, a machine learning model specifically for the proposal of drugs that could be potentially active against Covid-19 was developed. In order to obtain such a model, 53 candidate drugs, including angiotensinconverting enzyme 2 (the host cell receptor for the S protein of SARS-CoV-2) inhibitors, and a series of various antiviral drugs were used as training data for a random forest approach. Using this method, a total of 230 drugs that could potentially limit the spread of SARS-CoV-2 or mortality of COVID-19 were obtained. According to this, among the highest-ranking drugs are corticosteroids, such as budesonide, dexamethasone and betamethasone, where the former two are already evaluated as candidate drugs against COVID-19, whereas the latter one is not. Also interestingly, estradiol is among the proposed drugs against COVID-10, whereas testosterone as well as various testosterone derivatives did not make the cut, reinforcing the theory that the worse clinical course in infected men could be linked to testosterone levels (41) . Also, among the suggested drugs is loperamide, which was shown to inhibit the in vitro replication of MERS-and SARS-CoV (42) . However, loperamide has a low oral absorption rate and is unable to cross the blood-brain-barrier (43) , which are both factors that are not taken into consideration when predicting the effectiveness against COVID-19. Therefore, even though the in vitro effectiveness against related coronaviruses supports it as a potential finding against COVID-19, this example also shows that the given suggestions need to be carefully evaluated. Since the prediction is solely based on structural features, other deterring properties regarding (low) drug absorption or mechanism of action might exist, which are not addressed by the machine learning methodology. These properties need to be considered when evaluating the predictions.

Promiscuous 2.0 represents an extensive update to the previously existing Promiscuous database, both in the amount of data and the introduction of all new features. Apart from having new information added to it, the previous Promiscuous database has been expanded and advanced in almost all areas ( Table 1 ). The focus of the database was shifted with regards to drug repositioning features by adding predicted targets and indications for drugs as well as targets.

Combining the information provides easily accessible and unique features, that list potential drug repositioning opportunities in a comprehensible way.

Even though the amount of data was vastly increased, it is, of course, impossible to claim absolute completeness of the data, which can of course lead to oversights in the suggested predictions. Additionally, the quality of prediction algorithms always depends on the quality of the underlying data, which, due to its size, is not feasible to curate manually. Therefore, the data was collected and filtered from established databases, such as ChEMBL (16) , Drugbank (15) and TTD (25) , in order to achieve a compromise between amount and quality of the data. Still, as is always the case for predictions, the suggestions need to be carefully evaluated and interpreted, as illustrated with the potential detri-mental role of warfarin in Osteoporosis and the problematic absorption of loperamide.

In recent years, a number of databases with a focus on drug repositioning have emerged, though none as comprehensive as Promiscuous 2.0 and for the most part applying different methods and functionalities.

RepurposeDB serves as a compendium of successfully repositioned drugs, annotated amongst others with primary and secondary indications (17) . In contrast to Promiscuous 2.0, it completely lacks interactive features like the suggestion of new indications or an interactive network representation. Additionally, with only 250 compounds it contains a very low number of drugs that are only searchable either as a whole list or by submitting a complete input structure.

Similarly, repoDB lists drugs with annotated indications, but contains no further information or features at all (18) .

The Experimental Knowledge-Based Drug Repositioning Database (19) is based on a similar principle as Promiscuous 2.0 and compiles drug repositioning opportunities upon searching for specific drugs, but does so solely based on information associated with the drug in question such as known targets or scientific articles describing researched repositioning areas for this drug. In addition, as no similarity search or similar means are performed, the functionality is not applicable to newly derived structures. Moreover, it has considerably less entries than Promiscuous 2.0 ( Table  1) .

The Drug ReposER webserver operates on the assumption that targets with structurally similar binding sites bind to similar drugs and therefore identifies new potential targets for compound structures via molecular docking and proposes them as protein targets for drug repositioning (44) . However, the method solely provides predicted targets and fails to report associated indications or to take any additional criteria into account, such as structural similarity to known drugs.

The DrugSig resource for computational drug repositioning uses the comparison of gene expression signatures to the signature of known drugs as basis for drug repositioning (20) . Since a gene expression profile of up-and downregulated genes is required as input, its accessibility is quite limited. Moreover, it only contains 1300 drugs, 800 targets and 6000 gene expression signatures, which further limits its scope.

We will regularly update the database with new entries to ensure excellent coverage and data quality standards. Especially, all data used for suggestions of drug repositioning opportunities will be further enriched. The similarity method will also be further developed and improved in regard of validity. We also plan to improve the network representation to serve all requirements of the users. We plan to include comorbidities in a future version to ensure better suggestions.

To further enhance the functionality, we plan to include additional data from relevant databases, especially regarding clinical and pharmacological information, to improve the prediction algorithm and simplify the filtering and evaluation of the results. Additionally, a REST API will be implemented to increase programmatic access to our Promiscuous database.

The data that support the findings of this study are openly available at http://bioinformatics.charite.de/promiscuous2.

Large-scale detection of drug off-targets: hypotheses for drug repurposing and understanding side-effects

Drug promiscuity

Mechanisms of unpredictable adverse drug reactions

Polypharmacology in precision oncology: current applications and future prospects

Rationalizing promiscuity cliffs

Estimating the cost of new drug development: is it really 802 million dollars?

The cost of drug development: a systematic review

Compound promiscuity: what can we learn from current data?

Drug repositioning: identifying and developing new uses for existing drugs

Editorial: drug repositioning: current advances and future perspectives

Current status of COVID-19 therapies and drug repositioning applications. iScience, 23

Rapid repurposing of drugs for COVID-19

Drug repositioning is an alternative for the treatment of coronavirus COVID-19

Review of drug repositioning approaches and resources

DrugBank 5.0: a major update to the DrugBank database

Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning

A standard database for drug repositioning

EK-DRD: a comprehensive database for drug repositioning inspired by experimental knowledge

DrugSig: A resource for computational drug repositioning utilizing gene expression signatures

SuperDRUG2: a one stop resource for approved/marketed drugs

SuperTarget goes quantitative: update on drug-target interactions

The SIDER database of drugs and side effects

SuperPred: update on drug classification and target prediction

Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics

Reoptimization of MDL keys for use in drug discovery

Extended-connectivity fingerprints

PubChem 2019 update: improved access to chemical data

ChemDoodle Web Components: HTML5 toolkit for chemical graphics, interfaces, and informatics

SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

Warfarin: almost 60 years old and still causing problems

Warfarin treatment and migraine

An unusual case report on the possible role of Warfarin in migraine prophylaxis

Migraine responsive to warfarin: an update on anticoagulant possible role in migraine prophylaxis

Huntington's disease: improvement with an anticoagulant-psychotherapy regimen

Reduced bone density in patients on long-term warfarin

Long-term warfarin therapy and biomarkers for osteoporosis and atherosclerosis

Warfarin-related recurrent knee haemarthrosis treated with arterial embolisation and intra-articular injection of tranexamic acid

Hemarthrosis associated with sodium warfarin therapy

Association of warfarin use with lower overall cancer incidence among patients older than 50 years

Worse progression of COVID-19 in men: Is testosterone a key factor?

Screening of an FDA-approved compound library identifies four small-molecule inhibitors of Middle East respiratory syndrome coronavirus replication in cell culture

Loperamide: a pharmacological review

Drug ReposER: a web server for predicting similar amino acid arrangements to known drug binding interfaces for potential drug repositioning

Funding for open access charge: Charité. Conflict of interest statement. None declared.