key: cord-0015060-9lu8c8o9 authors: Bobrowski, Tesia; Alves, Vinicius M.; Melo-Filho, Cleber C.; Korn, Daniel; Auerbach, Scott; Schmitt, Charles; Muratov, Eugene N.; Tropsha, Alexander title: Computational models identify several FDA approved or experimental drugs as putative agents against SARS-CoV-2 date: 2020-04-22 journal: ChemRxiv DOI: 10.26434/chemrxiv.12153594 sha: 88c34d9c2e7c40089c3c32a38a06333cad382305 doc_id: 15060 cord_uid: 9lu8c8o9 The outbreak of a novel human coronavirus (SARS-CoV-2) has evolved into global health emergency, infecting hundreds of thousands of people worldwide. In an effort to find antiviral medications, many computational groups have pursued the 3C-like protease of the virus, also known as main protease (M(pro)), as a drug target. We have identified experimental data on the inhibitory activity of compounds tested against closely related (96% sequence identity, 100% active site conservation) protease of SARS-CoV and employed this data to build Quantitative Structure-Activity Relationships (QSAR) models for this dataset. We employed these models for virtual screening of all marketed, withdrawn, experimental, and investigational drugs from DrugBank, including compounds in clinical trials. Molecular docking and similarity search approaches were explored in parallel with QSAR modeling, but molecular docking failed to correctly discriminate between experimentally active and inactive compounds, so we did not rely on this approach in prospective virtual screening. As a result of our studies, we recommended 41 approved, experimental, or investigational drugs as potential agents against SARS-CoV-2 acting as putative inhibitors of M(pro)>. Ten compounds with feasible prices were purchased and are awaiting the experimental validation. This manuscript will be updated once results are available and submitted for peer-review publication if compounds are found to be active in SARS-CoV-2 phenotypic screen. On December 8th, 2019, Chinese health authorities in Hubei detected the first case of an 35 infection caused by a novel coronavirus since named SARS-CoV-2. 1,2 On January 31, less than 36 two months later, the World Health Organization declared the SARS-CoV-2 outbreak a global 37 health emergency. 3 The new coronavirus is most similar to a bat betacoronavirus that does not 38 infect humans, but it is also in the same family as the notorious human coronaviruses SARS-CoV 39 (sudden acute respiratory syndrome coronavirus) and MERS-CoV (Middle Eastern Respiratory 40 Syndrome coronavirus), which have reported fatality rates of 10% and 35%, respectively. 4,5 41 Current (as of April 16 th , 2020) estimates of the fatality rate of COVID-19 vary per age cohort and 42 the virus to date is estimated to have infected over two million people, though these statistics are 43 approximate due to established asymptomatic transmission of the disease or likely underreporting 44 or lack of testing by health authorities. 6, 7 While the fatality rate of the current virus is estimated 45 to be less than that of SARS and MERS-CoV, it has been shown to be highly transmissible, 46 infecting the first 1,000 patients in only 48 days, whereas SARS took 130 days and MERS took 47 2.5 years. 8 The initial velocity of the spread of SARS-CoV-2 was enough to indicate pandemic 48 potential at the start of the outbreak, and now and hundreds of thousands of cases have been 49 reported worldwide despite strict quarantine and travel protocols set in place in many countries. 50 No antivirals or vaccines exist against SARS-CoV-2 or past epidemic betacoronaviruses, 51 which represents a larger-scale paucity of data on this genus of viruses. 9 Genomic sequences of 52 the SARS-CoV-2 continue to be uploaded to GenBank, hosted by the National Center for 53 Biotechnology Information (NCBI), and there are 1084 distinct sequences listed there to date. 10 54 The first protein crystal structure for SARS-CoV-2 deposited in the Protein Data Bank in February 55 2020 was the 2019-nCoV main protease (also known as 3C-like protease or M pro ) in complex with 56 The workflow employed in this study can be seen in We collected 201 datapoints for the SARS-CoV M pro assay (ChEMBL ID: X) and, after 99 curation, 91 compounds (27 actives and 64 inactives, considering a threshold of 10 µM) were kept. 100 We found 22 additional compoudns in PDB (13 actives and 9 inactives) that were not available in 101 ChEMBL. At the end, 113 compounds (40 actives and 73 inactives) were kept for modeling. All 102 chemical structures and correspondent biological information were carefully standardized using 103 Standardizer v.20.8.0 (ChemAxon, Budapest, Hungary, http://www.chemaxon.com) according to 104 the protocols proposed by Fourches and colleagues. 31, 32 Briefly, inorganics, counterions, metals, 105 organometallic compounds, and mixtures were removed. In addition, specific chemotypes such as 106 aromatic rings and nitro groups were normalized. Furthermore, we performed the analysis and 107 exclusion of duplicates: (i) if duplicates presented discordance in biological activity, both entries 108 would be excluded; and (ii) if the reported outcomes of the duplicates were the same, one entry 109 would be retained in the dataset and the other excluded. 110 The QSAR models were developed using three types of descriptors: Morgan fingerprints, 33 type, but also for other atomic characteristics that may impact biological activity of molecules, 117 e.g., partial charge, lipophilicity, refraction, and atom ability for being a donor/acceptor in 118 hydrogen-bond formation (H-bond). Detailed description of HiTQSAR and SiRMS can be found 119 elsewhere. 35 Dragon descriptors were calculated at 2D level as well. For both SiRMS and Dragon, 120 descriptors with less than 0.01 variance were removed. Correlated descriptors were also removed. 121 QSAR models were built and rigorously validated following best practices. 36 The models 123 were built using the Random Forest (RF) algorithm 37 implemented in scikit-learn (http://scikit-124 learn.org). Random Forest hyperparameters were tuned using the GridSearchCV module 125 implemented in scikit-learn. Trees were decorrelated by randomly bootstrapping compound 126 instances used in modeling with replacement and selecting a random sample of root(N)-many 127 features for each tree, where N is the total number of features available. Trees were configured to 128 evaluate features on classification accuracy at the median value and to use gini as the split criterion. 129 A 5-fold external cross-validation procedure was performed using the following protocol. 130 The full set of compounds with known experimental activity is randomly divided into five subsets 131 of equal size. One of these subsets (20% of all compounds) is set aside as the external validation 132 set, while the remaining four sets form the modeling set (80% of all compounds). This procedure 133 is repeated five times, allowing each of the five subsets to be used as an external validation set. 134 Models are built using the training set only, and it is important to emphasize that compounds are 135 never simultaneously part of both the training and external validation set. 136 Two types of consensus were performed: consensus is a majority average of predictions 137 from the independent models developed with Morgan, SiRMS, and Dragon. Consensus AD is a 138 majority average prediction from independent models when predictions are inside the applicability 139 domain of that model. The local (tree) applicability domain approach 38 setting a threshold of 70% 140 was used for all RF models developed in this study. 141 Molecular docking experiments were performed using the structure of M pro from SARS-144 CoV-2 (PDB ID: 6LU7). To enable these calculations, the structure was prepared in Maestro 39 145 under pH 7.0±2.0 and optimized with OPLS3e force field. All ligands were prepared under the 146 same conditions and submitted to molecular docking using Glide 12 with the standard precision 147 (SP) option. 148 149 Similarity search was performed in the KNIME platform (https://www.knime.com/) using 151 Morgan fingerprints using the three compounds described by Wang et al. 12 as active in the 152 phenotypic screen (remdesivir, chloroquine, and nitazoxanide). A threshold of 75% similarity in 153 Tanimoto coefficient was employed to select compounds from DrugBank as putative actives. 154 155 As seen in Figure 1 , we employed three different computational strategies to screen a wide 157 array of compounds from DrugBank in order to suggest preexisting compounds with possible 158 inhibitory activities against SARS-CoV-2. We started by collecting all publicly available data on 159 the SARS-CoV-2 and other coronaviruses. We excluded all phenotypic assays from modeling on 160 the basis of a recent study by Wang et al. 40 which demonstrated that some compounds active 161 against SARS-CoV were not active against SARS-CoV-2 in a phenotypic screen. The replicase 162 polyprotein 1ab was discarded because its whole structure is not available in PDB, but just its 163 derivatives. Using Basic Local Alignment Search Tool (BLAST) available in UniProt 164 (https://www.uniprot.org/blast/) 41 , we observed that the primary sequences of M pro in both CoV and SARS-CoV-2 had 96% identity (Figure 2a) . The crystal structure of SARS-CoV-2 M pro 166 was recently elucidated and superposition of the respective 3D protein structures (PDB IDs: 5N19, 167 6LU7) revealed a conserved binding site around the co-crystallized inhibitors including the 168 catalytic dyad represented by His41 and Cys145 (Figures 2b and 2c The 113 compounds (40 actives and 73 inactives) kept after curation were used for binary 179 QSAR modeling. The statistical characteristics of our QSAR models are available in Table 1 . Due 180 to the limited size of the dataset, models were only validated by 5-fold external cross validation 181 and achieved external correct classification rate of 71-83% (sensitivity = 55-72%, positive 182 predicted value = 72-100%, specificity = 88-100%, negative predicted value = 78-85%). Models 183 were generatated with the entire (unbalanced) dataset. Although sensitivity was only acceptable 36 184 (> 60% for majority of the models) and below this threshold for Dragon models, we decided to 185 proceed with this model because the PPV was higher. This guarantees that a lower number of hits 186 would be found, but a higher confidence is expected. 187 188 Due to the small amount of publicly available SARS-CoV-2 M pro assay data and the high 203 similarity 96% identity sequence of M pro in SARS-CoV and SARS-CoV-2, including conserved 204 active site (see above), we hypothesized that compounds predicted to be active in the SARS-CoV 205 M pro assay 45 (used for compounds in our modeling set) could be active against SARS-CoV-2. 206 In addition, we have also predicted M pro activity for twenty three compounds reported to 207 undergo clinical trials (as of March 23, 2020) 46 (See Table S1 in Supplementary Materials). Of 208 these compounds, lopinavir, ritonavir, tetrandrine, cobicistat, losartan, ribavirin, remdesivir, 209 aviptadil, and danoprevir were predicted as active by SiRMS models. Lopinavir was also predicted 210 as active by Dragon. None of the molecules were predicted as active by Morgan models. Lopinavir 211 is an established protease inhibitor that approved for use in HIV patients and is usually used in 212 conjunction with ritonavir, another protease inhibitor. 47 Lopinavir and lopinavir/ritonavir have 213 been tested previously on SARS 48 and MERS-CoV 49 , but recent clinical trials suggest that the drug 214 combination is not as successful as expected against SARS-CoV-2. 50 215 Since no data is available to build models for SARS-CoV-2 M pro and considering the high 216 similarity between these targets, we we decided to employ these models to virtually screen the 217 curated DrugBank dataset and submit these molecules for experimental evaluation.. Applying our 218 models to screen this dataset of 9,615 compounds yielded 41 compounds predicted as actives using 219 a Consensus and Consensus AD models. 220 In parallel, we have also conducted molecular docking exeriments using the structure of 221 M pro from SARS-CoV-2 (PDB ID: 6LU7). 11 Before using docking as a virtual screening tool, it is 222 crucial to validate the approach with known experimental data. Therefore, known inhibitors and 223 non-inhibitors of M pro were used to evaluate if the docking score was capable of ranking active 224 compounds better than inactives. For this purpose, the curated dataset (CHEMBL3927) used for 225 QSAR modeling was applied in a docking validation run. Then, compound ranking by the docking 226 score was compared with ranking by activity in the ChEMBL assay. The results suggested that 227 docking scores were poorly correlated with the binding affinity as indicated by the area under the 228 receiver operating characteristic (ROC) score of 0.49 (Figure 3) , implying that docking scores 229 randomly assigned compounds as actives and inactives. Additionally, the early enrichment was 230 poor with sensitivity of only 0.11 for the top 10% ranked compounds, i.e., actives were ranked 231 poorly while inactives were occupying the top of the list of virtual hits. The top 15% also presented 232 poor sensitivity (0.14) . Only after the top 69% of the list was considered, the sensitivity reached 233 reasonable values (0.70). Based on these results, docking was discarded as a virtual screening 234 approach. 235 We also employed a similarity search using three compounds described by Wang et al. 12 240 as 39 active in the phenotypic screen (remdesivir, chloroquine, and nitazoxanide). We found that 241 only the following 13 compounds from the curated DrugBank dataset had Tanimoto similarity 242 coefficient higher than 75% to any of those three drugs: anhydrovinblastine, 243 hydroxychloroquine, lurbinectedin, quinacrine, quinacrine mustard, rifalazil, vinblastine, 244 vincristine, vindesine, vinflunine, vinorelbine, Five out of 13 compounds were predicted as active by SiRMS models, including Thus, we selected 41 hits from DrugBank based on QSAR predictions, including four 259 compounds identified by similarity search and predicted by both SiRMS and Dragon. These hits 260 have been found among commercially available compounds listed in ZINC database 59 and the 261 vendors selling these compouds were identified using our in-house ZINC-Express software 262 (https://zincexpress.mml.unc.edu/) ( Table S1 in Supplemental Materials). We purchased 10 263 compounds ( Table 2 ) that were financially feasible for testing and submitted them for experimental 264 evaluation by our collaborators at the University of Kentucky. The experimental data for testing 265 these compounds in M pro assay will be reported in the updated version of this manuscript once the 266 results become available. The complete list of hits is available in the supplementary materials. 267 268 In this study, we utilized previous experimental data on SARS-CoV M pro to develop a 287 QSAR model that was used to virtually screen DrugBank in the search for novel potential hits 288 against SARS-CoV-2 M pro . As shown in Figure 2 , the binding site of M pro is conserved across 289 SARS-CoV and SARS-CoV-2. Collectively, the high conservation of M pro among coronaviruses 290 has been noted in the past and previous studies have explored the potential of developing broad-291 spectrum antivirals by targeting this enzyme. 16 Molecular docking was not sufficient to 292 discriminate between experimental actives and inactives and was ultimately not used to select hits. 293 The generation of QSAR models according to best practices resulted in 41 virtual hits. Of the other 294 top hits, several compounds currently being tested in clinical trials such as lopinavir and ritonavir 295 were predicted to be active by our models. 296 The 41 virtual hits were analyzed for availability and price feasibility using our in-house 297 ZINC Express software (https://zincexpress.mml.unc.edu/). At the end, 10 compounds (Table 2) 298 were selected for experimental testing by our collaborators at the University of Kentucky. Our 299 group has also selected compound combinations through other methods that will be tested at the 300 National Center for Advancing Translational Sciences. All collected and curated data, models, and 301 virtual screening results are publicly available in the Supplementary Materials of this paper and at 302 GitHub (https://github.com/alvesvm/sars-cov-mpro). The curated data are also available in the 303 Articles Epidemiological and clinical characteristics of 99 cases of 2019 317 novel coronavirus pneumonia in Wuhan, China: a descriptive study Statement on the second meeting of the International Health Regulations (2005) Emergency 321 Committee regarding the outbreak of novel coronavirus 328 Worldwide reduction in MERS cases and deaths since 2016 Transmission of 2019-NCOV infection from an asymptomatic contact in 331 335 8. Comparing the Wuhan coronavirus outbreak with SARS and MERS The crytal structure of 2019-nCoV main 342 protease in complex with an inhibitor N3 Remdesivir and chloroquine effectively inhibit the recently emerged novel 345 coronavirus (2019-nCoV) in vitro Drug Ivermectin inhibits the replication of SARS-CoV-2 in vitro An orally bioavailable broad-spectrum antiviral inhibits SARS-CoV-350 2 in human airway epithelial cell cultures and multiple coronaviruses in mice Structure of Mpro from COVID-19 virus and discovery of its inhibitors Exoribonuclease Activity Are Susceptible to Lethal Mutagenesis: Evidence for Per aspera ad astra: Application of Simplex QSAR approach in 406 antiviral research The effects of characteristics of substituents on toxicity of the 408 nitroaromatics: HiT QSAR study Best practices for QSAR model development, validation, and exploitation Qsar analysis of the toxicity of nitroaromatics in tetrahymena 414 pyriformis: Structural factors and possible modes of action. SAR and QSAR in 415 Glide: A New Approach for Rapid, Accurate Docking and Scoring. 417 1. Method and Assessment of Docking Accuracy From SARS to MERS: crystallographic studies on coronaviral proteases 420 enable antiviral drug design UniProt: A worldwide hub of protein knowledge Structure of Mpro from COVID-19 virus and discovery of its inhibitors Identification of Genotypic Changes in Human Immunodeficiency 431 Virus Protease That Correlate with Reduced Susceptibility to the Protease Inhibitor 432 Lopinavir among Viral Isolates from Protease Inhibitor-Experienced Patients Role of lopinavir/ritonavir in the treatment of SARS: Initial virological 435 and clinical findings Treatment with lopinavir/ritonavir or interferon-β1b improves outcome 437 of MERSCoV infection in a nonhuman primate model of common marmoset A Trial of Lopinavir-Ritonavir in Adults Hospitalized with Severe Covid-440 19 Peganum harmala seed extract and its total alkaloids against Influenza virus Cytotoxicity, antiviral and antimicrobial activities of 445 alkaloids, flavonoids, and phenolic acids Antiviral activity of selected 448 Catharanthus alkaloids Antiviral 451 activity of natural and semi-synthetic chromone alkaloids The human immunodeficiency virus protease inhibitor ritonavir 454 inhibits lung cancer cells, in part, by inhibition of survivin Marine Natural Products in Medicinal Chemistry Development potential of rifalazil and other benzoxazinorifamycins. Expert Opinion on 462 Investigational Drugs ZINC -A free database of commercially available compounds 464 for virtual screening Nitazoxanide, a new drug candidate for the treatment of Middle East 467 respiratory syndrome coronavirus Protease inhibitors targeting coronavirus and filovirus entry Strategies of Development of Antiviral Agents Directed Against 472 Influenza Virus Replication Nitazoxanide: A first-in-class broad-spectrum antiviral agent Repurposing of Kinase Inhibitors as Broad-Spectrum Antiviral 477 Drugs Concurrent Antiviral and Immunosuppressive Activities of Leflunomide 479 in Vivo Combination of MEK inhibitors and oseltamivir 482 leads to synergistic antiviral effects after influenza A virus infection in vitro ProTides of BVdU as potential anticancer agents upon efficient intracellular 485 delivery of their activated metabolites