key: cord-1029650-924y05d1 authors: Ozdemir, E. Sila; Le, Hillary H.; Yildirim, Adem; Ranganathan, Srivathsan V. title: In silico screening and testing of FDA approved small molecules to block SARS-CoV-2 entry to the host cell by inhibiting Spike protein cleavage date: 2022-03-07 journal: bioRxiv DOI: 10.1101/2022.03.07.483324 sha: dc3cbf7f84c1df6ea248370655c42b16b1bd2462 doc_id: 1029650 cord_uid: 924y05d1 The COVID-19 pandemic began in 2019, but it is still active. The development of an effective vaccine reduced the number of deaths; however, a treatment is still needed. Here, we aimed to inhibit viral entry to the host cell by inhibiting Spike (S) protein cleavage by several proteases. We develop a computational pipeline to repurpose FDA-approved drugs to inhibit protease activity and thus prevent S protein cleavage. We tested some of our drug candidates and demonstrated a decrease in protease activity. We believe our pipeline will be beneficial in identifying a drug regimen for COVID-19 patients. Introduction 21 The outbreak of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in China 22 is followed by a global level emergency. COVID-19, the disease caused by SARS-CoV-2, is a global 23 pandemic, has resulted in over five million deaths, and the death toll continues to climb, which is 24 devastating as the virus is expected to be in circulation for a long time (1) . The novel coronavirus has 25 been sequenced and it has been revealed that it shares substantial similarity with the SARS-CoV that 26 caused a similar epidemic in 2003 (2), they are both zoonotic coronaviruses belonging to the 27 betacoronavirus family (3). 28 The world is in search of a robust drug with universal applicability and high efficacy, that can be 29 scaled rapidly to high levels of production in order to fight this virus family. Considering the fact that 30 trials and approval process of a novel designed small-molecules take approximately 10 to 15 years in 31 the US (4), it would be a wise choice to repurpose the already well characterized small-molecules to 32 respond to this emergency. 33 To this date, there is a significant amount of work done to test previously approved antiviral drugs, 34 resulting in a few successes, like Remdesivir by Gilead and Molnupiravir by Merck with modest 35 levels of improvement in treatment outcomes. However, there is still a need for a more 36 comprehensive treatment of SARS-CoV-2 to further decrease the fatality and minimize the spread of 37 virus. 38 As an early therapeutic strategy, there are already numbers of studies to identify small molecules to 39 inhibit SARS-CoV-2 activity in the host cells. To our knowledge, there are six main strategies/routes 40 to target the viral proteins (5) ( Figure 1A targeted to prevent viral entry into the host cell (9). iv. Inhibiting the proteases essential for 50 proteolytic processing of viral polyproteins (10, 11). After entering the host cell, the virus continues 51 its lifecycle by translating its genes which express two co-terminal polyproteins, pp1a and pp1ab. 52 Proteolytic processing of viral polyproteins by main protease (Mpro) and papain-like protease 53 (PLpro) results in nonstructural proteins which basically inhibit most of the function of the host cells. 54 v. Inhibiting reverse transcription of the viral genomic material and its multiplication (12). Finally, 55 vi. Inhibiting assembly, packaging and release of newly formed viruses (12). 56 Here, we focused on inhibiting viral entry (route ii, Figure 1A ) by performing a structure-based 57 screening of FDA approved drug compounds against the proteases trypsin, catL, and TMPRSS2 as 58 well as the viral S protein. S protein consists of two main domains; S1 domain responsible for 59 receptor binding and the S2 domain mediating membrane fusion. In the case of SARS-CoV, the S 60 protein is proteolytically cleaved at the S1/S2 boundary (13). However, recent studies have revealed 61 that cleavage at a secondary site (S2') is also critical for viral activation by serine proteases, like 62 TMPRRS2 and trypsin, and entry into the host cell (13, 14). Therefore, we aim to identify small 63 molecules to be repurposed to disrupt S protein cleavage at both S1/S2 boundary and S2'. Even 64 though the SARS-CoV endosomal entry requires catL protease (15), it has been shown that 65 TMPRRS2-mediated activation, governed by the S2' site, is required for viral pathogenesis (16, 17). 66 able to dock the active (47) and decoy molecules (1652), and compare the performance against 114 docking to the crystal structure, which we refer to as the naïve method. The results are summarized in 115 terms of a Receiver Operating Curve (ROC) shown in Figure 4 . The naïve method performs decently 116 as characterized by an Area Under the Curve (AUC) of 0.71, which is in good agreement with 117 previously published performance of Vina (AUC = 0.72) (21). Remarkably, the ensemble method 118 showed a significant improvement in performance with an AUC of 0.84, aided by molecular 119 simulations in explicit water. Therefore, we proceeded with this ensemble method for screening of 120 the drugs against our protease and spike protein targets. In the future, we plan on validating this 121 method to a wide range of targets from DUD database to comprehensively evaluate its performance. 122 The S2' site of the S protein was not subject to REMD simulations, as the target region is a rigid 125 alpha helix bundle. Therefore, the ensemble of monomer receptors comprised a total of 17 structures 126 are as listed: 3 each of trypsin, catL, TMPRSS2, S1/S2-a, S1/S2-b, and one of S2' & S2'_D936Y. 127 S1/S2-a refers to docking against S1/S2 boundary cleavage site targeted by TMPRSS2 and trypsin 128 (R685-S686), while S1/S2-b indicates docking against S1/S2 boundary cleavage site targeted by only 129 catL (T696-M697). S2', by itself, is the secondary cleavage site on S protein targeted by TMPRSS2 130 and trypsin. 131 85,000 docking calculations using the 17 conformations of S protein and proteases were performed 132 and the top 19-21 hits for each ligand was listed in Table S1 . The docking scores varies from -13.6 to 133 -7.4. From the list, 96 distinct molecules (hereinafter referred to as ligands) were identified targeting 134 either/both S protein cleavage sites or/and catalytic sites of proteases. Considering the similarity 135 between catalytic sites of TMPRSS2 and trypsin, we expected to see some common molecules 136 targeting both of these proteases, as seen in the table. 137 The total of 24 complex structures are as listed; trypsin_1-S1/S2_1, trypsin_2-S1/S2_1, trypsin_3-138 S1/S2_1, trypsin_1-S1/S2_2, trypsin_2-S1/S2_2, trypsin_3-S1/S2_2, trypsin_1-S2'_1, trypsin_2-139 S2'_1, trypsin_3-S2'_1, trypsin_1-S2'_D936Y_1, trypsin_2-S2'_D936Y_1, trypsin_3-140 S2'_D936Y_1, catL_1-S1/S2'_1, catL_2-S1/S2'_1, catL_3-S1/S2'_1, catL_1-S1/S2'_2, catL_2-141 S1/S2'_2, catL_3-S1/S2'_2, TMPRSS2_1-S1/S2_1, TMPRSS2_2-S1/S2_1, TMPRSS2_1-142 S1/S2_2, TMPRSS2_2-S1/S2_2, TMPRSS2_1-S2', TMPRSS2_2-S2'. Here is an example to 143 clarify the nomenclature, trypsin_1-S1/S2_1 refers to complex between the first configuration of 144 trypsin and the first configuration of S1/S2 boundary domain of S protein. 145 120,000 docking calculations using these 24 complexes were performed and the top 19-21 hits for 146 each ligand was listed in Table S2 . The docking scores varies from -13.7 to -9.2. From the list, 151 147 distinct ligands were identified targeting S protein-protease interaction interfaces. From the 96 and 151 distinct ligands from in silico screening results for the monomer and the 151 complex structures, respectively, we used three different selection criteria to arrive at a more refined 152 list of ligands. The first selection criterion is based on the toxicity of the ligands. As mentioned 153 before, FDA approved drugs were used in this study to meet the urgent need for a treatment. To 154 further pursue this approach, the ligands with high acute toxicity rate (such as chemotherapeutics), 155 which cannot be delivered at high concentrations, were given the least preference. The second and 156 third selection criteria are based on the structural and functional similarities of the ligands, 157 respectively. Chemical clustering of ligands was performed according to their Tanimoto coefficients 158 with a similarity cut-off of 70%. Furthermore, the mechanism of action was assigned to each ligand. 159 For example, Tyrosine kinase inhibitors, NS5A inhibitors, ergosterol binders were among the most 160 commonly occurring action mechanism of the ligands. Finally, the least toxic ligands with a unique 161 mechanism of actions were selected from each cluster. The atopical ligands are eliminated. The 162 refined list of 55 ligands with their LD50s and functions is shown in Table 1 . Interestingly, drugs 163 previously proposed or/and used for COVID-19 treatment such as Remdesivir, Gleevec, Saquinavir, 164 Digoxin, Camostat, Drospirenone, Dexamethasone M and Dihydroergotamine are among these 55 165 ligands. However, their potential for blocking viral entry needs to be further explored. 166 Considering both TMPRSS2 and trypsin are serine proteases, some of the 55 ligands were targeting 168 more than one complex or monomer. There were 173 initial configurations consisting of ligand-169 monomer or ligand-protease-S protein complexes. Atomistic MD simulations were performed on 170 these complexes for 100 ns. Using a MMPBSA energy cut-off (∆G < 0) (Table S3) , we identified a 171 small number of ligands for each protease and S protein to specifically inhibit proteolytic cleavage of 172 S protein (Table 2 and Figure 5 ). The ligands listed in Table 2 can be seen in Figure 5 , which shows 173 the positioning of the ligands in the interfaces of the complex or cleavage/catalytic sites of S 174 protein/proteases. 175 The in silico screening approach has allowed us to narrow down the possible viral inhibitors from 176 ~5,000 to ~ 20, which can be tested experimentally for their efficacy to block viral entry into host 177 cells. To demonstrate proof of concept, we proceeded to validate some of our lead hits 178 experimentally at least in one case. We designed an in vitro protease activity assay for catL, and 179 observed the possible inhibitory effects of the four drug molecules that are predicted to interact with 180 the active site of the proteases. We introduced the four putative drugs and a known inhibitor (as 181 positive control) over a range of drug concentrations. As expected, the known inhibitors had a 182 significant decrease in fluorescence activity. Excitingly, we found that Nystatin and Rifapentine had 183 measurable effects on the activity of catL. Specifically, Nystatin has an IC50 of ~2 μM and 184 Rifapentine had an IC50 of ~30 μM for inhibiting the protease activity ( Figure 6A ). In our handful of 185 lead hits, we found at least two molecules that show potential to inhibit protease activity in μM 186 concentrations. These results are also corroborated by the MD simulations, as evidenced by the 187 RMSD plots of the drugs bound to catL ( Figure 6B In this study, we used homology modeling and REMD to generate an ensemble of configurations for 204 ensemble docking. This approach enabled us to sample a wide variety of conformations of the target 205 proteins (S protein and proteases). The structures chosen for docking following RMSD clustering 206 accounted for more than 70% of the conformational states sampled, which produced the diversity 207 needed for ensemble docking. We screened 5,000 small molecules against our ensembles of 208 configurations for our 17 monomer and 24 complex structures, summing up to over 205,000 docking 209 calculations. Following docking, we narrowed down the lead hits by filtering them based on acute 210 toxicity (LD50) and Tanimoto coefficient of our ligands to identify unique ligands with low toxicity, 211 leading us to a refined list of 55 drugs. This narrowing down allowed to study the 55 drugs in 212 complex with their targets (173 complexes) in more detail using all-atom explicit water MD 213 simulations to assess the stability of the interaction and quantify the binding energy in the presence of 214 the solvent explicitly. The RMSD plots and the binding energies from the MD simulations were used 215 to finally rank the drug-target interactions, which allows further experimental investigation of a short 216 list of drugs for their antiviral potential. 217 As we mentioned SARS-CoV-2 variants can acquire mutations that increase their virulence, like 218 enhanced binding to the ACE2 receptor. Therefore, if the treatment regimen involves targeting the 219 ACE2 binding domain of the S protein, its efficacy might decrease when the mutations alter the 220 binding to the drug. In this study, we have targeted the human proteases and cleavage sites of S 221 protein, which tend to be conserved even in the variants. Nevertheless, we identified one variant with 222 a mutation D936Y on S protein, which is close to the S2' cleavage site. We modeled the variant with 223 the mutation, and identified Irbesartan and Nebivolol as potential drugs specific to D936Y variant. 224 The computational pipeline allowed us to screen a large number of drugs, and narrowed the list of 225 interest to a few that can be tested at the bench. Two of the four drugs, Nystatin and Rifapentine, 226 showed measurable inhibitory effects on catL activity. Nystatin is an antifungal that is usually used to 227 treat fungal infection in mouth, which rifapentine is an antibiotic used for treating tuberculosis. It is 228 important to note that these drugs have not been tested for their potential for treat COVID19, and the 229 authors recognize such premature applications of the drug can be harmful and result in shortage of 230 the drugs for patients who need them currently. We are hoping that extensive testing of these drugs 231 for their potential to treat COVID19 is conducted together with the other recently proposed potential 232 proteolytic inhibitory molecules (24, 25), starting from in vitro studies for their potential to block 233 viral entry. 234 In this study, we generated a computational pipeline for screening drugs from the target fasta 235 sequence to the bench ( Figure 1B The crystal structure of SARS-CoV-2 spike ectodomain has been already solved (PDB ID: 6vyb 247 (26)), however it has some unstructured missing region. Therefore, the structure of whole length S 248 protein was modeled using SWISS-MODEL web server (27); the crystal structure, 6vyb, and FASTA 249 sequence (NCBI Reference Sequence: YP_009724390.1) of SARS-CoV-2 S protein were used as 250 template. S1/S2 boundary and S2' structures were adapted from our whole length S protein model. 251 S1/S2 boundary structure covered residues from 293 to 326 and from 591 to 700, S2' structure 252 consisted of residues from 715 to 1070 (Figure 2, bottom panel) . 253 TMPRSS2 extracellular domain was modeled giving Hepsin crystal structure (PDB ID: 1z8g (28)), a 254 closely-related serine protease with more than 50% sequence identity, and FASTA sequence of 255 TMPRSS2 (UniProt ID: O15393) as template using SWISS-MODEL. 256 catL and trypsin crystal structures were already available in PDB (PDB ID: 2xu1 (29) and 1h4w (30), 257 respectively) and they were used in this study. The protein structures were centered in a 3D periodic cubic box with initial dimensions ~ 7-8 nm in 264 the three orthogonal directions. The box was then solvated with ~ 12,000 to 14,000 water molecules, 265 depending on the size of the box, and a few Na + or Clcounterions, depending on the charge of the 266 protein, to keep the system charge neutral. After an initial equilibration of 1 ns, a REMD schedule 267 was setup with 60 windows in the temperature range of 300-425K. The temperature schedule was 268 generated using the webserver (http://virtualchemistry.org/remd-temperature-generator/). The REMD 269 simulations were then performed on the 60 windows for 15 ns with exchanges between neighboring 270 windows tried every 0.4 ps, and the protein co-ordinates stored every 50 ps. The first 5 ns were 271 ignored, and the last 10 ns, consisting of 200 frames were used for clustering analysis. 272 Top three configurations from each ensemble covering more than 70% of all configurations were 273 used (except for S2') for in silico screening of three unbound protease structures and one S1/S2 274 boundary structure. RMSD values within each ensemble vary from 0.9 to 1.3 Å. For the screening of 275 S2', one equilibrated configuration was used (Figure 2 ) 276 The configurations generated by REMD for two cleavage domains of S protein (S1/S2 and S2'), 278 TMPRSS2, trypsin and catL were used to obtain ensembles of configurations for each S protein-279 protease complex. Top two configurations from S1/S2 boundary and TMPRSS2 ensembles and top 280 three configurations from trypsin and catL ensembles covering ~70% of all configurations were 281 selected for the docking. RMSD values within each ensemble vary from 1.9 to 5.9 Å. One 282 equilibrated S2' configuration was used. TMPRSS2 and trypsin were docked to both S1/S2 283 boundary (R685-S686) and S2', while catL was only docked to a different cleavage site on S1/S2 284 boundary (T696-M697) using HADDOCK web server (31) (Figure 3 ). Active residues for flexible 285 docking were defined as follows; catalytic triad of TMPRSS2 and trypsin, active sites of catL 286 identified in Fujishima et al (32) , and cleavage sites on S1-S2 boundary and S2'. The cleavage sites 287 on S1/S2 boundary and S2' for TMPRSS2, trypsin and catL were obtained by aligning protein 288 sequences of SARS-CoV S protein (UniProt ID: P59594) and SARS-CoV-2 S protein. The sequences 289 of previously defined cleavage sites on SARS-CoV (14) were conserved in SARS-CoV-2 but residue 290 numbers were different (Table 3 ). In order to obtain, mutant S2' monomer and mutant S2'-trypsin 291 complexes, PyMol (version 2.3.2) built-in Mutagenesis tool was used. Filtering of our ~160 lead candidates were done using chemical similarity & toxicity. Chemical 310 similarity was quantified using the Tanimoto coefficient as calculated using the Molecular Operating 311 Environment (MOE). The acute toxicity as quantified by the mice lethal dose (LD50) was 312 downloaded from FDA drug reports & go.drugbank.com. The mechanism of action data was curated 313 from PubChem database (38). The least toxic drugs were selected from clusters that were generated 314 using a 70% chemical similarity, yielding a list of ~ 55 molecules. 315 All-atom AMBER-type force-field parameters were generated for the lead 55 molecules. We created 317 a pipeline to start from a pdb file of the lead molecules to producing gromacs type forcefields for the 318 protein-drug complexes. This pipeline included mostly AMBER tools (39), to calculate partial 319 charges at HF/6-31G (AM1-BCC) (40), and use the General AMBER Force-Field (GAFF) (41) for 320 bonded and van der Waals interactions, after ensuring the right protonation state for the molecules. University (Portland, OR). MD simulations in an aqueous environment were carried out for a total of 325 173 initial configurations of small molecule-protein complexes. The TIP3P water model was used to 326 simulate the system in an aqueous environment with proper number of counterions (Na+ or Cl-) to 327 ensure charge neutrality. A 3D periodic box was used to center the complex with at least 1.0 nm from 328 the edge. In the equilibration and production runs, the temperature was maintained at 300K and the 329 pressure was maintained at 1 bar, using V-rescale thermostat (42) and Parrinello-Rahman barostat, 330 respectively. The MD simulations incorporated leap-frog algorithm with a 2 fs time-step to integrate 331 the equations of motion. The long-ranged electrostatic interactions were calculated using particle 332 mesh Ewald (PME) (43) algorithm with a real space cut-off of 1.2 nm. LJ interactions were also 333 truncated at 1.2 nm. To monitor the systems to reach equilibration, root-mean-square deviation 334 (RMSD) of the complex structures were calculated as a function of time. Furthermore, to analyze the 335 interaction of the drug with the proteins, we performed RMSD of the drug during the course of the 336 simulation, while minimizing the RMSD of the protein monomer/complex. 337 The average binding free energy is derived from the Gibbs free energy, 339 configurations for TMPRSS2, trypsin, catL and S1/S2 as well as one configuration for S2' were used 505 for in silico screening. 506 TMPRSS2-S1/S2 complex, two configurations for TMPRSS2-S2' complex, six configurations for 508 trypsin-S1/S2 complex, three configurations for trypsin-S2' complex, six configurations for catL-509 S1/S2 complex were used for in silico screening. Catalytic sites of the proteases and cleavage sites of 510 the S protein are shown with spheres. 511 Projecting the transmission 372 dynamics of SARS-CoV-2 through the postpandemic period A new coronavirus associated with 374 human respiratory disease in China COVID-19: a novel zoonotic disease caused by a coronavirus 376 from China: what we know and what we don't Part 2: An Overview of Approval Processes: 378 FDA Approval of Medical Devices Current progress in antiviral strategies Identification of SARS-CoV-2 Cell Entry Inhibitors by 382 Drug Repurposing Using in silico Structure-Based Virtual Screening Approach. Frontiers in 383 Immunology Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19 Coronaviruses: an overview of their replication and pathogenesis. 388 Fusion mechanism of 2019-nCoV and 390 fusion inhibitors targeting HR1 domain in spike protein Identification of key interactions between SARS-CoV-2 392 main protease and inhibitor drug candidates Coronavirus Protease Identified by Virtual Screening of 606 Million Compounds. International 395 journal of molecular sciences The antiviral activity of iota-, kappa-, and lambda-carrageenan against 397 COVID-19: A critical review Activation of the SARS coronavirus spike protein via 399 sequential proteolytic cleavage at two distinct sites Different residues 401 in the SARS-CoV spike protein determine cleavage and activation by the host cell protease 402 TMPRSS2 Identification of a 404 broad-spectrum antiviral small molecule against severe acute respiratory syndrome coronavirus Nipah viruses by using a novel high-throughput screening assay Protease 408 inhibitors targeting coronavirus and filovirus entry Proteolytic activation of SARS-CoV-2 spike protein The role of dynamic conformational ensembles in 412 biomolecular recognition D936Y and Other Mutations in the 414 Fusion Core of the SARS-CoV-2 Spike Protein Heptad Repeat 1: Frequency, Geographical 415 Distribution, and Structural Effect Directory of useful decoys, enhanced 417 (DUD-E): better ligands and decoys for better benchmarking AutoDock Vina 1.2.0: New Docking 419 Expanded Force Field, and Python Bindings Simultaneous treatment of 421 human bronchial epithelial cells with serine and cysteine protease inhibitors prevents severe acute 422 respiratory syndrome coronavirus entry Evidence that 424 TMPRSS2 activates the severe acute respiratory syndrome coronavirus spike protein for membrane 425 fusion and reduces viral control by the humoral immune response Hydroxychloroquine-mediated inhibition 427 of SARS-CoV-2 entry is attenuated by TMPRSS2 Camostat mesylate inhibits SARS-CoV-2 activation by TMPRSS2-related proteases and its 430 metabolite GBPA exerts antiviral activity. EBioMedicine. 2021;65. 431 26 SWISS-MODEL: An automated protein 434 homology-modeling server Hepatocyte growth factor is 436 a preferred in vitro substrate for human hepsin, a membrane-anchored serine protease implicated in 437 prostate and ovarian cancers Systematic 439 investigation of halogen bonding in protein-ligand interactions The 444 HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes The crystal 447 structure of human cathepsin L complexed with E-64 ZINC--a free database of commercially available compounds for 449 virtual screening DrugBank 5.0: a major 451 update to the DrugBank database for 2018 The Binding Database: data management and interface 453 design Open Babel: 455 An open chemical toolbox AutoDock Vina: improving the speed and accuracy of docking with a new 457 scoring function, efficient optimization, and multithreading PubChem in 2021: new data content 459 and improved web interfaces Automatic atom type and bond type perception in 461 molecular mechanical calculations Fast, efficient generation of high-quality atomic charges AM1-BCC model: II. Parameterization and validation Development and testing of a 465 general amber force field Canonical sampling through velocity rescaling Molecular modeling: an experimental tool. Environ Health 469 Perspect The MM/PBSA and MM/GBSA methods to estimate ligand-binding 471 affinities Electrostatics of nanosystems: 473 application to microtubules and the ribosome