key: cord-0794426-5wc258os authors: Airas, Justin; Bayas, Catherine A.; N’Ait Ousidi, Abdellah; Ait Itto, Moulay Youssef; Auhmani, Aziz; Loubidi, Mohamed; Esseffar, M’hamed; Pollock, Julie A.; Parish, Carol A. title: Investigating Novel Thiazolyl-Indazole Derivatives as Scaffolds for SARS-Cov-2 MPro Inhibitors date: 2022-02-10 journal: European Journal of Medicinal Chemistry Reports DOI: 10.1016/j.ejmcr.2022.100034 sha: 8408aa0620536416fdbdf4571f167dd5f99204f5 doc_id: 794426 cord_uid: 5wc258os COVID-19 is a global pandemic caused by infection with the SARS-CoV-2 virus. Remdesivir, a SARS-CoV-2 RNA polymerase inhibitor, is the only drug to have received widespread approval for treatment of COVID-19. The SARS-CoV-2 main protease enzyme (MPro), essential for viral replication and transcription, remains an active target in the search for new treatments. In this study, the ability of novel thiazolyl-indazole derivatives to inhibit MPro is evaluated. These compounds were synthesized via the heterocyclization of phenacyl bromide with (R)-carvone, (R)-pulegone and (R)-menthone thiosemicarbazones. The binding affinity and binding interactions of each compound were evaluated through Schrödinger Glide docking, AMBER molecular dynamics simulations, and MM-GBSA free energy estimation, and these results were compared with similar calculations of MPro binding various 5-mer substrates (VKLQA, VKLQS, VKLQG) and a previously identified MPro tight-binder X77. From these simulations, we can see that binding is driven by residue specific interactions such as π-stacking with His41, and S/π interactions with Met49 and Met165. The compounds were also experimentally evaluated in a MPro biochemical assay and the most potent compound containing a phenylthiazole moiety inhibited protease activity with an IC50 of 92.9 μM. This suggests that the phenylthiazole scaffold is a promising candidate for the development of future MPro inhibitors. Owing to their significance in the viral life cycle and lack of related human homologues, the SARS-CoV-2 RNA-dependent RNA polymerase (RdRp) and 3C-like main protease (3CL Pro or M Pro ) have been identified as potentially promising drug targets. [11, 12] The RdRp catalyzes the synthesis of viral RNA and is the target of the nucleotide analog prodrug remdesivir. [11, 13] With the Food and Drug Administration's Emergency Use Authorization of remdesivir, the RdRp is currently the only SARS-CoV-2 drug target with an approved medicinal therapy. [13] [12, 14, 15] This function makes M Pro a potential target of therapeutic drugs. [14, 16] While the Pfizer PF-07321332 covalent M Pro inhibitor has recently show remarkably effective clinic results, there are currently no FDA-approved non-covalent inhibitors. The M Pro catalytic site consists of four binding pockets or subsites. These are the S1', S1, S2, and S4 subsites, occupied by the substrate P1', P1, P2, and P4 residues, respectively. [12, 17] The His41, Val42, Asn119, Thr25, Cys145, and Gly143 sidechains and the Thr26 backbone form the S1' subsite. His41 and Cys145 form the catalytic dyad. The S1 subsite, which accommodates the P1-Gln, is formed by the Phe140, Asn142, Ser144, Cys145, His163, His172, and Glu166 sidechains and the Leu141, Gly142, His164, and Met165 backbones. The P2-Leu accommodating S2 subsite is formed by the His41, Met49, Tyr54, Met165 and Asp187 sidechains and the Arg188 and Gln189 backbone. Lastly, the S4 subsite is formed by the Met165, Leu167, Pro168, Ala191, and Gln192 sidechains and the Glu166, Arg188, and Thr190 backbones. These subsites are displayed in Figure 1 . Various studies have shown that the residues forming these subsites are essential targets of M Pro inhibitors. [17] [18] [19] [20] [21] [22] [23] [24] Currently no drug has been approved for M Pro inhibition despite the screening of many structurally diverse compounds. [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] Notable candidates have included the peptidyl Michael acceptor N3 (kobs/[I] = 11,300 ± 880 M -1 s -J o u r n a l P r e -p r o o f nM, KI = 12 ± 1.4 nM). [12, 34, 35] Both inhibitors function by covalent bond formation with the catalytic Cys145 residue and form binding interactions with additional S1' -S4 subsite residues. [12, [37] [38] [39] [40] Figure 1: Subsites of the SARS-CoV-2 M Pro with the N3 inhibitor covalently bound (PDB Code 6lu7). Subsites S1', S1, S2 and S4 are displayed in green, red, cyan, and yellow, respectively. Catalytic residues His41 and Cys145 are shown in black below the surface of the subsites. Thiazole and indazole derivatives have displayed a wide array of biological function and there is significant interest in their pharmacologic applications. Many bioactive natural products, such as Vitamin B1, bacitracin, and penicillin contain thiazole ring structures. [31] Additionally, synthetic thiazole-based compounds have been shown to function as antineoplastic agents, anti-HIV drugs, antifungal agents, antiparasitic agents, anti-inflammatory agents, and antiulcer agents. [41, 42] Most recently, thiazole-based inhibitors have been reported for SARS-CoV-2. [43, 44] Synthetic indazole-based compounds have exhibited function as anti-inflammatory, antiarrhythmic, antitumor, J o u r n a l P r e -p r o o f antifungal, antibacterial, and anti-HIV drugs. [45] Indazole-based drugs are also being considered as SARS-CoV-2 M Pro inhibitors. [46] Previously, we reported the synthesis and characterization of two novel thiazolylindazole derivatives. [47] These compounds were synthesized through the heterocyclization of phenacyl bromide with (R)-carvone and (R)-pulegone thiosemicarbazones producing (R)-2- (2-(5-isopropyl-2-methylcyclohex-2-enylidene) hydrazinyl)-4-phenylthiazole I and (3aR,6R)-3,3,6-trimethyl-2-(4-phenylthiazol-2-yl) -3,3a,4,5,6,7-hexahydro-2H-indazol- 3a-ol II, respectively. While it was initially expected that these reactions would result in compounds I and (R)-2- (2-(5-methyl-2-(propan-2-ylidene) cyclohexylidene) hydrazinyl)-4-phenylthiazole III, an unexpected rearrangement of the (R)-pulegone thiosemicarbazone occurred in which the thioureido group underwent an N-H addition to the C=C double bond. Subsequent condensation of this intermediate with phenacyl bromide followed by an unexpected oxidation reaction resulted in compound II with two potential diastereomers (3aR,6R) and (3aS,6R). X-Ray and computational analysis indicated that the (3aR,6R) diastereomer was synthesized and is energetically favored. [47] In addition to exploring the inhibitory properties of I -III, we are also reporting the synthesis of a version of II lacking the hydroxy group (IV), and a version of III with a single bond connecting the isopropyl group to the cyclohexane ring (V). These compounds will provide further insight into the structural features underlying the binding of the thiazolyl-indazole derivatives. All structures are shown in Scheme 1. M. [43, 44] A machine learning approach to identifying investigational or off-market drug targets for SARS-CoV-2 identified an indazole containing compound as one of the most promising with an estimated binding affinity of -9 kcal/mol. [46] Given the wide-ranging and beneficial pharmacological impacts of thiazole-and indazole-containing compounds, and the community's interest in these drugs as potential SARS-CoV-2 M Pro inhibitors, we seek to elucidate the usefulness of thiazolyl-indazole derivatives as potential scaffolds for further pharmaceutical development and exploration. [41] [42] [43] [44] [45] [46] 48] In this work, we explore the potential for compounds I -V to function as reversible, noncovalent inhibitors of M Pro through chemical property prediction, biochemical assays, docking analysis and molecular dynamics simulations. For benchmarking purposes, we also include a detailed binding comparison with the viral substrate as well as a recently reported tight-binding non-covalent inhibitor X77 (PDB code 6w63: estimated Kd = 0.057 µM). [26] We have identified M Pro amino acids that consistently interact with these compounds and our simulations reveal the importance of π interactions in the mechanism of binding. We demonstrate experimentally that 7 compound I is capable of inhibiting the activity of M Pro ; and this, taken together with our binding analyses, suggests that the phenylthiazole scaffold is a good candidate for future M Pro inhibitor drug development. Two in-silico Pan-Assay Interference Compounds (PAINS) identification screens were conducted (http://www.cbligand.org/PAINS/ [49] and http://zinc15.docking.org/patterns/home/) for I -V. Additionally, molecular properties and predicted absorption, distribution, metabolism, and excretion (ADME) values were calculated using Schrödinger's QikProp program. [50] All default settings were used. In the Results section, we note any properties that violate the 95% range of known drugs. Compounds I and II have been previously reported and their spectral details are included in Supplemental Information (Figures S1-S6). [47] The synthesis of the thiazolylindazole heterocycles IV and V is being reported for the first time in this report. Each compound was prepared individually in a one pot reaction of the corresponding natural monoterpenic ketone, thiosemicarbazide, and the phenacyl bromide in refluxing ethanol conditions (Scheme 2). J o u r n a l P r e -p r o o f The general synthetic procedure was as follows. To an ethanolic solution (50 mL of absolute ethanol), 1 equivalent each of phenacyl bromide 1 and thiosemicarbazide 2 was added to 1 equivalent of (R)-Pulegone to produce IV, or to (R)-Menthone to produce V. The reaction mixture was heated under reflux for 1 h. After evaporating the solvent, the residue was diluted with water (10 mL) and extracted with dichloromethane. The organic layer was separated, dried on anhydrous MgSO4, evaporated to dryness, and then purified by silica gel column chromatography, using hexane/ethyl acetate as eluent, to obtain the corresponding thiazolyl-indazole compounds in good yield (IV: 82%; V: 70%). The structures of the two synthesized products were confirmed based on their spectral data (Figures S7 -S12 The endpoint values were recorded using the Softmax Pro software and all raw data was normalized with the "blank" solutions. The structure of the SARS-CoV-2 M Pro was obtained from the Protein Data Bank (PDB code 6lu7). [51] All waters and N3 were removed. A dimerized model of M Pro was created by aligning 6lu7 polypeptides to chains A and B of Human Coronavirus NL63 M Pro (PDB code 5gwy). Previous studies have shown that the monomer is functionally inactive due to the role N-terminal residues of the second chain play in forming the active site. [52] Schrödinger's Protein Preparation Wizard was used together with Prime, Epik, and PROPKA to prepare the protein. [53] [54] [55] Further protein preparation details can be found in the Supplemental Information. [53] [54] [55] [56] Schrödinger's Receptor Grid Generation program was used to generate a 40 Å by 40 Å by 40 Å receptor grid with a ligand size cutoff of 20 Å. This grid was centered on the previously identified catalytic site of M Pro and used for all subsequent ligand docking J o u r n a l P r e -p r o o f using default parameters. [12] Structures of compound I -V were manually built and optimized according to the GAFF force field using Avogadro 1.2. [57, 58] Geometry optimizations in PCM water at the B3LYP/6-31G* level of theory were also performed using Gaussian 16. [59] Both diastereomers (3aR,6R) and (3aS,6R) of IV were constructed (IV-E and IV-Z respectively). Compound V, with the exocyclic double bond removed, resulted in an altered cyclohexane structure that placed the methyl and isopropyl groups cis (V-E) or trans (V-Z) to each other ( Table 1 ). As such, V-E and V-Z were also built. Schrödinger's Glide Docking program was used to dock these molecules into the catalytic site of M Pro . [60, 61] Two poses, the primary pose (1º) that reports the most favorable GScore and an additional top scoring secondary pose (2º) with different binding interactions, were selected for further MD evaluation. Docking tools such as Glide are best used as idea generators and selecting poses with different binding interactions helps speed surface coverage in subsequent MD simulation. These poses and their GScores are viewable in Table S1 . Additional detail on Glide docking is provided in the Supplemental Information. Three 5-mer P4-P3-P2-P1-|-P1' substrates, based on the best-recognized proteogenic amino acid M Pro substrate reported by Rut et al., were built using Schrödinger's Maestro program. [62, 63] The C-terminal ACC dye was replaced with common P1'-site residues Ser, Ala, and Gly. [15] To protect against the influence of terminal charges, all substrates were ACE and NME capped. The sequences of these substrates are ACE-Val-Lys-Leu-Gln-[Ser, Ala, Gly]-NME ( Figure S15A ). These substrates will be referred to as VKLQS, VKLQA, and VKLQG. As with our treatment of I -V, binding site docking using the Schrödinger Glide Docking program was conducted. However, as this program is not intended for peptide docking, this expectedly failed to generate poses aligning with previously described substrate recognition sequencecatalytic site interactions. [15, 62, 64] Therefore, using UCSF Chimera, each substrate was manually positioned within the M Pro catalytic site to J o u r n a l P r e -p r o o f maximize the interactions between the substrate recognition sequence residues P1-Gln and P2-Leu and the M Pro S1 and S2 pockets. [65] The S1 and S2 pockets have been previously noted to be invariably occupied by the P1-Gln and P2-Leu residues, respectively, while low specificity is noted for the P3 and P4 residues. [62, 64] Our docking results were used merely as a starting structure for subsequent molecular dynamics simulations. To speed surface coverage for each system studied, we utilized two distinct conformations as simulation starting structures. Ideally, experimental structures (or modifications thereof) are best for initiating MD simulation; however, in cases such as the M Pro :substrate complex, where there is no experimental structure available, subsequent surface sampling imparted by the use of molecular dynamics mitigates concerns associated with our use of manually docked initial structures. After manual positioning, a minimization calculation was then performed under default settings using UCSF Chimera's Minimize Structure tool. The positions of the P1-Gln and P2-Leu sidechains were fine-tuned using Schrödinger's Maestro program. [63] Atoms/sidechains were manually moved, with localized minimizations performed with each movement. The P1-Gln sidechain was positioned to allow for contacts with Phe140, His163, and Gln166 in the S1 pocket, while the P2-Leu sidechain was moved into the S2 pocket. Additionally, the P1-Gln -P1' amide bond was positioned between catalytic residues His41 and Cys145. Following these movements, optimization and minimization was performed using Schrödinger's Protein Preparation Wizard according to the same steps previously detailed. This resulted in a common binding pose for all three substrates. This pose is detailed in Figure S15B . Additionally, ligand interaction diagrams for each substrate are viewable in Table S2 . The structure of X77 was extracted from PDB code 6w63. Schrödinger's Glide Docking program was used to dock X77 into the catalytic site of M Pro . This was conducted using the same protocol as previously detailed for I -V. Two poses were selected for further MD evaluation. The first pose reported the most favorable GScore and the second pose J o u r n a l P r e -p r o o f was obtained from 6w63 by aligning the structure onto chain A of our dimerized 6lu7 model and subsequently removing all 6w63 waters and peptide. These poses are viewable in Table S1 . Unrestrained molecular dynamics (MD) simulations were conducted on M Pro bound to compounds I -V, X77, and three 5-mer substrates, using the AMBER18 suite. [66] The 1 o and 2 o poses for I -V, and X77, were selected for MD analysis. The manually oriented binding poses detailed above were used to initiate MD for the 5-mer substrate. The ff14SB force field was applied to M Pro and the 5-mer substrates. [67] The program antechamber was used to apply the GAFF force field and AM1-BCC charges to I -V and X77. [57, [68] [69] [70] All models were neutralized with Na + ions and explicitly solvated in a TIP3P unit cell using the program tleap. [71] All simulations were performed using the GPU-accelerated pmemd code of AMBER18. [72, 73] Further details describing the MD protocol can be found in the Supplemental Information. In total, nineteen 1000 ns ensembles were generated (two for each compound I -III, IV-(E,Z), and V-(E,Z), one for each of the three possible 5-mer substrates, and two for X77). The total number of simulations performed is detailed in Table S3 . From each 1000 ns ensemble, all frames in which the ligand either sampled binding positions outside of the catalytic site or dissociated from the protein entirely where removed. All data analysis was conducted on the resultant truncated ensembles including only frames in which the ligand interacts with the catalytic site. As such, percent occurrence and thermodynamic data reported below are relative to each truncated ensemble. This approach allows us to evaluate the specific interaction energies between the active site of the M Pro and the various ligands without the variability associated with normal thermal motion and the random components of the MD algorithm. To avoid any binding over-estimation associated with this approach, we also track percent dissociation times for each compound. Trajectory visualization was conducted using UCSF Chimera and UCSF ChimeraX. [65, 74] MM-GBSA binding free energy, per-residue decomposition, and normal mode entropy analyses were conducted using the AmberTools MMPBSA.py package. [75] Entropic analysis has been shown to scale binding free energies closer to experimental values while also providing improved comparison of binding affinities across models. [76] This is especially important for comparison between diverse structures like peptide substrates and small molecule inhibitors. Entropy calculations were performed on each truncated ensemble of I and III (with II, IV and V excluded due to experimental and computational shortcomings reported in the Results section) with a 12.5 ns interval. Entropy calculations were also performed on the substrates and the X77 inhibitor. Hydrogen bonding, center-of-mass distance (COM), and root-mean-squared deviation (RMSD) analyses were conducted on I -III using the AmberTools cpptraj module. Potential aromatic -π interactions were screened using literature-based COM distance cutoffs as detailed in Table S4 . [77] [78] [79] [80] [81] [82] Ensemble averaged binding structures were created for the I -III truncated ensembles using cpptraj RMSD-based clustering. Compounds IV and V were excluded from this analysis due to shortcomings relative to I -III. The RMSD of each frame within each ensemble relative to its respective average binding structure was then calculated. Frames reporting an RMSD value below 1.75 Å were considered to sample the average structure. This cutoff value, intended to account for the dynamic behavior of each structure, was determined from trajectory visualization. Note that the percent occurrence of a structure that undergoes more fluctuation will be underreported. When RMSD analysis alone is deemed insufficient, center-of-mass (COM) analysis is additionally used with cutoff distances selected on a case-by-case basis. Our simulation protocol is optimized for non-covalent inhibitors and cannot be applied to the covalent inhibitors that have been shown to bind to the SARS-CoV-2 M Pro . As such, a computational comparison to the biochemical assay control compound GC376 could not be conducted. In lieu of this, comparison to X77 and the substrates is provided. All compounds passed both PAINS screens with no points of concern. There were no 95% range violations or reactive functional groups reported for I -III, however IV and V generated range violations, and ADME screening indicated a reactive functional group in V ( [50] Taken together, these results suggest that I displays the most favorable drug-like properties. candidates, as both display shortcomings relative to I -III. ADME screening indicates that the removal of the hydroxy group from II, to form IV, results in IV having an aqueous solubility outside the 95% range of known drug-like molecules, along with an increased potential for CNS activity and HERG K + channel blockage. This suggests that the hydroxy group of II may be necessary for maintaining drug-like properties. Likewise, ADME screening indicates that removal of the isopropylcyclohexane double bond in III to produce V results in the cyclohexanelinking imine becoming a potentially reactive carbonyl center (Table 1) . In order to evaluate the ability of the compounds to disrupt the activity of M Pro , we employed a commercially available kit (see methods). We confirmed the validity of the assay using the known covalent inhibitor GC376 obtaining an IC50 of 32.5 nM, similar to literature values ( Figure S13 ). [37] Since the assay relies on the cleavage of an internally quenched EDANS fluorophore (λex 360 nm, λem 460 nm), we first examined the J o u r n a l P r e -p r o o f background fluorescence of the synthesized compounds I, II, IV, and V. As seen in Figure S16 , compound IV exhibited high fluorescence when excited at 360 nm so it was excluded from biological testing. Due to solubility issues, we were only able to test compounds II and V up to 100 µM and did not observe inhibition of M Pro activity at these concentrations ( Figure S14 ). There were no solubility issues with compound I, in agreement with our ADME prediction that I is the most water soluble, while II -V lie close to or even surpass the 95% solubility range of known drug-like molecules, and therefore we were able to characterize the inhibitory activity ( Figure 2 , IC50 92.9 µM). To better understand the atomistic nature of the experimental results, MD simulations initiated from the structurally distinct 1º and 2º Glide poses (Table S1) show promise as good binders (Table 2 ). Increased dissociation of II and IV is likely the result of increased structural rigidity caused by the presence of the indazole group. Trajectory visualization suggests that this increased rigidity results in a diminished ability of II and IV to conform to catalytic site dynamics. To further evaluate binding, we calculated MM-GBSA binding free energies and average structures for I -V using truncated ensembles that included only frames where ligand dissociation did not occur (Table 2) . Taken together, the estimated binding free energies, percent dissociation values, and properties prediction suggest that I, III and V-Z show the most promise as potential inhibitors. This agrees with our experimental results for I; unfortunately, we were unable to synthesize III for experimental testing, and V did not show inhibition at concentrations lower than 100 µM. Limited solubility of V prevented us from testing at higher concentrations. It is noteworthy that our synthesis of V produced both the E and Z isomers and so it is possible that our experimental binding analysis at these concentrations was diminished by the presence of the E isomer. Data for each individual seed and the full non-truncated 1000 ns ensembles is detailed in Tables S5 -S11. Compound V is not likely to be a viable candidate due to its physical properties; however, a comparison to the results obtained with this molecule provide structural insight into the role of the exocyclic isopropyl moiety on the binding of I and III. While our trajectory analysis of I and III does not provide evidence of any specific interactions between the isopropylcyclohexane double bond and the M Pro catalytic site (see below), the slightly reduced MM-GBSA binding estimations for V suggest that the positioning of the isopropyl group and overall conformation of the cyclohexane ring in I and III affect binding affinity. It is notable that I, with its non-planar isopropyl group, displays a somewhat less favorable binding affinity and higher percent dissociation than III which has a planar isopropyl. Likewise, V-E with a non-planar isopropyl oriented similarly to I, displays a less favorable binding affinity and higher percent dissociation than the V-Z diastereomer with the isopropyl group oriented to the opposite side of the molecule. These results suggest that the orientation of the isopropyl is important and that a cisoriented isopropyl may be less suitable than a planar or trans-oriented isopropyl. It is similarly possible that the presence of the double bond in or adjacent to the cyclohexane group affects binding affinity. Geometry optimizations performed on I at the B3LYP/6-31G* level of theory in solvent (PCM water) indicate that this ring is largely planar with a pucker at the isopropyl group. The cyclohexane of III is also largely planar, however the pucker is more pronounced and present at the two cyclohexane carbons between the methyl and isopropyl groups. Keeping in mind that II and V did not produce J o u r n a l P r e -p r o o f inhibition in our experimental studies, it is notable that the cyclohexane ring in II is a chair-like whereas in V-E and V-Z the cyclohexane adopts twisted boat-and chair-like conformations, respectively (Table 2) . Taken together, our experimental binding results and molecular modeling suggests that some planarity in the cyclohexane ring may be important to binding. Per-residue energy decomposition analysis was conducted on each truncated ensemble of I -III ( Visualization of the truncated ensembles suggests close interactions between many of the above residues and the phenyl and thiazole groups of I -III. To further elucidate the importance of this interaction, COM distance analysis was conducted between these aromatic rings and the functional groups of the noted sidechains of I -III (Table S12 ). This analysis suggests the occurrence of various interactions, most notably π-stacking interactions with His41, and S/π interactions with Met49 and Met165. These compounds display similar per-residue decomposition profiles, and thus the occurrence of similar π interactions is not surprising. Additionally, the potential for OH/π interactions with Ser144, SH/π interactions with Cys145, and anion/π interactions with Glu166 are noted with II and III. These Glu166 interactions resemble Asp anion/π interactions reported in the work Ellenbarger et al. (face-on packing to the aromatic π-cloud). [83] The Cys SH/π interaction occurs only with II, while potential Ser OH/π interactions occur only with the phenyl group of II and with the thiazole of III. While the interactions in II suggest promise for the indazole scaffold as the basis of a potential M Pro inhibitor, we do not see experimental inhibition at lower concentrations. In addition to the π/π, S/π, OH/π and anion/π interactions, we also looked at hydrogen bonding patterns for the binding of I. Compound I donates a hydrogen bond to Gln189 for 28.56% of the truncated ensembles, in agreement with the decomposition energies reported in Table 3 . Solvent-bridged hydrogen bonds are additionally noted with Glu166 (11.17%) and Gln189 (9.33%). Hydrogen bonding patterns for I -III can be found in the Supplemental Information (Tables S13 -S14). Compound I has experimentally verified M Pro binding behavior suggesting that it may prove useful as an inhibitor scaffold for future development. Compound III is The sampling of these structures suggests the possibility of an energetic barrier separating these important conformations. The repeated occurrence of the I dominant structure across individual trajectories suggests that it is entropically favored and explains similarities in per-residue decomposition interaction energies between the two ensembles of I despite structural differences in the poses used to initiate simulation (Table 3) . I dominant structure where the (R)-carvone group is oriented inwards towards His41, Met49, and Met165 ( Figure 4B ). As this variation orients the phenylthiazole groups away from the three noted residues, His π-stacking and S/π interactions are not possible in this conformation. It should be noted that in order to generate a classic-textbook, non-dynamic pharmacophore, Figure 3A needs to be combined with 3B, i.e. atomistic physics-based simulations capture the thermal motion and energetic variability associated with real molecules, and as such almost never produce rigid pharmacophores. To obtain such a phamacophore, the conformations in Figure 3A are combined with the energetics and percent occurrences (or importance weightings) shown in 3B. Such a composite analysis would result in an image for compound I with only one orientation (due to the high percent occurrence (56%) and the overall conformer stability (-26 kcal/mol; second most stable, higher than the lowest energy conformer by only 2 kcal/mol and close enough to be considered approximately isoenergetic). Compound III does in fact sample two distinct orientations as noted by the two conformations with similar weightings and isoenergetics. Pharmacophores with multiple orientations are not unusual. [84, 85] J o u r n a l P r e -p r o o f subsites. This agrees with our per-residue decomposition (Table 3 ) and COM distance data (Table S12) suggesting interactions with His41, Met49, Met165, and Gln189. Additionally, the reversed I dominant structure ( Figure 5B ), the I non-dominant structure (14.62%) ( Figure 5D ), and the III dominant structure ( Figure 5E ) project their isopropyl and cyclohexane groups into the S2 pocket. Positioning of the isopropyl group into this pocket is expected due to the S2 subsite favoring occupation with Leu. As discussed above, simulations on V suggest that the positioning of the isopropyl group affects affinity and so we were not surprised to see it appear as a common binding motif in I and III; the ability to occupy the S2 subsite is likely related to the conformational preferences of the various isopropyl moieties. Notably, I is the only compound to significantly J o u r n a l P r e -p r o o f interact with the S4 subsite, while the dominant structure of III ( Figure 5E ) is the only average structure to bridge both the S1 and S2 subsites. Overall, all average structures demonstrate occupation of the previously noted important subsites for inhibitor binding. Further structural modification and optimization of these scaffolds may allow for additional interactions with various subsites and overall improved binding and inhibition. pocket residues, cyan indicates S2 pocket residues, red indicates S1 pocket residues, and green indicates S1' pocket residues. Each average binding structure is displayed, with the I dominant structure and its reversal shown in A and We performed MD simulations on three 5-mer M Pro substrates to provide a detailed binding comparison with the thiazole compounds. As above, truncated ensembles were produced and MM-GBSA binding free energy and per-residue decomposition analyses were conducted. Binding affinities between the VKLQA and VKLQS substrates appear highly similar, with both reporting average ∆∆G's of -34.88 kcal/mol. The VKLQG substrate reports a notably less favorable binding affinity and a higher percent dissociation from the catalytic site. This is expected, as coronavirus M Pro appears to more readily select for Ser and Ala in the P1' position over Gly. [86, 87] These energies, reported in Table 4 , suggest favorable binding for all three substrates, and on a scale that is relatively comparable to I. (Data for each individual seed is viewable in Table S15 .) Normal mode calculations were performed in order to include entropic effects in our binding free energy estimations, and to allow for comparison of I and III to the 5-mer substrates. These results are displayed in Table 5 . Expectedly, the 5-mer substrates report greater average entropic penalties than that of the phenylthiazole compounds. Corrected J o u r n a l P r e -p r o o f binding free energies, calculated by assuming the MM-GBSA ∆∆G to be roughly equivalent to ∆H and subtracting the entropic penalty, suggests that I and III show greater binding affinities than the 5-mer substrates. Compound III stands out amongst the phenylthiazole compounds, with the primary and secondary truncated ensembles displaying corrected binding free energies of -9.07 and -6.52 kcal/mol, respectively, while compound I displays corrected binding free energies in the range of -5.4 to -5.9 kcal/mol. This further supports the promise of these compounds to serve as scaffolds for further inhibitor development. It should be noted that in these benchmarking tests we are using truncated peptides and as such likely underestimate binding. Also, our simulations do not include any chaperoned folding or other in vivo effects. Per-residue decomposition and hydrogen bonding analyses reveal similar interactions to those described by Chan et al. [27] Many of the same residues noted to interact with their 11-mer substrates also interact favorably with our 5-mer substrates (Table S16) . Interactions with these residues are also noted with I (Table 3) . While the VKLQG substrate does not demonstrate any consistent hydrogen bonds, backbonebackbone interactions between Glu166 and P3 Lys are reported with both VKLQA and VKLQS. This has previously been noted as a primary interaction responsible for holding the substrate in place. [27] Notable interactions are also observed between P2 Leu and His163 as well as P1 Gln and Gly143, Cys145, and His163. Some previously unreported, but less consistent hydrogen bonds are also observed. Most notably of these are interactions between the sidechains of P1 Gln and Ser144 and His41, as well as the P3 Lys backbone and the Gln189 sidechain (Table S17) . Peptide:M Pro models are available at https://github.com/Parish-Lab/sarscov2 Our benchmarking results on the tight-binder X77 suggests that our methodology is appropriate. An average MM-GBSA ΔΔG of -44.16 kcal/mol is reported for X77 and the simulation did not produce any dissociated frames indicative of a tight binding ligand. The ensemble initiated from the 6w63 binding pose reports a significantly more favorable MM-GBSA binding energy (49.53) than that initiated from the top Glide pose (-38.74 kcal/mol) confirming that experimental structures make the best initial poses for subsequent MD, and that there may be a variety of possible active site orientations with a range of favorability. Binding free energy for each individual seed and per-residue decomposition data is reported in Tables S18 -S19. The entropic contribution and adjusted ΔΔG are reported in Table 5 . Our simulations confirm that the enthalpy of X77 binding is more favorable than any of the phenylthiazole scaffolds but less favorable than the various substrates. When entropy is considered, X77 is a markedly more favorable binder than either the substrate or the phenylthiazole compounds, underscoring the importance of rigidity imparted by small-molecule drugs, and the need for further development of the thiazole scaffolds. J o u r n a l P r e -p r o o f Our experimental binding analysis suggests that previously synthesized phenylthiazole compound I is a promising scaffold for future inhibitor development. Computational analysis suggests that I, along with the unsynthesized III, pass PAINS screenings and display ADME properties in-line with known drug-like molecules. Docking analysis predicts these molecules to bind favorably to the SARS-CoV-2 M Pro catalytic site and molecular dynamics simulations suggest prolonged interaction with the enzyme. Subsequent MM-GBSA binding free energy and per-residue decomposition calculations suggest that I and III experience energetically favorable binding driven predominantly by interactions with residues His41, Met49, Met165, and Gln189. Importance is also demonstrated for residues Ser144, Cys145, Glu166, and Asp187 across some of the ensemble-averaged structures. A detailed structural comparison suggests that the orientation of the isopropyl moiety and conformation of the cyclohexane ring may be important for M Pro binding. Trajectory visualization, per-residue decomposition and COM analysis supports the importance of specific π interactions between the phenylthiazole moiety and M Pro . Binding is driven by π-stacking interactions with His41 and possibly S/π interactions with Met49 and Met165. Other notable interactions are supported, including OH/π and SH/π interactions with Ser144 and Cys145 respectively and anion/π interactions with Glu166. An entropically corrected binding comparison to 5-mer substrates with demonstrated experimental affinity further suggests that these compounds show promise as scaffolds for future development as SARS-CoV-2 M Pro inhibitor drugs. A Novel Coronavirus from Patients with Pneumonia in China Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia A Pneumonia Outbreak Associated with a New Coronavirus of Probable Bat Origin A New Coronavirus Associated with Human Respiratory Disease in China The COVID-19 Epidemic Presymptomatic SARS-CoV-2 Infections and Transmission in a Skilled Nursing Facility The Lancet Alterations in Smell or Taste in Mildly Symptomatic Outpatients With SARS-CoV-2 Infection Estimating the Infection-Fatality Risk of SARS-CoV-2 in New York City during the Spring 2020 Pandemic Wave: A Model-Based Analysis. The Lancet Infectious Diseases Association of Cardiovascular Disease and 10 Other Pre-Existing Comorbidities with COVID-19 Mortality: A Systematic Review and Meta-Analysis Structure of the RNA-Dependent RNA Polymerase from COVID-19 Virus Structure of Mpro from SARS-CoV-2 and Discovery of Its Inhibitors ACTT-1 Study Group Members. Remdesivir for the Treatment of Covid-19 -Final Report The SARS-CoV-2 Main Protease as Drug Target Crystal Structure of SARS-CoV-2 Main Protease Provides a Basis for Design of Improved α-Ketoamide Inhibitors COVID-19: Targeting Proteases in Viral Invasion and Host Immune Response Optimization Rules for SARS-CoV-2 Mpro Antivirals: Ensemble Docking and Exploration of the Coronavirus Protease Active Site First Structure-Activity Relationship Analysis of SARS-CoV-2 Virus Main Protease (Mpro) Inhibitors: An Endeavor on COVID-19 Drug Discovery Identification of Proteasome and Caspase Inhibitors Targeting SARS-CoV-2 Mpro Ligand-Based Design, Molecular Dynamics and ADMET Studies of Suggested SARS-CoV-2 Mpro Inhibitors Targeting the Main Protease of SARS-CoV-2: From the Establishment of High Throughput Screening to the Design of Tailored Inhibitors Prediction of Novel Inhibitors of the Main Protease (M-pro) of SARS-CoV-2 through Consensus Docking and Drug Reposition A Blueprint for High Affinity SARS-CoV-2 Mpro Inhibitors from Activity-Based Compound Library Screening Guided by Analysis of Protein Dynamics Identification of Key Interactions between SARS-CoV-2 Main Protease and Inhibitor Drug Candidates Rapid Identification of Potential Inhibitors of SARS-CoV-2 Main Protease by Deep Docking of 1.3 Billion Compounds Computational Discovery of Small Drug-like Compounds as Potential Inhibitors of SARS-CoV-2 Main Protease Mpro Peptide Inhibitors from Modelling Substrate and Ligand Binding Ehrt Christiane; Ewert Wiebke; Oberthuer Dominik Dunkel Ilona Seychell Brandon; Gieseler Henry; Norton-Baker Brenna; Escudero-Pérez Beatriz Saouane Sofiane Groessler Michael; Fleckenstein Holger; Trost Fabian; Galchenkova Marina; Gevorkov Yaroslav Awel Salah; Peck Ariana Andaleeb Hina; Ullah Najeeb Schwinzer Martin; Brognaro Hévila; Rogers Cromarte Kuzikov Maria Zhang Linlin; Sun Xinyuanyuan; Pletzer-Zelgert Jonathan Usenik Aleksandra; Loboda Jure; Tidow Henning; Chari Ashwin; Hilgenfeld Rolf X-Ray Screening Identifies Active Site and Allosteric Inhibitors of SARS-CoV-2 Main Protease Structure of Papain-like Protease from SARS-CoV-2 and Its Complexes with Non-Covalent Inhibitors Potent Noncovalent Inhibitors of the Main Protease of SARS-CoV-2 from Molecular Sculpting of the Drug Perampanel Guided by Free Energy Perturbation Calculations Conserved Interactions Required for Inhibition of the Main Protease of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Structural Basis of Potential Inhibitors Targeting SARS-CoV-2 Main Protease Lead Compounds for the Development of SARS-CoV-2 3CL Protease Inhibitors Ensemble Docking Coupled to Linear Interaction Energy Calculations for Identification of Coronavirus Main Protease (3CLpro) Non-Covalent Small-Molecule Inhibitors Prioritisation of Compounds for 3CLpro Inhibitor Development on SARS-CoV-2 Variants Design of Wide-Spectrum Inhibitors Targeting Coronavirus Main Proteases Discovery of M Protease Inhibitors Encoded by SARS-CoV-2 Mechanism of Inhibition of SARS-CoV-2 Mpro by N3 Peptidyl Michael Acceptor Explained by QM/MM Simulations and Design of New Derivatives with Tunable Chemical Reactivity Both Boceprevir and GC376 Efficaciously Inhibit SARS-CoV-2 by Targeting Its Main Protease GC-376, and Calpain Inhibitors II, XII Inhibit SARS-CoV-2 Viral Replication by Targeting the Viral Main Protease Significance of Thiazole-Based Heterocycles for Bioactive Systems Recent Applications of 1,3-Thiazole Core Structure in the Identification of New Lead Compounds and Drug Discovery Imidazole, and Thiazole-Based Compounds as Potential Agents against Results in Chemistry Mapping Major SARS-CoV-2 Drug Targets and Assessment of Druggability Using Computational Fragment Screening: Identification of an Allosteric Small-Molecule Binding Site on the Nsp13 Helicase Recent Advances in Indazole-Containing Derivatives: Synthesis and Biological Perspectives Repositioning of 8565 Existing Drugs for COVID-19 Synthesis of New Thiazolyl-Indazole Derivatives from R-Carvone: A Combined Experimental and Theoretical Study Computational Exploration of Molecular Scaffolds in Medicinal Chemistry New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays Schrödinger Release 2019-3: QikProp, Schrödinger, LLC The Protein Data Bank An Overview of Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV) 3CL Protease Inhibitors: Peptidomimetics and Small Molecule Chemotherapy Schrödinger Release 2019-3: Protein Preparation Wizard; Epik Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of PKa Values PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical PKa Predictions OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins Development and Testing of a General Amber Force Field Avogadro: An Advanced Semantic Chemical Editor, Visualization, and Analysis Platform A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening SARS-CoV-2 Mpro Inhibitors and Activity-Based Probes for Patient-Sample Imaging Coronavirus Main Proteinase (3CL pro ) Structure: Basis for Design of Anti-SARS Drugs UCSF Chimera-A Visualization System for Exploratory Research and Analysis Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations Efficient Generation of High-Quality Atomic Charges. AM1-BCC Model: I. Method Efficient Generation of High-Quality Atomic Charges. AM1-BCC Model: II. Parameterization and Validation Quantum and Statistical Mechanical Studies of Liquids. 25. Solvation and Conformation of Methanol in Water Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald UCSF ChimeraX: Meeting Modern Challenges in Visualization and Analysis Py: An Efficient Program for End-State Free Energy Calculations Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics Simulations Aromatic Side-Chain Interactions in Proteins. Near-and Far-Sequence His-X Pairs the Protein Data Bank: Searching for Origin of the Attraction and Directionality of the NH/π Interaction: Comparison with OH/π and CH/π Interactions Insights into Interactions: A Stereoelectronic Basis for S-H/π Interactions Models of S/π Interactions in Protein Structures: Comparison of the H2S-Benzene Complex with PDB Data Sulphur-Aromatic Interactions in Proteins Anion-π Interactions in Computer-Aided Drug Design: Modeling the Inhibition of Malate Synthase by Phenyl-Diketo Acids Structural Basis of NR2B-Selective Antagonist Recognition by N-Methyl-D-aspartate Receptors Characterization of Two Pharmacophores on the Multidrug Transporter P-Glycoprotein Conservation of Substrate Specificities among Coronavirus Main Proteases Virus-Encoded Proteinases and Proteolytic Processing in the