key: cord-0989406-273q5m4n
authors: Glaser, Jens; Sedova, Ada; Galanie, Stephanie; Kneller, Daniel W.; Davidson, Russell B.; Maradzike, Elvis; Del Galdo, Sara; Labbé, Audrey; Hsu, Darren J.; Agarwal, Rupesh; Bykov, Dmytro; Tharrington, Arnold; Parks, Jerry M.; Smith, Dayle M. A.; Daidone, Isabella; Coates, Leighton; Kovalevsky, Andrey; Smith, Jeremy C.
title: Hit Expansion of a Noncovalent SARS-CoV-2 Main Protease Inhibitor
date: 2022-04-04
journal: ACS Pharmacol Transl Sci
DOI: 10.1021/acsptsci.2c00026
sha: 9b3ef5f70e1d70c478df429cd414cd400dd2f819
doc_id: 989406
cord_uid: 273q5m4n

[Image: see text] Inhibition of the SARS-CoV-2 main protease (M(pro)) is a major focus of drug discovery efforts against COVID-19. Here we report a hit expansion of non-covalent inhibitors of M(pro). Starting from a recently discovered scaffold (The COVID Moonshot Consortium. Open Science Discovery of Oral Non-Covalent SARS-CoV-2 Main Protease Inhibitor Therapeutics. bioRxiv 2020.10.29.339317) represented by an isoquinoline series, we searched a database of over a billion compounds using a cheminformatics molecular fingerprinting approach. We identified and tested 48 compounds in enzyme inhibition assays, of which 21 exhibited inhibitory activity above 50% at 20 μM. Among these, four compounds with IC(50) values around 1 μM were found. Interestingly, despite the large search space, the isoquinolone motif was conserved in each of these four strongest binders. Room-temperature X-ray structures of co-crystallized protein–inhibitor complexes were determined up to 1.9 Å resolution for two of these compounds as well as one of the stronger inhibitors in the original isoquinoline series, revealing essential interactions with the binding site and water molecules. Molecular dynamics simulations and quantum chemical calculations further elucidate the binding interactions as well as electrostatic effects on ligand binding. The results help explain the strength of this new non-covalent scaffold for M(pro) inhibition and inform lead optimization efforts for this series, while demonstrating the effectiveness of a high-throughput computational approach to expanding a pharmacophore library.

The development of antiviral therapeutics is a major focus of COVID-19 research. The SARS-CoV-2 main protease, M pro (3CLpro), is responsible for cleaving the viral polypeptide pp1a and pp1ab into functional protein subunits essential for viral replication. Because of this key role, and together with the low mutation rate of the active site 1−5 which suggests that mutations will not broadly impact the efficacy of SARS-CoV-2 M pro inhibitors, as well as a lack of homologous proteases in humans, M pro is a prime target for antiviral drug discovery. 3, 6 Recent studies have confirmed the effectiveness of targeting M pro for inhibiting viral replication. 7−9 Several inhibitors of varying affinity have been discovered, including some with activities as low as 20 nM, by optimizing compounds based on structure−activity relationships (SARs). 10, 11 Pfizer has developed two M pro inhibitors, PF-07321332 (Nirmatrelvir) 12, 13 and PF-07304814. 14 The former is orally bio-available and has been authorized for emergency use in the USA with the cytochrome P450 3A4 inactivator Ritonavir under the brand name Paxlovid. 15 The latter inhibitor is given intravenously and is currently in clinical trials. 16 The PF-07321332 inhibitor has a kinetic inhibition constant (K i ) of 3.1 nM. 13 For M pro both covalent and non-covalent inhibitors have been discovered. We focus here on non-covalent inhibitors, which can act alone or provide starting points for optimized covalent and reversible covalent inhibitors.

Over the past two decades, efforts to develop non-covalent inhibitors of the main proteases of human coronaviruses such as MERS and SARS-CoV-1 have resulted in only a handful of non-covalent pharmacophores (scaffolds) that can be used to derive molecular series for optimization. These optimization efforts have resulted in relatively few non-covalent inhibitors with IC 50 less than 1 μM in in vitro kinetic assays. 17−19 We use as a starting point here selected results from the PostEra COVID Moonshot, 20,21 which, through massive crowdsourcing and high-throughput experimental assays, has helped to discover a number of new motifs for non-covalent inhibitors with IC 50 values below 2 μM. A large proportion of these are isoquinoline-containing compounds, such as four of the compounds shown in Figure 1 . Crystal structures of many such compounds were determined through a collaboration between the Diamond Light Source XChem X-ray project and PostEra and were produced by high-throughput, automated methods. For all compounds in the isoquinoline series the isoquinoline nitrogen forms a strong hydrogen bond (2.6 Å to 2.8 Å) with His163 of M pro . Another hydrogen bond can be formed between a carbonyl oxygen and Glu166 of M pro ; this interaction is also found in many of the compounds from this series. Of the 362 unique quinoline-containing compounds from the PostEra dataset, 327 experimentally measured fluorescence IC 50 values have been reported. Of these, 179 have an IC 50 under 5 μM, 48 between 5 and 20 μM, 23 between 20 and 30 μM, and 77 over 30 μM.

Having discovered inhibitors (hits), a next step in the hit-tolead process can be hit expansion, in which chemically similar molecules are identified, assayed, and compared. Hit expansion aims at producing inhibitors of similar or increased activity to the original hit, expanding diversity beyond a particular scaffold, and extending understanding of SARs. 22 Highthroughput parallel expansion has been found to be effective in this regard. 23 Furthermore, serendipitous findings arising from the addition of random diversity to the hit compounds often results in discoveries of novel and unpredictable mechanisms of action. 24 These types of efforts can be automated and may include cheminformatics or other highthroughput modeling approaches. 25 Here we explore the ability of a fingerprint-based, automated hit expansion method to produce new M pro inhibitors starting from a parent inhibitor.

As a starting compound for our expansion we selected the chiral COVID-19 Moonshot inhibitor MAT-POS-b3e365b9-1 (bold in Figure 1 ). This compound is an intermediate product of a manual, crowd-sourced SAR optimization strategy, in which a weak (IC 50 ≈ 25 μM) aminopyridine SARS CoV-2 M pro inhibitor was successively modified to include an isoquinoline and a halogen group. Then, an oxane moiety was added to the m-chlorophenyl group to allow for stereoselective binding of the inhibitor, and further improvement of the affinity to an IC 50 = 80 nM was achieved by replacing the out-of-plane hydrogen of the aliphatic heterocycle with a methoxy group. This starting compound was the most potent non-covalent inhibitor characterized by the COVID-19 Moonshot project at the time our expansion was performed and compounds were ordered.

We performed the automated computational hit expansion by examining 1.37 billion compounds in the Enamine REAL library (including stereoisomers and tautomers), a database of commercially available drug-like fragments complying with Lipinki's "rule of five" 26 and Veber criteria 27 for orally active compounds (molecular weight ≤ 500 Da, SlogP ≤ 5, number of hydrogen bond acceptors ≤ 10, number of hydrogen bond donors ≤ 5, number of rotatable bonds ≤ 10, and total polar surface area ≤ 140 Å 2 ). 28 Enzymatic Activity Assays. We tested 48 compounds for activity against M pro (Figure 2 ). Our screening yielded 26 novel inhibitors with less than 50% residual enzymatic activity in the primary screen at 20 μM, and 21 satisfying the same threshold in both the primary and the confirmation screens (cf. Figure 3 and Table 1 ). Of these, we selected five inhibitors (compounds 6, 12, 17, 19, and 21) with high Z-scores for further characterization. The hits include a compound already characterized by the COVID Moonshot project (compound 21, Moonshot ID ADA-UCB-6c2cb422-1) and four others that, to our knowledge, have not been characterized previously. These compounds have the isoquinoline group in common, with either halogen substitutions at the meta position of the phenyl group (compounds 6 and 19), a methoxy (compound 12), methyl (compound 19), or 5-bromo-2,3-dihydrofuran (compound 6) group instead of the oxane moiety, or the addition of a methanesulfonyl functional group (compound 17). These five inhibitors were further characterized in a dose− response experiment ( Figure 4 ). Consistent with the high similarity scores to the starting compound, the IC 50 values of these inhibitors (Table 2) were also similar and between 1.6 and 4.8 μM, but not statistically different from the control compound 21 or from the micromolar inhibitor MCULE-5948770040. 29 Structure−Activity Relationship (SAR) from Similarity Search. Given the size of the screened database it is interesting that, among the 48 similar molecules tested, the most potent ones exhibit only minor modifications to the original scaffold retaining the isoquinoline, the halophenyl ring, and the amide bond connecting the two. These salient features may therefore be critical for the activity of this class of inhibitors. They also demonstrate the necessity for searching a comprehensive molecule database, since more global changes to the molecules did not lead to improved inhibition. In other words, minute modifications of the scaffold would have likely been missing from smaller databases. To test the hypothesis that the similarity in activity is consistent with a SAR model generated from the COVID-19 Moonshot dataset, we used machine learning to predict the importance of individual molecular features for the reference compound and the top five inhibitors ( Figure 5 ).

Although we do not explicitly explore the combinatorial effect of the various substitutions, e.g., by separately varying the scaffold and the chloro substitutions, the results can be interpreted in terms of a manual SAR. The slight increase in 50 increases further with compound 19, where the methylene bridge between the amide and the phenyl ring bears an additional methyl group, and the chlorine is simultaneously replaced by a fluorine. The model predicts that both contributions reduce the activity. A rigidification of the above-mentioned connection between the amide and the phenyl ring was made in compound 6, and chlorine was replaced by bromine, resulting in an even weaker binder. Generally, the most important functional groups of the isoquinoline derivatives are the 4-aminopyridine ring fused to a phenyl ring and the halogen and its closest atoms on the second phenyl ring. On the other hand, the substitutions that differentiate our top inhibitors from the reference compound and from compound 21 are predicted to be of minor relevance for the activity. This finding is in agreement with the observed minor changes in IC 50 values. Finally, the halogen accounts for an order of magnitude improvement in binding strength; this is supported by a compound nearly identical to compound 21 but without the halogen substituent (PostEra ID RAL-THA-2d450e86-1, cf. Figure 1 ) which has a reported IC 50 = 14 μM.

Crystallography. To examine the molecular basis of inhibition of hit compounds, we determined room-temperature X-ray structures of M pro co-crystallized with compound 12, compound 19, and compound 21 up to 1.90 Å resolution (Table S1 ). Other hits selected from the inhibition assay results were attempted but did not co-crystallize. Each ligand was modeled with unambiguous electron density in the active site (Figure 6a−c) .

The structures show that all three ligands form a hydrogen bond (d = 2.9 Å) between the isoquinoline and the ε-nitrogen on His163 in S1 subsite of the binding pocket. Another hydrogen bond (d = 3.1 Å) forms between the ligand carbonyl O and Glu166 backbone N in the S3 site. The amide NH group of the ligand forms a hydrogen bond with a water molecule (the water molecule is not shown in Figure 6 , but it is included in the analysis of water structure below). Moreover, one discerns a weak interaction (d = 3.9 Å to 4.1 Å) between the C2−C3 edge of the m-chlorophenyl group with the imidazole of the catalytic His41 in the S2 site. Halogen bonding has been recognized as a useful tool in drug design, in part for its tunability; 30 a halogen bond, with an average distance of 3 Å, is a relatively weak interaction but can contribute several kcal/mol to the binding energy of a ligand. 31, 32 While the geometries found in our crystal structures do not correspond to a classical RX−Y halogen bond, 33 the interaction may be of a halogen−π 30−32 nature. It may alternatively, or also, involve CH−π interactions. The methoxy and methyl substitutions in compounds 12 and 19, respectively, do not lead to additional interactions with the protein, which partially explains their lack of (significant) effect on activity. They can thus be understood as neutral substitutions of the ligand scaffold, potentially only having a steric or entropic effect on binding. Potential Role of Water Molecules in Stabilizing the Ligand Pose. Given the polar nature of most of the interactions discussed here, it is natural to look at the complex−solvent interactions, their qualitative change upon ligand binding and to elucidate the role of individual water molecules. Such an approach can form the basis for even more quantitative modeling of binding strength. The important role of water in mediating protein−ligand interactions is well known and can make modeling and prediction in drug discovery difficult. 34, 35 We performed a set of analyses to identify the locations of both trapped and displaced water molecules that impact the stability of the ligand binding pose.

The efforts of the global community to find drugs targeting SARS-CoV-2 proteins has led to an explosion in the number of crystal structures of these proteins, creating an unprecedented collection of structures of M pro for analysis. Making use of this wealth of data, we aligned 550 SARS-CoV-2 M pro structures deposited in the Protein Data Bank (see Table S2 for the list of structures) and interpolated the water oxygen positions onto a 3D grid around the active site (see Methods). In Figure 7 , we show the crystallographic water molecule loci satisfying our density threshold in the crystallographic ensemble as red surfaces.

To elucidate water positions specific to these isoquinoline ligands and to determine if stable water molecules with kinetics too fast to capture with crystallography were also important, we performed a constrained molecular dynamics (MD) simulation of the ligand compound 21 in the binding pocket in explicit water and analyzed the water molecule positions in the same way, showing these as blue surfaces.

Generally, crystallographic and MD water locations coincide, demonstrating good qualitative agreement between simulated and crystallographic water molecule positions. A large cluster found only in the simulation and not in the crystallographic database analysis is found to interact with the amide nitrogen on the ligand, an interaction that appears to be ligand-specific (a majority of the crystal structures either do not contain bound ligands or contain ligands bound in poses that differ from the isoquinoline series). Notably, some crystallographic waters found in the set of M pro crystal structures overlap with ligand compound 21 atoms and are therefore absent in the MD simulation, indicating water displacement. Panels (b) and (c) of Figure 7 show ligand substructures that form hydrogen bonds with the backbone of Glu166 and the side chain of His163, respectively, together with clusters of crystallographic waters found in the same positions as ligand atoms. In lieu of the ligand forming hydrogen bonds with the protein in the complex, water molecules are bonding partners in the fully solvated apo-protein structure. Key non-covalent protein− ligand interactions arise from the displacement of non-catalytic hydration water observed at reproducible positions in the binding pocket, as well as from displacement of water from the ligand solvation shell. Displacement of water molecules by ligands in protein binding sites is known to contribute to binding affinity with standard free energies of ΔG 0 ≈ 8 kcal/ mol 36, 37 and is likely a contributing factor to the binding strength of this series.

Density Functional Theory-Optimized Binding Pocket Geometry and Electrostatic Surface Analysis. Finally, using quantum chemistry, we examine some aspects of the electrostatic interactions of ligands with the active site. The electrostatic potential V el was calculated for the ligand alone, for the protein environment alone, and for the ligand bound to protein, all using a continuum solvent mimicking an aqueous environment. The density functional theory (DFT)-optimized geometry is in excellent agreement with the corresponding Xray structure (RMSD 0.71 Å), indicating that the choices of interacting atoms used to define the cluster and the water placement were robust. Figure 8 displays a particular result from these calculations. The electrostatic potentials of the mchlorophenyl moiety as calculated for the ligand alone (panel a) and in the protein active site (panel b) are substantially different. The full system used for the protein−ligand calculation is shown in panels (b) and (c). The anisotropic nature of the electrostatic potential on the Cl suggests that it is interacting with three or four neighboring protein sites. This result supports the order of magnitude change in affinity associated with the halogen group discussed above, and also the effect of various halogen substitutions in a hit expansion around the similar MCULE-5948770040 scaffold, as discussed by Kneller et al. 38 

Inhibition of the SARS-CoV-2 main protease (M pro ) is a major focus of drug discovery efforts against COVID-19. The list of M pro inhibitors is rapidly growing, 39 and SARs have been identified for several series. Various modes of inhibition have been developed, including small-molecule covalent inhibitors, peptidomimetic covalent inhibitors, non-covalent inhibitors, and metal-conjugated inhibitors. We report here a hit expansion of non-covalent inhibitors of M pro , focusing on an isoquinoline scaffold discovered as part of the PostEra COVID Moonshot. 20 A novelty of the present work is the use of a cheminformatics-based hit expansion, in which we performed an automated computational search of a billion-compound database using a molecular fingerprinting approach. In this hit expansion, 48 selected compounds were identified and tested, of which 21 exhibited inhibitory activity above 50% (IC 50 ) at 20 μM. Four new non-covalent inhibitors with IC 50 ≈ 1 μM were found. The isoquinoline motif is present in each of the four strongest binders, and the success of this hit expansion in demonstrating its importance was chiefly enabled by the enormous size of the database searched. A simple machinelearning model trained on COVID-19 Moonshot data suggests that the strong homology of the ligands is consistent with the SAR implied by the Moonshot dataset, which amounts to a manual hit expansion method guided by human intuition. Room-temperature X-ray structures of co-crystallized protein− inhibitor complexes reveal essential hydrogen bonds with the binding site and water molecules. MD simulation and quantum chemical DFT calculations further probe the nature of these interactions as well as charge effects on ligand binding. These compounds will benefit from optimization to improve binding affinity, solubility, desired anti-viral effect in live cells, and other key properties such as selectivity and metabolic stability. Nevertheless, we have demonstrated the potential of high-throughput, automated cheminformatics-based computational hit expansions for rapidly expanding the size of a set of hits for lead optimization.

■ METHODS Selection of Compounds. We selected compounds by scaffold similarity to the starting compound PostEra MAT-POS-b3e365b9-1. To this end, we employed the MAP4 MinHash-based, atom-pair molecule fingerprint with 1024 permutations. 40 MAP4 combines chemical environments of radius r = 1 and r = 2 around pairs of atoms with their topological distance, and for every such set of descriptors returns the member with the minimum SHA-1 hash under a random permutation, resulting in a similarity measure between molecules that is an unbiased estimator of the Jaccard index. 41 MAP4 has been shown to outperform comparable methods in the task of separating active binders from decoys by similarity. 40 We computed MAP4 fingerprints for the Enamine REAL library and employed the dask 42 -distributed and NVIDIA RAPIDS 43 GPU-accelerated data analytics libraries to parallelize the calculation of fingerprints and to reduce the dataset to the most similar compounds. We selected 47 unique scaffolds (cf. Supporting Information) having the highest fingerprint similarity (≥0.28125) and included a control compound (Enamine ID Z1530724813/PostEra COVID Moonshot ID ADA-UCB-6c2cb422-1, similarity 0.2549), for which an IC 50 has been reported by the PostEra project. These compounds were filtered for potential pan-assay interference compounds (PAINS) and purchased from Enamine. Our experimental characterization did not include the starting compound itself, as it was unavailable from Enamine; however, we included an additional control (MCULE-5948770040) that was previously found to be a potent, non-covalent M pro inhibitor 29 in a study unrelated to the PostEra project.

SARS-CoV-2 M pro Expression, Purification, and Enzyme Inhibition Assay. Protein purification supplies were purchased from Cytiva (Piscataway, NJ, USA). A gene construct encoding M pro (NSP5) from SARS-CoV-2 was cloned into plasmid pD451-SR 44 (Atum, Newark, CA) and expressed and purified with protocols detailed in ref 45 . Briefly, the authentic N-terminus was achieved by including an NSP4-NSP5 autoprocessing sequence flanked by maltose binding protein and M pro . At the C-terminus, a sequence encoding the human rhinovirus 3C (HRV-3C) cleavage site was followed by a His6-tag. The authentic N-terminal sequence was then created by autocleavage during expression, while the Cterminus was generated by HRV-3C treatment following Niimmobilized metal affinity chromatography.

Compounds were purchased from Enamine as 10 mM stock solutions in DMSO and stored at 20°C. All compounds are >90% pure by LC-MS. The assays were performed in 40 μL total volume in black half-area 96-well plates at 25°C as previously described. 29, 46, 47 The assay buffer contained 20 mM Tris-HCl pH 7.3, 100 mM NaCl, 1 mM EDTA, and 2 mM reduced glutathione with 5% v/v final DMSO concentration. Reaction final concentrations were 250 nM M pro enzyme, 20 μM inhibitor, and 40 μM FRET peptide substrate. The FRET substrate DABCYLKTSAVLQSGFRKM-E(EDANS) trifluoroacetate salt was purchased from Bachem (PN 4045664). Initial rates were determined for time points in the linear range by linear regression in Excel, residual activities were determined by normalizing candidate initial rates to the average of the positive controls, and Z-scores were determined by dividing the difference between the candidate initial rate and average positive control initial rate by the standard deviation of the positive control initial rates. The Z′ statistics for the plate were calculated using the published equation. 48 Figure S1 shows good agreement between the primary and the confirmation screen (Spearman-ρ = 0.44, P = 0.00164).

IC 50 Determination. To determine the concentration at which a compound was able to achieve 50% inhibition of M pro activity in vitro (IC 50 ), the FRET assay was performed at seven concentrations of inhibitor (0.03−31.6 μM) in duplicate. Initial rates were normalized to no inhibitor control (100% activity) and no enzyme control (0% activity), and nonlinear regression of the [Inhibitor] vs normalized response IC 50 equation was performed to fit the data using GraphPad Prism 9, yielding IC 50 and its 95% confidence interval.

Structure−Activity Model. To derive a structure−activity model, we trained a support vector regression model on 1365 fluorescence IC 50 values from the COVID-19 Moonshot dataset 20 using scikit-learn. 49 For every molecule we computed the 2048-bit Morgan fingerprint as a feature vector using RDKit 50 and the label as −log 10 IC 50 [μM], normalized by the mean and standard deviation of the training dataset. We then split the training fingerprints and labels into training and test sets using a 9:1 ratio. This model reproduced the order of experimentally measured activity values of the test set with a Spearman-ρ rank correlation coefficient of 0.73 and a meansquare error of 0.353 (in log 10 IC 50 units). To determine which parts of the molecule are important for predicting the affinity value, we used the GetSimilarityMapForModel function in RDkit, which removes a single atom and then recomputes the fingerprint for every atom in the molecule.

Crystallization. Three compounds with favorable IC 50 values were crystallized in complex with M pro and their structures determined using X-ray diffraction. Crystallization reagents were purchased from Hampton Research (Aliso Viejo, CA, USA). Crystallographic tools were purchased from MiTeGen (Ithaca, NY, USA) and Vitrocom (Mountain Lakes, NJ, USA). M pro concentrated to ∼5.0 mg/mL in 20 mM Tris pH 8.0, 150 mM NaCl, and 1 mM TCEP was used for crystallization by sitting-drop vapor diffusion. Conditions for growing crystalline aggregates of ligand-free M pro were identified by high-throughput screen at the Hauptman-Woodward Research Institute, 51 reproduced locally, and then used for microseeding to nucleate M pro crystals in subsequent co-crystallization experiments. Lyophilized samples of compounds 12 (Z1530718726), 19 (Z1530724963), and 21 (Z1530724813) (Enamine, Monmouth Jct., NJ, USA) were dissolved in 100% DMSO as 50 mM stocks and stored at −20°C. Compound 21 corresponds to PostEra COVID Moonshot ID ADA-UCB-6c2cb422-1. Compounds were mixed with M pro in a 5:1 molar ratio and allowed to incubate on ice for a minimum of 1 h. Crystals suitable for roomtemperature X-ray diffraction grew after 1 week in 20 μL drops at a 1:1 mixture with 18−20% PEG3350, 0.1 M Bis-Tris pH 6.5, nucleated with 0.2 μL of 1:200 dilution microseeds after incubation at 14°C.

Room-Temperature X-ray Data Collection and Structure Refinement. Protein crystals were mounted using a MiTeGen (Ithaca, NY) room-temperature capillary system. Xrays for crystallography were generated from a Rigaku HighFlux HomeLab employing a MicroMax-007 HF X-ray generator with Osmic VariMax optics. Diffraction images were collected using an Eiger R 4M hybrid photon-counting detector. Diffraction datasets were reduced and scaled using Rigaku CrysAlis Pro software package. Molecular replacement was performed using the ligand-free room-temperature M pro structure (PDB ID 6WQF, 44 ) using Phaser. 52 Structure refinement was performed with Phenix.refine from the Phenix suite 53 and manual refinement in COOT 54 assisted by Molprobity. 55 Data collection and refinement statistics are listed in Table S1 . The structures and corresponding structure factors of the room-temperature co-crystal complexes have been deposited into the Protein Data Bank (PDB).

Crystal Structure Hydration Analysis. To determine the average locations of water molecules around the active site in the ensemble of M pro structures deposited in the PDB, the Dali server 56 was used to identify homologues of the SARS-CoV-2 M pro . The query structure was the M pro :compound 21 protein−ligand complex reported here. The search across the full PDB database returned 277 crystal structures with strong structural homology. The three room-temperature X-ray structures reported here were also included in this set of structures. Alignment of all chains in the 280 M pro -analogous structures was performed using the align function in PyMOL, 57 with the M pro :compound 21 complex used as the target structure. Chains with strong alignments (RMSD smaller than 5.0 Å, number of aligned residues greater than 50, and alignment score greater than 150) were included in the structural ensemble; all crystallographic waters and resolved small molecules within 5.0 Å of any protein atom were maintained in the aligned structures. The final count of monomeric M pro homologues was 550 structures. Once all structures were aligned, a 3D histogram of crystal water oxygen atom positions was calculated around the active site of the enzyme. Each of the original 280 crystal structures was given an equal weight during the creation of the histogram to avoid over-weighting of the results toward structures from multimeric M pro or NMR datasets. Each cubic bin (voxel) was 0.25 Å × 0.25 Å × 0.25 Å. The resulting 3D histogram was then imported into Visual Molecular Dynamics (VMD) 58 to enable the visualization of hydration hot spots identified from the ensemble of crystal structures.

Molecular Dynamics Sampling of Water Positions in the Constrained M pro :Compound 21 Complex. To obtain a dynamic view of the hydration structure of M pro specific to the isoquinoline ligands, all-atom explicit solvent MD simulations of the protonated M pro :compound 21 complex were performed to model the protein in a bulk solvent environment. Here, both the protein and the ligand were completely restrained to prevent deviation away from the crystal structure geometry so that the solvation hot spots for the specific bound structure could be identified. Additional details regarding the MD simulation protocol are provided below. The 5 ns trajectory was analyzed in a similar fashion to ACS Pharmacology & Translational Science pubs.acs.org/ptsci Article the crystal structure ensemble to generate a 3D histogram of solvent oxygen atom positions around the ligand. Parameters for the solvated M pro :compound 21 complex were assigned in AmberTools21 tleap, 59 using the ff19SB force field 60 and GAFF/AM1-BCC parameters 61, 62 for the ligand compound 21. Resolved crystal waters were maintained. Protonation states of His residues were assigned using PropKa, 63, 64 and all His side chains were singly protonated. His172 and His163 were protonated at the ε-nitrogen position. The other histidines, His41 and His164, were protonated at the δ-nitrogen. The system was solvated in a box of TIP3P water molecules with a minimum distance of 12 Å from the protein to the nearest face of the box. Sodium and chloride ions were added to neutralize charge and maintain a 0.1 M ionic concentration. The OpenMM molecular simulation software package 65 was used to perform the MD simulations. The protein and ligand atoms were constrained by setting their masses to zero. The particle mesh Ewald (PME) method was used to account for long-range electrostatics interactions. A Langevin thermostat was used to keep the temperature of the solvent at 310 K. A Monte Carlo barostat was used to maintain the pressure at 1 atm. A time step of 2 fs was used to propagate the solvent atoms, with frames written every 2 ps. A total of 5 ns was performed for adequate sampling around the static protein−ligand complex.

Quantum Chemical Calculations. To examine specific static binding interactions in detail, quantum mechanical calculations were performed using DFT with the ORCA package. 66 We optimized the geometry of a cluster model of the ligand in the binding pocket and used this geometry to calculate the electrostatic potential in the ligand-only, proteinonly, and complex configurations. Starting geometries of the ligands were derived from crystallography, and the boundcomplex geometries included all residues within 5 Å of the ligand as well as six stably bound water molecules identified in the MD water analysis. Two carboxylic groups, belonging to Glu166 and Asp187, were protonated (i.e., COO − → COOH) to mimic an overall neutral charge of the protein. 67 The two carboxylic groups were chosen due to their peripheral position in the model and hence limited impact on the substrate− binding pocket interaction. Constraints were imposed on the protein backbone. The model with bound ligand contained 268 atoms and 2571 basis functions. The optimized geometry and the list of all constraints are available as Supporting Information.

The model was optimized at the BP86/Def2-SVP level of theory. 68−70 Grimme's D3 dispersion corrections were applied in all calculations. 71, 72 Single-point energy calculations at the optimized geometries were performed at the B3LYP/Def2-SVP level of theory. 73, 74 These calculations included D3 dispersion corrections as well as the CPCM polarizable continuum solvent model. 75 The dielectric constant was set to ε r = 4 during geometry optimization to mimic a protein environment. All calculations used the resolution-of-theidentity (RI) approximation and automatically generated auxiliary basis sets as implemented in ORCA. 76−78 For any single point in the vicinity of an atom, the electrostatic potential was computed using the ORCA_vpot module, with the density computed at the B3LYP/Def2-SVP level of theory in a continuum solvent with ε r = 80. This computation was performed for the complex, the ligand, and the environment (including water molecules). A Python script 79 was used to call the ORCA_vpot module to generate electrostatic potentials around the 4096000 = 160 × 160 × 160 points specified to surround the model of the protein active site and to convert the computed electrostatic potentials to the .cube file format for visualization. The .cube file was visualized using the Chimera program. 80

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsptsci.2c00026. Figure S1 , validation of the enzymatic assay; Table S1 , crystallographic data collection and refinement statistics for room-temperature structures of Mpro in complex with compounds 12, 19, and 21; and ■ REFERENCES

The interplay of SARS-CoV-2 evolution and constraints imposed by the structure and functionality of its proteins

Discovery of potential multi-target-directed ligands by targeting host-specific SARS-CoV-2 structurally conserved main protease

Inhibitor binding influences the protonation states of histidines in SARS-CoV-2 main protease

Tuning proton transfer thermodynamics in SARS-Cov-2 main protease: implications for catalysis and inhibitor design

From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses

Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors

Structure of M pro from SARS-CoV-2 and discovery of its inhibitors

SARS-CoV-2 Mpro inhibitors with antiviral activity in a transgenic mouse model

Potent noncovalent inhibitors of the main protease of SARS-CoV-2 from molecular sculpting of the drug perampanel guided by free energy perturbation calculations

Structure-Based Optimization of ML300-Derived, Noncovalent Inhibitors Targeting the Severe Acute Respiratory Syndrome Coronavirus 3CL Protease (SARS-CoV-2 3CLpro)

Pfizer unveils its oral SARS-CoV-2 inhibitor

An oral SARS-CoV-2 Mpro inhibitor clinical candidate for the treatment of COVID-19

A tale of two antiviral targets-and the COVID-19 drugs that bind them

Pfizer's novel COVID-19 antiviral heads to clinical trials

Discovery of Non-Covalent Inhibitors of the SARS Main Proteinase 3CLpro. Probe Reports from the NIH Molecular Libraries Program

The development of Coronavirus 3C-Like protease (3CLpro) inhibitors from 2010 to 2020

20) The COVID Moonshot Consortium

COVID moonshot: open science discovery of SARS-CoV-2 main protease inhibitors by combining crowdsourcing, highthroughput experiments, computational simulations, and machine learning

Synthesis and evaluation of isatin derivatives as effective SARS coronavirus 3CL protease inhibitors

Discovery of a potent inhibitor of the antiapoptotic protein Bcl-xL from NMR and parallel synthesis

Virtual screening: is bigger always better? Or can small be beautiful?

Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings

Molecular properties that influence the oral bioavailability of drug candidates

High Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor

The halogen bond

Computer modeling of halogen bonds and other σ-hole interactions

It's time to stop and think

The roles of water in the protein matrix: a largely untapped resource for drug discovery

Standard free energy of releasing a localized water molecule from the binding pockets of proteins: double-decoupling method

Classification of water molecules in protein binding sites

Structural, Electronic, and Electrostatic Determinants for Inhibitor Binding to Subsites S1 and S2 in SARS-CoV-2 Main Protease

One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome

Etude comparative de la distribution florale dans une portion des Alpes et des Jura

Dask: Library for dynamic task scheduling

Structural plasticity of SARS-CoV-2 3CL M pro active site cavity revealed by room temperature X-ray crystallography

Roomtemperature neutron and X-ray data collection of 3CL Mpro from SARS-CoV-2

Characterization of SARS main protease and inhibitor assay using a fluorogenic substrate

Malleability of the SARS-CoV-2 3CL Mpro activesite cavity facilitates binding of clinical antivirals

A simple statistical Parameter for Use in evaluation and validation of High Throughput screening assays

A deliberate approach to screening for initial crystallization conditions of biological macromolecules

Phaser crystallographic software

PHENIX: a comprehensive Python-based system for macromolecular structure solution

Features and development of Coot

MolProbity: all-atom structure validation for macromolecular crystallography

The PyMOL Molecular Graphics System, Version 1.8

VMD: visual molecular dynamics

Amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution

Development and testing of a general amber force field

Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation

Improved treatment of ligands and coupling effects in empirical calculation and rationalization of p K a values

PROPKA3: consistent treatment of internal and surface residues in empirical p K a predictions

OpenMM 7: Rapid development of high performance algorithms for molecular dynamics

Quantum Chemical Studies of Mechanisms for Metalloenzymes

Density-functional approximation for the correlation energy of the inhomogeneous electron gas

Density functional calculations of molecular bond energies

Fully optimized contracted Gaussian basis sets for atoms Li to Kr

A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu

Effect of the damping function in dispersion corrected density functional theory

Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density

Density-functional thermochemistry. III. The role of exact exchange

Energies, structures, and electronic properties of molecules in solution with the C-PCM solvation model

On some approximations in applications of Xα theory

Self-consistent molecular

Slater calculations I. The computational procedure

Accurate Coulomb-fitting basis sets for H to Rn

ChimeraA visualization system for exploratory research and analysis