key: cord-0824573-plpb5i5p authors: Wan, Hongbin; Aravamuthan, Vibhas; Pearlstein, Robert A. title: Probing the Dynamic Structure–Function and Structure-Free Energy Relationships of the Coronavirus Main Protease with Biodynamics Theory date: 2020-11-06 journal: ACS Pharmacol Transl Sci DOI: 10.1021/acsptsci.0c00089 sha: 6cae636f63392f27af13d4a5ead493e3e1df0137 doc_id: 824573 cord_uid: plpb5i5p [Image: see text] The SARS-CoV-2 main protease (M(pro)) is of major interest as an antiviral drug target. Structure-based virtual screening efforts, fueled by a growing list of apo and inhibitor-bound SARS-CoV/CoV-2 M(pro) crystal structures, are underway in many laboratories. However, little is known about the dynamic enzyme mechanism, which is needed to inform both assay development and structure-based inhibitor design. Here, we apply biodynamics theory to characterize the structural dynamics of substrate-induced M(pro) activation under nonequilibrium conditions. The catalytic cycle is governed by concerted dynamic structural rearrangements of domain 3 and the m-shaped loop (residues 132–147) on which Cys145 (comprising the thiolate nucleophile and half of the oxyanion hole) and Gly143 (comprising the second half of the oxyanion hole) reside. In particular, we observed the following: (1) Domain 3 undergoes dynamic rigid-body rotation about the domain 2–3 linker, alternately visiting two primary conformational states (denoted as M(1)(pro) ↔ M(2)(pro)); (2) The Gly143-containing crest of the m-shaped loop undergoes up and down translations caused by conformational changes within the rising stem of the loop (Lys137–Asn142) in response to domain 3 rotation and dimerization (denoted as M(1/down)(pro) ↔ 2·M(2/up)(pro)) (noting that the Cys145-containing crest is fixed in the up position). We propose that substrates associate to the M(1/down)(pro) state, which promotes the M(2/down)(pro) state, dimerization (denoted as 2·M(2/up)(pro)–substrate), and catalysis. Here, we explore the state transitions of M(pro) under nonequilibrium conditions, the mechanisms by which they are powered, and the implications thereof for efficacious inhibition under in vivo conditions. M pro is of current interest as an antiviral drug target, and experimental and in silico efforts toward the discovery of potent, efficacious inhibitors are currently underway in many laboratories. However, drug discovery is typically a trial-anderror/hit-and-miss undertaking due in no small measure to key deficiencies in the fundamental understanding of molecular and cellular structure-free energy relationships, as well as heavy reliance on equilibrium potency metrics (e.g., IC 50 , K d ) that are of limited relevance to the nonequilibrium in vivo setting. 1, 2 In this work, we break from traditional screening and structure-based drug design approaches, and examine M pro inhibition from a theoretical, in vivo relevant perspective based on multiscale biodynamics principles outlined in our previous work. 1, 2 Our theory addresses the fundamental nature of dynamic molecular structure and function under aqueous cellular conditions (which are powered principally by desolvation and resolvation costs), 2,3 and the general means by which cellular function is derived from interacting molecular species undergoing time-dependent cycles of exponential buildup and decay. As such, the enzyme structure−function relationship is necessarily considered in the overall context of cellular function and dysfunction (consisting of viral infection in this case), and in particular the following: (1) Synchrony between substrate k 1 , k cat , and k −1 , in which the bound substrate lifetime (t 1/2 ) is comparable to 1/k cat (a general kinetic paradigm that was first described by van Slyke and Cullen), 4, 5 and product inhibition is circumvented via fast leaving group dissociation; (2) Synchrony between the rates of enzyme and substrate buildup and product formation. We assume that infection proceeds in the following general phases: 6, 7 (1) Virion capture. (2) Virion "factory" construction. (a) Translation of ORF1a and ORF1ab into polyproteins pp1a containing nonstructural protein (nsps) 1−11 and pp1ab, containing nsp1−16, respectively. (b) Cleavage of the pp1a and pp1ab polyproteins into their constituent nsps. (i) Autocleavage of nsp3 (papain-like protease, PPL pro ) in cis is followed by nsp3-mediated cleavage of nsp4 in trans. (ii) Autocleavage of nsp5 (M pro ) in cis, followed by nsp5-mediated cleavage of nsp6 through nsp11/16 in trans. As such, M pro and its substrates are built together, the consequences of which are of critical importance to therapeutic inhibition (c) Buildup of the replication−transcription complex (RTC) within cytoplasmic endosome-derived double membrane vesicles. 8−11 (3) Virion production. (a) RNA replication (b) Structural protein translation/processing. (c) Virion assembly/export. 12 Therapeutic intervention is targeted optimally at proteins, such as M pro , that drive the earliest steps of viral infection prior to, or during, the factory construction phase. Clinical antiviral success depends on reducing the active M pro population below that required for RTC buildup and virion production at a threshold fractional inhibition of the protein population over time, which may be relatively high, given that many substrate copies can be cleaved by each free enzyme copy (constituting "leakage" from the inhibited system). Efficacious dynamic occupancy is achieved under nonequilibrium conditions at the lowest possible exposure when the rates of drug association and dissociation are tuned to the rates of target or binding-site buildup and decay. 1 In the case of enzymes, fractional occupancy depends on the inhibitor on-rate relative to that of the substrate (denoted as k 1 ·[substrate](t)·[enzyme](t) and k on ·[inhibitor](t)·[enzyme](t), respectively, where [enzyme]-(t) is denoted henceforth as k i ). The challenge in achieving efficacious M pro inhibition is greatest when substrate−M pro binding is kinetically tuned (versus mistuned), as reflected in the following M pro /substrate buildup scenarios: (1) Buildup coincides with polyprotein expression, thereby maintaining an approximately constant 1:1 M pro /substrate ratio throughout the "factory construction" phase of infection. This scenario is consistent with kinetically tuned substrate−M pro binding at the lowest possible substrate concentration. Efficacy at the lowest possible inhibitor concentration depends on high inhibitor−M pro occupancy under this scenario ( Figure 1A) , which in turn, depends on parity between k on and k i . (2) Buildup lags behind polyprotein expression, consistent with kinetically mistuned substrate−M pro binding, in which k 1 < k i ( Figure 1B) . Ideal fractional inhibition of the M pro population depends on k on < k i , whereas the minimal efficacious inhibition depends on k on > k 1 . The kinetic tuning requirement may be relaxed in the case of covalent inhibition, in which the inhibited enzyme fraction accumulates over time. However, accumulation rates ≪ k i can likewise result in "leakage" of uninhibited M pro and its downstream products. Covalent inhibition has been used successfully with other antiviral targets, including hepatitis C NS3 protease. 13, 14 Understanding the mechanism and dynamics of M pro cleavage and its subsequent activation is essential for differentiating among these scenarios and informing in vivo relevant inhibitor design. We collected, classified, and overlaid representative dimeric and monomeric ligand-bound and apo SARS-CoV and CoV-2 M pro crystal structures (see the "Materials and Methods" section ). We then explored and compared these structures using an integrated approach, consisting of 3D visualization and molecular dynamics (MD)-based solvation analysis (WATMD) 2, 15 to qualitatively assess the free energy barriers governing the intramolecular states of the monomeric (denoted as M 1 pro and M 2 pro ) and dimeric protein (denoted as 2·M 2 pro ), as well as the association and dissociation barriers governing substrate and inhibitor occupancy. We then investigated the structure and function of M pro , focusing on the inter-relationship between the catalytic and substrate-/ and the downstream cleavage products thereof (reflecting substrate association, dissociation, and turnover in aggregate) under the two general scenarios described in the text (the mathematical basis of these plots is explained elsewhere). 1 Worst case scenario, in which the rate of product buildup is comparable to k i . The plot includes the following quantities: autocleaved M pro buildup (green tracing) (noting that M pro decay depends on the existence of a degradation pathway), collective product buildup (purple tracing), and buildup and decay of inhibitorbound M pro under conditions in which the inhibitor k on ≈ k i (blue tracing), and inhibitor k on < k i (red tracing). (B) Same as A, except for the best case scenario, in which the rate of product buildup < k i . inhibitor-binding mechanisms and the means by which they are powered. Questions of interest include the following: (1) The basis of substrate-and dimer-induced activation and specificity of the catalytic site; (2) The interplay between covalent/mechanism-based inhibitor binding kinetics and the structural dynamics of the protein; and (3) The interplay between catalytic turnover and viral dynamics governing the buildup of postcleaved M pro and its substrates. Whereas biomolecular processes are considered in terms of equilibrium free energy models throughout mainstream cell biology and pharmacology, living systems (including virally infected cells) depend to a very large degree on nonequilibrium operation, where the state distributions of the participating molecular populations are transient. Spontaneous noncovalent intra-and intermolecular interactions, by definition, lower the total system free energy (i.e., for ΔG = −RT·ln(K) = G ∞ − G interacting < 0, where ΔG, R, T, and K are the free energy change, gas constant, temperature, and equilibrium constant, respectively). However, K, and therefore ΔG, are undefined under conditions in which the concentrations of the participating species vary over time. The nonequilibrium fractional occupancy of a given state is proportional to the relative rates of entry and exit to/from that state. Under such conditions, binding free energy is defined strictly in terms of the barriers governing the rates of entry and exit to/from each available state (denoted as ΔG in ⧧ and ΔG out ⧧ ). As such, the transient fractional occupancy of a given state is proportional to the relative rates of entry and exit to/from that state, rather than ΔG per se (ΔG ≠ ΔG in ⧧ − ΔG out ⧧ ). We previously reported a first-principles multiscale theoretical treatment of nonequilibrium structure−function−free-energy relationships referred to as Biodynamics. 1, 2 According to our theory, under aqueous conditions ΔG in ⧧ and ΔG out ⧧ are contributed predominantly by H-bond free energy differences between solvating water and bulk solvent. Such differences vary across solvent-exposed solute surfaces (both external and interior surfaces of buried cavities) as a function of losses and gains in water H-bond propensity (number/strength) relative to that of bulk solvent ( Figure 2 ). Free energy is increased/stored and decreased/released relative to unperturbed bulk solvent (which serves as the reference state) in the form of disrupted and enhanced water H-bond propensity (H-bond/enthalpically depleted and Hbond/enthalpically enriched/entropically depleted), respectively. Stored solvation free energy (constituting unfavorable potential energy) is released via the expulsion of each highenergy-solvating water to bulk solvent in response to intra-and intermolecular rearrangements. The rearrangement-induced return of water from bulk solvent to each high-energy solvation position incurs a cost equivalent to |(G bulk − G solv )|. Under nonequilibrium conditions, water transfer costs to/from bulk solvent and solvation are given strictly by ΔG to_or_from ⧧ = ∑ i (ΔG to_or_from ⧧ ) i summed over the i water transfers (versus the net free energy change over favorable + unfavorable transfers at equilibrium). ΔG to ⧧ and ΔG from ⧧ equate to the mutual desolvation and resolvation costs of the interacting solute groups, respectively. Solvating water is always entropically depleted (i.e., S solv < S bulk ), but enthalpically enriched or depleted (i.e., H solv < H bulk or H solv > H bulk ) in accordance with the local H-bond propensity at each position of a given solvent-accessible surface. The maximum desolvation cost per water molecule incurred during entry to a given state j is proportional to the maximum possible loss of water H-bond free energy from that state, which in turn, is proportional to the degree of H-bond enrichment of the solvating water (noting that the cost of transferring H-bond depleted and trapped water to bulk Free energy of solvating water molecules varies as a function of position on a given solvent-accessible surface. Solute surfaces are imprinted in ("written to") their solvating water in the form of H-bond propensity patterns, analogous to a three-dimensional bitmap (H-bond depleted and enriched solvating water molecules are denoted by a lightning bolt and heart, respectively), resulting in highly nonisotropic solvation free energy fields. Solvation free energy fields are "read" by state transition-induced unfavorable water transfers to/from bulk solvent and solvation (such that the overall state transition barrier equates to the total cost of such transfers). Polar/charged surfaces promote H-bond enriched solvation relative to bulk solvent, resulting in decreased solvation free energy (the expulsion of which incurs a free energy cost). Nonpolar surfaces promote H-bond depleted solvation relative to bulk solvent, resulting in increased solvation free energy (the expulsion of which results in a free energy gain). solvent is zero). The actual desolvation cost depends on the extent to which lost water-solute H-bonds across the rearrangement interface (e.g., the binding interface) are mutually replaced by intra-or intersolute H-bonds (which is typically, a zero sum game at best). The rate of entry to state j is therefore proportional to the total desolvation cost of that state, the occupancy of which increases as the rate of entry increases at a constant rate of exit (noting that the loss of Hbond propensity in even a single water molecule can slow the rate of entry). The resolvation cost per water incurred at Hbond depleted positions in the solvation shells of all participating solute atoms during exit from state j is proportional largely to the total loss of H-bond free energy relative to bulk solvent (noting that the cost of transferring water from bulk solvent to H-bond enriched positions is zero). The rate of exit from state j is therefore proportional to the resolvation cost of exiting that state, the occupancy of which increases as the rate of exit decreases at a constant rate of entry. The dynamic occupancy of a given state accumulates when the rate of entry > rate of exit, where the rate constants are proportional to ΔG in ⧧ and ΔG out ⧧ . The driving force of all noncovalent rearrangements under aqueous conditions (including protein folding) is attributed by Biodynamics to potential energy stored within solvating water, as follows: (1) The release of solvation free energy (i.e., potential energy) stored in H-bond depleted or trapped solvation via the displacement of such water by overlapping solute atoms. The persistence of a given state j (kinetic stability) is proportional to the resolvation cost incurred at H-bond depleted or trapped positions upon exiting that state. Highly persistent states result from the expulsion of large amounts of H-bond depleted solvation, whereas dynamic rearrangeability depends on the conservation of H-bond-depleted or -trapped solvation across the available states (i.e., conservation of local instability within a "Goldilocks zone" of global stability) ( Figure 3 ). (2) The generation of H-bond-enriched solvation in the folded state, which counterbalances the unfavorable energy contribution from residual H-bond depleted solvation (such that the global free energy remains within the Goldilocks stability zone). The molecular structure−function relationship is driven energetically by the following dynamically generated solvation patterns: (1) H-bond-enriched solvation serves as a "gatekeeper" for entry into a subsequent state from the penultimate state. Selective entry into one or more specific states (i.e., recognition) is proportional to the desolvation cost of those states. The lowest cost state(s) are entered fastest; (2) Hbond-depleted solvation governs the rate of decay of all states. The generation of such solvation upon entry to state j depends on the storage of solvation free energy during the penultimate state i (i.e., some of this energy is used to stabilize state i, and some is reserved to stabilize state j). Noncovalent rearrangements under aqueous conditions are therefore powered largely by solvation free energy (which we refer to as "hydropower"). We set about to characterize the catalytic cycle of M pro on this basis, including substrate binding, rearrangement of the catalytic site, and dimerization as a prelude to inhibitor design (which was not attempted in this work). Monomeric M pro is organized into three domains (denoted as domains 1−3; Figure 4A ), 16 which respectively comprise the "ceiling", "floor", and "basement" of the active site (AS). The domains are organized in a loosely packed arrangement that promotes high sensitivity of protein structure−function to the monomeric versus dimeric forms, the unbound versus substrate-bound forms, and the substrate versus productbound forms. Whereas the geometric relationship between domains 1 and 2 (which subserve the protease function) is relatively invariant throughout the available M pro crystal structures, rearrangeability of domain 3 is apparent in the monomeric versus dimeric forms of the protein. As such, we denote the hierarchical interdomain relationship as {1−2}−3 throughout the remainder of this work. The M 1 pro state is captured in Protein Databank (PDB) structure 2QCY, and the 2·M 2 pro state is captured in PDB structures 6M03 and 2Q6G, as well as many others (noting that monomeric M 2 pro is unobserved experimentally). The backbone NH groups comprising the oxyanion hole (contributed by Cys145 and Gly 143) reside on a 3D doublecrested, m-shaped loop, hereinafter denoted as the "m-shaped loop" ( Figure 4B ). The N-terminal leader and C-terminal tail sequences (denoted as NTL and CTT, respectively), the latter of which includes a small helix, play key roles in organizing the AS, m-shaped loop, and dimer interface. We assume in this work that the substrate binding and catalytic machineries are well-conserved in CoV and CoV-2, which differ by only 12 residues, half of which are located in domain 1 (including one Figure 3 . Cyclic nonequilibrium transitions between states i and j depend on conservation of H-bond depleted and/or trapped solvation, wherein the decay rate of state j is driven by H-bond depleted solvation transduced during state i (analogous to a "whack-amole" paradigm). Such a paradigm would equate to a perpetual motion machine in the absence of an external energy input requirement, such as the continual buildup and decay of one or more participating species (substrates in the case of M pro ), which are in turn, powered by covalent free energy sources, such as ATP. located at the upper boundary of the AS), two in domain 2, and four in domain 3. 17 As such, CoV and CoV-2 structures were used interchangeably throughout this work (noting that the residue numbering is that of the SARS-CoV-2 variant). Serine proteases function via a common catalytic mechanism conveyed by an Asp−His−Ser triad. However, a His-Cys dyad appears sufficient for proton abstraction from the more acidic Cys (relative to Ser) of cysteine proteases, leading to an activated thiolate−His ion pair. 19−21 The M pro catalytic mechanism may be summarized as follows: (1) Abstraction of the Cys145 proton by His41, resulting in a nucleophilic thiolate moiety (stage 1 proton transfer); (2) Substrate binding, followed by nucleophilic attack on the scissile bond, resulting in a transient tetrahedral intermediate (TI) which is stabilized by the oxyanion hole. This step is claimed to be extremely fast in other cysteine proteases (requiring stopped flow measurement) 20 ; (3) Spontaneous TI decay to the Nterminal leaving group (product 1) and thioester adduct; (4) Hydrolysis of the thioester adduct (stage 2 proton transfer, resulting in the C-terminal leaving group (product 2). This step is claimed to be rate-determining in other cysteine proteases. 22 However, alternate catalytic triad-based mechanisms have been proposed for M pro , including the following: (1) Substitution of the canonical Asp of the catalytic triad by a high-occupancy water molecule is observed near His41 in many M pro crystal structures and our WATMD results (possibly a weaker surrogate for Asp), 21,23 noting the absence of this water in subunit B of 2Q6G due to repositioning of Asp187 (which if catalytically essential, would result in enzyme inactivation); (2) Rearrangement of Asp187 from its observed pairing with Arg40 to His41. 13 Given the strategic location of the Arg40-Asp187 ion pair opposite to the domain 1−2 linker in all of the structures that we examined (exhibiting a latchlike appearance), stabilization of the domain 1−2 interface by this shielded ion pair is the more likely scenario ( Figure 5 ). Structural Data and Visualization. All M pro structures used in our study were obtained from the RCSB Protein Databank 24 and grouped according to species, site-directed mutants, apo versus ligand/substrate-bound forms, and dimeric and mutant monomeric forms ( Table 1) . All calculations and structure visualizations were performed using WATMD V9, 2,15,31 AMBER 16, 32 Maestro 2020−1 (Schrodinger, LLC), and PyMol 2.0 (Schrodinger, LLC). 2QCY, 2Q6G, and 6M03 were prepared for WATMD calculations using the PPrep tool in Maestro, and the resulting structures were aligned using PyMol. The aligned dimeric structures and their disassembled A and B chains were compared visually using PyMol and Maestro. We emphasize that this is a first-principles theoretical study with limited reliance on conventional molecular modeling techniques. WATMD Calculations. We mapped the following solvation properties around the solvent-accessible surfaces of M pro on a time-averaged basis: (1) H-bond enriched positions, in which the number/strength of solvating water H-bonds is increased/ enhanced compared with bulk solvent, resulting in an enthalpic preference for the solvation shell. Such solvation occurs at donor/acceptor-containing regions of the protein surface; (2) H-bond depleted positions, in which the number/strength of solvating water H-bonds is decreased/weakened compared with bulk solvent, resulting in an enthalpic and entropic preference for bulk solvent. Such solvation occurs at regions of the protein surface at which donors/acceptors are absent or scarce; (3) Trapped/buried positions within the protein surface, in which exchanges between solvating water and bulk solvent are highly limited or absent, resulting in an enthalpic and entropic preference for bulk solvent. Trapped water molecules typically H-bond with a single protein acceptor or donor, but in some cases, may be fully devoid of H-bonds; (4) Bulklike positions, in which no preference exists for solvation versus bulk solvent. WATMD is based on the fundamental assumption that the H-bond free energy of the solvation shell at each position of the solvent-accessible surface is correlated with the time-averaged occupancy of water atoms at that position. Dynamic water exchanges between bulk solvent and the solvation shell are estimated using unrestrained pubs.acs.org/ptsci Article molecular dynamics (MD) simulations, consisting of a 0.5 ns equilibration step, followed by a 30 ns production run. WATMD analysis is limited to the last 10 ns of the trajectory (40 000 frames), in which quasi-equilibrium exchanges between water and bulk solvent have been achieved. Water oxygen (O) and hydrogen (H) occupancies (referenced to the atomic centers) are sampled along a stationary 3D grid of 1 Å 3 voxels over the last 40 000 frames of the trajectory (noting that this voxel size was chosen to ensure single atom occupancy within the same simulation frame). Bulk and bulklike voxel occupancies are assigned based on six criteria representing the isotropic environment of bulk solvent, in which the H and O positions within each voxel are fully uncorrelated (corresponding to no orientational preference of the occupying water molecule). Voxels outside of the solvation shell (corresponding to bulk solvent) are omitted from the downstream analysis. The overall O and H counts accumulated during the simulation are distributed across all voxels in all cases in a Gaussian-like manner ( Figure 6A ,B, respectively), the mean of which corresponds to bulklike occupancy, and the low and high tails of which correspond to the following: (1) Left tail: graded H-bond depletion, ranging from low-to ultralow-occupancy voxels relative to bulk solvent (noting that many low and ultralow occupancy voxels result from competition between water and protein atoms); (2) Right tail: (a) H-bond enriched solvation, ranging from moderate occupancy voxels (just above bulk) to high occupancy voxels far above bulk solvent; (b) Water that is trapped within buried channels/cavities (or the rate of exchange with bulk solvent is slowed significantly), which manifests in many cases as ultrahigh occupancy voxels. The results are annotated on the grid using spheres encoded with the following information: (1) The relative percentage of O versus H counts accumulated over the 40 000 frames of the simulation, which are color-coded as follows: (a) Bright red ≈ 100% O occupancy over time, reflecting a voxel environment dominated by one or more protein donors; (b) Bright blue ≈ 100% H occupancy over time, reflecting a voxel environment dominated by one or more protein acceptors; (c) Red−white− blue spectrum = a mixture of O and H occupancies, reflecting a mixed voxel environment comprised of both protein donor(s) and acceptor(s). The spectrum is tipped toward the following: (i) Pink to red as the normalized percentage is tipped increasingly toward O; (ii) Purple to blue as the normalized percentage is tipped increasingly toward H; (iii) White when the normalized percentages are approximately equal; (d) Yellow = bulklike occupancy, reflecting an H-bonding environment that is iso-energetic to bulk solvent; (2) The normalized occupancy levels, which are encoded in the relative radii. The ∼30 Å 3 volume of a single water molecule maps to a supervoxel comprised of approximately 3 × 3 × 3 primary voxels. However, multiple groupings of primary voxels are possible, depending on the following: (1) The number of water molecules that are bound simultaneously around a given region of the protein surface (during all or a fraction of the 40 000 frames of the simulation), which in turn, depends on the local surface shape. Flat or convex surfaces/cavities are solvated by multiple waters (noting that primary voxel groupings are ambiguous in such cases), whereas concave surfaces are solvated by a limited (possibly single-digit) number of water molecules, commensurate with the available volume of the cavity; (2) The number of orientations of each water molecule over the 40 000 frames of the simulation, where each orientation corresponds to a unique primary voxel grouping. High-occupancy voxels residing in mixed protein acceptor/ donor environments often occur in clusters, reflecting the various time-averaged orientations of H-bond enriched water molecules. The occupancies within the primary voxels of each supervoxel necessarily sum to a 2:1 H/O ratio, given that water behaves as a rigid body (i.e., adjacent primary voxel occupancies cannot differ significantly for the H and O atoms of the same molecule). The resulting voxel maps (which we refer to as the "solvation structures") inform qualitatively about the time-averaged preferences for H or O, together with the preferences of water for solvation versus bulk solvent (i.e., proportional to the free energy content of the solvation, which putatively equates to the free energy content of the protein), at each grid position relative to the corresponding solventaccessible protein surface (exemplified in Figure 7 ), as follows: (1) Bulklike occupancy voxels (BLOVs) that are typically present within the outer to middle strata (3 and 2) of the grid (denoted by small yellow spheres). The corresponding solvation is approximately iso-energetic to bulk solvent. (2) Supra-bulk-like occupancy voxels (SBLOVs), which are typically present in stratum 3 (transitioning between bulk solvent and water solvating moderately nonpolar protein surfaces). Occupation of these voxels, which dominate the grid (denoted by white/gray spheres with radii moderately greater than those of bulklike voxels), is assumed to consist of laterally H-bonded water participating in water−water networks exhibiting free energies slightly below that of bulk solvent. (3) Low-occupancy voxels (LOVs), corresponding to exchangeable H-bond depleted solvation that is weakly Hbonded to a single protein donor or acceptor. The small red or blue spheres (radii < BLOVs) are typically positioned within stratum 1, directly adjacent to protein surfaces containing a single H-bond partner. (4) Ultra-low-occupancy voxels (ULOVs) located in the far lower tail, corresponding to exchangeable H-bond depleted solvation at nonpolar protein surface positions (effectively translating to holes in the solvation shell). The dot-sized (typically white) spheres are positioned within stratum 1, directly adjacent to fully or highly nonpolar protein surfaces. ULOVs are ubiquitous on both concave and convex surfaces (although sparsely distributed) within stratum 3 of the solvation shell. It is reasonable to believe that binding is greatly enhanced at concave surfaces capable of maximal desolvation, despite the ubiquitous presence of ULOVs on convex surfaces (noting that such surfaces may bind to concave surfaces on cognate partners, including antibodies). (5) High-occupancy voxels (HOVs), corresponding to exchangeable H-bond enriched solvation, which are likewise typically positioned within stratum 3, adjacent to concave, fully polar protein surfaces containing multiple H-bond donors and/or acceptors. Such water often exhibits multiple orientational preferences with respect to protein H-bond partners, as reflected in clusters of HOVs. H-bond enriched solvation governs access to H-bond depleted solvation within concave surface regions, reflected in ULOVs (serving as "gatekeepers"), and counterbalances the high energy of this solvation (thereby stabilizing the overall folded protein structure). The dynamic structure−function relationship depends on a Goldilocks zone of structural stability (i.e., a narrow window of rearrangements residing between structural collapse and unfolding), which is subserved by counterbalancing between favorable and unfavorable solvation free energy contributions. (6) Ultra-high-occupancy voxels (UHOVs) located in the far upper tail, typically corresponding to water trapped within buried surfaces ("bubbles"), which may be devoid of H-bonds (white spheres) or H-bonded to a single donor/acceptor (blue or red spheres, respectively). Such water is expected to be both enthalpically and entropically depleted, and can drive structural rearrangements (similar to that occupying ULOVs). The M pro structures listed in Table 1 were prepared and simulated using the following protocol: (1) Protonation states, Asn/Gln and His flips, missing atoms, and net charge were corrected manually using the PPrep tool in Maestro; (2) The prepared protein structures were simulated using AMBER 16 (ff14SB force-field) 32 at 300 K without restraints under periodic boundary conditions in a TIP3 water box, with the box boundaries residing 8 Å from the closest protein atoms. The pH-dependent M pro structure and substrate recognition and the possibility of pH-driven structure switching has been suggested by other workers on the basis of the observed pH dependence of M pro structure. 16, 33 However, similar structures were obtained over a wide range of pH (Table 1) ; furthermore, M pro appears to operate exclusively within the cytoplasmic double-membrane vesicle environment (pH 7.0−7.4). As such, M pro simulations at pH 7.0 seem justified. We assume that solvating water moves in concert with flexible protein substructures (a boundary layer effect). However, due to the fixed reference frame of the grid relative to the flexible protein and its solvating water, occupancy of certain voxels by both protein and water atoms over the 40 000 frames of the trajectory is expected (resulting in artificial reduction of the water atom counts in such voxels). We circumvented this problem via rigid-body alignment of the protein + water across the 40 000 frames of the simulation (relative to the stationary grid) to a common set of template residues located within each region of interest, such that the flexible moieties and their solvation are stationary with respect to the grid (analogous to the tail wagging the dog, in which the analysis is limited to the tail). The alignment residues for each region of interest in our study are listed in Table 2 . We simulated the time-averaged structures and voxel occupancies for the following M pro states, from which we qualitatively inferred the solvation free energy barrier magnitudes governing the M 1/down pro ↔ M 2/up pro state transitions, together with those governing dimerization and substrate and inhibitor association and dissociation: (1) The apo form of monomeric M 1/down pro (2QCY) and the putative substrate-bound Figure 7 . Stereo views of the WATMD annotations described in the text, exemplified for monomeric M pro (2QCY). In general, solvation shells are loosely organized into three major strata (demarcated by yellow lines) spanning between the protein surface and bulk solvent (noting that bulk solvent per se is omitted from WATMD analyses), as follows: (1) Stratum 1: ULOVs that are largely or fully devoid of protein H-bond partners, which reside directly adjacent to nonpolar protein surface patches, as well as HOVs residing directly adjacent to Voxels are denoted by spheres, which are scaled in proportion to their relative time-averaged H and O occupancies, and color-coded according to relative preference for O versus H (red and blue, respectively), or lack thereof (white). (A) Full WATMD grid, viewed toward the protein surface in the direction of strata 3 to 1. A crystallized M pro substrate extracted from 2Q6G (magenta) is overlaid on the active site for reference. Bulk solvent surrounding the solvation shell has been removed, resulting in an irregular grid boundary. (B) AS of M pro viewed approximately parallel to the pocket. Stratum 3 voxels typically consist of BLOVs (yellow spheres) and SBLOVs (white spheres). (C) Same as B, except zoomed into stratum 2 voxels, which typically consist of LOVs occupied by solvation that is weakly H-bonded to a single protein donor or acceptor (small dark blue spheres). (D) Same as B, except zoomed into stratum 3 voxels, which typically consist of HOVs occupied by solvation that is strongly Hbonded to multiple protein donor(s) and/or acceptor(s) (large spheres) or ULOVs occupied by solvation that is largely or fully devoid of H-bonds (dot-sized spheres). UHOVs corresponding to trapped water within buried channels/ cavities are not shown. form of monomeric M 2/up pro (PDB structure 2Q6G with one chain removed), focusing on the following: (a) The AS solvation structure in 2QCY informs qualitatively about substrate k 1 and k −1 , as well as inhibitor k on and k off . We examined the correspondences between low-versus highoccupancy voxels and (i) substrate atoms extracted from 2Q6G, which we overlaid on the time-averaged protein and solvation structures of 2QCY; and (ii) the atoms of representative inhibitors (Table 1) Overview of M pro Structural Dynamics. Analysis of the M pro crystal structures in our study suggests the existence of a complex substrate-binding mechanism in both CoV and CoV-2 variants. This mechanism can be dissected into four interdependent switchable dynamic contributions, consisting of the following ( Figure 8) : (1) Rigid-body rotation of domain 3 relative to domains {1−2}, where domain 3 oscillates between the M 1 pro and M 2 pro states (noting that dimerization occurs fastest in the substratebound M 2 pro state). The trajectory is guided by transient rearrangements over a large H-bond network spanning within and between the dimeric subunits. (2) Cooperative state transitions between domain 3 and the rising stem of the m-shaped loop, in which the 3 10 helix melts into the extended chain (denoted as M 1/down pro ↔ M 2/up pro ). The free energy difference between these states is attributable to solvation-mediated rearrangements (see below). Monomeric M 1/up pro is ruled out by our mechanism, and monomeric M 2 pro is highly transient (noting that neither of these states is observed experimentally). ( across the domain {1−2}−3 interface in the monomeric form and additionally across the dimer interface. Here, we focus on the configurational rearrangements within this network that switch M pro between the substrate binding, dimerization, and catalytically competent states. The detailed effects of these rearrangements on the domain {1−2}−3 interface, m-shaped loop conformation, and dimer interface are addressed in the following sections. The dilemma for all dynamic intra-and intermolecular rearrangements relates to the trade-off between specificity and transience/throughput, which according to Biodynamics, is achieved via counterbalancing between energetically favorable and unfavorable contributions (which we refer to as "yins" and "yangs"). 2 The fastest rearrangements prevail, and the balance is tipped transiently toward specific condition-dependent states, so as to avoid equilibration. Specificity Figure 15A ), which terminates below the AS βhairpin (denoted as entrance 1; Figure 15B ) and above the domain 3 C-terminal helix (denoted as entrance 2; Figure 15C ). The channel consists largely of Arg131, Glu290, Lys137, Asp240, and Asp289, the H-bond network of which is disrupted in the M 1 pro state ( Figure 12) . A second buried channel appears elsewhere within the domain {1−2}−3 interface in the M 2 pro state (denoted as channel 2; Figure 15A ), which terminates within the dimer interface (noting that this entrance is closed in all substrate-/inhibitor-bound structures ( Figure 15D ). The channel lining consists largely of Lys5, Met6, Ala7, and Phe8 of the NTL, together with Phe291, Thr292, Asp295, Val296, Arg298, Gln299, and Cys300 of domain 3. Rearrangement of the domain {1−2}− 3 interface during the M 1 pro → M 2 pro state transition results in the loss of channel 1, mediated largely by Arg131 and two βstrands of domain 2 that occupy the channel in the M 2 pro state (Figures 11 and 15E ). Reverse rearrangement of the interface during the M 1 pro → M 1 pro state transition results in the loss of channel 2, mediated largely by Arg4, Lys5, and Met6 of the NTL backbone that occupies the channel in the M 1 pro state (Figures 12 and 15F) . Channel 1 is occupied by expellable ULOVs and HOVs ( Figure 15G ). Although HOVs typically correspond to H-bond enriched solvation, the narrowness of the channel is consistent with slowly exchanging water between the channel and bulk solvent (via entrances 1 and 2). We therefore hypothesize that water occupying channel 1 in the M 1 pro state is entropically/ enthalpically unfavorable, and as such, promotes local instability of the domain {1−2}−3 interface. This water is displaced to bulk solvent during the M 1 pro → M 2 pro state transition. Channel 2 is likewise occupied by HOVs and ULOVs, which in the absence of an open entrance, necessarily correspond to fully trapped/nonexpellable solvation ( Figure 15H ). As such, potential energy released by the expulsion of water from channel 1 during domain 3 rotation is partially stored in the water trapped within channel 2. This water is vented subsequent to product dissociation upon completion of the catalytic cycle ( Figure 15I ), thereby driving the M 2 pro → M 1 pro state transition. The overall mechanism can be summarized as follows ( Figure 16 ): (1) M 1/down pro is destabilized within a Goldilocks zone (globally stable/locally unstable) by impeded (though Hbonded) and H-bond depleted solvation within buried channel 1. (2) Spontaneous rigid-body rotation of domain 3 underlying the M 1 pro → 2·M 2 pro transition is powered by the expulsion of channel 1 solvation through entrance 1, which is accompanied by the creation of channel 2 and the solvation thereof by trapped water (analogous to loading a spring). ( 3) The open state of channel 1/entrance 1 may be stabilized transiently by substrate binding (a key external energy input to the system), as inferred from the close proximity of this entrance to the β-hairpin substrate binding site. (4) Dimerization (i.e., 2·M 2 pro formation) depends on specific positioning of the NTL, part of which comprises the lining of channel 2. Dimerization is well-explained by the expulsion of H-bond depleted solvation from the dimer interface (see below), which further stabilizes the watertrapped state of channel 2. (5) Opening of channel 2 subsequent to product dissociation (as captured in 6M03), followed by venting of the trapped water, drives the reverse 2·M 2 pro → M 1 pro state transition. Putative Conformational Transitions of the m-Shaped Loop. The m-shaped loop, which contains the catalytic Cys (resident on crest A of the loop) and oxyanion hole (resident on crests A and B), is common to all members of the chymotrypsin family. Crest B of M pro switches between the down (S1-subpocket-accessible) ( Figure 17A ) and up (S1subpocket-inaccessible) positions ( Figure 17B ) corresponding to the M 1/down pro and 2·M 2/up pro states of the enzyme, respectively. The S1 subpocket switches between the open/oxyanion hole misaligned and closed/oxyanion hole aligned states in M 1/down pro and in 2·M 2/up pro , respectively. Although access to the S1 subpocket is sterically blocked by Asn142 in the crest B up position, the cavity itself remains intact and occupiable (as such, Asn142 acts as a gatekeeper rather than a plug; Figure 17C ). We postulate that the complex m-shaped loop mechanism of M pro is tailored for lowering the otherwise high desolvation cost of the polar P1 Gln side chain during substrate association with the S1 subpocket (which appears to be only partially desolvated in the bound state). The need for this mechanism is obviated in hepatitis C NS3 protease and chymotrypsin due to the preference of those enzymes for Cys/ Thr and aromatic P1 side chains, respectively. As such, the mshaped loops of these proteins are instead rigidified via an extra crest in NS3 ( Figure 18A ) and a disulfide bond to an adjacent chain in chymotrypsin ( Figure 18B ), resulting in continuous S1 subpocket accessibility ( Figure 18C,D) . We explored the M 1/down pro ↔ 2·M 2/up pro transition mechanism via comparison of the monomeric and representative dimeric CoV and CoV-2 structures to better understand the functional purpose and detailed structural and energetic basis of the up/ down bidirectional state transition of crest B. We now turn to exploration of the following: Figure 19A ). The domain 3 position in 2BX3 is similar to that in 2Q6G, but the m-shaped loop conformation is extended in the latter structure, and the Lys5-Gln127 Hbond in zone 3 that promotes the M 1/down pro state is also present in the 2·M 2/up pro state, suggesting that the m-shaped loop conformation and domain 3 positioning are decoupled anomalously. These and other differences do not appear to be pH-dependent, noting that boceprevir crystallized in CoV-2 M pro at pH 6.5 (PDB structure 7BRP), pH 7.5 in 6WNP, and pH 4 in 6XHM exhibit only slight structural differences. A comparison of the m-shaped loops in 2QCY and 6WNP reveals the detailed differences between these two conformations ( Figure 19B (1) A buried water channel (denoted as channel 3) is present in the time-averaged apo 2·M 2/up pro state (6M03) ( Figure 20A ), which is absent in the time-averaged M 1/down pro state (2QCY) ( Figure 20B ). This channel, which is occupied by several HOVs and ULOVs, resides largely within the opposite subunit of the dimer, projecting behind the m-shaped loop, and connecting to the protein surface directly below the S1 subpocket. (2) Two UHOVs (representing high energy trapped water) residing between and below crests A and B of the m-shaped loop are present in M 1/down pro , whereas a single UHOV (likewise representing high energy trapped water) is present in the 2· M 2/up pro state, near the descending stem of the m-shaped loop ( Figure 20C ). These findings suggest that trapped solvation Enzyme Dynamics in cis. Dimer-independent catalytic activity of precleaved M pro was observed by Chen et al., who nevertheless proposed the existence of an "intermediate" which Asn142 points away from the S1 subpocket), awaiting substrate association. Middle: substrates associate into the AS, projecting their P1 side chain into the open S1 subpocket. Right: crest B undergoes substrate-and dimerization-induced rearrangement to the up position, with Asn142 facing the S1 subpocket. We postulate that this mechanism facilitates partial desolvation of the highly polar P1 Gln side chain of cognate M pro substrates. dimeric form of the enzyme. 35 A more plausible explanation is that precleaved M pro exists exclusively as monomers embedded within the polyprotein, whereas the postcleaved species necessarily exists as a mixture of monomers and dimers, in which the monomeric form binds substrates that are cleaved by the dimeric form (such that k cat monomer ≪ k cat dimer ). The precleaved monomeric form of M pro cannot be fully represented in 2QCY because the C-terminal peptide is spatially far from the AS (noting that the Gln306 C-terminus serves as the P1 residue of the precleaved protein). We propose the existence of two distinct forms of monomeric M pro , consisting of: (1) The postcleaved species captured in 2QCY. subpocket is alternately blocked and unblocked in the two rotamers, suggesting that the rate of repositioning may be rate-limiting for cognate substrate binding (i.e., the Met165 side chain is energetically "frustrated").] The catalytic cycle is energetically self-consistent, beginning with substrate association-induced expulsion of H-bond depleted solvation from the AS. Cleavage of the Gln306 peptide bond ( Figure 22A ) results in two products, consisting Figure 22B ,C, respectively), noting that the chain inserts into the AS in the N-to C-terminal direction. Dissociation of the C-terminal leaving group has no impact on the intramolecular/dimeric state of M pro (Figure 22D ), whereas that of the N-terminal Figure 22E ). The S1 subpocket is comprised of the residues shown in Figure 23 (M 1/down pro ) and 24 (2·M 2/up pro ), together with the substrate P3 side chain. A subset of these residues plays a dual (2) Closing the S1 subpocket via the crest B down → crest B up transition, which repositions the Asn142 gatekeeper over the subpocket. We postulate that the desolvation cost of the polar amide group of the P1 Gln side chain is reduced via this mechanism, such that the side chain binds with its solvation partially intact (noting that the S1 The S1 subpocket is lined by Glu166 (orange), His172 (green), His163 (not visible), Ser139 (blue), Phe140 (blue), Leu141 (blue), Asn142 (coral), and the substrate P3 side chain (yellow). The subpocket is occupied by the P1 Gln side chain (pink). Many of the residues lining the S1 subpocket play dual roles: the backbone of Glu166 H-bonds with the substrate P3 backbone (thereby directly connecting the β-sheet formed by the substrate and β-hairpin to the S1 subpocket). (B) Same as A, except showing the solvent-accessible surface (noting that the accessibility of the S1 subpocket is underestimated by the smoothed solvent-accessible surface). Furthermore, the S1 subpocket appears to be coupled to channel 1 within the domain {1−2}−3 interface (see above). (3) Orienting the scissile bond toward the attacking Cys145 side chain. Access to the S1 subpocket is blocked by Asn142 in the extended conformation of the m-shaped loop in the M 2/up pro state ( Figure 24A ), which is pointed away from the subpocket in the 3 10 helical M 1/down pro state ( Figure 23A ). As such, we postulate that substrates bind to the M 1/down pro state, which then rotates about the domain 2−3 linker into the substrate-stabilized2· M 2/up pro state, followed by dimerization. Dimer Interface. Dimerization is widely assumed to govern both the activation and substrate complementarity of M pro . 36 The dimer interface bridges the H-bond networks within the individual subunits via their NTL chains (Figure 25) . Deletion of the NTL results in an alternate tail−tail dimer interface about domain 3 of the member subunits. 37 The Putative Hydropowered Dimerization Mechanism. We used WATMD to explore dimerization of substrate-bound M 2/up pro (2Q6G) (i.e., M 2/down pro + M 2/down pro → 2·M 2/up pro ), which we postulate is driven by mutual desolvation of the monomeric subunits in and around their NTL regions. Expulsion of solvating water during dimerization is expected in regions where the side chain/backbone atoms of each subunit overlap with the solvation structure of the opposite subunit. We calculated the solvation properties of subunit A (the reference subunit) of the time-averaged 2Q6G structure in and around the NTL region. We then overlaid subunit B and examined the overlaps between the atoms of subunit B and the occupied voxels of subunit A ( Figure 26A ). The results demonstrate high complementarity between the HOVs and ULOVs of subunit A and the NTL of subunit B (and vice versa), consistent with the expulsion of H-bond depleted water during dimerization (noting that the dimerization K d is lower in the substrate-bound than the empty dimer, 38, 39 suggesting that the substrate plays a key role in determining the solvation properties of the dimer interface). We then calculated the solvation structure of the dimer (2Q6G), which corresponds to the residual solvation within the postdimerization interface ( Figure 26B) . The Putative Hydropowered Substrate/Inhibitor Binding Mechanism. We calculated the solvation structures in and around the AS of apo M 1/down pro (2QCY; Figure 27A ,B), substrate-bound 2·M 2/up pro (2Q6G; Figure 27C ), and apo 2· M 2/up pro in 6M03 ( Figure 27D ) using WATMD. We aligned (rather than docked) the substrate-and inhibitor-bound complexes included in our study (2Q6G, 6XHM, 6WNP, 6LU7, and 4MDS) to the time-averaged monomeric M 1/down pro structure, and extracted the ligands. We then characterized the degree of complementarity between the overlaid ligand groups and voxel occupancies and H-bond donor/acceptor preferences. We assume that the core solvation structure of the apo form is comparable to that of the induced fit forms present in the substrate-and inhibitor-bound protein structures, which is borne out by the excellent observed qualitative overlaps between polar substrate and inhibitor groups and HOVs in the aligned structures (keeping in mind that HOVs are fuzzy representations of the occupying water due to dynamic Hbond rearrangeability among the donors/acceptors in the local protein environment, and the exchangeability of water molecules with bulk solvent). The results are summarized below (close-up views with detailed voxel overlap information for the substrate and inhibitors are provided, as noted in the Supporting Information). Substrate-Solvation Structure Complementarity ( Figures S1−S5) . Recognition of M pro substrates depends largely on gatekeeper HOVs located within the backbone binding region and S1 subpocket ( Figure 27A,B) , which binds the fully conserved Gln (Table 3 ). Our results suggest that the M pro solvation structure, together with the size/shape of the AS, Figure 24 . Stereo views of the S1 subpocket in the 2·M 2/up pro state and the bound substrate P1 group in 2Q6G. The substrate peptide (red cartoon) is visible at the top of the image. (A) The donut-shaped S1 subpocket is lined by Glu166 (orange), His172 (green), His163 (not visible), Ser139 (blue), Phe140 (blue), Leu141 (blue), Asn142 (coral), and the substrate P3 side chain (yellow). The P1 Gln side chain (pink) occupies the "donut hole", with the open side serving as a solvent-accessible cavity for the Gln amide, thereby reducing the desolvation cost of this group. Many of the residues lining the S1 subpocket play dual roles: The backbone of Glu166 H-bonds with the substrate P3 backbone (thereby directly connecting the β-sheet formed by the substrate and β-hairpin to the subpocket), and Asn142 serves as the gatekeeper of the subpocket. Tyr118 equate to the lowest common denominator of solvation complementarity/recognition among the twelve nsp substrates of M pro (namely, P1 Gln and P2 Leu), and further suggest that this sequence is possibly rare throughout both the viral and host genomes. Activation of the catalytic His in NS3 protease has been attributed to P2 Leu-induced desolvation of the S2 subpocket 14 (noting that this side chain overlaps unfavorably with a HOV cluster at this position in M pro ). The polar environments of the HOVs located in the S4 subpocket and beyond (many of which exhibit more moderate water occupancy) likely lower the desolvation cost of substrates containing polar side chains at these positions (noting the existence of unfavorable overlaps with the P4 side chain of the crystallized substrate). Conversely, numerous ULOVs reside throughout the envelope of the overlaid substrate ( Figure 27A,B) . We calculated the voxel occupancies in the timeaveraged substrate-bound 2·M 2/up pro crystal structure (2Q6G), representing the residual nonexpelled solvation in the bound state ( Figure 27C ). The results suggest that the solvation corresponding to many of the HOVs residing within the substrate envelope is expelled (possibly unfavorably) during association. However, in the absence of quantitative solvation free energy predictions, the absolute magnitude of such energy losses cannot be determined. The solvation structure of the apo 2·M 2/up pro state (6M03) is shown in Figure 27D . The HOVs within the S1 subpocket are considerably larger than those in the M 1/down pro structure, suggesting that Gln-induced expulsion of the corresponding solvation in 2·M 2/up pro is potentially hampered (i.e., k 1 is slowed) in this state (consistent with our hypothesis that substrate binding is limited to the M 1/down pro state). A possible connection between these larger HOVs and the open buried water channel adjacent to the m-shaped loop in the dimeric protein is conceivable. Inhibitor-Solvation Structure Complementarity. Next, we sampled the complementarity between the protein and solvation structures in the M 1/down pro state (2QCY) and four representative inhibitors (Table 1) . Substrates and covalent inhibitors are assumed to interact initially with this state (i.e., prior to induced-fit conformational changes). All of the inhibitors overlap with a subset of ULOVs to varying degrees, which putatively slows k off in proportion to the resolvation costs at those positions during dissociation of the bound complex. However, the inhibitors exhibit variable degrees of complementarity with the HOVs in each subpocket, which putatively speeds or slows k on in proportion to the desolvation costs at those positions during association. Both potency and the observed B-factors of the crystallized inhibitors ( Figure 28A ) can be explained qualitatively in terms of favorable and unfavorable complementarity between overlapping inhibitor groups and ULOVs and HOVs. PF00835321 ( Figures 28B and S2 ): Favorable overlaps between HOVs and polar groups of PF00835321 include the cyclic amide (a Gln mimetic) located in the S1 subpocket the and amide O in the S3 subpocket (corresponding to the backbone O of the substrate P3). Unfavorable overlaps between HOVs and nonpolar groups are largely avoided (in the S4 subpocket, in particular), with the exception of the S2 subpocket, which contains lower occupancy HOVs. These findings are consistent with the high measured potency of this inhibitor (fast k on and slow k off are predicted). Figure 28 . Stereo views of four representative crystallized inhibitors overlaid on the time-averaged M 1/down pro structure (2QCY) and the solvation structure thereof calculated using WATMD. ULOVs are distributed diffusely across the S1′ through S4 subpockets, each of which additionally contain clusters of HOVs representing one or two water molecules per cluster (noting that the sphere sizes are proportional to occupancy, rather than the spatial expanse of the voxels). Inhibitor-solvation structure complementarity assessment is based on overlaps between polar/nonpolar inhibitor R-groups and SID 24808289 (Figures 28C and S3 ): Favorable overlaps between HOVs and polar groups of SID 24808289, include the benzotriazole ring in the S1 subpocket, amide O in the S3 subpocket (similar to PF00835321), and amide O in the S4 subpocket (corresponding to the substrate P4 backbone O). The isopentyl group overlaps unfavorably with HOVs in the S3 subpocket. These findings are likewise consistent with the high measured potency of analog 17a of this inhibitor 30 (faster k on and slower k off are predicted). Boceprevir ( Figures 28D and S4 ): The urea NH of boceprevir overlaps favorably with a HOV in the S4 subpocket (corresponding to the P4 backbone O). However, multiple mismatches are present between nonpolar groups of this inhibitor and HOVs in the S1 (most critically), S2, and S4 subpockets. These findings are consistent with the lower measured potency of this inhibitor (slow k on is predicted). N3 ( Figures 28E and S5) : The amide NH of N3 overlaps favorably with a HOV in the S4 subpocket (corresponding to the P4 backbone O). However, unfavorable overlaps are present between nonpolar groups of N3 and HOVs in the S2 and S4 subpockets. These findings are likewise consistent with the low measured potency of this inhibitor (slow k on is predicted). Nonequilibrium Perspective on M pro Catalysis and Inhibition. Enzyme kinetics are typically measured and analyzed under the assumption that the rate of enzyme− substrate complex formation and turnover are equivalent (the steady state assumption). However, this assumption need not apply under native cellular conditions, in which the enzyme and substrate concentrations vary over time, and the rate of enzyme−substrate complex formation is necessarily described using ordinary differential equations (ODEs) of the form: 1) where ES denotes the enzyme−substrate complex, and k 1 , k −1 , and k cat denote the association, dissociation, and turnover rates, respectively. At constant free enzyme and substrate concentrations, eq 3 reduces to K M = k k where k exp and k deg are the rates of monomer synthesis and monomer degradation, respectively (assuming the possible existence of one or more protein degradation pathways). where k 1(1) and k −1 (1) are the rates of substrate−M pro association and dissociation, respectively, and k b(2) and k b(−2) are the rocking rates between the domain 3 positions 1 and 2 in the substrate-bound M pro monomer. where product 1 is the hydrolyzed C-terminal product, k on(1) , k off(1) , and k cat(1) are the rates of dimerization, dimer−substrate dissociation, and turnover, respectively where 2·(M 2/up pro ∼thioester) is the thioester adduct, which is equal to the rate of product 1 generation. where product 2 is the hydrolyzed C-terminal product, and k cat(2) is the turnover rate constant for thioester adduct decay (where the functional unit is dimeric). Under nonequilibrium conditions, the catalytic efficiency of M pro depends on synchronous dimerization, substrate binding, and turnover, where the following are true: (1) The substrate−M 2/up pro association rate approaches the turnover rate (k 1(1) ≳ k cat ). The slowest binding step is otherwise rate-determining. (2) The lifetime of the 2·(M 2/up pro ∼substrate) dimer approaches the reaction time constant (1/k off(1) < 1/k cat ). Turnover is disrupted when the dimer and/or bound substrate dissociate prior to product formation (noting that K d is agnostic to binding partner exchanges, whereas enzyme-mediated turnover is not). 5 For noncovalent inhibitors: where k on (2) and k off (2) are the inhibitor association and dissociation constants, respectively. We assume that inhibitors bind to the 3 10 where k 1(3) and k −1(3) are the unreacted inhibitor−M pro association and dissociation constants, respectively. where k 1(4) , k −1 (4) , and k cat(4) are the rates of inhibitor−M pro association, dissociation, and adduct formation, respectively, and k on(4) and k off(4) are the dimerization and dimer dissociation rates, respectively (noting that slow dimer dissociation may result in the presence of irreversible adduct formation). The solution to the above set of coupled ODEs consists of a time-dependent exponential function, commensurate with rapid growth in polyprotein processing and virion production over time. However, implementation of this model leads to a catch-22, in which experimental parameter measurement and analysis depend on the assumed kinetics model, and vice versa. The enzyme kinetics data reported for SARS-CoV-2 M pro is out of line with respect to that of other known enzymes, 42 as follows: (1) K M ranges between 189.5 and 228.4 μM for three model substrates 43 (consistent with other reported values), 36, 44, 45 compared with the median K M of 130 μM reported for 5194 enzymes. (2) k cat ranges between 0.05 and 0.178 s −1 , compared with the median k cat of 13.7 s −1 reported for 1942 enzymes. Slow turnover by CoV 3CL pro has been attributed to slow hydrolysis of the acyl adduct (reaction step 2), rather than slow proton abstraction or TI formation (reaction step 1). 20 (3) k cat /K M ranges between 219 and 859 M −1 s −1 , compared with the median k cat /K M of 125 × 10 3 reported for 1882 enzymes. The k cat /K M equates to unrealistically slow processing throughput (e.g., ∼1 mM of substrate is needed to achieve an overall processing rate of 1 s −1 , compared with 8 μM at the median k cat /K M ). The above discrepancies may result from neglect of the substrate and dimerization contributions to M pro activation, in which case, data analysis cannot be based simply on fixed concentrations of the enzyme and substrates. Our model suggests that the dimerization K d differs between the substratebound and unbound states, which is consistent with the K d values of 0.8 and 2 μM reported by Cheng et al. for substrate-bound and unbound CoV M pro , respectively. 39 Graziano et al. reported a somewhat higher dimerization K d for the unbound form (ranging between ∼5 and 7 μM) based on three orthogonal measurement techniques. 38 Dimer buildup is a nonequilibrium process under in vivo conditions due to the time-dependence of the total M pro and polyprotein concentrations resulting from first-order autocleavage; furthermore, the monomer−dimer−substrate distribution is highly nonlinear due to the three-way relationship among the participating species. We calculated the equilibrium dimer concentration as a function of substrate-independent free monomer concentration in multiples of K d = 5 and 0.8 μM ( Table 4 ). The results suggest that the substrate-independent fractional dimer concentration increases slowly as a function of the total M pro concentration (i.e., dimer + monomer). A large fraction of monomer is present at physiologically meaningful total M pro concentrations (which we assume to be ≪ 5 μM) in the absence of substrates, which is tipped toward the dimer in the presence of substrates (e.g., ≪ 50% dimer at concentrations ≪ 5 μM versus 50% at 800 nM). A similar activation mechanism for caspase-1 was reported by Datta et al., in which a 20-fold increase in the dimer/ monomer ratio was observed in the presence of substrate (corresponding to a 10-fold increase in the k cat /K M ), compared with a 2.5-and 9-fold increase in the dimer/monomer ratio with M pro at the K d values listed respectively in Table 4 . 46 The primary aim of early/preclinical drug discovery consists of predicting efficacious/nontoxic chemical entities via a combination of experimental and in silico data modeling techniques. Whereas drug discovery is predicated on equilibrium drugtarget/off-target structure-free energy relationships (expressed as nK d or nIC 50 , where n is a scaling factor between the drug concentration at 50% occupancy versus that at the efficacious occupancy), cellular function and pharmacodynamics in the in vivo setting depend on nonequilibrium structure-kinetics relationships, in which the concentrations of target/off-target, endogenous cognate partner(s), and drug vary over time. The equilibrium and nonequilibrium regimes rarely converge, due in no small measure to the fact that free energy, occupancy, and concentration/exposure are frequently disconnected between the in vitro and in vivo settings (noting that the relationship between ΔG and −RT·ln(K d ) applies solely at fixed species concentrations and that the occupancy− concentration relationship is underestimated by the Hill and Michaelis−Menten equations). In the absence of theoretical principles on which to base drug-target occupancy predictions under in vivo conditions, drug discovery is relegated to a stepwise trial-and-error process centered on empirical approaches and data fitting techniques (i.e., inductive reasoning). We proposed in our previous work the following: (1) Optimal dynamic drug-target occupancy depends first and foremost on the drug-target association rate constant (k on , k 1 ), and that the k on of many marketed drugs is fast, even when the k off is slow (if the train is missed, it matters not how long the trip). 1 (2) ΔG association ⧧ and ΔG dissociation ⧧ are contributed largely by H-bond depleted/trapped and enriched solvation, 3,47−50 and that achieving high dynamic occupancy depends on optimal desolvation of this water. Here, we propose the following: (1) The catalytically important structural transitions in M pro , which are powered putatively by potential energy stored in unfavorable H-bond depleted/trapped solvation (rather than protein structure per se). (2) The spatial distribution of solvation free energy (which we refer to as the "solvation structure") across the AS and domain {1−2}−3 and dimer interfaces. In principle, optimal ligand structures can be inferred from computed solvation structures consisting of voxel occupancies and donor/acceptor preferences, so as to maximize and minimize resolvation and desolvation costs to/from enriched ("gatekeeper") and depleted protein surface positions represented by exposed HOVs and UHOVs; and exposed or trapped ULOVs and trapped UHOVs, respectively. (3) The specific mechanisms by which solvation free energy is stored and released cyclically by intra-and intermolecular state transitions, including substrate and covalent inhibitor binding. The time-dependence of all processes in which M pro participates under native conditions in vivo, including monomer expression and degradation, rearrangement, and solvation free-energy-driven substrate/inhibitor binding are key considerations in inhibitor design. Two nonmutually exclusive M pro inhibition approaches are conceivable: (1) Inhibition of M pro autocleavage in cis: Under this approach, the inhibitor k on must necessarily keep pace with the rate of polyprotein synthesis and remain bound throughout the protein lifetime. However, this approach is likely nonviable under the likely scenario that the cleavage substrate folds within the AS. (2) Inhibition of M pro -mediated polyprotein cleavage in trans: We assume that most covalent inhibitors containing substrate-like P1 groups bind to the monomeric M 1/down pro (S1subpocket-accessible) form of postcleaved M pro . From a systems perspective, efficacious M pro inhibition depends on lowering the active enzyme population below a critical threshold at which downstream processing can no longer proceed, and maintaining this inhibition level over time (noting that M pro inhibition during the virion production phase may have little impact on disease outcome, given that the ship has already sailed). The validity of the slow reported M pro k cat / K M derived from the Michaelis−Menten approach is questioned by the caspase-1 study 46 performed using a dynamic enzyme model (described in the Supporting Information of ref 46) , suggesting the need for a similar model in M pro enzyme studies. Furthermore, inhibitor-induced activation of caspase-1 was observed at suboptimal inhibitor concentrations, which is likewise of potential concern for M pro . In our previous work, we demonstrated the high sensitivity of noncovalent dynamic drug occupancy to the rates of binding site buildup and decay (in order of precedence: k on , [drug concentration](t), and k off ). 1 Efficacious inhibition (i.e., high dynamic occupancy of the AS) at the lowest possible concentration depends on kinetically tuned inhibitor binding, where k on ≈ k i or k 1 and k off approaches the protein lifetime or k −1 . Fast k on and slow k off depend on high mutual AS-inhibitor complementarity between the solvation structures of both partners, as follows: (1) The H-bonds of expelled H-bond enriched binding partner solvation are replaced one-for-one by polar inhibitor groups (i.e., H-bond acceptors are matched to water O and Hbond donors are matched to water H). Optimal H-bond replacements are predicted to speed k on toward the diffusion limit, corresponding to the minimum possible ΔG association ⧧ . (2) H-bond depleted/trapped water molecules are maximally expelled, resulting in large free energy losses during resolvation of the dissociating partners, corresponding to the maximum possible ΔG dissociation ⧧ . (3) The absence of additional H-bond depleted solvation and gain of additional H-bond enriched solvation in the bound versus unbound state (which is predicted to slow k on and k off , respectively). Both covalent and noncovalent M pro inhibition strategies are being pursued by other laboratories. In the former case, efficacy is assumed to depend on occupancy accumulation, although the rate of accumulation may likewise be important (noting that uninhibited M pro and its downstream products may result from slow occupancy accumulation due to slow k on and/or k cat ). In the latter case, efficacy is assumed to depend on fast k on in relation to the rate of M pro buildup and/or slow k off (noting that noncovalent inhibitors may likewise accumulate via slow k off , given sufficient expulsion of H-bond depleted solvation). The advantages and limitations of the two strategies can be summarized as follows: (1) Covalent inhibition depends on delivering the reactive warhead to the catalytic Cys145 in a state-dependent fashion (i.e., M 1/down pro ) via a noncovalent prereaction step, in which the 2·M 2/up pro state is stabilized (just as for native substrates). Conversely, noncovalent inhibitors could conceivably bind to any M pro state. (2) Both classes depend on achieving the fastest possible k on and the slowest possible k off . However, these rates may tip toward slow k off versus fast k on in the case of covalent and noncovalent inhibitors, respectively. Optimization of covalent inhibitors is aimed at both k cat (a necessary but insufficient condition for achieving efficacious M pro occupancy) and k on . Rapid adduct formation is conceivable based on the general cysteine protease mechanism reported by other workers, where the rate-determining step consists of hydrolysis (step 2) rather than thioester formation (step 1). 22 Optimization of noncovalent inhibitors is necessarily aimed at both k on and k off . The exquisite measured potency of PF00835321 is consistent with fast k on and a fast rate of reaction. The nanomolar potency of analog 17a of SID 24808289 suggests that noncovalent inhibitor occupancy need not be k off -limited, which is consistent with the large number of inhibitoroverlapped ULOVs ( Figure 28C ), together with the low Bfactors of this inhibitor ( Figure 28A ). Interestingly, the Rgroups of both compounds are well-matched to overlapped HOVs ( Figure 28B,C) , whereas the weaker inhibitors are poorly matched ( Figure 28D,E) . However, the actual quality of H-bond replacements is difficult to assess quantitatively in the absence of inhibitor k on and k off data. Less is more when it comes to drugs. Pharmacodynamic and pharmacokinetic behaviors (including solubility and permeability) are governed largely by drug, target binding site, and membrane surface desolvation and resolvation costs, which in turn are governed largely by polar/nonpolar scaffold composition. Balanced polar/nonpolar composition, as prescribed by the Pfizer rule of 5, may be achieved, as follows: (1) Limiting the polar composition to approximately that needed for replacing the H-bonds of gatekeeper solvation (corresponding to HOVs) in polar environments, thereby minimizing both drug and binding site desolvation costs. (2) Limiting the nonpolar composition to approximately that needed for expelling H-bond depleted solvation from nonpolar environments (corresponding to ULOVs), thereby maximizing the resolvation costs of the dissociating drug and binding site. Property imbalances result from mismatches between HOVs and ligand groups, leading to a vicious circle, in which: (1) Nonpolar group incorporations are needed to overlap additional k off -slowing ULOVs in compensation for inadequate k on (2) Additional polar group incorporations are needed to rebalance logP, at the cost of increased molecular weight. Inhibitor−M pro occupancy may be impacted negatively by the following: (1) The high entropic cost of binding flexible peptidomimetic inhibitors (reflecting the cost of ordering), which contributes to the association free energy barrier. (2) The lack of an optimal P1 group, which is expected to slow k on (and likely k cat as well) and speed k off due to higher inhibitor desolvation cost and indirect loss of substrate-induced enzyme activation in the M 1/down pro state. The lack of inhibitor−AS solvation complementarity in the S2, S3, and S4 subpockets can result in independent binding/rebinding behavior ("wagging") of the occupying P2, P3, and P4 groups due to local solvation free energy losses in the affected subpockets (reflected in high inhibitor B-factors of these groups in 6LU7). (3) Simultaneous overlaps between nonpolar ligand groups, ULOVs, and HOVs represent a tradeoff between slowed k off and slowed k on . Optimization of k off to < the rate of binding site decay at the expense of k on < [the rate of binding site buildup] is typically counterproductive. We have showed that the dynamic noncovalent intra-and intermolecular rearrangements underlying M pro structure− function, We have further demonstrated that solvation free energy is ideally suited for powering the aforementioned rearrangements via counterbalanced, position-/state-specific H-bond enriched and depleted solvation, the desolvation and resolvation of which govern the rates of entry and exit of molecular populations to/from the available enzyme states (including substrate and inhibitor-bound states). Finally, we have challenged the reported enzyme kinetics data for M pro , in which the enzyme efficiency and inhibitory requirements may be underestimated by the classical Michaelis−Menten approach used in those studies. Pearlstein − Global Discovery Chemistry, Computer-Aided Drug Discovery Email: robert.pearlstein@novartis.com ■ REFERENCES Biodynamics: A Novel Quasi-First Principles Theory on the Fundamental Mechanisms of Cellular Function/Dysfunction and the Pharmacological Modulation Thereof Building New Bridges between In Vitro and In Vivo in Early Drug Discovery: Where Molecular Modeling Meets Systems Biology Toward in vivo-relevant HERG safety assessment and mitigation strategies based on relationships between non-equilibrium blocker binding, three-dimensional channel-blocker interactions, dynamic occupancy, dynamic exposure, and cellular arrhythmia. bioRxiv (Pharmacology and Toxicology The Mode of Action of Urease and of Enzymes in General Application of the Van Slyke-Cullen Irreversible Mechanism in the Analysis of Enzymatic Progress Cu Processing of the SARS-CoV Pp1a/Ab Nsp7−10 Region Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2): An Overview of Viral Structure and Host Response SARS-Coronavirus Replication Is Supported by a Reticulovesicular Network of Modified Endoplasmic Reticulum Ultrastructure of the Replication Sites of Positive-Strand RNA Viruses Dynamics of Coronavirus Replication-Transcription Complexes Expression and Cleavage of Middle East Respiratory Syndrome Coronavirus Nsp3−4 Polyprotein Induce the Formation of Double-Membrane Vesicles That Mimic Those Associated with Coronaviral RNA Replication Downloaded From Topographic Changes in SARS Coronavirus-Infected Cells during Late Stages of Infection Three-Dimensional Model of a Substrate-Bound SARS Chymotrypsin-Like Cysteine Proteinase Predicted by Multiple Molecular Dynamics Simulations Inhibitor Binding Induces Active Site Stabilization of the HCV NS3 Protein Serine Protease Domain Estimation of Solvation Entropy and Enthalpy via Analysis of Water Oxygen-Hydrogen Correlations Structure and Dynamics of SARS Coronavirus Main Proteinase (M Pro) Structural and Evolutionary Analysis Indicate That the Sars-COV-2 Mpro Is a Challenging Target for Small-Molecule Inhibitor Design On the Mode of Activation of the Catalytically Essential Sulfhydryl Group of Papain Steady-State and Pre-Steady-State Kinetic Evaluation of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) 3CLpro Cysteine Protease: Development of an Ion-Pair Model for Catalysis Evidence for Substrate Binding-Induced Zwitterion Formation in the Catalytic Cys-His Dyad of the SARS-CoV Main Protease On the Reactivity of the Thiol Group of Thiolsubtilisin Mechanism for Controlling the Dimer-Monomer Switch and Coupling Dimerization to Catalysis of the Severe Acute Respiratory Syndrome Coronavirus 3C-Like Protease The Protein Data Bank/Biopython Structures of Two Coronavirus Main Proteases: Implications for Substrate Binding and Antiviral Drug Design Structure of Mpro from COVID-19 Virus and Discovery of Its Inhibitors Structure of Alpha-Chymotrypsin Refined at 1.68 A Resolution Discovery of Danoprevir (ITMN-191/R7227), a Highly Selective and Potent Inhibitor of Hepatitis C Virus (HCV) NS3/4A Protease Discovery of N-(Benzo[1,2,3]Triazol-1-Yl)-N-(Benzyl)Acetamido)Phenyl) Carboxamides as Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) 3CLpro Inhibitors: Identification of ML300 and Noncovalent Nanomolar Inhibitors with an Induced-Fit Binding Time-Averaged Distributions of Solute and Solvent Motions: Exploring Proton Wires of GFP and PfM2DH Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB The Crystal Structures of Severe Acute Respiratory Syndrome Virus Main Protease and Its Complex with an Inhibitor Mutation of Glu-166 Blocks the Substrate-Induced Dimerization of SARS Coronavirus Main Protease Liberation of SARS-CoV Main Protease from the Viral Polyprotein: N-Terminal Autocleavage Does Not Depend on the Mature Dimerization Mode Critical Assessment of the Important Residues Involved in the Dimerization and Catalysis of MERS Coronavirus Main Protease Without Its N-Finger, the Main Protease of Severe Acute Respiratory Syndrome Coronavirus Can Form a Novel Dimer through Its C-Terminal Domain SARS CoV Main Proteinase: The Monomer-Dimer Equilibrium Dissociation Constant The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19 Both Boceprevir and GC376 Efficaciously Inhibit SARS-CoV-2 by Targeting Its Main Protease The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters Substrate Specificity Profiling of SARS-CoV-2 M pro Protease Provides Basis for Anti-COVID-19 Drug Design Ligand-Induced Dimerization of Middle East Respiratory Syndrome (MERS) Coronavirus Nsp5 Protease (3CL pro) Implications for Nsp5 Regulation and the Development of Antivirals Biosynthesis, Purification, and Substrate Specificity of Severe Acute Respiratory Syndrome Coronavirus 3C-like Proteinase Substrate and Inhibitor-Induced Dimerization and Cooperativity in Caspase-1 but Not Caspase-3 Contributions of Water Transfer Energy to Protein-Ligand Association and Dissociation Barriers: Watermap Analysis of a Series of P38α MAP Kinase Inhibitors New Hypotheses about the Structure-Function of Proprotein Convertase Subtilisin/Kexin Type 9: Analysis of the Epidermal Growth Factor-like Repeat A Docking Site Using Watermap The Translocation Kinetics of Antibiotics through Porin OmpC: Insights from Structure-Based Solvation Mapping Using WaterMap Structure-Kinetic Relationship of Carbapenem Antibacterials Permeating through E. Coli OmpC Porin The authors declare no competing financial interest. The authors thank Dr. Andrei Golosov for his important contributions to the WATMD parts of this work and helpful suggestions during the manuscript preparation.