key: cord-0146793-u82xtsjf authors: Folescu, Dan; Onufriev, Alexey V. title: A closed-form, analytical approximation for apparent surface charge and electric field of molecules date: 2021-06-22 journal: nan DOI: nan sha: dcf650c782ff6f2035cc791bc3feca699bd5dcb1 doc_id: 146793 cord_uid: u82xtsjf Closed-form, analytical approximations for electrostatic properties of molecules are of unique value, as these can provide computational speed, versatility, and physical insight. Here, we derive a simple, closed-form formula for the apparent surface charge (ASC), as well as for the electric field, generated by a molecular charge distribution in aqueous solution. The approximation, with no fitted parameters, is tested against numerical solutions of the Poisson equation, where it yields a significant speed-up. For neutral small molecules, the hydration free energies estimated from the closed-form ASC formula are within 0.8 kcal/mol RMSD from the numerical Poisson reference; the electric field at the surface is in quantitative agreement with the reference. Performance of the approximation is also tested on larger structures, including a protein, a DNA fragment, and a viral receptor-target complex. For all structures tested, a near quantitative agreement with the numerical Poisson reference is achieved, except in regions of high negative curvature, where the new approximation is still qualitatively correct. A unique efficiency feature of the proposed"source-based"closed-form approximation is that the ASC and electric field can be estimated individually at any point or surface patch, without the need to obtain the full global solution. An open source software implementation of the method is available: http://people.cs.vt.edu/~onufriev/CODES/aasc.zip Accurate and efficient modeling of solvation effects at the atomistic level is a critical component of modern efforts to understand molecular structure and function [1] [2] [3] [4] [5] . Analysis and visualization of electrostatic properties of biomolecules, including the electric field and surface charge generated by the molecular charge distribution, have made an impact in qualitative reasoning about biomolecules 1,6 . There are two broad approaches to the modeling of molecular electrostatics and solvation effects: explicit and implicit solvation methods 7 . Arguably the most widely used model of solvation is that for which individual solvent molecules are treated explicitly, on the same footing with the target molecule. However, accuracy of the explicit solvent representation comes at high price, computationally, limiting the practical utility of atomistic simulations in many areas. The implicit, continuum solvation approach -treating solvent as a continuum with the dielectric and non-polar properties of water -can offer much greater effective simulation speeds compared to the explicit solvent models [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] . The Poisson equation 9, 10, [19] [20] [21] [22] [23] of classical electrostatics 24 provides an exact formalism -within the continuum, local, linearresponse dielectric approximation of solvent in the absence of mobile ions -for computing the electrostatic potential V (r) produced by a molecular charge distribution ρ(r) characterizing the solute: where (r) is the dielectric constant. Once V (r) is obtained, the electrostatic part of the solvation free energy is easily computed 24 . The problem of finding V (r) is mathematically equivalent 25, 26 to finding a continuous charge density, σ, on the dielectric boundary (DB), such that: where ρ(r) is the discrete charge distribution, formed by n point charges q 1 , · · · , q n , and σ(s) is the apparent surface charge (ASC) associated with each surface patch s. The second term in the above equation represents the so-called reaction field potential [27] [28] [29] . Conceptually, once the ASC, σ(s), is found, all of the solvation effects, at the level of the Poisson equation, can be computed. A great variety of practical, widely used methods, including multiple modern derivatives of PCM 25 and COSMO 30 , utilize this general idea -the apparent surface charge (ASC) formalism, see Refs. 31, 32 for comprehensive reviews. The reformulation of the Poisson problem via equation 2 has a number of technical advantages made apparent over the years, especially in quantum mechanical (QM) calculations 8, 32 . A number of ASC-based methods yield numerically exact solutions to the Poisson equation, in the sense that the exact solution can, at least in principle, be approximated with an arbitrary precision. Formally exact, linear-scaling implementations of numerical ASC methods, based on conjugate gradient or domain decomposition, exist 32 . Concerns related to computational cost of numerically exact approaches 31 bounding-box grid spacing of 0.5 Å and 1.0 Å, respectively. The total number of NPB grid points was determined by setting a volumetric bounding-box side length slightly larger (+1 Å) than the maximal intra-molecular distance. We numerically approximate the electric 6 field normal at a point r by a two-point stencil: where r + hn and r − hn are two sampling points distance r ± h from the DB along the surface normaln; see Ref. 36 for additional details of the sampling protocol. Here, h was chosen to minimize the distance between sampled points, while still being large enough so that the sampled points are distinct. Notice that if h were too small, the sampling protocol would sample the exact same two points of the cubic lattice used in the NPB reference calculations. The largest possible distance between two such grid points is the diagonal of the cubic grid, which determines the minimum h. To illustrate this numerical constraint, h must be larger than √ 3/(0.1) 2 ∼ 0.173 Å for members of the small molecule dataset, whose potential grids were computed with the NPB reference at a grid spacing of 0.1 Å. Additionally, to avoid numerical artifacts of the NPB reference near the DB, and mitigate possible effects of minor differences between the internal representations of the SES computed by our NPB reference and NanoShaper, the field is computed a distance p > h from the DB, Figure 4 : doing so ensures that we do not accidentally sample grid points inside the molecule. The need to use a non-zero projection distance in the NPB calculations makes it necessary to consider the electric field normal values near the DB as the numerical reference for assessing the accuracy of the analytical ASC, rather than the apparent surface charge itself (which, up to a prefactor, is essentially the normal component of the field right at the DB, see equation 5). We utilize the IGB5 95 GB model from AMBER package 81 . This model was parameterized against reference NPB hydration free energies calculated based on SES surface, Bondi radii, and 1.4 Å water probe radius. We test our ASC approximation, first against the exact PB (EPB, see section III A ) reference, and then against NPB and GB references. Our comparison with the EPB 7 reference is used directly for the apparent surface charge, employing the two test charge configuration in Figure 2b . Per-vertex electric field normal values are compared against the NPB reference, averaging over each vertex in a given biomolecule, and over each biomolecule in the comparison set. Electrostatic hydration free energies are compared against the NPB and GB references with inner and outer dielectrics in = 1 and out = 80, unless otherwise specified. All results will be in e/Å 2 for apparent surface charge, kcal/( mol · e · Å) for electric field normals, and kcal/mol for electrostatic hydration free energies. All computations and visualizations were completed on a commodity desktop computer with an Intel Core i7 (or equivalent) processor, using a maximum of 32 GB of memory. In the context of implicit solvation, the simplest scenario is that of a solute with a sharp, spherical DB, Figure In each case, a perfectly spherical, sharp DB is utilized, with in and out denoting inner and outer dielectric constants, respectively. The angle θ (resp. π − θ) is subtended by the lines connecting the point of observation, r, and the charge(s) to the spherical center. In panel 2a, a single point charge, q i , is located r i away from the center of a spherical boundary with radius A. In panel 2b, two point charges, q 1 and q 2 , are located on the vertical diameter of the sphere. The charges are of equal distance, r 1 = r 2 , from the spherical center. The electrostatic potential V (r) is computed at the point of observation r. For such a spherical boundary, as in Figure 2a , Kirkwood 96 gave the exact, analytical solution of equation 1 for the potential V i at the DB due to a single charge q i inside the boundary. Here, we use the solution valid on or exterior to the spherical DB 97 , without consideration for mobile ions. At r = A: The apparent surface charge σ is related to the normal component of the electric field 9 E ⊥ = ∂V ∂ n out at (just outside) the boundary via: 24,25 From this, and equation 4, we obtain an exact, analytical expression for the apparent surface charge on the spherical DB: where the summation is over all of the enclosed charges. Equation 6 will provide a key check for our analytical ASC approximation. The presence of the indexing term, (l + 1), is notable in its effects on the convergence characteristics of equation 6, by increasing the number of terms necessary to obtain a converged, accurate reference. As the ratio r i /A approaches 1 -that is, as the charge approaches the DB -slow convergence of the approximate solution manifests itself. shown with blue, orange, green, and red lines, respectively. In Panel (b) of Figure 3 , we see that, even for r i A = 0.8, it is possible to achieve both qualitatively and quantitatively reasonable results. The analytical reference appears well- The boundary separates the inner (blue) and outer (white) dielectric regions, with constants in and out , respectively. For a non-spherical DB, the distance r from the spherical center, Figure 2a , to the sampling point r is replaced with the generalized expression r = A + p. Here, A is the so-called electrostatic size of S, which characterizes the dimension of the molecule 98 , and p denotes the projection distance from r to ∂S along the surface normal, n. q i is the source charge under consideration, with d i denoting the distance between q i and r. When p > 0, the electric field normal E ⊥ (r) of S at r is computed via equation 9. When p = 0, the ASC, σ(r), is computed via equation 10. To derive our ASC approximation, we begin with the previously derived closed-form approximation for the electrostatic potential around (outside) an arbitrary molecular shape 97 , 12 see Figure 4 : We utilize the polar orthonormal frame, e r = ∂ ∂r ; e θ = 1 r ∂ ∂θ , to take its derivative, for use in equation 5. The derivative vanishes in the direction of e θ , yielding: Exploiting the geometry in Figure 4 , we relate cos(ϕ i ) as a dot product of the surface unit normal,n, and the vector from Applying equation 7 to 8, and summing over the charge distribution ( Figure 4 ) we arrive at: where we have made the substitution r = A + p described in Figure 4 . At the dielectric boundary, p = 0, applying equation 5 to equation 9 gives the following closed-form, analytical approximation for the apparent surface charge: Arguably the simplest self-consistency check of analytical ASC is that the total surface charge produced by equation 10 should be zero for any of the neutral small molecules making up our main test set: In the discrete DB case, as the triangulation density is increased, we expect the numerical approximation of the total charge integral, equation 11, to approach zero. Thus, the same simple check automatically tests both the analytical ASC and the DB discretization (triangulation) used. As seen from Figure 5 , our ASC implementation follows the expected trend. We stress that the only purpose of this simple test is a "sanity check" of our code implementation; the accuracy of the derived approximation is tested thoroughly below. The exact result for the total surface charge is zero; as NanoShaper grid density is increased, the molecular charge estimated via our implementation also tends to zero. An average over the entire set of small neutral molecules is shown. NanoShaper inverse grid spacings are given in Å −1 , while average total molecular charges are given in |e|. C. Accuracy against the Analytical PB Reference The apparent surface charge formulation allows us to gain insights into a variety of solvation effects 28,100 , including the estimate of hydration free energy, which we will use extensively here to evaluate the accuracy of our new approach against accepted reference. Within the implicit solvation framework 8 , hydration free energy of a molecule is often approximated 101 as the sum of polar (electrostatic) and non-polar components: Of the two components in equation 12, the electrostatic solvation free energy, ∆G el , often contributes the most to the total in polar solvents such as water, especially for macro- where V (r i ) and V (r i ) vac are the electrostatic potentials due to the given charge distribution in the solvent and in vacuum, respectively, sampled at each point charge q i located at r i . In the special case when the inner and outer dielectric constants, in and out , are equal to 1 and 80, respectively, we call ∆G el the electrostatic hydration free energy. We use equations 2 and 13 to write: Though equation 14 is valid for any choice of DB, the surface integral is non-trivial to compute. We approximate the surface integral using a specific triangulation of the DB, see section II B. This discrete representation approximates equation 14 as: where σ T , A T , and r T are the apparent surface charge, area, and center of the triangle T (to express ∆G el in kcal/mol, which is often convenient, the equation above is multiplied by 166, while using atomic units of length, Å, and charge, |e| ). The ASC on T is found by averaging the ASC at its comprising vertices. The triangular center is simply the centroid of T , with its area calculated using Heron's formula: where a, b, and c are the side lengths of T . Here we present general running time descriptions for each tested method, rather than In algorithmic time complexity, the three methods we compare in Table I are can be clearly seen when we focus on hen-egg lysozyme (2LZT) and double-stranded DNA wall running times. Though the 2LZT structure has about 400 more atoms than the DNA structure, the intra-molecular width of the DNA structure is almost double that of 2LZT. This means that the DNA structure has both a larger total surface area and requires a bigger volumetric bounding box; as seen in Table I , we find longer running times for our ASC approximation and the NPB reference for 2LZT as compared to DNA, but not for the IGB5 reference. Hence, this contrasting algorithmic complexity affects computational timings between each model for structures of different size and characteristic. A principal consequence is the following: The best case scenario for the efficiency of our ASC approximation is for structures having many atoms, but a comparatively low surface area; in terms of the derivation of our model, it is coincidental that in three dimensions the shape maximizing total inner "volume" (number of atoms) and minimizing outer surface area is that of a sphere. Though the efficiency of our analytical ASC implementation is not at the level of the IGB5 reference, it occupies a different niche: its main purpose is the estimation of the ASC and the electric field. It is worth noting that the surface integration required for equation 15 is trivially parallel, since equations 9 and 10 can be computed independent of adjacent surface elements on the DB. This is in addition to the more fundamental parallelism present in equations 9 and 10, due to the independence of per-charge contributions to the electric field and ASC, respectively. Improvements in the efficiency of our implementation would greatly improve performance as surface resolution is increased, or in "worst-case" scenarios such as those seen in the DNA fragment, Table I to the discrete sampling of the NPB potential map, the electric field normals computed with our ASC approximation are, visually, almost indistinguishable from those computed with the NPB reference. The near quantitative agreement between the analytical ASC and the NPB reference, Figure 7 is nontrivial, and goes beyond the visual match of the "reds" and the "blues" with the NPB reference. Notice that, when α is set to 0 in equations 7 and 9, the analytical ASC reduces to the so-called Coulomb Field Approximation (CFA), which is often used, and can be considered a Null model here. field that is approximately 36% weaker than our analytical ASC method, which reproduces the NPB reference closely. The significant deviation of the CFA surface charge from the reference translates into its poor accuracy in estimation of the hydration free energy, Figure 9 below. 21 b. Quantitative assessment of ASC accuracy Next, we examine how our ASC approximation compares quantitatively to the NPB reference. Though electric field normals (or linearly related surface charges) are not the most intuitive accuracy metrics, these quantities are those that our method directly computes, and so a direct comparison with the NPB is in order. In section IV B 3, we discuss a physical interpretation of the deviation of these quantities from the reference. Relative to the NPB reference, our ASC approximation achieves an average RMSD and absolute difference of 0.14 and 0.11 kcal/( mol · e · Å) on calculated electric field normal values, respectively. From our findings above, our ASC approximation has an approximately 20% reduced RMSD to NPB reference hydration free energies than the IGB5 reference. This accuracy gain is encouraging, particularly from the perspective of design: GB models are designed to analytically approximate the electrostatic solvation free energy obtained from solving the Poisson equation. That our model can estimate solvation energies more accurately than a widely used GB model suggests that the approximation of the ASC and electric field normals by the model is reasonably accurate to be considered for practical applications. Additionally, the results achieved above give another encouraging conclusion, with respect to running times efficiencies. To achieve a deviation from the NPB reference of just slightly above kT , we do not require an overly fine triangulation density for the analytical ASC. When the grid resolution is set to what is used in the timing section above (Table I) , RMSD against the NPB reference, differs only in non-significant digits. Thus, our analytical ASC can achieve a very similar accuracy, without incurring a heavy 1-2 order of magnitude time penalty, as seen with the NPB reference at this fine grid resolution. In analyzing the performance of our model on structures of increased size and complexity, we examine a fragment of double-stranded DNA and the hen-egg lysozyme. Structures of this type -with regions of the DB having deep, negative curvature "pockets" -present some of the toughest tests for our model, due to certain theoretical considerations we touch on below. Double-Stranded DNA. First, we examine our analytical approximation on a doublestranded DNA fragment. Qualitatively, Figure 10 shows that our analytical approximation reproduces the NPB reference quite well, although some discrepancies are clearly seen in the grooves, that is in the regions of negative curvature, see a discussion below. Triclinic Hen Egg White Lysozyme Next, we compare our analytical ASC to the NPB reference on the triclinic hen egg white lysozyme. Table II in a better context, it may be helpful to consider a hypothetical situation. Suppose a biomolecule of interest has a constant electric field strength near its dielectric boundary. When compared to the reference, there is, on average, a ∼ 0.4 kcal/( mol · e · Å) RMSD error in the electric field normal values. If a unit electric charge was moved 1 Å away from the biomolecular boundary, along the surface normal, the error in the total work done by the electric field would be less than ∼ 0.4 kcal / mol -small when compared to the "gold-standard" 1 kcal / mol difference against reference. Several caveats are due here. First, the estimate is based on the RMS error; it is possible that the errors are larger in some spots near the DB. That concern is mitigated to an extent by the fact that the electric field strength is inversely related to the square of distance from the surface, and would, in reality, decrease away from the surface, making any discrepancy with the reference smaller. Second, this test is free from the danger of a specific error cancellation that may affect the commonly used error assessment based on solvation free energies. As Figure 10 and the binding cleft in Figure 11 , do not occur on a sphere; this is where our model is not expected to perform well. A resulting loss in performance had been noted previously 36 for the approximate electrostatic potential (equation 7). It is therefore surprising that a qualitative agreement with the NPB reference is still seen, even in these regions. More shallow negative curvature regions are also present in the small molecules, Figure 7 , but apparently these do not present a significant challenge to the new model. Confounding this effect, deep negative curvature regions on the DB can restrict water molecule conformational freedom, making nearby solvent behave less similar to that of the 28 bulk 108, 109 . It has been shown that regions of this type can significantly modify interactions between small molecule inhibitors and their target proteins 110 , prompting investigations related to their identification 111, 112 . In our context, this change in the behavior of water bulk has negative implications on the performance of our model, but the same is true for the NPB reference, which is also based on the continuum solvent. Because both of these models are expected to deviate from the correct physical behavior within these regions of negative curvature, we argue that a qualitative agreement with the NPB reference may be acceptable here, in place of a strong quantitative agreement. c. Computational Considerations Though not implemented in this work, one very important consequence of our analytical approach to the ASC confers a key additional theoretical benefit -the trivially parallel nature of our solution. Equation 10 is computed over our discrete DB representation, and is totally independent from surrounding surface elements. Coupled with a similarly parallelizable expression for the computation of electrostatic solvation free energy, equation 15, our method holds significant potential for the computationally efficient ASC treatment of large biomolecules. For a real-world application, we examine our approximation on a much larger (∼6500 atom) complex, with important relevance today -the ACE2/SARS-CoV-2 complex (PDB ID:6M0J). SARS-CoV-2 mutants of concern 114 , especially in high throughput studies. Our method can potentially be useful in this area due to its targeted, source-based approach to computation of ASC, where only a small portion the entire DB, and hence a small subset of the surface elements, is included in the computation, as demonstrated in Figure 12 . As a result, the compute time is reduced dramatically, see the discussion on time complexity above. In this work, we have derived a closed-form, analytical approximation for biomolecular apparent surface charge (ASC), and the normal component of the electric field outside the molecule. To the best of our knowledge, this is the first such fully analytical approximation. The approximation is constructed to closely reproduce the exact infinite series solution for a perfect spherical boundary; more importantly, the approximation yields a reasonably close agreement with the standard numerical PB for realistic molecular structures. Specifically, quantitative reproduction of results from the standard numerical PB reference is achieved on most of the tested molecules, except within prominent regions of negative curvature, where the new approximation is still qualitatively correct. Comparisons with a popular fast GB model in AMBER (IGB5) shows that our method is more accurate in reproducing the hydration free energy, albeit at higher computational expense, which may be expected of proof-of-concept code package that is not highly optimized. At the same time, standard numerical PB is still 1-2 orders of magnitude slower than the proposed approximation, which puts it "in-between" fast analytical GB and numerical PB. We stress that solvation free energy estimates are used here as a common and convenient accuracy metric, and is not where we believe the potential benefits of the proposed analytical ASC may be. These potential benefits stem from the unique features of the method. There are at least two features of the new approximation absent from the GB: the ability to estimate the apparent surface charge (and, hence, the potential everywhere); and the ability to estimate the normal component of the electric field. Another noteworthy feature of the approach sets it apart from other existing approximations that can estimate ASC, including those aimed at computing ASC directly -the fact that the new approximation is "source-based". This means that the normal electric field and the ASC can be estimated at any individual point or surface patch, without the need for self-consistent computation over the entire surface or volume. This feature is in contrast to "field-based" methods such as numerical solutions of the Poisson equation or DPCM. As an illustration, we showed that the "source-based" feature of our ASC approximation allows a rapid examination of the ACE2/SARS-CoV-2 RBD electrostatics, reproducing conditions posited to contribute to the spike protein's high binding strength. An area which, in our view, can benefit the most from the proposed analytical ASC is the development of new implicit solvation methods that require fast estimates of local polarization charges or/and fields. We also believe the new approach may have the potential to compete with existing ASC-based approaches in QM applications, especially where computational efficiency is key; further extensive testing and analysis beyond the scope of the proof-of-concept work will be necessary to explore the potential of the method in this area. As it stands, the proposed method has several limitations. First, it does not yet include salt effects explicitly. However, in the future, it should be relatively easy to add into the model salt dependence at the Debye-Huckel level, following an approach outlined in Ref. 97 . Another limitation of the model is its qualitative nature in the regions of high negative curvature, at least relative to the standard NPB reference. Overcoming this specific limitation will require a significant extension of the underlying theory, and extensive testing on biomolecular structures. A careful and detailed comparison of our proof-of-concept approximation within the broader category of existing, optimized implementations ASC methods has not been performed, and is warranted in the future; nonetheless, promising results so far have pointed to the potential of our approach in forming the basis of novel implicit models of solvation. We would like to thank Nitin Passa for his contributions to the numerical ideas behind this application of ASC. In addition, we appreciate the help of Dr. Igor Tolokh for supplying the double-stranded DNA snapshot, and Dr. Andrew Fenley for his useful comments and suggestions. Biological Applications of Electrostatic Calculations and Brownian Dynamics Simulations Implicit Solvent Electrostatics in Biomolecular Simulation Proceedings of the National Academy of Sciences of the United States of America Continuum Electrostatics Solvent Modeling with the Generalized Born Model Crystal structure of SARS-CoV-2 spike receptor-binding domain bound with ACE2 An object-oriented programming suite for electrostatic effects in biological molecules An experience report on the MEAD project Proceedings of the Royal Society of London. Series B The authors have no conflicts to disclose. The data that support the findings of this study are available from the corresponding author upon reasonable request.