key: cord-0809745-2uix5lca
authors: Kucherova, Anna; Strango, Selma; Sukenik, Shahar; Theillard, Maxime
title: Modeling the Opening SARS-CoV-2 Spike: an Investigation of its Dynamic Electro-Geometric Properties
date: 2020-10-29
journal: bioRxiv
DOI: 10.1101/2020.10.29.361261
sha: cd440706afd484499d7e076ab75426560d7b9bfa
doc_id: 809745
cord_uid: 2uix5lca

The recent COVID-19 pandemic has brought about a surge of crowd-sourced initiatives aimed at simulating the proteins of the SARS-CoV-2 virus. A bottleneck currently exists in translating these simulations into tangible predictions that can be leveraged for pharmacological studies. Here we report on extensive electrostatic calculations done on an exascale simulation of the opening of the SARS-CoV-2 spike protein, performed by the Folding@home initiative. We compute the electric potential as the solution of the non-linear Poisson-Boltzmann equation using a parallel sharp numerical solver. The inherent multiple length scales present in the geometry and solution are reproduced using highly adaptive Octree grids. We analyze our results focusing on the electro-geometric properties of the receptor-binding domain and its vicinity. This work paves the way for a new class of hybrid computational and data-enabled approaches, where molecular dynamics simulations are combined with continuum modeling to produce high-fidelity computational measurements serving as a basis for protein bio-mechanism investigations.

Since its emergence in December 2019, the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to be a major concern due to its high transmission rate and worldwide presence [2, 41, 42, 44] . The mechanism by which the virus interacts with cells is mediated by the receptor-binding domain (RBD) on the SARS-CoV-2 spike protein (S protein). The full S protein, composed of three identical S proteins to form a homotrimer, binds to the human angiotensin-converting enzyme 2 (ACE2), and this interaction is the principal instrument for viral infection [2, 18, 20, 33] .

Multiple studies maintain that the spike protein's structure and fxunction are two features highly responsible for cell infection because the conformational rearrangement of the S protein reveals its binding interface [15, 18, 37, 39] . Recently, Zimmerman et al. [46] produced the first molecular dynamics simulation of the spike opening. This computational tour de force involved millions of citizen scientists collaborating through the Folding@home initiative [1] , and produced an overall 0.1s of molecular trajectories. For a full characterization of the protein interaction, the molecular trajectory may not be enough.

One aspect that may provide some insight into the interactions of the S protein is the electrostatic potential it generates. Indeed, according to a previous study, the affinity constant for the RBD of SARS-CoV-2 to the ACE2 is 10 to 15 times greater than that of SARS-CoV, potentially contributing to its transmission efficiency [37] . The reason for the higher binding affinity was attributed to several mutations, most notably from the residue Val404, found in SARS-CoV, to the positively charged Lys417 in SARS-CoV-2. This mutation resulted in an intensified electrostatic potential complementarity between the negatively charged ACE2 binding site and the now more positively charged RBD of SARS-CoV-2 [15, 37] . To understand the contribution of charges, and also to help inform drug design strategies that leverage this distribution, it is imperative to understand how the S protein electrostatic potential changes during its opening.

Although the electrostatic surface between the closed and open conformations of the spike protein is well characterized, all of the aforementioned work examine static structural data [2, 15, 37] . There is a deficit of numerical quantification regarding the surface of the entire S protein, the electrostatic potential on and around the protein's surface, and how these properties vary as the protein transitions from a closed to an open conformation.

In this work, we leverage the S protein opening trajectory created by the Folding@home initiative to examine and characterize the electrostatic potential of the SARS-CoV-2 S protein during trimer opening. The novelty of our exploration lies in the combination of molecular dynamics simulation with partial differential equation modeling to obtain dynamical electric potential maps, thus revealing variations of the electro-geometric properties of the S protein. At each frame of the simulated trajectory, we reconstruct the S protein surface and calculate the generated electric potential with the approach developed by Mirzadeh et al. [26, 27] . Using adaptive non-graded Octree grids and sharp discretizations we efficiently produce high-fidelity solutions to the non-linear Poisson-Boltzmann equation. In the continuum model, all temporal variation may be neglected, and as a consequence, all potential maps are independent, making their computation parallelizable.

Our study of the electrostatic dynamics of the S protein opening reveals dramatic rearrangements of the electrostatic field during this process. These rearrangements act to localize a negatively charged field towards the interior of the S protein and expose a positive surface of its residue binding domain. This is in line with the negative charge of the target binding region on the ACE2 receptor and may aid the S protein in binding to its target with high affinity.

The paper is arranged as follows: in section 2.1, we present our molecular trajectory emphasis and investigate the spike protein's conformational changes. The dynamical electrostatic and geometric properties of the spike protein are characterized in section 2.2, followed by an analysis and fundamental takeaways in section 3. Section 4.1 entails a thorough mathematical design for the calculation of the potential field that was carried out on each frame in the trajectory. Section 4.2 encapsulates the numerical method used accompanied by a convergence study for computational validation.

The S protein exists primarily in the closed configuration, hiding the three identical RBDs in its core [46] . The Folding@home trajectory focuses on the opening of the unglycosylated, uncleaved spike configuration through the detachment of a single monomer from the spike's core, revealing its associated binding site. A similar (but not identical) monomeric-opening state has recently shown to be populated in ≈ 16% of the unbound spike population in a recent cryoEM study [5] .

The trajectory, from Zimmerman et al. [46] contains a 71-frame sequence that captures the most populated pathway of the spike protein's transition from its closed to open state. This path was calculated by means of a goal-oriented adaptive sampling algorithm (FAST, [45] ) to favorably sample spike opening.

Each frame consists of a list of N = 51, 671 atoms, represented by their positions x i=1..N , radius r i=1..N and fixed partial charge z i=1..N . Throughout these frames, the protein undergoes continuous deformations which shift the receptor-binding domain of one of the monomers from being hidden, while the spike is in its closed conformation, to being fully exposed once the spike opens. Figure 1a and 1b represent the protein's molecular structure in the open and closed configurations. Each one of its chains is depicted in a different color to illustrate the protein's trimeric structure. The revealed receptor-binding domain is located at the opening extremity of the red chain (see Fig. 1a ,1b).

To provide preliminary quantification of the spike opening, dozens of atom pairs on opposing sides of the spike were chosen and the distance between them measured as the spike transitions from one conformation to another (i.e. as a function of frame) ( Fig.1a and b) . One of the atoms in the pair was chosen from the binding interface, the monomer depicting in red in Fig.1(a and b) , while the second atom in the pair was chosen across the top of the spike opening. Although all pairs presented the same general trajectory, four of these atom pairs were arbitrarily chosen as a subset for illustration.

The resulting relative distance between the atom pairs is measured in Angstroms (Å), as shown in Fig. 1c . The lines labeled 1, 2, 3, and 4 refer to the behavior of the four-pair subset, with their average behavior depicted in black. All four measurements generally follow the same trend. This extension happens continuously outside of frames 50 to 55 which show abrupt variations.

To measure the total structural deformation we compute the root-mean-square deviation (RMSD). It measures the average distance between atoms in the current position and some reference configuration, defined here as the initial closed configuration. Specifically

where the superscripts n and 1 are for the current and initial states. Despite large variations at the initial and final stages, the RMSD evolution (depicted in Fig. 1d ), shares similar features with the evolution of the relative opening. These similarities suggest that the significant transformation that occurs during the opening are recapitulated by the 4 distances in Fig. 1c .

The S protein remains predominantly in the closed conformation to mask its receptor-binding domains (RBDs), thereby impeding their binding. To bind with ACE2, the S protein transforms into its open conformation, revealing its binding interface [46] .

In describing our results, we refer to the part of the spike containing the three RBDs as the top part, and the predominantly negatively charged portion binding to the virus capsid as the bottom part.

Our simulations (see Fig. 2a ) illustrate that the top part of the spike protein is predominantly positively charged, in accordance with the negative charge of the ACE2 binding site (see Fig. 3a) . Surprisingly, the top of the spike also reveals an underlying core with a surface area that has a dense negative charge. As the spike opens, this area could be exposed to the solvent, generating a negatively charged electrostatic cloud in the upper part of the protein, which may repulse the negatively charged ACE2 receptor. Therefore, we wanted to characterize the dynamics of the geometrical and electric properties of this negatively charged core χ (Fig. 2b) , the resulting repulsive cloud C (Fig. 2c) , the binding site B (Fig. 3c) , and the far electric field of the spike protein as they shift from the dynamics that occur during spike opening.

We defined the negatively charged core χ, illustrated in Fig. 2b , as the area of the SES in the top of the protein where the electric potential is more negative than a threshold value c χ = −5. Figure 2 c and d depict the evolution of its surface and average charge. As the spike opens, the area shrinks by about 50%, while the variation in the average potential remains small (≈ 10%). In both cases, the most significant variations happen during the first 10 iterations. In comparison, the molecular structure analysis (see Fig. 1 ) displays continuous variation over the entirety of the trajectory, indicating that the structural and electro-geometric transformations are non-trivially coupled. The diminution of both quantities could be explained by the fact that the spike opening removes a barrier, one of the monomers, between the core and solvent, causing the dilution of the surface charges in the solvent, and therefore a contraction of the core and its charge. 

We define the negatively charged electric cloud C, generated by the core χ, as the region in the upper part of the solvent where the potential is below the threshold value c C = −2 (see Fig. 2c ). The measurements presented in Fig. 2d and e indicate C expands by ≈ 42% as the spike opens, while its average potential decreases in magnitude by 30%. Again we interpret this phenomenon as an effective dissolution of the negative charges induced by the removal of the protecting monomer.

For both geometries χ and C, we observe large variations in the first 10 frames. The abrupt change (around frame 50) in the protein structure, discussed in section 2.1, is only perceptible in the size variation of the electric cloud C.

The interface between the SARS-CoV-2 receptor-binding domain (RBD) and ACE2 are of particular interest as the binding of the two facilitates virus entry into cells [2, 18, 20, 33] . Lan et al. [18] undertook the task of determining the residues of the RBD and ACE2 interface which form a connection, finding the location of the binding site on the spike, B, to consist of 17 residues between Lys417-Tyr505 and the ACE2 site to be formed of 20 residues between Gln24-Tyr83 and Asn330-Arg393.

For this analysis we define the binding site, B, for the spike and the ACE2 receptor as the portion of the proteins SES generated by the residues sequences [417−505] and [24−42] respectively. The spike RBD is generated by the sequence [333−527]. When the spike is in the closed position B is located near its center, but as the spike shifts to its open conformation, B is moved outward and exposed to the solvent, as seen in Fig. 4 . As the spike opens, we monitor the evolution of the binding site area, average absolute curvature, and potential. The average is computed over the binding site surface. We interpret the average absolute curvatures as measures of the global convexity of the binding site. For the potential evolution, we distinguish between the average potential, average positive potential, and average negative potential. Figure 3 provides a depiction of the resulting information. A relative decrease is observed for the area and average absolute curvature: as the spike opens the binding site shrinks and flattens, in each case constant, resulting in the total average potential of B following very closely with the changes in average ψ − and ultimately nearing 0 in the final frame. This increase in average potential is consistent with the knowledge that the ACE2 receptor it is designed to bind to is negatively charged, and may be important in directing the ACE2 binding to the correct region.

The spike electric far-field streamlines are depicted in Fig. 4 . In the closed configuration, the majority of streamlines emanating from the bottom part of the spike are pointing away from the spike. Only a few of them, caused by the rare presence of positive charges, return rapidly to the protein. In the upper part, the streamlines are predominantly closed, suggesting that in this configuration the spike protein may not be able to attract charges of any polarity at long range, and therefore has limited attracting potential. This may cause a lower affinity. As the spike opens, the streamlines in the bottom remain unchanged. However, at the top of the protein, the streamlines are now predominantly open. Fig 5(a)-(b) depicts the entire electric field structure along with the region in the solvent where the electric field point away from the protein, or equivalently where negative charges would be drawn to the spike. In the closed configuration, this region is predominantly located above the spike, while in the open configuration it is split into three sub-regions, each centered around one of the RBDs. In the latter, the sub-region centered around the revealed RBD represents 45% of the total region and carries 34% of its total electric potential energy.

To characterize the restructuring of the electric field of the S protein, we define the first four moments of the non-dimensional charge distribution q(x) = sinh(−ψ(x)) in the solvent

Dipole

Quadrupole

Octupole

Note that the last two moments are defined in their symmetric traceless form, as they would be for a standard electrostatic multipole expansion. Because here our continuum model is more complex, it should be reminded that these four moments are not guaranteed to be the ones appearing in the multipole expansion. Nonetheless, they remain a pertinent tool to characterize the structure of the electrostatic solution. Their evolution throughout the opening, illustrated in Fig. 5c , show that the strength of the dipole (D) is decreasing, while all other moments norms are increasing. The two most informative moments for the structure of the electric field, the dipole and the quadrupole, are diminished by 9% and increased by 67% respectively. The principal direction of Q associated to the positive eigenvector appears to remain quasiparallel to the dipole direction. In fact, all four directions (the dipole direction and the three principal directions of Q) appear to undergo the same rotation. The opening also reinforces the anisotropy in the tensor Q, by amplifying the difference between the magnitude of its eigenvalues, compressing one of its directions and effectively turning it into a two-dimensional quadrupole.

The overarching objective was to reveal the impact of the SARS-CoV-2 spike protein's transition from its open to closed conformation. Thus, we examined the entire protein's electrostatic potential dynamics, focusing particularly on its receptor-binding domains. In the Folding@home trajectory we analyzed, one of the three monomers detaches from the core of the complex and becomes visible to the surrounding environment, consistent with recent cryo-EM derived structures [5] . Our results show that despite the dramatic molecular displacement engendered by this strategic re-positioning, the geometric and electric properties of the RBD itself remain largely unaltered. Rather, as we describe below, changes emanating from the exposure of the core of the S protein cause a change in the electric fields surrounding the RBD.

Our continuum analysis, both of the surface potential and the volumetric potential in the vicinity of the binding region point to the existence of an inner negatively charged core on the surface of the spike, which is revealed to the solvent as the spike opens. This negatively charged surface shrinks as the spike opens, inducing a negatively charged region between the exposed RBD and the two hidden ones. The emergence of this electric cloud is a priori puzzling: the binding site of the ACE2 receptor being almost entirely negatively charged, we would expect this cloud to repulse the receptor. However, this cloud does not envelop the spike's binding site, B, which becomes more positively charged. This is in line with what we expect is a strong binder of the ACE2 receptor, and indicates that targeting the open state of the S protein may be a more viable drug design strategy than the closed configuration. Indeed, recent cryoEM studies indicated that ≈ 16% of a recombinantly expressed S protein population is in an open state, with a single monomer "erected" from the core [5] .

The electric field, which we observe uncoiling above the protein, exhibits a dramatic rearrangement. This is correlated to the emergence of the negatively charged cloud and manifests in the multipole moments decomposition. In particular, we observe the quadrupole moment growing in magnitude, rotating, and compressing one of its principal directions. This transformation can be interpreted as a strategic transition from an undirected configuration where the entire top part of the spike is attracting negatively charged structures, such as the ACE2 binding site, to one where the attraction is directed toward a specific RBD.

At the computational level, we have developed a novel framework, combining molecular dynamics simulations with a continuum physics numerical solver to produce high-resolution multiscale dynamical potential maps of deforming proteins and their surrounding environment. Our framework includes practical mathematical tools to quantify the numerical error and characterize protein electro-geometric properties. The molecular trajectories alone are insufficient for understanding protein electrostatic interactions and thus, protein core bio-mechanisms. On the other hand, it is unrealistic to envision an approach where trajectories of such a large protein could be obtained solely through continuum modeling. We believe hybrid approaches, such as the one presented here, can provide the scientific community with invaluable information, which we hope can be used to elucidate how changes to physical properties such as electrostatics translate into function.

The progression of time will uncover a multitude of breakthroughs regarding the behavior of the SARS-CoV-2 S protein. In the short time since the conception of our investigation, numerous discoveries by other researchers have been made public already. For example, the trajectory presented in [46] contains glycans, chemical compounds known to coat the exterior of many viruses, which may have an impact on the results presented here. Recall that the trajectory we explored here is simply one possible path the spike may follow, and so there is a possibility that a different trajectory would be a more accurate representation of the true function of the spike protein. Recent studies displayed cryo EM structures for the spike, presenting an open configuration that differs from the open configuration represented here [40, 36] . A similar analysis to the one presented here may be required on different structures, as more probable trajectories are predicted and novel molecular structures uncovered, to capture a more relevant image of the electric potential on and around the spike. Utilizing the same scientific pipeline, we are confident high-fidelity quantitative insights -into the electric potential of molecules unrestricted to the one investigated here -could be obtained efficiently, supporting the global scientific effort.

The Poisson-Boltzmann equation has long been recognized as the representation of choice to model electric potential. It has drawn significant interest from the computational community since the pioneering calculations of Warwicker and Watson in the early 1980s [38] , which has lead to the production of a broad variety of numerical solvers [4, 3, 6, 7, 13, 21, 17, 22] and open source software [12, 16, 32] , built over the traditional spectrum of numerical methods. Such tools are, for example, employed in the context of drug development and discovery, to calculate solvation free energies [10, 14, 28] . Using massively parallel architectures [11] , these calculations can be carried out on considerably large proteins, such as the entire HIV-1 capsid (Protein database entry: 3J3Q, 4, 884, 312 atoms).

This section describes the reconstruction of the potential map at each iteration from the current protein structure using the non-linear Poisson-Boltzmann (PB) equation. Because of the way the trajectory has been obtained, all frames are independent. Thus, the potential maps are decoupled and can be computed separately. A more comprehensive model, such as the Poisson-Boltzmann-Nernst-Planck model [43, 25] , would consider the diffusion of the ions and introduce time derivatives in the partial differential equations.

Because at the atomic scale (L = 1Å), for typical diffusivities (D ≈ 10 −9 m 2 s −1 ), the ionic diffusion time scale (τ D ≈ L 2 D = 10 −11 s ) is orders of magnitude smaller than the estimated opening duration (τ O >> 10 −9 s), these effects can be neglected and the static PB equation is a pertinent model. Readers not interested in the numerical specifics can skip the following sections (4.1 and 4.2).

A common way of portraying a molecular surface is by use of the van der Waals Surface depiction (see Fig. 6b ). Each atom in a molecule is depicted by a sphere with location x i=1...N and radius r i=1...N that is defined by an isosurface on their electron density. The union of these spheres, depicted in gray in Fig. 6 , forms the van der Waals Surfaces (vdWS) of that molecule. The nature of the vdWS means that some regions might be identified as being exposed to the solvent, while in fact, their geometry makes them inaccessible to the solvent particles. As a consequence, we define the Solvent Accessible Surface (SAS [19] ) of the protein, depicted as the blue outline in Fig. 6 b; it is formed through the addition of the solvent's particle radius, r P , to each r i=1..N resulting in a buffer around the vdWS. While the puffed-up SAS depiction may be useful for some areas of study the Solvent Excluded Surface (SES) is preferable when discussing surface details of a molecule [31] , as shown in Fig. 6b . As the name suggests, the inner molecule region defined by this surface includes all locations the solvent cannot occupy including the vdWS and the tiny crevasses on its exterior, shown as concave black triangles at the meeting of atoms A and B in Fig. 6b . 

To represent the biomolecule, we employ the level set method [30] and capture the SES location as the zero level set of an auxiliary field φ(x) defined over the domain of interest Ω. The solvent Ω − , the Solvent Excluded Surface Γ, and the inside of the molecule Ω + are defined as

The normal n to the interface Γ is defined as pointing toward Ω + . It is calculated as the normalized level set gradient n = ∇φ |∇φ| . 

7.039 × 10 3 Table 1 : Problem parameters and non-dimensional numbers.

The level set function is constructed following the approach proposed by Mirzadeh et al. in [27] . We start by constructing the level set function φ SAS (x) representing the SAS as

The SAS level set function is then reinitialized to be a signed-distance function (i.e. |∇φ SAS | = 1) by solving the reinitialization equation in fictitious time τ until the steady state is reached

From the reinitialized function R(φ SAS ), the SES level set function is then obtained as

As it was pointed out in [27] , this procedure can create non-physical inner cavities. They are identified by finding physical points disconnected from the contour of the computational domain ∂Ω where the level set function φ(x) is positive. In practice, such points are isolated by solving the following Laplace problem

and detecting where φ(x) < 0 and c = 0. The cavities are removed by switching the sign of the level set function at these problematic positions. Finally, for computational purposes the SES level set is systematically reinitialized.

The electrostatic potential, Ψ, around a biomolecule immersed in a binary z:z electrolyte solution can be described by the following nonlinear Poisson-Boltzmann (PB) equation,

where is the relative permittivity, being equal to + in the molecule (Ω + ) and − in the electrolyte (Ω − ).

0 is the permittivity of a vacuum, N A is the Avogadro number, c b is the bulk salt concentration, e is the elementary charge, k B is the Boltzmann constant, T is the temperature, z is the valence of the background electrolyte, q i and x i are the partial charge and position of the i th atom respectively, N is the total number of atoms in the molecule. c b (x) = 0 inside the molecule. In non-dimensional form, the Poisson-Boltzmann equation takes the following form

where the characteristic length L is chosen to be 1Å, the potential is scaled by the thermal voltage ( k B T e ), z i is the non-dimensional partial charge on the i th atom, and κ D = 2c b N A e 2 z 0kB T L 2 is the non-dimensional inverse of the Debye length. Inside the molecule, κ D is null. The constant λ B = e 2 k B T 0L is the non-dimensional Bjerrum length. The above non-dimensional PDE is completed with the following jump conditions on non-dimensional potential

where the jump operator is defined for any quantity ζ defined in both domain as [ζ] Γ = ζ + − ζ − . All parameters and non-dimensional numbers, along with their values for this study, are summarized in table 1.

Following [27] , we treat the singularities arising in the solution due to the singular charges inside the molecules by using the decomposition proposed by Chern et al. [8] . Doing so we split the non-dimensional potential ψ, into regular and singular parts:ψ andψ, respectively ψ =ψ +ψ.

The singular part is itself split into two parts ψ * and ψ 0

where ψ * is the Coulombic potential due to singular charges,

and ψ 0 satisfies the following Poisson's problem

Utilizing the decomposition shown above, the regular (non-singular) part of the solution is given by solving subject to the following jump conditions:

Since ψ * is known analytically, the gradient, ∇ψ * , appearing in the right hand side of (26) can be computed exactly

while ∇ψ 0 must be numerically approximated.

The numerical approach for the resolution of the above Poisson-Boltzmann problem is presented in this section, along with novel practical a postieri error estimates. We then verify the method for the entire trajectory by conducting a systematic convergence study for these estimators.

The numerical method, implemented on non-graded adaptive Octree grids, follows the general description given in [27] , with each snapshot of the protein treated independently. The surface and grid generation is done using the level set framework developed by Min and Gibou [23] . In particular, as Fig. 7 illustrates, the mesh is systematically adapted to the SES location. All quantities are stored at the nodes of the mesh for improved accuracy and facilitated manipulation. The numerical solutions of the Poisson systems (13)-(14)- (15) and (22)-(23) (for the cavities detection and the construction of the regular part of the solution ψ 0 respectively) are obtained using the second-order approach presented in [35] , itself based on [24] . The solution,ψ, to the problem defined by Eqs. (24) , (25) and (26) is constructed using a nodal version of the jump solver presented in [34] . The non-linearity of Eq. (24) is addressed using Newton's method [8, 26, 27] , with a relative error tolerance of 10 −6 , chosen to be orders of magnitude smaller than the desired overall numerical error. The whole method is parallelized in a shared memory fashion using OpenMP [9, 29] .

The entire computational domain is defined as the cube of side length 400Å (about twice the size of the whole protein) and center x c = 1 N N i=1 x i . Calculations were performed on Octrees of maximum level ranging from 7 to 11. On the finest grids, the minimal spatial resolution is 0.18Å.

To monitor the convergence of the overall method we construct a posteriori numerical error estimators. They only rely on the decomposition presented in section 4.1.4, and can therefore be employed independently of the numerical approach. For the current implementation, detailed in section 4.2, we refer the interested reader to [26, 27] for formal convergence studies, using analytic solutions and order estimations.

The error on the interface representation, gradient of the solution (i.e. electric field) and solution itself can be estimated using the following metrics e Γ , e E , e ψ

where |Γ| denotes the surface area of Γ. In virtue of Gauss's Theorem the first two integrals are null. Because of the boundary condition (23), the third quantity should also be null. When computing e Γ , because the gradient of the Coulombic and the total charge are known exactly, numerical errors can only arise from calculating the local normal n or approximating the surface integral. Therefore, this metric focuses on the geometry and its manipulation only. The numerical errors in the gradient of the component ψ 0 are the main source of errors in the second metric e E . Thus, it can be interpreted as a lower bound estimate for the average total normal electric field on the interface. The metric e ψ , the average error in ψ 0 on the interface, can similarly be used to estimate the numerical error in the total potential on the interface. Figure 8a depicts the spike protein's electrostatic potential at the initial frame (closed configuration) as the mesh is refined. Positive electrostatic potential is colored in blue, neutral (no charge) in white, and the negative electrostatic potential is shown in red. The coarsest simulations (max level = 7, 8) are only able to reproduce the general structure of the protein but fail at creating an accurate electrostatic map. As the maximal resolution reaches the characteristic atomic radius (r min = 1Å), the finest geometrical features are correctly reproduced, leading to appreciably more accurate results (max level = 9, minimal resolution 0.79Å). Further increasing the spatial resolution refines these molecular structures and the small scale potential variations even more (max level = 10, 11).

The time evolution of our three error estimates for all examined resolutions is presented in Fig. 8 . As expected, all three metrics converge with increasing resolution. The impact of using subatomic resolution (i.e. max level ≥ 9) is well illustrated with the convergence of the electric field and potential error estimates: it is unclear for super-atomic resolutions (max level = 7, 8), and evident for subatomic ones (max level = 9, 10, 11). The error estimation for the total potential (e ψ ) is significantly larger than the variations between consecutive maximum levels observed in Fig. 8a , which is a proxy for the error on the regular part of the solutionψ. Since e ψ involves the singular part of the solution, which exhibits large spatial variations over small length scales, it is prone to higher numerical errors and is expected to be larger than the actual error on the regular part of the solution. Closer inspection of our measurements reveals that these two metrics may differ by at least one order of magnitude. The error estimation for ψ, using max level = 10 is approximately equal to the maximum absolute surface potential value (≈ 10) observed on Fig. 8a . This misleadingly suggests the relative error on the total solution is as large as 100%. The comparison between the potential maps for max level = 10 and max level = 11 indicates that in practice the relative error on the surface potential probably lies between 1% and 10%.

From all these remarks, we are confident that the method is correctly implemented and that the most refined simulations accurately capture the S protein's electrostatic potential. For the finest resolution, the estimated average error on the total potential is close to 2, which in light of the above discussion suggests that the average practical relative error on the surface potential is under 2%.

Comparing the binding interactions in the receptor binding domains of sars-cov-2 and sars-cov

Improving implicit solvent simulations: a Poisson-centric view. Current opinion in structural biology

Implicit solvent electrostatics in biomolecular simulation. New algorithms for macromolecular simulation

Receptor binding and priming of the spike protein of SARS-CoV-2 for membrane fusion

A fast and robust poisson-boltzmann solver based on adaptive cartesian grids

Recent advances in implicit solvent-based methods for biomolecular simulations

Accurate evaluation of electrostatics for macromolecules in solution

Openmp: An industry-standard api for shared-memory programming

Implicit solvent methods for free energy estimation

Fast and scalable algorithms for constructing solvent-excluded surfaces of large biomolecules

Pb-am: An open-source, fully analytical linear poisson-boltzmann solver

Treatment of charge singularities in implicit solvent models

Problems of robustness in poisson-boltzmann binding free energies

Considerations around the sars-cov-2 spike protein with particular attention to covid-19 brain infection and neurological symptoms

Improvements to the apbs biomolecular solvation software suite

Electrostatics calculations : latest methodological advances

Structure of the sars-cov-2 spike receptor-binding domain bound to the ace2 receptor

The interpretation of protein structures: estimation of static accessibility

Functional assessment of cell entry and receptor usage for sars-cov-2 and other lineage b betacoronaviruses

Progress in developing poisson-boltzmann equation solvers

Recent Progress in Numerical Methods for the Poisson Boltzmann Equation in Biophysical Applications

A second order accurate level set method on non-graded adaptive Cartesian grids

A supra-convergent finite difference scheme for the variable coefficient Poisson equation on non-graded grids

Enhanced charging kinetics of porous electrodes: Surface conduction as a short-circuit mechanism

A second-order discretization of the nonlinear PoissonBoltzmann equation over irregular geometries using non-graded adaptive Cartesian grids

An adaptive, finite difference solver for the nonlinear Poisson-Boltzmann equation with applications to biomolecular computations

Accurate, robust, and reliable calculations of poisson-boltzmann binding energies

OpenMP application program interface version 5.0

Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations

Model the solvent-excluded surface of 3d protein molecular structures using geometric pde-based level-set method

Extending the applicability of the nonlinear poisson-boltzmann equation: Multiple dielectric constants and multivalent ions

Characterization of the receptor-binding domain (rbd) of 2019 novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine

Sharp numerical simulation of incompressible two-phase flows

A multigrid method on non-graded adaptive Octree and Quadtree Cartesian grids

Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein

Enhanced receptor binding of sars-cov-2 through networks of hydrogen-bonding and hydrophobic interactions

Calculation of the electric potential in the active site cleft due to alpha-helix dipoles

Cryo-em structure of the 2019-ncov spike in the prefusion conformation

Sars-cov-2 and bat ratg13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects

The sars-cov-2 outbreak: What we know

Biochemical characterization of sars-cov-2 nucleocapsid protein

Poisson-boltzmann-nernst-planck model

A novel coronavirus from patients with pneumonia in china

Fast conformational searches by balancing exploration/exploitation trade-offs

The authors would like to thank M. Zimmerman, G. Bowman, and the Folding@home project for creating and providing us with the S protein opening trajectory. We also thank D. Strubbe and J. Grasis for valuable discussions. This research was supported by a COVID-19 seed grant from the Center for Information Technology 

The authors declare no competing interests.