key: cord-0918758-xdbtwiqw authors: Mansbach, Rachael A.; Chakraborty, Srirupa; Nguyen, Kien; Montefiori, David C.; Korber, Bette; Gnanakaran, S. title: The SARS-CoV-2 Spike Variant D614G Favors an Open Conformational State date: 2020-07-26 journal: bioRxiv DOI: 10.1101/2020.07.26.219741 sha: 311858eb1b09bfa98f0c641b7b131265a6a8c6d6 doc_id: 918758 cord_uid: xdbtwiqw The COVID-19 pandemic underwent a rapid transition with the emergence of a SARS-CoV-2 variant that carried the amino acid substitution D614G in the Spike protein that became globally prevalent. The G-form is both more infectious in vitro and associated with increased viral loads in infected people. To gain insight into the mechanism underlying these distinctive characteristics, we employed multiple replicas of microsecond all-atom simulations to probe the molecular-level impact of this substitution on Spike’s closed and open states. The open state enables Spike interactions with its human cellular receptor, ACE2. Here we show that changes in the inter-protomer energetics due to the D614G substitution favor a higher population of infection-capable (open) states. The inter-protomer interactions between S1 and S2 subunits in the open state of the D-form are asymmetric. This asymmetry is resolved in the G-form due to the release of tensile hydrogen bonds resulting in an increased population of open conformations. Thus, the increased infectivity of the G-form is likely due to a higher rate of profitable binding encounters with the host receptor. It is also predicted to be more neutralization sensitive due to enhanced exposure of the receptor binding domain, a key target region for neutralizing antibodies. The COVID-19 pandemic underwent a rapid transition with the emergence of a SARS-23 CoV-2 variant that carried the amino acid substitution D614G in the Spike protein that became 24 globally prevalent. The G-form is both more infectious in vitro and associated with increased viral 25 loads in infected people. To gain insight into the mechanism underlying these distinctive 26 characteristics, we employed multiple replicas of microsecond all-atom simulations to probe the Thus, the increased infectivity of the G-form is likely due to a higher rate of profitable binding increase in the flexibility of residues 610 to 650 in the G614 form ( Fig. S9G-J) . In general (Figs 136 S9-S10), we do not observe compensatory hydrogen bond formation with THR 859. Thus, it is 137 conceivable that this hydrogen bond reduction leads to a conformational relaxation. The distinct contact signatures associated with the D614G substitution may be associated 143 with alterations in the relative population of the all-down and one-up ensembles of the Spike 144 protein (Fig. 2) . For the G-form, the symmetry in the number of contacts among the inter-145 protomer interfaces suggests that the energetics between the protomers are similar. Since the G-146 form preserves these interactions in both the all-down and one-up conformations, the interaction 147 energy between up-and down-protomers would be similar to the energetics between two down-148 protomers. Inspired by this rationale, we use an Ising model (12) to demonstrate how the D614G 149 substitution can affect the relative stability of the open and closed Spike conformations. In this 150 model, each protomer can adopt an "up" or "down" state, and each protomer interacts with its two 151 neighbors with an energy that depends only on the states of both protomers. Defining the Spike as 152 a periodic three-spin system, the Ising model suggests that symmetrization of the energetics of the bond. We hypothesize that these three symmetrizations lead to a decrease in the energy 162 differences between the all-down and one-up states, leading to an increase in infectivity through 163 an increase in the one-up population of the G-form. 164 165 One-up state enhances exposure of RBD to ACE2 binding and epitope sites 166 Since Spike is a glycoprotein, we also modeled the impact of highly-flexible glycan 167 ensembles on RBD exposure and epitope shielding using a quantitative measure called the glycan 168 encounter factor (GEF) (13) . A low GEF value indicates a high exposure of the protein surface in 169 a given region. We perform GEF analysis here to assess changes in exposure between all-down 170 and one-up states and between the D-and G-forms (for details on local changes see 171 Supplementary Material and Fig. S11 ). Independent of D-or G-forms, the ACE2 binding site 172 and epitope regions are significantly more exposed in the one-up state than in the all-down state 173 ( Fig. 4A-B) . Specifically, the exposure increases by ~40% at the Receptor Binding Motif 174 (residues 438-506). This is accompanied by larger exposure of both the ACE2 binding site and 175 RBD targeting antibody epitopes. For example, there is a ~64% increase in exposure of the C105 176 epitope region (7) when the RBD adopts the open state, as opposed to the closed orientation 177 where it is buried by surrounding protein and glycans ( Fig. 4C-E) Further, they found that the G614 form is more sensitive to neutralization by sera raised from 193 D614 form vaccinations in animal models (14) . Our simulations provide mechanistic details that 194 can explain the increased occupancy of the one-up state in the G614 form, which in turn accounts 195 for both its enhanced infectivity (5), (6) and neutralization sensitivity (14) . 196 We found that the different interactions between protomers in the D614 form are different 197 in the one-up and all-down states. By contrast, in the G614 form there is a symmetrization in the 198 number of inter-protomer contacts between S1 and S2 subunits, which is associated with 199 allosteric fluctuations in RBD, distal to the 614 site. Thermodynamically, the effect of 200 symmetrization can be captured using a periodic three-spin Ising model. Using the observed 201 50:50 ratio of all-down to one-up conformations in the D614 form (4), we show that 202 symmetrization of the G-form energetics in comparison to the D-form would lead to a 75:25 ratio 203 of one-up to all-down. It is noted that this ratio will become higher if there is a concomitant 204 deviation from symmetry in the all-down state of the G614 form. The shift towards one-up Spike 205 trimers would increase the likelihood of binding events of RBD and ACE2, thus explaining the 206 experimentally observed increase in infectivity (5) and sensitivity to neutralization activity in from all-down to one-up; the glycan coverage disappears at the apex of the trimer when in the 210 one-up conformation. Thus, regardless of the D-or G-form, the one-up conformation exposes the 211 RBD for binding to the ACE2 receptor while simultaneously exposing more of the RBD protein 212 surface for antibody binding to RBD epitopes. There are no significant global changes coming 213 from the D614G substitution on the ACE2 binding site exposure in RBD once the Spike trimer is 214 in the one-up conformation. Thus, given the natural preference for a more open Spike 215 conformation in the G-form, it is possible that this form may have advantages as a vaccine 216 antigen. 217 In summary, our findings suggest that the overall protein exposure remains globally 218 similar between D-and G-forms, but predicts a dramatic increase in the one-up state with the G-219 form, meaning that both ACE2 binding and RBD-targeting antibody binding are likely to increase 220 in the one-up state. Therefore, a change towards a higher one-up state population is likely the 221 dominant effect of the D614G substitution. The mechanistic studies presented here, the structural 222 data (14) , the experimentally determined increase in infectivity (5) , and the enhanced neutralizing 223 antibody sensitivity (14) all come together in a consistent story. 224 225 Acknowledgments 226 We thank Drew Weissman for sharing unpublished neutralization data and being willing to 228 consider co-submission of our papers. We thank Priyamvada Acharya for sharing unpublished 229 negative stain EM data on G614. We are grateful to Paul Weber for securing extensive 230 computational resources that made this study possible. protomer. For instance, LS1-U2 represents the interactions between the S1 region of the L-down protomer and the S2 region of the Atomic structures derived from cryo-EM (PDB IDs: 6VXX and 6VYB; Walls et al. (1)) were used to prepare the all-down and one-up configurations for simulation. As described in Ref (1), residues 986-987 were kept as Proline-Proline (PP), and Fusion Peptide residues 682-685 as SGAG instead of the viral form residues KV and RRAR, to maintain the stable soluble form of the spike extracellular domain protein. In the original 6VXX and 6VYB models, several flexible regions were unresolved, which vary in lengths from 2-27 residues. We used a datadriven structure-based modeling approach to build a complete model that accurately captures secondary structures of these missing regions, without introducing artificial 'knotting' of loops. For this, we employed homology modeling using numerous SARS-COV1 structures as templates (see Supplementary Information) . Missing residues in the RBD were built using an ACE2bound SARS-CoV2 structure (PDB: 6M0J) (2). Structure-based sequence alignment was performed using the 3D-Coffee program (3) and homology modeling was performed using the MODELLER 9.20 suite (4). We generated 10 models for each configuration and selected the top model based on DOPE and Procheck scores (5), (6) . After testing that CryoEM fitting did not significantly improve the model employing Chimera 1.13.0 (7) rigid fitting, and the MDFF (8) flexible fitting, we arbitrarily chose as starting structures the all down model after 2 ns in vacuum with gscale = 0.3 and a further 2 ns in vacuum with gscale = 0.2 and the one up model after 2 ns in vacuum with gscale = 0.3. To avoid artificial charges at the protein ends, we introduced N-terminal acetylated and C-terminal N-methylamide capping groups. All histidines were modeled as the neutral tautomer where the epsilon nitrogen is protonated. 15 disulfide bonds found in 6VXX and 6VYB were modeled in the forcefield for simulation. All-atom explicit-solvent simulations were performed with the Gromacs v5.1.2 and 2018.3 software packages (9), (10), using the CHARMM36m (11) forcefield and TIP3P water model (12) . Each configuration was solvated and centered in a cubic box. The side length of the box was defined such that there is at least 15Å padding around the molecule. Each system was neutralized with an excess of 150mM KCL. Energy minimization was conducted using the steepest descent algorithm. Equilibration simulations were performed under constant numbervolume-temperature (NVT, 2ns) and constant number-pressure-temperature (NPT, 10ns) ensembles. During the equilibration stages, harmonic position restraints were imposed on all heavy atoms of the molecule. Temperature coupling was achieved using Langevin dynamics at 310K with a relaxation time of 1ps (13) . The Berendsen barostat with isotropic coupling was employed to maintain a constant pressure of 1 bar, with a relaxation time of 4ps and compressibility of 4.5x10E-5 / bar (14) . Covalent bonds were constrained by implementing the LINCS algorithm (15) . Van der Waals interactions were evaluated using a cutoff where forces smoothly switch to zero between 1.0 and 1.2nm. Coulomb interactions were calculated using the particle mesh Ewald (PME) method, with a cutoff of 1.2nm, a Fourier spacing of 0.12nm, an interpolation order of 4, and a tolerance of 1x10E-5 (16) . Unrestrained production simulations were performed in the NPT ensemble, with an integration time step of 2fs. For each configuration, we simulated five replicas. Each replica was run for 1.1 µs. The first 100 ns of each run was considered as further equilibration and was not included for analysis. In total, we generated 20 1-µs production simulations. We used the g_contacts plugin for Gromacs (17) to identify contact frequencies of all residue-residue contacts formed, both intra-and inter-protomer. A contact was defined as any heavy atom of one residue coming within 6Å of the heavy atom of another residue. For the contact analysis, we defined persistent contacts as any of the identified contacts that appeared with a frequency of greater than or equal to 0.5 -that is, that appeared in at least 50% of analyzed simulation frames. Average contacts with region S2 excluded the CD1 domain because of observations that it "frayed" in simulation and was extremely flexible, which we presume to be an artefact due to the lack of embedding of the Spike in the viral membrane. The cross-correlation matrix is defined as which is the normalized covariance where %& ! (() is the fluctuations of atom i with respect to its average coordinates. We used the -cov, -norm, and -dot flags of the program carma to perform the analysis (18) . We performed the hydrogen bond analysis with VMD 1.9.3 Hbonds plugin (19) , where the definition of a hydrogen bond was the requirement of donor-acceptor atoms being polar, distance cutoff of 4Å and angle cutoff of 30 o . This is a significantly more stringent criterion than that employed for the contact identification. A cutoff of 10% occupancy was used to identify extant hydrogen bonds. There are 22 potential N-glycosylation sites (PNGS) in the Spike, each of which has a unique glycoform distribution. We chose glycan types with the highest occurrence at each site, as determined by mass spectrometry studies (20) for 19 of these glycans present within our model range (see Fig. S11A and Table S3 ). There are two putative O-glycan sites on the Spike; however, they have been reported to have less than 2% occupancy (21) and were therefore not included in our models. We performed a Jarvis Patrick based cluster analysis where a structure is added to a cluster if it has at least 6 neighbors in common with another neighbor. Clustering was performed on all five replicas, for each of the four different systems, using the "gmx cluster" command with a cutoff of 0.2 nm to identify a subset of distinct conformations based on Ca RMSDs. We identified 63 clusters for D614 all-down, 78 clusters for D614 one-up, 54 clusters for D614G alldown, and 103 clusters for D614G one-up. We took the closest to the mean structure from each of the top twenty largest clusters of each system, and used these snapshots as the basis for building glycan models. Glycan ensemble modeling was performed using a previously established pipeline (22) with the ALLOSMOD (23) suite of MODELLER. An initial glycan conformation was added at each PNGS with randomized atomistic deviations from CHARMM36 ideal carbohydrate geometries, and random orientations. This was followed by simulated annealing relaxation of the entire glycoprotein. 50 glycosylated conformations were built on each of the 20 selected models, resulting in an ensemble of 1000 distinct glycoprotein conformations for each of the four systems. Glycoform for each PNGS was chosen as the one with the highest probability of occurrence from site-specific mass spectrometry results (20) . The Glycan Encounter Factor (GEF) was calculated for each residue exposed on the surface of the protein as established in our previous study (22) . It is defined as the number of glycan heavy atoms encountered by an approaching probe of 6Å diameter mimicking a typical hairpin loop of antibodies interacting with epitopes. Geometric mean of this value measured in the three cardinal directions (perpendicular to the surface, and along the plane) was taken to cover different orientations. Although the results from these extensive MD simulations represent an extensive investment of computational resources, large-scale conformational shifts in the dynamics, such as the actual transition between the all-down and one-up states, are beyond the microsecond timescales considered here. Nonetheless, as long as they are applied on the appropriate timescale within the known limitations of the MD force field, the results of this article are of great significance in terms of explaining the molecular-level changes that occur as a result as a result of the D614G amino acid shift and its effects on the stability of different conformational states. Also, the current structure does not include the heptad repeat 2, trans-membrane or cytoplasmic regions that could differentially alter the presentation of Spike on the membrane in the G-form. We cannot comment directly on the S1/S2 cleavage site conformation or the effect of two prolines used for stabilization, as we used the sequence from Walls et. al (1) that was mutated to stabilize a soluble protein. where N is the total number of atoms, " = ∑ -! ' !() is the total mass, ri ref is the coordinates of the 27 reference structure, and ri is the coordinates of the ith frame. All calculations were performed with 28 the gmx rms tool from Gromacs 5.1.2 (6) . 29 Root mean square fluctuation (RMSF) of atom i is calculated as, 30 We also investigated spatial details of the fluctuations by presenting the RMSFs of each 51 atom in the RBD. In the D-form (Fig. S1 E,H) noted that these trends are largely preserved in the G-form (Fig. S1B,D) region. We observe that due to the increased flexibility around 614 resulting from the D to G 73 change, the glycan at 616 can now sample a larger volume and interact with residues 46, 280 and 74 281 in the neighboring protomer in ~24% of the ensemble (Fig. S11C) . Due to this reorientation 75 of glycan 616 near the D614G, there is an overall increase in glycan coverage for residues 826 to 76 855 in the all-down conformation (Fig. S11C) . 77 S2 . S1-S1 and S2-S2 contacts. Average total number of contacts at the S1-S1 and S2-S2 interfaces in (A) the all-down system 105 and (B) the one-up system. For each set of simulations, error bars were calculated as standard error across five replicas. In the x-axis 106 of all panels, 'L' denotes the L-down protomer, 'U' the Up protomer, and 'R' the R-down protomer. For instance, LS1-US1 107 represents the interactions between the S1 region of the L-down protomer and the S1 region of the Up protomer. LU: !"# ± !# !%# ± !# !!& ± % !'" ± ( UR: !"# ± !# !!) ± & !%" ± & *( ± ) RL: !"& ± ( !%( ± !# !"& ± ( !%( ± !# Here we use a periodic three-spin Ising model to describe the occupation probability-and hence the population-of a particular thermodynamic macrostate through the Boltzmann distribution, which includes taking the exponential of the energy. It captures how a change in energies can lead to a large shift in the ratios of the respective populations. Consider the one-dimensional periodic Ising model not in an external field, in which the Hamiltonian is defined as, where the sum is across all nearest-neighbor pairs, including the pair {0, N} where N is the total number of spins. We may think of the COVID Spike trimer in a highly simplified way as being a three-spin periodic Ising model, where each of the protomers is a "spin" that can take on an "up" state or a "down" state. Let us assume the following definitions for the interaction energies: E UU is the magnitude of the interaction between a pair of protomers both in the up state, E DD is the magnitude of the interaction between a pair of protomers both in the down state, E UD is the magnitude of the interaction between protomers where the S1 region of the protomer in the up state interacts with the S2 region of the protomer in the down state, and E DU is the magnitude of the interaction between protomers where the S2 region of the protomer in the up state interacts with the S1 region of the protomer in the down state. We do not assume these are equal since there is a known asymmetry in these interactions. Then the Hamiltonians of all microstates in the system are given as follows, where we assume all interactions are overall favorable with negative interaction energies at body temperature, room temperature, etc. The probability of being in a particular microstate is given by the Boltzmann distribution, where S 2 {U, D}, = (k B T ) 1 is the Boltzmann factor, and Z ⌦ is the normalizing constant or partition function. Then let us consider the ratios of di↵erent macrostates to one another. We have four possible macrostates: all down (none up) (0U ), three (all) up (3U ), one up (1U ), and two up (2U ). Of these, all down and three up correspond to a single microstate, and one up and two up correspond to three microstates. Then the set of potentially interesting probability ratios is, P (3U ) P (2U ) Assuming that most Cryo-EM and similar experiments have been performed at equilibrium, we note that there is a general lack of observation of the 2U and 3U states, which suggests that the di↵erence between E UU and E DD is large with E UU ⌧ E DD , which would lead to a rapidly vanishing exponential in the comparison between 2U and 0U , 3U and 0U , and 2U and 1U , etc. Since most studies have seen an approximate 50/50 = 1 ratio of 1U to 0U populations, this gives an order of magnitude estimate for the di↵erence in E DD and E DU if we assume E DU ⇡ E UD , T = 310 K, and k b = 1.38 ⇥ 10 23 m 2 kg s 2 K 1 , of, 3e (2E DU 2E DD ) ⇡ 1 (12) E DU E DD ⇡ k B T log(3) 2 ⇡ 10 21 m 2 kg-s 2 (13) or approximately -0.006 eV, which is still a significant energy di↵erence when considering the system as single particles in a highly idealized model. If our results do point to an equalization of the energies between E DU and E DD , then that would point to a shift in population from a 1:1 to a 3:1 ratio for 1U :0U as the argument of the exponential goes to zero and the exponential goes to one, meaning 75% 1U and only 25% 0U , which could already be responsible for an increase in transmissibility. It is unclear from our current results whether the mutation could also e↵ect the E UU interaction, but it is certainly not impossible. Since the Boltzmann distribution is an exponential, even relatively small changes in the argument can lead to large changes in the overall population of the states. A good avenue for future work would be to investigate the 2U and 3U states if possible. One caveat to this interpretation is that it solely addresses thermodynamic contributions and does not consider the rates at which di↵erent states can tran-sition from one to the other. If the rate of state transition is very low in comparison to the lifetime of the Spike between creation and ACE2 binding, then the thermodynamics are not the relevant quantity and what is more relevant is the initial configuration and-assuming that to be all down-the ease with which it can transition into the 1U conformation. This potential kinetic aspect of state transition would need to be addressed through di↵erent techniques, but it is also possible that the relaxation of the hydrogen bonding network induced by the mutant could have a significant impact on the kinetics as well as the thermodynamics. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee Comparative Protein Structure Modeling Using Modeller Statistical potential for assessment and prediction of protein structures PROCHECK: a program to check the stereochemical quality of protein structures UCSF Chimera--a Visualization System for Exploratory Research and Analysis Molecular dynamics flexible fitting: A practical guide to combine cryo-electron microscopy and Xray crystallography GROMACS: fast, flexible, and free GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers CHARMM36m: an improved force field for folded and intrinsically disordered proteins Comparison of simple potential functions for simulating liquid water Efficient algorithms for langevin and DPD dynamics Molecular dynamics with coupling to an external bath LINCS: A linear constraint solver for molecular simulations A smooth particle mesh Ewald method g _ contacts : Fast contact search in bio-molecular ensemble data Grcarma: A Fully Automated Task-Oriented Interface for the Analysis of Molecular Dynamics Trajectories VMD: Visual molecular dynamics Site-specific glycan analysis of the SARS-CoV-2 spike Deducing the N-and Oglycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 Quantification of the Resilience and Vulnerability of HIV-1 Native Glycan Shield at Atomistic Detail All-Atom Ensemble Modeling to Analyze Small-Angle X-Ray Scattering of Glycosylated Proteins Spike ASN residue Glycan Type Constituent pyranose rings 17 Fucosylated 3-antennae (FA3) Fucosylated 3-antennae (FA3) HexNAc(5)Hex(3)Fuc(1) Fucosylated 2-antennae (FA2) HexNAc(5)Hex(3)Fuc(1) Fucosylated 2-antennae (FA2) HexNAc(5)Hex(5)Fuc(1) Fucosylated 2-antennae (FA2) HexNAc(5)Hex(3)Fuc(1) Fucosylated 2-antennae (FA2) Fucosylated 2-antennae (FA2) HexNAc(5)Hex(3)Fuc(1) Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein JPred4: a protein secondary 189 structure prediction server I-TASSER: A unified platform for automated 192 protein structure and function prediction Comparative Protein Structure Modeling Using Modeller PROCHECK: a program to check the stereochemical quality of protein structures GROMACS: High performance molecular simulations through 202 multi-level parallelism from laptops to supercomputers Site-specific glycan 205 analysis of the SARS-CoV-2 spike Matplotlib: A 2D Graphics Environment The PyMOL Molecular Graphics System, Version 1.8 Glycans on the SARS-CoV-2 Spike Control the Receptor Binding 211 Soluble form of the SARS-CoV-2 Envelope spike protein was modeled based on available PDB 3 structures. Structure 6VXX was used as the base template for all RBD down, closed model and 4 6VYB was used for the one-up model (1) . These cryo-EM structures have several missing residues 5 around the flexible loops. Ten such segments are more than 4 residues long, some are missing up 6 to 29 residues. Analyzing the sequence with secondary-structure prediction-software JPRED-4 (2) 7 and iTASSER (3), shows that many of them may have stabilized secondary structure features of 8 helices and beta strands. Moreover, modeling of such long loops ab initio may lead to artifacts and 9'self-knotting'. Hence, we utilized available CoV-1 structure data as starting template to fill in these 10 gaps whenever possible. Templates used for different loops are provided in Table S2 . 11Structure driven sequence alignment between CoV-1 and CoV-2 was used to determine template 12 ranges, and is provided in Fig. S12 . Following template selection, homology modeling was 13 performed using a Variable Target Function Method of refinement, and 300 steps of template-14 restrained MD, with MODELLER 9.20 suite (4). 10 models were generated for each of the four 15 systems. Model selection was based on DOPE score and PROCHECK stereochemistry scores (5) . 16 15 disulfide bonds between cysteines pairs were identified and restrained with patches during 17 modeling and appropriate force terms for these bonds were included for simulations.