key: cord-0297393-xusm06t4 authors: Tekin, E.Deniz title: Investigation of the Effects of N-Linked Glycans on the Stability of the Spike Protein in SARS-CoV-2 by Molecular Dynamics Simulations date: 2022-01-12 journal: bioRxiv DOI: 10.1101/2022.01.07.475397 sha: 448293a9a185dd537ebfcec56cc906ef747d58d1 doc_id: 297393 cord_uid: xusm06t4 We perform all-atom molecular dynamics simulations to study the effects of the N-linked glycans on the stability of the spike glycoprotein in SARS-CoV-2. After a 100 ns of simulation on the spike proteins without and with the N-linked glycans, we found that the presence of glycans increases the local stability in their vicinity; even though their effect on the full structure is negligible. Graphical Abstract Main structural components of SARSCoV-2 are made up of several different types of proteins: the transmembrane glycoprotein, the spike (S) glycoprotein and the envelope protein; with each protein having definite functions in binding to a receptor and sneaking into the host cell. The largest of these proteins, the S protein, assumed to be the most important one in entering the host cell, is the target of antibodies. For this reason, the three-dimensional shape and dynamics of this protein is extremely important to understand [1] . The SARSCoV-2 S glycoprotein exists as a homotrimer with each monomer composed of two sub-units: the S1 subunit that binds to the host cell receptor, and the S2 subunit that takes part in the fusion of the viral and the cellular membranes. In addition, a furin cleavage site at the boundary between the S1/S2 subunits exists [2] . The boundary, cleaved during the biosynthesis, is a novel feature of this virus which does not exist in the other known coronaviruses such as the SARS-CoV and the SARSr-CoV. These two domains of the S glycoprotein are shown in Figure 1 . Walls et al. [1] determined the cryo-EM structures of the SARSCoV-2 S glycoprotein in two distinct conformations and stored the data as 6VXX.pdb (closed SARS-CoV-2 S) and 6VYB.pdb (SARS-CoV-2 S with one S B open) in RCSB PDB. Various related coronavirus S glycoproteins are extensively patched with heterogeneous N-linked glycans (NAG) hanging from the trimer surface that are important for proper folding [3] and "for modulating accessibility to host proteases and neutralizing antibodies" [4] [5] [6] . Extensive N-linked glycans have also been characterized at the interface of SARS-CoV-2 S ( Figure 2 ); and the existence of glycans is believed to be one of the reasons why the coronavirus has caused a large number of infections and mortality [7] [8] [9] . Some comparative studies have been conducted between the coronavirus spikes [10] [11] [12] [13] [14] and the coronavirus spikes with other viral proteins [15, 16] . Wang et al. summarized the importance of the S protein glycans in viral entry and antibody production in [17] . To date, a large amount of experimental [18] [19] [20] [21] [22] [23] and theoretical studies [24] [25] [26] [27] [28] [29] [30] [31] [32] have been conducted elucidating the structure of the coronavirus and how it infects. Among them, Amaro et al. [33] showed that two N-glycans at positions N165 and N234 have an important structural role in stabilizing the receptor-binding domain (RBD) of S1 "up" conformation using all-atom molecular dynamics simulation (MD). Another MD study conducted by Woods et al. [9] revealed the importance of the S protein glycan in shielding the immune recognition. 16 out of 22 SARS-CoV-2 S N-linked glycans (NAGs) per monomer (in total 48) were observed in cryo-EM [1] (Table-1, Figure 2b ); however, computer simulation of closed form (6Vxxx.pdb) spike protein with NAG molecules is not easy for the reason that the existing force fields are generally defined for amino acids, nucleic acids and for some special cofactors such as the ATP; but not defined for such mixed molecules. So, I introduced the NAG-linked amino acids to the system as pseudo-amino acids. Details are in the "Material and Methods" section. and SARS-CoV-2 S with NAG (S-with-NAG) have been performed to better understand the role of the NAG molecule on the stability of the S protein. The structure of the SARS-CoV-2 S glycoprotein (closed state) was taken from the Protein Data Bank with the 6vxx code [1] . The missing residues were added using the Swiss Pdb-Viewer program [34] in order to complete the structure before running the molecular dynamics simulations. The acetyl (ACE) and amino (NH2) groups were capped to each -mer of the S protein, in order to keep the protein in a fully folded state. The difficulty one encounters is that the NAG molecule, and so the NAG-linked amino acid, is not a known entity in any force field of GROMACS [35] . In the S protein, NAG is linked to the Asparagine (Asn) in two configurations: Mono-Saccharide NAG bound to ASN which I called as ANX ( Figure 3a ) and Poly-Saccharide NAG residues bound to ASN which I called as NAN ( Figure 3b ). To be able to introduce the NAG-linked amino acids, ANX and NAN, into an existing force field, one needs to parametrize them. To this end, geometries of the ANX and NAN were created by ArgusLab [36] and the bonded parameters (bonds, angles, dihedrals and impropers) were added to the aminoacids.rtp file of Gromacs 54a7 force field [37] . The partial charges derived from the electrostatic potential were obtained by single point calculations at the B3LYP-D3/6-31G**++ level of theory using the Jaguar 10.7 program package [38] . Then, ANX and NAN were defined as Protein in the residuetypes.atp file. Thus, we obtained the complete model of the S glycoprotein of SARS-CoV-2 (S-with-NAG). Besides, in order to understand what effects the NAG molecule has on the S protein, we also created a system which is devoid of the NAG molecules (S-without-NAG) for comparison. S-without-NAG and S-with-NAG proteins were put in separate rhombic dodecahedron box filled with a SPC [39] type water molecules. Minimum image convention is set with the S-trimers in the center of the boxes and at a distance 1.0 nm to the edges of the boxes. We added 15 Na + ions (5 Na + ions permer) to neutralize the charge of the systems. Having obtained electro-neutral S-trimers in boxes of water, a two-stage minimization was carried out with the steepest-descent algorithm for S-with-NAG; first with define=-dflexible option and then run without it (rigid water) and latter minimization for Swithout-NAG. Following minimizations, each structure was equilibrated in two steps as 100 ps NVT and 100 ps NPT respectively with position constraints for the protein, to stabilize the temperature at 300 and long-range electrostatic interactions were calculated using the particle mesh Ewald (PME) [44] algorithm. Long range dispersion corrections for both the energy and the pressure were applied to crank into the truncation of van der Waals terms. Periodic boundary conditions were applied in the x, y and z directions. All the calculations were carried out using the GROMACS 2019 [35] code and the GROMOS 54a7 force field [37] extended to include the ANX and NAN residues. The snapshots were obtained using the visual molecular dynamics (VMD) software [45] . The coronavirus spike protein has a coating of N-linked glycans in various locations (see Table 1 ). To get the glycan bound to the S protein, we modified the Gromos 54a7 force field including ANX and NAN residues. Then, to understand if the NAG molecules lead the changes in the dynamics of the S protein, we analyzed the MD trajectories in the presence (S-with-NAG) and absence of NAG (S-without-NAG). Structural stability along the simulations was analyzed using the backbone root-mean-squaredeviation (RMSD) relative to the energy-minimized configuration of the starting structures in three different manners. First, the RMSD of the S-without-NAG and S-with-NAG were compared. The RMSD values for both systems range from 0.5 nm to 0.6 nm which are consistent with the results of [27] for the corresponding temperature; and it was observed that both systems reach stability after the first 20 ns of the simulation time (Figure 4a) . Second, since the S protein is made up of two domains, S1 and S2, we also analyzed the RMSDs of these parts without and with NAG separately. The results are as follows. -RMSD values of the S1 region is greater than that of the S2 region for both without and with NAG structures ( Figure SI-1a and Figure SI-1b) . Thus, it might be concluded that the S2 region is slightly more stable than the S1 region which is consistent with [27] -For the S1 region (Figure 4b ), after the 20 ns simulation, the difference in the RMSD values between S1-without-NAG and S1-with-NAG ranges from 0.04 nm to 0.09 nm and S1-with-NAG has greater RMSD values than the S1-without-NAG. -In the S2 region (Figure 4c) , the difference in the RMSD values between the S2-without-NAG and S2-with-NAG ranges from 0.02 nm to 0.07 nm and these differences are less than that of the S1 region. Moreover, unlike the S1 region (Figure 4c ), S2-wihout-NAG has greater RMSD values than S2-with-NAG. It can be inferred that the presence of the NAG molecule makes the S2-domain more stable. Third, since the NAG molecules are known to have important roles in the viral infection, we examined the vicinity of the NAG-linked residues and the corresponding ASN residues. Namely, we calculated the RMSD values between the NAG-linked amino acids (ANX and NAN; Table 1 ) and only amino acids (that is ASN aminoacids without NAG). Striking differences between them are observed; that is to say the RMSD value of NAG-linked amino acids, which is 0.9 nm and below, appears to be more stable than the one without NAG which is 2.7 nm on average (Figure 4d ). It can be inferred that the N-linked glycans make the ASN amino acids more stable. As a measure of compactness, that is to get an idea of the overall dimensions of the structures, the radius of gyration (Rg) was calculated throughout the simulations in three different ways. As in the case of RMSD results, gyration (Rg) calculation of S-without-NAG and S-with-NAG (Figure 5a) showed that the two structures exhibited nearly similar fluctuations. Gyration values of the S1 region is greater than the S2 region for both without and with NAG structures ( Figure SI-2a and Figure SI-2b) ; 4.5 nm for S1 and 3.5 nm for S2 for both. S1 and S2 regions in S-with-NAG (Figure SI-2c) are larger than S1 and S2 regions in S-without-NAG ( Figure SI-2d) , as expected. However, if one zooms into the vicinity of the NAG-linked amino acids, one observes that the structure without NAG is more floppy; compactuncompact, than the structure with NAG ( Figure 5b) . Moreover, solvent accessible surface area (SASA) as a function of time was analyzed. SASA plot of Swithout-NAG and S-with-NAG showed no considerable difference ( Figure 6a ); that is, there is no remarkable shielding by glycans (consistent with [9] ). SASA of S1 is greater than S2 for both structures ( Figures SI-3a and SI-3b) . The presence of NAG did not cause much fluctuation in the SASA of the S1 region ( Figure 6b ), but reduced the fluctuation in S2 region (Figure 6c ). SASA of NAG-linked amino acids We calculated the solvent accessibility of the individual glycan residues and the corresponding ASN residues. As seen from the Table 2 and Figure 6d , for all three-chains, residues ANX and NAN have almost 2 times more solvent-accessible surface areas than the corresponding ASN residues without NAG. Also, NAN has a larger solvent-accessible surface area compared to ANX. In the case of hydrogen bonds ( Figure 7 ) the cut-off distance and the cut-off angle criteria were set to The spike glycoprotein is the trimer structure decorated with the NAG molecules, located at the outermost part of the SARS-CoV-2, which enters the respiratory system by interacting with the human receptor ACE2. Some experiments and theoretical simulations show the importance of the NAG molecule on the stability of the different molecules. At the molecular level, however, the effect of NAG molecules on the SARS-CoV 2 structure is still not clear. In this study, 100 ns MD simulations of closed SARS-CoV-2 S (6VXX.pdb) were performed without and with NAG to better understand the structures in atomic details and the importance of NAG molecules on the stability of the SARS-CoV-2 S glycoprotein. This comparative study revealed that even though there are no striking differences between the stability of the full structures in the absence or presence of the NAG molecules, glycans increase the local stability; that is to say the vicinity of NAG molecules are much more stable than the corresponding ASN residues. Moreover, the S1 domain is more flexible than the S2 domain. Overall, these results can enlighten the nature of the SARS-CoV-2 providing an information at the atomic level, as well as be useful for vaccines and therapeutics development. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Phylogenetic analysis and structural modeling of SARS-CoV-2 spike protein reveals an evolutionary distinct and proteolyticallysensitive activation loop The Viral Spike Protein Is Not Involved in the Polarized Sorting of Coronaviruses in Epithelial Cells Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Glycan shield and fusion activation of a deltacoronavirus spike glycoprotein fine-tuned for enteric infections Two Mutations Were Critical for Bat-to-Human Transmission of Middle East Respiratory Syndrome Coronavirus Virus-receptor interactions of glycosylated SARS-CoV-2 spike and human ACE2 receptor Site-specific glycan analysis of the SARS-CoV-2 spike Analysis of the SARS-CoV-2 spike protein glycan shield reveals implications for immune recognition Comprehensive Structural and Molecular Comparison of Spike Proteins of SARS-CoV-2, SARS-CoV and MERS-CoV, and Their Interactions with ACE2 A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses SARS-CoV-2, SARS-CoV, and MERS-CoV: A comparative overview Vulnerabilities in coronavirus glycan shields despite extensive glycosylation Similarities and differences between HIV and SARS-CoV-2 Glycans of SARS-CoV-2 Spike Protein in Virus Infection and Antibody Production Structure-based design of prefusion-stabilized SARS-CoV-2 spikes A new coronavirus associated with human respiratory disease in China The architecture of SARS-CoV-2 transcriptome SARS-coronavirus-2 replication in Vero E6 cells: replication kinetics, rapid adaptation and cytopathology Isolation, sequence, infectivity, and replication kinetics of severe acute respiratory syndrome coronavirus 2 Computer Simulations of the interaction between SARS-CoV-2 spike glycoprotein and different surfaces Structure, Dynamics, Receptor Binding, and Antibody Binding of the Fully Glycosylated Full-Length SARS-CoV-2 Spike Protein in a Viral Membrane Fighting COVID-19 Using Molecular Dynamics Simulations Investigation of the Effect of Temperature on the Structure of SARS-CoV-2 Spike Protein by Molecular Dynamics Simulations Dynamics of the ACE2-SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms Computational design of SARS-CoV-2 spike glycoproteins to increase immunogenicity by T cell epitope engineering Computer simulations of the interaction between SARS-CoV-2 spike glycoprotein and different surfaces SARS-CoV-2 Main Protease: A Molecular Dynamics Study Microsecond MD Simulation and Multiple-Conformation Virtual Screening to Identify Potential Anti-COVID-19 Inhibitors Against SARS-CoV-2 Main Protease Beyond Shielding: The Roles of Glycans in SARS-CoV-2 Spike Protein Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective Hess, and the GROMACS development team, GROMACS User Manual version Molecular Docking Using ArgusLab, an Efficient Shape-Based Search Algorithm and the a Score Scoring Function Definition and testing of the GROMOS force-field versions 54A7 and 54B7 FriesnerJaguar: A highperformance quantum chemistry software program with strengths in life and materials sciences The viscosity of SPC and SPC/E water at 277-K and 300-K Canonical sampling through velocity rescaling Molecular-Dynamics with Coupling to an External Bath LINCS: a linear constraint solver for molecular simulations Computer 'experiments' on classical uids. I. Thermodynamical properties of Lennard-Jones molecules Particle mesh Ewald -an n log(n) method for Ewald sums in large systems VMD: visual molecular dynamics Dictionary of protein secondary structure -pattern-recognition of hydrogen-bonded and geometrical features The numerical calculations reported in this paper were fully/partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).