key: cord-1026135-m1q1sjr4 authors: Majumder, Satyabrata; Chaudhuri, Dwaipayan; Datta, Joyeeta; Giri, Kalyan title: Exploring the intrinsic dynamics of SARS-CoV-2, SARS-CoV and MERS-CoV spike glycoprotein through normal mode analysis using anisotropic network model date: 2020-10-16 journal: J Mol Graph Model DOI: 10.1016/j.jmgm.2020.107778 sha: fa06982dd080f2ceacfda2e0950f2b8dd9e40424 doc_id: 1026135 cord_uid: m1q1sjr4 COVID-19 caused by SARS-CoV-2 have become a global pandemic with serious rate of fatalities. SARS-CoV and MERS-CoV have also caused serious outbreak previously but the intensity was much lower than the ongoing SARS-CoV-2. The main infectivity factor of all the three viruses is the spike glycoprotein. In this study we have examined the intrinsic dynamics of the prefusion, lying state of trimeric S protein of these viruses through Normal Mode Analysis using Anisotropic Network Model. The dynamic modes of the S proteins of the aforementioned viruses were compared by root mean square inner product (RMSIP), spectral overlap and cosine correlation matrix. S proteins show homogenous correlated or anticorrelated motions among their domains but direction of C(α) atom among the spike proteins show less similarity. SARS-CoV-2 spike shows high vertically upward motion of the receptor binding motif implying its propensity for binding with the receptor even in the lying state. MERS-CoV spike shows unique dynamical motion compared to the other two S protein indicated by low RMSIP, spectral overlap and cosine correlation value. This study will guide in developing common potential inhibitor molecules against closed state of spike protein of these viruses to prevent conformational switching from lying to standing state. Coronaviruses are zoonotic pathogens which belong to the largest group of Nidovirales order [1] and has been shown to infect many avian and mammalian species [2] . Severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV) and novel coronavirus (SARS-CoV-2) are in betacoronavirus family but with different lineages. SARS-CoV-2, SARS-CoV belongs to lineage B whereas MERS-CoV belongs to lineage C but a common factor is that they all cause acute lung injury (ALI) and acute respiratory distress syndrome (ARDS) which leads to pulmonary failure and result in fatality. COVID-19 caused by novel coronavirus (SARS-CoV-2), has now emerged as global pandemic and high rate of fatalities have been reported globally [3, 4, 5] . The major surface protein of all these 3 coronaviruses is called spike glycoprotein (S) which is responsible for host attachment of the virus and entry into cell by membrane fusion. The protein protrudes out of the surface and is found in a trimeric configuration [6] . The protein has 2 subunits called S1 and S2. S1 subunit has the receptor binding domains (RBD) which help to stabilize the prefusion state. The S2 subunit remains anchored to the membrane and contains the fusion machinery and further cleaved to form S2' in certain coronaviruses thus activating the membrane fusion cascade [7, 8] . The receptors recognized by the viruses are also different in case of MERS-CoV it is 5-N-acetyl-9-O-acetyl-sialosides found on glycoproteins and glyco-lipids at the host cell surface while for SARS-CoV and SARS-CoV-2 it is hACE2 receptor [9, 10] . Spike glycoprotein undergoes conformational switch in order to bind with the receptor. In lying or closed state the receptor binding domain is buried deeply into the trimer core and so it is not accessible to the receptor making the conformational switch essential. In standing or open state one of the RBD protrudes out of the trimer core and becomes accessible to the receptor [11] . A detailed study to explore the intrinsic dynamical motion of the prefusion lying state spike protein trimer is needed for proper understanding of its function from conformation perspective. In recent years characterization of whole protein collective and domain-specific motions have been accomplished using normal mode calculations [12, 13] . Coarse-grained normal mode analysis (CG-NMA) has now become a valuable tool for studying the biologically relevant conformational motions of large protein or protein complex. Although coarse-grained NMA (CG-NMA) does not capture the detailed local dynamics that are obtained from all atom molecular dynamics simulation, CG-NMA has been shown to be equally effective in J o u r n a l P r e -p r o o f demonstrating the global functional motions of proteins [14, 15] . NMA with anisotropic network model (ANM), a variant of elastic network model (ENM), has been proved to be successful in describing the collective dynamics of a wide range of biomolecular systems [16] [17] [18] . Notable application of NMA using ANM especially in membrane proteins has been thoroughly mentioned in the study and review done by Bahar et al [19, 20] . In this study we have explored the intrinsic dynamics of the closed or lying state spike (S) glycoprotein of SARS-CoV-2, SARS-CoV and MERS-CoV through NMA studies using ANM. We found noticeable variations in the dynamics of the proteins in different low frequency modes in terms of both magnitude and direction of displacement vectors. We found that receptor binding motif (RBM) of a single chain of SARS-CoV-2 spike show high fluctuation in the lying state indicating its high probability to bind with the receptor implying the high infectivity of SARS-CoV-2 than other two viruses. MERS-CoV S shows distinct dynamics than SARS-CoV S and SARS-CoV-2 S indicates its low sequence similarity and distant phylogenetic relationship with other. Detailed domain-wise motion, atomic fluctuation data and quantitative comparison between S protein structures of the mentioned viruses will aid in identifying common potential drug binding hotspots in the receptor binding domain of the protein which will guide in designing common inhibitor or modified version of these molecules through structure-based approach. J o u r n a l P r e -p r o o f All the X-ray crystal structures for SARS-CoV-2, SARS-CoV and MERS-CoV S proteins in the lying state were taken from protein data bank [21] . The corresponding PDB IDs are 6VXX, 5X58 and 6Q05 respectively [11, 22, 23] . CHARM-GUI was used to prepare the protein (i.e adding missing atoms and assigning correct protonation states) [24] . As a large part of this study is involved in the comparative analysis purposes, we have discarded part of the S2 region of S protein of all viral strains so that each protein model has equal number of Cα atoms (1087 in all three chains). S2 region of the protein does not take active part in receptor interaction. Detailed mathematical description about normal mode analysis and how it is applicable to protein molecule is given in many studies [20, [25] [26] [27] . NMA is a technique to study vibrational motion of harmonic oscillating system near the equilibrium state. The motions are of small amplitude in a potential well and cannot cross energy barriers. At a minimum state , the potential energy V can be obtained from the approximation of Taylor series expansion. For the generalized coordinates : Where specifies the deviation from equilibrium ( = + ). The kinetic energy T also approximated from Taylor expansion. The resulting Lagrangian is L = T-V, which results in linear differential equations of motion: Assuming oscillatory solution, = cos ( + ) and substituting it in equation 2 results in eigenvalue problem: A is the matrix of amplitudes , and V is the matrix of the second derivative of the potential energy and is referred to as Hessian. $ is a diagonal matrix of eigenvalues. The pattern of motions is fully specified by the vibrational normal modes, i.e. the eigenvectors (% ) and their associated eigenvalues ($ ). The vector describes the directionality and magnitude of motion of each particle relative to other particles. Here . and . are instantaneous and equilibrium distances between the nodes i and j, ( is Here we consider (3N -6) non zero modes, $ corresponds to eigenvalues and the corresponding eigenvectors are 7 . The inverse H -1 is also organized in N x N submatrices of size 3 x 3, each. The ij th submatrix H ij -1 defines the covariance between the fluctuations of residues i and j. The cross-correlation (CC) between the equilibrium fluctuations of residues i and j, 8Δ9 • Δ9 ;is expressed in terms of the trace (tr) of these submatrices as: We have performed the NMA and all the analysis functions including root mean squared inner product (RMSIP) [28] , spectral overlap [29] and cosine correlation or overlap [30] using ProDy python package [31] . RMSIP computes quantitative comparison value between the sets of normal modes and expressed as: Structural alignments of the models were done using PyMol. We have computed the eigenvalues of first hundred non-zero normal modes of the SARS-CoV-2, SARS-CoV, and MERS-CoV spike (S) proteins in the lying state ( Fig. 1 ). The energy associated with a given normal mode is directly proportional to its eigenvalue [28] . We have discarded the We have assessed the extent to which the trimer conformation influences the dynamics of each S protein chain (Fig. 3) . Overall the pattern of the cross correlation matrix (CCM) looks similar for three models and have similar domain wise correlated movement but the magnitude of correlation coefficients (CC) are different to some extent. According to PDB and Uniprot protein feature view, the position of S1 NTD, S1 CTD and RBD for the protein models are as follows: SARS-CoV: 14-290 is S1 NTD (N-terminal domain), 321-513 is S1 CTD (C terminal domain), 306-527 is RBD (binding motif is 424-494); SARS-CoV-2: 13-303 is S1 NTD, 334-527 is S1 CTD, 319-541 is RBD (binding motif is 437-508); MERS-CoV: 18-350 is S1 NTD, 381-588 is J o u r n a l P r e -p r o o f spike proteins are shown. Color coding is as follows: Orange: S1 NTD, Violet: S1 CTD, Cyan: S2 region and Blue: RBD. Multiple sequence alignment of the proteins is given in supplementary. Emphasizing the motions among the domains, it was found that SARS-CoV and SARS-CoV-2 S proteins have homogenous correlated displacements. S1 CTD and NTD of each chain have weak negative correlation with each other (-0.1 to -0.4). S1 CTD of one chain has weak negative correlation with CTD of other chains and same scenario is seen with the NTD. S1 CTD and NTD in each chain are negatively correlated with each other implying the RBM tries to move further from NTD in order to be accessible by the receptor. One of the interesting facts is that S1 NTD of one chain shows strong positive correlation with CTD of at least one of the two other chains which is true for both SARS-CoV and SARS-CoV-2 S protein but variation is seen in the CCM of MERS-CoV S protein (Fig. 3C ). S1 NTD of each chain is negatively correlated with NTD of other chains with magnitude range -0. around the interfaces of domains [32] . In order to comprehensibly describe the motion of the trimer chains of the first four low frequency (low eigenvalue) modes (λ7, λ8, λ9, λ10), first we have computed three unique symmetry axis of the spike trimer using VMD Symmetry Tool. Among the three symmetry axis, one pointed outward through the trimer core, which we have named the Z axis (default 2 in VMD nomenclature) and the other two axis were named as X and Y axis (default VMD nomenclature 1 and 0 respectively). We have used default PDB chain ids of the three spike protein structures to address the chain and trimer motion individually. As previously described we will confined ourselves to discuss the motion of S1 domain of the three chains as it harbors the functional receptor binding domain. In all the structures, the β-strand region (BSR) of S1 dominates overall motions of the trimer in first four low frequency non-zero modes (supplementary 4, 5 and 6). We did not notice any concerted motion of the three chains We have computed the squared atomic fluctuations of the S protein models using first 10 nonzero normal modes (Fig. 6) . The amplitude of the fluctuations shows variation within the models. In In order to compare the normal modes between the structures, we have calculated the RMSIP, Normal mode analysis of equilibrium structures falls in the category of principal component analysis-based methods like singular value decomposition of molecular dynamics or Monte Carlo trajectories or essential dynamics analysis of the covariance matrices retrieved from MD runs [20] . These studies have proven to be useful in studying the equilibrium dynamics of proteins which are spanned by low frequency end of the mode spectrum [33] . Anisotropic , which is proportional to energy, will lead to energetically unfavorable motions [20] . Spectra of eigenvalues shows that SARS-CoV-2 S has more collective motion than other two S protein implying SARS-CoV-2 S protein has higher stability [32] and has lower energetically expensive motion than the latter. This stability of the S trimer of SARS-CoV-2 in terms of eigenvalue frequency may give a part of the explanation about why SARS-CoV-2 is more infective than other two but do not provide detailed description. It is shown that in SARS-CoV S, which is the lying state, the RBD is buried in the S protein trimer therefore interaction with ACE receptor is hindered. In order to bind with receptor, S protein must switch to standing state where RBD of any one of the chains is exposed and binds with receptor [8] (Fig. 8) . From the homogenous CCM plot of whole trimer (Fig. 2) , we hypothesize that the molecular mechanism of this conformational switching at atomic scale may be similar in nature in the three viruses but the variation seen in the CCM of RBM and RBD are responsible for the difference in infectivity. It can be said from the CCM that RBD of all three chains have equal probability to be switched into standing state. From the first four low frequency non-zero modes (λ 7 , λ 8, λ 9, λ 10 ) localized variation in the direction of C α atom motion in the models are worth noticeable (Fig. 5 ). We have also described the motions of the C α respect to the symmetry axis of the trimer (supplementary receptor. This leads to more tight binding of SARS-CoV-2 S protein RBD to the receptor than SARS-CoV and it explains partially, why SARS-CoV-2 is 10 to 15 times more infective than SARS-CoV [31] . Cryo-EM studies have shown that, in SARS-CoV spike, the RBD is mostly in the standing state [8, 11] however , in SARS-CoV-2 spike, the RBD is mostly in the lying state The present comprehensive study demonstrated the intrinsic dynamics of the S proteins from SARS-CoV-2, SARS-CoV and MERS-CoV and their inter-relationship through normal mode analysis using ANM. High fluctuation of the RBM of SARS-CoV-2 spike indicates its propensity to bind with ACE2 even in the lying state inferring its higher infectivity than SARS-CoV and MERS-CoV. The distant relationship of MERS-CoV with the SARS-CoV and SARS-CoV-2 in phylogenetic tree at sequence level resembles that at the dynamics level due to its unique dynamical motion. Sites of very low atomic fluctuation at the S1 site which harbours the RBD could serve as the potential drug binding site. Our study could serve as the basis for J o u r n a l P r e -p r o o f designing common potential drugs against the spike of the three mentioned viruses which may inhibit the conformational changes from lying to standing state. COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses Jumping species-a mechanism for coronavirus persistence and survival WHO | Summary of probable SARS cases with onset of illness from 1 Middle East respiratory syndrome coronavirus (MERS-CoV) Immunogenicity of a DNA vaccine candidate for COVID-19 Structural insights into coronavirus entry Pre-fusion structure of a human coronavirus spike protein Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Analysis of domain motions in large proteins Analysis of domain motions by approximate normal mode calculations Global Dynamics of Proteins: Bridging Between Structure and Function Probing the global and local dynamics of aminoacyl-tRNA synthetases using all-atom and coarse-grained simulations Insights into Equilibrium Dynamics of Proteins from Comparison of NMR and X-Ray Data with Computational Predictions Coarse-grained normal mode analysis in structural biology Normal Mode Analysis: Theory and Applications to Biological and Chemic Anisotropic network model: systematic evaluation and a new web interface Normal mode analysis of biomolecular structures: Functional mechanisms of membrane proteins Protein Data Bank | Nucleic Acids Research | Oxford Academic Structures of MERS-CoV spike glycoprotein in complex with receptors Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein CHARMM-GUI: A web-based graphical user interface for CHARMM Normal mode analysis as a method to derive protein dynamics information from the Protein Data Bank Comparing the intrinsic dynamics of multiple protein structures using elastic network models Comparison of intrinsic dynamics of cytochrome p450 proteins using normal mode analysis On the convergence of the conformational coordinates basis set obtained by the essential dynamics analysis of proteins' molecular dynamics simulations Convergence of sampling in protein simulations Cosine Similarity -an overview | ScienceDirect Topics ProDy: Protein Dynamics Inferred From Theory and Experiments -PubMed Enhanced receptor binding of SARS-CoV-2 through networks of hydrogen-bonding and hydrophobic interactions Investigating protein dynamics in collective coordinate space Anisotropic Network Model: Systematic Evaluation and a New Web Interface -PubMed Phylogenetic Analysis and Structural Modeling of SARS-CoV-2 Spike Protein Reveals an Evolutionary Distinct and Proteolytically Sensitive Activation Loop This study was supported by FRPDF grant of Presidency University. Protocol designed, conceptualized and analyzed by S.M., manuscript preparation done by D.C. and J.D. Project was done under the supervision of K.G. The manuscript was reviewed and approved by all authors. The authors declare no conflict of interest.