key: cord-0997975-94degu58 authors: Baldwin, Quenisha; Sumpter, Bobby G; Panagiotou, Eleni title: The local topological free energy of the SARS-CoV-2 Spike protein date: 2021-02-07 journal: bioRxiv DOI: 10.1101/2021.02.06.430094 sha: b97152c91b75356340d44b88e9653eb0ec5da4b5 doc_id: 997975 cord_uid: 94degu58 The novel coronavirus SARS-CoV-2 infects human cells using a mechanism that involves binding and structural rearrangement of its spike protein. Understanding protein rearrangement and identifying specific residues where mutations affect protein rearrangement has attracted a lot of attention for drug development. We use a mathematical method introduced in [9] to associate a local topological/geometrical free energy along the SARS-CoV-2 spike protein backbone. Our results show that the total local topological free energy of the SARS-CoV-2 spike protein monotonically decreases from pre-to post-fusion and that its distribution along the protein domains is related to their activity in protein rearrangement. By using density functional theory (DFT) calculations with inclusion of solvent effects, we show that high local topological free energy conformations are unstable compared to those of low topological free energy. By comparing to experimental data, we find that the high local topological free energy conformations in the spike protein are associated with mutations which have the largest experimentally observed effect to protein rearrangement. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to the COVID-19 global pandemic which has taken over 2 million lives. The need to stop the spread of the highly infectious virus requires disruption in the infection process and has become the focus for many scientists. Even more so, the task to be able to control and stop a global pandemic in the future is of great interest. Fusion of the membranes of both a host cell and a viral cell is necessary for infection [10] . Viral glycoproteins aid in this process by facilitating the binding of the two cells. Viral glycoproteins are folded proteins on the enveloped viral cell membrane which, when triggered, undergo irreversible dramatic conformational changes [14, 19, 24, 42, 48, 62, 63] . The SARS-CoV-2 S protein is a class I viral glyco-protein that consists in two subdomains (S1 and S2) and is triggered by cleavage at the S1 cleavage site [14, 14, 20, 25, 30, 37, 63] . S1, containing a receptor binding domain (RBD), binds to a host cell receptor, angiotensin-converting enzyme 2 (ACE2), and leads to a second cleavage at an S2' cleavage site, adjacent to the fusion peptide. The protein undergoes several structural rearrangements that lead to a stable post-fusion state which brings the two membranes together [14] . In this manuscript, we quantify these changes with the aim to understand how local changes can trigger global conformations. Folded proteins are defined by their primary, secondary, tertiary and quarternary structure [1] . The primary structure refers to the protein by amino acid sequence. The secondary structure refers to a sequence of 3-dimensional building blocks the protein attains (beta sheets, α-helices, coils). The tertiary structure refers to the 3-dimensional conformation of the entire polypeptide chain. The rearrangement of viral proteins during protein fusion changes both their tertiary structure and their secondary structure. We use tools from knot theory, namely, the Writhe and the Torsion, to characterize the viral protein conformations at the length-scale of 4 consecutive residues along the backbone. In the last decades, measures from knot theory have been applied to biopolymers [3-8, 12, 13, 16, 21, 29, 35, 38, 39, 46, 50, 51, 54, 56, 58] and in particular to proteins to classify their conformations [6-8, 15, 23, 31, 41, 46, 53, 55] . One of the simplest measures of conformational complexity of proteins that does not require an approximation of the protein by a knot dates back to Gauss; the Writhe of a curve. In [9] the Writhe and the Torsion were used to define a novel topological/geometrical free energy that can be assigned locally to the protein. The results therein showed that high local topological free energy conformations are independent of the local sequence and may be involved in the rate limiting step in protein folding. In this paper we apply this method to the spike protein of SARS-CoV-2 to characterize its conformation in various phases of viral fusion. Our results show that the local topological free energy of the spike protein is decreasing monotonically as the protein undergoes various conformational changes pre-fusion to post-fusion, in agreement with a transition of the protein from a metastable to stable state. Our results in combination with DFT calculations suggest that local conformations of high local topological free energy are unstable. By comparing our results to experimental data, we find that residues in high local topological free energy conformations are possible candidates for mutations with impact on protein rearrangement. The paper is organized as follows: Section 2 describes the topological and geometrical functions for characterizing 3-dimensional conformation used in this paper and the density function theory calculations used to evaluate conformational stability. Section 3 describes the results of this method for SARS-CoV-2. Finally, in Section 4, we summarize the findings of our analysis. In this Section we give some definitions necessary for the rest of the manuscript. We represent proteins by their CA atoms, as linear polygonal curves in space. A measure of conformational complexity of curves in 3-space is the Gauss linking integral. When applied to one curve, this integral is called the Writhe of a curve: . For an oriented curve with arc-length parameterization γ(t), the Writhe, W r, is the double integral over l: It is a measure of the number of times a chain winds around itself and can have both positive and negative values. The total Torsion of the chain, describes how much it deviates from being planar and is defined as: The Torsion of an oriented curve with arc-length parameterization γ(t) is the double integral over l: The Writhe and the Torsion have successfully been applied to study entanglement in biopolymers and proteins in particular [2, 6, 6-8, 17, 18, 40, 43-45, 47, 49, 52 ]. An important property of the Gauss linking integral and the Torsion which makes them useful in practice is that they can be applied to polygonal curves of any length to characterize 3-dimensional conformations at different length scales. In this work, we use the Writhe and the Torsion to characterize the local conformation of parts of the protein at the length scale of 4 residues, we call this the local Writhe (local Torsion, resp.). The local Writhe is a measure of the local orientation of a polygonal curve and a measure of its compactness. For example a very tight right handed turn (resp. left-handed) will have a positive (resp. negative) Writhe value close to 1 (resp. −1), while a relatively straight segment will have a value close to 0. Similarly, the Torsion is 0 for a planar segment and increases to ±1 as the segment deviates from being planar. In this Section we give the definition of the local topological free energy, originally defined in [9] . To assign a topological/geometrical free energy along a protein backbone, we first derive the distributions of the local Writhe and local Torsion and local ACN in the ensemble of folded proteins. To do this in practice, we use a curated subset of the crystal structures provided in the PDB [11] . Namely, we use the dataset of unbiased, high-quality 3-dimensional structures with less than 60% homology identity from [60] . Then for each residue of a given protein we compare its local Writhe (resp. Torsion) value to those of the ensemble and a free energy is assigned to the residue based on the population of that value in the ensemble. Let X denote a topological parameter (local Writhe, local Torsion). Let d X denote the density (ie. the number of occurrences) of X in the folded ensemble (d W r , d T , respectively). Let m X (resp. m W r , m T ) denote the maximum occurrence value for X. To any value p of X, we associate a purely topological/geometrical free energy: The total local topological free energy of a protein is defined as the sum of local topological free energies in Writhe (resp. Torsion) along the protein backbone. In [9] it was shown that the experimentally observed folding rates of a set of 2-state single domain proteins decrease with increasing total local topological free energy of the proteins. We will say that a residue is rare or in high local topological free energy in a parameter X (local Writhe or local Torsion) if its value X = p is such that Π(p) ≥ w, where w is a threshold corresponding to the 95th percentile of Π-values across the set of folded proteins. We stress that our definition of rare residue involves 4 consecutive residues, starting from the one we label as rare. We will say that a residue is in a high local topological free energy conformation, we denote LTE, when it is one of the 4 residues composing a conformation with high local topological free energy. Geometry optimizations of the identified high local topological free energy within the SARS-CoV-2 spike protein, were carried out in a model aqueous solution phase without imposing geometrical restrictions by using the NWChem suite of programs (version 7.0.2) [59] . All residues were optimized at the MO6-2X/6-311++G** level, e.g., via hybrid metafunctionals [66] , and solvent effects were accounted for by using the Solvation Model Based on Density (SMD), [36] . The optimization of the identified residues via DFT was done to evaluate the validity of our hypothesis that high local topological free energy conformations are indicative of unstable structures. Figure 2 . We find that the total local topological free energy in Torsion decreases from pre-fusion to post-fusion for all proteins. Similarly, we find that the total local topological free energy in Writhe also decreases pre-fusion to post-fusion, with the exception of SARS. This suggests that viral proteins during fusion are guided towards a minimum of the total local topological free energy, in agreement with a transition from a metastable to a stable state. We found that the total topological free energy of SARS-CoV-2 decreases from pre-to post-fusion (see Figure 2 ). We next analyze how the local topological free energy changes in various pre-fusion stages. In pre-fusion the SARS-CoV-2 glycoprotein may be in uncleaved conformation entails that all three RBD are in the down position [14] . An open conformation entails that there is an RBD in the up position, accessible for the angiotensin-converting enzyme 2 or ACE2 receptor to bind [14, 30, 64] . A cleaved protein indicates that the protein has been proteolytically cleaved at the cleavage site by a furin protease into the receptor binding subunit of S1. This is necessary for conformational changing of the RBD in human coronavirus. The fusion subunit of S2 remains associated after cleavage until post-fusion [65] . An intermediate conformation indicates that cleavage at the RBD has occurred and the RBD has been removed yet refolding has not occurred [14] . Section we compare this variant to two other variants of interest, G614 and HexaPro [28] . Our results in Figure 4 (Right), show that the G614 variant has an overall much higher local topological free energy in comparison to the HexaPro variant and the D614 variant. This is in agreement with experimental results which suggest that the D614 and HexaPro are more stable. The D614 variant is proposed to form a hydrogen bond with T859 (of the neighboring protomer or chain), limiting its flexibility while the G614 variant does not form this bond [34] . The HexaPro variant consists in 6 mutations [28] and has shown to stabilize the pre-fusion structure and produced high yield. In terms of the distribution of the local topological free energy in domains, we find that the G614 variant has increased local topological free energy in SD1, SD2, CD and HR1 domains. The results for Π T are similar and are shown in SI. In this section we focus on those local conformations in SARS-CoV-2 with high local topological free energy. In this section we use DFT calculations to compare the minimal energy configurations of high local topological free energy versus medium/low topological free energy conformations in SARS-CoV-2. obtained conformations versus those of the PDB. We do this for the high local topological free energy conformations versus medium or low local topological free energy conformations in SARS-CoV-2. The distribution of the differences is shown in Figure 5 . Orange indicates the difference for high local topological free energy conformations and blue for medium/low local topological free energy conformations. Figure 5 : The distribution of (Π W r ) DF T − (Π W r ) P DB . The difference for residues in high local topological free energy conformations is shown in orange and that of medium or local topological free energy conformations is shown in blue. The distribution of high LTE minimized differences has a skewness of −0.006 while that of medium/low LTE has skewness of 1.717. We note that 2 of the largest positive differences in medium/low LTE (outliers in blue) are for conformations that proceed a gap in the PDB sequence. Our results show that the distribution of (Π W r ) DF T − (Π W r ) P DB is more broad for the high LTE conformations compared to the medium/low LTE conformations. The skewness of the distribution of the medium/low LTE conformations is 1.717, while that of the high LTE conformations is −0.006. In particular, we find that for the majority of medium/low topological free energy conformations, their DFT structure is either very similar or it has lower local topological free energy. In contrast, many of the DFT reduced conformations of the high local topological free energy conformations, have even higher local topological free energy. These results further corroborate the idea that high local topological free energy conformations are indicative of unstable structures. The high LTE conformations in the SARS-CoV-2 S protein at various stages pre-fusion, post-fusion and for some of its variants are given in Table 1 In this section we compare the experimentally reported ability of a mutation to impact 3-dimensional rearrangement and the local topological free energy at that site for SARS and SARS-CoV-2 known mutations. Our results are shown in Tables 2 and 3 in SI. In the following all high LTE conformations are in Writhe, unless stated otherwise. Overall, we find that 75% of the mutations which are known experimentally to change the 3-dimensional properties of the spike protein are at residues in high local topological free energy conformations. The effect on protein conformation of the new mutants that have naturally arisen is to our knowledge undetermined. We find that only 42% of those natural mutants occurred at residues in high local topological free energy conformations. However, we note that the only one of the natural occurring mutants for which we have a crystal structure is G614. We found that the residue 614 was in a high LTE in Torsion conformation in D614, only in the open state, while it is found in a high LTE in Torsion conformation in the closed mutant G614. More precisely, mutations at SARS residues K968 and V969 (known as 2P a double proline mutation) caused disruption of conformational changes upon binding [33] . We find both residues in high local topological free energy conformations. The cleavage site of SARS-CoV-2, residue A688, which has been identified as important in viral rearrangement [27] , is in a high local topological free energy conformation. The residue 614 where the G614 natural mutation occurred is in a high local topological free energy conformation in Torsion for D614 and G614 [14] . Also, residue 985 which is involved in a stabilizing mutation, is in a high local topological free energy conformation [37] . In [28] 43 substitutions where studied. The most efficient was the HexaPro variant (involves 6 mutations), which stabilized the pre-fusion structure and produced high yield [28] . We find two out of the six mutations composing HexaPro (F817P, A892P, A899P, A942P, K986P, V987P) to be in a high local topological free energy conformation before the mutation and none to be in a high local topological free energy conformation after the mutation. In [26] , using a different method, specific sites for mutation that would change global conformation were identified. These were a double cysteine mutant, S383C D985C (RBD to S2 double mutant (rS2d)), a triple mutant, D398L S514L E516L (RBD to NTD (triple mutant (rNt)), a double mutant, N866I A570L (subdomain 1 to S2 double mutant (u1S2d)), a quadruple mutant, A570L T572I F855Y N856I (subdomain 1 to S2 quadruple mutant (u1S2q)) and finally, a double cysteine mutant, G669C and T866C, to link SD2 to S2 (subdomain 2 to S2 double mutant (u2S2d)). We found that out of these 5 mutants, 4 contained residues in high local topological free energy conformations. Moreover, one of the most efficient mutations contained 2 residues in a high local topological free energy conformation. In [32] , a combination of mutations were examined. Categorized by location the mutations included N532P, T572I, D614N, D614G of SD1, and A942P, T941G, T941P, S943G, A944P, A944G , and A892P, F888 and G880C, S884C and A893C, and K986P and V987P of the C-terminus of HR1 [32] . We find residues 572, 614, 942, 943, 944, 884, 986, 987 are in high local topological free energy conformations (884 in Torsion). A892P was reported to increase closed trimers while A942 decreased closed trimers. The double proline mutation (2P) at K986P and V987P stabilized the pre-fusion structure. K986P was reported to have higher ACE2 binding affinity while D614N and T572I reported to have low binding affinity [32] . The top 10 naturally prevalent mutations of the SARS-CoV-2 S protein (the recently discovered UK and South African mutations [22, 57] ) are shown in Table 4 . These mutations are are believed to increase infectivity and transmission and the majority of them are located in S1 [61] . Table 4 . One of these residues, residue 570, is in a high local topological free energy conformation. The South African variant, involves N501Y, E484K, and K417N. We find none of these residues to be in high local topological free energy conformations before the mutation. We used the local topology/geometry of protein crystal structures alone to associate a local topological free energy to the protein backbone. We find that total local topological free energy decreases from pre-to post-fusion. In addition the total local topological free energy of the spike protein of SARS-CoV-2 pre-fusion decreases continuously in the steps leading to protein rearrangement, in agreement with a transition to an energetically more stable state. We found that the total local topological free energy of the G614 mutant was much higher than the D614 and HexaPro mutants. Experimental results have shown that the G614 mutant is more unstable compared to the D614 and HexaPro mutants. This finding further supports that the purely topological free energy can quantify protein stability. fusion in a fashion that plunges the FP into the host membrane. HR2, located at the tail of the protein, folds to bring the viral membrane to close proximity to the host membrane during fusion [30] . The linking of uniform random polygons in confined spaces The effects of density on the topological structure of the mitochondrial dna from trypanosomes Dna knots reveal a chiral organization of dna in phage capsids Knotting probability of dna molecules confined in restricted volumes: Dna knotting in phage capsids Exploring the correlation between the folding rates of proteins and the entanglement of their native state Sequence and structural patterns detected in entangled proteins reveal the importance of co-translational folding Linking in domain-swapped protein dimers The local topological free energy of proteins Mechanisms of coronavirus cell entry mediated by the viral spike protein Magnetic helicity in a periodic domain Predicting knot or catenane type of site-specific recombination products A topological characterization of knots and links arising from site-specific recombination Distinct conformational states of sars-cov-2 spike protein Protein knotting by active threading of nascent polypeptide chain exiting from the ribosome exit channel Tangle analysis of difference topology experiments: applications to a mu protein-dna complex The average inter-crossing number of equilateral random walks and polygons The mean-squared writhe of alternating random knot diagrams The many mechanisms of viral membrane fusion proteins Cryo-em analysis of the post-fusion structure of the sars-cov spike glycoprotein Topological descriptions of protein folding European Centre for Disease Prevention and Control. Rapid increase of a sars-cov-2 variant with multiple spike protein mutations observed in the united kingdom Topological methods for open-knotted protein chains using the concepts of knotoids and bonded knotoids Mechanism of membrane fusion by viral envelope proteins Viral membrane fusion Controlling the sars-cov-2 spike glycoprotein conformation A multibasic cleavage site in the spike protein of sars-cov-2 is essential for infection of human lung cells Structure-based design of prefusion-stabilized sars-cov-2 spikes Random state transitions of knots: a first step towards modeling unknotting by type ii topoisomerases Structural and functional properties of sars-cov-2 spike protein: potential antivirus drug development for covid-19 Knotprot: a database of proteins with knots and slipknots Stabilizing the closed sars-cov-2 spike trimer Stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition of proteolysis Tracking changes in sars-cov-2 spike: Evidence that d614g increases infectivity of the covid-19 virus Dna-dna interactions in bacteriophage capsids are responsible for the observed dna knotting Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions Structure-guided covalent stabilization of coronavirus spike protein trimers in the closed conformation Knotting of random ring polymers in confined spaces Efficient sampling of knotting-unknotting pathways for semiflexible gaussian chains Linear random knots and their scaling behavior Complex lasso: new entangled motifs in proteins Early steps of retrovirus replicative cycle Pulling-force-induced elongation and alignment effects on entanglement and knotting characteristics of linear polymers in a melt Writhe and mutual entanglement combine to give the entanglement length Topological methods for polymeric materials: characterizing the relationship between polymer entanglement and viscoelasticity A topological study of protein folding kinetics. Topology and Geometry of Biopolymers Backbone free energy estimator applied to viral glycoproteins On the mean and variance of the writhe of random polygons The rabl configuration limits topological entanglement of chromosomes in budding yeast Effect of knotting on the shape of polymers Automatic classification of protein structure by using gauss integrals Ftsk-dependent xercd-dif recombination unlinks replication catenanes in a stepwise manner Pathways of dna unlinking: a story of stepwise simplification Conservation of complex knotting and slipknotting in patterns in proteins Untangling dna Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (sars-cov-2) lineage with multiple spike mutations in south africa. medRxiv Novel display of knotted dna molecules by two-dimensional gel electrophoresis Nwchem: a comprehensive and scalable open-source solution for large scale molecular simulations Pisces: a protein sequence culling server Decoding sars-cov-2 transmission and evolution and ramifications for covid-19 diagnosis, vaccine, and medicine Virus membrane fusion Structures and mechanisms of viral membrane fusion proteins: multiple variations on a common theme Cryo-em structure of the 2019-ncov spike in the prefusion conformation Sars-cov-2 and bat ratg13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects A new local density functional for main-group thermochemistry