key: cord-0768252-7xd0msyk authors: Pal, Debnath title: Spike protein fusion loop controls SARS-CoV-2 fusogenicity and infectivity date: 2020-08-23 journal: bioRxiv DOI: 10.1101/2020.07.07.191973 sha: 5d06743396ef0f7686b5a484666b5a2fe4f59229 doc_id: 768252 cord_uid: 7xd0msyk Compared to the other human coronaviruses, SARS-CoV-2 has a higher reproductive number that is driving the COVID-19 pandemic. The high transmission of SARS-CoV-2 has been attributed to environmental, immunological, and molecular factors. The Spike protein is the foremost molecular factor responsible for virus fusion, entry and spread in the host, and thus holds clues for the rapid viral spread. The dense glycosylation of Spike, its high affinity of binding to the human ACE2 receptor, and the efficient priming by cleavage have already been proposed for driving efficient virus-host entry, but these do not explain its unusually high transmission rate. I have investigated the Spike from six β-coronaviruses, including the SARS-CoV-2, and find that their surface-exposed fusion peptides constituting the defined fusion loop are spatially organized contiguous to each other to work synergistically for triggering the virus-host membrane fusion process. The architecture of the Spike quaternary structure ensures the participation of the fusion peptides in the initiation of the host membrane contact for the virus fusion process. The SARS-CoV-2 fusion peptides have unique physicochemical properties, accrued in part from the presence of consecutive prolines that impart backbone rigidity which aids the virus fusogenicity. The specific contribution of these prolines shows significantly diminished fusogenicity in vitro and associated pathogenesis in vivo, inferred from comparative studies of their deletion-mutant in a fellow murine β-coronavirus MHV-A59. The priming of the Spike by its cleavage and subsequent fusogenic conformational transition steered by the fusion loop may be critical for the SARS-CoV-2 spread. Significance Statement The three proximal fusion peptides constituting the fusion loop in Spike protein are the membranotropic segments most suitable for engaging the host membrane surface for its disruption. Spike’s unique quaternary structure architecture drives the fusion peptides to initiate the protein host membrane contact. The SARS-CoV-2 Spike trimer surface is relatively more hydrophobic among other human coronavirus Spikes, including the fusion peptides that are structurally more rigid owing to the presence of consecutive prolines, aromatic/hydrophobic clusters, a stretch of consecutive β-branched amino acids, and the hydrogen bonds. The synergy accrued from the location of the fusion peptides, their physicochemical features, and the fusogenic conformational transition appears to drive the virus fusion process and may explain the high spread of the SARS-CoV-2. . The degree of infectivity is significantly high in SARS-CoV-2, and far more aggressive, as evidenced by the current global pandemic. This can be quantified by the preliminary reproductive number (R0) of COVID-19 (2.0-2.5), which is higher than the R0 of SARS (1.7-1.9) and far higher than that of MERS (<1) (3) . The significant difference in R0 may accrue due to environmental, immunological or molecular reasons. COVID-19 transmission has been attributed to the long life of SARS-CoV-2 outside the host as it increases the chances of infection through cross-contamination by contact in the population (4). The large distance distribution of the SARS-CoV-2 particles from the infected person due to activities like sneezing and coughing (5), and the tiny size of the virus droplets may be more efficient in penetrating deeply into the pulmonary system to allow rapid spread of the disease (6). However, the SARS-CoV has high genomic similarity with the SARS-CoV-2, and one would have expected it to have similar transmission behavior and R0, which is evidently not the case. For that matter, the environmental spread of other viruses should have been far more widespread than coronaviruses, given that the coronaviruses have the largest RNA viral genomes and therefore the largest particle size and consequently higher aerosol size compared to many other viruses. This is again not that we observe in practice. Another possibility of high viral spread may accrue from intense viral shedding, where SAR-CoV-2 has succeeded early on in rapid viral replication and cell-to-cell spread before the onset of acute inflammatory 5 CoV, and 9-O-Ac-Sia for HCoV-OC43 and HCoV-HKU1. The S2 domain has also diverged, but comparatively less with >33% overall pairwise sequence identity -the heptad repeat regions being the most conserved (Fig. 1B) . It is pertinent to ask what features of Spike sequence and structure determine the virus fusogenicity and infectivity. Densely glycosylated Spike protein has been suggested as the prime reason for the high SARS-CoV-2 infectivity (7). This extensive glycosylation of the S2 domain is driven by an intracellular N-terminal signaling peptide for transport and retention in the endoplasmic reticulum (see Fig. 1B Multiple Sequence Alignment bottom panel). However, this signal sequence is absent in a fellow murine β-coronavirus MHV-A59 Spike protein (10), indicating limited glycosylation. Yet, MHV-A59 aggressively infects the mouse liver and brain. Upon intracranial inoculation in mice, it can cause acute stage meningoencephalitis and myelitis, chronic stage demyelination, and axonal loss (11-13). It infects the neurons profusely and can spread from neuron to neuron. Its propagation from grey matter neuron to white matter and release at the nerve ends to infect the oligodendrocytes by cell-to-cell fusion (12) are robust mechanisms to evade immune responses and induce chronic stage progressive neuroinflammatory demyelination concurrent with axonal loss in the absence of functional virions (12, 14) . Therefore, high infectivity of the MHV-A59 Spike does not appear to be contingent on glycosylation and one can argue the same for SARS-CoV-2 Spike, where its glycosylation may only marginally raise the basal fusion efficiency. Surface glycosylation thus may not be a contributing factor to host cell binding, although the successive virus-to-cell and cell-to-cell fusion may all-together play an important role in higher virus infectivity. It has been suggested that enhanced virus-to-cell infection can be propelled by the increased number of hydrogen-bonded contacts between SARS-CoV-2 RBD and ACE2 receptor leading to higher affinity and improved host targeting compared to the SARS-CoV (7, 15). However, a significantly higher affinity between ACE2 and SARS-CoV-2 has not been experimentally corroborated (16). Besides, such a proposition is weak because the RBDs in all HCoVs are diverse, including SARS-CoV, where the minimal RBD (318 to 510 residues) (17) shares only 74% sequence identity with SARS-CoV-2 (Fig. S1) . Also, the SARS-CoV-2 Spike may interact with other 6 receptors such as DC-SIGN and DC-SIGNR as in SARS-CoV to increase tropism (18) and viral spread. Therefore, there is no direct consequence of ACE2 recognition with infectivity unless a virus entry can be realized; however, when RBDs interact with ACE2 in large numbers during the acute stage of the infection, they may modulate the host immune response by downregulating hydrolysis of the pro-inflammatory angiotensin II to anti-inflammatory angiotensin 1-7 in the reninangiotensin signaling pathway (19) . This can alter the immune response and increase infectivity. But such effects can manifest only beyond the early stage of the infection, and for that to happen the efficiency of viral entry is the rate-limiting step. The cleavage of the Spike protein is said to prime it for the efficient virus-host membrane fusion process. How essential is this for virus fusogenicity and infectivity is an important consideration. The Spike cleavage potentially removes any in situ covalent and noncovalent constraints that the S1 domain may impose on the S2 domain impeding its conformational transition that facilitates the virus entry. It has been proposed that SARS-CoV-2 Spike is preactivated by cleavage at the S1/S2 site when it is packaged inside the host, and the S2' site is cleaved when the Spike gets attached to the host receptor, which makes the priming process very efficient (20). However, a comparison of the S1/S2 cleavage signal sequence …RXXR… shows that SARS-CoV-2 "…RRARS…" Furin recognition site is similar to MHV-A59 Spike's "…RRAHR…", and others like MERS, HCoV-OC43 and HCoV-HKU1 Spike have a conserved motif sequence as well (Fig. 1B) . The cleavage site signal at S2' embedding a single Arginine is highly conserved across all HCoVs. Therefore, the efficient priming advantage available to SAR-CoV-2 Spike is equally present for MHV-A59, MERS, HCoV-OC43, and HCoV-HKU1 Spike. In contrast, the canonical S1/S2 cleavage recognition sequence is missing in SARS-CoV with only a single Arginine present there. A regular cleavage at this site has not been reported, and cleavage by trypsin has been shown to activate the virus independent of the pH due to the presence of a single Arginine. The importance of this region has been aptly corroborated by S2' site cleavage studies in SARS-CoV (21). Besides, it is also possible for Spike to be activated by the low pH environment through protonation of residues if it internalizes in the endosome post interaction with the host receptor. In contrast, fusion 7 processes are known to happen in MHV-A59 Spike without cleavage as well (22). Therefore, based on the similarity SARS-CoV-2 Spike with others, one can argue that it is competent to access multiple pathways for priming that enhances its infection capability, including the possibility that it can infect without a cleavage as well -though these advantages are not unique. Among all the components of the fusion apparatus in the Spike S2 domain, the FPs are the least studied although they have been suggested to contribute to the trigger that drives the virus-host fusion process by initiating the protein-host membrane contacts. Limited experimental information available shows mutation in FP of SARS-CoV Spike can significantly perturb the fusion efficiency (23) as much as >70% (24). Most studies of the FP regions have used synthetic peptides in a fusion assay system to understand their membrane perturbing capabilities (25-28), and how Ca +2 ions may interact with these peptides to modulate fusion (9). Interestingly, the FPs also contains a central proline (29) to each other in the sequence, the whole segment spanning the beginning of FP-1 to the end of FP-III can be termed together as the "fusion loop" (Fig. 1A) . But how these FPs in the loop can act synergistically to rapidly trigger the membrane fusion process is what we explore further in this study. Spike receptor binding and the fusion loop. To understand the synergy of the FPs in the trigger process, it is important to understand the structure of the Spike protein in the proper context. The FP-I to FP-III are surface exposed and are contiguous to each other in space as seen from the three-dimensional structures of the Spike fusion domain (Fig. 2, cartoon diagrams) . While FP-I to FP-III are always surface exposed in the full-length, the FP-IV is deeply buried and interface the virus membrane. (The location of FP-IV although not available from the three-dimensional structure can be inferred from the primary structure and the location of the transmembrane domain ( Fig. 1B) ). Therefore, while FP-I to FP-III are always available for early contacts with the host membrane, FP-IV can participate in the process post the conformational transition which may expose it for interaction with the host membrane. To understand what guarantees the FP surfaces to make the initial protein-host membrane contact, one must look at the possible modes of virus-host attachment mediated by the Spike. For this, we propose a new contact initiation model, where there is no requirement of a fusion peptide to be at the N-terminal of the conformationally transformed pre-hairpin intermediate (9) . To understand the model, let us consider the different options for host receptor binding. For example, if all three RBDs in the trimeric structure find the host receptors, it can attain a tripod binding mode (Fig. 3A) . A recent structure of the trimeric Spike complexed with a host receptor reveals the precise geometry of CEACAM1 binding the RBD of Spike from MHV-A59 (37). However, a tripod binding requires receptor molecules on the host membrane to be preavailable in a specific arrangement. High expression of receptor molecules on the host cell surface is expected to increase the probability of tripod binding, but there is no existing information whether such a precise arrangement is present on the host surface suitable for interaction with the trimeric RBD. Moreover, the membrane bilayer structure does not contain any feature that can direct such regular host receptor arrangement. It may be noted that only in a tripod arrangement, the N-terminal segment of the S2 domain will interact early during the protein-membrane contact post intermediate structure formation, and in such a case the pre-hairpin/pre-bundle helices of the fusion domain are expected to interact head-on with the host membrane (9). However, FP-I and FP-II are located at the middle of the cylinder-like Spike S2 structure (Fig. 3A) and an intermediate formation through conformational transition is needed to place them near the head of the cylinder. Here, FP1 is expected to make the early host-membrane contact if the cleavage is at the S1/S2 site and FP-II if the cleavage is at the S2' site (see Fig. 1A ). The FP-III region has limited scope to make any early contact due to its farthest position from the host membrane surface. In tripod binding mode the virus membrane is still ~150Å away, and the three HR2 regions need to fold back and bind to the hydrophobic grooves of the HR1 trimer in an antiparallel manner to bridge this gap and form a hemifusion structure with the host membrane (9). The Contact Initiation Model -Spike fusion peptide trigger. If only one or two RBDs bind the receptor, the vertical anchoring of the Spike fusion domain relative to the host surface lacks the third anchor rendering the vertical orientation unstable and unfeasible. Also, a recent trimeric structure of SARS-CoV-2 Spike has shown a single RBD to be in the open conformation (7), where it is swiveled away from its core structure originally interacting with the NTD and the fusion domain. In such a state, the interaction of the open RBD with the host (Fig. 3B) is not expected to stabilize the Spike anchoring in any specific orientation relative to the host due to the weak interaction with the fusion domain. Here a post-cleaved S2 domain is expected to interact side-on to the membrane surface through a "belly" landing to trigger the fusion process (Fig. 3C) . Such a process would be sterically facile if there are no other Spike in the vicinity on the virus surface. It is to be noted that the shape of the trimeric S2 domain is not a proper cylinder, but with a bulge in the mid-segment which we call a "belly". The FP-II and FP-I surfaces are located at the crest of this bulge, such that it is able the make the initial contact with the host membrane. The structural constraints that guarantee the "belly" landing can be understood from the overall geometry of the Spike (Fig. 3D ). It is to be noted that the most stable and eventual landing posture of an object having an uneven surface will be the one that guarantees the largest surface area of contact with the host landing surface. The Spike architecture is such where the relative location of the NTDs approximately form three vertices of a triangle, while the FP-I to F-III are located midpoint of the sides, forming the vertices of an inner triangle. Based on the maximum landing surface premise, if we consider the receptor attachment to be in two RBD locations, the S2 landing will be close to the midpoint of the two vertices of the outer triangle coincident to the FP surfaces. Even for one-legged attachment, the contact must always be directed towards the midpoint of the two NTDs because that allows maximum contact surface to be formed where the Spike can stably rest on the host surface. During FP-III fragment revealed its unique ability to form cis-peptide at the central P-P peptide bond (39). A cis-trans isomerization during the membrane fusion process has the potential to expose the hydrophobic residues efficiently around the isomerized-peptide neighborhood, enhancing the fusion trigger potential. While the structural role of proline cannot be discounted, other stabilizing interactions also dominate the FPs, among which the formation of the aromatic/hydrophobic clusters is of relevance to the fusion process (see Fig. S3 for examples). As observed from the packing of the loops, a combination of aromatic and Val/Ile/Leu side chains pack tightly to exclude water. But when these regions become exposed during the conformational transition, they would enhance the hydrophobic interaction of the protein surface with the membrane. In all S2 domains, 13 however, FP-III regions are in part masked by the FP-I segment, as evident from the relative solvent accessible surface area (SASA) values indicated in Fig. 4 . The FP-III region can get fully exposed post cleavage at the S2' site which dislodges the sheathing FP-I segment, or when the site gets progressively exposed during the conformational transition amidst the fusion process. 14 Target for therapy. Our study brings out the importance of the fusion loop region which could be a legitimate target for the design of vaccine or synthetic agents for therapy against COVID-19. The S2 domain serves as a better therapeutic target than S1 due to higher evolutionary conservation, but the few attempts made have mainly focussed attention on the heptad repeat regions (9). Two important features that also make the fusion loop an attractive therapeutic target are its accessibility due to its surface exposure in the full-length Spike, and the relatively high conservation of residues in the fusion loop, especially around the FP-II region which increases its scope as a pancoronavirus target that can cater to future pandemic threats as well. Mimetic peptides can be designed to bind to the fusion loop to inhibit the fusogenic conformational transition of the S2 domain. Impairment of the fusion trigger would have a direct bearing on the fusogenicity of the virus and contribute to the reduction of lung invasion and damage that clinically results in acute pneumonia. Systematic studies can identify the minimal motif in the fusion loop serving as the fusogenic determinant to improve our selection of a potential therapeutic target to prevent cell-tocell fusion and subsequent pathogenesis. The interplay of the outlined physicochemical features determines the virus-entry process to become more efficient. The local and global stability of the S2 domain is important. Since the S2 domain undergoes a conformational transition, local stability means reasonably rigid moving parts, and global stability means a well-defined conformational transition pathway from the metastable to the stable state. This local and global stability requirement could be attributed to the physicochemical efficiency needed in disturbing the host membrane. Secondly, the electrostatic potential of the fusion peptide derived surface patches must be neutral or positive to be able to engage the host membrane. The presence of glycosylation sites adds to the hydrophobicity of the S2 domain surface, but it is a small fraction of the available surface for interaction with the host membrane. The fusogenic conformational transition requires optimal synergy between the physical and chemical properties of the fusion loop to allow a concurrent scything action to rapidly facilitate transition to the host-virus hemifusion membrane state. The free-energy available from the conformational transition of S2 to a more relaxed helical bundle is available to disrupt the host membrane and overcome the kinetic barrier to bring the host and virus membrane lipid bilayers together. The Contact Initiation Model ensures that the virus and host membrane are in close proximity for the formation of a hemifusion structure. The pre-hairpin S2 intermediate as suggested to exist by many researchers may be one of the many conformational states interacting with the host membrane. Priming by Spike cleavage is important for facilitating the rapid fusion process and therefore a part of the synergy at play. However, the open conformation of RBD seen in PDB ID: 6VSB for SARS-CoV-2 suggests that flexible linker segments loosely connect the RBD back to the fusion domain leaving it relatively free for unfettered conformational transition. Therefore, multifarious options to prime and trigger appear to be available to SARS-CoV-2 for viral entry, which contributes to its increased infectivity. Preventing the trigger by inhibiting the fusion loop is therefore a suitable target for therapy. Given the importance, a more extensive study of SARS-CoV-2 Spike protein and the mechanistic hypothesis described here is therefore warranted. The sequences used in this study were downloaded from the NCBI database (URL:http://www.ncbi.nlm.nih.gov). The multiple sequence alignments were performed using the T-coffee webserver (http://tcoffee.crg.cat/). The default parameter values for alignment available in the server were used. The server combines several methods to come up with an optimal multiple sequence alignment (42). All protein three-dimensional structures were downloaded from the Protein Data Bank VLQENQKILAASFNKAINNIVASFSSVNDAITQTAEAIHTVTIALNKIQDVVNQQGSALNHLT 1029 228E|NP_073551. VLQENQKILAASFNKAMTNIVDAFTGVNDAITQTSQALQTVATALNKIQDVVNQQGNSLNHLT 848 Consensus ** :*** :* **.*: : .: . *: *:* . The model is created using candle wax, where the base represents the triangular shape as found for the trimeric spike S1 domain, and the cylindrical stem represents the trimeric S2 domain. The stem is attached to the virus envelope through trimeric flexible linkers at the C-terminal end of the Spike ectodomain. The model on the extreme left shows that an unanchored structure on a triangle base, when bumped from the side, will fall in a position where one of the base sides rests on the surface. This in turn would ensure that the fusion peptide surface on the stem aligned along the middle of the base will make the initial contact and trigger the fusion process (Please check the triangles marked in Fig. 2D alongside) . The fact that the orientation holding it by the corner of the base is not stable is emphasized in third and fourth illustrations, and the model always falls such that it rests on the side base. This architecture guarantees the fusion peptides in the S2 domain make the initial contact given that they are located at the crest of the bulge on the S2 domain surface. The last figure emphasizes that three anchors are needed for a stable vertical "tripod" orientation of the spike with respect to the host surface. Fig. 4 (main text) . The ball and stick model of the starting structures of all the fusion peptides with central proline is shown along with their backbone trace. The central prolines are marked by "P". Carbon atoms are in green, oxygen in red, nitrogen in blue, and sulfur in yellow. The location of the aromatic cluster is marked by oval in SARS-COV-2 and SARS-CoV Spike protein. The spread of the conformation in each FP can be visually estimated from the contours projected onto the X-Y plane. COVID-19 infection: the perspectives on immune responses SARS-CoV-2: an Emerging Coronavirus that Causes a Global Threat COVID-19, SARS and MERS: are they closely related? The sequences for HCoV-NL63 and HCoV-2289E are not marked because they belong to α-coronaviruses, whereas all others are from β-coronaviruses. The RED marked region spanning Heptad repeat region 2 C-terminal and trans-membrane domain N-terminal region is also called the aromatic domain or the fourth fusion peptide FP-IV. Accession number of Genome/Spike protein sequences used References used to annotate the alignment Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Cellular entry of the SARS coronavirus Inhibition of SARS-CoV-2 (previously 2019-nCoV) infection by a highly potent pancoronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion Acknowledgments DP thanks the Department of Biotechnology, New Delhi for supporting the computational facilities, and Prof. Jayasri Das Sarma, Department of Biological Sciences, Indian Institute of Science Education and Research Kolkata for her generous support.