key: cord-0757907-uc8xabac authors: Podder, Soumita; Ghosh, Avishek; Ghosh, Tapash title: Mutations in membrane‐fusion subunit of spike glycoprotein play crucial role in the recent outbreak of COVID‐19 date: 2020-10-14 journal: J Med Virol DOI: 10.1002/jmv.26598 sha: 69c41e85d7a3aa0c4e01e0aa3f0b4c11323303db doc_id: 757907 cord_uid: uc8xabac COVID‐19, the ongoing pandemic caused by SARS‐CoV2 is a major threat to the entire human race. It is reported that SARS‐CoV2 seems to have relatively low pathogenicity and higher transmissibility than previously outbroke SARS‐CoV. To explore the reason of increased transmissibility of SARS‐CoV2 compared to SARS‐CoV, we have performed a comparative analysis on the structural proteins (Spike, Envelope, Membrane, Nucleoprotein) of two viruses. Our analysis revealed that extensive substitutions of hydrophobic to polar and charged amino acids in spike glycoproteins of SARS‐CoV2 creates an intrinsically disordered region (IDR)at the beginning of membrane‐fusion subunit and intrinsically disordered residues in fusion peptide. IDR provides potential site for proteolysis by furin and enriched disordered residues facilitate prompt fusion of the SARS‐CoV2 with host membrane by recruiting Molecular Recognition features. Here, we have hypothesized that mutation driven accumulation of intrinsically disordered residues in spike glycoproteins play dual role in enhancing viral transmissibility than previous SARS‐corona virus. These analyses may help in epidemic surveillance and preventive measures against COVID‐19. This article is protected by copyright. All rights reserved. nucleocapsid (N) protein, membrane (M) protein, and the envelope (E) protein, all of which are crucial to produce a structurally complete viral particle. 7 Coronavirus enter into host cells by using transmembrane spike (S) glycoprotein that forms homotrimers extended from the viral envelope. 8 S encompasses two functional subunits-S1, responsible for binding to the host cell receptor and S2, involved in fusion of the viral envelope and host cellular membranes. For many CoVs, S protein is cleaved at the boundary between the S1 and S2 subunits, which remain as a single polypeptide in the prefusion conformation. 9 The distal S1 subunit comprises the receptor-binding domain (RBD) and facilitates the stabilization of the prefusion state of the membrane-anchored S2 subunit containing the fusion machinery. 10 The cleavage at S1/S2 boundary has been anticipated to stimulate the protein by irreversible conformational changes for membrane fusion. 11 The host proteases for S protein cleavage differ among different coronaviruses, which plays crucial roles in determining the epidemiological and pathological features of virus, including host range, tissue tropism, transmissibility and mortality. For example, a variety of human proteases, such as trypsin, tryptase Clara, human airway trypsin-like protease (HAT) and transmembrane protease serine 2 (TMPRSS2), are reported to cleave and activate the S protein of SARS-CoV. 12, 13 Depending on the viral species, coronaviruses recognize a variety of entry receptors to infect the host. SARS-CoV and This article is protected by copyright. All rights reserved. several SARS-related coronaviruses (SARSr-CoV) interact directly with angiotensinconverting enzyme 2 (ACE2) via S protein to enter into the target cells. 14 Recently, it is reported that mutation in the RBM in SARS-CoV-2 renders more efficient human-human transmission. 15 Scientists have found that SARS-CoV-2 S glycoprotein possesses a furin cleavage site at the boundary between the S 1 /S 2 subunits which helps in activating the fusion machinery of the virus. 16, 17 These two distinctive features in SARS-CoV2 could partially explain the efficient transmission of SARS-CoV-2 in humans. A recent study by S. Zhao et al. 18 has estimated basic reproduction number (R 0 ) for 2019-nCoV in the early phase of the outbreak and revealed that mean R 0 for SARS-CoV2 is ranging from 3.3 to 5.5 which is higher than those of SARS-CoV (R 0 : 2-5). The higher transmissibility of this virus turns the outbreak into pandemic. Thus, it is of the prime interests of the researchers to untangle all the uniqueness of this newly emerged coronavirus by comparing with the previous human infecting SARS-CoV for designing protective measures against it. We have studied an in depth mutational spectra and evolutionary dynamics of these four structural proteins by comparing SARS-CoV2 and human infecting SARS-CoV. Analyzing the impact of mutation in proteins we have found that an intrinsically disordered region is acquired at the beginning of fusion protein(S2) which offers furin cleavage site in SARS-CoV2.Moreover, higher predisposition of intrinsically disordered residues in S2 observed to contain three MoRFs. We here, hypothesized a unique fusion mechanism favored by the MoRFs present in the fusion peptide of novel corona virus. Thus, our study provides a new insight into the genomic feature responsible for rapid Accepted Article transmission of SARS-CoV2 as well as it could help in designing preventives against COVID-19. Coordinates of S1 and S2 subunits of S proteins in SARS-CoV and SARS-CoV2were All the statistical tests were performed using the SPSS package. Bayesian evolutionary rate and divergence date estimates were shown that nonsynonymous-to-synonymous substitution rate ratio is decreasing from SARS (1. This article is protected by copyright. All rights reserved. experience significantly (P=0.001) higher synonymous substitution rates than other proteins to retain overall conservation of the S proteins (Table1). Though, it was well evidenced that RNA viruses accumulate more mutation rates than DNA viruses due to lack of proofreading activity in RNA polymerase they have encoded. 33 However, it would be interesting to investigate whether the accumulation of non-synonymous mutation preferentially in S proteins than others offer any benefits to the virus for enhancing their potency of infectivity. We Table 2 . Several high throughput studies on protein structure have evidenced that regions in a protein enriched with polar and charged amino acids have a tendency to conform intrinsically disordered region (IDR). 34 Moreover, intrinsic disorder residues in virus endure several structural features that associated with viral pathogenicity. 35 Thus, we have predicted IDR in all structural proteins in SARS-CoV2 by PONDR-VLXT and compared predisposition of IDR with the corresponding proteins present in SARS-CoV. Here, we have found that M and E proteins of both SARS-CoV and SARS-CoV2 don't contain any IDR (consecutive disordered residues > 30 amino acids) in their proteins (Table 3) . N proteins contain three IDRs and percentages of intrinsically disordered residues in their proteins is remarkably high. However, enrichment of IDR in N proteins is similar for both viruses (Table 3) . Interestingly, we have revealed an IDR (671-708) in S proteins of SARS-CoV2 but no IDR is found in their previous orthologous SARS-CoV ( Figure 1A , Table 3 ). Moreover, percentages of disordered residues (PID) is significantly (P=0.035) increased in S proteins of SARS-CoV2 compared to SARS-CoV that implies more disorder residues become enriched in S proteins after evolution ( Accepted Article the host. Thus, it is imperative to explore the connection between IDR and elevated transmissibility in SARS-CoV2. S proteins contain two subunits -S1 and S2. Pfam prediction on S proteins of two viruses 16, 17 We have found that this cleavage site is actually resided in the IDR. Since, IDPs/IDPRs lack stable well-folded 3Dstructures, the structural instability renders exceptional sensitivity to proteolysis. 36 The protein expressed on the surface of a pathogen is supposed to be more accessible to surveillance by the immune system than one within the interior of a pathogen. 40 Thus, more genetic variations in surface proteins are the signatures of host-pathogen coevolution. In this study, we have found that amongst the four structural proteins, extensive higher rate of non-synonymous substitution is occurred in spike glycoproteins of SARS-CoV2 when compared with the human infecting SARS-CoV strain. Along with the amino acid substitution having neutral effects on virus fitness, S proteins also experienced five deleterious mutations that may cause destabilization of viral structure. This article is protected by copyright. All rights reserved. The neutral theory of molecular evolution suggested that the mutations decreasing the carrier's fitness tend to disappear from populations through the process of negative or purifying selection (dN/dS<1). 41 Thus, S protein has also experienced higher synonymous substitution rate to balance overall selection pressure on it. Now, it was also depicted that slightly deleterious and slightly advantageous mutations are engulfed by neutral mutations. Thus, the ratio of dN and dS is frequently used to study positive Darwinian selection operating at highly variable genetic loci, but it could not able to detect adaptively important codons offering benefits to the organism for adaptation. 42 Thus, we have extensively studied amino acid changes in all the structural proteins of SARS-CoV2 occurred during evolution from SARS-CoV to search out the mutation posing advantages to the novel virus for their systematic infection in human body. We have revealed that mutations in the four structural proteins of SARS-CoV2 prompt a significant hydrophobic to polar and charged amino acids exchange in S proteins compared to E, M and N proteins (Table2). This trend of amino acid exchange in S proteins is observed to generate an intrinsically disordered region (38 residues) at the upstream of fusion peptide in S2 domain which is embedded inside the envelope of SARS-CoV2. However, amino acid substitution in M, E, and N proteins did not show any enrichment of new IDR in SARS-CoV2 compared to SARS-CoV. Though, it was earlier reported that N proteins of SARS-CoV extensively enriched with intrinsically disordered residues. 43 We found that the propensity of disordered residues in N proteins of SARS-CoV2 (49.6%) is nearly similar with SARS-CoV (50.7%). The enrichment of disordered residues in N proteins has suggested as a crucial phenomenon for their transmission in respiratory routes. 43 Whereas, lower content of disordered residues in This article is protected by copyright. All rights reserved. of charged and polar amino acids in disordered proteins makes them more efficient in generating lateral pressure. This pressure is consequently used to incite membrane curvature. It has been seen that many IDRs induce membrane curvature by recruiting Molecular Recognition Features (MoRF). 47 MoRFs are relatively short (10-70 residues) and typically possessing higher numbers of hydrophilic amino acids and prolines. 48, 49 Thus they could play a vital role in protein-protein interactions, metal binding and in cellular communications. 50 Several roles of MoRFs are also documented in Chikunguniya virus. 51 We have noticed that abundance of disordered residues in SARS- In summary, these analyses provide insights into the mutational effects in originating intrinsically disordered residues in the S2 subunits of spike glycoprotein present in SARS-CoV2. We have also hypothesized a unique fusion mechanism of viral envelope and host membrane by MoRF. However, these propositions are mainly based on our sequence studies and experimental evidences in other organisms, thus further experimental validations are required to confirm this mechanism in corona virus. We are thankful to Mr. Sanjib Kumar Gupta, Senior technical assistant, Bioinformatics Center, Bose institute for his kind help. We are also thankful to two anonymous referees for their valuable suggestions. SP designed, executed experiments and wrote the manuscript. AG performed some parts of the experiment. TCG helped in manuscript preparation. Authors declare no conflict of interest. All data will be available in the Supplementary Total disorder residues 14 13 No of disorder region (>30a.a) NIL NIL PID 6.28 5.86 Total disorder residues 10 9 Interspecies transmission and emergence of novel viruses: Lessons from bats and birds Bat-to-human: Spike features determining 'host jump' of coronaviruses SARS-CoV, MERS-CoV, and beyond World Health Organization WHO. Summary of probable SARS cases with onset of illness from 1 World Health Organization WHO. WHO MERS-CoV Global Summary and Assessment of Risk Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The LANCET World Health Organization (2020) Coronavirus disease (COVID-19) outbreak. Situation report -117, 16 th The molecular biology of coronaviruses Structural insights into coronavirus entry Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding Host cell proteases: Critical determinants of coronavirus tropism and pathogenesis Cathepsin L functionally cleaves the severe acute respiratory syndrome coronavirus class I fusion protein upstream of rather than adjacent to the fusion peptide Cleavage and activation of the severe acute respiratory syndrome coronavirus spike protein by human airway trypsin-like protease Ward Stabilized coronavirus spikes are resistant to conformational changes induced by This article is protected by copyright. All rights reserved. Accepted Article receptor recognition or proteolysis Receptor Recognition by the Novel Coronavirus from Wuhan: An Analysis Based on Decade-Long Structural Studies of SARS Coronavirus A Unique Protease Cleavage Site Predicted in the Spike Protein of the Novel Pneumonia Coronavirus (2019-nCoV) Potentially Related to Viral Transmissibility. Virologica Sinica Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak ViPR: an open bioinformatics database and analysis resource for virology research Phylogenetic analysis by maximum likelihood (PAML). Version 2. 1999 Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm Predicting protein disorder for N-, C-, and internal regions Sequence complexity of disordered protein First Experimental Assessment of Protein Intrinsic Disorder Involvement in an RNA This article is protected by copyright. All rights reserved. Accepted Article Virus Natural Adaptive Process Deciphering the dark proteome of Chikungunya virus Structural disorder in the proteome and interactome of Alkhurma virus (ALKV) MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic On the origin and continuing evolution of SARS-CoV-2. Microbiology Exploring the Differences in Evolutionary Rates between Monogenic and Polygenic Disease Genes in Human Evolutionary rate at the molecular level Complexities of Viral Mutation Rates Why are "natively unfolded" proteins unstructured under physiologic conditions? Overlapping Regions in HIV-1 Genome Act as Potential Sites for Host-Virus Interaction Intrinsically Disordered Proteins and Their "Mysterious Membrane fission by protein crowding Intrinsically disordered proteins in synaptic vesicle trafficking and release Shaping membranes with disordered proteins Harnessing bioinformatics to discover new vaccines Neutral theory: The null hypothesis of molecular evolution Selectionism and Neutralism in Molecular Evolution Understanding Viral Transmission Behavior via Protein Intrinsic Disorder Prediction: Coronaviruses Highthroughput characterization of intrinsic disorder in proteins from the protein structure initiative Intrinsically Disordered Side of the Zika Virus Proteome The importance of being flexible: the case of basic region leucine zipper transcriptional regulators. Current protein & peptide science Mining αhelix-forming molecular recognition features with cross species sequence alignments Characterization of molecular recognition features, MoRFs, and their binding partners Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions Analysis of Molecular Recognition Features (MoRFs) in membrane proteins Understanding the interactability of chikungunya virus proteins via molecular recognition feature analysis Alphasynuclein induces both positive mean curvature and negative Gaussian curvature in membranes Intrinsically disordered proteins drive membrane curvature ArfGAP1 responds to membrane curvature through the folding of a lipid packing sensor motif The number of α-synuclein proteins per vesicle gives insights into its physiological function