key: cord-0684775-7axo7k51 authors: Satarker, Sairaj; Nampoothiri, Madhavan title: Structural Proteins in Severe Acute Respiratory Syndrome Coronavirus-2 date: 2020-05-25 journal: Arch Med Res DOI: 10.1016/j.arcmed.2020.05.012 sha: 849c68250076639ed0f7f6364a51856bfb1960f9 doc_id: 684775 cord_uid: 7axo7k51 Abstract What began with a sign of pneumonia-related respiratory disorders in China has now become a pandemic named by WHO as Covid-19 known to be caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). The SARS-CoV-2 are newly emerged β coronaviruses belonging to the Coronaviridae family. SARS-CoV-2 has a positive viral RNA genome expressing open reading frames that code for structural and non-structural proteins. The spike, nucleocapsid, membrane, and envelope proteins are structural proteins. The S1 subunit of spike protein facilitates ACE2 mediated virus attachment and S2 subunit for membrane fusion. The presence of glutamine, asparagine, leucine, phenylalanine and serine amino in SARS-CoV-2 enhanced ACE2 binding. The N protein is composed of a serine-rich linker region sandwiched between N terminal (NTD) and C terminal (CTD). These terminals play a role in viral entry and its processing post entry. The NTD of SARS-CoV-2 N protein forms orthorhombic crystals and binds to the viral genome. The linker region contains phosphorylation sites that regulate its functioning. The CTD promotes nucleocapsid formation. Envelope proteins contain an NTD, hydrophobic domain and C terminal which form viroporins needed for viral assembly. Membrane proteins hydrophilic C terminal and amphipathic N terminal. Its long-form promotes spike incorporations and interaction with E facilitate virion production. As each protein is essential in viral functioning, this review describes the insights of SARS-CoV-2 structural proteins that would help in developing therapeutic strategies by targeting each protein to curb the rapidly growing pandemic. S-E-M-N -3' flanked by untranslated regions on both the ends. The rep gene codes for the nonstructural protein, membrane and envelope protein that are essential in viral assembly and nucleocapsid protein is important for RNA synthesis (15) . The spike protein is responsible for the attachment of the virus to the host cell and its subsequent entry into it (6, 15, 16) . Recently, mutations in the ORF1, ORF8, N region of SARS-CoV-2 were observed (17) . The mutations in non-structural proteins (nsp) 2 and nsp 3 could be the reasons for its unique mechanism of action as compared to SARS. The presence of glutamine, serine, and proline at various positions in the sequence of SARS-CoV-2 is believed to affect its properties as shown in Table 1 , Table 2 and Table 3 (18) . Only 3.8 % variability was seen in the genome sequence of SARS-CoV-2 and the strain of coronavirus obtained from bats i.e. RaTG13, thereby indicating 96.2 % similarity between the two. These genomes of SARS-CoV-2 were seen to be expressed in two types of strains of CoV, namely L and S type. Recently genomes of 103 patients infected with SARS-CoV-2 were analysed, where-in 101 of them showed a linkage in polymorphism for single nucleotide. Among these 72 of them expressed L type, which is named due to involvement of Leucine codon and 29 strains showed S type named due to involvement of Serine codon. This change was observed at the site of 8,728 and 28,144 of the sequence (19) .Similar studies of SARS-CoV-2 genome analysis showed Type I and Type II SARS-CoV-2 genotypes that differed at sites 8750, 29063 and 28112 (20) as shown in Figure 5 and SARS-CoV-2 genome types A (ancestral genome), B (obtained by mutations at C28144T and T8782C in type A) and C (obtained by mutation at G26144T in type B) (21) . Full genome sequencing of various SARS-CoV-2 isolates have shown multiple variations in different ORFs corresponding to the sequence of SARS-CoV-2 obtained from bat RaTG13 which shows that as the virus having spread globally, has evolved with numerous mutations (22) as shown in Supplementary Table 1 that could place the medical fraternity at an even more alarming situation. Structural Proteins of SARS-CoV-2 The envelope of corona-virion contains protruding projections from its surface called the large surface glycoproteins or spike proteins responsible for recognizing the host's receptor followed by its binding to it and fusing with its membrane (23) . Due to the crown-shaped appearance of these projections it has been named coronavirus {corona-a crown (Latin)} (24). The amino acid sequence of the SARS-CoV-2 spike protein has ~ 75 % homology with SARS-CoV spike protein (25) . Other findings have reported a 70% similarity of S1 subunit and 99% similarity of S2 subunit of SARS-CoV-2 with SARS-CoV (11) . Its molecular weight is about 141178 kDa and contains 1273 amino acids (26) . Generally, a coronavirus particle may contain about 50-100 timers of spikes (27) . The spike protein consists of an ectodomain element, transmembrane moiety and a short intracellular C fragment. The viral ectodomain inherits two subunits, namely S1 that facilitates receptor binding and S2 that facilitates membrane fusion. It is structured like a clove built from three S1 subunits and S2 stem formed of a trimer (10) as shown in Figure 6A . The genomic sequence of spike with its amino acids is shown in Figure 6B . The functional components of spike protein were mapped according to the amino acid positions as shown in Table 4 . Spike proteins also help in promoting adhesion of infected cells with adjacent non-infected cells that enhance the spreading of the virus (29) . The additional 12 nucleotides towards the arginine cleaving site correlate to a cleavage site similar to furin and this site is known to be cleaved during virus activation (24) . The Angiotensin-Converting Enzyme 2 (ACE2) and Type II transmembrane serine protease (TMPRSS2) are co-expressed in type II pneumocytes. TMPRSS2 initiates cleavage followed by activation of spike protein that promotes membrane fusion and viral entry into the cells and also the transmission of infection to neighbouring cells, specifically in the conditions of low pH (30) . The spike protein of SARS-CoV-2 undergoes a structural transformation for the binding of the viral membrane to the host membrane to occur. A Receptor Binding Domain (RBD) is present in the S1 subunit that recognizes and binds to the human ACE2 with 10-20 times more affinity than the SARS-CoV domain (31) possibly due to polymorphism at 501T thus enhancing the receptor spike interactions and promoting infectivity (32) . A unique feature is seen in the S1 ad S2 sites of SARS-CoV-2 where both these sites form a prolonged loop which lies towards the outer end of the trimer thus rendering SARS-CoV-2 to more proteolytic interaction by the proteases of host cells (4). The S1 subunit occurs in the prefusion trimer phase. However, upon binding to the host cell receptor, it destabilizes and sheds itself and causes the S2 subunit to transform into a stable postfusion state. This spike protein can display itself in an upward phase (available for a receptor) or downward phase (unavailable for a receptor).In the downward phase of n-CoV, the RBD lies near the trimer's central pocket (31) . The S2 subunit comprises a single fusion peptide (FP), an additional proteolytic site with an internal fusion peptide, and just prior to the transmembrane domain, a two heptad-repeat (24) . Recent analysis has suggested four novel inserts in the spike protein that are unique to SARS-CoV2. In the S1 subunit, the insert I hints about the N terminal domain while the C terminal domain has been implicated by the insert II and insert III. At the junction of the subdomain 1 and subdomain 2 of the S1 subunit lies the insert IV. Interestingly the inserts I, II, and III have shown remarkable homology with the HIV-1 gp120 while insert IV to Gag proteins (33) . Another study revealed the characteristic features of certain amino acid in S protein that has rendered SARS-CoV-2 with high affinity to ACE receptor. The presence of glutamine 493, asparagine 501, leucine 455, phenylalanine 486 and serine 494 amino acids have shown to provide a boost in ACE2 binding (32) . Therefore, the S protein plays a crucial role in viral entry into the host cells and the structural abilities in this newly discovered SARS-CoV-2 boost its intended actions. The fact that these protruding spikes are the first point of contact with host receptors, therapeutic strategies can be applied to prevent its binding to target receptors and prevent further actions. The N protein is the most abundant viral protein and is expressed in host samples during the early stages of infection. It is known to bind to viral RNA to form a core of a ribonucleoprotein which helps in its host cell entry and interaction with cellular processes following the fusion of virus (34) . The sequence of SARS-CoV-2 N protein shows ~90 % similarity with the N protein of SARS-CoV (25) . The RTC i.e. the Replication Transcription Complexes formed by the non-structural proteins (nsp) play an essential part in viral genome synthesis (35) . A linker region is sandwiched between an N terminal domain (NTD) and a C terminal domain (CTD) as shown in Figure 7 . The N protein in SARS-CoV caused activation of Cyclooxygenase-2 (COX-2) enhancer further causing the expression of COX-2 that causes inflammation in the lungs (37) . It is also involved in the inhibition of phosphorylation of B23 phosphoprotein that is essential in the progression of the cell cycle during the duplication of centrosome (38) . The N protein interacts with the p42 proteasome subunit, which is known to degrade the viral proteins (39) . It also showed to inhibit type I Interferon (IFN) causing restrictions in immune responses generated by the body due to viral infections (40) . Cell line studies showed that due to the inhibitory action of N protein on the cyclin-cyclin-dependent kinase complex (C-CDK) the progression of S-Phase was reduced (41) Another cell line study showed reduced proliferation of cells due to inhibition of cytokinesis and protein translation due to N protein-induced aggregation of a translation factor named as Human elongation factor 1 α (HEF1 α) (42) . The viral RNA synthesis is known to increase when N protein is seen to interact with Heterogeneous nuclear ribonucleoprotein (hnRNPA1) (43) . The structural elucidation SARS-CoV-2 N protein NTD showed that a single asymmetric confirmation of four N protein NTD depicted a structural alignment of an orthorhombic crystal. Basically, it looks like a wrist made up of acidic moieties with a palm of basic components and the core of β sheet extending like fingers (36) . The NTD of N protein is known to bind to the RNA genome where the sequencing of different ordered and disordered regions of N protein showed that RNA binding occurs at the N45-181 region that exists as a monomer (44) . It was also reported that the presence of amino acids Arginine at position 94 and Tyrosine at 122 might be essential for the binding of SARS-CoV RNA (45) . But further studies proved that the combination of the linker region and the NTD and even the CTD of N protein was essential in enhanced binding capacity to the viral RNA where the latter showed about 6-8 times more binding performance than the earlier, as CTD is dimeric in nature with 2 disordered regions around it as compared to the single disordered region around NTD (46) . The central linker region is rich in serine and arginine residues (SR region) possessing essential phosphorylation sites that may regulate the N protein functioning (47) . This site forms the major phosphorylation site that enhances the interactions in proteins, localization of linker proteins within the cell, and mediates the desired activity. Higher the alanine substitutions reduced the amount of N protein phosphorylation. (48). In the linker region, the interactions of SR moieties with the central region lead to the formation of dimers of N proteins essential for activity (49). The CTD is hydrophobic in nature, rich in helix, and is also known as the domain where dimerization occurs. The reason for this is that it contains residues that form homodimers by self-association of itself (45) . Structurally every single asymmetric moiety of CTD consists of four individual homodimers joined to form an octamer. The arrangement of these moieties appears in the shape of the letter "X" that forms symmetric fold structures, one being perpendicular and another perpendicular to the midpoint of the structure. Due to the positive charges in this region, the N terminus appears to be basic that could hint it to be a site for the binding of nucleic acid (50). The domain components possess large forces of repulsion that prevents interactions within the domains. This provides an electrostatically larger binding surface and prevents oligomer and nucleocapsid formation. Therefore when nucleic acid binds to it causes neutralization of the charges causing accumulation of protein molecules to come closer and oligomerize thus forming nucleocapsids (46) . The N protein showcases various activities essential in the functioning and proliferation of the virus and thus is another essential component after spike proteins. Along with S, the N shows a promising area in the field of developing effective therapeutics to prevent the proliferation of viral progenies. The E protein is a tiny integral membrane protein composed of an NTD, hydrophobic domain, and a chain at C terminal (51) having 76-109 amino acids (52) and 8-12 kDa in size (53). The N terminal stretches from 1 st -9 th amino acids, the hydrophobic region ranges from 10 th -37 th position and C terminal from 38 th -76 th position in the structural sequence (54) . The amino acids in the 1-11 position are present in virion while the hydrophobic tail is towards the cytoplasm(55). The hydrophobic region forms oligomers to form an ionic pore across the membranes. Structural elucidation of this protein shows its pentameric form containing 35 α-helical regions and 40 looped regions. Both these structures showed randomized movements that modulated the normal activity of the ion channels, thus enhancing the viral pathogenicity (56) . In contrast, interactions within the C terminal domain may affect this pentameric conformation (57) . A novel feature is seen in the 69 th position of the E protein sequence of SARS-CoV-2 where Arginine has been replaced by Alanine, glutamine and Aspartate compared in other similar coronaviruses. Also, at positions 55-56 threonine and valine have been identified (58) . After translation E can undergo palmitoylation at cysteine residues or N mediated glycosylation at aspartate amino acid but may not be necessary for virus like particles (VLP) formation (59) . This protein form viroporins that are proteins of hydrophobic nature and small in architecture. These viroporins are essential for viral assembly, along with its release. They also mediate pathogenic processes and induce cytotoxicity (60) . The interactions of nsp2 and nsp3 heterotypically essential in inducing the desired curvature in the Endoplasmic Reticulum (ER) membrane and M and E coexpression facilitates the production of spherical virulent particles (29) . The tail of SARS-CoV E protein that lies in cytoplasm targets cis-Golgi complex region with the help of proline residues incorporated in it. The N terminal of E protein also contains additional Golgi complex associating elements due to which mutations in the tail region do not affect the Golgi complex targeting process (61) . The ionic gradient may be dissipated in the ERGIC and Golgi compartment by the E that may lead to the exit of virion (52). The final four amino acids present in the C terminal of E protein hosts a motif named as postsynaptic density protein/Disc Large/Zonula occludans-1 (PDZ) binding motif (PDM). Therefore E protein may facilitate disruption of the epithelium of the lungs due to binding of Protein Associated with Caenorhabditis elegans Lin-7 protein 1 (PALS1) to the PDM (62). The M protein is present in a high amount out of all proteins in coronaviruses (63) . Its length spans to about 220-260 amino acids with a short length N terminal domain, attached to triple transmembrane domains that are further connected to a carboxyl-terminal domain and belong to the N-linked glycosylated proteins and conserved domain of 12 amino acids (64) . Structural analysis of this protein shows that it exists in two forms, namely long and compact form. These two forms are originally homodimers of N terminal ectodomain and C terminal endodomain that are different in conformation. The endodomain may undergo elongation or compression due to which it is named as a long and compact form. Spike protein can be seen over these both forms but predominantly at the long-form that may suggest this form promotes spike installation. The tyrosine reduces at 211 may be essential instability of the long form of M. This form bends the membrane that creates a spherical structure encircling the ribonucleoprotein. (65) . The M protein is organized in a 2D lattice and provides a scaffold in viral assembly (27) . They undergo translation on the polysomes bound to the membranes, fused in the ER, and carried towards the Golgi complex where they interact with E proteins to generate virions. Out of the three TDM, the first one is capable enough to encourage self-association of M proteins, improved membrane affinity, and retention in Golgi (66) . Proline is expected to create a steric bulk and rigorousness due to which the structure of SARS-CoV-2 may undergo a change in its conformation (18 x-x Running title: SARS-CoV-2 structural proteins The species Severe acute respiratory syndromerelated coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 World Health Organization COVID-19: A novel zoonotic disease caused by a coronavirus from China: What we know and what we don't Phylogenetic Analysis and Structural Modeling of SARS-CoV-2 Spike Protein Reveals an Evolutionary Distinct and Proteolytically-Sensitive Activation Loop Transmission electron microscopy imaging of SARS-CoV Structure, Function, and Evolution of Coronavirus Spike Proteins Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition Betacoronavirus Adaptation to Humans Involved Progressive Loss of Hemagglutinin-Esterase Lectin Activity COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses Structure analysis of the receptor binding of 2019-nCoV Genomic characterization of the 2019 novel humanpathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Subunit Vaccines Against Emerging Pathogenic Human Coronaviruses The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak-an update on the status Identification of diverse bat alphacoronaviruses and betacoronaviruses in china provides new insights into the evolution and origin of coronavirus-related diseases From SARS to MERS, thrusting coronaviruses into the spotlight Commentary Genome Composition and Divergence of the Novel Coronavirus ( 2019-nCoV ) Originating in China The establishment of reference sequence for SARS-CoV-2 and variation analysis COVID-2019: The role of the nsp2 and nsp3 in its pathogenesis On the origin and continuing evolution of SARS-CoV-2 Genomic variations of SARS-CoV-2 suggest multiple outbreak sources of transmission Phylogenetic network analysis of SARS-CoV-2 genomes Evidence of the Recombinant Origin and Ongoing Mutations in Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) The Coronaviridae The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade Return of the coronavirus: 2019-nCoV Drug and vaccine design against Novel Coronavirus (2019-nCoV) spike protein through Computational approach Supramolecular Architecture of Severe Acute Respiratory Syndrome Coronavirus Revealed by Electron Cryomicroscopy Fusion mechanism of 2019-nCoV and fusion inhibitors targeting HR1 domain in spike protein Coronavirus envelope protein: Current knowledge Evidence that TMPRSS2 Activates the Severe Acute Respiratory Syndrome Coronavirus Spike Protein for Membrane Fusion and Reduces Viral Control by the Humoral Immune Response Cryo-EM structure of the 2019-nCoV spike in the perfusion conformation. Science (80-) 2020;1-9 Receptor recognition by novel coronavirus from Wuhan: An analysis based on decade-long structural studies of SARS Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag Structure of the N-terminal RNA-binding domain of the SARS CoV nucleocapsid protein Determination of host proteins composing the microenvironment of coronavirus replicase complexes by proximity-labeling Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites Nucleocapsid protein of SARS-CoV activates the expression of cyclooxygenase-2 by binding directly to regulatory elements for nuclear factor-kappa B and CCAAT/enhancer binding protein The nucleocapsid protein of SARS-associated coronavirus inhibits B23 phosphorylation Interactions of SARS Coronavirus Nucleocapsid Protein with the host cell proteasome subunit p42 SARS-CoV nucleocapsid protein antagonizes IFN-β response by targeting initial step of IFN-β induction pathway, and its C-terminal region is critical for the antagonism The nucleocapsid protein of severe acute respiratory syndrome-coronavirus inhibits the activity of cyclin-cyclin-dependent kinase complex and blocks S phase progression in mammalian cells The Nucleocapsid Protein of Severe Acute Respiratory Syndrome Coronavirus Inhibits Cell Cytokinesis and Proliferation by Interacting with Translation Elongation Factor 1 The nucleocapsid protein of SARS coronavirus has a high binding affinity to the human cellular heterogeneous nuclear ribonucleoprotein A1 Modular organization of SARS coronavirus nucleocapsid protein The coronavirus nucleocapsid is a multifunctional protein Multiple Nucleic Acid Binding Sites and Intrinsic Disorder of Severe Acute Respiratory Syndrome Coronavirus Nucleocapsid Protein: Implications for Ribonucleocapsid Protein Packaging The SARS coronavirus nucleocapsid protein -Forms and functions Analysis of SARS-CoV E protein ion channel activity by tuning the protein and lipid charge Small envelope protein E of SARS: Cloning, expression, purification, CD determination, and bioinformatics analysis In-silico approaches to detect inhibitors of the human severe acute respiratory syndrome coronavirus envelope protein ion channel Structural model of the SARS coronavirus E channel in LMPG micelles Sars-CoV-2 Envelope and Membrane proteins : differences from closely related proteins linked to cross-species transmission SARS-CoV envelope protein palmitoylation or nucleocapid association is not required for promoting virus-like particle production Role of the Coronavirus E Viroporin Protein Transmembrane Domain in Virus Assembly Identification of a Golgi Complex-Targeting Signal in the Cytoplasmic Tail of the Severe Acute Respiratory Syndrome Coronavirus Envelope Protein and Alters Tight Junction Formation and Epithelial Morphogenesis Membrane binding proteins of coronaviruses A Conserved Domain in the Coronavirus Membrane Protein Tail Is Important for Virus Assembly A structural analysis of M protein in coronavirus assembly and morphology Self-assembly of severe acute respiratory syndrome coronavirus membrane protein The Membrane Protein of SARS-CoV Suppresses NF-kB Activation The SARS-coronavirus membrane protein induces apoptosis via interfering with PDK1PKB/Akt signalling The Membrane Protein of Severe Acute Respiratory Syndrome Coronavirus Functions as a Novel Cytosolic Pathogen-Associated Molecular Pattern To Promote Beta Interferon Induction via a Toll