key: cord-0897522-862mvafw authors: Nafea Alharbi, Sultan; Fahad Alrefaei, Abdulwahed title: Comparison of the SARS-CoV-2 (2019-nCoV) M protein with its Counterparts of SARS-CoV and MERS-CoV Species date: 2021-01-07 journal: J King Saud Univ Sci DOI: 10.1016/j.jksus.2020.101335 sha: 86422bcff5cce97170bc0861da79349fa44c59b5 doc_id: 897522 cord_uid: 862mvafw Coronaviruses M proteins are well-represented in the major protein component of the viral envelope. During the viral assembly, they play an important role by association with all other viral structural proteins. Despite their crucial functions, very little information regarding the structures and functions of M proteins is available. Here we utilize bioinformatic tools from available sequences and 3D structures of SARS-CoV, SARS-CoV2, and MERS-CoV M proteins in order to predict potential B-cell epitopes and assessing antibody binding affinity. Such study aims to aid finding more effective vaccines and recognize neutralizing antibodies. we found some rather exciting differences between SARS-COV-2, SARS-Cov and MERS-CoV M proteins. Two SARS-CoV-2 peptides with significant antigen presentation scores for human cell surface proteins have been identified. The results reveal that N-terminal domains of M proteins of SARS-CoV and SARS-CoV2 are translocated (outside) whereas it is inside (cytoplasmic side) in MERS-CoV. Coronaviruses (CoVs) family are mostly responsible for enzootic infections. In the last two decades, CoVs have noticeably arisen in human populations, each species within this family has its unique characteristic features but also shared some similarities. However, after the emergence of severe acute respiratory syndorme coronavirus (SARS-CoV) in 2002, this family has been widely known. They are a group of viruses that cause diseases in mammals and birds (Perlman and Netland, 2009) . Unlike other species within this family such as SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), SARS-CoV-2 (or 2019-nCoV) has highly spread in infected population (Huang et al., 2020) . With the numbers infected rising well above a 56 million and confirmed deaths above 1.3 million as of 19th November 2020, it has noticeably become the paramount healthcare for the global community at present. The high mortality rate of some CoVs, along with their ease of transmission accelerates the demand for more investigation into CoV molecular biology which will help in the development of effective anti-coronaviral drugs. Improvement of effective therapeutic and prevent strategies are clearly limited by the lack of detailed structural information on viral proteins. Thought, such proteins are considered as a good model for this class of proteins (Armstrong et al., 1984) . The shape of the viral envelope is mainly determined by its membrane (M) protein, which is the most abundant structural protein in the CoVs family (Neuman et al., 2011) . Analysis of several types of CoVs showed that the viral size presumably depends on the interaction of M protein with spike (S), nucleocapsid (N) proteins and viral genomic RNA (Neuman et al., 2011) . It is also considered as the central organiser of CoVs assembly, due to its interaction with all other structural proteins (Masters, 2006) . For example, interaction of M protein with S protein is required for retention of S protein in the ER-Golgi intermediate compartment and its integration into new virions (Opstelten et al., 1995) . In addition, M protein plays important role in structure-stabilizing of N protein as it is located in the internal core of virions (Mortola and Roy, 2004 , Glowacka et al., 2011 , Narayanan et al., 2000 . It has been demonstrated that M proteins of some CoVs have much higher immunogenicity for T-cell responses than the nonstructural viral proteins (Li et al., 2008) . In addition, it plays a critical role in virus-specific B-cell response due to its ability to produce efficient neutralizing antibodies in SARS patients (Pang et al., 2004) . Vaccine advancement is considered as one of the most significant issues to prevent most infectious diseases mainly when treatment is not available yet. The infection rate of CoVs can be limited by developing a potential vaccine. Bioinformatics tools for prediction B-cell epitope candidates are currently being utilized in several applications including vaccine design, development of diagnostics and monitoring of unwanted immune responses against protein therapeutics (Larsen et al., 2010 , Lund et al., 2011 , Robson, 2020 . Antibodies that are produced by B-cells are significant in predicting effective vaccines (Olsson et al., 2007) . Even though the ability of the human immune system to mount its antibodies against pathogens, only neutralizing antibodies can completely block the entry of pathogens into the human body (Suarez and Schultz-Cherry, 2000) . The body's high ability to produce neutralizing antibodies mainly depends on finding unique epitopic sites on viral surface proteins that those antibodies can bind to. In this study, we performed bioinformatic, and homology structural modeling analyses of three spices of betacoronaviruses: SARS-Cov, SARS-CoV2, and MERS-CoV. We analysed the homology of M protein sequences of those three species and identified all of the amino acid changes in their M protein sequences. We also used IEDB to predict likely epitopes on the M proteins of those species that are likely to be recognized in humans. In this study, M protein sequences of the three species of CoVs were retrieved from the National Center for Biotechnology Information (NCBI): namely SARS-CoV2 (protein ID: YP_009724393), SARS-CoV (protein ID: NP_828855), and MERS-CoV (protein ID: YP_009047210). These protein sequences were then subjected for comparison using different bioinformatics prediction tools. We used M protein sequences of the three species of CoV as a query to search the NCBI Protein Database to identifying M proteins across diverse Coronaviruses species. Consequently, the full length amino acid sequences of those three species were selected for multiple alignment by using CLUSTALX 2.1 program (Thompson et al., 1997) . A bootstrap re-sampling technique was used to ensure the robustness of the generated topological tree. Neighbor Joining (NJ) phylogenetic analysis was conducted in Geneious Prime software (Kearse et al., 2012) . The secondary structure of M proteins of the three CoVs were generated using computer-based structure PSIPRED server (Mcguffin et al., 2000) . Consequently, a three-dimensional (3D) structures of those proteins were predicted after submitting to Phyre2 server (Kelley et al., 2015) . Immune-Epitope Data-base and Analysis Resource (IEDB) (Vita et al., 2015) have been utilized to list available data that are highly related to coronaviruses. BepiPred method in IEDB was used in order to predict linear B-cell epitopes (Jespersen et al., 2017) from the covserved regions with a default threshold vaule 0.55(81% Specificity and 29% Sensitivity). The method combines the predictions of a hidden Markov model and the tendency scale approach (Larsen et al., 2006) . The complete M protein sequences of the three species were analyzed with BepiPred method to predict the potential B-cell epitopes. The hydropathy profile shows that M protein obviously consists of three domains-the amino ( multiple sequence alignment analysis also shows the highly conserved dileucine (LL) motif at the C-terminal domains of all the three proteins (orange box). In addition, the highly conserved Phenylalanine (F95) and S110 residues play important role in virus assembly (Tseng et al., 2010) . Figure 3: Phylogenetic analysis of M protein sequences from 29 orthologues. The M protein sequences were aligned using MUSCLE alignment (Edgar, 2004) . The Neighbor-joining tree was generated based on the alignment. The tree was rooted using Bat-CoV HKU9-5 M protein sequence as the outgroup. The highly related betacoronaviruses MERS-CoV are highlighted (blue color), SARS-CoV (red color), and SARS-CoV2 (green color). Number at nodes indicates bootstrap support (1000 replicates), and the scale bar 2 represents the estimated number of substitutions per site. Accession numbers of sequences used in the analyses are shown next each species. The matter of the hydropathy of a particular sequence of amino acids supposes added significance when structural proteins are considered. Structural proteins are characterized by hydrophobicity and hydrophilicity scores using their amino acid sequences, the grand average hydrophicity values (GRAVY) (Kyte and Doolittle, 1982) . To achieve an initial assessment of shared and specific features of M protein, multiple sequence alignment was performed to compare the M protein sequence of SARS-CoV-2 with that of the SARC-CoV and MERS-CoV. The alignment model was based on the profile HMM. The M protein is conserved across the three coronaviruses. The multiple sequence alignment analysis also shows the highly conserved dileucine (LL) motif at the C-terminal domains of all the three proteins. The mutation in this motif leads to weaken the interaction and packaging between M and N proteins (Tseng et al., 2013 , Saikatendu et al., 2007 , Tseng et al., 2010 . In addition, the highly conserved Phenylalanine (F95) and S110 residues play important role in virus assembly (Tseng et al., 2010) . We constructed phylogenetic tree by using first structure sequences of M protein of those three species as query to retrieve 29 orthologues sequences derived from various CoV species (Figure 3 ), In order to provide important insights into their evolutionary and functional relationships at protein levels. B-cell epitopes are those sites on the protein that can be recognized by antibodies of the immune system. Determining such regions can be utilised in the design of suitable vaccines and diagnostics tests. The traditional experimental epitopes scanning method obviously not practicable on a genomic scale. Prediction approaches are less time-consuming and more cost effective and dependable methods. This study aimed to apply IEDB software in order to predict the appropriate CoV eptitope vaccine against the well-known world population alleles via M protein and its modification sequence after the pandemic spread of SARS-CoV2 in late 2019. The results of this study revealed that the M proteins and their modified sequences of SARS-CoV, SARS-CoV-2 and MERS-CoV can be regarded as a defensive immunogenic with a strong conservation due to their highly capacities to determine neutralizing antibodies. We predicted likely human antibody binding sites (B-cell epitopes) on SARS-CoV, SARS-CoV2 and MERS-CoV M protein with BepiPred. Because of the diversity of the existing of intrinsic disorder prediction methods, we decided to combine them into more accurate meta-prediction method (Xue et al., 2010a) . in order to infer the potentially intrinsic disorder regions of M protein. In our study on the prediction of intrinsic disorder, three predictors were utilised to predict disordered regions. VSL2 (Peng et al., 2006) (Various Short Long, version 2), XL1-XT (Romero et al., 1997) and VL-XT (Li et al., 1999) . All the three predictors employed the same attributes based on amino acid compositions. The amount and the peculiarity of distribution of such regions play important roles in behavior and transmission modes of Coronavirus (Goh et al., 2012 , Goh et al., 2008a , Goh et al., 2008b , Xue et al., 2010b . In general, the amounts of disorder regions in the M proteins of coronaviruses are predicted to be less comparing to other structural viral proteins such as N proteins (Li et al., 1999) . As the main function of M proteins is to protect the virion, it is strongly appealing to suppose that these diversities in the overall disorder regions of M proteins may associate with its mechanism to protect the viruses from different environment conditions which in return can reflect differences in the viral transmission mode. Further development on how coronavirus will behave in terms of transmission would be extraordinarily effective not just for medical but also fundamental research. Such a model will also provide a tool to aid the implementation of public health policies for dealing with old and even newly emerging pathogenic viruses. Sequence and topology of a model intracellular membrane protein, E1 glycoprotein, from a coronavirus Assembly of the coronavirus envelope: homotypic interactions between the M proteins MUSCLE: multiple sequence alignment with high accuracy and high throughput Evidence that TMPRSS2 activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response A comparative analysis of viral matrix proteins using disorder predictors Protein intrinsic disorder toolbox for comparative analysis of viral proteins Understanding viral transmission behavior via protein intrinsic disorder prediction: Coronaviruses Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data The Phyre2 web portal for protein modeling, prediction and analysis A simple method for displaying the hydropathic character of a protein Improved method for predicting linear B-cell epitopes Identification of CD8+ T cell epitopes in the West Nile virus polyprotein by reverse-immunology using NetCTL T cell responses to whole SARS coronavirus in humans Predicting protein disorder for N-, C-and internal regions Human leukocyte antigen (HLA) class I restricted epitope discovery in yellow fewer and dengue viruses: importance of HLA binding strength The molecular biology of coronaviruses The PSIPRED protein structure prediction server Efficient assembly and release of SARS coronavirus-like particles by a heterologous expression system Characterization of the coronavirus M protein and nucleocapsid interaction in infected cells A structural analysis of M protein in coronavirus assembly and morphology Induction of immune memory following administration of a prophylactic quadrivalent human papillomavirus (HPV) types 6/11/16/18 L1 virus-like particle (VLP) vaccine Envelope glycoprotein interactions in coronavirus assembly Protective humoral responses to severe acute respiratory syndrome-associated coronavirus: implications for the design of an effective protein-based vaccine Length-dependent prediction of protein intrinsic disorder Coronaviruses post-SARS: update on replication and pathogenesis Membrane protein molecules of transmissible gastroenteritis coronavirus also expose the carboxy-terminal region on the external surface of the virion Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the SARS-CoV-2 (2019-nCoV, COVID-19) coronavirus. Computers in biology and medicine Sequence data analysis for long disordered regions prediction in the calcineurin family Ribonucleocapsid formation of severe acute respiratory syndrome coronavirus through molecular action of the N-terminal domain of N protein A hidden Markov model for predicting transmembrane helices in protein sequences Immunology of avian influenza virus: a review The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Identifying SARS-CoV membrane protein amino acid residues linked to virus-like particle assembly Self-assembly of severe acute respiratory syndrome coronavirus membrane protein The immune epitope database (IEDB) 3.0. Nucleic acids research Studies on membrane topology, N-glycosylation and functionality of SARS-CoV membrane protein PONDR-FIT: a meta-predictor of intrinsically disordered amino acids Viral disorder or disordered viruses: do viral proteins possess unique features? Protein and peptide letters We extend our appreciation to the Research Support Project (number RSP-2020/218), King Saud University, Riyadh, Saudi Arabia.