key: cord-0819435-g0u5qoap authors: Yurina, Valentina title: Coronavirus epitope prediction from highly conserved region of spike protein date: 2020-07-31 journal: Clin Exp Vaccine Res DOI: 10.7774/cevr.2020.9.2.169 sha: 2f5712ad99f83b726c0f465aff85f017c97bf5db doc_id: 819435 cord_uid: g0u5qoap PURPOSE: The aim of this research was to predict the epitope for coronavirus family spike protein. Coronavirus family is highly evolved viruses which cause several outbreaks in the past decades. Therefore, it is crucial to design a global vaccine candidate to prevent the coronavirus outbreak in the future. MATERIALS AND METHODS: The spike protein amino acid sequences from nine coronavirus family were searched in the Uniprot database. The spike protein sequences were aligned using Clustal method. The highly conservatives amino acids were analyzed its B cell linear and continuous epitopes and T cell epitopes. RESULTS: From the alignment results it was found that there is a highly conserved region in the extracellular domain of spike protein. With prediction methods from this highly conserved region, B cell and T cell epitopes from spike protein were derived. CONCLUSION: From several different prediction results, B cell epitope and T cell epitope were identified in the highly conserved region thus it is promising to be developed as a coronavirus vaccine candidate. virus, which has been contagious in Saudi Arabia since March 2012, has never before been found in the world and has characteristics that are different from the SARS coronavirus that infected 32 countries in the world in 2003 [3] . All types of coronaviruses cause clinical symptoms that can include fever, coughing, acute respiratory distress, pneumonia, fatigue, headaches, dyspnea, lymphopenia, and infrequently cause gastrointestinal symptoms such as diarrhea. Severe COVID-19 infection can be characterized by turbidity in both lung subpleural areas, acute respiratory distress syndrome, and acute cardiac injury. In critical patients occur both local and systemic immune responses, which lead to intense inflammation [1, 4] . Vaccination is still the most effective preventive for virus infection. One of the latest vaccine technology developments are peptide-based vaccines or epitope vaccines. Epitope based vaccine is synthesized based on in silico analyzes through an immunoinformatics approach. In silico studies reduce costs and time needed in developing vaccines and construct vaccines with higher efficacy and safety than conventional vaccines [5] [6] [7] . Looking at the global pandemic COVID-19, MERS, and SARS caused by coronavirus, it is considered necessary to develop an effective vaccine against all types of coronavirus. Alignment of nine strains of the coronavirus has now been carried out and a highly conserved region of the S2 spike protein has been found. Highly conserved regions can be poten-tial vaccine candidates because they can recognize various strains of the coronavirus. Spike protein is a surface protein in coronavirus that plays a role in binding with receptors and facilitating membrane fusion. The spike S1 protein plays a role in binding virions to the cell membrane through its interaction with the receptors so that it initiates the infection process. S2 protein facilitates fusion between virions and cell membranes [8, 9] . Spike protein sequences from nine coronavirus strains were collected from protein data bank (https://www.uniprot.org/) ( Table 1) . Nine spike protein sequences were aligned using COBALT (constraint-based multiple alignment tools) which is available at https://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi. Highly conservatives' sequences were chosen and analyzed its B cells epitope using several tools (Emini Surface Accessibility Prediction, Chou and Fasman Beta-turn Prediction, Parker hydrophilicity prediction, Kolaskar Tongaonkar Antigenicity for linear epitopes) and DiscoTope for continuous epitopes. While the T cell epitopes were predicted using NetCTL, Immune Epitope Database (IEDB)-major histocom- Spike protein sequences from nine strains of coronavirus which infected human were collected. Alignment result showed a highly conserved region in amino acid number 945-1100 from severe acute respiratory syndrome coronavirus 2 (2019-nCoV, SARS-CoV-2) spike protein (Fig. 1) . This region was used to predict the T and B cells epitopes. Several tools to predict T cells epitopes identified epitopes that presented by MHC class I and II (Table 2) . While, the B cells linear epitopes prediction was presented in Table 3 , the continuous B cells epitopes is demonstrated in Fig. 2 . In summary, all of the epitopes identified in highly conserved region is revealed in Fig. 3 . Vaccination is one of the most effective approaches to prevent viral infections. However, the development of vaccines requires a long time and high costs since it is required for the screening of large arrays of potential epitope candidates. Using the in-silico predictions method, it can dramatically reduce the cost for vaccine development. The immune system recognizes antigens through the mechanism of humoral and cellular immune systems, each of which is mediated by B cells and T cells. Both types of immune cells recognize the antigen not as a whole but only in a portion of the pathogenic components called antigens. The introduction of B cell antigens and T cells requires a different process [10] . We predict epitopes from spike glycoprotein (S protein) since this protein has been studied as the most antigenic part of the virus [11] . Prior to epitope prediction, sequencing of S protein sequences of nine strains of the coronavirus was carried out. From this alignment, it is obtained that the highly conserved region is from amino acid residue number 945-1100. From the highly conserved region, epitope prediction is carried out; both B cell epitope and T cell epitope. Epitope prediction is performed in the highly conserved area with the intention that the vaccine can be used for a variety of coronavirus strains, including it is expected that if a new type of virus strain develops in the future, the area this is conserved and vaccination remains effective. Our findings provide a sequence from highly conserved region of S2 protein which can help guide new experimental efforts to develop coronavirus vaccine candidate. B cell epitope prediction is performed to predict both linear and continuous epitopes. From the prediction of linear epitopes in the highly conserved region it was found that the area contained several potential epitopes. Prediction of continuous epitopes has similar results with the presence of epitopes that is recognized by B cells in the spike protein. T cell epitopes prediction in highly conserved region also has similar results. The conclusion of these predictions is the presence of epitopes in the highly conserved region so that they can be developed as vaccine candidates. The results of this study can be a reference for the next stage of coronavirus vaccine development. A delivery strategy that can be useful in the development of the coronavirus vaccine is by the mucosal pathway using live bacteria vector as a career. Live bacteria become an important career because they can induce the mucosal immune system in addition to the systemic immune system [12] , the mucosal immune system is very important to defense against viral infections that attack the respiratory tract. Fig. 3 . Selected highly conserved region for epitopes prediction is presented in yellow, T cell epitopes showed in underlined font, and B cell linear epitopes showed in red, numbers indicated the amino acid. The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak Epidemiologic clues to SARS origin in China Comparative epidemiology of Middle East respiratory syndrome coronavirus (MERS-CoV) in Saudi Arabia and South Korea Pathological findings of COV-ID-19 associated with acute respiratory distress syndrome Proof of principle for epitope-focused vaccine design Recent advances in B-cell epitope prediction methods Computer-aided drug discovery and development (CADDD): in silico-chemico-biological approach Preliminary identification of potential vaccine targets for the COVID-19 Coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies Strong evolutionary convergence of receptor-binding protein spike between COVID-19 and SARS-related coronaviruses Fundamentals and methods for T-and B-cell epitope prediction Candidate targets for immune responses to 2019-novel coronavirus (nCoV): sequence homology-and bioinformatic-based predictions Live bacterial vectors: a promising DNA vaccine delivery system