key: cord-0893831-hf7fgtnp authors: Vashi, Yoya; Jagrit, Vipin; Kumar, Sachin title: Understanding the B and T cell epitopes of spike protein of severe acute respiratory syndrome coronavirus-2: A computational way to predict the immunogens date: 2020-05-27 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2020.104382 sha: f6f03679898f1ab4fd917301297b8b266360320d doc_id: 893831 cord_uid: hf7fgtnp The 2019 novel severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) outbreak has caused a large number of deaths, with thousands of confirmed cases worldwide. The present study followed computational approaches to identify B- and T-cell epitopes for the spike (S) glycoprotein of SARS-CoV-2 by its interactions with the human leukocyte antigen alleles. We identified 24 peptide stretches on the SARS-CoV-2 S protein that are well conserved among the reported strains. The S protein structure further validated the presence of predicted peptides on the surface, of which 20 are surface exposed and predicted to have reasonable epitope binding efficiency. The work could be useful for understanding the immunodominant regions in the surface protein of SARS-CoV-2 and could potentially help in designing some peptide-based diagnostics. Also, identified T-cell epitopes might be considered for incorporation in vaccine designs. Emerging severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is a recent pandemic and has been declared as a public health emergency by the World Health Organization ((WHO, 2020b) . The disease rapidly spread across the globe and caused havoc to humanity (Wu and McGoogan, 2020) . By the start of May, SARS-CoV-2 had spread to 215 countries and infected over 3,862,676 people (WHO, 2020a) . The WHO is continuously monitoring and updating health-related plans to curtail the disease spread. The absence of a specific treatment and vaccine worsens the situation and threatens the world. The International Committee on Taxonomy of Viruses (ICTV), classified SARS-CoV-2 under the family Coronaviridae of order Nidovirales. The genomic sequence of SARS-CoV-2 isolated from the bronchoalveolar lavage fluid of a patient from Wuhan, China showed a length of 29,903 nucleotides (GenBank accession number NC_045512 . SARS-CoV-2 contains a positive-sense single-stranded RNA with 5ˊ and 3ˊ untranslated region. The genome codes for ORF1a, ORF1b, Spike (S), ORF3a, ORF3b, Envelope (E), Membrane (M), ORF6, ORF7a, ORF7b, ORF8, ORF9b, ORF14, Nucleocapsid (N), and ORF10 from 5ˊ to 3ˊ Zhu et al., 2020) . The S glycoprotein forms a homotrimer and mediates viral entry into host cells. The S protein is a potential target for therapeutic and vaccine design against SARS-CoV-2 infection in humans (Li, 2016; Tortorici et al., 2019) . The S glycoprotein comprises two functional subunits: the S1 subunit is responsible for binding to the host cell receptor and the S2 subunit is responsible for fusion of the virus with the cell membrane. Usually in CoVs, S is cleaved at the boundary between S1 and S2 subunits, which remain non-covalently bound in the prefusion conformation, to activate the protein for membrane fusion via extensive irreversible conformational changes (Burkard et al., 2014; Park et al., 2016; Walls et al., 2017) . Setting it apart from other SARS-CoVs, it is found that the S glycoprotein of SARS-CoV-2 harbors a J o u r n a l P r e -p r o o f furin cleavage site at the boundary between the S1/S2 subunits (Walls et al., 2020) . By now, it is evident that SARS-CoV-2 S uses angiotensin-converting enzyme 2 (ACE2) receptormediated entry into cells. Some studies suggest similar binding affinities to human ACE2 with the S protein of SARS-CoV-2 and SARS-CoV (Letko et al., 2020; Walls et al., 2020) . However, some suggest that SARS-CoV-2 binds ACE2 with higher affinity than SARS-CoV (Tai et al., 2020; Wang et al., 2020; Wrapp et al., 2020) . As the situation worsens, there is a growing need for the development of suitable therapeutics, vaccines, and other diagnostics against SARS-CoV-2 for effective disease management strategies. Vaccines and diagnostic assays based on peptides have become increasingly substantial and indispensable for their advantages over conventional methods (Li et al., 2014; Mohanraj et al., 2017) . The present study aimed to locate appropriate epitopes within a particular protein antigen that can elicit an immune response and could be selected for the synthesis of an immunogenic peptide. Using a computational approach, the S glycoprotein of SARS-CoV-2 was explored to identify various immunodominant epitopes for the development of diagnostics and vaccines. Besides, the results could also help us to understand the SARS-CoV-2 surface protein response towards T-and B-cells. The amino acid sequences (n=98) of S protein available at the time of study on targeted SARS-CoV-2 were downloaded from the National Centre for Biotechnological Information (NCBI) database. To identify an immunodominant region, it is of extreme importance to select the conserved region within the S protein of SARS-CoV-2. All the sequences were compared J o u r n a l P r e -p r o o f among themselves for variability using the protein variability server by the Shannon method (Garcia-Boronat et al., 2008) . The average solvent accessibility (ASA) profile was predicted for each sequence using the SABLE server (Adamczak et al., 2004) . BepiPred 1.0 Linear Epitope Prediction module incorporated in Immune Epitope Database (IEDB) was used to predict potential epitopes within the S protein (Haste Andersen et al., 2006; Larsen et al., 2006; Ponomarenko and Bourne, 2007; Vita et al., 2019) . The FASTA sequence of the targeted protein was used as an input for all the default parameters. We used two web-based tools for B-cell epitope prediction: the IEDB and ABCpred servers (Saha and Raghava, 2006) . S protein structure from the protein data bank (PDB, 6VSB) was analyzed for linear and discontinuous B-cell epitopes using the ElliPro module on the IEDB server with default settings (Ponomarenko et al., 2008; Wrapp et al., 2020) . Also, the ABCpred server was used to detect B-cell epitopes using the artificial neural network (ann) method. T-cell epitopes with a binding affinity towards MHC-I and MHC-II alleles were selected to boost up both cytotoxic T-cell and helper T-cell mediated immune response. IEDB server was used to predict the major histocompatibility complex (MHC)-I and MHC-II binding epitopes for the targeted protein. The reference set of alleles was used for predicting the MHC-I and MHC-II T-cell epitopes (Karosiene et al., 2012; Nielsen et al., 2007; Nielsen et al., 2003; Peters and Sette, 2005; Sturniolo et al., 1999) . In our study, we targeted the S glycoprotein of SARS-CoV-2 as it is present outside the virus and interacts with the host receptor. At the time of the study, there were 98 J o u r n a l P r e -p r o o f sequences available for the targeted protein of SARS-CoV-2. The S glycoprotein sequence is 1,273 amino acids long, except for that of the virus isolated from Kerala (India), which is a 1,272 amino acid long S glycoprotein (GenBank accession number MT012098). Our interest here was to determine conserved regions first and then determine surface-exposed regions, which are potential epitopes to generate an immune response. We found that sequences among all the S proteins in the analysis are least variable and highly conserved, as shown in ASA value are more surface exposed compared to others. We identified a total of 24 peptides of varying lengths, which were selected based on high ASA values (Table 1 ). The potential epitope regions were predicted using the sequence of the S protein of SARS-CoV-2 that showed the least variability (GenBank accession number NC_045512). The potential epitopes are represented by blue peaks, while green-colored slopes represent non-epitopic regions ( Figure 2 ). The existence of B-cell linear and discontinuous (conformational) epitopes within the identified segments could help us to identify the peptides, which can elicit an immune response (Purcell et al., 2007) . We identified 18 linear epitopes, predicted by ElliPro (IEDB), which contained regions from 19 of our selected peptides (highlighted in red in Table 2 ). These identified B-cell linear epitopes were placed based on their positional value and scores. Epitopes with high scores have more potential for antibody binding. Five of our selected J o u r n a l P r e -p r o o f peptides (peptide numbers 3, 5, 19, 23, and 24 in Table 1) were not considered as potential linear B-cell epitopes. Some parts of our identified epitopes were in accordance with epitopes recognized in an earlier study (Ahmed et al., 2020) , which further supports the credibility of our identified epitopes. Using the same module, B-cell discontinuous epitopes were predicted, which gave 16 epitope regions that contained regions from 18 of our selected peptides (highlighted in red in Table S1 ). Six peptides (peptide numbers 3, 5, 14, 19, 23, and 24 in Table 1) were not predicted as discontinuous B-cell epitopes. To further confirm, we used the ABCpred server to detect B-cell epitopes, with a default threshold of 0.51. It identified various epitopes with different lengths and scores. Out of those, the regions that contained our selected peptides are highlighted in red in Table3. A high score represents good binding affinity with epitopes; most of our peptides scored more than 0.7 and were predicted as linear B-cell epitopes. We used the IEDB server to determine the binding affinity for the human leucocyte antigen (HLA). As recommended by the IEDB server, reference HLA allele sets were used for the prediction of MHC-I and MHC-II T-cell epitopes, as they provide comprehensive coverage of the population. All the predictions were made using IEDB recommended procedures. The list of binding affinities for MHC-I T-cell epitopes is given in Table S2, where low rank represents high binding affinity. Similarly, the list of binding affinities for MHC-II T-cell epitopes are given in Table 4 . Regions from our selected peptides are highlighted in red. The epitopes with rank <1 % for very high binding affinity were selected. We also observed that some of the peptides we identified as potential B-cell epitopes were present as T-cell epitopes with good binding affinities. Overall, it was found that the regions identified in Table 1 not only had good B-cell and T-cell affinities, but the majority of them had also overlapped with discontinuous epitopes (Table S1 ). The peptide segments identified from the set of 98 sequences of the J o u r n a l P r e -p r o o f SARS-CoV-2 S glycoprotein appear to hold reasonable potential to act as immunogens. Peptide-based diagnostics and vaccines have previously been proposed against virus outbreaks (Dey et al., 2017; Ichihashi et al., 2011; Navalkar et al., 2015; Oany et al., 2014; Zhao et al., 2009) . The availability of a 3D structure (6VSB) of the SARS-CoV-2 S glycoprotein provided an opportunity to inspect the predicted peptides. Placement of the peptide segments identified by ASA and conserved sequence analysis on the S glycoprotein showed that 20 of the regions we identified lie on the surface (Figure 3) . In order to limit recognition and evade the immune response of the host, coronaviruses use conformational masking and glycan shielding Xiong et al., 2018) . SARS-CoV-2 S trimer also exists in multiple distinct conformational states, which is necessary for receptor engagement, leading to the initiation of fusogenic conformational changes (Walls et al., 2020) . The considerable number of peptides at the surface region of the S glycoprotein allows for the potential use of those peptide regions as immunogens. Binding to the ACE2 receptor is a critical initial step for the SARS-CoV-2 in entering target cells. Recent studies have also pointed out the vital role of ACE2 in mediating the entry of SARS-CoV-2 (Hoffmann et al., 2020). Receptor binding motif (RBM) is part of the receptor-binding domain (RBD) of SARS-CoV-2, which contains most of the contacting residue for ACE-2 binding (Lan et al., 2020) . It was observed that some of our identified peptides from Table 1 (peptide no. 7-12) fall in the regions of RBD (amino acid no. 319-540) and RBM (amino acid no. 438-506), which makes them potential peptide regions to be used. The emergence of new viral diseases like SARS-CoV-2 represents a substantial global disease burden. Over the past few months, there have been increased research efforts for the design and development of diagnostics and vaccines for SARS-CoV-2. Some related analyses have been reported in distinct, parallel studies (Baruah and Bose, 2020; Bhattacharya et al., 2020; Grifoni et al., 2020) . Our study leverages the available resources and computational J o u r n a l P r e -p r o o f methods and adds to the ongoing research focused on the development of diagnostics and vaccines against SARS-CoV-2. Other than already existing ones, we have identified a further number of peptides, which adds to the library of peptides that are likely to be recognized by human immune responses. Facilitated by high mutation rates, traditional vaccines based on antibody-mediated protection are often poor inducers of T-cell responses and can have limited success (Rosendahl Huber et al., 2014) . Peptide-based sensitive and rapid diagnostic kits are considered a better alternative to the conventional serological tests, including whole antigenic protein (Mohanraj et al., 2017) . In our study, we predicted both B-cell and T-cell epitopes for conferring immunity in different ways. We speculate that the identified epitopes with considerably good epitope binding efficiency have the potential to be an immunodominant peptide. The study could help us to use the predicted peptide as an immunogen for the development of diagnostics and vaccines against SARS-CoV-2. In the present study, peptide segments were identified on S proteins for the development of diagnostics and vaccines against SARS-CoV-2. The recent availability of 3D data on 2019-CoV S glycoprotein has helped the search. SARS-CoV-2, being an RNA virus, has a high mutation rate and undergoes active recombination (Yi, 2020) . Although the peptides identified are ideal candidates as immunogens for the development of peptide-based diagnostics and vaccines, more refinement and lab trials are essential steps that are yet to be undertaken for early development before the identified epitopes are rendered obsolete. numbers. High ASA value means the solvent accessibility score is relatively higher for that region and it is more surface exposed with respect to its neighbours. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses Structure, Function, and Evolution of Coronavirus Spike Proteins Peptide Vaccine: Progress and Challenges Peptide Based Viral Detection Systems for Effective Diagnosis of Common Viral Infections in India Peptide based diagnostics: are random -se quence peptides more useful than tiling proteome sequences? Prediction of MHC class II binding affinity using SMMalign, a novel stabilization matrix alignment method Reliable prediction of T-cell epitopes using neural networks with novel sequence representations Design of an epitope -based peptide vaccine agai nst spi ke protein of human coronavirus: an in silico approach Proteolytic processing of Middle East respiratory syndrome coronavirus spikes expands virus tropism Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method Antibody-protein interactions: benchmark datasets and prediction tools evaluation More than one reason to rethink the use of peptide s in vaccine design T cell re sponse s to viral infections -opportunities for Peptide vaccination Prediction of continuous B-cell epitopes in an antigen using re curre nt neural network Generation of tissue -specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine Structural basis for human coronavirus attachment to sialic acid receptors The Immune Epitope Database (IEDB): 2018 update Structure, Functi on, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell Tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation A new coronavirus associated with human respiratory disease in China Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China Glycan Shield and Fusion Activation of a Deltacoronavirus Spike Glycoprotein Fine-Tuned for Enteric Infections 2019 novel coronavirus is undergoing active recombination Screening of specific diagnostic peptides of swine hepatitis E virus A Novel Coronavirus from Patients with Pneumonia in China Figure 3 . Our selected peptides are highlighted on spike protein of SARS-CoV-2 protein structure downloaded from PDB (ID: 6VSB). 5 249 261 13 LTPGDSSSGWTAG 6 278 287 10 KYNENGTITD 7 314 325 12 QTSNFRVQPTES 8 407 428 22 VRQIAPGQTGKIADYNYKLPDD 9 437 450 14 NSNNLDSKVGGNYN 10 461 485 25 LKPFERDISTEIYQAGSTPCNGVEG 11 493 506 14 QSYGFQPTNGVGYQ 12 521 533 13 PATVCGPKKSTNL 13 567 581 15 RDIADTTDAVRDPQT 14 597 607 11 VITPGTNTSNQ 15 625 648 24 HADQLTPTWRVYSTGSNVFQTRAG 16 654 661 8 EHVNNSYE 17 673 691 19 SYQTQTNSPRRARSVASQS 18 700 713 16 GAENSVAYSNNSIA 19 768 780 13 TGIAVEQDKNTQE 20 788 799 14 IYKTPPIKDFGG 21 805 816 12 ILPDPSKPSKRS 22 1134 1150 17 NNTVYDPLQPELDSFKE 23 1153 1171 19 DKYFKNHTSPDVDLGDISG 24 1255 1267 13 KFDEDDSEPVLKG