key: cord-0835463-wbg1wmcm authors: Lin, Li; Ting, Sun; Yufei, He; Wendong, Li; Yubo, Fan; Jing, Zhang title: Epitope-based peptide vaccines predicted against novel coronavirus disease caused by SARS-CoV-2 date: 2020-07-01 journal: Virus Res DOI: 10.1016/j.virusres.2020.198082 sha: 6f2add322eaa7e97e9ff5c2995feb18d0620c46f doc_id: 835463 cord_uid: wbg1wmcm The outbreak of the 2019 novel coronavirus (SARS-CoV-2) has infected millions of people with a large number of deaths across the globe. The existing therapies are limited in dealing with SARS-CoV-2 due to the sudden appearance of the virus. Therefore, vaccines and antiviral medicines are in desperate need. We took immune-informatics approaches to identify B- and T-cell epitopes for surface glycoprotein (S), membrane glycoprotein (M) and nucleocapsid protein (N) of SARS-CoV-2, followed by estimating their antigenicity and interactions with the human leukocyte antigen (HLA) alleles. Allergenicity, toxicity, physiochemical properties analysis and stability were examined to confirm the specificity and selectivity of the epitope candidates. We identified a total of five B cell epitopes in RBD of S protein, seven MHC class-I, and 18 MHC class-II binding T-cell epitopes from S, M and N protein which showed non-allergenic, non-toxic and highly antigenic features and non-mutated in 55,179 SARS-CoV-2 virus strains until June 25, 2020. The epitopes identified here can be a potentially good candidate repertoire for vaccine development. The primary sequence of SARS-CoV-2 protein was retrieved from the NCBI database using accession number MN908947.3 [3] . Experimentally known 3D structure of SARS-CoV-2 S protein (PDB ID: 6VSB) and N protein (PDB ID: 6VYO) were retrieved from Protein Data Bank [1] . There is no 3D structure of M protein available yet. The predicted interaction conformation between RBD of SARS-CoV-2 S protein and human ACE-2 was retrieved from a very recent report [27] [28] [29] [30] [31] [32] . The protein sequence was analyzed for its chemicals and physical properties including GRAVY (Grand average of hydropathicity), half-life, molecular weight, stability index, and amino acid atomic composition via an online tool Protparam [33] . TMHMM v2.0 (http://www.cbs.dtu.dk/services/TMHMM/) was applied to examine the transmembrane topology of S and M protein. The secondary structure of the SARS-CoV-2 S, M and N protein was analyzed by PSIPRED [34] . The existence of disulfide-bonds was examined through an online tool DIANNA v1.1 which uses a trained neural system to make predictions [35] . Antigenicity of full-length S, M and N protein were evaluated by VaxiJen v2.0 [36] . IEDB (Immune-Epitope-Database And Analysis-Resource) [37] were used to predict linear B-cell epitopes using Bepipred and Bepipred2.0 with default parameter settings, Kolaskar and Tongaonkar antigenicity, Parker hydrophilicity, Chou and Fasman beta-turn, and Karplus and Schulz flexibility. BcePred [38] was also used to predict linear B-cell epitopes using accessibility, antigenic propensity, exposed surface, flexibility, hydrophilicity, polarity, and turns. Predicted linear B-cell epitopes by IEDB and BcePred were combined to the linear B-cell epitope candidate list. Based on the transmembrane topology of S and M protein predicted by TMHMM v2.0, only epitopes on the outer surface remained, and other intracellular epitopes were eliminated. VaxiJen 2.0 [36] was applied to evaluate the antigenicity of the remained epitopes. A stringent criterion was used to have epitopes with an antigenicity score of 0.9 viewed adequate to initiate a defensive immune reaction. A B-cell discontinuous epitope forms the antigen-binding interface through fragments scattered along the protein sequence. DiscoTope2.0 [39] with a discotope score threshold of -3.7 was used to predict discontinuous epitopes. As the 3D structure of M protein is not available, open-source Pymol was used to examine the positions of selected linear and discontinuous epitopes on the 3D structure of SARS-CoV-2 S protein or the interacting conformation of S protein RBD and human ACE-2 [28] [29] [30] . Peptide_binding_to_MHC_class_I_molecules tool of IEDB and HLA class I set [40] was utilized to predict MHC class I binding T-cell epitopes. Peptide_binding_to_MHC_class_II_molecules tool of IEDB and HLA class II set [41] was utilized to predict T-cell epitopes. Percentile rank with a threshold of 1% for MHC class I binding epitopes and 10% for MHC class II binding epitopes were used to filter out peptideallele with weak binding affinity. The antigenicity score of each epitope was calculated by VaxiJen v2.0. A high stringent standard was used to filter peptides with antigenicity score larger than or equal to 1, the number of binding alleles larger than or equal to 3 for MHC class I binding epitopes and 5 for MHC class II binding epitopes. The [43] with the top prediction chosen from a total of 10 epitope-protein interaction reports. pepATTRACT [44] was adopted to estimate the Docking score of each peptide with the corresponding HLA allele. ConSurf [45] was used to examine the conservation status for each residue of SARS-CoV-2 by analyzing the amino acid sequences of S, M, and N protein from seven known coronaviruses including SARS-CoV- S protein is an important target for vaccine development because of its important function in entering the host cell. M protein and N protein of coronavirus have also been reported to generate immunogenic epitopes. Table 2 ). Antigenicity analysis of the full-length protein by Vaxijen confirmed that they were expected antigens with an antigenicity score of 0.4646 for S protein, 0.5102 for M protein, and 0.5059 for N protein. As S and M proteins are transmembrane protein, the transmembrane protein topologies were therefore predicted by TMHMM for S and M protein, respectively. The residues from 1 to 1213 were exposed on the surface, residues from 1214 to 1236 were inside transmembrane-region and residues from 1237 to 1273 were within the core-region of the S protein ( Supplementary Fig. 4A ). 1-19 residues were exposed on the surface, with 20-99 residues inside transmembrane-region and 100-222 residues within the core-region of the M protein ( Supplementary Fig.5A ). B-cell epitopes can bind to antigen receptors on the surface of B cells, but N protein is inside the virus. Considering both S and M protein are transmembrane proteins, we attempted to predict B-cell epitopes only for S and M protein (even there is no neutralization activity well known for M protein) in the downstream analysis. We predicted T-cell epitopes for S, M, and N protein. B-cell epitopes can guide B-cell to recognize and activate defense responses against viral infection. Recognition of B-cell epitopes depended on predictions of linear epitopes, antigenicity, hydrophilicity, accessibility of surface, beta-turn, and flexibility [46] . B-cell epitopes of S and M protein were predicted by methods with default settings provided in IEDB [37] including . BcePred [38] was used to predict B-cell epitopes using accessibility, antigenic propensity, exposed surface, flexibility, hydrophilicity, polarity, and turns. Overall, we obtained a total of 129 and 24 linear B-cell epitopes for S and M protein respectively (Supplementary Table 3B ). VaxiJen v2.0 was further used to estimate the antigenicity of all linear B-cell epitopes, resulting in a total of 80 and 4 epitopes for S and M protein with the antigenicity score larger than or equal to 0.9, respectively (Supplementary Table 3C ). Based on the transmembrane topology of S and M protein predicted by TMHMM v2.0, intracellular epitopes were further eliminated. As a result, 78 linear B-cell epitopes from S protein were retained as candidates (Supplementary Table 3C 'KCVNFNFNGLTG' located in the RBD region of the spike head, which is the most exposed region (Table 1 ) (Fig. 1C) . Based on the predicted interacting conformation between RBD domain of SARS-CoV-2 S protein and ACE-2 [27] [28] [29] [30] , the ten linear B-cell epitopes in the spike head substantially overlaps with the interacting surface where ACE-2 binds to RBD [28] [29] [30] , demonstrating that an antibody binding to this surface may block viral entry into cells (Fig. 1D) . After examining the antigenicity of recently reported Bcell epitopes [47, 48] , we discovered that all except for one epitope from Orf3a (antigenicity score of QGEIKDATPSDF: 1.1542) (Supplementary Table 3D ) have much less antigenicity score than the ten linear B-cell epitopes we identified from the most exposed region in spike protein (antigenicity scores ranging from 0.9567 to 1.6969) ( Table 1) . As there is no 3D structure of M protein available, discontinuous B-cell epitopes were predicted for S protein by Discotope 2.0 using A, B, and C chain of the 3D structure of S protein (PDB ID: 6VSB). The positions of discontinuous epitopes were mapped on the surface of the 3D structure of S protein ( Fig. 2A, Supplementary Fig. 6 ). Most discontinuous B-cell epitopes were mapped on the fully-exposed 'spike head' region ( Fig. 2B ) (Supplementary Table 4 ) and exposed 'spike stem' region, while a few located in the 'spike root' region (Supplementary Table 3E ). The main discontinuous B-cell epitopes on the 'spike head' region overlapped with the interacting surface of ACE-2 binding to S protein (Fig. 2C) , suggesting their roles in blocking virus' fusion with cells. J o u r n a l P r e -p r o o f As the ten linear B-cell epitopes in RBD of the S protein were predicted to be of both non-allergen and nontoxin, we further examined their hydrophobicity, hydropathicity, hydrophilicity, and charge by a support vector machine (SVM) based method, ToxinPred (Supplementary Table 5A ). The stability of the ten linear B-cell epitopes was evaluated by the number of peptide-digesting enzymes through the protein digest server (http://db.systemsbiology.net:8080/proteomicsToolkit/proteinDigest.html). More non-digesting enzymes predicted for an epitope suggests its potentially higher stability. All the ten linear B-cell epitopes were found to have multiple non-digesting enzymes varying from 2 to 8 enzymes (Supplementary Table 5B ). Peptide_binding_to_MHC_class_I_molecules tool of IEDB and HLA class I set [40] was utilized to predict T-cell epitopes for S protein. Percentile rank with a threshold of 1% was used to filter out peptide-allele with weak binding affinity. The antigenicity score of each peptide was calculated by VaxiJen v2.0 to evaluate its antigenicity. A peptide having both high antigenicity score and capacity to bind with a larger number of alleles is considered to have high potentials to initiate a strong defense response. High stringent criteria were used to filter peptides with antigenicity score larger than or equal to 1 and the number of binding alleles larger than or equal to 3. Utilizing the evaluating method above, we obtained a total of 27 MHC class-I allele binding peptides from S, M, and N protein (Supplementary Table 6A Allergenicity of T-cell epitopes were assessed by Allergen FP 1.0. Results showed that two of nine, three of nine, two of nine MHC class-I binding peptides from S, M, and N protein were probably non-allergen, respectively (Supplementary Table 8A -C). Nine of thirteen and nine of thirteen MHC class-II binding peptides from S and M protein were predicted to be non-allergen, respectively (Supplementary Table 8A -C). Toxicity of T-cell epitopes along with hydrophobicity, hydropathicity, hydrophilicity, and charge was evaluated by ToxinPred. All but two T-cell epitopes were predicted to be non-toxin (Supplementary Table 8A -C). The stability of T-cell epitopes was evaluated through the number of peptides digesting enzymes by the protein digest server. All T-cell epitopes but 'KMKDLSPRWY' were found to have multiple nondigesting enzymes varying from 3 to 11 enzymes (Supplementary Table 9A Fig. 8A-B) . Fig. 9 ). The highly conserved and exposed residues mainly located from 711 to 1221 in S protein ( Supplementary Fig. 9 ), from 21 to 204 in M protein ( Supplementary Fig. 10) , and from 18 to 311 in N protein ( Supplementary Fig. 11) . Particularly, the epitopes without allergenicity and toxicity containing one functional residue (highly conserved and exposed) included B-cell Table 10 ). No mutations were observed in non-allergenic and non-toxic T-cell epitopes from S, M and N protein. The emergence of SARS-CoV-2 is a serious health threat for the whole society, thus there is an urgent need for drugs and preventative measures. The SARS-CoV-2 infection is characterized by lung infections with symptoms including fever, cough, and shortness of breath. Based on the information from CDC (Centers for Disease Control and Prevention), the symptoms can appear in as few as 2 days or as long as 14 days after exposure to the virus which can transmit from human to human or from contact with infected surfaces and objects [5] [6] [7] . It is essential to identify immune epitopes as quickly as possible. The S protein is crucial in the fuse and entry of the virus into host cells [1] , therefore it is a primary target for neutralizing antibodies. The specificity of epitope-based vaccines can be enhanced by selecting parts of S protein exposed on the surface [52] . Medical biotechnology is important in developing vaccines against SARS-CoV-2 [18] . While computer-based immune-informatics can improve time and economic effectiveness, and therefore, it is also an essential method in immunogenic analysis and vaccine development. In this study, we characterized the physio-chemical characteristics of the SARS-CoV-2 viral genome for epitope candidates and adopted an immune-informatics based pipeline with highly stringent criteria to identify S,M and N protein targeted B-and T-cell epitopes that may potentially promote an immune response in the host. The antigenicity, flexibility, solvent accessibility, disulfide bonds of predicted epitopes were evaluated, yielding a small repertoire of potential B-cell epitope and vaccine candidates. Allergenicity and toxicity analysis suggested the ten linear B-cell epitopes in RBD region are of non-allergen and nontoxin. Stability analysis revealed that they can not be digested by multiple enzymes. Also, two MHC class-I and nine MHC class-II binding T-cell epitopes were predicted to interact with numerous HLA alleles and to be highly antigenic. Allergenicity, toxicity, and physiochemical properties of T-cell epitopes were analyzed to increase specificity and selectivity. The stability and safety were confirmed by digestion analysis. Conservation anlaysis of seven known coronaviruses revealed that RBD region is not conserved. Mutations generated from 51,150 sequences of SARS-CoV-2 in the NGDC database were observed in five of ten linear B-cell epitopes in RBD region. The B-and T-cell (MHC class I and II) epitopes without mutations would be considered to be vaccine candidates with full potentials of being antigenicity. We predict the B-and T-cell epitopes identified here may assist the development of potent peptide-based vaccines to address the SARS-CoV-2 challenge. Particularly, those epitopes without mutations from the conserved regions could generate immunity that is not only cross-protective across Beta coronaviruses but also relatively resistant to ongoing virus evolution [47] . The epitopes predicted here can also potentially be used in the design of more sensitive serological assays for epidemiological or vaccine efficiency assessments. But the replication of SARS-CoV-2 must be error-prone, which is similar to SARS-CoV with a reported mutation rate of 4x10 -4 substitutions/site/year [53] . Anti-viral vaccines are necessary to be developed before the predicted epitopes are potentially obsolete. Moreover, our immune-informatics based pipeline also provides a framework to identify B-and T-cell epitopes for SARS-CoV-2, but not limited to a specific virus. At the same time, we also have to mention that there are limitations in predicting T-cell epitopes. The prerequisite that an epitope can elicit T cell response is the epitope can bind to both MHC alleles and T cell receptors. However, the binding prediction between MHC alleles and an epitope is relatively more accurate, and the binding between an epitope and T cell receptors is extremely difficult to J o u r n a l P r e -p r o o f be predicted. In short, these results here will be useful to guide the design and evaluation of efficient and specific serological assays against epitopes, as well as help prioritize vaccine target designs during this unprecedented crisis [54] . Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding A new coronavirus associated with human respiratory disease in China A pneumonia outbreak associated with a new coronavirus of probable bat origin A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Molecular biology of flaviviruses The coronavirus spike protein is a class I virus fusion protein: structural and functional characterization of the fusion core complex Structure, Function, and Evolution of Coronavirus Spike Proteins Tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion A structural analysis of M protein in coronavirus assembly and morphology Protective humoral responses to severe acute respiratory syndrome-associated coronavirus: implications for the design of an effective protein-based vaccine The coronavirus nucleocapsid is a multifunctional protein Antibody response of patients with severe acute respiratory syndrome (SARS) targets the viral nucleocapsid Progress and Prospects on Vaccine Development against SARS-CoV-2. Vaccines (Basel) Peptide vaccine against chikungunya virus: immuno-informatics combined with molecular docking approach The SARS-CoV-2 Vaccine Pipeline: an Overview From SARS to MERS, Thrusting Coronaviruses into the Spotlight. Viruses Recent Advances in the Vaccine Development Against Middle East Respiratory Syndrome-Coronavirus. Front Microbiol Middle East Respiratory Syndrome Vaccine Candidates: Cautious Optimism. Viruses Immune responses in COVID-19 and potential vaccines: Lessons learned from SARS and MERS epidemic T-cell immunity of SARS-CoV: Implications for vaccine development against MERS-CoV Epitope-Based Vaccine Target Screening against Highly Pathogenic MERS-CoV: An In Silico Approach Applied to Emerging Infectious Diseases Recent progress in adjuvant discovery for peptide-based subunit vaccines Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies Potential T-cell and B-cell epitopes of 2019-nCoV Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine Structural basis of receptor recognition by SARS-CoV-2 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Predicting the angiotensin converting enzyme 2 (ACE2) utilizing capability as the receptor of SARS-CoV-2. Microbes Infect Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: An in silico analysis ExPASy: The proteomics server for in-depth protein knowledge and analysis Scalable web services for the PSIPRED Protein Analysis Workbench DiANNA 1.1: an extension of the DiANNA web server for ternary cysteine classification VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines The immune epitope database and analysis resource: from vision to blueprint BcePred: Prediction of Continuous B-Cell Epitopes in Antigenic Sequences Using Physico-chemical Properties Reliable B cell epitope predictions: impacts of method development and improved benchmarking Comprehensive analysis of dengue virus-specific responses supports an HLAlinked protective role for CD8+ T cells Functional classification of class II human leukocyte antigen (HLA) molecules reveals seven different supertypes and a surprising degree of repertoire sharing across supertypes Epitope-based peptide vaccine design and target site depiction against Middle East Respiratory Syndrome Coronavirus: an immune-informatics study PepSite: prediction of peptide-binding sites from protein surfaces The pepATTRACT web server for blind, large-scale peptide-protein docking ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids Influence of protein flexibility and peptide conformation on reactivity of monoclonal anti-peptide antibodies with a protein alpha-helix A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 Structural basis to design multi-epitope vaccines against Novel Coronavirus 19 (COVID19) infection, the ongoing pandemic emergency: an in silico approach Decamer-like conformation of a nona-peptide bound to HLA-B*3501 due to non-standard positioning of the C terminus Nonstandard peptide binding revealed by crystal structures of HLA-B*5101 complexed with HIV immunodominant epitopes Bound water structure and polymorphic amino acids act together to allow the binding of different peptides to MHC class I HLA-B53 Immunoinformatic analysis of glycoprotein from bovine ephemeral fever virus Residue analysis of a CTL epitope of SARS-CoV spike protein by IFN-gamma production and bioinformatics prediction Two linear epitopes on the SARS-CoV-2 spike protein that elicit neutralising antibodies in COVID-19 patients We thank Qiwen Gan (Postdoctoral research scientist at Columbia University Medical Center) for the discussion about the 3D structure of the RBD domain of S protein and ACE-2.J o u r n a l P r e -p r o o f JZ and YBF conceived and designed this study; JZ, LL, TS, YFH, and WDL performed immuneinformatics analysis. JZ and YBF wrote the manuscript. JZ, YBF, LL, TS, YFH, and WDL improved and revised the manuscript. All authors read and approved the final manuscript. The authors declare no potential conflicts of interest. This work was supported by grants from the National Natural Science Foundation of China (NSFC No. The locations of the ten non-allergenic and non-toxic B-cell epitopes in the spike head which is the most exposed region. (D) The locations of the ten linear nonallergenic and non-toxic B-cell epitopes mapped to the predicted interacting conformation between the RBD domain of SARS-CoV-2 S protein and ACE-2. From A to C, Green, cyan and purple are chain A, chain B and chain C, respectively; Blue, red and pink are the locations of the 34 non-allergenic and nontoxic linear B-cell epitopes in chain A, chain B, and chain C, respectively. In D, light green is the ACE-2; Grey is the RBD of S protein; Light red is the locations of the ten non-allergenic and non-toxic B-cell epitopes in RBD. The e colored by orange is exposed residues according to the neural-network algorithm; the b colored by green is buried residues according to the neuralnetwork algorithm; the f colored by red is predicted functional residue (highly conserved and exposed); the s colored by dark blue is predicted structural residues (highly conserved and buried). The conservation scale represents the status of conversation from variable, average to convserved. (D) The mutations observed in the ten non-allergenic and non-toxic linear B-cell epitopes in the RBD. Brown, yellow and light brown represent chain A, chain B and chain C. Red represent the locations of the ten non-allergenic and non-toxic linear B-cell epitopes in RBD region; blue represent the observed mutations. The black AAs in the epitopes are the mutated ones.