key: cord-0788304-jjmzktfi authors: He, Jinlei; Huang, Fan; Zhang, Jianhui; Chen, Qiwei; Zheng, Zhiwan; Zhou, Qi; Chen, Dali; Li, Jiao; Chen, Jianping title: Vaccine design based on 16 epitopes of SARS‐CoV‐2 spike protein date: 2020-11-01 journal: J Med Virol DOI: 10.1002/jmv.26596 sha: bf790366956cf6b1fdb327e677822aa9aa34a326 doc_id: 788304 cord_uid: jjmzktfi The global outbreak of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) urgently requires an effective vaccine for prevention. In this study, 66 epitopes containing pentapeptides of SARS‐CoV‐2 spike protein in the IEDB database were compared with the amino acid sequence of SARS‐CoV‐2 spike protein, and 66 potentially immune‐related peptides of SARS‐CoV‐2 spike protein were obtained. Based on the single‐nucleotide polymorphisms analysis of spike protein of 1218 SARS‐CoV‐2 isolates, 52 easily mutated sites were identified and used for vaccine epitope screening. The best vaccine candidate epitopes in the 66 peptides of SARS‐CoV‐2 spike protein were screened out through mutation and immunoinformatics analysis. The best candidate epitopes were connected by different linkers in silico to obtain vaccine candidate sequences. The results showed that 16 epitopes were relatively conservative, immunological, nontoxic, and nonallergenic, could induce the secretion of cytokines, and were more likely to be exposed on the surface of the spike protein. They were both B‐ and T‐cell epitopes, and could recognize a certain number of HLA molecules and had high coverage rates in different populations. Moreover, epitopes 897‐913 were predicted to have possible cross‐immunoprotection for SARS‐CoV and SARS‐CoV‐2. The results of vaccine candidate sequences screening suggested that sequences (without linker, with linker GGGSGGG, EAAAK, GPGPG, and KK, respectively) were the best. The proteins translated by these sequences were relatively stable, with a high antigenic index and good biological activity. Our study provided vaccine candidate epitopes and sequences for the research of the SARS‐CoV‐2 vaccine. We downloaded the spike protein amino acid sequence of SARS-CoV-2 isolate Wuhan-Hu-1 from GenBank (GenBank ID: QHD43416.1). The sequences of the 66 epitopes containing pentapeptides of SARS-CoV-2 spike protein were from Lucchese G's report and checked in the IEDB database. 4 Then, the sequences of these epitopes were aligned with the amino acid sequence of SARS-CoV-2 spike protein to obtain 66 peptides at the corresponding sequence position of SARS-CoV-2 spike protein, which might be candidate epitopes of a vaccine. As nonsynonymous mutation sites in the viral amino acid sequence may affect the recognition of vaccine antigens, vaccine candidate antigens are generally more inclined to choose conservative sequences. 7, 8 Therefore, the inclusion of mutation sites in candidate epitopes of SARS-CoV-2 should be avoided as much as possible. We searched the 2019 Novel Coronavirus Resource (2019nCoVR, https://bigd.big.ac.cn/ ncov) from the China National Center for Bioinformation (CNCB) to obtain high-quality genomic data of SARS-CoV-2 clinical isolates. A total of 1218 isolates from 34 countries around the world sampled from June 1, 2020 to June 30, 2020 were selected for analysis. The detailed countries are shown in Table S1 . We focused on counting nonsynonymous mutations that cause amino acid changes in spike protein single-nucleotide polymorphism (SNPs). The amino acid sites with nonsynonymous mutations that appeared twice or more in 1218 isolates were considered to be easily mutated. The obtained 66 peptides of SARS-CoV-2 spike protein were checked for the presence of the easily mutated amino acid sites, and peptides containing the easily mutated sites should be noted in subsequent screening. The immune protective antigens in the peptides of SARS-CoV-2 spike protein were predicted using immunoinformatics tool Vaxijen v2.0, 9 the toxic peptides were predicted using ToxinPred 10 and the allergenic peptides were predicted using AllergenFP v.1.0. 11 The ability of the epitopes to induce interferon-γ (IFN-γ), interleukin-4 (IL-4), and IL-10 secretion was predicted using IFNepitope, 12 IL4Pred, 13 and IL-10Pred, 14 respectively. The peptides with nonantigenic protection, toxicity, or allergenicity were removed, and the remaining peptides were used as antigen epitopes for subsequent screening. The solvent accessibility of each amino acid of spike protein (template 6xr8.1 15 ) was predicted by SWISS-MODEL 16 to screen the epitopes that were more likely to be exposed on the surface of the spike protein. ABCpred 17 and IEDB Bepipred Linear Epitope Prediction 2.0 18 were used to predict B-cell epitopes. NetMHC 4.0 Sever, 19 Rankpep, 20 and SYFPEITHI 21 were used to predict T-cell epitopes and HLA molecules. As different HLA types are expressed at dramatically different frequencies in different ethnicities, 22 after obtaining the results of HLA class I and class II molecules recognized by these epitopes, we predicted the coverage rate of each epitope in different populations using Population Coverage in IEDB Analysis Resource. 22 Although some epitopes contained easily mutated sites, some of them might be strong neutralizing epitopes which might induce strong protections and should also be considered in vaccine design. Therefore, according to the above analysis, the selected vaccine candidate epitopes for SARS-CoV-2 were predicted to be relatively conservative, immunoprotective, nontoxic, and nonallergenic, and could promote the secretion of cytokines and more likely to be exposed on the surface of the spike protein. They were both B-and T-cell epitopes, which could identify a certain number of HLA molecules and had high coverage rates in different populations. 2.4 | Acquisition, analysis, and screening of vaccine candidate sequences DNAStar software. 24 Expasy ProtParam tool was used to predict the half-life and stability of the candidate proteins. 25 Finally, through a comprehensive analysis, the best candidate vaccine sequences were selected and will be prepared into vaccines and their immune effects verfied through animal experiments. After comparing the amino acid sequences of 66 epitopes in the IEDB database with those of corresponding positions of SARS-CoV-2 spike protein, 66 peptides belonging to SARS-CoV-2 spike protein were obtained and shown in Table 1 28 which are underlined in Table 1 . CR3022 is a neutralizing antibody previously isolated from a convalescent SARS patient and targets a highly conserved epitope that enables cross-reactive binding between SARS-CoV and SARS-CoV-2. 28,29 CR3022 related epitopes may produce cross-protective antibody responses against SARS-CoV and SARS-CoV-2. Therefore, these peptides need to be focused on in subsequent experiments. After analyzing the SNPs of 1218 SARS-CoV-2 clinical isolates of spike protein, we found a total of 52 nonsynonymous mutation sites that occurred twice or more, which were considered to be easily mutated and are marked in Figure 1A . The D614G mutation occurred the most and appeared in 1101 SARS-CoV-2 clinical isolates. The D614G mutation was also discovered by Korber et al., 30 and might lead to the change of SARS-CoV-2 virulence, but further research is needed. We checked the obtained 66 peptide sequences of SARS-CoV-2 to determine whether they contained easily mutated sites, and the peptides containing easily mutated sites should be noted in subsequent screening. Finally, 21 peptides containing easily mutated sites were found and are shown in Table 1. Peptides 15-44, 195-226, 683-699, 690-706, and 690-707 even contained more than two easily mutated sites, and should not be considered as vaccine epitopes. 3.3 | Prediction of protective antigen, toxicity, allergenicity, and cytokine secretion of the 66 peptides The prediction results of protective antigen, toxicity, allergenicity, and cytokine secretion of the 66 peptides are shown in Table 2 . There were 26 peptides without immune protection (score lower than 0.4 in analysis tool), 6 peptides with toxicity (score higher than 0 in analysis tool), and 19 peptides with allergenicity. There were 28 epitopes that had the ability to induce IFN-γ secretion, 42 epitopes had the ability to induce IL-4 secretion, and 24 epitopes had the ability to induce IL-10 secretion. After removing the nonimmunoprotective, toxic, or allergenic peptides, there were 28 remaining peptides as candidate epitopes for further screening. Among These six epitopes would be noted in the subsequent screening. The solvent accessibility prediction results of spike protein and the remaining 28 epitopes are shown in Figure S1 , and the average solvent accessibility scores of amino acids for the 28 epitopes are shown in Table 3 . There were 15 epitopes with an average solvent accessibility score higher than 20, which might be considered as vaccine candidates. The prediction results of B-, T-cell epitopes, and HLA class I and class II molecules identified by the 28 epitopes are shown in Table 3 . Except that the amino acid sequence of 899-906 epitope was too short to predict, all the other 27 epitopes were predicted to contain B-cell epitopes, which might induce the production of neutralizing antibodies. The analysis results also suggested that the 28 epitopes belonged to T-cell epitopes, 25 of which could recognize HLA class I and class II molecules, two of which could only recognize HLA class I molecules, and one of which could only recognize HLA class II molecules. However, among the six epitopes we focused on, Combined with the prediction results, among the 28 epitopes, epitopes with an average accessibility score of more than 20 or a world population coverage rate of more than 50% were selected. Therefore, a total of 21 epitopes were selected. However, among the 21 epitopes, eight of them (15-44, 195- The bolded peptides were consistent with the corresponding epitope sequences in IEDB database. N/A meant undetectable because the peptide length was beyond the range of the analysis system (≤30 amino acids). nontoxic, and nonallergenic, and could induce the secretion of cytokines, and more likely to be exposed on the surface of the spike protein. They were both B-and T-cell epitopes, could recognize a certain number of HLA molecules, and their population coverage rates in the world were more than 50%. The 16 candidate epitopes were eventually merged into 11 peptides and connected with different linkers to obtain vaccine candidate sequences. The schematic diagram of tandem sequences of the 11 peptides is shown in Figure 2A . Vaccine design is a complex issue with many factors to consider, the most important of which is the safety and effectiveness of the vaccine. 38 When screening candidate epitopes in our study, nonsynonymous mutation sites in the sequence were considered to ensure that the candidate epitopes did not contain easily mutated sites to avoid affecting antigen recognition. 7, 8 The toxicity and allergenicity of epitopes were considered to ensure the safety of the epitopes. 39, 40 The immunogenicity of antigens, the secretion of cytokines, the solvent accessibility of amino acids, and the recognition of MHC molecules were considered to ensure the effectiveness of the epitopes. 38, [41] [42] [43] The coverage of epitopes in different populations was also considered to ensure the effectiveness of the epitopes in most populations. 44 Moreover, when expressing the fusion protein, choosing the appropriate linker is very important for the design of the vaccine candidate sequence. Different linkers have impacts on the correct folding, stability, biological activity, and immunogenicity of proteins. 45 These studies need a lot of experiments to verify. However, the application of immunoinformatics tools to help design vaccine has greatly improved the efficiency and accuracy of epitope screening and the rationality of vaccine design and has been applied to many vaccine research. 46, 47 In this study, 16 epitopes of spike protein were predicted to be Epitope 1025-1041 also had a low average solvent accessibility score (12.33) and cannot recognize HLA class Ⅰ molecule. Therefore, these three epitopes were not selected as vaccine candidates. Another interesting finding was that in the population coverage results of 28 epitopes, the coverage rate of each epitope was high in Europe, North America, East Asia, and Oceania, but low in East Africa, West Africa, South Africa, and Central Africa. We thought this was due to the differences in recognition of HLA molecules by different populations. 50 However, this difference might lead to people in Africa being less protected by the same vaccine than people in Europe, North America, East Asia, and Oceania. Whether it is necessary to prepare a specific vaccine based on the recognition ability of the African population to HLA subclasses in the future remains to be studied. According to the results of mutation and immunoinformatics analysis, GPGPG, and KK, respectively) were predicted to be relatively stable, with a high antigenic index and good biological activity. We recommended the five sequences as candidate sequences for SARS-CoV-2 vaccine. Our next project is to synthesize the gene sequences for cloning and expression to prepare vaccines for SARS-CoV-2 and verify their immune effects. The bioinformatics analysis method in our study will greatly improve the accuracy and effectiveness of vaccine epitopes screening and the rationality of vaccine design, and can also be applied to vaccine design for other infectious diseases. Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus Contributions of the structural proteins of severe acute respiratory syndrome coronavirus to protective immunity Immunological responses against SARS-coronavirus infection in humans Epitopes for a 2019-nCoV vaccine The Immune Epitope Database (IEDB): 2018 update Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission Mutations in hepatitis D virus allow it to escape detection by CD8+ T cells and evolve at the population level Matrix-M™ adjuvant enhances immunogenicity of both protein-and modified vaccinia virus Ankara-based influenza vaccines in mice VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines In silico approach for predicting toxicity of peptides and proteins AllergenFP: allergenicity prediction by descriptor fingerprints Designing of interferon-gamma inducing MHC class-II binders Prediction of IL4 inducing peptides Computer-aided designing of immunosuppressive peptides based on IL-10 inducing potential Distinct conformational states of SARS-CoV-2 spike protein SWISS-MODEL: homology modelling of protein structures and complexes Prediction of continuous B-cell epitopes in an antigen using recurrent neural network BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes Gapped sequence alignment using artificial neural networks: application to the MHC class I system Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles SYFPEITHI: database for searching and T-cell epitope prediction Predicting population coverage of T-cell epitope-based diagnostics and vaccines PredictProtein-an open resource for online prediction of protein structural and functional features DNASTAR's Lasergene sequence analysis software Protein identification and analysis tools on the ExPASy server SARS corona virus peptides recognized by antibodies in the sera of convalescent cases Identification of immunodominant sites on the spike protein of severe acute respiratory syndrome (SARS) coronavirus: implication for developing SARS diagnostics and vaccines A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV Human monoclonal antibody combination against SARS coronavirus: synergy and coverage of escape mutants Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 How long is a piece of loop? Draft landscape of COVID-19 candidate vaccines Understanding modern-day vaccines: what you need to know Prime-boost vaccine strategy against viral infections: mechanisms and benefits Heterologous prime-boost vaccination A prime-boost vaccination protocol optimizes immune responses against the nucleocapsid protein of the SARS coronavirus Adenovirus-based vaccine prevents pneumonia in ferrets challenged with the SARS coronavirus and stimulates robust immune responses in macaques Principles of vaccination Study designs for the nonclinical safety testing of new vaccine products T-cell epitope prediction Development of a multivalent enterovirus subunit vaccine based on immunoinformatic design principles for the prevention of HFMD Analysis of conformational B-cell epitopes in the antibody-antigen complex using the depth function and the convex hull Recombinant and epitope-based vaccines on the road to the market and implications for vaccine design and production Fusion protein linkers: property, design and functionality Immunoinformatics-aided identification of T cell and B cell epitopes in the surface glycoprotein of 2019-nCoV Bioinformatics analysis of four proteins of Leishmania donovani to guide epitopes vaccine design and drug targets selection Mining of epitopes on spike protein of SARS-CoV-2 from COVID-19 patients COVID-19 patients form memory CD8+ T cells that recognize a small set of shared immunodominant epitopes in SARS-CoV-2 HLA supertype variation across populations: new insights into the role of natural selection in the evolution of HLA-A and HLA-B polymorphisms The authors declare that there are no conflict of interests. The data that supports the findings of this study are available in the supplementary material of this article. http://orcid.org/0000-0001-7617-2800