key: cord-0812059-0rgo6kyz authors: Marchan, Jose title: A vaccine built from potential immunogenic pieces derived from the SARS-CoV-2 spike glycoprotein: A computational approximation date: 2022-01-07 journal: J Immunol Methods DOI: 10.1016/j.jim.2022.113216 sha: 6e4503e3547f8c77b2fbdce51c308a1401aff15b doc_id: 812059 cord_uid: 0rgo6kyz Coronavirus Disease 2019 (COVID-19) represents a new global threat demanding a multidisciplinary effort to fight its etiological agent—severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In this regard, immunoinformatics may aid to predict prominent immunogenic regions from critical SARS-CoV-2 structural proteins, such as the spike (S) glycoprotein, for their use in prophylactic or therapeutic interventions against this highly pathogenic betacoronavirus. Accordingly, in this study, an integrated immunoinformatics approach was applied to identify cytotoxic T cell (CTC), T helper cell (THC), and Linear B cell (BC) epitopes from the S glycoprotein in an attempt to design a high-quality multi-epitope vaccine. The best CTC, THC, and BC epitopes showed high viral antigenicity and lack of allergenic or toxic residues, as well as CTC and THC epitopes showed suitable interactions with HLA class I (HLA-I) and HLA class II (HLA-II) molecules, respectively. Remarkably, SARS-CoV-2 receptor-binding domain (RBD) and its receptor-binding motif (RBM) harbour several potential epitopes. The structure prediction, refinement, and validation data indicate that the multi-epitope vaccine has an appropriate conformation and stability. Three conformational epitopes and an efficient binding between Toll-like receptor 4 (TLR4) and the vaccine model were observed. Importantly, the population coverage analysis showed that the multi-epitope vaccine could be used globally. Notably, computer-based simulations suggest that the vaccine model has a robust potential to evoke and maximize both immune effector responses and immunological memory to SARS-CoV-2. Further research is needed to accomplish with the mandatory international guidelines for human vaccine formulations. On 31 st December, 2019, a dramatic increase in the number of patients with a potentially fulminant respiratory disease was reported in Wuhan, China. The etiological agent was eventually identified as a novel highly pathogenic betacoronavirus-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [2] . Thereafter, SARS-CoV-2 caused an overwhelming wave of coronavirus disease 2019 (COVID-19) cases across Asia, Europe, Oceania, the Americas, and Africa, which led to the World Health Organization declared COVID-19 as a pandemic on 11 th March, 2020 [1] . Unfortunately, at the time of this research, no clinically approved treatment is available to fight SARS-CoV-2, whose rapid spread has generated an explosive second wave of COVID-19 all over the world [1] . The SARS-CoV-2 outer membrane is decorated with several structural proteins, including the S glycoprotein, the membrane protein, and the envelope protein [2] . The S glycoprotein forms homotrimers containing both a receptor-binding domain (RBD) and a receptor-binding motif (RBM) [3] . The latter mediates contacts with human angiotensinconverting enzyme 2 (hACE2), thereby allowing SARS-CoV-2 entry into host cell [3] . This critical role in viral pathogenesis turns the SARS-CoV-2 S glycoprotein into an attractive target for vaccine development [4] . Multi-epitope vaccines designed from immunoinformatics tools could aid to elicit a protective immune response against SARS-CoV-2, as reported previously for other infectious agents [5] . In this regard, recent data indicate that the SARS-CoV-2 S glycoprotein harbours prominent immunologically active regions, which may serve as candidates for multi-epitope vaccine models [6] . Accordingly, the present study aimed to design a multiple-epitope vaccine construct against SARS-CoV-2 using for this purpose an integrated in silico approach. with a small percentile rank have high affinity by HLA alleles. This percentile rank is produced on IEDB-AR by comparing the IC50 of each predicted peptide against random peptides from SWISSPROT database. In this work, epitopes were selected by following this guideline as well as by using a percentile rank cut-off ≤ 20 as recommended previously [18] , which has also been successfully applied in other in silico studies focused on SARS-CoV-2 [19, 20] . In addition, binding peptides to HLA-II were also chosen by their potential to induce interferon-gamma (IFN-g) (Fig. 1) , which is a cytokine necessary to fight viral infections [10] . Epitopes with a high potential to induce the production of IFN-g were selected using the IFNepitope server (http://crdd.osdd.net/raghava/ifnepitope/) [21] . This website harbours three models (motif based, SVM based and hybrid approach), which has been trained on 10433 experimentally validated IFN-gamma inducing and non-inducing MHC class II peptides [21] . BCPRED (http://ailab.ist.psu.edu/bcpred/) [22] was used to predict linear BC epitopes based on several physicochemical properties: hydrophilicity, flexibility, accessibility, and antigenicity propensity (threshold = 1 for each parameter). Simultaneously, the S glycoprotein amino acid sequence was also subjected to iBCE-EL (http://thegleelab.org/iBCE-EL/) [23] and BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/) [24] for additional predictions of linear BC epitopes. To evaluate the presentation of the best epitopes in the context of HLA molecules, a molecular docking study was conducted (Fig. 1) . Taking into account that HLA-A*02:01 and HLA-DRB1*01:01 were predicted as common interacting HLA alleles, they were selected for this purpose. The molecular docking simulation and the Gibbs free energy (ΔG) of the HLA-viral peptide complexes were evaluated as recently reported [20] . J o u r n a l P r e -p r o o f High potential CTC, THC, and linear BC epitopes were selected to generate the amino acid sequence of the multi-epitope vaccine. The CTC and THC epitopes were linked together using AAY and GPGPG linkers, respectively, whereas linear BC epitopes were connected by KK linkers (Fig. 1) . Moreover, a TLR4 agonist, known as RS09 (Sequence: APPHALS) [25] , was added as an adjuvant at the N-terminus by using an EAAAK linker ( Fig. 1) . For future validation studies, the vaccine molecule must be expressed in vitro and then purified. Therefore, a polyhistidine-tag (6x-H tag) was included at the C-terminus ( Fig. 1) , which would allow its purification [26] . The ProtParam tool (https://web.expasy.org/protparam/) [27] was used to examine relevant physiochemical parameters of the multi-epitope vaccine. To reconfirm its viral antigenicity and lack of allergenicity and toxicity, the web tools described in section 2.2 were applied. In addition, the vaccine solubility was predicted using the SOLpro server (http://scratch.proteomics.ics.uci.edu/) [28] . PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) [29] and GalaxyWEB (http://galaxy.seoklab.org/) [30] were utilized to predict the secondary and tertiary structure, respectively, of the multi-epitope vaccine construct. The best model was refined with GalaxyWEB [30] . The vaccine structure was validated by comparing with experimentally validated 3D protein structure. In this regard, the vaccine structure was submitted to ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php), which provides a general quality score for a given structure [31] . Furthermore, the Ramachandran plot was created on PROCHECK website (https://servicesn.mbi.ucla.edu/PROCHECK/) whereby the protein structure can be validated according to energetically allowed and disallowed dihedral angles psi and phi of amino acid residues [32] . J o u r n a l P r e -p r o o f Journal Pre-proof B-cells are also considered professional antigen presenting cells and they can initiate this function by recognizing the antigen through B-cell receptors [10] . Therefore, conformational epitopes of the multi-epitope vaccine construct were predicted from Ellipro (http://tools.iedb.org/ellipro/) [33] , which represents the protein structure as an ellipsoid and calculates protrusion indexes for protein residues outside of such ellipsoid [33] . Minimum levels of 0.8 and a distance of 6.0 Å were applied. The epitopes were visualized with the VMD software (Version 1.9.3) to illustrate their position and 3D structure as previously reported [20] . Since TLR4 may serve as a sensor for the recognition of coronaviruses S glycoproteins [34] , this germline-encoded pattern recognition receptor was selected for the docking study. The 3D structure of TLR4 was obtained from PDB (accession number: 4G8A). The refined model of the multi-epitope vaccine was used as a ligand. The TLR4-Vaccine docking simulation and its 3D visualization were performed as recently reported [20] . To further characterize the potential immune response of the multi-epitope vaccine, immune simulations were performed using the C-ImmSim server (http://150.146.2.1/C-IMMSIM/index.php) [35] . Three injections were applied four weeks apart as described previously [36] . Furthermore, 12 injections were applied four weeks apart to simulate repeated exposure to the potential immunogen. The Simpson index D was used to interpret the diversity of the immune response. Global population coverage of the multi-epitope vaccine construct was calculated from IEDB-AR (http://tools.iedb.org/population/) [37] . The HLA allele genotypic frequencies available on IEDB-AR were obtained from Allele Frequency Database (AFD) The in silico approach (Fig. 1 ) allowed predict a total of 47 T cell epitopes; however, 7 cytotoxic T cell (CTC) and 11 T helper cell (THC) epitopes were identified as the best (Table 1) . These epitopes showed a potent viral antigenicity-ranging from 0.63 to 1.52and lack of allergenic or toxic residues in their sequences (Table 1) . Moreover, THC epitopes were characterized by their potential capability to induce IFN-g (Table 1) . Although "EGFNCYFPLQSYGFQ" (E47 in Table 1 ) could be categorized as a strong potential THC epitope, it was identified as a probable inductor of toxicity. Therefore, this epitope was not included in the amino acid sequence of the multi-epitope vaccine. The selected CTC epitopes (Table 1) J o u r n a l P r e -p r o o f A total of 10 linear BC epitopes of varying amino acid lengths were predicted ( Table 2) . Most of the epitopes showed robust viral antigenicity (≥0.5), as well as, they were identified as non-allergenic and non-toxic (Table 2 ). However, only 7 epitopes were selected for the vaccine design due to they were predicted simultaneously by 3 different web tools (BCPRED, iBCE-EL, and BepiPred-2.0) ( Table 2) . Interestingly, overlapping residues were observed between some linear BC and T cell epitopes. To evaluate the presentation of the best epitopes in the context of HLA, molecular docking simulations were conducted. For this purpose, HLA-A*02:01 and HLA-DRB1*01:01 were chosen as representative alleles. HLA-I and HLA-II alleles were docked with CTC and THC epitopes, respectively, using the Cluspro server, which has been recently applied to successfully dock epitopes from SARS-CoV-2 non-structural proteins into HLA molecules [20] . The inspection on VDM software allowed observing different binding patterns wherein viral peptides rightly interact with the active site residues of the HLA groove in a similar way to control peptides ( Fig. 2A ). Moreover, several viral peptides (e.g., E18 and E33) formed a bulge that projected from their respective HLA allele ( Fig. 2B and 2C) , which is relevant, for instance, to activate CTC against SARS-CoV-2 infected cells [10] (Fig. 2E) . Importantly, each HLAviral peptide complex showed robust potential interactions (free energy values -7 < kcal/mol -1 ) comparable to control peptides (Fig. 2D ). To design the amino acid sequence of the multi-epitope vaccine, epitopes were organized using several linkers (Fig. 1) . This sequence is constituted by 425 amino acid residues (Fig. 1) . Of particular note, several epitopes selected for the vaccine design (E19, E42, E43, E44, and E45 in Table 1 ; E10 in Table 2 ) harbour residues that are usually involved in the interaction between the SARS-CoV-2 S glycoprotein and hACE2 [3, 38, 39] . For instance, N501-which is present in the amino acid sequence of E19 (Table 1 ) and E10 (Table 2 )-J o u r n a l P r e -p r o o f has been recently described as one of the critical hACE2-binding residues in SARS-CoV-2 [3] . The vaccine showed a strong viral antigenicity (0.64), as well as neither allergenic nor toxic residues were observed in its amino acid sequence. Furthermore, the physicochemical properties examined with the ProtParam tool, including molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and GRAVY, were computed as conventional results (Table 3) . The vaccine construct was analysed using the PSIPRED server to predict its secondary structure, which identified 309, 65, and 51 amino acids forming coil, helix, and strand regions, respectively (Fig. 3) . The predicted tertiary structure was subjected to refinement using the GalaxyRefine server. The output showed four potential models. Model 1 (Fig. 4A ) was classified as the best according to the web tool. Therefore, this model was selected for further analysis. In this regard, the Ramachandran plot (Fig. 4B) showed that 98.1 of residues were located in allowed regions, whereas the remaining residues were observed in disallowed regions (1.9%). In addition, the Z-score value (-2.4) (Fig. 4C) suggests that the vaccine structure is similar to native proteins of comparable size. Four conformational BC epitopes (CE) were predicted using Ellipro (Fig. 4A) . These CE showed high probability scores-CE1: 0.911, CE2: 0.819, CE3: 0.803, and CE4: 0.803 suggesting a considerable accessibility for antibodies (Fig. 4A) . Likewise, these results also confirm the immunogenic potential of the multi-epitope vaccine construct. The vaccine model showed a suitable interaction between the TLR-4 chain B and the myeloid differentiation factor 2 (MD-2) molecule, which are known to initiate the cascade J o u r n a l P r e -p r o o f signalling pathways in vivo [10] , thereby suggesting that such a vaccine model could elicit an appropriate immune response against SARS-CoV-2. Importantly, the adjuvant inserted in the vaccine sequence was observed in the interaction zone with the TLR4 and MD-2, which indicates the relevance of such an adjuvant in the potential efficacy of the present vaccine model. The immune response simulations with the multi-epitope vaccine construct (3 doses given 4 weeks apart) showed cell-mediated and humoral responses. As expected, increased number and activity of Natural Killer (NK) cells-a relevant line of attack against viruses [10] , and macrophages were observed (Fig. 6 ). Regarding the adaptive immune response, CTC and THC populations showed a proliferative burst, effector cell generation, and a dramatic cell number contraction (Fig. 6) . Importantly, IL-2, which is necessary for T cell activation and optimal proliferation [10] , was amplified after each dose (Fig. 6) . Moreover, the vaccine model increased BC and plasma cell populations, particularly immunoglobulin M (IgM) and IgG1 isotypes (Fig. 6) . In this regard, titres of IgM, IgG1, and IgG2 were higher in the secondary and tertiary response compared to primary response (Fig. 6 ). Of note, immunogen concentrations decreased after antibody response (Fig. 6) . Notably, repeated exposure with 12 injections (given 4 weeks apart) increased the IgG1 levels and stimulated CTC and THC populations (Fig. S1 ). Taken together, these results suggest that the multi-epitope vaccine could evoke and maximize both effector responses and immunological memory to SARS-CoV-2. To investigate whether the multi-epitope vaccine may be used in different ethnic groups or globally, a population coverage analysis was performed. Remarkably, the multi-epitope vaccine construct showed high global population coverage: 99.69%. For instance, several countries with positive reports of COVID-19 (>6000 cases) [1] , obtained the highest values, including, Australia, Brazil, Ecuador, Chile, China, France, Germany, India, Iran, Israel, Italy, Japan, Mexico, Morocco, Peru, Philippines, Russia, Singapore, South Korea, Spain, Sweden, USA, UK, etc., (Fig. 7) . Immunoinformatics represents a valuable tool whereby the limitations in the selection of appropriate antigens and immunodominant epitopes may be overcome [40] . Previous in silico-based reports have shown that the SARS-CoV-2 S glycoprotein contains potential epitopes [8] . Therefore, researchers have recently attempted to design epitope-based vaccine candidates against SARS-CoV-2 [8] . In the present study, highly potential B and T cell epitopes from the SARS-CoV-2 glycoprotein were predicted and the best selected to design a high-quality multi-epitope vaccine candidate. Remarkably, this vaccine harbours 2 epitopes (E19 in Table 1 and E10 in Table 2 ) that could evoke immune responses against SARS-CoV-2 RBM-the main responsible for virus entry into human cells [3] whereas 4 epitopes (E43, E44, E45, and E46 in Table 1 ) may direct the immune attack against other regions of SARS-CoV-2 RBD. These results are consistent with in vitro data that have demonstrated the antigenicity of the SARS-CoV-2 S glycoprotein [41] . The T cell epitopes included in the vaccine sequence accomplish with relevant requisites to design a suitable multi-epitope vaccine candidate. Firstly, they showed a marked antigenicity, immunogenicity, and lack of allergenic or toxic residues. Secondly, the THC epitopes were predicted as potent inductors of IFN-g-a crucial cytokine for CTC activation [10] . Thirdly, both CTC and THC epitopes properly interacted with the groove of HLA-I and HLA-II alleles, respectively, which is in agreement with other computer-based reports [20] , thereby suggesting that the T cell epitopes identified and selected in the present study could be successfully presented in the context of HLA molecules. In addition, most of the peptides arched away from the HLA alleles and are, therefore, more exposed, which in turn suggests that they could interact more directly with the T-cell receptor, thereby possibly leading to a proper activation of T cells [10] . The purpose of an adjuvant is to make a vaccine "detectable" for antigen-presenting cells such as dendritic cells [10] . Here, the TLR4 adjuvant known as RS09 [25] was included in the multi-epitope vaccine sequence. The molecular docking simulation showed J o u r n a l P r e -p r o o f that the multi-epitope vaccine rightly interacts with this innate immune receptor in a similar way to previous works [42] . Notably, this study shows, by immunoinformatics simulations, the induction of both innate and adaptive responses to SARS-CoV-2. In this regard, NK cell and macrophage activation were detected, as well as high production of typical antibodies (IgM and IgG), cytokines (IFN-g and IL-2), and a proliferative burst of CTC and THC were observed after three injections. The generation and increase of plasma cells were also documented. Furthermore, B and T cell populations decreased along with immunogen levels. These data is comparable to previous investigations that have been focused on vaccine development against Mycobacterium ulcerans [36] and filarial diseases [43] , as well as are in agreement, at least partially, with a recent study that demonstrated a positive correlation between robust CD4+ THC responses with anti-SARS-CoV-2 IgG and IgA titres of COVID-19 convalescent patients [44] . These immune responses were directed to the SARS-CoV-2 S glycoprotein [44] . Recently, Kar et al. (2020) [45] have reported a similar potential vaccine model, whose epitopes where also derived from the S glycoprotein. However, there are important differences to point out. For instance, it is not clear why they authors docked the entire vaccine model with HLA class I and class II alleles, which are known to interact only with peptides of a relatively small amino acid length [10] (9-mer for HLA class I molecules and 15-mer for HLA class II molecules) as has shown in the present and other similar works focused on SARS-CoV-2 and vaccine development [20] . In addition, the docking simulations between the vaccine model and the TLR-4 showed that the vaccine is not interacting in the region TLR-4 and MD-2 (Kar et al. 2020), which is pivotal for antigen recognition and initiation of the immune response [46] . Importantly, the population coverage of the vaccine molecule reported in the present study is higher compared to the (2020) [47] selected seven SARS-COV-2 proteins, including the S glycoprotein, to design a vaccine construct. Interestingly, they did not detect the S glycoprotein as antigenic on the Vaxijen website, which contrast with previous reports in the literature wherein the Sglycoprotein has been demonstrated to induce immune responses [19] . This result may be J o u r n a l P r e -p r o o f due to the fact that complex proteins harbour both antigenic and non-antigenic regions and in order to identify and select the former is necessary to apply an integrated approach in which different algorithms are used to validate the predicted output [19, 20] . This work was limited by A) the population coverage analysis did not include some countries, particularly from Africa, Central America, Eastern Europe, and Central Asia. This was mainly due to data not available concerning the HLA allele frequencies. Nevertheless, the highest population coverage was observed in several of the worst-hit countries by COVID-19 (e.g, Brazil, China, France, Italy, Iran, Peru, Spain, USA, etc.) [1] . B) This study did not explore whether the epitopes used for vaccine design are conserved in other beta-coronaviruses. However, former reports have already demonstrated that SARS-CoV-2 shares 79.5% and 50% sequence identity to SARS-CoV and MERS-CoV, respectively [2] . In summary, this research provides a novel multi-epitope vaccine built from high potential epitopes derived from the SARS-CoV-2 S glycoprotein. This immunoinformatics study suggests that such multi-epitope vaccine could activate and generate robust humoral and cell-mediated responses in a simultaneous manner against SARS-CoV-2, as well as the population coverage analysis indicates that it could be used globally. However, further rigorous in vitro and in vivo studies are imperative to confirm its immunogenic properties, safety, and efficacy, which-of course-would imply months, even years. J o u r n a l P r e -p r o o f J o u r n a l P r e -p r o o f World Health Organization. Coronavirus disease (COVID-19) Pandemic Genomic characterization and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor SARS-CoV-2 Vaccines: Status Report Therapeutic efficacy of a multi-epitope vaccine against Helicobacter pylori infection in BALB/c mice model Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach AllergenFP: allergenicity prediction by descriptor fingerprints In silico approach for predicting toxicity of peptides and proteins, PloS one VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Kuby Immunology Immune epitope database analysis resource Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding A consensus epitope prediction approach identifies the breadth of murine T(CD8+)-cell responses to vaccinia virus NetMHCpan, a method for MHC class I binding prediction beyond humans A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method Development and validation of a broad scheme for prediction of HLA class II restricted T cell epitopes A, Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 Conserved HLA binding peptides from five non-structural proteins of SARS-CoV-2-An in silico glance Designing of interferon-gamma inducing MHC class-II binders Prediction of continuous B-cell epitopes in an antigen using recurrent neural network A New Ensemble Learning Framework for Improved Linear B-Cell Epitope Prediction BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes Synthetic Toll like receptor-4 (TLR-4) agonist peptides as a novel class of adjuvants Purification of Polyhistidine-Tagged Proteins Protein identification and analysis tools in the ExPASy server SOLpro: accurate sequence-based prediction of protein solubility The PSIPRED Protein Analysis Workbench: 20 years on GalaxyWEB server for protein structure prediction and refinement ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins PROCHECK -a program to check the stereochemical quality of protein structures ElliPro: a new structure-based tool for the prediction of antibody epitopes Toll-like receptors in antiviral innate immunity Computational immunology meets bioinformatics: the use of prediction tools for molecular binding in the simulation of the immune system Structural basis and designing of peptide vaccine using PE-PGRS family protein of Mycobacterium ulcerans-An integrated vaccinomics approach Predicting population coverage of T-cell epitope-based diagnostics and vaccines Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: An in silico analysis Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Moving from Empirical to Rational Vaccine Design in the 'Omics' Era Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Novel Immunoinformatics Approaches to Design Multi-epitope Subunit Vaccine for Malaria by Investigating Anopheles Salivary Protein In-silico design of a multiepitope vaccine candidate against onchocerciasis and related filarial diseases Targets of T cellresponses to SARS-CoV-2 coronavirus in humans with COVID-19 disease and unexposed individuals A candidate multi-epitope vaccine against SARS-CoV-2 The structural basis of lipopolysaccharide recognition by the TLR4-MD-2 complex Designing of a next generation multiepitope based vaccine (MEV) against SARS-COV-2: Immunoinformatics and in silico approaches