key: cord-0759454-mnutyc0k authors: Agarwal, Vidhu; Varadwaj, Pritish; Tiwari, Akhilesh title: Designing of epitope-based vaccine from the conserved region of spike glycoprotein of SARS-CoV-2 date: 2020-08-27 journal: bioRxiv DOI: 10.1101/2020.08.27.269456 sha: e5bb4afbde5efe40c463db297e0f759248365b85 doc_id: 759454 cord_uid: mnutyc0k The emergence of COVID-19 as a pandemic with a high morbidity rate is posing serious global concern. There is an urgent need to design a suitable therapy or vaccine that could fight against SARS-CoV-2 infection. As spike glycoprotein of SARS-CoV-2 plays a crucial role in receptor binding and membrane fusion inside the host, it could be a suitable target for designing of an epitope-based vaccine. SARS-CoV-2 is an RNA virus and thus has a property to mutate. So, a conserved peptide region of spike glycoprotein was used for predicting suitable B cell and T cell epitopes. 4 T cell epitopes were selected based on stability, antigenicity, allergenicity and toxicity. Further, MHC-I were found from the immune database that could best interact with the selected epitopes. Population coverage analysis was also done to check the presence of identified MHC-I, in the human population of the affected countries. The T cell epitope that binds with the respective MHC-I with highest affinity was chosen. Molecular dynamic simulation results show that the epitope is well selected. This is an in-silico based study that predicts a novel T cell epitope from the conserved spike glycoprotein that could act as a target for designing of the epitope-based vaccine. Further, B cell epitopes have also been found but the main work focuses on T cell epitope as the immunity generated by it is long lasting as compared to B cell epitope. be present for an effective, safe and stable T cell epitope: Antigenicity, Non-allergenicity, Nontoxicity and stability. Based on these properties, 4 T cell epitopes were selected from the 23 T cell epitopes selected from the immune database. Antigenicity determination of T cell epitopes means that whether the epitope is capable of eliciting an immune response or not, inside the host. VaxiJen v2.0 predicts the protective antigens and vaccine subunits Allergenicity determination of the epitopes check whether the epitope is producing any kind of allergic reactions or hypersensitivity or not. AllerTOP v. 2.0 defines whether epitope can be allergen or not. 6 | P a g e ToxinPred predicts the toxicity of the epitope. In this Swiss-Prot based (SVM) prediction method was used with an E-value cut-off value of 10 for the motif-based method. SVM threshold was set as 0.0. ProtParam tool computes the physical and chemical parameters that are stored in the SWISS-Prot or TrEMBL like the stability of the epitope. Population coverage analysis was done to check the presence of the MHC-I molecule that binds to the T cell epitopes (as predicted by the immune database), in the human population of the SARS-CoV-2 affected countries. Because if these MHC-I molecules are present in the human population that only it will bind with the T cell epitope for eliciting an immune response in the host against SARS-CoV-2. IDEB analysis tool was used for population coverage analysis. 7 | P a g e 2.6. 3D structure prediction of T cell epitopes and MHC-I: As the 3D structure of the T cell epitopes and the MHC-I molecules were not present in the database, they were 3D modeled. The T cell epitopes and MHC-I 3D structures were predicted and modeled using PEP-FOLDD and SWISS-MODEL, respectively. Molecular docking analysis of T cell epitopes were done in order to check the binding affinities between the chosen T cell epitopes and their corresponding MHC-I molecules that were derived from the immune databases. Autodock flexible receptor (ADFR) tool was used to do molecular docking for knowing the receptor-ligand interactions and binding affinities . The most probable T cell epitope was selected and subjected for molecular dynamic simulations in order to know the deviations and atomic fluctuations, when the T cell epitope would bind to the MHC-I molecule. GROMACS v 2018.1 was used for the molecular dynamic simulation. 8 | P a g e B-cell epitopes interact with the B lymphocyte for eliciting immune response 5 . IEDB tool was used for identifying B-cell epitopes and its antigenicity using methods: Kolaskar and Tongaonkar (KT) antigenicity scale prediction, Emini surface accessibility prediction, Karplus and Schulz (KS) flexibility prediction, Bepipred linear epitope prediction and Chou Fasman (CF) beta-turn prediction. 10 Sequence of S glycoprotein of SARS-CoV-2 with PDB ID: 6X2A, 6X2B, 6X2C, 6X29, 6YM0, 6YLA, 6VXX, 6VSB, 6WPT and 6WPS were derived from PDB RCSB. MSA predicted the conserved region from S glycoprotein as shown in Figure 2 , which starts from 289 proline and ends at 461 asparagine. The conserved S glycoprotein region of the sequence is 172 amino acid long and this peptide sequence was used in further analysis. 9 | P a g e From the conserved S glycoprotein region of the protein, potential T cell epitopes were taken from the NetCTL server in a preselected environment. 23 T cell epitopes were found and used for the further analysis as shown in Table 1 . Proteasome complex cleaves the peptide bonds and converts the proteins into small peptides. These small peptide molecules gets associates with class-I MHC molecules and are further presented to T helper cells. The binding predictions of MHC-I and processing were done from IDEB tool that generates proteasomal CT processing, TAP transport, MHC-I and processing score. The overall score predicts the peptides potential to be a T-cell epitope as shown in Table 2 . MHC-I binding predictions resulted in a range of MHC-I alleles that interacts with selected T cell epitopes. The MHC-I that were having highest binding affinity were chosen for further analysis. 10 | P a g e For an epitope to be effective and safe for the host, antegenicity, non-allergenicity, non-toxicity and stability are given some parameters on the basis of which 4 epitopes were selected for further analysis (LDSKVGGNY, SKVGGNYNY, NDLCFTNVY and GQTGKIADY) as shown in Table 3 . The population coverage of the predicted epitopes is depicted in Table 4 . T-cell recognizes a complex of MHC and pathogen-derived epitope. This means that the particular MHC needs to be present in the individual, so that it can binds to a particular T-cell pathogen-derived epitope for eliciting an immune response. This is known as MHC restriction of T-cell response. As MHC molecules are polymorphic and different human leukocyte antigen (HLA) alleles are present in human population. If peptides are selected that binds with HLA with a high affinity and that HLA is present in target human population, then epitope-based vaccine could be more effective. Therefore, careful considerations must be taken care so that the vaccine is not ethnically biased. For the issue discussed above, IEDB population coverage analysis helps in calculating the fraction of individuals that contains the predicted MHC. 11 | P a g e 3.6. 3D structure prediction of T cell epitopes: 3D structure modeling of T cell epitopes were done using homology modeling approach using SWISS MODEL web server that is fully automated and freely available. Further, MHC-I were modeled using a de-novo approach that predicts 3D structure of protein from its linear amino acid sequence. ADFR tool was used to find the binding affinities between the T-cell epitope and MHC-I that is having a lower value of IC50. GQTGKIADY T-cell epitope binds with HLA-C*03:03 most strongly as described in Table 5 and Figure 3 , as it is having the highest binding affinity. MD simulations were applied for the modeled structure of T-cell epitope GQTGKIADT and HLA-C*03:03 complex. This was done in order to understand the stability and dynamics of the complex for 50 ns in GROMACS. The system was solvated in TIP3P salvation box and CHARMM36 all atom force field. Root mean square deviation (RMSD) was performed to calculate the average distance of the backbone C-alpha (Cα) atom of the superimposed frames as observed in Figure 4a . An initial change can be observed between 0 to 10 ns. After that, another change is observed between 30 to 40 ns. After which the system gets quite stable. Next, Root mean square fluctuation (RMSF) was applied to the system trajectories as observed in Figure 4b . RMSF calculates the average residual mobility of complex residues from its mean position. Minor fluctuations can be observed, which mean that the complex is stable, except between 30 to 40 ns. As observed from the Figures 4a and 4b, the complex of T-cell epitope GQTGKIADT and HLA-C*03:03 is stable after 40 ns. The different analysis method was used from IEDB tool to identify B-cell epitope. This tool uses amino acid scale based method. The determination of antigenicity was on the basis of physicochemical properties of amino acid. The average antigenic propensity of the conserved S glycoprotein of SARS-CoV-2 was 1.043, with a maximum value of 1.214 and minimum value of 0.907. The antigenic determination threshold was 1.043 (>1.00 are potential antigenic determinants). 6 epitopes satisfy the threshold value and so they have the capacity to express B-cell response. Results are summarized and shown in Figure 5 and Table 6. 13 | P a g e For being a potential B-cell epitope, it must have surface accessibility. Therefore, this method is used to predict the peptide surface accessibility. The average value of peptide antigenic propensity was 1.00, with a maximum and minimum value of 4.805, 0.073, respectively. The antigenic determination threshold was 1.00. The region between 90 to 100 amino acid residues was found to be more accessible in the conserved S glycoprotein of SARS-CoV-2 as shown in Figure 6 and Table 7 . Experimentally, the antigenicity is correlated with its peptide flexibility 6 . Therefore, this method was implemented to investigate the flexibility of the peptide. The average value of peptide antigenic propensity has been found 0.989, with a maximum and minimum value of 1.112 and 0.896, respectively. The threshold value of antigenic determination was found to be 0.989. The region from 80 to 88 amino-acid was found to be the most flexible as shown in Figure 7. 14 | P a g e This method uses hidden markov model (HMM) method (Best method for linear B-cell epitope prediction). The average antigenic propensity of the peptide was 0.075, with a maximum value of 1.896 and minimum value of 0.021. The threshold value of antigenic determination was 0.350. Peptide sequence from165 to 178 are capable of induction of the desired immune response from the B-cell epitope. The result is shown in Figure 8 and Table 8 . Often, the beta turns are hydrophilic and accessible. These 2 properties are of the antigenic region of a protein 7 Mostly, vaccine developments are based on B-cell immunity. But, a strong immune response is generated by the CD8+ T-cell, as compared to the B-cell immunity 26 . Due to antigenic drift, with time antibody memory fails, whereas the T-cell immune response is long-lasting. Due to the advancement in computational biology and sequence-based technology, there is a huge database that could be used in the treatment of such an infection. Therefore, an effort has been made in this paper to find a T-cell and B-cell epitope for designing of the epitope-based vaccine. Although, the paper proposes a novel T-cell epitope but some B-cell epitopes are also being This study is an in silico-based study and the data have been extracted from the various immune database, but such a type of study have previously been validated with wet-lab results 27 . So, the proposed B-cell and T-cell epitope could also be effective in eliciting an immune response and killing the infection caused by SARS-CoV-2. This study can help in designing of the epitope-based vaccine while saving a lot of wet lab effort. A novel T-cell epitope GQTGKIADY has been proposed as it contains all the features like antigenicity, non-allergenicity, non-toxicity and stability. Further, GQTGKIADY epitopes binds with HLA-C*03:03 very well, as predicted from the molecular docking and MD simulation results. Population coverage analysis results support that HLA-C*03:03 is present in the human population of the most effected countries like of India, China, Italy, United States, United Kingdom and Russia. This means the proposed T-cell epitope can be a very good candidate for eliciting the immune response in the most affected population. Some B-cell epitopes have also been found that could be effective in producing antibodies inside the host. This work mainly focuses on the T-cell epitopes, as T-cell based immune response could be long lasting as compared to B-cell for fighting with infection caused by SARS-CoV-2. Epitopes Combinatorial score Coronavirus envelope protein: current knowledge Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2 The Economic Value of Vaccination: Why Prevention Is Wealth Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding Epitope recognition by diverse antibodies suggests conformational convergence in an antibody response Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains) Turns in peptides and proteins Developing countries face double burden of disease Genome-derived vaccines Immunoinformatics comes of age Emerging and re-emerging infectious diseases: influenza as a prototype of the host-pathogen balancing act More than one reason to rethink the use of peptides in vaccine design Epitope-Based Vaccine Designing of Nocardia asteroides Targeting the Virulence Factor Mce-Family Protein by Immunoinformatics Approach In silico vaccine design based on molecular simulations of rhinovirus chimaeras presenting HIV-1 gp41 epitopes A computational approach for identification of epitopes in dengue virus envelope protein: a step towards designing a universal dengue vaccine targeting endemic regions A computational assay to design an epitope-based peptide vaccine against chikungunya virus A computational assay to design an epitope-based Peptide vaccine against Saint Louis encephalitis virus Inferring the rate and timescale of dengue virus evolution Evolution of hypervariable region 1 of hepatitis C virus in primary infection Immune selection and genetic sequence variation in core and envelope regions of hepatitis C virus Intranasal vaccination with recombinant receptor-binding domain of MERS-CoV spike protein induces much stronger local mucosal immune responses than subcutaneous immunization: Implication for designing novel mucosal MERS vaccines A DNA vaccine induces SARS coronavirus neutralization and protective immunity in mice Evaluation of serologic and antigenic relationships between middle eastern respiratory coronavirus and other coronaviruses to develop vaccine platforms for the rapid response to emerging coronaviruses The spike protein of 353 SARS-CoV--a target for vaccine and therapeutic development Middle East Respiratory Syndrome Vaccine Candidates: Cautious Optimism Role of CD8+ T cells in control of West Nile virus infection In silico predicted mycobacterial epitope elicits in vitro T-cell responses We acknowledge the support of IIIT Allahabad for providing necessary facilities and infrastructure required for the completion of the work. There is no funding and conflict of interest between the authors. A*23:01 -2.12 0.32 131.7Table2: 23 potential T cell epitopes along with their interacting MHC-I alleles, proteasomal cleavage score, TAP transport score, MHC score, processing score and the overall total score that predicts the peptides intrinsic potential to be a potential T cell epitope that could elicit immunogenic response in the host.