key: cord-0683630-6o1gqe8v authors: Sanami, Samira; Zandi, Milad; Pourhossein, Behzad; Mobini, Gholam-Reza; Safaei, Mohsen; Abed, Atena; Arvejeh, Pooria Mohammadi; Chermahini, Fatemeh Amini; Alizadeh, Morteza title: Design of a Multi-epitope Vaccine against SARS-CoV-2 using Immunoinformatics approach date: 2020-07-15 journal: Int J Biol Macromol DOI: 10.1016/j.ijbiomac.2020.07.117 sha: add2d6d531398dd43adae803b328adb6beb306d9 doc_id: 683630 cord_uid: 6o1gqe8v Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused COVID-19 disease in China. So far, no vaccine has licensed to protect against infection with COVID-19, therefore an effective COVID-19 vaccine needed. The aim of this study was to predict antigenic peptides of SARS-CoV-2 for designing the COVID-19 vaccine using immunoinformatic analysis. In this study, T and B-cell epitopes of S protein were predicted and screened based on the antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. The epitopes were joined by the appropriate linker. LT-IIc as an adjuvant was attached to the end of the structure. The secondary and 3D structure of the vaccine was predicted. The refinement process was performed to improve the quality of the 3D model structure; the validation process is performed using the Ramachandran plot and ProSA z-score. The proposed vaccine's binding affinity to the HLA-A11: 01 and HLA-DRB1_01: 01 molecule was evaluated by molecular docking. Using molecular dynamics, the stability of vaccine-HLA complexes was also evaluated. Finally, in silico gene cloning was performed in the pET30a (+) vector. The findings suggest that the current vaccine may be a promising vaccine to prevent SARS-CoV-2 infection. In December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused COVID-19 disease in China, which was associated with a cluster of respiratory infections [1] . Infection with SARS-CoV-2 has spread to various countries and created a significant threat to public health. Following the outbreak of SARS-CoV in 2002 and 2003 and MERS-CoV outbreak in 2012, the SARS-CoV-2 outbreak is a recurrence of the previous occurrence with new pressure, and the bat is highly probable to be the origin of all types of CoV [2] . Researchers have confirmed the transmission of this virus from human to human, which has led to a further outbreak of the disease [3] . Coronaviruses are classified into four genera: Alpha, Beta, Gamma, and Delta Coronavirus. Alpha and Bata coronaviruses infect bats and human [4, 5] , SARS-CoV-2 is a member of β coronavirus genera [6] . SARS-CoV-2 is a single strand, positive-sense RNA virus with a genome of around 30 Kb in length [7] . In all coronaviruses, the structural proteins comprise spike (S), envelope (E), membrane protein (M), and nucleoprotein (N) [8, 9] . SARS-CoV-2 uses the S protein as the key to binding and entering host cells therefore understanding its structure and function is important. The S protein is a type I transmembrane protein and highly glycosylated. Spike glycoproteins assemble into trimers on the viral particle surface. S glycoprotein has three domains including ectodomain, transmembrane, and endodomain. Ectodomain subdivided into S1 and S2 subunits, S1 is important for binding to host cell receptors, the amino acid sequences of S1 are more variable than S2 [10] . S protein is an attractive target for vaccine development for the following reasons: S protein is located at the virus surface and uses angiotensin-converting enzyme 2 receptor in respiratory epithelial cells to enter the host cells [11, 12] , it also can induce neutralizing antibodies [13] . The development and production of a vaccine are costly and will take many years to achieve this goal, the various strategies were used to reduce the time and costs of this process [14] . Reverse vaccinology or vaccinomics is a new method that combines immunogenomics and bioinformatics to develop novel vaccines [15] . This method has many benefits over conventional vaccinology, it decreases the vaccine development time and costs and it also makes possible to study non-cultivable or risky microorganisms [16] . Scientists are working very hard to combat this virus, but still, there is no drug or vaccine available to protect against infection with COVID-19 [17] . Thus, in this regard, we decided to design a new multiepitope vaccine against SARS-CoV-2 infection using reverse vaccinology approaches. In this research, first, the CTL, HTL, and B-cell epitopes of the S protein were predicted using ProPred-1, ProPred, and ABCPred servers, respectively, and then were selected base on antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. The selected epitopes were joined together using the appropriate linker to construct the main core of the multi-epitope vaccine. Since the multi-epitope vaccine has poor antigenicity, LT-IIc as an adjuvant to overcome this weakness was attached to the end of the structure. LT-IIc is a type II subfamily of LTBs consisting of 121 amino acids. Moreover, when coadministered with a multi-epitope vaccine it shows an increase in CD8+ T cell immune responses [18] . Next, the physicochemical properties of the construct were investigated, the 3D structure of the protein was predicted, and finally, its affinity to the MHC I and II molecules was investigated through docking, following that, was performed the molecular dynamics (MD) simulation of docking complexes. Finally, the vaccine coding gene expression in Escherichia coli (E. coli) was evaluated, and in silico gene cloning was performed in the pET30a (+) vector. The amino acid sequences of SARA-COV-2 S protein (QII57328.1) and LT-IIc (H6W8F2) were obtained in FASTA format from NCBI (https://www.ncbi.nlm.nih.gov/) and Uniprot (https://www.uniprot.org/) databases, respectively. The binding epitopes to MHC I alleles were predicted using the ProPred-1 server (http://crdd.osdd.net/raghava/propred1/). ProPred-1 server is based on a method of predicting epitopes that bind to 47 different alleles of MHC I [19] . To predict HTL epitopes was used the ProPred server (http://crdd.osdd.net/raghava/propred/) which uses quantitative matrices to predict T-cell epitopes binding to 51 different alleles of MHC II [20] . In this study, peptide fragments with default parameters for MHC I and II alleles with the highest coverage of Iran's population were predicted. Gamma interferon (IFN-γ) is a cytokine that plays a key role in both innate and adaptive immune response; IFNepitope server (https://webs.iiitd.edu.in/raghava/ifnepitope/predict.php) determined the IFN-γ-inducing HTL epitopes. ABCpred server (http://crdd.osdd.net/raghava/abcpred/) using an artificial neural network predicted the B-cell epitopes. This is the first server developed using fixed-length patterns based on a recurrent neural network. The server predicts peptides 10-20 mer in length and the threshold is among 0.1 to 1. In this study, the server was predicted peptide fragments of 16 mer length, and the threshold was set at 0.89 [21] . Antigenicity, toxicity, and allergenicity were evaluated for predicted epitopes. The antigenicity of the epitopes were calculated by VaxiJen v2.0 server (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html), which is based on the transformation of the protein sequences auto cross-covariance (ACC) into uniform vectors of main amino acid properties. The performance accuracy of the VaxiJen server varied from 70 to 89%, based on the selected model organism. In this study, the VaxiJen v2.0 server determined the epitopes antigenicity based on a threshold of 0.4 and a virus model [22] [23] [24] . The epitopes toxicity were determined using the ToxinPred server (http://crdd.osdd.net/raghava/toxinpred/). This server predicts of toxic or non-toxic peptides from a wide variety of peptides, along with their important physical and chemical properties such as hydrophobicity, hydropathicity, amphipathicity, molecular weight, and pI charge [25] . The AllerTOP v. 2.0 Server (https://www.ddg-pharmfac.net/AllerTOP/index.html) predicted epitopic allergenicity. The approach used in this server is based on the automatic cross-covariance (ACC) transformation of protein sequences into uniform vectors of equal length [26] . The PIR peptide matching program (https://research.bioinformatics.udel.edu/peptidematch/index.jsp) was used to investigate the presence or absence of similarity with the human proteome in predicted epitope sequences [27] . The selected T and B-cell epitopes from the previous step were placed in the vaccine structure. KK and GPGPG linkers joined the screened T and B-cell epitopes in a structure of multi-epitope vaccine, respectively. The LT-IIc amino acid sequence was attached to the merged epitopes at the N-terminus using the EAAAK linker to boost the immunogenicity of the vaccine. An effective vaccine should not only cause strong immune responses but should also have appropriate physicochemical properties during the J o u r n a l P r e -p r o o f development cycle. Protparam server (https://web.expasy.org/protparam/) calculated the amino acid content and some of the designed vaccine's physicochemical properties such as molecular weight, theoretical pI, half-life in the mammalian reticulocytes, instability index, aliphatic index, and grand average of hydropathy (GRAVY) [28] . A standard method for determining the molecular weight of a protein, given the number of amino acids that make it up, is to multiply that number by the average weight of an amino acid, which is 110Da [29] . The isoelectric point is the pH at which amino acid or protein loses its charge and is, therefore, unable to move in a direct current electric field. Knowing the theoretical pI of a protein can be very useful for selecting and optimizing the methods used for protein purification, including ion-exchange chromatography and isoelectric focusing electrophoresis [30] . The half-life is the time it takes for half of the protein to disappear inside the cell after synthesis [31] . The instability index of proteins indicates their stability in the test tube so that proteins with an instability index of less than 40 are predicted as stable proteins [32] . The aliphatic index of a protein is the relative volume of protein occupied by aliphatic side chains, this factor indicates the stability of the protein against heat [33] . The GRAVY is computed by dividing the sum of the hydropathic values of all amino acids by the number of residues in the sequence [34] . The GRAVY demonstrates the amphipathic nature of the proteins, the negative and positive values suggest the hydrophilic and hydrophobic nature of the designed vaccine, respectively [35] . SOLpro server (http://scratch.proteomics.ics.uci.edu/) predicted the protein tends to be soluble when overexpression in E. coli [36] . ANTIGENpro server (http://scratch.proteomics.ics.uci.edu/) was used to estimate the antigenicity of the vaccine, the accuracy of the server was determined to be 76 % based on cross-validation experiments using the combined dataset [37] . Allergenicity and toxicity of the vaccine were determined by AllerTOP v. 2.0 and ToxinPred servers, respectively. Prabi server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_gor4.html) used GOR4 method to predict the secondary structure of the engineered vaccine. GOR IV with a mean accuracy of 64.4% uses all possible pair frequencies within a 17 amino acid residues window [38] . 3Dpro server (http://scratch.proteomics.ics.uci.edu/) predicted the tertiary structure of the multiepitope vaccine [36] . 3Drefine server (http://sysbio.rnet.missouri.edu/3Drefine/) improved the quality of the 3D model from the previous stage. 3Drefine refining protocol includes a two-step process as follows: (i) Optimization of the hydrogen bonding network and (ii) minimization of atomic energy by integrating physics with knowledge-based force fields [39] . Model validation is an important step to detect possible errors in crude structures and to compare the quality of models before and after the refinement process [40] , for this purpose, RAMPAGE (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) and ProSA (https://prosa.services.came.sbg.ac.at/prosa.php) servers were used in this study. Ramachandran plot was generated using the RAMPAGE server. The most important application of the Ramachandran plot is to predict the quality of different protein structures regarding experimental methods (X-ray crystallography, NMR, and Cryo-EM). Acceptable and nonacceptable quality protein structures consisted of a collection of torsional angles in the allowed and disallowed regions, respectively [41] . The ProSA J o u r n a l P r e -p r o o f server evaluated the overall quality of the protein models in the form of a z-score. When the predicted model's z-scores are outside the normal range of native proteins, it indicates an incorrect structure [42] . ClusPro 2.0 server (https://cluspro.org) performed molecular docking between vaccine candidate and two of the most common binding alleles in the population of Iran including HLA-A11:01 with PDB entry of 1X7Q, and HLA-DRB1_01:01 with PDB entry of 1AQD. ClusPro can discriminate the hypothetical usercreated structures and run any server-compatible docking algorithms. Additional functions allow the users to modify the structure, determine the attraction and repulsion, or identify pairwise distance restrictions [43] [44] [45] . MD simulation was performed using Gromacs 5.1.5 (GROningen MAchine for Chemical Simulations) software to investigate the behavior of vaccine, HLA-A11:01, and HLA-DRB1_01:01 molecules in docking structure [46] . The input structures were prepared using the amber ff99SB force field. The correct hydrogen status of histidines was established for all the histidines presented in the structure, and the disulfide bonds were defined for the protein. Then, the surface charge of the structure was neutralized by adding several sodium and chlorine ions. The protein was inserted into a layer of TIP3P water molecules at 10 angstroms using gmx solvate software. Energy minimization was performed on structures with the1000-step by steepest descent method to eliminate Van der Waals interactions and to form the hydrogen bonds between water and complex molecules. In the next step, the system temperature gradually increased from 0 up to 300 K for 200 ps in constant volume and was then equilibrated in constant pressure at 1000 ps. MD simulation was performed at 27°C (300 K) for 40 ns. Non-bonding interactions with a distance of 10 angstroms were calculated using the PME method. To increase the computational speed, the SHAKE algorithm was used to limit the bonds involved in the hydrogen atom. Backtranslation of the amino acid sequence of the multi-epitope vaccine was conducted using the Gene infinity server (http://www.geneinfinity.org/sms/sms_backtranslation.html). The Codon Adaptation Index (CAI) and GC content of optimized DNA were assessed by the GenScript server (https://www.genscript.com/tools/rare-codon-analysis). Finally, the multi-epitope vaccine sequence was inserted into the pET-30a (+) vector by the SnapGene tool. Using the ProPred-1 server, which was set to default parameters, were predicted 15 CTL epitopes (9 mer) for human MHC-I alleles including HLA-A*0201, HLA-A*1101, HLA-B*3501, HLA-B*5101 and HLA-B*5301 ( Table 1 ). The ProPred server predicted 657 HTL epitopes (9 mer) for the alleles of MHC-II, namely, DRB1*01:01, DRB1*03:01, DRB1*07:01, DRB1*11:01, DRB1*13:01 and DRB1*15:01. Next, 52 epitopes capable of binding to at least four types of MHC alleles were assessed using the IFNepitope server for their ability to induce IFN-γ, the result of this step suggested that 23 HTL epitopes were capable of inducing IFN-γ (Table 2 ). Using the default parameters of the ABCPred server were predicted J o u r n a l P r e -p r o o f 15 linear B-cell epitopes ( Table 3) . The predicted CTL, HTL, and B-cell epitopes were evaluated for overlapping, antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. It is noteworthy that among the overlapped epitopes were selected epitopes that had a higher antigenicity and improved quality of the second and three-dimensional structure of the vaccine by being in the chimeric structure. Finally, 4, 8, and 6 epitopes were selected from CTL, HTL, and B-cell epitopes, respectively. The final vaccine structure consists of four domains: LT-IIc, Linear B-cell, CTL, and HTL epitopes, which were placed in the vaccine construct by different linkers (Fig. 1) . The Protparam results revealed that the multi-epitope vaccine comprised of 380 amino acids. The vaccine's molecular weight was 41.65 kDa and its theoretical pI value was 9.96. The half-life of the vaccine was 30 h in mammalian reticulocytes, more than 20 h in yeast, and more than 10 h in E. coli. The instability index, aliphatic index, and GRAVY were predicted 22.73, 91.08, and -0.035, respectively. The SOLpro server result showed the vaccine to be soluble. The antigenicity of the final sequence (including the adjuvant ) and the main sequence (without adjuvant) of vaccine were predicted using the ANTIGENpro server to be 0.584792 and 0.512677, respectively. The vaccine sequence was predicted to be non-allergenic and non-toxic using AllerTOP v. 2.0 and ToxinPred servers, respectively. The Prabi server predicted that 24.47, 25.26 and 50.26% of the total 380 amino acids of vaccine construct were alpha helix, extended strand, and random coil, respectively. Vaccine primary 3D model was predicted using the 3Dpro server. The 3Drefine server performed the process of refining the protein model. Five parameters, including 3Drefine, GDT-TS, GDT-HA, RMSD, RWplus, and MolProbity, ranked the optimized models. The higher GDT-TS, RMSD and GDT-HA scores, and the low scores of 3Drefine, MolProbity, and RWplus showed a higher quality model. Based on these factors, refined model No. 5 was selected, which is indicated in the first row of Table 4 , the raw 3D structure and refined model of the vaccine are depicted in Fig. 2A and b . The validation process was performed using the Ramachandran plot and the ProSA z-score. Ramachandran plot result revealed that in the primary model, 93.4%, 3.2%, and 3.4% of residues were located in favored, allowed, and outlier regions, respectively (Fig. 3A) , while in the refined model, 93.1%, 5%, and 1.9% of residues were found J o u r n a l P r e -p r o o f in favored, allowed, and outlier regions, respectively (Fig. 3B) . Z-Score of the initial model was calculated -0.82 (Fig. 4A) , while the Z-Score of the refined model was estimated -1.08 (Fig. 4B ). The interaction of the refined model of vaccine with HLA-I and HLA-II molecules was investigated using the ClusPro 2.0 server. Proteins with PDB entry of 1X7Q and 1AQD were identified separately as receptor and vaccine as a ligand to do molecular docking. The server provided 30 clusters from each complex and ranked the clusters based on the amount of energy. Cluster No. 0.00 in the results of the interaction between the HLA-A11:01 and HLA-DRB1_01:01 molecules with the multi-epitope vaccine showed the lowest energy weighted score of -1165.4 and-1279.1 Kcal/mol, respectively ( Fig. 5A and 6A ). The PDBsum server was used to obtain accurate information on the interaction between residues in docking complexes. A total of 41 vaccine residues coupled with 39 residues of chain A from HLA-A11:01 molecule. A number of 18 hydrogen bonds were formed between the residues of the chain A from the HLA-A11: 01 molecule and the residues of the vaccine, which included Pro105-His113, Glu53-Lys250, Glu53-His211, Asn174-Gln183, Glu166-Tyr109, Arg169-Tyr109, Lys146-Tyr361, Asp39-Lys259, Glu58-Lys27, Lys176-Gly376, Lys68-Asn112, Lys68-Ala379, Lys68-Leu377, Arg163-Ser18, Asp106-Asn98, Arg75-Ile367, Arg75-Ala368 and Thr80-Trp360 (Fig. 5B ). In the vaccine-HLA-DRB1_01:01 complex 15 and 23 residues of the vaccine associated with 14 residues of the chain A and 24 residues of the chain B from the HLA-DRB1_01: 01 molecule, respectively. Hydrogen bond of Thr162-Ser53 was established between the vaccine and chain A from HLA-DRB1 01:01 molecule and 14 hydrogen bonds, including Glu176-Lys4, Glu128-Lys4, Gln92-Ala34, Gly84-Arg149, His81-Arg149, Glu87-Arg149, Glu87-Lys39, Thr21-Lys39, Glu22-Gln42, Asn19-Val41, Asn82-Gly145, Asn82-Thr141, Thr77-Thr141and Gln70-Arg137 were formed between the chain B from HLA-DRB1 01:01 molecule and vaccine (Fig. 6B ). The best molecular docking complexes were used as inputs to MD simulation. At this stage, MD simulation of both complexes was performed for 40 ns and the results were evaluated as root mean square deviation (RMSD), Root Mean Square Fluctuation (RMSF), and radius of gyration (Rg). RMSD between the created structures during MD simulation is an appropriate parameter to ensure the stability of the vaccine-HLA complexes. RMSD of the HLA-A11:01 molecule in the vaccine -HLA-A11:01 complex reached 0.2 nm after approximately 1000 ps and showed a slight variation around this amount until the end of the MD simulation. The RMSD of the vaccine showed a significant increase from the beginning of the MD simulation until 3500 ps, after that, there was a downward trend up to 5,000, from this point to until 36000 ps the trend was approximately increasing (Fig. 7A) . The profile of RMSD changes of HLA-DRB1_01: 01 molecule in the vaccine -HLA-DRB1_01:01 complex was similar to HLA-A11: 01, while the rate of RMSD changes in the vaccine was relatively different. At the beginning of the MD simulation, the value of RMSD showed a sharp increase until 20000 ps which reached 2 nm, of this point until the end of the MD simulation was observed no significant changes. (Fig. 7B) . RMSF determined the flexibility of any residue in the vaccine-HLA complexes. RMSF result showed that the RMSF of most amino acids in the chain A of the HLA-A11: 01 molecule was high and the maximum and minimum RMSF values were 0.17 and 0.05, respectively (Fig. 8A) . Two regions on the chain B of the molecule showed relatively high fluctuations, which included regions 12-21 and 70-78. In the range of J o u r n a l P r e -p r o o f 12-21, the RMSF reached 0.23 nm (Fig. 8B) . The study of vaccine structure in this complex showed high RMSF in most of its amino acids. The two regions of the vaccine showed very high levels of RMSF of 0.57 and 0.5 nm, which included amino acids in regions 65-80 and 300-320, respectively (Fig. 8C ). Most parts in chain A of HLA-DRB1_01: 01 molecule in the vaccine -HLA-DRB1_01: 01 complex including amino acids in areas 32-38, 48-51, 92-100, 125-130, 154-157 and 167-171 showed high fluctuations (Fig. 9A) . In chain B, the lowest and highest value of RMSF was 0.05 and 0.15 nm, respectively (Fig. 9B) . The vaccine showed very high fluctuations in most areas and had a maximum RMSF of 0.55 nm (Fig. 9C) . Rg for the HLA-A11: 01 molecule in the vaccine -HLA-A11:01 complex was 2.35 nm at the beginning of the simulation and showed no change from the beginning of the simulation to the end (Fig. 10A) . While Rg of the HLA-DRB1_01: 01 molecule in the vaccine -HLA-DRB1_01: 01 complex was 2.4 nm at the beginning of the MD simulation and had a slight increase up to 1000 ps, then remained constant until the end of the MD simulation (Fig. 10B) . In the presence of the HLA-A11: 01 molecule, the Rg value of the vaccine was 3.7 at the beginning of the simulation, and during the simulation period, a slight increase and decrease were observed in this parameter, and at the end of the simulation time, the Rg value reached 3.5 (Fig. 10A) . The Rg level of the vaccine in the presence of the HLA-DRB1_01: 01 molecule showed a relatively significant decrease from the beginning of the simulation to 17500 ps then remained constant from that point until the end of the simulation period (Fig. 10B ). The Gene infinity server conducted backtranslation of the vaccine construct. GenScript Tool was used to evaluating the key properties of the gene sequence including the Codon Adaptation Index (CAI) and GC content. The optimized nucleotide sequence CAI value was 1 and the GC content of the optimized sequence was 55.47% in E. coli. Recognition sites for BamHI and XhoI restriction enzymes were added to 5′ and 3′ end of the optimized gene, respectively. The adapted codon sequence was inserted into the pET30a (+) vector using SnapGene software (Fig. 11 ). The COVID-19 outbreak demonstrates a pandemic threat to global public health [47] . The SARS-CoV-2 can be transmitted from person to person, even when the infected individual has no clinical symptoms, and sometimes cannot be detected in the infected individual for several days or even weeks. There are cases where a person has clinical symptoms of COVID-19, but the diagnosis test is negative, so the best way to prevent coronavirus from occurring is to get a vaccine [48] , but unfortunately, because of the novelty of the virus, there is still no licensed vaccine for the disease. However, researchers and pharmacists around the world are working at an unprecedented pace to develop an effective virus vaccine, and Clinical trials of several candidate vaccines have begun, including mRNA-1273, INO-4800, and ChAdOx1 nCoV-19 [49] . Reverse vaccinology technology has been widely used to determine the epitopes for the development of multi-epitope vaccines in various organisms. In recent months, several reverse vaccinology studies have also been conducted to design the COVID-19 vaccine. In a study conducted by Enayatkhani et al. the B-cell and T epitopes of N, ORF3a, and M proteins were predicted, a multi-epitope vaccine candidate was constructed, then were performed bioinformatics evaluations [50] . Peele et al. designed a vaccine candidate using B-cell and T epitopes of S protein by in silico methods [51] . Kalita et al. selected N, M, and S proteins as the target antigen for the prediction of T and B-cell epitopes and designed a multi-epitope vaccine against SARS-CoV-2 [48] . Guglielmo Lucchese predicted S J o u r n a l P r e -p r o o f protein antigenic peptides from SARS-CoV-2 using immunoinformatics methods [52] . Ahmed et al. predicted a collection of B cell and T cell epitopes derived from the S and N proteins of SARS-CoV-2 [53] . In each of the studies mentioned above, different proteins have been used to predict epitopes, also, the type of bioinformatics softwares, Linkers, and many other factors are different in each study, therefore the laboratory evaluations of each of these designed vaccines show different results. This work focussed on the in silico design of a potential multi-epitope vaccine for COVID-19 using S protein. It is important to identify B-cell and T epitopes of the target antigens in order to develop a multi-epitope vaccine. In the current study, we predicted a large number of CTL, IFN-γ-inducing HTL, and B-cell epitopes for S protein then selected 18 high-ranked epitopes based on antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. Linkers play a dual role in the multi-epitope vaccine structure : (i) the prevention of the generation of junctional epitopes (neoepitopes), which is the major concern in designing multi-epitope vaccines, and (ii) promotion of the immune processing and presentation of HLA-II binding epitopes [40, 54] . To design the vaccine construct, the selected T and B-cell epitopes were fused using the KK and GPGPG linkers, respectively. LT-IIc was attached to the N-terminal of the multiepitope sequence by an EAAAK linker. The designed vaccine was evaluated for its physicochemical features, antigenicity, allergenicity, and toxicity .The vaccine's molecular weight was 41.65 kDa, which makes it an acceptable vaccine since proteins with a molecular weight of less than 110 kDa are considered to be more appropriate targets for the development of vaccines, as proteins with a lower molecular weight can be easily purified [29] , The vaccine construct was basic in nature and the total numbers of negatively and positively charged residues were 16 and 48, respectively. The vaccine structure has a half-life of 30, 20, and 10 h in mammalian, yeast, and E. coli, respectively, therefore the vaccine is exposed to the immune system for a relatively long time and causes more immune responses [28] . The recombinant vaccine's instability index was calculated at 22.73, and as it was less than 40, it was considered to be a stable protein [28] . The calculated aliphatic index and the GRAVY were 91.08 and -0.035, respectively. The higher aliphatic index value suggests greater thermostability and the negative GRAVY value shows the vaccine's hydrophilic nature therefore it can show strong interaction with water molecules [55] . The sequences of vaccine (with and without adjuvant) were both antigenic, and adding the adjuvant to the N-terminal of the vaccine sequence increased the vaccine's antigenicity. Results also suggested that the vaccine was soluble, non-allergenic, and non-toxic. To obtain a high-quality 3D structure, the initial 3D structure was subjected to the refinement process. The quality of the initial and refined models were assigned using Ramachandran plot and ProSA z-score. Ramachandran plot data showed that 3.4% of residues of the initial model was located in the outlier region while this value was reduced to 1.9% after the refinement process. In addition, the z-score of primary model was calculated -0.82, while this parameter was determined -1.08 after refining. These results suggest that the three-dimensional structure of the vaccine has improved after refinement. Cluspro server conducted the docking process between the multi-epitope vaccine with HLA-A11:01 and HLA-DRB1_01:01 molecules, the value of lowest energy in the selected complexes of vaccine-HLA-A11: 01 and vaccine-HLA-DRB1_01: 01 were -1165.4 and -1279.1, respectively, which indicated that the interaction of the vaccine with the HLA-DRB1_01:01 to be stronger than its interaction with the HLA-A11:01. The results of the RMSD evaluation showed that The vaccine -HLA-DRB1_01:01 complex reached stability at 20 ns times, while the vaccine -HLA-A11:01 complex required more time to stabilize. The RMSF result suggested that the flexibility in the regions between vaccine-HLA in the both complex and the results of Rg showed that the vaccine structure in both complexes was densified during simulation, but gained more density in the vaccine-HLA-DRB1_01: 01 complex. In practice, some essential factors including CAI and GC content of the gene need to be optimized to achieve an optimal J o u r n a l P r e -p r o o f level protein expression in E. coli. According to the results, the value of the CAI was 1, the CAI value was greater than 0.8, which suggests a higher probability of protein expression at a high level [56] . GC content was 55.47%, and the ideal percentage of GC content ranges from 30 to 70%. The results of bioinformatics evaluation of the vaccine showed that this vaccine candidate may have a high potency against SARS-CoV-2 but the in vitro and in vivo studies are needed to confirm these results. The purpose of this research was to develop a COVID-19 vaccine using bioinformatics approaches. To achieve this aim were predicted T and B-cell epitopes of S protein and then screened for antigenicity, toxicity, allergenicity, and cross-reactivity with human proteomes. The eighteen high-ranked epitopes and LT-IIc were incorporated into the final construct. After evaluating the physicochemical properties of the vaccine construct, its secondary and three-dimensional structure was predicted. Structural quality, binding affinity to MHC-I and II, and stability of the docked complexes were assessed. Finally, the expression level of the vaccine in E. coli strain K12 was evaluated. Significant results were reported in this research, experimental confirmation of the engineered vaccine required to further assess the efficacy of this vaccine candidate. J o u r n a l P r e -p r o o f Clinical features of patients infected with 2019 novel coronavirus in The 2019 novel coronavirus disease (COVID-19) pandemic: A zoonotic prospective SARS-CoV-2 causing pneumoniaassociated respiratory disorder (COVID-19): diagnostic and proposed therapeutic options Host factors in coronavirus replication, Roles of Host Gene and Non-coding RNA Expression in Virus Infection Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus A novel coronavirus from patients with pneumonia in China COVID-19: a new challenge for human beings The genome sequence of the SARS-associated coronavirus Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage Mechanisms of coronavirus cell entry mediated by the viral spike protein Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Consider TLR5 for new therapeutic development against COVID-19 Identification of an antigenic determinant on the S2 domain of the severe acute respiratory syndrome coronavirus spike glycoprotein capable of inducing neutralizing antibodies History of vaccination Application of pharmacogenomics to vaccines Reverse vaccinology Probable Molecular Mechanism of Remdesivir for the Treatment of COVID-19: Need to Know More LT-IIc, a new member of the type II heat-labile enterotoxin family encoded by an Escherichia coli strain obtained from a nonmammalian host ProPred1: prediction of promiscuous MHC Class-I binding sites ProPred: prediction of HLA-DR binding sites Prediction of continuous B-cell epitopes in an antigen using recurrent neural network VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Identifying candidate subunit vaccines using an alignmentindependent method based on principal amino acid properties Bioinformatic approach for identifying parasite and fungal candidate subunit vaccines In silico approach for predicting toxicity of peptides and proteins DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures A fast peptide match service for UniProt knowledgebase Protein identification and analysis tools on the ExPASy server Exoproteome and secretome derived broad spectrum novel drug and vaccine candidates in Vibrio cholerae targeted by Piper betel derived compounds Isoelectric point separations of peptides and proteins How are substrates recognized by the ubiquitin-mediated proteolytic system Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence Thermostability and aliphatic index of globular proteins A simple method for displaying the hydropathic character of a protein Development of multi-epitope driven subunit vaccine against Fasciola gigantica using immunoinformatics approach SCRATCH: a protein structure and structural feature prediction server High-throughput prediction of protein antigenicity using protein microarray data GOR secondary structure prediction method version IV MESHI: a new library of Java classes for molecular modeling Designing an efficient multiepitope peptide vaccine against Vibrio cholerae via combined immunoinformatics and protein interaction based approaches ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins The ClusPro web server for protein-protein docking New additions to the C lus P ro server motivated by CAPRI How good is automated protein docking? GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers SARS-CoV-2: an emerging coronavirus that causes a global threat Design of a peptide-based subunit vaccine against novel coronavirus SARS-CoV-2 A review of SARS-CoV-2 and the ongoing clinical trials Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study Design of multi-epitope vaccine candidate against SARS-CoV-2: a in-silico study Epitopes for a 2019-nCoV vaccine Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies A novel design of a multi-antigenic, multistage and multi-epitope vaccine against Helicobacter pylori: an in silico approach Exploring dengue genome to construct a multi-epitope based subunit vaccine by utilizing immunoinformatics approach to battle against dengue infection Codon adaptation index as a measure of dominating codon bias We would like to thank Dr. Shahram Tahmasebian and Dr. Sorayya Ghasemi for their valuable feedback and suggestions. The Authors have declared no competing interest. None Table 1 . Results of CTL epitope prediction. Four CTL epitopes were selected (labeled with *) for incorporation in the proposed vaccine amino acid sequence.